References
- Lu, L., Jiang, H., & Zhang, H. (2001). A robust audio classification and segmentation method, in Proc. ACM International Conference on Multimedia, 203-211.
- Xu, M., et al. (2003). Creating audio keywords for event detection in soccer video, in Proc. IEEE International Conference on Multimedia and Expo, 281-284.
- Cheng, W., Chu, W., and Wu, J. (2003). Semantic context detection based on hierarchical audio models, in Proc. ACM SIGMM International Workshop on Multimedia Information Retrieval, 109-115.
- Elo, J. P., et al. (2009). Non-speech audio event detection, in Proc. Internationa Conference on Acoustics, Speech and Signal Processing, 1973-1976.
- Heittola, T., et al. (2013). Context-dependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, 11-13.
- Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. in Proc. Advances in Neural Information Processing Systems, 1096-1104.
- K, Zvi., & T, Orith. (2013). Audio event classification using deep neural networks, in Proc. INTERSPEECH, 1482-1486.
- Ballan, L., et al. (2009). Deep networks for audio event classification in soccer videos, in Proc. International Conference on Multimedia and Expo, 474-477.
- Bengio, Y. & LeCun, Y. (2007). Scaling learning algorithms towards AI, Large-scale Kernel Machines, Vol. 34, No.5, 321-360.
- Barker, J., et al. (2012). The PASCAL CHiME speech separation and recognition challenge, Computer Speech & Language, Vol. 27, No. 3, 621-633. https://doi.org/10.1016/j.csl.2012.10.004
- Downie, S., et al. (2010). The Music Information Retrieval Evaluation eXchange: Some observations and insights, Advances in Music Information Retrieval. Springer, 93-115.
- Malkin, R. G. (2007). Multimodal Technologies for Perception of Humans. Springer, 323-330.
- Smeaton, F. et al. (2006). Evaluation campaigns and TRECVid, in Proc. ACM International Workshop on Multimedia Information Retrieval, 321-330.
- Vincen, E., et al. (2012). The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges, Signal Processing, Vol. 92, No. 8, 1928-1936. https://doi.org/10.1016/j.sigpro.2011.10.007
- Larochelle, H., et al. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. in Proc. International Conference on Machine learning, 473-480.
- Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectified linear units and dropout, in Proc. International Conference on Acoustics, Speech and Signal Processing, 8609-8613.
- Bottou, L. (2004). Advanced Lectures on Machine Learning, Sringer, 146-168.
- Salamon, J., Jacoby, C., & Bello, J. P. (2014), A dataset and taxonomy for urban sound research, in Proc. ACM International Conference on Multimedia, 1041-1044.
- Young, S., et al. (1999). The HTK Book. Cambridge, U.K.: Entropic.
- Bergstra, J., et al. (2010). Theano: A CPU and GPU math expression compiler. in Proc. Python for Scientific Computing Conference, Vol. 4, p. 3.