CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

Jang, Bumsuk;Lee, Sang-Hyun;

doi:10.7236/IJASC.2020.9.2.20

International journal of advanced smart convergence

Volume 9 Issue 2
/
Pages.20-27
/
2020
/
2288-2847(pISSN)
/
2288-2855(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

Jang, Bumsuk (BS SOFT Co., LTD.) ;
Lee, Sang-Hyun (Department of Computer Engineering, Honam University)

Received : 2020.03.17
Accepted : 2020.03.25
Published : 2020.06.30

https://doi.org/10.7236/IJASC.2020.9.2.20 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). To improve the separation quality of the NMF, it includes noise update technique that learns and adapts the characteristics of the current noise in real time. The noise update technique analyzes the sparsity and activity of the noise bias at the present time and decides the update training based on the noise candidate group obtained every frame in the previous noise reduction stage. Noise bias ranks selected as candidates for update training are updated in real time with discrimination NMF training. This NMF was applied to CNN and Hidden Markov Model(HMM) to achieve improvement for performance of sound event detection. Since CNN has a more obvious performance improvement effect, it can be widely used in sound source based CNN algorithm.

Keywords

References

E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 6, pp. 1291-1303, Jun. 2017. DOI: 10.1109/TASLP.2017.2690575
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. L. Roux, and K. Takeda, "Duration-Controlled LSTM for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 11, pp. 2059-2070, Nov. 2017. DOI: 10.1109/TASLP.2017.2740002
Crocco, M.; Cristani, M.; Trucco, A.; Murino, V. Audio surveillance: A systematic review. ACM Comput. Surv. 2016, 48, 52. DOI: 10.1145/2871183
Sharan, R.V.; Moir, T.J. An overview of applications and advancements in automatic sound recognition. Neurocomputing 2016, 200, 22-34. doi.org/10.1016/j.neucom.2016.03.020
Cakir, E.; Parascandolo, G.; Heittola, T.; Huttunen, H.; Virtanen, T. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1291-1303. DOI: 10.1109/TASLP.2017.2690575
B. McFee, J. Salamon, and J. P. Bello, "Adaptive Pooling Operators for Weakly Labeled Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 11, pp. 2180-2193, Apr. 2018. https://doi.org/10.1109/taslp.2018.2858559
S. Adavanne, P. Pertila, and T. Virtanen, "Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network," Detection and Classification of Acoustics Scenes and Events 2017, Munich, Germany, Nov. 2017, pp. 1-5. DOI: 10.1109/ICASSP.2017.7952260
J. Lu, "Mean Teacher Convolution System For DCASE 2018 Task 4," Detection and Classification of Acoustics Scenes and Events 2018, Shanghai, China, Jul. 2018, pp. 1-5.
D. Su, X. Wu, L. Xu, "GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection," 2010 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Dallas, TX, USA, Mar. 2010, pp. 4890-4893. DOI: 10.1109/ICASSP.2010.5495122
A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen "Acoustic Event Detection in Real Life," 18th European Signal Process. Conf., Aalborg, Denmark, Aug. 2010, pp. 1267-1271.
V. Bisot, S. Essid, and G. Richard, "Overlapping Sound Event Detection with Supervised Nonnegative Matrix Factorization," 2017 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), New Orleans, LA, USA, Mar. 2017, pp. 31-35. DOI: 10.1109/ICASSP.2017.7951792
T. Komatsu, Y. Senda, and R. Kondo, "Acoustics Event Detection Based on Non-Negative Matrix Factorization With Mixtures of Local Dictionaries and Activation Aggregation," 2016 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Shanghai, China, Mar. 2016, pp. 2259-2263. DOI: 10.1109/ICASSP.2016.7472079
Z. Md. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and K. Mizutani, "State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems," IEEE Commun. Surveys Tutorials, vol. 19, no. 4, pp. 2432-2455, 2017. DOI: 10.1109/COMST.2017.2707140
Z. Liu, Z. Jia, C. Vong, S. Bu, J. Han, and X. Tang, "Capturing High-Discriminative Fault Features for Electronics-Rich Analog System via Deep Learning," IEEE Trans. Indust. Inform., vol. 13, no. 3, pp. 1213-1226, Jun. 2017. DOI: 10.1109/TII.2017.2690940
M. He and D. He, "Deep Learning Based Approach for Bearing Fault Diagnosis," IEEE Trans. Indust. Applications, vol. 53, no. 3, pp. 3057-3065, Jun. 2017. DOI: 10.1109/TIA.2017.2661250
T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, "PCANet: A Simple Deep Learning Baseline for Image Classification?," IEEE Trans. Image Process., vol. 24, no. 12, pp. 5017-5032, Dec. 2015. DOI: 10.1109/TIP.2015.2475625
A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, "Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 2, pp. 379-393, Feb. 2018. DOI: 10.1109/TASLP.2017.2778423
Q. Kong, Y. Cao, T. Iqbal, Yong Xu, W. Wang, and M. D. Plumbley, "Cross-task learning for audio-tagging, sound event detection spatial localization: DCASE 2019 baseline systems," arXiv: 1904.03476, pp. 1-5.
D. D. Lee, and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, Oct. 1999. https://doi.org/10.1038/44565
Y. Xie, Z. Liu, Z. Yao, and B. Dai, "Improved two-stage Wiener filter for robust speaker identification," in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 310-313, Hong Kong, August 2006. DOI: 10.1109/ICPR.2006.696

International journal of advanced smart convergence

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)