DOI QR코드

DOI QR Code

Sound Event Detection based on Deep Neural Networks

딥 뉴럴네트워크 기반의 소리 이벤트 검출

  • 정석환 (계명대학교 전기전자융합시스템공학과) ;
  • 정용주 (계명대학교 전자공학과)
  • Received : 2019.02.07
  • Accepted : 2019.04.15
  • Published : 2019.04.30

Abstract

In this paper, various architectures of deep neural networks were applied for sound event detection and their performances were compared using a common audio database. The FNN, CNN, RNN and CRNN were implemented using hyper-parameters optimized for the database as well as the architecture of each neural network. Among the implemented deep neural networks, CRNN performed best at all testing conditions and CNN followed CRNN in performance. Although RNN has a merit in tracking the time-correlations in audio signals, it showed poor performance compared with CNN and CRNN.

본 논문에서는 다양한 구조의 딥 뉴럴 네트워크를 소리 이벤트 검출을 위하여 적용하였으며 공통의 오디오 데이터베이스를 이용하여 그들 간의 성능을 비교하였다. FNN, CNN, RNN 그리고 CRNN이 주어진 오디오데이터베이스 및 딥 뉴럴 네트워크의 구조에 최적화된 하이퍼파라미터 값을 이용하여 구현되었다. 구현된 방식 중에서 CRNN이 모든 테스트 환경에서 가장 좋은 성능을 보였으며 그 다음으로 CNN의 성능이 우수함을 알 수 있었다. RNN은 오디오 신호에서의 시간 상관관계를 잘 추적하는 장점에도 불구하고 CNN 과 CRNN에 비해서 저조한 성능을 보임을 확인할 수 있었다.

Keywords

KCTSAD_2019_v14n2_389_f0002.png 이미지

그림 2. RNN 구조 Fig. 2 RNN Architecture

KCTSAD_2019_v14n2_389_f0003.png 이미지

그림 3. CRNN 구조 Fig. 3 CRNN Architecture

KCTSAD_2019_v14n2_389_f0005.png 이미지

그림 1. CNN 구조 Fig. 1 CNN Architecture

KCTSAD_2019_v14n2_389_f0006.png 이미지

그림 4. Learning rate 변화에 따른 비용함수와 정확도의 epoch에 따른 수렴 특성. Fig. 4 Convergence Characteristics of Loss Function and Accuracy as Learning rate changes.

표 1. Learning rate 변화에 따른 CRNN 성능 Table 1. Performances of CRNN as learning rate changes

KCTSAD_2019_v14n2_389_t0001.png 이미지

표 2. FNN, CNN, RNN, CRNN 간의 성능비교 Table 2. Performance Comparison Between FNN, CNN, RNN and CRNN.

KCTSAD_2019_v14n2_389_t0002.png 이미지

References

  1. M. Nandwana, A. Ziaei, and J. Hansen, "Robust Unsupervised Detection of Human Screams In Noisy Acoustic Environments," Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, April, 2015.
  2. M. Crocco, M. Christani, A. Trucco, and V. Murino, "Audio Surveillance: A Systematic Review," ACM Computing Surveys, vol. 48. no. 4, 2016, pp. 52:1-52:46.
  3. Y. Lee and P. Moon, "A Comparison and Analysis of Deep Learning Framework," J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 1, 2017, pp. 115-122. https://doi.org/10.13067/JKIECS.2017.12.1.115
  4. Y. Wang, L. Neves, and F. Metze, "Audio-based Multimedia Event Detection Using Deep Recurrent Neural Networks," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 2742-2746.
  5. A. Mesaros, T. Heittola, and T. Virtanen, "Metrics for polyphonic sound event detection," Applied Sciences, vol. 6, no. 6, 2016, pp. 321-337. https://doi.org/10.3390/app6110321
  6. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," Communications of the ACM, vol. 60, no. 6, 2017, pp. 84-90. https://doi.org/10.1145/3065386
  7. A. Graves, A. Mohamed, and G. E. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," Proceedings of the IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 6645-6649.
  8. S. Bang, "Implementation of Image based Fire Detection System Using Convolution Neural Network," J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 2, 2017, pp. 331-336. https://doi.org/10.13067/JKIECS.2017.12.2.331
  9. S. Chung and Y. Chung, "Comparison of Audio Event Detection Performance using DNN," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 3, 2018, pp. 571-577. https://doi.org/10.13067/JKIECS.2018.13.3.571
  10. E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," IEEE/ACM Trans. On Audio Speech and Language Process, vol. 26, no. 6, 2017, pp. 1291-1303.
  11. T. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, Long Short-term Memory, Fully Connected Deep Neural Networks," Proceedings of the 2015 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015, pp. 4580-4584.
  12. K. Choi, G. Fazekas, M. Sandler, and K. Cho, "Convolutional Recurrent Neural Networks for Music Classification," Proceedings of the 2017 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017, pp. 2392-2396.
  13. TUT-SED Synthetic Database 2016, Availab:http://www.cs.tut.fi/sgn/arg/taslp2017-crnn-sed/tut-sed-synthetic-2016