DOI QR코드

DOI QR Code

Frequency-Cepstral Features for Bag of Words Based Acoustic Context Awareness

Bag of Words 기반 음향 상황 인지를 위한 주파수-캡스트럴 특징

  • 박상욱 (고려대학교 전기전자전파공학과) ;
  • 최우현 (고려대학교 전기전자전파공학과) ;
  • 고한석 (고려대학교 전기전자전파공학과)
  • Received : 2013.12.19
  • Accepted : 2014.06.02
  • Published : 2014.07.31

Abstract

Among acoustic signal analysis tasks, acoustic context awareness is one of the most formidable tasks in terms of complexity since it requires sophisticated understanding of individual acoustic events. In conventional context awareness methods, individual acoustic event detection or recognition is employed to generate a relevant decision on the impending context. However this approach may produce poorly performing decision results in practical situations due to the possibility of events occurring simultaneously or the acoustically similar events that are difficult to distinguish with each other. Particularly, the babble noise acoustic event occurring at a bus or subway environment may create confusion to context awareness task since babbling is similar in any environment. Therefore in this paper, a frequency-cepstral feature vector is proposed to mitigate the confusion problem during the situation awareness task of binary decisions: bus or metro. By employing the Support Vector Machine (SVM) as the classifier, the proposed feature vector scheme is shown to produce better performance than the conventional scheme.

음향 상황 인지(acoustic context awareness)는 다양하게 발생되는 음원들로부터 어떠한 장소인지 또는 어떠한 사건이 발생하는지를 판단하는 기술로 음향 이벤트 검출 또는 인식 보다 한 단계 더 복잡한 문제이다. 기존의 상황인지 기술은 음향 이벤트 검출 또는 인식 기술에 기반하여 현재 상황을 인지하는 방법을 사용하고 있다. 하지만 이와 같은 접근 방법은 여러 음원이 동시에 발생하거나 유사한 음원이 발생하는 실제 환경에서 정확한 상황 판단이 어렵다. 특히 버스와 지하철은 승객들에 의한 잡음으로 상황을 인지하기 힘들다. 이러한 문제를 극복하기 위해 본 논문에서는 유사한 음향 이벤트가 발생하는 버스와 지하철 상황을 인식할 수 있는 Bag of Words 기반의 상황 인지 알고리즘을 연구하고 코드북 생성을 위한 특징벡터를 제안한다. 제안하는 특징벡터의 효용성은 Support Vector Machine을 이용한 실험을 통해 검증했다.

Keywords

References

  1. W.H. Choi, S.I. Kim, M.S. Keum, D.K. Han, and H. Ko, "Acoustic and visual signal based context awareness system for mobile application," IEEE Trans. Cons. Elec. 57. 2 738-746 (2011). https://doi.org/10.1109/TCE.2011.5955216
  2. B. Clarkson, N. Sawhney, and A. Pentland, "Auditory context awareness via wearable computing," in Proc. Works. Perceptual User Interface, 37-42 (1998).
  3. L. Ma, B. Milner, and D. Smith, "Environmental noise classification for context-aware application," in Proc. Works. Database and Expert Sys. Appl. 2736, 360-370 (2003).
  4. L. Ma, B. Milner, and D. Smith, "Acoustic environment classification," ACM Trans. Speech and Lang. Process. 3, 2, 1-22 (2006).
  5. A. J. Eronenm, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho, and J. Huopaniemi, "Audio-based context recognition," IEEE Trans. Audio, Speech, and Lang. Process. 14, 321-329 (2006). https://doi.org/10.1109/TSA.2005.854103
  6. T. Nishiura, S. Nakamura, K. Miki, and K. Shikano, "Environmental sound source identification based on hidden Markov model for robust speech recognition," in Proc. Eurospeech-Interspeech, 2157-2160 (2003).
  7. P. Gaunard, C.G. Mubikangiey, C. Couvneur, and V. Fontaine, "Automatic classification of environmental noise events by hidden Markov model," in Proc IEEE Inter. Conf. Acoust., Speech, and Sig. Process. 6, 3609-3612 (1998).
  8. G. Guo and S.Z. Li, "Content-based audio classification and retrieval by support vector machines," IEEE Trans. Neural Networks 14, 209-215 (2003). https://doi.org/10.1109/TNN.2002.806626
  9. A. Temko, E. Monte, and C. Nadeu, "Comparison of sequence discriminant support vector machines for acoustic event classification," in Proc. IEEE Inter. Conf. Acoust., Speech, and Sig. Process. 5, 721-724 (2006).
  10. K. Kim and H. Ko, "Hierarchical approach for abnormal acoustic event classification in an elevator," in Proc. IEEE Inter. Conf. Ad. Video and Sig. Surveillance, 88-94 (2011).
  11. T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, "Context-dependent sound event detection," EURASIP J. Audio, Speech, and Music Process. 1, 1-13 (2013).
  12. T. Heittola, A. Mesaros, T. Virtanen, and M. Gabbouj, "Supervised model training for overlapping sound events based on unsupervised source separation," in Proc IEEE Inter. Conf. Acoust., Speech, and Sig. Process. 8677-8681 (2013).
  13. S. Rawat, P. F. Schulam, S. Burger, D. Ding, Y. Wang, and F. Metze, "Robust audio-codebooks for large-scale event detection in consumer videos," in Proc. Interspeech, 2929-2933 (2013).
  14. V. Carletti, P. Foggia, G. Percannella, A. Saggese, N. Strisciuglio, and M. Vento, "Audio surveillance using a bag of aural words classifier," in Proc. IEEE Inter. Conf. Ad. Video and Sig. Surveillance, 81-86 (2013).
  15. T. George and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech and Audio Process. 10. 5, 293-302 (2002). https://doi.org/10.1109/TSA.2002.800560
  16. The HTK book Version 3.4, Cambridge University Engineering Department, (2009).

Cited by

  1. Acoustic scene classification using recurrence quantification analysis vol.35, pp.1, 2016, https://doi.org/10.7776/ASK.2016.35.1.042