DOI QR코드

DOI QR Code

주변 배경음에 강인한 구간 검출을 통한 음원 인식 및 위치 추적 시스템 설계

Sound recognition and tracking system design using robust sound extraction section

  • 김우준 (동아대학교 전자공학과) ;
  • 김영섭 ((주)시그널웍스) ;
  • 이광석 (국립경남과학기술대학교 전자공학과)
  • 투고 : 2016.07.12
  • 심사 : 2016.08.24
  • 발행 : 2016.08.31

초록

본 논문은 비정상 상황 시 발생하는 음원에 대해 주변 환경 음에 강인한 음원 구간을 검출하여, 구간내의 신호를 이용한 음원 인식 과 위치 추적 시스템 설계에 관한 연구이다. 강인한 음원 구간 검출은 수신되는 오디오 신호로부터 단 구간 가중 평균 델타 에너지를 계산하여, 저역 통과 필터에 입력 후, 출력되는 결과 값들의 비교를 통해 배경음에 강인한 구간을 정의 하며, 음원 인식은 검출된 구간 내 데이터로부터 종래의 인식 방법인 HMM(: Hidden Markov Model)을 이용해, 음원 인식 정보를 생성하여 학습 및 인식을 한다. 이는 주변 배경음이 포함된 음원 신호에 대해 기존 신호의 에너지를 이용해 구간을 검출 후, HMM을 통한 인식에 비해 3.94% 상향된 인식률을 보인다. 또한 인식 결과를 바탕으로 구간내의 신호간의 TDOA(: Time Delay of Arrival)를 이용한 위치 파악은 실제 발생 위치와의 각도와 97.44%일치함을 보인다.

This paper is on a system design of recognizing sound sources and tracing locations from detecting a section of sound sources which is strong in surrounding environmental sounds about sound sources occurring in an abnormal situation by using signals within the section. In detection of the section with strong sound sources, weighted average delta energy of a short section is calculated from audio signals received. After inputting it into a low-pass filter, through comparison of values of the output result, a section strong in background sound is defined. In recognition of sound sources, from data of the detected section, using an HMM(: Hidden Markov Model) as a traditional recognition method, learning and recognition are realized from creating information to recognize sound sources. About signals of sound sources that surrounding background sounds are included, by using energy of existing signals, after detecting the section, compared with the recognition through the HMM, a recognition rate of 3.94% increase is shown. Also, based on the recognition result, location grasping by using TDOA(: Time Delay of Arrival) between signals in the section accords with 97.44% of angles of a real occurrence location.

키워드

참고문헌

  1. L. RRabinner and R. Schafer, Digital Processing of Speech Signals, New Jersey: PRENTICE HALL, 1978.
  2. P. Atrey, N. Maddage, and M. Kankanhalli, "Audio Based Event Detection for Multimedia Surveillance," 2006 IEEE Int. Conf. on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, May, 2006, pp. 813-816.
  3. G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and Gunshot Detection and Localization for Audio-Surveillance Systems," IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS 2007), London, England, Sept, 2007, pp. 21-26.
  4. C. Knapp and G. Carter, "The generalized correlation method for estimation of thime delay," IEEE Trans. Acoustics, Speech and Signal Processing, vol. 24, no. 4, 1976, pp. 320-327. https://doi.org/10.1109/TASSP.1976.1162830
  5. C. Kee, G. Ki, and T. Le, "Real-Time Sound Localization System For Reverberant And Noisy Environmen," Int. J. of Aeronautical and Space Sciences, vol. 38, no. 3, 2010, pp. 258-263.
  6. B. Park, K. Ban, K. Kwak, and H. Yoon, "Performance analysis of GCC-PHAT-based sound source localization for intelligent robots," The J. of Korea Robotics Society, vol. 2, no. 3, 2007, pp. 270-274.
  7. B. Kwon, Y. Park, and Y. Park, "Spatially Mapped GCC Function Analysis for Multiple Source and Source Localization Method," J. of Institute of Control, Robotics and Systems, vol. 16, no. 5, 2010, pp. 415-419. https://doi.org/10.5302/J.ICROS.2010.16.5.415
  8. G. Jang and M. Jeong, "Voice Activity Detection using Bi-Level HMM," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 8, 2015, pp. 901-906. https://doi.org/10.13067/JKIECS.2015.10.8.901
  9. Y. Kim and H. Lee, "A Study on Improved Method of Voice Recognition Rate," J. of the Korea Institute of Electronic Communication Sciences, vol. 8, no. 1, 2013, pp. 77-83. https://doi.org/10.13067/JKIECS.2013.8.1.077
  10. C. Lee, "The Effect of FIR Filtering and Spectral Tilt on Speech Recognition with MFCC," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, no. 4, 2010, pp. 363-371.