DOI QR코드

DOI QR Code

Audio-Visual Fusion for Sound Source Localization and Improved Attention

음성-영상 융합 음원 방향 추정 및 사람 찾기 기술

  • Lee, Byoung-Gi (Center for Cognitive Robotics Research, Korea Institute of Science and Technology) ;
  • Choi, Jong-Suk (Center for Cognitive Robotics Research, Korea Institute of Science and Technology) ;
  • Yoon, Sang-Suk (Center for Intelligent Robotics, Korea Institute of Science and Technology) ;
  • Choi, Mun-Taek (Center for Intelligent Robotics, Korea Institute of Science and Technology) ;
  • Kim, Mun-Sang (Center for Intelligent Robotics, Korea Institute of Science and Technology) ;
  • Kim, Dai-Jin (Dept. Computer Science and Engineering, Postech)
  • 이병기 (한국과학기술연구원 인지로봇연구단) ;
  • 최종석 (한국과학기술연구원 인지로봇연구단) ;
  • 윤상석 (한국과학기술연구원 지능로봇사업단) ;
  • 최문택 (한국과학기술연구원 지능로봇사업단) ;
  • 김문상 (한국과학기술연구원 지능로봇사업단) ;
  • 김대진 (포항공과대학교 컴퓨터공학과)
  • Received : 2010.12.10
  • Accepted : 2011.04.13
  • Published : 2011.07.01

Abstract

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.

Keywords

Audio-Vision Fusion;Sound Source Localization;Human Attention;Robot Tracking

References

  1. Nakadai, K., Hidai, K., Okuno, H.G. and Kitano, H., 2001, "Real-Time Multiple Speaker Tracking by Multi-Modal Integration for Mobile Robots," in Proc. Eurospeech 2001, pp. 1193-1196.
  2. Lim, Y. and Choi, J., 2009, "Speaker Selection and Tracking in a Cluttered Environment with Audio and Visual Information," IEEE Trans. Consumer Electronics, Vol. 55(3), pp. 1581-1589. https://doi.org/10.1109/TCE.2009.5278030
  3. Hornstein, J., Lopes, M., Santos-Victor, J. and Lacerda, F., 2006, "Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF," in Proc. IEEE/RSJ IROS 2006, pp. 1170-1176.
  4. Chan, V., 2009, "Audio-Visual Sensor Fusion for Object Localization," INE NewsLetter, 8 June.
  5. Zabin, R. and Woodfill, J., 1994, "Non-Parametric Local Transforms for Computing Visual Correspondance," In Proc. the 3rd European Conference on Computer Vision, pp.151-158.
  6. Froba, B. and Ernst, A., 2004, "Face Detection with the Modified Census Transform," in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, pp.91-96.
  7. Jun, B.-J. and Kim, D., 2007, "Robust Real-time Face Detection Using Face Certainty Map," in Proc. ICB 2007, pp.29-38.
  8. Haas, H., 1972,"The Influence of a Single Echo on the Audibility of Speech," Journal of the Audio Engineering Society, Vol. 20, pp.146-159.
  9. Lee, B.-G., Choi, J. S., Kim, D. and Kim, M., 2010, "Verification of Sound Source Localization in Reverberation Room and its Real Time Adaptation Using Visual Information," in Proc. ARSO2010, pp.176-181.

Cited by

  1. Interaction Intent Analysis of Multiple Persons using Nonverbal Behavior Features vol.19, pp.8, 2013, https://doi.org/10.5302/J.ICROS.2013.13.1893