원거리 음성명령어 인식시스템 설계

Performance Evaluation of an Automatic Distance Speech Recognition System

  • 오유리 (광주과학기술원 정보통신공학과) ;
  • 윤재삼 (광주과학기술원 정보통신공학과) ;
  • 박지훈 (광주과학기술원 정보통신공학과) ;
  • 김민아 (광주과학기술원 정보통신공학과) ;
  • 김홍국 (광주과학기술원 정보통신공학과) ;
  • 공동건 (삼성종합기술원 Micro Systems Lab.) ;
  • 명현 (삼성종합기술원 Micro Systems Lab.) ;
  • 방석원 (삼성종합기술원 Micro Systems Lab.)
  • Oh, Yoo-Rhee (Department of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Yoon, Jae-Sam (Department of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Park, Ji-Hoon (Department of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Kim, Min-A (Department of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Kim, Hong-Kook (Department of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Kong, Dong-Geon (Micro Systems Lab., Samsung Advanced Institute of Technology) ;
  • Myung, Hyun (Micro Systems Lab., Samsung Advanced Institute of Technology) ;
  • Bang, Seok-Won (Micro Systems Lab., Samsung Advanced Institute of Technology)
  • 발행 : 2007.07.11

초록

In this paper, we implement an automatic distance speech recognition system for voiced-enabled services. We first construct a baseline automatic speech recognition (ASR) system, where acoustic models are trained from speech utterances spoken by using a cross-talking microphone. In order to improve the performance of the baseline ASR using distance speech, the acoustic models are adapted to adjust the spectral characteristics of speech according to different microphones and the environmental mismatches between cross-talking and distance speech. Next we develop a voice activity detection algorithm for distance speech. We compare the performance of the base-line system and the developed ASR system on a task of PBW (Phonetically Balanced Word) 452. As a result it is shown that the developed ASR system provides the average word error rate (WER) reduction of 30.6 % compared to the baseline ASR system.

키워드