Development of a Korean Speech Recognition Platform (ECHOS)

Kwon Oh-Wook;Kwon Sukbong;Jang Gyucheol;Yun Sungrack;Kim Yong-Rae;Jang Kwang-Dong;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju;

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 24 Issue 8
/
Pages.498-504
/
2005
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

Development of a Korean Speech Recognition Platform (ECHOS)

한국어 음성인식 플랫폼 (ECHOS) 개발

권오욱 (충북대학교) ;
권석봉 (한국정보통신대학교) ;
장규철 (한국과학기술원) ;
윤성락 (한국과학기술원) ;
김용래 (충북대학교) ;
장광동 (충북대학교) ;
김회린 (한국정보통신대학교) ;
유창동 (한국과학기술원) ;
김봉완 (음성정보기술산업지원센터) ;
이용주 (음성정보기술산업지원센터)

Published : 2005.11.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

We introduce a Korean speech recognition platform (ECHOS) developed for education and research Purposes. ECHOS lowers the entry barrier to speech recognition research and can be used as a reference engine by providing elementary speech recognition modules. It has an easy simple object-oriented architecture, implemented in the C++ language with the standard template library. The input of the ECHOS is digital speech data sampled at 8 or 16 kHz. Its output is the 1-best recognition result. N-best recognition results, and a word graph. The recognition engine is composed of MFCC/PLP feature extraction, HMM-based acoustic modeling, n-gram language modeling, finite state network (FSN)- and lexical tree-based search algorithms. It can handle various tasks from isolated word recognition to large vocabulary continuous speech recognition. We compare the performance of ECHOS and hidden Markov model toolkit (HTK) for validation. In an FSN-based task. ECHOS shows similar word accuracy while the recognition time is doubled because of object-oriented implementation. For a 8000-word continuous speech recognition task, using the lexical tree search algorithm different from the algorithm used in HTK, it increases the word error rate by $40\%$ relatively but reduces the recognition time to half.

교육 및 연구 목적을 위하여 개발된 한국어 음성인식 플랫폼인 ECHOS를 소개한다. 음성인식을 위한 기본 모듈을 제공하는 BCHOS는 이해하기 쉽고 간단한 객체지향 구조를 가지며, 표준 템플릿 라이브러리 (STL)를 이용한 C++ 언어로 구현되었다. 입력은 8또는 16 kHz로 샘플링된 디지털 음성 데이터이며. 출력은 1-beat 인식결과, N-best 인식결과 및 word graph이다. ECHOS는 MFCC와 PLP 특징추출, HMM에 기반한 음향모델, n-gram 언어모델, 유한상태망 (FSN)과 렉시컬트리를 지원하는 탐색알고리듬으로 구성되며, 고립단어인식으로부터 대어휘 연속음성인식에 이르는 다양한 태스크를 처리할 수 있다. 플랫폼의 동작을 검증하기 위하여 ECHOS와 hidden Markov model toolkit (HTK)의 성능을 비교한다. ECHOS는 FSN 명령어 인식 태스크에서 HTK와 거의 비슷한 인식률을 나타내고 인식시간은 객체지향 구현 때문에 약 2배 정도 증가한다. 8000단어 연속음성인식에서는 HTK와 달리 렉시컬트리 탐색 알고리듬을 사용함으로써 단어오류율은 $40\%$ 증가하나 인식시간은 0.5배로 감소한다.

Keywords

References

HTK Home page. http://htk.eng.cam.ac.uk
CMU Sphinx: Open Source Speech Recognition. http://www.speech.cs.cmu.edu/sphinx/Sphinx. html
Automatic Speech Recognition: Software. http://www.isip.msstate. edu/proiects/speech/software/
Multipurpose Large Vocabulary Continuous Speech Recognition Engine Julius. http://www.ar.media.kyoto-u.ac.jp/ members/ian/doc
http://speech.chungbuk.ac.kr/~owkwon/srhome/index.html ezCSR
권오욱, 김회린,유창동,김봉완,이용주,'한국어 음성인식 플랫폼의 설계,' 말소리, 51 (9). 2004
Standard Template Library Programmer's Guide. http://www.sgi.com/tsch/stl/
Practical UML: A Hands-On Introduction for Developers- by Randy Miller
L. Rabiner and B.-H. .Iuang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993)
F. Jelinek, Statistical Methods for Speech Recognition (Language, Speech, and Communication), (MIT Press, 1999)
S.B. Davis and P. Mermelstein, 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,' IEEE Trans. ASSP, 28, 357-366, Aug. 1980 https://doi.org/10.1109/TASSP.1980.1163420
H. Herrnanskv, 'Perceptual linear predictive (PLP) analysis of speech,' Journal of the Acoustical Society of America, 87 (4), 1738-1752, 1990 https://doi.org/10.1121/1.399423
Aurora, Distributed Speech Recognition. http://portal.etsi.org/stq/kta/DSR/dsr.asp
X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing, 648-650, Pretice Hall, 2001
M.K. Raishankar, Efficient Algorithms for Speech Recognition, (PhD Thesis, CMU, 1996)

The Journal of the Acoustical Society of Korea (한국음향학회지)

Development of a Korean Speech Recognition Platform (ECHOS)

한국어 음성인식 플랫폼 (ECHOS) 개발

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)