A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering

결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구

  • 정현열 (영남대학교 전자정보공학부) ;
  • 정호열 (영남대학교 전자정보공학부) ;
  • 오세진 (대구과학대학 디지털정보통신계열) ;
  • 황철준 (대구과학대학 디지털정보통신계열) ;
  • 김범국 (대구과학대학 디지털정보통신계열)
  • Published : 2002.02.01

Abstract

In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.

본 논문은 한국어 음성인식에서 음향모델의 성능개선을 위한 기초적 연구로서 결정트리 상태 클러스터링에 의한 HM-Net (Hidden Markov Network)의 구조결정 알고리즘을 이용한 음성인식에 관한 연구를 수행하였다. 한국어는 다른 언어와 비교하여 많은 문법과 변이음이 존재하는데, 국어 음성학에서 정의한 다양한 변이음을 조사하고, 음소결정트리를 위한 음소 질의어 집합을 작성하였다. 본 논문의 HM-Net 구조결정 알고리즘의 아이디어는 SSS (Successive State Splitting) 알고리즘의 구조를 가지면서 미리 작성해 둔 문맥의존 음향모델의 상태를 다시 분할하는 방법이다. 즉, 모델의 각 상태위치마다 음소 질의어 집합에 의해 음소결정트리를 생성하고, PDT-SSS (Phonetic Decision Tree-based SSS) 알고리즘에 의해 문맥의존 음향모델의 상태열을 다시 학습하는 방법이다. 결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘의 유효성을 확인하기 위해, 국어공학센터 (KLE)의 452단어와 항공편 예약에 관련된 YNU200 문장을 대상으로 음성인식 실험을 수행하였다. 인식실험 결과, 음소, 단어, 연속음성인식 실험에서 상태분할을 수행한 후 상태수의 변화에 따라 인식률이 점진적으로 향상됨을 확인하였다. 상태수 2,000일 때 음소, 단어 인식률이 평균 71.5%, 99.2%를 각각 얻었으며, 연속음성인식률은 상태수 800일 때 평균 91.6%를 얻었다. 또한 HM-Net 구조결정 알고리즘의 파라미터 공유관계를 비교하기 위해 상태공유를 수행하는 HTK를 이용한 단어인식 실험을 수행하였다. 실험결과, HTK를 이용한 문맥의존 음향모델에 비해 평균 4.0%의 인식률 향상을 보여, 본 논문에서 적용한 결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘의 유효성을 확인하였다.

Keywords

References

  1. IEEE Signal Processing Magazine v.13 no.5 A review of large vocabulary continuous speech recognition S. J. Young https://doi.org/10.1109/79.536824
  2. Proc. of Eurospeech'97 Modeling and decoding of cross-word context-dependent phones in the Philips large vocabulary continuous speech recognition system P. Beyerlein;M. Ullrich;P. Wilcox
  3. Computer Speech and Language v.8 State clustering in hidden Markov model-based continuous speech recognition S. J. Young;P. C. Woodland https://doi.org/10.1006/csla.1994.1019
  4. IEEE Trans. Speech and Audio Processing v.4 no.6 Predicting unseen triphones with senones M. Y. Hwang;X. Huang;F. A. Alleva https://doi.org/10.1109/89.544526
  5. Proc. of ARPA Human Language Technology Workshop Tree-based state tying for high accuracy acoustic modeling S. J. Young;J.J. Odell;P.C. Woodland
  6. Proc. of ICASSP'91 Decision tree for phonological rules in continuous speech L. R. Bahl;P. V. de Souza;P. S. Gopalakrishnan;D. Nahamoo;M. A. Picheny
  7. Proc. of ICSLP'90 Description of acoustic variations by tree-based phone modeling S. Hayamizu;K. F Lee;H. W. Hon
  8. Proc. of ICASSP'90 Allophone clustering for continuous speech recognition K. F. Lee;H. W. Hon;C. Huang;J. Swartz;R. Weide
  9. Proc. of ICSLP'90 v.1 Estimation of unknown context using a phoneme environment clustering algorithm S. Sagayama;S. Honma
  10. Proc. of ICASSP'92 v.1 A successive state splitting algorithm for efficient allophone modeling J. Takami;S. Sagayama
  11. IEICE Trans. Info. & Syst. v.E78-D no.6 A new HMnet construction algorithm requiring no contextual factors M. Suzuki;S. Makino;A. Ito;H. Aso;H. Shimodaira
  12. Computer Speech and Language v.11 HMM topology design using maximum likelihood successive state splitting M. Ostendoft;H. Singer https://doi.org/10.1006/csla.1996.0021
  13. Ph. D. Thesis, Yamagata University A study on large vocabulary continuous speech recognition T. Hori
  14. Fundamentals of Speech Recognition L. Rabiner;B. H. Juang
  15. 確率モデ ルによる音聲認識 中川聖一
  16. IEEE 4th workshop on Multimedia Signal Processing New state clustering of hidden Markov network with Korean phonological rules for speech recognition S. J. Oh;C. J. Hwang;B. K. Kim;H. Y. Chung;A. Ito
  17. 국어음성학 이호영
  18. 국어음운론 배주채
  19. 한국음향학회지 v.20 no.3 콜퍼스에 기반한 한국어 문장/음성변환 시스템 김상훈;박준;이영직
  20. The HTK Book S. Young;D. Kershaw;J. Odell;D. ollason;V. Valtchev;P. Woodland
  21. IEEE Signal Processing Magazine Dynamic programming search for continuous speech recognition H. Ney;S. Ortmanns
  22. IEEE Signal Processing Magazine Hierarchical Search for large vocabulary conversational speech recognition N. Deshmukh;A. Ganaparhiraju;J. Picone
  23. Proc. of ICSP'01 A study on context-dependent acoustic modeling using the PDT-SSS Algorithm for Korean speech recognition S. J. Oh;C. J. Hwang;B. K. Kim;H. Y. Jung;H. Y. Chung