Phonetic Question Set Generation Algorithm

음소 질의어 집합 생성 알고리즘

  • 김성아 (고려대학교 컴퓨터학과 음성정보처리 연구실) ;
  • 육동석 (고려대학교 컴퓨터학과 음성정보처리 연구실) ;
  • 권오일 (현대 오토넷 주식회사)
  • Published : 2004.02.01

Abstract

Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.

음소 질의어 집합은 문맥 속에서 비슷한 조음 효과를 보이는 음소들을 분류해 놓은 것으로서, 음성 인식 시스템 학습 시 결정트리를 기반으로 HMM (hidden Markov model)의 상태들을 클러스터링할 때 사용된다. 현재까지의 음소 질의어 집합은 대부분 음성학자나 언어학자들에 의해 수작업으로 제시되어 왔는데, 이러한 지식 기반음소 질의어들은 언어 또는 유사음소 단위 (PLU: phone like unit)에 종속될 뿐 아니라 생성된 클러스터 내의 동질성을 저하시킬 수 있다는 단점이 있다. 본 논문에서는 이와 같은 문제점들을 해결하기 위해 음성 데이터를 사용하여 측정한 음소들 사이의 유사도를 기반으로 언어나 유사음소단위에 상관없이 자동으로 음소 질의어 집합을 생성하는 알고리즘을 제안한다. 실험결과, 제안한 방법으로 생성된 음소 질의어들을 사용한 인식기의 에러율이 약 14.3%감소하여 데이터 기반의 음소 질의어 집합이 상태 클러스터링에 효율적임을 관측하였다.

Keywords

References

  1. IEEE Transactions on Acoustics, Speech, and Signal Processing v.38 no.4 Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition K.Lee
  2. DARPA Human Language Technology Workshop Tree-based state tying for high accuracy acoustic modeling S.Young;J.Odell;P.Woodland
  3. IEEE Transactions on Speech and Audio Processing v.4 no.6 Predicting unseen triphones with senones M.Hwang;X.Huang;F.Alleva
  4. PhD thesis, University of Cambridge The use of context in large vocabulary speech recognition J.Odell
  5. Introduction to Statistical Pattern Recognition K.Fukunaga
  6. Lecture Notes in Computer Science v.2412 Decision tree based clustering D.Yook
  7. Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing v.2 Automatic question generation for decision tree based state tying K.Beulen;H.Ney
  8. Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing v.1 Automatic clustering and generation of contextual questions for tied states in hidden Markov models R.Singh;B.Raj;R.Stern
  9. Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing v.1 Unsupervised incremental online adaptation to unknown environment and speaker D.Yook
  10. Lecture Notes in Computer Science v.2510 Hidden Markov model and neural network hybrid D.Yook