DOI QR코드

DOI QR Code

레벤스타인 거리 기반의 위치 정확도를 이용하여 다중 음성 인식 결과에서 관련성이 적은 후보 제거

Removal of Heterogeneous Candidates Using Positional Accuracy Based on Levenshtein Distance on Isolated n-best Recognition

  • Yun, Young-Sun (Department of Information and Communication Engineering, Hannam University)
  • 투고 : 2011.08.24
  • 심사 : 2011.11.03
  • 발행 : 2011.11.30

초록

Many isolated word recognition systems may generate irrelevant words for recognition results because they use only acoustic information or small amount of language information. In this paper, I propose word similarity that is used for selecting (or removing) less common words from candidates by applying Levenshtein distance. Word similarity is obtained by using positional accuracy that reflects the frequency information along to character's alignment information. This paper also discusses various improving techniques of selection of disparate words. The methods include different loss values, phone accuracy based on confusion information, weights of candidates by ranking order and partial comparisons. Through experiments, I found that the proposed methods are effective for removing heterogeneous words without loss of performance.

키워드

참고문헌

  1. X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: a guide to theory, algorithm, and system development, pp. 663-674, Prentice Hall, New Jersey, 2001.
  2. J. Li, Y. Tsao and C.-H. Lee, "A study on knowledge source integration for candidate rescoring in automatic speech recognition," in Proc. ICASSP , pp. 837-840, Philadelphia, Pennsylvania, March 2005.
  3. G. Leusch, N. Ueffing, and H. Ney, "A novel string-to-string distance measure with applications to machine translation evaluation," in Proc. MT Summit IX, pp. 240-247, New Orleans, Louisiana, September 2003.
  4. L. Lita, "Dynamic machine translation evaluation methods: algorithmic analysis and generalization," CMU-LTI-05-193, 2005.
  5. S. Young, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK book version 2.1, pp. 197, Cambridge University, 1997.
  6. NIST SCLITE Scoring Package Version 1.5, http://www.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm, 1997.
  7. J. Park, H. Chung, and Y. Lee, "Development of the point-of-interest input system based on large-vocabulary embedded speech recognition," in Proc. KSPSS, pp. 108-111, November 2007.
  8. I. Melamed, R. Green, and J. Turian, "Precision and recall of machine translation," in Proc. HLT-NAACL, pp.61-63, Edmonton, Canada, May 2003.