Segmentation of Continuous Speech based on PCA of Feature Vectors

주요고유성분분석을 이용한 연속음성의 세그멘테이션

  • 신옥근 (한국해양대학교 자동화·정보공학부)
  • Published : 2000.02.01

Abstract

In speech corpus generation and speech recognition, it is sometimes needed to segment the input speech data without any prior knowledge. A method to accomplish this kind of segmentation, often called as blind segmentation, or acoustic segmentation, is to find boundaries which minimize the Euclidean distances among the feature vectors of each segments. However, the use of this metric alone is prone to errors because of the fluctuations or variations of the feature vectors within a segment. In this paper, we introduce the principal component analysis method to take the trend of feature vectors into consideration, so that the proposed distance measure be the distance between feature vectors and their projected points on the principal components. The proposed distance measure is applied in the LBDP(level building dynamic programming) algorithm for an experimentation of continuous speech segmentation. The result was rather promising, resulting in 3-6% reduction in deletion rate compared to the pure Euclidean measure.

음소에 대한 사전지식 없이 음성의 신호나 특징벡터 만으로부터 음소별 경계를 추출하는 맹목 세그멘테이션의 한가지 방법은 음소별 특징벡터들 사이의 거리를 최소화하는 경계를 찾는 것이다. 이런 방법에서 특징벡터들 사이의 거리척도로 유클리드 거리가 자주 사용되고 있지만 한 음소의 특징벡터들 사이에도 많은 변화가 있어 단순한 유클리드 거리척도만으로는 음소별 경계를 추출하기에 효율적이지 못하다. 본고에서는 한 음소에 속하는 특징벡터들의 전체적인 추이를 반영한 특징벡터들 사이의 거리를 구하기 위해 주요고유성분분석법(principal component analysis)을 이용하는 방법을 제안한다. 이 방법에서는 각 특징벡터들과 이들을 주요고유성분에 투영한 점 사이의 거리를 척도로 이용한다. 제안하는 거리척도를 LBDP 알고리즘에 적용하여 연속음성의 음소간 경계를 추출하는 실험을 수행하였다. 실험 결과, 단순한 유클리드 거리를 척도로 할 때 보다 약 3-6% 정도의 누락오류를 줄일 수 있어 유용하게 이용될 수 있음을 보였다.

Keywords

References

  1. IEEE Trans. on Pattern Analysis and Machine Intelligence v.PAMI-5 A Maximum Likehood Approach to Continuous Speech Recognition L. R. Bahl;F. Jelinek;R. L. Mercer
  2. IEEE Trans. on Acoustics, Speech and Signal Processing v.37 no.12 A Stochastic Segment Model for Phoneme Based Continuous Speech Recognition M. Ostendorf;S. R. Roukos
  3. IEEE Trans. on Acoustics, Speech and Signal Processing v.36 no.1 A New Approach for the Automatic Segmentation of Continuous Speech Signals R. A. Obrecht
  4. 전자공학회 논문지 v.33 no.B4 연속음에서의 각 음소의 대표구간 추출에 관한 연구 박찬응;이쾌희
  5. Proc. ICSLP v.96 no.2 Time-Based Clustering for Phonetic Segmentation B. Eberman;W. Goldenthal
  6. Proc. ICSLP 96 v.2 Blind Speech Segmentation;Automatic Segmentation of speech without Linguistic Knowledge M. Sharma;R. Mammone
  7. IEEE Trans. on Acoustics, Speech and Signal Processing v.29 no.4 A Level Building Dynamic Time Warping Algorithm for connected word Recognition C. S. Myersand;L. R. Rabiner
  8. Introduction of Statistical Pattern Recognition K. Fukunaga
  9. Discrete Random Signals and statistical Signal Processing C. W. Therrien
  10. Proc. of 1992 ICASSP v.2 An unsupervised Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers M. H. Siu(et.al.)
  11. Numerical Recipes in C, (2nd ed.) W. H. Press(et. al.)
  12. Ph. D. Thesis, Oregon Graduate Institute of Science & Tech. N-Best Formant Features for Segment-Based Speech Recognition P. Schmid, Explicit
  13. Proc. of EuroSpeech '97. Rhodos Pronunciation Modeling Applied to Automatic Segmentation of Spontaneous Speech A. Kipp
  14. Journal of Acoustic Society of America v.87 no.4 Perceptual Linear Predictive Analysis of Speech H. Hermansky
  15. Proc. of 1998 Workshop on Speech Processing Speaker Normalization Using Correlation Among Classes Z. Hu(et. al.)
  16. IEEE 1998 Workshop on Multimedia Signal Processing Eigenfaces and Eigenvoices;Dimensionlity Reduction for Specialized Pattern Recognition R. Kuhn(et. al.)