Advanced SearchSearch Tips
Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination
Kang, Jihoon; Kim, Youngil; Jeong, Sangbae;
  PDF(new window)
In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.
speaker recognition;glottal flow;skewness;kurtosis;PCA;LDA;
 Cited by
T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Communication, Vol. 52, No. 1, pp. 12-40, 2010. crossref(new window)

B. Putra and Suyanto, "Implementation of secure speaker verification at web login page using Mel Frequency Cepstral coefficient-Gaussian Mixture Model (MFCCGMM)," ICA, pp. 358-363, 2011.

N. Ahmed, "How I came up with the discrete cosine transform," Digital Signal Processing, Vol. 1, No. 1, pp. 4-9, 1991. crossref(new window)

D. Raynolds and R. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Trans. Speech and Audio Proc., Vol. 3, No. 1, pp. 72-83, 1995. crossref(new window)

L. Rabiner and B. H. Juang, Fundamental of Speech Recognition, Signal Processing Series, Prentice Hall, New Jersey, 1993.

T. Kinnunen and P. Alku, "On separation glottal source and vocal tract information in telephony speaker verification," ICASSP, pp. 4545-4548, 2009.

J. Markel and A. Gray Jr., Linear Prediction of Speech, Springer-Verlag, New York, 1976.

P. Alku, H. Tiitinen and R. Naatanen, "A method for generating natural-sounding speech stimuli for cognitive brain research," CLINPH, pp. 1329-1333, 1999.

W. Kleijin and K. Paliwal, Speech Coding and Synthesis, 2nd ed., Elsevier, 1998.

C. Nikias and A. Petropulu, Higher-Order Spectra Analysis, Prentice Hall, 1993.

Martinez, A.M. and Kak, A.C., "PCA versus LDA," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp. 228-233, 2001. crossref(new window)

Md Jahangir Alam, T. Kinnunen, P. Kenny, P. Ouellet and D. O'Shaughnessy, "Multitaper MFCC and PLP features for speaker verification using i-vectors," Speech Communication, Vol. 55, No. 2, pp. 237-251, 2013. crossref(new window)

H. Hermanski, "Perceptually linear predictive(PLP) analysis of speech," J. Acoust. Soc. Am., Vol. 87. No. 4, Apr. 1990.