DOI QR코드

DOI QR Code

Development of Age Classification Deep Learning Algorithm Using Korean Speech

한국어 음성을 이용한 연령 분류 딥러닝 알고리즘 기술 개발

  • So, Soonwon (Department of Biomedical Engineering, Hanyang University) ;
  • You, Sung Min (Department of Biomedical Engineering, Hanyang University) ;
  • Kim, Joo Young (Department of Biomedical Engineering, Hanyang University) ;
  • An, Hyun Jun (Department of Biomedical Engineering, Hanyang University) ;
  • Cho, Baek Hwan (Department of Medical Device Management and Research, Sungkyunkwan University) ;
  • Yook, Sunhyun (Department of Biomedical Engineering, Hanyang University) ;
  • Kim, In Young (Department of Biomedical Engineering, Hanyang University)
  • 소순원 (한양대학교 일반대학원 생체공학과) ;
  • 유승민 (한양대학교 의생명공학전문대학원 생체의공학과) ;
  • 김주영 (한양대학교 의생명공학전문대학원 생체의공학과) ;
  • 안현준 (한양대학교 의생명공학전문대학원 생체의공학과) ;
  • 조백환 (성균관대학교 삼성융합의과학원 의료기기산업학과) ;
  • 육순현 (한양대학교 일반대학원 생체공학과) ;
  • 김인영 (한양대학교 일반대학원 생체공학과)
  • Received : 2018.02.12
  • Accepted : 2018.03.01
  • Published : 2018.04.30

Abstract

In modern society, speech recognition technology is emerging as an important technology for identification in electronic commerce, forensics, law enforcement, and other systems. In this study, we aim to develop an age classification algorithm for extracting only MFCC(Mel Frequency Cepstral Coefficient) expressing the characteristics of speech in Korean and applying it to deep learning technology. The algorithm for extracting the 13th order MFCC from Korean data and constructing a data set, and using the artificial intelligence algorithm, deep artificial neural network, to classify males in their 20s, 30s, and 50s, and females in their 20s, 40s, and 50s. finally, our model confirmed the classification accuracy of 78.6% and 71.9% for males and females, respectively.

Keywords

References

  1. J.H.L. Hansen and T. Hasan, "Speaker recognition by machines and humans: A tutorial review," IEEE Signal Proc. Mag., vol. 32, no. 6, pp. 74-99, 2015. https://doi.org/10.1109/MSP.2015.2462851
  2. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Muller, C., Narayanan, S, "The INTERSPEECH 2010 Paralinguistic Challenge," In: Proc. INTERSPEECH 2010, Makuhari, Japan, 2010, pp. 2794-2797.
  3. M. Li, K. J. Han, and S. Narayanan, "Automatic speaker age and gender recognition using acoustic and prosodic level information fusion," Computer Speech & Language, vol. 27, no. 1, pp. 151-167, 2013. https://doi.org/10.1016/j.csl.2012.01.008
  4. Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, and Dharmendra Sharma. "Fuzzy support vector machines for age and gender classification," In INTERSPEECH 2010, Makuhari, Japan, 2010, pp. 2806-2809.
  5. 강우현, 이강현, 강태균, 김남수. "I-벡터 특징을 이용하는 NN 기반의 화자 연령 분류,"한국통신학회 학술대회논문집, 2015, pp. 589-590.
  6. Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling," ISMIR, vol. 270, 2000.
  7. Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, 2015.
  8. Katerenchuk, Denys. "Age Group Classification with Speech and Metadata Multimodality Fusion." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers," vol. 2, 2017.
  9. 윤태진, 강윤정, "한국어 대용량발화말뭉치의 단모음분석," 말소리와 음성과학, 제6권, 제3호, 2014, pp. 139-145. https://doi.org/10.13064/KSSS.2014.6.3.139
  10. Muda, L., M. Begam and I. Elamvazuthi (2010). "Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques," arXiv preprint arXiv:1003.4083.
  11. D. Mahmoodi, H. Marvi, M. Taghizadeh, A. Soleimani, F. Razzazi, and M. Mahmoodi, "Age estimation based on speech features and support vector machine," in Proceedings of the 3rd Computer Science and Electronic Engineering Conference (CEEC '11), July. 2011, pp. 60-64.
  12. A. Kumar, P. Agarwal, P. Dighe, S. S. Bhiksha Raj, and K. Prahallad, "Speech Emotion Recognition by AdaBoost Algorithm and Feature Selection for Support Vector Machines," http://home.iitk.ac.in/?subhali/reports/reportiptse.pdf.
  13. KINGMA, Diederik P.; BA, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  14. B. D. Barkana and J. Zhou, "A new pitch-range based feature set for a speaker's age and gender classification," Appl. Acoust., vol. 98, pp. 52-61, 2015. https://doi.org/10.1016/j.apacoust.2015.04.013