DOI QR코드

DOI QR Code

Gender Analysis in Elderly Speech Signal Processing

노인음성신호처리에서의 젠더 분석

  • Lee, JiYeoun (Department of Biomedical Engineering, Jungwon University)
  • 이지연 (중원대학교 생체의공학과)
  • Received : 2018.09.04
  • Accepted : 2018.10.20
  • Published : 2018.10.28

Abstract

Changes in vocal cords due to aging can change the frequency of speech, and the speech signals of the elderly can be automatically distinguished from normal speech signals through various analyzes. The purpose of this study is to provide a tool that can be easily accessed by the elderly and disabled people who can be excluded from the rapidly changing technological society and to improve the voice recognition performance. In the study, the gender of the subjects was reported as sex analysis, and the number of female and male voice samples was used equally. In addition, the gender analysis was applied to set the voices of the elderly without using voices of all ages. Finally, we applied a review methodology of standards and reference models to reduce gender difference. 10 Korean women and 10 men aged 70 to 80 years old are used in this study. Comparing the F0 value extracted directly with the waveform and the F0 extracted with TF32 and the Wavesufer speech analysis program, Wavesufer analyzed the F0 of the elderly voice better than TF32. However, there is a need for a voice analysis program for elderly people. In conclusions, analyzing the voice of the elderly will improve speech recognition and synthesis capabilities of existing smart medical systems.

화로 인한 성대의 변화는 음성의 주파수를 변화시킬 수 있으며, 그 노인음성 신호는 다양한 분석을 통해 정상음성 신호와 자동으로 구분할 수 있다. 본 연구의 목적은 기존 스마트 의료 시스템의 노령자 음성 인식 성능을 향상시키고, 음성을 이용한 편리한 인터페이스를 제공함으로써 빠르게 변화하고 있는 기술사회에서 제외될 수 있는 노인과 장애인들에게 쉽게 접근 할 수 있는 도구를 제공하는 것이다. 본 연구에서는 성 분석으로서, 연구 대상의 성별을 보고했고, 여성과 남성 음성 샘플 개수를 동일하게 사용하였다. 또한 젠더 분석을 적용하여 모든 연령의 목소리를 사용하지 않고 노령자의 목소리를 목표로 설정하여 실험을 수행하였다. 마지막으로, 우리는 성별 및 젠더 편견을 줄이기 위한 표준 및 참조 모델의 재검토 방법을 적용하였다. 본 연구에서는 70세에서 80세까지의 한국인 여성 10명과 남성 10명의 노령자 음성을 사용했다. 파형을 보고 직접 추출한 F0 값과 TF32와 Wavesufer 음성 분석 프로그램에서 추출된 F0를 비교했을 때, TF32보다 Wavesufer가 노인음성의 F0를 더 잘 분석하는 것을 알 수 있었다. 그러나 노령자 대상 노인음성용 음성분석프로그램이 필요하며, 노령자의 음성을 분석함으로써 기존 스마트 의료 시스템의 음성 인식 및 합성 성능을 향상시킬 수 있을 것으로 기대한다.

Keywords

References

  1. J. Lee. (2014). KHIDI Brief. Korea Health Industry Development Institute. 140(2014), 1-2.
  2. J. I. Yi, Y. K. Kim & G. J. Kim. (2017). A Study on Improving English Pronunciation and Intonation utilizing Fluency Improvement system, Journal of the Korea Convergence Society, 8(11), 1-6. https://doi.org/10.15207/JKCS.2017.8.11.001
  3. J. C. Hwang. (2017). Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution, Journal of the Korea Convergence Society, 8(5), 13-17. https://doi.org/10.15207/JKCS.2017.8.5.013
  4. J. C. Kahane. (1981). Anatomic and physiologic changes in the aging peripheral speech mechanism, Edited D. S. Beasley & G. A. Davis, Grune & Stratton, New York, 21-45.
  5. S. Y. Lee. (2011). The overall speaking rate and articulation rate of normal elderly people, Graduate program in speech and language pathology, Master these, Yonsei University.
  6. R. T. Sataloff, D. C. Rogen, M. Hawkshaw & J. R. Spiegel. (1997). The three ages of voice. The aging adult voice, Journal of Voice, 11(2), 156-160. https://doi.org/10.1016/S0892-1997(97)80072-0
  7. S. Lee & S. Kim. (2014). Elderly speech analysis for improving elderly speech recognition, Communications of the KOREA Information Science Society, 32(11), 15-20.
  8. J. Y. Lee & S. H. Choi. (2012). Perturbation analysis using a moving window for disordered voices, International Journal of Engineering, Science, and Innovative Technology, 3(1), 1-10.
  9. J. Y. Lee. (2016). Fundamental Frequency Characteristics using Moving Window Method for Korean Elderly Voices, International Journal of Engineering and Technology, 8(3), 1589-1599.
  10. J. B. Alonso, J. de Leon, I. Alonso & M. A. Ferrer. (2001). Automatic Detection of Pathologies in the Voice by HOS Based Parameters, EURASIP Journal on Applied Signal Processing, 4(2001), 275-284.
  11. J. Y. Lee, S. Jeong & M. S. Hahn. (2008). Pathological Voice Detection Using Efficient Combination of Heterogeneous Features, IEICE Transactions on Information and Systems, E91-D(2), 367-370. https://doi.org/10.1093/ietisy/e91-d.2.367
  12. J. Y. Lee, S. Jeong, H. S. Choi & M. S. Hahn. (2008). Objective pathological voice quality assessment based on HOS features, IEICE Transactions on Information and Systems, E91-D(12), 2888-2891. https://doi.org/10.1093/ietisy/e91-d.12.2888
  13. J. Y. Lee. (2012). A two-stage approach using Gaussian mixture models and higher-order statistics for a classification of normal and pathological voices, Advances in Signal Processing on Euraship, 252(2012). http://asp.eurasipjournals.com/content/2012/1/252.
  14. J. Y. Lee, S. B. Jeong, M. S. Hahn, A. Sprecher & J. J. Jiang. (2011). An efficient approach using HOS-based parameters in the LPC residual domain to classify breathy and rough voices, Biomedical Signal Processing and Control, 6(2), 186-196. https://doi.org/10.1016/j.bspc.2010.09.003
  15. J. Y. Lee. (2017). Feature Extraction of Elderly Signals based on Bicoherence Estimation for Automated Medical Diagnosis System, International Journal of Control and Automation, 10(2), 115-128. http//dx.doi.org/10.14257/ijca.2017.10.2.10
  16. KOFWST, Gendered Innovations, http://gister.re.kr/#!/main
  17. WISET (2013). Science and Technology Gender Innovation. Seoul : WISET. ISBN 978-89-97520-24-4
  18. H. T. Kim, S. H. Cho, S. M. Youn, D. I. Sun & M. S. Kim. (2000). The Changes and Characteristics of Acoustic Parameters with Aging in Korean, Korean J Otolaryngol, 2000(43), 69-74.
  19. S. W. Kim, H. H. Park, E. S. Park & H. S. Choi. (2010). Acoustic Characteristics of Normal Healthy Koreans with Advancing age, Phonetics and Speech Sciences, 2(4), 19-28.
  20. P. H. Milenkovic. University of Wisconsin-Madison http://userpages.chorus.net/cspeech/
  21. https://en.wikipedia.org/wiki/WaveSurfer