DOI QR코드

DOI QR Code

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

  • Farhadipour, Aref (Department of Media Engineering, IRI Broadcast University) ;
  • Veisi, Hadi (Faculty of New Sciences and Technologies, University of Tehran) ;
  • Asgari, Mohammad (Department of Media Engineering, IRI Broadcast University) ;
  • Keyvanrad, Mohammad Ali (Department of Computer Engineering and Information Technology, Amirkabir University of Technology)
  • Received : 2017.11.14
  • Accepted : 2018.04.10
  • Published : 2018.10.01

Abstract

Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.

Keywords

References

  1. F. Rudzicz, Production knowledge in the recognition of dysarthric speech, Ph.D. Thesis, Dept. Comput. Sci, Toronto University, Canada, 2011.
  2. V. Poblete et al., A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification, Comput. Speech Lang. 31 (2015), no. 1, 1-27. https://doi.org/10.1016/j.csl.2014.10.006
  3. M. J. Kim, Y. Kim, and H. Kim, Automatic intelligibility assessment of dysarthric speech using phonologically-structured sparse linear model, IEEE/ACM Trans. Audio, Speech, Lang. Process. 23 (2015), no. 4, 694-704. https://doi.org/10.1109/TASLP.2015.2403619
  4. B. Schuller et al., A survey on perceived speaker traits: Personality, likability, pathology, and the first challenge, Comput. Speech Lang. 29 (2015), no. 1, 32. https://doi.org/10.1016/j.csl.2014.07.001
  5. K. L. Kadi et al., Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge, Biocybernetics Biomed. Eng. 36 (2016), no. 1, 233-247. https://doi.org/10.1016/j.bbe.2015.11.004
  6. X. Menendez-Pidal et al., The nemours database of dysarthric speech, Proc. Int. Conf. Spoken Lang., Philadelphia, PA, USA, Oct. 3-6, 1996, pp. 1962-1965.
  7. F. Rudzicz, A. K. Namasivayam, and T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria, Lang. Resour. Eval. 46 (2012), no. 4, 523-541. https://doi.org/10.1007/s10579-011-9145-0
  8. H. Kim et al., Dysarthric speech database for universal access research, Interspeech 2008 (2008), 1741-1744.
  9. S. R. Shahamiri, B. Salim, and S. Salwah, A multi-views multilearners approach towards dysarthric speech recognition using multi-nets artificial neural networks, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (2014), no. 5, 1053-1063. https://doi.org/10.1109/TNSRE.2014.2309336
  10. S.-O. Caballero-Morales and F. Trujillo-Romero, Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition, Expert Syst. Applicat. 41 (2014), no. 3, 841-852. https://doi.org/10.1016/j.eswa.2013.08.014
  11. S. R. Shahamiri and S. S. B. Salim, Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach, Adv. Eng. Inform. 28 (2014), no. 1, 102-110. https://doi.org/10.1016/j.aei.2014.01.001
  12. G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag. 29 (2012), no. 6, 82-97. https://doi.org/10.1109/MSP.2012.2205597
  13. Z.-H. Ling et al., Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, IEEE Signal Process. Mag. 32 (2015), no. 3, 35-52. https://doi.org/10.1109/MSP.2014.2359987
  14. F. Rudzicz, Articulatory knowledge in the recognition of dysarthric speech, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 947-960. https://doi.org/10.1109/TASL.2010.2072499
  15. R. Palmer and P. Enderby, Methods of speech therapy treatment for stable dysarthria: A review, Int. J. Speech-Lang. Pathol. 9 (2007), no. 2, 140-153. https://doi.org/10.1080/14417040600970606
  16. T. Kinnunen and L. Haizhou, An overview of text-indepedent speaker recognition from features to supervectores, Speech Commun. 52 (2010), no. 1, 12-40. https://doi.org/10.1016/j.specom.2009.08.009
  17. X.-L. Zhang and J. Wu, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process. 21 (2013), no. 4, 697-710. https://doi.org/10.1109/TASL.2012.2229986
  18. J. Sohn and W. Sung, A voice activity detector employing soft decision based noise spectrum adaptation, IEEE Int. Conf. Acoustics, Speech, Signal Process., Seattle, WA, USA, May 15, 1998, pp. 365-368.
  19. S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28 (1980), no. 4, 357-366. https://doi.org/10.1109/TASSP.1980.1163420
  20. H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am. 87 (1990), no. 4, 1738-1752. https://doi.org/10.1121/1.399423
  21. G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput. 14 (2002), no. 8, 1771-1800. https://doi.org/10.1162/089976602760128018
  22. N. Dehak et al., Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 788-798. https://doi.org/10.1109/TASL.2010.2064307
  23. M. A. Keyvanrad and M. M. Homayounpour, A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet), arXiv preprint arXiv:1408.3264, 2014.

Cited by

  1. Feature Recognition of English Based on Deep Belief Neural Network and Big Data Analysis vol.2021, pp.None, 2018, https://doi.org/10.1155/2021/5609885