Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Farhadipour, Aref;Veisi, Hadi;Asgari, Mohammad;Keyvanrad, Mohammad Ali;

doi:10.4218/etrij.2017-0260

ETRI Journal

Volume 40 Issue 5
/
Pages.643-652
/
2018
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Farhadipour, Aref (Department of Media Engineering, IRI Broadcast University) ;
Veisi, Hadi (Faculty of New Sciences and Technologies, University of Tehran) ;
Asgari, Mohammad (Department of Media Engineering, IRI Broadcast University) ;
Keyvanrad, Mohammad Ali (Department of Computer Engineering and Information Technology, Amirkabir University of Technology)

Received : 2017.11.14
Accepted : 2018.04.10
Published : 2018.10.01

https://doi.org/10.4218/etrij.2017-0260 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.

Keywords

References

F. Rudzicz, Production knowledge in the recognition of dysarthric speech, Ph.D. Thesis, Dept. Comput. Sci, Toronto University, Canada, 2011.
V. Poblete et al., A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification, Comput. Speech Lang. 31 (2015), no. 1, 1-27. https://doi.org/10.1016/j.csl.2014.10.006
M. J. Kim, Y. Kim, and H. Kim, Automatic intelligibility assessment of dysarthric speech using phonologically-structured sparse linear model, IEEE/ACM Trans. Audio, Speech, Lang. Process. 23 (2015), no. 4, 694-704. https://doi.org/10.1109/TASLP.2015.2403619
B. Schuller et al., A survey on perceived speaker traits: Personality, likability, pathology, and the first challenge, Comput. Speech Lang. 29 (2015), no. 1, 32. https://doi.org/10.1016/j.csl.2014.07.001
K. L. Kadi et al., Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge, Biocybernetics Biomed. Eng. 36 (2016), no. 1, 233-247. https://doi.org/10.1016/j.bbe.2015.11.004
X. Menendez-Pidal et al., The nemours database of dysarthric speech, Proc. Int. Conf. Spoken Lang., Philadelphia, PA, USA, Oct. 3-6, 1996, pp. 1962-1965.
F. Rudzicz, A. K. Namasivayam, and T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria, Lang. Resour. Eval. 46 (2012), no. 4, 523-541. https://doi.org/10.1007/s10579-011-9145-0
H. Kim et al., Dysarthric speech database for universal access research, Interspeech 2008 (2008), 1741-1744.
S. R. Shahamiri, B. Salim, and S. Salwah, A multi-views multilearners approach towards dysarthric speech recognition using multi-nets artificial neural networks, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (2014), no. 5, 1053-1063. https://doi.org/10.1109/TNSRE.2014.2309336
S.-O. Caballero-Morales and F. Trujillo-Romero, Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition, Expert Syst. Applicat. 41 (2014), no. 3, 841-852. https://doi.org/10.1016/j.eswa.2013.08.014
S. R. Shahamiri and S. S. B. Salim, Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach, Adv. Eng. Inform. 28 (2014), no. 1, 102-110. https://doi.org/10.1016/j.aei.2014.01.001
G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag. 29 (2012), no. 6, 82-97. https://doi.org/10.1109/MSP.2012.2205597
Z.-H. Ling et al., Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, IEEE Signal Process. Mag. 32 (2015), no. 3, 35-52. https://doi.org/10.1109/MSP.2014.2359987
F. Rudzicz, Articulatory knowledge in the recognition of dysarthric speech, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 947-960. https://doi.org/10.1109/TASL.2010.2072499
R. Palmer and P. Enderby, Methods of speech therapy treatment for stable dysarthria: A review, Int. J. Speech-Lang. Pathol. 9 (2007), no. 2, 140-153. https://doi.org/10.1080/14417040600970606
T. Kinnunen and L. Haizhou, An overview of text-indepedent speaker recognition from features to supervectores, Speech Commun. 52 (2010), no. 1, 12-40. https://doi.org/10.1016/j.specom.2009.08.009
X.-L. Zhang and J. Wu, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process. 21 (2013), no. 4, 697-710. https://doi.org/10.1109/TASL.2012.2229986
J. Sohn and W. Sung, A voice activity detector employing soft decision based noise spectrum adaptation, IEEE Int. Conf. Acoustics, Speech, Signal Process., Seattle, WA, USA, May 15, 1998, pp. 365-368.
S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28 (1980), no. 4, 357-366. https://doi.org/10.1109/TASSP.1980.1163420
H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am. 87 (1990), no. 4, 1738-1752. https://doi.org/10.1121/1.399423
G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput. 14 (2002), no. 8, 1771-1800. https://doi.org/10.1162/089976602760128018
N. Dehak et al., Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 788-798. https://doi.org/10.1109/TASL.2010.2064307
M. A. Keyvanrad and M. M. Homayounpour, A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet), arXiv preprint arXiv:1408.3264, 2014.

Cited by

Feature Recognition of English Based on Deep Belief Neural Network and Big Data Analysis vol.2021, pp.None, 2018, https://doi.org/10.1155/2021/5609885

ETRI Journal

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)