Development of a Work Management System Based on Speech and Speaker Recognition

Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong;

doi:10.14372/IEMEK.2021.16.3.89

대한임베디드공학회논문지 (IEMEK Journal of Embedded Systems and Applications)

제16권3호
/
Pages.89-97
/
2021
/
1975-5066(pISSN)

대한임베디드공학회 (Institute of Embedded Engineering of Korea)

DOI QR Code

Development of a Work Management System Based on Speech and Speaker Recognition

Gaybulayev, Abdulaziz (Kumoh National Institute of Technology) ;
Yunusov, Jahongir (Kumoh National Institute of Technology) ;
Kim, Tae-Hyong (Kumoh National Institute of Technology)

투고 : 2021.03.16
심사 : 2021.04.06
발행 : 2021.06.30

https://doi.org/10.14372/IEMEK.2021.16.3.89 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

키워드

과제정보

This paper was supported by Research Fund, Kumoh National Institute of Technology (2018-104-079).

참고문헌

Google, Cloud Speech-to-Text, see https://cloud.google.com/speech-to-text
Herve Bourlard, Nelson Morgan, Connectionist Speech Recognition: A Hybrid Approach, The Kluwer International Series in Engineering and Computer Science; v. 247, Kluwer Academic Publishers, 1994.
R. Parente, N. Kock, John Sonsini, "An Analysis of the Implementation and Impact of Speech-recognition Technology in the Healthcare Sector." Perspectives in health information management Vol. 1, 2004.
Kulyukin, V. Human-Robot Interaction Through Gesture-Free Spoken Dialogue. Autonomous Robots 16, pp. 239-257 (2004). https://doi.org/10.1023/B:AURO.0000025789.33843.6d
Norberto Pires, J. (2005), "Robot by Voice: Experiments on Commanding an Industrial Robot Using the Human Voice", Industrial Robot, Vol. 32 No. 6, pp. 505-511. https://doi.org/10.1108/01439910510629244
Adam Rogowski, Industrially oriented voice control system, Robotics and Computer-Integrated Manufacturing, Elsevier. Vol. 28, Issue 3, June 2012, pp. 303-315. https://doi.org/10.1016/j.rcim.2011.09.010
K. Zinchenko, C. Wu, K. Song, "A Study on Speech Recognition Control for a Surgical Robot," in IEEE Transactions on Industrial Informatics, Vol. 13, No. 2, pp. 607-615, April 2017. https://doi.org/10.1109/TII.2016.2625818
Ismail, Ahmed; Abdlerazek, Samir; El-Henawy, Ibrahim M. 2020. "Development of Smart Healthcare System Based on Speech Recognition Using Support Vector Machine and Dynamic Time Warping" Sustainability 12, No. 6: 2403. https://doi.org/10.3390/su12062403
Anwer, Saba; Waris, Asim; Sultan, Hajrah; Butt, Shahid I.; Zafar, Muhammad H.; Sarwar, Moaz; Niazi, Imran K.; Shafique, Muhammad; Pujari, Amit N. 2020. "Eye and Voice-Controlled Human Machine Interface System for Wheelchairs Using Image Gradient Approach" Sensors 20, No. 19: 5510. https://doi.org/10.3390/s20195510
Ohneiser, Oliver; Jauer, Malte; Rein, Jonathan R.; Wallace, Matt. 2018. "Faster Command Input Using the Multimodal Controller Working Position "TriControl"" Aerospace 5, No. 2: 54. https://doi.org/10.3390/aerospace5020054
Kaczmarek, Wojciech; Panasiuk, Jaroslaw; Borys, Szymon; Banach, Patryk. 2020. "Industrial Robot Control by Means of Gestures and Voice Commands in Off-Line and On-Line Mode" Sensors 20, No. 21: 6358. https://doi.org/10.3390/s20216358
Microsoft, Azure Speech to Text, see https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
Ye-Ji Kim, Yong-Seong Moon, Seong-Hun Jeong, Dae-Han Jeong, Tae-Hyong Kim, "Voice Recognition and Control System Based on Deep Learning for Smart Lighting", KSC2017, Korea Information Science Society, 2017.12.
Yong-Seong Moon, Ye-Ji Kim, Seong-Hun Jeong, Yu-Hee Kim, Chang-Yeol Lee, Tae-Hyong Kim, "Dialog Management for Voice Recognition based Light Control", KCC2018, Korea Information Science Society, 2018.06.
C. Shayamunda, T. D. Ramotsoela, G. P. Hancke, "Biometric Authentication System for Industrial Applications using Speaker Recognition," IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 2020, pp. 4459-4464.
E. K. Wang, X. Liu, C. -M. Chen, S. Kumari, M. Shojafar, M. S. Hossain, "Voice-Transfer Attacking on Industrial Voice Control Systems in 5G-Aided IIoT Domain," in IEEE Transactions on Industrial Informatics, 2020. doi: 10.1109/TII. 2020.3023677.
Mozilla, Project DeepSpeech, see https://github.com/mozilla /DeepSpeech, 2016.
Mirco Ravanelli, Yoshua Bengio, "Speaker Recognition from Raw Waveform with SincNet", arXiv:1808.00158,2018.
Awni Y. Hannun, Carl Case, J. Casper, Bryan Catanzaro, G. Diamos, Erich Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, A. Ng, "Deep Speech: Scaling up end-to-end speech recognition", arXiv:1412.5567, 2014.
Wit.ai, Inc, wit.ai: Build Natural Language Experiences, see https://wit.ai/
DeepSpeech, "DeepSpeech Model", https://deepspeech.readthedocs.io/en/v0.9.3/DeepSpeech.html
Mark Heath, NAudio, see https://github.com/naudio/NAudio
librosa, A python package for music and audio analysis, see https://github.com/librosa/librosa
kenlm, KenLM: Faster and Smaller Language Model Queries, see https://github.com/kpu/kenlm.
Wikipedia, Levenshtein distance, see https://en.wikipedia.org /wiki/Levenshtein_distance

대한임베디드공학회논문지 (IEMEK Journal of Embedded Systems and Applications)

Development of a Work Management System Based on Speech and Speaker Recognition

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)