DOI QR코드

DOI QR Code

Development of a Work Management System Based on Speech and Speaker Recognition

  • 투고 : 2021.03.16
  • 심사 : 2021.04.06
  • 발행 : 2021.06.30

초록

Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

키워드

과제정보

This paper was supported by Research Fund, Kumoh National Institute of Technology (2018-104-079).

참고문헌

  1. Google, Cloud Speech-to-Text, see https://cloud.google.com/speech-to-text
  2. Herve Bourlard, Nelson Morgan, Connectionist Speech Recognition: A Hybrid Approach, The Kluwer International Series in Engineering and Computer Science; v. 247, Kluwer Academic Publishers, 1994.
  3. R. Parente, N. Kock, John Sonsini, "An Analysis of the Implementation and Impact of Speech-recognition Technology in the Healthcare Sector." Perspectives in health information management Vol. 1, 2004.
  4. Kulyukin, V. Human-Robot Interaction Through Gesture-Free Spoken Dialogue. Autonomous Robots 16, pp. 239-257 (2004). https://doi.org/10.1023/B:AURO.0000025789.33843.6d
  5. Norberto Pires, J. (2005), "Robot by Voice: Experiments on Commanding an Industrial Robot Using the Human Voice", Industrial Robot, Vol. 32 No. 6, pp. 505-511. https://doi.org/10.1108/01439910510629244
  6. Adam Rogowski, Industrially oriented voice control system, Robotics and Computer-Integrated Manufacturing, Elsevier. Vol. 28, Issue 3, June 2012, pp. 303-315. https://doi.org/10.1016/j.rcim.2011.09.010
  7. K. Zinchenko, C. Wu, K. Song, "A Study on Speech Recognition Control for a Surgical Robot," in IEEE Transactions on Industrial Informatics, Vol. 13, No. 2, pp. 607-615, April 2017. https://doi.org/10.1109/TII.2016.2625818
  8. Ismail, Ahmed; Abdlerazek, Samir; El-Henawy, Ibrahim M. 2020. "Development of Smart Healthcare System Based on Speech Recognition Using Support Vector Machine and Dynamic Time Warping" Sustainability 12, No. 6: 2403. https://doi.org/10.3390/su12062403
  9. Anwer, Saba; Waris, Asim; Sultan, Hajrah; Butt, Shahid I.; Zafar, Muhammad H.; Sarwar, Moaz; Niazi, Imran K.; Shafique, Muhammad; Pujari, Amit N. 2020. "Eye and Voice-Controlled Human Machine Interface System for Wheelchairs Using Image Gradient Approach" Sensors 20, No. 19: 5510. https://doi.org/10.3390/s20195510
  10. Ohneiser, Oliver; Jauer, Malte; Rein, Jonathan R.; Wallace, Matt. 2018. "Faster Command Input Using the Multimodal Controller Working Position "TriControl"" Aerospace 5, No. 2: 54. https://doi.org/10.3390/aerospace5020054
  11. Kaczmarek, Wojciech; Panasiuk, Jaroslaw; Borys, Szymon; Banach, Patryk. 2020. "Industrial Robot Control by Means of Gestures and Voice Commands in Off-Line and On-Line Mode" Sensors 20, No. 21: 6358. https://doi.org/10.3390/s20216358
  12. Microsoft, Azure Speech to Text, see https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
  13. Ye-Ji Kim, Yong-Seong Moon, Seong-Hun Jeong, Dae-Han Jeong, Tae-Hyong Kim, "Voice Recognition and Control System Based on Deep Learning for Smart Lighting", KSC2017, Korea Information Science Society, 2017.12.
  14. Yong-Seong Moon, Ye-Ji Kim, Seong-Hun Jeong, Yu-Hee Kim, Chang-Yeol Lee, Tae-Hyong Kim, "Dialog Management for Voice Recognition based Light Control", KCC2018, Korea Information Science Society, 2018.06.
  15. C. Shayamunda, T. D. Ramotsoela, G. P. Hancke, "Biometric Authentication System for Industrial Applications using Speaker Recognition," IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 2020, pp. 4459-4464.
  16. E. K. Wang, X. Liu, C. -M. Chen, S. Kumari, M. Shojafar, M. S. Hossain, "Voice-Transfer Attacking on Industrial Voice Control Systems in 5G-Aided IIoT Domain," in IEEE Transactions on Industrial Informatics, 2020. doi: 10.1109/TII. 2020.3023677.
  17. Mozilla, Project DeepSpeech, see https://github.com/mozilla /DeepSpeech, 2016.
  18. Mirco Ravanelli, Yoshua Bengio, "Speaker Recognition from Raw Waveform with SincNet", arXiv:1808.00158,2018.
  19. Awni Y. Hannun, Carl Case, J. Casper, Bryan Catanzaro, G. Diamos, Erich Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, A. Ng, "Deep Speech: Scaling up end-to-end speech recognition", arXiv:1412.5567, 2014.
  20. Wit.ai, Inc, wit.ai: Build Natural Language Experiences, see https://wit.ai/
  21. DeepSpeech, "DeepSpeech Model", https://deepspeech.readthedocs.io/en/v0.9.3/DeepSpeech.html
  22. Mark Heath, NAudio, see https://github.com/naudio/NAudio
  23. librosa, A python package for music and audio analysis, see https://github.com/librosa/librosa
  24. kenlm, KenLM: Faster and Smaller Language Model Queries, see https://github.com/kpu/kenlm.
  25. Wikipedia, Levenshtein distance, see https://en.wikipedia.org /wiki/Levenshtein_distance