A study on user defined spoken wake-up word recognition system using deep neural network-hidden Markov model hybrid model

Deep neural network-hidden Markov model 하이브리드 구조의 모델을 사용한 사용자 정의 기동어 인식 시스템에 관한 연구

  • 윤기무 (인천대학교 컴퓨터공학부) ;
  • 김우일 (인천대학교 컴퓨터공학부)
  • Received : 2020.01.23
  • Accepted : 2020.03.04
  • Published : 2020.03.31


Wake Up Word (WUW) is a short utterance used to convert speech recognizer to recognition mode. The WUW defined by the user who actually use the speech recognizer is called user-defined WUW. In this paper, to recognize user-defined WUW, we construct traditional Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), Linear Discriminant Analysis (LDA)-GMM-HMM and LDA-Deep Neural Network (DNN)-HMM based system and compare their performances. Also, to improve recognition accuracy of the WUW system, a threshold method is applied to each model, which significantly reduces the error rate of the WUW recognition and the rejection failure rate of non-WUW simultaneously. For LDA-DNN-HMM system, when the WUW error rate is 9.84 %, the rejection failure rate of non-WUW is 0.0058 %, which is about 4.82 times lower than the LDA-GMM-HMM system. These results demonstrate that LDA-DNN-HMM model developed in this paper proves to be highly effective for constructing user-defined WUW recognition system.


Supported by : 인천대학교


  1. V. Z. Kepuska and T. B. Klein, "A novel Wake-Up-Word speech recognition system, Wake-up-Word recognition task, technology and evaluation," Nonlinear Analysis, 71, e2772-e2789 (2009).
  2. F. Ge and Y. Yan, "Deep neural network based Wake- Up-Word speech recognition with two-stage detection," Proc. ICASSP. 2761-2765 (2017).
  3. G. Hinton, L. Deng, D. Yu, G. Dahl, A. -r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, 29, 82-97 (2012).
  4. S. Mika, G. Ratsch , J. Weston, B. Scholkopf, and K. R. Mullers, "Fisher discriminant analysis with kernels," Proc. IEEE Neural Networks for Signal Processing Workshop, 711-720 (1999).
  5. ETSI ES 201 108, ETSI Standard Document, v1.1.2 (2000-04)., 2000.