Multimodal Emotion Recognition using Face Image and Speech

얼굴영상과 음성을 이용한 멀티모달 감정인식

  • Published : 2012.03.30

Abstract

A challenging research issue that has been one of growing importance to those working in human-computer interaction are to endow a machine with an emotional intelligence. Thus, emotion recognition technology plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between human and computer. In this paper, we propose the multimodal emotion recognition system using face and speech to improve recognition performance. The distance measurement of the face-based emotion recognition is calculated by 2D-PCA of MCS-LBP image and nearest neighbor classifier, and also the likelihood measurement is obtained by Gaussian mixture model algorithm based on pitch and mel-frequency cepstral coefficient features in speech-based emotion recognition. The individual matching scores obtained from face and speech are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. Through experimental results, the proposed method exhibits improved recognition accuracy of about 11.25% to 19.75% when compared to the most uni-modal approach. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

Keywords

References

  1. S. I. Stephen et. al, "Using a live-In laboratory for ubiquitous computing research," Lecture Notes in Computer Science, Vol. 3968, 2006, pp. 349-365.
  2. K. Partridge and P. Golle, "On using existing time-use study data for ubiquitous computing applications," Proceedings of the 10th international conference on Ubiquitous computing, ACM, Vol. 344, 2008, pp. 144-153.
  3. Y. S. Shin, "The effect of facial expression recognition based on the dimensions of emotion using PCA representation and neural networks," Lecture Notes in Computer Science, Vol. 3656, 2005, pp. 1133-1140.
  4. P. Penev and J. Atick, "Local feature analysis: a general statistical theory for object representation," Network : Computation in Neural Systems, Vol. 7, 1996, pp. 477-500. https://doi.org/10.1088/0954-898X_7_3_002
  5. P. Belhumeur, J. Hespanha and D. Kriegman, "Eigenfaces vs. fisherfaces: Recognition using class specific linear projection," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, 1997, pp. 711-720. https://doi.org/10.1109/34.598228
  6. M. S. Bartlett and T. J. Sejnowski, "Independent of face images: a representation for face recognition," Proceedings of the Fourth Annual Joint Symposium on Neural Computation, 1997.
  7. C. Padgett and G. Cottrell, "Representing face images for emotion classification," Advances in Neural Information Processing Systems, Vol. 9, 1997.
  8. Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, "Comparison between geometry based and Gaborwavelets-based facial expression recognition using multi-layer perceptron," Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. pp. 454-459.
  9. J. Nicholson, K. Takahashi and R. Nakatsu, "Emotion recognition in speech using neural networks," Neural Computing & Application, Vol. 9, 2000, pp. 290-296. https://doi.org/10.1007/s005210070006
  10. A. A. Razak, R. Komiya and M. I. Z. Abidin, "Comparison between fuzzy and NN method for speech emotion recognition," Proceedings of the Third International Conference on Information Technology and Applications, 2005.
  11. C. M. Lee, S. S. Narayanan and R. l Pieraccini, "Classifying emotions in human-machine spoken dialogs," ICME'02, Vol. 1, 2002, pp. 737-740.
  12. H. J. Go, Y. T. Kim and M. G. Chun, "A Multimodal Emotion Recognition Using the Facial Image and Speech Signal," International Journal of Fuzzy Logic and Intelligent Systems, Vol. 5, No. 1, 2005, pp. 1-6. https://doi.org/10.5391/IJFIS.2005.5.1.001
  13. M. Song, M. You, N. Li, and C. Chen, "A robust multimodal approach for emotion recognition," Neurocomputing, Vol. 71, 2008, pp. 1913-1920. https://doi.org/10.1016/j.neucom.2007.07.041
  14. C. Shan, S. Gong and P. W. McOwan, "Facial expression recognition based on local binary pattern: A comprehensive study," Image and Vision Computing, Vol. 27, 2009, pp. 803-816. https://doi.org/10.1016/j.imavis.2008.08.005
  15. T. Ahonen, A. Hadid and M. Pietikainen, "Face recognition with local binary patterns," ECCV, 2004, pp. 469-481.
  16. G. Zhang, X. Huang, S. Z. Li, Y. Wang and X. Wu, "Boosting local binary pattern-based face recognition," in Proc. Advances in Biometric Person Authentication, Vol. 3338, 2004, pp. 179-186.
  17. X. Fu and W. Wei, "Centralized binary patterns embedded with image euclidean distance for facial expression recognition," Fourth International Conference on Natural Computation, Vol. 4, 2008, pp. 115-199.
  18. K. Meena and A. Suruliandi, "Local binary patterns and its variants for face recognition," International Conference on Recent Trends in Information Technology, 2011, pp. 782-786.
  19. M. Kirby and L. Sirovich, "Application of the Karhunen-Loeve procedure for the characterization of human faces," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 1, 1990, pp. 103-108. https://doi.org/10.1109/34.41390
  20. M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, 1991, pp. 71-86. https://doi.org/10.1162/jocn.1991.3.1.71
  21. Y. Jian, Z. David, F. Alejandro and J. Y. Yang, "Two-dimensional PCA: A new approach to appearance-based face representation and recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 1, 2004, pp. 131-137. https://doi.org/10.1109/TPAMI.2004.1261097
  22. A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society B, 1977, pp. 1-38.
  23. A. Ross and A. K. Jain, "Information fusion in biometrics," Pattern Recognition. Letter, Vol. 24, No. 13, 2003, pp. 2115-2125. https://doi.org/10.1016/S0167-8655(03)00079-5
  24. C. Sanderson and K. K. Paliwal, "Identity verification using speech and face information," Digital Signal Processing, Vol. 14, No. 5, 2004, pp. 449-480. https://doi.org/10.1016/j.dsp.2004.05.001
  25. J. L. Michael, S. Akamatsu, M. Kamachi and J. Gyoba, "Coding facial expressions with gabor wavelets," Proceedings, Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998, pp. 200-205.
  26. M. Minear and D. C. Park, "A lifespan database of adult facial stimuli," Behavior Research Methods, Instruments and Computers, Vol. 36, 2004, pp. 630-633. https://doi.org/10.3758/BF03206543
  27. K. H. Hyun, E. H. Kim and Y. K. Kwak, "Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept," IEEE International Workshop on Robot and Human Interactive Communication, 2005, pp. 312-316.
  28. 박창현, 심귀보, "음성 신호를 이용한 감정인식에서의 패턴인식 방법," 한국퍼지 및 지능시스템학회 논문지, 제 12권, 제 3호, 2006, pp. 284-288.
  29. 김정철, 허범근, 신나라, 홍기천, "모바일 환경에서 Haar-Like Features와 PCA를 이용한 실시간 얼굴 인증 시스템," 디지털산업정보학회 논문지, 제 6권, 제 2호, 2010, pp. 199-207.
  30. 이현구, 노용완, "차세대 PC를 위한 음성 인터페이스에 관한 연구," 디지털산업정보학회 논문지, 제 2권, 제 3호, 2006, pp. 59-66.
  31. 강면구, 서정태, 김원구, "음성 신호를 사용한 GMM 기반의 감정인식," 한국음향학회지, 제 23권, 제 3호, 2004, pp. 235-241.
  32. 고현주, 이대종, 전명근, "얼굴표정과 음성을 이용한 감정인식," 정보과학회 논문지, 제 31권, 제 6호, 2004, pp. 799-807.
  33. 이현구, 김동규, "HCI를 위한 오감정보처리에 관한 연구," 디지털산업정보학회 논문지, 제 5권, 제 2호, 2009, pp. 77-86.