Automatic Vowel Sequence Reproduction for a Talking Robot Based on PARCOR Coefficient Template Matching

  • Vo, Nhu Thanh ;
  • Sawada, Hideyuki
  • Received : 2016.04.20
  • Accepted : 2016.05.24
  • Published : 2016.06.30


This paper describes an automatic vowel sequence reproduction system for a talking robot built to reproduce the human voice based on the working behavior of the human articulatory system. A sound analysis system is developed to record a sentence spoken by a human (mainly vowel sequences in the Japanese language) and to then analyze that sentence to give the correct command packet so the talking robot can repeat it. An algorithm based on a short-time energy method is developed to separate and count sound phonemes. A matching template using partial correlation coefficients (PARCOR) is applied to detect a voice in the talking robot's database similar to the spoken voice. Combining the sound separation and counting the result with the detection of vowels in human speech, the talking robot can reproduce a vowel sequence similar to the one spoken by the human. Two tests to verify the working behavior of the robot are performed. The results of the tests indicate that the robot can repeat a sequence of vowels spoken by a human with an average success rate of more than 60%.


Talking robot;PARCOR;Vowel sequence;Human speech;Short-time energy


  1. F. H. Guenther, et al., "A neural modelling and imaging of the cortical interactions underlying syllable production", Brain and Language, Vol 96(3), pp. 280-30, 2006.
  2. H. Bernd, et al., "Associative learning and self-organization as basic principles for simulating speech acquisition", Speech Production, and Speech Perception. EPJ Nonlinear Biomedical Physics. pp. 2-28, 2014.
  3. M. Kitani, H. Sawada, et al, "A talking robot and its singing performance by the mimicry of human vocalization", Human-Computer Systems Interaction: Backgrounds and Applications. Advances in Intelligent and Soft Computing, Vol 99, pp. 57-73, 2012.
  4. H. Sawada, "Talking robot and the autonomous acquisition of vocalization and singing skill", Robust Speech Recognition and Understanding, Vol 22, pp.385-404, 2007.
  5. K. Fukui, E. Shintaku, et al, "Mechanical vocal cord for anthropomorphic talking robot based on human biomechanical structure", The Japan Society of Mechanical Engineers, Vol 73, pp. 112-118, 2007.
  6. Flanagan. J.L, Speech Analysis Synthesis and Perception, Springer-Verlag, 1972.
  7. Atal. B.S, Hanauer. S.L, Speech analysis and synthesis by linear prediction of the speech wave, JASA,50, 637-655,1971
  8. J.Durbin, The fitting of time-series models, Rev. Inst. Int. de Stat., Vol.28, No.3, pp.233-244, 1960

Cited by

  1. Simplified cerebellum-like spiking neural network as short-range timing function for the talking robot vol.30, pp.4, 2018,


Supported by : Japan Society for the Promotion of Science