DOI QR코드

DOI QR Code

A Study on Realization of Speech Recognition System based on VoiceXML for Railroad Reservation Service

철도예약서비스를 위한 VoiceXML 기반의 음성인식 구현에 관한 연구

  • Received : 2010.12.06
  • Accepted : 2011.03.03
  • Published : 2011.04.26

Abstract

This paper suggests realization method for real-time speech recognition using VoiceXML in telephony environment based on SIP for Railroad Reservation Service. In this method, voice signal incoming through PSTN or Internet is treated as dialog using VoiceXML and the transferred voice signal is processed by Speech Recognition System, and the output is returned to dialog of VoiceXML which is transferred to users. VASR system is constituted of dialog server which processes dialog, APP server for processing voice signal, and Speech Recognition System to process speech recognition. This realizes transfer method to Speech Recognition System in which voice signal is recorded using Record Tag function of VoiceXML to process voice signal in telephony environment and it is played in real time.

본 논문에서는 철도예약서비스를 위한 SIP를 기반으로 하는 텔레포니 환경에서의 VoiceXML을 이용한 실시간 음성인식을 구현하는 방안을 제안하였다. 제안된 방법은 PSTN 또는 인터넷을 통하여 들어온 음성신호를 VoiceXML을 이용한 Dialog 처리를 하고 전송된 음성신호를 음성인식 시스템에서 처리하여 출력된 결과값을 VoiceXML의 Dialog에 반환하여 사용자에게 전달하는 방식이다. VASR 시스템은 Dialog를 처리하는 Dialog 서버, 음성신호를 처리하기 위한 APP서버, 그리고 음성인식을 처리하는 음성인식 시스템으로 구성된다. 본 논문에서는 텔레포니 환경에서의 음성신호 처리를 위하여 VoiceXML의 Record Tag 기능을 이용하여 음성신호를 녹음하고 이를 실시간 재생하여 음성인식 시스템으로 전송하는 방식을 구현하였다.

Keywords

References

  1. E.A. Anderson, S. Breitenbach, T. Burd, N. Chidambaram, P. Houle, D. Newsome, X. Tang, X. Zhu (2001) Early Adapter VoiceXML, Wrox.
  2. C.S. Ryu, H.H. Jeon, M.W. Koo (2000) Train information trial service of korea Telecom Using Speech Recognition, Institute for Information Technology Advancement.
  3. The Railroad News, http://www.railnews.co.kr, 28 June 2010 (1012).
  4. A. King, A. Terzoli, P. Clayton (2006) Creating a low cost VoiceXML Gateway to replace IVR systems for rapid deployment of voice applications, 2006 SATNAC conf.
  5. J. Rouillard (2007) Web services and speech-based applications around VoiceXML, Journal of Networks, 2(1).
  6. K.R. Kim, K. H. Kim (2000) Design and Implementation of Voice Browser and VXML editor, 2000 Spring Conf. Korean Institute of Information Scientistis and Engineers, 27(1), pp. 414-416.
  7. E.H. Kim, J.I. Kim, M.W. Koo (2002) The interactive Voice Service based on VoiceXML, KSCSP 2002, Acoustical Society of Korea, 19(1), pp. 1-7.
  8. H.S. Kim, M.K. Lee, J.C. Kim, S.J. Lee (2002) Implementation and Design of Internet Telephony Architecture based on SIP, 2002 Autumn Conf. Korea Information and Communication Society.
  9. Asterisk PBX, http://www.asterisk.org, accessed on 20 July 2010.
  10. The Open Source PBX for Windows, http://www.asteriskwin32. com, accessed on 20 July 2010.
  11. Voxy, VoiceXML Integration for Asterisk, http://voicexml.phpmagazine.net/2006/10/voxy-voicexml-integration-for.html, accessed on 20 July 2010.
  12. A. Tsai, A.N. Pargellis, C.H. Lee, J.P. Olive (2001) Dialogue Session Management Using VoiceXML, In EUROSPEECH-2001, pp. 2213-2216.
  13. K. Singh, A. Nambi, H. Schulzrinne (2003) Integrating VoiceXML with SIP services, ICC 2003 - Global Services and Infrastructure for Next Generation Networks, Anchorage, Alaska.
  14. L. Lerato, M. Molapo and L. Khoase (2009) Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications, 2009 SATNAC conf.
  15. VoiceXML 2.0, W3C Recommendation, http://www.w3.org/TR/2004/REC-voicexml20-20040316, accessed on 20 July 2010.
  16. VAC, http://software.muzychenko.net/eng/vac.htm, accessed on 20 July 2010.
  17. imTEL, http://www.imtel.com, accessed on 20 July 2010.
  18. B.S. Kim, S.H. Kim (2009) A Study on the Speech Recognition for Commands of Ticketing Machine using CHMM, Journal of the Korean Society for Railway, 12(2), pp. 285-290.
  19. L.R. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, Proc IEEE, 77(2), pp. 257-286.
  20. D. Jurafsky and J. H. Martin (2008) Speech and Language Processing, Prentice Hall(2nd).
  21. Y. Hu, P. Loizou (2008) Evaluation of Objective Measures for Speech Enhancement, IEEE Transactions on Speech and Audio Processing, 16(1), pp 229-238. https://doi.org/10.1109/TASL.2007.911054

Cited by

  1. The Automated Threshold Decision Algorithm for Node Split of Phonetic Decision Tree vol.31, pp.3, 2012, https://doi.org/10.7776/ASK.2012.31.3.170