DOI QR코드

DOI QR Code

음소기반의 순환 신경망 음성 검출기를 이용한 음성 향상

Speech Enhancement using RNN Phoneme based VAD

  • 이강 (인하대학교 전자공학과, 컴퓨터공학과) ;
  • 강상익 (인하대학교 전자공학과, 컴퓨터공학과) ;
  • 권장우 (인하대학교 전자공학과, 컴퓨터공학과) ;
  • 이상민 (인하대학교 전자공학과, 컴퓨터공학과)
  • Lee, Kang (Dept. of Electronic, Computer Information Engineering, Inha University) ;
  • Kang, Sang-Ick (Dept. of Electronic, Computer Information Engineering, Inha University) ;
  • Kwon, Jang-woo (Dept. of Electronic, Computer Information Engineering, Inha University) ;
  • Lee, Samgmin (Dept. of Electronic, Computer Information Engineering, Inha University)
  • 투고 : 2016.12.28
  • 심사 : 2017.04.12
  • 발행 : 2017.05.25

초록

본 논문에서는 향상된 연산 능력을 가진 하드웨어와 알고리즘의 혼합을 통하여 음성 향상을 위한 정확한 음성 검출기 구현을 목적으로 하였다. 음성은 음소의 나열로 구성되어있으며 음성 모델을 세우는데 적합한 방법은 이전의 정보를 이용하는 순환 신경망 (recurrent neural network, RNN)을 사용하는 것이다. 실제 존재하는 모든 잡음에 대하여 학습한 모델을 제시하는 것은 사실상 불가능 하므로 이를 극복하고자 음소기반 학습을 진행하였다. 학습의 결과로 세워진 모델을 기반으로 새로운 음성 신호에서 음성을 검출하고 그 결과를 이용하여 음성 향상을 진행하였다. 순환 신경망과 음소기반 학습은 프레임 별 높은 상관성을 가진 음성 신호에서 좋은 성능을 얻을 수 있었으며 음성 검출기의 성능을 검증하기 위하여 라벨 데이터와 음성 검출결과를 비교하고 다양한 잡음 환경에서 객관적 음질 평가를 진행하여 기존의 음성 향상 알고리즘과 비교하였다.

In this papers, we apply high performance hardware and machine learning algorithm to build an advanced VAD algorithm for speech enhancement. Since speech is made of series of phoneme, using recurrent neural network (RNN) which consider previous data is proper method to build a speech model. It is impossible to study every noise in real world. So our algorithm is builded by phoneme based study. we detect voice present frames in noisy speech signal and make enhancement of the speech signal. Phoneme based RNN model shows advanced performance in speech signal which has high correlation among each frames. To verify the performance of proposed algorithm, we compare VAD result with label data and speech enhancement result in various noise environments with previous speech enhancement algorithm.

키워드

참고문헌

  1. H. G. Hirsch, and D. Pearce. "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW). 2000.
  2. Y. Wang and D. Wang, "Towards scaling up classification-based speech separation," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 12, pp. 1381-1390, July 2013. https://doi.org/10.1109/TASL.2013.2250961
  3. D. Burshtein, and S. Gannot, "Speech enhancement using a mixture-maximum model," IEEE transactions on speech and audio processing, vol. 10, no. 6 pp. 341-351, 2002. https://doi.org/10.1109/TSA.2002.803420
  4. Loizou, Philipos C. "Speech enhancement: theory and practice." CRC press, 2013.
  5. A. W. Rix, J. G. Beerends, M. P. Hollier, P. Hekstra "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs." Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on. Vol. 2. IEEE, 2001.
  6. J. S. Garofolo, L. F. Lamel, W. M. Fisher, and J. G. Fiscus, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic data consortium, Philadelphia vol. 33, 1993.
  7. C. Lopes, and F. Perdigao. "Phone recognition on the TIMIT database," Speech Technologies/Book 1, pp. 285-302, 2011.
  8. A. Varga and H. J. Steeneken, "Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech communication, vol. 12, no. 3, pp. 247- 251, 1993. https://doi.org/10.1016/0167-6393(93)90095-3
  9. I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Transactions on speech and audio processing vol. 11, no. 5, pp. 466-475, 2003. https://doi.org/10.1109/TSA.2003.811544
  10. Y. S. Park, H. S. Ahn, and S. M. Lee, "Speech Enhancement Based on Teager Energy and Speech Absence Probability in Noisy Environments." IEIE Journal-SP, vol. 49. no. 13, pp. 81-88, 2012. https://doi.org/10.5573/ieek.2012.49.12.081