DOI QR코드

DOI QR Code

Robust speech quality enhancement method against background noise and packet loss at voice-over-IP receiver

배경잡음 및 패킷손실에 강인한 voice-over-IP 수신단 기반 음질향상 기법

  • 김지연 (광운대학교 전자융합공학과) ;
  • 김형국 (광운대학교 전자융합공학과)
  • Received : 2018.09.18
  • Accepted : 2018.11.22
  • Published : 2018.11.30

Abstract

Improving voice quality is a major concern in telecommunications. In this paper, we propose a robust speech quality enhancement against background noise and packet loss at VoIP (Voice-over-IP) receiver. The proposed method combines network jitter estimation based on hybrid Markov chain, adaptive playout scheduling using the estimated jitter, and speech enhancement based on restoration of amplitude and phase to enhance the quality of the speech signal arriving at the VoIP receiver over IP network. The experimental results show that the proposed method removes the background noise added to the speech signal before encoding at the sender side and provides the enhanced speech quality in an unstable network environment.

음성 품질의 향상은 통신 분야의 주요 관심사이다. 본 논문에서는 VoIP(Voice-over-IP) 수신부에서의 배경잡음 및 패킷손실에 강인한 음질향상 방식을 제안한다. 제안된 방식에서는 하이브리드 마르코프 체인 기반 네트워크 지터추정, 추정된 지터를 이용한 적응적 플레이아웃 스케줄링, 그리고 진폭 및 위상 복원 기반의 음성 향상 방식 등을 결합하여 IP 네트워크를 통해 VoIP 수신부에 도착하는 음성신호의 품질을 향상시킨다. 실험결과는 제안된 방식이 송신부의 인코딩 전에 음성신호에 추가된 잡음을 제거하고 불안정한 네트워크 환경에서 양질의 음성을 제공하는 것을 확인할 수 있다.

Keywords

GOHHBH_2018_v37n6_512_f0001.png 이미지

Fig. 1. Overall flow chart of the proposed VoIP receiver-based speech quality enhancement.

GOHHBH_2018_v37n6_512_f0002.png 이미지

Fig. 2. Block diagram of network jitter estimation.

GOHHBH_2018_v37n6_512_f0003.png 이미지

Fig. 3. Block diagram of adaptive playout control and signal reconstruction.

GOHHBH_2018_v37n6_512_f0004.png 이미지

Fig. 4. Overall flow chart of packet loss concealment.

GOHHBH_2018_v37n6_512_f0005.png 이미지

Fig. 5. Block diagram of the proposed speech enhancement.

GOHHBH_2018_v37n6_512_f0006.png 이미지

Fig. 6. Block diagram of the phase estimation.

Table 1. Statics of network trace.

GOHHBH_2018_v37n6_512_t0001.png 이미지

Table 2. Averaged MOS scores.

GOHHBH_2018_v37n6_512_t0002.png 이미지

References

  1. B. H. Kim, H.-G. Kim, J. Jeong, and J. Y. Kim, "VoIP receiver-based adaptive playout scheduling and packet loss concealment technique," IEEE Trans. on Consumer Electronics, 59, 250-258 (2013). https://doi.org/10.1109/TCE.2013.6490267
  2. Y, Xu, J. Du, L.-I. Dai, and C.-H. Lee, "An experimental study on speech enhancement based on deep neural networks," IEEE Signal Processing Letters, 21, 65-68 (2014). https://doi.org/10.1109/LSP.2013.2291240
  3. A. Kumar and D. Florencio, "Speech enhancement in multiple noise conditions using deep neural networks," Proc. Interspeech, 738-3742 (2016).
  4. Z. Zhao, L. Guardalben, M. Karimzadeh, J. Silva, T. Braun, and S. Sargeno, "Mobility prediction-assisted over-the-top edge prefetching for hierarchical VANETs," IEEE J. Selected Areas in Communication, 1786-1807 (2018).
  5. W. Jin, X. Liu, M. S. Scordilis, and L. Han, "Speech enhancement using harmonic emphasis and adaptive comb filtering," IEEE Trans. Audio, Speech, and Language Processing, 18, 356-368 (2010). https://doi.org/10.1109/TASL.2009.2028916
  6. J.-M. Valin, "A hybrid DSP/deep learning approach to real-time full-band speech enhancement," arxiv: 1709. 08243v3 (2017).
  7. H.-G. Kim and J. Y. Kim, "Adaptive single-channel speech enhancement method for a Push-To-Talk enabled wireless communication device," IEICE Trans. on Communications, E99-B, 1745-1753 (2016). https://doi.org/10.1587/transcom.2015CCP0023