DOI QR코드

DOI QR Code

클리핑 감지기를 이용한 음성 신호 클리핑 제거의 성능 향상

Performance Enhancement of Speech Declipping using Clipping Detector

  • Eunmi Seo (Dept. of Electronics Engineering, Kwangwoon Univ.) ;
  • Jeongchan Yu (Dept. of Electronics Engineering, Kwangwoon Univ.) ;
  • Yujin Lim (Dept. of Electronics Engineering, Kwangwoon Univ.) ;
  • Hochong Park (Dept. of Electronics Engineering, Kwangwoon Univ.)
  • 투고 : 2022.12.08
  • 심사 : 2023.01.25
  • 발행 : 2023.01.30

초록

본 논문에서는 클리핑 감지기를 이용하여 음성 신호의 클리핑 제거 성능을 향상시키는 방법을 제안한다. 클리핑은 입력 음성 신호의 크기가 마이크의 동적 범위를 넘을 때 발생하며, 음성 품질을 저하시키는 요인이 된다. 최근 머신러닝을 이용한 많은 클리핑 제거 기술이 개발되었고 우수한 성능을 제공하고 있다. 그러나 머신러닝 기반의 클리핑 제거 방법은 신호 복원 과정의 왜곡으로 인해 클리핑이 심하지 않을 때 출력 신호의 품질이 저하되는 문제를 가진다. 이를 해결하기 위해 클리핑 제거기를 클리핑 감지기와 연동시켜 클리핑 수준에 따라 클리핑 제거 동작을 선택적으로 적용하는 방법을 제안하고, 이를 통해 모든 클리핑 수준에서 우수한 품질의 신호를출력하도록 한다. 다양한 평가 지표로 클리핑 제거 성능을 측정하였고, 제안 방법이 기존 방법에 비해 모든 클리핑 수준에 대한 평균성능을 향상시키고, 특히 클리핑 왜곡이 작을 때 성능을 크게 향상시키는 것을 확인하였다.

In this paper, we propose a method for performance enhancement of speech declipping using clipping detector. Clipping occurs when the input speech level exceeds the dynamic range of microphone, and it significantly degrades the speech quality. Recently, many methods for high-performance speech declipping based on machine learning have been developed. However, they often deteriorate the speech signal because of degradation in signal reconstruction process when the degree of clipping is not high. To solve this problem, we propose a new approach that combines the declipping network and clipping detector, which enables a selective declipping operation depending on the clipping level and provides high-quality speech in all clipping levels. We measured the declipping performance using various metrics and confirmed that the proposed method improves the average performance over all clipping levels, compared with the conventional methods, and greatly improves the performance when the clipping distortion is small.

키워드

과제정보

이 논문은 2022년도 광운대학교 교내학술연구비 지원과 2021년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원(NRF-2021R1F1A1059233)과 2022년도 정부(산업통상자원부)의 재원으로 한국산업기술진흥원의 지원(P0017124, 산업혁신인재성장지원사업)을 받아 수행된 연구임.

참고문헌

  1. S. Maymon, E. Marcheret, and V. Goel, "Restoration of clipped signals with application to speech recognition," Proc. Interspeech, pp. 3294-3297, Aug. 2013. doi: https://doi.org/10.21437/Interspeech.2013-729
  2. W. Mack and E. A. P. Habets, "Declipping speech using deep filtering," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 200-204, Dec. 2019. doi: https://doi.org/10.1109/WASPAA.2019.8937287
  3. P. Zaviska, P. Rajmic, A. Ozerov, and L. Rencker, "A survey and an extensive evaluation of popular audio declipping methods," IEEE J. Selected Topics in Signal Processing, Vol. 15, No. 1, pp. 5-24, Jan. 2021. doi: https://doi.org/10.1109/JSTSP.2020.3042071
  4. J. Y. Jung and G. B. Kim "Adaptation of classification model for improving speech intelligibility in noise," J. of Broadcast Engineering, Vol. 23, No. 4, pp. 511-518, July 2018. doi: https://doi.org/10.5909/JBE.2018.23.4.511
  5. Adobe Audition 3 USER GUIDE, https://help.adobe.com/archive/en_US/audition/3/audition_3_help.pdf (accessed Nov. 1, 2022).
  6. A. A. Nair and K. Koishida, "Cascaded time + time-frequency Unet for speech enhancement: Jointly addressing clipping, codec distortions, and gaps," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. pp. 7153-7157, 2021. doi: https://doi.org/10.1109/ICASSP39728.2021.9414721
  7. C. Macartney and T. Weyde, "Improved speech enhancement with the Wave-U-net," arXiv preprint arXiv:1811.11307, Nov. 2018. doi: https://doi.org/10.48550/arXiv.1811.11307
  8. S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech enhancement generative adversarial network," Proc. Interspeech, pp. 3642-3646, Mar. 2017. doi: https://doi.org/10.21437/Interspeech.2017-1428
  9. X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, "U2-net: Going deeper with nested U-structure for salient object detection," Pattern Recognition, Vol. 106, No. 107404, Oct. 2020. doi: https://doi.org/10.1016/j.patcog.2020.107404
  10. H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, "Phase-aware speech enhancement with deep complex u-net," arXiv preprint arXiv:1903.03107, Mar. 2019. doi: https://doi.org/10.48550/arXiv.1903.03107
  11. G. B. Kim "Binary mask estimation using training-based SNR estimation for improving speech intelligibility," J. of Broadcast Engineering, Vol. 1, No. 6, pp. 1061-1068, Nov. 2012. doi: http://dx.doi.org/10.5909/JBE.2012.17.6.1061
  12. Y. Hu and P. C. Loizou, "Evaluation of objective measures for speech enhancement." IEEE Trans. on Audio, Speech, and Language Processing, Vol. 16, No. 1, pp. 229-238, Jan. 2008. doi: https://doi.org/10.1109/TASL.2007.911054
  13. Evaluation measures open source, https://www.crcpress.com/downloads/K14513/K14513_CD_Files.zip (accessed Nov. 1, 2022).
  14. J. Yamagishi, C. Veaux, and K. MacDonald, "CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92)," University of Edinburgh. The Centre for Speech Technology Research, 2019. doi: https://doi.org/10.7488/ds/2645
  15. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, Dec. 2014. doi: https://doi.org/10.48550/arXiv.1412.6980