DOI QR코드

DOI QR Code

CNN based dual-channel sound enhancement in the MAV environment

MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법

  • Kim, Young-Jin (Department of Computer Science & Engineering, Graduate School, Korea University of Technology and Education) ;
  • Kim, Eun-Gyung (School of Computer Science & Engineering, Korea University of Technology and Education)
  • Received : 2019.08.15
  • Accepted : 2019.08.30
  • Published : 2019.12.31

Abstract

Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.

최근 드론과 같은 멀티로터 UAV(Unmanned Aerial Vehicle, 무인항공기)의 산업 범위가 크게 확대됨에 따라, UAV를 활용한 데이터의 수집 및 처리, 분석에 대한 요구도 함께 증가하고 있다. 그러나 UAV를 이용해서 수집된 음향 데이터는 UAV의 모터 소음과 바람 소리 등으로 크게 손상되어, 음향 데이터의 처리 및 분석이 어렵다는 단점이 있다. 따라서 본 논문에서는 UAV에 연결된 마이크를 통해 수신된 음향 신호로부터 목표 음향 신호의 품질을 향상시킬 수 있는 방법에 대해 연구하였다. 본 논문에서는 기존의 단일 채널 음향 향상 기술 중 하나인 densely connected dilated convolutional network를 음향 신호의 채널 간 특성을 반영할 수 있도록 확장하였으며, 그 결과 SDR, PESQ, STOI과 같은 평가 지표에서 기존 연구 대비 좋은 성능을 보였다.

Keywords

Acknowledgement

This paper was supported by the Education and Research Promotion Program of KOREATECH in 2018.

References

  1. Korea Embedded Software and System Industry Association. KESSIA ISSUE REPORT [Internet]. Available: http://www.fkii.or.kr.
  2. L. Wang, and A. Cavallaro, "Acoustic sensing from a multi-rotor drone," IEEE Sensors Journal, vol. 18, no. 11, pp. 4570-4582, Apr. 2018. https://doi.org/10.1109/JSEN.2018.2825879
  3. D. Floreano, and R. J. Wood, "Science, technology and the future of small autonomous drones," Nature, vol 521, no. 7553, pp. 460-466, May. 2015. https://doi.org/10.1038/nature14542
  4. K. Daniel, S. Rohde, N. Goddemeier, and C. Wietfeld, "Cognitive agent mobility for aerial sensor networks," IEEE Sensors Journal, vol. 11, no.11, pp. 2671-2682, Jun. 2011. https://doi.org/10.1109/JSEN.2011.2159489
  5. G. Sinibaldi, and L. Marino, "Experimental analysis on the noise of propellers for small UAV," Applied Acoustics, vol. 74, no. 1, pp. 79-88, Jan. 2013. https://doi.org/10.1016/j.apacoust.2012.06.011
  6. S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, Apr. 1979. https://doi.org/10.1109/TASSP.1979.1163209
  7. J. S. Lim, and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 2005. https://doi.org/10.1109/PROC.1979.11540
  8. Y. Ephraim, and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984. https://doi.org/10.1109/TASSP.1984.1164453
  9. Y. Li, X. Li, Y. Dong, M. Li, S. Xu and S. Xiong, "Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6860-6864, May. 2019.
  10. D. Wang, and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702-1726, May. 2018 https://doi.org/10.1109/TASLP.2018.2842159
  11. T. Gao, J. Du, Y. Xu, C. Liu, L. R. Dai, and C. H. Lee, "Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments," In International Conference on Latent Variable Analysis and Signal Separation, pp. 75-82, 2015.
  12. G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269, 2017.
  13. G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," In Advances in neural information processing systems, pp. 971-980, 2017.
  14. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," Nasa Sti/recon Technical Report N, vol. 93, Feb. 1993.
  15. M. Strauss, P. Mordel, V. Miguet, and A. Deleforge, "DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization," In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1-8, 2018.
  16. D. Mirabilii, and E. A. Habets, "Simulating Multi-Channel Wind Noise Based on the Corcos Model," In International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 560-564, 2018.
  17. D. Diaz-Guerra, A. Miguel, and J. R. Beltran, "gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration," arXiv preprint 1810.11359, 2018.
  18. A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001.
  19. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 7, pp. 2125-2136, Feb. 2011. https://doi.org/10.1109/TASL.2011.2114881