DOI QR코드

DOI QR Code

CNN based Complex Spectrogram Enhancement in Multi-Rotor UAV Environments

멀티로터 UAV 환경에서의 CNN 기반 복소 스펙트로그램 향상 기법

  • Kim, Young-Jin (Department of Computer Science & Engineering, Korea University of Technology and Education) ;
  • Kim, Eun-Gyung (School of Computer Science & Engineering, Korea University of Technology and Education)
  • Received : 2020.01.28
  • Accepted : 2020.03.01
  • Published : 2020.04.30

Abstract

The sound collected through the multi-rotor unmanned aerial vehicle (UAV) includes the ego noise generated by the motor or propeller, or the wind noise generated during the flight, and thus the quality is greatly impaired. In a multi-rotor UAV environment, both the magnitude and phase of the target sound are greatly corrupted, so it is necessary to enhance the sound in consideration of both the magnitude and phase. However, it is difficult to improve the phase because it does not show the structural characteristics. in this study, we propose a CNN-based complex spectrogram enhancement method that removes noise based on complex spectrogram that can represent both magnitude and phase. Experimental results reveal that the proposed method improves enhancement performance by considering both the magnitude and phase of the complex spectrogram.

멀티로터 UAV(Unmanned Aerial Vehicle)를 이용해서 수집한 음향 데이터는 모터나 프로펠러에서 발생하는 자체 소음이나 비행 중 발생하는 바람 소리 등으로 인해 음향 품질이 크게 손상되는 문제가 발생한다. 멀티로터 UAV 환경에서는 목표 음향의 크기뿐만 아니라 위상도 크게 손상되기 때문에 크기와 위상을 모두 고려해서 음향을 향상시킬 필요가 있다. 하지만 위상은 크기와 달리 구조적인 특징이 잘 나타나지 않으므로 향상시키는 것이 쉽지 않다. 따라서 본 연구에서는 크기와 위상을 모두 표현할 수 있는 복소 스펙트로그램을 기초로 잡음을 제거해서 목표 음향의 품질을 향상시키는 CNN 기반 복소 스펙트로그램 향상 방법을 제안한다.

Keywords

References

  1. L. Wang, and A. Cavallaro, "Acoustic sensing from a multi-rotor drone," IEEE Sensors Journal, vol. 18, no. 11, pp. 4570-4582, Apr. 2018. https://doi.org/10.1109/JSEN.2018.2825879
  2. D. Floreano, and R. J. Wood, "Science, technology and the future of small autonomous drones," Nature, vol 521, no. 7553, pp. 460-466, May. 2015. https://doi.org/10.1038/nature14542
  3. K. Daniel, S. Rohde, N. Goddemeier, and C. Wietfeld, "Cognitive agent mobility for aerial sensor networks," IEEE Sensors Journal, vol. 11, no.11, pp. 2671-2682, Jun. 2011. https://doi.org/10.1109/JSEN.2011.2159489
  4. S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, Apr. 1979. https://doi.org/10.1109/TASSP.1979.1163209
  5. J. S. Lim, and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 2005.
  6. Y. Ephraim, and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984. https://doi.org/10.1109/TASSP.1984.1164453
  7. Y. Wang, N. Arun, and W. DeLiang, "On training targets for supervised speech separation," IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 12, pp. 1849-1858, Aug. 2014. https://doi.org/10.1109/TASLP.2014.2352935
  8. J. Lee, and H. J. Kang, "A Joint Learning Algorithm for Complex-Valued TF Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1098-1108, June. 2019 https://doi.org/10.1109/TASLP.2019.2910638
  9. S. J. Park, S. M. Choi, H. J. Lee and J. B. Kim, "Spatial analysis using R based Deep Learning," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 4, pp. 1-8, April. 2016 https://doi.org/10.14257/AJMAHS.2016.04.08
  10. D. Kim, "Acquiring Real Time Traffic Information Using Deep Learning Neural Networks," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 5, pp. 435-444, May. 2016 https://doi.org/10.14257/AJMAHS.2016.05.22
  11. H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks," IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 708-712, Apr. 2015.
  12. D. S. Williamson, Y. Wang, and D. Wang, "complex ratio masking for monaural speech separation," IEEE/ACM transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 483-492, Dec. 2015. https://doi.org/10.1109/TASLP.2015.2512042
  13. Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "A regression approach to speech enhancement based on deep neural networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7-19, Oct. 2014.
  14. Y. Li, X. Li, Y. Dong, M. Li, S. Xu and S. Xiong, "Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6860-6864, May. 2019.
  15. S. W. Fu, T. Y. Hu, Y. Tsao, and X. Lu, "Complex spectrogram enhancement by convolutional neural network with multi-metrics learning," 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, Sep. 2017
  16. Y. J. Kim and E. K. Kim, "CNN based dual-channel sound enhancement in the MAV environment," Journal of the Korea Institute of Information and Communication Engineering, vol. 33, no. 12, pp. 1506-1513, Dec. 2019.
  17. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," Nasa Sti/recon Technical Report N, vol. 93, Feb. 1993.
  18. E. Vincent, R. Gribonval and C. Fevotte, "Performance measurement in blind audio source separation," IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462-1469, Jun. 2006. https://doi.org/10.1109/TSA.2005.858005
  19. A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001.
  20. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 7, pp. 2125-2136, Feb. 2011. https://doi.org/10.1109/TASL.2011.2114881