DOI QR코드

DOI QR Code

Optimization of the Kernel Size in CNN Noise Attenuator

CNN 잡음 감쇠기에서 커널 사이즈의 최적화

  • Lee, Haeng-Woo (Dept. of Information Communication Engineering, Namseoul University)
  • 이행우 (남서울대학교 정보통신공학과)
  • Received : 2020.09.13
  • Accepted : 2020.12.15
  • Published : 2020.12.31

Abstract

In this paper, we studied the effect of kernel size of CNN layer on performance in acoustic noise attenuators. This system uses a deep learning algorithm using a neural network adaptive prediction filter instead of using the existing adaptive filter. Speech is estimated from a single input speech signal containing noise using a 100-neuron, 16-filter CNN filter and an error back propagation algorithm. This is to use the quasi-periodic property in the voiced sound section of the voice signal. In this study, a simulation program using Tensorflow and Keras libraries was written and a simulation was performed to verify the performance of the noise attenuator for the kernel size. As a result of the simulation, when the kernel size is about 16, the MSE and MAE values are the smallest, and when the size is smaller or larger than 16, the MSE and MAE values increase. It can be seen that in the case of an speech signal, the features can be best captured when the kernel size is about 16.

본 논문은 음향잡음감쇠기에서 CNN(: Convolutional Neural Network) 계층의 커널 사이즈가 성능에 미치는 영향을 위한 연구하였다 이 시스템은 기존의 적응필터를 이용하는 대신 신경망 적응예측필터를 이용한 심층학습 알고리즘으로 잡음감쇠 성능을 개선한다. 100-neuron, 16-filter CNN 필터와 오차 역전파(back propagation) 알고리즘을 이용하여 잡음이 포함된 단일입력 음성신호로부터 음성을 추정한다. 이는 음성신호가 갖는 유성음 구간에서의 준주기적 성질을 이용하는 것이다. 본 연구에서 커널 사이즈에 대한 잡음감쇠기의 성능을 검증하기 위하여 Tensorflow와 Keras 라이브러리를 사용한 시뮬레이션 프로그램을 작성하고 모의실험을 수행하였다. 모의실험 결과, 커널 사이즈가 16 정도일 때 평균자승오차(MSE: Mean Square Error) 및 평균절대값오차(MAE: Mean Absolute Error) 값이 가장 작은 것으로 나타났으며 사이즈가 이보다 더 작거나 커지면 MSE 및 MAE 값이 증가하는 것을 볼 수 있다. 이는 음성신호의 경우 커널 사이즈가 16 정도일 때 특성을 가장 잘 포집할 수 있음을 알 수 있다.

Keywords

References

  1. S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, Apr. 1979, pp. 113-120. https://doi.org/10.1109/TASSP.1979.1163209
  2. A. Schaub and P. Schaub, "Spectral sharpening for speech enhancement/noise reduction," Proc. of Int. Conf. on Acoust., Speech, Signal Processing, vol. 2, May 1991, pp. 993-996.
  3. J. S. Lim and A. V. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, Jun. 1978, pp. 197-210.
  4. J. Hansen and M. Clements, "Constrained iterative speech enhancement with to speech recognition," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-39, no. 4, Apr. 1989, pp. 21-27.
  5. J. Choi, "Noise Reduction Algorithm in Speech by Wiener Filter," J. of the Korea Institute of Electronic Communication Sciences, vol. 8, Sep. 2013, pp. 1293-1298. https://doi.org/10.13067/JKIECS.2013.8.9.1293
  6. J. S. Lim, A. V. Oppenheim and L. D. Braida, "Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, no. 4, Apr. 1991, pp. 354-358.
  7. S. F. Boll and D. C. Pulsipher, "Suppression of acoustic noise in speech using two microphone adaptive noise cancellation," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-28, no. 6, Dec. 1989, pp. 752-753.
  8. W. A. Harrison, J. S. Lim, and E. Singer, "A new application of adaptive noise cancellation," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, Feb. 1986, pp. 21-27.
  9. O. S. Kwon, "Study on Efficient Adaptive Controller for Attenuation of Engine Noises in a Car," J. of the Korea Institute of Electronic Communication Sciences, vol. 9, Sep. 2014, pp. 983-989. https://doi.org/10.13067/JKIECS.2014.9.9.983
  10. M. R. Sambur, "Adaptive noise canceling for speech signals," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, Oct. 1978, pp. 419-423. https://doi.org/10.1109/TASSP.1978.1163137
  11. J. Schmidhuber, "Deep learning in neural networks: An overview," Neural Networks, vol. 61, 2015, pp. 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
  12. J. Choi, "Speech and Noise Recognition System by Neural Network," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, Aug. 2010, pp. 357-362.
  13. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, Nov. 1998, pp. 2278-2324. https://doi.org/10.1109/5.726791
  14. D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 5, 1988, pp. 3.