Data Augmentation for DNN-based Speech Enhancement

딥 뉴럴 네트워크 기반의 음성 향상을 위한 데이터 증강

Lee, Seung Gwan;Lee, Sangmin

  • Received : 2019.04.05
  • Accepted : 2019.06.10
  • Published : 2019.07.31


This paper proposes a data augmentation algorithm to improve the performance of DNN(Deep Neural Network) based speech enhancement. Many deep learning models are exploring algorithms to maximize the performance in limited amount of data. The most commonly used algorithm is the data augmentation which is the technique artificially increases the amount of data. For the effective data augmentation algorithm, we used a formant enhancement method that assign the different weights to the formant frequencies. The DNN model which is trained using the proposed data augmentation algorithm was evaluated in various noise environments. The speech enhancement performance of the DNN model with the proposed data augmentation algorithm was compared with the algorithms which are the DNN model with the conventional data augmentation and without the data augmentation. As a result, the proposed data augmentation algorithm showed the higher speech enhancement performance than the other algorithms.


Speech Enhancement;Data Augmentation;Deep Neural Network(DNN);Noise Reduction


  1. S. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Transactions on Acoustic, Speech, and Signal Processomg, Vol. ASSP-27, No. 2, pp. 113-120, 1979.
  2. P. Scalart and J.V. Filho, "Speech Enhancement Based on a Priori Signal to Noise Esti-mation," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 629-632, 1996.
  3. Y. Ephraim and H.L. Van Trees, "A Signal Subspace Approach for Speech Enhancement," Proceedings of 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 355-358, 1993.
  4. K.K. Paliwal and A. Basu, "A Speech Enhancement Method Based on Kalman Filtering," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 177-180, 1987.
  5. Y. Xu, J. Du, L. Dai, and C. Lee, “An Experimental Study on Speech Enhancement Based on Deep Neural Networks,” IEEE Signal Processing Letters, Vol. 21, No. 1, pp. 65-68, 2014.
  6. ITU-T P.862, Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-End Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs, 2001.
  7. C.H. Taal, R.C. Hendrilks, R. Heusdens, and J. Jensen, “An Algorithm for Intelligibility Prediction of Time Frequency Weighted Noisy Speech,” IEEE Transaction on Audio, Speech, and Language Processing, Vol. 19, No. 7, pp. 2125-2136, 2011.
  8. T. Tran, J. Park, O. Kwon, K. Moon, S. Lee, K. Kwon, et al., “Classification of Leukemia Disease in Peripheral Blood Cell Images Using Convolutional Neural Network,” Journal of Korea Multimedia Society, Vol. 21, No. 10, pp. 1150-1161, 2018.
  9. N. Jaitly and G.E. Hinton, "Vocal Tract Length Perturbation (VTLP) Improves Speech Recognition," Proceedings of International Conference on Machine Learning Workshop on Deep Learning for Audio, Speech and Language, pp. 925-660, 2013.
  10. J. Salamon and J.P. Bello, “Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification,” IEEE Signal Processing Letters, Vol. 24, No. 3, pp. 279-283, 2017.
  11. T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio Augmentation for Speech Recognition," Proceeding of Sixteenth Annual Conference of the International Speech Communication Association, pp3586-3589, 2015.
  12. L.J. Raphael, G.J. Borden, and K.S. Harris, Speech Science Primer: Physiology, Acoustics, and Perception of Speech: Sixth Edition, Lippincott Williams and Wilkins, Philadelphia, United States, 2012.
  13. D. Maurer, Acoustics of the Vowel-Preliminaries, Peter Lang AG, International Academic Publishers, Bern, Switzerland, 2016.
  14. J. Kim and M. Hahn, “Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy,” IEEE Signal Processing Letters, Vol. 26, No. 5, pp. 770-774, 2019.
  15. A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," Proceeding of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645-6649, 2013.
  16. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, and D.S. Pallett, "DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM," National Institute of Standards and Technology, 1993.
  17. A. Varga and H.J.M. Steeneken, “Assessment for Automatic Speech Recognition II: Noisex-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems,” Speech Communication, Vol. 12, No. 3, pp. 247-251, 1993.


Supported by : National Research Foundation of Korea(NRF)