Performance assessments of feature vectors and classification algorithms for amphibian sound classification

양서류 울음 소리 식별을 위한 특징 벡터 및 인식 알고리즘 성능 분석

  • 박상욱 (고려대학교 전기전자공학과) ;
  • 고경득 (고려대학교 전기전자공학과) ;
  • 고한석 (고려대학교 전기전자공학과)
  • Received : 2017.09.12
  • Accepted : 2017.11.29
  • Published : 2017.11.30


This paper presents the performance assessment of several key algorithms conducted for amphibian species sound classification. Firstly, 9 target species including endangered species are defined and a database of their sounds is built. For performance assessment, three feature vectors such as MFCC (Mel Frequency Cepstral Coefficient), RCGCC (Robust Compressive Gammachirp filterbank Cepstral Coefficient), and SPCC (Subspace Projection Cepstral Coefficient), and three classifiers such as GMM(Gaussian Mixture Model), SVM(Support Vector Machine), DBN-DNN(Deep Belief Network - Deep Neural Network) are considered. In addition, i-vector based classification system which is widely used for speaker recognition, is used to assess for this task. Experimental results indicate that, SPCC-SVM achieved the best performance with 98.81 % while other methods also attained good performance with above 90 %.


Supported by : 환경부


  1. S. Park, W. Choi, D. K. Han, and H. Ko "Acoustic event filterbank for enabling robust event recognition by cleaning robot," IEEE Trans. Consu. Electro., 61, 189-196 (2015).
  2. M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digital Signal Processing, 29, 147-157 (2014).
  3. S. Park, Y. Lee, D. K. Han, and H. Ko, "Subspace projection cepstral coefficients for noise robust acoustic event recognition," Proc. ICASSP, 761-765 (2017).
  4. G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, 18, 1527-1554 (2006).
  5. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. Audio, Speech. and Lang. Proc. 19, 788-798 (2011).
  6. J. Park, W. Kim, D. K. Han, and H. Ko, "Voice activity detection in noisy environments based on double-combined fourier transform and line fitting," The Scientific World J. 2014, 1-12 (2014).
  7. L. J. P. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Machine Learning Research, 9, 2579-2605 (2008).
  8. Z. Kons and O. Toledo-Ronen, "Audio event classification using deep neural networks," Proc. INTERSPEECH, 1482-1486 (2013).
  9. P. Kenny, G. Boulianne, and P. Dumouchel, "Eigenvoice modelling with sparse training data," IEEE Trans. Speech and Audio Processing, 13, 345-354 (2005).
  10. M. E. Tipping and C. M. Bishop, "Mixtures of probabilistic principal component analyzers," Neural Computation, 11, 443-482 (1999).