DOI QR코드

DOI QR Code

Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

  • Kim, Hyoung-Gook (Department of Electronics Convergence Engineering, Kwangwoon University) ;
  • Kim, Jin Young (Department of Electronics and Computer Engineering, Chonnam National University)
  • Received : 2015.03.16
  • Accepted : 2015.12.28
  • Published : 2016.06.01

Abstract

Recent developments in the field of separation of mixed signals into music/voice components have attracted the attention of many researchers. Recently, iterative kernel back-fitting, also known as kernel additive modeling, was proposed to achieve good results for music/voice separation. To obtain minimum mean square error (MMSE) estimates of short-time Fourier transforms of sources, generalized spatial Wiener filtering (GW) is typically used. In this paper, we propose an advanced music/voice separation method that utilizes a generalized weighted ${\beta}$-order MMSE estimation (WbE) based on iterative kernel back-fitting (KBF). In the proposed method, WbE is used for the step of mixed music signal separation, while KBF permits kernel spectrogram model fitting at each iteration. Experimental results show that the proposed method achieves better separation performance than GW and existing Bayesian estimators.

Keywords

References

  1. Z. Rafii and B. Pardo, "REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 1, Jan. 2013, pp. 73-84. https://doi.org/10.1109/TASL.2012.2213249
  2. N.C. Maddage, C. Xu, and Y. Wang, "Singer Identification Based on Vocal and Instrumental Models," Proc. Int. Conf. Pattern Recogn., Cambridge, UK, Aug. 23-26, 2004, pp. 375-378.
  3. M. Ryynanen and A. Klapuri, "Transcription of the Singing Melody in Polyphonic Music," Int. Conf. Music Inf. Retrieval, Victoria, Canada, Oct. 8-12, 2006, pp. 222-227.
  4. S. Marchand et al., "DReaM: A Novel System for Joint Source Separation and Multi-track Coding," 133rd AES Conv., San Francisco, CA, USA, Oct. 26-29, 2012.
  5. J. Nikunen, T. Virtanen, and M. Vilermo, "Multichannel Audio Upmixing Based on Non-negative Tensor Factorization Representation," IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 16-19, 2011, pp. 33-36.
  6. U. Simsekli, Y.K. Yilmaz, and A.T. Cemgil, "Score Guided Audio Restoration via Generalized Coupled Tensor Factorisation," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 5369-5372.
  7. J.L. Durrieu, B. David, and G. Richard, "A Musically Motivated Mid-level Representation for Pitch Estimation and Musical Audio Source Separation," IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, Oct. 2011, pp. 1180-1191. https://doi.org/10.1109/JSTSP.2011.2158801
  8. C.L. Hsu and J.S.R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 2, Feb. 2010, pp. 310-319. https://doi.org/10.1109/TASL.2009.2026503
  9. T. Virtanen, A. Mesaros, and M. Ryynanen, "Combining Pitch-Based Inference and Non-negative Spectrogram Factorization in Separating Vocals from Polyphonic Music," ISCA Tutorial Res. Workshop Statistical Perceptual Audition, Brisbane, Australia, Sept. 21, 2008, pp. 17-22.
  10. A. Liutkus et al., "Kernel Additive Models for Source Separation," IEEE Trans. Signal Process., vol. 62, no. 16, Aug. 2014, pp. 4298-4310. https://doi.org/10.1109/TSP.2014.2332434
  11. D. Fitzgerald, "Harmonic/Percussive Separation Using Median Filtering," Int. Conf. Digital Audio Effects, Graz, Austria, Sept. 6-10, 2010, pp. 1-4.
  12. Z. Rafii and B. Pardo, "A Simple Music/Voice Separation Method Based on the Extraction of the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Prague, Czech Republic, May 22-27, 2011, pp. 221-224.
  13. A. Liutkus et al., "Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 53-56.
  14. Z. Rafii and B. Pardo, "Music/Voice Separation Using the Similarity Matrix," Int. Conf. Music Inf. Retrieval, Porto, Portugal, Oct. 8-12, 2012, pp. 583-588.
  15. O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. Signal Process., vol. 52, no. 7, July 2004, pp. 1830-1847. https://doi.org/10.1109/TSP.2004.828896
  16. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
  17. E. Plourde and B. Champagne, "Auditory-Based Spectral Amplitude Estimators for Speech Enhancement," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 8, Nov. 2008, pp. 1614-1623. https://doi.org/10.1109/TASL.2008.2004304
  18. C.H. You, S.N. Koh, and S. Rahardja, "${\beta}$-Order MMSE Spectral Amplitude Estimation for Speech Enhancement," IEEE Trans. Speech, Audio Process., vol. 13, no. 4, July. 2005, pp. 475-486. https://doi.org/10.1109/TSA.2005.848883
  19. F. Deng, F. Bao, and C.-C. Bao, "Speech Enhancement Using Generalized ${\beta}$-Order Spectral Amplitude Estimator," Speech Commun., vol. 59, Apr. 2014, pp. 55-68. https://doi.org/10.1016/j.specom.2014.01.002
  20. C.H. You, S.N. Koh, and S. Rahardja, "Masking-Based ${\beta}$-Order MMSE Speech Enhancement," Speech Commun., vol. 48, no. 1, Jan. 2006, pp. 57-70. https://doi.org/10.1016/j.specom.2005.05.012
  21. C.H. You, S.N. Koh, and S. Rahardja, "Improved Adaptive ${\beta}$- Order MMSE Speech Enhancement," APSIPA Ann. Summit Conf., Sapporo, Japan, Oct. 4-7, 2009, pp. 797-800.
  22. D.D. Greenwood, "A Cochlear Frequency-Position Function for Several Species-29 Years Later," J. Acoust. Soc. America, vol. 87, no. 6, July 1990, pp. 2592-2605. https://doi.org/10.1121/1.399052
  23. Multimedia Technology Laboratory homepage, Accessed Nov. 20, 2015. http://imsp.kw.ac.kr/Research.html
  24. E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469. https://doi.org/10.1109/TSA.2005.858005
  25. R.C. Hendriks et al., "Minimum Mean-Square Error Amplitude Estimators for Speech Enhancement under the Generalized Gamma Distribution," Int. Workshop Acoust. Echo Noise Contr., Paris, France, Sept. 12-14, 2006, pp. 1-4.
  26. Z. Rafii, A. Liutkus, and B. Pardo, "REPET for Background/Foreground Separation in Audio," in Blind Source Separation: Advances in Theory, Algorithms and Appl., Berlin, Germany: Springer, 2014, pp. 395-411.
  27. P.S. Huang et al., "Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.