Vocal separation method using weighted β-order minimum mean square error estimation based on kernel back-fitting

커널 백피팅 알고리즘 기반의 가중 β-지수승 최소평균제곱오차 추정방식을 적용한 보컬음 분리 기법

  • Received : 2015.08.13
  • Accepted : 2015.09.17
  • Published : 2016.01.31


In this paper, we propose a vocal separation method using weighted ${\beta}$-order minimum mean wquare error estimation (WbE) based on kernel back-fitting algorithm. In spoken speech enhancement, it is well-known that the WbE outperforms the existing Bayesian estimators such as the minimum mean square error (MMSE) of the short-time spectral amplitude (STSA) and the MMSE of the logarithm of the STSA (LSA), in terms of both objective and subjective measures. In the proposed method, WbE is applied to a basic iterative kernel back-fitting algorithm for improving the vocal separation performance from monaural music signal. The experimental results show that the proposed method achieves better separation performance than other existing methods.


Vocal separation;Kernel back-fitting;Weighted ${\beta}$-order MMSE estimation


  1. S. Vembu and S. Baumann "Separation of vocals from polyphonic audio recordings," in Proc. International Society for Music Information Retrieval Conference, 337-344 (2005).
  2. Z. Rafii and B. Pardo, "Repeating pattern extraction technique (REPET): a simple method for music/voice separation," IEEE Trans. Audio, Speech, Language Process. 21, 71-82 (2013).
  3. A. Liutkus, D. Fitzgerald, Z. Raffi, B. Pardo, and L. Daudet, "Kernel additive models for source separation," IEEE Trans. Signal Process. 62, 4298-4310 (2014).
  4. E. Plourde and B. Champagne, "Auditory-based spectral amplitude estimators for speech enhancement," IEEE Trans. Audio, Speech, Language Process. 16, 1614-1623 (2008).
  5. F. Deng, F. Bao, and C.-C. Bao, "Speech enhancement using generalized ${\beta}$-order spectral amplitude estimator," Speech Commun. 59, 55-68 (2014).
  6. E.Vincent, R. Griboncal, and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, Language Process. 14, 1462-1469 (2006).
  7. A. Liutkus, D. Fitzgerald, and Z. Rafii, "Scalable audio separation with light kernel additive modeling," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 76-80 (2015).


Supported by : 정보통신기술진흥센터, 한국연구재단