DOI QR코드

DOI QR Code

Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting

커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정을 적용한 배경음과 보컬음 분리

  • 이준용 (광운대학교 전파공학과) ;
  • 김형국 (광운대학교 전파공학과)
  • Received : 2014.12.15
  • Accepted : 2015.04.01
  • Published : 2015.05.31

Abstract

In this paper, we propose music and voice separation using kernel sptectrogram models backfitting based on log-spectral amplitude estimator. The existing method separates sources based on the estimate of a desired objects by training MSE (Mean Square Error) designed Winer filter. We introduce rather clear music and voice signals with application of log-spectral amplitude estimator, instead of adaptation of MSE which has been treated as an existing method. Experimental results reveal that the proposed method shows higher performance than the existing methods.

본 논문은 커널 스펙트럼 모델 backfitting 기반의 로그 스펙트럼 진폭 추정부를 적용한 배경음과 보컬음 분리를 제안한다. 기존의 커널 스펙트럼 모델 기반의 배경음과 보컬음 분리는 추출하고자하는 객체의 모델을 기반으로 위너형태의 평균 제곱의 오차의 이득값을 학습함으로써 배경음과 보컬음을 분리하는 기술이다. 본 논문은 기존의 커널 스펙트럴 모델 기반의 배경음과 보컬음 분리 방식에서 위너형태의 이득값 대신 로그 스펙트럼 진폭 추정을 적용하여 기존 방식 보다 명료한 배경음과 보컬음을 추출한다. 실험결과는 본 논문에서 제안한 방식이 기존의 방식들보다 더 우수하다는 것을 보인다.

Keywords

References

  1. P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications (Academic Press, 2010). pp. 208-214.
  2. P.-S. Huang, S. D. Chen, P. Smaragdis, and M. H. Johnson, "Singing-voice separation from monaural recordings using robust principal component analysis," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 57-60 (2012).
  3. A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," Audio, Speech, and Language Processing, IEEE Transactions on, 1118-1133 (2011)
  4. Z. Rafii and B. Pardo, "Repeating pattern extraction technique (REPET): A simple method for music/voice separation," IEEE Transactions on Audio, Speech & Language Processing, 71-82 (2013).
  5. A. Liutkus, Z. Rafii, E. Fitzgerald and L. Daudet, "Kernel spectrogram models for source separation," 4th Joint Workshop on Hands-free Speech Communication Microphone Arrays, (2014).
  6. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process, 443-445 (1985).
  7. B. J. Shannon and K. K. Paliwal, "Role of phase estimation in speech enhancement," in Proc. 9th Int. Conf. Spoken Language Processing - Interspeech, Pittsburgh, PA, 1423-1426 (2006).
  8. Y. Ephraim and I. Cohen, "Recent advancements in speech enhancement," in the Electrical Engineering Handbook, (CRC press, 2005).
  9. E. Vincent, R. Gribonval, and M. Plumbley, "Oracle estimators for the benchmarking of source separation algorithms," Signal Processing, 1933-1950, (2007).

Cited by

  1. An Overview of Lead and Accompaniment Separation in Music vol.26, pp.8, 2018, https://doi.org/10.1109/TASLP.2018.2825440