DOI QR코드

DOI QR Code

시간 연속성을 갖는 비음수 행렬 분해를 이용한 음질 개선

Speech Enhancement Using Nonnegative Matrix Factorization with Temporal Continuity

  • 투고 : 2015.01.27
  • 심사 : 2015.03.27
  • 발행 : 2015.05.31

초록

본 논문은 시간 연속성을 갖는 비음수 행렬 분해(Nonnegative Matrix Factorization, NMF)를 이용하여 잡음에 열화된 음성 신호의 음질을 개선하는 문제를 다룬다. 음성과 잡음 신호는 포아송 분포로 모델되며, NMF의 기본 벡터와 이득 벡터는 감마 분포로 모델된다. 이득 벡터의 시간 연속성은 음질 개선에 중요한 영향을 미치는 것으로 알려져 있다. 본 논문에서 시간의 연속성은 이득 벡터를 감마-마르코프 연쇄(Gamma-Markov chain, GMC) 사전 분포로 모델함으로써 이루어진다. 실험 결과는 제안된 알고리즘이 잡음 신호의 시간 연속성을 효과적으로 모델하는 것을 보여준다.

In this paper, speech enhancement using nonnegative matrix factorization with temporal continuity has been addressed. Speech and noise signals are modeled as Possion distributions, and basis vectors and gain vectors of NMF are modeled as Gamma distributions. Temporal continuity of the gain vector is known to be critical to the quality of enhanced speech signals. In this paper, temporal continiuty is implemented by adopting Gamma-Markov chain priors for noise gain vectors during the separation phase. Simulation results show that the Gamma-Markov chain models temporal continuity of noise signals and track changes in noise effectively.

키워드

참고문헌

  1. P. Loizou, Speech Enhancement: Theory and Practice, 2nd Ed., (CRC Press, Inc. Boca Raton, FL, 2013), pp. 1-6.
  2. P. Smaragdis, C. Fevotte, G. J. Mysore, N. Mohammadiha, and M. Hoffman, "Static and dynamic source separation using nonnegative factorization," IEEE Sigal Processing Magazine, 66-75 (2014).
  3. D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, 401, 788-791 (1999). https://doi.org/10.1038/44565
  4. A. T. Cemgil, "Bayesian inference in non-negative matrix factorisation models," Computational Intelligence and Neuroscience, 2009, Article ID 785152, 1-17 (2009).
  5. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, NewYork, 2006), pp. 462-466.
  6. N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement using NMF," IEEE Trans. Audio, Speech, Lang. Processing, 21, 2140-2151, (2013). https://doi.org/10.1109/TASL.2013.2270369
  7. T. Virtanen, A.T. Cemgil, and S. Godsill, "Bayesian extension to non-negative matrix factorization for audio signal modeling," IEEE Int. Conf. on Acousts, Speech and Signal Process. 1825-1828, (2008).
  8. http://www.uni-oldenburg.de/en/mediphysics-acoustics/sigproc/staff/nasser-mohammadiha/matlab-codes/
  9. A. T. Cemgil, and O. Dikmen, "Conjugate Gamma Markov random fields for modeling nonstationary sources," 7th Int. Conf. on Independent Component Analysis and Signal Separation, 697-705, (2007)
  10. S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, Objective Measures of Speech Quality (Prentice Hall, New Jersey, 1988), pp. 45.
  11. E. Vincent, R. Grivonval, and C. Fevotte, "Performance measurement in blind audio source separation." IEEE Trans. Audio, Speech, and Language Process. 14, 1462-1469 (2006). https://doi.org/10.1109/TSA.2005.858005