A NMF-Based Speech Enhancement Method Using a Prior Time Varying Information and Gain Function

Kwon, Kisoo;Jin, Yu Gwang;Bae, Soo Hyun;Kim, Nam Soo;

doi:10.7840/kics.2013.38C.6.503

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Volume 38C Issue 6
/
Pages.503-511
/
2013
/
1226-4717(pISSN)
/
2287-3880(eISSN)

The Korean Institute of Commucations and Information Sciences (한국통신학회)

DOI QR Code

A NMF-Based Speech Enhancement Method Using a Prior Time Varying Information and Gain Function

시간 변화에 따른 사전 정보와 이득 함수를 적용한 NMF 기반 음성 향상 기법

권기수 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소) ;
진유광 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소) ;
배수현 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소) ;
김남수 (서울대학교 전기.정보공학부 및 뉴미디어통신공동연구소)

Received : 2013.03.28
Accepted : 2013.05.15
Published : 2013.06.30

https://doi.org/10.7840/kics.2013.38C.6.503 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a speech enhancement method using non-negative matrix factorization. In training phase, we can obtain each basis matrix from speech and specific noise database. After training phase, the noisy signal is separated from the speech and noise estimate using basis matrix in enhancement phase. In order to improve the performance, we model the change of encoding matrix from training phase to enhancement phase using independent Gaussian distribution models, and then use the constraint of the objective function almost same as that of the above Gaussian models. Also, we perform a smoothing operation to the encoding matrix by taking into account previous value. Last, we apply the Log-Spectral Amplitude type algorithm as gain function.

본 논문은 비음수 행렬 인수분해(NMF)를 이용한 음성향상 기법을 다루고 있다. 음성과 잡음에서 적절한 훈련을 통해 각각의 기저(basis) 행렬을 구하고 이 행렬들을 이용하여 두 음원을 분리 하는 것이다. 이 때 훈련으로부터, 시간 흐름에 따른 기저 사용량의 변화량을 각기 독립적인 가우시안 모델들로 만들고, 이를 이용하여 매 시간 프레임에서 주어진 모델들에 일정 가중치만큼 가까워지는 방향으로 최적화를 수행하였다. 또한 매 시간 얻은 NMF의 부호화 행렬의 결과를 이전 시간 프레임의 부호화 행렬 값과 평활화(smoothing) 과정을 수행하였다. 향상 과정에서는 Log-spectral Amplitude를 이용하여 이득(gain) 함수를 구하였다. 실험 결과에서는 PESQ 값을 지표로 사용하였고, 기존의 NMF를 이용한 음성 향상 보다 이 두 과정을 적용한 방법이 뛰어남을 확인 했다.

Keywords

References

Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol. 33 no. 2, pp. 443-445, Apr. 1985. https://doi.org/10.1109/TASSP.1985.1164550
I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Process. vol. 81, no. 11, pp. 2403-2418, Nov. 2001. https://doi.org/10.1016/S0165-1684(01)00128-1
N. S. Kim and J.-H. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Process. Lett. vol. 7, no. 5, pp. 108-110, May 2000. https://doi.org/10.1109/97.841154
J.-H. Chang and N.S. Kim, "Noisy speech enhancement based on multiple statistical models," Telecommun. Review, vol. 16, no. 4, pp.731-747, Aug. 2006.
D. D. Lee and H. S. Seung, "Learning the parts of objects by nonnegative matrix factorization," Nature, vol. 401, pp. 788-791, Oct. 1999. https://doi.org/10.1038/44565
C.-J. Lin, "Projected gradient methods for non-negative matrix factorization," Neural Computation. vol. 19, no. 10, pp. 2756-2779, Oct. 2007. https://doi.org/10.1162/neco.2007.19.10.2756
R. Zdunek and A. Cichocki, "Non-negative matrix factorization with quasi-Newton optimization," in Proc. 8th Int. Conf. Artificial Intell. Soft Comput. (ICAISC 2006), pp. 870-879, Zakopane, Poland, June 2006.
A. Cichocki, R. Zdunek, and S. Amari, "New algorithms for non-negative matrix factorization in application to blind source separation," IEEE Acoust. Speech Signal Process., vol. 5, pp. 14-19, May 2006.
T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization With temporal continuity and sparseness criteria," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 3, pp. 1066-1074, Mar. 2007. https://doi.org/10.1109/TASL.2006.885253
P. D. O'Grady and B. A. Pearlmutter, "Convolutive non-negative matrix factorization with a sparseness constraint," in Proc. 16th IEEE Signal Process. Soc. Workshop Machine Learning Signal Process., pp. 427-432, Maynooth, Ireland, Sep. 2006.
A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. Pascual-Marqui, "Nonsmooth nonnegative matrix factorization (nsNMF)," IEEE Trans. Pattern Anal. Machine Intell., vol. 28, no. 3, pp. 403-415, Mar. 2006. https://doi.org/10.1109/TPAMI.2006.60
P. O. Hoyer, "Non-negative sparse coding," in Proc. IEEE Workshop Neural Networks for Signal Process., pp. 557-565, Martigny, Switzerland, Sep. 2002.
D. Wang and J. Lim, "The unimportance of phase in speech enhancement," IEEE Trans. Acoust. Speech Signal Process., vol. 30, no. 4, pp. 679-681, Aug. 1982. https://doi.org/10.1109/TASSP.1982.1163920
K. W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran, "Speech denoising using nonnegative matrix factorization with priors," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2008, pp. 4029-4032, Las Vegas, U.S.A., Apr. 2008.

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

A NMF-Based Speech Enhancement Method Using a Prior Time Varying Information and Gain Function

시간 변화에 따른 사전 정보와 이득 함수를 적용한 NMF 기반 음성 향상 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)