• Title/Summary/Keyword: Binary mask

Search Result 21, Processing Time 0.07 seconds

Binary Mask Criteria Based on Distortion Constraints Induced by a Gain Function for Speech Enhancement

  • Kim, Gibak
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.4
    • /
    • pp.197-202
    • /
    • 2013
  • Large gains in speech intelligibility can be obtained using the SNR-based binary mask approach. This approach retains the time-frequency (T-F) units of the mixture signal, where the target signal is stronger than the interference noise (masker) (e.g., SNR > 0 dB), and removes the T-F units, where the interfering noise is dominant. This paper introduces two alternative binary masks based on the distortion constraints to improve the speech intelligibility. The distortion constraints are induced by a gain function for estimating the short-time spectral amplitude. One binary mask is designed to retain the speech underestimated (T-F) units while removing the speech overestimated (T-F)units. The other binary mask is designed to retain the noise overestimated (T-F) units while removing noise underestimated (T-F) units. Listening tests with oracle binary masks were conducted to assess the potential of the two binary masks in improving the intelligibility. The results suggested that the two binary masks based on distortion constraints can provide large gains in intelligibility when applied to noise-corrupted speech.

  • PDF

Eigenvoice Adaptation of Classification Model for Binary Mask Estimation (Eigenvoice를 이용한 이진 마스크 분류 모델 적응 방법)

  • Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.20 no.1
    • /
    • pp.164-170
    • /
    • 2015
  • This paper deals with the adaptation of classification model in the binary mask approach to suppress noise in the noisy environment. The binary mask estimation approach is known to improve speech intelligibility of noisy speech. However, the same type of noisy data for the test data should be included in the training data for building the classification model of binary mask estimation. The eigenvoice adaptation is applied to the noise-independent classification model and the adapted model is used as noise-dependent model. The results are reported in Hit rates and False alarm rates. The experimental results confirmed that the accuracy of classification is improved as the number of adaptation sentences increases.

Two-Microphone Binary Mask Speech Enhancement in Diffuse and Directional Noise Fields

  • Abdipour, Roohollah;Akbari, Ahmad;Rahmani, Mohsen
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.772-782
    • /
    • 2014
  • Two-microphone binary mask speech enhancement (2mBMSE) has been of particular interest in recent literature and has shown promising results. Current 2mBMSE systems rely on spatial cues of speech and noise sources. Although these cues are helpful for directional noise sources, they lose their efficiency in diffuse noise fields. We propose a new system that is effective in both directional and diffuse noise conditions. The system exploits two features. The first determines whether a given time-frequency (T-F) unit of the input spectrum is dominated by a diffuse or directional source. A diffuse signal is certainly a noise signal, but a directional signal could correspond to a noise or speech source. The second feature discriminates between T-F units dominated by speech or directional noise signals. Speech enhancement is performed using a binary mask, calculated based on the proposed features. In both directional and diffuse noise fields, the proposed system segregates speech T-F units with hit rates above 85%. It outperforms previous solutions in terms of signal-to-noise ratio and perceptual evaluation of speech quality improvement, especially in diffuse noise conditions.

Adaptation of Classification Model for Improving Speech Intelligibility in Noise (음성 명료도 향상을 위한 분류 모델의 잡음 환경 적응)

  • Jung, Junyoung;Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.511-518
    • /
    • 2018
  • This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to "0" or "1" according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment.

Method for Spectral Enhancement by Binary Mask for Speech Recognition Enhancement Under Noise Environment (잡음환경에서 음성인식 성능향상을 위한 바이너리 마스크를 이용한 스펙트럼 향상 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.7
    • /
    • pp.468-474
    • /
    • 2010
  • The major factor that disturbs practical use of speech recognition is distortion by the ambient and channel noises. Generally, the ambient noise drops the performance and restricts places to use. DSR (Distributed Speech Recognition) based speech recognition also has this problem. Various noise cancelling algorithms are applied to solve this problem, but loss of spectrum and remaining noise by incorrect noise estimation at low SNR environments cause drop of recognition rate. This paper proposes methods for speech enhancement. This method uses MMSE-STSA for noise cancelling and ideal binary mask to compensate damaged spectrum. According to experiments at noisy environment (SNR 15 dB ~ 0 dB), the proposed methods showed better spectral results and recognition performance.

Noise Reduction Using Gaussian Mixture Model and Morphological Filter (가우스 혼합모델과 형태학적 필터를 이용한 잡음 제거)

  • Eom Il-Kyu;Kim Yoo-Shin
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.1
    • /
    • pp.29-36
    • /
    • 2004
  • Generally, wavelet coefficients can be classified into two categories: large coefficients with much signal information and small coefficients with little signal component. This statistical characteristic of wavelet coefficient is approximated to Gaussian mixture model and efficiently applied to noise reduction. In this paper, we propose an image denoising method using mixture modeling of wavelet coefficients. Binary mask value is generated by proper threshold which classifies wavelet coefficients into two categories. Information of binary mask value is used to remove image noise. We also develope an enhancement method of mask value using morphological filter, and apply it to image denoising for improvement of the proposed method. Simulation results shows the proposed method have better PSNRs than those of the state of art denoising methods.

Adaptive Noise Canceller for Speech Enhancement Using 2-D Binary Mask (2차원 이진 마스크를 이용한 적응형 음성향상 잡음 제거기)

  • Lee, Gihyoun;Lee, Jyung Hyun;Cho, Jin-Ho;Kim, Myoung Nam
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.7
    • /
    • pp.1127-1136
    • /
    • 2016
  • Speech enhancement algorithm plays an important role in numerous speech signal processing applications. Over the last few decades, many algorithms have been studied for speech enhancement. The algorithms are based on spectral subtraction, Wiener filter, and subspace method etc. They have good performance of speech enhancement, but the performance can be deteriorated in specific noises or low SNR environment. In this paper, a new speech enhancement algorithms are proposed based on adaptive noise canceller. And the proposed algorithm improved performance of adaptive noise cancelling using 2-D binary mask. From objective experimental index, it is confirmed that the proposed algorithm is useful and has better performance than recently proposed speech enhancement algorithms.

Resolution Limit Analysis of Isolated Patterns Using Optical Proximity Correction Method with Attenuated Phase Shift Mask (Attenuated Phase Shift Mask에 광 근접 효과 보정을 적용한 고립 패턴의 해상 한계 분석)

  • 김종선;오용호;임성우;고춘수;이재철
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.13 no.11
    • /
    • pp.901-907
    • /
    • 2000
  • As the minimum feature size for making ULSI approaches the wavelength of light source in optical lithography, the aerial image is so hardly distorted because of the optical proximity effect that the accurate mask image reconstruction on wafer surface is almost impossible. We applied the Optical Proximity Correction(OPC) on isolated patterns assuming Attenuated Phase Shift Mask(APSM) as well as binary mask, to correct the widening of isolated patterns. In this study, we found that applying OPC to APSM shows much better improvement not only in enhancing the resolution and fidelity of t도 images but also in enhancing the process margin than applying OPC to the binary mask. Also, we propose the OPC method of APSM for isolated patterns, the size of which is less than the wavelength of the ArF excimer laser. Finally, we predicted the resolution limit of optical lithography through the aerial image simulation.

  • PDF

Binary Mask Estimation using Training-based SNR Estimation for Improving Speech Intelligibility (음성 명료도 향상을 위한 학습 기반의 신호 대 잡음 비 추정을 이용한 이산 마스크 추정 방법)

  • Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.17 no.6
    • /
    • pp.1061-1068
    • /
    • 2012
  • This paper deals with a noise reduction algorithm which uses the binary masking approach in the time-frequency domain to improve speech intelligibility. In the binary masking approach, the noise-corrupted speech is decomposed into time-frequency units. Noise-dominant time-frequency units are removed by setting the corresponding binary masks as "0"s and target-dominant units are retained untouched by assigning mask "1"s. We propose a binary mask estimation by comparing the local signal-to-noise ratio (SNR) to a threshold. The local SNR is estimated by a training-based approach. An optimal threshold is proposed, which is obtained from observing the distribution of the training database. The proposed method is evaluated by normal-hearing subjects and the intelligibility scores are computed by counting the number of words correctly recognized.