• Title/Summary/Keyword: Noisy environments

Search Result 284, Processing Time 0.029 seconds

Feature Compensation Combining SNR-Dependent Feature Reconstruction and Class Histogram Equalization

  • Suh, Young-Joo;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.30 no.5
    • /
    • pp.753-755
    • /
    • 2008
  • In this letter, we propose a new histogram equalization technique for feature compensation in speech recognition under noisy environments. The proposed approach combines a signal-to-noise-ratio-dependent feature reconstruction method and the class histogram equalization technique to effectively reduce the acoustic mismatch present in noisy speech features. Experimental results from the Aurora 2 task confirm the superiority of the proposed approach for acoustic feature compensation.

  • PDF

A Comparison of Front-Ends for Robust Speech Recognition

  • Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.3E
    • /
    • pp.3-11
    • /
    • 1998
  • Zero-crossings with Peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features form speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.

  • PDF

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

  • Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.11-16
    • /
    • 2013
  • This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Optimized Wiener Filter for Noise Reduction in VoIP Environments (VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터)

  • Jeong, Sang-Bae;Lee, Sung-Doke;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments (엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출)

  • Ha, Dong-Gyung;Cho, Seok-Je;Jin, Gang-Gyoo;Shin, Ok-Keun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.768-774
    • /
    • 2008
  • In many areas of speech signal processing such as automatic speech recognition and packet based voice communication technique, VAD (voice activity detection) plays an important role in the performance of the overall system. In this paper, we present a new feature parameter for VAD which is the product of energy of the signal and the difference of two types of entropies. For this end, we first define a Mel filter-bank based entropy and calculate its difference from the conventional entropy in frequency domain. The difference is then multiplied by the spectral energy of the signal to yield the final feature parameter which we call PEED (product of energy and entropy difference). Through experiments. we could verify that the proposed VAD parameter is more efficient than the conventional spectral entropy based parameter in various SNRs and noisy environments.

Practical Considerations for Hardware Implementations of the Auditory Model and Evaluations in Real World Noisy Environments

  • Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1E
    • /
    • pp.15-23
    • /
    • 1997
  • Zero-Crossings with Peak Amplitudes(ZCPA) model motivated by human auditory periphery was proposed to extract reliable features speech signals even in noisy environments for robust speech recognition. In this paper, some practical considerations for digital hardware implementations of the ZCPA model are addressed and evaluated for recognition of speech corrupted by several real world noises as well as white Gaussian noise. Infinite impulse response(IIR) filters which constitute the cochliar filterbank of the ZCPA are replaced by hamming bandpass filters of which frequency responses are less similar to biological neural tuning curves. Experimental results demonstrate that the detailed frequency response of the cochlear filters are not critical to performance. Also, the sensitivity of the model output to the variations in microphone gain is investigated, and results in good reliability of the ZCPA model.

  • PDF

The Performance Improvement of Speech Recognition System based on Stochastic Distance Measure

  • Jeon, B.S.;Lee, D.J.;Song, C.K.;Lee, S.H.;Ryu, J.W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.254-258
    • /
    • 2004
  • In this paper, we propose a robust speech recognition system under noisy environments. Since the presence of noise severely degrades the performance of speech recognition system, it is important to design the robust speech recognition method against noise. The proposed method adopts a new distance measure technique based on stochastic probability instead of conventional method using minimum error. For evaluating the performance of the proposed method, we compared it with conventional distance measure for the 10-isolated Korean digits with car noise. Here, the proposed method showed better recognition rate than conventional distance measure for the various car noisy environments.

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

Noisy Speech Recognition using Probabilistic Spectral Subtraction (확률적 스펙트럼 차감법을 이용한 잡은 환경에서의 음성인식)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.94-99
    • /
    • 1997
  • This paper describes a technique of probabilistic spectral subtraction which uses the knowledge of both noise and speech so as to reduce automatic speech recognition errors in noisy environments. Spectral subtraction method estimates a noise prototype in non-speech intervals and the spectrum of clean speech is obtained from the spectrum of noisy speech by subtracting this noise prototype. Thus noise can not be suppressed effectively using a single noise prototype in case the characteristics of the noise prototype are different from those of the noise contained in input noisy speech. To modify such a drawback, multiple noise prototypes are used in probabilistic subtraction method. In this paper, the probabilistic characteristics of noise and the knowledge of speech which is embedded in hidden Markov models trained in clean environments are used to suppress noise. Futhermore, dynamic feature parameters are considered as well as static feature parameters for effective noise suppression. The proposed method reduced error rates in the recognition of 50 Korean words. The recognition rate was 86.25% with the probabilistic subtraction, 72.75% without any noise suppression method and 80.25% with spectral subtraction at SNR(Signal-to-Noise Ratio) 10 dB.

  • PDF

Speech Enhancement Algorithm Based on Teager Energy and Speech Absence Probability in Noisy Environments (잡음환경에서 Teager 에너지와 음성부재확률 기반의 음성향상 알고리즘)

  • Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.81-88
    • /
    • 2012
  • In this paper, we propose a novel speech enhancement algorithm for effective noise suppression in various noisy environments. In the proposed method, to result in improved decision performance for speech and noise segments, local speech absence probability (LSAP, local SAP) based on Teager energy of noisy speech is used as the feature parameter for voice activity detection (VAD) in each frequency subband instead of conventional LSAP. In addition, The presented method utilizes global SAP (GSAP) derived in each frame as the weighting parameter for the modification of the adopted TE operator to improve the performance of TE operator. Performances of the proposed algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.