Search | Korea Science

Speech Recognition in Car Noise Environments Using Multiple Models Based on a Hybrid Method of Spectral Subtraction and Residual Noise Masking

Song, Myung-Gyu;Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3E
- /
- pp.3-8
- /
- 1999
In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In this approach, multiple model sets are made according to several noise masking levels and then a model set appropriate for the estimated noise level is selected automatically in recognition phase. According to speaker independent isolated word recognition experiments in car noise environments, the proposed method using model sets with only two masking levels reduced average word error rate by 60% in comparison with spectral subtraction method.
PDF

Implementation of a Robust Speech Recognizer in Noisy Car Environment Using a DSP (DSP를 이용한 자동차 소음에 강인한 음성인식기 구현)

Chung, Ik-Joo
- Speech Sciences
- /
- v.15 no.2
- /
- pp.67-77
- /
- 2008
In this paper, we implemented a robust speech recognizer using the TMS320VC33 DSP. For this implementation, we had built speech and noise database suitable for the recognizer using spectral subtraction method for noise removal. The recognizer has an explicit structure in aspect that a speech signal is enhanced through spectral subtraction before endpoints detection and feature extraction. This helps make the operation of the recognizer clear and build HMM models which give minimum model-mismatch. Since the recognizer was developed for the purpose of controlling car facilities and voice dialing, it has two recognition engines, speaker independent one for controlling car facilities and speaker dependent one for voice dialing. We adopted a conventional DTW algorithm for the latter and a continuous HMM for the former. Though various off-line recognition test, we made a selection of optimal conditions of several recognition parameters for a resource-limited embedded recognizer, which led to HMM models of the three mixtures per state. The car noise added speech database is enhanced using spectral subtraction before HMM parameter estimation for reducing model-mismatch caused by nonlinear distortion from spectral subtraction. The hardware module developed includes a microcontroller for host interface which processes the protocol between the DSP and a host.
PDF

A Feed-forward Method for Reducing Current Mismatch in Charge Pumps (전하 펌프의 전류 부정합 감소를 위한 피드포워드 방식)

Lee, Jae-Hwan;Jeong, Hang-Geun
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.46 no.1
- /
- pp.63-67
- /
- 2009
Current mismatch in a charge pump causes degradation in spectral purity of the phase locked loops(PLLs), such as reference spurs. The current mismatch can be reduced by increasing the output resistance of the charge pump, as in a cascoded output stage. However as the supply voltage is lowered, it is hard to stack transistors. In this paper, a new method for reducing the current mismatch is proposed. The proposed method is based on a feed-forward compensation for the channel length modulation effect of the output stage. The new method has been demonstrated through simulations on typical $0.18{\mu}m$ CMOS circuits.
PDF KSCI

A Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis (코퍼스 기반 음성합성기를 위한 합성단위 경계 스펙트럼 평탄화 알고리즘)

Kim Sang-Jin;Jang Kyung Ae;Hahn Minsoo
- MALSORI
- /
- no.56
- /
- pp.225-235
- /
- 2005
Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the fullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.
PDF

Spectral Subtraction Using Spectral Harmonics for Robust Speech Recognition in Car Environments

Beh, Jounghoon;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2E
- /
- pp.62-68
- /
- 2003
This paper addresses a novel noise-compensation scheme to solve the mismatch problem between training and testing condition for the automatic speech recognition (ASR) system, specifically in car environment. The conventional spectral subtraction schemes rely on the signal-to-noise ratio (SNR) such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, these schemes are based on the postulation that the power spectrum of noise is in general at the lower level in magnitude than that of speech. Therefore, while such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. This paper proposes an efficient spectral subtraction scheme focused specifically to low SNR noisy environment by extracting harmonics distinctively in speech spectrum. Representative experiments confirm the superior performance of the proposed method over conventional methods. The experiments are conducted using car noise-corrupted utterances of Aurora2 corpus.
PDF KSCI

The Hybrid Bandwidth Extenstion Method Using Spectral Folding and GMM Transformation (Spectral Folding방법과 GMM 변환을 이용한 대역폭 확장의 Hybrid 방법)

Choi Mu-Yeol;Kim Hyung-Soon
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.131-134
- /
- 2006
The narrowband speech over the telephone network is lacking in the information from low-band (0-300 Hz) and high-band (3400-8000 Hz) that are found in wideband speech (0-8000 Hz). As a result, narrowband speech is characterized by the reduced intelligibility and muffled quality, and degraded speaker identification. Spectral folding is the easiest way to reconstruct the missing high-band; however, the reconstructed speech still brings the sense of band-limited characteristic because of the absence of low-band and mid-band frequency components. To compensate for the lack of the extended speech, we propose to combine the spectral folding method and GMM transformation method, which is a statistical method to reconstruct wideband speech. The reconstructed wideband speech showed that the absent frequency components was filled up with relatively low spectral mismatch. According to the subjective speech quality evaluations, the proposed method was preferred to other methods.
PDF

Effective frequency doubling of fs-pulse with simultaneous group velocity matching and quasi-phase matching in periodically poled lithium niobate (주기적으로 분극반전된 $LiNbO_3$에서 군속도 일치와 의사위상정합에 의한 펨토초 펄스의 효율적인 2차 조화파발생)

Lee, Yu-Nan;S. Kurimura;K. Kitamura;Hun, No-Jeong;Sik, Cha-Myeong
- Proceedings of the Optical Society of Korea Conference
- /
- 2003.02a
- /
- pp.224-225
- /
- 2003
Since group velocity (GV) mismatch significantly limits the efficiency of nonlinear interactions such as second harmonic generation (SHG), several techniques have been developed to compensate GV mismatch. The simplest way to avoid the GV mismatch problem is to reduce the device length. However, it results in a poor trade-off between the SHG spectral bandwidth and the conversion efficiency. (omitted)
PDF

Noise-Robust Speech Recognition Using Histogram-Based Over-estimation Technique (히스토그램 기반의 과추정 방식을 이용한 잡음에 강인한 음성인식)

권영욱;김형순
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.6
- /
- pp.53-61
- /
- 2000
In the speech recognition under the noisy environments, reducing the mismatch introduced between training and testing environments is an important issue. Spectral subtraction is widely used technique because of its simplicity and relatively good performance in noisy environments. In this paper, we introduce histogram method as a reliable noise estimation approach for spectral subtraction. This method has advantages over the conventional noise estimation methods in that it does not need to detect non-speech intervals and it can estimate the noise spectra even in time-varying noise environments. Even though spectral subtraction is performed using a reliable average noise spectrum by the histogram method, considerable amount of residual noise remains due to the variations of instantaneous noise spectrum about mean. To overcome this limitation, we propose a new over-estimation technique based on distribution characteristics of histogram used for noise estimation. Since the proposed technique decides the degree of over-estimation adaptively according to the measured noise distribution, it has advantages to be few the influence of the SNR variation on the noise levels. According to speaker-independent isolated word recognition experiments in car noise environment under various SNR conditions, the proposed histogram-based over-estimation technique outperforms the conventional over-estimation technique.
PDF

Harmonics-based Spectral Subtraction and Feature Vector Normalization for Robust Speech Recognition

Beh, Joung-Hoon;Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
- Speech Sciences
- /
- v.11 no.1
- /
- pp.7-20
- /
- 2004
In this paper, we propose a two-step noise compensation algorithm in feature extraction for achieving robust speech recognition. The proposed method frees us from requiring a priori information on noisy environments and is simple to implement. First, in frequency domain, the Harmonics-based Spectral Subtraction (HSS) is applied so that it reduces the additive background noise and makes the shape of harmonics in speech spectrum more pronounced. We then apply a judiciously weighted variance Feature Vector Normalization (FVN) to compensate for both the channel distortion and additive noise. The weighted variance FVN compensates for the variance mismatch in both the speech and the non-speech regions respectively. Representative performance evaluation using Aurora 2 database shows that the proposed method yields 27.18% relative improvement in accuracy under a multi-noise training task and 57.94% relative improvement under a clean training task.
PDF

Speech Recognition in Noisy Environrrents using Histogram-based Over-estimation (히스토그램 기반의 Over-estimation을 이용한 잡음환경에서의 음성인식)

권영욱
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.262-266
- /
- 1998
In the speech recognition under the noisy environments, reducing the mismatch introduced between training and testing environments is an important issue, and spectral subtraction is widely used technique because of its simplicity and relatively good performance in noisy environments. In this paper, we introduced histogram method as a reliable noise estimationi approach for spectral subtraction. To deal with the problem of residual noise after spectral subtraction, we proposed a new ove-estimation technique based on distribution characteristics of histogram used for noise estimation. Since the proposed technique decides the degree of over-estimation adaptively according to the measured noise distribution, it can cope with the SNR variations effectively in compared with the conventional over-estimation technique.
PDF

Search Result 23, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)