Search | Korea Science

Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output (자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식)

Park, Chul-Ho;Bae, Jae-Chul;Bae, Keun-Sung
- MALSORI
- /
- no.62
- /
- pp.85-96
- /
- 2007
In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.
PDF

Hardware Implementation for Real-Time Speech Processing with Multiple Microphones

Seok, Cheong-Gyu;Choi, Jong-Suk;Kim, Mun-Sang;Park, Gwi-Tea
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.215-220
- /
- 2005
Nowadays, various speech processing systems are being introduced in the fields of robotics. However, real-time processing and high performances are required to properly implement speech processing system for the autonomous robots. Achieving these goals requires advanced hardware techniques including intelligent software algorithms. For example, we need nonlinear amplifier boards which are able to adjust the compression radio (CR) via computer programming. And the necessity for noise reduction, double-buffering on EPLD (Erasable programmable logic device), simultaneous multi-channel AD conversion, distant sound localization will be explained in this paper. These ideas can be used to improve distant and omni-directional speech recognition. This speech processing system, based on embedded Linux system, is supposed to be mounted on the new home service robot, which is being developed at KIST (Korea Institute of Science and Technology)
PDF

A User-friendly Remote Speech Input Method in Spontaneous Speech Recognition System

Suh, Young-Joo;Park, Jun;Lee, Young-Jik
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.2E
- /
- pp.38-46
- /
- 1998
In this paper, we propose a remote speech input device, a new method of user-friendly speech input in spontaneous speech recognition system. We focus the user friendliness on hands-free and microphone independence in speech recognition applications. Our method adopts two algorithms, the automatic speech detection and the microphone array delay-and-sum beamforming (DSBF)-based speech enhancement. The automatic speech detection algorithm is composed of two stages; the detection of speech and nonspeech using the pitch information for the detected speech portion candidate. The DSBF algorithm adopts the time domain cross-correlation method as its time delay estimation. In the performance evaluation, the speech detection algorithm shows within-200 ms start point accuracy of 93%, 99% under 15dB, 20dB, and 25dB signal-to-noise ratio (SNR) environments, respectively and those for the end point are 72%, 89%, and 93% for the corresponding environments, respectively. The classification of speech and nonspeech for the start point detected region of input signal is performed by the pitch information-base method. The percentages of correct classification for speech and nonspeech input are 99% and 90%, respectively. The eight microphone array-based speech enhancement using the DSBF algorithm shows the maximum SNR gaing of 6dB over a single microphone and the error reductin of more than 15% in the spontaneous speech recognition domain.
PDF

On a Reduction of Computation Time of FFT Cepstrum (FFT 켑스트럼의 처리시간 단축에 관한 연구)

Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.2
- /
- pp.57-64
- /
- 2003
The cepstrum coefficients are the most popular feature for speech recognition or speaker recognition. The cepstrum coefficients are also used for speech synthesis and speech coding but has major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of FFT cepstrum analysis. We use the normal ordered inputs for FFT function and the bit-reversed inputs for IFFT function. Therefore we can omit the bit-reversing process and reduce the processing time of FFT ceptrum analysis.
PDF

A study on the Visible Speech Processing System for the Hearing Impaired (청각 장애자를 위한 시각 음성 처리 시스템에 관한 연구)

김원기;김남현
- Journal of Biomedical Engineering Research
- /
- v.11 no.1
- /
- pp.75-82
- /
- 1990
The purpose of this study is to help the hearing Impaired's speech training with a visible speech processing system. In brief, this system converts the features of speech signals into graphics on monitor, and adjusts the features of hearing impaired to normal ones. There are formant and pitch in the features used for this system. They are extracted using the digital signal processing such as linear predictive method or AMDF(Average Magnitude Difference Function). In order to effectively train for the hearing impaired's abnormal speech, easilly visible feature has been being studied.
PDF

Performance Enhancement of Speech Intelligibility in Communication System Using Combined Beamforming (directional microphone) and Speech Filtering Method (방향성 마이크로폰과 음성 필터링을 이용한 통신 시스템의 음성 인지도 향상)

Shin, Min-Cheol;Wang, Se-Myung
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2005.05a
- /
- pp.334-337
- /
- 2005
The speech intelligibility is one of the most important factors in communication system. The speech intelligibility is related with speech to noise ratio. To enhance the speech to noise ratio, background noise reduction techniques are being developed. As a part of solution to noise reduction, this paper introduces directional microphone using beamforming method and speech filtering method. The directional microphone narrows the spatial range of processing signal into the direction of the target speech signal. The noise signal located in the same direction with speech still remains in the processing signal. To sort this mixed signal into speech and noise, as a following step, a speech-filtering method is applied to pick up only the speech signal from the processed signal. The speech filtering method is based on the characteristics of speech signal itself. The combined directional microphone and speech filtering method gives enhanced performance to speech intelligibility in communication system.
PDF

Introduction of ETRI Broadcast News Speech Recognition System (ETRI 방송뉴스음성인식시스템 소개)

Park Jun
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.89-93
- /
- 2006
This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.
PDF

Implementation and Performance Evaluation of the System for Speech Services using VMEbus (VMEbus 를 이용한 음성 서비스 시스템의 구현 및 성능평가)

Kwon, Oh-Il;Kang, Kyung-Young;Kim, Tong-Ha;Rhee, Tae-Won
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.1
- /
- pp.93-101
- /
- 1996
In this paper, we implement the system for speech processing to provide the subscribers who are using the telephone network with better speech services. We develop the specified board which is processing speech signal and devise the system which carries out storing and replaying the speech signal under the condition that one master board controls multiple DSP(Digital Signal Processing) boards using VME bus. We use CPU30 board as a maste board and develop SPM(Signal Processing Module) board as a DSP board and then evaluate performance of the system.
PDF

Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

Chung, Yong-Joo
- Journal of the Institute of Convergence Signal Processing
- /
- v.15 no.2
- /
- pp.37-41
- /
- 2014
The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.
PDF KSCI

Design and Implementation of a Text-to Speech System using the Prosody and Duration Information (운율 및 길이 정보를 이용한 무제한 음성 합성기의 설계 및 구현)

Yang, Jin-Seok;Kim, Jae-Beom;Lee, Jeong-Hyeon
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.5
- /
- pp.1121-1129
- /
- 1996
To produce more natural speech in a Text-to-Speech system, the processing of the prosody and duration must be processing in advance, and then extracted the prosody and duration information by means of trial-and-error experiments. In this paper, a method is proposed to improve the naturalness in a Text-to Speech system using this information. As the results, the Text-to-Speech system proposed and implemented in this paper showed more natural speech synthesis than the systems, which do not use this information, did.
PDF

Search Result 947, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)