Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 21, Issue 8 - Nov 2002
Volume 21, Issue 7 - Oct 2002
Volume 21, Issue 6 - Aug 2002
Volume 21, Issue 5 - Jul 2002
Volume 21, Issue 4 - May 2002
Volume 21, Issue 3 - Apr 2002
Volume 21, Issue 2 - Feb 2002
Volume 21, Issue 1 - Jan 2002
Volume 21, Issue 1E - 00 2002
Selecting the target year
Design of a Low Bit-rate Speech Coder Based on Mixed Multi-band Excitation Model
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 510~521
MBE (multi-band excitation) coder can achieve high qualify synthetic speech below 4.0 kbps. There are, however, significant differences of the fine structure between the original spectrum and the synthetic spectrum. They are mainly due to the exclusive partition of voiced and unvoiced regions in frequency domain and the decision procedure based on the experimental threshold. This paper proposes MMBE (mixed multi-band excitation) speech model to overcome drawbacks of a MBE coder. In addition, two analysis methods, which do not need my decision procedure based on a threshold, are presented. Both voiced and unvoiced components can be mixed over all the frequency axis in the MMBE speech model. To illustrate the potential of the proposed speech model, we develop a 2.6 kbps MMBE coder and compare it with a 2.9 kbps MBE coder by both objective and subjective methods. The results have shown that the proposed coder has a better performance even at a lower bit-rate compared with the MBE coder.
Long-term Prediction of Speech Signal Using a Neural Network
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 522~530
This paper introduces a neural network (NN) -based nonlinear predictor for the LP (Linear Prediction) residual. To evaluate the effectiveness of the NN-based nonlinear predictor for LP-residual, we first compared the average prediction gain of the linear long-term predictor with that of the NN-based nonlinear long-term predictor. Then, the effects on the quantization noise of the nonlinear prediction residuals were investigated for the NN-based nonlinear predictor A new NN predictor takes into consideration not only prediction error but also quantization effects. To increase robustness against the quantization noise of the nonlinear prediction residual, a constrained back propagation learning algorithm, which satisfies a Kuhn-Tucker inequality condition is proposed. Experimental results indicate that the prediction gain of the proposed NN predictor was not seriously decreased even when the constrained optimization algorithm was employed.
Tandemless Transcoding for AMR and EVRC Speech Coders
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 531~542
Novel tandemless transcoding method for AMR and EVRC speech coders is proposed in this paper. In contrast to conventional tandem method, the parameters which is used commonly in speech coder where CELP algorithm is adapted are directly transcoded. The proposed algorithm is composed of LSP transcoding, pitch delay transcoding, gains transcoding and fixed codebook vector transcoding Evaluation results show that the novel algorithm achieves better speech quality than tandem method and reduce computational complexity and delay.
Large Vocabulary Continuous Speech Recognition Based on Language Model Network
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 543~551
In this paper, we present an efficient decoding method that performs in real time for 20k word continuous speech recognition task. Basic search method is a one-pass Viterbi decoder on the search space constructed from the novel language model network. With the consistent search space representation derived from various language models by the LM network, we incorporate basic pruning strategies, from which tokens alive constitute a dynamic search space. To facilitate post-processing, it produces a word graph and a N-best list subsequently. The decoder is tested on the database of 20k words and evaluated with respect to accuracy and RTF.
Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 552~558
Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.
An Efficient Approach for Noise Robust Speech Recognition by Using the Deterministic Noise Model
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 559~565
In this paper, we proposed an efficient method that estimates the HMM (Hidden Marke Model) parameters of the noisy speech. In previous methods, noisy speech HMM parameters are usually obtained by analytical methods using the assumed noise statistics. However, as they assume some simplication in the methods, it is difficult to come closely to the real statistics for the noisy speech. Instead of using the simplication, we used some useful statistics from the clean speech HMMs and employed the deterministic noise model. We could find that the new scheme showed improved results with reduced computation cost.
Audio Quality Enhancement at a Low-bit Rate Perceptual Audio Coding
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 566~575
Low-titrate audio coding enables a number of Internet and mobile multimedia streaming service more efficiently. For the help of next-generation mobile telephone technologies and digital audio/video compression algorithm, we can enjoy the real-time multimedia contents on our mobile devices (cellular phone, PDA notebook, etc). But the limited available bandwidth of mobile communication network prohibits transmitting high-qualify AV contents. In addition, most bandwidth is assigned to transmit video contents. In this paper, we design a novel and simple method for reproducing high frequency components. The spectrum of high frequency components, which are lost by down-sampling, are modeled by the energy rate with low frequency band in Bark scale, and these values are multiplexed with conventional coded bitstream. At the decoder side, the high frequency components are reconstructed by duplicating with low frequency band spectrum at a rate of decoded energy rates. As a result of segmental SNR and MOS test, we convinced that our proposed method enhances the subjective sound quality only 10%∼20% additional bits. In addition, this proposed method can apply all kinds of frequency domain audio compression algorithms, such as MPEG-1/2, AAC, AC-3, and etc.
A Study on the Denoising Method by Multi-threshold for Underwater Transient Noise Measurement
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 576~584
This paper proposes a new denosing method using wavelet packet, to reject unknown external noise and white gaussian ambient noise for measuring the transient noise which is one of the important elements for ship classification. The previous denosing method applied the same wavelet threshold at each node of multi-single sensors for rejecting white noise is not adequate in the underwater environment existing lots of external noises. The proposed algorithm of this paper applies a modified soft-threshold to each node according to the discriminated threshold so as to reject unknown external noise and white gaussian ambient noise. It is verified by numerical simulation that the SNR is increased more than 25㏈. And the simulation results are confirmed through sea-trial using multi-single sensors.
A Method on the Improvement of Speaker Enrolling Speed for a Multilayer Perceptron Based Speaker Verification System through Reducing Learning Data
The Journal of the Acoustical Society of Korea, volume 21, issue 6, 2002, Pages 585~591
While the multilayer perceptron(MLP) provides several advantages against the existing pattern recognition methods, it requires relatively long time in learning. This results in prolonging speaker enrollment time with a speaker verification system that uses the MLP as a classifier. This paper proposes a method that shortens the enrollment time through adopting the cohort speakers method used in the existing parametric systems and reducing the number of background speakers required to learn the MLP, and confirms the effect of the method by showing the result of an experiment that applies the method to a continuant and MLP-based speaker verification system.