Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 21, Issue 8 - Nov 2002
Volume 21, Issue 7 - Oct 2002
Volume 21, Issue 6 - Aug 2002
Volume 21, Issue 5 - Jul 2002
Volume 21, Issue 4 - May 2002
Volume 21, Issue 3 - Apr 2002
Volume 21, Issue 2 - Feb 2002
Volume 21, Issue 1 - Jan 2002
Volume 21, Issue 1E - 00 2002
Selecting the target year
A Study on Seasonal Variation of Propagation Loss in the Yellow Sea Using Broadband Source of Low Frequency
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 213~220
The sound wave in the sea propagates under the effect of water depth, sound velocity structure, sea surface and bottom roughness, and bottom sediment distribution. In particular the sound velocity structure in shallow water varies with time and space, an? the sediment distributes very variedly with place. In order to investigate the seasonal variation of low-frequency sound propagation in the Yellow Sea, the propagation experiments were conducted along the same track in the middle part of the Yellow Sea at various seasons of spring. summer, and autumn. In this paper we consider the measurement results on the propagation loss with the sound velocity structure, and investigate the seasonal variation of the propagation loss. As a result, the propagation losses measured in summer were larger than the losses in spring and autumn. And the propagation losses measured in autumn were smaller than the losses in spring. The seasonal change of the propagation loss increased with the rise of sound frequency and the propagation range.
Decision of Error Tolerance in Sonar Array by the Monte-Carlo Method
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 221~229
In thin paper, error tolerance of each array element which satisfies error tolerance of beam pattern is decided by using the Monte-Carlo method. Conventional deterministic method decides the error tolerance of each element from the acceptance pattern by testing all cases, but this method is not suitable for the analysis of large number of array elements because the computation resources increase exponentially as the number of array elements increases. To alleviate this problem, we applied new algorithm which reduces the increment of calculation time increased by the number of the array elements. We have validates the determined error tolerance region through several simulation.
A Study on the Performance Improvement of Uplink in Multi-rate Mobile Communication System Using Adaptive Parallel Interference Canceller
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 230~236
A study on architecture of new parallel interference canceller which can be applied for reverse link of next generation mobile communication system supporting multi-rate is performed on this paper. The proposed method adopts new algorithm which can be applied for multi-rate system to reduce multiple access interference (MAI) which cause performance degradation of CDMA system and limit of channel capacity. The proposed system is evaluated by simulation results under various conditions. As a result, performance enhancement is achieved compared to existing conventional interference cancellers. Although the amount of calculation is increased, we can find that the performance is improved generally.
Optimization of MPEG-4 AAC Codec on PDA
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 237~244
In this paper we mention the optimization of MPEG-4 VM (Moving Picture Expert Group-4 Verification Model) GA (General Audio) AAC (Advanced Audio Coding) encoder and the design of the decoder for PDA (Personal Digital Assistant) using MPEG-4 VM source. We profiled the VMC source and several optimization methods have applied to those selected functions from the profiling. Intel Pentium III 600 MHz PC, which uses windows 98 as OS, takes about 20 times of encoding time compared to input sample running time, with additional options, and about 10 times without any option. Decoding time on PDA was over 35 seconds for the 17 seconds input sample. After optimization, the encoding time has reduced to 50% and the real time decoding has achieved on PDA.
A Study on the Transaural Filter Implementation for 5.1 Channel Speaker System
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 245~255
This thesis deals a method to deliver more realistic sound by cancelling the cross-talk which is inherent to the 5.1 channel speaker system. The acoustical model for cross-talk cancellation is the free field model. This model minimizes distortion of sound. I used the bark scale sound quality compensation which based on psycho-acoustic. For the surround channels, band-limited sound quality compensation is performed in the frequency domain. I also performed the sound quality assessment test on the traditional 2 channel stereo and 5.1 channel system. This test is performed in the test chamber which satisfies the ITU-R specifications. I uses the IACC (Inter-Aural Cross-Correlation) to determine the preferences of the amateur and the golden ear experts to asses the trans-aural filter. According to the result from the proposed method, I got more the 38 dB separation rates with the Dolby standard speaker array. The results on the diffusion by the subjective test with the experts shows 0.4 point increased then before.
Design of the Extended Kalman Filter for Frequency-amplitude Tracker
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 256~263
In this study, the tracking of the temporal variation of the frequency and the amplitude in the presence of additive white Gaussian noise is considered using the Extended Kalman filter (EKF. The EKF has many applications and it has been applied to the problem of tracking the time-variable frequency. However the existing EKF frequency trackers could was driven in the small time-variable amplitude or required the additional amplitude tracker in the large time-variable amplitude. In this study, the EKF frequency-amplitude tracker, which could track both frequency and amplitude simultaneously from the measured signal in the relatively large time-variable amplitude environment, is proposed for improving the performance of the time-variable frequency tracking and its performance is verified by the simulation and the experimental work.
Performance Improvement of Stereophonic Acoustic Echo Canceler Using Non-linear Pre-processing Filter
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 264~273
Adaptive filters cannot exactly estimate the echo path of the receiving room because of the cross-correlation of stereo signals. In this paper, a new pre-processing method reducing the cross-correlation without degradation of stereophony is proposed to enhance the performance of stereophonic acoustic echo canceller. To reduce the cross-correlation, absolutes of two orthogonal signals derived from each channel signals are added to original channel signals. Assuming that the power of each channel signal is larger than that of the cross-correlation, the computation of pre-processing can be reduced. As results of simulations, it is shown that the performance of stereo acoustic echo canceller with the proposed pre-processing method is better than that of conventional ones.
Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 274~284
Word segmentation errors occurring in text preprocessing often insert incorrect words into recognition vocabulary and cause poor language models for Korean large vocabulary continuous speech recognition. We propose an automatic word segmentation algorithm using Markov chains and syllable-based n-gram language models in order to correct word segmentation error in teat corpora. We assume that a sentence is generated from a Markov chain. Spaces and non-space characters are generated on self-transitions and other transitions of the Markov chain, respectively Then word segmentation of the sentence is obtained by finding the maximum likelihood path using syllable n-gram scores. In experimental results, the algorithm showed 91.58% word accuracy and 96.69% syllable accuracy for word segmentation of 254 sentence newspaper columns without any spaces. The algorithm improved the word accuracy from 91.00% to 96.27% for word segmentation correction at line breaks and yielded the decomposition accuracy of 96.22% for compound-noun decomposition.
Context-adaptive Smoothing for Speech Synthesis
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 285~292
One of the problems that should be solved in Text-To-Speech (TTS) is discontinuities at unit-joining points. To cope with this problem, a smoothing method using a low-pass filter is employed in this paper, In the proposed soothing method, a filter coefficient that controls the amount of smoothing is determined according to contort information to be synthesized. This method efficiently reduces both discontinuities at unit-joining points and artifacts caused by undesired smoothing. The amount of smoothing is determined with discontinuities around unit-joins points in the current synthesized speech and discontinuities predicted from context. The discontinuity predictor is implemented by CART that has context feature variables. To evaluate the performance of the proposed method, a corpus-based concatenative TTS was used as a baseline system. More than 6075 of listeners realized that the quality of the synthesized speech through the proposed smoothing is superior to that of non-smoothing synthesized speech in both naturalness and intelligibility,
Optimized Time Scale Modification (TSM) System Integrating G,729 Speech Decoder and Dual SOLA Algorithm
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 293~303
This paper implements optimized Time Scale Modification (TSM) system using ITU G.729 speech decoder and Dual SOLA algorithm. The proposed system assume 8 Kz sampling rate, 80 samples/frame input speech from the ITU G.729 speech Decoder and the TSM (Time Scale Modification) feature of Dual SOLA produces the high quality output speech that was slow-down or speed up as a user's choice. Especially, the proposed Optimized Dual SOLA base on various simulations and theoretical analysis, and the additional interpolation procedure of the speech makes it possible to setup high performance integrated TSM system at the maximum time scale modification rate. The system performance is analyzed and verified with various input speech and playback speed.
A Study on the Frequency Scaling Methods Using LSP Parameters Distribution Characteristics
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 304~309
We propose the computation reduction method of real root method that is mainly used in the CELP (Code Excited Linear Prediction) vocoder. The real root method is that if polynomial equations have the real roots, we are able to find those and transform them into LSP. However, this method takes much time to compute, because the root searching is processed sequentially in frequency region. In this paper, to reduce the computation time of real root, we compare the real root method with two methods. In first method, we use the mal scale of searching frequency region that is linear below 1 kHz and logarithmic above. In second method, The searching frequency region and searching interval are ordered by each coefficient's distribution. In order to compare real root method with proposed methods, we measured the following two. First, we compared the position of transformed LSP (Line Spectrum Pairs) parameters in the proposed methods with these of real root method. Second, we measured how long computation time is reduced. The experimental results of both methods that the searching time was reduced by about 47% in average without the change of LSP parameters.
A Study on the Technique of Spectrum Flattening for Improved Pitch Detection
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 310~314
The exact pitch (fundamental frequency) extraction is important in speech signal processing like speech recognition, speech analysis and synthesis. However the exact pitch extraction from speech signal is very difficult due to the effect of formant and transitional amplitude. So in this paper, the pitch is detected after the elimination of formant ingredients by flattening the spectrum in frequency region. The effect of the transition and change of phoneme is low in frequency region. In this paper we proposed the new flattening method of log spectrum and the performance was compared with LPC method and Cepstrum method. The results show the proposed method is better than conventional method.
A Temporal Decomposition Method Based on a Rate-distortion Criterion
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 315~322
In this paper, a new temporal decomposition method is proposed. which takes into consideration not only spectral distortion but also bit rates. The interpolation functions, which are one of necessary parameters for temporal decomposition, are obtained from the training speech corpus. Since the interval between the two targets uniquely defines the interpolation function, the interpolation can be represented without additional information. The locations of the targets are determined by minimizing the bit rates while the maximum spectral distortion maintains below a given threshold. The proposed method has been applied to compressing the LSP coefficients which are widely used as a spectral parameter. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/Frame.
Frame Reliability Weighting for Robust Speech Recognition
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 323~329
This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.
Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition
The Journal of the Acoustical Society of Korea, volume 21, issue 3, 2002, Pages 330~338
Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.