Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 21, Issue 8 - Nov 2002
Volume 21, Issue 7 - Oct 2002
Volume 21, Issue 6 - Aug 2002
Volume 21, Issue 5 - Jul 2002
Volume 21, Issue 4 - May 2002
Volume 21, Issue 3 - Apr 2002
Volume 21, Issue 2 - Feb 2002
Volume 21, Issue 1 - Jan 2002
Volume 21, Issue 1E - 00 2002
Selecting the target year
Separation of Single Channel Mixture Using Time-domain Basis Functions
장길진 ; 오영환 ;
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 146~146
We present a new technique for achieving source separation when given only a single channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of time-domain basis functions that encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis functions. For each time point we infer the source parameters and their contribution factors. This inference is possible due to the prior knowledge of the basis functions and the associated coefficient densities. A flexible model for density estimation allows accurate modeling of the observation, and our experimental results exhibit a high level of separation performance for simulated mixtures as well as real environment recordings employing mixtures of two different sources. We show separation results of two music signals as well as the separation of two voice signals.
Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification
장길진 ; 오영환 ;
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 156~156
We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.
Hybrid Type Vibration Power Flow Analysis Method Using SEA Parameters
박영호 ; 홍석윤 ;
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 164~164
This paper proposes a hybrid method for vibration analysis in the medium to high frequency ranges using Power Flow Analysis (PFA) algorithm and Statistical Energy Analysis (SEA) coupling concepts. The main part of the developed method is the application of coupling loss factor (CLF) suggested in SEA to the power transmission, reflection coefficients in PI' A boundary conditions. The developed hybrid method shows very promising results with regard to the applications for the various damping loss factors in wide frequency ranges. And also this paper presents the applied results of Power Flow Finite Element Method (PFFEM) by forming the new joint element matrix with CLF to analyze the various plate structures in shape. The analytical results of automobile, complex plate structures show good agreement with those of PFFEM using the PFA coefficients.
Dialog System Using Multimedia Techniques for the Elderly with Dementia
김성일 ; 정현열 ;
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 170~170
The goal of the present research is to improve a quality of life of the elderly with a dementia. In this paper, it is realized by developing the dialog system that is controlled by three kinds of modules such as speech recognition engine, graphical agent, or database classified by a nursing schedule. The system was evaluated in an actual environment of a nursing facility by introducing it to an older male patient with dementia. The comparison study between dialog system and professional caregivers was then carried out at nursing home for 5 days in each case. The evaluation results showed that the dialog system was more responsive in catering to needs of dementia patient than professional caregivers. Moreover, the proposed system led the patient to talk more than caregivers did.
Adaptive Wavelet Denoising For Speech Rocognition in Car Interior Noise
김이재 ; 양성일 ; Kwon, Y. ; Jarng, Soon S. ;
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 178~178
In this paper, we propose an adaptive wavelet method for car interior noise cancellation. For this purpose, we use a node dependent threshold which minimizes the Bayesian risk. We propose a noise estimation method based on spectral entropy using histogram of intensity and a candidate best basis instead of Donoho's best bases. And we modify the hard threshold function. Experimental results show that the proposed algorithm is more efficient, especially to heavy noisy signal than conventional one.
Observation of Strong In-plane End Vibration of a Cylindrical Shell
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 183~183
In this paper, the strong in-plane vibration has been experimentally observed at the end of a finite cylindrical shell. The strong in-plane vibration was generated by the evanescent wave field, which was excited along about half the length of the shell. The evanescent waves were generated due to mode conversion of elastic waves at the ends of the cylindrical shells.
Outdoor Noise Propagation: Geometry Based Algorithm
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 339~438
This paper presents a method to simulate noise propagation by a computer for outdoor environment. Sound propagated in 3 dimensional space generates reflected waves whenever it hits boundary surfaces. If a receiver is away from a sound source, it receives multiple sound waves which are reflected from various boundary surfaces in space. The algorithm being developed in this paper is based on a ray sound theory. If we get 3 dimensional geometry input as well as sound sources, we can compute sound effects all over the boundary surfaces. In this paper, we present two approaches to compute sound: the first approach, called forward tracing, traces sounds forwards from sound sources. while the second approach, called geometry based computation, computes possible propagation routes between sources and receivers. We compare two approaches and suggest the geometry based sound computation for outdoor simulation. Also this approach is very efficient in the sense we can save computational time compared to the forward sound tracing. Sound due to impulse-response is governed by physical environments. When a sound source waveform and numerically computed impulse in time is convoluted, the result generates a synthetic sound. This technique can be easily generalized to synthesize realistic stereo sounds for virtual reality, while the simulation result is visualized using VRML.
Dynamic Redundant Audio Transmission for Packet Loss Recovery in VoIP Systems
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 349~360
In ITU H.323 teleconference system, the RTP/RTCP protocol is offered to transfer real-time multimedia stream. Both sender and receiver hate experience in packet loss and jitter which result from network congestion over Internet. Audio quality over Internet depends on the number of lost packets and on jitter between successive packets. The goal of our study is to improve the speech quality over Internet by checking the packet loss characteristics of the network and adopting the but for control management mechanism at the receiver. We suggest a dynamic redundant audio transmission mechanism which examines the packet loss rate and uses the feedback information through RTCP.
Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 361~373
In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.
Speech Enhancement Based on Voice/Unvoice Classification
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 374~379
In this paper, a nobel method to reduce noise using voice/unvoice classification is proposed. Voice and unvoice are an important feature of speech and the proposed method processes noisy speech differently for each voice/unvoice part. Speech is classified into voice/unvoice using zero-crossing rate and energy, and a modified speech/noise dominant-decision is proposed based on voice/unvoice classification. The proposed method was tested on conditions of white noise and airplane noise, and on the basis of comparing segmental SNR with the existing method and listening to the enhanced speech, a performance of the proposed method was superior to that of the existing method.
Improvement of Keyword Spotting Performance Using Normalized Confidence Measure
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 380~386
Conventional post-processing as like confidence measure (CM) proposed by Rahim calculates phones' CM using the likelihood between phoneme model and anti-model, and then word's CM is obtained by averaging phone-level CMs. In conventional method, CMs of some specific keywords are tory low and they are usually rejected. The reason is that statistics of phone-level CMs are not consistent. In other words, phone-level CMs have different probability density functions (pdf) for each phone, especially sri-phone. To overcome this problem, in this paper, we propose normalized confidence measure. Our approach is to transform CM pdf of each tri-phone to the same pdf under the assumption that CM pdfs are Gaussian. For evaluating our method we use common keyword spotting system. In that system context-dependent HMM models are used for modeling keyword utterance and contort-independent HMM models are applied to non-keyword utterance. The experiment results show that the proposed NCM reduced FAR (false alarm rate) from 0.44 to 0.33 FA/KW/HR (false alarm/keyword/hour) when MDR is about 8%. It achieves 25% improvement of FAR.
Speech Enhancement Based on Mixture Hidden Filter Model (HFM) Under Nonstationary Noise
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 387~393
The enhancement technique of noise signal using mixture HFM (Midden Filter Model) are proposed. Given the parameters of the clean signal and noise, noisy signal is modeled by a linear state-space model with Markov switching parameters. Estimation of state vector is required for estimating original signal. The estimation procedure is based on mixture interacting multiple model (MIMM) and the estimator of speech is given by the weighted sum of parallel Kalman filters operating interactively. Simulation results showed that the proposed method offers performance gains relative to the previous results with slightly increased complexity.
Transmission of Channel Error Information over Voice Packet
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 394~400
In digital speech communications, the quality of service can be increased by speech coding scheme that is adaptive to the error rate of voice packet transmission. However, current communication protocol in cellular and internet communications does not provide the function that transmits the channel error information. To solute this problem, in this paper, new method for real-time transmission of channel error information is proposed, where channel error information is embedded in voice packet. The proposed method utilizes the pulse positions of codevector in ACELP speech codec, which results in little degradation in speech quality and low false alarm rate. The simulations with various speech data show that the proposed method meets the requirement in speech quality, detection rate, and false alarm rate.
Performance Improvement of Connected Digit Recognition by Considering Phonemic Variations in Korean Digit and Speaking Styles
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 401~406
Each Korean digit is composed of only a syllable, so recognizers as well as Korean often have difficulty in recognizing it. When digit strings are pronounced, the original pronunciation of each digit is largely changed due to the co-articulation effect. In addition to these problems, the distortion caused by various channels and noises degrades the recognition performance of Korean connected digit string. This paper dealt with some techniques to improve recognition performance of it, which include defining a set of PLUs by considering phonemic variations in Korean digit and constructing a recognizer to handle speakers various speaking styles. In the speaker-independent connected digit recognition experiments using telephone speech, the proposed techniques with 1-Gaussian/state gave string accuracy of 83.2%, i. e., 7.2% error rate reduction relative to baseline system. With 11-Gaussians/state, we achieved the highest string accuracy of 91.8%, i. e., 4.7% error rate reduction.
Automatic Generation of Concatenate Morphemes for Korean LVCSR
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 407~414
In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.
Study on Linear Parameters Identification of Loudspeaker
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 415~420
To identify linear parameters of loudspeaker, two methods are presented. Linear parameter identification methods by box method and added mass method are developed. These methods are compared with conventional software to show advantages and disadvantages of developed methods. Results identified by conventional method of Laud software are significantly different from developed methods. But two methods developed show 4% error in Thiele-Small (TS) parameters identified. In box method, it shows that TS parameters are dependent on the amount of porous material.
Measurements of High-frequency Sea Surface Backscattering Signals
The Journal of the Acoustical Society of Korea, volume 21, issue 4, 2002, Pages 421~429
Sea surface backscattering signal measurements were conducted in the shallow waters off the east coast of Korea to study the acoustic wave scattering from the sea surface. The grazing angles of wave range from 20° to 40° with a frequency of 60 kHz. The wind speed and surface roughness of the experiment area were 3 m/os and below 1 m, respectively. The measured acoustic backscattering strengths greatly exceed the composite roughness predictions at low grazing angles. To account for this discrepancy, the scattering strengths due to a near-surface bubble layer were considered. The prediction with bubble contribution was found to be in good agreement with the experimental measurement.