Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 22, Issue 8 - Nov 2003
Volume 22, Issue 7 - Oct 2003
Volume 22, Issue 6 - Aug 2003
Volume 22, Issue 5 - Jul 2003
Volume 22, Issue 4 - May 2003
Volume 22, Issue 3 - Apr 2003
Volume 22, Issue 2 - Feb 2003
Volume 22, Issue 1 - Jan 2003
Volume 22, Issue 1E - 00 2003
Selecting the target year
RPCA-GMM for Speaker Identification
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 519~527
Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.
A Variable Parameter Model based on SSMS for an On-line Speech and Character Combined Recognition System
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 528~538
A SCCRS (Speech and Character Combined Recognition System) is developed for working on mobile devices such as PDA (Personal Digital Assistants). In SCCRS, the feature extraction is separately carried out for speech and for hand-written character, but the recognition is performed in a common engine. The recognition engine employs essentially CHMM (Continuous Hidden Markov Model), which consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. For generating contort independent variable parameter model, we propose the SSMS(Successive State and Mixture Splitting), which gives appropriate numbers of mixture and of states through splitting in mixture domain and in time domain. The recognition results show that the proposed SSMS method can reduce the total number of GOPDD (Gaussian Output Probability Density Distribution) up to 40.0% compared to the conventional method with fixed parameter model, at the same recognition performance in speech recognition system.
Improvement of MLLR Speaker Adaptation Algorithm to Reduce Over-adaptation Using ICA and PCA
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 539~544
This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (Principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary MLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.
Improving Wave Propagation Performance of an Ultrasonic Waveguide for Heat Isolation
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 545~553
This paper is concerned with protecting piezoelectric transducers used in an ultrasonic flowmeter from the high temperature of hot fluid in a pipe by using a waveguide and with improving the propagation of ultrasonic longitudinal vibration in the waveguide. Waveguide material has been chosen for efficient insulation of heat transferred in the waveguide, and the minimum length of the waveguide for protecting piezoelectric transducer has been estimated. Forced response of the longitudinal vibration in a uniform circular rod has been obtained and the length of the waveguide has been selected for maximum amplitude. Longitudinal vibration response of a conically-tapered rod excited at a natural frequency has been obtained to confirm that wave motion is amplified as the cross-sectional size of the waveguide decreases along the axial direction. The fact that dispersion of a pulse wave in a waveguide is reduced as the cross-sectional radius is decreased has been examined theoretically and confirmed experimentally by using a single-rod waveguide. A bundle-type waveguide has proven to be a practical one through the evaluation of the wave propagation performance.
A Study on Estimation of the Sound Speed of Seabed from the Frequency-dependent Interference Pattern of Broadband Signal
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 554~561
Results of the numerical simulation and experimental data analysis for identification of mode cutoff frequency and estimation of sound speed of seabed from the spectrum of acoustic signal received at fixed source-receiver range are presented. Model simulations for Pekeris waveguide show that the frequency-dependent propagation loss and interference pattern are closely related to mode cutoff frequencies and it could be possible to the identify them from the changes of interference pattern. The concept considered at numerical simulations is applied to signals acquired at sea test. Cutoff frequency and sound speed of seabed are estimated from the interference pattern of measured signal. Propagation loss predicted using the estimated sound speed of seabed as model input parameter shows similar estimation result compared to propagation loss derived from measured data.
Matched-target Model Inversion for the Position Estimation of Moving Targets
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 562~572
A matched-target model inversion method was developed for a passive sonar to estimate the position of moving targets. Based on the well known matched-field processing in underwater acoustics, the method finds target position by matching the measured target directions and frequencies with the corresponding values of the proposed target model. For the efficient and accurate estimations, the parameter searching was accomplished using a hybrid optimizing method, which first starts with a global optimization such as generic algorithm or simulated annealing then applies a local optimization of a simple down hill algorithm. The suggested method was testified using simulations for three different moving scenarios. The simulation results showed that the method is robust in convergence, even under the situation of over 5 times standard deviation of Gaussian distribution of measured error, and is practical in calculation time as well.
Speech Enhancement Using Acoustic Channel Estimation
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 573~578
Recently, speaker localizing estimation technique has been rising in teleconference systems. In this paper, it was described to be able to enhance the speech quality through microphone array, and received the only signal of speaker. Unfortunately, as it using estimated the signal in advance, it is not matched in a real acoustic environment so it has poor performance. In this paper is proposed for Adaptive Matched Filter Microphone Array that estimated acoustic room environment from the received the signal and study of the efficiency through simulations.
Prediction Model of Propagation Path Loss of the Free Space in the Sea
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 579~584
All of propagation path loss prediction models, which have been presented up to date, are oかy for ground living space. In reality, sea surface free space is different from ground living space in physical hierarchical structure. If the propagation path prediction model for ground living space is applied to the sea surface free space, propagation path loss will be smaller than actual value, while the maximum service straight line will become shorter. Thus this paper proposed and simulated the propagation path loss prediction model for predicting propagation path loss more accurately in sea surface free space, with its focus on CDMA mobile communication frequency band. Then the simulation results were compared to actual survey to verify its practicality.
Sound Detection Characteristics Using Fabry-Perot Fiber Optic Sensor which Simply Supported in Structure
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 585~591
In this paper, fiber optic sensor using Fabry-Perot interferometer which had benefit of minimize and light-weight was used. The sensor head has 1cm in length, total length of fiber is 9.5 chi and the sensor supported at both ends, simply. To analyze the acoustic characteristic non-directional speaker is used as a sound source. Acoustic applied in lateral direction and detected two signals were compared each other. Below 1㎑ fiber optic sensor has more sensitive than microphone, but in 2㎑ fiber optic sensor has less sensitive than microphone. This characteristic varies to the supporting system of fiber optic sensor. It was confirmed that the Fabry-Perot interferometric sensor detected acoustic signal, effectively. This kind of sensor can be applied to the structural health monitoring field of intellectual structure.
The Design of Object-based 3D Audio Broadcasting System
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 592~602
This paper aims to describe the basic structure of novel object-based 3D audio broadcasting system To overcome current uni-directional audio broadcasting services, the object-based 3D audio broadcasting system is designed for providing the ability to interact with important audio objects as well as realistic 3D effects based on the MPEG-4 standard. The system is composed of 6 sub-modules. The audio input module collects the background sound object, which is recored by 3D microphone, and audio objects, which are recorded by monaural microphone or extracted through source separation method. The sound scene authoring module edits the 3D information of audio objects such as acoustical characteristics, location, directivity and etc. It also defines the final sound scene with a 3D background sound, which is intended to be delievered to a receiving terminal by producer. The encoder module encodes scene descriptors and audio objects for effective transmission. The decoder module extracts scene descriptors and audio objects from decoding received bistreams. The sound scene composition module reconstructs the 3D sound scene with scene descriptors and audio objects. The 3D sound renderer module maximizes the 3D sound effects through adapting the final sound to the listner's acoustical environments. It also receives the user's controls on audio objects and sends them to the scene composition module for changing the sound scene.
Feature Compensation Method Based on Parallel Combined Mixture Model
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 603~611
This paper proposes an effective feature compensation scheme based on speech model for achieving robust speech recognition. Conventional model-based method requires off-line training with noisy speech database and is not suitable for online adaptation. In the proposed scheme, we can relax the off-line training with noisy speech database by employing the parallel model combination technique for estimation of correction factors. Applying the model combination process over to the mixture model alone as opposed to entire HMM makes the online model combination possible. Exploiting the availability of noise model from off-line sources, we accomplish the online adaptation via MAP (Maximum A Posteriori) estimation. In addition, the online channel estimation procedure is induced within the proposed framework. For more efficient implementation, we propose a selective model combination which leads to reduction or the computational complexities. The representative experimental results indicate that the suggested algorithm is effective in realizing robust speech recognition under the combined adverse conditions of additive background noise and channel distortion.
Additive Data Insertion into MP3 Bitstream Using linbits Characteristics
The Journal of the Acoustical Society of Korea, volume 22, issue 7, 2003, Pages 612~621
As the use of MP3 audio compression increased, the demand for the insertion of additive data about copyright or information on music contents has been groved and the related research has been progressed actively. When an additive data is inserted into MP3 bitstream, it should not to happen any distortion of music quality or the change of file size, due to the modification of MP3 bitstream structure. In our study, to make these conditions satisfied, we inserted some additive data to bitstream by modifying some bits of linbits among the quantized integer coefficients having big values. At this time, we consider the characteristics of linbits and their distributions. As a result of subjective sound quality test through MOS test, we confirmed that the quality of MOS 4.6 can be achieved at the data insertion rate of 60 bytes/sec. Using the proposed method, it is possible to effectively insert an additive data such as copyright information or information about media itself, so that various applications like audio database management can be realized.