Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 20, Issue 8 - Nov 2001
Volume 20, Issue 7 - Oct 2001
Volume 20, Issue 6 - Aug 2001
Volume 20, Issue 5 - Jul 2001
Volume 20, Issue 4 - May 2001
Volume 20, Issue 3 - Apr 2001
Volume 20, Issue 2 - Feb 2001
Volume 20, Issue 1 - Jan 2001
Volume 20, Issue 4E - 00 2001
Volume 20, Issue 3E - 00 2001
Volume 20, Issue 2E - 00 2001
Volume 20, Issue 1E - 00 2001
Selecting the target year
Design of Random Number Generator for Simulation of Speech-Waveform Coders
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 3~9
In this paper, a random number generator for simulation of speech-waveform coders was designed. A random number generator having a desired probability density function and a desired power spectral density is discussed and experimental results are presented. The technique is based on Sondhi algorithm which consists of a linear filter and a memoryless nonlinearity. Several methods of obtaining memoryless nonlinearities for some typical continuous distributions are discussed. Sondhi algorithm is analyzed in the time domain using the diagonal expansion of the bivariate Gaussian probability density function. It is shown that the Sondhi algorithm gives satisfactory results when the memoryless nonlinearity is given in an antisymmetric form as in uniform, Cauchy, binary and gamma distribution. It is shown that the Sondhi algorithm does not perform well when the corresponding memoryless nonlinearity cannot be obtained analytically as in Student-t and F distributions, and when the memoryless nonlinearity can not be expressed in an antisymmetric form as in chi-squared and lognormal distributions.
A Comparison of Speech/Music Discrimination Features for Audio Indexing
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 10~15
In this paper, we describe the comparison between the combination of features using a speech and music discrimination, which is classifying between speech and music on audio signals. Audio signals are classified into 3classes (speech, music, speech and music) and 2classes (speech, music). Experiments carried out on three types of feature, Mel-cepstrum, energy, zero-crossings, and try to find a best combination between features to speech and music discrimination. We using a Gaussian Mixture Model (GMM) for discrimination algorithm and combine different features into a single vector prior to modeling the data with a GMM. In 3classes, the best result is achieved using Mel-cepstrum, energy and zero-crossings in a single feature vector (speech: 95.1％, music: 61.9％, speech & music: 55.5％). In 2classes, the best result is achieved using Mel-cepstrum, energy and Mel-cepstrum, energy, zero-crossings in a single feature vector (speech: 98.9％, music: 100％).
A Study on the Use of Speech Recognition Technology for Content-based Video Indexing and Retrieval
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 16~20
An important aspect of video program indexing and retrieval is the ability to segment video program into meaningful segments, in other words, the ability of content-based video program segmentation. In this paper, a new approach using speech recognition technology has been proposed for content-based video program segmentation. This approach uses speech recognition technique to synchronize closed caption with speech signal. Experimental results demonstrate that the proposed scheme is very promising for content-based video program segmentation.
A Study on Noisy Speech Recognition Using a Bayesian Adaptation Method
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 21~26
An expectation-maximization (EM) based Bayesian adaptation method for the mean of noise is proposed for noise-robust speech recognition. In the algorithm, the on-line testing utterances are used for the unsupervised Bayesian adaptation and the prior distribution of the noise mean is estimated using the off-line training data. For the noisy speech modeling, the parallel model combination (PMC) method is employed. The proposed method has shown to be effective compared with the conventional PMC method for the speech recognition experiments in a car-noise condition.
Performance Comparison of Out-Of-Vocabulary Word Rejection Algorithms in Variable Vocabulary Word Recognition
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 27~34
Utterance verification is used in variable vocabulary word recognition to reject the word that does not belong to in-vocabulary word or does not belong to correctly recognized word. Utterance verification is an important technology to design a user-friendly speech recognition system. We propose a new utterance verification algorithm for no-training utterance verification system based on the minimum verification error. First, using PBW (Phonetically Balanced Words) DB (445 words), we create no-training anti-phoneme models which include many PLUs(Phoneme Like Units), so anti-phoneme models have the minimum verification error. Then, for OOV (Out-Of-Vocabulary) rejection, the phoneme-based confidence measure which uses the likelihood between phoneme model (null hypothesis) and anti-phoneme model (alternative hypothesis) is normalized by null hypothesis, so the phoneme-based confidence measure tends to be more robust to OOV rejection. And, the word-based confidence measure which uses the phoneme-based confidence measure has been shown to provide improved detection of near-misses in speech recognition as well as better discrimination between in-vocabularys and OOVs. Using our proposed anti-model and confidence measure, we achieve significant performance improvement; CA (Correctly Accept for In-Vocabulary) is about 89％, and CR (Correctly Reject for OOV) is about 90％, improving about 15-21％ in ERR (Error Reduction Rate).
Automatic Generation of Pronunciation Variants for Korean Continuous Speech Recognition
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 35~43
Many speech recognition systems have used pronunciation lexicon with possible multiple phonetic transcriptions for each word. The pronunciation lexicon is of often manually created. This process requires a lot of time and efforts, and furthermore, it is very difficult to maintain consistency of lexicon. To handle these problems, we present a model based on morphophon-ological analysis for automatically generating Korean pronunciation variants. By analyzing phonological variations frequently found in spoken Korean, we have derived about 700 phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In generating pronunciation variants, morphological analysis is preceded to handle variations of phonological words. According to the morphological category, a set of tables reflecting phonemic context is looked up to generate pronunciation variants. Our experiments show that the proposed model produces mostly correct pronunciation variants of phonological words. Then we estimated how useful the pronunciation lexicon and training phonetic transcription using this proposed systems.
An Analysis of Pulse Length Effect on Underwater Simulated Target Strength Estimated Model
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 44~51
This Paper the practical echo signal synthesis model to predict the target strength and signal shape of a submarine for a valuable tool to active sonar engineer. It is based on UTAHID (Underwater TArget by Highlight Distribution) model which is relocated highlight points along to external hull for aspect angle, and synthesized echo signal by modified grouping highlights to internal scatter cloud. Proposed model is analyzed target strength characteristics on various incident pulse length, and synthesis signal signature, target time spreading loss, echo elongation effect and so on. Thus it can be efficiently used in various real systems related to underwater target echo signal synthesis, that is, active sonar, acoustic countermeasure and surveillance system.
Performance of a Passive Ranging by Using Dual Focused Beamformers
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 52~57
The passive ranging estimation techniques using a focused beamformer have been studied under the water. It is well known that the passive ranging estimation method using a focused beamformer is excellently evaluated. Among these, the passive ranging sonar is known to have a good performance under low signal-to-noise. ratio. However, its performance is degraded in multi-source environments. In this paper, we proposed the technique using dual focused beamformers to estimate the range. And when the sampling frequency is low, it is very difficult to steer the focused beam to the desired direction, as a result of this, the low performance occurs because of a distorted beam pattern. In this paper, we study the effect of sampling rate on passive ranging by using focused beamformer. And we verified the performance of the proposed method via computer simulation.
Constraints for the Design of Room Reverberation Filter by Using 5-DOF Reverberation Model
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 58~65
Recently, a 5-degrees-of-freedom (DOF) reverberation model was proposed as a method of representing subjective perception of reverberation as objective measures. This model approximates sound energy decay curve by five objective measures, widely used in which have been concert hall acoustics. However, it is note worthy that there can be infinite number of impulse responses which correspond to a selected 5-DOF reverberation model. There may exist some filters making very unnatural and unrealistic sound. In this paper, the limitation of the 5-DOF reverberation model when it is used as a filter design criteria is investigated. When a 5-DOF reverberation model is given, additional constraints to get natural reverberation are suggested. This is based on the listening tests for several quite different source sounds.
New Echo Embedding Technique for Robust Audio Watermarking
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 66~76
Conventional echo watermarking techniques often exhibit inherent trade-offs between imperceptibility and robustness. In this paper, a new echo embedding technique is proposed. The proposed method enables one to embed high energy echoes while the host audio quality is not deteriorated, so that it is robust to common signal processing modifications and resistant to tampering. It is possible due to echo kernels that are designed based on psychoacoustic analyses. In addition, we propose some novel techniques to improve robustness against signal processing attacks. Subjective and objective evaluations confirmed that the proposed method could improve the robustness without perceptible distortion.
Time-Scale Modification of Polyphonic Audio Signals Using Sinusoidal Modeling
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 77~85
This paper proposes a method of time-scale modification of polyphonic audio signals based on a sinusoidal model. The signals are modeled with sinusoidal component and noise component. A multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in time-scale modification a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. For extracting sinusoidal components and calculating their parameters matching pursuit algorithm is applied to each analysis frame of subband signal. In accordance with spectrum analysis a psychoacoustic model implementing the effect of frequency masking is incorporated with matching pursuit to provide a resonable stop condition of iteration and reduce the number of sinusoids. The noise component obtained by subtracting the synthesized signal with sinusoidal components from the original signal is modeled by line-segment model of short time spectrum envelope. For various polyphonic audio signals the result of simulation shows suggested sinusoidal modeling can synthesize original signal without loss of perceptual quality and do more robust and high quality time-scale modification for large scale factor because of representing transients without any perceptual loss.
Sound Source Detection Technique Considering the Effects of Source Bandwidth and Measurement Noise Correlation
The Journal of the Acoustical Society of Korea, volume 20, issue 2, 2001, Pages 86~92
Various array processing techniques to identify the noise source position or bearing have been developed. Typical array processing techniques which are based on time delay between received signals at two sensors, are classified as conventional beamforming, correlation function and NAH (Near-Field Acoustic Holography) techniques which have their own characteristics with respect to application field and signal processing method. In this study, correlation function technique which could be applied for broadband noise source detection, is adopted and the effective detection technique is proposed considering the effects of source bandwidth and measurement noise correlation of noise sources. The validity of the Proposed technique is evaluated using the 3-dimensional nonlinear any which does not give 3-dimensional Position or bearing ambiguity