Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 20, Issue 8 - Nov 2001
Volume 20, Issue 7 - Oct 2001
Volume 20, Issue 6 - Aug 2001
Volume 20, Issue 5 - Jul 2001
Volume 20, Issue 4 - May 2001
Volume 20, Issue 3 - Apr 2001
Volume 20, Issue 2 - Feb 2001
Volume 20, Issue 1 - Jan 2001
Volume 20, Issue 4E - 00 2001
Volume 20, Issue 3E - 00 2001
Volume 20, Issue 2E - 00 2001
Volume 20, Issue 1E - 00 2001
Selecting the target year
An Adaptive Digital Filter for Target Signal Enhancement in Active Sonar
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 3~7
In active sonar system using CW signal, when the noise included reverberation has not the white characteristics, the CFAR detector estimates high threshold. Because of this reason it cannot detect targets and not resolve the closely spaced multiple targets. In order to solve these problems, we propose an adaptive reverberation rejection filter The proposed filter is composed of an adaptive filter and a fixed filter with its coefficients. To study the performance of the proposed adaptive reverberation rejection filter, various experiments have been performed under In moving active sonar environments. As a results, the proposed method has the improved performance than the previous methods.
An Acoustic Noise Cancellation Using Subband Block Conjugate Gradient Algorithm
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 8~14
In this paper, we present a new cost function for subband block adaptive algorithm and block conjugate gradient algorithm for noise cancellation of acoustic signal. For the cost function, we process the subband signals with data blocks for each subbands and recombine it a whole data block. After these process, the cost function has a quadratic form in adaptive filter coefficients, it guarantees the convergence of the suggested block conjugate gradient algorithm. And the block conjugate gradient algorithm which minimizes the suggested cost function has better performance than the case of full-band block conjugate gradient algorithm, the computer simulation results of noise cancellation show the efficiency of the suggested algorithm.
Voice Conversion Using Linear Multivariate Regression Model and LP-PSOLA Synthesis Method
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 15~23
This paper presents a voice conversion technique that modifies the utterance of a source speaker as if it were spoken by a target speaker. Feature parameter conversion methods to perform the transformation of vocal tract and prosodic characteristics between the source and target speakers are described. The transformation of vocal tract characteristics is achieved by modifying the LPC cepstral coefficients using Linear Multivariate Regression (LMR). Prosodic transformation is done by changing the average pitch period between speakers, and it is applied to the residual signal using the LP-PSOLA scheme. Experimental results show that transformed speech by LMR and LP-PSOLA synthesis method contains much characteristics of the target speaker.
Corpus-based Korean Text-to-speech Conversion System
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 24~33
this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.
An Efficient Pitch Estimation for IMBE (Improved Multi-band Excitation) Speech Coder
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 34~41
In an IMBE (Improved Multi-band Excitation) speech coder, initial pitch estimation occupies most of the total computing time for the coder due to complex cost function and exhaustive search over candidate pitches. Future frames in initial pitch estimation cause inevitable time delay. Therefore, it is difficult to implement a real-time coder. Furthermore, unvoiced frames use the unnecessary pitch estimation as in the voiced frames. In this paper, each frame is determined voiced or unvoiced by Dyadic Wavelet Transform (DyWT) and, then, initial pitch estimation is performed only for voiced frame. Therefore different pitch estimation algorithms are employed between voiced and unvoiced frames incurring reduced time delay at transmitter and receiver. Simulation result show that the relative complexity of initial pitch estimation is reduced by 23％, and the processing time decreases down to 1/10 ∼ 1/1l of the IMBE coder while speech quality is almost maintained.
Improvement of VAD Performance for the Reduction of the Bit Rate Under the Noise Environment in the G.723.1
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 42~47
This paper improves the performance of VAD (Voice Activity Detector) in G.723.1 Annex A 6.3kbps/5.3kbps dual rate speech coder, which is developed for Internet Phone and videoconferencing. The VAD decision is based on a three-level energy threshold. We evaluates for processing time, speech quality, and bit rate. The processing time is reduced due to the accuracy of VAD decision on the silence period. On subjective quality test there is almost no difference compared with the G.723.1. In order to measure the bit rate we count the active speech frame (VAD=1) and we can reduce more bit rate as silence periods are shown.
A Novel Speech Enhancement Based on Speech/Noise-dominant Decision in Time-frequency Domain
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 48~55
A novel method to reduce additive non-stationary noise is proposed. The method requires neither the information about noise nor the estimate of the noise statistics from any pause regions. The enhancement is performed on a band-by-band basis for each time frame. Based on both the decision on whether a particular band in a frame is speech or noise dominant and the masking property of the human auditory system, an appropriate amount of noise is reduced using spectral subtraction. The proposed method was tested on various noisy conditions (car noise, Fl6 noise, white Gaussian noise, pink noise, tank noise and babble noise) and on the basis of comparing segmental SNR with spectral subtraction method and visually inspecting the enhanced spectrograms and listening to the enhanced speech, the method was able to effectively reduce various noise while minimizing distortion to speech.
Robust Speech Recognition Using Missing Data Theory
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 56~62
In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12％ in a typical noise situation.
Development of a Lipsync Algorithm Based on Audio-visual Corpus
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 63~69
A corpus-based lip sync algorithm for synthesizing natural face animation is proposed in this paper. To get the lip parameters, some marks were attached some marks to the speaker's face, and the marks' positions were extracted with some Image processing methods. Also, the spoken utterances were labeled with HTK and prosodic information (duration, pitch and intensity) were analyzed. An audio-visual corpus was constructed by combining the speech and image information. The basic unit used in our approach is syllable unit. Based on this Audio-visual corpus, lip information represented by mark's positions was synthesized. That is. the best syllable units are selected from the audio-visual corpus and each visual information of selected syllable units are concatenated. There are two processes to obtain the best units. One is to select the N-best candidates for each syllable. The other is to select the best smooth unit sequences, which is done by Viterbi decoding algorithm. For these process, the two distance proposed between syllable units. They are a phonetic environment distance measure and a prosody distance measure. Computer simulation results showed that our proposed algorithm had good performances. Especially, it was shown that pitch and intensity information is also important as like duration information in lip sync.
A Syllabic Segmentation Method for the Korean Continuous Speech
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 70~75
This paper proposes a syllabic segmentation method for the korean continuous speech. This method are formed three major steps as follows. (1) labeling the vowel, consonants, silence units and forming the Token the sequence of speech data using the segmental parameter in the time domain, pitch, energy, ZCR and PVR. (2) scanning the Token in the structure of korean syllable using the parser designed by the finite state automata, and (3) re-segmenting the syllable parts witch have two or more syllables using the pseudo-syllable nucleus information. Experimental results for the capability evaluation toward the proposed method regarding to the continuous words and sentence units are 73.5％, 85.9％, respectively.
A Review of the Possible Causes of Negative Source Impedance in Fluid Machines
;Keith S. Peat;
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 76~82
Most fluid machines can be considered as periodic noise sources when operated under constant conditions, which allows for a frequency domain representation of the source and the associated acoustic field In the duct. In such a representation, the source is characterized by frequency-dependent values of both strength and impedance. Although knowledge of these values can be gained by either experimentation or by modeling, one-port acoustic characteristics of an in-duct source with high flow velocity, high temperature, and high sound level can be measured only by the multiload method using an overdetermined set of open pipes with different lengths as applied loads. However, the problem is that negative source resistances have been often measured. This paper reviews the possible causes of the problem, with reference to experimental and theoretical results, in an attempt to clarify the issue. A new interpretation is given for the violation of basic assumptions and the defect in the algorithm of multiload method. The major cause and mechanism of the problem is due to the violation of time invariance assumption of the source and the load impedance can seriously affect the final measured result of source impedance.
Optimal Design of a One-chip-type SAW Duplexer Filter Using Micro-strip Line Lumped Elements
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 83~90
Conventional SAW duplexer filters employ a 1/4 wavelength transmission line, which causes difficulty in fabrication of the strip line on the package. Its manufacturing process is also complicated, because it needs integrating process of the separate transmitting filter, receiving filter and isolation circuits. This paper concerns development of a new structure of the duplexer filter that has all the transmitting filter, the receiving filter and the isolation circuit as a one chip device. For composition of the duplexer, we design the component SAW ladder filters and the isolation network consisting of lumped inductor and capacitor elements. Performance of the whole duplexer is optimized by the nonlinear multivariable minimization of a proper target function, and the result is compared with that of commercial filters.
Target Scattering Echo Simulation for Active Sonar System in the Geometric Optics Region
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 91~97
Since the new field information of target signal is important in the development and verification of active sonar system, experimental method and simulation technique are widely used in order to analyze the detail characteristics of target scattered echoes. Therefore, in this paper, the scale target experiment is performed to develope and Improve the target signal simulation model. Since the experimental results show that the specular reflection is the major component among scattering mechanisms, the target signal simulation model based on the Geometric Optics Theory (GOT) is developed. Complex target is separated into simple shapes, known as canonical shape. The contribution from individual canonical shapes are summed with proper phase and amplitude to produce the target strength of the whole complex body. Simulated target signal is compared with the experimental results and discussed.
Improving a Sound Localization Using 1/3-octave Band Pass Filter
The Journal of the Acoustical Society of Korea, volume 20, issue 3, 2001, Pages 98~103
The binaural auditory system of human has the capability of differentiating the direction and distance of sound sources. This feature is well characterised in terms of the inter-aural intensity difference (IID), the inter-aural time difference (ITD) and/or the spectral shape difference (SSD) arising from the acoustic transfer of a sound source to the outer ears. This paper proposes an effective way of extracting the three sound perception factors (IID, ITD, SSD) from the head-related transfer functions (HRTF's) that depends on the direction and distance of the acoustic source from the listener. It includes the estimation method of the equivalent ITD and 1/3-octave band-based IID factors and their usage to locate a sound source in space. Subjective and objective tests were carried out to examine the effectiveness of the proposed methodology and its applicability to real sound systems. Those experimental results are illustrated in this paper.