Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
The Journal of the Acoustical Society of Korea
Journal Basic Information
Journal DOI :
The Acoustical Society of Korea
Editor in Chief :
Volume & Issues
Volume 28, Issue 8 - Nov 2009
Volume 28, Issue 7 - Oct 2009
Volume 28, Issue 6 - Aug 2009
Volume 28, Issue 5 - Jul 2009
Volume 28, Issue 4 - May 2009
Volume 28, Issue 3 - Apr 2009
Volume 28, Issue 2 - Feb 2009
Volume 28, Issue 1 - Jan 2009
Selecting the target year
A Relevant Distortion Criterion for Interpolation of the Head-Related Transfer Functions
Lee, Ki-Seung ; Lee, Seok-Pil ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 85~95
In the binaural synthesis environments, wide varieties of the head-related transfer functions (HRTFs) that have measured with a various direction would be desirable to obtain the accurate and various spatial sound images. To reduce the size' of HRTFs, interpolation has been often employed, where the HRTF for any direction is obtained by a limited number of the representative HRTFs. In this paper, we study on the distortion measures for interpolation, which has an important role in interpolation. With lhe various objective distortion metrics, the differences between the interpolated and the measured HRTFs were computed. These were then compared and analyzed with the results from the listening tests. From the results, the objective distortion measures were selected, that reflected the perceptual differences in spatial sound image. This measure was employed in a practical interpolation technique. We applied the proposed method to four kinds of an HRTF set, measured from three human heads and one mannequin. As a result, the Mel-frequency cepstral distortion was shown to be a good predictor for the differences in spatial sound location, when three HRTF measured from human, and the time-domain signal to distortion ratio revealed good prediction results for the entire four HRTF sets.
Correlation Between the Headphone's Acoustical Characteristics and Subjective Preferences
Lee, Ki-Seung ; Lee, Seok-Pil ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 96~106
In this paper, correlation between the headphone's acoustical characteristics and the subjective preferences is analyzed, and a possibility of predicting the subjective preferences using the acoustical characteristics is investigated, The headphone's acoustical characteristics include the total harmonic distortions, the variation of the frequency response which were measured by separate channel and the inter-aural correlation coefficients, Those characteristics were measured in a noise-free anechoic chamber, using a head and torso simulator, The subjective preferences were scored in terms of loudness, clearness, spaciousness, fullness and overall impression, In the subjective listening test, 12 subjects were participated who have plentiful listening experiences, The programs include 5 kinds of musics; korean popular song, pop song, light music, male-voice and classic, The 8 models of the headphones were employed, including 4 closed-type circumaural headphones, 2 open-type supraaural headphones and 2 intra-concha headphones, A significant test was carred on the results from the subjective test, using a two-way ANOVA test, The correlation coefficients between the acoustical parameters and the subjective preferences were computed, Experimental results showed that the variation of the magnitude of frequency response measured from a right channel revealed higher correlation with the subjective preferences. Whereas the inter-aural correlation coefficients have very low correlation coefficients.
An Analysis on Audio Quality Deterioration of Acoustic OFDM
Cho, Ki-Ho ; Yu, Hwan-Sik ; Chang, Jun-Hyuck ; Kim, Nam-Soo ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 107~111
Acoustic OFDM is used for audible frequency band acoustic communication which employs loudspeaker as transmitter and microphone as the receiver antenna. Since acoustic OFDM can transmit about 1 kbps using 1600 Hz band. acoustic OFDM signal is inserted into the audio signal like music or speech, However. audio quality deteriorates definitely during the inserting process. This paper introduces a reason for audio quality deterioration and discuss how to reduce this phenomenon.
Salient Region Detection Algorithm for Music Video Browsing
Kim, Hyoung-Gook ; Shin, Dong ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 112~118
This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.
Implementation of Non-Stringed Guitar Based on Physical Modeling Synthesis
Kang, Myeong-Su ; Cho, Sang-Jin ; Chong, Ui-Pil ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 119~126
This paper describes the non-stringed guitar composed of laser strings, frets, sound synthesis algorithm and a processor. The laser strings that can depict stroke and playing arpeggios comprise laser modules and photo diodes. Frets are implemented by voltage divider. The guitar body does not need to implement physically because commuted waveguide synthesis is used. The proposed frets enable; players to represent all of chords by the chord glove as well as guitar solo. Sliding, hammering-on and pulling-off sounds are synthesized by using parameters from the voltage divider. Because the pitch shifting corresponds to the time-varying propagation speed in the digital waveguide model, the proposed model can synthesize vibrato as well. After transformation of signals from the laser strings and frets into parameters for synthesis algorithm, the digital signal processor, TMS320F2812, performs the real-time synthesis algorithm and communicates with the DAC. The demonstration movieclip available via the Internet shows one to play a song, 'Arirang', synthesized by proposed algorithm and interfaces in real-time. Consequently, we can conclude that the proposed synthesis algorithm is efficient in guitar solo and there is no problem to play the non-stringed guitar in real-time.
Speech Intelligibility Analysis on the Laser Detected Sound of the Glass Windows
Kim, Seock-Hyun ; Lee, Hyun-Woo ; Kim, Hee-Dong ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 127~134
In this study, possibility of the laser eavesdropping is investigated on the window glasses with various thicknesses, Glass windows are excited by maximum length sequency (MLS) signal and the vibration sound is detected by a laser doppler vibrometer. From the detected sound, speech intelligibility is objectively estimated. Speech transmission index (STI), which is based on the modulation transfer function (MTF). is calculated for the estimation. Finally, disturbing wave effect on the speech intelligibility is analysed by using an outside speaker and a window shaker attached on the glass window. The purpose of the study is to estimate the possibility of remote eavesdropping by the laser sensor and to evaluate the performance of the homemade window shaker to protect from the remote eavesdropping.
Synthesis and Classification of Active Sonar Target Signal Using Highlight Model
Kim, Tae-Hwan ; Park, Jeong-Hyun ; Nam, Jong-Geun ; Lee, Su-Hyung ; Bae, Keun-Sung ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 135~140
In this paper, we synthesized active sonar target signals based on highlights model, and then carried out target classification using the synthesized signals. If the target aspect angle is changed, the different signals are synthesized. To know the result, two different experiments are done. First, The classification results with respect to each aspect angle are shown. Second, the results in two group in aspect angle are acquired. Time domain feature extraction is done using matched filter and envelope detection. It shows the pattern of each highlights. Artificial neural networks and multi-class SVM are used for classifying target signals.
The Algorithm Improved the Speed for the 3-Dimensional CT Video Composition
Jeong, Chan-Woong ; Park, Jin-Woo ; Jun, Kyu-Suk ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 141~147
This paper presents a new fast algorithm, rotation-based method (RBM), for the reconstruction of 3 dimensional image for cone beam computerized tomography (CB CT) system. The system used cone beam has less exposure time of radioactivity than fan beam. The Three-Pass Shear Matrices (TPSM) is applied, that has less transcendental functions than the one-pass shear method to decrease a time of calculations in the computer. To evaluate the quality of the 3-D images and the time for the reconstruction of the 3-D images, another 3-D images were reconstructed by the radon transform under the same condition. For the quality of the 3-D images, the images by radon transform was shown little good quality than REM. But for the time for the reconstruction of the 3-D images REM algorithm was 35 times faster than radon transform. This algorithm offered
frames a second. It meant that it will be possible to reconstruct the 3-D dynamic images in real time.
Enhancement of SBR for Speech Signal Using Adaptive Noise Floor Level
Lee, Se-Won ; Oh, Seoung-Jun ; Ahn, Chang-Beom ; Lee, Tae-Jin ; Kang, Kyoung-Ok ; Park, Ho-Chong ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 148~154
In audio coding, SBR technology synthesizes the high-bands using patched time-frequency information from low-bands and the correction parameters, Since SBR transmits only correction parameters for high-bands, it provides a low-rate coding of high-bands, and is used as a core module of MPEG-4 HE-AAC, SBR was originally designed for audio signal and its performance for speech signal tends to decrease, and the major reason is an excessive noise floor in high-bands which is caused by incorrect tonality computation, In this paper, a new method to determine noise floor level in an adaptive fashion according to the speech characteristics is proposed in order to solve the problem of SBR for speech signal, The proposed method maintains the compatibility with the standard SBR, and the subjective performance evaluation shows that the proposed method improves the SBR performance especially for male speech signal compared with the standard SBR.
A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System
Na, Deok-Su ; Min, So-Yeon ; Lee, Jong-Seok ; Bae, Myung-Jin ;
The Journal of the Acoustical Society of Korea, volume 28, issue 2, 2009, Pages 155~163
In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.