Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 1, Issue 4 - Dec 2009
Volume 1, Issue 3 - Sep 2009
Volume 1, Issue 2 - Jun 2009
Volume 1, Issue 1 - Mar 2009
Selecting the target year
In Search of Models in Speech Communication Research
Hiroya, Fujisaki ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 9~22
This paper first presents the author's personal view on the importance of modeling in scientific research in general, and then describes two of his works toward modeling certain aspects of human speech communication. The first work is concerned with the physiological and physical mechanisms of controlling the voice fundamental frequency of speech, which is an important parameter for expressing information on tone, accent, and intonation. The second work is concerned with the cognitive processes involved in a discrimination test of speech stimuli, which gives rise to the phenomenon of so-called categorical perception. They are meant to illustrate the power of models based on deep understanding and precise formulation of the functions of the mechanisms/processes that underlie observed phenomena. Finally, it also presents the author's view on some models that are yet to be developed.
The Role of Prosody in Dialect Synthesis and Authentication
Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 25~31
The purpose of this paper is to examine the viability of synthesizing Masan dialect with Seoul dialect and to examine the role of prosody in the authentication of the synthesized Masan dialect. The synthesis was performed by transferring one or more of the prosodic features of the Masan utterance onto the Seoul utterance. The hypothesis is that, given an utterance composed of the phonemes shared by both dialects, as more prosodic features of the Masan utterance are transferred onto the Seoul utterance, the Seoul utterance will be identified as more authentic Masan utterance. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The synthesized Masan utterances were evaluated by thirteen native speakers of Masan dialect. The result showed that the fundamental frequency contour and the segmental durations had main effects on the perceptual shift from Seoul to Masan dialect.
Statistical Patterns in Consonant Cluster Simplification in Seoul Korean: Within-dialect Interspeaker and Intraspeaker Variation
Cho, Tae-Hong ; Kim, Sa-Hyang ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 33~40
This study examines how young speakers of Seoul Korean produce tri-consonantal clusters /1kt/ and /1pt/ as in palk-ta ('to be bright') and palp-ta ('to step on'). Production data were collected from 20 speakers of Seoul Korean. The results of narrow transcription of the data showed that simplification is not obligatory as some speakers often preserve all three consonants. When simplified, there was a clear asymmetry between /1kt/ and /1pt/. Speakers showed no clear preference for either C1 preservation (C1=/1/) or C2 preservation (C2=/k/ in /1kt/ and /p/ in /1pt/) in production of /1kt/, but in production of /1pt/, strong preference was found for C1-preserved to C2-preserved variant. When compared with production data in Cho (1999), simplification patterns appear to have changed over the past 10 years, in a direction to preserve the first member of the cluster (/1/) more often, especially with /1kt/. There was no substantial between-item variation, indicating that simplification patterns are not lexically specified. Finally, the results suggest that the process of tri-consonantal simplification has not been fully phonologized in the grammar of the language as evident in substantial inter- and intra-speaker variation.
Vowel Duration and the Feature of the Following Consonant
Yun, Il-Sung ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 41~46
Duration of the preceding vowel is known to vary as a function of the (phonological or phonetic) voicing feature of the following consonant. This study raises a question against this general belief. A spectrographic experiment using 14 Korean obstruents (three sets of stops: /p, p',
/, /t, t',
/, /k, k',
/; one set of affricates: /c, c',
/; one set of fricatives: /s, s'/) reveals that (1) phonetic voicing in the intervocalic lax consonants /p, t, k, c, s/ has nothing to do with the duration of the preceding vowel; (2) vowel length is significantly shorter before tense consonants than before their lax cognates while tense consonants are significantly longer than their lax cognates. Importantly, Korean obstruents are all phonologically voiceless. Therefore, the voicing feature is rejected as the cause of preconsonantal vowel shortening in Korean both phonetically and phonologically. It is suggested that the temporal phenomenon is basically a kind of physiologically-motivated coarticulation though it is restricted by the phonology of a given language. To meet this assumption, the feature voicing should be replaced with the feature tenseness as the cause, which will enable us to explain the temporal phenomenon on the same basis irrespective of language.
Experimental Phonetic Study of Yanjin Sino-Korean Dialect
Kim, Hyun-Gi ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 47~52
The speech of Sino-Korean has been evolved from geopolitical cause since 1945. The aim of this study is to collect Yanji dialectal speech and to compare with South Korean dialectal speech. Twenty Yanbian university students participated as informants. Acoustic speech informations are analyzed using the Multi-Speech Windows Vista version. Dialectal speech characteristics of Yanji sino-Korean showed posterior vowel /
/, neutralization of mid-vowel /o/ between /o/ and /Ɔ/. Lenis stop sound showed the tendency of glottalization based on VOT value. Sibilant sound contains aspiration following constriction and lateral /l/ realized the approximant /r/.
Utterance Verification using Phone-Level Log-Likelihood Ratio Patterns in Word Spotting Systems
Kim, Chong-Hyon ; Kwon, Suk-Bong ; Kim, Hoi-Rin ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 55~62
This paper proposes an improved method to verify a keyword segment that results from a word spotting system. First a baseline word spotting system is implemented. In order to improve performance of the word spotting systems, we use a two-pass structure which consists of a word spotting system and an utterance verification system. Using the basic likelihood ratio test (LRT) based utterance verification system to verify the keywords, there have been certain problems which lead to performance degradation. So, we propose a method which uses phone-level log-likelihood ratios (PLLR) patterns in computing confidence measures for each keyword. The proposed method generates weights according to the PLLR patterns and assigns different weights to each phone in the process of generating confidence measures for the keywords. This proposed method has shown to be more appropriate to word spotting systems and we can achieve improvement in final word spotting accuracy.
HMM-based Music Identification System for Copyright Protection
Kim, Hee-Dong ; Kim, Do-Hyun ; Kim, Ji-Hwan ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 63~67
In this paper, in order to protect music copyrights, we propose a music identification system which is scalable to the number of pieces of registered music and robust to signal-level variations of registered music. For its implementation, we define the new concepts of 'music word' and 'music phoneme' as recognition units to construct 'music acoustic models'. Then, with these concepts, we apply the HMM-based framework used in continuous speech recognition to identify the music. Each music file is transformed to a sequence of 39-dimensional vectors. This sequence of vectors is represented as ordered states with Gaussian mixtures. These ordered states are trained using Baum-Welch re-estimation method. Music files with a suspicious copyright are also transformed to a sequence of vectors. Then, the most probable music file is identified using Viterbi algorithm through the music identification network. We implemented a music identification system for 1,000 MP3 music files and tested this system with variations in terms of MP3 bit rate and music speed rate. Our proposed music identification system demonstrates robust performance to signal variations. In addition, scalability of this system is independent of the number of registered music files, since our system is based on HMM method.
Global Covariance based Principal Component Analysis for Speaker Identification
Seo, Chang-Woo ; Lim, Young-Hwan ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 69~73
This paper proposes an efficient global covariance-based principal component analysis (GCPCA) for speaker identification. Principal component analysis (PCA) is a feature extraction method which reduces the dimension of the feature vectors and the correlation among the feature vectors by projecting the original feature space into a small subspace through a transformation. However, it requires a larger amount of training data when performing PCA to find the eigenvalue and eigenvector matrix using the full covariance matrix by each speaker. The proposed method first calculates the global covariance matrix using training data of all speakers. It then finds the eigenvalue matrix and the corresponding eigenvector matrix from the global covariance matrix. Compared to conventional PCA and Gaussian mixture model (GMM) methods, the proposed method shows better performance while requiring less storage space and complexity in speaker identification.
The Effects of Reading Pronunciation Training of Korean Phonological Process Words for Chinese Learners
Lee, Yu-Ra ; Kim, Soo-Jin ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 77~86
This study observes how the combined intervention program effects on the acquisition reading pronunciation of Korean phonological process words and the acquisition aspects of each phonological process rules to four Korean learners whose first language is Chinese. The training program is the combination of multisensory Auditory, Visual and Kinethetic (AVK) approach, wholistic approach, and metalinguistic approach. The training purpose is to evaluate how accurately they read the words of the phonological process which have fortisization, nasalization, lateralization, intermediate sound /ㅅ/ (/
/). We access how they read the untrained words which include the four factors above. The intervention effects are analyzed by the multiple probe across subjects design. The results indicate that the combined phonological process rule explanation and the words activity intervention affects the four Chinese subjects in every type of word. The implications of the study are these: First, it suggests the effect of Korean pronunciation intervention in a concrete way. Second, it offers how to evaluate the phonological process and how to train people who are learning Korean language.
The Prosodic Characteristics of Utterance of Sentences with Ambiguous Word in Patients with Neurogenic Communication Disorders
Lee, Myoung-Soon ; Kwon, Do-Ha ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 87~91
The purpose of this study was to examine the characteristics of prosody of utterance of ambiguous sentences in patients with neurogenic communication disorders. Ambiguous words on which prosody may have an impact were used to investigate this matter. The characteristics of tone duration, pitch and intensity were analyzed to examine the characteristics of prosody in patients with lesions in the left or right hemisphere and normal controls. The whole process was recorded using a Praat 4.3.14 and for statistical analyses, two-way Anova and multiple comparative analyses were carried out using SPSS10.0 for Windows. The conclusions of this study are as follows: The length of vowel in homograph in Korean was different depending on the meaning and the duration of vowel was the longest in patients with lesions in the left hemisphere. There was agreed that they had problem of timing of prosody(Danly & Shapiro, 1982). On the other hand, there found that patients with lesions in the right hemisphere had deficiency of changeability in pitch. Among various acoustic parameters, this study focused on the duration which are closely related to suprasegmental characteristics of prosody. More acoustic parameters should be taken into account in future studies.
Characteristics of Connected Speech in ADSD
Hwang, Yon-Shin ; Kim, Jae-Ok ; Choi, Hong-Shik ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 93~98
The aim of this study was to investigate voice characteristics of adductive spasmodic dysphonia(ADSD) by measuring electroglottal and acoustic examination at the sentence level. The clinical records of 86 ADSD female patients (age group of
years) and the control records of 86 normal females (age group of
years) were recorded by speech studio(Laryngograph Ltd., UK). An independent t-test was used to compare ADSD and normal group. Results were as follows. (1) Fundamental frequency(
) was significantly decreased in ADSD compared with normal group. (2) Irregularity of frequency and closed quotient(CQ) was significantly increased in ADSD compared with normal group. (3) Voiceless duration increased and voiced duration was significantly decreased in ADSD compared with normal group. (4) Fricative duration was increased in ADSD compared with normal group but it wasn't significant. In conclusion, strained, tight and choked voice shows an increase of CQ, tremor voice shows an increase of irregularity of frequency and less feminine voice shows decrease of
. Increase of voiceless duration and fricative duration and decrease of voiced duration related with diminution speech intelligibility.
A Comparison of the Voice Differences of Patients with Idiopathic Parkinson's Disease and a Normal-Aging Group
Kang, Young-Ae ; Kim, Yong-Duk ; Ban, Jae-Chun ; Seong, Cheol-Jae ;
Phonetics and Speech Sciences, volume 1, issue 1, 2009, Pages 99~107
In view of the hypothesis that the effects of Parkinson disease on voice production can be detected before pharmacological intervention, the voice differences of patients with Idiopathic Parkinson's disease and a healthy aging group were diagnostically analyzed with the long term object of establishing, for clinical purposes, early disease-progression biomarkers. Fifteen patients with Idopathic Parkinson's disease (prior to pharmacological intervention) and a healthy control group of 15 were selected and every voice was recorded three times using praat (ver. 5022) with a headset mic. Relevant parameters - acoustic measure of /a/ phonation, F0 related parameters, MPT related parameters, articulatory ratio, VOT - were then analyzed by MANOVA. Significant differences were found in the F0 related (low F0, high F0, F0 range) and MPT related parameters. There were also significant differences in acoustic measurements (intensity, shimmer, HNR, jitter), AMR (/
/) and VOT (/ta/), The findings indicated that the voice production of patients with Idiopathic Parkinson's disease have normal pitch but bad quality. In particular, with slow articulatory ratios and VOT values, the tongue tip functioning of patients was lower than for the healthy group.