Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 1, Issue 4 - Dec 2009
Volume 1, Issue 3 - Sep 2009
Volume 1, Issue 2 - Jun 2009
Volume 1, Issue 1 - Mar 2009
Selecting the target year
How Different are Learner Speech and Loanword Phonology?
Kim, Jong-Mi ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 3~18
Do loanword properties emerge in the acquisition of a foreign language and if so, how? Classic studies in adult language learning assumed loanword properties that range from near-ceiling to near-chance level of appearance depending on speech proficiency. The present research argues that such variations reflect different phonological types, rather than speech proficiency. To investigate the difference between learner speech and loanword phonology, the current research analyzes the speech data from five different proficiency levels of 92 Korean speakers who read 19 pairs of English words and sentences that contained loanwords. The experimental method is primarily an acoustical one, by which the phonological cause in the loanwords (e.g., the insertion of [
] at the end of the word stamp) would be attested to appear in learner speech, in comparison with native speech from 11 English speakers and 11 Korean speakers. The data investigated for the research are of segment deletion, insertion, substitution, and alternation in both learner speech and the native speech. The results indicate that learner speech does not present the loanword properties in many cases, but depends on the types of phonological causes. The relatively easy acquisition of target pronunciation is evidenced in the cases of segment deletion, insertion, substitution, and alternation, except when the loanword property involves the successful command of the target phonology such as the de-aspiration of [p] in apple. Such a case of difficult learning draws a sharp distinction from the cases of easy learning in the development of learner speech, particularly beyond the intermediate level of proficiency. Overall, learner speech departs from loanword phonology and develops toward the native speech value, depending on phonological contrasts in the native and foreign languages.
The Duration Feature of Acoustic Signals and Korean Speakers' Perception of English Stops
Kim, Mun-Hyong ; Jun, Jong-Sup ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 19~28
This paper reports experimental findings about the duration feature of the acoustic components of English stops in Korean speakers' voicing perception. In our experiment, 35 participants discriminated between recorded stimuli and digitally transformed stimuli with different duration features from the original stimuli. 72 sets of paired stimuli are generated to test the effects of the duration feature in various phonetic contexts. The result of our experiment is a complicated cross-tabulation with 540 cells defined by five categorical independent variables plus one response variable. To find a meaningful generalization out of this complex frequency table, we ran logit log-linear regression analyses. Surprisingly, we have found that there is no single effect of the duration feature in all phonetic contexts on Korean speakers' perception of the voicing contrasts of English stops. Instead, the logit log-linear analyses reveal that there are interaction effects among phonetic contexts (=C), the places of articulation of stops (=P), and the voicing contrast (=V), and among duration (=T), phonetic contexts, and the places of articulation. To put it in mathematical terms, the distribution of the data can be explained by a simple log-linear equation, logF=
The Role of Contrast in Prosodically Induced Acoustic Variation
Choi, Han-Sook ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 29~37
This paper presents results from speech production experiments on English, Korean, and Hindi that compare variation in the acoustic expression of dissimilar phonological laryngeal contrast in stops conditioned by prosodic prominence. Target stops are analyzed from utterance-initial, -medial, and -final positions, with a variation in contrastive focal accent, from the speech data by six male American English speakers, five male Seoul Korean speakers, and five male Delhi Hindi speakers. The results show that prosodic prominence conditions enhanced distinctiveness between contrastive segments in the three languages. The manner in which prosodic prominence and prosodic phrase structure is marked at the level of segmental variation is, however, found to be language-specific to some extent. In addition, a correlation between the size of the phonological inventory and the corresponding acoustic variation was found but the linear correlation was not strongly supported with the findings in the present study.
Voice Onset Time of Korean Stops as a Function of Speaking Rate
Oh, Eun-Jin ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 39~48
Previous studies on the effects of speaking rate on voice onset time (VOT) of stops in English, French, Icelandic, and Thai indicate that speaking rate asymmetrically affects VOT values. That is, pre-voiced and long-lag stops vary due to the rate factor more than short-lag stops do. One suggested explanation for this asymmetry is that it is due to the necessity of maintaining phonetic contrasts among the stop categories. Since pre-voiced and long-lag stops represent the ends of the VOT scale, they encompass broad swathes of that range and consequently allow for large variations. On the other hand, the VOT variations of short-lag stops may result in overlap with the VOTs of long-lag stops. This study aimed to explore the effects of speaking rate on the VOTs of Korean stops and see whether Korean fortis and lenis stops are limited in the degrees of variation as a function of rates due to the existence of stops with larger VOT values, lenis and aspirated stops respectively. Conversely, aspirated stops were expected to show more variation since there are no other categories with longer VOTs. Fortis, lenis, and aspirated stops in /CVn/ words (C = bilabial or velar stop, V = /i/ or /a/) were examined in isolation, and at normal and fast rates in a carrier sentence. Speaking rates were controlled by alternating words or sentences on a computer screen at intervals of two seconds for the isolation- and normal-rate conditions and one second for the fast-rate condition. This study found that while the VOTs of fortis stops did not change significantly, those of lenis and aspirated stops showed considerable changes as a function of speaking rates. Also, overlap between lenis and aspirated stops occurred considerably at all speaking rates. These phenomena were interpreted to relate to the fact that VOT contrasts between lenis and aspirated stops in Korean are currently being collapsed. Large variations of lenis stops as a function of rates seem to occur due to a weak motivation to limit the degree of variations for the purpose of maintaining phonetic contrasts. The significant overlap between lenis and aspirated stops at all rates was interpreted to occur because the VOT merger between the two categories became considerably fixed. Also the percentage of correctly-classified VOTs by optimal-boundary values between lenis and aspirated stops turned out to be lower than in previously-studied languages. This was interpreted to be further evidence that VOTs are losing their role in contrasting the two stop categories in Korean.
Rhythmic Differences between Spontaneous and Read Speech of English
Kim, Sul-Ki ; Jang, Tae-Yeoub ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 49~55
This study investigates whether rhythm metrics can be used to capture the rhythmic differences between spontaneous and read English speech. Transcription of spontaneous speech tokens extracted from a corpus is read by three English native speakers to generate the corresponding read speech tokens. Two data sets are compared in terms of seven rhythm measures that are suggested by previous studies. Results show that there is a significant difference in the values of vowel-based metrics (VarcoV and nPVI-V) between spontaneous and read speech. This manifests a greater variability in vocalic intervals in spontaneous speech than in read speech. The current study is especially meaningful as it demonstrates a way in which speech styles can be differentiated and parameterized in numerical terms.
Korean Speakers' Perception of Hindi Stop Consonants
Ahn, Hyun-Kee ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 57~63
The two specific research questions pursued in this paper are: (i) how Korean speakers perceive Hindi stops in terms of the three laryngeal categories of Korean stops; (ii) how well Korean speakers do with an ABX perception test that utilizes a total of 52 Hindi minimal pairs where all sounds are identical except for the laryngeal features of a stop in each word. A total of 45 university students participated in this experiment. The results showed that (i) Koreans tended to perceive Hindi voiceless unaspirated stops as Korean fortis ones, voiceless aspirated stops as aspirated ones, voiced stops as lenis ones, and breathy stops as aspirated ones, and (ii) Koreans had difficulty in distinguishing between voiceless aspirated and breathy stops in Hindi.
Formant Trajectories of English Vowels Produced by American Males
Yang, Byung-Gon ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 65~72
Formant values are the most important acoustic correlates of English vowels. Classical studies on English vowels reported the first three formant values measured at a single timepoint on a sustained vowel segment. However, many recent studies revealed that partial onset or offset segments with information of dynamic spectral changes may contribute to the exact identification of English vowels with an accuracy almost comparable to that by the whole vowel segment or word. The purpose of this study was to examine formant trajectories of nine English vowels collected by Hillenbrand et al.(1995). Acoustic analysis was systematically made by a Praat script at six equidistant timepoints over the vowel segment. Results showed that the first formant trajectories played an important role in distinguishing each vowel within the front- or back-vowel groups. The second formant trajectories of the back vowels varied more drastically than those of the front vowels. The third formant value was similar except the high vowel /i/. From the vowel space on F1 by F2 axes, the formant trajectories of each vowel clearly showed a transition toward the locus of the following consonant /d/. Other acoustic data revealed that there were some vowel inherent duration or pitch values. From this study we can conclude that the dynamic spectral changes are very important in specifying acoustic characteristics of the English vowels. Further studies on vowels and diphthongs in different contexts are desirable.
Synthesis and Evaluation of Prosodically Exaggerated Utterances
Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 73~85
This paper introduces the technique of synthesizing and evaluating human utterances with exaggerated or atypical prosody. Prosody exaggeration can be implemented by manipulating either the fundamental frequency (F0) contour, the segmental durations, or the intensity contour of an utterance. Of these three prosodic elements, two or more can be exaggerated at the same time. The algorithms of synthesis and evaluation were suggested. Learner utterances exaggerated in each of the three prosodic features were evaluated with respect to their original native versions in terms of the differences in their F0 contours, the segmental durations, and the intensity contours. The measure of differences was the Euclidean distance metric between the matching points in their F0 and intensity contours. The measure was calculated after the exaggerated learner utterances were aligned by the segments and rendered identical to their native version in terms of their segmental durations. For the evaluation of the segmental durations, no prior modifications were made in durations and the same measure was used. The results from the pilot experiment suggest the viability of this measure in the evaluation of learner utterances with atypical prosody with respect to their native versions.
A Study of the English Pronunciation of Korean Exchange Students
Park, Hee-Suk ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 87~93
The purpose of this experimental study is to investigate and compare the vowel lengths of English diphthongs and low vowels among native-English-speaking Americans and Korean college exchange students. To do this eight words and sixteen sentences were uttered and recorded by nine subjects, five Korean subjects and four American subjects. Results showed that the vowel lengths of English low vowels between American subjects and Korean subjects were different, which may lead to foreign accent of Korean speakers. Comparing the average length of English low vowels of Korean subjects with those of American subjects, we can see that American subjects tend to pronounce the English low vowels longer than Korean subjects do. In the pronunciation of diphthongs /eI/ and /ou/, Korean subjects pronounced longer than American subjects did. However, in the pronunciation of diphthongs /au/, /aI/, and /ɔI/, American subjects pronounced longer than Korean subjects did.
Probabilistic Target Speech Detection and Its Application to Multi-Input-Based Speech Enhancement
Lee, Young-Jae ; Kim, Su-Hwan ; Han, Seung-Ho ; Han, Min-Soo ; Kim, Young-Il ; Jeong, Sang-Bae ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 95~102
In this paper, an efficient target speech detection algorithm is proposed for the performance improvement of multi-input speech enhancement. Using the normalized cross correlation value between two selected channels, the proposed algorithm estimates the probabilistic distribution function of the value from the pure noise interval. Then, log-likelihoods are calculated with the function and the normalized cross correlation value to detect the target speech interval precisely. The detection results are applied to the generalized sidelobe canceller-based algorithm. Experimental results show that the proposed algorithm significantly improves the speech recognition performance and the signal-to-noise ratios.
Building English-to-Korean Transliteration Dictionary Based on Pronouncing Dictionary
Lee, Do-Gil ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 103~108
This paper proposes a method for building a transliteration dictionary, which is based on pronouncing information extracted from two kinds of existing dictionaries. Also, it proposes a method for transforming the pronouncing information into Korean translitered words. To express the pronouncing information, we define Phoman code system. In order to avoid phonetic estimation process of English words which is the most important problem, the proposed method uses the pronouncing information extracted from the existing dictionaries. Therefore, unlike previous approaches, the proposed method does not need any incomplete phonetic estimation process so that it can produce accurate transliteration results. The proposed method has been fully implemented.
Exclusion of Non-similar Candidates using Positional Accuracy based on Levenstein Distance from N-best Recognition Results of Isolated Word Recognition
Yun, Young-Sun ; Kang, Jeom-Ja ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 109~115
Many isolated word recognition systems may generate non-similar words for recognition candidates because they use only acoustic information. In this paper, we investigate several techniques which can exclude non-similar words from N-best candidate words by applying Levenstein distance measure. At first, word distance method based on phone and syllable distances are considered. These methods use just Levenstein distance on phones or double Levenstein distance algorithm on syllables of candidates. Next, word similarity approaches are presented that they use characters' position information of word candidates. Each character's position is labeled to inserted, deleted, and correct position after alignment between source and target string. The word similarities are obtained from characters' positional probabilities which mean the frequency ratio of the same characters' observations on the position. From experimental results, we can find that the proposed methods are effective for removing non-similar words without loss of system performance from the N-best recognition candidates of the systems.
Acoustic Characteristics of Korean Stops in Korean Child-directed Speech
Kim, Min-Jung ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 117~122
A variety of cross-linguistic studies has documented that the acoustic properties of speech addressed to young children include exaggeration of pitch contours and acoustically salient features of phonetic units. It has been suggested that phonetic modifications of child-directed speech facilitate young children's learning of speech sounds by providing detailed phonetic information about the target word. While there are several studies reporting vowel modifications in speech to infants (i.e., hyper-articulated vowels), there has been little research about consonant modifications in speech to young children (except for VOT). The present study examines acoustic properties of Korean stops in Korean mothers' speech to their children (seven children aged 27 to 38 months). Korean tense, lax, and aspirated stops are all voiceless in word-initial position, and are perceptually differentiated by several acoustic parameters including VOT,
of the following vowel, and the amplitude difference of the first and second harmonics at the voice onset of the following vowel. This study compares values of these parameters in Korean child-directed speech to those in adult-directed speech from same speakers. Conclusions focus on the acoustic properties of Korean stops in child-directed speech and how they are modified to help Korean young children learn the three-way phonetic contrast.
Nasalance and Intensity of Profound Hearing-Impaired Adults
Choi, Eun-Ah ; Park, Han-Sang ; Seong, Cheol-Jae ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 123~132
This study investigates the differences in nasalance across handicap, gender, and vowels and the correlation between nasal energy and oral energy both of which are used to compute nasalance. For this study, 20 hearing-impaired adults and 20 normal hearing adults as a control group were asked to read 7 Korean vowels (/
, o, u, ɯ, i,
/). Subjects' readings were recorded by NasalView and analyzed by Praat. Results showed that the hearing impaired group (HL) has a significantly higher nasalance than the normal hearing group(NH), and that there was a significant positive correlation between nasal energy and oral energy. A higher nasalance of the hearing impaired group seems to be due to an improper velopharyngeal control which is caused by lack of a proper auditory feedback.
A Comparison of Resonance Parameters before and after Pharyngeal Flap Surgery:A Preliminary Report
Kang, Young-Ae ; Kang, Nak-Heon ; Lee, Tae-Yong ; Seong, Cheol-Jae ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 133~144
Pharyngeal flap surgery changes the space and shape of the oral cavity and vocal tract, and these changing conditions bring resonance change. The purpose of this study was to determine the most reliable and valuable parameters for evaluating hypernasality to distinguish two patients before and after pharyngeal flap surgery. Each patient was asked to clearly speak the vowels /a/, /i/, /u/, /e/, /o/ for voice recording. There were nine parameters: Formant (F1, F2, F3), Bandwidth (BW1, BW2, BW3), LPC energy slope (
|A2-A1/F2-F1|), and Band Energy (0-500 Hz, 500-1000 Hz) by each vowel. From the results of discrimination analyses on acoustic parameters, the vowels /a/, /e/ appeared to be insignificant but vowels /i/, /u/, /o/ appeared to be efficient in the separation. A 95%, 100%, and 100% recognition score could be reached when vowels /i/, /u/, and /o/ were analyzed. The results showed that F2, BW3, and LPC slope are more important parameters than the others. Finally, there is a relation between perceptual evaluation score and LPC energy slope of acoustic parameters by least square slope.
The Acoustic Characteristics of Transgenders' Voice
Yoo, Jae-Yeon ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 145~151
The purpose of this study was to investigate the acoustic characteristics of transgenders' voice. This study obtained acoustic measurements (F0, jitter, shimmer, and NHR) of 45 subjects (15 male adults, 15 female adults, and 15 transgender(male to female)) and compared acoustic measurements of the vowel /a/ produced by 3 groups. The MDVP was used to measure the acoustic parameters. A one-way ANOVA was used for statistical analysis. The results were as follow: Firstly, there was a significant difference among the 3 groups in F0. And F0 of transgenders was higher than that of male group and lower than that of female group. Secondly, there was a significant difference between male and transgender group in jitter and shimmer so that the transgender had a tendency to phonate roughly than male.
Resonance Changes in the External Auditory Canal Associated with the Ear Canal Volume
Choi, Ah-Hyun ; Lee, Mi-So ; Choi, Ah-Reum ; Heo, Seung-Deok ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 151~154
The external ear generates resonance gain because of anatomical characteristics. The ear canal resonance is influenced by the length and volume of the ear canal, the pinna, the concha cavity, the body trunk, and the speed of sound wave. This study is focus on the influence of the volume of ear canal. 17-healthy-adult (32 ears) were participated. They did not have any medical and ear disease history. The maximum resonance frequency of the ear canal was 2675 (
) Hz at azimuth
and 2784 (
) Hz at azimuth
. The resonance gain was 18.1 (
) dB at azimuth
and 17.9 (
) dB at azimuth
, respectively. The ear canal volume was 0.78 (
) cc and 1.32 (
) cc including static compliance. The ear canal resonance was changed depending on the ear canal volume. It was also statistically correlated at azimuth
(p=0.013), respectively. The resonance gain was not correlated with the ear canal volume. The change of resonance frequency according to the ear canal volume will be useful information in the field of audiological rehabilitation especially for hearing aids fitting. In addition, we expected this study can provide the basic information for the study of the external ear resonance characteristics.
Automated Speech Analysis Applied to Sasang Constitution Classification
Kang, Jae-Hwan ; Yoo, Jong-Hyang ; Lee, Hae-Jung ; Kim, Jong-Yeol ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 155~163
This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 473 speakers and extracted a total of 144 speech features from the speech data consisting of five sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents binary negative decisions. In conclusion, 55.7% of the speech data were diagnosed by this system, of which 72.8% were correct negative decisions.
A Study on the Speech Rates of 5- to 7-Year-old Children Depending upon their Tasks
Shin, Myung-Sun ; Ahn, Jong-Bok ;
Phonetics and Speech Sciences, volume 1, issue 3, 2009, Pages 163~168
This study investigated the determination of speech rates, words per minute (WPM) and syllables per minute (SPM), of
year-old normal children to understand if there are any differences in the rates according to the children's age and sex. All participants were required to conduct story retelling tasks (SRT) and picture description tasks (PDT). In SRT, there was a significant difference between the groups of 5 year-old and 7 year-old children on WPM. However, there was no significant difference between the groups of ages regarding SPM. In addition, there was no significant difference between the groups according to sex on WPM and SPM. In PDT, there was no significant difference between the groups according to their ages and sex on WPM and SPM. The current research found that the speech rates of the preschool children might be somewhat different in their utterance abilities according to their age, but there was no obvious difference according to their sex. The findings can advance development of a clinical tool to screen children with fluency disorders and to determine the steps in establishing speech rates of children in the language development period.