Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 3, Issue 4 - Dec 2011
Volume 3, Issue 3 - Sep 2011
Volume 3, Issue 2 - Jun 2011
Volume 3, Issue 1 - Mar 2011
Selecting the target year
Strong (stressed) syllables in English and lexical segmentation by Koreans
Kim, Sun-Mi ; Nam, Ki-Chun ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 3~14
It has been posited that in English, native listeners use the Metrical Segmentation Strategy (MSS) for the segmentation of continuous speech. Strong syllables tend to be perceived as potential word onsets for English native speakers, which is due to the high proportion of strong syllables word-initially in the English vocabulary. This study investigates whether Koreans employ the same strategy when segmenting speech input in English. Word-spotting experiments were conducted using vowel-initial and consonant-initial bisyllabic targets embedded in nonsense trisyllables in Experiment 1 and 2, respectively. The effect of strong syllable was significant in the RT (reaction times) analysis but not in the error analysis. In both experiments, Korean listeners detected words more slowly when the word-initial syllable is strong (stressed) than when it is weak (unstressed). However, the error analysis showed that there was no effect of initial stress in Experiment 1 and in the item (F2) analysis in Experiment 2. Only the subject (F1) analysis in Experiment 2 showed that the participants made more errors when the word starts with a strong syllable. These findings suggest that Koran listeners do not use the Metrical Segmentation Strategy for segmenting English speech. They do not treat strong syllables as word beginnings, but rather have difficulties recognizing words when the word starts with a strong syllable. These results are discussed in terms of intonational properties of Korean prosodic phrases which are found to serve as lexical segmentation cues in the Korean language.
A Study on the Correlation between English Word-final Stop and Vowel Duration Produced by Speakers of Korean
Kim, Ji-Eun ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 15~22
The purposes of this study are (1) to investigate the correlation between English word-final stop and the duration of vowels before word-final stop and (2) to suggest a way to detect pronunciation errors and teach the pronunciation of English word-final stops. For these purposes, 18 Korean speakers' production was recorded and analysed using Speech Analyzer and their production was compared with that of native English speakers. In addition, two native English speakers evaluated the subjects' pronunciation. The major findings are the voicing dependent effect of the English vowels produced by native Korean speakers is lower than that of native English speakers; Korean speakers release English word-final stops less than native English speakers; and the pronunciation of English word-final stops and the duration of adjacent vowels are closely related in that the pronunciation score of final stops and the ratio of vowels between the vowels before voiced stops and voiceless stops are correlated. In addition, this study concludes with pedagogical suggestions that may be useful for English pronunciation teaching.
Formant Trajectories of English Vowels Produced by American Children
Yang, Byung-Gon ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 23~34
Many Korean children have difficulty learning English vowels. The gestures inside the oral and pharyngeal cavities are hard to control when they cannot see the gestures and the target vowel system is quite different from that of Korean. This study attempts to collect children's acoustic data of twelve English vowels published by Hillenbrand et al. (1995) online and to examine the acoustic features of English vowels for phoneticians and English teachers. The author used Praat to obtain the data systematically at six equidistant timepoints over the vowel segment avoiding any obvious errors. Results show inherent acoustic properties for vowels from the children's distribution of vowel duration, f0 and intensity values. Second, children's gestures for each vowel coincide with the regression analysis of all formant values at different timepoints regardless of the vocal fold and tract difference. Third, locus points appear higher than those of American males and females. Their gestures along the timepoints display almost similar patterns. From the results the author concludes that vowel formant trajectories provide useful and important information on dynamic articulatory gestures, which may be applicable to Korean children's education and correction of English vowels. Further studies on the developmental study of vowel formants and pitch values are desirable.
Evaluation of Teaching English Intonation through Native Utterances with Exaggerated Intonation
Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 35~43
The purpose of this paper is to evaluate the viability of employing the intonation exaggeration technique proposed in  in teaching English prosody to university students. Fifty-six female university students, twenty-two in a control group and the other thirty-four in an experimental group, participated in a teaching experiment as part of their regular coursework for a five-and-a-half week period. For the study material of the experimental group, a set of utterances was synthesized whose intonation contours had been exaggerated whereas the control group was given the same set without any intonation modification. Recordings from both before and after the teaching experiment were made and one sentence set was chosen for analysis. The parameters analyzed were the pitch range, words containing the highest and lowest pitch points, and the 3-dimensional comparison of the three prosodic features . An AXB and subjective rating test were also performed along with a qualitative screening of the individual intonation contours. The results showed that the experimental group performed slightly better in that their intonation contour was more similar to that of the model native speaker's utterance. This appears to suggest that the intonation exaggeration technique can be employed in teaching English prosody to students.
An Analysis of Formants Extracted from Emotional Speech and Acoustical Implications for the Emotion Recognition System and Speech Recognition System
Yi, So-Pae ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 45~50
Formant structure of speech associated with five different emotions (anger, fear, happiness, neutral, sadness) was analysed. Acoustic separability of vowels (or emotions) associated with a specific emotion (or vowel) was estimated using F-ratio. According to the results, neutral showed the highest separability of vowels followed by anger, happiness, fear, and sadness in descending order. Vowel /A/ showed the highest separability of emotions followed by /U/, /O/, /I/ and /E/ in descending order. The acoustic results were interpreted and explained in the context of previous articulatory and perceptual studies. Suggestions for the performance improvement of an automatic emotion recognition system and automatic speech recognition system were made.
The Interlanguage Speech Intelligibility Benefit for Listeners (ISIB-L): The Case of English Liquids
Lee, Joo-Kyeong ; Xue, Xiaojiao ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 51~65
This study attempts to investigate the interlanguage speech intelligibility benefit for listeners (ISIB-L), examining Chinese talkers' production of English liquids and its perception of native listeners and non-native Chinese and Korean listeners. An Accent Judgment Task was conducted to measure non-native talkers' and listeners' phonological proficiency, and two levels of proficiency groups (high and low) participated in the experiment. The English liquids /l/ and /r/ produced by Chinese talkers were considered in terms of positions (syllable initial and final), contexts (segment, word and sentence) and lexical density (minimal vs. nonminimal pair) to see if these factors play a role in ISIIB-L. Results showed that both matched and mismatched interlanguage speech intelligibility benefit for listeners occurred except for the initial /l/. Non-native Chinese and Korean listeners, though only with high proficiency, were more accurate at identifying initial /r/, final /l/ and final /r/, but initial /l/ was significantly more intelligible to native listeners than non-native listeners. There was evidence of contextual and lexical density effects on ISIB-L. No ISIB-L was demonstrated in sentence context, but both matched and mismatched ISIB-L was observed in word context; this finding held true for only high proficiency listeners. Listeners recognized the targets better in the non-minimal pair (sparse density) environment than the minimal pair (higher density) environment. These findings suggest that ISIB-L for English liquids is influenced by talkers' and listeners' proficiency, syllable position in association with L1 and L2 phonological structure, context, and word neighborhood density.
An Acoustic Analysis of the Aspiration Merger in Korean
Mi, Jang ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 67~75
In Korean, 'Aspiration Merger' is the result of the heteromorphemic sequence of lenis stop and /h/ becoming a single aspirated stop word-medially. However, the contrast between lenis stop-plus-/h/ and an underlying aspirated stop is maintained when they span Phonological Phrase boundaries. By varying the position in the prosodic domain such as APP (Across Phonological Phrase) and PPM (Phonological Phrase Medial) positions, the phonetic properties of the two categories are compared. In the results from noise duration and change of intensity, lenis stop-plus-/h/ show a large difference between the APP and PPM positions. The results from a noise duration comparison show that the two categories are completely neutralized into aspirated stop in the PPM position and the complete neutralization is sensitive to prosodic phrasing.
The Acquisition of External Sandhi in a Second Language: Production of Obstruent Nasalization by Chinese Learners of Korean
Han, Jeong-Im ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 77~83
The present study reports the results of an acoustic study of nasal assimilation at word boundaries in Chinese-Korean interlanguage. Twelve Chinese learners of Korean and four Korean native speakers recorded obstruent#nasal sequences in noun compounds and verb phrases, and their different production patterns were examined in detail. While nasalization of the word-final obstruents occurred only in 11.7% of the obstruent#nasal sequences for the Chinese learners, the Korean native speakers showed complete nasalization of those sequences. However, there was small, but consistent effect of learning on the production of external sandhi in L2, because there were shown to be differences in the rate of nasalization between the two proficiency groups of Chinese participants. On average, the intermediate level learners nasalized the target stops at the rate of 16%, and the beginning level learners showed the 7% nasalization rate. In addition, it was found that the context difference such as noun compounds versus verb phrases does not influence the nasalization pattern across word boundaries.
Two-step a priori SNR Estimation in the Log-mel Domain Considering Phase Information
Lee, Yun-Kyung ; Kwon, Oh-Wook ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 87~94
The decision directed (DD) approach is widely used to determine a priori SNR from noisy speech signals. In conventional speech enhancement systems with a DD approach, a priori SNR is estimated by using only the magnitude components and consequently follows a posteriori SNR with one frame delay. We propose a phase-dependent two-step a priori SNR estimator based on the minimum mean square error (MMSE) in the log-mel spectral domain so that we can consider both magnitude and phase information, and it can overcome the performance degradation caused by one frame delay. From the experimental results, the proposed estimator is shown to improve the output SNR of enhanced speech signals by 2.3 dB compared to the conventional DD approach-based system.
Electroglottographic Measurements of Glottal Function in Voice according to Gender and Age
Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 97~102
Electroglottography (EGG) is a common method for providing non-invasive measurements of glottal activity. EGG has been used in vocal pathology as a clinical or research tool to measure vocal fold contact. This paper presents the results of pitch, jitter, and closed quotient (CQ) measurements in electroglottographic signals of young (mean = 22.7 years) and elderly (mean = 74.3 years) male and female subjects. The sustained corner vowels /i/, /a/, and /u/ were measured at around 70 dB SPL since the most notable among EGG variables is the phonation intensity, which showed positive correlation with closed phase. The aim of this paper was to measure EGG data according to age and gender. In CQ, there was a significant difference between young and elderly female subjects while there was no significant difference between young and elderly male subjects. The mean value for young males was higher than that for elderly males while the mean value for young females was lower than that for elderly females. Thus, it can be said that in mean values, increased CQ was related to decreased age for females, while CQ decreased for males as the speaker's age decreased. Although the laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant increase of CQ in elderly female voices could not be explained in terms of age-related physiological changes. In standard deviation of pitch and jitter, the mean values for young and elderly males were higher than that for young and elderly females. That is, male subjects showed higher in mean values of voice variables than female subjects. This result could be considered as a sign of vocal instability in males. It was suggested that these results may provide powerful insights into the control and regulation of normal phonation and into the detection and characterization of pathology.
Characteristics of Speech Breathing in de novo Idiopathic Parkinson's Disease during Passage Reading Tasks
Kim, Byung-Me ; Sohn, Young-Ho ; Baek, Seung-Jae ; Lee, Phil-Hyu ; Nam, Chung-Mo ; Lee, Ji-Eun ; Choi, Yae-Lin ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 103~110
Idiopathic Parkinson's Disease patients' speech is hypokinetic dysarthria and their speech is possibly the consequence of impaired respiratory support. The purpose of this study was focused on the respiratory characteristics of speech breathing in de novo IPD who were not given prior vocal or anti-Parkinson treatment. A total of 40 subjects participated in the study: 20 de novo IPD patients between the ages of 50 and 80, and 20 normal subjects with similar age, height, and weight matches. Forced Expiratory Vital Capacity (FVC), Forced Expiratory Volume in 1 sec (FEV1) and
as a percentage of FVC (FEV1/FVC) was measured with a PC-based spirometer (Cosmed). In addition, Maximum Phonation Time (MPT), Mean Airflow Rate (MFR), Subglottal Pressure (Psub) and the number of syllables produced per breath were measured with a Phonatory Aerodynamic System (Kay PENTAX). All subjects were asked to read a standardized Korean paragraph and the following measurements were obtained from the task. Results indicated no statistically significant differences in respiratory function (FEV1/FVC%) and aerodynamic function between the two groups, but the number of syllables per breath was significantly lower in the IPD patient group than in the normal group and it could be predicted by FVC and MFR. Therefore, the study shows that the MFR from the lungs during speech in de novo IPD patients is used inefficiently.
A Study on the Relationship between the Self-reported Voice Problems and Voice Disorders in the Adult Populations
Byeon, Hae-Won ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 111~116
The purpose of this study was to analyze the association between self-reported voice problems and voice disorders in the Korean adult population. Data were collected from the 4th Korea National Health and Nutritional Examination Survey (2008) from 3,135 subjects (1,310 men and 1,825 women) aged 19 years and older. Multi-nominal logistic regression analyses were used to examine the association between self-reported voice problems and voice disorders in the Korean adult population. Adjusting for covariates (age, sex, education level, job, smoking, alcohol drinking, thyroid disorders, pain and discomfort during the last 2 weeks), self-reported voice problems included independently associated functional voice disorders (OR=4.70, 95% CI: 3.14-7.03) and organic voice disorders (OR=3.89, 95% CI: 1.57-9.65). The results of the present study verified that self-reported voice problems are valuable indicators for voice disorders. Further research is needed to ascertain the effect of self-reported voice problems on voice disorder in adults.
The Effect of Signal-to-Noise Ratio on Sentence Recognition Performance in Pre-school Age Children with Hearing Impairment
Lee, Mi-Sook ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 117~123
Most individuals with hearing impairment have difficulty in understanding speech in noisy situations. This study was conducted to investigate sentence recognition ability using the Korean Standard-Sentence Lists for Preschoolers (KS-SL-P2) in pre-school age children with cochlear implants and hearing aids. The subjects of this study were 10 pre-school age children with hearing aids, 12 pre-school age children with cochlear implants, and 10 pre-school age children with normal hearing. Three kinds of signal-to-noise (SNR) conditions (+10 dB, +5 dB, 0 dB) were applied. The results for all pre-school age children with cochlear implants and hearing aids presented a significant increase in the score for sentence recognition as SNR increased. The sentence recognition score in speech noise were obtained with the SNR +10 dB. Significant differences existed between groups in terms of their sentence recognition ability, with the cochlear implant group performing better than the hearing aid group. These findings suggest the presence of a sentence recognition test using speech noise is useful for evaluating pre-school age children's listening skill.
Effects of Listener's Experience, Severity of Speaker's Articulation, and Linguistic Cues on Speech Intelligibility in Congenitally Deafened Adults with Cochlear Implants
Lee, Young-Mee ; Sung, Jee-Eun ; Park, Jeong-Mi ; Sim, Hyun-Sub ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 125~134
The current study investigated the effects of experience of deaf speech, severity of speaker's articulation, and linguistic cues on speech intelligibility of congenitally deafened adults with cochlear implants. Speech intelligibility was judged by 28 experienced listeners and 40 inexperienced listeners using a word transcription task. A three-way (2
4) mixed design was used with the experience of deaf speech (experienced/inexperienced listener) as a between-subject factor, the severity of speaker's articulation (mild to moderate/moderate to severe), and linguistic cues (no/phonetic/semantic/combined) as within-subject factors. The dependent measure was the number of correctly transcribed words. Results revealed that three main effects were statistically significant. Experienced listeners showed better performance on the transcription than inexperienced listeners, and listeners were better in transcribing speakers who were mild to moderate than moderate to severe. There were significant differences in speech intelligibility among the four different types of cues, showing that the combined cues provided the greatest enhancement of the intelligibility scores (combined > semantic > phonological > no). Three two-way interactions were statistically significant, indicating that the type of cues and severity of speakers differentiated experienced listeners from inexperienced listeners. The current results suggested that the use of a combination of linguistic cues increased the speech intelligibility of congenitally deafened adults with cochlear implants, and the experience of deaf speech was critical especially in evaluating speech intelligibility of severe speakers compared to that of mild speakers.
The maximum phonation time and temporal aspects in Korean stops in children with spastic cerebral palsy
Jeong, Jin-Ok ; Kim, Deog-Yong ; Sim, Hyun-Sub ; Park, Eun-Sook ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 135~143
This study evaluated the respiratory capacity of spastic cerebral palsy children who were grouped by GMFCS (Gross Motor Function Classification System) levels and identified the acoustic characteristics of three different types of Korean stops (stop consonants) which are needed for the temporal coordination of larynx and supra-larynx, in these children. Thirty-two children with dysarthria due to spastic cerebral palsy were divided into two subgroups: 14 children classified at GMFCS levels I~III were placed in Group 1 and 18 classified at GMFCS levels IV~V were placed in Group 11, and 18 children with normal speech were selected and placed in the control group. /a/ pronged phonation (sustained vowel /a/) and nine Korean VCV syllables were used. Examined acoustic characteristics were maximum phonation time (MPT) and closure duration and aspiration duration. The results were as follows: 1) The MPTs of the cerebral palsy (CP) groups, both Group I and Group II, were significantly shorter than those of the normal group. 2) The closure durations of the two CP groups were longer than those of the normal group for all 9 target syllables. 3) The aspiration durations of the two CP groups were longer than those of the normal group. 4) The closure duration of the normal and CP Group I was significantly different among tense, aspirated, and lax. However, the CP Group II was different from normal. 5) The aspiration duration of the normal and CP Group I was significantly different among aspirated, tense, and lax. However, the CP Group II was different from normal. 6) The place of articulation influenced less than the manner of articulation on closure and aspiration duration.
A study of prosodic features of patients with idiopathic Parkinson's disease
Kang, Young-Ae ; Seong, Cheol-Jae ; Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 3, issue 1, 2011, Pages 145~151
In view of the hypothesis that the effects of Parkinson's disease on voice production can be detected before pharmacological intervention, the prosodic features of patients with idiopathic Parkinson's disease (IPD) and a healthy aging group were diagnostically analyzed with the long term object of establishing, for clinical purposes, early disease-progression biomarkers. Twenty patients (male 8; female 12) with IPD (prior to pharmacological intervention) and a healthy control group of 22 (male 10; female 12) were selected. Ten sentences were recorded with a head-worn microphone. One sentence was chosen for the analysis of this paper. Relevant parameters, i.e. 3-dimensional model (F0, intensity, duration) and pitch and intensity related slopes (maxEnergy, maxF0, meanAbS, semiT, meanEnergy, meanF0), were analyzed by two-group discriminant analysis. The stepwise estimation method of discriminant analysis was performed by gender. The discriminant functions predicted 83.9% of the male test data correctly while the prediction rate was 93.1% for the female group. The results showed that meanF0_slope and semiT_slope were more important parameters than the others for the male group. For the female group, the meanEnergy_slope and maxEnergy_slope were the important ones. These findings indicate that significant parameters are different for the male and female group. Gender lifestyle may be responsible for this difference. Dysprosodic features of IPD show not simultaneously but progressively in terms of F0, intensity and duration.