Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 2, Issue 4 - Dec 2010
Volume 2, Issue 3 - Sep 2010
Volume 2, Issue 2 - Jun 2010
Volume 2, Issue 1 - Mar 2010
Selecting the target year
Generational Differences in the Perception of Korean Stops
Kang, Kyoung-Ho ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 3~10
The proposal that a sound change is occurring in Korean stops was evidenced in this study through identification experiments on Korean stops. Perceptual weight of acoustic correlates to Korean stop manner contrast [VOT (Voice Onset Time), H1-H2 (amplitude difference between the first and second harmonics), and F0 (Fundamental frequency)] was examined with re-synthesized /
/, /ta/, and /
/ syllables for younger and older Seoul speakers of Korean. For the identification of the aspirated and lenis stops, F0 cue weight relative to VOT was greater for the younger listeners than the older listeners. For H1-H2 cue weight, the two listener groups were more or less the same. These findings were parallel to the production differences found in the earlier work of the author. Combined with production differences, these perception differences between younger and older generations of Seoul speakers suggested that there are generational differences in the phonetic targets of Korean aspirated and lenis stops and such differences are realized in the perception of the stops.
A Study on the Declination According to Length of Utterance, Clause Boundary and Focus in Korean
Kwak, Sook-Young ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 11~22
The present study attempts to investigate declination in Korean and its relevant aspects to the length of utterance, the clause boundary, and focus. More specifically, I examine the relation of declination with the length of utterance, the declination reset at the clause boundary, and the effect of focus on declination. Results showed that the length of utterance had no relation with the first and last pitch values of the utterance but that they were consistent regardless of the length of utterance. However, the declination slope changed to be relatively gentle from the fourth accentual phrase to the end of the whole intonational phrase. There was a reset of declination in such a way that the first pitch in the second phrase was always lower than that of the first phrase, but the first pitch in the third phrase was not always lower than that of the second phrase when the whole utterance was composed of three phrases. Finally, the pitch values of the focusing words decreased as their position went back in a sentence. One declination line was formed in the case of focused utterance, but in the case of an utterance that contained a clause boundary, a new declination line was formed at the start of each new clause. These findings can be applied to developing a Korean speech synthesizer that contains natural prosody; they can be also utilized for teaching Korean prosody.
A Study of the Effects of Vowels on the Production of English Labials /p, b, f, v/ by Korean Learners of English
Koo, Hee-San ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 23~27
The purpose of this study was to find how English vowels /a, e, i, o, u/ affect the production of English labials /p, b, f, v/ by Korean learners of English. Sixty syllables were composed by five vowels and four labials in the syllable types CV, VC, and VCV. The nonsense syllables were produced three times by nine subjects. The major results show that (1) in inter-vocalic position, the subjects had higher scores in producing /v/ composed with /a, e, o/ and /u/, while subjects had lower scores in producing /p/ with /i/ and /o/, (2) in post-vocalic position, the subjects had higher scores in producing /v/ and /f/ with /a, e/, and /o/, while subjects had lower scores in producing /b/ with /e/ and /i/, and (3) in pre-vocalic position, the subjects had higher scores in producing /v/ with /e, o, u/ and /f/ with /u/, while subjects had lower scores in producing /b/ with /e/, /i/ and /u/. The results suggest that on the whole, Korean learners of English have much difficulty in producing /p/ with /i/ in inter-vocalic condition and /b/ with /i, /e/ in pre-vocalic position.
Korean Speakers' Pronunciation and Pronunciation Training of English Stops
Kim, Ji-Eun ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 29~36
The purposes of this study are (1) to see if language transfer effect is found in Korean speakers' pronunciation of English stops and to correct them and (2) to investigate the effectiveness of mimicry training and Speech Analyzer training on subjects' pronunciation of English stops. For these purposes, 20 Korean speakers' VOT values of English stops were measured using Speech Analyzer and their post-training production was compared with their pre-training production. The result shows that Korean speakers have no difficulty in correcting pronunciation errors of English voiceless stops and voiced stops and such a result indicates that language transfer effect is not noticed as expected. In addition, the result of pronunciation training shows that the training using Speech Analyzer is more effective than mimicry training.
An Acoustic Study of Korean and English Voiceless Sibilant Fricatives
Sung, Eun-Kyung ; Cho, Yun-Jeong ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 37~46
This study investigates acoustic characteristics of English and Korean voiceless sibilant fricatives as they appear before the three vowels, /i/, /
/ and /u/. Three measurements - duration, center of gravity and major spectral peak - are employed to compare acoustic properties and vowel effect for each fricative sound. This study also investigates the question of whether Korean sibilant fricatives are acoustically similar to the English voiceless alveolar fricative /s/ or to the palato-alveolar /
/. The results show that in the duration of frication noise, English /
/ is the longest and Korean lax /s/ the shortest of the four sounds. It is also observed that English alveolar /s/ has the highest value, whereas Korean /s/ shows the lowest value in the frequency of center of gravity. In terms of major spectral peak, while English /s/ reveals the highest frequency, English /
/ shows the lowest value. In addition, evidence indicates that there is a strong vowel effect in the fricative sounds of both languages, although the vowel effect patterns of the two languages are inconsistent. For instance, in the major spectral peak, both Korean lax /s/ and tense /
/ show significantly higher frequencies before the vowel /
/ than before the other vowels, whereas both English /s/ and /
/ exhibit significantly higher frequencies before the vowel /i/ than before the other vowels. These results indicate that Korean sibilant fricatives are acoustically distinct from both English /s/ and /
Glottal Characteristics of Word-initial Vowels in the Prosodic Boundary: Acoustic Correlates
Sohn, Hyang-Sook ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 47~63
This study provides a description of the glottal characteristics of the word-initial low vowels /a,
/ in terms of a set of acoustic parameters and discusses glottal configuration as their acoustic correlates. Furthermore, it examines the effect of prosodic boundary on the glottal properties of the vowels, seeking an account of the possible role of prosodic structure based on prosodic theory. Acoustic parameters reported to indicate glottal characteristics were obtained from the measurements made directly from the speech spectrum on recordings of Korean and English collected from 45 speakers. They consist of two separate groups of native Korean and native English speakers, each including both male and female speakers. Based on the three acoustic parameters of open quotient (OQ), first-formant bandwidth (B1), and spectral tilt (ST), comparisons were made between the speech of males and females, between the speech of native Korean and native English speakers, and between Korean and English produced by native Korean speakers. Acoustic analysis of the experimental data indicates that some or all glottal parameters play a crucial role in differentiating the speech groups, despite substantial interspeaker variations. Statistical analysis of the Korean data indicates prosodic strengthening with respect to the acoustic parameters B1 and OQ, suggesting acoustic enhancement in terms of the degree of glottal abduction and the glottal closure during a vibratory cycle.
/W/-Variants in Korean
Oh, Mi-Ra ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 65~73
No systematic study has examined the relationship between acoustic variability and /w/-deletion in Korean. Most previous studies on /w/-deletion have described /w/-variants in categorical terms, i.e., /w/-deletion or a full glide (Silva 1991; Kang 1997; Yun 2005). These studies are based either on impressionistic judgements without a systematic acoustic analysis or on an exclusive examination of internal acoustic variability of /w/ such as F2, without examining the availability of external acoustic cues such as voice onset time (VOT) of a consonant. However, given the important influence of the adjacent sounds for segmental realizations, it is necessary to examine possible acoustic variability in the differentiation of /w/-variants. The present study aims to address this issue by evaluating the acoustic properties of /CwV/, including VOT and formant transitions. In the analysis, 432 tokens in word-initial position (216 /CwV/ words and 216 /CV/ words) were examined. The results indicated that /w/ exhibits four different variants. Firstly, /w/ is realized as a full glide. Such a variant is characterized by a VOT difference and significant differences in F1 and F2 at voicing onset compared with /CwV/ and /CV/. Secondly, /w/ can be maintained but coarticulated with the following vowel. Such a variant is demonstrated by differences in VOT and F2. Thirdly, /w/ is categorically deleted, which is indicated by the absence of any differences in VOT, F1, and F2. Fourthly, /w/ overlaps a consonant. The F2 difference without VOT difference is manifested in the variant. In contrast to VOT, F1, and F2 differences, pitch plays little role in determining /w/-variants in Korean. These findings suggest that allophones can be produced along a gradient continuum of acoustic cues, exhibiting sounds intermediate between the full realization of a given category and its deletion. Furthermore, each variant can be cued by a set of internal and external acoustic cues.
A Study of Acoustic Analysis in the Chinese' Korean Language Learners
Kim, Hyun-Ji ; You, Jae-Yeon ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 75~80
The present research investigated the characteristics of voice between genders and nationalities by measuring the acoustic parameter values of Korean and Chinese students. Sound Forge was used to collect voice samples and Praat was used to measure and analyze jitter, shimmer, NHR,
, and pitch range. The results of this research are a follows. First, during prolongation of the vowels, there was no significant difference in
between Korean and Chinese males and Korean and Chinese females. Korean males and females had higher
values than Chinese males and females. Secondly, during sentence reading, there was no significant difference between Korean and Chinese males in
. But between female groups, there was a significant difference in
. Thirdly, during sentence reading, the pitch range in Korean males was found to be narrower compared to Korean and Chinese females who had wider pitch range, showing a significant difference. Fourthly, jitter in the five vowels /a, i, u, e, o/ was found to be higher in Chinese than Korean subjects. In the vowels /a, e, u/ females were higher than males showing a significant difference. Fifthly, shimmer in the vowels /a, e, u/ was found to be higher in Chinese than Korean subjects showing a significant difference. Finally, NHR in the vowels /a, u, o/ was found to be higher in Chinese than Korean subjects showing a significant difference.
Compensation in VC and Word
Yun, Il-Sung ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 81~89
Korean and three other languages (English, Arabic, and Japanese) were compared with regard to the compensatory movements in a VC (Vowel and Consonant) sequence and word. For this, Korean data were collected from an experiment and the other languages' data from literature. All the test words of the languages had the same syllabic contexture, i.e., /CVCV(r)/, where C was an oral stop and intervocalic consonants were either bilabial or alveolar stops. The present study found that (1) Korean is most striking in the durational variations of segments (vowel and the following hetero-syllabic consonant); (2) unlike the three languages that show a constant sum of VC, Korean yields a three-way distinction in the length of VC according the type (lax unaspirated vs. tense unaspirated vs. tense aspirated) of the following stop consonant; (3) a durational constancy is maintained up to the word level in the three languages, but Korean word duration varies as a function of the feature tenseness of the intervocalic consonants; (4) consonant duration is proven to differentiate Korean the most from the other languages. It is suggested that the durational difference between a lax consonant and its tense cognate(s) and the degree of compensation between V and C are determined by the phonology in each language.
Sensitivity to Phrase-initial Tone and Laryngeal Feature Identification of Foreign Learners of Korean
Lee, Hye-Sook ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 91~99
This paper reports on an identification test where KFL learners identified the Korean three-way laryngeal contrast in the phrase-initial position, when the phrase-initial tone was systematically manipulated. It turns out that heritage learners have some sensitivity to phrase-initial tone and show a plain-aspirated alternation in their identification according to the phrase-initial tone, as native speakers do, whereas non-heritage students do not show such tone sensitivity. However, after a weekly prosody training, second-year non-heritage students have shown a significant improvement in their performance. This paper clearly shows that the phrase-initial tone plays a critical role in distinguishing laryngeal features of Korean obstruents, and also suggests that prosody including the tone-segment correlation should be incorporated in the KFL curriculum.
Prosodic Boundary Effects on the V-to-V Lingual Movement in Korean
Cho, Tae-Hong ; Yoon, Yeo-Min ; Kim, Sa-Hyang ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 101~113
The present study investigated how the kinematics of the /a/-to-/i/ tongue movement in Korean would be influenced by prosodic boundary. The /a/-to-/i/ sequence was used as 'transboundary' test materials which occurred across a prosodic boundary as in /ilnjəʃ
a/ # / minsakwae/ ('일년차#민사과에' 'the first year worker' # 'dept. of civil affairs'). It also tested whether the V-to-V tongue movement would be further influenced by its syllable structure with /m/ which was placed either in the coda condition (/am#i/) or in the onset condition (/a#mi). Results of an EMA (Electromagnetic Articulagraphy) study showed that kinematical parameters such as the movement distance (displacement), the movement duration, and the movement velocity (speed) all varied as a function of the boundary strength, showing an articulatory strengthening pattern of a "larger, longer and faster" movement. Interestingly, however, the larger, longer and faster pattern associated with boundary marking in Korean has often been observed with stress (prominence) marking in English. It was proposed that language-specific prosodic systems induce different ways in which phonetics and prosody interact: Korean, as a language without lexical stress and pitch accent, has more degree of freedom to express prosodic strengthening, while languages such as English have constraints, so that some strengthening patterns are reserved for lexical stress. The V-to-V tongue movement was also found to be influenced by the intervening consonant /m/'s syllable affiliation, showing a more preboundary lengthening of the tongue movement when /m/ was part of the preboundary syllable (/am#i/). The results, together, show that the fine-grained phonetic details do not simply arise as low-level physical phenomena, but reflect higher-level linguistic structures, such as syllable and prosodic structures. It was also discussed how the boundary-induced kinematic patterns could be accounted for in terms of the task dynamic model and the theory of the prosodic gesture (
The Effect of Acoustic Correlates of Domain-initial Strengthening in Lexical Segmentation of English by Native Korean Listeners
Kim, Sa-Hyang ; Cho, Tae-Hong ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 115~124
The current study investigated the role of acoustic correlates of domain-initial strengthening in lexical segmentation of a non-native language. In a series of cross-modal identity-priming experiments, native Korean listeners heard English auditory stimuli and made lexical decision to visual targets (i.e., written words). The auditory stimuli contained critical two word sequences which created temporal lexical ambiguity (e.g., 'mill#company', with the competitor 'milk'). There was either an IP boundary or a word boundary between the two words in the critical sequences. The initial CV of the second word (e.g., [
] in 'company') was spliced from another token of the sequence in IP- or Wd-initial positions. The prime words were postboundary words (e.g., company) in Experiment 1, and preboundary words (e.g., mill) in Experiment 2. In both experiments, Korean listeners showed priming effects only in IP contexts, indicating that they can make use of IP boundary cues of English in lexical segmentation of English. The acoustic correlates of domain-initial strengthening were also exploited by Korean listeners, but significant effects were found only for the segmentation of postboundary words. The results therefore indicate that L2 listeners can make use of prosodically driven phonetic detail in lexical segmentation of L2, as long as the direction of those cues are similar in their L1 and L2. The exact use of the cues by Korean listeners was, however, different from that found with native English listeners in Cho, McQueen, and Cox (2007). The differential use of the prosodically driven phonetic cues by the native and non-native listeners are thus discussed.
Performance Improvement of Robust Speaker Verification According to Various Standard Deviations of a Reference Distribution in Histogram Transformation
Kwon, Chul-Hong ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 127~134
Additive noise and channel mismatch strongly degrade the performance of speaker verification systems, as they distort the features of speech. In this paper a histogram transformation technique is presented to improve the robustness of text-independent speaker verification systems. The technique transforms the features extracted from speech such that their histogram is conformed to a reference distribution. The effect of different standard deviations for the reference distribution is investigated. Experimental results indicate that, in channel mismatched environments, the proposed technique offers significant improvements over existing techniques. We also verify performance improvement of the proposed method using statistics.
Confidence Measure of Forensic Speaker Identification System According to Pitch Variances
Kim, Min-Seok ; Kim, Kyung-Wha ; Yang, IL-Ho ; Yu, Ha-Jin ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 135~139
Forensic speaker identification needs high accuracy and reliability. However, the current level of speaker identification does not reach its demand. Therefore, the confidence evaluation of results is one of the issues in forensic speaker identification. In this paper, we propose a new confidence measure of forensic speaker identification system. This is based on pitch differences between the registered utterances of the identified speaker and the test utterance. In the experiments, we evaluate this confidence measure by speech identification tasks on various environments. As the results, the proposed measure can be a good measure indicating if the result is reliable or not.
Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition
Shin, Min-Hwa ; Park, Ji-Hun ; Kim, Hong-Kook ; Lee, Yeon-Woo ; Lee, Seong-Ro ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 141~148
In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.
Effectiveness of Computer-Animated Pure Tone Audiometry for Screening
Kim, Young-Min ; Lee, Moo-Kyung ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 151~156
The purpose of this study was to develop a computer-animated pure tone audiometry for screening (CAPTAS) for toddler and to determine its validity and reliability. The CAPTAS utilizes an animated cartoon story producing visual and auditory stimuli. The intensities were 40 dB, 60 dB, 80 dB. The frequencies were 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz. The subjects were 20 (9 males and 11 females) severely hearing impaired children. As a result, The correlation coefficient between mean hearing threshold of children who were able to perform PTA and average hearing threshold of CAPTAS was performed and it revealed CAPTAS's high validity. And to verify the reliability of the re-test, all children had the CAPTAS and repeated it periodically. The result confirmed the reliability.
Phonation Threshold Flow and Phonation Threshold Pressure in Patients with Adductor Spasmodic Dysphonia
Choi, Seong-Hee ; Jiang, Jack J. ; Yun, Bo-Ram ; Lee, Ji-Yeoun ; Lim, Sung-Eun ; Choi, Hong-Shik ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 157~164
This study investigated the characteristics of two aerodynamic indices, PTP (Phonation threshold pressure) and PTF (Phonation threshold flow) in patients with ADSD (adductor spasmodic dysphonia) and to see if two new aerodynamic indices can differentiate between normal and ADSD group. Additionally, PTP and PTF values were compared in terms of overall severity of ADSD in the patient group. The severity of ADSD was rated on a 7-point rating scale by two experienced speech language pathologists. The Kay Elemetrics Phonatory Aerodynamic System (PAS) (Kay Elemetrics Corp., Lincoln Park, NJ) was used to collect PTP and PTF measurements from 16 female normal subjects, 31 female patients with ADSD. Significantly lower PTF values (P< 0.05) were observed in ADSD when compared to those of normal control. Also, significantly lower PTF values in severe ADSD patients (P<.001). However, PTP could not distinguish patients with ADSD from control groups (P=0.119) and among the ADSD groups according to the severity (P=0.177). Consequently, PTF was more sensitive than PTP which might differentiate between normal speakers and ADSD and among different levels of severity within ADSD, suggesting that PTF could be a useful diagnostic parameter to measure the aerodynamic function of ADSD and provide the neurolaryngeal dysfunction in patients with ADSD.
Acoustic Analysis with Moving Window in Normal and Pathologic Voices
Choi, Seong-Hee ; Lee, Ji-Yeoun ; Jiang, Jack J. ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 165~170
In this study, the most stable portion was identified using 5% moving window during /a/ sustained phonation in normal and pathologic voice signals and the perturbation values were compared between normal and pathologic voices at the mid-point and at the most stable portion using moving window, respectively. The results revealed that some severe pathologic voice signals can be eligible for perturbation analysis by identifying the most stable portion with Err less than 10. In addition, the perturbation acoustic parameters did not differentiate the pathologic voice signals from the normal voice signals when the mid-point was selected to measure the perturbation analysis(p>0.05). However, significantly higher %shimmer and lower SNR values were observed in pathologic voices (p<0.05) when the most stable portion was selected by moving window. In conclusion, moving window could identify the most stable portion objectively which can allow toget the minimum perturbation values (%jitter, %shimmer) and maximum SNR values. Thus, moving window technique can be applicable for more reliable and accurate perturbation acoustic analysis.
The Relationship Between Speech Intelligibility and Comprehensibility for Children with Cochlear Implants
Heo, Hyun-Sook ; Ha, Seung-Hee ;
Phonetics and Speech Sciences, volume 2, issue 3, 2010, Pages 171~178
This study examined the relationship between speech intelligibility and comprehensibility for hearing impaired children with cochlear implants. Speech intelligibility was measured by orthographic transcription method for acoustic signal at the level of words and sentences. Comprehensibility was evaluated by examining listener's ability to answer questions about the contents of a narrative. Speech samples were collected from 12 speakers(age of 6~15 years) with cochlear implants. For each speaker, 4 different listeners(total of 48 listeners) completed 2 tasks: One task involved making orthographic transcriptions and the other task involved answering comprehension questions. The results of the study were as follows: (1) Speech intelligibility and comprehensibility scores tended to be increased by decreasing of severity. (2) Across all speakers, the relationship was significant between speech intelligibility and comprehensibility scores without considering severity. However, within severity groups, there was the significant relationship between comprehensibility and speech intelligibility only for moderate-severe group. These results suggest that speech intelligibility scores measured by orthographic transcription may not accurately reflect how well listener comprehend speech of children with cochlear implants and therefore, measures of both speech intelligibility and listener comprehension should be considered in evaluating speech ability and information-bearing capability in speakers with cochlear implants.