Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 4, Issue 4 - Dec 2012
Volume 4, Issue 3 - Sep 2012
Volume 4, Issue 2 - Jun 2012
Volume 4, Issue 1 - Mar 2012
Selecting the target year
The Effect of Word Frequency and Neighborhood Density on Spoken Word Segmentation in Korean
Song, Jin-Young ; Nam, Ki-Chun ; Koo, Min-Mo ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 3~20
DOI : 10.13064/KSSS.2012.4.2.003
The purpose of this study was to investigate whether a segmentation unit for a Korean noun is a 'syllable' and whether the process of segmenting spoken words occurs at the lexical level. A syllable monitoring task was administered which required participants to detect an auditorily presented target from visually presented words. In Experiment 1, syllable neighborhood density of high frequency words which can be segmented into both CV-CVC and CVC-VC were controlled. The syllable effect and the neighborhood density effect were significant, and the syllable effect emerged differently depending on the syllable neighborhood density. Similar results were obtained in Experiment 2 where low frequency words were used. The significance of word frequency effect on syllable effect was also examined. The results of Experiments 1 and 2 indicated that the segmentation unit for a Korean noun is indeed a 'syllable', and this process can occur at the lexical level.
The Influence of Chinese Falling-Rising Tone on the Pitch of Sino-Korean Words Pronounced by Chinese Learners: Focusing on the Partly-Different-Form-Same-Meaning Words
Liu, Si Yang ; Kim, Young-Joo ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 21~31
DOI : 10.13064/KSSS.2012.4.2.021
The purpose of this study is to find the influence of Chinese falling-rising tone on the pitch pattern of corresponding partly-different-form-same-meaning Sino-Korean words delivered by Chinese learners of Korean and to examine how the falling-rising tone of corresponding Chinese words affects the pitch patterns of Sino-Korean words. The scope of this research is limited to Chinese learners of Korean, especially on two groups of Sino-Korean words - AB:CB type and AB:AC type that the are second-most frequently occuring different-form-same-meaning Sino-Korean words. In this study, Chinese learners pronounced both Chinese words and corresponding Sino-Korean words. Learners' pitch patterns were recorded and analyzed using software and compared with the tone of corresponding Chinese words. Experimental results showed that AB:CB type Sino-Korean words were not affected by Chinese 'falling-rising tone - high and level tone'. As well as AB:CB type, experimental results showed there were no significant influence on the pitch pattern of AB:AC type Sino-Korean words by Chinese falling-rising tone. But it was clear that Chinese learners' made pitch errors on both AB:CB type and AB:AC type Sino-Korean words. In conclusion, the Chinese learners' pitch patterns of partly-different-form-same-meaning Sino-Korean words are different from Korean native speakers', but their pitch errors cannot be attributed to Chinese falling-rising tone.
A Study on the Voice Onset Time of English Voiceless Stops in the Buckeye Corpus
Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 33~40
DOI : 10.13064/KSSS.2012.4.2.033
The purpose of this paper is to investigate the voice onset time (VOT) of the English voiceless stops [p, t, k] found in the Buckeye Corpus of Conversational Speech . Three young female speakers were chosen for this study and their VOT values were semi-automatically extracted along with other factors. The factors used for the analysis were place of articulation, location in word, syllabic stress, content word or not, word frequency calculated from the corpus, and the speech rate expressed in syllables per second. Results showed that, for the three places of articulation of each speaker, all the factors had a statistically significant effect on the VOT values. This paper has significance in that the materials used for the analysis were from a corpus of spontaneous natural English speech.
An Analysis of the Vowel Formants of the Young Males in the Buckeye Corpus
Yoon, Kyu-Chul ; Noh, Hye-Uk ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 41~49
DOI : 10.13064/KSSS.2012.4.2.041
The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech  and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, syllabic stress information, the location in a word, location in utterance, speech rate of three consecutive words, and the word frequency in the corpus. The results indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants. The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech  and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The result indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants.
A Study on the Detection and the Correction of Prosodic Errors Produced by Chinese Korean-Learners
Yune, Young-Sook ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 51~59
DOI : 10.13064/KSSS.2012.4.2.051
The purpose of this study is to examine the pitch pattern of Korean accentual phrases produced by Chinese Korean-learners in the reading of a Korean text. Korean accentual phrase is determined by a specific F0 contour. And the pitch contour of APs differ depending on their length and the nature of initial segment. In order to examine if Chinese speakers are also aware such a phonetic properties, we have examined the AP pitch contours produced by 15 Chinese speakers differing in proficiency, and compared them to pitch contours produced by six Korean native speakers. The results show that Chinese speakers' pitch errors were observed in initial segment-tone interaction and in type of pitch patterns. However, even though Chines speakers produced the same type of pitch patterns, internal tonal modulation differs from native speakers. Finally, on the basis of theses results, we proposed a teaching method that visualizes the F0 contour.
Durational Interaction of Stops and Vowels in English and Korean Child-Directed Speech
Choi, Han-Sook ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 61~70
DOI : 10.13064/KSSS.2012.4.2.061
The current study observes the durational interaction of tautosyllabic consonants and vowels in the word-initial position of English and Korean child-directed speech (CDS). The effect of phonological laryngeal contrasts in stops on the following vowel duration, and the effect of the intrinsic vowel duration on the release duration of preceding stops in addition to the acoustic realization of the contrastive segments are explored in different prosodic contexts - phrase-initial/medial, focal accented/non-focused - in a marked speech style of CDS. A trade-off relationship between Voice Onset Time (VOT), as consonant release duration, and voicing phonation time, as vowel duration, reported from adult-to-adult speech, and patterns of durational variability are investigated in CDS of two languages with different linguistic rhythms, under systematically controlled prosodic contexts. Speech data were collected from four native English mothers and four native Korean mothers who were talking to their one-word staged infants. In addition to the acoustic measurements, the transformed delta measure is employed as a variability index of individual tokens. Results confirm the durational correlation between prevocalic consonants and following vowels. The interaction is revealed in a compensatory pattern such as longer VOTs followed by shorter vowel durations in both languages. An asymmetry is found in CV interaction in that the effect of consonant on vowel duration is greater than the VOT differences induced by the vowel. Prosodic effects are found such that the acoustic difference is enhanced between the contrastive segments under focal accent, supporting the paradigmatic strengthening effect. Positional variation, however, does not show any systematic effects on the variations of the measured acoustic quantities. Overall vowel duration and syllable duration are longer in English tokens but involve less variability across the prosodic variations. The constancy of syllable duration, therefore, is not found to be more strongly sustained in Korean CDS. The stylistic variation is discussed in relation to the listener under linguistic development in CDS.
Extraction of Speech Features for Emotion Recognition
Kwon, Chul-Hong ; Song, Seung-Kyu ; Kim, Jong-Yeol ; Kim, Keun-Ho ; Jang, Jun-Su ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 73~78
DOI : 10.13064/KSSS.2012.4.2.073
Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.
Histogram Equalization Using Background Speakers' Utterances for Speaker Identification
Kim, Myung-Jae ; Yang, Il-Ho ; So, Byung-Min ; Kim, Min-Seok ; Yu, Ha-Jin ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 79~86
DOI : 10.13064/KSSS.2012.4.2.079
In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.
A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis
Kang, Young-Ae ; Seong, Cheol-Jae ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 89~94
DOI : 10.13064/KSSS.2012.4.2.089
The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.
An Aerodynamic and Acoustic Analysis of the Breathy Voice of Thyroidectomy Patients
Kang, Young-Ae ; Yoon, Kyu-Chul ; Kim, Jae-Ock ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 95~104
DOI : 10.13064/KSSS.2012.4.2.095
Thyroidectomy patients may have vocal paralysis or paresis, resulting in a breathy voice. The aim of this study was to investigate the aerodynamic and acoustic characteristics of a breathy voice in thyroidectomy patients. Thirty-five subjects who have vocal paralysis after thyroidectomy participated in this study. According to perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 14). Aerodynamic analysis was conducted by three tasks (Voicing Efficiency, Maximum Sustained Phonation, Vital Capacity) and acoustic analysis was measured during Maximum Sustained Phonation task. The breathy voice group had significantly higher subglottal pressure and more pathological voice characteristics than the non breathy voice group. Showing 94.1% classification accuracy in result logistic regression of aerodynamic analysis, the predictor parameters for breathiness were maximum sound pressure level, sound pressure level range, phonation time of Maximum Sustained Phonation task and Pitch range, peak air pressure, and mean peak air pressure of Voicing Efficiency task. Classification accuracy of acoustic logistic regression was 88.6%, and five frequency perturbation parameters were shown as predictors. Vocal paralysis creates air turbulence at the glottis. It fluctuates frequency-related parameters and increases aspiration in high frequency areas. These changes determine perceptual breathiness.
Comparative Studies on the Self Voice Assessment of Voice Disorder Patients and the Hearer Voice Assessment of a Comparative Group of normal subjects
Lee, Yu-Jin ; Hwang, Young-Jin ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 105~114
DOI : 10.13064/KSSS.2012.4.2.105
This paper will discuss the difference between self assessment of voice disorders and the hearer voice assessment of a comparative group of normal subjects. The study was conducted on 25 voice disorder subjects and 32 hearers of a comparative group of normal subjects. The results are as follows. Firstly, in K-VHI and VHI-H, the hearers of the comparative group of normal subjects perceived more serious voice disorders than the voice disorder group in all sub-domains. Likewise, in K-VQOL and VRQOL-H, the hearers of the comparative group of normal subjects perceived more serious voice disorders than the voice disorder group in all sub-domains. Secondly, the hearer voice assessment of the comparative group of normal subjects showed no difference in gender regarding the perception of the severity of voice disorder issues. Thirdly, the hearer voice assessment of the comparative group of normal subjects states that in the emotional aspects of VHI-H, professional voice users perceive more serious voice disorders than others. Accordingly, in VRQOL-H, there was no difference in use of the voice between professionals and others.
The Study of Breath Competence Depending on Utterance Condition by Healthy Speakers: a Preliminary Study
Lee, In-Ae ; Lee, Hye-Eun ; Hwang, Young-Jin ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 115~120
DOI : 10.13064/KSSS.2012.4.2.115
This study sought to compare breath competence in three different utterance conditions when reading a passage aloud, making a spontaneous speech, and singing. We tested 15 normal females (ages averaging
) and measured breath competence through an objective, aero-mechanical instrument called PAS (Phonatory aerodynamic system, model 6600, KAY Electronics, Inc). Breathing sets of inspiration and expiration were measured by breath group number, breath group duration, and the ratio of inspiration to expiration. The results from this study led us to the following conclusion: The breath group number and the breath group duration showed no significant difference. However, the only variance that we could find was in the ratio of inspiration and expiration. In significantly different speech patterns, singing resulted in the most varied ratio of inspiration and expiration, followed by reading a text aloud, and spontaneous speech. The average frequency rates and maximum intensity levels varied with regards to varying utterance conditions. This thus shows that breath competence and phonation competence have a closely interrelated relationship.
The Noise Effect on Stuttering and Overall Speech Rate: Multi-talker Babble Noise
Park, Jin ; Chung, In-Kie ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 121~126
DOI : 10.13064/KSSS.2012.4.2.121
This study deals with how stuttering changes in its frequency in a situation where adult participants who stutter are exposed to one type of background noise, that is, multi-talker babble noise. Eight American English-speaking adults who stutter participated in this study. Each of the subjects read aloud sentences under each of three speaking conditions (i.e., typical solo reading (TSR), typical choral reading (TCR), and multi-talker babble noise reading (BNR)). Speech fluency was computed based on a percentage of syllables stuttered (%SS) and speaking rate was also assessed to examine if there was significant change in rates as a measure of vocal change under each of the speaking conditions. The study found that participants read more fluently both during BNR and during TCR than during TSR. The study also found that participants did not show significant changes in speaking rate across the three speaking conditions. Some discussion was provided in relation to the effect of multi-talker babble noise on the frequency of stuttering and its further speculation.
The Effects of Increased Processing Demands on the Sentence Comprehension of Korean-speaking Adults with Aphasia
Choi, So-Young ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 127~134
DOI : 10.13064/KSSS.2012.4.2.127
The purpose of this study is to present evidence for a particular processing approach based on the language-specific characteristics of Korean. To compare individuals' sentence-comprehension abilities, this study measured the accuracy and reaction times (RT) of 12 aphasic patients (AP) and 12 normal controls (NC) during a sentence-picture matching task. Four versions of a sentence were constructed with the two types of voice (active/passive) and two types of word order (agent-first/patient-first). To examine the effects of increased processing demand, picture stimuli were manipulated in such a way that they appeared immediately after the sentence was presented. As expected, the AP group showed higher error rates and longer RT for all conditions than the NC group. Furthermore, Korean speakers with aphasia performed above a chance level in sentence comprehension, even with passive sentences. Aphasics understood sentences more quickly and accurately when they were given in the active voice and with agent-first order. The patterns of the NC group were similar. These results confirm that Korean adults with aphasia do not completely lose their knowledge of sentence comprehension. When the processing demand was increased by delaying the picture stimulus onset, the effect of increased processing demands on RT was more pronounced in the AP than in the NC group. These findings fit well with the idea that the computational system for interpreting sentences is intact in aphasics, but its ability is compromised when processing demands increase.
Discourse Characteristics in Healthy Elderly: Effects of Aging, Gender and Educational Level
Choi, Hyun-Joo ;
Phonetics and Speech Sciences, volume 4, issue 2, 2012, Pages 135~143
DOI : 10.13064/KSSS.2012.4.2.135
Discourse is regarded as an important component of communication assessment, but studies about the discourse characteristics of the elderly are scant. The purpose of this study was to confirm the effects of aging, gender, and educational level on discourse in elderly people with normal cognitive function. Forty normal elderly and forty young people participated in this study. A picture description task (Boston Cookie-Theft picture) was used to examine discourse function. The description task was analyzed for both productivity (total number of sentences, total number of syllables, and syllables per sentence) and semantics (CIU ratio). The results were as follows: 1) Only CIU ratio differed significantly according to age. 2) In the total number of syllables and syllables per sentence, females demonstrate a higher number than males. 3) The CIU ratio differed significantly according to educational level. These results suggest that impairment of communicative function is an aspect of cognitive impairment that can be related to aging. Also, discourse performance in the elderly is associated with their gender and educational level.