Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 4, Issue 4 - Dec 2012
Volume 4, Issue 3 - Sep 2012
Volume 4, Issue 2 - Jun 2012
Volume 4, Issue 1 - Mar 2012
Selecting the target year
English /s/ and Korean s
/ Contrast in Seoul and Busan Dialects: A Study of Category Solidity
Kang, Kyoung-Ho ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 3~12
DOI : 10.13064/KSSS.2012.4.3.003
The primary goal of the current study was to examine category solidity of Korean alveolar fricatives in the Busan and Seoul dialects of Korean. Considering the common belief of
neutralization in Kyungsang speech, plain
fricatives of Busan speakers were examined against the same fricatives of Seoul speakers. Perceptual distance between Korean
on the one hand and English /s/ on the other was investigated by use of across-linguistic mapping method. Two experiments of a perceptual mapping task of English /s/ to Korean
-production task were conducted on users of the Busan and Seoul dialects of Korean. The results from the perception and production experiments suggested that at a micro-level, younger Busan speakers have less solid category stability for Korean
compared with Seoul speakers, although their production of
was as highly distinctive from each other as that of Seoul speakers.
Phonological Status of Korean /w/: Based on the Perception Test
Kang, Hyun-Sook ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 13~23
DOI : 10.13064/KSSS.2012.4.3.013
The sound /w/ has been traditionally regarded as an independent segment in Korean regardless of the phonological contexts in which it occurs. There have been, however, some questions regarding whether it is an independent phoneme in /CwV/ context (cf. Kang 2006). The present pilot study examined how Korean /w/ is realized in
context by performing some perception tests. Our assumption was that if Korean /w/ is a part of the preceding complex consonant like
, it should be more or less uniformly articulated and perceived as such. If /w/ is an independent segment, it will be realized with speaker variability. Experiments I and II examined the identification rates as "labialized" of the spliced original stimuli of
, and the cross-spliced stimuli
. The results showed that round qualities of /w/ are perceived at significantly different temporal point with speaker and context variability. We therefore conclude that /w/ in
context is an independent segment, not a part of the preceding segment. Full-scale examination of the production test in the future should be performed to verify the conclusion we suggested in this paper.
The Effect of Visual Cues in the Identification of the English Consonants /b/ and /v/ by Native Korean Speakers
Kim, Yoon-Hyun ; Koh, Sung-Ryong ; Valerie, Hazan ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 25~30
DOI : 10.13064/KSSS.2012.4.3.025
This study investigated whether native Korean listeners could use visual cues for the identification of the English consonants /b/ and /v/. Both auditory and audiovisual tokens of word minimal pairs in which the target phonemes were located in word-initial or word-medial position were used. Participants were instructed to decide which consonant they heard in
conditions: cue (audio-only, audiovisual) and location (word-initial, word-medial). Mean identification scores were significantly higher for audiovisual than audio-only condition and for word-initial than word-medial condition. Also, according to signal detection theory, sensitivity, d', and response bias, c were calculated based on both hit rates and false alarm rates. The measures showed that the higher identification rate in the audiovisual condition was related with an increase in sensitivity. There were no significant differences in response bias measures across conditions. This result suggests that native Korean speakers can use visual cues while identifying confusing non-native phonemic contrasts. Visual cues can enhance non-native speech perception.
L1-L2 Transfer in VOT and f0 Production by Korean English Learners: L1 Sound Change and L2 Stop Production
Kim, Mi-Ryoung ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 31~41
DOI : 10.13064/KSSS.2012.4.3.031
Recent studies have shown that the stop system of Korean is undergoing a sound change in terms of the two acoustic parameters, voice onset time (VOT) and fundamental frequency (f0). Because of a VOT merger of a consonantal opposition and onset-f0 interaction, the relative importance of the two parameters has been changing in Korean where f0 is a primary cue and VOT is a secondary cue in distinguishing lax from aspirated stops in speech production as well as perception. In English, however, VOT is a primary cue and f0 is a secondary cue in contrasting voiced and voiceless stops. This study examines how Korean English learners use the two acoustic parameters of L1 in producing L2 English stops and whether the sound change of acoustic parameters in L1 affects L2 speech production. The data were collected from six adult Korean English learners. Results show that Korean English learners use not only VOT but also f0 to contrast L2 voiced and voiceless stops. However, unlike VOT variations among speakers, the magnitude effect of onset consonants on f0 in L2 English was steady and robust, indicating that f0 also plays an important role in contrasting the [voice] contrast in L2 English. The results suggest that the important role of f0 in contrasting lax and aspirated stops in L1 Korean is transferred to the contrast of voiced and voiceless stops in L2 English. The results imply that, for Korean English learners, f0 rather than VOT will play an important perceptual cue in contrasting voiced and voiceless stops in L2 English.
F0 Extrema Timing of HL and LH in North Kyungsang Korean: Evidence from a Mimicry Task
Kim, Jung-Sun ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 43~49
DOI : 10.13064/KSSS.2012.4.3.043
This paper describes the categorical effects of pitch accent contrasts in a mimicry task. It focuses, specifically, on examining how fundamental frequency (f0) variation reflects phonological contrasts from speakers of two distinct varieties of Korean (i.e., North Kyungsang and South Cholla). The results showed that, in a mimicry task using synthetic speech continua, there was a categorical effect in f0 peak timing for North Kyungsang speakers, but the timing of f0 peaks and valleys in the responses of South Cholla speakers was more variable, presenting a gradient or non-categorical effect. Evidence of categorical effects was represented as the shift of f0 peak times along an acoustic continuum for North Kyungsang speakers. The range for the shift of f0 valley times was much narrower, compared to that of f0 peak times. The degree of a shift near the middle of the continuum showed variability across individual mimicry responses. However, the categorical structure in mimicry responses regarding the clustering of f0 peak points was more significant for North Kyungsang speakers than for South Cholla speakers. Additionally, the finding of the current study implies that the location of f0 peak times depends on individuals' imitative (or cognitive) abilities.
A Study on the Relation Between Korean Speakers' English Stop Pronunciation Accuracy and Pronunciation Proficiency
Kim, Ji-Eun ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 51~58
DOI : 10.13064/KSSS.2012.4.3.051
The purpose of this study is to measure the impact of Korean speakers' English stop pronunciation on their general pronunciation proficiency. For these purposes, 20 Korean speakers read English sentences and their pronunciations were rated by native English speakers. The Korean speakers' VOT values of English stops in sentences were then measured and the relation between the VOT values and native speakers' pronunciation rating was compared. Here, the relation between (1) the proficiency score of each speaker and VOT values; and (2) the proficiency score of each sentence and VOT values were analyzed. The results show that there is a relation between the proficiency score of each sentence and VOT values of /t, b, d, g/; and there is a relation between VOT values of /t, b, d, g/ and proficiency scores of each speaker while these is a weak relation between VOT values of /p, k/ and proficiency scores of each speaker.
The Contribution of Prosody to the Foreign Accent of Chinese Talkers' English Speech
Liu, Xing ; Lee, Joo-Kyeong ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 59~73
DOI : 10.13064/KSSS.2012.4.3.059
This study attempts to investigate the contribution of prosody to the foreign accent in Chinese speakers' English production by examining the synthesized speech of crossing native and non-native talkers' prosody and segments. For the stimuli of the foreign accent ratings, we transplanted gender-matched native speakers' prosody onto non-native talkers' segments and vice versa, utilizing the TD-PSOLA algorithm. Eight English native listeners participated in judging foreign accent and comprehensibility of the transplanted stimuli. Results showed that the synthesized stimuli were perceived as stronger foreign accent regardless of speakers' proficiency when English speakers' prosody was crossed with Chinese speakers' segments. This suggests that segments contribute more than prosody to native listeners' evaluation of foreign accent. When transplanted with English speakers' segments, Chinese speakers' prosody showed a difference in duration rather than pitch between high and low proficiency such that stronger foreign accent was detected when low proficient Chinese speakers' duration was crossed with English speakers' segments. This indicated that prosody, more specifically duration, plays a role though the prosodic role is not overall as significant as segments. According to the post acoustic analysis, the temporal features contributing to making the duration parameter prominent as opposed to pitch were found out to be speaking rate, pause duration and pause frequency. Finally, foreign accent and comprehensibility showed no significant correlation such that native listeners had no difficulty listening to highly foreign accented speech.
Reduction and Frequency Analyses of Vowels and Consonants in the Buckeye Speech Corpus
Yang, Byung-Gon ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 75~83
DOI : 10.13064/KSSS.2012.4.3.075
The aims of this study were three. First, to examine the degree of deviation from dictionary prescribed symbols and actual speech made by American English speakers. Second, to measure the frequency of vowel and consonant production of American English speakers. And third, to investigate gender differences in the segmental sounds in a speech corpus. The Buckeye Speech Corpus was recorded by forty American male and female subjects for one hour per subject. The vowels and consonants in both the phonemic and phonetic transcriptions were extracted from the original files of the corpus and their frequencies were obtained using codes of a free software R. Results were as follows: Firstly, the American English speakers produced a reduced number of vowels and consonants in daily conversation. The reduction rate from the dictionary transcriptions to the actual transcriptions was around 38.2%. Secondly, the American English speakers used more front high and back low vowels while three-fourths of the consonants accounted for stops, fricatives, and nasals. This indicates that the segmental inventory has nonlinear frequency distribution in the speech corpus. Thirdly, the two gender groups produced vowels and consonants similarly even though there were a few noticeable differences in their speech. From these results we propose that English teachers consider pronunciation education reflecting the actual speech sounds and that linguists find a way to establish unmarked segmentals from speech corpora.
Further Issues on the Duration Differences in Vowels due to the Voicing of the Following Stops in English
Oh, Eun-Jin ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 85~92
DOI : 10.13064/KSSS.2012.4.3.085
It is a well-known phenomenon that vowel duration in English is generally longer before a voiced stop than a voiceless one. Past research has postulated that the closure duration of the voiceless stop is generally longer than that of the voiced stop and that the duration of a preceding vowel is determined complementarily by the closure duration of the stop. To shed further light on the phenomenon, this study examined fourteen native speakers of American English who read the monosyllabic words [bVC] (V = [i, ɪ, eɪ, ɛ, æ, ʌ, ɑ], C = [t, d]). First, we found that mean vowel duration was 38 ms longer before the voiced stop than the voiceless (mean duration ratio = 1.24). Second, mean closure duration of the voiced stop was only shorter by 5 ms compared to the voiceless stop (mean duration ratio = 0.97). Therefore, for our subjects, vowel duration was not determined complementarily by the closure duration of the following stop. Third, vowels with longer inherent durations (viz., tense, diphthong, and low vowels) tended to show larger duration ratios in the voiced and voiceless contexts than the vowels with shorter durations (viz., lax vowels). This indicates that the lengthening of inherently shorter vowels before a voiced stop is limited in order to avoid overlapping with longer vowels in the duration range. Fourth, there was no significant gender difference in vowel duration ratios in the contexts of voiced and voiceless stops. Finally, considerable individual differences were found in the vowel and consonant duration ratios.
Forensic Automatic Speaker Identification System for Korean Speakers
Kim, Kyung-Wha ; So, Byung-Min ; Yu, Ha-Jin ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 95~101
DOI : 10.13064/KSSS.2012.4.3.095
In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.
Target signal detection using MUSIC spectrum in noise environments
Park, Sang-Jun ; Jeong, Sang-Bae ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 103~110
DOI : 10.13064/KSSS.2012.4.3.103
In this paper, a target signal detection method using multiple signal classification (MUSIC) algorithm is proposed. The MUSIC algorithm is a subspace-based direction of arrival (DOA) estimation method. Using the inverse of the eigenvalue-weighted eigen spectra, the algorithm detects the DOAs of multiple sources. To apply the algorithm in target signal detection for GSC-based beamforming, we utilize its spectral response for the DOA of the target source in noisy conditions. The performance of the proposed target signal detection method is compared with those of the normalized cross-correlation (NCC), the fixed beamforming, and the power ratio method. Experimental results show that the proposed algorithm significantly outperforms the conventional ones in receiver operating characteristics (ROC) curves.
Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis
Lee, Hea-Min ; Kim, Hyung-Soon ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 111~117
DOI : 10.13064/KSSS.2012.4.3.111
In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.
Modified Generic Mode Coding Scheme for Enhanced Sound Quality of G.718 SWB
Cho, Keun-Seok ; Jeong, Sang-Bae ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 119~125
DOI : 10.13064/KSSS.2012.4.3.119
This paper describes a new algorithm for encoding spectral shape and envelope in the generic mode of G.718 super-wide band (SWB). In the G.718 SWB coder, generic mode coding and sinusoidal enhancement are used for the quantization of modified discrete cosine transform (MDCT)-based parameters in the high frequency band. In the generic mode, the high frequency band is divided into sub-bands and for every sub-band the most similar match with the selected similarity criteria is searched from the coded and envelope normalized wideband content. In order to improve the quantization scheme in high frequency region of speech/audio signals, the modified generic mode by the improvement of the generic mode in G.718 SWB is proposed. In the proposed generic mode, perceptual vector quantization of spectral envelopes and the resolution increase for spectral copy are used. The performance of the proposed algorithm is evaluated in terms of objective quality. Experimental results show that the proposed algorithm increases the quality of sounds significantly.
Alveolar Fricative Sound Errors by the Type of Morpheme in the Spontaneous Speech of 3- and 4-Year-Old Children
Kim, Soo-Jin ; Kim, Jung-Mee ; Yoon, Mi-Sun ; Chang, Moon-Soo ; Cha, Jae-Eun ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 129~136
DOI : 10.13064/KSSS.2012.4.3.129
Korean alveolar fricatives are late-developing speech sounds. Most previous research on phonemes used individual words or pseudo words to produce sounds, but word-level phonological analysis does not always reflect a child's practical articulation ability. Also, there has been limited research on articulation development looking at speech production by grammatical morphemes despite its importance in Korean language. Therefore, this research examines the articulation development and phonological patterns of the /s/ phoneme in terms of morphological types produced in children's spontaneous conversational speech. The subjects were twenty-two typically developing 3- and 4-year-old Koreans. All children showed normal levels in three screening tests: hearing, vocabulary, and articulation. Spontaneous conversational samples were recorded at the children's homes. The results are as follows. The error rates decreased with increasing age in all morphological contexts. Also, error percentages within an age group were significantly lower in lexical morphemes than in grammatical morphemes. The stopping of fricative sounds was the main error pattern in all morphological contexts and reduced as age increased. This research shows that articulation performance can differ significantly by morphological contexts. The present study provides data that can be used to identify the difficult context for articulatory evaluation and therapy of alveolar fricative sounds.
The Usefulness of Multiple-Choice Name Matching Test in Aphasic Patients
Min, Yong ; Ko, Myoung-Hwan ; Seo, Jeong-Hwan ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 137~142
DOI : 10.13064/KSSS.2012.4.3.137
The aim of this study is to investigate the usefulness of the multiple-choice name matching test (MC-NMT) in adults with aphasia by comparing the Korean version of the Boston Naming Test (K-BNT) and subsets of the Korean version of the Western Aphasia Battery (K-WAB). Thirty-nine patients who suffer from aphasia participated in the study. All patients were examined by the K-BNT, MC-NMT and K-WAB. The MC-NMT consisted of the 30 original BNT object stimuli which were presented with four response choices (written words) with similar frequency, including one correct and three incorrect responses. Cards containing the drawings were presented to the patient one at time. An item was passed if the patient chose the correct response within 10 seconds. We subdivided two groups into a total group and a low K-BNT group (at and below 15 points). We evaluated the correlation between the K-BNT, MC-NMT score and production, naming, repetition, comprehension, reading and writing scores in subsets of the K-WAB. There was a highly positive correlation between the K-BNT score and naming score of the K-WAB in total patients. However, the MC-NMT was highly correlated with reading scores in the K-WAB. In low score K-BNT patients, the K-BNT strongly correlated with production, naming and repetition scores of the K-WAB. These findings mean that K-BNT reflects motor language function. However, the MC-NMT was strong correlated comprehension, reading and writing of the K-WAB. This finding reflects sensory language function. We suggest that the combination of K-BNT and newly developed MC-NMT will be useful to evaluate speech functions in aphasic patients.
Speech Rates of Male Esophageal Speech
Park, Won-Kyoung ; Shim, Hee-Jeong ; Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 143~149
DOI : 10.13064/KSSS.2012.4.3.143
The purpose of this study is to investigate the speech rate of an esophageal speech group that is capable of vocalization after surgery. The subjects in this experiment were 10 male esophageal speakers and 10 male laryngeal speakers. Each group read a reading passage that was recorded by a DAT recorder (Rolando, EDIROL R-09). These records were analyzed by using CSL (Computerized Speech Lab, model 4150). The results were as follows: (1) the overall speech rate of esophageal speech was 2.50 SPS (syllable per second) while the overall speech rate of laryngeal speech was 4.23 SPS. (2) The articulatory rate of esophageal speech was 3.14 SPS (syllable per second) while the articulatory rate of laryngeal speech was 4.75 SPS. Speech rates as well as articulatory rates of esophageal speech were significantly lower than laryngeal speech. These differences between the two groups may be due to reduced efficiency of airflows across the pharyngeal-esophageal segment for esophageal speakers when compared to airflow through the glottis for laryngeal speakers. These results would provide a guideline in speech rates for esophageal speakers in clinical settings.
Prevalence of Voice Disorders and Characteristics of Korean Voice Handicap Index in the Elderly
Song, Yun-Kyung ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 151~159
DOI : 10.13064/KSSS.2012.4.3.151
The purpose of this study is to evaluate the prevalence of voice disorders and the Korean voice handicap index in the elderly. For this study, 169 elderly performed two types of questionnaires and vowel /a/ prolongation. Self-reported voice symptoms and the Korean voice handicap index were analyzed and acoustic voice evaluation was performed by MDVP. The results showed that the prevalence of voice disorders in the elderly are significantly higher than that of adults in self-reports. In acoustic evaluation, 32.2% of the male elderly and 40.9% of the female elderly exceeded the thresholds of Jitter (%), Shimmer (%) and NHR. In addition, Korean voice handicap index scores of the female elderly are significantly higher than those of female adults. These findings indicate the high frequency of voice disorders in the elderly and the need to focus on this group. Additional studies on the voice related quality of life for the elderly are needed.
Characteristics of Speech Intelligibility and the Vowel Space in Patients with Parkinson's disease
Shim, Hee-Jeong ; Park, Won-Kyoung ; Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 161~169
DOI : 10.13064/KSSS.2012.4.3.161
The purpose of this study was to investigate the characteristics of speech intelligibility of spontaneous speech and the vowel space parameters in patients with Parkinson's disease. Ten PD patients (M=5, F=5) and a corresponding control group of ten normal adults participated in this study. Firstly, subjects were asked to tell a story about their hometown and youth in order to analyze speech intelligibility. Secondly, the subjects were also asked to repeat four vowels (/a/, /i/, /u/, /e/) five times in order to compare their vowel spaces. The results were as follows: (1) the speech intelligibility of the PD group was lower than that of the control group. (2) Four parameters including vowel area, vowel articulatory index, formant centralization ratio, F2i/F1u ratio were significantly different in each group. For instance, vowel area and F2 ratio were wider and higher, respectively. As a result, a decrease in speech intelligibility of patients with PD is likely to show different types of errors from the normal group. The results of this research are meaningful in a sense that they could provide the objective standard of speech intelligibility and vowel space parameters.
Efficacy of CPAP (Continuous Positive Airway Pressure) Therapy on Reducing the Degree of Hypernasality in Speakers with Repaired Cleft Palate
Ha, Seung-Hee ; Jung, Seung-Eun ; Koh, Kyung-S. ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 171~177
DOI : 10.13064/KSSS.2012.4.3.171
The purpose of this study was to investigate whether CPAP therapy was effective for reducing the degree of hypernasality in individuals with repaired cleft palate and whether the efficacy of CPAP therapy was maintained. Five individuals with cleft palate participated in an 8-week home-based CPAP program. Results from perceptual evaluation of hypernasality and nasalance scores before and after CPAP therapy and at the follow-up speech evaluation were compared. The results of the study showed that the responses of the CPAP therapy were various among individuals. Three individuals exhibited reductions in the degree of perceived hypernasality, while nasalance scores in all individuals decreased after the therapy. The results showed that the effect of CPAP therapy was generally maintained until approximately three months after the completion of CPAP therapy.
A Study on Vestibulosaccular Hearing
Heo, Seung-Deok ;
Phonetics and Speech Sciences, volume 4, issue 3, 2012, Pages 179~186
DOI : 10.13064/KSSS.2012.4.3.179
The aims of this study are to consider auditory physiological characteristics and to confirm audiological evaluation and interpretation in regards to cases of sensorineural hearing loss that observe an abnormal AB gap. Vestibulosaccular hearing occurs when there is an abnormally large air-bone gap (AB gap) in sensorineural hearing loss, also known as pure cochlear conductive hearing loss. Generally, an AB gap is caused by damage to the external and/or middle ear. In conductive hearing loss, loss of air condition hearing occurs due to a loss of resonance in the outer ear and/or impedance mismatching in the middle ear. Most of these types of hearing loss can be treated medically and surgically. However, there is no medical treatment for an AB gap in sensorineural hearing loss and hearing loss can worsen gradually or suddenly. In addition, many studies have reported that head trauma makes hearing loss even more serious. Therefore, in order to differentiate between conductive hearing losses, it is important to check whether or not there is an enlarged vestibular aqueduct by means of temporal bone computerized tomography and/or magnetic resonance imaging.