Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 4, Issue 4 - Dec 2012
Volume 4, Issue 3 - Sep 2012
Volume 4, Issue 2 - Jun 2012
Volume 4, Issue 1 - Mar 2012
Selecting the target year
The Patterns of Vowel Insertion in Korean Speakers' Production of English C+/l/ and C+/r/ Clusters
Kang, Seo-Yoon ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 3~17
DOI : 10.13064/KSSS.2012.4.4.003
This study examines Korean speakers' production of English consonant clusters, focusing on vowel insertion. An acoustic analysis along with a statistical test was carried out to see what factors are involved in this production. The following factors were considered in the present study: phonetic properties, L1 transfer, and cluster types. Specifically, liquid types were considered to see if they cause any difference depending on C+/l/ or C+/r/ clusters in the onset in terms of vowel insertion patterns. That is, it was examined which Korean speakers produce better, C+/l/ or C+/r/ clusters. Interestingly, the result of the present experiment shows that the correct answer percent was higher in the C+/r/ onset clusters than C+/l/ onset clusters unlike Eckman's (1977) Marked Differential Hypothesis. In other words, the occurrence of the vowel insertion in C+/l/ clusters is higher than C+/r/ onset clusters. This may be attributed to L1 transfer. Furthermore, in the present study, three patterns of vowel insertion in the C+/l/ clusters were identified by implementing an acoustic analysis based on vowel duration and formant: a) vowel insertion with gemination, b) phonological epenthesis, and c) phonetic intrusion. However, phonetic intrusion mainly occurred in the C+/r/ clusters. Data were collected from 54 Korean speakers to see what factors are involved in vowel insertion patterns in the production of English consonant clusters. This study provides evidence for L1 transfer, the duration effect of /l/ in a different context, and three kinds of vowel insertion patterns in conjunction with gestural coordination by age groups.
Speech Rate Variation in Synchronous Speech
Kim, Miran ; Nam, Hosung ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 19~27
DOI : 10.13064/KSSS.2012.4.4.019
When two speakers read a text together, the produced speech has been shown to reduce a high degree of variability (e.g., pause duration and placement, and speech rate). This paper provides a quantitative analysis of speech rate variation exhibited in synchronous speech by examining the global and local patterns in two dialects of Mandarin Chinese (Taiwan and Shanghai). We analyzed the speech data in terms of mean speech rate and the reference of "Just Noticeable difference (JND)" within a subject and across subjects. Our findings show that speakers show lower and less variable speech rates when they read a text synchronously than when they read alone. This global pattern is observed consistently across speakers and dialects maintaining the unique local variation patterns of speech rate for each dialect. We conclude that paired speakers lower their speech rates and decrease the variability in order to ensure the synchrony of their speech.
An Analysis of the Vowel Formants of the Young versus Old Speakers in the Buckeye Corpus
Km, Ji-Eun ; Yoon, Kyuchul ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 29~35
DOI : 10.13064/KSSS.2012.4.4.029
The purpose of this study was to measure the first two vowel formants of the forty male and female speakers (twenty young vs. old male speakers and twenty young vs. old female speakers) from the Buckeye Corpus of Conversational Speech and to examine the vowel formant changes across two generations (younger vs. older). The results indicated that the vowel space of the younger generation (in their thirties or less) shifted to the lower left position compared to those of the older generation (in their forties or more) in both male and female speakers. When the results were compared to those of Peterson & Barney (1952), it appears that differences can be found in the size of the vowel spaces through time.
A Study of an Independent Evaluation of Prosody and Segmentals: with Reference to the Difference in the Foreign Accent of Korean, Chinese, and Japanese Learners of English
Park, Hansang ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 37~43
DOI : 10.13064/KSSS.2012.4.4.037
This study investigates an independent evaluation of prosody and segmentals with reference to the difference in the foreign accent of Korean, Chinese, and Japanese learners of English. For this study, a set of stimuli were made of English sentences read by male and female Korean, Chinese, and Japanese learners of English by prosody swapping technique. Two groups of American and Korean subjects evaluated the difference in the prosody and segmentals of the stimuli by pairwise difference rating. The results showed that there was no significant difference in the evaluation scores of prosody and segmentals across accents for either subject group. The results also showed that both subject groups indicated a greater score with segmentals than with prosody. The results of the present study are significant in that they are opposite to the claim of some previous studies that prosodic factors could have a greater influence on the foreign accent and intelligibility than segmentals.
An Analysis of the Vowel Formants of the Young Females in the Buckeye Corpus
Yoon, Kyuchul ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 45~52
DOI : 10.13064/KSSS.2012.4.4.045
The purpose of this paper is to measure the first two vowel formants of the ten young female speakers from the Buckeye Corpus of Conversational Speech  automatically and then to analyze various potential factors that may affect the formant distribution of the eight peripheral vowels of English. The factors that were analyzed included the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The results indicate that the overall formant patterns of the female speakers were similar to those of earlier works. The effects of the factors on the realization of the two formants were also similar to those from the male speakers with minor differences.
The Interlanguage Speech Intelligibility Benefit (ISIB) of English Prosody: The Case of Focal Prominence for Korean Learners of English and Natives
Lee, Joo-Kyeong ; Han, Jeong-Im ; Choi, Tae-Hwan ; Lim, Injae ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 53~68
DOI : 10.13064/KSSS.2012.4.4.053
This study investigated the speech intelligibility of Korean-accented and native English focus speech for Korean and native English listeners. Three different types of focus in English, broad, narrow and contrastive, were naturally induced in semantically optimal dialogues. Seven high and seven low proficiency Korean speakers and seven native speakers participated in recording the stimuli with another native speaker. Fifteen listeners from each of Korean high & low proficiency and native groups judged audio signals of focus sentences. Results showed that Korean listeners were more accurate at identifying the focal prominence for Korean speakers' narrow focus speech than that of native speakers, and this suggests that the interlanguage speech intelligibility benefit-talker (ISIB-T) held true for narrow focus regardless of Korean speakers' and listeners' proficiency. However, Korean listeners did not outperform native listeners for Korean speakers' production of narrow focus, which did not support for the ISIB-listener (L). Broad and contrastive focus speech did not provide evidence for either the ISIB-T or ISIB-L. These findings are explained by the interlanguage shared by Korean speakers and listeners where they have established more L1-like common phonetic features and phonological representations. Once semantically and syntactically interpreted in a higher level processing in Korean narrow focus speech, the narrow focus was phonetically realized in a more intelligible way to Korean listeners due to the interlanguage. This may elicit ISIB. However, Korean speakers did not appear to make complete semantic/syntactic access to either broad or contrastive focus, which might lead to detrimental effects on lower level phonetic outputs in top-down processing. This is, therefore, attributed to the fact that Korean listeners did not take advantage over native listeners for Korean talkers and vice versa.
Analysis of the Relationship Between Sasang Constitutional Groups and Speech Features Based on a Listening Evaluation of Voice Characteristics
Kwon, Chulhong ; Kim, Jongyeol ; Kim, Keunho ; Jang, Junsu ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 71~77
DOI : 10.13064/KSSS.2012.4.4.071
Sasang constitution experts utilize voice characteristics as an auxiliary measure for deciding a person's constitutional group. This study aims at establishing a relationship between speech features and the constitutional groups by subjective listening evaluation of voice characteristics. A speech database of 841 speakers whose constitutional groups have been already diagnosed by Sasang constitution experts was constructed. Speech features related to speech source and vocal tract filter were extracted from five vowels and one sentence. Statistically significant speech features for classifying the groups were analyzed using SPSS. The features contributed to constitution classification were speaking rate, Energy, A1, A2, A3, H1, H2, H4, CPP for males in their 20s, F0_mean, CPP, SPI, HNR, Shimmer, Energy, A1, A2, A3, H1, H2, H4 for females in their 20s, Energy, A1, A2, A3, H1, H2, H4, CPP for male in the 60s, and Jitter, HNR, CPP, SPI for females in their 60s. Experimental results show that speech technology is useful in classifying constitutional groups.
Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model
Park, Jihoon ; Hahn, Minsoo ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 79~86
DOI : 10.13064/KSSS.2012.4.4.079
In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.
Microphone Array Based Speech Enhancement Using Independent Vector Analysis
Wang, Xingyang ; Quan, Xingri ; Bae, Keunsung ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 87~92
DOI : 10.13064/KSSS.2012.4.4.087
Speech enhancement aims to improve speech quality by removing background noise from noisy speech. Independent vector analysis is a type of frequency-domain independent component analysis method that is known to be free from the frequency bin permutation problem in the process of blind source separation from multi-channel inputs. This paper proposed a new method of microphone array based speech enhancement that combines independent vector analysis and beamforming techniques. Independent vector analysis is used to separate speech and noise components from multi-channel noisy speech, and delay-sum beamforming is used to determine the enhanced speech among the separated signals. To verify the effectiveness of the proposed method, experiments for computer simulated multi-channel noisy speech with various signal-to-noise ratios were carried out, and both PESQ and output signal-to-noise ratio were obtained as objective speech quality measures. Experimental results have shown that the proposed method is superior to the conventional microphone array based noise removal approach like GSC beamforming in the speech enhancement.
The Aspect of Voice Characteristics Change after Botulinum Toxin-A Injection in Patients with Adductor Spasmodic Dysphonia according to Vocal Tremor
Ko, Hyeju ; Choi, Hong-Shik ; Lim, Sung-Eun ; Choi, Yaelin ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 95~107
DOI : 10.13064/KSSS.2012.4.4.095
As BTX-A, which has been known to be the most effective treatment for ADSD, is not effective in treating vocal tremors, voice assessment must be employed to perform differential diagnosis of SD and vocal tremor in an accurate fashion. In this study, the characteristics of vocal changes after botulinum toxin injection were compared by analyzing the voice characteristics resulting from the presence of vocal tremors using objective analysis devices, with the aim of helping to provide prognoses and to determine remedial effects in clinical cases comprising patients with adductor spasmodic dysphonia accompanied by voice tremors. Respiratory function tests, aerodynamic analysis, electroglottography (EGG), acoustic analysis, auditory perception tests, and K-VHI had been conducted at intervals of four, eight, and twelve weeks before and after injection, targeting a group of 17 ADSD female patients (a ADSD group of four with vocal tremor and a ADSD group of 13 without voice tremor). For average FVC and FEV1, the T group showed statistically significant low averages compared with the NT group, whereas the T group showed statistically significant high average ATRI compared with the NT group. In addition, the T group showed a statistically significant Fatr, lower than that of the NT group. For the ADSD group of patients with voice tremor, their vocal tremor remained unchanged despite noticeable decrease in wringing voices. In other words, as the vocal tremor and wringing voices are two distinctive features, there is a need for the two features to be targeted separately for differential diagnosis.
Stuttering Reduction Rate during Sentence Reading: Choral Speech and Altered Auditory Feedback
Park, Jin ; Park, Heeyoung ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 109~115
DOI : 10.13064/KSSS.2012.4.4.109
This paper mainly aims to investigate how differently choral speech and altered auditory feedback (i.e., delayed auditory feedback, frequency-altered feedback) enhance speech fluency during sentence reading. To do this, a stuttering reduction rate was used and measured how much stuttering in frequency was reduced during each of the fluency enhancing conditions (i.e, typical choral reading, DAF, FAF) relative to typical solo reading. The results showed that stuttering frequency was reduced in the three fluency enhancing conditions and the highest mean value in stuttering reduction rate was observed during typical choral reading. Some discussion was provided in relation to the stuttering reduction rate observed during typical choral reading and its further speculation.
The Prosodic Characteristics of Children with Cochlear Implant with Respect to the Articulation Rate, Pause, and Duration
Oh, Soonyoung ; Seong, Cheoljae ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 117~127
DOI : 10.13064/KSSS.2012.4.4.117
This research reports the prosodic characteristics (including articulation speech rate, pause characteristics, duration) of children with cochlear implants with reference to those of children with normal hearing. Subjects are 8-to 10-year-old children, balancing each number of gender as 24. Dialogue speech data are comprised of four types of sentence patterns. Results show that 1) there's a statistically meaningful difference on articulation speech rate between the two groups. 2) On pauses, they are not observed in exclamatory and declarative sentences in normal children. While imperative sentences show no statistical difference on the number of pauses between the two groups, interrogative sentences do. 3) Declarative, exclamatory, and interrogative sentences reveal statistical difference between the two groups in terms of the sentence's final two-syllable word duration, showing no difference on imperative sentences. 4) When it comes to the RFP (duration ratio of sentence final syllable to penultimate syllable), we no statistically meaningful difference between the two groups in all types of sentences exists. 5) Lastly, RWS (the ratio of sentence final two syllable word duration to that of whole sentence duration) shows statistical difference between two groups in imperative sentences, but not in all the rest types.
A Literature Review on Reading Fluency
Lee, Suhyang ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 129~138
DOI : 10.13064/KSSS.2012.4.4.129
Reading fluency is an important variable in reading comprehension. However, a limited number of studies on reading fluency are available in Korea. The purpose of this study is to review the articles about reading fluency during last 10 years and to present a direction for future research. Forty research papers from the Journal of Learning Disabilities and Language Speech and Hearing Services in Schools were selected from 2002 to 2012. These papers were analyzed in terms of their subjects and research methods. About 64% of the articles focused on typically developing children and children with dyslexia. About 67% of the research consisted of descriptive studies. Based on these results, suggestions were made for future research on reading fluency.
An Experimental Study of Comfortable Pitch and Loudness with Target Matching: Effects on Electroglottographic and Acoustic Measures
Choi, Seong Hee ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 139~146
DOI : 10.13064/KSSS.2012.4.4.139
This study was designed to examine comfort levels of pitch and loudness with target matching and their effects on electroglottographic (EGG) and acoustic measures. Twelve speakers, six males and six females, were instructed to produce /a/ sustained vowel for three seconds at a comfortable pitch and loudness level without any instruction and with a target matching procedure of either a certain f0 or SPL separately with visual and auditory feedback. The range of pitch for females and males were presented by progressing up and down randomly at intervals of 5Hz from 150 Hz to 310 Hz (total 33 frequency targets) and from 85 Hz to 190 Hz (total 22 frequency targets), respectively. The loudness levels were 65, 75, 85, 95 dB (total of four intensity targets) for both males and females. Subjective estimations of comfortable levels were obtained using a 10-point equal-appearing interval rating scale following each phonation. The results showed that males and females demonstrated similar trends in loudness levels with greatest comfort at 75 dB, whereas pitch comfort ratings showed a greater variability with females having a wider range with target matching. In the comfort levels of individuals, most male and female speakers rated higher comfort at soft, rather than loud phonations. On the other hand, most male speakers perceived highest comfort levels below the comfort pitch levels they phonated under natural conditions. Higher frequency ranges, however, were perceived to be more comfortable than those of natural condition in most female speakers, although the comfortable pitch levels in spontaneous phonations were within the comfort level ranges determined by targeted phonations. When comparing acoustic (%jitter, %shimmer, SNR) and EGG measures (CQ%) between spontaneous comfortable phonations and targeted phonations produced by the same subject at similar f0 and intensity, no significant differences were observed (p>0.05). Thus, target matching procedures may be considered a compatible and alternative method to reduce the variability of comfortable pitch and loudness levels by eliciting consistent comfortable phonations.
The Effects of Vocal Relaxation Training on Voice Improvement of Children with Vocal Nodules
Han, Ji Eun ; Seong, Cheol Jae ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 147~154
DOI : 10.13064/KSSS.2012.4.4.147
The purpose of this study is to examine the effect of voice improvement when vocal training, which relaxes the vocal contact, is applied to children with vocal nodules. Subjects included 20 5- to 12-year-old boys with vocal nodules in Otolaryngology and for whom voice therapy had been advised. The vocal therapy was conducted for 40 minutes per a week for a total of eight times. Results were evaluated by videostroboscopy, auditory-perceptual evaluation of GRBAS Scale, aerodynamic test, and acoustic analysis before and after therapy. As a result, first, the size of vocal nodules was reduced and the unstable pattern of vocal contact was improved. Glottic closure was increased and Phase symmetry was decreased during vocal vibration. Mucosal wave was increased and muscle tension of the larynx was reduced. Second, auditory-perceptual evaluation showed that subjects' overall quality of voice improved. GRBAS Scale Evaluation showed that the characteristics of the subjects' voice which were rough, breathy, and strained and breathy were reduced after therapy. Third, the measurements of acoustic parameters showed a statistically significant improvement. The fundamental frequency of the subejects' voice was increased and values of Jitter and Shimmer, NHR, [H1-H2] decreased. Fourth, the maximum phonation time of children was increased. These results imply that vocal relaxation training conducted in this study has a very positive effect to improve the voice of children with vocal nodules.
Acoustic Characteristics of Female Senior Citizens in Communities: The Effects of Residence and Depression
Hwang, Jaeho ; Kim, JungWan ;
Phonetics and Speech Sciences, volume 4, issue 4, 2012, Pages 155~162
DOI : 10.13064/KSSS.2012.4.4.155
The population of Korea is ageing as the number of elderly people increases due to improvements in health care and diet. Accordingly, it is expected that interest in how to live actively during the years after retirement and how to communicate effectively will increase the demand for voice improvement methods and technology. However, the criteria to evaluate the voice strength and characteristics of the elderly are lacking. In this study, we analyzed the acoustic characteristics of elderly women living in the community according to residential status and mental health status (e.g. depressive mood). Accordingly, we selected women (n=63) above the age of 65 age who were living in the Seoul metropolitan area and Daegu Gyeongbuk. The selected subjects were divided into two groups: a normal speaker group (n=40) and a speaker group comprised of those suffering from depressive mood (n=23). This study analyzed the voice characteristics of subjects based on collected data through the sustained phonation of the vowel /a/. It was shown that there were differences among MPT, F0, Jitter, Shimmer and NHR depending on location of residence but no difference with regard to depressive mood. Therefore, we must consider location of residence in elderly as the key factor in demonstrating the voice norms of seniors.