Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 6, Issue 4 - Dec 2014
Volume 6, Issue 3 - Sep 2014
Volume 6, Issue 2 - Jun 2014
Volume 6, Issue 1 - Mar 2014
Selecting the target year
Measuring Correlation between Mental Fatigues and Speech Features
Kim, Jungin ; Kwon, Chulhong ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 3~8
DOI : 10.13064/KSSS.2014.6.2.003
This paper deals with how mental fatigue has an effect on human voice. For this a monotonous task to increase the feeling of the fatigue and a set of subjective questionnaire for rating the fatigue were designed. From the experiments the designed task was proven to be monotonous based on the results of the questionnaire responses. To investigate a statistical relationship between speech features extracted from the collected speech data and fatigue, the T test for two-related-samples was used. Statistical analysis shows that speech parameters deeply related to the fatigue are the first formant bandwidth, Jitter, H1-H2, cepstral peak prominence, and harmonics-to-noise ratio. According to the experimental results, it can be seen that voice is changed to be breathy as mental fatigue proceeds.
Computer-Based Fluency Evaluation of English Speaking Tests for Koreans
Jang, Byeong-Yong ; Kwon, Oh-Wook ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 9~20
DOI : 10.13064/KSSS.2014.6.2.009
In this paper, we propose an automatic fluency evaluation algorithm for English speaking tests. In the proposed algorithm, acoustic features are extracted from an input spoken utterance and then fluency score is computed by using support vector regression (SVR). We estimate the parameters of feature modeling and SVR using the speech signals and the corresponding scores by human raters. From the correlation analysis results, it is shown that speech rate, articulation rate, and mean length of runs are best for fluency evaluation. Experimental results show that the correlation between the human score and the SVR score is 0.87 for 3 speaking tests, which suggests the possibility of the proposed algorithm as a secondary fluency evaluation tool.
An Automatic Method of Detecting Audio Signal Tampering in Forensic Phonetics
Yang, Il-Ho ; Kim, Kyung-Wha ; Kim, Myung-Jae ; Baek, Rock-Seon ; Heo, Hee-Soo ; Yu, Ha-Jin ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 21~28
DOI : 10.13064/KSSS.2014.6.2.021
We propose a novel scheme for digital audio authentication of given audio files which are edited by inserting small audio segments from different environmental sources. The purpose of this research is to detect inserted sections from given audio files. We expect that the proposed method will assist human investigators by notifying suspected audio section which considered to be recorded or transmitted on different environments. GMM-UBM and GSV-SVM are applied for modeling the dominant environment of a given audio file. Four kinds of likelihood ratio based scores and SVM score are used to measure the likelihood for a dominant environment model. We also use an ensemble score which is a combination of the aforementioned five kinds of scores. In the experimental results, the proposed method shows the lowest average equal error rate when we use the ensemble score. Even when dominant environments were unknown, the proposed method gives a similar accuracy.
Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation
Chung, Yongjoo ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 29~34
DOI : 10.13064/KSSS.2014.6.2.029
In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.
Preliminary Study for Comparison of Subjective Voice Evaluations among Vocal and Applied Music Major Students
Lee, Dahye ; Hwang, Youngjin ; Kim, Jaeock ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 37~45
DOI : 10.13064/KSSS.2014.6.2.037
The purpose of this study was to determine whether the Korean Singing Voice Handicap Index (K-SVHI) was suitable for singers in other genres than vocal music to assess their vocal problems subjectively. Twenty six college students majoring in vocal music and twenty six students majoring in applied music were included in the study. They were divided into G0 and G1 in voice quality using the GRBAS scale during the tasks of singing. K-SVHI was divided into three sub-areas (Physical, Functional, and Emotional). In the singing task, both groups showed no significant difference between K-SVHI scores by G scale. In the reading task, the vocal music group had significantly higher K-SVHI in G0 than in G1 in K-SVHIs by G scale, while the applied vocal music group had significantly higher K-SVHI in G1 than in G0. Also, the two groups were not significantly different in G0, G1 in the singing task while the vocal music group showed higher K-SVHI than the applied vocal music group in G0 in the reading task. In addition, the vocal music group had higher K-SVHI than the applied vocal music group in G1 in both tasks. As comparing by groups in three sub-areas of K-SVHI, significant differences were found in the Emotional and Functional area. Those results showed that singers felt their voice problems differently by musical genres, which means that K-SVHI may not be a proper tool for evaluating voice handicap of singers in diverse voice music genres.
Spectral and Cepstral Analyses of Esophageal Speakers
Shim, Hee-Jeong ; Jang, Hyo-Ryung ; Shin, Hee-Baek ; Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 47~54
DOI : 10.13064/KSSS.2014.6.2.047
The purpose of this study was to analyze spectral versus cepstral measurements in esophageal speakers. The comparison between the measurements in thirteen male esophageal speakers was compared with the control group of thirteen normal speakers using the sustained vowel /a/. The main results can be summarized as below: (a) the CPP and L/H ratio of the esophageal group were significantly lower than those of the control group (b) the CPP was significantly correlated with the spectral parameters such as jitter, shimmer, NHR and VTI, and (c) the ROC analysis showed that the threshold of 10.25dB for the CPP achieved a good classification for esophageal speakers, with 100% perfect sensitivity and specificity. Thus, it was known that cepstral-based acoustic measures such as CPP, may be more reliable predictors than other spectral-based acoustic measures such as jitter and shimmer. And it was found that cepstral-based acoustic measures were effective in distinguishing esophageal voice quality from normal voice quality. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with laryngectomees.
The Articulation Characteristics of the Profound Hearing-Impaired Children with Reference to Formant Bandwidth
Choi, Eunah ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 55~64
DOI : 10.13064/KSSS.2014.6.2.055
This study measured formant bandwidths of profound hearing impaired children and examined the characteristics of their articulation. For this study, 10 cochlear implanted children(CI), 10 hearing aid children(HA) and 10 normal hearing children(NH) were asked to read 7 Korean vowels(/ɑ, ʌ, o, u, ɯ, i, ɛ/). The subjects' readings were recorded by NasalView and analyzed by Praat. The analysis of the formant bandwidths explains the degree of vocal fold opening and the characteristics of radiation. Through the analysis of formant bandwidth, we can see that the hearing-impaired maintain vocal fold tension when they speak high vowels and characteristics of radiation. Narrower B1 means better maintain vocal fold tension, wider B2 means more front and wider B3 means the rounder lips. CI's B1 was widest and NH's was narrowest. And females' B1 was wider than males'. Among vowels, B1 of /a/ was widest, and B1 of /i/ was narrowest. In the case of B2, HA and NH's B2 was wider than CI's. Females' B2 was wider than males'. And B2 of /i/ was widest, and B2 of /ʌ/ was narrowest. In the case of B3, NH's was widest, and CI's was narrowest. Males' was wider than females'. Among vowels, B3 of /o/ was widest, and B3 of /ɛ/ was narrowest. As a result, first, through the analysis of B1, we can find that NH and males could better maintain vocal fold tension than the hearing-impaired or females, and all children articulate /i/ with vocal fold tension than other vowels. Second, through the analysis of B2, NH and HA articulate vowels with the weaker rounded than CI does. And females articulate vowels with the weaker rounded than males do. Third, through the analysis of B3, NH articulate vowels with the rounder than HA or CI do, and males articulate vowels with the rounder than females do. Through the results, we can expect that the analysis of formant bandwidth will be applied to the therapy of articulation for the hearing-impaired with hearing aids or cochlear implant.
Phonological Characteristics of Early Vocabulary in Young Children with Cleft Palate
Ha, Seunghee ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 65~71
DOI : 10.13064/KSSS.2014.6.2.065
The purpose of this study was to investigate whether young children with cleft palate differ from those of noncleft typically developing children in terms of expressive vocabulary size, phonological characteristics and lexical selectivity. A total of 12 children with cleft palate and 12 noncleft children who were matched by age and gender participated in the study. The groups were compared by size of expressive vocabulary reported on Korean version of MacArthur-Bates Communicative Development Inventories and the number of different words, consonant inventory, the percentage of words beginning with obstruents and vowels, nasal, and glottal sounds, and the percentage of words which do not include obstruents in a language sample. Also, correlation analysis were performed to examine the relationship between measures on size of expressive vocabulary and phonological characteristics. The results showed that expressive vocabulary size and consonant inventory for children with cleft palate produced significantly smaller than those for noncleft children. Children with cleft palate produced significantly more words beginning with vowel or which do not include obstruents, and fewer words beginning with obstruents than noncleft children. The two groups showed different results on significant correlations between measures on size of expressive vocabulary and phonological characteristics indicating that children with cleft palate show different lexical selectivity from their noncleft peers. The results suggest that children with cleft palate aged 18-30 months demonstrate a slower rate of lexical and phonological development compared with their noncleft peers and they develop lexical selectivity reflecting cleft palate speech. The results will have a clinical implication on speech-language intervention for young children with cleft palates.
Comparison of Acoustic Phonetic Characteristics of Korean Fricative Sounds Pronounced by Hearing-impaired Children and Normal Children
Kim, YunHa ; Kim, Eunyeon ; Jang, Seoung-Jin ; Choi, Yaelin ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 73~79
DOI : 10.13064/KSSS.2014.6.2.073
Alveolar fricative sounds /s/ and /s'/ are learned last for normal children in the speech development process for Koreans. These are especially difficult to articulate for hearing-impaired children often causing articulation errors. The acoustic phonetic evaluation uses testing tools to provide indirect and object information. These objective resources can be compared with standardized resources on speech when interpreting the results of a test. However, most previous studies in Korea did not consider acoustic studies that used the spectrum moment values of hearing-impaired children. Therefore, this study was conducted to compare the characteristics of hearing-impaired children's pronunciation of fricative sounds using spectrum moment values. For this purpose, the study selected a total of 10 hearing-impaired children (5 boys and 5 girls) currently in 3rd or 5th grade and attending one of the elementary schools in Seoul or Gyeonggi-do. For the selection process, their age, type of hearing aid, implantation of hearing aid (CI) before two years of age, hearing capacity (dB) before and after wearing the hearing aid, duration of speech rehabilitation, and time of learning alveolar fricative sounds were all considered. Also, 10 normal children (5 boys and 5 girls) were selected among 3rd or 5th grade students attending one of the elementary schools in Seoul or Gyeonggi-do. The subjects were asked to read the carrier sentence, "I say _______," including a list of 12 meaningless syllables composed of CV and VCV syllables, including alveolar fricative sounds /s/ and /s'/ and vowels /a/, /i/, and /u/. The recorded resources were processed through the Time-frequency Analysis Software Program to measure M1 (mean), M2 (variance), M3 (skewness), and M4 (kurtosis) of the fricative noise. No significant differences were found when comparing spectrum threshold values in the acoustic phonetic characteristics of hearing-impaired children and normal children in alveolar fricative sound pronunciation according to vowels /a/, /i/, and /u/, alveolar fricative sounds /s/ and /s'/, and syllable structure (CV, VCV) other than, for M3 in the comparison of groups according to disability. In the comparison of syllable structures, there were statistically significant differences in M1, M2, M3, and M4 with clinical significance. However, there was no significant difference in results when comparing the alveolar fricative sounds according to the vowels.
Listener's Age Estimation by Prosody Manipulation
Kim, Jiyoun ; Seong, Cheoljae ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 81~88
DOI : 10.13064/KSSS.2014.6.2.081
The normal aging process on speech production and these changes are perceived by listeners. This study examined whether age perception changed under various conditions of prosodic manipulations in normal listeners, comparing the prosodic changes according to age and sex in adulthood. The older and younger voices were resynthesized by manipulation of the speaking rate and pitch to shift the perceived age of the groups toward each other. Two-way repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant shift in perceived age of young and old voices. The manipulation of the speaking rate resulted in a significant shift in perceived age for the older and younger groups. A significant shift in age estimates was not observed for the younger male group when pitch was manipulated. There were significant gender-by-age group interactions for prosodic manipulation type. Age-related changes in the prosodic properties of speech may ultimately influence speech perception.
Vowel Space Area and Speech Intelligibility of Children with Cochlear Implants
Park, Hyemi ; Huh, Myungjin ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 89~96
DOI : 10.13064/KSSS.2014.6.2.089
This study measured speech intelligibility in relation to the vowel space area and the perception of the listener through acoustic analysis of children who had received cochlear implants. It also provided basic data in the evaluation of speech intelligibility by analyzing the correlation between the vowel space area and speech intelligibility. As a research method, the vowel space area was analyzed by obtaining the value of
in children three years after receiving cochlear implants, and compared them to normal children by measuring speech intelligibility through interval scaling. A product-moment correlation analysis was conducted to investigate the correlation. Results showed that the vowel space area of the children who had received cochlear implants was significantly different from that of the normal children, though their speech intelligibility showed similar points to those of the normal children. The results of the correlation analysis on the vowel space area and speech intelligibility showed no significant correlation. Therefore, the period of improving intelligibility after receiving cochlear implants and the objective standards of the vowel space area could be established. In addition, the acoustic rating was required to increase the accuracy of the objective measurement in the evaluation of speech intelligibility.
Characteristics of Maximal Tongue and Lip Strength and Tongue Endurance Scores According to Age and Gender in Healthy Korean Adults
Song, Yunkyung ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 97~106
DOI : 10.13064/KSSS.2014.6.2.097
The purpose of this study was to (1) establish a Korean adult normative data for Iowa Oral Performance Instrument, (2) investigate the characteristics of maximal tongue and lip strength and tongue endurance scores according to age and gender, and (3) examine the correlation of those scores. The results showed that there were no significant differences of gender in maximal tongue strength and tongue endurance scores. But there were significant differences of age in maximal tongue and lip strength and tongue endurance scores. The data will provide an important database for speech language pathology with the purpose of diagnosis and treatment of tongue and lip dysfunction.
Difference in Voice Parameters of MDVP and Praat Programs according to Severity of Voice Disorders in Vocal Nodule
Shim, SangYong ; Kim, HyangHee ; Kim, JaeOck ; Shin, JiCheol ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 107~114
DOI : 10.13064/KSSS.2014.6.2.107
MDVP and Praat are measured by nine variables in common; F0, jitter local, jitter absolute, jitter relative average perturbation, jitter period perturbation quotient, shimmer local, shimmer dB, shimmer amplitude perturbation quotient, and NHR. In the present study, 30 female subjects were divided by their disorders(control group, vocal nodule group), ages(from 18 to 50 years old), gender(women), and severities of voice disorder(GRBAS-G0, G1, G2). Then, the subjects' vowel /a/ was evaluated by MDVP and Praat. First, jitter and shimmer variables of the MDVP were significantly different by severities. Praat showed different jitter, shimmer, and NHR parameters by severities. Second, jitter and NHR levels of MDVP were meaningfully higher than Praat regardless their severities. The result of the research confirms the relationships among GRBAS, MDVP and Praat as well as the differences in acoustic variables between MDVP and Praat.
The Relationship between Acoustic Characteristics and Voice Handicap Index in Esophageal Speakers
Jang, Hyo-Ryung ; Shim, Hee-Jeong ; Shin, Hee-Baek ; Ko, Do-Heung ; Kim, Hyun-Ki ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 115~121
DOI : 10.13064/KSSS.2014.6.2.115
This paper investigates the relationship between acoustic characteristics and voice handicap index for 29 males with esophageal speakers. Acoustic characteristics were measured by using a sustained vowel /a/ three times. The stable vocalization for 2 seconds was analyzed by MDVP program. Specifically, relationships between four VHI scores (total, functional, physical, and emotional) and three acoustic characteristics (jitter, shimmer, and NHR) were investigated using the Pearson correlation coefficient. As results, we found no relationship between NHR and VHI scores. However, both jitter and shimmer had statistically significant correlations with all four VHI scores. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with esophageal speakers. Further research could be done to examine the overall quality of life survey, which is widely used as a subjective measure about voice for patients with esophageal speakers.
The Comprehension and Production of Tense Markings in Language Delayed Children and Typically Developing Children
Jo, Miok ; Choi, Soyoung ; Hwang, Mina ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 123~131
DOI : 10.13064/KSSS.2014.6.2.123
The purpose of this study is to investigate the comprehension and production of various tense markings in Korean-speaking children with and without language delay. Thirty children with language delay(LD) and 30 typically developing(TD) children participated in the study. In each group, half were at the age of 4-years and the other half at 7-years. In both the comprehension and production task, 28 verbs containing four types of tense markings were used: past tense '-et ta', two present progressives '-ko itta', '-enta', and future tense '-elyeko hanta'. In the comprehension task, the children were presented with three printed still-scenes of video recording of a verb action, each representing future, present progressive, and past tense of the verb, respectively. Then they listened to the action verb with one of the 4 tense markings and had to pick the scene that matched the verb tense. In the production task, the children were given one of the three scenes and asked to produce the verb with appropriate tense marking. In both tasks, the LD children performed significantly worse than the TD children, and the older children performed significantly better than the younger children. Interestingly, the pattern of performances across different types of tense markings at the two language-age levels were closely similar in LD children and TD children. This similarity of groups seemed stronger in the comprehension task than the production task.
A Study on the Validation of Phonation Threshold Power and the Clinical Usefulness of PTW: A Preliminary Study
Hwang, Youngjin ; Lee, Inae ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 133~138
DOI : 10.13064/KSSS.2014.6.2.133
This study attempted to investigate the validation of Phonation Threshold Power of Patients who have Functional voice disorder. 50 subjects participated in the study (32 subjects were patients who had functional voice disorders and 20 subjects were normal adults). The PAS (Phonatory aerodynamic system, model 6600, KAY electronics, Inc.) was used to measure the data and to do the analysis. Data from the Phonation Threshold Power was measured multiplying Phonation Threshold Pressure and Phonation Threshold Airflow. Phonation Threshold Pressure and Phonation Threshold Airflow were measured by the PAS protocol. Those were used because of the ease of phonation. The results of this study showed that the differences in Phonation Threshold Power between patients who had functional voice disorder and normal adults could become a significant index. Patients who had functional voice disorder showed more higher figures than normal adults. The results of study showed that Phonation threshold Power is more sensitive than Phonation Threshold Pressure and Phonation Threshold Airflow. The measured data also provided useful information for diagnosing patients with vocal fold.
Initial-syllable lengthening of an utterance-internal phrase in Korean
Yun, Ilsung ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 141~151
DOI : 10.13064/KSSS.2014.6.2.141
This study reports anti-hierarchical initial-syllable lengthening of an utterance-internal phrase in Korean. That is, the phrase-initial syllable (e.g., /a/ of "apa-do" or /ma/ of "mapa-do") starting with a voiced phoneme (i.e., vowels or voiced consonants) manifests itself as significantly longer when it is preceded by another phrase without a pause than when it leads an utterance or follows a pause utterance-internally. The phenomenon was examined with regard to two other factors: (1) tempo and (2) tenseness of the consonant (/p,
/) following the target syllable /a/. First, the effect of tempo on initial lengthening was not significant. Apart from the statistical significance, however, a tendency was observed, i.e., the slower the tempo is, the greater the lengthening. By contrast, the faster the tempo is, the higher the ratio (%) of lengthening. Second, contrary to our expectations, initial-syllable lengthening was even greater before tense stops /
/ than before lax stop /p/ regardless of tempo, and it was remarkable when it comes to the ratio (%), which means that initial lengthening is free of the pre-consonantal vowel shortening effect. Final-syllable lengthening is a pre-boundary marker, while the initial-syllable lengthening is regarded as a post-boundary marker of a phrase.
Prosody and comprehension of ambiguous dative NPs in Korean
Kang, Soyoung ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 153~161
DOI : 10.13064/KSSS.2014.6.2.153
The current study reports the results from a cross-modal naming experiment investigating the effects of a prosodic boundary location on the comprehension of ambiguous dative NPs in Korean (Yeongmi-ka Ceonghi-eykey norae-rul pwulecwu-n pwuin-ul
). The underlined dative NP, Ceonghi-eykey, can temporarily be attached to the embedded rel-marked verb, pwulecwu-n ('sing-rel') or to the matrix verb to appear later. Participants heard sentence fragments manipulated for the location of Intonation Phrase boundary (the biggest prosodic boundary in the model of Seoul Korean) and right after that, had to name visually presented naming targets, which resolve the ambiguity of dative NPs. The prosodic manipulation did not result in difference in naming time, suggesting that the location of a prosodic boundary failed to influence the way Korean listeners interpreted ambiguous dative NPs. Possible reasons for the null effect were discussed.
An Experimental Study of Vowel Epenthesis among Korean Learners of English
Shin, Dong-Jin ; Iverson, Paul ;
Phonetics and Speech Sciences, volume 6, issue 2, 2014, Pages 163~174
DOI : 10.13064/KSSS.2014.6.2.163
Korean L2 speakers have many problems learning the pronunciation of English words. One of these problems is vowel epenthesis. Vowel epenthesis is the insertion of vowels into or between words, and Korean learners of English typically do this between successive consonants, either within clusters, or across syllables, word boundaries or following final coda consonants. The aim of this study was to investigate whether individual differences in vowel epenthesis are more closely related to the perception and production of segments (vowels and consonants) and prosody or if they are relatively independent from these processes. Subjects completed a battery of production and perception tasks. They read sentences, identified vowels and consonants, read target words likely to have epenthetic vowels (e.g., abduction) and demonstrated stress recognition and epenthetic vowel perception. The results revealed that Korean second-language learners (L2) have problems with vowel epenthesis in production and perception, but production and perception abilities were not correlated with one another. Vowel epenthesis was strongly related to vowel production and perception, suggesting that problems with segments may be combined with L1 phonotactics to produce epenthesis.