Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 5, Issue 4 - Dec 2013
Volume 5, Issue 3 - Sep 2013
Volume 5, Issue 2 - Jun 2013
Volume 5, Issue 1 - Mar 2013
Selecting the target year
A Validity Study on Measurement of Mental Fatigue Using Speech Technology
Song, Seungkyu ; Kim, Jongyeol ; Jang, Junsu ; Kwon, Chulhong ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 3~10
DOI : 10.13064/KSSS.2013.5.1.003
This study proposes a method to measure mental fatigue using speech technology, which has not been used in previous research and is easier than existing complex and difficult methods. It aims at establishing a relationship between the human voice and mental fatigue based on experiments to measure the influence of mental fatigue on the human voice. Two monotonous tasks of simple calculation such as finding the sum of three one digit numbers were used to measure the feeling of monotony and two sets of subjective questionnaires were used to measure mental fatigue. While thirty subjects perform the experiment, responses to the questionnaire and speech data were collected. Speech features related to speech source and the vocal tract filter were extracted from the speech data. According to the results, speech parameters deeply related to mental fatigue are a mean and standard deviation of fundamental frequency, jitter, and shimmer. This study shows that speech technology is a useful method for measuring mental fatigue.
Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments
Hong, Jungpyo ; Park, Sangjun ; Jeong, Sangbae ; Hahn, Minsoo ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 11~16
DOI : 10.13064/KSSS.2013.5.1.011
This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.
Aerodynamic Characteristics, Vocal Efficiency, and Closed Quotient Differences according to Fundamental Frequency Fixation
Kim, Jaeock ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 19~26
DOI : 10.13064/KSSS.2013.5.1.019
The aerodynamic characteristics (subglottal pressure (Ps) and mean airflow rate (MFR)), fundamental frequency (Fo), intensity (I), vocal efficiency (VE), and closed quotient (CQ) were compared during a sustained vowel /o/ sound under three conditions: in a comfortable loudness and pitch level (condition 1), in a maximum loudness level with a fixed pitch (condition 2), and in a maximum loudness level without a fixed pitch (condition 3). Also, multiple regression analyses were done to measure the aerodynamic characteristics affect on the VE and the CQ in each condition. The results showed the Fo, Ps, MFR, VE, and CQ increased as I increased with and without fixed pitch. Most notably, VE in condition 3 was the highest of all the conditions, but CQ was not very high. By the results of multiple regression analysis, VE was significantly affected by I and Ps in all conditions; Fo was the other main key for affecting VE in high pitch. However, none of the aerodynamic characteristics significantly affected CQ. As I increases, Fo should be increased by increasing Ps and VE. Therefore, researchers should consider and specify an a priori to Fo, Ps, and I when measuring VE to examine the complex and delicate vocal mechanism.
Improving Cognitive Abilities for People with Alzheimer's Disease: Application and Effect of Reality Orientation Therapy (ROT)
Kim, JungWan ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 27~38
DOI : 10.13064/KSSS.2013.5.1.027
Healthcare providers in Korea are using conservative pharmacological treatment for Alzheimer's disease (AD) to delay the progress of the disease or to mitigate its behavioral and neurological symptoms. However, there is a growing need for interventions using practical non-pharmacologic treatment, as the effects of pharmacological treatments has faced limitations. This research provided a cognitive rehabilitation program to 3 AD patients and used a multiple baseline design across subjects to examine the effects. Performing reality orientation therapy (ROT) for 1 cycle (4 weeks) resulted in a slight increase in accuracy and responsiveness on an orientation task, mainly with patients with mild cases of AD. Also, in the sub-domain of the Korean-Mini Mental Status Examination performed to examine changes in cognitive ability, there were minimal changes in place orientation. In functional communication, however, there were no significant differences before and after the intervention. In conclusion, we found that ROT was an effective intervention for improving accuracy and responsiveness in the orientation of patients with mild cases of AD. In future studies, the effect of non-pharmacological interventions can be evaluated more reliably by examining the interaction effects of sample size, length of the intervention, outcome measurements, and pharmacological intervention.
Effects of Phonetic Complexity and Articulatory Severity on Percentage of Correct Consonant and Speech Intelligibility in Adults with Dysarthria
Song, HanNae ; Lee, Youngmee ; Sim, HyunSub ; Sung, JeeEun ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 39~46
DOI : 10.13064/KSSS.2013.5.1.039
This study examined the effects of phonetic complexity and articulatory severity on Percentage of Correct Consonant (PCC) and speech intelligibility in adults with dysarthria. Speech samples of thirty-two words from APAC (Assessment of Phonology and Articulation of Children) were collected from 38 dysarthric speakers with one of two different levels of articulatory severities (mild or mild-moderate). A PCC and speech intelligibility score was calculated by the 4 levels of phonetic complexity. Two-way mixed ANOVA analysis revealed: (1) the group with mild severity showed significantly higher PCC and speech intelligibility scores than the mild-moderate articulatory severity group, (2) PCC at the phonetic complexity level 4 was significantly lower than those at the other levels and (3) an interaction effect of articulatory severity and phonetic complexity was observed only on the PCC. Pearson correlation analysis demonstrated the degree of correlation between PCC and speech intelligibility varied depending on the level of articulatory severity and phonetic complexity. The clinical implications of the findings were discussed.
Consonant Confusions Matrices in Adults with Dysarthria Associated with Cerebral Palsy
Lee, Youngmee ; Sung, JeeEun ; Sim, HyunSub ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 47~54
DOI : 10.13064/KSSS.2013.5.1.047
The aim of this study was to analyze consonant articulation errors produced by 90 speakers with cerebral palsy (CP). Phonetic transcriptions were made for 37 single-word utterances containing 70 phonemes: 48 initial consonants and 22 final consonants. Errors of substitution, omission, and distortion were analyzed using a confusion matrix paradigm showing the visualization of error patterns. Results showed that substitution errors in initial and final consonants were most frequent, followed by omission and distortion. Consonant omission occurred more frequently on final consonants. In both initial and final consonants, the within-place errors were more prominent than the within-manner errors. The current results suggest that consonant confusion matrices for dysarthric speech may provide useful information for evaluating speech intelligibility and developing automatic speech recognition system of adults with CP associated dysarthria.
The Aerodynamic Comparisons between Pathologic Whispers and Phonation in Patients with Muscle Misuse Dysphonia
Seo, Inhyo ; Hwang, Youngjin ; Seong, Cheoljae ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 55~62
DOI : 10.13064/KSSS.2013.5.1.055
This study compared the aerodynamic multiparameters of whispers and phonation in patients with muscle misuse dysphonia(MMD) to evaluate the voice aerodynamic analysis for discrimination between whispers and phonation. Eleven patients with muscle misuse dysphonia were examined. Whispers were shorter with a maximum phonation time(MPT; p<.01), a lower phonatory sound pressure level(SPLp; p<.01), a higher phonatory flow rate (PFR; p<01), lower phonatory efficiency(PE; p<.01), and a lower phonatory resistance (PR; p<.05) than phonation. The subglottal pressure level was not significantly different between whispers and phonation. (Psub; p>.05). The ROC analysis showed that the threshold of 23.83 ppm for PE achieved a good classification for whispers, with the perfect sensitivity(100%) and specificity(100%). Those results indicate PE reliably distinguished between whispers and phonation. The results also suggest that PE may provide a useful tool for studying the laryngeal source.
A Study on the Characteristics of Phonation Threshold Pressure and Phonation Threshold Airflow of Patients with Functional Voice Disorder
Lee, Inae ; Yun, Joowon ; Hwang, Youngjin ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 63~69
DOI : 10.13064/KSSS.2013.5.1.063
This study attempted to investigate the characteristics of Phonation Threshold Pressure and Phonation Threshold Airflow of Patients who have Functional voice disorder. 50 subjects participated in study (32 subjects were patients who had functional voice disorders and 20 subjects were normal adults). The PAS (Phonatory aerodynamic system, model 6600, KAY electronics, Inc.) was used to measure the data and to do the analysis. Data from the Phonation Threshold Pressure was measured using voicing efficiency of the PAS protocol. Data from the Phonation Threshold Airflow was measured using Maximum Sustained Phonation of the PAS protocol. Those were used because of the ease of phonation. The results of this study showed that the differences in Phonation Threshold Pressure and Phonation Threshold Airflow between patients who had functional voice disorder and normal adults could be significant index. Patients who had functional voice disorder showed more higher figures than normal adults. These results suggest that Phonation Threshold Pressure and Phonation Threshold Airflow are very useful in diagnosing the voice disorder. The measured data also provided useful information for diagnosing patients with vocal fold diseases.
A Study on the Formant Comparison of Korean Monophthongs according to Age and Gender -A Survey on Patients in Oriental Hospitals-
Kim, Young-Su ; Kim, Keun Ho ; Kim, Jong Yeol ; Jang, Jun-Su ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 73~80
DOI : 10.13064/KSSS.2013.5.1.073
Formant is one of the essential vocal features for research of voice production, recognition and synthesis. Numerous studies were established on foreign languages including English vowels. However, studies related to Korean were done with a limited number of voice data. In this study, we compare four formants according to age and gender using a large number of Korean monophthongs. A total of 2614 Korean speakers participated in our experiments. We summarize statistical results by mean and standard deviation for each formant of five monophthongs. The results show a notable difference in each age and gender group. A quantitative study based on a large dataset is suggested for future studies on Korean speech sounds.
Acoustic Characteristics on the Adolescent Period Aged from 16 to 18 Years
Ko, Hye-Ju ; Kang, Min-Jae ; Kwon, Hyuk-Jae ; Choi, Yaelin ; Lee, Mi-Geum ; Choi, Hong-Shik ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 81~90
DOI : 10.13064/KSSS.2013.5.1.081
During adolescence the mutational period is characterized by the changes in the laryngeal structure, the length of the vocal cords, and a tone of voice. Usually, adolescents at 15 or 16 reach the voice of adults but the mutational period is sometimes delayed. Therefore, studies on the voice of adolescents between 16 ~ 18 right after the mutational period are required. Accordingly, this paper attempted to provide basic data about the normal standard for patients with voice disorders during this period by evaluating the vocal characteristics of males and females between 16 ~ 18 with an objective device bycomparing and analyzing them by sex and age. The study was conducted on a total of 60 subjects composed of each 10 subjects of each age. The vocal analysis was conducted by MPT (Maximum Phonation Time) measurement, sustained vowels and sentence reading. As for /a/ sustained vowels, fundamental frequency, hereinafter referred to as
, jitter, shimmer, noise-to-harmonic ratio, hereinafter referred to as NHR were measured by using the Multi-dimensional voice program (MDVP) among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). The sentence reading, mean
were measured using the Real-Time Pitch (RTP) Model 5121 among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). As a result, according to sex, there were statistically significant differences in
, jitter, shimmer, mean
, and minimum
; and according to age, there were statistically significant differences in MPT. In conclusion, the voice of the adolescents between 16 ~ 18 reached the maturity levels of adults but the voice quality which can be considered on the scale of voice disorders showed transition to the voice of an adult during the mutational period.
The Internal Structure of an Identification Function in Korean Lexical Pitch Accent in North Kyungsang Dialect
Kim, Jungsun ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 91~98
DOI : 10.13064/KSSS.2013.5.1.091
This paper investigated Korean prosody as it relates to graded internal structure in an identification function. Within Korean prosody, variants regarded as dialectal variations can appear as different prosodic scales, which contain the range of within-category variations. The current experiment was intended to show how the prosodic scale corresponding to the range of within-category differences relates to f0 contours for speakers of two Korean dialects, North Kyungsang and South Cholla. In an identification task, participants responded by selecting an item from two answer choices. The probability of choosing the correct response from the two choices was computed by a logistic regression analysis using intercepts and slopes. That is, the correct response between two choices was used to show a linear line with an s-shape presentation. In this paper, to investigate the graded internal structure of labeling, 25%, 50%, and 75% of predicted probability were assessed. Listeners from North Kyungsang showed progressive variations, whereas listeners from South Cholla revealed random patterns in the internal structure of the identification function. In this paper, the results were plotted using scatterplot graphs, applying the range of within-category variation and predicted probability obtained from the logistic regression analyses. The scatterplot graphs showed the different degree of the responses for f0 scales (i.e., variations within categories). The results demonstrate that the gradient structures of native pitch accent users become more progressive in response to f0 scales.
Prosodic Modifications of the Internal Phonetic Structure of Monosyllabic CVC Words in Conversational Speech
Mo, Yoonsook ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 99~108
DOI : 10.13064/KSSS.2013.5.1.099
Previous laboratory studies have shown that prosodic structures are encoded in the modulations of phonetic patterns of speech including suprasegmental as well as segmental features. In particular, effects of prosodic context on duration and intensity of syllables and words have been widely reported. Drawing on prosodically annotated large-scale speech data from the Buckeye corpus of conversational speech of American English, the current study attempted to examine whether and how prosodic prominence and phrase boundary of everyday conversational speech, as determined by a large group of ordinary listeners, are related to the phonetic realization of duration and intensity. The results showed that the patterns of word durations and intensities are influenced by prosodic structure. Closer examinations revealed, however, that the effects of prosodic prominence are not the same as those of prosodic phrase boundary. With regard to intensity measures, the results revealed the systematic changes in the patterns of overall RMS intensity near prosodic phrase boundary but the prominence effects are restricted to the nucleus. In terms of duration measures, both prosodic prominence and phrase boundary are the most closely related to the lengthening of the nucleus. Yet, prosodic prominence is more closely related to the lengthening of the onset while phrase boundary lengthens the coda duration more. The findings from the current study suggest that the phonetic realizations of prosodic prominence are different from those of prosodic phrase boundary, and speakers signal different prosodic structures through deliberate modulations of the internal phonetic structure of words and listeners attend to such phonetic variations.
Articulatory Attributes in Korean Nonassimilating Contexts
Son, Minjung ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 109~121
DOI : 10.13064/KSSS.2013.5.1.109
This study examined several kinematic properties of the primary articulator (the tongue dorsum) and the supplementary articulator (the jaw) in the articulation of the voiceless velar stop (/k/) within nonassimilating contexts. We examined in particular the spatiotemporal properties (constriction duration and constriction maxima) from the constriction onset to the constriction offset by analyzing a velar (/k/) followed by the coronal fricative (/s/), the coronal stop (/t/), and the labial (/p/) in across-word boundary conditions (/k#s/, /k#t/, and /k#p/). Along with these measurements, we investigated intergestural temporal coordination between C1 and C2 and the jaw articulator in relation to its coordination with the articulation of consonant sequences. The articulatory movement data was collected by means of electromagnetic midsagittal articulometry (EMMA). Four native speakers of Seoul Korean participated in the laboratory experiment. The results showed several characteristics. First, a velar (/k/) in C1 was not categorically reduced. Constriction duration and constriction degree of the velar (/k/) were similar within nonassimilating contexts (/k#s/=/k#t/=/k#p/). This might mean that spatiotemporal attributes during constriction duration were stable and consistent across different contexts, which might be subsequently associated with the nontarget status of the velar in place assimilation. Second, the gestural overlap could be represented as the order of /k#s/ (less) < /k#p/ (intermediate) < /k#t/ (more) as we measured the onset-to-onset lag (a longer lag indicated shorter gestural overlap.). This indicates a gestural overlap within nonassimilating contexts may not be constrained by any of the several constraints including the perceptual recoverability constraint (e.g., more overlap in Front-to-Back sequences compared to the reverse order (Back-to-Front) since perceptual cues in C1 can be recovered anytime during C2 articulation), the low-level speech motor constraint (e.g., more overlap in lingual-nonlingual sequences as compared to the lingual-lingual sequences), or phonological contexts effects (e.g., similarity in gestural overlap within nonassimilating contexts). As one possible account for more overlap in /k#t/ sequences as compared to /k#p/, we suspect speakers' knowledge may be receptive to extreme encroachment on C1 by the gestural overlap of the coronal in C2 since it does not obscure the perceptual cue of C1 as much as the labial in C2. Third, actual jaw position during C2 was higher in coronals (/s/, /t/) than in the labial (/p/). However, within the coronals, there was no manner-dependent jaw height difference in C2 (/s/=/t/). Vertical jaw position of C1 and C2 was seen as inter-dependent as higher jaw position in C1 was closely associated with C2. Lastly, a greater gap in jaw height was associated with longer intergestural timing (e.g., less overlap), but was confined to the cluster type (/kp/) with the lingual-nonlingual sequence. This study showed that Korean jaw articulation was independent from coordinating primary articulators in gestural overlap in some cluster types (/k#s/, /k#t/) while not in others (e.g., /k#p/). Overall, the results coherently indicate the velar stop (/k/) in C1 was robust in articulation, which may have subsequently contributed to the nontarget status of the velar (/k/) in place assimilation processes.
Korean Intonation Patterns from the Viewpoint of F
Lee, Ji Yeon ; Lee, Ho-Young ;
Phonetics and Speech Sciences, volume 5, issue 1, 2013, Pages 123~130
DOI : 10.13064/KSSS.2013.5.1.123
Previous researches on Korean intonation have been mainly focused on
slope, and the duration of intonation patterns. This study investigated Korean intonation patterns, both boundary and phrasal tones, in relation to the
percentage change between pitch targets. We measured the percentage change between the pitch targets of both boundary and phrasal tones. Additionally, the
change between the preceding pitch target and the first pitch target of the boundary tone and the
targets of the sequence of two LH phrasal tones ('LH + LH') were also measured. Two phrasal tones, LHLH and HLH, were compared with 'LH + LH' and the 'HLH' in the LHLH pattern respectively. We found that the percentage change between pitch targets in the phrasal tone is fixed to some extent. This helped explain why the slope of the phrasal tone is closely related to the number of syllables and the duration of the phrasal tone as discussed in previous studies. Since we analyzed the intonation patterns with the utterances from a large speech corpus, the results of this paper are expected to be used in building a larger annotated corpus of Korean.