Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 7, Issue 4 - Dec 2015
Volume 7, Issue 3 - Sep 2015
Volume 7, Issue 2 - Jun 2015
Volume 7, Issue 1 - Mar 2015
Selecting the target year
Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition
Choi, Young Ho ; Ban, Sung Min ; Kim, Kyung-Wha ; Kim, Hyung Soon ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 3~10
DOI : 10.13064/KSSS.2015.7.1.003
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
Correlation between Physical Fatigue and Speech Signals
Kim, Taehun ; Kwon, Chulhong ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 11~17
DOI : 10.13064/KSSS.2015.7.1.011
This paper deals with the correlation between physical fatigue and speech signals. A treadmill task to increase fatigue and a set of subjective questionnaire for rating tiredness were designed. The results from the questionnaire and the collected bio-signals showed that the designed task imposes physical fatigue. The t-test for two-related-samples between the speech signals and fatigue showed that the parameters statistically significant to fatigue are fundamental frequency, first and second formant frequencies, long term average spectral slope, smoothed pitch perturbation quotient, relative average perturbation, pitch perturbation quotient, cepstral peak prominence, and harmonics to noise ratio. According to the experimental results, it is shown that mouth is opened small and voice is changed to be breathy as the physical fatigue accumulates.
A comparison of CPP analysis among breathiness ranks
Kang, Youngae ; Koo, Bonseok ; Jo, Cheolwoo ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 21~26
DOI : 10.13064/KSSS.2015.7.1.021
The aim of this study is to synthesize pathological breathy voice and to make a cepstral peak prominence (CPP) table following breathiness ranks by cepstral analysis to supplement reliability of the perceptual auditory judgment task. KlattGrid synthesizer included in Praat was used. Synthesis parameters consist of two groups, i.e., constants and variables. Constant parameters are pitch, amplitude, flutter, open phase, oral formant and bandwidth. Variable parameters are breathiness (BR), aspiration amplitude (AH), and spectral tilt (TL). Five hundred sixty samples of synthetic breathy vowel /a/ for male were created. Three raters participated in ranking of the breathiness. 217 were proved to be inadequate samples from perceptual judgment and cepstral analysis. Finally, 343 samples were selected. These CPP values and other related parameters from cepstral analysis are classified under four breathiness ranks (B0~B3). The mean and standard deviation of CPP is
dB(B3). The value of CPP decreases toward the severe group of breathiness because there is a lot of noise and a small quantity of harmonics.
Effects of Background Noises on Speech-related Variables of Adults who Stutter
Park, Jin ; Oh, Sunyoung ; Jun, Je-Pyo ; Kang, Jin Seok ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 27~37
DOI : 10.13064/KSSS.2015.7.1.027
This study was mainly aimed at investigating on the effects of background noises (i.e., white noise, multi-speaker conversational babble) on stuttering rate and other speech-related measures (i.e., articulation rate, speech effort). Nine Korean-speaking adults who stutter participated in the study. Each of the participants was asked to read a series of passages under each of four experimental conditions (i.e., typical solo reading (TR), choral reading (CR), reading under white noise presented (WR), reading with multi-speaker conversational babble presented (BR). Stuttering rate was computed based on a percentage of syllables stuttered (%SS) and articulation rate was also assessed as another speech-related measure under each of the experimental conditions. To examine the amount of physical effort needed to read, the speech effort was measured by using the 9-point Speech Effort Self Rating Scale originally employed by Ingham et al. (2006). Study results showed that there were no significant differences among each of the passage reading conditions in terms of stuttering rate, articulation rate, and speech effort. In conclusion, it can be argued that the two different types of background noises (i.e., white noise and multi-speaker conversational babble) are not different in the extent to which each of them enhances fluency of adults who stutter. Self ratings of speech effort may be also useful in measuring speech-related variables associated with vocal changes induced under each of the fluency enhancing conditions.
Acoustic Characteristics of Stop Consonants in Normal Elderly
Yoo, Hyunji ; Kim, HyangHee ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 39~45
DOI : 10.13064/KSSS.2015.7.1.039
Changes in speech production in normal elderly might be subtle and gradual. Therefore, an acoustic analysis is appropriate to identify the effect of aging on speech. For this purpose, this study examined four speech parameters; voice onset time (VOT), VOT range,
of following vowel(
difference in two age groups, old (mean age 74.57 yrs.) and young (m: 27.43 yrs.). The results show that compared to the older group the younger demonstrated significantly shorter VOTs in lenis and longer in aspirated stop. VOT ranges were relatively broad and consequently overlapped between the phonation types (e.g., lenis, fortis, aspirated). The
values in the older group which are an integral parameter with VOT were lower compared with the young group. The
differences in the old female group were significantly narrower than the young female group, therefore, clear distinction became difficult. In conclusion, contrast in temporal information was obscured, and the domain of glottal information was diminished on stop consonants in Korean elderly. The findings suggest that central/peripheral changes by aging could lead to a deficit in coordination between phonation and articulation.
The Change of Acceptability for the Mild Dysarthric Speakers' Speech due to Speech Rate and Loudness Manipulation
Kim, Jiyoun ; Seong, Cheoljae ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 47~55
DOI : 10.13064/KSSS.2015.7.1.047
This study examined whether speech acceptability was changed under various conditions of prosodic manipulations. Both speech rate and voice loudness reportedly are associated with acceptability and intelligibility. Speech samples by twelve speakers with mild dysarthria were recorded. Speech rate and loudness changes were made by digitally manipulating habitual sentences. 3 different loudness levels (70, 75, & 80dB) and 4 different speech rates (normal, 20% rapidly, 20% slowly, & 40% slowly) were presented to 12 SLPs (speech language pathologists). SLPs evaluated sentence acceptability by 7-point Likert scale. Repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant change in speech acceptability. A faster speech rate (20% rapidly) rather than habitual and slower rates (20%, 40% slowly) resulted in significant improvement in acceptability ratings (p <.001). An increased vocal loudness (up to 80dB) resulted in significant improvement in acceptability ratings (p <.05). Speech rate and loudness changes in the prosodic properties of speech may contribute to improved acceptability.
The final stop consonant perception in typically developing children aged 4 to 6 years and adults
Byeon, Kyeongeun ; Ha, Seunghee ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 57~65
DOI : 10.13064/KSSS.2015.7.1.057
This study aimed to identify the development pattern of final stop consonant perception using the gating task. Sixty-four subjects participated in the study: 16 children aged 4 years, 16 children aged 5 years, 17 children aged 6 years, and 15 adults. One-syllable words with consonant-vowel-consonant(CVC) structure, mokㄱ-motㄱ and papㄱ-patㄱ were used as stimuli in order to remove the redundancy of acoustic cues in stimulus words, 40ms-length (-40ms) and 60ms-length (-60ms) from the entire duration of the final consonant were deleted. Three conditions (the whole word segment, -40ms, -60ms) were used for this speech perception experiment. 48 tokens (4 stimuli
trials) in total were provided for participants. The results indicated that 5 and 6 year olds showed final consonant perception similar to adults in stimuli, papㄱ-patㄱ and only the 6-year-old children showed perception similar to adults in stimuli, 'mokㄱ-motㄱ. The results suggested that younger typically developing children require more acoustic information to accurately perceive final consonants than older children and adults. Final consonant perception ability may become adult-like around 6 years old. The study provides fundamental data on the development pattern of speech perception in normal developing children, which can be used to compare to those of children with communication disorders.
Comparison of Acoustic Characteristics of Vowel and Stops in 3, 4 year-old Normal Hearing Children According to Parents' Deafness: Preliminary Study
Hong, Jisook ; Kang, Youngae ; Kim, Jaeock ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 67~77
DOI : 10.13064/KSSS.2015.7.1.067
The purpose of this study was to investigate how deaf parents influence the speech sounds of their normal-hearing children. Twenty four normal hearing children of deaf adults (CODA) and normal hearing parents (NORMAL) aged 3 to 4 participated in the study. The F1, F2, and the vowel triangle area in 7 vowels and the voice onset times (VOTs) and closure durations in 9 stops were measured. The results of the study are as follows. First, the F1 and F2 for all vowels were higher and the vowel triangle area was larger in CODA than in NORMAL although they were not statistically significant. Second, VOTs in
were longer in CODA than in NORMAL. Most stops in CODA appeared to be longer VOTs for most phonemes. Third, the manner and place of articulation in stops did not make a difference between CODA and NORMAL in VOTs and closed durations. CODA does not demonstrate the speech characteristics of deaf people, however, they seem to speak differently than NORMAL, which means CODA might be influenced by a different linguistic environment created by deaf parents in some way.
The Stability and Variability based on Vowels in Voice Quality Analysis
Choi, Seong Hee ; Choi, Chul-Hee ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 79~86
DOI : 10.13064/KSSS.2015.7.1.079
This study explored the vowel effect on acoustic perturbation measures in voice quality analysis. For this study, the perturbation parameters (%jitter, %shimmer) and noise parameter (SNR) were measured with 7 Korean vowels (/a/, /ɛ/, /i/, /o/, /u/, /ɯ/, /ʌ/) using CSpeech with 50 Korean normal young adults (24 males and 26 females). A significant vowel effect was found only in %shimmer and in particular, low-back /a/vowel was significantly different from other vowels in %shimmer. The least perturbation and noise were exhibited on high-back /ɯ/ and /o/ vowel, respectively. Based on tongue height, a significant higher %shimmer was demonstrated on low vowels than high vowels. In addition, back vowels in tongue advancement and rounded vowels in lip rounding showed significantly less perturbation and noise. The least variability of perturbation and noise within individuals was demonstrated on the vowel /i/ in three repeated measures. However, there was no significant difference among 3 token measures in single session among vowels tested except the vowel /o/. Consequently, the vowel /a/ commonly used in acoustic perturbation measures exhibited higher perturbation and noise whereas higher stability and less variability were demonstrated on the high-back vowel /u/. These results suggested that the Korean high-back vowel /u/ can be more appropriate and reliable for perturbation acoustic measures.
Dutch Listeners' Perception of Korean Stop Consonants
Choi, Jiyoun ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 89~95
DOI : 10.13064/KSSS.2015.7.1.089
We explored Dutch listeners' perception of Korean three-way contrast of fortis, lenis, and aspirated stops. The three Korean stops are all voiceless word-initially, whereas Dutch distinguishes between voiced and voiceless stops, so Korean voiceless stops were expected to be difficult for the Dutch listeners. Among the three Korean stops, fortis stops are phonetically most similar to Dutch voiceless stops, thus they were expected to be the easiest to distinguish for the Dutch listeners. Dutch and Korean listeners carried out a discrimination task using three crucial comparisons, i.e., fortis-lenis, fortis-aspirated, and lenis-aspirated stops. Results showed that discrimination between lenis and aspirated stops was the most difficult among the three comparisons for both Dutch and Korean listeners. As expected, Dutch listeners discriminated fortis from the other stops relatively accurately. It seems likely that Dutch listeners relied heavily on VOT but less on F0 when discriminating between the three Korean stops.
An acoustical analysis of synchronous English speech using automatic intonation contour extraction
Yi, So Pae ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 97~105
DOI : 10.13064/KSSS.2015.7.1.097
This research mainly focuses on intonational characteristics of synchronous English speech. Intonation contours were extracted from 1,848 utterances produced in two different speaking modes (solo vs. synchronous) by 28 (12 women and 16 men) native speakers of English. Synchronous speech is found to be slower than solo speech. Women are found to speak slower than men. The effect size of speech rate caused by different speaking modes is greater than gender differences. However, there is no interaction between the two factors (speaking modes vs. gender differences) in terms of speech rate. Analysis of pitch point features has it that synchronous speech has smaller Pt (pitch point movement time), Pr (pitch point pitch range), Ps (pitch point slope) and Pd (pitch point distance) than solo speech. There is no interaction between the two factors (speaking modes vs. gender differences) in terms of pitch point features. Analysis of sentence level features reveals that synchronous speech has smaller Sr (sentence level pitch range), Ss (sentence slope), MaxNr (normalized maximum pitch) and MinNr (normalized minimum pitch) but greater Min (minimum pitch) and Sd (sentence duration) than solo speech. It is also shown that the higher the Mid (median pitch), the MaxNr and the MinNr in solo speaking mode, the more they are reduced in synchronous speaking mode. Max, Min and Mid show greater speaker discriminability than other features.
The effect of word length on f0 intervals: Evidence from North Kyungsang children
Kim, Jungsun ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 107~116
DOI : 10.13064/KSSS.2015.7.1.107
The present experiment investigated the effect of word length on the length of f0 intervals for North Kyungsang children. In order to find out the lengths of the f0 intervals, the f0 values at the midpoints of vowels in words were measured. F0 estimates were computed as intervals consistent with the logarithmic scale corresponding to the number of syllables in the words. The results indicated that the mean f0 intervals in words of different lengths showed a significant difference for the HH in HH vs. HHL and the LH in LH vs. LLH for North Kyungsang children. Adult speakers from the North Kyungsang region significantly differed only within the HH in HH vs. HHL. Adult speakers made a noticeable contribution in this characteristic from the children. The result of the adult study was presented to confirm whether the children used a North Kyungsang dialect. With respect to individual speaker differences, the North Kyungsang children showed more or less consistent patterns in quantile-quantile plots for the HH vs. HHL, but for the HL vs. LHL and LH vs. LLH, there were more variations than for the HH vs. HHL. The individual speakers' variation was the largest for the HL vs. LHL and the smallest for HH vs. HHL. Considering these results, the effect of word length on f0 intervals tended to show pitch accent-type-specific characteristics in the process of prosodic acquisition.
A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English
Yoon, Tae-Jin ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 117~124
DOI : 10.13064/KSSS.2015.7.1.117
This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.
Individual differences in autistic traits and variability in production patterns: a case of affricates by young Seoul Korean speakers
Kang, Soyoung ; Kong, Eun Jong ; Seo, Misun ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 125~131
DOI : 10.13064/KSSS.2015.7.1.125
The current study explores whether speaker variability in the fronted articulations of Seoul Korean affricates can be explained by cognitive differences measured by individual autistic traits. The goal was to explore Yu's (2010; 2013) proposal that individual differences in cognitive style can be an important factor in speakers' use of sound variants. The spectral peak frequencies (SPF) of affricates relative to those of fricatives, reported in Kong et al. (2014), were used to acoustically represent the relative degree of anterior place of constriction. When these individual SPFs were related to the scores of Autistic-Spectrum Quotients (Baron-Cohen et al., 2001), a correlation was found for the male speakers, but not for the female speakers, such that speakers of more anterior affricate productions scored low in AQs. Discussion is made with respect to how these findings are in line with Yu's proposal.
Vowel Variation in PC Communication Language and Phonetic Similarity
Ji, Yunjoo ; Kim, Ilkyu ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 133~138
DOI : 10.13064/KSSS.2015.7.1.133
The purpose of this study is to provide deeper understanding of how it is possible for people to understand PC communication language they have never seen or heard before without any problem. In order to answer this question, we focused on the vowel variation through which new variants are created (for PC communication), and hypothesized that there is a phonetic constraint which requires the vowel of the variant to be phonetically similar (to the maximum) to the vowel of the original word. Through the corpus analysis of the dictionary of PC communication language, we show that our hypothesis is justified by the fact that most of the variants we collected from the dictionary, that is, 90% of them, conformed to the phonetic constraint we postulated.
Developing a Korean Standard Speech DB
Shin, Jiyoung ; Jang, Hyejin ; Kang, Younmin ; Kim, Kyung-Wha ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 139~150
DOI : 10.13064/KSSS.2015.7.1.139
The data accumulated in this database will be used to develop a speaker identification system. This may also be applied towards, but not limited to, fields of phonetic studies, sociolinguistics, and language pathology. We plan to supplement the large-scale speech corpus next year, in terms of research methodology and content, to better answer the needs of diverse fields. The purpose of this study is to develop a speech corpus for standard Korean speech. For the samples to viably represent the state of spoken Korean, demographic factors were considered to modulate a balanced spread of age, gender, and dialects. Nine separate regional dialects were categorized, and five age groups were established from individuals in their 20s to 60s. A speech-sample collection protocol was developed for the purpose of this study where each speaker performs five tasks: two reading tasks, two semi-spontaneous speech tasks, and one spontaneous speech task. This particular configuration of sample data collection accommodates gathering of rich and well-balanced speech-samples across various speech types, and is expected to improve the utility of the speech corpus developed in this study. Samples from 639 individuals were collected using the protocol. Speech samples were collected also from other sources, for a combined total of samples from 1,012 individuals.
Korean /l/-flapping in an /i/-/i/ context
Son, Minjung ;
Phonetics and Speech Sciences, volume 7, issue 1, 2015, Pages 151~163
DOI : 10.13064/KSSS.2015.7.1.151
In this study, we aim to describe kinematic characteristics of Korean /l/-flapping in two speech rates (fast vs. comfortable). Production data was collected from seven native speakers of Seoul Korean (four females and three males) using electromagnetic midsagittal articulometry (EMMA), which provided two dimensional data on the x-y plane. We examined kinematic properties of the vertical/horizontal tongue tip gesture, the vertical/horizontal (rear) tongue body gesture, and the jaw gesture in an /i/-/i/ context. Gestural landmarks of the vertical tongue tip gesture are directly measured. This serves as the actual anchoring time points to which relevant measures of other trajectories referred. The study focuses on velocity profiles, closing/opening spatiotemporal properties, constriction duration, and constriction minima were analyzed. The results are summarized as follows. First, gradiently distributed spatiotemporal values of the vertical tongue tip gesture were on a continuum. This shows more of a reduction in fast speech rate, but no single instance of categorical reduction (deletion). Second, Korean /l/-flapping predominantly exhibited a backward sliding tongue tip movement, in 83% of production, which is apparently distinguished from forward sliding movement in English. Lastly, there was an indication of vocalic reduction in fast rate, truncating spatial displacement of the jaw and the tongue body, although we did not observe positional variations with speech rate. The present study shows that Korean /l/-flapping is characterized by mixed articulatory properties with respect to flapping sounds of other languages such as English and Xiangxiang Chinese. Korean /l/ flapping demonstrates a language-universal property, such as the gradient nature of its flapping sounds that is compatible with other languages. On the other hand, Korean /l/-flapping also shows a language-particular property, particularly distinguished from English, in that a backward gliding movement occurs during the tongue tip closing movement. Although, there was no vocalic reduction in V2 observed in terms of jaw and tongue body height, spatial displacement of these articulators still suggests truncation in fast speech rate.