Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 5, Issue 4 - Dec 2013
Volume 5, Issue 3 - Sep 2013
Volume 5, Issue 2 - Jun 2013
Volume 5, Issue 1 - Mar 2013
Selecting the target year
An Acoustic Analysis of Diadochokinesis in Patients with Parkinson's Disease
Kang, Young Ae ; Park, Hyun Young ; Koo, Bon Seok ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 3~15
DOI : 10.13064/KSSS.2013.5.4.003
The acoustic analysis of diadochokinesis(DDK) has been used to evaluate dysarthria. However, there has not been an automatic method to evaluate dysarthria. The aim of this study was to introduce a new automated program to measure DDK tasks and to apply this to clinical patients with idiopathic Parkinson's disease(IPD). Fourty-seven patients with IPD and a healthy control group of twenty participants were selected with every DDK task recorded three times. Twenty-five acoustic parameters in the program were developed. The relevant parameters were times of DDK, pitch related parameters, intensity parameters which were analyzed by 2-way ANOVA. Significant differences between the groups were found in the times of DDK, pitch related parameters, and intensity parameters. The findings indicated that the pitch of control group was more stable than that of the IPD. Even though the patients with IPD had a higher intensity value, this phenomenon was caused by the weakness of the IPD group who could not control their speech with a breath.
Speech Intelligibility and Vowel Space Characteristics of Alaryngeal Speech
Shim, Hee-Jeong ; Jang, Hyo-Ryung ; Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 17~24
DOI : 10.13064/KSSS.2013.5.4.017
This study is aimed at finding out different types of speech characteristics categorized based on voice rehabilitation techniques used on twenty-six patients (all-male) with total or partial laryngectomees. The speech intelligibility of standard esophageal (SE), tracheoesophageal speech (TE), and electriclarynx (EL) was measured by using the CSL and eleven listeners were instructed to rate the speech on a 5-point scale. The vowel space parameters such as vowel space, VAI, FCR, and F2 ratio were measured by averaging 5 repeats of each vowel (/a/, /e/, /i/, /u/) and the results were put into the parameter formula. The results showed significant statistical differences in speech intelligibility and vowel space between SE and TE. The speech intelligibility and vowel space of TE were higher than those of SE or EL and there was a high correlation between speech intelligibility and some parameters (vowel space, VAI, F2 ratio). The results also showed that TE's speech characteristics were most similar to normal groups comparing with SE and EL, but still very deviant in laryngeal speech. This was due to insufficient airflow intake into the esophagus when producing sounds, and because articulation movement was carried out differently among groups. Therefore, these findings will contribute to establishing a baseline related to speech characteristics in voice rehabilitation for patients with alaryngeal speech.
Acoustic Characteristics of Patients with Total Laryngectomees via Voice Rehabilitation Techniques
Jang, Hyo-Ryung ; Shim, Hee-Jeong ; Ko, Do-Heung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 25~32
DOI : 10.13064/KSSS.2013.5.4.025
This research is aimed at finding the acoustic characteristics of different voice rehabilitation techniques, the electrolaryx (EL), standard esophageal (SE), and tracheoesophageal (TE), used on 17 patients with laryngectomees. The analysis of the voice qualities was achieved using MDVP. In order to compare the acoustic characteristics, patients were asked to produce the vowel /a/ sound. The acoustic analysis included fundamental frequency (f0), jitter, shimmer, and noise-to-harmonic ratio (NHR). The main acoustic results showed no significant statistical differences between the average measurements of SE and TE speakers. It was found that the current study showed the same tendency found in previous studies. There was also a significant difference between SE and EL speakers. On the other hand, there were no significant statistical differences between the average measurements of TE and EL speakers on all acoustic measurements. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation for patients with laryngectomees. In future, the present findings and issues should be considered in the context of gender. Specifically, the number of women who are diagnosed with laryngeal cancer continues to rise and their acoustic characteristics may indeed differ from those of men.
Voice Quality of Dysarthric Speakers in Connected Speech
Seo, Inhyo ; Seong, Cheoljae ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 33~41
DOI : 10.13064/KSSS.2013.5.4.033
This study investigated the perceptual and cepstral/spectral characteristics of phonation and their relationships in dysarthria in connected speech. Twenty-two participants were divided into two groups; the eleven dysarthric speakers were paired with matching age and gender healthy control participants. A perceptual evaluation was performed by three speech pathologists using the GRBAS scale to measure the cepstrual/spectral characteristics of phonation between the two groups' connected speech. Correlations showed dysarthric speakers scored significantly worse (with a higher rating) with severities in G (overall dysphonia grade), B (breathiness), and S (strain), while the smoothed prominence of the cepstral peak (CPPs) was significantly lower. The CPPs were significantly correlated with the perceptual ratings, including G, B, and S. The utility of CPPs is supported by its high relationship with perceptually rated dysphonia severity in dysarthric speakers. The receiver operating characteristic (ROC) analysis showed that the threshold of 5.08 dB for the CPPs achieved a good classification for dysarthria, with 63.6% sensitivity and the perfect specificity (100%). Those results indicate the CPPs reliably distinguished between healthy controls and dysarthric speakers. However, the CPP frequency (CPP F0) and low-high spectral ratio (L/H ratio) were not significantly different between the two groups.
Perceptions on Evaluation and Treatment of Swallowing Disorders in Speech-Language Pathologists
Yoon, Ji Hye ; Lee, Hyun-Joung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 43~51
DOI : 10.13064/KSSS.2013.5.4.043
The purpose of this study is to survey Speech-Language Pathologists' perception on evaluation and treatment of "swallowing disorders". An online questionnaire was sent to the 279 subjects attending undergraduate/graduate programs in speech therapy department and/or SLPs who work in various settings. The survey consisted of three parts: 1) background information and educational/clinical experiences that are associated with dysphagia (swallowing disorder), 2) the current state of diagnosis and treatment of dysphagia of clinical practice (certified SLPs only), 3) the recognition of diagnosis, treatment, education for dysphagia. Each item of the survey was scaled by the participants on a five-point Likert scale of 1 to 5 (1 being not at all and 5 being extremely) or self-reported answers. The results of the survey showed that SLPs have high interest in "swallowing disorder", but most of them regarded them very difficult to diagnose and treat. The reason is that they have not been trained as a swallowing specialist. Therefore it is necessary to provide more opportunities for education and practice to establish the expertise of SLPs.
The Characteristics of Voice Handicap Index and Vocal Misuse and Overuse in Female Elementary Teachers
Choi, Seong Hee ; Choi, Chul-Hee ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 53~61
DOI : 10.13064/KSSS.2013.5.4.053
Voice disorders are most common in female teachers due to work-related vocal demands; however, only a few studies tried to evaluate individual risk factors with work-related risk factors to diagnose voice disorders. This study evaluated sixty-seven female elementary teachers (36 with voice disorders and 31 without voice disorders) to compare their vocal misuse, overuse, and vocal hygiene behaviors. Total Voice Handicap Index scores and VHI subscale (P, E, F) scores were not significantly different between two groups (p>0.05) and there was no relationship between VHI and acoustic measures (p>0.05). Loud talking, talking in noisy situations, and excessive speaking were significantly more frequent in female teachers with voice disorders (p<0.05) and thereby these overuse and misuse behavioral patterns were identified as risk factors to develop voice disorders in female teachers. Also, hydration was the most common behavior for vocal hygiene when experiencing vocal fatigue; however, hydration with hot green tea or coffee and throat clearing were often misused for vocal hygiene. This study found that female teachers from both groups presented higher voice handicap regardless of voice disorders. This study suggests a multidimensional voice assessment protocol is required to reflect voice problems in teachers and a vocal education program may be important to improve vocal hygiene knowledge and behavioral changes in female teachers.
A Comparison of Aerodynamic Characteristics in Muscle Tension Dysphonia and Adductor Spasmodic Dysphonia
Heo, Jeonghwa ; Song, Kibum ; Choi, Yanggyu ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 63~70
DOI : 10.13064/KSSS.2013.5.4.063
The purpose of this study is to show the aerodynamic characteristics and differences in muscle tension dysphonia and adductor spasmodic dysphonia to predict factors which will provide additional information while preparing for the objective examination standard to distinguish the two dysphonias. Forty-eight individuals diagnosed with muscle tension dysphonia and adductor spasmodic dysphonia participated in this study. PAS was used in order to find the aerodynamic characteristics for the two dysphonias. The outcomes of this study show that the airflow variation and glottal resistance of the two groups showed noticeable differences. This study concludes that the aerodynamic characteristics may be used as additional information on diverse evaluations to classify muscle tension dysphonia and adductor spasmodic dysphonia.
A Comparison of Voice Analysis of Children with Cochlear Implant and with Normal Hearing
Yoon, Misun ; Choi, Eunah ; Sung, Youngju ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 71~78
DOI : 10.13064/KSSS.2013.5.4.071
The purpose of this study was to compare the acoustic voice outcomes of children with cochlear implant to those of children with normal hearing. Participants were 41 children using unilateral cochlear implant (18 males and 23 females), and children with normal hearing from the same age and sex. Mean age of implantation was approximately 3 years old, mean duration of implant use was 4 years in CI group. Acoustic analyses were performed using MDVP of CSL. Speech samples were 3 sustained vowels, /a, i, u/. 9 parameters (F0, Fhi, Flo, Jitter, Shimmer, vF0, vAm, NHR, and SPI) were analyzed. Children with CI did not show the significant differences in those parameters after the vowel /a/ phonation. Meanwhile, there were significantly different results in F0, Fhi, vF0, and SPI after /i, u/ phonation. These results revealed that differences of voice characteristics in children with CI compare to children with NH persist regarding vowel context. It suggests that high vowels would recommend as speech samples for acoustic evaluation. Futhermore perceptual analysis and speech therapy for phonation control would be necessary for children with CI.
L2 Proficiency Effect on the Acoustic Cue-Weighting Pattern by Korean L2 Learners of English: Production and Perception of English Stops
Kong, Eun Jong ; Yoon, In Hee ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 81~90
DOI : 10.13064/KSSS.2013.5.4.081
This study explored how Korean L2 learners of English utilize multiple acoustic cues (VOT and F0) in perceiving and producing the English alveolar stop with a voicing contrast. Thirty-four 18-year-old high-school students participated in the study. Their English proficiency level was classified as either 'high' (HEP) or 'low' (LEP) according to high-school English level standardization. Thirty different synthesized syllables were presented in audio stimuli by combining a 6-step VOTs and a 5-step F0s. The listeners judged how close the audio stimulus was to /t/ or /d/ in L2 using a visual analogue scale. The L2 /d/ and /t/ productions collected from the 22 learners (12 HEP, 10 LEP) were acoustically analyzed by measuring VOT and F0 at the vowel onset. Results showed that LEP listeners attended to the F0 in the stimuli more sensitively than HEP listeners, suggesting that HEP listeners could inhibit less important acoustic dimensions better than LEP listeners in their L2 perception. The L2 production patterns also exhibited a group-difference between HEP and LEP in that HEP speakers utilized their VOT dimension (primary cue in L2) more effectively than LEP speakers. Taken together, the study showed that the relative cue-weighting strategies in L2 perception and production are closely related to the learner's L2 proficiency level in that more proficient learners had a better control of inhibiting and enhancing the relevant acoustic parameters.
A Study of an Independent Evaluation of Prosody and Segmentals: With Reference to the Difference in the Evaluation of English Pronunciation across Subject Groups
Park, Hansang ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 91~98
DOI : 10.13064/KSSS.2013.5.4.091
This study investigates the difference in the evaluation of foreign-accentedness of English pronunciation across subject groups, evaluated accents, and compared components. This study independently evaluates the prosody and segmentals of the foreign-accented English sentences by pairwise difference rating. Using the prosody swapping technique, segmentals and prosody of the English sentences read by native speakers of American English (one male and one female) were combined with the corresponding segmentals and prosody of the English sentences read by male and female native speakers of Chinese, Japanese or Korean (one male and one female from each native language). These stimuli were evaluated by 4 different subject groups: native speakers of American English, Korean, Chinese, and Japanese. The results showed that the Japanese subject group scored higher in prosody difference than in segmental difference while the other groups scored the other way around. This study is significant in that the attitude toward the difference in segmentals and prosody of the foreign accents of English varies with the native language of the subject group. In other words, for native speakers of some languages, the difference in prosody could have a greater influence on the foreign-accentedness than the difference in segmentals, while for native speakers of other languages the other way around.
A Comparative Study of Relative Distances among English Front Vowels Produced by Korean and American Speakers
Yang, Byunggon ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 99~107
DOI : 10.13064/KSSS.2013.5.4.099
The purpose of this study is to examine the relative distances among English front vowels in a message produced by 47 Korean and American speakers in order to better instruct pronunciation skills of English vowels for Korean English learners. A Praat script was developed to collect the first and second formant values(F1 and F2) of eight words in each sound file which was recorded from an internet speech archive. Then, the Euclidean distances were measured between the three vowel pairs: [i-ɛ], [i-ɪ], and [ɛ-æ]. The first vowel pair [i-ɛ] was set as the reference from which the relative distances of the other two vowel pairs were measured in percent in order to compare the vowel sounds among speakers of different vocal tract lengths. Results show that F1 values of the front vowels produced by the Korean and American speakers increased from the high front vowel to the low front vowel wih differences among the groups. The Korean speakers generally produced the front vowels with smaller jaw openings than the American speakers did. Secondly, the relative distance of the high front vowel pair [i-ɪ] showed a significant difference between the Korean and American speakers while that of the low front vowel pair [ɛ-æ] showed a non-significant difference. Finally, the Korean speakers in the higher proficiency level produced front vowels with higher F1 values than those in the lower proficiency level. The author concluded that Korean speakers should produce the front high vowels distinctively by securing sufficient relative distance of the formant values. Further studies would be desirable to examine how strong the Korean speakers' English proficiency correlate with the relative distance of target words of comparable productions.
A Study on Human Evaluators Using the Evaluation Model of English Pronunciation
Yoon, Kyuchul ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 109~119
DOI : 10.13064/KSSS.2013.5.4.109
The purpose of this paper is to show the tendency of evaluators in the pronunciation evaluation of English utterances. The tendency was visualized using the evaluation model of English pronunciation proposed in . One hundred fifty female university students and four evaluators participated in the study. Students read eight English sentences aloud as evaluators evaluated English pronunciation by their own criteria. The models based on their pronunciation evaluation proved to be efficient in showing their evaluation tendency in terms of the fundamental frequency, intensity, segmental durations, and segmental spectra as compared to those of the five native speakers of English chosen for building the models. However, human evaluators were not always consistent in their evaluation and sometimes gave conflicting scores to the same students.
Perception and Production of Wh-Questions & Indefinite Yes-No Questions Produced by Chinese Korean-Learners
Yune, Youngsook ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 121~128
DOI : 10.13064/KSSS.2013.5.4.121
In Korean, wh-question and indefinite yes-no questions have the same morphemic and syntactic structure. In speech, however, these two types of questions are distinguished by a prosodic difference. In this study, we examined if Chinese Korean leaners can distinguish between these two types of questions in production and if they can correctly perceive the different meaning of a question based on the prosodic information. For this purpose, we analysed two types of interrogative sentences produced by 5 native speakers and 15 Chinese Korean language leaners. The results show that the 5 Korean native speakers produce two types of questions by a salient prosodic difference, i.e., difference of prosodic structure, different pitch range of wh-phrase and indefinite phrase, and different boundary tone. However, for the 15 Chinese speakers, the two types of questions were not distinguished by the same prosodic features but in the perception analysis they were able to distinguish between the two types of questions easily.
Effects of Prosodic Strengthening on the Production of English High Front Vowels /i, ɪ/ by Native vs. Non-Native Speakers
Kim, Sahyang ; Hur, Yuna ; Cho, Taehong ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 129~136
DOI : 10.13064/KSSS.2013.5.4.129
This study investigated how acoustic characteristics (i.e., duration, F1, F2) of English high front vowels /i, ɪ/ are modulated by boundary- and prominence-induced strengthening in native vs. non-native (Korean) speech production. The study also examined how the durational difference in vowels due to the voicing of a following consonant (i.e., voiced vs. voiceless) is modified by prosodic strengthening in two different (native vs. non-native) speaker groups. Five native speakers of Canadian English and eight Korean learners of English (intermediate-advanced level) produced 8 minimal pairs with the CVC sequence (e.g., 'beat'-'bit') in varying prosodic contexts. Native speakers distinguished the two vowels in terms of duration, F1, and F2, whereas non-native speakers only showed durational differences. The two groups were similar in that they maximally distinguished the two vowels when the vowels were accented (F2, duration), while neither group showed boundary-induced strengthening in any of the three measurements. The durational differences due to the voicing of the following consonant were also maximized when accented. The results are discussed further in terms of phonetics-prosody interface in L2 production.
F0 Perturbation as a Perceptual Cue to Stop Distinction in Busan and Seoul Dialects of Korean
Kang, Kyoung-Ho ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 137~143
DOI : 10.13064/KSSS.2013.5.4.137
Recent investigation of acoustic correlates of Korean stop manner contrasts has reported a diachronic transition in Korean stops: young Seoul speakers are relatively more dependent on the F0 characteristics of the stops than on the VOT characteristics in aspirated and lenis stop distinction. This finding has been examined against tonal dialects of Korean and the results suggested that the speakers of tonal dialects are not sharing the transition. These results also suggested that F0 function for segmental stop classification interferes with the function for lexical tone classification in their tonal speech. The current study investigated these findings in terms of perception. Perceptual behavior of Seoul and Busan speakers of Korean was examined in a comparative manner through the measurement of perceptual cue weight of F0 and VOT in particular. The results from regression and correlation analyses revealed that Busan speakers are closer to older Seoul speakers than to younger Seoul speakers in that the cue weight for VOT and F0 were comparable in the aspirated-lenis stop distinction. This result was in contrast to the perceptual behavior of younger Seoul speakers who showed clear dominance of F0 over VOT for the same distinction. These findings provided perceptual evidence of the dual function of F0 for segmental and lexical distinctions in tonal dialects of Korean.
Speech Production and Perception of Word-medial Singleton and Geminate Sonorants in Korean
Kim, Taekyung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 145~155
DOI : 10.13064/KSSS.2013.5.4.145
This study investigated the articulatory characteristics of Korean singleton and geminate sonorants in the word-medial position, effects of the duration of the sonorant consonant and the preceding vowel on perception, and the difference between native Korean speakers and foreign learners of Korean in perceiving the singleton and geminate consonant contrast. The Korean sonorant consonants(/m, n, l/) are examined from the VCCV, VCV sequences through speech production and perception experiments. The results suggest that the duration of the sonorant consonant is the most important factor for native Korean speakers to recognize whether sonorants are overlapped, and the duration of preceding vowel and other factors affect the recognition of singleton/geminate consonant contrast if the duration is not obvious. A perception experiment showed Chinese Korean language learners did not clearly distinguish singleton consonants from geminate consonants. The results of this study provide basic data for recognition of singleton/geminate consonant contrast in word-medial of Korean language, and can be utilized for teaching Korean pronunciation as a foreign language.
The Primitive Representation in Speech Perception: Phoneme or Distinctive Features
Bae, Moon-Jung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 157~169
DOI : 10.13064/KSSS.2013.5.4.157
Using a target detection task, this study compared the processing automaticity of phonemes and features in spoken syllable stimuli to determine the primitive representation in speech perception, phoneme or distinctive feature. For this, we modified the visual search task(Treisman et al., 1992) developed to investigate the processing of visual features(ex. color, shape or their conjunction) for auditory stimuli. In our task, the distinctive features(ex. aspiration or coronal) corresponded to visual primitive features(ex. color and shape), and the phonemes(ex. /
/) to visual conjunctive features(ex. colored shapes). The automaticity is measured by the set size effect that was the increasing amount of reaction time when the number of distracters increased. Three experiments were conducted. The laryngeal features(experiment 1), the manner features(experiment 2), and the place features(experiment 3) were compared with phonemes. The results showed that the distinctive features are consistently processed faster and automatically than the phonemes. Additionally there were differences in the processing automaticity among the classes of distinctive features. The laryngeal features are the most automatic, the manner features are moderately automatic and the place features are the least automatic. These results are consistent with the previous studies(Bae et al., 2002; Bae, 2010) that showed the perceptual hierarchy of distinctive features.
Diachronic Change of High Vowel Devoicing in Japanese Dialects
Byun, Hi-Gyung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 171~184
DOI : 10.13064/KSSS.2013.5.4.171
This study investigated the devoicing rate of Japanese high vowels, focusing on regional and generational differences by acoustically analyzing vowels from two large speech databases. The first speech database used in this study was collected between 1986 and 1988 from 41 areas (prefectures) which included 607 participants (299 high school students and 308 their grandparents). The second was taken from a 2006-2007 collection from seven areas as a follow-up investigation to the first database consisting of 463 participants ranging in age from 8-90 year olds. The results revealed there is a generational as well as regional difference in the devoicing rate in almost all areas. Based on those results, a new distribution map reflecting a current devoicing rate of the younger generation was presented. Furthermore, by comparing the two data sets, this study confirmed age difference in the devoicing rate is not age-grading but a sound change in progress. This study discusses the social factors for changes in the devoicing rate of some areas and then applies the devoicing rate of five areas to an S-curve model to predict the future devoicing rate.
The Production of Stops by Seoul and Yanbian Korean Speakers
Oh, Mira ; Yang, Hui ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 185~193
DOI : 10.13064/KSSS.2013.5.4.185
This study investigates dialectal differences in the acoustic properties of Korean lenis, aspirated, and tense stops Seoul Korean (standard Korean) and Yanbian Korean (spoken in the largest Korean Autonomous Prefecture in China). This production study the main acoustic cues that each dialect uses to mark the laryngeal distinction between the three types of Korean stops. Measurements included VOT, and the initial F0 of the following vowel. Data collected from 10 young Seoul Korean speakers, 10 young Yanbian Korean speakers, and 6 older Yanbian speakers. two key findings: First, aspirated and lenis stops are mainly differentiated by F0 in Seoul Korean, and by
in Yanbian Korean. Second, there is no VOT merger between lenis and aspirated stops in Yanbian Korean, whereas there is in Seoul Korean. These results are discussed in terms of the phenomenon of VOT shift and the function of F0t is argued that the function of F0 to substitute for VOT difference as a primary cue for the coding of laryngeal contrast can be predicted by the pitch accent system of the language involved.
Formant Transition Shapes of Korean Front Vowels
Oh, Eunjin ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 195~200
DOI : 10.13064/KSSS.2013.5.4.195
This study investigates formant transition shapes of Korean front vowels produced by native speakers of Seoul Korean. Sixteen speakers (eight male and eight female speakers) produced [pVt] syllables where the vowels were [i, e, ɛ]. F1, F2, and F3 transition shapes were estimated by presenting formant values at 11 points by dividing the vowel duration into 10 different time intervals. The results indicated that the male and female speakers overall demonstrated similar formant transition shapes and measurement points arriving at the maximum and minimum formant values for the three front vowels. As for the vowels [e] and [ɛ], both male and female speakers showed similar formant values across the 11 measurement points and similar measurement points arriving at the maximum and minimum values, indicating that the two Korean vowels have been merged not only in the steady-state formant values, but also in the dynamic transition shapes.
Cross-Generational Differences of /o/ and /u/ in Informal Text Reading
Han, Jeong-Im ; Kang, Hyunsook ; Kim, Joo-Yeon ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 201~207
DOI : 10.13064/KSSS.2013.5.4.201
This study is a follow-up study of Han and Kang (2013) and Kang and Han (2013) which examined cross-generational changes in the Korean vowels /o/ and /u/ using acoustic analyses of the vowel formants of these two vowels, their Euclidean distances and the overlap fraction values generated in SOAM 2D (Wassink, 2006). Their results showed an on-going approximation of /o/ and /u/, more evident in female speakers and non-initial vowels. However, these studies employed non-words in a frame sentence. To see the extent to which these two vowels are merged in real words in spontaneous speech, we conducted an acoustic analysis of the formants of /o/ and /u/ produced by two age groups of female speakers while reading a letter sample. The results demonstrate that 1) the younger speakers employed mostly F2 but not F1 differences in the production of /o/ and /u/; 2) the Euclidean distance of these two vowels was shorter in non-initial than initial position, but there was no difference in Euclidean distance between the two age groups (20's vs. 40-50's); 3) overall, /o/ and /u/ were more overlapped in non-initial than initial position, but in non-initial position, younger speakers showed more congested distribution of the vowels than in older speakers.
Post-Processing of IVA-Based 2-Channel Blind Source Separation for Solving the Frequency Bin Permutation Problem
Chu, Zhihao ; Bae, Keunsung ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 211~216
DOI : 10.13064/KSSS.2013.5.4.211
The IVA(Independent Vector Analysis) is a well-known FD-ICA method used to solve the frequency permutation problem. It generally works quite well for blind source separation problems, but still needs some improvements in the frequency bin permutation problem. This paper proposes a post-processing method which can improve the source separation performance with the IVA by fixing the remaining frequency permutation problem. The proposed method makes use of the correlation coefficient of power ratio between frequency bins for separated signals with the IVA-based 2-channel source separation. Experimental results verified that the proposed method could fix the remaining frequency permutation problem in the IVA and improve the speech quality of the separated signals.
Design and Implementation of Server-Based Web Reader kWebAnywhere
Yun, Young-Sun ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 217~225
DOI : 10.13064/KSSS.2013.5.4.217
This paper describes the design and implementation of the kWebAnywhere system based on WebAnywhere, which assists people with severely diminished eye sight and the blind people to access Internet information through Web interfaces. The WebAnywhere is a server-based web reader which reads aloud the web contents using TTS(text-to-speech) technology on the Internet without installing any software on the client's system. The system can be used in general web browsers using a built-in audio function, for blind users who are unable to afford to use a screen reader and for web developers to design web accessibility. However, the WebAnywhere is limited to supporting only a single language and cannot be applied to Korean web contents directly. Thus, in this paper, we modified the WebAnywhere to serve multiple language contents written in both English and Korean texts. The modified WebAnywhere system is called kWebAnywhere to differentiate it with the original system. The kWebAnywhere system is modified to support the Korean TTS system, VoiceText
, and to include user interface to control the parameters of the TTS system. Because the VoiceText
system does not support the Festival API used in the WebAnywhere, we developed the Festival Wrapper to transform the VoiceText
's private APIs to the Festival APIs in order to communicate with the WebAnywhere engine. We expect that the developed system can help people with severely diminished eye sight and the blind people to access the internet contents easily.
A Single-Channel Speech Dereverberation Method Using Sparse Prior Imposition in Reverberation Filter Estimation
Zee, Min-Seon ; Park, Hyung-Min ;
Phonetics and Speech Sciences, volume 5, issue 4, 2013, Pages 227~232
DOI : 10.13064/KSSS.2013.5.4.227
Since a reverberation filter is generally much shorter than the corresponding dereverberation filter, a single-channel speech dereverberation method based on reverberation filter estimation has been developed to improve its performance. Unfortunately, a typical reverberation filter still requires too many coefficients to be accurately estimated using limited speech observations. In order to exploit sparseness of reverberation filter coefficients, in this paper, we present an algorithm to impose a sparse prior to the process of reverberation filter estimation. Simulation results demonstrate that the sparse prior imposition further improves performance of the speech dereverberation method based on reverberation filter estimation.