Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Phonetics and Speech Sciences
Journal Basic Information
Journal DOI :
The Korean Society of Speech Sciences
Editor in Chief :
Volume & Issues
Volume 1, Issue 4 - Dec 2009
Volume 1, Issue 3 - Sep 2009
Volume 1, Issue 2 - Jun 2009
Volume 1, Issue 1 - Mar 2009
Selecting the target year
The Effect of the Number of Clusters on Speech Recognition with Clustering by ART2/LBG
Lee, Chang-Young ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 3~8
In an effort to improve speech recognition, we investigated the effect of the number of clusters. In usual LBG clustering, the number of codebook clusters is doubled on each bifurcation and hence cannot be chosen arbitrarily in a natural way. To have the number of clusters at our control, we combined adaptive resonance theory (ART2) with LBG and perform the clustering in two stages. The codebook thus formed was used in subsequent processing of fuzzy vector quantization (FVQ) and HMM for speech recognition tests. Compared to conventional LBG, our method was shown to reduce the best recognition error rate by 0
}0.9% depending on the vocabulary size. The result also showed that between 400 and 800 would be the optimal number of clusters in the limit of small and large vocabulary speech recognitions of isolated words, respectively.
Performance Comparison of the Speech Enhancement Methods for Noisy Speech Recognition
Chung, Yong-Joo ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 9~14
Speech enhancement methods can be generally classified into a few categories and they have been usually compared with each other in terms of speech quality. For the successful use of speech enhancement methods in speech recognition systems, performance comparisons in terms of speech recognition accuracy are necessary. In this paper, we compared the speech recognition performance of some of the representative speech enhancement algorithms which are popularly cited in the literature and used widely. We also compared the performance of speech enhancement methods with other noise robust speech recognition methods like PMC to verify the usefulness of speech enhancement approaches in noise robust speech recognition systems.
Prominence Detection Using Feature Differences of Neighboring Syllables for English Speech Clinics
Shim, Sung-Geon ; You, Ki-Sun ; Sung, Won-Yong ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 15~22
Prominence of speech, which is often called 'accent,' affects the fluency of speaking American English greatly. In this paper, we present an accurate prominence detection method that can be utilized in computer-aided language learning (CALL) systems. We employed pitch movement, overall syllable energy, 300-2200 Hz band energy, syllable duration, and spectral and temporal correlation as features to model the prominence of speech. After the features for vowel syllables of speech were extracted, prominent syllables were classified by SVM (Support Vector Machine). To further improve accuracy, the differences in characteristics of neighboring syllables were added as additional features. We also applied a speech recognizer to extract more precise syllable boundaries. The performance of our prominence detector was measured based on the Intonational Variation in English (IViE) speech corpus. We obtained 84.9% accuracy which is about 10% higher than previous research.
Positive Peaked Electrically Compound Action Potentials in Cochlear Implant Recipients
Heo, Seung-Deok ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 25~30
Animal experiments have shown that the positive peaked electrically compound action potentials (ECAPs) can be recorded in round window, intracochlear, and nerve trunk by stimulating a monopolar pulse. However, positive peaked ECAPs of cochlear implant recipients have never been reported because ECAPs are recorded from intracochlear electrodes after bipolar stimulation. In our experiment, the positive peaked ECAPs were recorded from 18 intracochlear electrodes in cochlear implant recipients with multiple cochlear anomalies. Thresholds in each channel were measured and the latency of P-, N-wave, and amplitude of P-N were analyzed. These results were identical with the electrically auditory brainstem response (EABR) on the input-output characteristics. In conclusion, the positive peaked ECAPs from the cochlear implant recipients are antidromic ECAPs recorded by perimodiolar electrodes stimulating cochlear implants with multiple anomalies. Therefore, positive peaked ECAPs can be used as useful audiological tools to evaluate the eighth nerve ending.
The Effect of a Portable Voice Feedback Device on the Hyperfunctional Voice Behaviors of Children with Vocal Nodules
Lee, Moo-Kyung ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 31~36
This study attempted to examine the effects of a portable voice feedback device on the hyperfunctional voice behaviors of children with vocal nodules when they wore the device in their daily lives. The device could set fundamental frequency and intensity at optimal levels for the subjects, It produces an audible alarm for inappropriate hyperfunctional voices beyond the preset levels, In addition, the frequency of hyperfunctional voice behaviors was recorded by the device, therefore the users were able to chart their number of hyperfunctional voice behaviors per day, According to results acquired after having subjects wear the device for 12 weeks, the subjects' frequency of hyperfunctional voice behaviors decreased significantly (p < .01). Especially from the first to fourth week, the frequency of their hyperfunctional voice behaviors declined significantly.
The Effect of Vocal Function Exercise on Voice Improvement in Patients with Vocal Nodules
Lim, Hye-Jin ; Kim, Jeong-Kyu ; Kwon, Do-Ha ; Park, Jun-Young ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 37~42
The purpose of the present study was to determine the effect of the management program known as vocal function exercise (VFE) on voice quality. Typical VFE was modified and applied to patients with vocal nodules by controlling intensity of voice and relieving the vocal fold to solve hyperfunctional problems in VFE. Eight female subjects aged between 28 and 54 who had been diagnosed with vocal nodules took part in the study. The patients performed VFEs once a week for eight weeks. Vocal function exercises consist of voice hygiene, respiratory training, phonation training, and glide training. The subjects' voices were analyzed pre and post therapy on the aspects of acoustics, maximum phonation time (MPT), GRBAS, and voice handicap index (VHI). As a result, it was found that fundamental frequency (
) was significant increased, shimmer decreased remarkably and that noise to harmonic ratio (NHR) lowered obviously in the acoustic parameter. In addition, MPT was increased significantly. The scale of GRBAS indicated significant improvement in grade, roughness, and strained voice. VHI indicated significant improvement in an emotional part. In conclusion, VFE was effective in improving voice quality for patients with vocal nodules.
The Effect of Voice Therapy in Vocal Polyp Patients
Kim, Seong-Tae ; Jeong, Go-Eun ; Kim, Sang-Yoon ; Choi, Seung-Ho ; Lim, Gil-Chai ; Han, Ju-Hee ; Nam, Soon-Yuhl ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 43~49
Vocal polyps are benign phonotraumatic lesions which are traditionally treated using phonomicrosurgical techniques. In the case of hyperfunctional voice use, voice therapy is effective and results in voice improvement. However, the utility of voice therapy about vocal polyp is in great demand. The purpose of this study was to evaluate the effects of voice therapy in patients with vocal polyps. The authors reviewed the medical records of 193 patients with vocal nodules or vocal polyps, and 64 patients (31 nodules and 33 polyps) were enrolled. All of the subjects had received explanation of problems, vocal hygiene education, and been treated by the
(Seong-Tae Kim's multiple voice therapy technique) ranging from 4 to 16 sessions (mean: 8.6 sessions). All subjects were examined by perceptual assessment, acoustic and aerodynamic measures, and VRP (voice range profile). In perceptual assessment, patients with vocal nodules had more breathy and strained voices than the vocal polyp group. Both groups significantly reduced rough, breathy voice after voice therapy. Patients with vocal polyps had worse voice quality than patients with nodules in acoustic measures. Both groups showed reduced jitter and shimmer after voice therapy. In aerodynamic measures, MPT and Psub were increased, and MFR was reduced (p<.05). Participants' frequency range and intensity range were increased after voice therapy, but only frequency range resulted in a significant difference (p<.05). In conclusion, the therapeutic effect of voice therapy in patients with vocal nodules and polyps was demonstrated perceptually and acoustically. We can suggest that voice therapy, including advice, vocal hygiene, and
is a useful as an initial choice of treatment for patients with vocal polyps before considering a surgical approach.
Effects of Neonatal Hearing Screening Program (NHSP) Information on Parental Satisfaction
Ahn, Hyun-Sook ; Cho, Soo-Jin ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 51~59
This study was designed to investigate the effects of neonatal hearing screening program (NHSP) information on parental satisfaction with the Parent Satisfaction Questionnaire with Neonatal Hearing Screening Program (PSQ-NHSP) by Mazlan et al. (2006). The PSQ-NHSP consisted of four aspects including: information, personnel in charge of the hearing test, appointment activity, and overall satisfaction in the neonatal hearing screening program. A total of 106 parents (50 in the experimental group and 56 in the control group) participated in this study in one general hospital and two delivery clinics. The fifty parents in the experimental group received information and counseling with educational materials before filling out the PSQ-NHSP, but the fifty-six parents in the control group did not receive any counseling or education materials before completing the PSQ-NHSP. The PSQ-NHSP demonstrated excellent internal consistency reliability (
). The results of the study were as follows. First, the overall satisfaction (
) and personnel in charge of hearing test (
) aspects showed higher rates of satisfaction than the appointment activity aspect (
) for total subjects. Second, the overall parental satisfaction rate of the experimental group (
) was significantly higher than that of the control group (
) in all items. Lastly, thirty-two participants (30%) made at least one comment in response to the open-set items. A total of 29 comments were related to satisfaction with participating in the NHSP and II comments were related to dissatisfaction. In conclusion, to improve parental satisfaction it is important to provide parents with education and information about the NHSP before the test. In addition, PSQ-NHSP was found to be a useful instrument for identifying the benefits and shortfalls of the NHSP.
Semantic Priming Effect of Korean Lexical Ambiguity: A Comparison of Homonymy and Polysemy
Yu, Gi-Soon ; Nam, Ki-Chun ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 63~73
The present study was conducted to explore how the processing of lexical ambiguity between homonymy and polysemy differs from each other, and whether the representation of mental lexicon was separated from each lexical ambiguity by a semantic priming paradigm. Homonymy (M1 means the literal meaning of '사과', i.e. apple and M2 means another literal meaning of '사과', i.e. apologize) was used in Experiment I, and polysemy (M2 means the literal meaning of '바람', i.e. wind and M2 means the figurative meaning of '바람', i.e. wanton) was used in Experiment 2. The results of both experiments showed that a significant semantic priming effect occurs regardless of the type of ambiguities (homonymy and polysemy) and the difference of their semantic processes. However, the semantic priming effect for polysemy was larger than that for homonymy. This result supports the hypothesis that the semantic process of homonymy is different from that of polysemy.
A Study of the Pitch Measurement Location and Reference Line for a Research of Declination in Korean
Kwak, Soook-Young ; Shin, Ji-Young ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 75~84
The aim of this paper is to find an adequate method to study declination in Korean. In previous studies of declination in Korean, maximum and minimum pitch values in an accentual phrase were measured. But this method is inadequate when an accentual phrase is located at the intonational phrase. So in order to exclude the final tone of an intonational phrase, we propose to measure pitch values of the first and second tone in an accentual phrase when the tonal pattern of the accentual phrase is 'LHLH'. In this case, the line that connects every first tone of an accentual phrase is the baseline, and the line that connects every second tone of an accentual phrase is the topline. By a comparison of declination between focused and neutral utterances, we will show that the topline of declination is more direct to the speaker's plan than the baseline.
The Role of Pitch and Length in Spoken Word Recognition: Differences between Seoul and Daegu Dialects
Lee, Yoon-Hyoung ; Pak, Hyen-Sou ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 85~94
The purpose of this study was to see the effects of pitch and length patterns on spoken word recognition. In Experiment 1, a syllable monitoring task was used to see the effects of pitch and length on the pre-lexical level of spoken word recognition. For both Seoul dialect speakers and Daegu dialect speakers, pitch and length did not affect the syllable detection processes. This result implies that there is little effect of pitch and length in pre-lexical processing. In Experiment 2, a lexical decision task was used to see the effect of pitch and length on the lexical access level of spoken word recognition. In this experiment, word frequency (low and high) as well as pitch and length was manipulated. The results showed that pitch and length information did not play an important role for Seoul dialect speakers, but that it did affect lexical decision processing for Daegu dialect speakers. Pitch and length seem to affect lexical access during the word recognition process of Daegu dialect speakers.
Perceptual Structure of Korean Consonants in High Vowel Contexts
Bae, Moon-Jung ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 95~103
We investigated the perceptual structure of Korean consonants by analyzing the confusion among consonants in various vowel contexts. The 36 CV syllable types combined by 18 consonants and 2 vowels (/i/ and /u/) were presented with masking noises or in degraded intensity. The confusion data were analyzed by the INDSCAL (Individual Difference Scaling), ADCLUS (Additive Clustering) and the probability of the transmitted information. The results were compared with those of a previous study with /a/ vowel context (Bae and Kim, 2002). The overall results showed that the laryngeal features-aspiration, lax and tense-are the most salient features in the perception of Korean consonant regardless of vowel contexts, but the perceptual saliency of place features varies across vowel conditions. In high vowel (front and back vowel) contexts, sibilant consonants were perceptually salient compared to in low vowel contexts. In back vowel contexts, grave (labial and velar) consonants were perceptually salient. These findings imply that place features and vowel features strongly interact in speech perception as well as in speech production. All statistical measures from our confusion data ensured that the perceptual structure of Korean consonants correspond to the hierarchical structure suggested in the feature geometry (Clements, 1991). We discuss the link between speech perception and production as the basis of phonology.
Identifying Frication and Aspiration Noise in the Frequency Domain: The Case of Korean AIveolar Lax Fricatives
Yoon, Kyu-Chul ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 105~110
This paper introduces the technique of semi-automatically identifying different types of noise in the frequency domain. Given the lower cutoff frequency of the frication noise, and a user-specified constant, the technique identifies the boundary between the frication and aspiration noise in a Korean lax fricative followed by the vowel /a/ by comparing the upper and lower sums of energy with respect to the cutoff frequency. The user-specified constant can be adjusted for different speakers. When the technique was applied to distinguish the two types of noise of Korean lax fricatives from the same speaker, the average and standard deviation of the difference between the manually inserted boundaries and the automatically inserted boundaries were 2.67ms and 1.80ms respectively.
The Relationship Between Voice and the Image Triggered by the Voice: American Speakers and American Listeners
Moon, Seung-Jae ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 111~118
The present study aims at investigating the relationship between voices and the physical images triggered by the voices. It is the final part of a four-part series and the results reported in the present study are limited to those of American speakers and American listeners. Combined with the results from previous studies (Moon, 2000; Moon, 2002; Tak, 2005), the results suggest that (1) there is a very strong, much higher than chance-level relationship between voices and the pictures chosen for the voices by the perception experiment subjects; (2) the more physical characteristics that are given, the better the chance for correctly matching voices with pictures; and (3) culture (in the present, language environment) seems to play a role in conjuring up the mental images from voices.
A Study on the Intonational Patterns in English Information Structures
Kim, Hwa-Young ; Oh, Mi-Ra ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 119~128
Many researchers have argued about the relationship between information structure and intonation. Their results can be summarized in three main points: the intonation of topic and focus in English information structures is implemented as i) a pitch accent, ii) a tune (a pitch accent + (an) edge tone(s)), or iii) a boundary tone. The purpose of this paper is to study various intonational patterns of topic and focus in English information structures, using natural conversations. In this paper, the types of topics and foci are divided, based on contrastiveness. The topics are classified as five non-contrastive and four contrastive topics. The foci are classified as neutral focus, informational focus, and contrastive focus. The results show that the intonation of the topic in English information structures is mainly implemented as a pitch accent, except for the type of the pronoun topic (Lp) which is not implemented as a pitch accent or a tune. However, the intonation of the focus is implemented as a tune in the neutral focus (Fn) and as a pitch accent or a tune in the informational focus (Fi) and the contrastive focus (Fe). In our discussion and conclusion, we suggest that it is not always true that for the meaning of contrast, the topic or the focus is represented as a
pitch accent, which has been the main contrastive intonation from earlier studies.
Frequency Related Information and Syllable Structure Constraints on Sino-Korean
Shin, Ji-Young ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 129~140
The purpose of the present study is to investigate frequency related information and syllable structure constraints on Sino-Korean. Previous studies on Sino-Korean have mostly investigated the historical change of sounds and reviewed archaic features of Chinese language in Sino-Korean. Unfortunately, there is little study on the sounds of contemporary Sino-Korean in terms of syllable structure constraints. For the purpose of the present study, sounds of 7,742 Chinese characters used in Sino-Korean (7,795 syllables) were investigated and syllable matrices made based on the results of frequency related information. As a result, 483 syllable types were observed and the most frequently observed syllables were as follows: /ku/ (103) > /ki/ (100) > /ju/ (87) > /pi/ (86). Only 16 out of 19 consonants are used for Sino-Korean. /
/ and /
/ are never used in Sino-Korean and /kh,
/ occur only a few times (3, 2, 1 respectively). /k/ (17.5%) shows the highest frequency and /n,
, 1, tc, m/ occupied the next rankings. Among 20 vowel types, /a/ showed the highest frequency and /o, u, i,
/ occupied the next rankings. Based on the syllable matrices, gaps were observed and classified into accidental or systematic ones. Onset and nucleus, nucleus and coda, onset and coda, and other syllable structure constraints of Sino-Korean were listed.
A Study on the Rhythm of Korean EFL Learners' English Pronunciation
Chung, Hyun-Song ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 141~149
An emphasis on teaching suprasegmental features of English, specifically English rhythm, is essential in order to improve the 'intelligibility' of the pronunciation of Korean EFL learners among interlocutors who use English as a Lingua Franca(ELF). By redefining the ELF suggested by Jenkins (2000, 2002), this paper argues that Lingua Franca Core (LFC) must include suprasegmental features such as 'stress-based rhythm' and word stress. However, because 'isochrony' is difficult to measure in a foot, the rhythm unit must be expanded to an intonational phrase which has prominence in it and the rhythm of the unit can be measured by calculating the duration of each segment in context The rhythmic pattern of Korean learners of English and that of native speakers or other non-native English speakers can then be calculated and compared by using correlation coefficients of the segmental duration. In terms of sociolinguistic factors, improving the 'comprehensibility' and 'accentedness' of Korean EFL learners' pronunciation is also important in international communication, which calls for more emphasis on suprasegmental features.
Tensification Preference of Native Seoul Speakers of Korean
Lee, Ho-Young ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 151~162
This paper aims to investigate how tensification preference has changed over time and discuss how appropriately tensification preference is reflected in Principles of Standard Pronunciation and Standard Korean Language Dictionary. For this research, a questionnaire survey of tensification preference was conducted. 173 test words were used and 156 native Seoul speakers participated in this survey. The results have shown that tensification preference has gradually increased from older to younger generations. In addition, Principles of Standard Pronunciation and Standard Korean Language Dictionary do not reflect real pronunciation appropriately. Therefore, some ways of incorporating the actual pronunciation of Seoul speakers in the Principles of Standard Pronunciation and the Standard Korean Language Dictionary are suggested.
Vocal Characteristics and Differences in Gender and Voice Classification among Classical Singers
Nam, Do-Hyun ; Kim, Wha-Soak ;
Phonetics and Speech Sciences, volume 1, issue 2, 2009, Pages 163~171
This study attempted to investigate vocal characteristics and differences in gender and voice classification among classical singers. Twenty-three female singers (M = 23.1 yrs, SD = 3.6 yrs, average 6.3 yrs singing experience, all classified as sopranos) and twenty male singers (M = 25.2 yrs, SD= 3.6 yrs, average 6. 3 yrs singing experience, 8 tenors, 12 baritones) were recruited to participate in the present study. Speaking fundamental frequency (FO), closed quotient (CQ), MPT (Maximum Phonation Time), breathing types, maximum inspiratory pressure (MIP), maximum expiratory pressure (MEP), and singers' formants were measured. In addition, vibratory patterns were observed using stroboscopy. Sfo, singing CQ, breathing types, formant frequency in singers' formants, MIP, MEP, and MPT were significantly different from gender to gender. Generally, singers' formants were observed in male singers and also the pattern of singers' formants was different between tenors and baritones. Lower singing CQ values were observed than speaking CQ values in the female singers (P<.001). Furthermore, MEP, MIP, and singing CQ were significantly lower for female singers than for males singers (P<.001). MPT and speaking FO, however, were not significantly different between tenors and baritones.