대한음성학회:학술대회논문집 (Proceedings of the KSPS conference) (Proceedings of the KSPS conference)
대한음성학회 (The Korean Society Of Phonetic Sciences And Speech Technology)
- 반년간
과학기술표준분류
- 언어 > 언어일반
대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
-
King Sejong the Great, his students in Jip-hyeun-jeon school and Choe Sejin, their successor of the sixteenth century, indicated Middle Korean had three distinctive pitches, low, high, and rising (phyeong-, geo-, sang-sheong). Thanks to
$Hun-min-jeng-{\emptyset}eum$ as well as its Annotation and side-dots literatures in fifteenth and sixteenth centuries, we can compare Middle Korean with Hamgyeong dialect, Gyeongsang dialect, and extant tone dialects with joint preservers of what was probably the tonal system of unitary mother Korean language. What is most remarkable about middle Korean phonetic work is its manifest superiority in conception and execution as anything produced in the present day linguistic scholarship. But at this stage in linguistics, prior to the technology and equipment needed for the scientific analysis of sound waves, auditory description was the only possible frame for an accurate and systematic classification. And auditory phonetics still remains fundamental in pitch description, even though modern acoustic categories may supplement and supersede auditory ones in tonological analysis. Auditory phonetics, however, has serious shortcoming that its theory and practice are too subject to be developed into the present century science. With joint researchers, I am developping a new pitch scale. It is a semiautomatic auditory grade pitch analysis program. The result of our labor will give a significant breakthrough to upgrade our component in linguistics. -
This paper describes a multimodal dialog system that uses Hidden Information State (HIS) method to manage the human-machine dialog. HIS dialog manager is a variation of classic partially observable Markov decision process (POMDP), which provides one of the stochastic dialog modeling frameworks. Because dialog modeling using conventional POMDP requires very large size of state space, it has been hard to apply POMDP to the real domain of dialog system. In HIS dialog manager, system groups the belief states to reduce the size of state space, so that HIS dialog manager can be used in real world domain of dialog system. We adapted this HIS method to Smart-home domain multimodal dialog system.
-
In this paper, we investigate a fast speaker adaptation method based on eigenvoice in several noisy environments. In order to overcome its weakness against noise, we propose a noisy environment clustering method which divides the noisy adaptation utterances into utterance groups with similar environments by the vector quantization based clustering using a cepstral mean as a feature vector. Then each utterance group is used for adaptation to make an environment dependent model. According to our experiment, we obtained 19-37 % relative improvement in error rate compared with the simultaneous speaker adaptation and environmental compensation method
-
This paper reviews the issues in implementing sound recognizers in real environments. First is the signal corruption caused by background noises and reverberation. Second is the open-set problem which is the problem of rejecting out-of-vocabulary words and noises. These two issues must be solved for noise robust recognizers.
-
For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.
-
In this paper, we introduce a system in which car navigation scenario is plugged multimodal interface based on multimodal middleware. In map-based system, the combination of speech and pen input/output modalities can offer users better expressive power. To be able to achieve multimodal task in car environments, we have chosen SCXML(State Chart XML), a multimodal authoring language of W3C standard, to control modality components as XHTML, VoiceXML and GPS. In Network Manager, GPS signals from navigation software are converted to EMMA meta language, sent to MultiModal Interaction Runtime Framework(MMI). Not only does MMI handles GPS signals and a user's multimodal I/Os but also it combines them with information of device, user preference and reasoned RDF to give the user intelligent or personalized services. The self-simulation test has shown that middleware accomplish a navigational multimodal task over multiple users in car environments.
-
This paper proposes a flexible selection method of feature vectors for speaker identification. In speaker identification, overlapped region between speaker models lowers the accuracy. Recently, a method was proposed which discards overlapped feature vectors without regard to the source causing the overlap. We suggest a new method using both overlapped features among speakers and non-overlapped features to mitigate the overlap effects.
-
The purpose of this study is to investigate prosodic characteristics of cerebral palsic adults' speech. The results showed some correlations between their articulation scores and prosodic properties of their speech: speakers with low articulation scores showed slower speech rate, larger number of IPs and pauses, and longer duration of pauses. They also showed steeper slopes of [L +H] in their APs.
-
This study has investigated the basic data of untrained boys and girls' VRP. The VRP comparison was executed between 5 boys(lO to 11 years old) and girls(10 to 11 years old). The measure of VRP was implemented by using Dr. Speech 4.0(Tiger-electronics) phonetogram program. The comparison of boys and girls' maximum and minimum range, the mean of boys' maximum range is 93.68dB(SD 7.90) and girls' range is 93.12dB(SD 5.11). There was no difference and the mean of minimum range of boy is 68.08dB(SD 3.59), girl is 71.10dB(SD 3.06).
-
This study was investigated the effect of speech tasks on habitual pitch. Seven male and female young adult speakers participated in this study. The experiment consisted of seven different speech tasks: counting, reading, sustained phonation /a/, prolonged /i:/, answering /ne/. Data was analyzed via Visi-pitch IV. The results showed that there was no significant F0 difference among speech tasks.
-
Neuromagnetic fields were recorded from normal 10 subjects to investigate the time course of cerebral neural activation during the resolution of lexical ambiguity. All recordings were made using a whole-head 306-channel MEG (Elekta Neuromag TM Inc.,
$Vectorview^{TM}$ ). The observed activity was described by sLORETA (standardized low resolution brain electromagnetic tomography) techniques implemented in CURRY software (Neuroscan). In the results, bilaterally occipito-temporal lobe was activated at 170ms. At 250ms was associated with bilateral temporal lobe during ambiguous condition, whereas in left parietal, temporal lobe on unambiguous condition. The left frontal lobe, temporal lobe were activated at 350ms for all condition. At approximately 430ms, was activated in right frontal, temporal lobe on the resolving ambiguous condition, in left parietal lobe, right temporal lobe on the preserving ambiguous condition. In conclusion, the cerebral activations related to the resolving lexical ambiguity were right frontal lobe and the areas of mountainous ambiguity were left parietal lobe. -
The purpose of this study was to compare and analyze some acoustic parameters of the cochlear implanted children(N=20, aged 3-10) and to suggest a basic data on speech rehabilitaion for the cochlear implanted children. Acoustic analyses of seven Korean monophthongs produced by 4 contexts(V, CV, VC, CVC) were conducted for the cochler implanted children and normal hearing children(N=20, aged 3-10). Subjects were asked to pronounce a list of vowel repeating three times. The results of this study are the same as follows: First, in the case of the cochlear implanted group, there were no significant differences in F1 and F2. Second, in the case of the normal hearing group, there were significant differences in F2 /ㅜ/ between V and CVC, between VC and CVC. Third, there were significant differences in F1, F2 between CI group and normal hearing group.
-
The present paper focuses on the interaction between lexical-semantic information and affective prosody. The previous studies showed that the influence of lexical-semantic information on the affective evaluation of the prosody was relatively clear, but the influence of emotional prosody on the word evaluation remains still ambiguous. In the present, we explore whether affective prosody influence on the evaluation of affective meaning of a word and vice versa, using more ecological stimulus (sentences) than simple words. We asked participants to evaluate the emotional valence of the sentences which were recorded with affective prosody (negative, neutral, and positive) in Experiment 1 and the emotional valence of their prosodies in Experiment 2. The results showed that the emotional valence of prosody can influence on the emotional evaluation of sentences and vice versa. Interestingly, the positive prosody is likely to be more responsible to this interaction.
-
The Purpose of the study is to give Korean-English leaners better knowledge on vowel sounds in their learning English. The traditional description of the cardinal vowel system developed by Daniel Johns in 1917 is not enough to provide English learners with clear ideas in producing native like vowel sounds. For the reason, three Korean-native subjects, one male, one female and one child are chosen to produce 7 cardinal vowels and compare them with native English and American speaker's vowel sounds. The difference of produced vowels sounds is quantified and visualized by employing Sona-match program. The results have been fairly remarkable. Firstly, Korean-English learner's vowel sounds are articulated differently from their intention of vowel production. Secondly, the tongue positions of Koreans are placed slightly more down and forward to the lips than those of English and Americans. However, the front vowel /i/ sound is quite close to English and Americans. Lastly the mid-vowel /
${\partial}$ / sound is not produced in any articulations of Korean-native speakers. It is thought that the mid vowel, /${\partial}$ / is a type of a weak sound regarded as 'schwa' which needs a great deal of exposure to the language to acquire a physical skill of articulation. -
In order to compare two hypotheses on the origin of semantic interference effect that has been offered in the psycholinguistic literature, we conducted two experiments using the picture-word interference paradigm. When participants named the pictures of the objects simultaneously presented with distractor words, they were required to use either native words (Experiment 1) or loanwords (Experiment 2). The pictures were paired with three kinds of distractor words that were identical, semantically related and neutral to the picture. Two observations were obtained from two experiments. Firstly, the naming times of the pictures were more fast in context of the identical distractors than in context of the neutral ones. Secondly, naming times were more slow in the presence of the semantically related distractors relative to the neutral ones. These findings support the claim that semantic interference is based on a lexical retrieval conflict.
-
The purpose of present study was to investigate the characteristics of disfluency between the Korean-English bilingual and Korean monolingual children, matched by their chronological age with the bilingual children. Twenty-eight children, 14 bilingual children and 14 monolingual children participated in this study. The experimental tasks consisted of the play situation and the task situation. The conclusion is (a) The score of total disfluency of the bilingual was significantly higher than that of the monolingual. The score of normal disfluency of the bilingual was significantly higher than that of the monolingual. The most frequent type is Interjection in both groups. All shows higher score in the task situation than the play situation. The bilingual children have quantitative and qualitative differences in disfluency score and types from the monolingual. (b) The bilingual were divided into two groups such as 6 Korean-dominant bilingual and 8 English-dominant bilingual. All shows more disfluency in their non-dominant language. The most frequent type is Interjection in both groups. (c) The higher the chronological age and the expressive language test score is, the lower the disfluency score is. The earlier the exposure age to the 2nd language is, the higher the disfluency score is. There is no correlation between resident month at foreign country and the disfluency.
-
The final purpose of this paper is to prove that, under noisy environment, there is significant difference of perceptibility of the place of articulation between fortis plosives and aspirated plosives in Korean. For this research, a perceptual experiment had been made. Two groups of subjects heard stimuli with noise and were required to answer which sound they had heard. The result is that, with noise, aspirated plosives cannot be heard clearly whereas fortis plosives can be heard well.
-
We encounter new cant in these days. The cant is classified as 'variation cant' which is used in the communication language. Therefore, this Study will focus on the aspects and the actual conditions of the cant in communication language.
-
This paper aims to analyze the pronunciation of Korean Point-of-Interest (POI) data, which consist of 224 sound files, from the phonological point of view, adapting the notion of prosodic word within the framework of Intonational Phonology. Each POI word is broken down into prosodic words, which are defined as the minimal sequence of segments which can be produced as one Accentual Phrase (AP). Then the pronunciation of the POI word considering its prosodic words are analyzed. The results show that: in most cases, a prosodic word is realized as one AP; that, in some cases, two prosodic words are pronounced as one AP: and that no cases are found where 3 prosodic words are realized as one AP.
-
This study investigates pause positions of Korean students' reading of an English script. 12 natives speakers of English and 18 Korean students were asked to read The North Wind and the Sun. The common pause positions were determined by examining the pauses of the native speakers' readings. Korean students were asked to mark pauses on a script. And then they were trained to put pauses as native speakers of English do. Although some errors have been corrected after the training, others have not been corrected in Korean students' readings. Korean students made fewer errors in marking on the script than in reading the script. They seem to know where to put pauses, but lack of practice makes it difficult to put pause in the right positions when they read. That suggests that teachers should continue to teach students where to put pauses in their reading or speaking English.
-
This study analyzes Korean middle school students' pronunciation errors of stop-liquid sequences in English. The results showed two typical errors: the insertion of a vowel between a stop and a liquid and the substitution of a liquid with a flap or vice versa. Those pronunciation errors seem to occur since English and Korean have different syllable structures and different types of liquids. A teaching material, which emphasizes no vowel insertion for a proper pronunciation of the consonant clusters, was designed to reduce Korean students' pronunciation errors. Errors were reduced substantially after a 50-minute class with the newly designed material.
-
This study was designed to examination effects of concurrent linguistic or cognitive tasks on speech rate. Eight normal speakers were repeated sentences either with or without simultaneous a linguistic task and a cognitive task. Linguistic task was conducted by generating verbs from nouns and cognitive task meaned performing mental arithmetic. Speech rate was measured from acoustic data. One-way ANOVA conducted to know speech rate difference among 3 different type of tasks. The results showed there was no significant difference between sentence repeat and linguistic tasks. But There was significant difference findings: sentence repeat and linguistic task, linguistic and cognitive task.
-
Second language learners' variable degree of production difficulty according to the cluster type has previously been accounted for in terms of sonority distance between adjacent segments. As an alternative to this previous model, I propose a Phonetically Based Consonant Cluster Acquisition Model (PCCAM) in which consonant cluster markedness is defined based on the articulatory and perceptual factors associated with each consonant sequence. The validity of PCCAM has been tested through Korean speakers' production of English consonant clusters.
-
The current study investigates the degree to which various prosodic cues at the boundaries of a prosodic phrase in Korean (Accentual Phrase) contributed to word segmentation. Since most phonological words in Korean are produced as one AP, it was hypothesized that the detection of acoustic cues at AP boundaries would facilitate word segmentation. The prosodic characteristics of Korean APs include initial strengthening at the beginning of the phrase and pitch rise and final lengthening at the end. A perception experiment revealed that the cues that conform to the above-mentioned prosodic characteristics of Korean facilitated listeners' word segmentation. Results also showed that duration and amplitude cues were more helpful in segmentation than pitch. Further, the results showed that a pitch cue that did not conform to the Korean AP interfered with segmentation.
-
This paper suggests a method to improve the performance of the pathological/normal voice classification. The effectiveness of the mel frequency-based filter bank energies using the fisher discriminant ratio (FDR) is analyzed. And mel frequency cepstrum coefficients (MFCCs) and the feature vectors through the linear discriminant analysis (LDA) transformation of the filter bank energies (FBE) are implemented. This paper shows that the FBE LDA-based GMM is more distinct method for the pathological/normal voice classification than the MFCC-based GMM.
-
The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not produce notable performance improvement compared with the full-band system. To cope with this drawback, we introduce a new technique of sub-band likelihood computation in the feature recombination, and propose a new feature recombination method by using this sub-band likelihood computation. Furthermore, the reliable sub-band selection based on the signal-to-noise ratio is used to improve the performance of this proposed feature recombination. Experimental results shows that the average error reduction rate in various noise condition is more than 27% compared with the conventional full-band speaker identification system.
-
We propose a multi-stage recognizer architecture that reduces the computation load and makes fast recognizer. To improve performance of baseline multi-stage recognizer, we introduced new feature. We used confidence vector for each phone segment instead of best phoneme sequence. The multi-stage recognizer with new feature has better performance on n-best and has more robustness.
-
This study has experimented and materialized a computational lexical processing model which hybridizes full model and decomposition model as applying lexical acquisition, one of early stages of human lexical processes, to Korean. As the result of the study, we could simulate the lexical acquisition process of linguistic input through experiments and studying, and suggest a theoretical foundation for the order of acquitting certain grammatical categories. Also, the model of this study has shown proofs with which we can infer the type of the mental lexicon of the human cerebrum through fu1l-list dictionary and decomposition dictionary which were automatically produced in the study.
-
It was the purpose of this study to obtain nasalance values for Chinese and Korean based on Vowels. The NasalView was used to measure the amount of nasal acoustic energy in the speech of 96 normal adults from China and Korea. Means and standard deviations for the nasalance and deviation scores are presented for each of three different vowels. The Chinese speakers were found to have significantly higher nasalance scores on vowel /a/ and /u/.
-
Muscle groups that are located in and around the vocal tract can produce audible changes in frequency and/or intensity of the voice. Vocal vibrato is a characteristic feature in the singing of performers trained in the western classical tradition and vibrato is generally considered to result from modulation in frequency amplitude and timbre. Vocal tremor is also characterized by periodic fluctuations in the voice frequency or intensity and vocal tremor is symptom of a neurological disease as Spasmodic dysphonia , Parkinson's disease. Vocal vibrato and Vocal tremor may have many of the same origins and mechanisms in the voice production systems. The purpose of this study is to find acostic character of Korean traditional song Pansori singer's vibrato and Spasmodic dysphonia patient's vocal tremor. twelve Pansori singers and seven Spasmodic dysponia patients participated to this study. Power spectrum and Real time Spectrogram are used to analyze the acoustic characteristics of Pansori singing and Spasmodic dysphonia patient's voice The results are as follows; First, vowel formant differences between Pansori singing and Spasmodic dysphonia patient's voice are higher F1, F3. Second, The vibrato rate show differences between Pansori singing and Spasmodic dysphonia patients;
$4^{\sim}6/sec$ and$5{\sim}6/sec$ Vibrato rate of pitch is 5.7 Hz${\sim}$ 42.4 Hz for Pansori singing , 3.8 Hz${\sim}$ 27.9 Hz for Spasmodic dysphonia patients ;Vibrato rate of intensity range is 0.07 dB${\sim}$ 8.26 dB for Pansori singing and 0.07 dB${\sim}$ 4.81 dB for Spasmodic dysphonia patients -
Studies of cry characteristics in the newborn infant were aimed to determine if cry analysis could be succesful in the early detection of the infant at risk for developmental difficulties. Crying presupposes functioning of the respiratory, laryngeal and supralaryngeal muscles. The nervous system controls the capacity, stability, and co-ordination of the movements in these muscles. Hence, the cry provides information about how the Nervous System is functioning. 3 patients(down syndrome, cornelia de lange syndrome, Patent ductus arteriosus) were assessed through a Computerized Speech Lab (CSL). Tests had been chosen to assess Fundamental frequency(mean, maximum, minimum values), Melody contour, NHR, Energy. We compared the data from patients and healthy volunteer. Variations in cry characteristics were documented in a number of medical abnormalities.
-
This study was designed to investigate fast ab/adduction rate of articulation valves in normal adults. The measurement of fast ab/aduction rate has traditionally been used for assessment, diagnosis and therapy in patients who suffered from dysarthria, functional articulation disorders or apraxia of speech. Fast ab/adduction rate shows the documented structural and physiological changes in the central nervous system and the peripheral components of oral and speech production mechanism. Fast ab/adduction rates were obtained from 20 normal subjects by producing the repetition of vocal function (/ihi/), tongue function (/t
${\wedge}$ /), velopharyngeal function (/m/), and labial function (/p${\wedge}$ /). The Aerophone II was used for data recording. The results of finding as follows: average fast ab/adduction rates were vocal function(6.21cps), tongue function(7.42cps), velopharyngeal function(5.23cps), labial function (6.93cps). The results of this study are guidelines of normal diadochokinetic rates. In addition, they can indicate the severity of diseases and evaluation of treatment. -
Submucosal type cleft palate is subdivision of cleft palate. Because of late detection, the treatment - for example, the operation or the speech therapy - for the submucosal type cleft palate patient usually late. In this study, we want to find the objective characteristics of submucosal type cleft palate patient, comparing with the normal and the complete cleft palate patient. Experimental groups are 10 submucosal type cleft palate patients who got the operation in our hospital, 10 complete cleft palate patients. And, 10 normals as control group. The sentence patterns using in this study is simple 5 vowels. Using CSL program we evaluate the Formant, Bandwidth. We analized the spectral characteristics of speech signals of 3 groups, before and after the operation.
-
Objective : There are close relationship between intraoral abnormal structure and speech-functional problem. Patients with cleft palate & ankyloglossia are typical examples. Patients with abnormal structure can be repaired toward normal structure by operation. Ankyloglossia may cause functional limitation - for example, speech disorder - even if adequate surgical treatment were done. And, each individuals have each speech disorders. The objective of this study is to evaluate the speechs of childrens with ankyloglossia, and to determine whether ankyloglossia is associated with articulation problem. We wanted to present criteria for indication of frenectomy. Study design The experimental group is composed of 10 childrens who visited our department of oral and maxillofacial surgery, dental hospital, Chonbuk university, due to ankyloglossia and articulation problem,. The average age is 5 Y 7M, M : F ratio is 4 : 1 at the time of speech test. The VPI consonant discrimination degree, PPVT, PCAT, Nasometer II, Visi-Pitch test result were obtained from each group. Result : There was significant difference for 'language development' through PPVT. Except 3 members of experimental group, all remainder showed retardation for 'language development'. For 'errored consonant rate', data showed more higher scores in alveolar consonant. There 'consonant error' in experimental group, mostly showed 'alveolar consonant', also a major modality of 'consonant error' was mostly distortion. Conclusion : We can judge the severity of ankyloglossia patient by examinig language development degree & speech test of 'alveolar consonant' . And we can make a decision for frenulotomy using these results.
-
Anomia, word finding difficulty, is one of the most common feature in aphasia. Previous studies support that the process of picture naming consists of three stages, in the order of the object recognition, semantic, and phonological output stages. Anomic patients have many symptoms and it means that anomia can be sub-divided into several symptom groups. Our anomia assessment battery consists of several parts: (1) picture naming set, (2) picture-word matching task, (3) lexical decision task for mental lexicon damage, (4) naming task for phonological lexicon damage, and (5) semantic decision task. Pictures and words were selected on the basis of usage frequency, semantic category, and word length. We administered this anomia evaluation battery to many anomic aphasics and we subdivided patients into several groups. We hope that our anomia evaluation set is useful and helpful for evaluation anomic aphasics
-
Our present study was performed to investigate acoustically the Chines normal adults' voices. 60 Chines normal adults (30 males and 30 females) of the age of 20 to 39 years oridyced systained vowel /a/ and, by analyzing them acoustically with Dr. Speech, we could get the fundamental frequency (Fo), jitter, shimmer, NNE. As results, on the average, male voices showed 1I8.1Hz in Fo, 0.186% in jitter, 1.12% in shimmer, and -13.7dB in NNE. And, female voices showed 252.4Hz in Fo, 0.186% in jitter, 0.81% in shimmer, and -1I.3dB in NNE. Every parameter except Fo showed no significant difference between male and female voices.
-
This study analyzes Korean students' pronunciation errors in stop-nasal sequences in English. For the study, 23 English words of stop-nasal sequences were pronounced by 4 natives and 21 Korean students. The results showed two kinds of pronunciation errors: the insertion of a vowel and the nasal assimilation between stops and nasals. A teaching material was designed based on the errors. After a 60-minute class with the material they were asked to pronounce the same words in another session. There was a substantial improvement in the error correction.
-
The purpose of this study was to examine the syllable structures of ten Korean numeric sounds produced by ten subjects of the same age. Each sound was normalized and divided into onset, vowel, and coda sections. Then, acoustical measurements of each syllable were done to compare the ten sounds. Results showed that there was not much deviation from the grand average duration and intensity for the majority of the sounds except the two diphthongal sounds on which their boundary points varied among the speakers. Some syllable boundaries were quite obvious while others were ambiguous. There seemed some tradeoff among the syllable components depending on their acoustic features.
-
The lexical decision task (LDT) commonly postulates the activation of semantic level. However, there are few studies for the feedback effect from semantic level. The purpose of the present study is to investigate whether the feedback effect from semantic level is facilitatory or inhibitory in Korean LDT. In Experiment 1, we manipulated the number of phonological syllable neighbors (PSN) and the number of semantic neighbors (SEN) orthogonally while orthographic syllable neighbor (OSN) is dense. In the results, the significant facilitatory effect was shown in words with many SEN. In Experiment 2, we examined same conditions as Experiment 1 but OSN was sparse. Although the similar lexical decision latency pattern was shown, there was no statistical significance. These results can be explained by the feedback activation from semantic level. If a target has many SENs and many PSNs, it receives more feedback activation from semantic level than a target with few SENs and PSNs.
-
This study examined the acoustic characteristics in women diver's Soombijil sound. A total of 18 women divers was attended this study. Acoustic analysis was performed via Praat. Soombijil sound were classified into three types as pitch variations in beginning, middle, and ending part. Type I showed increasing-decreasing-flat. Type II was identified by the shape of flat-flat-increasing. The shape of type III showed increasing-decreasing-increasing. Duration of Soombijil sound was mean 1.48 sec. The range of frequency was 1591.54
${\sim}$ 4477.13 Hz. FFT analysis showed that frequencies were concentrated 500${\sim}$ 2000 Hz. Type I and II showed two peaks at 500 Hz and 1500${\sim}$ 2000 Hz. Type III has one peak below 500 Hz. -
The present study was carried out to investigate how two languages are represented and processed for the late Korean-English bilinguals. To this end, we compared the naming times of Korean-English bilinguals on a series of the picture-word interference tasks. The entire experiment is divided into four parts, each of which required participants to name the pictures in Korean or in English with distractor words visually presented either in Korean or English. The distractor words were semantically related or unrelated to the picture. The results showed that, in different language conditions (L1 naming-L2 distractor, L2 naming - L1 distractor), there was only numerical difference between semantic related and unrelated condition. In same language conditions (L1 naming-L1 distractor, L2 naming-L2 distractor), however, significant semantic interference effect occurred. And, the interference effect was stronger in the L1 distractor condition than in the L2 distractor condition. These results suggest that the semantic processing of L1 and L2 for the late bilinguals are independent each other.
-
This study analyzes Korean high school students' pronunciation errors in word-initial onglides in English. For this study, 24 Korean high school students read 34 English words including glide-vowel sequences in word-initial positions and vowel-initial words in a frame sentence. The results showed 2 different error types: glide deletion and vowel distortion. After the analysis of the first recording, the subjects were taught how to pronounce glide-vowel sequences properly in a 60-minute class. Comparison of the analyses of the first and second recordings showed that the subjects improved on the pronunciation of glide-vowel sequences. After the training, the pronunciation errors of diphthongs unique to English, [
$j_I$ ], decreased substantially. However, most subjects still had difficulties in pronouncing [$w{\mho}$ ], [wu], and [wo]. There was no significant correlation between English course grade and error reduction. -
The purpose of this paper is to show that the current notation of Japanese proper names in Korean has some problems. It cannot represent the different sounds between the voiced and voiceless. The purpose of this paper is also to give a more correct notation which is coherent and efficient. After introducing some general knowledge about the phonemes of Japanese language, I measured the Voice Onset Time of the stops[k, t] at the beginning, in the middle and at the end of a word, and compared the spectrogram of affricates with that of fricatives. In conclusion, Japanese voiceless [k, t ,
$t{\int}$ ] should be written as [ㅋ,ㅌ,ㅊ] and voiced [g, d$d_3$ ] as [ㄱ,ㄷ,ㅈ] and the affricate[ts] as[ㅊ] in Korean. -
The purpose of this paper is to find boundary tone's characteristics in Korean emotion speeches. I mainly focus on investigating patterns and f0 values of boundary tones and f0 values in utterance final phrases.
-
The purpose of this study is to analyze the prosodic characteristics of Korean news utterances. In this paper, prosodic phrases were described in terms of the K-ToBI labeling system. In addition, the change of intonation contour that occurs throughout the sentences was discussed in terms of types of media and gender. According to analyzing the tendency of resets, 331 out of 729 resets were observed at the boundary of the intonation phrases. This means that resets are of the speaker's own volition regardless of prosodic units of intonation phrases. The declination of the intonation contour of radio news showed a gentler slope than that of TV news, because when the sentence is getting longer, the declination of the intonation contour becomes slower.
-
The purpose of this study was to investigate and quantitatively describe the acoustic characteristics of current Korean monophthongs. Recordings were made of 33 men and 27 women producing the vowels /i, e,
${\varepsilon}$ , a, (표현불가), O, u, (표현불가)/ in a carrier phrase "This character is _." A listening test was conducted in which 19 participants judged each vowel. F1, F2, and F3 were measured from the vowels judged as intended vowels by more than 17 people from the listening test. Analysis of formant data shows some interesting results including the undeniable confirmation of 7-vowel system in current Korean. -
As the Internet is prevalent in our life, harmful contents have been increasing on the Internet, which has become a very serious problem. Among them, pornographic video is harmful as poison to our children. To prevent such an event, there are many filtering systems which are based on the keyword based methods or image based methods. The main purpose of this paper is to devise a system that classifies the pornographic videos based on the audio information. We use Mel-Cepstrum Modulation Energy (MCME) which is modulation energy calculated on the time trajectory of the Mel-Frequency cepstral coefficients (MFCC) and MFCC as the feature vector and Gaussian Mixture Model (GMM) as the classifier. With the experiments, the proposed system classified the 97.5% of pornographic data and 99.5% of non-pornographic data. We expect the proposed method can be used as a component of the more accurate classification system which uses video information and audio information simultaneously.
-
Stroke makes several physical deficits. Dysarthria is one of the most difficult problems in conventional medicine because of the weakness of neuromotor control. The purpose of this study is to find the acoustic characteristics of acupuncture therapy effects on post-stroke dysarthria. Seven patients with stroke(infarction or hemorrhage) were selected by CT or MR imaging. The authors applied acupuncture therapy by inserting needles into 8 acupuncture points, ipsilateral ST4, ST6 and contralateral LI4, ST36 on facial palsy side, and CV23, CV24, bilateral "Sheyu" for 4 weeks. Speech sample were composed of five simple vowels /a,e,i,o,u/ and meaningless polysyllabic words CVCVC(C: stops, affricated, fricative sounds, v: /e/). .VOT, total duration of each speech samples and vowel formant (F1&F2) were analyzed on Spectrogram. The results are as follows: 1. VOT of bilabial and velar stops was decreased post treatment. The VOT of bilabial glottalized pre and post treatment were statistically significant (p < 0.05). 2. Total duration of polysyllabic words was decreased post treatment. Decrement of total duration containing the bilabial was statistically significant (p<0.05). 3. First formant of round vowel /o/ pre and post treatment was statistically significant (p<0.05).
-
The purpose of this study is to observe how Korean learners with low (KL) and high (KH) English proficiency manifest English rhythm with respect to the relative temporal stability or temporal constraint of syllable. In this study, speech cycling task, repeating a short phrase with the series of beeps of same interval, was used to examine temporal distribution of stressed beats.
-
This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.
-
It is reported that the orthognathic patients suffer from not only aesthetic problems but also resonance disorder and articulation disorder because of the abnormality of the oral cavity. These disorders have an influence on the patients' communication and they are usually required to be corrected by orthognathic surgery. Speech of the orthognatic patients is affected by the change of the oral cavity capacity and structures after surgery. This study was designed to investigate the resonance of nasality and the intelligibility of speech for acoustic characteristics of speech for pre and post orthognatic surgery patients.
-
We investigate whether the word frequency effects occur in native speakers' homophone speech in such a way that less frequent words are produced with greater magnitudes in duration and F0 than more frequent words. Acoustic analyses of homophone data produced by four speakers reveal that there is a tendency that vowels in less frequent words get longer than those in more frequent words, and statistical tests verify the significance of their differences. On the other hand, no considerable correlation has been discovered between F0 and word frequency.
-
In the literature on the tense consonants in Korean, it has been proposed that this consonant is underlyingly represented by a single consonant (the singleton hypothesis) and that it is represented by a sequence of two lenis consonants (the geminate hypothesis). One piece of the empirical evidence supporting the geminate hypothesis is that the closure duration of tense consonants in intervocalic position is more than twice as long in comparison with their lenis counterparts. In this paper, we report on the closure duration of three types of plosives in various phonotactically permitted contexts in Korean. The results of the measurement show that the duration of the tense consonants in post-sonorant contexts is reduced by a third in comparison with that of the intervocalic ones. These temporal differences suggest that the measurement of closure durations in intervocalic position alone is not sufficient to sustain the geminate hypothesis.
-
The purpose of this paper is to examine the viability of simulating one dialect with the speech segments of another dialect through prosody cloning. The hypothesis is that, among Korean regional dialects, it is not the segmental differences but the prosodic differences that play a major role in authentic dialect perception. This work intends to support the hypothesis by simulating Masan dialect with the speech segments from Seoul dialect. The dialect simulation was performed by transplanting the prosodic features of Masan utterances unto the same utterances produced by a Seoul speaker. Thus, the simulated Masan utterances were composed of Seoul speech segments but their prosody came from the original Masan utterances. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The simulated Masan utterances were evaluated by four native Masan speakers and the role of prosody in dialect authentication and speech synthesis was discussed.
-
The goal of this paper is to investigate effects of three prosodic factors, such as phrasal accent (accented vs. unaccented), prosodic boundary (IP-initial vs. IP-medial) and coda voicing (e.g., bed vs. bet), on acoustic realization of English vowels (/i,
$_I/$ ,$/{\varepsilon}$ ,${\ae}/$ ) as produced by native (Canadian) and nonnative (Korean) speakers. The speech corpus included 16 minimal pairs (e.g., bet-bat, bet-bed) embedded in a sentence. Results show that phonological contrast between vowels are maximized when they were accented, though the contrast maximization pattern was not the same between the English and Korean speakers. However, domain-initial position do not affect the phonetic manifestation of vowels. Results also show that phonological contrast due to coda voicing is maximized only when the vowels are accented. These results propose that the phonetic realization of vowels is affected by phrasal accent only, and not by the location within prosodic position. -
This paper aims to propose an intonation labeling method using Momel and to present results of analyzing a speech corpus consisting of 80 passages pronounced by 4 speakers (2 male and 2 female) using the proposed method. The results show that Momel works well enough to derive meaningful pitch targets, which could be labeled with H and L tones. On the other hand, the results of the analysis of Korean speech corpus correspond to earlier work.
-
The affricates of the Korean were alveolar sounds in the 15th century. Alveolar sounds have changed to post-alveolar or alveo-palatal sounds since the 18th century, at least in Southern Korean. These days, the advanced articulation of the affricates are observed, especially in the speech of young generations. The aim of this paper is to show the differences of the affricates when they are pronounced in alveo-palatal and in a more advanced position than in alveo-palatal by their cut-off frequencies. We have recorded speeches of freshmen(in their early twenties) at Seoul National University. The result was that the cut-off frequency of the advanced articulation in auditory observations was higher than that of the others. We have found in particular, that women have tendency to advance their place of articulation of the affricates.
-
Dialogic reading program is designed to involve children actively during shared reading and to provide a rich avenue for language development. The present study is to examine the effects of the modified dialogic reading program on parent-child interactions in parents of children with developmental language delays. six children with developmental language delays and their parents were participated. This 4 week program was composed of three group sessions and one individual feedback session. Parent-child interactions were videotaped before and after the program. As a result, all six parents showed increase in positive behaviors during the interaction after completing the program. And negative behaviors were partly decreased. These results were discussed in conclusions.
-
This study was designed to compare the self-rating scales (SSS, S-24, P-FA, and PSI) translated into Korean in adults with stuttering. Eighteen adults with stuttering were participated. Each scale was divided into two sub-categories, avoidance and locus of control. The correlations among the scales and among the sub-categories were evaluated. Objective stuttering severity and self-rated stuttering severity were compared. Results indicated that those scales were significantly correlated. Total score in each scale and each sub-category were also significantly correlated. There were no significant differences in total score nor subjective stuttering severity with objective stuttering severity. The self-rating scales in adults with stuttering currently used in clinics and research areas in Korea are suitable tools that adults with stuttering can evaluate the characteristics of and attitudes for stuttering subjectively.
-
Five males trained singers (age:25.0
${\pm}$ 1.4years, career:6.8${\pm}$ 1.1 years) and five female trained singers (age:22.0${\pm}$ 1.0years, career:5.8${\pm}$ 1.2 years) participated in this study. SaO2(Oxi Hemoglobin saturation) measured by Oxy-Pulse meter and PAC02 (Pressure Alveolar Co2) measured by Quick et CO2 are compared with pre and post vocal training. As the result, PAC02 was lower than normal range (36-40mmHg) after vocal training, leading to Hypocapnia. This causes headache and dizziness -
This paper reports an compared acoustic analysis on speech produced by two Korean groups, normal and AOS, focusing on utterances of V-CV structures. Major concerns include: 1) types of errors (distortion/substitution) according to the place of articulation, 2) duration of each syllable, 3) VOTs of stop sounds, and 4) F1 and F2 of vowels. In terms of the differences in these phonetic characteristics between the two groups, we aim to clarify some characteristics of AOS and to provide fundamental criteria for diagnosing and evaluating the disease.
-
The purpose of this study was to evaluate the effects of multisensory(AVK: Auditory, Visual and Kinethetic) treatment on reading pronunciation with phonological prcessing - tensification, palatalization, and lateralization for the middle school students with delayed language development caused by mental retarded. Participants were three children with reading pronunciation difficulties in phonological processing. The following conclusions were arrived. First, three children are improved on tensifiication, palatalization, and lateralization by multisensory treatment program. Second, multisensory treatment was effective in facilitating generalization. Three children presented prominent generalization effcects in lateralization. Third, they were found to maintain partially their performance rates of the later phase of the reading with phonological processing intervention three weeks after the termination of the intervention.
-
The purpose of this study is 1) to describe the phoneme inventories of cochlear implant(CI) children and 2) to describe their utterances using narrow phonetic transcription method. All the subjects had more than 2 year-experience with CI and showed more than 87% open-set sentence perception abilities. Average consonant accuracy was 81.36% and it was improved up to 87.41% when distortion errors were not counted. They showed different error patterns from hearing aid users. The prominent error pattern was weakening of consonants.
-
This experimental study aims to find out the correlation between semantic predictability and pitch-accent realization. For the experiment, we classified the predictability into three degrees: unpredictable, implicitly predictable, and explicitly predictable. And then each degree divided into to two subcatergories: one is adverbs/adverbial phrases of time or place and the other one is not time or place adverbs/adverbial phrases. The materials used in the experiment were 9 sentences for the each subcategory. One male and one female English native speakers participated in this experiment. Their reading speeches were recorded on Digital Audio Tape. Their speech data were analyzed by using Pitchworks program. The results of this experiment show pitch accented ratio is somewhat in inverse proportion to the degree of predictability.
-
The primary goal of this study is to find out how the effect of speaking rate has some influence on the production and perception across languages. Through both experiments of production and perception, an English native speaker changes both production and perception simultaneously. Especially the production of the temporal features changes relatively fast. On the contrary, Chinese and Korean speakers changes their production rather than perception by following the speaking rate.
-
The aim of this study is to clarify degrees of difficulty of the Japanese L2 (second language) learners for learning Korean sounds and phonological rules. 31 subjects took a questionnaire survey and an identification test using words. In addition, each subject's pronunciation was evaluated by 3 Korean native speakers. As for Korean sounds, the results show that Japanese L2 learners have a tendency perceiving that listening is more difficult than pronouncing, although the listening test's scores were greater than the pronunciation test's scores for a majority of the items. As for Korean phonological rules, 1) there were some difficult items for applying the phonological rules, although Japanese L2 learners had knowledge of them, and 2) there were also some items that Korean native speakers evaluated Japanese L2 learners' pronunciations as the phonological rules were applied, even though learners pronounced them without any knowledge.
-
This study was designed to compare the translated patient's subjective rating scales for voice evaluation (Voice Handicap Index; VHI, Voice-Related Quality of Life; V-RQOL, Voice Rating Score; VRS) into Korean, taken from 24 professional voice users diagnosed with organic voice disorders. First, the correlation amongh those scales were observed. Second, the correlation between the patient's subjective rating scales and acoustic measures (Jitter%, Shimmer%, NHR) were examined. Third, those scales were compared by clinician's objective scale (G in GRBAS scale). Results indicated that significant correlations among the patients' subjective rating scales and significant correlations of clinician's rating scale with jitter% and Shimmer%, but not with NHR were observed. In addition, there were significant correlations of G with VHI and VHI-P (one of subscale of VHI). However, none of acoustic measures were correlated with the patient's subjective rating scales.
-
This study suggests a computational model to inquire the roles of phonological information and orthography information in the process of visual word recognition among the courses of language information processing and the representation types of the mental lexicon. As the result of the study, the computational model showed the phonological and orthographic neighborhood effect among language phenomena which are shown in Korean word recognition, and showed proofs which implies that the mental lexicon is represented as phonological information in the process of Korean word recognition.
-
The purpose of this study was to evaluate speech production ability of congenitally deaf children with cochlear implant. Forty children were participated in the study. The results are following: (1) mean of speech intelligibility score was 3.05 in 5 point scale, (2) mean of percent of correct vowels was 86.19%, and mean of percent of correct consonants was 74.89%, and (3) voice profiles showed their voice were high pitched, hypernasal, and breathy. But 12.5% of the children were evaluated as having normal voice quality. Overall speech production abilities of children with cochlear implant were superior than the deaf children's result reported in literatures. Meanwhile their abilities were not same as children with normal hearing.
-
In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed to improve the quality of decoded speech under a burst packet loss condition. The proposed algorithm is based on the recovery of voiced excitation using an estimate of the voicing probability and the generation of random excitation by permutating the previously decoded excitation. The voicing probability is estimated from the correlation using the previous correctly decoded excitation and pitch. The proposed algorithm is implemented as a PLC algorithm for G.729 and its performance is compared with PLC employed in G.729 by means of perceptual evaluation of speech quality (PESQ) and an A-B preference test under the random and burst packet losses with rates of 3% and 5%. It is shown that the proposed algorithm provides better speech quality than the PLC of G.729, especially under burst pack losses.
-
Spoken dialog system development includes many laborious and inefficient tasks. Since there are many components such as speech recognizer, language understanding, dialog management and knowledge management in a spoken dialog system, a developer should take an effort to edit corpus and train each model separately. To reduce a cost for editting corpus and training each models, we need more systematic and efficent working environment. For the working environment, we propose DialogStudio as an spoken dialog system workbench.
-
This paper proposes the methods to enhance the speech quality of source controlled variable bit-rate coder based on the waveform interpolation. The methods are to estimate and generate the parameters that are not transmitted from encoder to decoder by the repetition and extrapolation schemes. For the performance evaluation, the PESQ(Perceptual Evaluation of Speech Quality) scores are measured. The experimental results shows that our proposed method outperforms the conventional source controlled variable bit-rate coder. Especially, the performance of the extrapolation method is better than that of the repetition method.
-
The goal of our research is to build a textindependent speaker identification system that can be used in mobile devices without any additional adaptation process. In this paper, we show that exploiting the advantages of both PCA(Principle Component Analysis) and LDA(Linear Discriminant Analysis) can increase the performance in the situation. The proposed method reduced the relative recognition error by 13.5%
-
The various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Aurora2 database to examine the potentiality of the trend feature based tandem architecture. The proposed method shows the better results than the baseline system on very low SNR environments.
-
This paper presents a FSN-based LVCSR system and it's application to the speech TV program guide. Unlike the most popular statistical language model-based system, we used FSN grammar based on the graph theory-based FSN optimization algorithm and knowledge-based advanced word boundary modeling. For the memory and latency efficiency, we implemented the dynamic pruning scheduling based on the histogram of active words and their likelihood distribution. We achieved a 10.7% word accuracy improvement with 57.3% speedup.
-
This paper presents a dialogue interface using the dialogue management system as a method for controlling home appliances in Home Network Services. In order to realize this type of dialogue interface, we annotated 96,000 utterance pair sized dialogue set and developed an example-based dialogue system. This paper introduces the automatic error correction module for the SMS-styled sentence. With this module we increase the accuracy of NLU(Natural Language Understanding) module. Our NLU module shows an accuracy of 86.2%, which is an improvement of 5.25% over than the baseline. The task completeness of the proposed SMS dialogue interface was 82%.