Proceedings of the KSPS conference (대한음성학회:학술대회논문집)
The Korean Society Of Phonetic Sciences And Speech Technology
- Semi Annual
Domain
- Linguistics > Linguistics, General
2003.10a
-
ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters
-
The aim of this paper is to analyze the pathological voice by separating signal into periodic and aperiodic part. Separation was peformed recursively from the residual signal of voice signal. Based on initial estimation of aperiodic part of spectrum, aperiodic part is decided from the extrapolation method. Periodic part is decided by subtracting aperiodic part from the original spectrum. A parameter HNR is derived based on the separation. Parameter value statistics are compared with those of Jitter and Shimmer for normal, benign and malignant cases.
-
This paper proposes an automatic pronunciation correction system which provides users with correction guidelines for each pronunciation error. For this purpose, we develop an HMM speech recognizer which automatically classifies pronunciation errors when Korean speaks foreign language. And, we collect speech database of native and nonnative speakers using phonetically balanced word lists. We perform analysis of mispronunciation types from the experiment of automatic mispronunciation detection using speech recognizer.
-
Two experiments were conducted to support the suggestion that the same information processing was used in both input modalities, visual and auditory modality in Wonil Choi & Kichun Nam(2003)'s paper. The primed lexical decision task was performed and pseudoword prime stimuli were used. The result was that priming effect did not occur in any experimental condition. This result might be interpreted visual facilitative information and phonological inhibitory information cancelled each other.
-
This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.
-
This study reveals the perceptual role of stop release burst to Koreans' recognition of POA(place of articulation) and voicing in the English word-final stops. 10 Korean subjects participated in a perception experiment wherein the stimuli are prepared on the basis of the amount of acoustic information, which includes the release burst. The result shows that i) release burst plays an important role in the recognition of POA in the order of velar, alveolar, and bilabial stops, and ii) the release burst more enhances the correct recognition of voiceless stops than that of voiced stops. This result leads us to conclude that the role of stop release burst differs with respect to the POA and voicing of the stops, and it is possibly related to the different intensity of release in voicing and in each POA.
-
The Korean diphthong /je/ is realized monophthong /e/ or neutralized /E/ in real speech generally. And diphthong /je/ was changed by preceeding consonants and place of syllable. In case that preceeding consonants exist, /je/ is realized as it is /je/, but in case that preceeding consonants don't exist, /je/ is changed variously. In case that /je/ is in second syllable place, /je/ is realized monophthong /e/ and in case that /je/ is in fist syllable place, /je/ is realized diphthong /je/.
-
The purpose of this paper was to analyze the effects of nasalization on vowels. Ten males and 7 females produced 5 vowels (/a/, /e/, /i/, /o/, /u/) in conditions: normal and nasalized. In this study we compared normal vowels' formant with nasalized vowels' and examined nasal-formant in the nasalized vowels. The results was as follows: First, there was a significant difference between normal vowels and nasalized in terms of F1 and F2. Second, the nasal formants were observed in nasalized vowels more frequently in females than males. Third, N1 appeared to influence F1 of vowels whereas N2 seemed to have an impact on F2 and/or F3.
-
In this paper, we introduce a new description method of annotation information of speech database. As one of structured description methods, XML based description which has been standardized by W3C will be applied to represent metadata of speech database. It will be continuously revised through the speech technology standard forum during this year
-
본 논문에서는 대화체 연속음성인식 과정에서 사용되는 다중발음사전의 개념을 확장하여 대화체 발화에 빈번하게 나타나는 불규칙한 발음변이 현상을 포용하도록 한 확장된 발음사전의 방법을 적용하여 대화체 연속음성인식에서 인식성능의 향상을 가져오게 됨을 실험을 통해 보여준다. 대화체 음성에서 빈번하게 나타나는 음운축약 및 음운탈락, 전형적인 오발화, 양성음의 음성음화 등의 발음변이는 언어모델의 효율성을 떨어뜨리고 어휘 수를 증가시켜 음성인식의 성능을 저하시키고, 또한 음성인식 결과로 나타나는 출력형태가 정형화되지 못하는 단점을 가지고 있다. 이에 이러한 발음변이들을 발음사전에 수용할 때 각각의 대표어휘에 대한 변이발음으로 처리하고, 언어모델과 어휘사전은 대표어휘만을 이용해 구성하도록 한다. 그리고, 음성인식기의 탐색부에서는 각각의 변이발음의 발음열도 탐색하되 대표어휘로 언어모델을 참조하도록 하고, 인식결과를 출력하도록 하여 결과적으로 인식성능을 향상시키고, 정형화된 출력패턴을 얻도록 한다. 본 연구에서는 어절단위 뿐 아니라 의사형태소[2] 단위의 발음사전에도 발음변이를 포용하도록 하여 실험을 하였다. 실험을 통해 어절단위의 다중발음사전 구성을 통해 ERR 10.9%, 의사형태소 단위의 다중발음 사전의 구성을 통해 ERR 4.3%의 성능향상을 보였다.
-
In this paper, we investigate various factors that are relevant to design and implementation of an integrated management system for various speech corpora. The purpose of this paper is to manage an integrated management system for various kinds of speech corpora necessary for speech research and speech corpora consrtructed in different data formats. In addition, ways are considered to allow users to search with effect for speech corpora that meet various conditions which they want, and to allow them to add with ease corpora that are constructed newly. In order to achieve this goal, we design a global schema for an integrated management of new additional information without changing old speech corpora, and construct a web-based integrated management system based on the scheme that can be accessed without any temporal and spatial restrictions. And we show the steps by which these can be implemented, and describe related future study topics, examining the system.
-
The paper deals with experimental results which was performed to investigate the characteristics of Korean lexical processing and representation of morphemes involved in Korean noun and verb Eojeols. The investigation is also related with the 'English past tense debate' which deals with human mental computation. Experiments using fMRI methods, show that Korean noun Eojeols and both regular and irregular verb Eojeols show a similar activation pattern. Thus, the results indicate that the morphological processing in Korean noun and verb Eojeols are performed quite differently than the Indo-European morphological processing.
-
The purpose of this study is to examine the regions of the cerebrum that handles the lexical and idiomatic ambiguity. The stimuli sets consist of two parts, and each part has 20 sets of sentences. For each part, 10 sets are experimental conditions and the other 10 sets are control conditions. Each set has two sentences, the 'context' and 'target' sentences, and a sentence-verification question for guaranteeing patients' concentration to the task. The results based on 15 patients showed that significant activation is present in the right frontal lobe of the cerebral cortex for both kinds of ambiguity. It means that right hemisphere participates in the resolution of ambiguity, and there are no regions specified for lexical ambiguity or idiomatic ambiguity alone.
-
This study carried out an experimental English pronunciation assessment to see the differences in the relationship between the different rater categories. The result shows that i) correlation between Korean and Native American raters is high(r=.98) enough to be considered reliable, ii) previous instructions about assessment rubric and the knowledge about English phonetics and phonology exert little influence on the rating scores, iii) correlation between the automatic ILT(Interactive Language Tutor) rating using speech recognition technology and Natives' rating is stronger than that between ILT and Koreans' rating.
-
The aim of this paper is to investigate the development of inflected words of Korean based on the analysis of 3 to 8 year-old children's spontaneous speech. For this purpose, the authors transcribe the spontaneous speech of 10 Korean children for each age and classified inflected word. The result of the analysis is as follows :
$\circled1$ In the verbs simple words are occupied 62%, derivative words 18% and complex words 20%. In the adjectives simple words are 82%, derivative words 7% and complex words 11%.$\circled2$ The children's getting older, derivative and complex words are increased, in spite of simple words are reduced.$\circled3$ 4 year-old children get to start the ability of word formation and then since the children become 8 year-old, the children complete that ability almost all we think. -
This paper addresses a method of convolutive source separation that based on SEONS (Second Order Nonstationary Source Separation) [1] that was originally developed for blind separation of instantaneous mixtures using nonstationarity. In order to tackle this problem, we transform the convolutive BSS problem into multiple short-term instantaneous problems in the frequency domain and separated the instantaneous mixtures in every frequency bin. Moreover, we also employ a H infinity filtering technique in order to reduce the sensor noise effect. Numerical experiments are provided to demonstrate the effectiveness of the proposed approach and compare its performances with existing methods.
-
In this paper, speech quality is improved by removing abrupt noise intervals and then substituting the gaps with estimates of the previous speech waveform. An abrupt noise detection signal has been proposed as a prediction error signal by utilizing LP coefficients of the previous frame. Abrupt noise intervals are estimated by using spectral energy. After removing estimated noise intervals, we applied several waveform substitution techniques such as zero substitution, previous frame repetition, pattern matching, and pitch waveform replication. To prove the validity of our algorithm, the LPC spectral distortion test and the recognition test are executed and, the results show that the speech quality is fairly well improved.
-
본 논문은 잡음 환경에서의 음성인식을 위하여 음성에 부가된 잡음을 제거하는 방법으로 프레임 단위로 웨이브렛 변환 영역의 표준편차를 활용하여 시간 적응적 임계값을 사용하는 새로운 방법을 제안한다. 웨이브렛 변환영역의 cD1과 cA3의 표준편차 값을 이용하여 임계값을 설정함으로써 음성의 변화에 적응할 수 있도록 하였다. 또한 묵음구간의 잔여 잡음을 제거하기 위한 방법을 제안하였다. 실험을 통하여 제안한 방법이 기존의 웨이브렛 변환과 웨이브렛 패킷 변환을 이용한 잡음제거 방법보다 SNR(Signal to Noise Ratio)과 MSE(Mean Squared Error) 측면에서 향상됨을 확인할 수 있었다.
-
In real time packetized voice applications, missing frames is a major source of voice quality degradation. Thus packet loss concealment(PLC) algorithms are needed to guarantee the QoS of the VoIP. Still current speech codecs for VoIP work poor when consecutive packet losses are issued. In this paper, we proposed a new PLC algorithm for the G.729 codec. Our algorithm works better especially when the consecutive packet loss occurs mainly because it adopts an adaptive gain controller utilizing the number of missing packet information combined with a fixed codebook vector estimation algorithm and LPC bandwidth expansion.
-
This study compared pitch range, speech rate, pause, intonation type and boundary tones of four sentences produced by five transgenders (male to female) to those of twenty normal adults. Voice analysis was done by Praat (version 4.049). Results indicated that there was a difference in pitch range, speech rate, pause, intonation type and boundary tones among the 3 groups(transgenders, normal males and normal females). Especially, transgenders indicated boundary tones more frequently than normal adults.
-
The purpose of this study was to show the characteristics of phonetic contrasts of one-syllable words and speech intelligibility in hearing-impaired adults. Seven subjects with hearing-impaired participated in this experiment(2 males, 5 females). The test materials are 77 pairs of one-syllable words with phonetic contrasts. The results of this study were as follows: (1) The average score of intelligibility(scored accuracy) was the highest in contrasts of onset feature. (2) The scored percentages of error(except for combinations of contrasts) were the highest in articulatory manner contrasts of onset, tongue height contrasts of nucleus, and articulatory place contrasts of coda, respectively.
-
The aim of the paper is to analyze prosodic characteristics in apraxia of speech and establish the fundamental sources in diagnosis of motor speech disorders. The sentences consist of two different types (declarative and interrogative) with different numbers of constituents which are one to three. The stimuli were constructed to assess apraxics speech with articulation and humming skills. The features of speech patterns were examined such as utterance duration, boundary tones, and etc. The results of the analysis are as follow: 1) In the interrogative sentences, the rising boundary tones appeared only in the humming tasks 2) the utterance duration is relatively shorter in the humming tasks than the speech with articulation.
-
The purpose of this study was to investigate the fundamental frequency(Fo) of voice signal, the first to the third(F1-F3), and duration in children with hearing impairment. Each subject made a recording of sustained /i/ and /a/, four VbV as and four VsV. The Praat 4.1.6. was used for analysis. The results of this study were as follows: First, F0 of children with hearing impairment were higher than normal children. Second, /a/ vowel was showed that F1, F2 and duration were higher than normal children. Third, /i/ vowel was showed that F1 and duration were higher than normal children. However, F2 was lower than normal children. Therapeutic implications have been drawn.
-
The purpose of this study was to provide acoustic data on the voice of the laryngectomized patients for more scientific and efficient voice rehabilitation. The phonation of prolonged /a/ of 9 electronic artificial larynx(AL) users, 5 esophageal(EP) speech users, and 2 tracheo-esophageal(TEP) voice users were recorded and analyzed using Multi-Speech. Habitual f0, mean f0, sd f0, max f0, min f0, jitter, shimmer, and NHR were compared among groups of subjects using t-test. The EP and TEP groups exhibited higher f0 compared to the AL group. The AL and TEP groups showed more stable f0 than the EP group. In addition, the quality of TEP and EP voices were comparatively better in terms of jitter, shimmer, and NHR.
-
Speech recognition performance depends on various factors. One of the factors is the characteristic and established distance of a microphone which is used when speech data is collected. Thus, in the present experiment speech databases for tests are created through the type and established distance of a microphone. Then, acoustic models are built based on these databases, and each of the acoustic models is assessed by the data to determine recognition performance depending on various microphones and established microphone distances.
-
To improve the recognition performance of Korean connected digit telephone speech, in this paper, both Aurora feature extraction method that employs noise reduction 2-state Wiener filter and DWFBA method are investigated and used. CMN and MRTCN are applied to static features for channel compensation. Telephone digit speech database released by SITEC is used for recognition experiments with HTK system. Experimental results has shown that Aurora feature is slightly better than MFCC and DWFBA without channel compensation. And when channel compensation is included, Aurora feature is slightly better than DWFBA with MRTCN.
-
In this paper, data-driven temporal filter methods[1] are investigated for robust feature extraction. A principal component analysis technique is applied to the time trajectories of feature sequences of training speech data to get appropriate temporal filters. We did recognition experiments on the Korean connected digit telephone speech database released by SITEC, with data-driven temporal filters. Experimental results are discussed with our findings.
-
Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.
-
This study investigates Korean phonation types in terms of spectral characteristics of release bursts. Particularly, this study compares intensity of the speech signal in the release burst, the center of gravity and skewness of the spectra of the release bursts across phonation types for the Korean alveolar plosives. The results showed that there was no significant difference in intensity, center of gravity, or skewness across phonation types but a significant difference across speakers.
-
The present thesis examines the correlation of VOT and F0 in the three-way distinction of Korean obstruents, conducting production and perception tests. In the production test, one female native speaker of Korean with a Seoul dialect (the author) recorded 15 repetitions of a monosyllabic word list including /ka, kha, k*a, pa, pha, p*a, ta, tha, t*a, ca, cha, c*a/ in random order, VOT and F0 of the following vowels were measured, and the result was significant for the three-way distinction with a strong correlation between VOT and F0, and also in the VOT-F0 plot, no overlapping among the domains was observed. As for the perception test, I manipulated the data recorded in the production test, heightening or lowering their F0 values. In all, 14 subjects (seven males and seven females) participated in the identification test. The result was as follows: the fortis stimuli were not influenced by F0 changes, and the VOT and F0 values at the lenis-aspirated boundary were negatively correlated. From these results I concluded the following: 1) VOT and F0 can distinguish the three domains of Korean obstruents without overlapping; 2) the fortis perception does not need F0 as its acoustic cue; and 3) VOT and F0 in the distinction between the lenis and aspirated are in the phonetic trading relation[2].
-
The purpose of this study was to provide preliminary data on the acoustical differences of one syllable words spoken by speakers with different language backgrounds. 20 native speakers of Korean and English were asked to read 7 one-syllable words written in their native language. The phonetic and phonemic characteristics of 7 words were similar between two languages. The ratio of duration of the body (onset+nucleus) and the rhyme(nucleus+coda) relative to the duration of each syllable were calculated using CSL (Computerized Speech Laboratory). The results corresponds to the body-coda structure of the Korean syllable which is supported by the recent experimental psychological studies. More acoustic studies on the Korean syllable structure are required to establish clinical foundation for the phonological awareness and the reading intervention programs.
-
In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.
-
This paper explores the perceptual relevance of acoustical correlates of emotional speech by using formant synthesizer. The focus is on the role of mean pitch, pitch range, speed rate and phonation type when it comes to synthesizing emotional speech. The result of this research is backing up the traditional impressionistic observations. However it suggests that some phonation types should be synthesized with further refinement.
-
In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.