통합 검색 | Korea Science

음소 유사율 오류 보정을 이용한 어휘 인식 후처리 시스템 (Vocabulary Recognition Post-Processing System using Phoneme Similarity Error Correction)

안찬식;오상엽
- 한국컴퓨터정보학회논문지
- /
- 제15권7호
- /
- pp.83-90
- /
- 2010
어휘 인식 시스템에서 인식률 저하의 요인으로는 유사한 음소 인식과 부정확한 어휘 제공으로 인해 오인식 오류가 존재한다. 부정확한 어휘의 입력으로 특징을 추출하여 인식할 경우 오인식의 결과가 나타나거나 유사한 음소로 인식되며 특징 추출이 제대로 이루어지지 않으면 음소 인식 시 유사한 음소로 인식하게 된다. 따라서 본 논문에서는 음소가 갖는 특징을 기반으로 음소 유사율을 이용한 어휘 인식 후처리에서의 오류 보정 후처리 시스템을 제안하였다. 음소 유사율은 모노폰으로 훈련시킨 훈련 데이터를 각각의 음소에 MFCC와 LPC 특징 추출 방법을 이용하여 구하였다. 유사한 음소는 정확한 음소로 인식할 수 있도록 유도하여 부정확한 어휘 제공으로 인하여 오인식되는 오류를 최소화하였다. 음소 유사율과 신뢰도를 이용하여 오류 보정율을 구하였으며, 어휘 인식 과정에서 오류로 판명된 어휘에 대하여 오류 보정을 수행하였다. 에러패턴 학습을 이용한 시스템과 의미기반을 이용한 시스템에 비해 시스템 성능 평가 결과 MFCC와 LPC는 각각 7.5%와 5.3%의 인식 향상률을 보였다.
https://doi.org/10.9708/jksci.2010.15.7.083 인용 PDF KSCI

유성음과 무성음의 경계를 이용한 연속 음성의 세그먼테이션 (Segmentation of continuous Korean Speech Based on Boundaries of Voiced and Unvoiced Sounds)

유강주;신욱근
- 한국정보처리학회논문지
- /
- 제7권7호
- /
- pp.2246-2253
- /
- 2000
In this paper, we show that one can enhance the performance of blind segmentation of phoneme boundaries by adopting the knowledge of Korean syllabic structure and the regions of voiced/unvoiced sounds. eh proposed method consists of three processes : the process to extract candidate phoneme boundaries, the process to detect boundaries of voiced/unvoiced sounds, and the process to select final phoneme boundaries. The candidate phoneme boudaries are extracted by clustering method based on similarity between two adjacent clusters. The employed similarity measure in this a process is the ratio of the probability density of adjacent clusters. To detect he boundaries of voiced/unvoiced sounds, we first compute the power density spectrum of speech signal in 0∼400 Hz frequency band. Then the points where this paper density spectrum variation is greater than the threshold are chosen as the boundaries of voiced/unvoiced sounds. The final phoneme boundaries consist of all the candidate phoneme boundaries in voiced region and limited number of candidate phoneme boundaries in unvoiced region. The experimental result showed about 40% decrease of insertion rate compared to the blind segmentation method we adopted.
PDF

주파수 상태 신경 회로망을 이용한 음소 인식 (Phoneme Recognition Using Frequency State Neural Network)

이준모;황영수;김성종;신인철
- 한국음향학회지
- /
- 제13권4호
- /
- pp.12-19
- /
- 1994
본 논문에서는 음소의 시간 구조 특성만을 다룬 일반적인 TSNN 방법에 음소의 주파수 대역 구조를 포함시킨 신경 회로망을 제안한다. 제안된 신경 회로망에 음소(아, 이, 오, ㅅ, ㅊ, ㅍ, ㄱ, ㅇ, ㄹ, ㅁ)을 학습시켜 인식을 수행한 결과, 시간 인자 특성을 입력으로 음소를 인식한 일반적인 TDNN 방법 과 TSNN 방법보다 본 논문에서 시간과 주파수 인자를 동시에 입력으로 수행한 신경회로망 방법이 약간 더 나은 인식 결과를 보였다.
PDF

2-5 세 아동의 자발적 발화에 나타난 한국어 음절 및 음운 빈도 (Syllable and Phoneme Frequencies in the Spontaneous Speech of 2-5 year-old Korean Children)

김민정;배소영;고도흥
- 음성과학
- /
- 제8권4호
- /
- pp.99-107
- /
- 2001
The purpose of this study was to investigate the syllable and phoneme frequencies in the spontaneous speech of some Korean children. Sixty four normally developing children aged from 2 to 5 were involved (male: female=1 : 1, 16 children in each age group). Fifty connected utterances were analyzed using the KCLA (Korean Computerized Language Analysis) 2.0 and Exel. The findings were as follows: 1) /i/ was the most frequently used syllable and was followed by /yo/, /k/, /s'/, /nen/ and so on. 2) The most frequently used Korean phonemes were syllable-initial consonant /k/, syllable- medial vowel /a/ and syllable-final consonant /n/. 3) All seven syllable final consonants (/p,t,k,m,n,n,l/) were used more frequently in the word-medial position than in the word-final position. Three syllable initial consonants(/k, I, s'/) were used more frequently in the word-medial position than in the word-initial position. The syllable and phoneme frequencies in the Korean children's spontaneous speech will provide valuable information in interpreting the severity of phonological disorder and in developing tools for the Korean phonological assessment and intervention.
PDF

국어의 이음.음소와 자모의 출현빈도수 조사 대비 및 분석 (A Comparative Study on the Frequency of Allophones, Phonemes and Letters in Korean)

이상억
- 음성과학
- /
- 제8권3호
- /
- pp.51-73
- /
- 2001
This study starts with an investigation of the frequency of allophones from the narrowly transcribed data of (1) most frequently used 2000 words and (2) some passages of standard Seoul Korean. Consequently this entails the investigation of the frequency of phonemes by adding the number of allophones. These two investigations are conducted for the first time in the study of Korean phonology. Previous studies on the reported 'frequency of phoneme' are in fact studies on the 'frequency of letters' and the critical difference between these two types of studies has yet to be clarified accurately. This paper also reveals the proportional distribution of natural classes among Korean phonemes and letters.
PDF

Support Vector Machine Based Phoneme Segmentation for Lip Synch Application

Lee, Kun-Young;Ko, Han-Seok
- 음성과학
- /
- 제11권2호
- /
- pp.193-210
- /
- 2004
In this paper, we develop a real time lip-synch system that activates 2-D avatar's lip motion in synch with an incoming speech utterance. To realize the 'real time' operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.
PDF

SVM을 이용한 자동 음소분할에 관한 연구 (Research about auto-segmentation via SVM)

권호민;한학용;김창근;허강인
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
- /
- pp.2220-2223
- /
- 2003
In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).
PDF

유아의 단어읽기 능력 예측변수 : 연령 집단별, 단어 유형별 분석 (Predictors of Preschoolers' Reading Skills : Analysis by Age Groups and Reading Tasks)

최나야;이순형
- 가정과삶의질연구
- /
- 제26권4호
- /
- pp.41-54
- /
- 2008
The purpose of this study was to investigate predictors concerning preschoolers' ability to read words, in terms of their sub-skills of alphabet knowledge, phonological awareness, and phonological processing. Fourteen literacy sub-tests and three types of reading tasks were administered to 289 kindergartners aged 4 to 6 in Busan. The main results are as follows. Sub-skills that predicted reading ability varied with children's age. Irrespective of children's age groups, knowledge of consonant names and digit naming speed commonly explained the reading of real words. In contrast, skills of syllable deletion and phoneme substitution and knowledge of alphabet composition principles were related to only 4-year-olds' reading skills. Exclusively included was digit memory in predicting 5-year-olds' reading abilities, and knowledge of vowel sounds in 6-year-olds' reading skills. The type of reading task also influenced reading ability. A few common variables such as knowledge of consonant names and vowel sounds, digit naming speed, and phoneme substitution skill explained all types of word reading. Syllable counting skills, however, had predictive value only for the reading of real words. Phoneme insertion skills and digit memory had predictive value for the reading of pseudo words and low frequency letters. Likewise, knowledge of consonant sounds and vowel stroke-adding principles were significant only for the reading of low frequency letters.
PDF KSCI

음소경계검출과 신경망을 이용한 음소인식 연구 (Phoneme-Boundary-Detection and Phoneme Recognition Research using Neural Network)

임유두;강민구;최영호
- 한국정보통신학회:학술대회논문집
- /
- 한국해양정보통신학회 1999년도 추계종합학술대회
- /
- pp.224-229
- /
- 1999
음성 인식 연구는 유사음소 단위의 인식시스템을 구축하는 방향과 단어 단위의 인식시스템에서의 효율을 최대화하는 방향으로 이루어지고 있다. 이중 유용한 유사음소 단위의 인식시스템 구현을 위해서는 음소의 경계 검출 문제와 검출된 음소에 대한 인식률 향상 문제가 해결되어야 한다. 기존의 LPC(Linear Predictive Coefficient) 방법들은 기준 음소데이터의 LPC와 입력 음성프레임의 LPC 사이의 거리를 Itakura-Saito 방법으로 구하여 음소의 경계를 검출하였으며, 근래에는 MFCC(Mel-Frequency-Cepstrum Coefficient)를 이용하여 스펙트럼의 천이부분을 음소의 경계로 검출하는 방법들이 제안되어왔으나 이러한 방법들은 공통적으로 적응성이 미비하다는 단점이 있다. 본 논문에서는 이러한 단점을 극복하기 위해 음소경계검출을 위해서는 auto-correlation을 이용하고 음소인식을 위해서는 적응성이 뛰어난 다층 Feed-Forward 신경망을 사용하는 새로운 인식시스템을 제안하였다 제안하는 시스템은 기존의 방법들보다 적응성이 뛰어나고 특징추출부분과 인식 부분의 알고리듬이 독립적이라는 장점을 가지며 프레임단위의 음소인식시스템의 구현 가능성을 확인해 주었다.
PDF

Thai Phoneme Segmentation using Dual-Band Energy Contour

Ratsameewichai, S.;Theera-Umpon, N.;Vilasdechanon, J.;Uatrongjit, S.;Likit-Anurucks, K.
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2002년도 ITC-CSCC -1
- /
- pp.110-112
- /
- 2002
In this paper, a new technique for Thai isolated speech phoneme segmentation is proposed. Based on Thai speech feature, the isolated speech is first divided into low and high frequency components by using the technique of wavelet decomposition. Then the energy contour of each decomposed signal is computed and employed to locate phoneme boundary. To verity the proposed scheme, some experiments have been performed using 1,000 syllables data recorded from 10 speakers. The accuracy rates are 96.0, 89.9, 92.7 and 98.9% for initial consonant, vowel, final consonant and silence, respectively.
PDF

검색결과 52건 처리시간 0.027초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)