• Title/Summary/Keyword: Large vocabulary continuous speech recognition

Search Result 36, Processing Time 0.026 seconds

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition (한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구)

  • Lee Sang-Bok;Choi Dae-Lim;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

Modeling Cross-morpheme Pronunciation Variations for Korean Large Vocabulary Continuous Speech Recognition (한국어 연속음성인식 시스템 구현을 위한 형태소 단위의 발음 변화 모델링)

  • Chung Minhwa;Lee Kyong-Nim
    • MALSORI
    • /
    • no.49
    • /
    • pp.107-121
    • /
    • 2004
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in within-morpheme and cross-morpheme. The results of 33K-morpheme Korean CSR experiments show that an absolute reduction of 1.45% in WER from the baseline performance of 18.42% WER was achieved by modeling proposed pronunciation variations with a possible multiple context-dependent pronunciation lexicon.

  • PDF

Improvement and Evaluation of the Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS) (한국어 음성인식 플랫폼(ECHOS)의 개선 및 평가)

  • Kwon, Suk-Bong;Yun, Sung-Rack;Jang, Gyu-Cheol;Kim, Yong-Rae;Kim, Bong-Wan;Kim, Hoi-Rin;Yoo, Chang-Dong;Lee, Yong-Ju;Kwon, Oh-Wook
    • MALSORI
    • /
    • no.59
    • /
    • pp.53-68
    • /
    • 2006
  • We report the evaluation results of the Korean speech recognition platform called ECHOS. The platform has an object-oriented and reusable architecture so that researchers can easily evaluate their own algorithms. The platform has all intrinsic modules to build a large vocabulary speech recognizer: Noise reduction, end-point detection, feature extraction, hidden Markov model (HMM)-based acoustic modeling, cross-word modeling, n-gram language modeling, n-best search, word graph generation, and Korean-specific language processing. The platform supports both lexical search trees and finite-state networks. It performs word-dependent n-best search with bigram in the forward search stage, and rescores the lattice with trigram in the backward stage. In an 8000-word continuous speech recognition task, the platform with a lexical tree increases 40% of word errors but decreases 50% of recognition time compared to the HTK platform with flat lexicon. ECHOS reduces 40% of recognition errors through incorporation of cross-word modeling. With the number of Gaussian mixtures increasing to 16, it yields word accuracy comparable to the previous lexical tree-based platform, Julius.

  • PDF

Review And Challenges In Speech Recognition (ICCAS 2005)

  • Ahmed, M.Masroor;Ahmed, Abdul Manan Bin
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1705-1709
    • /
    • 2005
  • This paper covers review and challenges in the area of speech recognition by taking into account different classes of recognition mode. The recognition mode can be either speaker independent or speaker dependant. Size of the vocabulary and the input mode are two crucial factors for a speech recognizer. The input mode refers to continuous or isolated speech recognition system and the vocabulary size can be small less than hundred words or large less than few thousands words. This varies according to system design and objectives.[2]. The organization of the paper is: first it covers various fundamental methods of speech recognition, then it takes into account various deficiencies in the existing systems and finally it discloses the various probable application areas.

  • PDF

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging (비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Korean broadcast news transcription system with out-of-vocabulary(OOV) update module (한국어 방송 뉴스 인식 시스템을 위한 OOV update module)

  • Jung Eui-Jung;Yun Seung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.33-36
    • /
    • 2002
  • We implemented a robust Korean broadcast news transcription system for out-of-vocabulary (OOV), tested its performance. The occurrence of OOV words in the input speech is inevitable in large vocabulary continuous speech recognition (LVCSR). The known vocabulary will never be complete due to the existence of for instance neologisms, proper names, and compounds in some languages. The fixed vocabulary and language model of LVCSR system directly face with these OOV words. Therefore our Broadcast news recognition system has an offline OOV update module of language model and vocabulary to solve OOV problem and selects morpheme-based recognition unit (so called, pseudo-morpheme) for OOV robustness.

  • PDF

Design of Linguistic Contents of Speech Copora for Speech Recognition and Synthesis for Common Use (공동 이용을 위한 음성 인식 및 합성용 음성코퍼스의 발성 목록 설계)

  • Kim Yoen-Whoa;Kim Hyoung-Ju;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.43
    • /
    • pp.89-99
    • /
    • 2002
  • Recently, researches into ways of improving large vocabulary continuous speech recognition and speech synthesis are being carried out intensively as the field of speech information technology is progressing rapidly. In the field of speech recognition, developments of stochastic methods such as HMM require large amount of speech data for training, and also in the field of speech synthesis, recent practices show that synthesis of better quality can be produced by selecting and connecting only the variable size of speech data from the large amount of speech data. In this paper we design and discuss linguistic contents for speech copora for speech recognition and synthesis to be shared in common.

  • PDF

Modified Phonetic Decision Tree For Continuous Speech Recognition

  • Kim, Sung-Ill;Kitazoe, Tetsuro;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4E
    • /
    • pp.11-16
    • /
    • 1998
  • For large vocabulary speech recognition using HMMs, context-dependent subword units have been often employed. However, when context-dependent phone models are used, they result in a system which has too may parameters to train. The problem of too many parameters and too little training data is absolutely crucial in the design of a statistical speech recognizer. Furthermore, when building large vocabulary speech recognition systems, unseen triphone problem is unavoidable. In this paper, we propose the modified phonetic decision tree algorithm for the automatic prediction of unseen triphones which has advantages solving these problems through following two experiments in Japanese contexts. The baseline experimental results show that the modified tree based clustering algorithm is effective for clustering and reducing the number of states without any degradation in performance. The task experimental results show that our proposed algorithm also has the advantage of providing a automatic prediction of unseen triphones.

  • PDF

On the Development of a Large-Vocabulary Continuous Speech Recognition System for the Korean Language (대용량 한국어 연속음성인식 시스템 개발)

  • Choi, In-Jeong;Kwon, Oh-Wook;Park, Jong-Ryeal;Park, Yong-Kyu;Kim, Do-Yeong;Jeong, Ho-Young;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.44-50
    • /
    • 1995
  • This paper describes a large-vocabulary continuous speech recognition system using continuous hidden Markov models for the Korean language. To improve the performance of the system, we study on the selection of speech modeling units, inter-word modeling, search algorithm, and grammars. We used triphones as basic speech modeling units, generalized triphones and function word-dependent phones are used to improve the trainability of speech units and to reduce errors in function words. Silence between words is optionally inserted by using a silence model and a null transition. Word pair grammar and bigram model based oil word classes are used. Also we implement a search algorithm to find N-best candidate sentences. A postprocessor reorders the N-best sentences using word triple grammar, selects the most likely sentence as the final recognition result, and finally corrects trivial errors related with postpositions. In recognition tests using a 3,000-word continuous speech database, the system attained $93.1\%$ word recognition accuracy and $73.8\%$ sentence recognition accuracy using word triple grammar in postprocessing.

  • PDF