• Title/Summary/Keyword: Diphone

Search Result 32, Processing Time 0.048 seconds

Korean Word Recognition Using Diphone- Level Hidden Markov Model (Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식)

  • Park, Hyun-Sang;Un, Chong-Kwan;Park, Yong-Kyu;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.14-23
    • /
    • 1994
  • In this paper, speech units appropriate for recognition of Korean language have been studied. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional legions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 7 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by two-state HMM's and stationary phoneme regions by one-state HMM's excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from $93.98\%$ to $96.29\%$. In addition, a local interpolation technique was used to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of $97.22\%$ after merging some diphone units.

  • PDF

'Hanmal' Korean Language Diphone Database for Speech Synthesis

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.55-63
    • /
    • 2005
  • This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.

  • PDF

A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction (한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구)

  • ;Jeong, Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.46-63
    • /
    • 1995
  • This thesis described an unlimited vocabulary connected speech recognition system using Time Delay Neural Network(TDNN). The recognition unit is the diphone unit which includes the transition section of two phonemes, and the number of diphone unit is 329. The recognition processing of korean connected speech is composed by three part; the feature extraction section of the input speech signal, the diphone recognition processing and post-processing. In the feature extraction section, the extraction of diphone interval in input speech signal is carried and then the feature vectors of 16th filter-bank coefficients are calculated for each frame in the diphone interval. The diphone recognition processing is comprised by the three stage hierachical structure and is carried using 30 Time Delay Neural Networks. particularly, the structure of TDNN is changed so as to increase the recognition rate. The post-processing section, mis-recognized diphone strings are corrected using the probability of phoneme transition and the probability o phoneme confusion and then the eojeols (Korean word or phrase) are formed by combining the recognized diphones.

  • PDF

Definition end Function of Two Song Types of the Bush Warbler (Cettia diphone boreoalis)

  • Shi-Ryong Park;Eui-Dong Han;Ha-Cheol Sung
    • Animal cells and systems
    • /
    • v.3 no.2
    • /
    • pp.149-151
    • /
    • 1999
  • It has been suggested that the bush warbler (Cettia diphone borealis) uses different song types in various situations. We analyzed song features and conducted playback experiments in order to reveal the function of songs of the bush warbler. Two song types were identified. The short song type has a shorter song duration than that of normal song types and consists of only one or two syllables. Due to its short syllable and low amplitude of the whistle portion, we were able to discriminate the short song type (S song type). from the normal song type (N song Type). In the playback experiments, bush warblers sang high rates of short song type for the first three minutes after playback. After 6 minutes of playback, males changed to singing normal songs. These results suggest that the short song of the bush warbler may function to threaten or drive off intruding males.

  • PDF

Implementation of Vocabulary- Independent Speech Recognizer Using a DSP (DSP를 이용한 가변어휘 음성인식기 구현에 관한 연구)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.143-156
    • /
    • 2004
  • In this paper, we implemented a vocabulary-independent speech recognizer using the TMS320VC33 DSP. For this implementation, we had developed very small-sized recognition engine based on diphone sub-word unit, which is especially suited for embedded applications where the system resources are severely limited. The recognition accuracy of the developed recognizer with 1 mixture per state and 4 states per diphone is 94.5% when tested on frequently-used 2000 words set. The design of the hardware was focused on minimal use of parts, which results in reduced material cost. The finally developed hardware only includes a DSP, 512 Kword flash ROM and a voice codec. In porting the recognition engine to the DSP, we introduced several methods of using data and program memory efficiently and developed the versatile software protocol for host interface. Finally, we also made an evaluation board for testing the developed hardware recognition module.

  • PDF

A Song Transition among the Geographic Populations of Bush Warbler (Cettia diphone) (휘파람새(Cettia Diphone)개체군간 song 변이의 방향)

  • Park, Dae Sik;Sooil Kim;Shi-Ryong Park
    • The Korean Journal of Ecology
    • /
    • v.19 no.2
    • /
    • pp.141-149
    • /
    • 1996
  • This study was to examine the occurrence of geographic song variation and its pattern of transitional direction among bush warbler populations distributed in Korea and Japan, Bush warbler songs (n=283) of 25 males from Cheongwon and Jeju, Korea, and from Chiba, Japan were analyzed. Chiba individuals had more song types and had the higher dominant frequency and longer duration of the introductory whistle portion than Cheongwon and Jeju individuals. In measure of eight song parameters, the parameters constantly showed a decreasing or increasing tendency. The constant tendency showed direction related with the geographic location from Chiba to Cheongwon. The difference in song parameters between Cheongwon and Chiba populations was the greatest in comparison to that of other sets of geographic populations. The degree of discrimination among the three populations was 92.00%. These results indicate that there is a geographic song variation between bush warblers of Japan and Korea, and that the song transition has been directed from Chiba (Japan) through Jeju to Cheongwon (Korea).

  • PDF

Behavioral Function of the Anomalous Song in the Bush Warbler, Cettia diphone

  • Park, Shi-Ryong;Cheong, Seok-Wan;Chung, Hoon
    • Animal cells and systems
    • /
    • v.8 no.2
    • /
    • pp.89-95
    • /
    • 2004
  • The bush warblers (Cettia diphone) have been recognized to possess two types of songs: a normal song that plays roles in attracting mate and territorial defense, and an anomalous song. The present study suggests that the anomalous song functions as an alarm signal as well as other unknown signals. Field observations and playback experiments on the anomalous song of bush warbler were conducted in order to investigate the contextual information that occurred between sender and receiver. In the field observation, the males frequently emitted anomalous songs to potential predators. The males responded with an anomalous song to stuffed potential predators. The distance from where the anomalous song occurs to the stimulating source varied depending upon the kinds of stimulus. The males of bush warbler possibly show different responses to the anomalous song depending on the level of danger. When the anomalous song was played back to terrestrial males and females, no distinctive behavior was observed. The anomalous song may be sung to defend the territory against predators or to distract invaders from the nest and female because the male and female behaviors were related with the anomalous song and its phonetic characteristics.

Design and Implementation of Korean Tet-to-Speech System (다이폰을 이용한 한국어 문자-음성 변환 시스템의 설계 및 구현)

  • 정준구
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.91-94
    • /
    • 1994
  • This paper is a study on the design and implementation of the Korean Tet-to-Speech system. In this paper, parameter symthesis method is chosen for speech symthesis method and PARCOR coeffient, one of the LPC analysis, is used as acoustic parameter, We use a diphone as synthesis unit, it include a basic naturalness of human speech. Diphone DB is consisted of 1228 PCM files. LPC synthesis method has defect that decline clearness of synthesis speech, during synthesizing unvoiced sound In this paper, we improve clearness of synthesized speech, using residual signal as ecitation signal of unvoiced sound. Besides, to improve a naturalness, we control the prosody of synthesized speech through controlling the energy and pitch pattern. Synthesis system is implemented at PC/486 and use a 70Hz-4.5KHz band pass filter for speech imput/output, amplifier and TMS320c30 DSP board.

  • PDF

Perceptual Evaluation of Duration Models in Spoken Korean

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.207-215
    • /
    • 2002
  • Perceptual evaluation of duration models of spoken Korean was carried out based on the Classification and Regression Tree (CART) model for text-to-speech conversion. A reference set of durations was produced by a commercial text-to-speech synthesis system for comparison. The duration model which was built in the previous research (Chung & Huckvale, 2001) was applied to a Korean language speech synthesis diphone database, 'Hanmal (HN 1.0)'. The synthetic speech produced by the CART duration model was preferred in the subjective preference test by a small margin and the synthetic speech from the commercial system was superior in the clarity test. In the course of preparing the experiment, a labeled database of spoken Korean with 670 sentences was constructed. As a result of the experiment, a trained duration model for speech synthesis was obtained. The 'Hanmal' diphone database for Korean speech synthesis was also developed as a by-product of the perceptual evaluation.

  • PDF