Search | Korea Science

Yoo, Jae-Kwon;Lee, Kyoung-Mi
- The Journal of the Korea Contents Association
- /
- v.11 no.5
- /
- pp.138-147
- /
- 2011
While most Korean speech databases are developed for adults' speech, not for children's speech, there are various children's speech databases based on other languages. Because there are wide differences between children's and adults' speech in acoustic and linguistic characteristics, the children's speech database needs to be developed. In this paper, to find the differences between them in Korean, we built speech recognizers using HMM and tested them according to gender, age, and the presence of VTLN(Vocal Tract Length Normalization). This paper shows the speech recognizer made by children's speech has a much higher recognition rate than that made by adults' speech and using VTLN helps to improve the recognition rate in Korean.
https://doi.org/10.5392/JKCA.2011.11.5.138 인용 PDF KSCI

Seung Min Kim;Dae Eol Park;Dae Seon Choi
- Smart Media Journal
- /
- v.12 no.2
- /
- pp.56-65
- /
- 2023
In broadcasts such as news and coverage programs, voice is modulated to protect the identity of the informant. Adjusting the pitch is commonly used voice modulation method, which allows easy voice restoration to the original voice by adjusting the pitch. Therefore, since broadcast voice modulation methods cannot properly protect the identity of the speaker and are vulnerable to security, a new voice modulation method is needed to replace them. In this paper, using the Lightweight speech de-identification model as the evaluation target model, we compare speech de-identification performance with broadcast voice modulation method using pitch modulation. Among the six modulation methods in the Lightweight speech de-identification model, we experimented on the de-identification performance of Korean speech as a human test and EER(Equal Error Rate) test compared with broadcast voice modulation using three modulation methods: McAdams, Resampling, and Vocal Tract Length Normalization(VTLN). Experimental results show VTLN modulation methods performed higher de-identification performance in both human tests and EER tests. As a result, the modulation methods of the Lightweight model for Korean speech has sufficient de-identification performance and will be able to replace the security-weak broadcast voice modulation.
https://doi.org/10.30693/SMJ.2023.12.2.56 인용 PDF

Shin, Ok-Keun
- The KIPS Transactions:PartB
- /
- v.12B no.4 s.100
- /
- pp.437-442
- /
- 2005
For the purpose of speaker normalization in speaker independent speech recognition systems, experiments are conducted on a method based on Gaussian mixture model(GMM). The method, which is an improvement of the previous study based on vector quantizer, consists of modeling the probability distribution of canonical feature vectors by a GMM with an appropriate number of clusters, and of estimating the warp factor of a test speaker by making use of the obtained probabilistic model. The purpose of this study is twofold: improving the existing ML based methods, and comparing the performance of what is called 'soft decision' method with that of the previous study based on vector quantizer. The effectiveness of the proposed method is investigated by recognition experiments on the TIMIT corpus. The experimental results showed that a little improvement could be obtained tv adjusting the number of clusters in GMM appropriately.
https://doi.org/10.3745/KIPSTB.2005.12B.4.437 인용 PDF KSCI