• Title/Summary/Keyword: Discriminative Training

Search Result 53, Processing Time 0.025 seconds

Discriminative Training of Stochastic Segment Model Based on HMM Segmentation for Continuous Speech Recognition

  • Chung, Yong-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4E
    • /
    • pp.21-27
    • /
    • 1996
  • In this paper, we propose a discriminative training algorithm for the stochastic segment model (SSM) in continuous speech recognition. As the SSM is usually trained by maximum likelihood estimation (MLE), a discriminative training algorithm is required to improve the recognition performance. Since the SSM does not assume the conditional independence of observation sequence as is done in hidden Markov models (HMMs), the search space for decoding an unknown input utterance is increased considerably. To reduce the computational complexity and starch space amount in an iterative training algorithm for discriminative SSMs, a hybrid architecture of SSMs and HMMs is programming using HMMs. Given the segment boundaries, the parameters of the SSM are discriminatively trained by the minimum error classification criterion based on a generalized probabilistic descent (GPD) method. With the discriminative training of the SSM, the word error rate is reduced by 17% compared with the MLE-trained SSM in speaker-independent continuous speech recognition.

  • PDF

Discriminative Weight Training for Gender Identification (변별적 가중치 학습을 적용한 성별인식 알고리즘)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.252-255
    • /
    • 2008
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based gender identification. In our approach, the gender decision rule is expressed as the SVM of optimally weighted mel-frequency cepstral coefficients (MFCC) based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to each MFCC filter bank which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for gender identification using SVM.

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Employing SVM Based on Discriminative Weight Training (SMV코덱의 음성/음악 분류 성능 향상을 위한 최적화된 가중치를 적용한 입력벡터 기반의 SVM 구현)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk;Cho, Ki-Ho;Kim, Nam-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.471-476
    • /
    • 2009
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based speech/music classification for the selectable mode vocoder (SMV) of 3GPP2. In our approach, the speech/music decision rule is expressed as the SVM discriminant function by incorporating optimally weighted features of the SMV based on a minimum classification error (MCE) method which is different from the previous work in that different weights are assigned to each the feature of SMV. The performance of the proposed approach is evaluated under various conditions and yields better results compared with the conventional scheme in the SVM.

Improving transformer-based acoustic model performance using sequence discriminative training (Sequence dicriminative training 기법을 사용한 트랜스포머 기반 음향 모델 성능 향상)

  • Lee, Chae-Won;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.335-341
    • /
    • 2022
  • In this paper, we adopt a transformer that shows remarkable performance in natural language processing as an acoustic model of hybrid speech recognition. The transformer acoustic model uses attention structures to process sequential data and shows high performance with low computational cost. This paper proposes a method to improve the performance of transformer AM by applying each of the four algorithms of sequence discriminative training, a weighted finite-state transducer (wFST)-based learning used in the existing DNN-HMM model. In addition, compared to the Cross Entropy (CE) learning method, sequence discriminative method shows 5 % of the relative Word Error Rate (WER).

Model Adaptation Using Discriminative Noise Adaptive Training Approach for New Environments

  • Jung, Ho-Young;Kang, Byung-Ok;Lee, Yun-Keun
    • ETRI Journal
    • /
    • v.30 no.6
    • /
    • pp.865-867
    • /
    • 2008
  • A conventional environment adaptation for robust speech recognition is usually conducted using transform-based techniques. Here, we present a discriminative adaptation strategy based on a multi-condition-trained model, and propose a new method to provide universal application to a new environment using the environment's specific conditions. Experimental results show that a speech recognition system adapted using the proposed method works successfully for other conditions as well as for those of the new environment.

  • PDF

Minimum Classification Error Training to Improve Discriminability of PCMM-Based Feature Compensation (PCMM 기반 특징 보상 기법에서 변별력 향상을 위한 Minimum Classification Error 훈련의 적용)

  • Kim Wooil;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.58-68
    • /
    • 2005
  • In this paper, we propose a scheme to improve discriminative property in the feature compensation method for robust speech recognition under noisy environments. The estimation of noisy speech model used in existing feature compensation methods do not guarantee the computation of posterior probabilities which discriminate reliably among the Gaussian components. Estimation of Posterior probabilities is a crucial step in determining the discriminative factor of the Gaussian models, which in turn determines the intelligibility of the restored speech signals. The proposed scheme employs minimum classification error (MCE) training for estimating the parameters of the noisy speech model. For applying the MCE training, we propose to identify and determine the 'competing components' that are expected to affect the discriminative ability. The proposed method is applied to feature compensation based on parallel combined mixture model (PCMM). The performance is examined over Aurora 2.0 database and over the speech recorded inside a car during real driving conditions. The experimental results show improved recognition performance in both simulated environments and real-life conditions. The result verifies the effectiveness of the proposed scheme for increasing the performance of robust speech recognition systems.

Enhancement of Speech/Music Classification for 3GPP2 SMV Codec Employing Discriminative Weight Training (변별적 가중치 학습을 이용한 3GPP2 SVM의 실시간 음성/음악 분류 성능 향상)

  • Kang, Sang-Ick;Chang, Joon-Hyuk;Lee, Seong-Ro
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.6
    • /
    • pp.319-324
    • /
    • 2008
  • In this paper, we propose a novel approach to improve the performance of speech/music classification for the selectable mode vocoder (SMV) of 3GPP2 using the discriminative weight training which is based on the minimum classification error (MCE) algorithm. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then proposed the speech/music decision rule is expressed as the geometric mean of optimally weighted features which are selected from the SMV. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

  • Kim, Tae-Hyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.3C
    • /
    • pp.257-267
    • /
    • 2009
  • This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.

Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection (통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습)

  • Kang, Sang-Ick;Jo, Q-Haing;Park, Seung-Seop;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.194-198
    • /
    • 2007
  • In this paper, we apply a discriminative weight training to a statistical model-based voice activity detection(VAD). In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratios(LRs) based on a minimum classification error(MCE) method which is different from the previous works in that different weights are assigned to each frequency bin which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LR test.

Voice Activity Detection Based on Real-Time Discriminative Weight Training (실시간 변별적 가중치 학습에 기반한 음성 검출기)

  • Chang, Sang-Ick;Jo, Q-Haing;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.4
    • /
    • pp.100-106
    • /
    • 2008
  • In this paper we apply a discriminative weight training employing power spectral flatness measure (PSFM) to a statistical model-based voice activity detection (VAD) in various noise environments. In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratio test (LRT) based on a minimum classification error (MCE) method which is different from the previous works in th at different weights are assigned to each frequency bin and noise environments depending on PSFM. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LRT.