Search | Korea Science

Speaker Verification Performance Improvement Using Weighted Residual Cepstrum (가중된 예측 오차 파라미터를 사용한 화자 확인 성능 개선)

위진우;강철호
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.5
- /
- pp.48-53
- /
- 2001
In speaker verification based on LPC analysis the prediction residues are ignored and LPCC(LPC cepstrum) are only used to compose feature vectors. In this study, LPCC and RCEP (residual cepstrum) extracted from residues are used as feature parameters in the various environmental speaker verification. We propose the weighting function which can enlarge inter-speaker variation by weighting pitch, speaker inherent vector, included in residual cepstrum. Simulation results show that the average speaker verification rate is improved in the rate of 6％ with RCEP and LPCC at the same time and is improved in the rate of 2.45% with the proposed weighted RCEP and LPCC at the same time compared with no weighting.
PDF

Context Recognition Using Environmental Sound for Client Monitoring System (피보호자 모니터링 시스템을 위한 환경음 기반 상황 인식)

Ji, Seung-Eun;Jo, Jun-Yeong;Lee, Chung-Keun;Oh, Siwon;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.2
- /
- pp.343-350
- /
- 2015
This paper presents a context recognition method using environmental sound signals, which is applied to a mobile-based client monitoring system. Seven acoustic contexts are defined and the corresponding environmental sound signals are obtained for the experiments. To evaluate the performance of the context recognition, MFCC and LPCC method are employed as feature extraction, and statistical pattern recognition method are used employing GMM and HMM as acoustic models, The experimental results show that LPCC and HMM are more effective at improving context recognition accuracy compared to MFCC and GMM respectively. The recognition system using LPCC and HMM obtains 96.03% in recognition accuracy. These results demonstrate that LPCC is effective to represent environmental sounds which contain more various frequency components compared to human speech. They also prove that HMM is more effective to model the time-varying environmental sounds compared to GMM.
https://doi.org/10.6109/jkiice.2015.19.2.343 인용 PDF KSCI KPUBS HTML

Speaker Identification using Incremental Neural Network and LPCC (Incremental Neural Network 과 LPCC을 이용한 화자인식)

허광승;박창현;이동욱;심귀보
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2002.12a
- /
- pp.341-344
- /
- 2002
음성은 화자들의 특징을 가지고 있다. 이 논문에서는 신경망에 기초한 Incremental Learning을 이용하여 화자인식시스템을 소개한다. 컴퓨터를 통하여 녹음된 문장들은 FFT를 거치면서 Frequency 영역으로 바뀌고, 모음들의 특징을 가지고 있는 Formant를 이용하여 모음들을 추출한다. 추출된 모음들은 LPC처리를 통하여 화자의 특성을 가지고 있는 Coefficient값들을 얻는다. LPCC과정과 Vector Quantization을 통해 10개의 특징 점들은 학습을 위한 Input으로 들어가고 화자 수에 따라 증가되는 Hidden Layer와 Output Layer들을 가지고 있는 신경망을 통해 화자인식을 수행한다.

Parameters Comparison in the speaker Identification under the Noisy Environments (화자식별을 위한 파라미터의 잡음환경에서의 성능비교)

Choi, Hong-Sub
- Speech Sciences
- /
- v.7 no.3
- /
- pp.185-195
- /
- 2000
This paper seeks to compare the feature parameters used in speaker identification systems under noisy environments. The feature parameters compared are LP cepstrum (LPCC), Cepstral mean subtraction(CMS), Pole-filtered CMS(PFCMS), Adaptive component weighted cepstrum(ACW) and Postfilter cepstrum(PF). The GMM-based text independent speaker identification system is designed for this target. Some series of experiments show that the LPCC parameter is adequate for modelling the speaker in the matched environments between train and test stages. But in the mismatched training and testing conditions, modified parameters are preferable the LPCC. Especially CMS and PFCMS parameters are more effective for the microphone mismatching conditions while the ACW and PF parameters are good for more noisy mismatches.
PDF

A Design of a Scream Detecting Engine for Surveillance Systems (보안 시스템을 위한 비명 검출 엔진 설계)

Seo, Ji-Hun;Lee, Hye-In;Lee, Seok-Pil
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.63 no.11
- /
- pp.1559-1563
- /
- 2014
Recently, the prevention of crime using CCTV draws special in accordance with the higher crime incidence rate. Therefore security systems like a CCTV with audio capability are developing for giving an instant alarm. This paper proposes a scream detecting engine from various ambient noises in real environment for surveillance systems. The proposed engine detects scream signals among the various ambient noises using the features extracted in time/frequency domain. The experimental result shows the performance of our engine is very promising in comparison with the traditional engines using the model based features like LPC, LPCC and MFCC. The proposed method has a low computational complexity by using FFT and cross correlation coefficients instead of extracting complex features like LPC, LPCC and MFCC. Therefore the proposed engine can be efficient for audio-based surveillance systems with low SNRs in real field.
https://doi.org/10.5370/KIEE.2014.63.11.1559 인용 PDF KSCI KPUBS HTML

Classification of Consonants by SOM and LVQ (SOM과 LVQ에 의한 자음의 분류)

Lee, Chai-Bong;Lee, Chang-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.6 no.1
- /
- pp.34-42
- /
- 2011
In an effort to the practical realization of phonetic typewriter, we concentrate on the classification of consonants in this paper. Since many of consonants do not show periodic behavior in time domain and thus the validity for Fourier analysis of them are not convincing, vector quantization (VQ) via LBG clustering is first performed to check if the feature vectors of MFCC and LPCC are ever meaningful for consonants. Experimental results of VQ showed that it's not easy to draw a clear-cut conclusion as to the validity of Fourier analysis for consonants. For classification purpose, two kinds of neural networks are employed in our study: self organizing map (SOM) and learning vector quantization (LVQ). Results from SOM revealed that some pairs of phonemes are not resolved. Though LVQ is free from this difficulty inherently, the classification accuracy was found to be low. This suggests that, as long as consonant classification by LVQ is concerned, other types of feature vectors than MFCC should be deployed in parallel. However, the combination of MFCC/LVQ was not found to be inferior to the classification of phonemes by language-moded based approach. In all of our work, LPCC worked worse than MFCC.
https://doi.org/10.13067/JKIECS.2011.6.1.034 인용 PDF KSCI

Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination

Zhang, Qiu-yu;Xu, Fu-jiu;Bai, Jian
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.2
- /
- pp.522-539
- /
- 2021
In order to solve the problems of the existing audio fingerprint method when extracting audio fingerprints from long speech segments, such as too large fingerprint dimension, poor robustness, and low retrieval accuracy and efficiency, a robust audio fingerprint retrieval method based on feature dimension reduction and feature combination is proposed. Firstly, the Mel-frequency cepstral coefficient (MFCC) and linear prediction cepstrum coefficient (LPCC) of the original speech are extracted respectively, and the MFCC feature matrix and LPCC feature matrix are combined. Secondly, the feature dimension reduction method based on information entropy is used for column dimension reduction, and the feature matrix after dimension reduction is used for row dimension reduction based on energy feature dimension reduction method. Finally, the audio fingerprint is constructed by using the feature combination matrix after dimension reduction. When speech's user retrieval, the normalized Hamming distance algorithm is used for matching retrieval. Experiment results show that the proposed method has smaller audio fingerprint dimension and better robustness for long speech segments, and has higher retrieval efficiency while maintaining a higher recall rate and precision rate.
https://doi.org/10.3837/tiis.2021.02.008 인용 PDF KSCI HTML

Discriminative Feature Vector Selection for Emotion Classification Based on Speech (음성신호기반의 감정분석을 위한 특징벡터 선택)

Choi, Ha-Na;Byun, Sung-Woo;Lee, Seok-Pil
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.64 no.9
- /
- pp.1363-1368
- /
- 2015
Recently, computer form were smaller than before because of computing technique's development and many wearable device are formed. So, computer's cognition of human emotion has importantly considered, thus researches on analyzing the state of emotion are increasing. Human voice includes many information of human emotion. This paper proposes a discriminative feature vector selection for emotion classification based on speech. For this, we extract some feature vectors like Pitch, MFCC, LPC, LPCC from voice signals are divided into four emotion parts on happy, normal, sad, angry and compare a separability of the extracted feature vectors using Bhattacharyya distance. So more effective feature vectors are recommended for emotion classification.
https://doi.org/10.5370/KIEE.2015.64.9.1363 인용 PDF KSCI KPUBS HTML

A Method of Evaluating Korean Articulation Quality for Rehabilitation of Articulation Disorder in Children

Lee, Keonsoo;Nam, Yunyoung
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.8
- /
- pp.3257-3269
- /
- 2020
Articulation disorders are characterized by an inability to achieve clear pronunciation due to misuse of the articulators. In this paper, a method of detecting such disorders by comparing to the standard pronunciations is proposed. This method defines the standard pronunciations from the speeches of normal children by clustering them with three features which are the Linear Predictive Cepstral Coefficient (LPCC), the Mel-Frequency Cepstral Coefficient (MFCC), and the Relative Spectral Analysis Perceptual Linear Prediction (RASTA-PLP). By calculating the distance between the centroid of the standard pronunciation and the inputted pronunciation, disordered speech whose features locates outside the cluster is detected. 89 children (58 of normal children and 31 of children with disorders) were recruited. 35 U-TAP test words were selected and each word's standard pronunciation is made from normal children and compared to each pronunciation of children with disorders. In the experiments, the pronunciations with disorders were successfully distinguished from the standard pronunciations.
https://doi.org/10.3837/tiis.2020.08.006 인용 PDF KSCI HTML

Performance Comparison of Automatic Detection of Laryngeal Diseases by Voice (후두질환 음성의 자동 식별 성능 비교)

Kang Hyun Min;Kim Soo Mi;Kim Yoo Shin;Kim Hyung Soon;Jo Cheol-Woo;Yang Byunggon;Wang Soo-Geun
- MALSORI
- /
- no.45
- /
- pp.35-45
- /
- 2003
Laryngeal diseases cause significant changes in the quality of speech production. Automatic detection of laryngeal diseases by voice is attractive because of its nonintrusive nature. In this paper, we apply speech recognition techniques to detection of laryngeal cancer, and investigate which feature parameters and classification methods are appropriate for this purpose. Linear Predictive Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficients (MFCC) are examined as feature parameters, and parameters reflecting the periodicity of speech and its perturbation are also considered. As for classifier, multilayer perceptron neural networks and Gaussian Mixture Models (GMM) are employed. According to our experiments, higher order LPCC with the periodic information parameters yields the best performance.
PDF

Search Result 28, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)