• Title/Summary/Keyword: optimal number of clusters

Search Result 77, Processing Time 0.03 seconds

Determining the Optimal Number of Signal Clusters Using Iterative HMM Classification

  • Ernest, Duker Junior;Kim, Yoon Joong
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.33-37
    • /
    • 2018
  • In this study, we propose an iterative clustering algorithm that automatically clusters a set of voice signal data without a label into an optimal number of clusters and generates hmm model for each cluster. In the clustering process, the likelihood calculations of the clusters are performed using iterative hmm learning and testing while varying the number of clusters for given data, and the maximum likelihood estimation method is used to determine the optimal number of clusters. We tested the effectiveness of this clustering algorithm on a small-vocabulary digit clustering task by mapping the unsupervised decoded output of the optimal cluster to the ground-truth transcription, we found out that they were highly correlated.

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

An Optimal Clustering using Hybrid Self Organizing Map

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.1
    • /
    • pp.10-14
    • /
    • 2006
  • Many clustering methods have been studied. For the most part of these methods may be needed to determine the number of clusters. But, there are few methods for determining the number of population clusters objectively. It is difficult to determine the cluster size. In general, the number of clusters is decided by subjectively prior knowledge. Because the results of clustering depend on the number of clusters, it must be determined seriously. In this paper, we propose an efficient method for determining the number of clusters using hybrid' self organizing map and new criterion for evaluating the clustering result. In the experiment, we verify our model to compare other clustering methods using the data sets from UCI machine learning repository.

Fast Search Algorithm for Determining the Optimal Number of Clusters using Cluster Validity Index (클러스터 타당성 평가기준을 이용한 최적의 클러스터 수 결정을 위한 고속 탐색 알고리즘)

  • Lee, Sang-Wook
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.9
    • /
    • pp.80-89
    • /
    • 2009
  • A fast and efficient search algorithm to determine an optimal number of clusters in clustering algorithms is presented. The method is based on cluster validity index which is a measure for clustering optimality. As the clustering procedure progresses and reaches an optimal cluster configuration, the cluster validity index is expected to be minimized or maximized. In this Paper, a fast non-exhaustive search method for finding the optimal number of clusters is designed and shown to work well in clustering. The proposed algorithm is implemented with the k-mean++ algorithm as underlying clustering techniques using CB and PBM as a cluster validity index. Experimental results show that the proposed method provides the computation time efficiency without loss of accuracy on several artificial and real-life data sets.

The Effect of the Number of Clusters on Speech Recognition with Clustering by ART2/LBG

  • Lee, Chang-Young
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.3-8
    • /
    • 2009
  • In an effort to improve speech recognition, we investigated the effect of the number of clusters. In usual LBG clustering, the number of codebook clusters is doubled on each bifurcation and hence cannot be chosen arbitrarily in a natural way. To have the number of clusters at our control, we combined adaptive resonance theory (ART2) with LBG and perform the clustering in two stages. The codebook thus formed was used in subsequent processing of fuzzy vector quantization (FVQ) and HMM for speech recognition tests. Compared to conventional LBG, our method was shown to reduce the best recognition error rate by 0${\sim$}0.9% depending on the vocabulary size. The result also showed that between 400 and 800 would be the optimal number of clusters in the limit of small and large vocabulary speech recognitions of isolated words, respectively.

  • PDF

Optimal Combination of VNTR Typing for Discrimination of Isolated Mycobacterium tuberculosis in Korea

  • Lee, Jihye;Kang, Heeyoon;Kim, Sarang;Yoo, Heekyung;Kim, Hee Jin;Park, Young Kil
    • Tuberculosis and Respiratory Diseases
    • /
    • v.76 no.2
    • /
    • pp.59-65
    • /
    • 2014
  • Background: Variable-number tandem repeat (VNTR) typing is a promising method to discriminate the Mycobacterium tuberculosis isolates in molecular epidemiology. The purpose of this study is to determine the optimal VNTR combinations for discriminating isolated M. tuberculosis strains in Korea. Methods: A total of 317 clinical isolates collected throughout Korea were genotyped by using the IS6110 restriction fragment length polymorphism (RFLP), and then analysed for the number of VNTR copies from 32 VNTR loci. Results: The results of discriminatory power according to diverse combinations were as follows: 25 clusters in 83 strains were yielded from the internationally standardized 15 VNTR loci (Hunter-Gaston discriminatory index [HGDI], 0.9958), 25 clusters in 65 strains by using IS6110 RFLP (HGDI, 0.9977), 14 clusters in 32 strains in 12 hyper-variable VNTR loci (HGDI, 0.9995), 6 clusters in 13 strains in 32 VNTR loci (HDGI, 0.9998), and 7 clusters in 14 strains of both the 12 hyper-variable VNTR and IS6110 RFLP (HDGI, 0.9999). Conclusion: The combination of 12 hyper-variable VNTR typing can be an effective tool for genotyping Korean M. tuberculosis isolates where the Beijing strains are predominant.

A Two-Stage Method for Near-Optimal Clustering (최적에 가까운 군집화를 위한 이단계 방법)

  • 윤복식
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.1
    • /
    • pp.43-56
    • /
    • 2004
  • The purpose of clustering is to partition a set of objects into several clusters based on some appropriate similarity measure. In most cases, clustering is considered without any prior information on the number of clusters or the structure of the given data, which makes clustering is one example of very complicated combinatorial optimization problems. In this paper we propose a general-purpose clustering method that can determine the proper number of clusters as well as efficiently carry out clustering analysis for various types of data. The method is composed of two stages. In the first stage, two different hierarchical clustering methods are used to get a reasonably good clustering result, which is improved In the second stage by ASA(accelerated simulated annealing) algorithm equipped with specially designed perturbation schemes. Extensive experimental results are given to demonstrate the apparent usefulness of our ASA clustering method.

The Effect of the Number of Phoneme Clusters on Speech Recognition (음성 인식에서 음소 클러스터 수의 효과)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.11
    • /
    • pp.1221-1226
    • /
    • 2014
  • In an effort to improve the efficiency of the speech recognition, we investigate the effect of the number of phoneme clusters. For this purpose, codebooks of varied number of phoneme clusters are prepared by modified k-means clustering algorithm. The subsequent processing is fuzzy vector quantization (FVQ) and hidden Markov model (HMM) for speech recognition test. The result shows that there are two distinct regimes. For large number of phoneme clusters, the recognition performance is roughly independent of it. For small number of phoneme clusters, however, the recognition error rate increases nonlinearly as it is decreased. From numerical calculation, it is found that this nonlinear regime might be modeled by a power law function. The result also shows that about 166 phoneme clusters would be the optimal number for recognition of 300 isolated words. This amounts to roughly 3 variations per phoneme.

Improvement of Self Organizing Maps using Gap Statistic and Probability Distribution

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.2
    • /
    • pp.116-120
    • /
    • 2008
  • Clustering is a method for unsupervised learning. General clustering tools have been depended on statistical methods and machine learning algorithms. One of the popular clustering algorithms based on machine learning is the self organizing map(SOM). SOM is a neural networks model for clustering. SOM and extended SOM have been used in diverse classification and clustering fields such as data mining. But, SOM has had a problem determining optimal number of clusters. In this paper, we propose an improvement of SOM using gap statistic and probability distribution. The gap statistic was introduced to estimate the number of clusters in a dataset. We use gap statistic for settling the problem of SOM. Also, in our research, weights of feature nodes are updated by probability distribution. After complete updating according to prior and posterior distributions, the weights of SOM have probability distributions for optima clustering. To verify improved performance of our work, we make experiments compared with other learning algorithms using simulation data sets.

A Study of optimized clustering method based on SOM for CRM

  • Jong T. Rhee;Lee, Joon.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.464-469
    • /
    • 2001
  • CRM(Customer Relationship Management : CRM) is an advanced marketing supporting system which analyze customers\` transaction data and classify or target customer groups to effectively increase market share and profit. Many engines were developed to implements the function and those for classification and clustering are considered core ones. In this study, an improved clustering method based on SOM(Self-Organizing Maps : SOM) is proposed. The proposed clustering method finds the optimal number of clusters so that the effectiveness of clustering is increased. It considers all the data types existing in CRM data warehouses. In particular, and adaptive algorithm where the concepts of degeneration and fusion are applied to find optimal number of clusters. The feasibility and efficiency of the proposed method are demonstrated through simulation with simplified data of customers.

  • PDF