JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Automated K-Means Clustering and R Implementation
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Automated K-Means Clustering and R Implementation
Kim, Sung-Soo;
  PDF(new window)
 Abstract
The crucial problems of K-means clustering are deciding the number of clusters and initial centroids of clusters. Hence, the steps of K-means clustering are generally consisted of two-stage clustering procedure. The first stage is to run hierarchical clusters to obtain the number of clusters and cluster centroids and second stage is to run nonhierarchical K-means clustering using the results of first stage. Here we provide automated K-means clustering procedure to be useful to obtain initial centroids of clusters which can also be useful for large data sets, and provide software program implemented using R.
 Keywords
K-means clustering;Ward`s method;Mojena`s stopping rule;model-based clustering;BIC(Bayesian Information Criteria);automated K-means clustering;
 Language
Korean
 Cited by
1.
A Variable Selection Procedure for K-Means Clustering,;

응용통계연구, 2012. vol.25. 3, pp.471-483 crossref(new window)
2.
Variable Selection and Outlier Detection for Automated K-means Clustering,;

Communications for Statistical Applications and Methods, 2015. vol.22. 1, pp.55-67 crossref(new window)
1.
Variable Selection and Outlier Detection for Automated K-means Clustering, Communications for Statistical Applications and Methods, 2015, 22, 1, 55  crossref(new windwow)
2.
A Variable Selection Procedure for K-Means Clustering, Korean Journal of Applied Statistics, 2012, 25, 3, 471  crossref(new windwow)
 References
1.
김성수 (1999). 통계그래픽스를 이용한 K-평균 및 계층적 군집분석, <한국분류학회지>, 3, 13-27

2.
허명회, 이용구 (2004). K-평균 군집화의 재현성 평가 및 응용, <응용통계연구>, 17, 135-144

3.
Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 crossref(new window)

4.
Brusco, M. J. and Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering, Psychometrika, 66, 249-270 crossref(new window)

5.
Chen, J. S., Ching, R. K. H. and Lin, Y. S. (2004). An extended study of the K-means algorithm for data clustering and its applications, The Journal of the Operational Research Society, 55, 976-987 crossref(new window)

6.
Dasgupta, A. and Raftery, A. E. (1998). Detecting features in spatial point processes with clutter via modelbased clustering, Journal of the American Statistical Association, 93, 294-302 crossref(new window)

7.
Everitt, B. S., Landau, S. and Leese, M. (2001). Cluster Analysis, Arnold, London

8.
Fraley, C. (1998). Algorithms for model-based gaussian hierarchical clustering, SIAM Journal on Scientific Computing, 20, 270-281 crossref(new window)

9.
Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering methods? Answers via modelbased cluster analysis, The Computer Journal, 41, 578-588 crossref(new window)

10.
Fraley, C. and Raftery, A. E. (2006). MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering, Technical Report No. 504, Department of Statistics University of Washington

11.
Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm, Applied Statistics, 28, 100-108 crossref(new window)

12.
Kim, S. S., Kwon, S. and Cook, D. (2000). Interactive visualization of hierarchical clusters using MDS and MST, Metrika, 51, 39-51 crossref(new window)

13.
Krzanowski, W. J. (1988). Principles of Multivariate Analysis, Oxford Science, Oxford

14.
Milligan, G. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 159-179 crossref(new window)

15.
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation, The Computer Journal, 20, 359-363 crossref(new window)

16.
Mojena, R., Wishart, D. and Andrews, G. B. (1980). Stopping rules for Wards'clustering method, COMPSTAT, 426-432

17.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66, 846-850 crossref(new window)

18.
SPSS (2000). Clementine Application Templates for Telecommunication Industries(Telco CAT), Chicago, SPSS Inc.

19.
Stanford, D. C. and Raftery, A. E. (2000). Principal curve clustering with noise, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 601-609 crossref(new window)

20.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of American Statistical Association, 58, 236-244 crossref(new window)

21.
Wehrens, R., Buydens, L. M. C., Fraley, C. and Raftery, A. E. (2004). Model-based clustering for image segmentation and large data sets via sampling, Journal of Classification, 21, 231-253 crossref(new window)