Advanced SearchSearch Tips
A Variable Selection Procedure for K-Means Clustering
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A Variable Selection Procedure for K-Means Clustering
Kim, Sung-Soo;
  PDF(new window)
One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website " and vnvarkm.r".
K-means clustering;variable selection;Mojena`s stopping rule;VS-KM;HINoV;adjusted Rand index;
 Cited by
Variable Selection and Outlier Detection for Automated K-means Clustering,;

Communications for Statistical Applications and Methods, 2015. vol.22. 1, pp.55-67 crossref(new window)
농촌체험프로그램 운영 유형 및 실태분석 : 농촌마을종합개발사업을 중심으로,황한철;노용식;박정수;

농촌계획, 2015. vol.21. 2, pp.103-114 crossref(new window)
Variable Selection and Outlier Detection for Automated K-means Clustering, Communications for Statistical Applications and Methods, 2015, 22, 1, 55  crossref(new windwow)
Operational Management System and Characteristics Analysis on the Rural Experience Programs:the Case of Comprehensive Rural Village Development Projects, Journal of Korean Society of Rural Planning, 2015, 21, 2, 103  crossref(new windwow)
Ban eld, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821. crossref(new window)

Brusco, M. J. and Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering, Psychometrika, 66, 249-270. crossref(new window)

Carmone, F. J., Kara, A. and Maxwell, S. (1999). HINoV; A new model to improve market segmentation by identifying noisy variables, Journal of Marketing Research, 36, 501-509. crossref(new window)

De Sarbo, W. S., Carroll, J. D., Clark, L. A. and Green, P. E. (1984). Synthesized clustering: A method for amalgamating alternative clustering bases with different weighting of variables, Psychometrika, 49, 57-78. crossref(new window)

De Soete, G. (1986). Optimal variable weighting for ultrametric and additive tree clustering, Quality and Quantity, 20, 169-180. crossref(new window)

Everitt, B. S., Landau, S. and Leese, M. (2001). Cluster Analysis, Arnold.

Fowlkes, E. B., Gnanadesikan, R. and Kettenring, J. R. (1987). Variable selection in clustering other contexts, In C.L. Mallows(Ed.), Design, Data and Analysis, 13-34.

Fowlkes, E. B., Gnanadesikan, R. and Kettenring, J. R. (1988). Variable selection in clustering, Journal of Classi cation, 5, 205-228.

Fowlkes, E. B. and Mallows, C. L. (1983). A method for comparing two hierarchical clusterings (with comments and rejoinder), Journal of the American Statistical Association, 78, 553-584. crossref(new window)

Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering methods? Answers via modelbased cluster analysis, Computer Journal, 41, 578-588. crossref(new window)

Gnanadesikan, R., Kettenring, J. R. and Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis, Journal of Classi cation, 7, 271-285.

Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of Classi cation, 2, 193-218.

Kim, S. (1999). Interactive visualization of K-means and Hierarchical clusters, The Journal of Data Science and Classi cation, 3, 13-27.

Kim, S. (2009). Automated K-means clustering and R implementation, The Korean Journal of Applied Statistics, 22, 723-733. crossref(new window)

Kim, S.-G. (2011). Variable selection in normal mixture model based clustering under heteroscedasticity, The Korean Journal of Applied Statistics, 24, 1213-1224. crossref(new window)

Kim, S., Kwon, S. and Cook, D. (2000). Interactive visualization of hierarchical clusters using MDS and MST, Metrika, 51, 39-51. crossref(new window)

Milligan, G. W. (1980a). An examination of six types of the effects of error perturbation on fifteen clustering algorithms, Psychometrika, 45, 325-342. crossref(new window)

Milligan, G. W. (1980b). An algorithm for generating artificial test clusters, Psychometrika, 50, 123-127.

Milligan, G. W. (1989). A validation study of a variable-weighting algorithm for cluster analysis, Journal of Classi cation, 6, 53-71.

Milligan, G. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 159-179. crossref(new window)

Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation, The Computer Journal, 20, 259-363.

Mojena, R., Wishart, D. and Andrews, G. B. (1980). Stopping rules for Wards' clustering method, COMP- STAT,, 426-432.

Raftery, A. E. and Dean, N. (2006). Variable selection for model-based clustering, Journal of the American Statistical Assocation, 101, 168-178. crossref(new window)

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Assocation, 66, 846-850. crossref(new window)

Qui, W.-L. and Joe, H. (2006). Generation of random clusters with specified degree of separation, Journal of Classi cation, 23, 315-334.

Steinley, D. and Brusco, M. J. (2008). A new variable weighting and selection procedure for K-means cluster analysis, Multivariate Behavioral Research, 43, 77-108. crossref(new window)

Waller, N. G., Underhill, J. M. and Kaiser, H. (1999). A method for generating simulated plasmodes and artificial test clusters with user-defined shape, size, and orientation, Multivariate Behavioral Research, 34, 123-142. crossref(new window)

Ward, J. H. (1963). Hierarchical grouping to optimise an objective function, Journal of American Statistical Association, 58, 236-244. crossref(new window)