Advanced SearchSearch Tips
Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity
Kim, Seung-Gu;
  PDF(new window)
In high dimensionality where the number of variables are excessively larger than observations, it is required to remove the noninformative variables to cluster observations. Most model-based approaches for variable selection have been considered under the assumption of homoscedasticity and their models are mainly estimated by a penalized likelihood method. In this paper, a different approach is proposed to remove the noninformative variables effectively and to cluster based on the modified normal mixture model simultaneously. The validity of the model was provided and an EM algorithm was derived to estimate the parameters. Simulation studies and an experiment using real microarray dataset showed the effectiveness of the proposed method.
Informative variables;variable selection;clustering;EM algorithm;microarray gene expression;
 Cited by
A Variable Selection Procedure for K-Means Clustering,;

응용통계연구, 2012. vol.25. 3, pp.471-483 crossref(new window)
정규분포기반 두각 혼합모형의 순환적 적합을 이용한 군집분석에서의 변수선택,김승구;

응용통계연구, 2013. vol.26. 5, pp.821-834 crossref(new window)
A Variable Selection Procedure for K-Means Clustering, Korean Journal of Applied Statistics, 2012, 25, 3, 471  crossref(new windwow)
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. and Bloomfield, C. D. (1999). Molecular classification of cancer: Class discovery andclass prediction by gene expression monitoring, Science, 286, 531-537. crossref(new window)

Kim, S.-G. (2006). Use of factor analyzer normal mixture model with mean pattern modeling on clustering genes, Communications Korean Statistical Society, 13, 113-123. (Korean with English abstract) crossref(new window)

McLachlan, G. J., Bean, R. W. and Jones, B.-T. (2006). A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays, Bioinformatics, 22, 1608-1615. crossref(new window)

McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, John Wiley & Sons.

Meng, X.-L. and Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278. crossref(new window)

Ng, S. K., McLachlan, G. J., Wang, K., Ben-Tovim, L. and Ng, S. W. (2006). A Mixture model with randomeffects components for clustering correlated gene-expression profiles, Bioinformatics, 22, 1745-1752. crossref(new window)

Pan, W. and Shen, X. (2006). Penalized model-based clustering with application to variable selection, Journal of Machine Learning Research, 8, 1145-1164.

Raftery, A. E. and Dean, N. (2006). Variable selection for model-based clustering, Journal of the American Statistical Association, 101, 168-178. crossref(new window)

Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461-464. crossref(new window)

Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data, Bioinformatics, 64, 440-448.

Xie, B., Pan, W. and Shen, X. (2008). Variable selection in penalized model-based clustering via regularization on grouped parameters, Biometrics, 64, 921-930. crossref(new window)