DOI QR코드

DOI QR Code

정규분포기반 두각 혼합모형의 순환적 적합을 이용한 군집분석에서의 변수선택

Variable Selection in Clustering by Recursive Fit of Normal Distribution-based Salient Mixture Model

  • 김승구 (상지대학교 컴퓨터데이터정보학과)
  • Kim, Seung-Gu (Department of Data and Information, Sangji University)
  • 투고 : 2013.08.26
  • 심사 : 2013.10.21
  • 발행 : 2013.10.31

초록

Law 등 (2004)은 군집분석에서 변수선택을 위해 정규분포기반 "두각 혼합모형(salient mixture model)"의 사용을 제안하였다. 본 논문에서는 이 모형의 적합 상의 문제점과 변수선택의 결함을 지적하고 그 대안을 제시한다. 모의자료와 실자료를 바탕으로 제안된 방법이 기존의 방법보다 유용함을 보였다.

Law et al. (2004) proposed a normal distribution based salient mixture model for variable selection in clustering. However, this model has substantial problems such as the unidentifiability of components an the inaccurate selection of informative variables in the case of a small cluster size. We propose an alternative method to overcome problems and demonstrate a good performance through experiments on simulated data and real data.

키워드

참고문헌

  1. Bouguila, N., Almakadmeh, K. and Boutemedjet, S. (2012). A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection, Expert Systems with Applica-tions, 39, 6641-6656. https://doi.org/10.1016/j.eswa.2011.12.038
  2. Bouguila, N. and Ziou, D. (2006). A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture, IEEE Transactions on Image Processing, 15, 2657-2668. https://doi.org/10.1109/TIP.2006.877379
  3. Boutemedjet, S., Bouguila, N. and Ziou, D. (2009). A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1429-1443. https://doi.org/10.1109/TPAMI.2008.155
  4. Elguebaly, T. and Bouguila, N. (2013). Simultaneous Bayesian clustering and features election using RJMCMC-based learning of finite generalized Dirichlet mixture models, Signal Processing, 93, 1531-1546. https://doi.org/10.1016/j.sigpro.2012.07.037
  5. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. and Bloomfield, C. D. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 531-537. https://doi.org/10.1126/science.286.5439.531
  6. Graham, M. W. and Miller, D. J. (2006). Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection, IEEE Transactions on Signal Processing, 54, 1289-1303. https://doi.org/10.1109/TSP.2006.870586
  7. Kim, S. G. (2011). Variable selection in normal mixture model based clustering under heteroscedasticity, The Korean Journal of Applied Statistics, 24, 1-12. https://doi.org/10.5351/KJAS.2011.24.1.001
  8. Law, M. H. C., Figueiredo, M. A. T. and Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1154-1166. https://doi.org/10.1109/TPAMI.2004.71
  9. Li, M. D. Y. and Hua, J. (2008). Localized feature selection for clustering, Pattern Recognition Letters, 29, 10-18. https://doi.org/10.1016/j.patrec.2007.08.012
  10. Li, Y., Dong, M. and Hua, J. (2009). Simultaneous localized feature selection and model detection for Gaussian mixtures, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 953-960. https://doi.org/10.1109/TPAMI.2008.261
  11. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, Wiley, New York.
  12. Pan, W. and Shen, X. (2006). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research, 8, 1145-1164.
  13. Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
  14. Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data, Bioinformatics, 64, 440-448.
  15. Xie, B., Pan, W. and Shen, X. (2008). Variable selection in penalized model-based clustering via regularization on grouped parameters, Biometrics, 64, 921-930. https://doi.org/10.1111/j.1541-0420.2007.00955.x