Comparison of clustering with yeast microarray gene expression data

효모 마이크로어레이 유전자발현 데이터에 대한 군집화 비교

  • Lee, Kyung-A (Department of Statistics, Duksung Women's University) ;
  • Kim, Jae-Hee (Department of Statistics, Duksung Women's University)
  • 이경아 (덕성여자대학교 정보통계학과) ;
  • 김재희 (덕성여자대학교 정보통계학과)
  • Received : 2011.05.16
  • Accepted : 2011.07.12
  • Published : 2011.08.01

Abstract

We accomplish clustering analyses for yeast cell cycle microarray expression data. We compare model-based clustering, K-means, PAM, SOM and hierarchical Ward method with yeast data. As the validity measure for clustering results, connectivity, Dunn Index and silhouette values are computed and compared.

마이크로어레이 유전자 발현데이터인 효모데이터를 이용하여 군집분석을 실시하였다. 모형기반 군집방법, K-평균법, 중앙값 중심분포 (PAM), 자기 조직화 지도 (SOM), 계층적 Ward 군집방법을 이용하여 군집화를 실시하고, 연결성 측도 (connectivity), Dunn지수, 실루엣 측도 (silhouette)를 이용하여 각 군집방법에 대한 유효성을 측정하고 군집분석 결과를 비교하고자한다.

Keywords

References

  1. 김재희 (2008). , 교우사, 서울
  2. 김재희 (2011). , 교우사, 서울
  3. Cho, K. and Park, H. (2008). A study of association rule application using self-organizung map for fused data. Journal of the Korean Data & Information Science Society, 19, 95-104.
  4. Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611-631. https://doi.org/10.1198/016214502760047131
  5. Fraley, C. and Raftery, A. E. (2006). MCLUST Version 3 for R: Normal mixture modeling and model-based clustering, Technical Report No. 504.
  6. Fraley, C. and Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and modelbased clustering. Journal of Classification, 24, 155-181. https://doi.org/10.1007/s00357-007-0004-5
  7. Gentleman, R., Caray, V. J., Huber, W., Irizarry, R. A. and Dudoit, S. (2005). Bioinformatics and computational biology solutions using R and bioconductor, Spinger, New York.
  8. Getz, G., Levine, E., Domany, E. and Zhang, M. Q. (2000). Super-paramagneic clustering of yeast expression profiles. Physica A, 279, 457-464. https://doi.org/10.1016/S0378-4371(99)00524-5
  9. Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis. Bioinformatics, 21, 3201-3212. https://doi.org/10.1093/bioinformatics/bti517
  10. Kim, J. and Kim, H. (2008). Clustering of change patterns using Fourier coefficients. Bioinformatics, 24, 184-191. https://doi.org/10.1093/bioinformatics/btm568
  11. Kim, J. and Ko, Y. (2009). A comparison of cluster analyses and clutering of sensory data on Hanwoo bulls. The Korean Journal of Applied Statistics, 22, 745-758. https://doi.org/10.5351/KJAS.2009.22.4.745
  12. Kaufman, L. and Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis, Wiley, New York.
  13. Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21, 1-6. https://doi.org/10.1016/S0925-2312(98)00030-7
  14. Lee, Y. and An, M. (2007). A comparison of clustering algorithm in data mining. Journal of the Korean Data & Information Science Society, 14, 725-736.
  15. McLachlan, G. J., Do, K.-A. and Ambroise C. (2004). Analyzing microarray gene expression data, Wiley, New York.
  16. Park, C. (2007). Monitoring of gene regulations using average rank in DNA microarray: Implement of R. Journal of the Korean Data & Information Science Society, 18, 1005-1021.
  17. Park, H. and Ryu, J. (2005). Clustering algorithm using a center of gravity for grid-based sample. Journal of the Korean Data & Information Science Society, 16, 217-226.
  18. Rousseeuw, P. T. (1987). Silhouettes : A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  19. Toronen, R., Kolehmainen, M., Wong, G. and Castren, E. (1999). Analysis of gene expression data using self-organizing maps. Federation of European Biochemical Societies, 451, 142-146. https://doi.org/10.1016/S0014-5793(99)00524-4
  20. Wit, E. and McClure, J. (2004). Statistics for microarrays, Wiley, New York.
  21. Yeung, K. Y., Haynor D. R. and Ruzzo, W. L. (2000). Validating clustering for gene expression data. Bioinformatics, 17, 309-318.
  22. Zhang, L., Zhang, A. and Ramanathan, M. (2003). Fourier harmonic approach for visualizing temporal patterns of gene expression data. IEEE Computer Society Bioinformatics Conference, 2, 137-147.