DOI QR코드

DOI QR Code

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young (Center for Korean Studies Materials, The Academy of Korean Studies)
  • Received : 20101100
  • Accepted : 20110100
  • Published : 2011.04.30

Abstract

Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Keywords

References

  1. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonu-cleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750. https://doi.org/10.1073/pnas.96.12.6745
  2. Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106. https://doi.org/10.1073/pnas.97.18.10101
  3. Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Wiley, New York.
  4. Gan, X., Liew, A. and Yan, H. (2006). Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34, 1608-1619. https://doi.org/10.1093/nar/gkl047
  5. Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, Johns Hopkins University Press, Baltimore, MD.
  6. Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Series C, 28, 100-108. https://doi.org/10.2307/2346830
  7. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M. and Mark, R. (2001). Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine, 344, 539-548. https://doi.org/10.1056/NEJM200102223440801
  8. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York.
  9. Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. https://doi.org/10.1038/89044
  10. Kim, D. W., Lee, K. Y., Lee, K. H. and Lee, D. (2006). Towards clustering of incomplete microarray data without the use of imputation, Bioinformatics, 23, 107-113.
  11. Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S. F. and Lee, C. (2004). Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data, Nucleic Acids Research, 32, e180. https://doi.org/10.1093/nar/gnh173
  12. Ouyang, M., Welsh, W. J. and Georgopoulos, P. (2004). Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923. https://doi.org/10.1093/bioinformatics/bth007
  13. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  14. Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data lusters via the gap statistic, Journal of the Royal Statistical Society: Series B, 63, 411-423. https://doi.org/10.1111/1467-9868.00293
  15. Troyanskaya,O. G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. https://doi.org/10.1093/bioinformatics/17.6.520