Advanced SearchSearch Tips
Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values
Kim, Su-Young;
  PDF(new window)
Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.
Microarray;gene expression;clustering;missing value;
 Cited by
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonu-cleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750. crossref(new window)

Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106. crossref(new window)

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Wiley, New York.

Gan, X., Liew, A. and Yan, H. (2006). Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34, 1608-1619. crossref(new window)

Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, Johns Hopkins University Press, Baltimore, MD.

Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Series C, 28, 100-108. crossref(new window)

Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M. and Mark, R. (2001). Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine, 344, 539-548. crossref(new window)

Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York.

Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. crossref(new window)

Kim, D. W., Lee, K. Y., Lee, K. H. and Lee, D. (2006). Towards clustering of incomplete microarray data without the use of imputation, Bioinformatics, 23, 107-113.

Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S. F. and Lee, C. (2004). Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data, Nucleic Acids Research, 32, e180. crossref(new window)

Ouyang, M., Welsh, W. J. and Georgopoulos, P. (2004). Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923. crossref(new window)

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. crossref(new window)

Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data lusters via the gap statistic, Journal of the Royal Statistical Society: Series B, 63, 411-423. crossref(new window)

Troyanskaya,O. G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. crossref(new window)