Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

Kim, Su-Young;

doi:10.5351/KJAS.2011.24.2.315

The Korean Journal of Applied Statistics (응용통계연구)

Volume 24 Issue 2
/
Pages.315-321
/
2011
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

Kim, Su-Young (Center for Korean Studies Materials, The Academy of Korean Studies)

Received : 20101100
Accepted : 20110100
Published : 2011.04.30

https://doi.org/10.5351/KJAS.2011.24.2.315 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Keywords

References

Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonu-cleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750. https://doi.org/10.1073/pnas.96.12.6745
Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106. https://doi.org/10.1073/pnas.97.18.10101
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Wiley, New York.
Gan, X., Liew, A. and Yan, H. (2006). Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34, 1608-1619. https://doi.org/10.1093/nar/gkl047
Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, Johns Hopkins University Press, Baltimore, MD.
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Series C, 28, 100-108. https://doi.org/10.2307/2346830
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M. and Mark, R. (2001). Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine, 344, 539-548. https://doi.org/10.1056/NEJM200102223440801
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York.
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. https://doi.org/10.1038/89044
Kim, D. W., Lee, K. Y., Lee, K. H. and Lee, D. (2006). Towards clustering of incomplete microarray data without the use of imputation, Bioinformatics, 23, 107-113.
Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S. F. and Lee, C. (2004). Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data, Nucleic Acids Research, 32, e180. https://doi.org/10.1093/nar/gnh173
Ouyang, M., Welsh, W. J. and Georgopoulos, P. (2004). Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923. https://doi.org/10.1093/bioinformatics/bth007
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data lusters via the gap statistic, Journal of the Royal Statistical Society: Series B, 63, 411-423. https://doi.org/10.1111/1467-9868.00293
Troyanskaya,O. G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. https://doi.org/10.1093/bioinformatics/17.6.520

The Korean Journal of Applied Statistics (응용통계연구)

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)