Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee ;
  • Yang, Hyung-Jeong ;
  • Kim, Soo-Hyung ;
  • Kim, Sun-Hee ;
  • Anh, Nguyen Thi Ngoc
  • Received : 2012.01.09
  • Accepted : 2012.02.17
  • Published : 2012.03.28


Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.


Multivariate Time Series;Principle Component Analysis;Independent Component Analysis Article;Microarray Analysis;Feature Selection;Incremental Model;Clustering


  1. M. Madan Babu, "Introduction to microarray data analysis," in Computational Genomics Horizon Press, U.K, 2009, pp. 225-249.
  2. B. Xie, W. Pan and X. Shen, "Penalized mixtures of factor analyzers with application to clustering high dimensional microarray data," in Bioinformatics, 2009, pp. 501-508..
  3. S. Raychaudhuri, J. M. Stuart and R. B. Altman, "Principal component analysis to summarize microarray experiments: application to sporulation Time Series'" in Pacific Symposium on Biocomputing, 2000, pp. 452-463.
  4. A. L. Boulesteix, C. Porzelius and Martin Daumer, "Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value," in Bioinformatics, 2008, pp. 1698-1706.
  5. C. Das, P. Maji and S. Chattopadhyay, "Supervised gene clustering for extraction of discriminative features from microarray data," in India Conference (INDICON), Annual IEEE, 2010, pp. 1-4.
  6. S. I. Ao and M. K. Ng, "Gene expression time series modeling with principal component analysis," in Soft Computing, A Fusion of Foundations, Methodologies and Appications, Springer Berlin, February 2006, vol. 10, pp. 351-358.
  7. M. Ungureanu, C. Bigan, R. Strungaru and V. Lazarescu, "Independent component analysis applied in biomedical signal processing," in Measurement Science Rev, 2004, vol. 4, pp. 1-8.
  8. A. Hyvarinen and E. Oja, "Independent component analysis: algorithms and applications,", in Neural Network, vol. 13, 2000, pp. 411-430.
  9. M. Dyrholm, "Model selection for convolutive ICA with an application to spatio-temporal Analysis of EEG," in Neural Computation, vol. 19, 2007, pp. 934-955.
  10. E. Acar, C. A. Bingol, H. Bingol, R. Bro and B. Yener, "Multiway analysis of epilepsy tensors," in Bioinformatics, vol. 23, 2007, pp. 10-18.
  11. JV. Stone, "Independent component analysis: a tutorial introduction," in MIT Press, 2004.
  12. Pan JY, H. Kitagawa, C. Faloutsos and M. Hamamoto, "AutoSplit: fast and scalable discovery of hidden variables in stream and multimedia databases," in Proceedings of the Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2004.
  13. J. L. Semmlow and S. L. Semmlow "Biosignal and biomedical image processing: matlab based application, Marcel Dekker, Inc, 2004.
  14. M. Journee, A. E. Teschendorft, P. A. Absil, S. Tavare and R. Sepulchre, "Geometric optimization methods for the analysis of gene expression data," in Principal Manifolds for Data Visualization and Dimension Reduction, vol. 58, 2007, pp. 272-292.
  15. L. Zhu and C.Tang "Microarray sample clustering using independent component analysis," in IEEE /SMC International Conference on System of Systems Engineering, 2006.
  16. W. Kong, X. Mou, Q. Liu, Z. Chen, X. R. Vanderburg, J. T. Rogers and XUdong Huang, "Independent component analysis of Alzheimer's DNA microarray gene expression data," in Molecular Neurogenereration, 2009.
  17. S. Papadimitriou, J. Sun and C. Faloutsos, "Streaming pattern discovery in multiple time-series," in Proceedings of the 31st VLDB Conference, 2009.
  18. J. Sun, , S. Papadimitriou and C. Faloutsos, "Online latent variable detection in sensor networks," in Proceedings of the 21st International Conference on Data Engineering, ICDE, 2005.
  19. E. M. Blalock, J. W. Geddes, K. C. Chen, N. M. Porter, W. R. Markesbery and P. W. Landfied, "Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses," Proceedings of the National Academy of Sciences of the United States of America, vol. 7, 2004, pp. 2173-2178.
  20. R. Mario, S. Cuciniello and D. Feminiano, "Incremental generalized eigenvalue classification on data streams," in International Workshop on Data Stream Management and Mining, 2005.
  21. K.S Ng, H. J. Yang and S. H Kim," in BioSystems, vol. 97, 2009, pp. 15-27.
  22. R. Vigario, S. Jaako and O. Erkki, "Searching for independence in electromagnetic brain waves," in Advances in Independent component analysis, Springer, 2005.