DOI QR코드

DOI QR Code

A Biclustering Method for Time Series Analysis

  • Received : 2010.02.20
  • Accepted : 2010.05.17
  • Published : 2010.06.01

Abstract

Biclustering is a method of finding meaningful subsets of objects and attributes simultaneously, which may not be detected by traditional clustering methods. It is popularly used for the analysis of microarray data representing the expression levels of genes by conditions. Usually, biclustering algorithms do not consider a sequential relation between attributes. For time series data, however, bicluster solutions should keep the time sequence. This paper proposes a new biclustering algorithm for time series data by modifying the plaid model. The proposed algorithm introduces a parameter controlling an interval between two selected time points. Also, the pruning step preventing an over-fitting problem is modified so as to eliminate only starting or ending points. Results from artificial data sets show that the proposed method is more suitable for the extraction of biclusters from time series data sets. Moreover, by using the proposed method, we find some interesting observations from real-world time-course microarray data sets and apartment price data sets in metropolitan areas.

Keywords

Biclustering;Time-series Data;Plaid Model;Binary Least Square

References

  1. Berndt, D. J. and Clifford, J. (1994), Using dynamic time warping to find patterns in time series, Association for the Advancement of Artificial Intelligence Technical Report, WS-94-03, 359-370.
  2. Cheng, Y. and Church, G. (2000), Biclustering of expression data, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 93-103.
  3. Cho, R., Campbell, M., Winzeler, E, Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., and Davis, R. (1998), A genome-wide transcriptional analysis of the mitotic cell cycle, Molecular Cell, 2, 65-73. https://doi.org/10.1016/S1097-2765(00)80114-8
  4. Ernst, J., Nau, G. J., and Bar-Joseph, Z. (2005), Clustering short time series gene expression data, Bioinformatics, 21, i159-i16. https://doi.org/10.1093/bioinformatics/bti1022
  5. Getz, G., Levine, E., and Domany, E. (2000), Coupled two-way clustering analysis of gene microarray data, The Proceedings of the National Academy of Sciences of the Unite States of America, 12079-12084.
  6. Hartigan, J. (1972), Direct clustering of a data matrix, Journal of the American Statistical Association, 37, 123-129.
  7. Kluger, Y., Basri, R., Ghang, J., and Gerstein, M. (2003), Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13, 703-716. https://doi.org/10.1101/gr.648603
  8. Kohonen, T. (1990) The self organizing maps, Proceeding IEEE, 78, 1464-1480.
  9. Lazzeroni, L. and Owen, A. (2002), Plaid models for gene expression data, Statistica Sinica, 12, 61-86.
  10. Lee, Y., Lee, J., and Jun, C.-H. (2009), Validation measures of bicluster solutions, Industrial Engineering and Management Systems, 8, 101-108.
  11. Liao, T. W. (2005), Clustering of time series data-a survey, Pattern Recognition, 38, 1857-1874. https://doi.org/10.1016/j.patcog.2005.01.025
  12. Liu, J. and Wang, W. (2003), OP-Cluster: clustering by tendency in high dimensional space, Proceeding, Third IEEE International Conference, Data Mining, 187-194.
  13. Madeira, S. and Oliveira, A. (2004), Biclustering Algorithms for Biological Data Analysis: A Survey, IEEE Transactions on Computational Biology and Bioinformatics, 1, 24-45. https://doi.org/10.1109/TCBB.2004.2
  14. Madeira, S. and Oliveira, A. (2005), A linear time biclustering algorithm for time series gene expression data, Lecture Notes in Computer Science, Springer Berlin, 39-52.
  15. Mirkin, B. (1996), Mathematical classification and clustering, Kluwer Academic Publish.
  16. Santamaria, R., Quintales, R. and Theoron, R. (2007), Method to bicluster validation and comparison in microarray data, Intelligent Data Engineering and Automated Learning-Ideal 2007: 8th International Conference, Birmingham, Uk, Proceedings, 780-789,
  17. Turner, H., Bailey, T., and Krzanowski, W. (2005), Improved biclustering of microarray data demonstrated through systematic performance tests, Computational Statistics and Data Analysis, 48, 235-254. https://doi.org/10.1016/j.csda.2004.02.003
  18. Yeung, K. Y., Haynor, D. R., and Ruzzo, W. L. (2000), Validating clustering for gene expression Data, Technical Report, Department of Computer Science and Engineering, University of Washington.
  19. Zhang, Y., Zha, H., and Chu, C. (2005), A Time-series biclustering algorithm for revealing co-regulated genes. Proceedings of the International Conference on Information Technology, Coding and Computing, 1, 32-37.

Cited by

  1. Biclustering of Smart Building Electric Energy Consumption Data vol.9, pp.2, 2019, https://doi.org/10.3390/app9020222