DOI QR코드

DOI QR Code

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke (Graduate School of Information Science and Technology, Hokkaido University) ;
  • Minami, Hiroyuki (Information Initiative Center, Hokkaido University) ;
  • Misuta, Masahiro (Information Initiative Center, Hokkaido University)
  • Received : 2014.01.09
  • Accepted : 2014.05.07
  • Published : 2014.05.31

Abstract

We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Keywords

References

  1. Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley, Chichester.
  2. Bock, H. H. and Diday, E. (2000). Analysis of Symbolic Data, Springer, Berlin Heidelberg.
  3. Diday, E. and Brito, M. P. (1989). Symbolic Cluster Analysis, In: Optiz, Otto (eds.), Conceptual and Numerical Analysis of Data, 45-84, Springer, Berlin Heidelberg.
  4. Diday, E. and Noirhomme-Fraiture, M. (2008). Symbolic Data Analysis and the SODAS Software, Wiley-Interscience.
  5. Diday, E. and Vrac, M. (2005). Mixture decomposition of distributions by copulas in the symbolic data analysis framework, Discrete Applied Mathematics, 147, 27-41, Elsevier Science Publishers B. V, Amsterdam. https://doi.org/10.1016/j.dam.2004.06.018
  6. Huh, M. H. (2002). Setting the Number of Clusters in K-Means Clustering, In: Baba, Y., Hayter, A. J., Kanefuji, K. and Kuriki, S. (eds.), Recent Advances in Statistical Research and Data Analysis, 115-124, Springer, Tokyo.
  7. Katayama, K., Minami, H. and Mizuta, M. (2009). Hierarchical symbolic clustering for distribution valued data, Journal of the Japanese Society of Computational Statistics, 22, 83-89 (In Japanese).
  8. Matsui, Y., Komiya, Y., Minami, H. and Mizuta, M. (2013). Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering, In: Gaul, W., Geyer-Schulz, A., Baba, Y. and Okada, A. (eds.), German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. (to appear), Springer, Heidelberg.
  9. Matsui, Y., Minami, H. and Mizuta, M. (2013). Symbolic Cluster Analysis for Distribution Valued Data, In: Cho, S. H. (eds.), Proceedings of Joint Meeting of the IASC Satellite Conference and the 8th Conference of the Asian Regional Section of the IASC, 305-310, Aug. 22-23, 2013, Yonsei University, Seoul, Korea.
  10. Mizuta, M. and Minami, H. (2012). Analysis of Distribution Valued Dissimilarity Data. In: Gaul, W. A., Geyer-Schulz, A., Schmidt-Thieme, L. and Kunze, J., Challenges at the Interface of Data Analysis, Computer Science, and Optimization, Studies in Classification, Data Analysis, and Knowledge Organization, 23-28, Springer, Heidelberg.
  11. Schweizer, B. (1968). Distributions are the numbers of the future, Proceedings section Napoli Meeting on "The mathematics of fuzzy systems", 137-149, Instituto di Mathematica delle Faculta di Achitectura, Universita degli studi di Napoli.
  12. Terada, Y. and Yadohisa, H. (2010). Non-hierarchical clustering for distribution-valued data, COMPSTAT 2010: Proceedings in Computational Statistics, 1653-1660, Psysica-Verlag, Heidelberg.