DOI QR코드

DOI QR Code

A Divisive Clustering for Mixed Feature-Type Symbolic Data

혼합형태 심볼릭 데이터의 군집분석방법

Kim, Jaejik
김재직

  • Received : 2015.09.14
  • Accepted : 2015.11.03
  • Published : 2015.12.31

Abstract

Nowadays we are considering and analyzing not only classical data expressed by points in the p-dimensional Euclidean space but also new types of data such as signals, functions, images, and shapes, etc. Symbolic data also can be considered as one of those new types of data. Symbolic data can have various formats such as intervals, histograms, lists, tables, distributions, models, and the like. Up to date, symbolic data studies have mainly focused on individual formats of symbolic data. In this study, it is extended into datasets with both histogram and multimodal-valued data and a divisive clustering method for the mixed feature-type symbolic data is introduced and it is applied to the analysis of industrial accident data.

Keywords

mixed feature-type symbolic data;cluster analysis;industrial accident

References

  1. Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining, John Wiley and Sons, New Jersey.
  2. Billard, L. and Kim, J. (2013). Clustering in contemporary mixed-valued data, In Proceedings of the 2013 World Statistics Congress, International Statistical Institute.
  3. Bock, H. H. and Diday, E. (2000). Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, Springer-Verlag, New York.
  4. Cha, S. H. and Srihari, S. H. (2002). On measuring the distance between histograms, Pattern Recognition Letter, 35, 1355-1370. https://doi.org/10.1016/S0031-3203(01)00118-2
  5. Chavent, M. (1998). A monothetic clustering method, Pattern Recognition Letters, 19, 989-996. https://doi.org/10.1016/S0167-8655(98)00087-7
  6. Chavent, M. (2000). Criterion-based divisive clustering for symbolic data. In: Bock, H.H., Diday, E. (Eds.), Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, Springer, New York, 299-311.
  7. Davis, D. L. and Bouldin, D. W. (1979). A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224-227.
  8. De Carvalho, F. A. T. (1994). Proximity coefficients between boolean symbolic objects. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., (Eds.), New Approaches in Classification and Data Analysis, Springer-Verlag, Berlin, 387-394.
  9. De Carvalho, F. A. T. (1998). Extension based proximity coefficients between constrained boolean symbolic objects. In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y., (Eds.), In Proceedings of the Fifth Conference of the International Federation of Classification Societies (IFCS-96), Springer-Verlag, Berlin, 370-378.
  10. De Carvalho, F. A. T., Brito, P. and Bock, H. H. (2006). Dynamic clustering for interval data based on $L_2$ distance, Computational Statistics, 2, 231-245.
  11. De Carvalho, F. A. T. and Lechevallier, Y. (2009). Partitional clustering algorithms for symbolic interval data based on single adaptive distances, Pattern Recognition, 42, 1223-1236. https://doi.org/10.1016/j.patcog.2008.11.016
  12. De Carvalho, F. A. T. and De Souza, R. M. C. R. (2010). Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 31, 430-443. https://doi.org/10.1016/j.patrec.2009.11.007
  13. De Souza, R. M. C. R. and De Carvalho, F. A. T. (2007). A clustering methods for mixed feature-type symbolic data using adaptive squared Euclidean distances, The 7th International Conference on Hybrid Intelligent Systems, 168-173.
  14. Diday, E. (1987). Introduction a l'approche symbolique en analyse des donnees, Premiere Journees Symbolique-Numerique, CEREMADE, Universite Paris IX, 21-56.
  15. Dunn, J. C. (1974). Well separated clusters and optimal fuzzy partitions, Journal of Cybernetica, 4, 95-104. https://doi.org/10.1080/01969727408546059
  16. Gowda, K. C. and Diday, E. (1991). Symbolic clustering using a new dissimilarity measure, Pattern Recog-nition, 24, 567-578. https://doi.org/10.1016/0031-3203(91)90022-W
  17. Gowda, K. C. and Ravi, T. V. (1995a). Agglomerative clustering of symbolic objects using the concepts of both similarity and dissimilarity, Pattern Recognition Letters, 16, 647-652. https://doi.org/10.1016/0167-8655(95)80010-Q
  18. Gowda, K. C. and Ravi, T. V. (1995b). Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity, Pattern Recognition, 28, 1277-1282. https://doi.org/10.1016/0031-3203(95)00003-I
  19. Ichino, M. and Yaguchi, H. (1994). Generalized minkowski metrics for mixed feature type data analysis, IEEE Transactions on Systems, Man, and Cybernetics, 24, 698-709. https://doi.org/10.1109/21.286391
  20. Irpino, A. and Verde, R. (2006). A newWasserstein based distance for the hierarchical clustering of histogram symbolic data, IFCS 2006, 185-192.
  21. Kim, J. and Billard, L. (2011). A polythetic clustering process and cluster validity indexes for histogramvalued objects, Computational Statistics & Data Analysis, 55, 2250-2262. https://doi.org/10.1016/j.csda.2011.01.011
  22. Kim, J. and Billard, L. (2012). Dissimilarity measures and divisive clustering for symbolic multimodal-valued data, Computational Statistics & Data Analysis, 56, 2795-2808. https://doi.org/10.1016/j.csda.2012.03.001
  23. Kim, J. and Billard, L. (2013). Dissimilarity measures for histogram-valued observations, Communications in Statistics - Theory and Methods, 42, 283-303. https://doi.org/10.1080/03610926.2011.581785