DOI QR코드

DOI QR Code

Cluster Analysis Using Principal Coordinates for Binary Data

  • Chae, Seong-San (Department of Information and Statistics, Daejeon University) ;
  • Kim, Jeong, Il (Department of Information and Statistics, Daejeon University)
  • Published : 2005.12.01

Abstract

The results of using principal coordinates prior to cluster analysis are investigated on the samples from multiple binary outcomes. The retrieval ability of the known clustering algorithm is significantly improved by using principal coordinates instead of using the distance directly transformed from four association coefficients for multiple binary variables.

Keywords

References

  1. Affi, A.A. and Clark, V.(1990). Computer-Aided Multivariate Analysis, Van Nostrand Reinhold Company, New York
  2. Asparoukhov, O.K. and Krzanowski, W.J.(2001). A comparison of discriminant procedures for binary variables, Computational Statistics & Data Analysis, Vol. 38, 139-160 https://doi.org/10.1016/S0167-9473(01)00032-9
  3. Chae, S.S. and Warde, W.D.(1991). A method to predict the number of clusters, Journal of the Korean Statistical Society, Vol. 20, 162-176
  4. Chae, S.S. and Warde, W.D.(2006). Effect of using principal coordinates and principal components on retrieval of clusters, Computational Statistics & Data Analysis, Vol. 50, 1407-1417 https://doi.org/10.1016/j.csda.2005.01.013
  5. DuBien, J.L. and Warde, W.D.(1979). A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms, The Canadian Journal of Statistics, Vol. 7, 29-38 https://doi.org/10.2307/3315012
  6. DuBien, J.L. and Warde, W.D.(1987), A comparison of agglomerative clustering methods with respect to noise, Communications in Statistics, Theory and Method, Vol. 16, 1433-1460 https://doi.org/10.1080/03610928708829447
  7. DuBien, J.L., Warde, W.D. and Chae, S.S.(2004). Moments of Rand's C statistic in cluster analysis, Statistics & Probability Letters, Vol. 69, 243-252 https://doi.org/10.1016/j.spl.2004.06.009
  8. Gower, J.C.(1966). Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, Vol. 53, 325-338 https://doi.org/10.1093/biomet/53.3-4.325
  9. Gower, J.C.(1971). A general coefficient of similarity and some of its properties, Biometrics, Vol. 27, 857-871 https://doi.org/10.2307/2528823
  10. Gower, J.C. and Legendre, P.(1986). Metric and Euclidean properties of dissimilarity coefficients, Journal of Classification, Vol. 3, 5-48 https://doi.org/10.1007/BF01896809
  11. Huang, Z.(1998). Extensions to the k-means algorithms for clustering large data sets with categorical values, Data mining and Knowledge Discovery, Vol. 2, 283-304 https://doi.org/10.1023/A:1009769707641
  12. Lee, J.J.(2005). Discriminant analysis of binary data with multinomial distribution by using the iterative cross entropy minimization estimation, The Korean Communications in Statistics, Vol. 12, 125-137 https://doi.org/10.5351/CKSS.2005.12.1.125
  13. Ordonez, C.(2003). Clustering binary data streams with K-means, In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
  14. Rand, W.M.(1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, Vol. 66, 846-850 https://doi.org/10.2307/2284239