Cluster Analysis Using Principal Coordinates for Binary Data

Chae, Seong-San;Kim, Jeong, Il

  • 발행 : 2005.12.01


The results of using principal coordinates prior to cluster analysis are investigated on the samples from multiple binary outcomes. The retrieval ability of the known clustering algorithm is significantly improved by using principal coordinates instead of using the distance directly transformed from four association coefficients for multiple binary variables.


Agglomerative Clustering Algorithm;Principal Coordinates;Association Coefficients


  1. Affi, A.A. and Clark, V.(1990). Computer-Aided Multivariate Analysis, Van Nostrand Reinhold Company, New York
  2. Asparoukhov, O.K. and Krzanowski, W.J.(2001). A comparison of discriminant procedures for binary variables, Computational Statistics & Data Analysis, Vol. 38, 139-160
  3. Chae, S.S. and Warde, W.D.(1991). A method to predict the number of clusters, Journal of the Korean Statistical Society, Vol. 20, 162-176
  4. DuBien, J.L. and Warde, W.D.(1979). A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms, The Canadian Journal of Statistics, Vol. 7, 29-38
  5. DuBien, J.L. and Warde, W.D.(1987), A comparison of agglomerative clustering methods with respect to noise, Communications in Statistics, Theory and Method, Vol. 16, 1433-1460
  6. DuBien, J.L., Warde, W.D. and Chae, S.S.(2004). Moments of Rand's C statistic in cluster analysis, Statistics & Probability Letters, Vol. 69, 243-252
  7. Gower, J.C.(1966). Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, Vol. 53, 325-338
  8. Gower, J.C.(1971). A general coefficient of similarity and some of its properties, Biometrics, Vol. 27, 857-871
  9. Huang, Z.(1998). Extensions to the k-means algorithms for clustering large data sets with categorical values, Data mining and Knowledge Discovery, Vol. 2, 283-304
  10. Lee, J.J.(2005). Discriminant analysis of binary data with multinomial distribution by using the iterative cross entropy minimization estimation, The Korean Communications in Statistics, Vol. 12, 125-137
  11. Ordonez, C.(2003). Clustering binary data streams with K-means, In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
  12. Rand, W.M.(1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, Vol. 66, 846-850
  13. Gower, J.C. and Legendre, P.(1986). Metric and Euclidean properties of dissimilarity coefficients, Journal of Classification, Vol. 3, 5-48
  14. Chae, S.S. and Warde, W.D.(2006). Effect of using principal coordinates and principal components on retrieval of clusters, Computational Statistics & Data Analysis, Vol. 50, 1407-1417