Advanced SearchSearch Tips
Cluster Analysis with Balancing Weight on Mixed-type Data
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Cluster Analysis with Balancing Weight on Mixed-type Data
Chae, Seong-San; Kim, Jong-Min; Yang, Wan-Youn;
  PDF(new window)
A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.
Agglomerative clustering algorithm;mixed-type attribute;association coefficient;
 Cited by
Affi, A.A. and Clark, V. (1990). Computer-Aided Multivariate Analysis. Van Nostrand Reinhold Company, New York

Asparoukhov, O.K. and Krzanowski, W.J. (2001). A comparison of discriminant procedures for binary variables. Computational Statistics & Data Analysis, Vol. 38, 139-160 crossref(new window)

Chae, S.S., DuBien J.L. and Warde, W.D. (2006). A method of predicting the number of clusters using Rand's statistic. Computational Statistics & Data Analysis, Vol. 50, 3531-3546 crossref(new window)

Chae, S.S. and Kim, J.I. (2005). Cluster analysis using principal coordinates for binary data. The Korean Communications in Statistics, Vol. 12, 683-696 crossref(new window)

DuBien, J.L. and Warde, W.D. (1987). A comparison of agglomerative cluster -ing methods with respect to noise. Communications in Statistics, Theory and Method, Vol. 16, 1433-1460 crossref(new window)

Everitt, B. (1993). Cluster Analysis. 3rd edition, John Wiley & Sons

Gowda, K.C. and Diday, E. (1991). Symbolic clustering using a new dis simi -larity measures. Pattern Recognition, Vol. 24, 567-578 crossref(new window)

Gower, J.C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, Vol. 53, 325-338 crossref(new window)

Gower, J.C. (1967). A comparison of some methods of cluster analysis. Biometrics, Vol. 23, 623-637 crossref(new window)

Gower, J.C. (1971). A general coefficient of similarity and some of its properties. Biometrics, Vol. 27, 857-871 crossref(new window)

Gower, J.C. and Legendre, P. (1986), Metric and Euclidean properties of dis -similarity coefficients. Journal of Classification, Vol. 3, 5-48 crossref(new window)

Huang, Z. (1998). Extensions to the k-means algorithms for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, 283-304 crossref(new window)

Jain, A.K. and Dubes, R.C, (1988). Algorithms for Clustering Data. Prentice Hall

Lee, J.J. (2005). Discriminant analysis of binary data with multinomial distri -bution by using the iterative cross entropy minimization estimation. The Korean Communications in Statistics, Vol. 12, 125-137 crossref(new window)

Ordonez, C. (2003). Clustering binary data streams with K-means. In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery

Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Joumal of the American Statistical Association, Vol. 66, 846-850 crossref(new window)