Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm

K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택

  • Lee, Won-Hee (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
  • Lee, Shin-Won (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
  • Chung, Sung-Jong (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
  • An, Dong-Un (Dept. of Electronics & Information Engineering, Chonbuk National University)
  • 이원휘 (전북대학교 전자정보공학부) ;
  • 이신원 (전북대학교 전자정보공학부) ;
  • 정성종 (전북대학교 전자정보공학부) ;
  • 안동언 (전북대학교 전자정보공학부)
  • Published : 2008.02.25

Abstract

Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

정보통신의 기술이 발달하면서 정보의 양이 많아지고 사용자의 질의에 대한 검색 결과 리스트도 많이 추출되므로 빠르고 고품질의 문서 클러스터링 알고리즘이 중요한 역할을 하고 있다. 많은 논문들이 계층적 클러스터링 방법을 이용하여 좋은 성능을 보이지만 시간이 많이 소요된다. 반면 K-means 알고리즘은 시간 복잡도를 줄일 수 있는 방법이다. 본 논문에서는 계층적 클러스터링 시스템인 콘도르(Condor) 시스템에서 K-Means 알고리즘을 이용하여 효율적으로 정보 검색을 하고 검색결과를 계층적으로 볼 수 있도록 구현하였다. 이 시스템은 K-Means Algorithm을 이용하였으며 클러스터 계층 깊이와 초기값을 조절하여 더 나은 성능을 보임을 알 수 있다.

Keywords

References

  1. Baeza-Yates, Rebeiro-Neto, "Modern Information Retrieval," Addison-Wesley
  2. Hai-nan Jin, Shin-won Lee, Dong-un An, Sung-jong Chung, "A Study on Cluster Hierarchy Depth in Hierarchical Clustering," Proceedings of the 20th KIPS Spring Conference, 2004
  3. Hyung Jin Oh "Analysis of Document Clustering Varing Cluster Centroid Decisions," Proceedings of IEEK Summer Conference, 2002
  4. KhaledAlsabti, Sanjay Ranka, Vineet Singh, "An Efficient K-Means Clustering Algorithm," IIPS 11th International Parallel Processing Symposium, 1998
  5. Michael Steinbach, George Karypis, Vipin Kumar, "A Comparison of Document Clustering Techniques," Technical Report #00_034, Department of Computer Science and Engineering, University of Minnesota, 2000
  6. Qin He, "A Review of Clustering Algorithms as Applied in IR," UIUCLIS—1999/6+IRG
  7. Ramon A. Mollineda, Enrique Vidal. "A relative approach to hierarchical clustering", 2000
  8. Sang-seon Yi, Shin-won Lee, Dong-un An, Sung-jong Chung, "A Study on Cluster Topic Selection in Hierarchical Clustering," Proceedings of the 20th KIPS Spring Conference, 2004
  9. Soon Cheol Park, Dong-un An, "CONDOR Information Retrieval System," Korea Society Industrial Information Systems. Vol. 8 No.4, 2003
  10. Tapas Kanung, "The Analysis of a Simple k-Means Clustering Algorithms" in Proceedings of the sixteenth annual symposium on Computational geometry, 2000
  11. Vivisimo http://vivisimo.com