A Study on Cluster Hierarchy Depth in Hierarchical Clustering

계층적 클러스터링에서 분류 계층 깊이에 관한 연구

  • Jin, Hai-Nan (Dept. of Computer Engineering, Chonbuk National University) ;
  • Lee, Shin-won (Dept. of Computer Engineering, Chonbuk National University) ;
  • An, Dong-Un (Dept. of Computer Engineering, Chonbuk National University) ;
  • Chung, Sung-Jong (Dept. of Computer Engineering, Chonbuk National University)
  • 김해남 (전북대학교 컴퓨터공학과) ;
  • 이신원 (전북대학교 컴퓨터공학과) ;
  • 안동언 (전북대학교 컴퓨터공학과) ;
  • 정성종 (전북대학교 컴퓨터공학과)
  • Published : 2004.05.14

Abstract

Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.

Keywords