Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm

Lee, Won-Hee;Lee, Shin-Won;Chung, Sung-Jong;An, Dong-Un;

Journal of the Institute of Electronics Engineers of Korea SD (대한전자공학회논문지SD)

Volume 45 Issue 2
/
Pages.150-156
/
2008
/
1229-6368(pISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm

K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택

Lee, Won-Hee (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
Lee, Shin-Won (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
Chung, Sung-Jong (Dept. of Electronics & Information Engineering, Chonbuk National University) ;
An, Dong-Un (Dept. of Electronics & Information Engineering, Chonbuk National University)

이원휘 (전북대학교 전자정보공학부) ;
이신원 (전북대학교 전자정보공학부) ;
정성종 (전북대학교 전자정보공학부) ;
안동언 (전북대학교 전자정보공학부)

Published : 2008.02.25

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

정보통신의 기술이 발달하면서 정보의 양이 많아지고 사용자의 질의에 대한 검색 결과 리스트도 많이 추출되므로 빠르고 고품질의 문서 클러스터링 알고리즘이 중요한 역할을 하고 있다. 많은 논문들이 계층적 클러스터링 방법을 이용하여 좋은 성능을 보이지만 시간이 많이 소요된다. 반면 K-means 알고리즘은 시간 복잡도를 줄일 수 있는 방법이다. 본 논문에서는 계층적 클러스터링 시스템인 콘도르(Condor) 시스템에서 K-Means 알고리즘을 이용하여 효율적으로 정보 검색을 하고 검색결과를 계층적으로 볼 수 있도록 구현하였다. 이 시스템은 K-Means Algorithm을 이용하였으며 클러스터 계층 깊이와 초기값을 조절하여 더 나은 성능을 보임을 알 수 있다.

Keywords

References

Baeza-Yates, Rebeiro-Neto, "Modern Information Retrieval," Addison-Wesley
Hai-nan Jin, Shin-won Lee, Dong-un An, Sung-jong Chung, "A Study on Cluster Hierarchy Depth in Hierarchical Clustering," Proceedings of the 20th KIPS Spring Conference, 2004
Hyung Jin Oh "Analysis of Document Clustering Varing Cluster Centroid Decisions," Proceedings of IEEK Summer Conference, 2002
KhaledAlsabti, Sanjay Ranka, Vineet Singh, "An Efficient K-Means Clustering Algorithm," IIPS 11th International Parallel Processing Symposium, 1998
Michael Steinbach, George Karypis, Vipin Kumar, "A Comparison of Document Clustering Techniques," Technical Report #00_034, Department of Computer Science and Engineering, University of Minnesota, 2000
Qin He, "A Review of Clustering Algorithms as Applied in IR," UIUCLIS—1999/6+IRG
Ramon A. Mollineda, Enrique Vidal. "A relative approach to hierarchical clustering", 2000
Sang-seon Yi, Shin-won Lee, Dong-un An, Sung-jong Chung, "A Study on Cluster Topic Selection in Hierarchical Clustering," Proceedings of the 20th KIPS Spring Conference, 2004
Soon Cheol Park, Dong-un An, "CONDOR Information Retrieval System," Korea Society Industrial Information Systems. Vol. 8 No.4, 2003
Tapas Kanung, "The Analysis of a Simple k-Means Clustering Algorithms" in Proceedings of the sixteenth annual symposium on Computational geometry, 2000
Vivisimo http://vivisimo.com

Journal of the Institute of Electronics Engineers of Korea SD (대한전자공학회논문지SD)

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm

K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)