DOI QR코드

DOI QR Code

A Post Web Document Clustering Algorithm

후처리 웹 문서 클러스터링 알고리즘

  • Im, Yeong-Hui (Dept.of Computer Information Communication, Engineering, Daejeon University)
  • 임영희 (대전대학교 컴퓨터정보통신공학부)
  • Published : 2002.02.01

Abstract

The Post-clustering algorithms, which cluster the results of Web search engine, have several different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those requirements as many as possible. The proposed Concept ART is the form of combining the concept vector that have several advantages in document clustering with Fuzzy ART known as real-time clustering algorithms. Moreover we show that it is applicable to general-purpose clustering as well as post-clustering

웹 검색 엔진의 검색 결과를 클러스터링하는 후처리 클러스터링 알고리즘은 그 특성상 일반적인 클러스터링 알고리즘과는 다른 요구조건을 갖는다. 본 논문에서는 이러한 후처리 클러스터링 알고리즘의 요구조건들을 최대한 만족하는 새로운 클러스터링 알고리즘을 제안하고자 한다. 제안된 Concept ART는 문서 클러스터링에 있어 여러 가지 장점을 갖는 개념 벡터와 실시간 클러스터링 알고리즘으로 알려진 Fuzzy ART를 결합한 형태로써, 후처리 클러스터링뿐 아니라 범용의 클러스터링 알고리즘으로도 응용이 가능하다.

Keywords

References

  1. O. Zamir and O. Etzioni, 'Web Document Clustering: A Feasibility Demonstration,' Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '98), pp.46-54, 1998 https://doi.org/10.1145/290941.290956
  2. A. Leouski and W. B. Croft, 'An Evaluation of Techniques for Clustering Search Results,' Technical Report IR-76, University of Massachusetts at Amherst, 1996
  3. D. S. Modha and W. S. Spangler, 'Clustering Hypertext With Applications To Web Searching,' Proceedings of ACM Hypertext Conference, 2000 https://doi.org/10.1145/336296.336351
  4. M. A. Hearst and J. O. Pedersen, 'Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results,' Proceedings of ACM SIGIR '96, pp.76-84, 1996 https://doi.org/10.1145/243199.243216
  5. O. Zamir and O. Etzioni, 'Grouper: A Dynamic Clustering Interface to Web Search Results,' available at http://www.cs.washington.edu.zamir/papers/www8.ps.gz
  6. 박민우, '검색엔진의 과거와 현재 그리고 미래', 마이크로소프트웨어, pp.220-235, 2000
  7. I. S. Dhillon and D. S. Modha, 'Concept Decomposition for Large Sparse Text Data using Clustering,' Technical Report RJ 10147(9502), IBM Almaden Research Center, 1999
  8. N. Vlajic and H. C. Card, 'Categorizing Web Pages using Modified ART,' IEEE Canadian Conference, Vol.1, pp.313-316, 1998 https://doi.org/10.1109/CCECE.1998.682747
  9. W. B. Frakes and R. Baeza-Yates, 'Information Retrieva I : Data Structures and Algorithms,' Prentice Hall, Englewood Cliffs, New Jersey, 1992
  10. J. J. Fan, 'MC: A Fast Sparse Matrix Generator For Large Text Collections,' available at http://www.cs.utexas.edu/users/jfan/dm/
  11. Available at http://www.cs.utexas.edu/users/inderjit/Resources/sparse_matrices
  12. G. A. Carpenter, S. Grossburg, and D. B. Rosen, 'Fuzzy ART : An Adaptive Resonance Algorithm for Rapid, Stable Classification of Analog Patterns,' Proceedings of 1991 International Conference Neural Networks, Vol. II, pp.411-416, 1991 https://doi.org/10.1109/IJCNN.1991.155368
  13. A. Baraldi and E. Alpaydin, 'Simplified ART : A New Class of ART Algorithms,' International Computer Science Institute. TR 98-004. 1998