베이지안 SOM과 붓스트랩을 이용한 문서 군집화에 의한 문서 순위조정

A Document Ranking Method by Document Clustering Using Bayesian SoM and Botstrap

  • 최준혁 (김포대학 컴퓨터계열 소프트웨어개발) ;
  • 전성해 (인하대학 대학원 통계학과) ;
  • 이정현 (인하대학교 전자계산공학과)
  • 발행 : 2000.07.01

초록

The conventional Boolean retrieval systems based on vector spae model can provide the results of retrieval fast, they can't reflect exactly user's retrieval purpose including semantic information. Consequently, the results of retrieval process are very different from those users expected. This fact forces users to waste much time for finding expected documents among retrieved documents. In his paper, we designed a bayesian SOM(Self-Organizing feature Maps) in combination with bayesian statistical method and Kohonen network as a kind of unsupervised learning, then perform classifying documents depending on the semantic similarity to user query in real time. If it is difficult to observe statistical characteristics as there are less than 30 documents for clustering, the number of documents must be increased to at least 50. Also, to give high rank to the documents which is most similar to user query semantically among generalized classifications for generalized clusters, we find the similarity by means of Kohonen centroid of each document classification and adjust the secondary rank depending on the similarity.

키워드

참고문헌

  1. T. Kohonen, Self Organizing Maps, 2nd Edition, Springer, 1997
  2. T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag, 2nd ed. 1988
  3. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983
  4. Johnson, Richard A. and Wichern Dean W, Applied Multivariate Statistical Analysis, Prentice Hall, 1992
  5. Oren Zamir, Oren Etzioni, 'Web Document Clustering : A Feasibility Demonstration,' Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.46-54, 1998 https://doi.org/10.1145/290941.290956
  6. 김기영, 전명식, 다변량 통계자료분석, 자유아카데미, 1994
  7. 김기영, 전명식, SAS 군집분석, 자유아카데미, 1992
  8. 정영미, 정보검색론, 구미무역, 1993
  9. 최준혁, 허준회, 이정현, '한국어 정보 검색에서 엔트로피와 사용자 프로파일을 이용한 질의 확장', 한국통신학회논문지, 제24권 제11호, pp.1729-1738, 1999