DOI QR코드

DOI QR Code

A Novel Technique of Topic Detection for On-line Text Documents: A Topic Tree-based Approach

온라인 텍스트문서의 계층적 트리 기반 주제탐색 기법

  • Xuan, Man (School of Electrical and Computer Engineering, University of Seoul) ;
  • Kim, Han-Joon (School of Electrical and Computer Engineering, University of Seoul)
  • 현만 (서울시립대학교 전자전기컴퓨터공학부) ;
  • 김한준 (서울시립대학교 전자전기컴퓨터공학부)
  • Published : 2012.11.22

Abstract

Topic detection is a problem of discovering the topics of online publishing documents. For topic detection, it is important to extract correct topic words and to show the topical words easily to understand. We consider a topic tree-based approach to more effectively and more briefly show the result of topic detection for online text documents. In this paper, to achieve the topic tree-based topic detection, we propose a new term weighting method, called CTF-CDF-IDF, which is simple yet effective. Moreover, we have modified a conventional clustering method, which we call incremental k-medoids algorithm. Our experimental results with Reuters-21578 and Google news collections show that the proposed method is very useful for topic detection.

Keywords