A Topic Classification System Based on Clue Expressions for Person-Related Questions and Passages

단서표현 기반의 인물관련 질의-응답문 문장 주제 분류 시스템

  • Received : 2015.07.28
  • Accepted : 2015.10.16
  • Published : 2015.12.31


In general, Q&A system retrieves passages by matching terms of a question in order to find an answer to the question. However it is difficult for Q&A system to find a correct answer because too many passages are retrieved and matching using terms is not enough to rank them according to their relevancy to a question. To alleviate this problem, we introduce a topic for a sentence, and adopt it for ranking in Q&A system. We define a set of person-related topic class and a clue expression which can indicate a topic of a sentence. A topic classification system proposed in this paper can determine a target topic for an input sentence by using clue expressions, which are manually collected from a corpus. We explain an architecture of the topic classification system and evaluate the performance of the components of this system.


Topic Classification;Clue Expression;Person-Related Topic Class


  1. Yongjin Bae and Hyunki Kim, "Estimating Block Weighting Scheme of Structured Text in the Information Retrieval for Question Answering," Korea Computer Cogress, pp.963-965, 2015.
  2. Zhang, Dell and Wee Sun Lee, "Question classification using support vector machines," Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003.
  3. Androutsopoulos, Ion, et al., "An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2000.
  4. Antonellis, Ioannis, Christos Bouras, and Vassilis Poulopoulos, "Personalized news categorization through scalable text classification," Frontiers of WWW Research and Development- APWeb 2006, Springer Berlin Heidelberg, pp.391-401, 2006.
  5. McCallum, Andrew, and Kamal Nigam, "A comparison of event models for naive bayes text classification," AAAI-98 Workshop on Learning for Text Categorization. Vol.752. 1998.
  6. McCallumzy, Andrew, et al., "Building domain-specific search engines with machine learning techniques," AAAI Technical Report SS-99-03, 1999.
  7. Chen, Jingnian, et al., "Feature selection for text classification with Naive Bayes," Expert Systems with Applications, Vol.36, No.3, pp.5432-5435, 2009.
  8. Wijewickrema, Chaaminda Manjula, and Ruwan Gamage, "An ontology based fully automatic document classification system using an existing semi-automatic system," IFLA WLIC 2013 - Future Libraries: Infinite Possibilities, Singapore, 2013.
  9. Morchid, Mohamed, Richard Dufour, and Georges Linares, "A LDA-based topic classification approach from highly imperfect automatic transcriptions," LREC'14, 2014.
  10. Quercia, Daniele, Harry Askham, and Jon Crowcroft, "TweetLDA: supervised topic classification and link prediction in Twitter," Proceedings of the 4th Annual ACM Web Science Conference. ACM, 2012.
  11. Phan, Xuan-Hieu, Le-Minh Nguyen, and Susumu Horiguchi, "Learning to classify short and sparse text & web with hidden topics from large-scale data collections," Proceedings of the 17th international conference on World Wide Web. ACM, 2008.
  12. Faguo, Zhou, et al., "Research on short text classification algorithm based on statistics and rules," Electronic Commerce and Security (ISECS), 2010 Third International Symposium on. IEEE, 2010.
  13. Wang, Chang et al., "Relation Extraction with Relation Topics," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1426-1436, 2011.
  14. Wang, Chang, et al., "Relation Extraction and Scoring in DeepQA," IBM Journal of Research and Development, Vol.56, Issue.3.4, pp.9:1-9:12, 2012.
  15. Changki Lee, Yi-Gyu Hwang, and Myung-Gil Jang, "Finegrained named entity recognition and relation extraction for question answering," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.799-800, 2007.
  16. Chae, "On the Classification and Distribution of Korean Adverbials: Focusing on the Distinction between Regular and Concord Adverbials," Language and Linguistics, Vol.29, pp.283-323, 2002.
  17. Cortes, Corinna and Vladimir Vapnik, "Support-vector networks," Machine Learning, Vol.20, Issue.3, pp.273-297, 1995.
  18. Murphy, Kevin P., "Naive bayes classifiers," University of British Columbia, 2006.


Grant : 휴먼 지식증강 서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발

Supported by : 정보통신기술진흥센터