Modified Na$\ddot{i}$ve Bayes Classifier for Categorizing Questions in Question-Answering Community

확장된 나이브 베이즈 분류기를 활용한 질문-답변 커뮤니티의 질문 분류

  • 연종흠 (서울대학교 전기컴퓨터공학부) ;
  • 심준호 (숙명여자대학교 정보과학부) ;
  • 이상구 (서울대학교 전기컴퓨터공학부)
  • Published : 2010.01.15

Abstract

Social media refers to the content, which are created by users, such as blogs, social networks, and wikis. Recently, question-answering (QA) communities, in which users share information by questions and answers, are regarded as a kind of social media. Thus, QA communities have become a huge source of information for the past decade. However, it is hard for users to search the exact question-answer that is exactly matched with their needs as the number of question-answers increases in QA communities. This paper proposes an approach for classifying a question into three categories (information, opinion, and suggestion) according to the purpose of the question for more accurate information retrieval. Specifically, our approach is based on modified Na$\ddot{i}$ve Bayes classifier which uses structural characteristics of QA documents to improve the classification accuracy. Through our experiments, we achieved about 71.2% in classification accuracy.

소셜 미디어(social media)는 블로그, 소셜 네트워크, 위키 등과 같이 사용자의 참여로 만들어지는 정보 컨텐츠이다. 사용자가 작성한 질문에 다른 사용자들이 답변을하는 질문-답변 커뮤니티 서비스도 이러한 소셜 미디어의 한 가지로서 지난 몇 년간 많은 양의 정보를 축적해왔다. 하지만 축적된 질문-답변의 양이 많아질수록 이전의 질문을 정확히 검색하는 것은 점점 어려운 작업이 되고 있다. 본 논문에서는 질문-답변 커뮤니티의 효율적인 정보 검색을 위해 확장된 나이브 베이즈 분류기(Na$\ddot{i}$ve Bayes classifier)를 이용하여 질문을 그 목적에 따라 정보형, 제안형, 의견형으로 자동 분류하는 기법을 제안한다. 정확한 분류를 위해 분류기는 질문-답변 문서의 구조적인 특징을 활용한다. 실제 질문-답변 커뮤니티의 질문들에 대해 실험을 수행한 결과 71.2%의 분류 정확도를 보였다.

Keywords

References

  1. S. Whittaker, L. Terveen, W. Hill, L. Cherny, "The dynamics of mass interaction," Proc. of the 1998 ACM Conference on Computer Supported Cooperative Work, pp.257-264, 1998.
  2. K. Zhongbao, Z. Changshui, "Reply networks on a bulletin board system," Physical Review E, http:// pre.aps.org/abstract/PRE/v67/i3/e036117
  3. J. Jeon, W.B. Croft, J.H. Lee, "Finding similar questions in large question and answer archives," Proc. of the 14th ACM International Conference on Information and Knowledge Management, pp.84-90, 2005.
  4. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, "Finding high-quality content in social media," Proc. of the International Conference on Web Search and Web Data Mining, pp.183-194, 2008.
  5. J. Lee, Y. Song, H. Rim, "Quality Prediction of Knowledge Search Documents Using Text-Confidence Features," Proc. of the 19th Annual Conference on Human and Cognitive Language Technology, pp.62-67, 2007. (in Korean)
  6. S. Park, J. Lee, J. Jeon, "Evaluation of the documents from the Web-based Question and Answer Service," Journal of the Korean Society for Library and Information Science, vol.40, no.2, pp.299-314, 2006. (in Korean) https://doi.org/10.4275/KSLIS.2006.40.2.299
  7. L. A. Adamic, J. Zhang, E. Bakshy, M. S. Ackerman, "Knowledge sharing and yahoo answers: everyone knows something," Proc. of the 17th International Conference on World Wide Web, pp.665-674, 2008.
  8. S. Kim, J. S. Oh, S. Oh, "Best-answer selection criteria in a social Q&A site from the user-oriented relevance perspective," Proc. of the American Society for Information Science and Technology, vol.44, no.1, pp.1-15, 2007.
  9. T. Mitchell, Machine Learning, McGraw-Hill, 1997.
  10. Y. Kim, T. Lee, J. Chun, S. Lee, "Modified Naïve Bayes Classifier for E-Catalog Classification," Lecture Notes in Computer Science, vol.4055, pp.246-257, 2006.
  11. B. Pang, L. Lee, S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proc. of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp.79-86, 2002