DOI QR코드

DOI QR Code

Question Classification Based on Word Association for Question and Answer Archives

질문대답 아카이브에서 어휘 연관성을 이용한 질문 분류

  • 김설영 (전북대학교 컴퓨터공학과) ;
  • 이경순 (전북대학교 컴퓨터공학부/영상정보신기술연구센터)
  • Received : 2010.02.25
  • Accepted : 2010.04.20
  • Published : 2010.08.31

Abstract

Word mismatch is the most significant problem that causes low performance in question classification, whose questions consist of only two or three words that expressed in many different ways. So, it is necessary to apply word association in question classification. In this paper, we propose question classification method using translation-based language model, which use word translation probabilities for question-question pair that is learned in the same category. In the experiment, we prove that translation probabilities of question-question pairs in the same category is more effective than question-answer pairs in total collection.

보통 두 세 개의 어휘로 구성된 질문 분류에서 어휘의 다양한 표현으로 인한 어휘 불일치문제는 성능 저하의 주요 원인이다. 따라서 질문 분류에서 어휘 사이의 연관성을 반영하는 것이 필수적이다. 본 논문에서는 같은 범주의 질문-질문 쌍들에 대해 계산한 어휘 번역확률을 번역기반 언어모델에 반영하여 질문을 분류하는 방법을 제안한다. 실험에서 야후!앤써 질문대답 아카이브를 이용해서 전체 질문-대답 쌍들에 대해서 번역확률을 계산하는 것보다 같은 범주에 속하는 질문-질문 쌍들에 대해서 번역확률을 계산하는 것이 질문 분류에서 더 좋은 번역확률인 것을 증명한다.

Keywords

References

  1. KDDCUP 2005, http://www.acm.org/sigs/kddcup/
  2. Yangdong Liu, Jiang Bian and Eugene Agichtein, "Predicting Information Seeker Satisfaction in Community Question Answering," Proceeding of the 31st Annual International ACM SIGIR Conference, pp.483-490, July, 2008.
  3. A. Berger and J. Lafferty, "Information retrieval as statistical translation," Proceedings of the 22nd annual international ACM SIGIR conference, pp.222-229, Aug., 1999.
  4. Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee, "Finding Similar Questions in Large Question and Answer Archives," Proceedings of the 14th ACM SIGIR Conference, pp.84-90, 2005.
  5. Xiaobing Xue, Jiwoon Jeon and W. Bruce Croft, "Retrieval Models for Question and Answer Archives," Proceedings of the 31st annual international ACM SIGIR conference, pp.475-482, July, 2008.
  6. Huanhuan Cao, Derek HaoHu, Dou Shen and Daxin Jiang, "Context-Aware Query Classification," Proceedings of the 32nd annual international ACM SIGIR conference, pp.3-10, July, 2009.
  7. Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen, "Building Bridges for Web Query Classification," Proceedings of the 29th annual international ACM SIGIR conference, pp.131-138, Aug., 2006.
  8. ODP, http://dmoz.org
  9. Yu Jingbo and YeNa, "Automatic Web Query Classification Using Large Unlabeled Web Pages", Proceedings of the 2008 The Ninth International Conference, pp.211-215, 2008.
  10. Dell Zhang, Wee Sun Lee, "Question Classification using Support Vector Machines", Proceedings of the 26th annual international ACM SIGIR conference, pp.26-32, 2003.
  11. GIZA tool, http://www.fjoch.com/GIZA++.html
  12. ChengXiang Zhai, John Lafferty, "A study of smoothing methods for language models applied to information retrieval", ACM Trans.Inf.Syst, Vol.22, No.2, pp.179-214, 2004. https://doi.org/10.1145/984321.984322
  13. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, Robert L. Mercer, "The Mathematics of Statistical Machine Translation: Parameter Estimation," Computational Linguistics 19, 2(1993), pp.263-311.

Cited by

  1. Question and Answering System through Search Result Summarization of Q&A Documents vol.3, pp.4, 2014, https://doi.org/10.3745/KTSDE.2014.3.4.149