DOI QR코드

DOI QR Code

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods

다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템

  • 권순재 (서강대학교 컴퓨터공학과) ;
  • 김주애 (서강대학교 컴퓨터공학과) ;
  • 강상우 (가천대학교 소프트웨어학과) ;
  • 서정연 (서강대학교 컴퓨터공학과)
  • Received : 2016.04.21
  • Accepted : 2016.07.28
  • Published : 2016.10.15

Abstract

Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

커뮤니티 기반 질의 응답 시스템은 사용자 질의에 대한 정답을 인터넷 커뮤니티에 사용자들이 게시했던 문서 중에서 선택하여 제공하는 시스템이다. 기존 방법들은 질의 분석의 성능 향상을 위하여 목적 영역에 적합한 규칙을 구축하거나 일부 처리 과정에 기계 학습을 적용하였다. 하지만 기존 방법들은 적용 영역을 확장하거나 수정하는 경우 많은 비용이 소요되며 경우에 따라서는 시스템이 특정 영역에 과적합되는 경우가 발생한다. 본 논문에서는 커뮤니티 기반 질의-응답 시스템의 효과적인 처리를 위해서 시스템의 각 과정에 적합한 기계 학습 방법을 적용하여 전체 과정을 자동화하는 다중 기계학습 방법을 제안한다. 제안 시스템은 사용자 질의를 분석하는 부분과 정답 문서를 선택하는 부분으로 나눌 수 있다. 질의 분석 과정은 질의의 초점 구문을 분석하는 질의 핵심부 추출기와 질의의 주제를 분류하는 질의 유형 분류기로 구성하였으며, 전자는 조건부 무작위장을 사용하고 후자는 지지 벡터 기계를 사용한다. 정답 문서 선택에서는 유사도 측정에서 사용하는 가중치를 인공 신경망으로 학습한다. 또한 인터넷에 커뮤니티에 게시된 데이터는 형태소 분석 결과를 신뢰할 수 없는 경우가 많이 발생한다. 따라서 음절 자질을 사용하여 질의를 분석 단계에서 형태소 분석의 영향을 최소화하는 방법을 제안한다. 제안하는 시스템은 Mean Average Precision 기준으로 0.765, R-Precision 기준으로 0.872의 성능을 보여 기존 시스템보다 성능이 우수하다.

Keywords

Acknowledgement

Supported by : 정보통신기술진흥센터, 한국연구재단

References

  1. Seung-Shik Kang, "Korean Morphological analysis and information retrieval systems," Hongreung Science Publisher, pp. 463, Insudong, Seoul, 2002. (in Korean)
  2. Soo-Jong Do, Yong-Sung Kim, Hong-Sun Yeom, So-Yoon Jung, Kwang-Jun Kim, Jung-Yun Seo, "Korean Information cQA System using Target-predicator Analysis and Topic Extraction," Proc. of the 41th KIISE Fall Conference, pp. 1290-1292, 2014. (in Korean)
  3. Sun-Jae Kwon. Dong-Hyun Yoo, Dong-Suk Oh, Jeong-Yun Seo, "A cQA System on Korean Language and Grammar unsing LSP Classification," Proc. of the 41th KIISE Fall Conference, pp. 1263-1265, 2014. (in Korean)
  4. Kiyota, Yoji, Sadao Kurohashi, Fuyuko Kido. "Dialog navigator: A Question-Answering system based on large text knowledge base," Proc. of the 19th international conference on Computational linguistics, Vol. 1, pp. 1-7, 2002.
  5. Kang Liu, Jun Zhao, Shizhu He, Yuanzhe Zhang, "Question-Answering over knowledge bases," Intelligent Systems, IEEE, Vol. 30, No. 5, pp. 26-35, 2015.
  6. Liu, Xiaoyong, W. Bruce Croft, and Matthew Koll. "Finding experts in community-based question-answering services," Proc. of the 14th ACM international conference on Information and knowledge management. ACM, pp. 315-316, 2005.
  7. Qu, Mingcheng, et al., "Probabilistic question recommendation for Question-Answering communities," Proc. of the 18th international conference on World wide web. ACM, pp. 1229-1230, 2009.
  8. Guangyou Zhou, Tingting He, Jun Zhao, Po Hu. "Learning continuous word embedding with metadata for question retrieval in community Question-Answering," Proc. of ACL, pp. 250-259, 2015.
  9. Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, Xiaolong Wang, "Answer sequence learning with neural networks for answer selection in community Question-Answering," Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 2, pp. 713-718, 2015.
  10. Kyoungman Bae, Youngjoong Ko, Jonghoon Kim. "A Topic Classifiaction Method Based on a Language Model Using the Structural Features of the Question-Answer Pair on cQA," Journal of KIISE : Software and Applications, Vol. 39, No. 8, pp. 664-671, 2012. (in Korean)
  11. M. Rami Ghorab, Dong Zhou, Alexander O'Connor, Vincent Wade, "Personalised information retrieval: survey and classification," User Modeling and User-Adapted Interaction, Vol. 23, No. 4, pp. 381-443, 2013. https://doi.org/10.1007/s11257-012-9124-1
  12. Yongmin Park, Bogyum Kim, JaeSung Lee, "Feature Extraction for Community Question-Answering System(cQA) considering Question Characteristic," Proc of the 26th HCLT 2014, pp. 119-121, 2014. (in Korean)
  13. Jung-,Min Mun, Young-Ho Song, Ji-Hwan Jin, Hyun-Sup Lee, Hyuna Lee, "Similar Question Search System for online Q&A for the Korean Language Based on Topic Classification," Conitive Science, Vol. 26, No. 3, pp. 263-278, 2015. (in Korean)
  14. Haebin Shin, Woongchan An, Yujin Jung, Sunjae Kwon, Juae Kim, Jungyeon Seo, "A cQA System of korean information Method Based on a Machine Learning Using CRF and SVM," Proc. of the 42th KIISE Fall Conference, pp. 1677-1679, 2015. (in Korean)
  15. HyunSun Hwang, KyoungHo Choi, GeonYeong Kim, JunHo Oh, Changki Lee, "CQA System Using Deep Learning," Proc. of the 42th KIISE Fall Conference, pp. 572-574, 2015. (in Korean)
  16. Lafferty, John, Andrew McCallum, and Fernando CN Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. of the 18th International Conference on Machine Learning 2001, pp. 282-289, 2001.
  17. Suykens, Johan AK, and Joos Vandewalle, "Least squares support vector machine classifiers," Neural processing letters, Vol. 9, No. 3, pp. 293-300, 1999. https://doi.org/10.1023/A:1018628609742
  18. Yang, Yiming, and Jan O. Pedersen, "A comparative study on feature selection in text categorization," ICML, Vol. 97, pp. 412-420, 1997.
  19. Qi Su, Chu-Ren Huang, Helen Kai-Yun Chen, "An Ensemble Framework for the Prediction of Best Community Answers," ACM SIGIR Forum, number 2, ACM, 2012.
  20. JJ. Jeon, W. B. Croft, and J. H. Lee, "Finding similar questions in large question and answer archives," Proc. of the 14th ACM Conference on Information and Knowledge Management, pp. 84-90, 2005.
  21. Mi. Lupu, K. Mayer, J. Tait, A. J. Trippe, "Current Challenges in Patent Information Retrieval," Springer, pp. 78-81, Berlin Germany, 2011.