DOI QR코드

DOI QR Code

Answer Snippet Retrieval for Question Answering of Medical Documents

의학문서 질의응답을 위한 정답 스닛핏 검색

  • 이현구 (강원대학교 컴퓨터정보통신공학과) ;
  • 김민경 (강원대학교 컴퓨터정보통신공학과) ;
  • 김학수 (강원대학교 컴퓨터정보통신공학과)
  • Received : 2016.02.04
  • Accepted : 2016.05.08
  • Published : 2016.08.15

Abstract

With the explosive increase in the number of online medical documents, the demand for question-answering systems is increasing. Recently, question-answering models based on machine learning have shown high performances in various domains. However, many question-answering models within the medical domain are still based on information retrieval techniques because of sparseness of training data. Based on various information retrieval techniques, we propose an answer snippet retrieval model for question-answering systems of medical documents. The proposed model first searches candidate answer sentences from medical documents using a cluster-based retrieval technique. Then, it generates reliable answer snippets using a re-ranking model of the candidate answer sentences based on various sentence retrieval techniques. In the experiments with BioASQ 4b, the proposed model showed better performances (MAP of 0.0604) than the previous models.

온라인 의학 문서의 폭발적 증가와 함께 질의응답 시스템에 대한 필요성이 늘어나고 있다. 최근에는 기계학습에 기반 한 질의응답 모델들이 다양한 영역에서 좋은 결과를 보여 왔다. 그러나 의학 영역에서 질의응답 모델들은 학습 데이터의 부족으로 인해 여전히 정보 검색 기술에 기반을 두고 있다. 본 논문에서는 다양한 정보검색 기술에 기반 한 의학문서 질의응답용 정답 스닛핏 검색 모델을 제안한다. 제안 모델은 먼저 클러스터 기반 검색 기술을 이용하여 의학 문서로부터 많은 정답 후보 문장을 검색한다. 그리고 다양한 문장 검색 기술들에 기반 한 정답 후보 문장 재순위화 모델을 사용하여 신뢰성 있는 정답 스닛핏을 생성한다. BioASQ 4b 데이터를 이용한 실험에서 제안 모델은 기존 모델보다 좋은 성능(MAP 0.0604)을 보였다.

Keywords

Acknowledgement

Grant : 링크드데이터 기반 대화형 질의응답 검색 프레임워크 개발

Supported by : LG전자

References

  1. Nedellec, Claire, et al., "Overview of BioNLP shared task 2013," Proc. of the BioNLP Shared Task 2013 Workshop, pp. 1-7, 2013.
  2. IBMWatson and Medical Records Text Analytics HIMSS Presentation [Online]. Available: http://www-01.ibm.com/software/ebusiness/jstart/downloads/MRTAWatsonHIMSS.pdf (downloaded 2015 Nov, 1)
  3. Balikas, Georgios, et al., "Results of the BioASQ tasks of the Question Answering Lab at CLEF 2015," CLEF 2015, 2015.
  4. Aronson, Alan R., and Thomas C. Rindflesch, "Query expansion using the UMLS Metathesaurus," Proc. of the AMIA Annual Fall Symposium, American Medical Informatics Association, 1997.
  5. Ben Abacha, Asma, and Pierre Zweigenbaum, "Medical question answering: translating medical questions into sparql queries," Proc. of the 2nd ACM SIGHIT International Health Informatics Symposium, ACM, 2012.
  6. Yu, Lei, et al., "Deep learning for answer sentence selection," arXiv preprint arXiv:1412.1632, 2014.
  7. Bordes, Antoine, Sumit Chopra, and Jason Weston, "Question answering with subgraph embeddings," arXiv preprint arXiv:1406.3676, 2014.
  8. Ravichandran, Deepak, and Eduard Hovy, "Learning surface text patterns for a question answering system," Proc. of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp. 41-47, 2002.
  9. Neves, Mariana, "HPI question answering system in the BioASQ 2015 challenge," Working Notes for the Conference and Labs of the Evaluation Forum (CLEF), Toulouse, France, 2015.
  10. Yenala, Harish, et al., "IIITH at BioASQ Challange 2015 Task 3b: Bio-Medical Question Answering System," Toulouse, France, 2015.
  11. Zhang, Zhi-Juan, et al., "A generic retrieval system for biomedical literatures: USTB at BioASQ2015 Question Answering Task," Working Notes for the Conference and Labs of the Evaluation Forum (CLEF), Toulouse, France, 2015.
  12. Peng, Shengwen, et al., "The Fudan participation in the 2015 BioASQ Challenge: Large-scale Biomedical Semantic Indexing and Question Answering," Working Notes for the Conference and Labs of the Evaluation Forum (CLEF), Toulouse, France. 2015.
  13. Song, Fei, and W. Bruce Croft, "A general language model for information retrieval," Proc. of the eighth international conference on Information and knowledge management, ACM, pp. 316-321, 1999.
  14. Merkel, Andreas, and Dietrich Klakow, "Comparing improved language models for sentence retrieval in question answering," LOT Occasional Series 7, pp. 35-50, 2007.
  15. Bodenreider, Olivier, "The unified medical language system (UMLS): integrating biomedical terminology," Nucleic acids research 32. suppl 1 : D267-D270, 2004. https://doi.org/10.1093/nar/gkh061
  16. Aronson, Alan R., "Metamap: Mapping text to the umls metathesaurus," Bethesda, MD: NLM, NIH, DHHS, pp. 1-26, 2006.
  17. Lafferty, John, Andrew McCallum, and Fernando CN Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," Proc. of the 18th International Conference on Machine Learning 2001 (ICML 2001), pp. 282-289, 2001.
  18. Robertson, Stephen E., et al., "Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track," Nist Special Publication SP, pp. 253-264, 1999.
  19. Blanco, Roi, and Hugo Zaragoza, "Finding support sentences for entities," Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, 2010.
  20. C D. Paice, "Soft evaluation of Boolean search queries in information retrieval systems," Information Technology Research Development Applications, Vol. 3, pp. 33-41, 1984.
  21. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schutze, "Scoring, term weighting and the vector space model," Introduction to Information Retrieval 100, 2008.
  22. BioASQ-Task B 3b Training Data [Online]. Available: http://participants-area.bioasq.org/general_information/Task3b/ (downloaded 2015, Mar. 1)
  23. BioASQ-Task B 4b Batch2 Data [Online]. Available: http://participants-area.bioasq.org/Tasks/4b/ (downloaded 2016, Mar. 24)
  24. BioASQ-EvalMeasures-taskB [Online]. Available: http://participants-area.bioasq.org/oracle/results/taskB/phaseB/ (downloaded 2015, Mar. 1)