DOI QR코드

DOI QR Code

Retrieval Model Based on Word Translation Probabilities and the Degree of Association of Query Concept

어휘 번역확률과 질의개념연관도를 반영한 검색 모델

  • 김준길 (전북대학교 컴퓨터공학과) ;
  • 이경순 (전북대학교 컴퓨터공학부 영상정보신기술연구센터)
  • Received : 2011.04.29
  • Accepted : 2012.01.31
  • Published : 2012.06.30

Abstract

One of the major challenge for retrieval performance is the word mismatch between user's queries and documents in information retrieval. To solve the word mismatch problem, we propose a retrieval model based on the degree of association of query concept and word translation probabilities in translation-based model. The word translation probabilities are calculated based on the set of a sentence and its succeeding sentence pair. To validate the proposed method, we experimented on TREC AP test collection. The experimental results show that the proposed model achieved significant improvement over the language model and outperformed translation-based language model.

정보 검색에서 성능 저하의 주요 요인은 사용자의 질의와 검색 문서 사이에서의 어휘 불일치 때문이다. 어휘 불일치 문제를 해결하기 위해 본 논문에서는 어휘 번역확률을 이용한 번역기반 언어모델에 질의개념연관도를 반영한 검색 모델을 제안한다. 어휘관계 정보를 획득하기 위하여 문장-다음문장 쌍을 이용하여 어휘 번역확률을 계산하였다. 제안모델의 유효성을 검증하기 위해 TREC AP 컬렉션에 대해 실험하였다. 실험결과에서 제안모델이 언어모델에 비해 아주 우수한 성능향상을 보였고, 번역기반 언어모델에 비해서도 높은 성능을 나타냈다.

Keywords

References

  1. A. Berger and J. Lafferty, "Information retrieval as statistical translation," Proceedings of the 22nd annual international ACM SIGIR conference, pp.222-229, Aug., 1999.
  2. P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, "The mathematics of statistical machine translation: parameter estimation," Computational Linguistics 19(2), pp.263-311, 1993.
  3. V. Murdock and W. B. Croft, "A Translation Model for sentence retrieval," Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp.684-691, 2005.
  4. J. Jeon, W. B. Croft and J. H. Lee, "Finding Similar Questions in Large Question and Answer Archives," Proceedings of the 14th ACM CIKM Conference, pp.84-90, 2005.
  5. J. M. Ponte and W. B. Croft, "A language modeling approach to information retrieval," Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.275-281, 1998.
  6. R. Jin, A. G. Hauptmann, and C. Zhai, "Title language model for information retrieval," Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.42-48, 2002.
  7. X. Xue, J. Jeon and W. B. Croft, "Retrieval Models for Question and Answer Archives," Proceedings of the 31st annual international ACM SIGIR conference, pp.475-482, 2008.
  8. 김설영, 이경순, "질문대답 아카이브에서 어휘 연관성을 이용한 질문 분류," 정보처리학회논문지B, 제17권 제4호, pp.327-332, 2010. https://doi.org/10.3745/KIPSTB.2010.17B.4.327
  9. GIZA tool. http://code.google.com/p/giza-pp/
  10. F. J and Och, H. Ney. "A Systematic Comparison of Various Statistical Alignment Models," Proceedings of the Computational Linguistics, Vol.29, No.1, pp.19-51, 2003. https://doi.org/10.1162/089120103321337421