DOI QR코드

DOI QR Code

Query Expansion Using Augmented Terms in an Extended Boolean Model

  • Nguyen, Tuan-Quang (Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST)) ;
  • Heo, Jun-Seok (Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST)) ;
  • Lee, Jung-Hoon (Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST)) ;
  • Kim, Yi-Reun (Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST)) ;
  • Whang, Kyu-Young (Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST))
  • Published : 2008.03.31

Abstract

We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al.

Keywords

References

  1. BAEZA-YATES, R. AND RIBEIRO-NETO, B., Modern Information Retrieval, Addison Wesley, 1999.
  2. XU, J. AND CROFT, W. B., "Improving the Effectiveness of Information Retrieval with Local Context Analysis," ACM Transactions on Information Systems (TOIS), Vol. 18, No. 1, pp. 79-112, Jan. 2000. https://doi.org/10.1145/333135.333138
  3. KWON, O. W., KIM, M. C., AND CHOI, K. S., "Query Expansion Using Domain Adapted, Weighted Thesaurus in an Extended Boolean Model," In Proc. 3rd Int'l Conf. on Information and Knowledge Management, pp. 140-146, Gaithersburg, Maryland, Nov. 1994.
  4. MANDALA, R., TOKUNAGA, T., AND TANAKA, H., "Combining Multiple Evidence from Different Types of Thesaurus for Query Expansion," In Proc. 22nd Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 191-197, Berkeley, Aug. 1999.
  5. SALTON, G. AND VOORHEES, E., "A Comparison of Two Methods for Boolean Query Relevancy Feedback," Information Processing & Management, Vol. 20, No. 5, pp. 637-651, Sept. 1984. https://doi.org/10.1016/0306-4573(84)90080-3
  6. CLARKE, C. L. A., CORMACK, G. V., AND TUDHOPE, E. A., "Relevance Ranking for One to Three Term Queries," Information Processing & Management, Vol. 36, No. 2, pp. 291-311, Mar. 2000. https://doi.org/10.1016/S0306-4573(99)00017-5
  7. SALTON, G., FOX, E. A., AND WU, H., "Extended Boolean Information Retrieval," Communications of the ACM, Vol. 26, No. 12, pp. 1022-1036, 1983. https://doi.org/10.1145/182.358466
  8. SALTON, G. AND LESK, M. E., "Computer Evaluation of Indexing and Text Processing," Journal of the ACM, Vol. 15, No. 1, pp. 8-36, Jan. 1968. https://doi.org/10.1145/321439.321441
  9. CHUNG, Y. M. AND LEE, J. Y., "Optimization of Some Factors Affecting the Performance of Query Expansion," Information Processing & Management, Vol. 40, No. 6, pp. 891-917, Nov. 2004. https://doi.org/10.1016/j.ipm.2003.11.003
  10. NIE, J. AND JIN, F., "Integrating Logical Operators in Query Expansion in Vector Space Model," In Proc. ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, Tampere, Finland, Aug. 2002.
  11. SILBERSCHATZ, A., GALVIN, P. B., AND GAGNE, G., Operating System Concepts, Wiley, 2003.
  12. HIEMSTRA, D., A Linguistically Motivated Probabilistic Model of Information Retrieval, In Proc. The 2nd European Conference on Research and Advanced Technology for Digital Libraries (ECDL), pp. 569−584, Crete, Greece, Sept. 1998.
  13. VOORHEES, E. M. AND HARMAN, D., "Overview of the Sixth Text Retrieval Conference (TREC-6)," In Proc. The 6th Text REtrieval Conference, pp. 1-24, Gaithersburg, Maryland, Nov. 19-21, 1997.
  14. FELLBAUM, C., WordNet − An Electronic Lexical Database, MIT Press, 1998.
  15. WHANG, K., LEE, M., KIM, M., AND HAN, W., "Odysseus: a High-Performance ORDBMS Tightly-Coupled with IR Features," In Proc. IEEE 21th Int'l Conf. on Data Engineering (ICDE), pp. 1104−1005, Tokyo, Japan, Apr. 5−8, 2005.

Cited by

  1. Selective sampling techniques for feedback-based data retrieval vol.22, pp.1-2, 2011, https://doi.org/10.1007/s10618-010-0168-8