DOI QR코드

DOI QR Code

An Efficient Web Search Method Based on a Style-based Keyword Extraction and a Keyword Mining Profile

스타일 기반 키워드 추출 및 키워드 마이닝 프로파일 기반 웹 검색 방법

  • 주길홍 (연세대학교 대학원 컴퓨터학과) ;
  • 이준휘 (소프트그램 기술연구소) ;
  • 이원석 (연세대학교 컴퓨터학과)
  • Published : 2004.10.01

Abstract

With the popularization of a World Wide Web (WWW), the quantity of web information has been increased. Therefore, an efficient searching system is needed to offer the exact result of diverse Information to user. Due to this reason, it is important to extract and analysis of user requirements in the distributed information environment. The conventional searching method used the only keyword for the web searching. However, the searching method proposed in this paper adds the context information of keyword for the effective searching. In addition, this searching method extracts keywords by the new keyword extraction method proposed in this paper and it executes the web searching based on a keyword mining profile generated by the extracted keywords. Unlike the conventional searching method which searched for information by a representative word, this searching method proposed in this paper is much more efficient and exact. This is because this searching method proposed in this paper is searched by the example based query included content information as well as a representative word. Moreover, this searching method makes a domain keyword list in order to perform search quietly. The domain keyword is a representative word of a special domain. The performance of the proposed algorithm is analyzed by a series of experiments to identify its various characteristic.

World Wide Web의 대중화로 인해 전자 정보량이 급속하게 증가하였고, 이러한 많은 양의 다양한 정보에 대한 효율적인 검색 시스템의 필요성이 증대되었다. 정확한 검색 결과를 제공하기 위해 사용자 요구 사항의 올바른 분석과 서술이 중요하게 인식되고 있으며, 분산 환경에서의 요구 사항 추출 및 분석의 필요성이 대두되고 있다. 본 논문에서는 웹 검색 방법에 있어서 목표 검색어만을 가지고 검색을 수행하는 기존 검색 방법과 달리 검색어가 나타나는 문맥 정보를 추가하여 검색하는 방법을 제안하고 구현하였다. 또한 본 논문에서는 제안된 새로운 키워드 추출 방법으로 추출된 키워드를 기반으로 키워드 마이닝 프로파일에 기반한 웹 검색 시스템을 제안하고 구현하였다. 이는 원하는 정보를 대표하는 목표 검색어만 가지고 검색을 수행하는 기존의 검색방법과 달리 검색어가 포함된 문맥정보를 추가하여 검색하기 때문에 기존의 검색방법보다 정확하고 효율적인 정보를 제공한다. 특정 도메인으로부터 순위가 매겨진 도메인 키워드 리스트를 작성하여 이를 기준으로 기존의 출현빈도기반의 차이를 실험을 통하여 보였으며, 예제 기반 질의를 바탕으로 키워드 마이닝 프로파일을 만들어 검색을 수행하는 검색 방법으로 이의 효용성을 실험을 통해 검증하였다.

Keywords

References

  1. E. shakshuki and H. Ghenniwa, 'A multi-agent system architecture for information gathering,' Database and Expert Systems Applications, Proceedings, 11th International Workshop on, pp.732-736, 2000 https://doi.org/10.1109/DEXA.2000.875107
  2. Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 'Modem Information Retrieval,' ADDISON WESLEY, pp.29- 30, 1999
  3. I. Aalbersberg, 'A Document Retrieval Model Based on Term Frequency Ranks,' 17th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp.163-172, 1994
  4. Amit Singhal, Chris Buckley and Mandar Mitra, 'Pivoted Document Length Normalization,' Proceedings of 19th ACM International Conference on Research and Development in Information Retrieval, 1996 https://doi.org/10.1145/243199.243206
  5. Cazalens S., Desmontils S., Jacquin C. and Lamarre P., 'A Web site indexing process for an Internet information retrieval agent system,' Web Information Systems Engineering 2000, Proceedings of the First International Conference on, Vol.1, pp.254-258, 2000 https://doi.org/10.1109/WISE.2000.882400
  6. M. Scmidt and U. Ruckert, 'Content-based information retrieval using an embedded neural associative memory,' Parallel and Distributed Processing 2001 Proceedings, Ninth Euromicro Workshop on, pp.443-450 https://doi.org/10.1109/EMPDP.2001.905073
  7. Weifeng Li, Baowen Xu, Hongji Yang, Cheng-Chung Chu W. and Chih-Wei Lu at Dept. of Compt. Sci. & Eng. Southeast Univ., Nanjing, China, 'Application of genetic algorithm in search engine,' Multimedia Software Engineering, Proceedings, International Symposium on, pp. 366-371, 2000 https://doi.org/10.1109/MMSE.2000.897237
  8. R. Weiss, B. Velez, M. Sheldon, C. Nemprempre, P. Szilagyi and D. K. Gifford, 'HyPursuit: A hierachical Network engine that exploits content-link hypertext clustering,' In Proc. Of the 7th ACM Conference on Hypertext and Hypermedia, Washington, DC, USA, pp.180-193, 1996 https://doi.org/10.1145/234828.234846
  9. A. Broder, S. Glassman, M. Manasse and G. Zweig, 'Syntactic clustering of the web,' In 6th Int. WWW Conference, Snata Clara, CA, USA, pp.391-404, April, 1997 https://doi.org/10.1016/S0169-7552(97)00031-7
  10. C-H. Chang and C-C. Hsu, 'Customizable mulit-engine search tool with clustering,' In 6th Int. WWW Conference, Santa Clara, Ca, USA, April, 1997
  11. Jiawei Han, 'Data Mining,' Encyclopedia of Distributed Computing, Kluwer Academic Publisher
  12. R. Agrawal and R. Srikant, 'Mining association rules between sets of items in large databases,' Proceeding of the ACM SIGMOD Conference on Management of Data, Washington, D.C., pp.207-216, May, 1993 https://doi.org/10.1145/170035.170072
  13. R. Agrawal and R. Srikant, 'Fast algorithms for mining association rules,' In Proceedings of the 20th VLDB Conference, Santiago, Chile, Sept., 1994
  14. J. S. Park, M-S. Chen and P. S. Ui, 'An effective hash-based algorithm for mining association rules,' In Proceedings of ACM SIGMOD Conference on Management of Data, San Jose, California, pp.175-186, May, 1995 https://doi.org/10.1145/223784.223813
  15. A. Savasere, E. Omiencinsky and S. Navathe, 'An efficient algorithm for mining association rules in large databases,' In Proceedings of the 21th VLDB Conference, Zurich, Swizerland, pp.432-444, 1995
  16. J. S. Park, P. S. Yu and M.-S. Chen, 'Mining Association Rules with Adjustable Accuracy,' In Proceedings of ACM CIKM '97, Las Vegas, Nevada, pp.151-160, November, 1997 https://doi.org/10.1145/266714.266886
  17. S. Brin, R. Motwani, J D. Ullman and S. Tsur, 'Dynamic itemset Counting and Implication Rules for Market Basket Data,' In Proceedings of ACM SIGMOD Conference on Management of Data, Tucson, Arizona, pp.255-264, May, 1997 https://doi.org/10.1145/253260.253325
  18. S. Harabagiu, D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus and P. Morarescu, 'FALCON : Boosting Knowledge for Answer Engines,' In the Proceedings of Text REtrieval Conference (TREC-9), 2000
  19. S. Alpha, P. Dixon, C. Liao, 'Oracle at TREC 10,' In the Proceedings of Text REtrieval Conference (TREC 2001), 2001
  20. E. Hovy, U. Hermjakob, C-Y Lin, 'The Use of External Knowledge in Factoid QA,' In the Proceedings of Text REtrieval Conference (TREC 2001), 2001