메타 검색엔진을 위한 HTML 문서 변경 탐지기의 설계 및 구현

Design and Implementation of an HTML Pages Modification Detector for Meta-search Engines

  • 발행 : 2002.06.01


검색엔진의 HTML문서는 수시로 변경되고 있으며, 이는 각 검색엔진의 결과 문서를 통합하여 사용자에게 제공하는 메타 검색엔진의 기능을 저하시키는 요인이 된다. 이에 대한 해결방법으로 본 논문에서는 HTML 문서의 변경을 탐지하는 HTML문서 변경 탐지기를 설계하고 구현한다. 문서 변경 탐지기는 문서 구조를 추출하기 위해 위치 정보 알고리즘과 수정된 Jaak Vilo 알고리즘을 사용하고, 그 결과로 패턴을 추출한다. 문서 변경 탐지기는 HTML문서에서 반복적으로 출현하는 구조를 표현하는 패턴을 사용한다. 또한, 문서 변경 탐지기의 정확성을 측정하기 위하여 문서 변경에 대한 전략을 세우고 이를 기반으로 실험을 수행한다.

HTML pages in the web change at any time. It could cause to decrease the functionality of meta-search engines which provide users with integrated results of search engines. To solve this problem, we propose an HTML pages modification detector. It utilities information of element positions in HTML pages and the modified Jaak Vilo algorithm. The HTML page modification detector uses patterns that represent the structure of HTML expressions occurring repeatedly in HTML pages. An experiment is carried out to verify the correctness of the modification detector.



  1. D. Drelinger and A. E. Howe, 'An Information Gathering Agent for Querying Web Search Engines,' Technical Re-port CS-96-111, Department of Computer Science, Colo-rado State University
  2. S. Lawrence and C. Giles, 'Accessibility of Information on the Web,' Nature, 400, pp.107-109, 1999
  3. D. Dreilinger and A. E. Howe, 'Experiences with Selecting Search Engines Using Metasearch,' ACM Transactions on Information Systems(TOIS) 15(3), pp.195-222, 1997
  4. E. Selberg and O. Etzioni, 'Multi-Service Search and Com-parison Using the MetaCrawler,' Proceedings of the 4th International World Wide Web Conference, 1995
  5. S. Lawrence and C. Giles, 'Inquirus the NECI meta search engine,' Seventh International World Wide Web Conference, pp.95-105, 1998
  6. L. Gravano, and Y. Papakonstantinou, 'Mediating and Meta-searching on the Internet,' Data Engineering Bulletin 21(2), pp.28-36, 1998
  7. S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom, 'Change Detection in Hierarchically Strucrued In-formation,' In Proceedings of the 1996 ACM SIGMOD In-ternational Conference on Management of Data, pp.493-504
  8. N. Addam, I. Adiwijaya, T. Critchlow, and R. Musick, 'Detecting Data and Schema Change in Scientific Do-cuments,' In Proceedings of IEEE Advances in Digital Libraries 2000(ADL 2000), pp.160-172, 2000
  9. K. Bohm, K. Aberer : 'HyperStorM-Administering Struc-tured Documents Using Object-Oriented Database Tech-nology,' (Demonstrator) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD Record 25(2), pp.547, 1996
  10. V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl, 'From structured documents to novel query facilities,' In Proceedings of the 1994 ACM SIGMOD International Con-ference on Management of Data, pp.313-324, 1994
  11. T. Nguyen and V. Srinivasan, 'Accessing relational data-bases from the World Wide Web,' In Proceedings of the 1996 ACM SIGMOD International Conference on Manage-ment of Data, pp.529-540, 1996
  12. J. T. L. Wang, K. Zhang, and D. Shasha. 'Pattern matching and pattern discovery in scientific, program, and document databases,' In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp.487, 1995
  13. J. T. Wang, D. Shasha, G. J. S. Chang, L. Relihan, K. Zhang, and G. Patel, 'Structural Matching and Discovery in Docu-ment Databases,' Proceedings of SIGMOD'97, pp.560-563, 1997
  14. J. Wang, K. Zhang, K. Jeong, and D. Shasha, 'A System for Approximate Tree Matching,' In IEEE Transaction on Knowledge and Data Engineering, volume 6, pp.559-570, 1994
  15. J. Vilo, 'Discovering Frequent Patterns from Strings,' Tech-nical Report C-1998-9, Department of Computer Science, University of Helsinki, 1998
  16. A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen, 'Predicting Gene Regulatory Elements in Silico on a Genomic Scale,' Genome Research 8, pp.1202-1215, 1998
  17. A. Brazma, I. Jonassen, E. Ukkonen, and J. Vilo, 'Discover-ing patterns and subfamilies in Biosequences,' Proceedings of Fourth International Conference on Intelligent Systems for Molecular Biology(ISMB)-96, AAAI Press, pp.34-43, 1996
  18. M. Crochemore and W. Rytter, 'Text Algorithms,' Oxford University Press, 1994
  19. D. Gusfield, 'Algorithms on Strings, Trees, and Sequences,' Cambridge University Press, 1997
  20. http : //
  21. http : //
  22. D. Florescu, A. Levy, and A. Mendelzon, 'Database Tech-niques for the World-Wide Web : A Survey,' SIGMOD Record 27(3), pp.59-74, 1998
  23. http : //
  24. http : //