DOI QR코드

DOI QR Code

Development of Web Crawler for Archiving Web Resources

웹 자원 아카이빙을 위한 웹 크롤러 연구 개발

  • 김광영 (한국과학기술정보연구원 정보기술연구실) ;
  • 이원구 (한국과학기술정보연구원 정보기술연구실) ;
  • 이민호 (한국과학기술정보연구원 정보기술연구실) ;
  • 윤화묵 (한국과학기술정보연구원 정보기술연구실) ;
  • 신성호 (한국과학기술정보연구원 정보기술연구실)
  • Received : 2011.06.29
  • Accepted : 2011.08.03
  • Published : 2011.09.28

Abstract

There are no way of collection, preservation and utilization for web resources after the service is terminated and is gone. However, these Web resources, regardless of the importance of periodically or aperiodically updated or have been destroyed. Therefore, to collect and preserve Web resources Web archive is being emphasized. Web resources collected periodically in order to develop Web archiving crawlers only was required. In this study, from the collection of Web resources to be used for archiving existing web crawlers to analyze the strengths and weaknesses. We have developed web archiving systems for the best collection of web resources.

Keywords

Archiving Web Resources;Web Crawler;Web Snapshot Robot;Archiving System;Permanent Preservation

References

  1. 이성숙, "웹 아카이빙 도구에 관한 연구", 한국정보 관리학회 학술대회, 제5권, pp.185-193, 2005.
  2. 김유승, "공공기록물 관리에 관한 법률의 제정 의의와 개선방안", 한국기록관리학회지, 제8권, 제1호, pp.5-24, 2008.
  3. B. Adrian, Archiving Website: a practical guide for information management professionals, facet publishing, 2006
  4. 차승준, 정준선, 이규철, "공공기관 웹기록물 아카이빙을 위한 웹 크롤러 연구 개발", 한국정보과학회, 제25권, 제2호, pp.1-15, 2009.
  5. J. Hendler, "Science and the Semantic Web," Science 299(5606) pp.520-521, 2003. https://doi.org/10.1126/science.1078874
  6. 서혜란 "웹 아카이빙의 성과와 미래 전망", 한국비블리아학술발표 제10집, pp.7-25, 2004.
  7. Bergman and K. Michael "The Deep Web: Surfacing Hidden Value," Journal of Electronic Publishing, Vol.7, No.1, 2001.
  8. A. Ball, "WEB Archiving," Digital Curation Centre, UKOLN, University of Bath, 2010.
  9. K. Terry, "The Digital Dark Ages?: Challenges in the Perservation of Electronic Information," International Preservation News No.17, pp.8-13, 1998.
  10. K. H. Lee, "The State of the Art and Practice in Digital Preservation," Journal of Research of the national Institute of Standards and Technology Vol.107, No.1, pp.93-106, 2002. https://doi.org/10.6028/jres.107.010
  11. P. M. Krister and A. Allan, "The Kulturarw Project - The Royal Swedish Web Archive," Electronic Library, Vol.16, No.2, pp.105-108, 1998. https://doi.org/10.1108/eb045623
  12. http://crawler.archive.org
  13. http://www.httrack.com
  14. http://bibnum.bnf.fr/downloads/deeparc
  15. http://www.projectcomputing.com/products/pageVault
  16. http://www.gnu.org/oftware/wget
  17. http://www.archive.org

Cited by

  1. Study of Analyzing Outcome of Building and Introducing System for Preserving Full-Text of e-Journal vol.2, pp.2, 2012, https://doi.org/10.5865/IJKCT.2012.2.2.005
  2. Refresh Cycle Optimization for Web Crawlers vol.13, pp.6, 2013, https://doi.org/10.5392/JKCA.2013.13.06.030