An XML-based Wrapper System for Integrating Web Information Sources

웹 정보원 통합을 위한 XML 기반의 랩퍼 시스템

  • 배종민 (경상대학교 컴퓨터과학부/컴퓨터정보통신연구소) ;
  • 박은경 (경상대학교 컴퓨터과학과) ;
  • 정채영 (경상대학교 컴퓨터과학과)
  • Published : 2006.12.30

Abstract

It became important to develop a wrapper for web information sources due to prevalence of information services through web information sources. We present a wrapper prototype that is a middleware to integrate web information sources. We present the derivation strategy of XML Schema from HTML documents and the query processing method based on XQJ. The usage example of wrapper API will show the usefulness of our prototype system.

최근 웹 정보원에서 제공하는 정보가 정보서비스의 주류를 이루면서 웹 정보원 랩퍼 개발의 중요성이 크게 부각되었다. 본 논문은 웹 정보원을 통합하기 위한 미들웨어로서의 웹 랩퍼를 설계, 구현한 결과를 제시한다. 특히 HTML 문서로부터 XML 스키마로 변환하는 방법을 제시하고, XQuery 질의어에 대한 파서와 XQJ 기반의 질의처리 과정을 제시한다. 그리고 개발된 랩퍼 API의 사용 예를 통하여 그 유용성을 보인다.

Keywords

References

  1. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira, 'A brief survey of Web Data extraction tools.' ACM Sigmod Record. 31(2):84-93, June 2002 https://doi.org/10.1145/565117.565137
  2. Ling Liu, Calton Pu, Wei Han, 'XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources', In ICDE(San Diego CA), pages 611-621, 2000, February 2000
  3. Arnaud Sahuguet, Fabien Azavant, 'Building intelligent Web application using lightweight wrappers', Data and Knowledge Engineering, 36(3):283-316, 2001 https://doi.org/10.1016/S0169-023X(00)00051-3
  4. World-Wide Web Consortium, 'XQuery 1.0: An XML Query Language', [Online]. Available: http://www. w3.org/TR/xquery/, W3C Candidate Recommendation 8 June 2006
  5. Andrew Eisenberg, Jim Melton, 'An early look at XQuery API for JavaTM(XQJ)', ACM Sigmod Record. 33(2): 105-111, June 2004 https://doi.org/10.1145/1024694.1024717
  6. World-Wide Web Consortium, 'XML Schema Part 0: Primer Second Edition', [Online]. Available :http://WWW.w3.org/TR/xmlschema-0/, W3C Recommendation 28 October 2004
  7. World-Wide Web Consortium, 'XML Schema Part 1: Structures Second Edition', [Online]. Available : http://www.w3.org/TR/xmlschema-1/, W3C Recommendation 28 October 2004
  8. World-Wide Web Consortium, 'XML Schema Part 2: Datatypes Second Edition', [Online]. Available: http://www.w3.org/TR/xmlschema-2/, W3C Recommendation 28 October 2004
  9. XSLT and XQuery Processing, [Online]. Available: http://www.saxonica.com/
  10. HTML TIDY, [Online]. Available: http://tidy.sourceforge.net/
  11. NCBI [Online]. Available: http://www.ncbi.nlm.nih.gov/
  12. Baumgartner, R., Flesga, S., Gottlob, G., 'Visual Web information extraction with Lixto', In Proceedings of the 26th International Conference on Very Large Data Bases, pp119-128, 2001
  13. C.Baru, A. Gupta, B.Ludascher, R.MarchianoYannis, Y.Papakonstantinou, P.Velikhov, and V.Chu, 'XML-based Information Mediation with MIX', In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp597-599, 1999
  14. Hammer, J., Garcia-molina, H., Nestorov, S., Yemeni, R., Breunig, M., and Vassalos, V., 'Template-based wrapper in the TSIMMIS system', In Proceedings of the ACM SIGMOD International Conference on Management of data, pp532-535, 1997