DOI QR코드

DOI QR Code

A Schema Extraction Method using Elements Information in XML Documents

XML 문서에서의 엘리먼트 정보를 이용한 스키마 추출방법

  • 김성림 (동덕여자대학교 정보학부 컴퓨터학전공 강의전임) ;
  • 윤용익 (숙명여자대학교 정보학부 멀티미디어학과)
  • Published : 2002.06.01

Abstract

XML documents, which are becoming new standard for expressing and exchanging data in the Internet, don't have defined schema. It is not adequate to directly apply XML documents to the existing SQL or OQL. Research on how to extract Schema for XML documents and query language is going on actively. For users' query, the results could be too tony or too less. It Is important to give the users adequate results. This paper suggests the way to extract many levelized schema according to the frequency of element occurrence in XML documents. The Schema can be reduced or extended to correspond to the users' query more flexibly.

인터넷상에서 데이터를 표현하고 교환하는 새로운 표준으로 등장하는 XML 문서는 정해진 스키마를 가지고 있지 않다. XML 문서를 기존의 SQL이나 OQL에 바로 적용하기에는 부적합하여 이러한 XML 문서에 대해 스키마를 추출하는 방법과 질의어에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 XML문서에 대해 엘리먼트 정보를 이용하여 스키마를 추출하고, 추출된 스키마를 바탕으로 데이터 빈도수에 따라 새로운 여러 단계의 스키마를 추출하는 방법을 제시하고 실험한다.

Keywords

References

  1. Jon Bosak, 'XML, Java and the Future of the Web,' http : //webreview.com/wr/pub/97/12/19/xml/index.html
  2. Roy Goldman, Jennifer Widom, 'DataGuides : Enabling Query Formulation and Optimization in Semistructured Data-bases,' In Proceedings of VLDB, 1997
  3. Jiawei Han, Jian Pei, Yiwen Yin, 'Mining Frequent Patterns without Candidate Generation,' Proceedings of the 2000 ACM SIGMOD on Management of data, pp.1-12, 2000 https://doi.org/10.1145/335191.335372
  4. Theodore Johnson 'Performance Measurements of Com-pressed Bitmap Indices,' VLDB, pp.278-289, 1999
  5. Alon Levy, 'More on Data Management for XML,' Uni-versity of Washington, May 9th, http : //www.cs.washing-ton.edu/homes/alon/widom-response.html, 1999
  6. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, J. Widom, 'Lore : A Database Management System for Semistruc-tured Data,' SIGMOD Record, 26(3), pp.54-66, September, 1997 https://doi.org/10.1145/262762.262770
  7. Patrick O'Neil, 'Improved Query Performance with Variant Indexes,' Proceedings of ACM SIGMOD, pp.38-49, 1997 https://doi.org/10.1145/253262.253268
  8. Jayavel Shanmugasundaran, Kristin Tufte, Gang He, Chun Zhang, David DeWit, Jeffrey Naughton, 'Relational Data-bases for Querying XML Documents : Limitations and Op-portunities,' Proceedings of the 25th VLDB Conference, 1999
  9. Dan Suciu, 'Semistructured Data and XML,' Proceed-ings of International Conference on Foundation of Data Organization, 1998
  10. M. C. Wu, A. P. Buchmann, 'Encoded Bitmap Indexing for Data Warehouses,' Proc. ICDE '98, pp.220-230
  11. Jennifer Widom, 'Data Management for XML,' Working Document, intial draft appeared April 1999, Also IEEE Data Engineering Bulletin, Special Issue on XML, 22(3) : 44-52, September, 1999
  12. Ke Wang, Huiqing Liu, 'Schema Discovery from Semis-tructured Data,' International Conference on Knowledge Discovery and Data Mining, pp.271-274, August, 1997
  13. Ke Wang, Huiqing Liu, 'Discovering Typical Structures of Documents : A Road Map Approach,' The ACM SIGR conference on Research and Development in Information Retrieval, pp.146-154, August, 1998 https://doi.org/10.1145/290941.290982
  14. Ming-Chuan Wu., 'Query optimization for selections using bitmaps,' Proceedings of the 1999 ACM SIGMOD inter-national conference on Management of data, pp.227-238
  15. J. Yoon, S. Kim, 'Schema Extraction for Multimedia XML Document Retrieval,' in Proc. of International Database Symposium on Mobile, XML and Post-Relational Data-bases Hong Kong, June, 2000 https://doi.org/10.1109/WISE.2000.882867
  16. Also to appear in Journal of Applied Systems Studies, Cambridge International Sci-ence Publishing, Cambridge, UK, 2001
  17. http : //us.imdb.com/top_250_films