DOI QR코드

DOI QR Code

A Labeling Methods for Keyword Search over Large XML Documents

대용량 XML 문서의 키워드 검색을 위한 레이블링 기법

  • 선동한 (한국항공대학교 컴퓨터정보공학) ;
  • 황수찬 (한국항공대학교 컴퓨터정보공학)
  • Received : 2014.02.20
  • Accepted : 2014.08.19
  • Published : 2014.09.15

Abstract

As XML documents are getting bigger and more complex, a keyword-based search method that does not require structural information is needed to search these large XML documents. In order to use this method, not only all keywords expressed as nodes in the XML document must be labeled for indexing but also structural information should be well represented. However, the existing labeling methods either have very simple information of XML documents for index or represent the structural information which is difficult to deal with the increase of XML documents' size. As the size of XML documents is getting larger, it causes either the poor performance of keyword search or the exponential increase of space usage. In this paper, we present the Repetitive Prime Labeling Scheme (RPLS) in order to improve the problem of the existing labeling methods for keyword-based search of large XML documents. This method is based on the existing prime number labeling method and allows a parent's prime number to be used at a lower level repeatedly so that the number of prime numbers being generated can be reduced. Then, we show an experimental result of the comparison between our methods and the existing methods.

XML 문서가 점차 복잡해지면서 XML문서의 구조를 알 필요 없이 키워드로만 검색을 하는 키워드 검색 방식이 많이 사용되고 있다. XML문서 내에서 키워드 검색 방식을 사용하기 위해서는 문서 내의 모든 키워드에 레이블을 부여해야 하며, 구조적인 정보 또한 레이블 내에 충분히 표현해야한다. 하지만 기존 레이블링 방법들은 색인을 위한 단순정보만 레이블링 하거나, 증가하는 XML문서의 크기에 대응하기 어려운 형태로 구조적인 정보를 표현한다. 이는 XML문서가 커질수록 키워드검색성능이 떨어지거나, 공간 사용량이 기하급수적으로 증가하는 문제를 야기한다. 따라서 본 논문에서는 대용량 XML문서에 대한 키워드 검색 시 기존 레이블링 방식이 가지고 있던 문제점을 보완하는 새로운 레이블링 방식인 RPLS(Repetitive Prime Labeling Scheme)을 소개한다. 이 방법은 기존 소수 레이블방식을 개선하여 상위 레벨의 소수를 하위 레벨에서 반복 사용할 수 있도록 하여 레이블링을 위해 생성해야하는 소수의 수를 감소시키도록 한 것이다. 본 논문에서는 대용량 XML 문서의 키워드검색에 대한 RPLS 스킴의 효율성 검증을 위해 기존 레이블링 기법들과의 성능 비교 실험 결과도 제시한다.

Keywords

References

  1. T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, and F. Yergeau, "Extensible Markup Language (XML)1.0," W3C Recommendation, Vol. 6, 2000.
  2. D. Carmel, Y.S. Maarek, M. Mandelbrod, Y. Mass, A. Soffer, "Searching XML Documents via XML Fragments," Proc. of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 151-158, 2003.
  3. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSEarch: A Semantic Search Engine for XML," Proc. of 29th International Conference on Very Large Data Bases, pp. 45-56, 2003.
  4. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "XRANK: Ranked Keyword Search over XML Documents," Proc. of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 16-27, 2003.
  5. V. Hritidis, P. Papakonstantinou, and A. Balmin, "Keyword Proximity Search on XML Graph," Proc. of the 19th International Conference on Data Engineering, pp. 367-378, 2003.
  6. A. Theobald, and G. Weikum, "Adding Relevance to XML," Proc. of the 3th International Workshop on the Web and Databases, pp. 105-124, 2000.
  7. J. Clark, and S Derose, "XML path language (XPath)," http://www.w3.org/TR/xpath, 1999.
  8. D. Chamberlin, "XQuery: An XML query language," IBM System Journal 41, pp. 597-615, 2003.
  9. Y Xu, and Y. Papakonstantinou,"Efficient Keyword Search for Smallest LCAs in XML Databases," Proc. of the 2005 ACM SIGMOD international conference on Management of data, pp. 527-538, 2005.
  10. L. Quanzhong, M. Bongki, "Indexing and Querying XML Data for Regular Path Expressions," Proc. of International Conference on Very Large Data Bases, pp. 361-370, 2001.
  11. L. Changging, T.W. Ling, "An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML," Proc. of International Conference on Database Systems for Advanced Applications, pp. 125-137, 2005.
  12. Xiang Y, Deng Z, Yu H, Wang S and Gao N "A New Indexing Strategy for XML Keyword Search," Proc. of International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2412-2416, 2010.
  13. I. Tatarinov, S. Viglas, and K. Beyer, "Storing and querying ordered XML using a relational database system," Proc. of 2003 ACM SIGMOD Madison, pp. 204-215, 2002.
  14. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, "Structural Joins : A Primitive for Efficient XML Query Pattern Matching," Proc. of the 10th International Conference on Data Engineering(ICDE), pp. 141-154, 2002.

Cited by

  1. A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce vol.73, pp.2, 2017, https://doi.org/10.1007/s11227-016-1803-y
  2. Efficiently Answering Reachability Queries for Tree-Structured Data in Repetitive Prime Number Labeling Schemes vol.8, pp.5, 2018, https://doi.org/10.3390/app8050785