XML Document Filtering based on Segments

세그먼트 기반의 XML 문서 필터링

  • 권준호 (서울대학교 전기컴퓨터공학부) ;
  • ;
  • 문봉기 (아리조나대학교 전산학과) ;
  • 이석호 (서울대학교 전기컴퓨터공학부)
  • Published : 2008.08.15

Abstract

In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribed users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered to only the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a fast and scalable XML filtering system called SFiST which is an extension of the FiST system. Sharable segments are extracted from twig patterns and stored into the hash-based Segment Table in SFiST system. Segments are used to represent user profiles as Terse Sequences and stored in the Compact Segment Index during filtering. Our experimental study shows that SFiST system has better performance than FiST system in terms of filtering time and memory usage.

최근 XML 문서 필터링에 기반한 출판-구독(publish-subscribe) 시스템이 많은 관심을 받고 있다. 전형적인 출판-구독 시스템에서, 구독자들은 XPath 언어로 명세된 프로파일로 자신들의 관심을 표현하고, 새로운 내용들은 사용자 프로파일에 대하여 매칭 여부를 판단하여 관심을 가지고 있는 사용자들에게만 배달된다. 구독자의 수와 그들의 프로파일이 증가할수록, 시스템의 확장성이 출판-구독 시스템의 중요한 성공 요소가 된다. 이 논문에서는 FiST 시스템을 확장한 세그먼트 기반의 XML 문서 필터링 시스템인 SFiST 시스템을 제안한다. SFiST 시스템은 XML 문서 필터링에서 중복된 처리를 없애기 위해서 가지형 패턴의 사용자 프로파일에서 세그먼트를 추출하여 해시 기반의 세그먼트 테이블에 저장하고 유지한다. 이 세그먼트는 사용자 프로파일을 터스 시퀀스 형태로 표현하는데 이용되고, 효율적인 필터링을 위한 컴팩트 시퀀스 인덱스에도 사용된다. 실험을 통하여 세그먼트 기반의 SFiST 시스템이 이전의 연구인 FiST 시스템보다 좋은 성능을 가지고 있음을 보였다.

Keywords

References

  1. James Clark, Steve DeRose, "XML Path Language (XPath) version 1.0," http://www.w3.org/ TR/ xpath/ (Nov. 1999)
  2. Joonho Kwon, Praveen Rao, Bongki Moon, Sukho Lee, "FiST: Scalable XML Document Filtering by Sequencing Twig Patterns," In Proceeding of the 31st VLDB Conference, pp. 217-228, 2005
  3. 권준호, Praveen Rao, 문봉기, 이석호, "가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링", 정보과학회논문지:데이타베이스, 제33권, 제4호, pp. 423-436, 2006
  4. H. Prüfer, "Neuer Beweis eines Satzes über Permutationen," Archiv fur Mathematik und Physik, 27: 142-144, 1998
  5. Mehmet Altinel, Michael J. Franklin, "Efficient Filtering of XML Documents for Selective Dissemination of Information," In Proceeding of the 26th VLDB Conference, pp. 53-64, Cairo, Egypt, September 2000
  6. Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, Peter Fischer, "Path sharing and predicate evaluation for high-performance XML filtering," ACM Trans. Database Syst, 28(4) : 467- 516, 2003 https://doi.org/10.1145/958942.958947
  7. Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, "Processing XML Streams with Deterministic Automata," In Proceedings of the 9th International Conference on Database Theory. Siena, Italy, 2003, pp. 173-189
  8. Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, "Processing XML streams with Deterministic Automata and Stream Indexes," ACM Trans. Database Syst., Vol.29, No.4, pp.752-788, 2004 https://doi.org/10.1145/1042046.1042051
  9. Bingsheng He, Qiong Luo, Byron Choi, "Cache- Conscious Automata for XML Filtering," In Proceedings of the 21st IEEE International Conference on Data Engineering. Tokyo, Japan, 2005, pp. 878-889
  10. Bingsheng He, Qiong Luo, Byron Choi, "Cache-Conscious Automata for XML Filtering," IEEE Trans. Knowl. Data Eng, Vol.18, No.12, pp. 1629- 1644, 2006 https://doi.org/10.1109/TKDE.2006.184
  11. Bertram Ludäscher, Pratik Mukhopadhyay, Yannis Papakonstantinou, "A transducer-based XML query processor," In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002, pp. 227-238
  12. Scott Boag, Don Chamberlin, Mary F. Fernández, Daniela Florescu, Jonathan Robiem, Jérôme Siméon, "XQuery 1.0: An XML Query Language," http:// www.w3.org/TR/xquery/
  13. Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," In Proceeding of the 2003 ACM-SIGMOD conference, pp. 419-430, San Diego, CA, June 2003
  14. Feng Peng and Sudarshan S. Chawathe, "XPath queries on streaming data," In Proceeding of the 2003 ACM-SIGMOD Conference, pp. 431-442, San Diego, CA, June 2003
  15. Chee Yong Chan, Pascal Felber, Minos N. Garofalakis, Rajeev Rastogi, "Efficient Filtering of XML Documents with XPath Expressions," In Proceedings of the 18th IEEE International Conference on Data Engineering, pp. 235-244, San Jose, CA, February 2002
  16. Nicolas Bruno, Luis Gravano, Nick Koudas, Divesh Srivastava, "Navigation- vs. Index-Based XML Multi-Query Processing," In Proceedings of the 19th IEEE International Conference on Data Engineering, pp. 139-150, Bangalore, India, March 2003
  17. Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, Jussi Myllymaki, "Implementing a Scalable XML Publish/Subscribe System Using a Relational Database System," In Proceeding of the 2004 ACM-SIGMOD Conference, pp. 479-490, Paris, France, June 2004
  18. Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou, "Bloom Filter-based XML Packets Filtering for Millions of Path Queries," In Proceedings of the 21st IEEE International Conference on Data Engineering. Tokyo, Japan, 2005, pp. 890-901
  19. K. Selçuk Candan, Wang-Pin Hsiung, Songting Chen, Jun'ichi Tatemura and Divyakant Agrawal, "AFilter: adaptable XML filtering with prefix- caching suffix-clustering," In Proceedings of the 32nd VLDB Conference, Seoul, Korea, 2006, pp. 559-570
  20. James Clark, "XSL Transformations (XSLT) Version 1.0," http://www.w3.org/TR/xslt/ (Nov. 1999)
  21. David Megginson, Simple API for XML, http:// sax.sourceforge.net/
  22. Apache Xerces C++ Parser. http://xml.apache.org/ xerces-c/
  23. Michael Ley, DBLP Bibliography. http://www. informatik.uni-trier.de/~ley/db/
  24. The Penn Treebank Project, http://www.cis.upenn. edu/~ treebank/
  25. Angel Luis Diaz and Douglas Lovell, XML Generator. http://www.alphaworks.ibm.com/ tech/xmlgenerator