DOI QR코드

DOI QR Code

Improving Performance of Change Detection Algorithms through the Efficiency of Matching

대응효율성을 통한 변화 탐지 알고리즘의 성능 개선

  • 이석균 (단국대학교 정보컴퓨터학부) ;
  • 김동아 (단국대학교 컴퓨터과학)
  • Published : 2007.04.30

Abstract

Recently, the needs for effective real time change detection algorithms for XML/HTML documents and increased in such fields as the detection of defacement attacks to web documents, the version management, and so on. Especially, those applications of real time change detection for large number of XML/HTML documents require fast heuristic algorithms to be used in real time environment, instead of algorithms which compute minimal cost-edit scripts. Existing heuristic algorithms are fast in execution time, but do not provide satisfactory edit script. In this paper, we present existing algorithms XyDiff and X-tree Diff, analyze their problems and propose algorithm X-tree Diff which improve problems in existing ones. X-tree Diff+ has similar performance in execution time with existing algorithms, but it improves matching ratio between nodes from two documents by refining matching process based on the notion of efficiency of matching.

최근 웹 문서의 변조의 탐지, 버전 관리 등을 위한 XML/HTML 문서들에 대한 효과적인 실시간 변화탐지 알고리즘의 필요성이 증대하고 있다. 특히 대용량의 XML/HTML 문서들에 대한 실시간 변화탐지 응용들은 최소비용의 편집스크립트를 계산하는 알고리즘 보다는 실시간 처리가 가능한 빠른 휴리스틱 알고리즘들을 필요로 한다. 기존의 휴리스틱 알고리즘들은 실행속도는 빠르나 생성되는 편집스크립트의 질이 만족스럽지 못하다. 본 논문에서는 기존의 알고리즘 XyDiff와 X-tree Diff를 소개하고 이들 알고리즘들의 문제점들을 분석하고 문제점들을 개선한 알고리즘 X-tree Diff+를 제안한다. X-tree Diff+는 실행시간 측면에서 기존 알고리즘들과 유사하나 대응효율성에 기반한 대응과정의 개선을 통해 두 문서 간의 노트들의 대응률을 향상시킨 알고리즘이다.

Keywords

References

  1. 김동아, 이석균,'X-tree Diff : 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘,' 정보처리학회논문지(C), 제10호 제6권, pp.683-694, 2003 https://doi.org/10.3745/KIPSTC.2003.10C.6.683
  2. 김동아, 'XML 문서에 대한 변화 탐지 및 관리,' 단국대학교 전산통계학과 박사학위논문, pp.1-111, 2005
  3. A. Aboulnaga, J. F. Naughton, and C. Zhang,'Generating Synthetic Complex-structured XML Data.' In Proceedings of the Fourth International Workshop on the Weh and Databases, WebDB, 2001
  4. 'Concurrent Versions System(CVS),' Free Software Foundation, http://www.gnu.org/manual/cvs-1.9
  5. Curbera and D. A. Epstein,'Fast Difference and Update of XML Documents,' XTech '99, San Jose, March 1999
  6. D. A. Kim and S. K. Lee, 'Efficient Change Detection in Tree Structured Data,' In Human.Society@Internet 2003, pp.675-681, 2003
  7. D.T. Bamard, G. Clarke and N. Duncan, 'Tree-to-tree correction for document trees.' Technical Report, Department of Computing and Information Science Queen's University, Kingston Ontario, Canada, January 1995
  8. E. W. Myers,'An O(ND) Difference Algorithm and Its Variations,' Algorithmica, 1(2), pp.251-266, 1986 https://doi.org/10.1007/BF01840446
  9. G. Cobcna, S. Abiteboul and A. Marian,'Detecting Changes in XML Documents,' The 18th ICDE, 2002 https://doi.org/10.1109/ICDE.2002.994696
  10. K. Tai,'The tree to tree correction problem,' Journal of the ACM, 26(3), pp.422-433 , July 1979 https://doi.org/10.1145/322139.322143
  11. K. Zhang and D. Shasha,'Simple fast algorithms for the editing distance between trees and related problems,' SIAM Journal of Computing, 18(6), pp.1245-1262, 1979 https://doi.org/10.1137/0218082
  12. NIAGARA Query Engine, http://www.cs.wisc.edu/niagaral
  13. R. Rivest,'The MD4 Message Digest Algorithm,' MIT and RSA Data Security, Inc., April 1992
  14. R. Wagner and M. Fischer,'The string-to string connection problem,' Journal of the ACM, 21, pp.168-173, 19 https://doi.org/10.1145/321796.321811
  15. S. Chawathe and H. G. Molina,'Meaningful Change Detection in Structured Data,' In SIGMOD '97, pp.26-37, 1997 https://doi.org/10.1145/253260.253266
  16. S. Lu,'A tree-to- tree distance and its application to cluster analysis,' IEEE TPAMI. 1(2), pp.219-224, 1979
  17. S. M. Selkow,'The tree to-tree editing problem.' Information Processing Letters, 6, pp.184-186, 1977 https://doi.org/10.1016/0020-0190(77)90064-3
  18. XyDiff Tools, http://pauillac.inria.fr/cdrom/www/xycliff/index-eng.htm
  19. Xyleme Project, http://www.xyleme.com/en/
  20. Y. Wang, D. Dewitt and J. Cai,'X-Diff: An effective change detection algorithm for XML Documents,' in 19th ICDE, India, March 2003 https://doi.org/10.1109/ICDE.2003.1260818