DOI QR코드

DOI QR Code

An Index-Based Approach for Subsequence Matching Under Time Warping in Sequence Databases

시퀀스 데이터베이스에서 타임 워핑을 지원하는 효과적인 인덱스 기반 서브시퀀스 매칭

  • Published : 2002.04.01

Abstract

This paper discuss an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, Kim et al. suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multidimensional index using a feature vector as indexing attributes. For query processing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verify the superiority of our approach, we perform extensive experiments. The results reveal that our approach achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

본 논문에서는 대용량 시퀀스 데이터베이스에 타임 워핑을 지원하는 인덱스 기반 서브시퀀스 매칭에 관하여 논의한다. 타임 워핑은 시퀀스의 길이가 서로 다른 경우에도 유사한 패턴을 갖는 시퀀스들을 찾을 수 있도록 해준다. 최근의 연구에서 타임 워핑을 지원하는 효과적인 전체 매칭 기법을 제안된바 있다. 이 기법은 데이터 시퀀스들로부터 타임 워핑에 영향을 받지 않는 특징 벡터들의 집합을 대상으로 인덱스를 구성한다. 또한, 특징 공간상에서의 필터링을 위하여 삼각형 부등식을 만족하는 타임 워핑 거리의 하한 함수를 사용한다. 본 연구에서는 이 기존의 연구에 슬라이딩 윈도우를 기반으로 하는 접두어-질의 방법을 결합하는 새로운 기법을 제안한다. 인덱싱을 위하여 각 슬라이딩 윈도우와 대응되는 서브 시퀀스로부터 특징 벡터를 추출하고, 이 특징 벡터를 인덱싱 애트리뷰트로 사용하는 다차원 인덱스를 구성한다. 질의 처리를 위하여, 조건을 만족하는 질의 접두어들에 대한 특징 벡터들을 이용하여 다수의 인덱스 검색을 수행한다. 제안된 기법은 대용량의 데이터베이스에서도 효과적인 서브시퀀스 매칭을 지원한다. 본 연구에서는 제안된 기법이 착오 기각을 유발시키지 않음을 증명한다. 제안된 기법의 우수성을 규명하기 위하여 다양한 실험을 수행한다. 실험 결과에 따르면, 제안된 기법은 실제 S&P 500 주식 데이터와 대용량의 생성 데이터 모두에 대하여 큰 성능 개선 효과를 보이는 것으로 나타났다.

Keywords

References

  1. R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Databases,' In Proc. Int'l. Conference on Foundations of Data Organization and Algorithms, FODO, pp.69-84, Oct., 1993 https://doi.org/10.1007/3-540-57301-1_5
  2. R. Agrawal et al., 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Data-bases,' Proc. Int'l conference on Very Large Data Bases, VLDB, pp.490-501, Sept., 1995
  3. N. Beckmann et al., 'The $R^{\ast}-tree$ : An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.322-331, Sept., 1995 https://doi.org/10.1145/93597.98741
  4. D. J. Berndt and J. Clifford, 'Finding Patterns in Time Series:A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1996
  5. S. Berchtold, D. A. Keim, and H.-P. Keriegel, 'The X-tree:An Index Structure for High-Dimensional Data,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp.28-39, 1996
  6. K. K. W. Chu, and M. H. Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM PODS, pp.237-248, May, 1999 https://doi.org/10.1145/303976.304000
  7. G. Das, D. Gunopulos, and H. Mannila, 'Finding Similar Time Series,' In Proc. European Symp. on Principles of Data Mining and Knowledge Discovery, PKDD, pp.88-100, 1997 https://doi.org/10.1007/3-540-63223-9_109
  8. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, 'Fast Subsequence Matching in Time-series Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.419-429, May, 1994 https://doi.org/10.1145/191839.191925
  9. D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time-Series Data:constraint Specification Specification and Implementation,' Proc. Int'l. Conf. on Principles and Practice of Constraint Programming, CP, pp.137-153, Sept., 1995 https://doi.org/10.1007/3-540-60299-2_9
  10. I. Kamel and C. Faloutsos, 'On Packing R-trees,' In Proc. Int'l. Conf. on Information and Knowledge Management, ACM CIKM, pp.490-499, 1993 https://doi.org/10.1145/170088.170403
  11. I. Kamel and C. Faloutsos, 'Parallel R-trees,' Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.195-204, 1992 https://doi.org/10.1145/130283.130315
  12. S. W. Kim, S. H. Park, and W. W. Chu, 'An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proc. Int'l. conf. on Data Engineering, IEEE, pp. 607-614, 2001 https://doi.org/10.1109/ICDE.2001.914875
  13. L. Rabiner and H. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  14. W. K. Loh, S. W. Kim, and K. Y. Whang, 'Index Interpolation : A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases,' IEICE Trans. on Information and Systems, Vol.E84-D, No.1, pp. 76-86, Jan. 2001
  15. W. K. Loh, S. W. Kim, and K. Y. Whang, 'Index Interpolation : An Approach for Subsequence Matching Supporting Normalization Transform in Time-Series Databases,' In Proc. Int'l. Conf. on Information and Knowledge Management, ACM CIKM, 2000 https://doi.org/10.1145/354756.354856
  16. S. H. Park et al., 'Efficient Searches for Similar Subsequences of Difference Lengths in Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp.23-32, 2000 https://doi.org/10.1109/ICDE.2000.839384
  17. F. P. Preparata and M. Shamos, Computational Geometry : An Introductions, Springer-Verlag, 1995
  18. D. Rafiei and A. Mendelzon, 'Similarity-Based Queries for Time-Series Data,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.13-24, 1997 https://doi.org/10.1145/253260.253264
  19. K. S. Shim, R. Srikant, and R. Agrawal, 'High-dimensional Similarity Joins,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp.301-311, Apr., 1997 https://doi.org/10.1109/ICDE.1997.581814
  20. B. K. Yi, H. V. Jagadish, and C. Faloutos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp.201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778