DOI QR코드

DOI QR Code

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases

시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색

  • 원정임 (한림대학교 대학원 컴퓨터공학과) ;
  • 윤지희 (한림대학교 정보통신공학부) ;
  • 김상욱 (한양대학교 정보통신공학부) ;
  • 박상현 (포항공과대학교 컴퓨터공학과)
  • Published : 2003.08.01

Abstract

The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

모양 기반 검색이란 실제 요소 값과 관계없이 질의 시퀀스와 유사한 모양을 갖는 시퀀스(서브시퀀스)를 데이터베이스 내에서 검색하여 내는 연산이다. 본 논문에서는 시계열 데이터베이스에서의 모양 기반 검색을 위한 유연성 있는 새로운 유사 모델을 정의하고, 이 유사 모델을 지원하기 위한 인덱싱 및 질의 처리 방안을 제시한다. 제안된 유사 모델에서는 정규화, 이동 평균, 타임 워핑 등 다양한 변환을 지원한다. 특히 최종 유사 정도를 계산하기 위하여 사용되는$L_p$거리 함수론 사용자가 임의로 지정하도록 함으로써 응용에서 선호하는 유사 모델을 반영할 수 있다. 또한 이러한 모양 기반 검색을 효과적으로 지원하기 위한 압축된 서브시퀀스 트리 구조를 제안하고, 이를 기반으로 하는 효율적인 질의 처리 기법을 제시한다. 실험 결과에 의하면 제안된 기법은 진의 시퀀스와 모양이 유사한 서브시퀀스들을 사용자에 의하여 선택된 거리 함수를 사용하여 성공적으로 검색할 뿐 아니라, 순차 검색과 비교하여 거리 함수 선택에 따라 수 십배에서 수 백배까지의 성능 개선 효과를 갖는 것으로 나타났다.

Keywords

References

  1. R. Agrawal, C. Faloutsos and A. Swami, 'Efficient Similarity Search in Sequence Databases', In Proc. Int'l. Conference on Foundations of Data Organization and Algorithms, FODO, pp.69-84, Oct., 1993 https://doi.org/10.1007/3-540-57301-1_5
  2. R. Agrawal et al., 'Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases', Proc. Int'l Conference on Very Large Data Bases, VLDB, pp.490-501, Sept., 1995
  3. C. Faloutsos, M. Ranganathan and Y. Manolopoulos, 'Fast Subsequence Matching in Time-series Databases', In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 419-429, May, 1994 https://doi.org/10.1145/191839.191925
  4. D. Rafiei and A. Mendelzon, 'Similarity-Based Queries for Time-Series Data', In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.13-24, 1997 https://doi.org/10.1145/253260.253264
  5. D. Rafiei, 'On Similarity-Based Queries for Time Series Data', Proc. IEEE Intl. Conf. on Data Engineering, pp. 410-417, 1999
  6. K. K. W. Chu and M. H. Wong, 'Fast Time-Series Sea-rching with Scaling and Shifting', In Proc. Int'l Symp. on Principles of Database Systems, ACM PODS, pp. 237-248, May, 1999 https://doi.org/10.1145/303976.304000
  7. D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time-Series Data , Constraint Specification and Implementation,' In Proc. Int'l Conf. on Principles and Practice of Constraint Programming, CP, pp.137-153, Sept., 1995 https://doi.org/10.1007/3-540-60299-2_9
  8. Y. S. Moon, K. Y. Whang, and W. K. Loh, 'Duality-Based Subsequence Matching in Time-Series Databases', In Proc. Int'l Conf. on Data Engineeing, IEEE ICDE, pp.263-272, 2001 https://doi.org/10.1109/ICDE.2001.914837
  9. G. Das, D. Gunopulos and H. Mannila, 'Finding Similar Time Series', In Proc. European Symp. on Principles of Data Mining and Knowledge Discovery, PKDD, pp.88-100, 1997 https://doi.org/10.1007/3-540-63223-9_109
  10. W. K. Loh, S. W. Kim and K. Y. Whang, 'Index Interpolation : A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time Series Databases', IEICE Trans. on Information and Systems, Vol.E84-D, No.1, pp.76-86, 2001
  11. W. K. Loh, S. W. Kim and K.Y.Whang, 'Index Interpolation : An Approach for Subsequence Matching Supporting Normalization Transform in Time-Series Databases', In Proc. ACM Intl. Conf. on Information and Knowledge Management(ACM CIKM), pp.480-487, 2000
  12. D. J. Berndt and J. Clifford, 'Finding Patterns in Time Series : A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, pp.229-248, 1996
  13. B. K. Yi, H. V. Jagadish, and C. Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping', In Proc. Int'l Conf. on Data Engineering, IEEE, pp.201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778
  14. S. H. Park et al., 'Efficient Searches for Similar Subsequences of Difference Lengths in Sequence Databases', In Proc. Int'l Conf. on Data Engineering, IEEE, pp.23-32, 2000 https://doi.org/10.1109/ICDE.2000.839384
  15. S. W. Kim, S. H. Park and W. W. Chu, 'An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases', Proc. Intl. Conf. on Data Engineering, IEEE, pp.607-614, 2001 https://doi.org/10.1109/ICDE.2001.914875
  16. S. H. Park, S. W. Kim, J. S. Cho andS. Padmanabhan, 'Prefix-Querying : An Approach for Effiective Subsequence Matching Under Time Warping in Sequence Databases', In Proc ACM Intl. Conf. on Information and Knowledge Management(ACM CIKM), pp.255-262, 2001 https://doi.org/10.1145/502585.502629
  17. N. Beckmann et al., 'The $R^*-tree$: An Efficient and Robust Access Method for Points and Rectangles', In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.322-331, 1990-05-00 https://doi.org/10.1145/93597.98741
  18. F. P. Preparata and M. Shamos, Computational Geometry : An Introduction, Springer-Verlag, 1985
  19. R. Agrawal et al., 'Querying Shapes of Histories,' In Proc. Int'l Conference on Very Large Data Bases, VLDB, pp.502-514, Sept., 1995
  20. C. S. Perng et al., 'Landmarks : A New Model for Similarity-Based Pattern Querying in Time Series Databases', In Proc. Int'l. Conf. on Data Engineering, IEEE, pp.33-42, 2000
  21. M. Kendall, Time-Series, 2nd Edition, Charles Griffin and Company, 1979
  22. C. Chatfield, The Analysis of Time-Series : An Introduction, 3rd Edition, Chapman and Hall, 1984
  23. K. S. Shim, R. Srikant and R.Agrawal, 'High-dimensional Similarity Joins', In Proc. Int'l. Conf. on Data Engineering, IEEE, pp.301-311, 1997-04-00 https://doi.org/10.1109/ICDE.1997.581814
  24. L. Rabiner and H. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  25. N. D. Sidiropoulos and R. Bros, 'Mathematical Programming Algorithms for Regression-Based Non-Linear Filtering in $R^N$', IEEE Trans. on Signal Processing, Mar., 1999
  26. B. K.Yi, and C. Faloutsos, 'Fast Time Sequence Indexing for Arbitrary $L_p$ Norms', In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp.385-394, 2000
  27. G. A. Stephen, String Searching Algorithms, World Scientific Publishing, 1994
  28. S. W. Kim, J. H. Yoon, S. H. Park, T. H. Kim, 'Shape-Based Retrieval of Similar Subsequences in Time-Series Databases', In Proc. ACM Intl. Symp. on Applied Computing(ACM SAC), pp.438-445, 2002