DOI QR코드

DOI QR Code

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information

의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구

  • 이석룡 (한국외국어대학교 산업정보시스템공학부) ;
  • 이주홍 (인하대학교 컴퓨터공학부) ;
  • 전석주 (안산1대학 인터넷정보과)
  • Published : 2003.04.01

Abstract

One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

연속된 일차원 실수로 이루어진 시계열 데이터는 데이터 마이닝이나 데이터 웨어하우징과 같은 다양한 데이터베이스 응용 분야에서 연구되어져 왔다. 그러나 최근의 복잡한 비즈니스 환경에서, 다차원 데이터 시퀀스(multidimensional data sequence : MDS)는 일차원 시계열 데이터와 더불어 그 중요성이 더해가고 있다. 다차원 데이터 시퀀스의 예로써, 비디오 스트림은 색상과 질감 등의 속성들로 이루어진 다차원 공간상에서 MDS로 나타낼 수 있다. 본 논문에서는 패턴 유사성 검색에서 사용되는 효과적인 유사성 척도를 제시한다. 하나의 MDS는 여러 개의 세그먼트(segment)로 나누어지며, 각 세그먼트는 다양한 의미적인 특징들로 표현된다. 유사성 척도는 이러한 세그먼트에 대해서 정의되는데 이 척도를 사용하여 어떤 주어진 질의 시퀀스에 대하여 무관한 세그먼트들은 검색 대상에서 일차적으로 제외된다. 데이터 시퀀스와 질의 시퀀스 모두 세그먼트 단위로 분할되며, 질의 처리는 전체 시퀀스의 모든 데이터를 검색하지 않고 데이터 세그먼트와 질의 세그먼트의 특징을 비교하는 것을 기초로 하여 수행된다.

Keywords

References

  1. R. Agrawal, C. Faloutsos and A. Swami, 'Efficient Similarity Search in Sequence Databases, Proceedings of Foundations of Data Organizations and Algorithms(FODO),' Evanstone, Illinois, pp.69-84, October, 1993
  2. S. Berchtold, D. Kein and H. Kriegel, 'The X-tree : An Index Structure for High-Dimensional Data,' Proceedings of Int'I Conference on Very Large Data Bases, India, pp.28-39, 1996
  3. N. Beckmann, H. Kriegel, R. Schneider and B. Seeger, 'The $R^{\ast}$-tree : An Efficient and Robust Access Method for Points and Rectangles,' Proceedings of ACM SIGMOD Int'I Conference on Management of Data, New Jersey, pp.322-331, 1990 https://doi.org/10.1145/93597.98741
  4. C. Faloutsos, M. Ranganathan and Y. Manolopoulos, 'Fast Subsequence Matching in Time-Series Databases,' Proceedings of ACM SIGMOD Int'I Conference on Management of Data, Minneapolis, Minnesota, pp.419-429, 1994 https://doi.org/10.1145/191839.191925
  5. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang,B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele and P. Yanker, 'Query by Image and Video Content : The QBIS System,' IEEE Computer, Vol.28, No.9, pp.23-32, 1995 https://doi.org/10.1109/2.410146
  6. A. Guttman, 'R-trees : A Dynamic Index Structure for Spatial Searching,' Proceedings of ACM SIGMOD Int'I Conference on Management of Data, Boston, Massachusetts, pp.47-57, 1984 https://doi.org/10.1145/602259.602266
  7. A. Hampapur, R. Jain and T. Weymouth, 'Digital Video Segmentation,' ACM Multimedia, pp.357-364, 1994 https://doi.org/10.1145/192593.192699
  8. A. Hinneburg and D. A. Keim, 'An Efficient Approach to Clustering in Large Mulitmedia Databases in Noise,' Int'l Conference on Knowledge Discovery in Databases and Data Mining, New York, NY, pp.58-65, 1998
  9. D. L. Harnett and A. K. Soni, 'Statistical Methods for Business and Economics,' 4th Edition, Addison Wesley Publishing, 1991
  10. E. J. Keogh, K. Chakrabarti, S. Mehrotra and M. J. Pazzani, 'Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, pp.151-162, 2001 https://doi.org/10.1145/375663.375680
  11. S. L. Lee and C. W. Chung, 'Hyper-Rectangle Based Segmentation and Clustering of Large Video Data Sets,' Information Science, Vol.141, No.1-2, pp.139-168, 2002 https://doi.org/10.1016/S0020-0255(01)00195-5
  12. S. L. Lee, S. J. Chun, D. H. Kim, J. H. Lee and C. W. Chung, 'Similarity Search for Multimensional Data Sequences,' Proceedings of IEEE Int'l Conference on Data Engineering, San Diego, California, pp.599-608, 2000
  13. D. Rafiei, 'On Similarity Queries for Time Series Data,' Proceedings of Int'l Conference on Data Engineering, Sydney, Australia, pp.410-417, 1999
  14. D. Rafiei and A. Mendelzon, 'Similarity-Based Queries for Time Series Data,' Proceedings of ACM SIGMOD Int'l Conference on Management of Data, Tucson, Arizona, pp.13-25, 1997 https://doi.org/10.1145/253260.253264
  15. T. Sellis, N. Roussopoulos and C. Faloutsos, 'The R+ Tree : A Dynamic Index for Multi-Dimensional Objects,' Proceedings of Int'l Conference on Very Large Data Bases, England, pp.507-518, 1987
  16. B. K. Yi and C. Faloutsos, 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' Proceedings of Int'l Conference on Very Large Data Bases, pp.385-394,2000
  17. H. J. Zhang, J. Wu, D. Zhong and S. W. Smoliar, 'An Integrated System for Content-Based Video Retrieval and Browsing, Pattern Recognition,' Vol.30, pp.643-653, 1997 https://doi.org/10.1016/S0031-3203(96)00109-4