DOI QR코드

DOI QR Code

Transformation of Continuous Aggregation Join Queries over Data Streams

  • Tran, Tri Minh (Department of Computer Science, University of Vermont) ;
  • Lee, Byung-Suk (Department of Computer Science, University of Vermont)
  • Published : 2009.03.31

Abstract

Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.

References

  1. ABADI, D. J., D. CARNEY, U. CETINTEMEL, M. CHERNIACK, C. CONVEY, S. LEE, M. STONEBRAKER, N. TATBUL, AND S. B. ZDONIK. 2003. Aurora: a new model and architecture for data stream management. The VLDB Journal 12:120-139. https://doi.org/10.1007/s00778-003-0095-z
  2. ARASU, A. AND G. S. MANKU. 2004. Approximate counts and quantiles over sliding windows. In: Proceedings of the 23th Symposium on Principles of Database Systems. ACM Press 286-296. https://doi.org/10.1145/1055558.1055598
  3. ARASU, A. AND J. WIDOM. 2004. Resource sharing in continuous sliding-window aggregates. In: Proceedings of the 30th International Conference on Very Large Data Bases. Morgan Kaufmann 336-347.
  4. AYAD, A. AND J. F. NAUGHTON. 2004. Static optimization of conjunctive queries with sliding windows over infinite streams. In: Proceedings of the 23rd International Conference on Management of Data. ACM Press 419-430. https://doi.org/10.1145/1007568.1007616
  5. BABCOCK, B., S. BABU, M. DATAR, R. MOTWANI, AND J. WIDOM. 2002. Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems. ACM Press 1-16. https://doi.org/10.1145/543613.543615
  6. BABCOCK, B., M. DATAR, AND R. MOTWANI. 2004. Load shedding for aggregation queries over data streams. In: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society 350. https://doi.org/10.1109/ICDE.2004.1320010
  7. BABU, S., A. ARASU, AND J. WIDOM. 2003. CQL: A language for continuous queries over streams and relations. In: Proceedings of the 8th International Symposium on Database Programming Languages. Springer 1-19. https://doi.org/10.1007/978-3-540-24607-7_1
  8. BAI, Y., H. THAKKAR, H. WANG, C. LUO, AND C. ZANIOLO. 2006. A data stream language and system designed for power and extensibility. In: Proceedings of the 15th International Conference on Information and Knowledge Management. ACM Press 337-346. https://doi.org/10.1145/1183614.1183664
  9. CHANDRASEKARAN, S., O. COOPER, A. DESHPANDE, M. J. FRANKLIN, J. M. HELLERSTEIN, W. HONG, S. KRISHNAMURTHY, S. R. MADDEN, F. REISS, M. A. SHAH, AND C. Q. TELEGRAPH. 2003. continuous dataflow processing. In: Proceedings of the 22nd International Conference on Management of Data. ACM Press 668-668.
  10. CHAUDHURI, S. AND K. SHIM. 1994. Including group-by in query optimization. In: Proceedings of the 20th International Conference on Very Large Data Bases. Morgan Kaufmann 354-366.
  11. CHEN, J., D. J. DEWITT, F. TIAN, Y. WANG, AND C. Q. NIAGARA. 2000. a scalable continuous query system for internet databases. In: Proceedings of the 19th International Conference on Management of Data. ACM Press 379-390.
  12. CONSIDINE, J., F. LI, G. KOLLIOS, AND J. W. BYERS. 2004. Approximate aggregation techniques for sensor databases. In: Proceedings of the 20th International Conference on Data Engineering. IEEE Computer Society 449-460.
  13. CRANOR, C., T. JOHNSON, O. SPATASCHEK, AND V. SHKAPENYUK. 2003. Gigascope: a stream database for network applications. In: Proceedings of the 22nd International Conference on Management of Data. ACM Press 647-651.
  14. DAS, A., J. GEHRKE, AND M. RIEDEWALD. 2003 Approximate join processing over data streams. In: Proceedings of the 22nd International Conference on Management of Data. ACM Press 40-51.
  15. DING, L. AND E. A. RUNDENSTEINER. 2004. Evaluating window joins over punctuated streams. In: Proceedings of the 13rd International Conference on Information and Knowledge Management. ACM Press 98-107.
  16. DOBRA, A., M. GAROFALAKIS, J. GEHRKE, AND R. RASTOGI. 2002. Processing complex aggregate queries over data streams. In: Proceedings of the 21st International Conference on Management of Data. ACM Press 61-72.
  17. GEHRKE, J., F. KORN, AND D. SRIVASTAVA. 2001. On computing correlated aggregates over continual data streams. SIGMOD Record 30:13-24.
  18. GHANEM, T. M., M. A. HAMMAD, M. F. MOKBEL, W. G. AREF, AND A. K. ELMAGARMID. 2007. Incremental evaluation of sliding-window queries over data streams. IEEE Transactions on Knowledge and Data Engineering 19:57-72. https://doi.org/10.1109/TKDE.2007.250585
  19. GILBERT, A. C., Y. KOTIDIS, S. MUTHUKRISHNAN, AND M. STRAUSS. 2001. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: Proceedings of the 27th International Conference on Very Large Data Bases. Morgan Kaufmann 79-88.
  20. GOLAB, L. AND M. T. OZSU. 2003. Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. ACM Press 500-511.
  21. GUHA, S. AND N. KOUDAS. 2002. Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society 567-579. https://doi.org/10.1109/ICDE.2002.994775
  22. HAMMAD, M. A., W. G. AREF, AND A. K. ELMAGARMID. 2003. Stream window join: Tracking moving objects in sensor-network databases. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management. 75-84.
  23. HAMMAD, M. A., M. F. MOKBEL, M. H. ALI, W. G. AREF, A. C. CATLIN, A. K. ELMAGARMID, M. ELTABAKH, M. G. ELFEKY, T. M. GHANEM, R. GWADERA, I. F. ILYAS, M. S. MARZOUK, AND X. XIONG. 2004. Nile: A query processing engine for data streams. In: Proceedings of the 20th International Conference on Data Engineering. IEEE Computer Society 851-863. https://doi.org/10.1109/ICDE.2004.1320080
  24. JIANG, Z., C. LUO, W. C. HOU, F. YAN, AND Q. ZHU. 2006. Estimating aggregate join queries over data streams using discrete cosine transform. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications. 182-192.
  25. KANG, J., J. F. NAUGHTON, AND S. D. VIGLAS. 2003. Evaluating window joins over unbounded streams. In: Proceedings of the 19th International Conference on Data Engineering. IEEE Computer Society 341-352. https://doi.org/10.1109/ICDE.2003.1260804
  26. LI, J., D. MAIER, K. TUFTE, V. PAPADIMOS, AND P. A. TUCKER. 2005. Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the 24th International Conference on Management of Data. ACM Press 311-322.
  27. MANJHI, A., S. NATH, AND P. B. GIBBONS. 2005. Tributaries and deltas: efficient and robust aggregation in sensor network streams. In: Proceedings of the 24th International Conference on Management of Data. ACM Press 287-298.
  28. MOTWANI, R., J. WIDOM, A. ARASU, B. BABCOCK, S. BABU, M. DATAR, G. S. MANKU, C. OLSTON, J. ROSENSTEIN, AND R. VARMA. 2003. Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the 1st International Conference on Innovative Data Systems Research. 22-34.
  29. SRIVASTAVA, U. AND J. WIDOM. 2004. Memory-limited execution of windowed stream joins. In: Proceedings of the 13th International Conference on Very Large Data Bases. Morgan Kaufmann 324-335.
  30. SULLIVAN, M. 1996. Tribeca: A stream database manager for network traffic analysis. In: Proceedings of 22th International Conference on Very Large Data Bases. Morgan Kaufmann 594-606.
  31. TATBUL, N. AND S. B. ZDONIK. 2006. Window-aware load shedding for aggregation queries over data streams. In: Proceedings of the 15th International Conference on Very Large Data Bases. ACM Press 799-810.
  32. TRAN, T. M. AND B. S. LEE. 2007. Transformation of continuous aggregation join queries over data streams. In: Proceedings of the 10th International Symposium on Advances in Spatial and Temporal Databases. 330-347.
  33. URHAN, T. AND M. J. FRANKLIN. 2000. Xjoin: A reactively-scheduled pipelined join operator. In: IEEE Data Engineering Bulletin. 27-33.
  34. VIGLAS, S., J. F. NAUGHTON, AND J. BURGER. 2003. Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of the 29th International Conference on Very Large Data Bases. ACM Press 285-296.
  35. VITTER, J. S. AND M. WANG. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 18th International Conference on Management of Data. ACM Press 193-204.
  36. YAN, W. P. AND P. Å. LARSON. 1994. Performing group-by before join. In: Proceedings of the 10th International Conference on Data Engineering. IEEE Computer Society 89:100. https://doi.org/10.1109/ICDE.1994.283001
  37. YAN, W. P. AND P. Å. LARSON. 1995. Eager aggregation and lazy aggregation. In: Proceedings of the 21st International Conference on Very Large Data Bases. Morgan Kaufmann 345-357.
  38. ZHANG, R., N. KOUDAS, B. C. OOI, AND D. SRIVASTAVA. 2005. Multiple aggregations over data streams. In: Proceedings of the 24th International Conference on Management of Data. ACM Press 299-310.