An Algorithm for Computing Range-Groupby Queries

영역-그룹화 질의 계산 알고리즘

  • Lee, Yeong-Gu (Dept. of Electronic Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Mun, Yang-Se (Dept. of Electronic Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Hwang, Gyu-Yeong (Dept. of Electronic Computer Science, Korea Advanced Institute of Science and Technology)
  • 이영구 (한국과학기술원 전자전산학과) ;
  • 문양세 (한국과학기술원 전자전산학과) ;
  • 황규영 (한국과학기술원 전자전산학과)
  • Published : 2002.08.01

Abstract

Aggregation is an important operation that affects the performance of OLAP systems. In this paper we define a new class of aggregation queries, called range-groupby queries, and present a method for processing them. A range-groupby query is defined as a query that, for an arbitrarily specified region of an n-dimensional cube, computes aggregations for each combination of values of the grouping attributes. Range-groupby queries are used very frequently in analyzing information in MOLAP since they allow us to summarize various trends in an arbitrarily specified subregion of the domain space. In MOLAP applications, in order to improve the performance of query processing, a method of maintaining precomputed aggregation results, called the prefix-sum array, is widely used. For the case of range-groupby queries, however, maintaining precomputed aggregation results for each combination of the grouping attributes incurs enormous storage overhead. Here, we propose a fast algorithm that can compute range-groupby queries with minimal storage overhead. Our algorithm maintains only one prefix-sum away and still effectively processes range-groupby queries for all possible combinations of the grouping attributes. Compared with the method that maintains a prefix-sum array for each combination of the grouping attributes in an n-dimensional cube, our algorithm reduces the space overhead by (equation omitted), while accessing a similar number of cells.

References

  1. Codd, E.F., Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate, Technical Report, E.F. Codd and Associates, 1993
  2. Chaudhuri, S. and Dayal, U., 'An Overview of Data Warehousing and OLAP Technology,' ACM SIGMOD Record, Vol. 26, No.1, pp. 65-74, Mar. 1997 https://doi.org/10.1145/248603.248616
  3. Agarwal, S., Agrawal, R., Deshpande, P.M. et al., 'On the Computation of Multidimensional Aggregations,' In Proc. Int'l Conf. on Very Large Data Bases, pp. 506-521, Mumbai(Bombay), India, Sept. 1996
  4. Chan, C. - Y. and Ioannidis, Y.E., 'Hierarchical Cubes for Range-Sum Queries,' In Proc. Int'l Conf. on Very Large Data Bases, pp. 675-686, Edinburgh, Scotland, 1999
  5. Geffner, S., Agrawal, D., Abhadi, A. EI, and Smith, T., 'Relative Prefix Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes,' In Proc. Int'l Conf. on Data Engineering, pp. 328-335, Sydney, Australia, Mar. 1999
  6. Ho, C.-T., Agrawal, R., Megiddo, N., and Srikant R., 'Range Queries in OLAP Data Cubes,' In Proc. Int'l Conf. on Management of Data, pp. 73-88, ACM SIGMOD, Tucson, Arizona, June 1997 https://doi.org/10.1145/253260.253274
  7. Agrawal, R, Gupta, A, and Sarawagi, S., 'Modeling Multidimensional Databases,' In Proc. Int'l Conf. on Data Engineering, pp. 232-243, Birmingham, U.K., Apr. 1997
  8. Gray, J' Bosworth, A, Layman, A, and Pirahesh, H., 'Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tabs, and Subtotals,' In Proc. Int'l Conf. on Data Engineering, pp. 152-159, New Orleans, Louisiana, Feb. 1996
  9. Chaudhuri, S., Krishnamurthy, S., Potamianos, S., and Shim, K., 'Optimizing Queries with Materialized Views,' In Proc. Int'l Conf. on Data Engineering, pp, 190-200, Taipei, Mar. 1995 https://doi.org/10.1109/ICDE.1995.380392
  10. Harinarayan, V., Rajaraman, A, and Ullman, J.D., 'Implementing Data Cubes Efficiently,' In Proc. Int'l Conf. on Management of Data, pp. 205-216, ACM SIGMOD, Montreal, Quebec, Canada, June 1996 https://doi.org/10.1145/233269.233333
  11. Mumick, I.S., Quass, D., and Mumick, B.S., 'Maintenance of Data Cubes and Summary Tables in a Warehouse,' In Proc. Int'l Conf. on Management of Data, pp. 100-111, ACM SIGMOD, Tucson, Arizona, June 1997 https://doi.org/10.1145/253260.253277
  12. Zhao, Y., Deshpande, P.M., and Naughton, J,F., 'An Array-Based Algorithm for Simultaneous Multidimensional Aggregates,' In Proc. Int'l Conf. on Management of Data, pp. 159-170, ACM SIGMOD, Tucson, Arizona, June 1997
  13. Knuth, D.E., The Art of Computer Programming, Volume 1: Fundamental Algorithms, 3rd ed., Addison-Wesley, 1997