An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce

맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘

  • Lee, Suan (Department of Computer Science, Kangwon National University) ;
  • Jo, Sunhwa (Department of Computer Science, Kangwon National University) ;
  • Kim, Jinho (Department of Computer Science, Kangwon National University)
  • 이수안 (강원대학교 IT대학 컴퓨터과학과) ;
  • 조선화 (강원대학교 IT대학 컴퓨터과학과) ;
  • 김진호 (강원대학교 IT대학 컴퓨터과학과)
  • Published : 2012.12.30

Abstract

Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.

최근 데이터의 폭발적인 증가로 인해 대규모 데이터의 분석에 대한 요구를 충족할 수 있는 방법들이 계속 연구되고 있다. 본 논문에서는 맵리듀스를 이용한 분산 병렬 처리를 통해 대규모 데이터 큐브의 효율적인 계산이 가능한 MRIterativeBUC 알고리즘을 제안하였다. MRIterativeBUC 알고리즘은 기존의 BUC 알고리즘을 맵리듀스의 반복적 단계에 따른 효율적인 동작이 가능하도록 개발되었고, 기존의 대규모 데이터 큐브 계산에 따른 문제인 데이터 크기와 저장 및 처리 능력의 한계를 해결하였다. 또한, 분석자의 관심 부분에 대해서만 계산하는 빙산 큐브 개념의 도입과 파티셔닝, 정렬과 같은 큐브 계산을 분산 병렬 처리하는 방법 등의 장점들을 통해 데이터 방출량을 줄여서 네트워크 부하를 줄이고, 각 노드의 처리량을 줄이며, 궁극적으로 전체 큐브 계산 비용을 줄일 수 있다. 본 연구 결과는 맵리듀스를 이용한 데이터 큐브 계산에 대해서 상향식 처리와 반복적 알고리즘을 통해 다양한 확장이 가능하며, 여러 응용 분야에서 활용이 가능할 것으로 예상된다.

Keywords

References

  1. Gray, J., et al., "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," In Proc. Conf. on Data Engineering, New Orleans, LA, pp. 152-199, Feb. 1996.
  2. Harinarayan, V., Rajaraman, A., and Ullman, J. D., "Implementing Data Cubes Efficiently," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Montreal, Canada, pp. 205-216, June, 1996.
  3. Agarwal, S., et al., "On the Computation of Multidimensional Aggregates," In Proc. the 22nd Int'l Conf. on Very Large Data Bases, pp. 506-521, Sept. 1996.
  4. Ross, K. A. and Srivastava, D., "Fast Computation of Sparse Datacubes," In Proc. the 23rd Int'l Conf. on Very Large Data Bases, pp. 116-125, Aug. 1997.
  5. Li, X., Han, J., and Gonzalez, H., "High-dimensional OLAP: A Minimal Cubing Approach," In Proc. the 30th Int'l Conf. on Very Large Data Bases, Toronto, Canada, pp. 528-539, Aug. 2004.
  6. Beyer, K. and Ramakrishnan, R., "Bottom-Up Computation of Sparse and Iceberg Cubes," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Philadelphia, PA, pp. 359-370, June 1999.
  7. Han, J., Pei, J., Dong, G., and Wang, K., "Efficient Computation of Iceberg Cubes with Complex Measures," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 1-12, June 2001.
  8. Shao, Z., Han, J., and Xin, D., "MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space," In Proc. 16th Int'l Conf. of Scientfic and Statistical Database Management, p. 213, June 2004.
  9. Xin, D., Han, J., Li, X., and Wah, B W., "Star-cubing: Computing Iceberg Cubes by Top-down and Bottomup Integration," In Proc. 29th Int'l Conf. of Very Large Data Bases, Berlin, Germany, pp. 476-487, Sept, 2003.
  10. Dean, J. and Ghemawat, S., "MapReduce: Simplified Data Processing on Large Clusters," Communication of the ACM, Vol. 51, No. 1, pp. 107-113, Jan. 2008.
  11. Ying, C., Frank, D., Todd, E., and Andrew, R.-C., "Parallel ROLAP Data Cube Construction on Shared- Nothing Multiprocessors," Distributed and Parallel Databases, Vol. 15, No. 3, pp. 219-236, May 2004. https://doi.org/10.1023/B:DAPD.0000018572.20283.e0
  12. Frank, D., Todd, E., and Andrew, R.-C., "The cgm- CUBE Project: Optimizing Parallel Data Cube Generation for ROLAP," Distributed and Parallel Databases, Vol. 19, No. 1, pp. 29-62, Jan. 2006. https://doi.org/10.1007/s10619-006-6575-6
  13. Chen, Y., Dehne, F. A. A. Eavis, T., and Rau-Chaplin, A., "PnP: Parallel And External Memory Iceberg Cube Computation," Distributed and Parallel Databases, Vol. 23, No. 2, Apr. 2008.
  14. T. Ng, R., Wagner, A., and Yin, Y., "Iceberg-cube computation with PC clusters," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 25-36, June, 2001.
  15. Agrawal, R., Imielinski, T., and Swami, A., "Mining Association Rules between Sets of Items in Large Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washinton, D.C., pp. 207- 216, June 1993.
  16. Jinguo, Y., Jianging, X., Pingjian, Z., and Hu, C., "A Parallel Algorithm for Closed Cube Computation," In Proc. 7th Int'l Conf. on Computer and Information Science, Portland, OR, pp. 95-99, May 2008.
  17. Yuxiang, W., Aibo, S., and Junzhou, L., "A Map- ReduceMerge-based Data Cube Construction Method," In Proc. 9th Int'l Conf. on Grid and Cooperative Computing, Nanjing, China, pp. 1-6, Nov. 2010.
  18. Suan, L., Yang-Sae, M., and Jinho, K., "Distributed Parallel Top-Down Computation of Data Cube using MapReudce," In Proc. 3rd Int'l Conf. on Emerging Databases, Incheon, Korea, pp. 303-306, Aug. 2011.
  19. Arnab, N., Cong, Y., Philip, B., and Raghu, R., "Distrubuted Cube Materialization on Holistic Measures," In Proc. 27th Int'l Conf. on Data Engineering, Hannover, Germany, pp. 183-194, April 2011.
  20. Ghemawat, S., Gobioff, H., and Leung, S. T., "The Google File System," In Proc. 19th on Operating Systems Principles, Bolton Landing, NY, pp. 29-43, Dec. 2003.
  21. Hadoop, http://hadoop.apache.org/
  22. HDFS, http://hadoop.apache.org/hdfs/
  23. Kim, H. and Kim, I. K. "A study on utilizing Technical Reference Model by applying ontology and visualization," Journal of Information Technology and Architecture, Vol. 8. No. 4, pp. 347-360, 2011.
  24. Lee, W., Leung, C. S., and Lee, J. J., "Mobile web navigation in digital ecosystems using rooted directed trees," IEEE Transactions on Industrial Electronics, 58(6), 2154-2162, 2011. https://doi.org/10.1109/TIE.2010.2050292
  25. Lee, W. and Lim, T. "Architectural measurements on the world wide web as a graph," Journal of Information Technology and Architecture, 4(1), 61-69, 2007.