DOI QR코드

DOI QR Code

An Efficient Data Replacement Algorithm for Performance Optimization of MapReduce in Non-dedicated Distributed Computing Environments

비-전용 분산 컴퓨팅 환경에서 맵-리듀스 처리 성능 최적화를 위한 효율적인 데이터 재배치 알고리즘

  • 류은경 (충북대학교 정보통신공학부) ;
  • 손인국 (충북대학교 정보통신공학부) ;
  • 박준호 (충북대학교 정보통신공학부) ;
  • 복경수 (충북대학교 정보통신공학부) ;
  • 유재수 (충북대학교 정보통신공학부)
  • Received : 2013.07.03
  • Accepted : 2013.08.19
  • Published : 2013.09.28

Abstract

In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. MapReduce is an emerging programming model that processes large amount of data. However, since MapReduce evenly places the data in the dedicated distributed computing environment, it is not suitable to the non-dedicated distributed computing environment. The data replacement algorithms were proposed for performance optimization of MapReduce in the non-dedicated distributed computing environments. However, they spend much time for date replacement and cause the network load for unnecessary data transmission. In this paper, we propose an efficient data replacement algorithm for the performance optimization of MapReduce in the non-dedicated distributed computing environments. The proposed scheme computes the ratio of data blocks in the nodes based on the node availability model and reduces the network load by transmitting the data blocks considering the data placement. Our experimental results show that the proposed scheme outperforms the existing scheme.

Keywords

Non-dedicated Distributed Computing;MapReduce;Hadoop

References

  1. J. Dittrich and J. Quiane-Ruiz, Efficient big data processing in Hadoop MapReduce, Proc. of the VLDB Endowment, pp.2014-2014, 2012.
  2. I. Hwang, K. Jung, K. Im, and J. Lee, "Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling," Journal of the Korea Contents Association, Vol.10, No.10, pp.78-85, 2010. https://doi.org/10.5392/JKCA.10.10.078
  3. http://hadoop.apache.org.
  4. K. Shvachko, H. Huang, S. Radia, and R. Chansler, The Hadoop Distributed File System, Proc. of the IEEE Symposium on Massive Storage Systems, pp.1-10, 2010.
  5. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Magazine Communications of the ACM, Vol.51, Issue1, pp.107-113, 2008.
  6. D. Werthimer, J. Cobb, M. Lebofsky, D. Anderson, and E. Korpela, "SETI@HOME-Massively Distributed Computing for SETI," Journal of Computing Science and Engineering, Vol.3, No.1, pp.78-83, 2001.
  7. D. L. Eager, E.D. Lazowska, and J. Zahor-jan, "Adaptive Load Sharing in Homogeneous Distributed Systems," Journal of Software Engineering, Vol.12, No.5, pp.662-675, 1986.
  8. S. T. Leutenegger, X. H. Sun, Distributed Computing Feasibility in a Non-dedicated Homogeneous Distributed System, Proc. of the ACM/IEEE Conference on Supercomputing, pp.143-152, 1993.
  9. "SETI@home", http://setiathome.berkeley.edu
  10. H. Jin, X. Yang, X. H. Sun, and I. Raicu, ADAPT: Availability-Aware MapReduce Data Placement for Non-Dedicated Distributed Computing, Proc. of IEEE International Conference on Distributed Computing Systems, pp.516-525, 2012.
  11. S. Ghemawat, H. Gobioff, and S. Leung, The Google File System, Proc. of ACM Symposium on Operating Systems Principles, pp.29-43, 2003.