DOI QR코드

DOI QR Code

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling

데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상

  • 황인성 (인하대학교 정보공학과) ;
  • 정경용 (상지대학교 컴퓨터정보공학부) ;
  • 임기욱 (선문대학교 컴퓨터정보공학부) ;
  • 이정현 (인하대학교 컴퓨터정보공학부)
  • Received : 2010.09.28
  • Accepted : 2010.10.01
  • Published : 2010.10.28

Abstract

Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.

Keywords

Map/Reduce;Cloud Computing;Predict Performance;Hadoop

Acknowledgement

Supported by : 정보통신산업진흥원

References

  1. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," In the Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp.107-113, 2004.
  2. C. Tian, H. Zhou, Y. He, and L. Zha, "A Dynamic Scheduler for Heterogeneous Workloads," The 8th International Conference on Grid and Cooperative Computing, pp.218-224, 2009. https://doi.org/10.1109/GCC.2009.19
  3. J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, and I. Whalley, "Performance-Driven Task Co-Scheduling for MapReduce Environments," The 12th IEEE/IFIP Network Operations and Management Symposium, pp.373-380, 2010. https://doi.org/10.1109/NOMS.2010.5488494
  4. K. Morton, A. Friesen, M. Balazinska, and D. Grossman, "Estimating the Progress of MapReduce Pipelines," 26th IEEE International Conference on Data Engineering, pp.681-684, 2010. https://doi.org/10.1109/ICDE.2010.5447919
  5. J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-9, 2010. https://doi.org/10.1109/IPDPSW.2010.5470880
  6. J. Shafer, S. Rixner, and A. L. Cox, "The Hadoop Distributed Filesystem: Balancing Portability and Performance," The 11th IEEE International Symposium on Performance Analysis of Systems and Software, pp.122-133, 2010. https://doi.org/10.1109/ISPASS.2010.5452045
  7. Z. Vrba, P. Halvorsen, C. Griwodz, and P. Beskow, "Kahn Process Networks are a Flexible Alternative to MapReduce," The 11th IEEE International Conference on High Performance Computing and Communications, pp.154-162, 2009. https://doi.org/10.1109/HPCC.2009.46
  8. S. H. Kang and D. A. Bader, "Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System," The 24th IEEE International Symposium on Parallel & Distributed Processing: Workshops and Phd Forum, pp.1-8, 2010. https://doi.org/10.1109/IPDPSW.2010.5470691
  9. http://lucene.apache.org/hadoop
  10. T. White, Hadoop: The Definitive Guide, O'Reilly, 2009.
  11. http://hadoop.apache.org/common/docs/ r0.20.2/ mapred_tutorial.html