DOI QR코드

DOI QR Code

Pre-arrangement Based Task Scheduling Scheme for Reducing MapReduce Job Processing Time

MapReduce 작업처리시간 단축을 위한 선 정렬 기반 태스크 스케줄링 기법

  • Park, Jung Hyo (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus) ;
  • Kim, Jun Sang (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus) ;
  • Kim, Chang Hyeon (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus) ;
  • Lee, Won Joo (Dept. of Computer Science, Inha Technical College) ;
  • Jeon, Chang Ho (Dept. of Computer Science & Engineering, Hanyang University ERICA Campus)
  • 박정효 (한양대학교 컴퓨터공학과) ;
  • 김준상 (한양대학교 컴퓨터공학과) ;
  • 김창현 (한양대학교 컴퓨터공학과) ;
  • 이원주 (인하공업전문대학 컴퓨터정보과) ;
  • 전창호 (한양대학교 컴퓨터공학과)
  • Received : 2013.10.23
  • Accepted : 2013.11.20
  • Published : 2013.11.29

Abstract

In this paper, we propose pre-arrangement based task scheduling scheme to reduce MapReduce job processing time. If a task and data to be processed do not locate in same node, the data should be transmitted to node where the task is allocated on. In that case, a job processing time increases owing to data transmission time. To avoid that case, we schedule tasks into two steps. In the first step, tasks are sorted in the order of high data locality. In the second step, tasks are exchanged to improve their data localities based on a location information of data. In performance evaluation, we compare the proposed method based Hadoop with a default Hadoop on a small Hadoop cluster in term of the job processing time and the number of tasks sorted to node without data to be processed by them. The result shows that the proposed method lowers job processing time by around 18%. Also, we confirm that the number of tasks allocated to node without data to be processed by them decreases by around 25%.

본 논문에서는 MapReduce 작업처리시간을 줄일 수 있는 선 정렬 기반 태스크 스케줄링 기법을 제안한다. 태스크와 그 태스크가 처리할 데이터가 동일 노드에 존재하지 않으면 해당 태스크는 다른 노드로부터 데이터를 전송받아 처리한다. 이때 전송시간으로 인해 MapReduce의 작업처리시간이 증가하는 문제점이 발생한다. 이러한 문제점을 해결하기 위해 본 논문에서는 두 단계로 태스크를 스케줄링한다. 첫 번째 단계에서는 데이터 지역성이 높은 순으로 태스크를 노드 리스트에 정렬한다. 두 번째 단계에서는 데이터의 위치정보를 이용하여 태스크들이 데이터 지역성을 높일 수 있도록 교환하여 스케줄링한다. 본 논문에서는 제안한 스케줄링 기법의 성능평가를 위해 소규모 Hadoop 클러스터를 구현하여 실험하였다. 제안한 기법을 적용하였을 때 작업처리시간이 약 18% 감소하였으며 데이터가 저장된 노드에 할당되지 않은 태스크 수는 약 25% 감소하였다.

Keywords

References

  1. Microsoft Azure, http://www.microsoft.com/windowsazure/Whitepapers/introducingwindowsazureplatform.
  2. KT Ucloud,. http://home.ucloud.olleh.com/guide/guide.kt.
  3. Google App Engine, https://developers.google.com/appengine/docs/whatisgoogleappengine.html.
  4. K. Lee, H. Choi, B. Moon, Y. Lee, and Y. Chung, "Parallel Data Processing with MapReduce : A Survey," In Proceedings of ACM SIGMOD, Vol . 4, Issue 3, pp. 11-20, Dec. 2012.
  5. J. Dean and S. Ghemawat, "MapReduce :Simplified Data Processing on Large Clusters," In Proceeding of the 6th USENIX Symposium on Operating Systems Design and Implementation, pp. 107-113, Jan. 2008.
  6. J. Lee, H. Yu, E. Lee, "Data Replication Technique for Improving Data Locality of MapReduce," In Proceeding of the KIISE Korea Computer Congress 2012, Vol. 39, No. 1(A), pp. 218-220, Jun. 2012.
  7. C. L. Abad, Y. Lu, and R. H. Campbell "DARE:Adaptive Data Replication for Efficient Cluster Scheduling," IEEE CLUSTER, 2011 IEEE International Conference, pp. 159-168, Sep. 2011.
  8. J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters," The 24th IEEE International Symposium on Parallel&Distributed processing:Workshops and Phd Forum, pp. 1-9, April 2010.
  9. C. Tian, H. Zhou, Y. He, and L. Zha, "A Dynamic Scheduler for Heterougeneous Workloads," The 8th International Conference on Grid and Cooperative Computing, pp. 218-224, Aug. 2009.
  10. X. Zhang, Y. Feng, S. Feng, J. Fan and M. Zhong, "An Effective Data Locality Aware Task Scheduling Method for MapReduce Framework in Heterogeneous Environments," In Proceedings of the Internatinal Conference on Cloud and Service Computing, pp. 235-242, Dec. 2011.
  11. Z. Guo, G. Fox, and M. Zhou, "Investigation of Data Locality in MapReduce," In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cludster, Cloud and Grid Computing, pp.419-426, May 2012.
  12. M. Zaharia, A. Konwinski, A. D. Joseph, R.Katz, and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," In Proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Vol. 8, No. 4, pp. 29-42, Dec. 2008.
  13. O. O'Malley, "TeraByte Sort on Apache Hadoop", Yahoo, available online at: http://sortbenchmark.org/Yahoo-Hadoop.pdf, May 2008.