DOI QR코드

DOI QR Code

A Development Study of The VPT for the improvement of Hadoop performance

하둡 성능 향상을 위한 VPT 개발 연구

  • Yang, Ill Deung (Department of Computer & Information Engineering, Cheongju University) ;
  • Kim, Seong Ryeol (Department of Computer & Information Engineering, Cheongju University)
  • Received : 2015.07.02
  • Accepted : 2015.08.11
  • Published : 2015.08.20

Abstract

Hadoop MR(MapReduce) uses a partition function for passing the outputs of mappers to reducers. The partition function determines target reducers after calculating the hash-value from the key and performing mod-operation by reducer number. The legacy partition function doesn't divide the job effectively because it is so sensitive to key distribution. If the job isn't divided effectively then it can effect the total processing time of the job because some reducers need more time to process. This paper proposes the VPT(Virtual Partition Table) and has tested appling the VPT with a preponderance of data. The applied VPT improved three seconds on average and we figure it will improve more when data is increased.

하둡 MR(MapReduce)는 매퍼(Mapper)의 출력을 리듀서(Reducer)의 입력으로 전달하기 위해 파티션 함수(Partition Function)을 사용한다. 파티션 함수는 키에서 해쉬 값을 계산한 후 리듀서 개수로 나머지 연산을 수행하여 대상 리듀서를 결정한다. 기존 파티션 함수는 키의 편중도에 민감하여 잡이 균등하게 배분될 수 없었다. 잡이 균등하게 배분되지 못하면 특정 리듀서들의 처리 수행 시간이 길어져 전체 분산 처리 수행 성능에 영향을 주게 된다. 이에 본 논문은 VPT(Virtual Partition Table)을 제안하고 편중도가 심한 데이터에 VPT을 적용하여 실험을 수행 하였다. 적용된 VPT는 기존 파티션 함수와 대비하여 평균 3초 정도 성능향상이 발생하였으며, 데이터 처리량이 증가할수록 성능 향상 폭이 증가할 것으로 예상된다.

Keywords

References

  1. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI, 2004, pp 137-150.
  2. David S. Touretzky, "COMMON LISP: A Gentle Introduction to Symbolic Computation", The Benjamin/Cummings Publishing Company, 1990.
  3. Tom White, “Hadoop : The Definitive Guide", OREILLY, 2011.
  4. Dhruba Borthakur and the eight members, “Apache Hadoop Goes Realtime at Facebook”, SIGMOD’11, June 12-16, 2011.
  5. Sanjay Ghemawat and the two members, "The Google File System", Google, 2003.
  6. Konstantin Shvachko and the three members, "The Hadoop Distributed File System", IEEE, 2010.
  7. Nandhini.C, Premadevi.P, “A Micro Partitioning Technique in MapReduce for Massive Data Analysis”, International Journal of Innovative Research in Computer and Communication Engineering Vol. 2, Issue 3, March 2014.
  8. Kenn Slagter and three members, "An improved partitioning mechanism for optimizing massive data analysis using MapReduce", Springer Science Business Media New York 2013, J Supercomput (2013) 66:539-555.
  9. http://wiki.apache.org/hadoop/PoweredBy
  10. http://www.gutenberg.org/ebooks/18525?msg=welcome_stranger