Performance Analysis of Distributed Hadoop Systems

Bae, Byoung-Jin;Kim, Young-Joo;Kim, Young-Kuk;

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

2014.05a
/
Pages.479-482
/
2014

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

Performance Analysis of Distributed Hadoop Systems

분산 하둡 시스템의 성능 비교 분석

Bae, Byoung-Jin (Korea Institute of Machinery & Materials) ;
Kim, Young-Joo (Electronics and Telecommunications Research Institute) ;
Kim, Young-Kuk (Dept. of Computer Science & Engineering, Chungnam National University)

배병진 (한국기계연구원) ;
김영주 (한국전자통신연구원) ;
김영국 (충남대학교 컴퓨터공학과)

Published : 2014.05.28

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Nowadays open-source hadoop systems have been using widely to efficiently manage a fast-growing big data. Hadoop systems consist of distributed file processing system called HDFS (Hadoop Distributed File System) and distributed parallel processing system called MapReduce. The MapReduce reads and processes big data from HDFS and then processed results are written in HDFS again by the MapReduce. Such a processing method has different system structure respectively according to hadoop version. Therefore, this paper shows analysis results for performance of hadoop systems. For this, we devise a way which monitors hadoop systems and measure occurrence frequency of processes, threads, and variables generated in hadoop system itself using the devised way. So, by using the measured results as analysis indicator, we help the indicator predict inner performance of hadoop systems.

오늘날 급증하는 빅데이터를 효율적으로 관리하기 위해 오픈소스인 하둡을 많이 사용한다. 하둡은 분산 파일 처리 시스템인 HDFS(Hadoop Distributed File System)와 분산 병렬 처리 시스템인 맵리듀스(MapReduce)로 구성되어 있다. 하둡의 맵리듀스 프레임워크에서는 빅데이터를 HDFS에서 읽어들이고 분석 처리된 결과를 다시 HDFS에 쓴다. 이러한 분산 병렬 처리 방식은 하둡 버전에 따라 다른 시스템 구조를 가진다. 따라서 본 논문에서는 하둡 버전에 따른 빅데이터 처리 시에 동작하는 하둡시스템들의 내부 성능을 비교 분석한다. 이를 위해서 하둡 시스템을 감시할 수 있는 방법을 고안하여 내부적으로 생성되는 프로세스 및 스레드들과 변수들의 발생빈도를 측정하여 분석 지표로 사용한다.

Keywords

HDFS;
MapReduce

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

Performance Analysis of Distributed Hadoop Systems

분산 하둡 시스템의 성능 비교 분석

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)