DOI QR코드

DOI QR Code

비용 효율적 맵리듀스 처리를 위한 클러스터 규모 설정

Scaling of Hadoop Cluster for Cost-Effective Processing of MapReduce Applications

  • 류우석 (부산가톨릭대학교 병원경영학과)
  • Ryu, Woo-Seok (Dept. of Health Care Management, Catholic University of Pusan)
  • 투고 : 2019.10.22
  • 심사 : 2020.02.15
  • 발행 : 2020.02.29

초록

본 논문에서는 하둡 플랫폼에서 비용 효율적 빅데이터 분석을 수행하기 위한 클러스터 규모의 설정 방안을 연구한다. 의료기관의 경우 진료기록의 병원 외부 저장이 가능해짐에 따라 클라우드 기반 빅데이터 분석 요구가 증가하고 있다. 본 논문에서는 대중적으로 많이 사용되고 있는 클라우드 서비스인 아마존 EMR 프레임워크를 분석하고, 비용 효율적으로 하둡을 운용하기 위해 클러스터의 규모를 산정하기 위한 모델을 제시한다. 그리고, 다양한 조건에서의 실험을 통해 맵리듀스의 실행에 영향을 미치는 요인을 분석한다. 이를 통해 비용 대비 처리시간이 가장 효율적인 클러스터를 설정함으로써 빅데이터 분석시 효율성을 증대시킬 수 있다.

This paper studies a method for estimating the scale of a Hadoop cluster to process big data as a cost-effective manner. In the case of medical institutions, demands for cloud-based big data analysis are increasing as medical records can be stored outside the hospital. This paper first analyze the Amazon EMR framework, which is one of the popular cloud-based big data framework. Then, this paper presents a efficiency model for scaling the Hadoop cluster to execute a Mapreduce application more cost-effectively. This paper also analyzes the factors that influence the execution of the Mapreduce application by performing several experiments under various conditions. The cost efficiency of the analysis of the big data can be increased by setting the scale of cluster with the most efficient processing time compared to the operational cost.

키워드

참고문헌

  1. Y. Ding and K. Kim, "A Customized Tourism System Using Log Data on Hadoop," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 2, Apr. 2018, pp. 397-404. https://doi.org/10.13067/JKIECS.2018.13.2.397
  2. E. Nazari, M. H. Shahriari, and H. Tabesh, "Big Data Analysis in Healthcare: Apache Hadoop, Apache spark and Apache Flink," Frontiers in Health Informatics, vol. 8, no. 1, 2019, pp. 92-101.
  3. J. Choi, "Utilization value of medical Big Data created in operation of medical information system," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 12, Dec. 2015, pp. 1403-1410. https://doi.org/10.13067/JKIECS.2015.10.12.1403
  4. Y. Ahn and H. Cho, "Hospital System Model for Personalized Medical Service," J. of the Korea Convergence Society, vol. 8, no. 12, Dec. 2017, pp. 77-84. https://doi.org/10.15207/JKCS.2017.8.1.077
  5. S. Kim and D. Kim, "The Design and Implementation of the Fire Spot Display System Using s Smart Device," J. of the Korea Institute of Electronic Communication Sciences, vol. 13, no. 6, Dec. 2018, pp. 1287-1292. https://doi.org/10.13067/JKIECS.2018.13.6.1287
  6. M. Lee, "Considerations for the Migration of Electronic Medical Records to Cloud Based Storage," J. of Korean Library and Information Science, vol. 47, no. 1, Mar. 2016, pp. 149-173.
  7. M. Copeland, J. Soh, A. Puca, M. Manning, and D. Gollob, Microsoft Azure. Berkeley: Apress, 2015.
  8. T. Gunarathne, T. Wu, J. Qiu, and G. Fox, "MapReduce in the Clouds for Science," In Proc. the IEEE Cloud Computing Technology and Science, Indianapolis, USA, 2010, pp. 565-572.
  9. S. Mathew, "Overview of Amazon Web Services," Amazon Whitepapers, Nov. 2014.
  10. W. Ryu, "Cost-Effective MapReduce Processing in the Cloud," In Proc. the Conf. on Korea Information and Communication Engineering, vol. 22, no. 2, Oct. 2018, pp. 114-115.
  11. A. Sharma and G. Singh, "A Review on Data locality in Hadoop MapReduce," In 2018 Fifth Int. Conf. on Parallel, Distributed and Grid Computing, Solan Himachal Pradesh, India, Dec. 2018, pp. 723-728.
  12. S. Kim, Y. Kim, and W. Kim, "The Design of Method for Efficient Processing of Small Files in the Distributed System based on Hadoop Framework," J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no. 10, Oct. 2015, pp. 1115-1122. https://doi.org/10.13067/JKIECS.2015.10.10.1115