Hadoop and MapReduce

Park, Jeong-Hyeok;Lee, Sang-Yeol;Kang, Da Hyun;Won, Joong-Ho;

doi:10.7465/jkdi.2013.24.5.1013

Journal of the Korean Data and Information Science Society

Volume 24 Issue 5
/
Pages.1013-1027
/
2013
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

Hadoop and MapReduce

하둡과 맵리듀스

Park, Jeong-Hyeok (School of Industrial Management Engineering, Korea University) ;
Lee, Sang-Yeol (School of Industrial Management Engineering, Korea University) ;
Kang, Da Hyun (School of Industrial Management Engineering, Korea University) ;
Won, Joong-Ho (School of Industrial Management Engineering, Korea University)

박정혁 (고려대학교 산업경영공학부) ;
이상열 (고려대학교 산업경영공학부) ;
강다현 (고려대학교 산업경영공학부) ;
원중호 (고려대학교 산업경영공학부)

Received : 2013.07.07
Accepted : 2013.08.12
Published : 2013.09.30

https://doi.org/10.7465/jkdi.2013.24.5.1013 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

As the need for large-scale data analysis is rapidly increasing, Hadoop, or the platform that realizes large-scale data processing, and MapReduce, or the internal computational model of Hadoop, are receiving great attention. This paper reviews the basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming, through examples that combine the R programming language and Hadoop.

대용량 데이터 분석의 필요성이 급격히 증대되면서 이를 가능케 해 주는 플랫폼인 하둡과 그 내부적인 계산 모형인 맵리듀스에 대한 관심 또한 늘고 있다. 본고에서는 R 등의 통계 프로그래밍에 익숙한 데이터 분석가가 하둡을 사용하고자 할 때 알아야 할 기본 개념들을 R과 하둡을 결합하는 몇가지 예제와 함께 소개한다.

Keywords

References

Bache, K. and Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. [Online; accessed June 2013].
Cho, S., Lee, S., Lee, K. and Kim, Y. (2009). Distributed filtering service model for spam mails based on hadoop framework. In Proceedings of the 2009 Korean Society for Internet Information, Korean Society for Internet Information, Seoul, 165-168.
Dean, J. and Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In OSDI4: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. USENIX Association, San Francisco.
Facebook Engineering Team (2012). Under the hood: scheduling MapReduce jobs more efficiently with Corona. https://www.facebook.com/notes/facebook-engineering/under-the-hood-schedulingmapreduce- jobs-more-efficiently-with-corona/10151142560538920. [Online; accessed June 2013].
Ghemawat, S., Gobioff, H. and Leung, S.-T. (2003). The google file system. ACM SIGOPS Operating Systems Review, 37, 29-43. https://doi.org/10.1145/1165389.945450
Guha, S. (2010). Computing environment for the statistical analysis of large and complex data, PhD thesis, Department of Statistics, Purdue University, West Lafayette.
Guha, S., Hafen, R. P., Kidwell, P. and Cleveland, W. S. (2009). Visualization databases for the analysis of large complex datasets. Journal of Machine Learning Research, 5, 193-200.
Harris, D. (2011). Why the pace of Hadoop innovation has to pick up. http://gigaom.com/2011/04/25/why-we-need-more-hadoop-innovation/. [Online; accessed June 2013].
Jung, H., Kim, J., Park, H. and Lee, J. (2011). The design of content-based music search system using hadoop. In Proceedings of the 2011 Korean Institute of Information Scientists and Engineers, The Korean Institute of Information Scientists and Engineers, Seoul, 377-380.
Kim, M., Cui, Y., Han, S. and Lee, H. (2012). A hadoop-based media transcoding system for mobile media service. In Proceedings of the 2012 Korean Society for Internet Information, Korean Society for Internet Information, Seoul, 233-234.
Lam, C. (2012). Hadoop in action (Korean translation), Ji & Son, Seoul.
McKinsey Global Institute (2011). Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, New York.
Park, S., Lee, B., Kim, H., Kim, D. and Yoon, S. (2011). A study on speedup of multiple sequence alignment using mapreduce on cloud infrastructure. In Proceedings of the 2011 Korean Institute of Information Scientists and Engineers, The Korean Institute of Information Scientists and Engineers, Seoul, 123-126.
Piccolboni, A. (2013). Mapreduce in R. https://github.com/RevolutionAnalytics/rmr2/blob/master/ docs/tutorial.md. [Online; accessed June 2013].
Revolution Analytics (2011). Advanced big dataanalytics with R and Hadoop. http://www.revolutionanalytics.com/why-revolution-r/whitepapers/advanced-big-data-analytics-with-rand- hadoop.php. [Online; accessed June 2013].
Seo, S., Kim, J., Park, Y., Lee, J. and Myeong, J. (2013). Hadoop & NoSQL, Gilbut, Seoul.
The Apache Software Foundation (2008). MapReduce tutorial. http://hadoop.apache.org/docs/stable/ mapred_tutorial.html. [Online; accessed June 2013].

Cited by

Enhancing the performance of taxi application based on in-memory data grid technology vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1035
A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster vol.19, pp.11, 2014, https://doi.org/10.9708/jksci.2014.19.11.009
Big data distributed processing system using RHadoop vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1155
Current trends in high dimensional massive data analysis vol.29, pp.6, 2016, https://doi.org/10.5351/KJAS.2016.29.6.999
Structuring of unstructured big data and visual interpretation vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1431
RHadoop platform for K-Means clustering of big data vol.27, pp.3, 2016, https://doi.org/10.7465/jkdi.2016.27.3.609
An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1129
빅데이터 통합모형 비교분석 vol.28, pp.4, 2013, https://doi.org/10.7465/jkdi.2017.28.4.755
빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계 vol.12, pp.7, 2021, https://doi.org/10.15207/jkcs.2021.12.7.045

Journal of the Korean Data and Information Science Society

Hadoop and MapReduce

하둡과 맵리듀스

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)