JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Study on the Improving Performance of Massively Small File Using the Reuse JVM in MapReduce
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A Study on the Improving Performance of Massively Small File Using the Reuse JVM in MapReduce
Choi, Chul Woong; Kim, Jeong In; Kim, Pan Koo;
  PDF(new window)
 Abstract
With the widespread use of smartphones and IoT (Internet of Things), data are being generated on a large scale, and there is increased for the analysis of such data. Hence, distributed processing systems have gained much attention. Hadoop, which is a distributed processing system, saves the metadata of stored files in name nodes; in this case, the main problems are as follows: the memory becomes insufficient; load occurs because of massive small files; scheduling and file processing time increases because of the increased number of small files. In this paper, we propose a solution to address the increase in processing time because of massive small files, and thus improve the processing performance, using the Reuse JVM method provided by Hadoop. Through environment setting, the Reuse JVM method modifies the JVM produced conventionally for every task, so that multiple tasks are reused sequentially in one JVM. As a final outcome, the Reuse JVM method showed the best processing performance when used together with CombineFileInputFormat.
 Keywords
Hadoop;SmallFile;Reuse JVM;MapReduce;Distributed Processing;
 Language
Korean
 Cited by
 References
1.
J.H. Jung, Beginning, Hadoop Programming, Wikibooks, Gyeonggi-do, Korea, 2012.

2.
C.B. Kim and J.P. Chung, "Processing Method of Mass Small File using Hadoop Platform," Journal of Advanced Navigation Technology, Vol. 18, No. 4, pp. 401-408, 2014. crossref(new window)

3.
W.J. Yi and S. Park, “A Data Merging Technique based on Clustering for Solving Problems of Massive Small Files in Hadoop with Performance Enhancement of Map/ Reduce,” Proceeding of the Winter Conference of the Korean Institute of Information Scientists and Engineers, pp. 180-182, 2014.

4.
H.K. Oh, K.Y. Kim, J.M. Hwang, J.H. Park, J.T. Lim, K.S. Bok, et al., "A Distributed Cache Management Scheme for Efficient Accesses of Small Files in HDFS," Journal of the Korea Contents Association, Vol. 14, No. 11, pp. 28-38, 2014. crossref(new window)

5.
W. Tom, Hadoop : The Definitive Guide, Hanbit Media, Seoul, Korea, 2013.

6.
K.Y. Han, Do it! Hadoop with Big Data, Easys Pub, Seoul, Korea, 2013.

7.
IDG Korea, Open DB Framework for Big Data: Understanding Hadoop, IDG Tech Report, pp. 1-9, 2012.

8.
D. Chandrasekar, R. Dakshinamurthy, P.G. Sechakumar, and B. Prabavathy, "A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System," Proceeding of International Conference on Computer Communication and Informatics, pp. 1-8, 2013.

9.
Hadoop 1.2.1 Documentation, http://hadoop.apache.org/ (accessed Aug., 05, 2015)

10.
HDFS File Storage Method, http://blrunner.com/category/Development/Hadoop (accessed Aug., 05, 2015)

11.
B. Dong, J. Qiu, O. Zheng, X.Zhong, J.Li, and Y. Li, "A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop:a Case Study by Power Point Files," Proceeding of Institute of Electrical and Electronics Engineers International Conference on Services Computing, pp. 65-72, 2010.

12.
H.W. Kim, S.E. Park, and S.Y. Euh, "The Distributed Encryption Processing System for Large Capacity Personal Information based on MapReduce," Journal of the Korea Institute of Information and Communication Engineering, Vol. 18, No. 3, pp. 576-585, 2014. crossref(new window)

13.
The Small Files Problem, http://blog.cloudera. com/blog/2009/02/the-small-files-problem/ (accessed Aug., 07, 2015)

14.
Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files, http://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-arraybloommap-files/ (accessed Aug., 07, 2015)

15.
T.W. Kim, H.J. Chung, J.M. Kim, "The Creation and Placement of VMs and Tasks in Virtualized Hadoop Cluster Environment," Journal of Korea Multimedia Society, Vol. 15, No. 12, pp. 1-7, 2012. crossref(new window)

16.
U.G. Kim, J.Y. Kim, “Research on Object- Oriented Relational Database Model and its Utilization for Dynamic Geo-spatial Service through Next Generation Ship Navigation System,” IT CoNvergence PRActice (INPRA), Vol. 1, No. 2, pp. 1-10, 2013.

17.
S. Vidalis, O. Angelopoulou, "Assessing Identity Theft in the Internet of Things," IT CoNvergence PRActice(INPRA), Vol. 2, No. 1, pp 15-21, 2014.