JOURNAL BROWSE
Search
Advanced SearchSearch Tips
An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform
Lee, Jae hwan; Choi, Jun; Koo, Dong hun;
  PDF(new window)
 Abstract
Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.
 Keywords
Bigdata;In-memory Platform;Spark;Tachyon File System;
 Language
Korean
 Cited by
 References
1.
S. Y. Kim, S. H. Lee, and H. S. Hwang, "A Study of Factors Affecting Attitude Towards Using Mobile Cloud Service", Journal of the Korea Industrial Information System Society, Vol.18, No. 6, pp.83-94, 2013. (journal) crossref(new window)

2.
J. W. Kim, "A workflow scheduling based on decision table for cloud computing", Journal of the Korea Industrial Information System Society, Vol.17, No. 5, pp.29-36, 2012. (journal)

3.
J. I. Chaos, and J. H. Ching, "A study on finding influential twitter users by clustering and ranking techniques", Vol.20, NO. 1, pp.19-26, Feb, 2015. (journal)

4.
H. S. Han, H. D. Yang, and K. H. Kim, "Research on Cloud Computing-Based SHE Inorganization Platform Policy", Vol. 19, No. 5, Oct, 2014. (journal)

5.
T. White, "Hadoop: The Definitive Guide", 2015. (book)

6.
Zachariah, Malted, eh ad. "Spark: Cluster Computing with Working Sets." Hotblood10 (2010): 10-10.

7.
Hadoop, Konstantin, eh ad. "The Hadoop distributed file system." Mass Storage Systems and Technologies (MUST), 2010 IEEE 26th Symposium on. IEEE, 2010.

8.
Dean, Jeffrey, and Sanjak Sanjak. "Sanjak: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.

9.
Hotblood, Veined Kumara, eh ad. "Apache Hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.

10.
Lf, Honduran, eh ad. "Tachyon: Reliable, memory speed storage for cluster computing frameworks." Proceedings of the ACM Symposium on Cloud Computing. ACM, 2014.

11.
Zachariah, Malted, eh ad. "Resilient distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USETI conference on Networked Systems Design and Implementation. USETI Association, 2012.

12.
Page, Lawrence, eh ad. "The PageRank citation ranking: bringing order to the web." (1999).

13.
http://snap.stanford.edu/data/soc-LiveJournal1.html

14.
http://sujee.net/2015/01/22/understandingspark-caching/#.V0ad95E6