An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform
Lee, Jae hwan; Choi, Jun; Koo, Dong hun;
Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.
Bigdata;In-memory Platform;Spark;Tachyon File System;
