JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 43, Issue 6,  2016, pp.621-626
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.6.621
 Title & Authors
A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms
Kang, Minseo; Kim, Jaesung; Lee, Jaegil;
 
 Abstract
Recursive query algorithm is used in many social network services, e.g., reachability queries in social networks. Recently, the size of social network data has increased as social network services evolve. As a result, it is almost impossible to use the recursive query algorithm on a single machine. In this paper, we implement recursive query on two popular in-memory distributed platforms, Spark and Twister, to solve this problem. We evaluate the performance of two implementations using 50 machines on Amazon EC2, and real-world data sets: LiveJournal and ClueWeb. The result shows that recursive query algorithm shows better performance on Spark for the Livejournal input data set with relatively high average degree, but smaller vertices. However, recursive query on Twister is superior to Spark for the ClueWeb input data set with relatively low average degree, but many vertices.
 Keywords
distributed in-memory platform;recursive query algorithm;big data;social network service;
 Language
Korean
 Cited by
 References
1.
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster computing with working sets," HotCloud, 2010.

2.
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae, J. Qiu, and G. Fox, "Twister: A runtime for iterative MapReduce," HPDC, pp. 810-818, 2010.

3.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., and Stoica, I., "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," Proc. of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 2-2, Apr. 2012.

4.
Guava: Google Core Libraries for Java. [Online]. Available: https://github.com/google/guava/. Accessed: 2016-3-27.

5.
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. [Online]. Available: http://snap.stanford.edu/data/. Accessed: 2016-2-3.

6.
Lemur. The ClueWeb09 Dataset. [Online]. Available: http://lemurproject.org/clueweb09/. Accessed: 2016-2-3.

7.
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. "HaLoop: Efficient iterative data processing on large clusters," PVLDB, Vol. 3, No. 1, pp. 285-296, 2010.

8.
H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark, O'Reilly Media, 2015.