Advanced SearchSearch Tips
Learning algorithms for big data logistic regression on RHIPE platform
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Learning algorithms for big data logistic regression on RHIPE platform
Jung, Byung Ho; Lim, Dong Hoon;
  PDF(new window)
Machine learning becomes increasingly important in the big data era. Logistic regression is a type of classification in machine leaning, and has been widely used in various fields, including medicine, economics, marketing, and social sciences. Rhipe that integrates R and Hadoop environment, has not been discussed by many researchers owing to the difficulty of its installation and MapReduce implementation. In this paper, we present the MapReduce implementation of Gradient Descent algorithm and Newton-Raphson algorithm for logistic regression using Rhipe. The Newton-Raphson algorithm does not require a learning rate, while Gradient Descent algorithm needs to manually pick a learning rate. We choose the learning rate by performing the mixed procedure of grid search and binary search for processing big data efficiently. In the performance study, our Newton-Raphson algorithm outpeforms Gradient Descent algorithm in all the tested data.
Big data;Hadoop;logistic regression;R;RHIPE;
 Cited by
Arnulf, B. A., Graf, Alexander J. S. and Borer, S. (2003). Classification in a normalized feature space using support vector machines. IEEE, 14, 597-605.

ASA data expo. (2009).

Ciliendo, E., Kunimasa, T. and Braswell, B. (2007). Linux performance and tuning guidelines, IBM redbooks, IBM, International Technical Support Organization, USA.

Davenport, T. (2015). B.I.G. Forum 2015. Big data initiative Gyeonggi, Gyeonggi Creative Economy & Innovation Center, Gyeonggi Province, Korea.

Forte, R. M. (2015). Mastering predictive analytics with R, Packt Publishing Ltd, Birmingham, UK.

Guha, S. (2010). Computing environment for the statistical analysis of large and complex data, Ph. D. Thesis, Department of Statistics, Purdue University, West Lafayette, Indiana, USA.

Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B. and Cleveland, W. S. (2012). Large complex data: Divide and recombine (D&R) with RHIPE. Stat, 191, 53-67

Hafen, R., Gibson, T., Dam, K. K. and Critchlow. T. (2014). Power grid data analysis with R and Hadoop. in data mining applications with R, 1-34.

Hilbe, J. M. (2009). Logistic regression models, Chapman & Hall/CRC Press, Florida, USA.

Jung, B. H., Shin, J. E. and Lim, D. H. (2014). Rhipe platform for big data processing and analysis, The Korean Journal of Applied Statistics, 27, 1171-1185. crossref(new window)

Jung, B. H. (2016). A study on machine learning algorithms using distributed processing system of big data, Ph. D. Thesis, Gyeongsang National University, Jinju, Korea.

Ko, Y. and Kim, J. (2013). Analysis of big data using Rhipe. Journal of the Korean Data & Information science Society, 24, 975-987. crossref(new window)

Lin, H., Yang, S. and Midkiff, S. P. (2013). RABID-A general distributed R processing framework targeting large data-set problems, IEEE International Congress on Big Data, Santa Clara, CA, USA.

Prajapati, V. (2013). Big data analytics with R and Hadoop, Packt Publishing Ltd, Birmingham, UK.

Rashid, M. (2008). Inference on logistic regression, Ph. D. Thesis, Bowling green state university, Ohio, USA.

Sammer, E. (2012). Hadoop Operations, O'Reilly Media, Inc., Sebastopol, CA.

Shin, J. E., Jung, B. H. and Lim, D. H. (2015). Big data distributed processing system using RHadoop. Journal of the Korean Data & Information science Society, 26, 1155-1166. crossref(new window)

Tzafestas, A. G. (1992). Robotic systems: Advanced techniques and applications, Kluwer Academic Publishers, Dordrecht, Netherlands.

Wang, C., Chen, M. H., Schifano, Wu, J. and Yan, J. (2015). A survey of statistical methods and computing for big data, Cornell university library, Available at

White, T. (2012). Hadoop: The definitive guide, O'Reilly Media, Inc., Sebastopol, CA.

Wu, J. and Coggeshall, S. (2012). Foundations of predictive analytics, Chapman and Hall/CRC Press, Florida, USA.