DOI QR코드

DOI QR Code

Web access prediction based on parallel deep learning

  • Received : 2019.10.31
  • Accepted : 2019.11.27
  • Published : 2019.11.29

Abstract

Due to the exponential growth of access information on the web, the need for predicting web users' next access has increased. Various models such as markov models, deep neural networks, support vector machines, and fuzzy inference models were proposed to handle web access prediction. For deep learning based on neural network models, training time on large-scale web usage data is very huge. To address this problem, deep neural network models are trained on cluster of computers in parallel. In this paper, we investigated impact of several important spark parameters related to data partitions, shuffling, compression, and locality (basic spark parameters) for training Multi-Layer Perceptron model on Spark standalone cluster. Then based on the investigation, we tuned basic spark parameters for training Multi-Layer Perceptron model and used it for tuning Spark when training Multi-Layer Perceptron model for web access prediction. Through experiments, we showed the accuracy of web access prediction based on our proposed web access prediction model. In addition, we also showed performance improvement in training time based on our spark basic parameters tuning for training Multi-Layer Perceptron model over default spark parameters configuration.

웹에서 정보 접근에 대한 폭발적인 주문으로 웹 사용자의 다음 접근 페이지를 예측하는 필요성이 대두되었다. 웹 접근 예측을 위해 마코브(markov) 모델, 딥 신경망, 벡터 머신, 퍼지 추론 모델 등 많은 모델이 제안되었다. 신경망 모델에 기반한 딥러닝 기법에서 대규모 웹 사용 데이터에 대한 학습 시간이 엄청 길어진다. 이 문제를 해결하기 위하여 딥 신경망 모델에서는 학습을 여러 컴퓨터에 동시에, 즉 병렬로 학습시킨다. 본 논문에서는 먼저 스파크 클러스터에서 다층 Perceptron 모델을 학습 시킬 때 중요한 데이터 분할, shuffling, 압축, locality와 관련된 기본 파라미터들이 얼마만큼 영향을 미치는지 살펴보았다. 그 다음 웹 접근 예측을 위해 다층 Perceptron 모델을 학습 시킬 때 성능을 높이기 위하여 이들 스파크 파라미터들을 튜닝 하였다. 실험을 통하여 논문에서 제안한 스파크 파라미터 튜닝을 통한 웹 접근 예측 모델이 파라미터 튜닝을 하지 않았을 경우와 비교하여 웹 접근 예측에 대한 정확성과 성능 향상의 효과를 보였다.

Keywords

References

  1. Mamoun A.Awad, Issa Khalil, "Prediction of User's web-browsing behavior: Application of Markov Model", IEEE Transactions on Systems, Man, And Cybernetics - Part B: Cybernetic, vol. 42, no. 4, pp. 1131-1142, August 2012. https://doi.org/10.1109/TSMCB.2012.2187441
  2. Wang, Yan "Web Mining and Knowledge Discovery of Usage Patterns", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.6743& rep=rep1&type=pdf, 2000.
  3. Giovanna Castellano, Anna M. Fanelli, and Maria A. Torsello, "Web Usage Mining: Discovering Usage Patterns for Web Applications", Advanced Techniques in Web Intelligence-2, SCI 452, pp. 75-104, 2013.
  4. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J.Franklin, S. Shenker, I. Stoica "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing" 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012), pp. 15-28
  5. AnastasiosGounaris, Jordi Torres, "A Methodlogy for Spark Parameter Tuning", Big Data Research, Volume 11, pages 22-32, 2018 https://doi.org/10.1016/j.bdr.2017.05.001
  6. P. Petridis, A. Gounaris, J. Torres "Spark parameter tuning via trial-and-error," Advances in Big Data - Proceedings of the 2nd INNS Conference on Big Data (2016), pp. 226-237
  7. Spark guidelines documentatin for tunning https://spark.apache.org/docs/latest/tuning.html
  8. R. Tous, A. Gounaris, C. Tripiana, J. Torres, S. Girona, E. Ayguadé, J.Labarta, Y. Becerra, D. Carrera, M. Valero "Spark deployment and performance evaluation on the marenostrum supercomputer" IEEE International Conference on Big Data (Big Data) (2015), pp. 299-306
  9. Alpine Data tuning tip http://techsuppdiva.github.io/spark1.6.htm
  10. A.J. Awan, M. Brorsson, V. Vlassov, E. Ayguade "How data volume affects spark data data analysitcs on a scale-up server" (2015)
  11. Spark parameters configuration http://spark.apache.org/docs/latest /configuration.htm
  12. Om Prakash Mandal, Hiteshware Kumar Azad "Web Access Prediction Model using Clustering and Artificial Neural Network", IJERT, Vol.3 Issue 9, 2014.
  13. Pruthvi, "Web-Users' Browsing behavior Prediction by Implementing Neural Network in MapReduce", IJAFRC, Vol.1 Issue 5, 2014
  14. Vidushi, Yashpal Singh, "SOM Improved Neural Network Approach for Next Page Prediction" International Journal of Computer Science and Mobile Computing, Volume 4, Issue 5, pg. 175-181, May 2015
  15. NASA: web access log dataset: http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
  16. ClarkNet: web access log dataset: http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html