DOI QR코드

DOI QR Code

Network Traffic Measurement Analysis using Machine Learning

  • Hae-Duck Joshua Jeong (Dept. of Computer Software, Korean Bible University)
  • Received : 2023.05.06
  • Accepted : 2023.06.02
  • Published : 2023.06.30

Abstract

In recent times, an exponential increase in Internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning. In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical, and the performance of the proposed system and another paper is compared and analyzed. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models.

Keywords

References

  1. Abbasi, A, Shahraki, A & Taherkordi, A. (2021). Deep Learning for Network Traffic Monitoring and Analysis (NTMA): A Survey, Computer Communications, 170, 19-41.  https://doi.org/10.1016/j.comcom.2021.01.021
  2. Almomani, O., Almaiah, M. A., Alsaaidah, A., Smadi, S., Mohammad, A. H., & Althunibat, A. (2021, July). Machine learning classifiers for network intrusion detection system: comparative study. In 2021 International Conference on Information Technology (ICIT) (pp. 440-445). 
  3. Alqudah, N., & Yaseen, Q. (2020). Machine Learning for Traffic Analysis: A Review, Procedia Computer Science, 170, 911-916.  https://doi.org/10.1016/j.procs.2020.03.111
  4. Barford, P., & Plonka, D. (2001). Characteristics of Network Traffic Flow Anomalies. Proc. 1st ACM SIGCOMM Workshop on Internet Measurement, San Francisco, California, USA, 69-73. 
  5. Bell, J. (2015). Machine Learning (Indianapolis, IN: John Wiley & Sons, Inc.). 
  6. Casas, P., Vanerio, J. & Fukuda, K. (2017). "GML learning, a generic machine learning model for network measurements analysis," 2017 13th International Conference on Network and Service Management (CNSM), Tokyo, Japan, 1-9. 
  7. Choudhary, S., & Kesswani, N. (2020). Analysis of KDD-Cup'99, NSL-KDD and UNSW-NB15 datasets using deep learning in IoT. Procedia Computer Science, 167, 1561-1573.  https://doi.org/10.1016/j.procs.2020.03.367
  8. Cortes, C., & Vapnik, V. (1995). Support-Vector Networks, Machine Learning, 20(3), 273-297. 
  9. Gitau, J.M., Rodrigues, A.J., & Abuonji, P. (2020). Prototype Intelligent Log-Based Intrusion Detection System, International Journal of Advanced Networking and Applications, 12, 4519-4527.  https://doi.org/10.35444/IJANA.2020.12102
  10. Gurung, S., Ghose, M. K., & Subedi, A. (2019). Deep learning approach on network intrusion detection system using NSLKDD dataset. International Journal of Computer Network and Information Security, 11(3), 8-14.  https://doi.org/10.5815/ijcnis.2019.03.02
  11. Jeong, H.-D., Ahn, W., Kim, H., & Lee, J.-S.R. (2017). Anomalous Traffic Detection Self-Similarity Analysis in the Environment of ATMSim, Cryptography, 1(3), 1-19. 
  12. Jeong, H.-D.J., Ryu, M.-U., Ji, M. -J., Cho, Y. -B., Ye, S. -K., & Lee, J.-S.R. (2016). DDoS Attack Analysis Using the Improved ATMSim, Journal of Internet Computing and Services, 17(2), 19-28.  https://doi.org/10.7472/JKSII.2016.17.2.19
  13. Kang, M., & Choi, E. (2021). Machine Learning: Concepts, Tools and Data Visualization, World Scientific. 
  14. Kelleher, J.D., Namee, B.M., & D'Arcy, A. (2014). Fundamentals of Machine Learning for Predictive Data Analysis: Algorithms, Worked Examples, and Case Studies (Cambridge, MA: The MIT Press).
  15. Khan, K., & Goodridge, W. (2019). A Survey of Network-based Security Attacks, International Journal of Advanced Networking and Applications, 10(5), 3981-3989.  https://doi.org/10.35444/IJANA.2019.10051
  16. Kim, K.-P., & Song, S.-W. (2018). A Study on Prediction of Business Status Based on Machine Learning. Korea Journal of Artificial Intelligence, 6(2), 23-27. https://doi.org/10.24225/KJAI.2018.6.2.23. 
  17. Kulariya, M., Saraf, P., Ranjan, R., & Gupta, G. P. (2016). Performance analysis of network intrusion detection schemes using Apache Spark. In 2016 International Conference on Communication and Signal Processing (ICCSP) (pp. 1973-1977). 
  18. Lee, J.-S., Ye, S.-K., & Jeong, H.-D. (2014). ATMSim: An Anomaly Teletraffic Detection Measurement Analysis Simulator, Simulation Modeling Practice and Theory, 49, 98-109.  https://doi.org/10.1016/j.simpat.2014.09.001
  19. Murphy, K.P. (2012). Machine Learning: A Probabilistic Perspective (Cambridge, Massachusetts: The MIT Press). 
  20. Pakdel, R. (2019). Cloud-based Machine Learning Architecture for Big Data Analysis, PhD thesis, National University of Ireland, Cork. 
  21. Parihar, V., & Yadav, S. (2022). Comparative Analysis of Different Machine Learning Algorithms to Predict Online Shoppers' Behaviour, International Journal of Advanced Networking and Applications, 13(6), 5169-5182.  https://doi.org/10.35444/IJANA.2022.13603
  22. Pentreath, N. (2015). Machine Learning with Spark, (Packt Publishing, London). 
  23. Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance Analysis of Data Mining Classification Techniques to Predict Diabetes, Procedia Computer Science, 82, 115-121.  https://doi.org/10.1016/j.procs.2016.04.016
  24. Saranya, T., Sridevi, S., Deisy, C., Chung, T. D., & Khan, M. A. (2020). Performance analysis of machine learning algorithms in intrusion detection system: A review. Procedia Computer Science, 171, 1251-1260.  https://doi.org/10.1016/j.procs.2020.04.133
  25. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A.A. (2009). A detailed analysis of the KDD CUP 99 data set, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 1-6. 
  26. Yuan, R., Li, Z., Guan, X. & Li, X. (2010). An SVM-based machine learning method for accurate internet traffic classification. Information Systems Frontiers, 12, 149-156.  https://doi.org/10.1007/s10796-008-9131-2
  27. Witten, I.H., & Frank, E. (2002). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, ACM SIGMOD Record, 31(1), 76-77.  https://doi.org/10.1145/507338.507355
  28. Witten, I.H., Frank, E., & Hall, M.A. (2011). Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition (New York, NY, Morgan Kaufmann).