DOI QR코드

DOI QR Code

URL 주요특징을 고려한 악성URL 머신러닝 탐지모델 개발

Development of a Malicious URL Machine Learning Detection Model Reflecting the Main Feature of URLs

  • Kim, Youngjun (Department of Convergence Security, Chung-Ang University) ;
  • Lee, Jaewoo (Department of Industrial Security, Chung-Ang University)
  • 투고 : 2022.10.21
  • 심사 : 2022.11.01
  • 발행 : 2022.12.31

초록

최근 코로나 19, 정치적 상황 등 사회적 현안을 악용한 스미싱, 해킹메일 공격이 지속되고 있다. 공격의 대부분은 악성 URL 접근을 유도하여 개인정보를 탈취하는 방식을 취하고 있는데, 이를 대비하기 위해 현재 머신러닝, 딥러닝 기술 연구가 활발하게 진행되고 있다. 하지만 기존 연구에서는 데이터 세트의 특징들이 단순하기 때문에 악성으로 판별할 근거가 부족하다고 판단하였다. 본 논문에서는 URL 데이터 분석을 통해 기존 연구에 반영된 URL 어휘적인 특징 이외에도 "URL Days", "URL Words", "URL Abnormal" 3종, 9개 주요특징을 추가 제안하였고, 4개의 머신러닝 알고리즘 적용을 통해 F1-Score, 정확도 지표로 측정하였다. 기존 연구와 비교 분석 시 평균 0.9%가 향상된 결과 값과 F1-Score, 정확도에서 최고 98.5%가 측정됨에 따라 주요특징이 정확도 및 성능 향상에 기여하였다.

Cyber-attacks such as smishing and hacking mail exploiting COVID-19, political and social issues, have recently been continuous. Machine learning and deep learning technology research are conducted to prevent any damage due to cyber-attacks inducing malicious links to breach personal data. It has been concluded as a lack of basis to judge the attacks to be malicious in previous studies since the features of data set were excessively simple. In this paper, nine main features of three types, "URL Days", "URL Word", and "URL Abnormal", were proposed in addition to lexical features of URL which have been reflected in previous research. F1-Score and accuracy index were measured through four different types of machine learning algorithms. An improvement of 0.9% in a result and the highest value, 98.5%, were examined in F1-Score and accuracy through comparatively analyzing an existing research. These outcomes proved the main features contribute to elevating the values in both accuracy and performance.

키워드

참고문헌

  1. N. S. Kim, "Ministry of Science and ICT, '21 cyber threat analysis and '22 viewpoint analysis," Ministry of Science and ICT, 2021. [Internet]. Available: https://doc.msit.go.kr/SynapDocViewServer/viewer/doc.html?key=7d38743144ff45fb8688b4f2255dfc13&convType=html&convLocale=ko_KR&contextPath=/SynapDocViewServer/.
  2. Spotting and blacklisting malicious COVID-19-themed sites [Internet]. Available: https://www.helpnetsecurity.com/2020/04/07/covid-19-malicious-sites/.
  3. Y. B. Kwon and I. S. Kim, "A Study on Anomaly Signal Detection and Management Model using Big Data," The Journal of The Institute of Internet, Broadcasting and Communication, vol. 16, no. 6, pp. 287-294, Dec. 2016. https://doi.org/10.7236/JIIBC.2016.16.6.287
  4. S. G. Lee, D. W. Kim, B. J. Kim, T. W. Lee, S. W. Han, and J. K. Lee, "Comprehensive Analysis Strategy in Cyber Threat Intelligence Environment," Review of KIISC, vol. 31, no. 5, pp. 33-38, Oct. 2021.
  5. Leading the domestic security market with AI technology [Internet]. Available: http://www.itdaily.kr/news/articleView.html?idxno=206661.
  6. J. K. Kim, M. H. Jang, S. N. Lim, and M. S. Kim, "A Study on the Detection Method of Malicious URLs based on the Internet Search Engines using the Machine Learning," The Transactions of The Korean Institute of Electrical Engineers, vol. 70, no. 1, pp. 114-120, Jan. 2021.
  7. H. K. Kang, S. S. Shin, D. Y. Kim, and S. T. Park, "Design and Implementation of Malicious URL Prediction System based on Multiple Machine Learning Algorithms," Journal of Korea Multimedia Society, vol. 23, no. 11, pp. 1396-1405, Nov. 2020. https://doi.org/10.9717/KMMS.2020.23.11.1396
  8. A. Hevapathige and K. Rathnayake, "Super Learner for Malicious URL Detection," in Proceedings of 2022 2nd International Conference on Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, pp. 114-119, 2022.
  9. Y. Chen, Y. Zhou, Q. Dong, and Q. Li, "A Malicious URL Detection Method Based on CNN," in Proceedings of 2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Shenyang, China, pp. 23-28, 2020.
  10. University of new brunswick ISCX-URL2016 URL dataset [Internet]. Available: https://www.unb.ca/cic/datasets/url2016.html.
  11. Phishing URLs provided by Phishing Tank [Internet]. Available: http://data.phishtank.com/data/online-valid.csv.
  12. Malicious URLs provided by URLhaus [Internet]. Available: https://urlhaus.abuse.ch/.
  13. Phishing websites provided by OpenPhish [Internet]. Available: https://openphish.com/.
  14. Multinational Open Content Directory on World Wide Web Links by DMOZ [Internet]. Available: https://www.dmoz-odp.org.
  15. The Internet Society, "Rfc3986: Uniform resource identifier (uri): Generic syntax," 2005. [Online]. Available: https://tools.ietf.org/html/rfc3986.
  16. J. S. Park, "Based on URL pattern analysis Preventive measures against harmful sites," M. S. thesis, Konkuk University, 2019.
  17. C. M. Kwon, Python Machine Learning Perfect Guide, Gyeonggi, Korea, Wikibook, 2019.