Novelty Detection on Web-server Log Dataset

Lee, Hwaseong;Kim, Ki Su;

doi:10.6109/jkiice.2019.23.10.1311

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 23 Issue 10
/
Pages.1311-1319
/
2019
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Novelty Detection on Web-server Log Dataset

웹서버 로그 데이터의 이상상태 탐지 기법

Lee, Hwaseong (Agency of Defense and Development) ;
Kim, Ki Su (Agency of Defense and Development)

이화성 ;
김기수

Received : 2019.07.25
Accepted : 2019.08.25
Published : 2019.10.31

https://doi.org/10.6109/jkiice.2019.23.10.1311 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Currently, the web environment is a commonly used area for sharing information and conducting business. It is becoming an attack point for external hacking targeting on personal information leakage or system failure. Conventional signature-based detection is used in cyber threat but signature-based detection has a limitation that it is difficult to detect the pattern when it is changed like polymorphism. In particular, injection attack is known to the most critical security risks based on web vulnerabilities and various variants are possible at any time. In this paper, we propose a novelty detection technique to detect abnormal state that deviates from the normal state on web-server log dataset(WSLD). The proposed method is a machine learning-based technique to detect a minor anomalous data that tends to be different from a large number of normal data after replacing strings in web-server log dataset with vectors using machine learning-based embedding algorithm.

현재 웹 환경은 정보 공유와 비즈니스 수행을 위해 보편적으로 사용되고 있는 영역으로 개인 정보 유출이나 시스템 장애 등을 목표로 하는 외부 해킹의 공격 타켓이 되고 있다. 기존의 사이버 공격 탐지 기술은 일반적으로 시그니처 기반 분석으로 공격 패턴의 변경이 발생할 경우 탐지가 어렵다는 한계가 있다. 특히 웹 취약점 기반 공격 중 삽입 공격은 가장 빈번히 발생하는 공격이고 다양한 변형 공격이 언제든 가능하다. 본 논문에서는 웹서버 로그에서 정상상태를 벗어나는 비정상 상태를 탐지하는 이상상태 탐지 기법을 제안한다. 제안된 방법은 웹서버 로그 내 문자열 항목을 머신러닝 기반 임베딩 기법으로 벡터로 치환한 후 다수의 정상 데이터와 상이한 경향성을 보이는 비정상 데이터를 탐지하는 머신러닝 기반 이상상태 탐지 기법이다.

Keywords

References

Symantec Corporation. 2016. Internet security threat report.
OWASP Top Ten Project, 2017 [Internet]. Available: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project.
J. Liang, W. Zhao, and W. Ye, "Anomaly-Based Web Attack Detection: A Deep Learning Approach," the VI International Conference on Network, Communication and Computing. ACM, pp. 80-85, 2017.
H. Mac, D. Truong, L. Nguyen, H. A. Tran, and D. Tran, "Detecting Attacks on Web Applications using Autoencoder," the 9th Internationa Symposium on Information and Communication Technology, Viet Nam, pp. 416-421, 2018.
T. Mikolov, I. Sutskever, K. Chen, G. Corrando, and J. Dean, "Distirubuted representations of words and phrases and their compositionality." Advances in neural information processing systems, pp. 3111-3119, 2013.
Q. Le, "Distributed Representations of Sentences and Documents," International conference on machine learning, vol. 32, pp. 1188-1196, Jun. 2014.
F. T. Liu, K. M. Ting, and Z. Hua, "Isolation Forest," the 8th IEEE International Conference on Data Mining, pp. 413-422, 2008.
Gensim, Last updated on July, 2019. [Internet]. https://radimrehurek.com/gensim/models/doc2vec.html.
L. V. D. Maaten, and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, vol. 9, pp. 2579-2695, 2008.
H. Lee, K. S. Kim, and H. Kim, "Embedding Model Based on Web-server Log Dataset," the Korea Institute of Military Science and Technology, pp.1183-1184, 2019.

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Novelty Detection on Web-server Log Dataset

웹서버 로그 데이터의 이상상태 탐지 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)