DOI QR코드

DOI QR Code

Anomaly Detection Analysis using Repository based on Inverted Index

역방향 인덱스 기반의 저장소를 이용한 이상 탐지 분석

  • 박주미 (아주대학교 지식정보공학과) ;
  • 조위덕 (아주대학교 전자공학과) ;
  • 김강석 (아주대학교 사이버보안학과)
  • Received : 2017.06.27
  • Accepted : 2017.12.17
  • Published : 2018.03.15

Abstract

With the emergence of the new service industry due to the development of information and communication technology, cyber space risks such as personal information infringement and industrial confidentiality leakage have diversified, and the security problem has emerged as a critical issue. In this paper, we propose a behavior-based anomaly detection method that is suitable for real-time and large-volume data analysis technology. We show that the proposed detection method is superior to existing signature security countermeasures that are based on large-capacity user log data according to in-company personal information abuse and internal information leakage. As the proposed behavior-based anomaly detection method requires a technique for processing large amounts of data, a real-time search engine is used, called Elasticsearch, which is based on an inverted index. In addition, statistical based frequency analysis and preprocessing were performed for data analysis, and the DBSCAN algorithm, which is a density based clustering method, was applied to classify abnormal data with an example for easy analysis through visualization. Unlike the existing anomaly detection system, the proposed behavior-based anomaly detection technique is promising as it enables anomaly detection analysis without the need to set the threshold value separately, and was proposed from a statistical perspective.

정보통신 기술의 발전에 따른 새로운 서비스 산업의 출현으로 개인 정보 침해, 산업 기밀 유출 등 사이버 공간의 위험이 다양화 되어, 그에 따른 보안 문제가 중요한 이슈로 떠오르게 되었다. 본 연구에서는 기업 내 개인 정보 오남용 및 내부 정보 유출에 따른, 대용량 사용자 로그 데이터를 기반으로 기존의 시그니처(Signature) 보안 대응 방식에 비해, 실시간 및 대용량 데이터 분석기술에 적합한 행위 기반 이상 탐지방식을 제안하였다. 행위 기반 이상 탐지방식이 대용량 데이터를 처리하는 기술을 필요로 함에 따라, 역방향 인덱스(Inverted Index) 기반의 실시간 검색 엔진인 엘라스틱서치(Elasticsearch)를 사용하였다. 또한 데이터 분석을 위해 통계 기반의 빈도 분석과 전 처리 과정을 수행하였으며, 밀도 기반의 군집화 방법인 DBSCAN 알고리즘을 적용하여 이상 데이터를 분류하는 방법과 시각화를 통해 분석을 간편하게 하기위한 한 사례를 보였다. 이는 기존의 이상 탐지 시스템과 달리 임계값을 별도로 설정하지 않고 이상 탐지 분석을 시도하였다는 것과 통계적인 측면에서 이상 탐지 방식을 제안하였다는 것에 의의가 있다.

Keywords

Acknowledgement

Supported by : 정보통신 기술 진흥 센터

References

  1. National Information Security White Paper, Korea Internet Security Agency (KISA), 2016. (in Korean) [Online]. Available: http://isis.kisa.or.kr/ebook/download_pdf/2016.pdf
  2. C. H. Baek, "Control of Information Leakage Security, and Utilization of Digital Forensic," Deloitte Anjin Review, No. 3, Oct. 2014. (in Korean) [Online]. Available: https://www2.deloitte.com/content/dam/Deloitte/kr/Documents/insights/deloitte-anjin-review/03/kr_insights_deloitte-anjin-review-03_08.pdf
  3. Elastic Stack. [Online]. Available: https://www.elastic.co/kr/products
  4. W. T. Young, H. G. Goldberg, A. Memory, J. F. Sartain, and T. E. Senator, "Use of Domain Knowledge to Detect Insider Threats in Computer Activities," Security and Privacy Workshops (SPW), pp. 60-67, San Francisco, CA, USA, May 2013. DOI: 10.1109/SPW.2013.32
  5. H. Eldardiry, K. Sricharan, J. Liu, J. Hanley, B. Price, O. Brdiczka, and E. Bart, "Multi-source Fusion for Anomaly Detection : using Across-domain and Across-time Peer-group Consistency Checks," Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, Vol. 5, No. 2, pp. 39-58, 2014. DOI: 10.13140/2.1.3215.6802
  6. S. S. Hong, "[Technology Trends : SIEM] Evolve into an Intelligent Log Management Platform," CiOCiSO Magazine, Jan. 2016. (in Korean) [Online]. Available: http://www.ciociso.com/news/articleView.html?idxno=10993.
  7. S. J. Lee and D. H. Lee, "Real Time Predictive Analytic System Design and Implementation using Bigdata-log," Journal of The Korea Institute of information Security & Cryptology, Vol. 25, No. 6, pp. 1399-1410, 2015. https://doi.org/10.13089/JKIISC.2015.25.6.1399
  8. H. D. Kim, J. H. Kim, M. S. Park, S. H. Cho, and P. S. Kang, "Insider Threat Detection based on User behavior Model and Novelty Detection Algorithms," Journal of the Korean Institute of Industrial Engineers (KIIE), pp. 276-287, Aug. 2017.
  9. Tran Manh Thang and Juntae Kim, "The Anomaly Detection by Using DBSCAN Clustering with Multiple Parameters," International Conference on Information Science and Applications (ICISA 2011). Jeju Island, Korea, Apr. 26-29, 2011. DOI: 10.1109/ICISA.2011.577243.
  10. J. H. Sun, "Intelligent Intrusion Detection System Using Web Log and Database Log," Master Thesis, Department of Media at Graduate School of Computer & Information Technology in KOREA University, Feb. 2008. (in Korean)
  11. Querydsl, [Online]. Available: http://www.querydsl.com/
  12. Resrarch R&A Statistics Data, [Online]. Available:http://www.researchrna.com
  13. E. S. Kim, "A Study on Detection Model of Abnormal Signs Using User Logs - An Empirical Study of ERP System," Master Thesis, Department of Information Security at Graduate School of Information Security in Korea University, Feb. 2015. (in Korean)
  14. H. S. Shin, C. S. Park, "Development of a Chiller Model using Data Pre-processing and Random Forest," Master Thesis, Department of Convergence Engineering for Future City in Sungkyunkwan University, June 2016. (in Korean)
  15. Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pp. 226-231, Portland, Oregon, August 02 - 04, 1996.
  16. I. Y. Lee, "Design and Implementation of Parallel ST-DBSCAN as In-database Analytics Function," Master Thesis, Department of Computer Science and Engineering at Graduate School in Seoul National University, Feb. 2017. (in Korean)
  17. Gengxin Chen, Saied A.Jaradat, Nila Banerjee, et. al., "Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data," Statistica Sinica, Vol. 12, pp. 241-262, 2002.
  18. A. R. Yoon, "Design and Implementation of Gene Expression Data Mining System using DBSCAN Algorithm," Master Thesis, Department of Computer Science & Engineering at Ewha Institute of Science and Technology in Ewha Womans University, Feb. 2004. (in Korean)