Topic Automatic Extraction Model based on Unstructured Security Intelligence Report

비정형 보안 인텔리전스 보고서 기반 토픽 자동 추출 모델

  • Hur, YunA (Department of Computer Science and Egineering, Korea University) ;
  • Lee, Chanhee (Department of Computer Science and Egineering, Korea University) ;
  • Kim, Gyeongmin (Department of Computer Science and Egineering, Korea University) ;
  • Lim, HeuiSeok (Department of Computer Science and Egineering, Korea University)
  • 허윤아 (고려대학교 컴퓨터학과) ;
  • 이찬희 (고려대학교 컴퓨터학과) ;
  • 김경민 (고려대학교 컴퓨터학과) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2019.04.24
  • Accepted : 2019.06.20
  • Published : 2019.06.28


As cyber attack methods are becoming more intelligent, incidents such as security breaches and international crimes are increasing. In order to predict and respond to these cyber attacks, the characteristics, methods, and types of attack techniques should be identified. To this end, many security companies are publishing security intelligence reports to quickly identify various attack patterns and prevent further damage. However, the reports that each company distributes are not structured, yet, the number of published intelligence reports are ever-increasing. In this paper, we propose a method to extract structured data from unstructured security intelligence reports. We also propose an automatic intelligence report analysis system that divides a large volume of reports into sub-groups based on their topics, making the report analysis process more effective and efficient.


Security;Intelligence Report;Analysis;Topic Modeling;Classification

OHHGBW_2019_v10n6_33_f0001.png 이미지

Fig. 1. Example of problematic PDF file whenextracting text

OHHGBW_2019_v10n6_33_f0002.png 이미지

Fig. 2. Topic Modeling based on Security Intelligence Report

OHHGBW_2019_v10n6_33_f0003.png 이미지

Fig. 3. Result of putting test document in TopicModeling

Table 1. When a PDF document is simply extracted as text

OHHGBW_2019_v10n6_33_t0001.png 이미지

Table 2. This is an example of extracting the same PDF document by the method developed in this task

OHHGBW_2019_v10n6_33_t0002.png 이미지

Table 3. Topic by bag-of- words

OHHGBW_2019_v10n6_33_t0003.png 이미지

Table 4. Security Intelligence Report Topic Automatic Extraction Model Satisfaction Evaluation Question

OHHGBW_2019_v10n6_33_t0004.png 이미지

Table 5. Security Intelligence Report Topic Automatic Extraction Model satisfaction

OHHGBW_2019_v10n6_33_t0005.png 이미지


Supported by : Korea Creative Content Agency(KOCCA)


