DOI QR코드

DOI QR Code

유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택

Feature Selection for Anomaly Detection Based on Genetic Algorithm

  • 서재현 (원광대학교 컴퓨터.소프트웨어공학과)
  • Seo, Jae-Hyun (Division of Computer Science & Engineering, WonKwang University)
  • 투고 : 2018.04.16
  • 심사 : 2018.07.20
  • 발행 : 2018.07.28

초록

데이터 전처리 기법 중 하나인 특징 선택은 대규모 데이터셋을 다루는 다양한 응용분야에서 주요 연구 분야 중 하나로 각광받고 있다. 특징 선택은 패턴 인식, 기계학습 및 데이터 마이닝에서 사용됐고, 최근에는 텍스트 분류, 이미지 검색, 침입 탐지 및 게놈 분석과 같은 다양한 분야에 널리 적용되고 있다. 제안 방법은 메타 휴리스틱 알고리즘 중의 하나인 유전 알고리즘을 기반으로 한다. 특징 부분 집합을 찾는 방법은 크게 필터(filter) 방법과 래퍼(wrapper) 방법이 있는데, 본 연구에서는 최적의 특징 부분 집합을 찾기 위해 실제 분류기를 사용한 평가를 하는 래퍼 방법을 사용한다. 실험에 사용한 훈련 데이터셋은 클래스 불균형이 심하여 희소클래스에 대한 분류 성능을 높이기 어렵다. SMOTE 기법을 적용한 훈련 데이터셋을 사용하여 특징 선택을 하고 다양한 기계학습 알고리즘을 사용하여 선택한 특징들의 성능을 평가한다.

Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

키워드

참고문헌

  1. H. Liu & L. Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 17(4), 491-502. https://doi.org/10.1109/TKDE.2005.66
  2. I. Guyon & A. Elisseeff. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  3. E. M. Yang, H. J. Lee & C. H. Seo. (2017). Comparison of Detection Performance of Intrusion Detection System Using Fuzzy and Artificial Neural Network. Journal of Digital Convergence, 15(6), 391-398. https://doi.org/10.14400/JDC.2017.15.6.391
  4. H. Y. Lee & H. S. Y. (2014). Quality Evaluation Model for Intrusion Detection System based on Security and Performance. Journal of Digital Convergence, 12(6), 289-295. https://doi.org/10.14400/JDC.2014.12.6.289
  5. H. Y. Lee & H. S. Y. (2015). Convergence Performance Evaluation Model for Intrusion Protection System based on CC and ISO Standard. Journal of Digital Convergence, 13(5), 251-257. https://doi.org/10.14400/JDC.2015.13.5.251
  6. A. Jain & D. Zongker. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2), 153-158. https://doi.org/10.1109/34.574797
  7. A. Blum & R. L. Rivest. (1989). Training a 3-node neural network is NP-complete. In Advances in neural information processing systems, 494-501.
  8. R. Kohavi & G. H. John. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
  9. P. Pudil, J. Novovicva & J. Kittler. (1994). Floating search methods in feature selection. Pattern recognition letters, 15(11), 1119-1125. https://doi.org/10.1016/0167-8655(94)90127-9
  10. V. Bolon-Canedo, N. Sanchez-Marono & A. Alonso- Betanzos. (2011). Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset. Expert Systems with Applications, 38(5), 5947-5957. https://doi.org/10.1016/j.eswa.2010.11.028
  11. H. Nguyen, K. Franke & S. Petrovic. (2010, February). Improving effectiveness of intrusion detection by correlation feature selection. In Availability, Reliability, and Security, 2010. ARES'10 International Conference on, 17-24.
  12. T. S. Chou, K. K. Yen & J. Luo. (2008). Network intrusion detection design using feature selection of soft computing paradigms. International journal of computational intelligence, 4(3), 196-208.
  13. KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  14. N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953
  15. WEKA, https://www.cs.waikato.ac.nz/ml/weka/
  16. D. E. Goldberg. (1989). Genetic Algorithms in Search, Optimization & Machine Learning. Addison. Wesely Publishing Co., Inc, 1998(3), 25.
  17. J. H. Seo. (2015). A study on the performance evaluation of unbalanced intrusion detection dataset classification based on machine learning. Journal of the Korean Institute of Intelligence Systems, 27, 466-474.