DOI QR코드

DOI QR Code

A Detection Model using Labeling based on Inference and Unsupervised Learning Method

추론 및 비교사학습 기법 기반 레이블링을 적용한 탐지 모델

  • Hong, Sung-Sam (Department of Computer Engineering, Gachon University) ;
  • Kim, Dong-Wook (Department of Computer Engineering, Gachon University) ;
  • Kim, Byungik (Department of Security R&D Team 1, Korea Internet& Security Agency) ;
  • Han, Myung-Mook (Department of Computer Engineering, Gachon University)
  • Received : 2016.12.14
  • Accepted : 2017.01.22
  • Published : 2017.02.28

Abstract

The Detection Model is the model to find the result of a certain purpose using artificial intelligent, data mining, intelligent algorithms In Cyber Security, it usually uses to detect intrusion, malwares, cyber incident, and attacks etc. There are an amount of unlabeled data that are collected in a real environment such as security data. Since the most of data are not defined the class labels, it is difficult to know type of data. Therefore, the label determination process is required to detect and analysis with accuracy. In this paper, we proposed a KDFL(K-means and D-S Fusion based Labeling) method using D-S inference and k-means(unsupervised) algorithms to decide label of data records by fusion, and a detection model architecture using a proposed labeling method. A proposed method has shown better performance on detection rate, accuracy, F1-measure index than other methods. In addition, since it has shown the improved results in error rate, we have verified good performance of our proposed method.

탐지 모델은 인공지능 기법들이나 데이터 마이닝 기법, 또는 지능형 알고리즘들을 이용하여 어떠한 목적에 맞는 결과를 찾고자 하는 모델들이다. 사이버 보안에서는 주로 침입탐지, 악성코드 탐지, 침해사고 탐지, 공격 탐지로 활용되고 있다. 보안데이터와 같은 실제 환경에 수집되는 데이터들을 레이블이 되지 않은 데이터들이 많다. 클래스 레이블이 정해지지 않아 유형을 알 수 없는 데이터가 많아 정확한 탐지 및 분석을 하기 위해서는 레이블 결정과정이 필요하다. 본 논문에서 제안하는 방법은 레이블 결정을 위해 D-S 추론 알고리즘과 비교사 방법인 k-means 알고리즘을 적용하여 각 데이터의 레이블을 융합하여 결정할 수 있는 KDFL(K-means and D-S Fusion based Labeling)제안하였으며 이를 적용한 탐지 모델 구조를 제안하였다. 제안하는 방법은 실험을 통해 기존의 방법에 비해 탐지율, 정확도, F1-measure 성능 지표에서 우수한 성능을 나타냈다. 또한 오류율도 크게 개선된 결과를 나타내어 제안하는 방법의 성능을 검증할 수 있었다.

Keywords

References

  1. Anna L. Buczak, Erhan Guven, "A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection," IEEE COMMUNICATIONS SURVEYS & TUTORIALS, Vol.18, No.2, 2016. https://doi.org/10.1109/comst.2015.2494502
  2. Sannasi Ganapathy, Kanagasabai Kulothungan, Sannasy Muthurajkumar, Muthusamy Vijayalakshmi, Palanichamy Yogesh, and Arputharaj Kannan, "Intelligent feature selection and classification techniques for intrusion detection in networks: a survey," EURASIP Journal on Wireless Communications and Networking (open access), 2013. https://dx.doi.org/10.1186/1687-1499-2013-271
  3. R. Hendry and S. J. Yang, "Intrusion signature creation via clustering anomalies," Proc. SPIE Defense Secur. Symp. Int. Soc. Opt. Photonics, pp.69730C- 69730C, 2008. https://doi.org/10.1117/12.775886
  4. Claudio Mazzariello, "Multiple classifier Systems for Network Security from data collection to attack detection," Universita degli Studi di Napoli Federico Il Open Archive, Doctor Thesis, 2008.
  5. N. B. Amor, S. Benferhat, and Z. Elouedi, "Naive Bayes vs. decision trees in intrusion detection systems," in Proc ACM Symp. Appl. Comput., pp.420-424, 2004. https://doi.org/10.1145/967900.967989
  6. Bass, Tim, "Intrusion detection systems and multisensor data fusion," Communications of the ACM, Vol.43, No.4, pp.99-105, 2000. https://doi.org/10.1145/332051.332079
  7. MLA Deng, Xinyang, and Yong Deng, "Multisensor Information Fusion Based on Dempster-shafer Theory and Power Average Operator," Journal of Computational Information Systems, Vol.9, No.16 pp.6417-6424, 2013. https://doi.org/10.12733/jcis7841
  8. Seo, Young Mi Jee, Hong Ke and Soontak Lee, "Rainfall Frequency Analysis and Uncertainty Quantification Using Dempster-Shafer Theory," Korea Water Resources Association 2010 KWRA conference, pp.1390-1394, 2010.
  9. Burroughs, Daniel J., Linda F. Wilson and George V. Cybenko, "Analysis of distributed intrusion detection systems using Bayesian methods. Performance," The 21st IEEE International Computing, and Communications, 2002. https://doi.org/10.1109/ipccc.2002.995166
  10. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. and Wirth, R, "CRISP-DM 1.0 Step-by-step data mining guide", IBM, 2000.
  11. Soukaena Hassan Hashem, "Efficiency of SVM and PCA to Enhance Intrusion Detection System," Journal of Asian Scientific Research, Vol.3, No.4, pp.381-395, 2013.
  12. Hong, Sung-Sam, Wanhee Lee, and Myung-Mook Han, "The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text classification," International Journal of Advances in Soft Computing & Its Applications, Vol.7, No.1, 2015.
  13. Rampure, Vinod, and Akhilesh Tiwari. "A Rough Set Based Feature Selection on KDD CUP 99 Data Set." International Journal of Database Theory and Application, Vol.8, No.1, pp.149-156, 2015. https://doi.org/10.14257/ijdta.2015.8.1.16
  14. http://www.r-project.org/
  15. https://cran.r-project.org/package=e1071
  16. KDD' cup 99, "Knowledge discovery in databases DARPA archive," http://www.kdd.ics.uci.edu/databases/kddcup99/task.html, 1999.
  17. Monowar H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network Anomaly Detection: Methods, Systems and Tools," IEEE Communications Surveys & Tutorials, Vol.16, No.1, pp.303-336, 2014. https://doi.org/10.1109/surv.2013.052213.00046
  18. Syarif, A. Prugel-Bennett, G. Wills, "Unsupervised clustering approach for network anomaly detection," Networked digital technologies communications in computer and information science, Vol.293, Springer, pp.135-145, 2012. https://doi.org/10.1007/978-3-642-30507-8_13