DOI QR코드

DOI QR Code

영상처리기법을 이용한 CNN 기반 리눅스 악성코드 분류 연구

A Study on Classification of CNN-based Linux Malware using Image Processing Techniques

  • 김세진 (호서대학교 정보보호학과) ;
  • 김도연 (호서대학교 정보보호학과) ;
  • 이후기 (건양대학교 사이버보안공학과) ;
  • 이태진 (호서대학교 정보보호학과)
  • Kim, Se-Jin (Division of Information Security, Hoseo University) ;
  • Kim, Do-Yeon (Division of Information Security, Hoseo University) ;
  • Lee, Hoo-Ki (Department of Cyber Security Engineering, Konyang University) ;
  • Lee, Tae-Jin (Division of Information Security, Hoseo University)
  • 투고 : 2020.04.20
  • 심사 : 2020.09.04
  • 발행 : 2020.09.30

초록

사물인터넷(IoT) 기기의 확산으로 인해 다양한 아키텍처가 존재하는 Linux 운영체제의 활용이 증가하였다. 이에 따라 Linux 기반의 IoT 기기에 대한 보안 위협이 증가하고 있으며 기존 악성코드를 기반으로 한 변종 악성코드도 꾸준히 등장하고 있다. 본 논문에서는 시각화한 ELF(Executable and Linkable Format) 파일의 바이너리 데이터를 영상처리 기법 중 LBP(Local Binary Pattern)와 Median Filter를 적용하여 CNN(Convolutional Neural Network)모델로 악성코드를 분류하는 시스템을 제안한다. 실험 결과 원본 이미지의 경우 98.77%의 점수로 가장 높은 정확도와 F1-score를 보였으며 재현율도 98.55%의 가장 높은 점수를 보였다. Median Filter의 경우 99.19%로 가장 높은 정밀도와 0.008%의 가장 낮은 위양성률을 확인하였으며 LBP의 경우 전반적으로 원본과 Median Filter보다 낮은 결과를 보였음을 확인하였다. 원본과 영상처리기법별 분류 결과를 다수결로 분류했을 경우 원본과 Median Filter의 결과보다 정확도, 정밀도, F1-score, 위양성률이 전반적으로 좋아졌음을 확인하였다. 향후 악성코드 패밀리 분류에 활용하거나 다른 영상처리기법을 추가하여 다수결 분류의 정확도를 높이는 연구를 진행할 예정이다.

With the proliferation of Internet of Things (IoT) devices, using the Linux operating system in various architectures has increased. Also, security threats against Linux-based IoT devices are increasing, and malware variants based on existing malware are constantly appearing. In this paper, we propose a system where the binary data of a visualized Executable and Linkable Format (ELF) file is applied to Local Binary Pattern (LBP) image processing techniques and a median filter to classify malware in a Convolutional Neural Network (CNN). As a result, the original image showed the highest accuracy and F1-score at 98.77%, and reproducibility also showed the highest score at 98.55%. For the median filter, the highest precision was 99.19%, and the lowest false positive rate was 0.008%. Using the LBP technique confirmed that the overall result was lower than putting the original ELF file through the median filter. When the results of putting the original file through image processing techniques were classified by majority, it was confirmed that the accuracy, precision, F1-score, and false positive rate were better than putting the original file through the median filter. In the future, the proposed system will be used to classify malware families or add other image processing techniques to improve the accuracy of majority vote classification. Or maybe we mean "the use of Linux O/S distributions for various architectures has increased" instead? If not, please rephrase as intended.

키워드

참고문헌

  1. KISA & KrCERT, "2016 Mirai Malware Trends Report", Technical report, KISA, Republic of Korea, pp. 2-8
  2. B. N. Noh, Embedded Linux based IoT device malware analysis technology research, Technical Report, KISA, Korea, pp. 1-7
  3. Hongbi, Kim, Hyunseok, Shin, Junho, Hwang, Taejin, Lee, "Malware Variants Detection based on Dhash", Journal of KIISE, Vol. 46, No. 11, pp. 1207-1214, 2019.11 DOI : https://dx.doi.org/10.5626/JOK.2019.46.11.1207
  4. Jun-ho, Hwang, Tae-jin, Lee, "Study of Static Analysis and Ensemble-Based Linux Malware Classification", Journal of The Korea Institute of Information Security & Cryptology, VOL. 29, NO. 6, pp. 1327-1337, Dec. 2019 DOI : https://dx.doi.org/10.13089/JKIISC.2019.29.6.1327
  5. Seon-hee, Seok, Ho-won, Kim, "Visualized Malware Classification Based-on Convolutional Neural Network", Journal of The Korea Institute of Information Security & Cryptology, VOL. 26, NO. 1, pp. 197-208, Feb. 2016 DOI : https://dx.doi.org/10.13089/JKIISC.2016.26.1.197
  6. L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," Proc. of the 8th international symposium on visualization for cyber security, pp. 1-7, 2011 DOI : https://dx.doi.org/10.1145/2016904.2016908
  7. Geun-Youngm, Lim, Young-Bok, Cho, "Dynamic RNN-CNN malware clasifer corespond with Random Dimension Input Data", Journal of the Korea Instiute of Information and Comunication Enginering, Vol. 23, No. 5, pp. 533-539, May 2019
  8. Ho-Sung, Woo, Geon-Ung, Cheong, Jun-Woo-Cho, Jae-Hyun, Kim, "Antivirus Software Using CNN", Proceedings of Symposium of the Korean Institute of communications and Information Sciences, pp. 385-386, 2018.11
  9. Jiawei Su, Danilo Vasconcellos Vargas, Sanjiva Prasad, Daniele Sgandurra, Yaokai Feng, Kouichi Sakurai, "Lightweight Classification of IoT Malware based on Image Recognition", 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), 2018 DOI : https://dx.doi.org/10.1109/COMPSAC.2018.10315
  10. Tae-Guen, Kim, Hwan-Tae, Ji, Eul-Gyu, Im, "Malware Classification Using Machine Learning and Binary Visualization", KIISE Transactions on Computing Practices, Vol. 24, No. 4, pp. 198-203, 2018.4 DOI : https://dx.doi.org/10.5626/KTCP.2018.24.4.198
  11. Wei-Chung, Huang, Fabio Di Troia, Mark Stamp, "Robust Hashing for Image-based Malware Classification", Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE 2018), Vol. 1, pp. 451-459, 2018. DOI: https://dx.doi.org/10.5220/0006942204510459
  12. Jhu-Sin, Luo, Dan, Lo, Malware Image Classification using Machine Learning with Local Binary Pattern, Master's thesis, Kennesaw State University of Computer Science, 2018.5 DOI : https://dx.doi.org/10.1109/TPAMI.2006.244
  13. T. Ahonen, A. Hadid, M. Pietikainen, "Face Description with Local Binary Patterns: Application to Face Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, Issue. 12 , pp. 2037-2041, Dec. 2006 https://doi.org/10.1109/TPAMI.2006.244
  14. Matti Pietikainen, Local Binary Patterns, Scholarpedia, 2010, http://www.scholarpedia.org/article/Local_Binary_Patterns (accessed Apr. 3, 2020)
  15. Muhammad Furqan Rafique, Muhammad Ali, Aqsa Saeed Qureshi, Asifullah Khan, Jin Young Kim, Anwar Majid Mirza, "Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique", 2019