DOI QR코드

DOI QR Code

CNN 모델을 이용한 프로그램 코드 변경 예측

Predicting Program Code Changes Using a CNN Model

  • 김동관 (목포해양대학교 해양컴퓨터공학과)
  • Kim, Dong Kwan (Department of Computer Engineering, Mokpo National Maritime University)
  • 투고 : 2021.07.08
  • 심사 : 2021.09.20
  • 발행 : 2021.09.28

초록

소프트웨어 시스템은 생명주기동안 기능 추가, 버그 수정, 새로운 컴퓨팅 환경 수용 등의 다양한 이유로 프로그램 코드 변경이 요구된다. 이러한 코드 수정 과정에서 새로운 오류 발생을 가져올 수 있으므로 프로그램 코드 수정 과정은 새로운 시스템 개발 못지 않게 신중하게 처리되야 한다. 또한, 오픈 소스 프로그램에 대한 재사용이 일반화된 소프트웨어 개발환경에서 오픈 소스 프로그램의 코드 변경 가능성을 예측할 수 있다면, 보다 양질의 프로그램 개발 효과를 기대할 수 있을 것이다. 본 논문은 소스 코드 변경을 예측하는 Convolutional Neural Network (CNN) 기반의 딥러닝 모델을 제안한다. 소스 코드 변경을 예측하는 문제는 딥러닝의 이진 분류 문제이며 레이블된 데이터가 요구되는 지도학습을 사용한다. 코드 예측 모델의 학습 및 시험을 위해 깃허브에서 수집한 Java 소스 코드와 코드 변경 로그를 데이터로 사용한다. 수집된 Java 소스 코드에서 소프트웨어 메트릭스를 계산한 후 제안된 코드 변경 예측 모델의 입력 데이터로 사용한다. 제안된 모델의 성능 평가를 위해 정밀도, 재현율, F1점수, 정확도가 측정되었으며 각각의 평가 지표에 있이서 CNN 모델은 95%, 다층 퍼셉트 기반의 DNN 모델은 92%를 달성했다.

A software system is required to change during its life cycle due to various requirements such as adding functionalities, fixing bugs, and adjusting to new computing environments. Such program code modification should be considered as carefully as a new system development becase unexpected software errors could be introduced. In addition, when reusing open source programs, we can expect higher quality software if code changes of the open source program are predicted in advance. This paper proposes a Convolutional Neural Network (CNN)-based deep learning model to predict source code changes. In this paper, the prediction of code changes is considered as a kind of a binary classification problem in deep learning and labeled datasets are used for supervised learning. Java projects and code change logs are collected from GitHub for training and testing datasets. Software metrics are computed from the collected Java source code and they are used as input data for the proposed model to detect code changes. The performance of the proposed model has been measured by using evaluation metrics such as precision, recall, F1-score, and accuracy. The experimental results show the proposed CNN model has achieved 95% in terms of F1-Score and outperformed the multilayer percept-based DNN model whose F1-Score is 92%.

키워드

과제정보

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1I1A3A01056368).

참고문헌

  1. B. W. Boehm. (1991). Software risk management: principles and practices. IEEE Software 8(1), 32-41. DOI : 10.1109/52.62930
  2. M. Lindvall & K. Sandahl. (1998). How well do experienced software developers predict software change?, Journal of Systems and Software, 43(1), 19-27. DOI : 10.1016/S0164-1212(98)10019-5
  3. L. Kaur & A. Mishra. (2020). A Pragmatic Framework for Predicting Change Prone Files Using Machine Learning Techniques with Java-based Software, Asia Pacific Journal of Information Systems, 30(3). DOI : 10.14329/apjis.2020.30.3.457
  4. A. G. Koru & H. Liu. (2007). Identifying and characterizing change-prone classes in two large-scale opensource products. Journal of Systems and Software, 80(1), 63-73. DOI : 10.1016/j.jss.2006.05.017
  5. C. S. Melo, M. M. L. da Cruz, A. D. F. Martins, T. Matos, J. M. da Silva Monteiro Filho & J. de Castro Machado, (2019). A practical guide to support change-proneness prediction. In Proceedings of the 21st International Conference on Enterprise Information Systems, SciTePress, 269-276. DOI : 10.5220/0007727702690276
  6. A. D. F. Martins, C. Melo, J. M. S. Monteiro & J. C. Machado, (2020). Empirical Study about Class Change Proneness Prediction using Software Metrics and Code Smells, In Proceedings of the 22nd International Conference on Enterprise Information Systems, SciTePress, 140-147. DOI : 10.5220/0009410601400147
  7. A. Barbez, F. Khomh & Y. Gueheneuc. (2019). Deep Learning Anti-patterns from Code Metrics History. In Proceedings of the 37th International Conference on Software Maintenance and Evolution. IEEE, 114-124. DOI : 10.1109/ICSME.2019.00021
  8. N. Pritam, M. Khari, L. Hoang Son, R. Kumar, S. Jha, I. Priyadarshini, M. Abdel-Basset, & H. Viet Long. (2019). Assessment of code smell for predicting class change proneness using machine learning. IEEE Access, 7, 37414-37425. DOI : 10.1109/ACCESS.2019.2905133
  9. F. Pudlitz, F. Brokhausen, & A. Vogelsang. (2019). Extraction of system states from natural language requirements. In Proceedings of the 27th International Conference on Requirements Engineering. IEEE, 211-222. DOI : 10.14279/depositonce-8717
  10. J. Chen, C, Chen, Z. Xing, X. Xia, L. Zhu, J. Grundy, & J. Wang. (2020). Wireframe-based UI design search through image autoencoder. ACM Transactions on Software Engineering and Methodology, 29(3), ACM, 1-31. DOI : 10.1145/3391613
  11. Y. Hussain, Z. Huang, Y. Zhou, & S. Wang. (2020). CodeGRU: Context-aware deep learning with gated recurrent unit for source code modeling. Information and Software Technology, 125, 106309. DOI : 10.1016/j.infsof.2020.106309
  12. Y. Yang, X. Xia, D. Lo, & J. Grundy. (2020), A Survey on Deep Learning for Software Engineering, ArXiv, 2011.14597.
  13. Tianchi Zhou, Xiaobing Sun, Xin Xia, Bin Li & Xiang Chen. (2019). Improving defect prediction with deep forest. Information and Software Technology, 114, 204-216. DOI : 10.1016/j.infsof.2019.07.003
  14. H. K. Dam, T. Pham, S. W. Ng, T. Tran, J. Grundy, A. Ghose, T. Kim & C. Kim. (2019). Lessons learned from using a deep tree-based model for software defect prediction in practice. In Proceedings of the 16th International Conference on Mining Software Repositories. IEEE, 46-57. DOI : 10.1109/MSR.2019.00017
  15. M. Wen, R. Wu & S. Cheung. (2018). How well do change sequences predict defects? sequence learning from software changes. IEEE Transactions on Software Engineering, 46, 1155-1175. DOI : 10.1109/TSE.2018.2876256
  16. M. Y. Mhawish & M. Gupta. (2019) Generating Code-Smell Prediction Rules Using Decision Tree Algorithm and Software Metrics, International Journal of Computer Sciences and Engineering, 7(5), 41-48. DOI : 10.26438/ijcse/v7i5.4148
  17. T. Guggulothu & S. A. Moiz. (2020). Code smell detection using multi-label classification approach. Software Quality Journal, 28, 1063-1086. DOI : 10.1007/s11219-020-09498-y
  18. M. Y. Mhawish & M. Gupta. (2020). Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics. Journal of Computer Science and Technology, 35, 1428-1445. DOI : 10.1007/s11390-020-0323-7
  19. H. Watson, T. J. McCabe, & D. R. Wallace. (1996). Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric. Gaithersburg, MD, USA: NIST, 1-114. DOI : 10.6028/NIST.SP.500-235
  20. V. Y. Shen, S. D. Conte, & H. E. Dunsmore. (1983). Software science revisited: A critical analysis of the theory and its empirical support, IEEE Transactions on Software Engineering, 9(2), 155-165. DOI : 10.1109/TSE.1983.236460
  21. S. Chidamber, & C. Kemerer. (1994). A metrics suite for object oriented design, IEEE Transaction on Software Engineering, 20(6), 476-493. DOI : 10.1109/32.295895