DOI QR코드

DOI QR Code

Implementation of a Machine Learning-based Recommender System for Preventing the University Students' Dropout

대학생 중도탈락 예방을 위한 기계 학습 기반 추천 시스템 구현 방안

  • Jeong, Do-Heon (College of Global Convergence Studies, Duksung Women's University)
  • 정도헌 (덕성여자대학교 글로벌융합대학)
  • Received : 2021.08.22
  • Accepted : 2021.10.20
  • Published : 2021.10.28

Abstract

This study proposed an effective automatic classification technique to identify dropout patterns of university students, and based on this, an intelligent recommender system to prevent dropouts. To this end, 1) a data processing method to improve the performance of machine learning was proposed based on actual enrollment/dropout data of university students, and 2) performance comparison experiments were conducted using five types of machine learning algorithms. 3) As a result of the experiment, the proposed method showed superior performance in all algorithms compared to the baseline method. The precision rate of discrimination of enrolled students was measured to be up to 95.6% when using a Random Forest(RF), and the recall rate of dropout students was measured to be up to 80.0% when using Naive Bayes(NB). 4) Finally, based on the experimental results, a method for using a counseling recommender system to give priority to students who are likely to drop out was suggested. It was confirmed that reasonable decision-making can be conducted through convergence research that utilizes technologies in the IT field to solve the educational issues, and we plan to apply various artificial intelligence technologies through continuous research in the future.

본 연구는 대학생의 중도탈락 패턴을 식별하는 효과적인 자동 분류 기법을 제안하고, 이를 바탕으로 중도탈락을 예방하기 위한 지능형 추천 시스템의 구현 방안을 제시하는 것을 목표로 한다. 이를 위해 1) 실제 대학생의 재학/제적 데이터를 기반으로 기계 학습의 성능을 향상시킬 수 있는 데이터 처리 방안을 제안하고, 2) 5종의 기계학습 알고리즘을 이용하여 성능 비교 실험을 실시하였다. 3) 실험 결과, 제안 기법이 베이스라인에 비해 모든 알고리즘에서 우수한 성능을 보여주었다. 제적생의 식별 정확률(precision)은 랜덤 포레스트(Random Forest)를 사용할 때 최대 95.6%, 제적생의 재현율(recall)은 나이브 베이즈(Naive Bayes)를 사용할 때 최대 80.0%로 측정되었다. 4) 마지막으로, 실험 결과를 바탕으로 중도탈락 가능성이 높은 학생을 우선 상담하는 추천 시스템의 활용 방안을 제시하였다. 교육 현안 문제를 해결하기 위해 IT 분야의 기술을 활용하는 융합 연구를 통해 합리적인 의사결정을 수행할 수 있음을 확인하였으며 향후 지속적인 연구를 통해 다양한 인공지능 기술을 적용하고자 한다.

Keywords

Acknowledgement

This Research was supported by Duksung Women's University Research Grants 2020 (3000005346).

References

  1. J. Y. Chung, M. Sun & M. J. Jeong. (2015). An Analysis of Institutional Factors Affecting on College Dropout Rates. Asian Journal of Education, 16(4), 57-76. URI : https://hdl.handle.net/10371/95751 https://doi.org/10.15753/aje.2015.12.16.4.57
  2. D. H. Jeong & J. Y. Park. (2021). Data Analysis of Dropouts of College Students Using Topic Modeling. Journal of the Korea Institute of Information and Communication Engineering, 25(1), 88-95. DOI : 10.6109/jkiice.2021.25.1.88
  3. P. Perchinunno, M. Bilancia, & D. Vitale. (2021). A Statistical Analysis of Factors Affecting Higher Education Dropouts. Social Indicators Research, 156, 341-362. DOI : 10.1007/s11205-019-02249-y
  4. M. Kang, E. Lee & E. Lee. (2019). Trends and influencing factors of college student's dropout intention. In Forum for Youth Culture, 58, 5-30. DOI : 10.17854/ffyc.2019.04.58.5
  5. C. Park. (2020). Development of Prediction Model to Improve Dropout of Cyber University. Journal of the Korea Academia-Industrial Cooperation Society, 21(7), 380-390. DOI : 10.5762/KAIS.2020.21.7.380
  6. E. J. Lee, Y. Song, J. H. Kim & S. H. Oh. (2020). An Exploratory Study on Determinants Predicting the Dropout Rate of 4-year Universities Using Random Forest: Focusing on the Institutional Level Factors. Journal of Educational Technology, 36(1), 191-219. https://doi.org/10.17232/kset.36.1.191
  7. H. J. Kim, H. S. Lee, B. J. Choi, & Y. H. Kim. (2019). Machine Learning-based Quality Control and Error Correction Using Homogeneous Temporal Data Collected by IoT Sensors. Journal of the Korea Convergence Society, 10(4), 17-23. DOI : 10.15207/JKCS.2019.10.4.017
  8. D. H. Jeong. (2017). Prescriptive analytics system design fusing automatic classification method and intellectual structure analysis method. Journal of the Korean Society for information Management, 34(4), 33-57. DOI : 10.3743/KOSIM.2017.34.4.033
  9. S. Zhang, L. Yao, A. Sun, & Y. Tay. (2019). Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Computing Surveys, 52(1), 1-38. DOI : 10.1145/3285029
  10. K. Lepenioti, A. Bousdekis, D. Apostolou, & G. Mentzas. (2020). Prescriptive analytics: Literature review and research challengesInternational. Journal of Information Management, 50, 57-70. DOI : 10.1016/j.ijinfomgt.2019.04.003.
  11. S. Goswami & A. Chakrabarti. (2012). Quartile Clustering: A quartile based technique for Generating Meaningful Clusters. Journal of Computing, 4(2), 48-55. arXiv: 1203.4157
  12. R. Bai, X. Wang, & J. Liao. (2010). Extract Semantic Information from WordNet to Improve Text Classification Performance. AST 2010, ACN 2010: Advances in Computer Science and Information Technology. 409-420 DOI : 10.1007/978-3-642-13577-4_36
  13. A. Kehagias, V. Petridis, V. G. Kaburlasos, & P. Fragkou. (2003). A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms. Journal of Intelligent Information Systems, 21, 227-247. DOI : 10.1023/A:1025554732352
  14. A. Stavrianou, C. Brun, T. Silander, & C. Roux. (2014). NLP-based feature extraction for automated tweet classification. Proceedings of the 1st International Conference on Interactions between Data Mining and Natural Language Processing, 1202, 145-146. https://aclanthology.org/2020.nlpcovid19-acl.17.pdf
  15. D. H. Jeong. (2019). Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method. Journal of the Korean Society for information Management, 36(4), 83-105. DOI : 10.3743/KOSIM.2019.36.4.083