DOI QR코드

DOI QR Code

Improvement of early prediction performance of under-performing students using anomaly data

이상 데이터를 활용한 성과부진학생의 조기예측성능 향상

  • Received : 2022.09.05
  • Accepted : 2022.09.30
  • Published : 2022.11.30

Abstract

As competition between universities intensifies due to the recent decrease in the number of students, it is recognized as an essential task of universities to predict students who are underperforming at an early stage and to make various efforts to prevent dropouts. For this, a high-performance model that accurately predicts student performance is essential. This paper proposes a method to improve prediction performance by removing or amplifying abnormal data in a classification prediction model for identifying underperforming students. Existing anomaly data processing methods have mainly focused on deleting or ignoring data, but this paper presents a criterion to distinguish noise from change indicators, and contributes to improving the performance of predictive models by deleting or amplifying data. In an experiment using open learning performance data for verification of the proposed method, we found a number of cases in which the proposed method can improve classification performance compared to the existing method.

최근 학생 수 감소로 인한 대학 간 경쟁이 심화되면서 성과부진학생을 조기에 예측하고, 중도이탈을 예방하기 위해 다양한 노력을 기울이는 것은 대학의 필수 업무로 인식되고 있다. 이를 위해서는 학생의 성과를 정밀하게 예측하는 우수한 성능의 모델이 필수적이다. 본 논문은 성과부진학생을 식별하기 위한 분류 예측 모델에서 이상 데이터를 제거하거나 증폭을 통해 예측 성능을 향상시키는 방법에 대해 제안한다. 기존 이상데이터 처리방법은 주로 데이터를 삭제하거나 무시하는데 집중되었지만 이 논문에서는 잡음과 변화지표를 구분하는 기준을 제시하고, 데이터를 삭제하거나 증폭함으로써 예측 모델의 성능을 높이는데 기여한다. 제안 방법의 검증을 위해 공개된 학습 성과 데이터를 활용한 실험에서 기존 방법에 비해 제안방법이 분류 성능을 향상시킬 수 있는 다수의 사례를 발견할 수 있었다.

Keywords

References

  1. D. W. Youn, "The present and future of university restructuring (Focused on quota policy)," Korea Higher Education Research Institute, pp. 1-54, Dec. 2021.
  2. Ministry of Education. 2021 Basic Competency Assessment for Universities Basic plan [Internet]. Available: https://www.moe.go.kr/boardCnts/view.do?boardID=294&boardSeq=78253&lev=0&searchType=null&statusYN=W&page=1&s=moe&m=020402&opType=N.
  3. Complete College America, Four-year myth: Making college more affordable [Internet]. Available: http://completecollege.org/wp-content/uploads/2014/11/4-Year-Myth.pdf, 2014.
  4. H. Lakkaraju, E. Aguiar, C. Shan, D. Miller, N. Bhanpuri, R. Ghani, and K. L. Addison, "A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcome," in Proceedings of the 21st ACM SIGKDD, International Conference on Knowledge Discovery and Data, Sydney, Australia, pp. 1909-1918, 2015.
  5. B. Albreiki, N. Zaki, and H. Alashwal, "A Systematic Literature Review of Student' Performance Prediction Using Machine Learning Techniques," Education Science, vol. 11, no. 9, pp. 1-27, Sep. 2020. https://doi.org/10.3390/educsci11010001
  6. E. Alyahyan and D. Dustegor, "Predicting academic success in higher education: Literature review and best practices," International Journal of Educational Technology in Higher Education, vol. 17, no. 3, Feb. 2020.
  7. W. Xing and D. Du, "Dropout Prediction in MOOCs: Using Deep Learning for Personalized Intervention," Journal of Educational Computing Research, vol. 57, no. 3, pp. 547-570, Mar. 2019. https://doi.org/10.1177/0735633118757015
  8. B. Krawczyk, "Learning from imbalanced data: open challenges and future directions," Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221-232, Apr. 2016. https://doi.org/10.1007/s13748-016-0094-0
  9. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-Sampling Technique," Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321-357, Dec. 2019.
  10. J. A. Saez, J. Luengo, J. Stefanowski, and F. Herrera, "SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering," Information Sciences, vol. 291, no. 10, pp. 184-203, Jan. 2015. https://doi.org/10.1016/j.ins.2014.08.051
  11. H. Han, W. Wang, and B. Mao, "Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning," in Proceedings of International Conference on Intelligent Computing, Berlin, Heidelberg, pp. 878-887, 2017.
  12. L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Smek, M. Kloft, T. G. Dietterich, and K. -R. Muller, "A Unifying Review of Deep and Shallow Anomaly Detection," in Proceedings of the IEEE, vol. 109, no. 5, pp. 756-795, May 2021. https://doi.org/10.1109/JPROC.2021.3052449
  13. O. Serradilla, E. Zugasti, J. Ramirez de Okariz, J. Rodriguez, and U. Zurutuza, "Adaptable and Explainable Predictive Maintenance: Semi-Supervised Deep Learning for Anomaly Detection and Diagnosis in Press Machine Data," Applied Sciences, vol. 11, no. 16, pp. 73-76, Aug. 2021.
  14. H. Zhao, Y. Li, N. He, K. Ma, L. Fang, H. Li, and Y. Zheng, "Anomaly Detection for Medical Images Using Self-Supervised and Translation-Consistent Features," IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3641-3651, Dec. 2021. https://doi.org/10.1109/TMI.2021.3093883
  15. H. D. Nguyen, K. P. Tran, S. Thomassey, and M. Hamad, "Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management," International Journal of Information Management, vol. 27, pp. 102282, Apr. 2021.
  16. M. Easter, H. P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in large Spatial Databases with Noise," in KDD'96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland: OR, USA, pp. 226-231, 1996.
  17. P. Cortez and A. Silva. "Using Data Mining to Predict Secondary School Student Performance," in Proceedings of 5th Future Business Technology Conference (FUBUTEC 2008), Portugal, pp. 5-12, 2008.