DOI QR코드

DOI QR Code

Clustering Analysis of Science and Engineering College Students' understanding on Probability and Statistics

Robust PCA를 활용한 이공계 대학생의 확률 및 통계 개념 이해도 분석

  • Yoo, Yongseok (Department of Electronics Engineering, Incheon National University)
  • 유용석 (인천대학교 전자공학과)
  • Received : 2022.02.20
  • Accepted : 2022.03.20
  • Published : 2022.03.28

Abstract

In this study, we propose a method for analyzing students' understanding of probability and statistics in small lectures at universities. A computer-based test for probability and statistics was performed on 95 science and engineering college students. After dividing the students' responses into 7 clusters using the Robust PCA and the Gaussian mixture model, the achievement of each subject was analyzed for each cluster. High-ranking clusters generally showed high achievement on most topics except for statistical estimation, and low-achieving clusters showed strengths and weaknesses on different topics. Compared to the widely used PCA-based dimension reduction followed by clustering analysis, the proposed method showed each group's characteristics more clearly. The characteristics of each cluster can be used to develop an individualized learning strategy.

본 연구에서는 실제 대학의 소규모 강좌에서 확률과 통계에 대한 수강생들의 이해도를 쉽고 빠르게 분석하기 위한 방법을 제안한다. 95명의 이공계 대학생을 대상으로 확률과 통계에 대한 컴퓨터 기반 검사를 시행하였다. 학생들의 응답을 Robust PCA와 가우시안 혼합 모델을 사용하여 7개의 군집으로 나눈 뒤, 각 군집 별로 주제별 성취도를 분석하였다. 상위권 군집은 통계적 추정을 제외한 다른 주제들에 대해서 대체로 높은 성취도를 보였으며, 저성취 군집들은 서로 다른 주제에 대해서 강약점을 보였다. 제안하는 기법은 기존에 널리 쓰이는 PCA를 사용하여 차원 축소 후 군집 분석을 수행한 것 보다 각 군집들의 특성이 더 분명하게 나타냈다. 이는 각 군집 별 특징에 따른 개별화된 학습 전략을 개발하는 데 활용될 수 있다.

Keywords

Acknowledgement

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1G1A1011136) and Incheon National University Research Grant in 2020.

References

  1. G. A. Jones. (2005). Exploring probability in school: Challenges for teaching and learning. Springer Science & Business Media.
  2. Ministry of Education. (1997). The 7th national mathematics curriculum.
  3. D. H. Jang & H. J. Lee. (2004). A Study on Probability and Statistics Education in 1-10 Grade Mathematics Textbooks in Korea. The Korean Journal of applied Statistics, 18(1), 229-249. DOI : 10.5351/KJAS.2005.18.1.229
  4. M. S. Park & E. J. Lee. (2021). An Analysis of Domestic Research Trends of Probability Education. Journal of the Korean School Mathematics, 24(4), 349-367. DOI : 10.30807/ksms.2021.24.4.002
  5. K. S. Oh. (2011). Probability and statistics curriculum in school. Journal of the Korean Data And Information Science Society, 22(6), 1097-1103.
  6. C. I. Kim & Y. J. Jeon. (2018). A Study on Pre-service Mathematics Teachers' some Misconceptions in the Statistics and Probability. ournal of the Korean School Mathematics, 21(4), 469-483. DOI : 10.30807/ksms.2018.21.4.008
  7. F. B. Baker & S-H. Kim. (2004). Item response theory: Parameter estimation techniques. CRC Press.
  8. S. E. Embretson & S. P. Reise. (2013). Item response theory. Psychology Press.
  9. C. D. Desjardins & O. Bulut. (2018). Handbook of Educational Measurement and Psychometrics using R. CRC Press,
  10. M. von Davier & Y. S. Lee. (2019). Handbook of diagnostic classification models. Springer International Publishing.
  11. A. A. Rupp & J. L. Templin. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6(4), 219-262. DOI : 10.1080/15366360802490866
  12. M. Birenbaum, A. E. Kelly & K. K. Tatsuoka. (1993). Diagnosing knowledge states in algebra using the rule-space model. Journalfor Research in Mathematics Education, 24(5), 442-459. DOI: 10.5951/jresematheduc.24.5.0442
  13. K. K. Tatsuoka. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal ofeducational measurement, 345-354.
  14. J. P. Leighton, M. J. Gierl & S. M. Hunka. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of educational measurement, 41(3), 205-237. DOI : 10.1111/j.1745-3984.2004.tb01163.x
  15. M. J. Gierl, C. Alves & R. T. Majeau. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees' knowledge and skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10(4), 318-341. DOI : 10.1080/15305058.2010.509554
  16. J. de la Torre. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343-362. DOI : 10.1111/j.1745-3984.2008.00069.x
  17. L. T. DeCarlo, (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447-468. DOI : 10.1177/0146621612449069
  18. C. Y. Chiu. (2013) Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37, 598-618. DOI : 10.1177/0146621613488436
  19. H. Koh, W. Jang & Y. Yoo. (2021). On Validating Cognitive Diagnosis Models for the Arithmetic Skills of Elementary School Students. International Journal of Advanced Computer Science and Applications, 12(12), 51-55. DOI : 10.14569/IJACSA.2021.0121207
  20. W. Jang & Y. Yoo. (2020). Analysis of the Status of Students' Knowledge Using Unsupervised Learning. Journal of The Korean Institute of Intelligent Systems. 30(4), 314-324. DOI : 10.5391/JKIIS.2020.30.4.315
  21. T. Hastie, R. Tibshirani, & J. Friedman. (2009). The elements of statistical learning. New York: springer.
  22. I. T. Jolliffe. (1986). Principal component analysis. Springer.
  23. J. A. Hartigan & M. A. Wong. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series C, 28(1), 100-108. DOI: 10.2307/2346830
  24. L. Xu & M. I. Jordan. (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural computation, 8(1), 129-151. DOI: 10.1162/neco.1996.8.1.129
  25. P. J. Rousseeuw & K. V. Driessen. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212-223. https://doi.org/10.1080/00401706.1999.10485670
  26. K. P. Burnham & D. R. Anderson. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research, 33(2), 261-304. DOI : 10.1177/0049124104268644
  27. J. Y. Lee & K. H. Lee. (2017). Study on the Levels of Informal Statistical Inference of the Middle and High School Students. School Mathematics 19(3), 533-551.
  28. M. S. Park, M. Park, K. H. Lee & E. S. Ko. (2011). Middle School Students' Statistical Inference Engaged in Comparing Data Sets. School Mathematics 13(4), 599-614.
  29. Y. M. Jee & Y. Yoo. (2019). Undergraduates' Understanding of Sampling Distribution and Confidence Interval in Statistical Inference. School Mathematics, 21(1). 125-153. DOI : 10.29275/sm.2019.03.21.1.125
  30. E. S. Ko & K. H. Lee. (2011). Pre-service Teachers' Understanding of Statistical Sampling. Journal of Educational Research in Mathematics 21(1), 17-32.
  31. L. Saldanha & P. Thompson. (2002). Conceptions of sample and their relationship to statistical inference. Educational studies in mathematics, 51(3), 257-270. DOI : 10.1023/A:1023692604014
  32. D. Ben-Zvi & J. B. Garfield. (2004). The challenge of developing statistical literacy, reasoning and thinking. Kluwer academic publishers.