Advanced SearchSearch Tips
Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques
Park, Hee Jin; Jang, Kyoung Ye; Lee, Youn Ho; Kim, Woo Je; Kang, Pil Sung;
  PDF(new window)
College Scholastic Ability Test(CSAT) is a primary test to evaluate the study achievement of high-school students and used by most universities for admission decision in South Korea. Because its level of difficulty is a significant issue to both students and universities, the government makes a huge effort to have a consistent difficulty level every year. However, the actual levels of difficulty have significantly fluctuated, which causes many problems with university admission. In this paper, we build two types of data-driven prediction models to predict correct answer rate and to identify significant factors for CSAT English test through accumulated test data of CSAT, unlike traditional methods depending on experts` judgments. Initially, we derive candidate question-specific factors that can influence the correct answer rate, such as the position, EBS-relation, readability, from the annual CSAT practices and CSAT for 10 years. In addition, we drive context-specific factors by employing topic modeling which identify the underlying topics over the text. Then, the correct answer rate is predicted by multiple linear regression and level of difficulty is predicted by classification tree. The experimental results show that 90% of accuracy can be achieved by the level of difficulty (difficult/easy) classification model, whereas the error rate for correct answer rate is below 16%. Points and problem category are found to be critical to predict the correct answer rate. In addition, the correct answer rate is also influenced by some of the topics discovered by topic modeling. Based on our study, it will be possible to predict the range of expected correct answer rate for both question-level and entire test-level, which will help CSAT examiners to control the level of difficulties.
College Ability Scholastic Test(CSAT) Difficulties;English Test;Topic Modeling;Multiple Linear Regression;Decision Tree;
 Cited by
인공신경망의 은닉층 최적화를 통한 농산물 가격예측 모델,배경태;김창재;

한국정보기술학회논문지, 2016. vol.14. 12, pp.161-169 crossref(new window)
2015 school year the CSAT questions headquarters, "2015 school year, the CSAT Press," in Proceedings 2015 school year the CSAT questions headquarter, 2014.

Korea Institute for Curriculum and Evaluation, "2015 school year CSAT score results press release," in Proceedings Korea Institute for Curriculum and Evaluation, 2014.

Korea Institute for Curriculum and Evaluation, "2015 school year CSAT plan," in Proceedings Korea Institute for Curriculum and Evaluation, 2014.

T. C. Kang, "CSAT Improvement Study," Ministry of Education, pp.57-77, 2013.

M. K. Kang and Y. M. Kim, "The internal analysis of the validation on item-types of Foreign (English) Language Domain of the current 2005 CSAT for designing the level-differentiated English tests of the 2014 CSAT," Journal of the Korea English Education Society, Vol.12, No2, pp.1-35, 2013.

K. S. Lee, "The effects of th number of questions per passage, the length of passage, and the topic familiarity on multiple-choice English listening and reading comprehension tests," English Teaching, Vol.54, No.4, pp.327-351, 1999.

N. B. Kim, "A corpus-based lexical analysis of the foreign language(English) test for the college scholastic ability test (CSAT)," English Language & Literature Teaching, Vol.14, No.4, pp.201-221, 2008.

K. S. Chang, "A model of predicting item difficulty of the reading test of College Scholastic Ability Test," Foreign Languages Education, Vol.11, No.1, pp.111-130, 2004.

Y. M. Sung, "Factor Analysis of English Test Scores in the College Scholastic Ability Test and Implications," Ph.D. dissertation, Inha University Graduate School, 2003.

H. W. Lee and S. Y. Lee, "A study on the relationship between the scores of TOEFIC, TOEIC and TEPS, and college academic performance," English Language & Literature Teaching, Vol.9, No.1, pp.153-171, 2003.

L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification and Regression Trees," Wadsworth, 1984.

D. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining," A Bradford Book The MIT Press, 2001.

F. Sebstiai, "Machine learning in automated text categorization," ACM Computing Surverys, Vol.34, No.1, 2002.

J. H. Bae, J. E. Son, and M. Song, "Analysis of twitter for 2012 South Korea presidential election by text mining techniques," Journal of Intelligent Information Systems, Vol.19, No.3, pp.141-156, 2013.

H. J. Lee and J. C. Park, "Probabilistic filtering for a biological knowledge discovery system with text mining and automatic inference," Journal of the Korea Society of Computer and Information, Vol.17, No.2, pp.139-147, 2012.

D. Blei, "Probabilistic topic models," Communications of the ACM, Vol.55, No.4, pp.77-84, 2012. crossref(new window)

S. R. Kang, "A Study on the Readability of English Textbooks: Middle School English 1 and 2 Based on the Revised 7th English National Curriculum," Master Dissertation, Inha University Graduate School, 2010.