Advanced SearchSearch Tips
Scoring Korean Written Responses Using English-Based Automated Computer Scoring Models and Machine Translation: A Case of Natural Selection Concept Test
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Scoring Korean Written Responses Using English-Based Automated Computer Scoring Models and Machine Translation: A Case of Natural Selection Concept Test
Ha, Minsu;
  PDF(new window)
This study aims to test the efficacy of English-based automated computer scoring models and machine translation to score Korean college students` written responses on natural selection concept items. To this end, I collected 128 pre-service biology teachers` written responses on four-item instrument (total 512 written responses). The machine translation software (i.e., Google Translate) translated both original responses and spell-corrected responses. The presence/absence of five scientific ideas and three ideas in both translated responses were judged by the automated computer scoring models (i.e., EvoGrader). The computer-scored results (4096 predictions) were compared with expert-scored results. The results illustrated that no significant differences in both average scores and statistical results using average scores was found between the computer-scored result and experts-scored result. The Pearson correlation coefficients of composite scores for each student between computer scoring and experts scoring were 0.848 for scientific ideas and 0.776 for ideas. The inter-rater reliability indices (Cohen kappa) between computer scoring and experts scoring for linguistically simple concepts (e.g., variation, competition, and limited resources) were over 0.8. These findings reveal that the English-based automated computer scoring models and machine translation can be a promising method in scoring Korean college students` written responses on natural selection concept items.
automated computer scoring;written response;natural selection;assessment;
 Cited by
Anderson, D. L., Fisher, K. M., & Norman, G. J. (2002). Development and evaluation of the conceptual inventory of natural selection. Journal of Research in Science Teaching, 39(10), 952-978. crossref(new window)

Basu, S., Jacobs, C., & Vanderwende, L. (2013). Powergrading: A clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics, 1, 391-402.

Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance?. Journal of Science Education and Technology, 23(1), 160-182. crossref(new window)

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. crossref(new window)

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. crossref(new window)

Crossgrove, K., & Curran, K. L. (2008). Using clickers in nonmajors-and majors-level biology courses: student opinion, learning, and long-term retention of course material. CBE-Life Sciences Education, 7(1), 146-154. crossref(new window)

Ha, M. (2013). Assessing scientific practices using machine learning methods: Development of automated computer scoring models for written evolutionary explanations. Unpublished Doctoral Dissertation. Columbus: The Ohio State University.

Ha, M., & Nehm, R. H. (2016a). The impact of misspelled words on automated computer scoring: A case study of scientific explanations. Journal of Science Education and Technology, 25, 358-374. crossref(new window)

Ha, H., & Nehm, R. H. (2016b). Predicting the accuracy of computer scoring of text: Probabilistic, multi-model, and semantic similarity approaches. Paper in proceedings of the National Association for Research in Science Teaching, Baltimore, MD, April 14-17.

Haudek, K. C., Prevost, L. B., Moscarella, R. A., Merrill, J., & Urban-Lurain, M. (2012). What are they thinking? Automated analysis of student writing about acid-base chemistry in introductory biology. CBE-Life Sciences Education, 11(3), 283-293. crossref(new window)

Kaplan, J. J., Haudek, K. C., Ha, M., Rogness, N., & Fisher, D. G. (2014). Using lexical analysis software to assess student writing in statistics. Technology Innovations in Statistics Education, 8(1).

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-174.

Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389-405. crossref(new window)

Levesque, A. A. (2011). Using clickers to facilitate development of problem-solving skills. CBE-Life Sciences Education, 10(4), 406-417. crossref(new window)

Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215-233. crossref(new window)

Magnusson, S. J., Templin, M., & Boyle, R. A. (1997). Dynamic science assessment: A new approach for investigating conceptual change. The Journal of the Learning Sciences, 6(1), 91-142. crossref(new window)

Makiko, M., Yuta, T., & Kazuhide, Y. (2011). Phrase-based statistical machine translation via Chinese characters with small parallel corpora. IJIIP: International Journal of Intelligent Information Processing, 2(3), 52-61. crossref(new window)

Mathan, S. A., & Koedinger, K. R. (2005). Fostering the intelligent novice: Learning from errors with metacognitive tutoring. Educational Psychologist, 40(4), 257-265. crossref(new window)

Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 1-14.

Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183-196. crossref(new window)

Nehm, R. H., Ha, M., Rector, M., Opfer, J. E., Perrin, L., Ridgway, J. et al. (2010). Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project 0909999.

Odom, A. L., & Barrow, L. H. (1995). Development and application of a two-tier diagnostic test measuring college biology students' understanding of diffusion and osmosis after a course of instruction. Journal of Research in Science Teaching, 32(1), 45-61. crossref(new window)

Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: Knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744-777. crossref(new window)

Rutledge, M. L., & Warden, M. A. (1999). The development and validation of the measure of acceptance of the theory of evolution instrument. School Science and Mathematics, 99(1), 13-18. crossref(new window)

Sato, T., Yamanishi, Y., Kanehisa, M., & Toh, H. (2005). The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics, 21(17), 3482-3489. crossref(new window)

Shute, V. J. (2008). Focus on formative feedback. Review of educational research, 78(1), 153-189. crossref(new window)

Weston, M., Haudek, K. C., Prevost, L., Urban-Lurain, M., & Merrill, J. (2015). Examining the impact of question surface features on students' answers to constructed-response questions on photosynthesis. CBE-Life Sciences Education, 14(2), ar19. crossref(new window)

Zhu, Z., Pilpel, Y., & Church, G. M. (2002). Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. Journal of Molecular Biology, 318(1), 71-81. crossref(new window)