Compilation of the Yonsei English Learner Corpus (YELC) 2011 and Its Use for Understanding Current Usage of English by Korean Pre-university Students

한국 예비 대학생의 영어 사용 특성 파악을 위한 대규모 공개 영어 학습자 코퍼스 구축 및 분석

  • 이석재 (연세대학교 영어영문학과) ;
  • 정채관 (한국교육과정평가원 영어교육센터)
  • Received : 2014.08.12
  • Accepted : 2014.09.22
  • Published : 2014.11.28


In recent years, researchers have become increasingly interested in the creation and pedagogical use of English learner corpora. Many studies have shown that learner corpora can not only make a significant contribution to second language acquisition research but also contribute to the construction and evaluation of language tests by advancing our understanding of English learners. So far, however, little attention has been paid to the Korean EFL (English as a foreign language) learners' corpus. The Yonsei English Learner Corpus (YELC 2011) is a specialized, monolingual, and synchronic Korean EFL learner corpus that was developed by Yonsei University from 2011 to 2012. Over 3,000 Korean high school graduates (or equivalents) who were accepted by Yonsei University for their further studies participated in this project. It consists of 6,572 written texts (1,085,828 words) at nine different English proficiency levels. In this paper, we describe its compilation, and more specifically, how we have corpusized from a text archive to a corpus. After introducing the process of corpusization, we report arresting insights into the specific linguistic features that different proficiency levels of Korean learners of English have. This study also discusses the potential use of the YELC 2011 which is now freely available for research purposes.


English;English Education;Corpus;English Learner Corpus;Corpus Compilation


  1. J. Sinclair, Corpus, concordance, collocation. Oxford: Oxford University Press, 1991.
  2. A. O'Keefe, M. McCarthy, and R. Carter, From corpus to classroom: Language use and language teaching, Cambridge: Cambridge University Press, 2007.
  3. S. Hunston, Corpora in applied linguistics, Cambridge: Cambridge University Press, 2002.
  4. 양옥렬, 강창규, 남명우, "대화형 코퍼스의 설계 및 구조적 문서화에 관한 연구", 한국콘텐츠학회논문지, 제4권, 제4호, pp.1-10, 2004.
  5. 하명정, "코퍼스에 기반한 문학텍스트 분석", 한국콘텐츠학회논문지, 제13권, 제9호, pp.447-447, 2013.
  6. 권혁승, 정채관, 코퍼스 언어학, 한국문화사, 2012.
  7. H. Kucera and W. Francis, Computational analysis of present-day American English, Providence, R.I.: Brown University Press, 1967.
  8. P. Crawford, B. Brian, and H. Kevin. In H. Hamilton, W. Y. Chou (eds.), The Routledge Handbook of Language and Health Communication, Abingdon, UK: Routledge, pp.75-90, 2014.
  9. G. Kjellmer, A dictionary of English collocations based on the Brown Corpus, Oxford: Clarendon Press, 1994.
  10. S. Granger, The computer learner corpus: A versatile new source of data for SLA research, In S. Granger (ed.), Learner English on computer, Abingdon, UK: Routledge, pp.3-18, 2013.
  11. G. Leech, "100 million words of English: the British National Corpus (BNC)," Language Research, Vol.28, No.1, pp.1-13, 1992.
  12. G. Leech, P. Rayson, and A. Wilson, Word frequencies in written and spoken English: Based on the British National Corpus, London: Longman, 2001.
  13. P. Baker, A. Hardie, and T. McEnery, A glossary of corpus linguistics, Edinburgh: Edinburgh University Press, 2006.
  14. C. James, Awareness, consciousness and language contrast. In. C. Mair, and M. Markus (eds.). Proceedings of the new departures in contrastive linguistics conference, Leopold - Franzens - University of Innsbruck, Austria, pp.183-197, 1992.
  15. S. Granger, "The international corpus of learner English: A new resource for foreign language learning and teaching and second language acquisition research," TESOL Quarterly, Vol.37, pp.538-546, 2003.
  16. P. Gillard and A. Gadsby, Using a learners' corpus in compiling ELT dictionaries, In S. Granger (ed.), Learner English on Computer, London: Longman, pp.159-171, 1998.
  17. 권혁승, "코퍼스 언어학의 실제 및 응용",응용언어학, 제24권, 제3호, pp.1-30, 2008.
  18. J. M. Choi, Personal communication, September 24, 2011.
  19. 한나래, 이수화, "학습자 코퍼스를 이용한 영어 전치사 오류 교정 모델 개발", 언어학, 제53권, 제1호, pp.163-185, 2009.
  20. N. R. Han, Personal communication, February 25, 2012.
  21. H. S. Kwon, "The SNU Korean learner corpus of English: Compilation and application," English Language and Linguistics, Vol.28, pp.203-228, 2009.
  22. H. K. Lee, "Investigating the applicability of the CEFR to a placement test for an English language program in Korea," English Language and Linguistics, Vol.17, pp.29-60, 2011.
  23. D. Biber, University language: A corpus-based study of spoken and written registers, Amsterdam: John Benjamins Publishing, 2006.
  24. T. McEnery, R. Xiao, and Y. Tono. Corpus-based language study: An advanced resource book, Abingdon, UK: Routledge, 2006.
  25. M. Stubbs, Text and corpus analysis, Oxford: Blackwell, 1996.
  26. S. Alsop and H. Nesi, "Issues in the development of the British Academic Written English (BAWE) corpus," Corpora, Vol.4, pp.71-83, 2009.
  27. C. K. Jung and S. Wharton, "Finding textual examples of genres: Issues for corpus users," Korean Journal of English Language and Linguistics, Vol.12, No.1, pp.64-82, 2012.
  28. H. Nesi and S. Gardner, Genres across the disciplines: Student writing in higher education, Cambridge: Cambridge University Press, 2012.
  29. N. Pravec, "Survey of learner corpora," ICAME Journal, Vol.26, pp.81-114, 2002.
  30. 안성호, 이은영, "한국인 학습자 전자우편 영어의 말뭉치 언어학적 분석", 영어학, 제5권, 제4호, pp.733-756, 2005.
  31. E. J. Lee, "Degree adverbial collocations in the Korean EFL learners' writing corpus: With a focus on intensifiers," Foreign Language Education, Vol.13, pp.1-21, 2006.
  32. M, Axelsson, "USE-The Uppsala Student English Corpus: An instrument for needs analysis," ICAME Journal, Vol.24, pp.155-157, 2000.
  33. M. Scott, WordSmith Tools version 6, Liverpool: Lexical Analysis Software, 2012.
  34. P. Scholfield, Quantifying language: A researcher's and teacher's guide to gathering language data and reducing it to figures, Clevedon, Avon: Multilingual Matter, 1995.
  35. E. Castello, Integrating learner corpus data into the assessment of spoken interaction in English in an Italian university context, In S. Granger, G. Gilquin, and F. Meunier (eds.), Twenty Years of Learner Corpus Research: Looking back, Moving ahead, Louvain-la-Neuve: Presses universitaires de Louvain, pp.61-74, 2013
  36. S. T. Gries, and A. S. Adelman, "Subject realization in Japanese conversation by native and non-native speakers: Examplifying a new paradigm for learner corpus research," In J. Romero-Trillo (ed.), Yearbook of Corpus Linguistics and Pragmatics 2014: New Empirical and Theoretical Paradigms, pp.35-54, 2014.
  37. L. Anthony, AntWordProfiler 1.4.0w Tokyo: Waseda University, 2013.
  38. M. West. A general service list of English words, London: Longman, 1953.
  39. I. S. P. Nation and L. Anthony, "Mid-frequency readers," The Journal of Extensive Reading, Vol.1, pp.5-16, 2013.
  40. A. Coxhead, "A new academic word list," TESOL Quarterly, Vol.34, pp.213-238, 2000.
  41. L. Bauer and I.S.P. Nation, "Word families," International Journal of Lexicography, Vol.6, No.3, pp.1-27, 1993.

Cited by

  1. Priming and adaptation in native speakers and second-language learners pp.1469-1841, 2017,
  2. Lexical Sophistication as a Multidimensional Phenomenon: Relations to Second Language Lexical Proficiency, Development, and Writing Quality vol.102, pp.1, 2017,