Korean Semantic Similarity Measures for the Vector Space Models

Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan

  • Received : 2015.12.09
  • Accepted : 2015.12.17
  • Published : 2015.12.31


It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.


semantic similarity in Korean;semantic relatedness;lexical relation;cross-POS-relations


  1. Hare, M., Elman, J. L., Tabaczynski, T., & McRae, K. (2009). The wind chilled the spectators, but the wine just chilled: Sense, structure, and sentence comprehension. Cognitive Science, 33(4), 610-628.
  2. Pado, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.
  3. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13-47.
  4. Jeffrey Pennington, Richard Socher, and Christopher Manning. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 12, 1532-1543.
  5. Yang et al. (2015). A Study on Word Vector Models for Representing Korean Semantic Information. Journal of the Korean Society of Speech Sciences, 7(4), 165-166.
  6. Lopukhin, A. S. (2015). The origin of life is the prerogative of primordial planets of novas. Herald of the Russian Academy of Sciences, 85(5), 453-458.
  7. Cruse, D. A. (1986). Lexical semantics. Cambridge University Press.
  8. Murphy, G. L., & Andrew, J. M. (1993). The conceptual basis of antonymy and synonymy in adjectives. Journal of memory and language, 32(3), 301-319.
  9. Kim, H. K. (1967). Korean kinship terminology: A semantic analysis. Language Research, 3(1), 70-81.
  10. Mititelu, V. B. (2008). Hyponymy patterns. In Text, Speech and Dialogue (pp. 37-44). Springer Berlin Heidelberg.
  11. Oakes, M. P. (2005). Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus. In RANLP Text Mining Workshop, 5, 63-67.
  12. Lapata, M., & Lascarides, A. (2003). A probabilistic account of logical metonymy. Computational Linguistics, 29(2), 261-315.
  13. McRae, K., Spivey-Knowlton, M. J., & Tanenhaus, M. K. (1998). Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension. Journal of Memory and Language, 38(3), 283-312.
  14. McRae et al. (2005) A basis for generating expectancies for verbs from nouns. Memory & Cognition, 33(7), 1174-1184.

Cited by

  1. Hypothetical Research Model and Program Design for Improving Transfer Student's Learning vol.30, pp.3, 2018,


Supported by : Sogang University