Korean Named Entity Recognition and Classification using Word Embedding Features
  • Journal title : Journal of KIISE
  • Volume 43, Issue 6,  2016, pp.678-685
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.6.678
Choi, Yunsu; Cha, Jeongwon;
Named Entity Recognition and Classification (NERC) is a task for recognition and classification of named entities such as a person's name, location, and organization. There have been various studies carried out on Korean NERC, but they have some problems, for example lacking some features as compared with English NERC. In this paper, we propose a method that uses word embedding as features for Korean NERC. We generate a word vector using a Continuous-Bag-of-Word (CBOW) model from POS-tagged corpus, and a word cluster symbol using a K-means algorithm from a word vector. We use the word vector and word cluster symbol as word embedding features in Conditional Random Fields (CRFs). From the result of the experiment, performance improved 1.17%, 0.61% and 1.19% respectively for TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other NERC systems, we demonstrate the effectiveness and efficiency of the proposed method.
natural language processing;named entity recognition and classification;word embedding;continuous bag-of-words model;
사회적 이슈 리스크 유형 분류를 위한 어휘 자질 선별,오효정;윤보현;김찬영;

정보처리학회논문지:소프트웨어 및 데이터공학, 2016. vol.5. 11, pp.541-548 crossref(new window)
