DOI QR코드

DOI QR Code

Compound Noun Decomposition by using Syllable-based Embedding and Deep Learning

음절 단위 임베딩과 딥러닝 기법을 이용한 복합명사 분해

  • 이현영 (국민대학교 컴퓨터공학과) ;
  • 강승식 (국민대학교 소프트웨어학부)
  • Received : 2018.10.10
  • Accepted : 2019.02.17
  • Published : 2019.06.30

Abstract

Traditional compound noun decomposition algorithms often face challenges of decomposing compound nouns into separated nouns when unregistered unit noun is included. It is very difficult for those traditional approach to handle such issues because it is impossible to register all existing unit nouns into the dictionary such as proper nouns, coined words, and foreign words in advance. In this paper, in order to solve this problem, compound noun decomposition problem is defined as tag sequence labeling problem and compound noun decomposition method to use syllable unit embedding and deep learning technique is proposed. To recognize unregistered unit nouns without constructing unit noun dictionary, compound nouns are decomposed into unit nouns by using LSTM and linear-chain CRF expressing each syllable that constitutes a compound noun in the continuous vector space.

Acknowledgement

Supported by : 한국연구재단

References

  1. 이현영, 강승식, "워드 임베딩과 딥러닝 기법을 이용한 SMS 문자 메시지 필터링," 스마트미디어저널, 제7권, 제4호, 24-29쪽, 2018년 12월 https://doi.org/10.30693/smj.2018.7.4.24
  2. 옹윤지, 강승식, "터치스크린 환경에서 쿼티 자판 오타 교정을 위한 n-gram 언어 모델", 스마트미디어저널, 제7권, 제2호, 54-59쪽, 2018년 6월 https://doi.org/10.30693/SMJ.2018.7.2.54
  3. 박승현, 이은지, 김판구, "한글 편집거리 알고리즘을 이용한 한국어 철자 오류 교정 방법," 스마트미디어저널, 제6권, 제1호, 16-21쪽, 2017년 3월
  4. 원형석, 박미화, 이근배. "복합명사 분할과 명사구 합성을 이용한 통합 색인 기법," 정보과학회논문지 : 소프트웨어 및 응용, 제27권, 제1호, 84-95쪽, 2000년 1월
  5. Jae Hoon Kim, "Korean Base-Noun Extraction and its Application," The KIPS Transactions: Part B, vol. 15, no. 6, pp. 613-620. Dec. 2008.
  6. Hyun Min Lee, Hyuk Ro Park. "Artificial Intelligence: A Reverse Segmentation Algorithm of Compound Nouns," The KIPS Transactions: Part B, vol. 8, no. 4, pp. 357-364. Aug. 2001.
  7. Seung-Shik Kang, "A Decomposition Algorithm of Korean Compound Nouns," Journal of KISS(B): Software and Applications, vol. 25, no. 1, pp. 172-182, Jan. 1998.
  8. Yong Hoon Lee, Cheol Young Ock, Eung Bong Lee, "Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network," The KIPS Transactions : Part B, vol. 19, no. 1, pp. 63-76, Feb. 2012.
  9. Kwangseob Shim, "A Compound Noun Segmentation using Composite Mutual Information," Journal of KISS(B): Software and Applications, vol. 24, no. 11, pp. 1307-1317, Nov. 1997.
  10. Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF Models for Sequence Tagging," arXiv preprint arXiv:1508.01991. 2015.
  11. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean, "Distributed Representations of Words and Phrases and their Compositionality," In Advances in neural information processing systems, Lake Tahoe, the United States, pp. 3111-3119, Dec. 2013.
  12. Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, T., "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, vol 5, pp. 135-146. Jun, 2017. https://doi.org/10.1162/tacl_a_00051
  13. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint arXiv:1301.3781, 2013.