Probabilistic Segmentation and Tagging of Unknown Words
  • Journal title : Journal of KIISE
  • Volume 43, Issue 4,  2016, pp.430-436
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.4.430
 Title & Authors
Kim, Bogyum; Lee, Jae Sung;
Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.
unknown word processing;word segmentation;open word class processing;probabilistic morphological analysis;
 Cited by
