Korean Sentence Boundary Detection Using Memory-based Machine Learning

메모리 기반의 기계 학습을 이용한 한국어 문장 경계 인식

  • 한군희 (천안대학교 정보통신학부) ;
  • 임희석 (한신대학교 소프트웨어학과)
  • Published : 2004.12.01


This paper proposes a Korean sentence boundary detection system which employs k-nearest neighbor algorithm. We proposed three scoring functions to classify sentence boundary and performed comparative analysis. We uses domain independent linguistic features in order to make a general and robust system. The proposed system was trained and evaluated on the two kinds of corpus; ETRI corpus and KAIST corpus. As experimental results, the proposed system shows about $98.82\%$ precision and $99.09\%$ recall rate even though it was trained on relatively small corpus.