An Automatic Coding System of Korean Standard Industry/Occupation Code Using Example-based Learning

예제기반의 학습을 이용한 한국어 표준 산업/직업 자동 코딩 시스템

  • 임희석 (한신대학교 소프트웨어학과)
  • Published : 2005.08.01

Abstract

Standard industry and occupation code are usually assigned manually in Korean census. The manual coding is very labor intensive and expensive task. Furthermore, inconsistent coding is resulted from the ability of human experts and their working environments. This paper proposes an automatic code classification system which converts natural language responses on survey questionnaires into corresponding numeric codes by using manually constructed rule base and example-based machine learning. The system was trained with 400,000 records of which standard codes was assigned. It was evaluated with 10-fold cross validation and was tested with three code sets: population occupation set, industry set, and industry survey set. The proposed system showed 76.63%, 82.24 and 99.68% accuracy for each code set.

Keywords

Manual Coding;Automatic Coding;Machine Learning;Industry(Occupation) Code Classification