A Study on Improvement of Korean OCR Accuracy Using Deep Learning

Kang, Ga-Hyeon;Ko, Ji-Hyun;Kwon, Yong-Jun;Kwon, Na-Young;Koh, Seok-Ju;

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

2018.05a
/
Pages.693-695
/
2018

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

A Study on Improvement of Korean OCR Accuracy Using Deep Learning

딥러닝을 이용한 한글 OCR 정확도 향상에 대한 연구

Kang, Ga-Hyeon (Kyungpook National University) ;
Ko, Ji-Hyun (Kyungpook National University) ;
Kwon, Yong-Jun (Kyungpook National University) ;
Kwon, Na-Young (Kyungpook National University) ;
Koh, Seok-Ju (Kyungpook National University)

강가현 (경북대학교 컴퓨터학부) ;
고지현 (경북대학교 컴퓨터학부) ;
권용준 (경북대학교 컴퓨터학부) ;
권나영 (경북대학교 컴퓨터학부) ;
고석주 (경북대학교 컴퓨터학부)

Published : 2018.05.31

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose the improvement of Hangul OCR accuracy through deep learning. OCR is a program that senses printed and handwritten characters in an optical way and encodes them digitally. In the case of the most commonly used Tesseract OCR, the accuracy of English recognition is high. However, Hangul has lower accuracy because it has less learning data for a complex structure. Therefore, in this study, we propose a method to improve the accuracy of Hangul OCR by extracting the character region from the desired image through image processing and using deep learning using it as learning data. It is expected that OCR, which has been developed only by existing alphanumeric and several languages, can be applied to various languages.

다음은 본 논문에서는 딥러닝을 통한 한글 OCR 정확도 향상을 제안한다. OCR은 인쇄되거나 손으로 쓴 문자를 광학적 방법으로 감지 인식하여 디지털로 인코딩하는 프로그램이다. 현재 가장 많이 쓰이는 tesseract OCR의 경우, 영문 인식의 정확도가 높다. 하지만 한글은 복잡한 구조에 비해 학습 데이터가 적어 정확도가 떨어진다. 따라서 이 연구에서는 이미지 프로세싱을 통해 원하는 이미지에서 글자 영역을 추출하고, 이를 학습 데이터로 활용한 딥러닝으로 한글 OCR의 정확도를 향상시키는 방법을 제안한다. 기존 영문과 숫자 및 몇 가지 언어에만 국한되어 발전해왔던 OCR을 다양한 언어에도 응용할 수 있을 것으로 기대된다.

Keywords

OCR

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

A Study on Improvement of Korean OCR Accuracy Using Deep Learning

딥러닝을 이용한 한글 OCR 정확도 향상에 대한 연구

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)