Extraction of Text Regions from Spam-Mail Images Using Color Layers

Kim Ji-Soo;Kim Soo-Hyung;Han Seung-Wan;Nam Taek-Yong;Son Hwa-Jeong;Oh Sung-Ryul;

doi:10.3745/KIPSTB.2006.13B.4.409

The KIPS Transactions:PartB (정보처리학회논문지B)

Volume 13B Issue 4 Serial No. 107
/
Pages.409-416
/
2006
/
1598-284X(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Extraction of Text Regions from Spam-Mail Images Using Color Layers

색상레이어를 이용한 스팸메일 영상에서의 텍스트 영역 추출

김지수 (전남대학교 전산학과) ;
김수형 (전남대학교 전자컴퓨터공학부) ;
한승완 (한국전자통신연구원) ;
남택용 (한국전자통신연구원 정보보호 연구본부 능동보안기술연구팀) ;
손화정 (전남대학교 전산학과) ;
오성열 (전남대학교 전산학과)

Published : 2006.08.01

https://doi.org/10.3745/KIPSTB.2006.13B.4.409 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose an algorithm for extracting text regions from spam-mail images using color layer. The CLTE(color layer-based text extraction) divides the input image into eight planes as color layers. It extracts connected components on the eight images, and then classifies them into text regions and non-text regions based on the component sizes. We also propose an algorithm for recovering damaged text strokes from the extracted text image. In the binary image, there are two types of damaged strokes: (1) middle strokes such as 'ㅣ' or 'ㅡ' are deleted, and (2) the first and/or last strokes such as 'ㅇ' or 'ㅁ' are filled with black pixels. An experiment with 200 spam-mail images shows that the proposed approach is more accurate than conventional methods by over 10%.

본 논문에서는 스팸메일 영상에서 텍스트 영역의 추출을 위한 색상 레이어기반의 알고리즘을 제안한다. CLTE(color layer-based text extraction)는 색상 레이어를 사용하여 영상을 8개로 나눈다. 8개 각각의 영상에서 연결요소를 추출한 후, 연결요소의 크기에 의해서 텍스트 영역과 비텍스트 영역을 분류하고 텍스트 영역을 추출한다. 또한, 추출된 텍스트 영역으로부터 회손된 획 정보를 복구하는 알고리즘을 제안한다. 이진영상내의 한글 문자에는 두 가지 형태의 손상된 획이 존재한다. 첫째 중성 획에 해당하는 'ㅣ' 나 'ㅡ' 등의 획들이 지워지는 경우와, 둘째 초 종성 획에 해당하는 'ㅁ' 이나 'ㅇ'이 흑화소로 채워지는 경우가 있다. 제안한 알고리즘은 이러한 두 가지 손상된 획들을 복구해준다. 200개의 스팸메일 영상을 사용한 실험 결과 제안한 알고리즘이 기존의 텍스트 추출 알고리즘보다 10% 이상 우수함을 관측하였다.

Keywords

References

A. K. Jain, B. Yu, 'Automatic Text Location in Images and Video Frames,' Pattern Recognition, Vol.31, No.12, pp.2055-2076, 1998 https://doi.org/10.1016/S0031-3203(98)00067-3
J. Hoya, A. Shio and S. Akamatsu, 'Recognizing Characters in Scene Images,' IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.16, No.2, pp.67-82, 1995 https://doi.org/10.1109/34.273729
R. Lienhart, F. Stuber, 'Automatic Text Recognition in Digital Videos,' Image and Video Processing IV, SPIE, 1996
S. Messelodi and C. M. Modena, 'Automatic Identification and Skew Estimation of Test Lines in Real Scene Images,' Pattern Recognition, Vol.32, No.5, pp.701-810, 1999
Y. Zhong, K. Karu and A. K Jain, 'Locating Text in Complex collar Images,' Pattern Recognition, Vol.28. No.10,pp.1532-1535
O. Hori, 'A Video Text Extraction Method for Character Recognition,' Proc. Fifth International Conference on Document Analysis and Recognition, pp.25-28, 1999 https://doi.org/10.1109/ICDAR.1999.791716
J. Ohya, A. Shio and S. Akamatsu, 'Recognizing Characters in Scene Images,' IEEE Trans. Pattern Analysis and Machine Intelligence, PAMI-16(2), pp. 214-220, 1994 https://doi.org/10.1109/34.273729
X. Wang, X. Ding and C. Liu, 'Character Extraction and Recognition in Natural Scene Images,' Proc. Sixth International Conference on Document Analysis and Recognition, pp. 1084-1088, 2001 https://doi.org/10.1109/ICDAR.2001.953953
C. Wolf and J,M. Jolion, 'Extraction and Recognition of Artificial Text in Multimedia Documents,' Pattern Analysis and Applications, Vol.6, No.4, pp.306- 326, 2003 https://doi.org/10.1007/s10044-003-0197-7
V. Wu, R. Manmatha and E.M. Riseman, 'An Automatic System to Detect and Recognize Text in Images,' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.21, No.11, pp.1224-1229, 1999 https://doi.org/10.1109/34.809116
J. Zhang, X. Chen, A. Hanneman, J, Yang and A. Waibel, 'A Robust Approach for Recognition of Text Embedded in Natural Scenes,' Proc. 16th International Conference on Pattern Recognition, Vol.3, pp.204-207, 2002 https://doi.org/10.1109/ICPR.2002.1047830
김지수, 김수형,'명도 정보를 이용한 자연 이미지에서의 텍스트 영역 추출,' 한국정보처리학회 호남.제주지부 학술 발표논문집, Vol.3. pp.127-132, 2003
김지수, 김수형, 최영우,' 명도 정보와 Split/Merge 분할을 이용한 자연 이미지에서의 텍스트 영역 추출,' 한국정보과학회논문지 : 소프트웨어 및 응용 Vol.32, No.6, pp.502-511, 2005
Y.J. Song, K.C. Kim, Y.W. Choi, H.R. Byun, S.H. Kim, S.Y Chi, D.K. lang, Y.K Chung, 'Text Region Extraction and Text Segmentation on Camera-captured Document Style Images,' Proc. of the 7th International Conference on Document Analysis and Recognition, Vol.1. pp.172-176, 2005 https://doi.org/10.1109/ICDAR.2005.234
D.H. Ballard and CM Brown, Computer Vision, Prentice-Hall, 1982

The KIPS Transactions:PartB (정보처리학회논문지B)

Extraction of Text Regions from Spam-Mail Images Using Color Layers

색상레이어를 이용한 스팸메일 영상에서의 텍스트 영역 추출

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)