DOI QR코드

DOI QR Code

Text Region Extraction and OCR on Camera Based Images

카메라 영상 위에서의 문자 영역 추출 및 OCR

  • 신현경 (경원대학교 수학정보학과)
  • Published : 2010.02.28

Abstract

Traditional OCR engines are designed to the scanned documents in calibrated environment. Three dimensional perspective distortion and smooth distortion in images are critical problems caused by un-calibrated devices, e.g. image from smart phones. To meet the growing demand of character recognition of texts embedded in the photos acquired from the non-calibrated hand-held devices, we address the problem in three categorical aspects: rotational invariant method of text region extraction, scale invariant method of text line segmentation, and three dimensional perspective mapping. With the integration of the methods, we developed an OCR for camera-captured images.

기존의 OCR 엔진은 보정된 환경에서 읽혀진 서류 영상에 맞게 설계되어있다. 스마트 폰을 비롯한 검정 화면 거리가 보정되지 않은 기기에서 읽혀진 영상에서는 삼차원 원근 투시에 의한 찌그러짐 또는 곡면상에서의 찌그러짐 등이 핵심적인 문제점들로 여겨진다. 휴대용 단말기에서 읽혀진 영상들에서의 OCR 기능에 대한 요구가 증가일로에 있는 시점에서, 본 논문에서는 문제점들을 세 가지로 구분하고 - 회전에 무관한 문자 영역 추출, 폰트 등의 크기에 무관한 문자 선 영역 추출, 3차원 매핑 이론 - 이를 해결하기위한 방법을 제시하였다. 이러한 방법론을 통합하여 카메라 영상 위에서의 OCR을 개발하였다.

Keywords

References

  1. A. Zandifar, R. Duraiswami, A. Chahine, L. Davis, “A Video Based Interface to Textual Information for the Visually Impaired,” IEEE 4th icmi, pp.325-330, 2002.
  2. D. Doermann, J. Liang, H. Li, “Progress in Camera-Based Document Image Analysis,” ICDAR. 2003.
  3. W. Newman, C. Dance, A. Taylor, S. Taylor, M. Taylor, T. Aldhous, “CamWorks: A Video-based Tool for Efficient Capture from Paper Source Document,” Proc. In the ICMCS, pp.647-653, 1999.
  4. P. Wellner, “Interacting with Paper on the DigitalDesk,” Comm. ACM, Vol.36, No.7, pp.87-96, 1993. https://doi.org/10.1145/159544.159630
  5. J. Liang, D. DeMethon, D. Doermann “Geometric Rectification of Camera-Captured Document Images,” IEEE Trans. PAMI. 2006.
  6. N. Chaddha, R. Sharma, A. Agrawai, A. Gupta, “Text Segmentation in Mixed Mode Images,” in Proc. Asilomar Conf. Signals, Syst., Comput., Vol.2, pp.1356-1361, 1994.
  7. Y. Zhong, H. Zhang, A.K. Jain, “Automatic Caption Localization in Compressed Video,” IEE Trans. PAMI., Vol.22, No.4, pp. 385-392, 2000. https://doi.org/10.1109/34.845381
  8. S. Lee, Y. Kim, S. Choi, “Fast Scene Change Detection Using Direct Feature Extraction from MPEG Compressed Videos,” IEEE Trans. on Vol.2, Issue4, Dec., 2000 pp.240-254.
  9. A. Jian, S. Bhattacharjee, “Text Segmentation Using Gabor Filters for Automatic Document Processing,” Machine Vis. Applicat., Vol.5, pp.169-184, 1992. https://doi.org/10.1007/BF02626996
  10. C. Jung, Q. Liu, J. Kim, “A New Approach for Text Segmentation Using a Stroke Filter,” Signal Processing, 88, pp.1907-1916, 2008. https://doi.org/10.1016/j.sigpro.2008.02.002
  11. V. Wu, R. Manmatha, E. Riseman, “Textfinder: An Automatic System to Detect and Recognize Text in Images,” IEEE. Trans. Pattern Anal. Mach. Intell., Vol.21, No.11, pp. 1224-1229, 1999. https://doi.org/10.1109/34.809116
  12. M. Guarnera, G. Messina, E. Ardizzone, L. Agro, “Text localization from photos,” Digest of Technical Papers International Conference on Consumer Electronics, pp.1-2, 2009.
  13. L.L. Sulem, A. Zahour, B. Taconet, “Text Line Segmentation of Historical Documents: a Survey,” IJDAR 2007.
  14. A. Zahour, B. Taconet, P. Mercy, and S. Ramdane, “Arabic hand-written text-line extraction,” ICDAR 2001.
  15. R. Ryue, J. Song, M. Cai, “A Comprehensive Method for Multilingual Video Text Detection, Localization, and Extraction,” IEEE Trans. CSVT, 2005.
  16. R. Manmatha, N. Srimal, “Scale space technique for word segmentation in handwritten manuscripts,” PAMI, 2005.
  17. Shi, Z., Venu Govindaraju, “Line separation for complex document images using fuzzy runlength,” Proceedings. First International Workshop, 2004.M. Lyu, J. Song, M.
  18. M. Feldback, K.D. Tonnies, “Line Detection and Segmentation in Historical Church Registers,” ICDAR, 2001.
  19. Y. Li, Y. Zheng, D. Doermann, “Script-independent Text Line Segmentation in Freestyle Handwritten Documents,” IEEE Trans. PAMI., 2008.
  20. E. Oztop et al, “Repulsive attractive network for baseline extraction on document Images,” IEEE Signal proceesing. 1997.
  21. Tseng, Lee, “Recognition based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm,” PR Letter, 1999. https://doi.org/10.1016/S0167-8655(99)00043-4
  22. S. Pollard, M. Pilu, “Building cameras for capturing documents,” IJDAR, Vol.7, pp.123-137, 2005. https://doi.org/10.1007/s10032-004-0129-0
  23. P. Clark, M. Mirmehdi, “Estimating the orientation and recovery of text planes in a single image,” in Proc. BMVC, pp.421-430, 2001.
  24. G. Myers, R. Bolles, Q. Luong, J. Herson, H. Aradhye, “Rectification and recognition of text in 3-D scenes,” IJDAR, Vol.7, pp.147-158, 2005. https://doi.org/10.1007/s10032-004-0133-4
  25. A. Ulges, C. Lampert, T. Breul, “Document image dewarping using robust estimation of curled text lines,” Proc. ICDAR, pp.1001-1005, 2005.
  26. C. Wu, G. Agam, “Document image de-warping for text/graphics recognition,” in SPR2002, Int. Workshop on Stat. and Struc. Pattern Recognition, Lecture Notes in Computer Science, Vol.2396, pp.348-357, 2002.
  27. Z. Zhang, C. Tan, “Correcting document image warping based on regression of curved text lines,” ICDAR, Vol.1, pp. 589-593, 2003.