Text Line Segmentation of Handwritten Documents by Area Mapping

  • 투고 : 2015.07.01
  • 심사 : 2015.09.25
  • 발행 : 2015.09.30

초록

Text line segmentation is a preprocessing step in OCR, which can significantly influence the accuracy of document analysis applications. This paper proposes a novel methodology for the text line segmentation of handwritten documents. First, the average width of the connected components is used to form a 1-D Gaussian kernel and a smoothing operation is then applied to the input binary image. The adaptive binarization of the smoothed image forms the final text lines. In this work, the segmentation method involves two stages: firstly, the large connected components are labelled as a unique text line using text line area mapping. Secondly, the final refinement of the segmentation is performed using the Euclidean distance between the text line and small connected components. The group of uniquely labelled text candidates achieves promising segmentation results. The proposed approach works well on Korean and English language handwritten documents captured using a camera.

키워드

참고문헌

  1. Likforman-Sulem, L.; Zahour, A.; Taconet, B.;"Text line segmentation of historical documents: a survey," Int. J. Doc. Anal. Recognit., vol.9, no.2, pp.123-138, 2007. https://doi.org/10.1007/s10032-006-0023-z
  2. Kumar, V.; Negi, A.;"Fringe Map Based Text Line Segmentation of Printed Telugu Document Images," Document Analysis and Recognition (ICDAR), 2011, pp.1294-1298, Sep 2011.
  3. Papavassiliou, V.; Katsouros, V.;"A Morphological Approach for Text-Line Segmentation in Handwritten Documents," 12th International Conference on Frontiers in Handwriting Recognition 2010, pp.16-24, Nov 2010.
  4. Clausner, C.;"A Robust Hybrid Approach for Text Line Segmentation in Historical Documents," International Conference on Pattern Recognition (ICPR) 2012, pp.335-338, Nov 2012.
  5. Saabni, R.; El-Sana, J.;"Language-Independent Text Lines Extraction Using Seam Carving," International Conference on Document Analysis and Recognition (ICDAR) 2011, pp.563-568, Sep 2011.
  6. Manohar, V.; Vitaladevuni, S.N.; Cao, H.; Prasad, R.; Natarajan, P.;"Graph Clustering-based Ensemble Method for Handwritten Text Line Segmentation," International Conference on Document Analysis and Recognition (ICDAR) 2011, pp.574-578, Sep 2011.
  7. Bukhari, S.S.; Shafait, F.; Breuel, T.M.; "Text-Line Extraction using a Convolution of Isotropic Gaussian Filter with a Set of Line Filters,"International Conference on Document Analysis and Recognition (ICDAR) 2011, pp.579-583, Sep 2011.
  8. Li, Y.; Zheng, Y.; Doermann, D.;"Detecting Text Line in Handwritten Documents," International Conference on Pattern Recognition (ICPR), 2012, pp.1030-1033, 2006.
  9. Stamatopoulos, N.; Gatos, B.; Louloudis, G.; Pal, U.; Alaei, A.;"ICDAR 2013 Handwriting Segmentation Contest," 12th International Conference on Document Analysis and Recognition (ICDAR), 2013, pp.1402-1406, Aug 2013.