JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Local Similarity based Document Layout Analysis using Improved ARLSA
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : International Journal of Contents
  • Volume 11, Issue 2,  2015, pp.15-19
  • Publisher : The Korea Contents Association
  • DOI : 10.5392/IJoC.2015.11.2.015
 Title & Authors
Local Similarity based Document Layout Analysis using Improved ARLSA
Kim, Gwangbok; Kim, SooHyung; Na, InSeop;
  PDF(new window)
 Abstract
In this paper, we propose an efficient document layout analysis algorithm that includes table detection. Typical methods of document layout analysis use the height and gap between words or columns. To correspond to the various styles and sizes of documents, we propose an algorithm that uses the mean value of the distance transform representing thickness and compare with components in the local area. With this algorithm, we combine a table detection algorithm using the same feature as that of the text classifier. Table candidates, separators, and big components are isolated from the image using Connected Component Analysis (CCA) and distance transform. The key idea of text classification is that the characteristics of the text parallel components that have a similar thickness and height. In order to estimate local similarity, we detect a text region using an adaptive searching window size. An improved adaptive run-length smoothing algorithm (ARLSA) was proposed to create the proper boundary of a text zone and non-text zone. Results from experiments on the ICDAR2009 page segmentation competition test set and our dataset demonstrate the superiority of our dataset through f-measure comparison with other algorithms.
 Keywords
Document Layout Analysis;Page Segmentation;Table Detection;Adaptive RLSA;
 Language
English
 Cited by
 References
1.
M. Chen, X. Ding, and Y. Wu, “Unified HMM-based Layout Analysis Framework and Algorithm,” Science in China Series F: Information Sciences, vol. 46, no. 6, Dec. 2003, pp. 401-408. crossref(new window)

2.
S. P. Chowdhury, S. Mandal, A. K. Das, and B. Chanada, "Segmentation of Text and Graphics from Document Images," ICDAR 2007, pp. 619-623.

3.
A. M. Vil’kin and I.V. Safonov, “Bottom-up Document Segmentation Method based on Textural Features,” Pattern Recognition and Image Analysis, vol. 21, no. 3, Sep. 2011, pp.565-568. crossref(new window)

4.
R. Smith, "Hybrid Page Layout Analysis via Tab-Stop Detection," ICDAR'09. 10th International conference on. 2009, pp. 241-245.

5.
M. Felhi, S. Tabbone, and M. V. O. Segovia, "Multiscale Stroke-based Page Segmentation Approach," In Document Analysis Systems (DAS), 2014, pp. 6-10.

6.
K. Chen, F. Yin, and C. L. Liu, "Hybrid Page Segmentation with Efficient Whitespace Rectangles Extraction and Grouping," In Document Analysis and Recognition (ICDAR), 2013, pp. 958-962.

7.
F. Shafait and R. Smith, "Table detection in heterogeneous document," Proc. 9th IAPR International Workshop on Document Analysis Systems, 2010, pp. 65-72.

8.
J. Chen and D. Lopresti, "Table Detection in Noisy Off-line Handwritten Documents," In Document Analysis and Recognition (ICDAR), International Conference on. 2010, pp. 399-403.

9.
F. M. Wahl, K.Y. Wong, and R. G. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer graphics and image processing, vol. 20, no. 4, Dec. 1982, pp. 375-390. crossref(new window)

10.
N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos, and N. Papamarkos, “Segmentation of Historical Machine-printed Documents using Adaptive Run Length Smoothing and Skeleton Segmentation Paths,” Image and Vision Computing, vol. 28, no. 4, Apr. 2010, pp. 590-604. crossref(new window)