DOI QR코드

DOI QR Code

Text Region Detection Method Using Table Border Pseudo Label

표의 테두리 유사 라벨을 활용한 문자 영역 검출 방법

  • Han, Jeong Hoon (Hanyang University - Ansan Campus, Department of Computer Science and Engineering) ;
  • Park, Se Jin (Hanyang University - Ansan Campus, Department of Computer Science and Engineering) ;
  • Moon, Young Shik (Hanyang University - Ansan Campus, Department of Computer Science and Engineering)
  • Received : 2020.08.18
  • Accepted : 2020.09.02
  • Published : 2020.10.31

Abstract

Text region detection is a technology that detects text area in handwriting or printed documents. The detected text areas are digitized through a recognition step, which is used in various fields depending on the purpose of use. However, the detection result of the small text unit is not suitable for the industrial field. In addition, the border of tables in the document that it causes miss-detected results, which has an adverse effect on the recognition step. To solve the issues, we propose a method for detecting text region using the border information of the table. In order to utilize the border information of the table, the proposed method adjusts the flow of two decoders. Experimentally, we show improved performance using the table border pseudo label based on weak supervised learning.

문자 영역 검출이란 수기 혹은 인쇄된 문서에서 문자의 영역을 검출하는 기술이다. 검출된 문자 영역들은 인식 단계를 거쳐 디지털화되며 이는 활용 목적에 따라 다양한 곳에서 활용된다. 하지만 문자 단위의 검출 결과는 대용량 문서를 인식해야 하는 산업 현장의 문자 인식 단계에는 적합하지 않다. 또한, 문서 내 존재하는 표는 문자 영역 검출 단계에서 오검출을 야기하며 이는 문자 인식 단계에서 악영향을 끼친다. 이를 해결하기 위해 본 논문에서는 표의 테두리 정보를 활용한 문자 영역 검출 방법을 제안한다. 표의 테두리 정보를 활용하기 위하여 제안하는 방법은 2개 디코더를 추가하고 간접적인 학습을 유도하기 위하여 각 디코드의 흐름을 조절하였다. 실험을 통해 표의 테두리 유사 라벨을 이용한 약지도 학습 방법이 성능 향상에 도움이 됨을 보였다.

Keywords

References

  1. Y. Xiao, M. Xue, T. Lu, Y. Wu, and S. Palaiahnakote, "A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection," in Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 695-700, 2019.
  2. O. Tursun, R. Zeng, S. Denman, S. Sivapalan, S. Sridharan, and C. Fookes, "MTRNet: A Generic Scene Text Eraser," in Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 39-44, 2019.
  3. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  4. S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, "Aggregated Residual Transformations for Deep Neural Network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500, 2017.
  5. G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, "Learning to Estimate 3D Human Pose and Shape from a Single Color Image," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 459-468, 2018.
  6. J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
  7. W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, "Shape Robust Text Detection With Progressive Scale Expansion Network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336-9345, 2019.
  8. S. S. Paliwal, R. Rahul, M. Sharma, and L. Vig, "TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images," in Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 128-133, 2019.
  9. R. Smith, "An Overview of the Tesseract OCR Engine," in Proceedings of International Conference on Document Analysis and Recognition (ICDAR), 2007.
  10. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234-241, 2015.
  11. L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adan, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European Conference on Computer Vision, pp. 801-818, 2018.
  12. R. Caruana, "Multi-task learning," Machine Learning, vol. 28, no. 1, pp. 41-75, 1997. https://doi.org/10.1023/A:1007379606734
  13. Z. Chen, R. Zhang, G. Zhang, Z. Ma, and T. Lei, "Digging Into Pseudo Label: A Low-Budget Approach for Semi-Supervised Semantic Segmentation," in IEEE Access, vol. 8, pp. 41830-41837, 2020. https://doi.org/10.1109/access.2020.2975022
  14. H. Wu, S. Zheng, J. Zhang, and K. Huang, "Fast End-to-End Trainable Guided Filter," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1838-1847, 2018.