DOI QR코드

DOI QR Code

Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

  • Received : 2020.03.01
  • Accepted : 2020.03.30
  • Published : 2020.04.30

Abstract

Optical character recognition (OCR) has been studied for decades because it is very useful in a variety of places. Nowadays, OCR's performance has improved significantly due to outstanding deep learning technology. Thus, there is an increasing demand for commercial-grade but affordable OCR systems. We have developed a low-cost, high-performance OCR system for the industry with the cheapest embedded developer kit that supports GPU acceleration. To achieve high accuracy for industrial use on limited computing resources, we chose a state-of-the-art text recognition algorithm that uses an end-to-end deep learning network as a baseline model. The model was then improved by replacing the feature extraction network with the best one suited to our conditions. Among the various candidate networks, EfficientNet-B3 has shown the best performance: excellent recognition accuracy with relatively low memory consumption. Besides, we have optimized the model written in TensorFlow's Python API using TensorFlow-TensorRT integration and TensorFlow's C++ API, respectively.

Keywords

References

  1. M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," pp. 1-10, 2014.
  2. A. Poznanski, L. Wolf, "CNN-N-gram for Handwriting Word Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2305-2314, 2016.
  3. Shangbang Long, Xin He, Cong Yao, "Scene Text Detection and Recognition: The Deep Learning Era," pp. 1-20, 2018.
  4. M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Reading Text in the Wild with Convolutional Neural Networks," International Journal of Computer Vision, Vol. 116, No. 1, pp. 1-20, 2016. https://doi.org/10.1007/s11263-015-0823-z
  5. M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, "Textboxes: A Fast Text Detector with a Single Deep Neural Network," Proceedings of Advancement of Artificial Intelligence, pp. 4161-4167, 2017.
  6. Z. Tian, W. Huang, T. He, P. He, Y. Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network," Proceedings of European Conference on Computer Vision, pp. 56-72, 2016.
  7. X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, "East: An Efficient and Accurate Scene Text Detector," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551-5560, 2017.
  8. Christian Bartz, Haojin Yang, Christoph Meinel, "STN-OCR: A Single Neural Network for Text Detection and Text Recognition," pp. 1-9, 2017.
  9. M. Busta, L. Neumann, J. Matas, "Deep Textspotter: An End-to-end Trainable Scene Text Localization and Recognition Framework," Proceedings of IEEE International Conference on Computer Vision, pp. 2204-2212, 2017.
  10. T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, C.Sun, "An End-to-end Textspotter with Explicit Alignment and Attention," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5020-5029, 2018.
  11. X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, J. Yan, "Fots: Fast Oriented Text Spotting with a Unified Network," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676-5685, 2018.
  12. Wikipedia, Comparison of Deep-learning Software, Available on : https://en.wikipedia.org/wiki/Comparison_of_deep-learning_software
  13. M. Liao, B. Shi, X. Bai, "TextBoxes++: A Single-shot Oriented Scene Text Detector," Journal of IEEE Transactions on Image Process, Vol. 27, No. 8, pp. 3676-3690, 2018. https://doi.org/10.1109/TIP.2018.2825107
  14. B. Shi, X. Bai, C. Yao, "An End-to-end Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition," Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 11, pp. 2298-2304, 2017. https://doi.org/10.1109/TPAMI.2016.2646371
  15. K. Simonyanm, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Proceedings of International Conference on Learning Representations, pp. 7-9, 2015.
  16. K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  17. K. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection," pp. 1-7, 2016.
  18. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam,"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," pp. 1-9, 2017.
  19. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
  20. A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, V. Q. Le, H. Adam, "Searching for MobileNetV3," Proceedings of IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
  21. M. Tan, Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolution Neural Networks," pp. 1-10, 2019.
  22. Sandler, M. Howard, A. Zhu, M. Zhmoginov, L. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
  23. Tan, M. Chen, B. Pang, R. Vasudevan, V. Sandler, M. Howard, Q. V. Le, "MnasNet: Platform-aware Neural Architecture Search for Mobile," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828, 2019.
  24. Hu, J. Shen, G. Sun, "Squeeze-and-excitation Networks," Proceedings of IEEE Conferenc on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
  25. xTensor, Multi-dimensional Arrays with Broadcasting and Lazy Computing, Available on : https://xtensor.readthedocs.io/
  26. Tobias Knopp, NumCpp, Available on : https://numcpp.readthedocs.io/