Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

Subedi, Bharat;Yunusov, Jahongir;Gaybulayev, Abdulaziz;Kim, Tae-Hyong;

doi:10.14372/IEMEK.2020.15.2.51

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Volume 15 Issue 2
/
Pages.51-60
/
2020
/
1975-5066(pISSN)

Institute of Embedded Engineering of Korea (대한임베디드공학회)

DOI QR Code

Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

Subedi, Bharat (Kumoh National Institute of Technology) ;
Yunusov, Jahongir (Kumoh National Institute of Technology) ;
Gaybulayev, Abdulaziz (Kumoh National Institute of Technology) ;
Kim, Tae-Hyong (Kumoh National Institute of Technology)

Received : 2020.03.01
Accepted : 2020.03.30
Published : 2020.04.30

https://doi.org/10.14372/IEMEK.2020.15.2.51 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Optical character recognition (OCR) has been studied for decades because it is very useful in a variety of places. Nowadays, OCR's performance has improved significantly due to outstanding deep learning technology. Thus, there is an increasing demand for commercial-grade but affordable OCR systems. We have developed a low-cost, high-performance OCR system for the industry with the cheapest embedded developer kit that supports GPU acceleration. To achieve high accuracy for industrial use on limited computing resources, we chose a state-of-the-art text recognition algorithm that uses an end-to-end deep learning network as a baseline model. The model was then improved by replacing the feature extraction network with the best one suited to our conditions. Among the various candidate networks, EfficientNet-B3 has shown the best performance: excellent recognition accuracy with relatively low memory consumption. Besides, we have optimized the model written in TensorFlow's Python API using TensorFlow-TensorRT integration and TensorFlow's C++ API, respectively.

Keywords

References

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," pp. 1-10, 2014.
A. Poznanski, L. Wolf, "CNN-N-gram for Handwriting Word Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2305-2314, 2016.
Shangbang Long, Xin He, Cong Yao, "Scene Text Detection and Recognition: The Deep Learning Era," pp. 1-20, 2018.
M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Reading Text in the Wild with Convolutional Neural Networks," International Journal of Computer Vision, Vol. 116, No. 1, pp. 1-20, 2016. https://doi.org/10.1007/s11263-015-0823-z
M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, "Textboxes: A Fast Text Detector with a Single Deep Neural Network," Proceedings of Advancement of Artificial Intelligence, pp. 4161-4167, 2017.
Z. Tian, W. Huang, T. He, P. He, Y. Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network," Proceedings of European Conference on Computer Vision, pp. 56-72, 2016.
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, "East: An Efficient and Accurate Scene Text Detector," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551-5560, 2017.
Christian Bartz, Haojin Yang, Christoph Meinel, "STN-OCR: A Single Neural Network for Text Detection and Text Recognition," pp. 1-9, 2017.
M. Busta, L. Neumann, J. Matas, "Deep Textspotter: An End-to-end Trainable Scene Text Localization and Recognition Framework," Proceedings of IEEE International Conference on Computer Vision, pp. 2204-2212, 2017.
T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, C.Sun, "An End-to-end Textspotter with Explicit Alignment and Attention," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5020-5029, 2018.
X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, J. Yan, "Fots: Fast Oriented Text Spotting with a Unified Network," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676-5685, 2018.
Wikipedia, Comparison of Deep-learning Software, Available on : https://en.wikipedia.org/wiki/Comparison_of_deep-learning_software
M. Liao, B. Shi, X. Bai, "TextBoxes++: A Single-shot Oriented Scene Text Detector," Journal of IEEE Transactions on Image Process, Vol. 27, No. 8, pp. 3676-3690, 2018. https://doi.org/10.1109/TIP.2018.2825107
B. Shi, X. Bai, C. Yao, "An End-to-end Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition," Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 11, pp. 2298-2304, 2017. https://doi.org/10.1109/TPAMI.2016.2646371
K. Simonyanm, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Proceedings of International Conference on Learning Representations, pp. 7-9, 2015.
K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
K. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection," pp. 1-7, 2016.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam,"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," pp. 1-9, 2017.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, V. Q. Le, H. Adam, "Searching for MobileNetV3," Proceedings of IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
M. Tan, Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolution Neural Networks," pp. 1-10, 2019.
Sandler, M. Howard, A. Zhu, M. Zhmoginov, L. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
Tan, M. Chen, B. Pang, R. Vasudevan, V. Sandler, M. Howard, Q. V. Le, "MnasNet: Platform-aware Neural Architecture Search for Mobile," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828, 2019.
Hu, J. Shen, G. Sun, "Squeeze-and-excitation Networks," Proceedings of IEEE Conferenc on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
xTensor, Multi-dimensional Arrays with Broadcasting and Lazy Computing, Available on : https://xtensor.readthedocs.io/
Tobias Knopp, NumCpp, Available on : https://numcpp.readthedocs.io/

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)