DOI QR코드

DOI QR Code

Effect of Input Data Video Interval and Input Data Image Similarity on Learning Accuracy in 3D-CNN

  • Kim, Heeil (Department of Computer Engineering, Wonkwang University) ;
  • Chung, Yeongjee (Department of Computer and Software Engineering, Wonkwang University)
  • Received : 2021.04.06
  • Accepted : 2021.04.15
  • Published : 2021.05.31

Abstract

3D-CNN is one of the deep learning techniques for learning time series data. However, these three-dimensional learning can generate many parameters, requiring high performance or having a significant impact on learning speed. We will use these 3D-CNNs to learn hand gesture and find the parameters that showed the highest accuracy, and then analyze how the accuracy of 3D-CNN varies through input data changes without any structural changes in 3D-CNN. First, choose the interval of the input data. This adjusts the ratio of the stop interval to the gesture interval. Secondly, the corresponding interframe mean value is obtained by measuring and normalizing the similarity of images through interclass 2D cross correlation analysis. This experiment demonstrates that changes in input data affect learning accuracy without structural changes in 3D-CNN. In this paper, we proposed two methods for changing input data. Experimental results show that input data can affect the accuracy of the model.

Keywords

Acknowledgement

This paper was supported by Wonkwang University, Iksan Korea in 2021.

References

  1. K. Alex, S. Ilya and E. T. Geoffrey , "ImageNet Classification with Deep Convolutional Neural Networks," Advances in neural information processing systems, Communications of the ACM, vol. 60, no. 6, pp: 84-90, May 2017. DOI: https://doi.org/10.1145/3065386
  2. S. G. Choi, and W. Xu, "A Study on Person Re-Identification System using Enhanced RNN," The journal of the institute of internet, broadcasting and communication(JIIBC), v.17 no.2, pp. 15-23, Apr. 2017. DOI: https://doi.org/10.7236/JIIBC.2017.17.2.15
  3. T. Du , B. Lubomir, F. Rob , T. Lorenzo and P. Manohar, "Learning Spatiotemporal Features with 3D Convolutional Networks," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp: 4489-4497, Oct 2015. DOI: https://doi.org/10.1109/iccv.2015.510
  4. K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. DOI: https://doi.org/10.3115/v1/d14-1179
  5. Hochreiter and Schmidhuber, "LONG SHORT-TERM MEMORY," 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
  6. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014. DOI: https://doi.org/10.1109/cvpr.2014.223
  7. J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017. DOI: https://doi.org/10.1109/cvpr.2017.502
  8. K. Yang, R. Li, P. Qiao, Q. Wang, D. Li, and Y. Dou, "Temporal Pyramid Relation Network for Video-Based Gesture Recognition," 2018 25th IEEE International Conference on Image Processing (ICIP), Oct. 2018. DOI: https://doi.org/10.1109/icip.2018.8451700
  9. C. Feichtenhofer, A. Pinz, and A. Zisserman, "Convolutional Two-Stream Network Fusion for Video Action Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016. DOI: https://doi.org/10.1109/cvpr.2016.213
  10. TwentyBN, "jester dataset: a hand gesture dataset," https://www.twentybn.com/datasets/jester, 2017.
  11. J. David, "Correlation and Convolution", Class Notes for CMSC 426, 2005.