DOI QR코드

DOI QR Code

Human Action Recognition Based on 3D Convolutional Neural Network from Hybrid Feature

  • Wu, Tingting (Dept. of Information Communication Engineering, Tongmyong University) ;
  • Lee, Eung-Joo (Dept. of Information Communication Engineering, Tongmyong University)
  • Received : 2019.05.29
  • Accepted : 2019.12.26
  • Published : 2019.12.31

Abstract

3D convolution is to stack multiple consecutive frames to form a cube, and then apply the 3D convolution kernel in the cube. In this structure, each feature map of the convolutional layer is connected to multiple adjacent sequential frames in the previous layer, thus capturing the motion information. However, due to the changes of pedestrian posture, motion and position, the convolution at the same place is inappropriate, and when the 3D convolution kernel is convoluted in the time domain, only time domain features of three consecutive frames can be extracted, which is not a good enough to get action information. This paper proposes an action recognition method based on feature fusion of 3D convolutional neural network. Based on the VGG16 network model, sending a pre-acquired optical flow image for learning, then get the time domain features, and then the feature of the time domain is extracted from the features extracted by the 3D convolutional neural network. Finally, the behavior classification is done by the SVM classifier.

Keywords

References

  1. R.T. Collins, A.J. Lipton, T. Kanade, H. Fujinyoshi, D. Duggins, Y. Tsin, et al., A System for Video Surveillance and Monitoring, Defense Advanced Research Projects Agency Image Understanding under Contract DAAB07-97-C-J031 and the Office of Naval Research under Grant, N00014-99-1-0646, 2000.
  2. Y. Zheng, Q.Q. Chen, and Y.J. Zhang, “Deep Learning and Its New Progress in Target and Behavior Recognition,” Chinese Journal of Image and Graphics, Vol. 19, No. 2, pp. 175-184, 2014.
  3. Q.J. Xu and Z.Y. Wu, “Research Progress on Behavior Recognition in Video Sequences,” Journal of Electronic Measurement and Instrument, Vol. 28, No. 4, pp. 343-351, 2014. https://doi.org/10.13382/j.jemi.2014.04.001
  4. Q. Lei, D.S. Chen, and S.Z. Li, “New Progress in Human Behavior Recognition in Complex Scenes,” Computer Science, Vol. 41, No. 12, pp. 1-7, 2014. https://doi.org/10.11896/j.issn.1002-137X.2014.12.001
  5. P.P. Peng, Image Classification Based on Set Representation, Master's Thesis of Harbin Engineering University, 2016.
  6. J. Ma, Research and Implementation of Action Recognition Based on Pose and Skeleton, ShanDong University of Control Engineering Language Institute, 2018.
  7. J. Arunnehru, G. Chamundeeswari, and S.P. Bharathi, "Human Motion Recognition Using 3D Convolutional Neural Network with 3D Motion Cuboids in Surveillance Videos," Proceeding of International Conference on Robotics and Smart Manufacturing, pp. 471-477, 2018.
  8. L. Wang, Y. Xiong, Z. Wang, and Y. Qiao, “Towards Good Practices for Very Deep Two-stream Conv Nets,” Computer Science, Vol. 10, No. 2, pp. 75-78, 2015.
  9. S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, Issue 1, pp. 221-231, 1996. https://doi.org/10.1109/TPAMI.2012.59
  10. S. Ji, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231, 2013. https://doi.org/10.1109/TPAMI.2012.59
  11. G. Cheron, I. Laptev, and C. Schmid, “P-CNN: Posed-based CNN Features for Action Recognition,” Computer Vision, Vol. 10, No. 10, pp. 3218-3226, 2015.
  12. M. Simonyan and A. Zisserman, “Two-steam Converlution Network for Action Recongnition in Videos,” Computational Linguistics, Vol. 1, No. 4, pp. 568-576, 2014.
  13. N. Zhang and E.J. Lee, “Human Action Recognition Based on An Improved Combined Feature Representation,” Journal of Korea Multimedia Society, Vol. 21, No. 12, pp. 1473-1480, 2018. https://doi.org/10.9717/kmms.2018.21.12.1473
  14. P. Matikainen, M. Hebert, and S.R. Trajectons, "Action Recognition through the Motion Analysis of Tracked Features," Proceeding of IEEE International Conference on Computer Vision Workshops, pp. 514-521, 2009.
  15. K. Simonyan and A. Zisserman, "Two-stream Convolutional Networks for Action Recognition in Video," Advances in Neural Information Processing Systems, pp. 568-576, 2014.
  16. G. Chero, I. Laptev, and C. Schmid, "P-CNN: Pose-based CNN Features for Action Recognition," Proceeding of the IEEE International Conference on Computer Vision, pp. 3218-3226, 2015.
  17. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale Video Classification with Convolutional Neural Networks," Proceeding of Conference on Computer Vision and Pattern Recognition, pp. 1725-1732, 2014.
  18. T. Du, L. Bourdev, R. Fergus, and Y. Qiao, "Towards Good Practices for Very Deep Two-Stream Conv Nets," Proceeding of IEEE International Conference on Computer Vision, pp. 4489-4497, 2015.