JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Multiscale Spatial Position Coding under Locality Constraint for Action Recognition
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Multiscale Spatial Position Coding under Locality Constraint for Action Recognition
Yang, Jiang-feng; Ma, Zheng; Xie, Mei;
  PDF(new window)
 Abstract
– In the paper, to handle the problem of traditional bag-of-features model ignoring the spatial relationship of local features in human action recognition, we proposed a Multiscale Spatial Position Coding under Locality Constraint method. Specifically, to describe this spatial relationship, we proposed a mixed feature combining motion feature and multi-spatial-scale configuration. To utilize temporal information between features, sub spatial-temporal-volumes are built. Next, the pooled features of sub-STVs are obtained via max-pooling method. In classification stage, the Locality-Constrained Group Sparse Representation is adopted to utilize the intrinsic group information of the sub-STV features. The experimental results on the KTH, Weizmann, and UCF sports datasets show that our action recognition system outperforms the classical local ST feature-based recognition systems published recently.
 Keywords
Action recognition;Action representation;Multiscale spatial position coding under locality constraint;
 Language
English
 Cited by
 References
1.
X. Wu, J. Lai, “Tensor-based projection using ridge regression and its application to action classification,” IET Image Processing, vol. 4, no. 6, pp. 486-493, 2010. crossref(new window)

2.
A. A. Chaaraoui, P. C. Perez, “Silhouette-based human action recognition using sequences of key poses,” Pattern Recognition Letters, vol. 34, no. 15, pp. 1799-1807, 2013. crossref(new window)

3.
K. N. Tran, I. A. Kakadiaris, S. K. Shah, “Modeling motion of body parts for action recognition,” in Proceedings of the British Machine Vision Conference, pp. 1-12, 2011.

4.
B. Huang, G. Tian, F. Zhou, “Human typical action recognition using gray scale image of silhouette sequence,” Computers and Electrical Engineering, vol. 38, no. 5, pp. 1177-1185, 2012. crossref(new window)

5.
S. A. Rahman, M. K. H. Leung, S. Y. Cho, “Human action recognition employing negative space features,” Journal of Visual Communication and Image Representation, vol. 24, no. 3, pp. 217-231, 2013. crossref(new window)

6.
B. Saghafi, D. Rajan, “Human action recognition using pose-based discriminant embedding,” Signal Processing, vol. 27, no. 1, pp. 96-111, 2012.

7.
S. M. Yoon and A. Kuijper, “Human action recognition based on skeleton splitting,” Expert Systems with Applications, vol. 40, no. 17, pp. 6848-6855, 2013. crossref(new window)

8.
L. Shao, L. Ji, Y. Liu, J. Zhang, “Human action segmentation and recognition via motion and shape analysis,” Pattern Recognition Letters, vol. 33, no. 4, pp. 438-445, 2012. crossref(new window)

9.
X. Deng, X. Liu, M. Song, “LF-EME: local features with elastic manifold embedding for human action recognition,” Neurocomputing, vol. 99, no. 1, pp. 144-153, 2013. crossref(new window)

10.
P. Dollar, V. Rabaud, G. Cottrell, S. Belongie, “Behavior recognition via sparse spatio-temporal features,” Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, October, 2005.

11.
A. Klaser, M. Marszalek, C. Schmid, “A spatio-temporal descriptor based on 3D-gradients,” in Proceedings of the British Machine Vision Conference, 2008.

12.
P. Scovanner, S. Ali, M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM International Conference on Multimedia, pp. 357-360, September 2007.

13.
G. Willems, T. Tuytelaar, L. Van Gool, “An efficient dense and scaleinvariant spatio-temporal interest point detector,” in Proceedings of the Europen Conference on Computer Vision, pp. 650-663, 2008.

14.
I. Laptev, M. Marszaek, C. Schmid, B. Rozenfeld, “Learning realistic human actions from movies,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, June 2008.

15.
M. J. Escobar, P. Kornprobst, “Action recognition via bioinspired features: the richness of center-surround interaction,” Computer Vision and Image Understanding, vol. 116, no. 5, pp. 593-605, 2012. crossref(new window)

16.
X. Zhu, Z. Yang, J. Tsien, “Statistics of natural action structures and human action recognition,” Journal of Vision, vol. 12, no. 9, pp. 834-834, 2012. crossref(new window)

17.
B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzalez, “Selective spatio-temporal interest points,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 396-410, 2012. crossref(new window)

18.
Y. Zhu, X. Zhao, Y. Fu, “Sparse coding on local spatial temporal volumes for human action recognition,” in Proceedings of the Computer Vision, pp. 660-671, Springer, Berlin, Germany, 2010.

19.
T. Guha, R K. Ward, “Learning sparse representations for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1576-1588, 2012. crossref(new window)

20.
S. T. Roweis, L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323-2326, 2000. crossref(new window)

21.
J. Tenenbaum, V. DeSilva, J. Langford, “Aglobal geometric framework for nonlinear dimensionality reduction”, Science, vol. 290, pp. 2319-2323, 2000. crossref(new window)

22.
A. Elgammal, R. Duraiswami, L. Davis, “Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1499-1504, 2003. crossref(new window)

23.
M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander, “LOF: identifying density-based local outliers,” in Proceeding software 2000 ACM SIGMOD International Conference on Management of Data, 2000.

24.
K. Yu, T. Zhang, Y. Gong, “Nonlinear learning using local coordinate coding,” in Advances in Neural Information Processing Systems, vol. 22, pp. 2223-2231, 2009.

25.
J. Wang, J. Yang, K. Yu, F. Lv, “Locality-constrained linear coding for image classification,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, 2010.

26.
Y. W. Chao, Y. R. Yeh, Y. W. Chen, “Locality-constrained group sparse representation for robust face recognition,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 761-764, 2011.

27.
M. Zheng, J. Bu, C. Chen, ”Graph regularized sparse coding for image representation,” IEEE Transactions on Image Processing, vol. 20, pp. 1327-1336, 2011. crossref(new window)

28.
J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Transactions on Information Theory, vol. 50, pp. 2231-2242, 2004. crossref(new window)

29.
B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607-609, 1996. crossref(new window)

30.
K. Yu, T. Zhang, Y. Gong, “Nonlinear learning using local coordinate coding,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, pp. 2223-2231, December, 2009.

31.
J. Wang, J. Yang, K. Yu, F. Lv, “Locality-constrained linear coding for image classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, June, 2010.

32.
S. T. Roweis, L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, 2000. crossref(new window)

33.
C.P. Wei, Y.W. Chao, Y. R. Yeh, “Locality-sensitive dictionary learning for sparse representation based classification,” Pattern Recognition, vol. 46, no. 5, pp. 1277-1287, 2013. crossref(new window)

34.
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. crossref(new window)

35.
X. Wu, D. Xu, L. Duan, J. Luo, “Action recognition using context and appearance distribution features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 489-496, June 2011.

36.
M. Bregonzio, T. Xiang, S. Gong, “Fusing appearance and distribution information of interest points for action recognition,” Pattern Recognition, vol. 45, no. 3, pp. 1220-1234, 2012. crossref(new window)

37.
B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzalez, “Selective spatio-temporal interest points,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 396-410, 2012. crossref(new window)

38.
Z. Zhang, C. Wang, B. Xiao, “Action recognition using context constrained linear coding,” Signal Processing Letters, vol. 19, no. 7, pp. 439-442, 2012. crossref(new window)

39.
D. Xu, Y. Huang, Z. Zeng, X. Xu, “Human gait recognition using patch distribution feature and locality-constrained group sparse representation,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 316-326, 2012. crossref(new window)

40.
M. Liu, S. Yan, Y. Fu, T. S. Huang, “Flexible X-Y patches for face recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2113-2116, April 2008.

41.
2013, http://spams-devel.gforge.inria.fr/.

42.
S. Lazebnik, C. Schmid, J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2169-2178, June, 2006.

43.
J. Yang, K. Yu, Y. Gong, T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

44.
H. Lee, A. Battle, R. Raina, A. Ng, “Efficient sparse coding algorithms,” Advances in Neural Information Processing Systems, MIT Press, pp. 801-808, 2007.

45.
K. G. Derpanis, J. M. Gryn, Three-dimensional nth derivative of Gaussian separable steerable filters, IEEE Int. Conf. on Image Processing, vol. 3, 2005.

46.
K.G. Derpanis, M. Sizintsev, K. Cannons, R. P. Wildes, “Efficient Action Spotting based on a Space time Oriented Structure Representation,” In Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2010.

47.
R. baraniuk, M. Wakin, “Random projection of smooth manifold,” foundation of computational mathmaematics, vol. 9, pp. 51-77, 2009. crossref(new window)