Abstract
Recognizing user-defined moves serves a large number of applications including sport monitoring, virtual reality or natural user interfaces (NUI). However, many of the efficient human move recognition methods are still limited to specific situations, such as straightforward NUI gestures or everyday human actions. In particular, most methods depend on a prior segmentation of recordings to both train and recognize moves. This segmentation step is generally performed manually or based on heuristics such as neutral poses or short pauses, limiting the range of applications. Besides, speed is generally not considered as a criterion to distinguish moves. We present an approach composed of a simplified move training phase that requires minimal user intervention, together with a novel online method to robustly recognize moves online from unsegmented data without requiring any transitional pauses or neutral poses, and additionally considering human move speed. Trained gestures are automatically segmented in real time by a curvature-based method that detects small pauses during a training session. A set of most discriminant key poses between different moves is also extracted in real time, optimizing the number of key poses. All together, this semi-supervised learning approach only requires continuous move performances from the user with small pauses. Key pose transitions and moves execution speeds are used as input to a novel human move recognition algorithm that recognizes unsegmented moves online, achieving high robustness and very low latency in our experiments, while also effective in distinguishing moves that differ only in speed.
Similar content being viewed by others
Notes
References
Althloothi, S., Mahoor, M.H., Zhang, X., Voyles, R.M.: Human activity recognition using multi-features and multiple kernel learning. Pattern Recognit. 47(5), 1800–1812 (2014)
Bloom, V., Makris, D., Argyriou, V.: Clustered spatio-temporal manifolds for online action recognition. In: Pattern Recognition (ICPR), IEEE 2014 22nd International Conference on, pp. 3963–3968 (2014)
Bobick, A., Davis, J.: The recognition of human movement using temporal templates. TPAMI 23 (2001)
Cao, L., Liu, Z., Huang, T.: Cross-dataset action detection. In: CVPR, pp. 1998–2005 (2010)
Cardinaux, F., Bhowmik, D., Abhayaratne, C., Hawley, M.S.: Video based technology for ambient assisted living: A review of the literature. J. Ambient Intell. Smart Environ. 3(3), 253–269 (2011)
Chaaraoui, A.A., Flórez-Revuelta, F.: Continuous Human Action Recognition in Ambient Assisted Living Scenarios. Springer International Publishing, Cham (2015)
Chen, D.Y., Shih, S.W., Liao, H.Y.M.: Human action recognition using 2-d spatio-temporal templates. In: 2007 IEEE International Conference on Multimedia and Expo, pp. 667–670. IEEE (2007)
Chen, W., Guo, G.: Triviews: A general framework to use 3d depth data effectively for action recognition. J. Vis. Commun. Image Represent. 26, 182–191 (2015). doi:10.1016/j.jvcir.2014.11.008
Desmond, M., Collet, P., Marsh, P., O’Shaughnessy, M.: Gestures: Their origins and distribution. Cape (1979)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3d human action recognition. In: ICIAP, pp. 456–464 (2013)
Do Carmo, M.: Differential geometry of curves and surfaces. Pearson, London (1976)
Ellis, C., Masood, S.Z., Tappen, M.F., La Viola Joseph, J.J., Sukthankar, R.: Exploring the trade-off between accuracy and observational latency in action recognition. J. Vis. Commun. Image Represent. 101(3), 420–436 (2013)
Faugeroux, R., Vieira, T., Martinez, D., Lewiner, T.: Simplified training for gesture recognition. In: Sibgrapi, pp. 133–140 (2014)
Forbes, K., Fiu, E.: An efficient search algorithm for motion data using weighted PCA. In: SCA, pp. 67–76 (2005)
Fothergill, S., Mentis, H., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’12, pp. 1737–1746. ACM, New York, NY, USA (2012)
Gong, D., Medioni, G., Zhu, S., Zhao, X.: Kernelized temporal cut for online temporal segmentation and recognition. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part III. ECCV’12, pp. 229–243. Springer, Berlin, Heidelberg (2012)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR, pp. 2046–2053 (2010)
Lan, R., Sun, H.: Automated human motion segmentation via motion regularities. J. Vis. Commun. Image Represent. 31, 35–53 (2015)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV 1, 432–439 (2003)
LaViola, J.: 3d gestural interaction: The state of the field. ISRN Artificial Intelligence p. 514641 (2013)
Lewiner, T., Gomes, J., Lopes, H., Craizer, M.: Curvature and torsion estimators based on parametric curve fitting. Comput. Gr. 29(5), 641–655 (2005)
Li, W., Zhang, Z., Liu, Z.: Expandable data-driven graphical modeling of human actions based on salient postures. Circuits Syst. Video Technol. 18(11), 1499–1510 (2008)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPR W. Human Communicative Behavior Analysis, pp. 9–14 (2010)
Liu, Z.: MSR action recognition datasets and codes. http://research.microsoft.com/~zliu/ActionRecoRsrc (2014)
Lo Presti, L., La Cascia, M.: 3d skeleton-based human action classification. Comput. Gr. 53, 130–147 (2016)
Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and Viterbi path searching. In: CVPR, pp. 1–8 (2007)
Miranda, L., Vieira, T., Martinez, D., Lewiner, T., Vieira, A.W., Campos, M.F.M.: Keypose and gesture database. http://www.im.ufal.br/professor/thales/gesturedb.html (2014)
Miranda, L., Vieira, T., Martinez, D., Lewiner, T., Vieira, A.W., Campos, M.F.M.: Online gesture recognition from pose kernel learning and decision forests. Comput. Gr. 39, 65–73 (2014)
Müller, M., Baak, A., Seidel, H.P.: Efficient and robust annotation of motion capture data. In: SCA, pp. 17–26 (2009)
Müller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: SCA, pp. 137–146 (2006)
Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV, pp. 392–405 (2010)
Nowozin, S., Shotton, J.: Action points: A representation for low-latency online human action recognition. Tech. Rep. MSR-TR-2012-68, Microsoft Research Cambridge (2012)
Padilla-López, J.R., Chaaraoui, A.A., Flórez-Revuelta, F.: A discussion on the validation tests employed to compare human action recognition methods using the MSR action3d dataset. CoRR (2015)
Poppe, R.: A survey on vision-based human action recognition. Comput. Gr. 28(6), 976–990 (2010)
Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: SCA, pp. 147–156 (2011)
Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)
Slama, R., Wannous, H., Daoudi, M.: Grassmannian representation of motion depth for 3d human gesture and action recognition. In: ICPR, pp. 3499–3504 (2014)
Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3d action recognition using learning on the Grassmann manifold. Comput. Gr. 48(2), 556–567 (2015)
Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp. 588–595 (2014)
Vieira, A.W., Lewiner, T., Schwartz, W., Campos, M.F.M.: Distance matrices as invariant features for classifying mocap data. In: ICPR, pp. 2934–2937 (2012)
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns. Comput. Gr. 36(15), 221–227 (2014)
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.M.: STOP: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: CIARP, pp. 252–259 (2012)
Vieira, T., Faugeroux, R., Martínez, D., Lewiner, T.: Speed-aware gesture database. http://www.im.ufal.br/professor/thales/sgd.html (2016)
Vitaladevuni, S.N., Kellokumpu, V., Davis, L.S.: Action recognition using ballistic dynamics. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: ECCV, pp. 872–885 (2012)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)
Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR, pp. 1–7 (2005)
Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3d joints. In: CVPR W. on Human Activity Understanding from 3D Data, pp. 20–27 (2012)
Yang, X., Tian, Y.: Eigenjoints-based action recognition using naïve-Bayes-nearest-neighbor. In: CVPR W. On Human Activity Understanding from 3D Data, pp. 14–19 (2012)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Multimedia, pp. 1057–1060 (2012)
Ye, J., Li, K., Qi, G.J., Hua, K.A.: Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15, pp. 99–106 (2015)
Yu, G., Liu, Z., Yuan, J.: Discriminative orderlet mining for real-time recognition of human-object interaction. In: ACCV, Lecture Notes in Computer Science, pp. 50–65 (2014)
Zhao, X., Li, X., Pang, C., Zhu, X., Sheng, Q.Z.: Online human gesture recognition from motion data streams. In: Proceedings of the 21st ACM International Conference on Multimedia. MM ’13, pp. 23–32. ACM, New York, NY, USA (2013)
Zhou, F., De la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 582–596 (2013)
Zhu, Y., Chen, W., Guo, G.: Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis. Comput. 32(8), 453–464 (2014)
Zhu, Y., Chen, W., Guo, G.: Fusing multiple features for depth-based action recognition. Image Vis. Comput. 6(2), 18:1–18:20 (2015)
Acknowledgements
The authors would like to thank CNPq, FAPEAL, FAPERJ and École Polytechnique for partially financing this research.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mov 23113 KB)
Rights and permissions
About this article
Cite this article
Vieira, T., Faugeroux, R., Martínez, D. et al. Online human moves recognition through discriminative key poses and speed-aware action graphs. Machine Vision and Applications 28, 185–200 (2017). https://doi.org/10.1007/s00138-016-0818-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-016-0818-y