Learning Three Dimensional Tennis Shots Using Graph Convolutional Networks
<p>A spatial temporal graph of skeleton. Red dots represent joints and other characteristic points. Red lines represent connections between points within the lower limbs, blue—upper limbs, green—spine, yellow—tennis racket.</p> "> Figure 2
<p>An example of raw forehand shot phases. (<b>a</b>) beginning of the preparation phase, (<b>b</b>) end of the preparation phase (<b>c</b>) hitting the ball and (<b>d</b>) swinging the racket after the hit.</p> "> Figure 3
<p>Scheme of used classifier consisting of the ST-GCN part and the active features knowledge base.</p> "> Figure 4
<p>Efficiency plot of ST-GCN classifier.</p> "> Figure 5
<p>Efficiency of Fuzzy ST-GCN classifier.</p> ">
Abstract
:1. Introduction
2. Material and Methods
2.1. Capturing Motion Data
2.2. Spatial Temporal Graph
2.3. Recognition of Tennis Shots
3. Experiments and Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Cai, J.; Hu, J.; Tang, X.; Hung, T.-Y.; Tan, Y.-P. Deep Historical Long Short-Term Memorys forAction Recognition. Neurocomputing 2020, 407, 428–438. [Google Scholar] [CrossRef]
- Mazari, A.; Sahbi, H. Human action recognition with multi-laplacian graph convolutional networks. arXiv 2019, arXiv:1910.06934. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 568–576. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. arXiv 2019, arXiv:1901.00596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv 2018, arXiv:1801.07455. [Google Scholar]
- Jain, A.; Zamir, A.R.; Savarese, S.; Saxena, A. Structural-rnn: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27–30 June 2016; pp. 5308–5317. [Google Scholar]
- Li, Y.; He, Z.; Ye, X.; He, Z.; Han, K. Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. Eurasip J. Image Video Process. 2019, 1, 78. [Google Scholar] [CrossRef]
- Liu, K.; Gao, L.; Khan, N.M.; Qi, L.; Guan, L. Graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 25–256. [Google Scholar]
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Ke, Q.; Bennamoun, M.; An, S.; Sohel, F.; Boussaid, F. A new representation of skeleton sequences for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3288–3297. [Google Scholar]
- Kim, T.S.; Reiter, A. Interpretable 3D human action analysis with temporal convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1623–1631. [Google Scholar]
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1110–1118. [Google Scholar]
- Zhu, W.; Lan, C.; Xing, J.; Zeng, W.; Li, Y.; Shen, L.; Xie, X. Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. arXiv 2016, arXiv:1603.07772. [Google Scholar]
- Wang, H.; Wang, L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 499–508. [Google Scholar]
- Lee, I.; Kim, D.; Kang, S.; Lee, S. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1012–1020. [Google Scholar]
- Si, C.; Jing, Y.; Wang, W.; Wang, L.; Tan, T. Skeleton-based action recognition with spatial reasoning and temporal stack learning network. Pattern Recognit. 2020, 107, 107511. [Google Scholar] [CrossRef]
- Sperduti, A.; Starita, A. Supervised neural networks for the classification of structures. IEEE Trans. Neural Netw. 1997, 8, 714–735. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2224–2232. [Google Scholar]
- Henaff, M.; Bruna, J.; LeCun, Y. Deep convolutional networks on graph-structured data. arXiv 2015, arXiv:1506.05163. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
- Niepert, M.; Ahmed, M.; Kutzkov, K. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2014–2023. [Google Scholar]
- Kong, Y.; Li, L.; Zhang, K.; Ni, Q.; Han, J. Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. J. Electron. Imaging 2019, 28, 043032. [Google Scholar] [CrossRef] [Green Version]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12026–12035. [Google Scholar]
- Narasimhan, M.; Lazebnik, S.; Schwing, A. Out of the box: Reasoning with graph convolution nets for factual visual question answering. In Proceedings of the Advances in neural Information Processing Systems, Montreal, QC, Canada, 2–8 December 2018; pp. 2654–2665. [Google Scholar]
- Cui, Z.; Xu, C.; Zheng, W.; Yang, J. Context-dependent diffusion network for visual relationship detection. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 1475–1482. [Google Scholar]
- Yao, T.; Pan, Y.; Li, Y.; Mei, T. Exploring visual relationship for image captioning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 684–699. [Google Scholar]
- Gourgari, S.; Goudelis, G.; Karpouzis, K.; Kollias, S. Thetis: Three dimensional tennis shots a human action dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA, 23–28 June 2013; pp. 676–681. [Google Scholar]
- FarajiDavar, N.; De Campos, T.; Kittler, J.; Yan, F. Transductive transfer learning for action recognition in tennis games. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 1548–1553. [Google Scholar]
- Zhu, G.; Xu, C.; Huang, Q.; Gao, W.; Xing, L. Player action recognition in broadcast tennis video with applications to semantic analysis of sports game. In Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, 23–27 October 2006; pp. 431–440. [Google Scholar]
- Mora, S.V.; Knottenbelt, W.J. Deep learning for domain-specific action recognition in tennis. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 170–178. [Google Scholar]
- Vainstein, J.; Manera, J.; Negri, P.; Delrieux, C.; Maguitman, A. Modeling video activity with dynamic phrases and its application to action recognition in tennis videos. In Iberoamerican Congress on Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2014; pp. 909–916. [Google Scholar]
- Zhu, G.; Xu, C.; Gao, W.; Huang, Q. Action recognition in broadcast tennis video using optical flow and support vector machine. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2006; pp. 89–98. [Google Scholar]
- Shotton, J.; Fitzgibbon, A.; Cook, M.; Shart, T.; Fincchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 20–25 June 2011; pp. 1297–1304. [Google Scholar]
- Hussein, M.E.; Torki, M.; Gowayyed, M.A.; El-Saban, M. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joints locations. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 2466–2472. [Google Scholar]
- Liu, J.; Shahroudy, A.; Xu, D.A.; Wang, G. Spatio-temporal lstm with trust gates for 3D human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Amsterdam, The Netherlands, 8–16 October 2016; pp. 816–833. [Google Scholar]
- Zhang, S.; Liu, X.; Xiao, J. On geometric features for skeleton-based action recognition using multilayer lstm networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, CA, USA, 24–31 March 2017; pp. 148–157. [Google Scholar]
- Dai, J.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wie, Y. Deformable convolutional networks. arXiv 2017, arXiv:1703.06211. [Google Scholar]
- Duddu, V.; Samanta, D.; Rao, V.D. Fuzzy graph modelling of anonymous networks. arXiv 2018, arXiv:1803.11377. [Google Scholar]
- Krleza, D.; Fertalj, K. Graph matching using hierarchical fuzzy graph neural networks. IEEE Trans. Fuzzy Syst. 2017, 25, 892–904. [Google Scholar] [CrossRef]
- Cao, X.; Kudo, W.; Ito, C.; Shuzo, M.; Maeda, E. Activity recognition using st-gcn with 3D motion data. In Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous computing and Proceeding of the 2019 International Symposium on Wearable Computers, Assoc Comp Machinery, London, UK, 9–13 September 2019; pp. 689–692. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
Percentage of Data Belonging to the Training Set | Forehand (True Possitive) | Backhand (True Possitive) | No Shot (True Possitive) |
---|---|---|---|
45% | 74.8% | 74.3% | 78.2% |
50% | 74.4% | 74.2% | 73.1% |
55% | 75.6% | 76.1% | 74.5% |
60% | 68.5% | 74.3% | 64.1% |
65% | 81.2% | 77.1% | 69.6% |
Percentage of Data Belonging to the Training Set | Forehand (True Possitive) | Backhand (True Possitive) | No Shot (True Possitive) |
---|---|---|---|
45% | 74.8% | 74.3% | 78.2% |
50% | 84.4% | 84.1% | 78.1% |
55% | 85.4% | 86.2% | 82.5% |
60% | 86.5% | 87.3% | 86.3% |
65% | 91.2% | 92.8% | 95.9% |
Percentage of Data Set | ST-GCN | Fuzzy ST-GCN |
---|---|---|
45% | 648 | 653 |
50% | 525 | 507 |
55% | 497 | 482 |
60% | 484 | 416 |
65% | 473 | 362 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Skublewska-Paszkowska, M.; Powroznik, P.; Lukasik, E. Learning Three Dimensional Tennis Shots Using Graph Convolutional Networks. Sensors 2020, 20, 6094. https://doi.org/10.3390/s20216094
Skublewska-Paszkowska M, Powroznik P, Lukasik E. Learning Three Dimensional Tennis Shots Using Graph Convolutional Networks. Sensors. 2020; 20(21):6094. https://doi.org/10.3390/s20216094
Chicago/Turabian StyleSkublewska-Paszkowska, Maria, Pawel Powroznik, and Edyta Lukasik. 2020. "Learning Three Dimensional Tennis Shots Using Graph Convolutional Networks" Sensors 20, no. 21: 6094. https://doi.org/10.3390/s20216094