Abstract
Multi-person pose estimation in natural scenes has been a hot topic in the recent years. The prediction speed of the top-down methods is affected by the number of people in the scene, so the bottom-up methods has an advantage in natural scenes. However, the study found that the accuracy of human margin joints (the joints farther from the center of the human, such as wrist and ankle) is always lower than that of the joints that are closer to the center of the human (such as shoulder and hip), and the accuracy gap between joints categories is large. Inspiring from the structural characteristics of human body, this paper proposes a tree structure network (TSNet) for human pose estimation, which divides the joints of the human into several levels according to the characteristics of human body structure, and stepwise predicts the joints from human center to human margin. Combining with the global features, the joint features of the next layer are predicted by extracting the correlation between the joint features of the current layer and the joint features of the previous layer. Therefore, each human joint contains not only the joint information of the current layer and the joint information of the previous layer, but also the background information. The experiment results show that this method can effectively alleviate the uneven precision of joints, and the TSNet can effectively improve the accuracy of lower body joints by setting different activation values for different joints. Extensive experiments on MPII datasets demonstrate the effectiveness of our proposed model and method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C. Mur phy, K.: Towards Accurate Multi-person Pose Estimation in the Wild, in: Proceedings of the CVPR, (2017), pp. 3711-3719
Fang, H., Xie, S., Tai, Y., Lu, C.: RMPE: Regional Multi-person Pose Estimation, in:Proceedings of the ICCV, (2017), pp. 2353-2362
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded Pyramid Network for Multi-person Pose Estimation, in: Proceedings of the CVPR, (2018), pp. 7103-7112
Li, W., Wang, Z., Yin, B., Peng, Q., Du, Y., Xiao, T., Yu, G., Lu, H., Wei, Y., Sun, J.: R ethinking on Multi-Stage Networks for Human Pose Estimation, CoRR abs/1901.0 0148 (2019)
Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-Person Pose Estimation With Enha nced Channel-Wise and Spatial Information, in: Proceedings of the CVPR, (2019), pp. 5667-5675
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional Pose Machines, in: Proceedings of the CVPR, 2016, pp. 4724-4732
Liang, S., Sun, X., Wei, Y.: Compositional Human Pose Regression, in: Proceedings of the ICCV, (2017), pp. 2621-2630
Liu, W., Chen, J., Li, C., Qian, C., Chu, X., Hu, X.: A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation, in: Proceedings of the AAAI, (2018), pp. 7170-7177
Tang, W., Yu, P., Wu, Y.: Deeply Learned Compositional Models for Human Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 197-214
Ke, L., Chang, M.-C., Qi, H., Lyu, S.: Multi-Scale Structure-Aware Network for Hu man Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 731-746
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, in: Proceedings of the CVPR, (2016), pp. 4929-4937
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model, in: Proceedings of the ECCV, (2016), pp. 34-50
Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.: ArtTrack: Articulated Multi-Person Tracking in the Wild, in: Proceedings of the CVPR, (2017), pp. 1293-1301
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields, in: Proceedings of the CVPR, (2017), pp. 1302-1310
Newell, A., Huang, Z., Deng, J.: Associative Embedding: End-to-End Learning for Joint Detection and Grouping, in: Proceedings of the NIPS, (2017), pp. 2274–2284
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: Composite Fields for Human Pose Estimation, in: Proceedings of the CVPR, (2019), pp. 11969-11978
Nie, X., Feng, J., Xing, J., Yan, S.: Pose Partition Networks for Multi-person Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 705-720
XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking, in: Proceedings of the European conference on computer vision (ECCV). (2018): 466–481
Cheng, Bowen., Wei, Yunchao., Shi, Honghui., Feris, Rogerio., Xiong, Jinjun., Huang, Thomas.: Decoupled classification20refinement: Hard false positive suppression for object detection. arXiv preprint arXiv:1810.04002, (2018). 2
Cheng, Bowen., Wei, Yunchao., Shi, Honghui., Feris, Rogerio., Xiong, Jinjun., Huang, Thomas.: Revisiting rcnn: On awakening the classification power of faster rcnn. In ECCV, (2018).2
Ren, Shaoqing., He, Kaiming., Girshick, Ross., Sun, Jian.: Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, (2015). 2
Lin, Tsung-Yi., Doll’ar, Piotr, Girshick, Ross, He, Kaiming, Hariharan, Bharath, Belongie, Serge: Feature pyramid networks for object detection. CVPR 2(3), 5 (2017)
Liu, Wei., Anguelov, Dragomir., Erhan, Dumitru., Szegedy, Christian., Reed, Scott., Fu, Cheng-Yang., CBerg, Alexander.: Ssd: Single shot multibox detector. In ECCV, (2016). 3
He, Kaiming., Gkioxari, Georgia., Doll’ar, Piotr., Girshick, Ross.: Mask r-cnn. In ICCV, (2017)
Redmon J., Divvala, S., Girshick, R., et al.: You Only Look Once: Unified, Real-Time Object Detection[C]// Computer Vision & Pattern Recognition. IEEE, (2016)
Andriluka, M.., Roth, S.., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation, in: Proceedings of the CVPR, (2009), pp. 1014-1021
Sun, M., Kohli, P., Shotton, J.: Conditional regression forests for human pose estimation, in: Proceedings of the CVPR, (2012), pp. 3394-3401
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet Conditioned Pictorial Structures, in: Proceedings of the CVPR, (2013), pp. 588-595
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep High-Resolution Representation Learning for Human Pose Estimation, in: Proceedings of the CVPR, (2019), pp. 5686-5696
oshev, A. T., Szegedy, C.: DeepPose: Human Pose Estimation via Deep Neural Net works, in: Proceedings of the CVPR, (2014), pp. 1653-1660
Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the ECCV, (2016), pp. 483-499
Papandreou, George., Zhu, Tyler., Chen, Liang chieh., Gidaris, Spyros., Tompson, Jonathan., Murphy, Kevin.: Personlab: Person pose estimation and instance segmentation with a part-based geometric embedding model. In ECCV, (2018).1, 2, 5, 6
ZHU, X., JIANG, Y., LUO, Z.: Multi-person pose estimation for posetrack with enhanced part affinity fields[C]//ICCV PoseTrack Workshop. (2017), 7
ZHANG, H., OUYANG, H., LIU, S.: ff. Human pose estimation with spatial contextual information[J]. arXiv preprint arXiv:1901.01760, (2019)
Luo, Y., Xu, Z., Liu, P., Du, Y., Guo, J.: Multi-Person Pose Estimation via Multi-Layer Fractal Network and Joints Kinship Pattern. TIP 28, 142–155 (2019)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T., Zhang, L.: HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation, in: Proceedings of the CVPR, (2020), pp. 5386-5395
Nie, X., Feng, J., Zhang, J., Yan, S.: Single-Stage Multi-Person Pose Machines, in: Proceedings of the ICCV, (2019), pp. 6950-6959
Chen, X., Yang, G.: Multi-Person Pose Estimation with LIMB Detection Heatmaps[C]// 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, (2018)
Zhang, F., Zhu, X., Dai, H., et al.: Distribution-aware coordinate representation for human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020): 7093-7102
Zhang, Zhiqian, Luo, Yanmin, Gou, Jin: Double anchor embedding for accurate multi-person 2D pose estimation[J]. Image and Vision Computing 111(1), 104198 (2021)
Ou, Zhilong., Luo, YanMin., Chen, Jin., Chen, Geng.: SRFNet: selective receptive field network for human pose estimation.J Supercomputing (2021). https://doi.org/10.1007/s11227-021-03889-z
BULAT, A., TZIMIROPOULOS, G.: Human pose estimation via convolutional part heatmap regression[C]//European Conference on Computer Vision. Springer, (2016): 717–732
Acknowledgements
This work was supported by Natural Science Foundation of Fujian Province, China under grant 2020J01082, and in part by the Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by the National Natural Science Foundation of China under Grant 61901183
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
The Natural Science Foundation of Fujian Province, China under grant 2020J01082, and in part by The Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by the National Natural Science Foundation of China under Grant 61901183.
Conflicts of interest
There are no conflicts of interest.
Availability of data and material
The data comes from the common dataset
Code availability
Custom code
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wan, T., Luo, Y., Zhang, Z. et al. TSNet : Tree structure network for human pose estimation. SIViP 16, 551–558 (2022). https://doi.org/10.1007/s11760-021-01999-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01999-y