[go: up one dir, main page]

skip to main content
10.1145/3240508.3240702acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition

Published: 15 October 2018 Publication History

Abstract

3D object recognition has attracted wide research attention in the field of multimedia and computer vision. With the recent proliferation of deep learning, various deep models with different representations have achieved the state-of-the-art performance. Among them, point cloud and multi-view based 3D shape representations are promising recently, and their corresponding deep models have shown significant performance on 3D shape recognition. However, there is little effort concentrating point cloud data and multi-view data for 3D shape representation, which is, in our consideration, beneficial and compensated to each other. In this paper, we propose the Point-View Network (PVNet), the first framework integrating both the point cloud and the multi-view data towards joint 3D shape recognition. More specifically, an embedding attention fusion scheme is proposed that could employ high-level features from the multi-view data to model the intrinsic correlation and discriminability of different structure features from the point cloud data. In particular, the discriminative descriptions are quantified and leveraged as the soft attention mask to further refine the structure feature of the 3D shape. We have evaluated the proposed method on the ModelNet40 dataset for 3D shape classification and retrieval tasks. Experimental results and comparisons with state-of-the-art methods demonstrate that our framework can achieve superior performance.

References

[1]
Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016).
[2]
Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. Springer, 354--370.
[3]
Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer graphics forum, Vol. 22. Wiley Online Library, 223--232.
[4]
Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3640--3649.
[5]
Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In IEEE CVPR, Vol. 1. 3.
[6]
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. {n. d.}. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition.
[7]
Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, and Xiaogang Wang. 2018. Question-Guided Hybrid Convolution for Visual Question Answering. arXiv preprint arXiv:1808.02632 (2018).
[8]
Yue Gao, Jinhui Tang, Richang Hong, Shuicheng Yan, Qionghai Dai, Naiyao Zhang, and Tat-Seng Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.
[9]
Alejandro González, David Vázquez, Antonio M López, and Jaume Amores. 2017. On-board object detection: Multicue, multimodal, and multiview random forest of local experts. IEEE transactions on cybernetics 47, 11 (2017), 3980--3990.
[10]
Haiyun Guo, Jinqiao Wang, Yue Gao, Jianqiang Li, and Hanqing Lu. 2016. Multiview 3d object retrieval with deep embedding network. IEEE Transactions on Image Processing 25, 12 (2016), 5526--5537.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[12]
Vishakh Hegde and Reza Zadeh. 2016. Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695 (2016).
[13]
Judy Hoffman, Saurabh Gupta, and Trevor Darrell. 2016. Learning with side information through modality hallucination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 826--834.
[14]
Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R Hershey, Tim K Marks, and Kazuhiko Sumi. 2017. Attention-based multimodal fusion for video description. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 4203--4212.
[15]
Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 3.
[16]
Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Symposium on geometry processing, Vol. 6. 156--164.
[17]
Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 863--872.
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[19]
Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv preprint arXiv:1801.07791 (2018).
[20]
Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 922--928.
[21]
Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Mazumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Multi-level multiple attentions for contextual multimodal sentiment analysis. In Data Mining (ICDM), 2017 IEEE International Conference on. IEEE, 1033--1038.
[22]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017), 4.
[23]
Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5648--5656.
[24]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5105--5114.
[25]
Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. 2017. Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds. arXiv preprint arXiv:1712.06760 (2017).
[26]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[27]
Shuran Song and Jianxiong Xiao. 2016. Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 808--816.
[28]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.
[29]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. Cvpr.
[30]
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.
[31]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2018. Dynamic Graph CNN for Learning on Point Clouds. arXiv preprint arXiv:1801.07829 (2018).
[32]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1912--1920.
[33]
Jin Xie, Guoxian Dai, Fan Zhu, Edward K Wong, and Yi Fang. 2017. Deepshape: deep-learned shape descriptor for 3d shape retrieval. IEEE transactions on pattern analysis and machine intelligence 39, 7 (2017), 1335--1345.

Cited By

View all
  • (2024)Triadic Elastic Structure Representation for Open-Set Incremental 3D Object RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658046(20-28)Online publication date: 30-May-2024
  • (2024)DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00356(3584-3593)Online publication date: 3-Jan-2024
  • (2024)Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333276846:4(2206-2223)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '18: Proceedings of the 26th ACM international conference on Multimedia
      October 2018
      2167 pages
      ISBN:9781450356657
      DOI:10.1145/3240508
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. 3d shape recognition
      2. multi-view
      3. point cloud
      4. point-view net

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • National Natural Science Funds of China

      Conference

      MM '18
      Sponsor:
      MM '18: ACM Multimedia Conference
      October 22 - 26, 2018
      Seoul, Republic of Korea

      Acceptance Rates

      MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)83
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 09 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Triadic Elastic Structure Representation for Open-Set Incremental 3D Object RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658046(20-28)Online publication date: 30-May-2024
      • (2024)DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00356(3584-3593)Online publication date: 3-Jan-2024
      • (2024)Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333276846:4(2206-2223)Online publication date: Apr-2024
      • (2024)Group Multi-View Transformer for 3D Shape Analysis With Spatial EncodingIEEE Transactions on Multimedia10.1109/TMM.2024.339473126(9450-9463)Online publication date: 2024
      • (2024)ChainFrame: A Chain Framework for Point Cloud ClassificationIEEE Transactions on Industrial Informatics10.1109/TII.2023.332368620:3(4451-4462)Online publication date: Mar-2024
      • (2024)Attention-Based Deep Neural Network for Point Cloud Learning2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651382(1-8)Online publication date: 30-Jun-2024
      • (2024)ExpPoint-MAE: Better Interpretability and Performance for Self-Supervised Point Cloud TransformersIEEE Access10.1109/ACCESS.2024.338815512(53565-53578)Online publication date: 2024
      • (2024)Incorporating Rotation Invariance with Non-invariant Networks for Point Clouds2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00070(985-994)Online publication date: 18-Mar-2024
      • (2024)3D shape knowledge graph for cross‐domain 3D shape retrievalCAAI Transactions on Intelligence Technology10.1049/cit2.12326Online publication date: 2-Apr-2024
      • (2024)LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognitionPattern Recognition10.1016/j.patcog.2024.110413151(110413)Online publication date: Jul-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media