research-article

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition

Authors:

Yue GaoAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1310 - 1318

https://doi.org/10.1145/3240508.3240702

Published: 15 October 2018 Publication History

Abstract

3D object recognition has attracted wide research attention in the field of multimedia and computer vision. With the recent proliferation of deep learning, various deep models with different representations have achieved the state-of-the-art performance. Among them, point cloud and multi-view based 3D shape representations are promising recently, and their corresponding deep models have shown significant performance on 3D shape recognition. However, there is little effort concentrating point cloud data and multi-view data for 3D shape representation, which is, in our consideration, beneficial and compensated to each other. In this paper, we propose the Point-View Network (PVNet), the first framework integrating both the point cloud and the multi-view data towards joint 3D shape recognition. More specifically, an embedding attention fusion scheme is proposed that could employ high-level features from the multi-view data to model the intrinsic correlation and discriminability of different structure features from the point cloud data. In particular, the discriminative descriptions are quantified and leveraged as the soft attention mask to further refine the structure feature of the 3D shape. We have evaluated the proposed method on the ModelNet40 dataset for 3D shape classification and retrieval tasks. Experimental results and comparisons with state-of-the-art methods demonstrate that our framework can achieve superior performance.

References

[1]

Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016).

[2]

Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. Springer, 354--370.

[3]

Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer graphics forum, Vol. 22. Wiley Online Library, 223--232.

[4]

Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3640--3649.

[5]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In IEEE CVPR, Vol. 1. 3.

[6]

Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. {n. d.}. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition.

[7]

Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, and Xiaogang Wang. 2018. Question-Guided Hybrid Convolution for Visual Question Answering. arXiv preprint arXiv:1808.02632 (2018).

[8]

Yue Gao, Jinhui Tang, Richang Hong, Shuicheng Yan, Qionghai Dai, Naiyao Zhang, and Tat-Seng Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.

Digital Library

[9]

Alejandro González, David Vázquez, Antonio M López, and Jaume Amores. 2017. On-board object detection: Multicue, multimodal, and multiview random forest of local experts. IEEE transactions on cybernetics 47, 11 (2017), 3980--3990.

[10]

Haiyun Guo, Jinqiao Wang, Yue Gao, Jianqiang Li, and Hanqing Lu. 2016. Multiview 3d object retrieval with deep embedding network. IEEE Transactions on Image Processing 25, 12 (2016), 5526--5537.

Digital Library

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[12]

Vishakh Hegde and Reza Zadeh. 2016. Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695 (2016).

[13]

Judy Hoffman, Saurabh Gupta, and Trevor Darrell. 2016. Learning with side information through modality hallucination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 826--834.

[14]

Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R Hershey, Tim K Marks, and Kazuhiko Sumi. 2017. Attention-based multimodal fusion for video description. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 4203--4212.

[15]

Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 3.

[16]

Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Symposium on geometry processing, Vol. 6. 156--164.

Digital Library

[17]

Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 863--872.

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[19]

Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv preprint arXiv:1801.07791 (2018).

[20]

Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 922--928.

Digital Library

[21]

Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Mazumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Multi-level multiple attentions for contextual multimodal sentiment analysis. In Data Mining (ICDM), 2017 IEEE International Conference on. IEEE, 1033--1038.

[22]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017), 4.

[23]

Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5648--5656.

[24]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5105--5114.

[25]

Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. 2017. Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds. arXiv preprint arXiv:1712.06760 (2017).

[26]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[27]

Shuran Song and Jianxiong Xiao. 2016. Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 808--816.

[28]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.

Digital Library

[29]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. Cvpr.

[30]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.

[31]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2018. Dynamic Graph CNN for Learning on Point Clouds. arXiv preprint arXiv:1801.07829 (2018).

[32]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1912--1920.

[33]

Jin Xie, Guoxian Dai, Fan Zhu, Edward K Wong, and Yi Fang. 2017. Deepshape: deep-learned shape descriptor for 3d shape retrieval. IEEE transactions on pattern analysis and machine intelligence 39, 7 (2017), 1335--1345.

Cited By

Xu YFeng YBie LGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Triadic Elastic Structure Representation for Open-Set Incremental 3D Object RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658046(20-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658046
Shen SZhu ZFan LZhang HWu X(2024)DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00356(3584-3593)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00356
Feng YJi SLiu YDu SDai QGao Y(2024)Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333276846:4(2206-2223)Online publication date: Apr-2024
https://doi.org/10.1109/TPAMI.2023.3332768
Show More Cited By

Index Terms

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
2. Information systems
  1. Information retrieval

Recommendations

PVLNet: Parameterized-View-Learning neural network for 3D shape recognition
Highlights
- We propose parameterized-view-learning mechanism which aims to parameterize the views as parameters in the multi-view networks.
Graphical abstract
We propose parameterized-view-learning (PVL) mechanism to build an efficient and light-weight multi-view based network, named as PVLNet. From the experiments on ScanObjectNN and ModelNet40 benchmark, with 1/10 FLOPS and GPU ...
Abstract
3D shape recognition has drawn much attention in recent years. Despite the amazing progress on view-based 3D feature description, previous multi-view based methods suffer from a burden in computation efficiency compared with point ...
SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval
Abstract
To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors for 3D objects represented by a single modality, such as voxels, point clouds, or multiview images. It is promising to leverage ...
PVFNet: Point-View Fusion Network for 3D Shape Recognition
Knowledge Science, Engineering and Management
Abstract
3D object recognition has enjoyed much of research attention in the machine vision filed. Deep learning methods for 3D shape recognition such as the multi-view based methods and the point cloud based methods have achieved the state-of-the-art ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
National Natural Science Funds of China

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

117
Total Citations
View Citations
969
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)14

Reflects downloads up to 09 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu YFeng YBie LGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Triadic Elastic Structure Representation for Open-Set Incremental 3D Object RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658046(20-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658046
Shen SZhu ZFan LZhang HWu X(2024)DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00356(3584-3593)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00356
Feng YJi SLiu YDu SDai QGao Y(2024)Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333276846:4(2206-2223)Online publication date: Apr-2024
https://doi.org/10.1109/TPAMI.2023.3332768
Xu LCui QHong RXu WChen EYuan XLi CTang Y(2024)Group Multi-View Transformer for 3D Shape Analysis With Spatial EncodingIEEE Transactions on Multimedia10.1109/TMM.2024.339473126(9450-9463)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3394731
Wang TFu MChen KLi FQu HLuo M(2024)ChainFrame: A Chain Framework for Point Cloud ClassificationIEEE Transactions on Industrial Informatics10.1109/TII.2023.332368620:3(4451-4462)Online publication date: Mar-2024
https://doi.org/10.1109/TII.2023.3323686
Wang TChen KLuo MQu H(2024)Attention-Based Deep Neural Network for Point Cloud Learning2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651382(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651382
Romanelis IFotis VMoustakas KMunteanu A(2024)ExpPoint-MAE: Better Interpretability and Performance for Self-Supervised Point Cloud TransformersIEEE Access10.1109/ACCESS.2024.338815512(53565-53578)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3388155
Fei JDeng Z(2024)Incorporating Rotation Invariance with Non-invariant Networks for Point Clouds2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00070(985-994)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00070
Chang RMa YHao TWang WNie W(2024)3D shape knowledge graph for cross‐domain 3D shape retrievalCAAI Transactions on Intelligence Technology10.1049/cit2.12326Online publication date: 2-Apr-2024
https://doi.org/10.1049/cit2.12326
He XCheng SLiang DBai SWang XZhu Y(2024)LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognitionPattern Recognition10.1016/j.patcog.2024.110413151(110413)Online publication date: Jul-2024
https://doi.org/10.1016/j.patcog.2024.110413
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents