Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper
<p>The 3D data representations of the Stanford Bunny [<a href="#B33-entropy-26-00235" class="html-bibr">33</a>] model: point cloud (<b>left</b>), voxels (<b>middle</b>), and 3D mesh (<b>right</b>) [<a href="#B34-entropy-26-00235" class="html-bibr">34</a>].</p> "> Figure 2
<p>RBG-D reconstruction and semantic annotation framework of ScanNet [<a href="#B39-entropy-26-00235" class="html-bibr">39</a>] dataset.</p> "> Figure 3
<p>System structure of PointOutNet [<a href="#B9-entropy-26-00235" class="html-bibr">9</a>] model.</p> "> Figure 4
<p>Pipeline of pseudo-renderer [<a href="#B12-entropy-26-00235" class="html-bibr">12</a>] model.</p> "> Figure 5
<p>Network architecture of RealPoint3D [<a href="#B13-entropy-26-00235" class="html-bibr">13</a>] model.</p> "> Figure 6
<p>Overview of cycle-consistency-based approach [<a href="#B15-entropy-26-00235" class="html-bibr">15</a>].</p> "> Figure 7
<p>Network architecture of GenRe [<a href="#B20-entropy-26-00235" class="html-bibr">20</a>] model.</p> "> Figure 8
<p>Network architecture of MarrNet [<a href="#B21-entropy-26-00235" class="html-bibr">21</a>] model.</p> "> Figure 9
<p>Network architecture of Perspective Transformer Nets [<a href="#B23-entropy-26-00235" class="html-bibr">23</a>] model.</p> "> Figure 10
<p>Proposed methods for reconstructing pose-aware 3D voxelised shapes: p-TL (parts 1 and 3) and p-3D-VAE-GAN (parts 2 and 3) [<a href="#B24-entropy-26-00235" class="html-bibr">24</a>] models.</p> "> Figure 11
<p>The generator in 3D-GAN [<a href="#B27-entropy-26-00235" class="html-bibr">27</a>] model.</p> "> Figure 12
<p>Pipeline for single-image 3D reconstruction [<a href="#B35-entropy-26-00235" class="html-bibr">35</a>].</p> "> Figure 13
<p>Main network structure of Residual MeshNet [<a href="#B36-entropy-26-00235" class="html-bibr">36</a>].</p> "> Figure 14
<p>Cascaded mesh deformation network [<a href="#B37-entropy-26-00235" class="html-bibr">37</a>].</p> "> Figure 15
<p>Pipeline of 3D reconstruction using CoReNet [<a href="#B38-entropy-26-00235" class="html-bibr">38</a>].</p> "> Figure 16
<p>Proposed framework of unsupervised learning of 3D structure from images [<a href="#B18-entropy-26-00235" class="html-bibr">18</a>].</p> "> Figure 17
<p>Proposed framework of Pix2Vox++ network [<a href="#B30-entropy-26-00235" class="html-bibr">30</a>].</p> "> Figure 18
<p>An overview of the 3D-R2N2 network [<a href="#B11-entropy-26-00235" class="html-bibr">11</a>].</p> "> Figure 19
<p>An overview of the shape-learning approach [<a href="#B32-entropy-26-00235" class="html-bibr">32</a>].</p> "> Figure 20
<p>An overview of the RPM-Net network [<a href="#B139-entropy-26-00235" class="html-bibr">139</a>].</p> "> Figure 21
<p>The architecture of DeepICP [<a href="#B140-entropy-26-00235" class="html-bibr">140</a>].</p> "> Figure 22
<p>Proposed pipeline for 3D multi-view registration [<a href="#B145-entropy-26-00235" class="html-bibr">145</a>].</p> "> Figure 23
<p>Architecture of MaskNet [<a href="#B165-entropy-26-00235" class="html-bibr">165</a>].</p> "> Figure 24
<p>Illustration of the proposed DMR network [<a href="#B167-entropy-26-00235" class="html-bibr">167</a>].</p> "> Figure 25
<p>Architecture of PU-Net [<a href="#B168-entropy-26-00235" class="html-bibr">168</a>].</p> "> Figure 26
<p>Overview of MPU with 3 levels of detail [<a href="#B169-entropy-26-00235" class="html-bibr">169</a>].</p> "> Figure 27
<p>General overview of CP-Net [<a href="#B170-entropy-26-00235" class="html-bibr">170</a>].</p> "> Figure 28
<p>Training of the proposed sampling method [<a href="#B171-entropy-26-00235" class="html-bibr">171</a>].</p> "> Figure 29
<p>Architecture of PCN [<a href="#B205-entropy-26-00235" class="html-bibr">205</a>].</p> "> Figure 30
<p>Architecture of MSN [<a href="#B209-entropy-26-00235" class="html-bibr">209</a>].</p> "> Figure 31
<p>Architecture of PF-Net [<a href="#B214-entropy-26-00235" class="html-bibr">214</a>].</p> "> Figure 32
<p>Overview of GRNet [<a href="#B211-entropy-26-00235" class="html-bibr">211</a>].</p> "> Figure 33
<p>Overview of SnowflakeNet [<a href="#B190-entropy-26-00235" class="html-bibr">190</a>].</p> ">
Abstract
:1. Introduction
1.1. Our Previous Work
1.2. Research Methodology
2. 3D Data Representations
2.1. Point Clouds
2.2. Voxels
2.3. Meshes
3. 3D Benchmark Datasets
3.1. ModelNet
3.2. PASCAL3D+
3.3. ShapeNet
3.4. ObjectNet3D
3.5. ScanNet
4. Object Reconstruction
4.1. Procedural-Based Approaches
4.2. Deep-Learning-Based Approaches
4.3. Single-View Reconstruction
4.3.1. Point Cloud Representation
4.3.2. Voxel Representation
4.3.3. Mesh Representation
4.4. Multiple-View Reconstruction
4.4.1. Point Cloud Representation
4.4.2. Voxel Representation
5. Registration
5.1. Traditional Methods
5.2. Learning-Based Methods
6. Augmentation
6.1. Denoising
6.2. Upsampling
6.3. Downsampling
7. Point Cloud Completion
Model | Advantages | Limitations |
---|---|---|
PCN [205] | Acquires knowledge of a projection from the space of incomplete observations to the space of fully formed shapes. | Requires training data to be prepared in partial shapes since it expects a test input that is identical to the training data. |
USCN [207] | Does not require explicit correspondence between example complete shape models and incomplete point sets. | Training GANs can be difficult due to common errors such as mode collapse. |
MSN [209] | Uses EMD as a better metric for measuring completion quality. | Frequently disregards the spatial correlation between points. |
PF-Net [214] | Accepts a partial point cloud as input and only outputs the portion of the point cloud that is missing. | Model’s intricate design results in a comparatively large number of parameters. |
GRNet [211] | Uses 3D grids as intermediary representations to maintain unordered point clouds. | Difficult to maintain an organised structure for points in small patches due to the discontinuous character of the point cloud. |
SnowflakeNet [190] | Focuses specifically on the process of decoding incomplete point clouds. | Fine-grained details are lost easily during pooling operations in the encoding phase. |
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
3D | Three-dimensional |
2D | Two-dimensional |
LiDAR | Light detection and ranging |
RGB-D | Red, green, blue plus depth |
CAD | Computer-aided design |
MLP | Multilayer perceptron |
CNN | Convolutional neural network |
FCGFs | Fully convolutional geometric features |
GPU | Graphics processing unit |
RAM | Random access memory |
References
- Vinodkumar, P.K.; Karabulut, D.; Avots, E.; Ozcinar, C.; Anbarjafari, G. A Survey on Deep Learning Based Segmentation, Detection and Classification for 3D Point Clouds. Entropy 2023, 25, 635. [Google Scholar] [CrossRef]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9297–9307. [Google Scholar]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1534–1543. [Google Scholar]
- Qi, C.R.; Chen, X.; Litany, O.; Guibas, L.J. ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes. arXiv 2020, arXiv:2001.10692. [Google Scholar] [CrossRef]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. arXiv 2018, arXiv:1812.04244. [Google Scholar] [CrossRef]
- Hanocka, R.; Hertz, A.; Fish, N.; Giryes, R.; Fleishman, S.; Cohen-Or, D. Meshcnn: A network with an edge. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Wang, S.; Zhu, J.; Zhang, R. Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple Feature Aggregation. arXiv 2022, arXiv:2202.13377. [Google Scholar]
- Fan, H.; Su, H.; Guibas, L.J. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. arXiv 2016, arXiv:1612.00603. [Google Scholar] [CrossRef]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An Information-Rich 3D Model Repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 628–644. [Google Scholar]
- Lin, C.H.; Kong, C.; Lucey, S. Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction. arXiv 2017, arXiv:1706.07036. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Z.; Liu, T.; Peng, B.; Li, X. RealPoint3D: An Efficient Generation Network for 3D Object Reconstruction From a Single Image. IEEE Access 2019, 7, 57539–57549. [Google Scholar] [CrossRef]
- Xiang, Y.; Kim, W.; Chen, W.; Ji, J.; Choy, C.; Su, H.; Mottaghi, R.; Guibas, L.; Savarese, S. Objectnet3d: A large scale database for 3d object recognition. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 160–176. [Google Scholar]
- Navaneet, K.L.; Mathew, A.; Kashyap, S.; Hung, W.C.; Jampani, V.; Babu, R.V. From Image Collections to Point Clouds with Self-supervised Shape and Pose Networks. arXiv 2020, arXiv:2005.01939. [Google Scholar] [CrossRef]
- Sun, X.; Wu, J.; Zhang, X.; Zhang, Z.; Zhang, C.; Xue, T.; Tenenbaum, J.B.; Freeman, W.T. Pix3d: Dataset and methods for single-image 3d shape modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2974–2983. [Google Scholar]
- Bautista, M.A.; Talbott, W.; Zhai, S.; Srivastava, N.; Susskind, J.M. On the generalization of learning-based 3D reconstruction. arXiv 2020, arXiv:2006.15427. [Google Scholar] [CrossRef]
- Rezende, D.J.; Eslami, S.M.A.; Mohamed, S.; Battaglia, P.; Jaderberg, M.; Heess, N. Unsupervised Learning of 3D Structure from Images. arXiv 2016, arXiv:1607.00662. [Google Scholar] [CrossRef]
- LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 12 November 2023).
- Zhang, X.; Zhang, Z.; Zhang, C.; Tenenbaum, J.B.; Freeman, W.T.; Wu, J. Learning to Reconstruct Shapes from Unseen Classes. arXiv 2018, arXiv:1812.11166. [Google Scholar]
- Wu, J.; Wang, Y.; Xue, T.; Sun, X.; Freeman, W.T.; Tenenbaum, J.B. MarrNet: 3D Shape Reconstruction via 2.5D Sketches. arXiv 2017, arXiv:1711.03129. [Google Scholar] [CrossRef]
- Xiang, Y.; Mottaghi, R.; Savarese, S. Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Steamboat Springs, CO, USA, 24–26 March 2014. [Google Scholar]
- Yan, X.; Yang, J.; Yumer, E.; Guo, Y.; Lee, H. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision. arXiv 2016, arXiv:1612.00814. [Google Scholar] [CrossRef]
- Zhu, R.; Galoogahi, H.K.; Wang, C.; Lucey, S. Rethinking Reprojection: Closing the Loop for Pose-Aware Shape Reconstruction from a Single Image. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 57–65. [Google Scholar] [CrossRef]
- Xiao, J.; Hays, J.; Ehinger, K.A.; Oliva, A.; Torralba, A. SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Wu, J.; Zhang, C.; Xue, T.; Freeman, W.T.; Tenenbaum, J.B. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. arXiv 2016, arXiv:1610.07584. [Google Scholar] [CrossRef]
- Wu, Z.; Song, S.; Khosla, A.; Tang, X.; Xiao, J. 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction. arXiv 2014, arXiv:1406.5670. [Google Scholar]
- Lim, J.J.; Pirsiavash, H.; Torralba, A. Parsing ikea objects: Fine pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2992–2999. [Google Scholar]
- Xie, H.; Yao, H.; Zhang, S.; Zhou, S.; Sun, W. Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. Int. J. Comput. Vis. 2020, 128, 2919–2935. [Google Scholar] [CrossRef]
- Gwak, J.; Choy, C.B.; Garg, A.; Chandraker, M.; Savarese, S. Weakly supervised 3D Reconstruction with Adversarial Constraint. arXiv 2017, arXiv:1705.10904. [Google Scholar] [CrossRef]
- Banani, M.E.; Corso, J.J.; Fouhey, D.F. Novel Object Viewpoint Estimation through Reconstruction Alignment. arXiv 2020, arXiv:2006.03586. [Google Scholar] [CrossRef]
- Turk, G.; Levoy, M. Zippered polygon meshes from range images. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, Orlando, FL, USA, 24–29 July 1994; pp. 311–318. [Google Scholar]
- Hoang, L.; Lee, S.H.; Kwon, O.H.; Kwon, K.R. A Deep Learning Method for 3D Object Classification Using the Wave Kernel Signature and A Center Point of the 3D-Triangle Mesh. Electronics 2019, 8, 1196. [Google Scholar] [CrossRef]
- Kato, H.; Ushiku, Y.; Harada, T. Neural 3D Mesh Renderer. arXiv 2017, arXiv:1711.07566. [Google Scholar] [CrossRef]
- Pan, J.; Li, J.Y.; Han, X.; Jia, K. Residual MeshNet: Learning to Deform Meshes for Single-View 3D Reconstruction. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 719–727. [Google Scholar]
- Wang, N.; Zhang, Y.; Li, Z.; Fu, Y.; Liu, W.; Jiang, Y.G. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–67. [Google Scholar]
- Popov, S.; Bauszat, P.; Ferrari, V. CoReNet: Coherent 3D scene reconstruction from a single RGB image. arXiv 2020, arXiv:2004.12989. [Google Scholar] [CrossRef]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [Google Scholar]
- Shilane, P.; Min, P.; Kazhdan, M.; Funkhouser, T. The princeton shape benchmark. In Proceedings of the Shape Modeling Applications, Genova, Italy, 7–9 June 2004; IEEE: Piscataway, NJ, USA, 2004; pp. 167–178. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Henn, A.; Gröger, G.; Stroh, V.; Plümer, L. Model driven reconstruction of roofs from sparse LIDAR point clouds. ISPRS J. Photogramm. Remote Sens. 2013, 76, 17–29. [Google Scholar] [CrossRef]
- Buyukdemircioglu, M.; Kocaman, S.; Kada, M. Deep learning for 3D building reconstruction: A review. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43, 359–366. [Google Scholar] [CrossRef]
- Tran, H.; Khoshelham, K. Procedural reconstruction of 3D indoor models from lidar data using reversible jump Markov Chain Monte Carlo. Remote Sens. 2020, 12, 838. [Google Scholar] [CrossRef]
- Mura, C.; Mattausch, O.; Pajarola, R. Piecewise-planar reconstruction of multi-room interiors with arbitrary wall arrangements. In Proceedings of the Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2016; Volume 35, pp. 179–188. [Google Scholar]
- Oesau, S.; Lafarge, F.; Alliez, P. Indoor scene reconstruction using feature sensitive primitive extraction and graph-cut. ISPRS J. Photogramm. Remote Sens. 2014, 90, 68–82. [Google Scholar] [CrossRef]
- Khoshelham, K.; Díaz-Vilariño, L. 3D modelling of interior spaces: Learning the language of indoor architecture. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 40, 321–326. [Google Scholar] [CrossRef]
- Tran, H.; Khoshelham, K.; Kealy, A.; Díaz-Vilariño, L. Shape grammar approach to 3D modeling of indoor environments using point clouds. J. Comput. Civ. Eng. 2019, 33, 04018055. [Google Scholar] [CrossRef]
- Wonka, P.; Wimmer, M.; Sillion, F.; Ribarsky, W. Instant architecture. ACM Trans. Graph. (TOG) 2003, 22, 669–677. [Google Scholar] [CrossRef]
- Becker, S. Generation and application of rules for quality dependent façade reconstruction. ISPRS J. Photogramm. Remote Sens. 2009, 64, 640–653. [Google Scholar] [CrossRef]
- Dick, A.R.; Torr, P.H.; Cipolla, R. Modelling and interpretation of architecture from several images. Int. J. Comput. Vis. 2004, 60, 111–134. [Google Scholar] [CrossRef]
- Becker, S.; Peter, M.; Fritsch, D. Grammar-supported 3d indoor reconstruction from point clouds for “as-built” BIM. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 2, 17–24. [Google Scholar] [CrossRef]
- Döllner, J. Geospatial artificial intelligence: Potentials of machine learning for 3D point clouds and geospatial digital twins. PFG- Photogramm. Remote Sens. Geoinf. Sci. 2020, 88, 15–24. [Google Scholar] [CrossRef]
- Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
- Hoeser, T.; Kuenzer, C. Object detection and image segmentation with deep learning on earth observation data: A review-part i: Evolution and recent trends. Remote Sens. 2020, 12, 1667. [Google Scholar] [CrossRef]
- Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. I-3 2012, 1, 293–298. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Liu, C.; Kong, D.; Wang, S.; Wang, Z.; Li, J.; Yin, B. Deep3D reconstruction: Methods, data, and challenges. Front. Inf. Technol. Electron. Eng. 2021, 22, 652–672. [Google Scholar] [CrossRef]
- Bhat, S.F.; Alhashim, I.; Wonka, P. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4009–4018. [Google Scholar]
- Kasieczka, G.; Nachman, B.; Shih, D.; Amram, O.; Andreassen, A.; Benkendorfer, K.; Bortolato, B.; Brooijmans, G.; Canelli, F.; Collins, J.H.; et al. The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics. Rep. Prog. Phys. 2021, 84, 124201. [Google Scholar] [CrossRef] [PubMed]
- Yu, X.; Rao, Y.; Wang, Z.; Liu, Z.; Lu, J.; Zhou, J. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12498–12507. [Google Scholar]
- Peng, S.; Niemeyer, M.; Mescheder, L.; Pollefeys, M.; Geiger, A. Convolutional occupancy networks. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 523–540. [Google Scholar]
- Kato, H.; Beker, D.; Morariu, M.; Ando, T.; Matsuoka, T.; Kehl, W.; Gaidon, A. Differentiable rendering: A survey. arXiv 2020, arXiv:2006.12057. [Google Scholar]
- Fu, K.; Peng, J.; He, Q.; Zhang, H. Single image 3D object reconstruction based on deep learning: A review. Multimed. Tools Appl. 2021, 80, 463–498. [Google Scholar] [CrossRef]
- Zhang, Y.; Huo, K.; Liu, Z.; Zang, Y.; Liu, Y.; Li, X.; Zhang, Q.; Wang, C. PGNet: A Part-based Generative Network for 3D object reconstruction. Knowl.-Based Syst. 2020, 194, 105574. [Google Scholar] [CrossRef]
- Lu, Q.; Xiao, M.; Lu, Y.; Yuan, X.; Yu, Y. Attention-based dense point cloud reconstruction from a single image. IEEE Access 2019, 7, 137420–137431. [Google Scholar] [CrossRef]
- Yuniarti, A.; Suciati, N. A review of deep learning techniques for 3D reconstruction of 2D images. In Proceedings of the 2019 12th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 18 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 327–331. [Google Scholar]
- Monnier, T.; Fisher, M.; Efros, A.A.; Aubry, M. Share with thy neighbors: Single-view reconstruction by cross-instance consistency. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 285–303. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Hu, T.; Wang, L.; Xu, X.; Liu, S.; Jia, J. Self-supervised 3D mesh reconstruction from single images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6002–6011. [Google Scholar]
- Joung, S.; Kim, S.; Kim, M.; Kim, I.J.; Sohn, K. Learning canonical 3d object representation for fine-grained recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1035–1045. [Google Scholar]
- Niemeyer, M.; Mescheder, L.; Oechsle, M.; Geiger, A. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3504–3515. [Google Scholar]
- Biundini, I.Z.; Pinto, M.F.; Melo, A.G.; Marcato, A.L.; Honório, L.M.; Aguiar, M.J. A framework for coverage path planning optimization based on point cloud for structural inspection. Sensors 2021, 21, 570. [Google Scholar] [CrossRef]
- Chibane, J.; Alldieck, T.; Pons-Moll, G. Implicit functions in feature space for 3d shape reconstruction and completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6970–6981. [Google Scholar]
- Collins, J.; Goel, S.; Deng, K.; Luthra, A.; Xu, L.; Gundogdu, E.; Zhang, X.; Vicente, T.F.Y.; Dideriksen, T.; Arora, H.; et al. Abo: Dataset and benchmarks for real-world 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21126–21136. [Google Scholar]
- Sahu, C.K.; Young, C.; Rai, R. Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: A review. Int. J. Prod. Res. 2021, 59, 4903–4959. [Google Scholar] [CrossRef]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar]
- Liu, R.; Wu, R.; Van Hoorick, B.; Tokmakov, P.; Zakharov, S.; Vondrick, C. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 9298–9309. [Google Scholar]
- Xu, D.; Jiang, Y.; Wang, P.; Fan, Z.; Shi, H.; Wang, Z. Sinnerf: Training neural radiance fields on complex scenes from a single image. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 736–753. [Google Scholar]
- Kanazawa, A.; Tulsiani, S.; Efros, A.A.; Malik, J. Learning category-specific mesh reconstruction from image collections. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 371–386. [Google Scholar]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
- Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4578–4587. [Google Scholar]
- Sitzmann, V.; Zollhöfer, M.; Wetzstein, G. Scene representation networks: Continuous 3d-structure-aware neural scene representations. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Enebuse, I.; Foo, M.; Ibrahim, B.S.K.K.; Ahmed, H.; Supmak, F.; Eyobu, O.S. A comparative review of hand-eye calibration techniques for vision guided robots. IEEE Access 2021, 9, 113143–113155. [Google Scholar] [CrossRef]
- Tatarchenko, M.; Richter, S.R.; Ranftl, R.; Li, Z.; Koltun, V.; Brox, T. What do single-view 3d reconstruction networks learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3405–3414. [Google Scholar]
- Sünderhauf, N.; Brock, O.; Scheirer, W.; Hadsell, R.; Fox, D.; Leitner, J.; Upcroft, B.; Abbeel, P.; Burgard, W.; Milford, M.; et al. The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 2018, 37, 405–420. [Google Scholar] [CrossRef]
- Han, X.F.; Laga, H.; Bennamoun, M. Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1578–1604. [Google Scholar] [CrossRef]
- Varol, G.; Ceylan, D.; Russell, B.; Yang, J.; Yumer, E.; Laptev, I.; Schmid, C. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 20–36. [Google Scholar]
- Najibi, M.; Ji, J.; Zhou, Y.; Qi, C.R.; Yan, X.; Ettinger, S.; Anguelov, D. Motion inspired unsupervised perception and prediction in autonomous driving. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 424–443. [Google Scholar]
- Xu, Q.; Wang, W.; Ceylan, D.; Mech, R.; Neumann, U. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Zhang, Z.; Zhang, C.; Wu, J.; Torralba, A.; Tenenbaum, J.; Freeman, B. Visual object networks: Image generation with disentangled 3D representations. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
- Gadelha, M.; Maji, S.; Wang, R. 3d shape induction from 2d views of multiple objects. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 402–411. [Google Scholar]
- Chan, E.R.; Monteiro, M.; Kellnhofer, P.; Wu, J.; Wetzstein, G. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5799–5809. [Google Scholar]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 165–174. [Google Scholar]
- Gao, J.; Shen, T.; Wang, Z.; Chen, W.; Yin, K.; Li, D.; Litany, O.; Gojcic, Z.; Fidler, S. Get3d: A generative model of high quality 3d textured shapes learned from images. Adv. Neural Inf. Process. Syst. 2022, 35, 31841–31854. [Google Scholar]
- Mittal, P.; Cheng, Y.C.; Singh, M.; Tulsiani, S. Autosdf: Shape priors for 3d completion, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 306–315. [Google Scholar]
- Li, X.; Liu, S.; Kim, K.; De Mello, S.; Jampani, V.; Yang, M.H.; Kautz, J. Self-supervised single-view 3d reconstruction via semantic consistency. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–693. [Google Scholar]
- de Melo, C.M.; Torralba, A.; Guibas, L.; DiCarlo, J.; Chellappa, R.; Hodgins, J. Next-generation deep learning based on simulators and synthetic data. Trends Cogn. Sci. 2022, 26. [Google Scholar] [CrossRef]
- Loper, M.M.; Black, M.J. OpenDR: An approximate differentiable renderer. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VII 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 154–169. [Google Scholar]
- Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W.Y.; Johnson, J.; Gkioxari, G. Accelerating 3d deep learning with pytorch3d. arXiv 2020, arXiv:2007.08501. [Google Scholar]
- Michel, O.; Bar-On, R.; Liu, R.; Benaim, S.; Hanocka, R. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13492–13502. [Google Scholar]
- Fahim, G.; Amin, K.; Zarif, S. Single-View 3D reconstruction: A Survey of deep learning methods. Comput. Graph. 2021, 94, 164–190. [Google Scholar] [CrossRef]
- Tang, J.; Han, X.; Pan, J.; Jia, K.; Tong, X. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4541–4550. [Google Scholar]
- Xu, Q.; Nie, Z.; Xu, H.; Zhou, H.; Attar, H.R.; Li, N.; Xie, F.; Liu, X.J. SuperMeshing: A new deep learning architecture for increasing the mesh density of physical fields in metal forming numerical simulation. J. Appl. Mech. 2022, 89, 011002. [Google Scholar] [CrossRef]
- Dahnert, M.; Hou, J.; Nießner, M.; Dai, A. Panoptic 3d scene reconstruction from a single rgb image. Adv. Neural Inf. Process. Syst. 2021, 34, 8282–8293. [Google Scholar]
- Liu, F.; Liu, X. Voxel-based 3d detection and reconstruction of multiple objects from a single image. Adv. Neural Inf. Process. Syst. 2021, 34, 2413–2426. [Google Scholar]
- Pan, J.; Han, X.; Chen, W.; Tang, J.; Jia, K. Deep mesh reconstruction from single rgb images via topology modification networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9964–9973. [Google Scholar]
- Mustikovela, S.K.; De Mello, S.; Prakash, A.; Iqbal, U.; Liu, S.; Nguyen-Phuoc, T.; Rother, C.; Kautz, J. Self-supervised object detection via generative image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8609–8618. [Google Scholar]
- Huang, Z.; Jampani, V.; Thai, A.; Li, Y.; Stojanov, S.; Rehg, J.M. ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12912–12922. [Google Scholar]
- Kar, A.; Häne, C.; Malik, J. Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Yang, G.; Cui, Y.; Belongie, S.; Hariharan, B. Learning single-view 3d reconstruction with limited pose supervision. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 86–101. [Google Scholar]
- Huang, Z.; Stojanov, S.; Thai, A.; Jampani, V.; Rehg, J.M. Planes vs. chairs: Category-guided 3d shape learning without any 3d cues. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 727–744. [Google Scholar]
- Jiao, L.; Huang, Z.; Liu, X.; Yang, Y.; Ma, M.; Zhao, J.; You, C.; Hou, B.; Yang, S.; Liu, F.; et al. Brain-inspired Remote Sensing Interpretation: A Comprehensive Survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, Volume 16, 2992–3033. [Google Scholar] [CrossRef]
- Yang, Z.; Ren, Z.; Bautista, M.A.; Zhang, Z.; Shan, Q.; Huang, Q. FvOR: Robust joint shape and pose optimization for few-view object reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2497–2507. [Google Scholar]
- Bechtold, J.; Tatarchenko, M.; Fischer, V.; Brox, T. Fostering generalization in single-view 3d reconstruction by learning a hierarchy of local and global shape priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15880–15889. [Google Scholar]
- Thai, A.; Stojanov, S.; Upadhya, V.; Rehg, J.M. 3d reconstruction of novel object shapes from single images. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 85–95. [Google Scholar]
- Yang, X.; Lin, G.; Zhou, L. Single-View 3D Mesh Reconstruction for Seen and Unseen Categories. IEEE Trans. Image Process. 2023, 32, 3746–3758. [Google Scholar] [CrossRef] [PubMed]
- Anciukevicius, T.; Fox-Roberts, P.; Rosten, E.; Henderson, P. Unsupervised Causal Generative Understanding of Images. Adv. Neural Inf. Process. Syst. 2022, 35, 37037–37054. [Google Scholar]
- Fan, H.; Su, H.; Guibas, L.J. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
- Niemeyer, M.; Geiger, A. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11453–11464. [Google Scholar]
- Or-El, R.; Luo, X.; Shan, M.; Shechtman, E.; Park, J.J.; Kemelmacher-Shlizerman, I. Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13503–13513. [Google Scholar]
- Xie, H.; Yao, H.; Sun, X.; Zhou, S.; Zhang, S. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2690–2698. [Google Scholar]
- Melas-Kyriazi, L.; Laina, I.; Rupprecht, C.; Vedaldi, A. Realfusion: 360deg reconstruction of any object from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8446–8455. [Google Scholar]
- Xiang, P.; Wen, X.; Liu, Y.S.; Cao, Y.P.; Wan, P.; Zheng, W.; Han, Z. Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 6320–6338. [Google Scholar] [CrossRef]
- Boulch, A.; Marlet, R. Poco: Point convolution for surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6302–6314. [Google Scholar]
- Wen, X.; Zhou, J.; Liu, Y.S.; Su, H.; Dong, Z.; Han, Z. 3D shape reconstruction from 2D images with disentangled attribute flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3803–3813. [Google Scholar]
- Wang, D.; Cui, X.; Chen, X.; Zou, Z.; Shi, T.; Salcudean, S.; Wang, Z.J.; Ward, R. Multi-view 3d reconstruction with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5722–5731. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
- Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5939–5948. [Google Scholar]
- Wen, C.; Zhang, Y.; Li, Z.; Fu, Y. Pixel2mesh++: Multi-view 3d mesh generation via deformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1042–1051. [Google Scholar]
- Jiang, Y.; Ji, D.; Han, Z.; Zwicker, M. Sdfdiff: Differentiable rendering of signed distance fields for 3d shape optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1251–1261. [Google Scholar]
- Wu, J.; Zhang, C.; Zhang, X.; Zhang, Z.; Freeman, W.T.; Tenenbaum, J.B. Learning shape priors for single-view 3d completion and reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 646–662. [Google Scholar]
- Ma, W.C.; Yang, A.J.; Wang, S.; Urtasun, R.; Torralba, A. Virtual correspondence: Humans as a cue for extreme-view geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15924–15934. [Google Scholar]
- Goodwin, W.; Vaze, S.; Havoutis, I.; Posner, I. Zero-shot category-level object pose estimation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 516–532. [Google Scholar]
- Myronenko, A.; Song, X. Point Set Registration: Coherent Point Drift. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2262–2275. [Google Scholar] [CrossRef] [PubMed]
- Iglesias, J.P.; Olsson, C.; Kahl, F. Global Optimality for Point Set Registration Using Semidefinite Programming. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8284–8292. [Google Scholar] [CrossRef]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 573–580. [Google Scholar]
- Yew, Z.J.; Lee, G.H. RPM-Net: Robust Point Matching using Learned Features. arXiv 2020, arXiv:2003.13479. [Google Scholar] [CrossRef]
- Lu, W.; Wan, G.; Zhou, Y.; Fu, X.; Yuan, P.; Song, S. DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration. arXiv 2019, arXiv:1905.04153. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–24 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3354–3361. [Google Scholar]
- Lu, W.; Zhou, Y.; Wan, G.; Hou, S.; Song, S. L3-net: Towards learning based lidar localization for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 6389–6398. [Google Scholar]
- Gojcic, Z.; Zhou, C.; Wegner, J.D.; Wieser, A. The Perfect Match: 3D Point Cloud Matching with Smoothed Densities. arXiv 2018, arXiv:1811.06879. [Google Scholar] [CrossRef]
- Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1802–1811. [Google Scholar]
- Gojcic, Z.; Zhou, C.; Wegner, J.D.; Guibas, L.J.; Birdal, T. Learning multiview 3D point cloud registration. arXiv 2020, arXiv:2001.05119. [Google Scholar] [CrossRef]
- Choi, S.; Zhou, Q.Y.; Koltun, V. Robust reconstruction of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5556–5565. [Google Scholar]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2021, 129, 23–79. [Google Scholar] [CrossRef]
- Sotiras, A.; Davatzikos, C.; Paragios, N. Deformable medical image registration: A survey. IEEE Trans. Med Imaging 2013, 32, 1153–1190. [Google Scholar] [CrossRef]
- Yang, J.; Li, H.; Campbell, D.; Jia, Y. Go-ICP: A globally optimal solution to 3D ICP point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2241–2254. [Google Scholar] [CrossRef]
- Huang, X.; Mei, G.; Zhang, J.; Abbas, R. A comprehensive survey on point cloud registration. arXiv 2021, arXiv:2103.02690. [Google Scholar]
- Brynte, L.; Larsson, V.; Iglesias, J.P.; Olsson, C.; Kahl, F. On the tightness of semidefinite relaxations for rotation estimation. J. Math. Imaging Vis. 2022, 64, 57–67. [Google Scholar] [CrossRef]
- Yang, H.; Carlone, L. Certifiably optimal outlier-robust geometric perception: Semidefinite relaxations and scalable global optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2816–2834. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Gojcic, Z.; Usvyatsov, M.; Wieser, A.; Schindler, K. Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4267–4276. [Google Scholar]
- Yew, Z.J.; Lee, G.H. Regtr: End-to-end point cloud correspondences with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6677–6686. [Google Scholar]
- Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.L. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15859–15869. [Google Scholar]
- Fu, K.; Liu, S.; Luo, X.; Wang, M. Robust point cloud registration framework based on deep graph matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8893–8902. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar] [CrossRef]
- Ren, S.; Chen, X.; Cai, H.; Wang, Y.; Liang, H.; Li, H. Color point cloud registration algorithm based on hue. Appl. Sci. 2021, 11, 5431. [Google Scholar] [CrossRef]
- Yao, W.; Chu, T.; Tang, W.; Wang, J.; Cao, X.; Zhao, F.; Li, K.; Geng, G.; Zhou, M. SPPD: A Novel Reassembly Method for 3D Terracotta Warrior Fragments Based on Fracture Surface Information. ISPRS Int. J. Geo-Inf. 2021, 10, 525. [Google Scholar] [CrossRef]
- Liu, J.; Liang, Y.; Xu, D.; Gong, X.; Hyyppä, J. A ubiquitous positioning solution of integrating GNSS with LiDAR odometry and 3D map for autonomous driving in urban environments. J. Geod. 2023, 97, 39. [Google Scholar] [CrossRef]
- Du, G.; Wang, K.; Lian, S.; Zhao, K. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review. Artif. Intell. Rev. 2021, 54, 1677–1734. [Google Scholar] [CrossRef]
- Choy, C.; Park, J.; Koltun, V. Fully convolutional geometric features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8958–8966. [Google Scholar]
- Lee, J.; Kim, S.; Cho, M.; Park, J. Deep hough voting for robust global registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15994–16003. [Google Scholar]
- Lu, F.; Chen, G.; Liu, Y.; Zhang, L.; Qu, S.; Liu, S.; Gu, R. Hregnet: A hierarchical network for large-scale outdoor lidar point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16014–16023. [Google Scholar]
- Sarode, V.; Dhagat, A.; Srivatsan, R.A.; Zevallos, N.; Lucey, S.; Choset, H. MaskNet: A Fully-Convolutional Network to Estimate Inlier Points. arXiv 2020, arXiv:2010.09185. [Google Scholar] [CrossRef]
- Pistilli, F.; Fracastoro, G.; Valsesia, D.; Magli, E. Learning Graph-Convolutional Representations for Point Cloud Denoising. arXiv 2020, arXiv:2007.02578. [Google Scholar] [CrossRef]
- Luo, S.; Hu, W. Differentiable Manifold Reconstruction for Point Cloud Denoising. arXiv 2020, arXiv:2007.13551. [Google Scholar] [CrossRef]
- Yu, L.; Li, X.; Fu, C.; Cohen-Or, D.; Heng, P. PU-Net: Point Cloud Upsampling Network. arXiv 2018, arXiv:1801.06761. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, S.; Huang, H.; Cohen-Or, D.; Sorkine-Hornung, O. Patch-based Progressive 3D Point Set Upsampling. arXiv 2018, arXiv:1811.11286. [Google Scholar] [CrossRef]
- Nezhadarya, E.; Taghavi, E.; Liu, B.; Luo, J. Adaptive Hierarchical Down-Sampling for Point Cloud Classification. arXiv 2019, arXiv:1904.08506. [Google Scholar] [CrossRef]
- Lang, I.; Manor, A.; Avidan, S. SampleNet: Differentiable Point Cloud Sampling. arXiv 2019, arXiv:1912.03663. [Google Scholar] [CrossRef]
- Zaman, A.; Yangyu, F.; Ayub, M.S.; Irfan, M.; Guoyun, L.; Shiya, L. CMDGAT: Knowledge extraction and retention based continual graph attention network for point cloud registration. Expert Syst. Appl. 2023, 214, 119098. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, T.; Tang, X.; Lei, X.; Peng, Y. Introducing Improved Transformer to Land Cover Classification Using Multispectral LiDAR Point Clouds. Remote Sens. 2022, 14, 3808. [Google Scholar] [CrossRef]
- Huang, X.; Li, S.; Zuo, Y.; Fang, Y.; Zhang, J.; Zhao, X. Unsupervised point cloud registration by learning unified gaussian mixture models. IEEE Robot. Autom. Lett. 2022, 7, 7028–7035. [Google Scholar] [CrossRef]
- Zhao, Y.; Fan, L. Review on Deep Learning Algorithms and Benchmark Datasets for Pairwise Global Point Cloud Registration. Remote Sens. 2023, 15, 2060. [Google Scholar] [CrossRef]
- Shi, C.; Chen, X.; Huang, K.; Xiao, J.; Lu, H.; Stachniss, C. Keypoint matching for point cloud registration using multiplex dynamic graph attention networks. IEEE Robot. Autom. Lett. 2021, 6, 8221–8228. [Google Scholar] [CrossRef]
- Wu, Y.; Zhang, Y.; Fan, X.; Gong, M.; Miao, Q.; Ma, W. Inenet: Inliers estimation network with similarity learning for partial overlapping registration. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1413–1426. [Google Scholar] [CrossRef]
- Wu, Y.; Zhang, Y.; Ma, W.; Gong, M.; Fan, X.; Zhang, M.; Qin, A.; Miao, Q. RORNet: Partial-to-partial registration network with reliable overlapping representations. IEEE Trans. Neural Netw. Learn. Syst. 2023. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Wu, Y.; Dai, Q.; Zhou, H.Y.; Xu, M.; Yang, S.; Han, X.; Yu, Y. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. arXiv 2022, arXiv:2209.13232. [Google Scholar]
- Simonovsky, M.; Komodakis, N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 3693–3702. [Google Scholar]
- Mou, C.; Zhang, J.; Wu, Z. Dynamic attentive graph learning for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, New Orleans, LA, USA, 18–24 June 2021; pp. 4328–4337. [Google Scholar]
- Luo, S.; Hu, W. Score-based point cloud denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4583–4592. [Google Scholar]
- Chen, H.; Wei, Z.; Li, X.; Xu, Y.; Wei, M.; Wang, J. Repcd-net: Feature-aware recurrent point cloud denoising network. Int. J. Comput. Vis. 2022, 130, 615–629. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Chen, H.; Luo, S.; Gao, X.; Hu, W. Unsupervised learning of geometric sampling invariant representations for 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 893–903. [Google Scholar]
- Zhou, L.; Sun, G.; Li, Y.; Li, W.; Su, Z. Point cloud denoising review: From classical to deep learning-based approaches. Graph. Model. 2022, 121, 101140. [Google Scholar] [CrossRef]
- Liu, W.; Sun, J.; Li, W.; Hu, T.; Wang, P. Deep learning on point clouds and its application: A survey. Sensors 2019, 19, 4188. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krähenbühl, P. Multimodal virtual point 3d detection. Adv. Neural Inf. Process. Syst. 2021, 34, 16494–16507. [Google Scholar]
- Xu, Q.; Zhou, Y.; Wang, W.; Qi, C.R.; Anguelov, D. Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15446–15456. [Google Scholar]
- Xiang, P.; Wen, X.; Liu, Y.S.; Cao, Y.P.; Wan, P.; Zheng, W.; Han, Z. Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5499–5509. [Google Scholar]
- Li, R.; Li, X.; Fu, C.W.; Cohen-Or, D.; Heng, P.A. Pu-gan: A point cloud upsampling adversarial network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7203–7212. [Google Scholar]
- Wang, X.; Ang, M.H., Jr.; Lee, G.H. Cascaded refinement network for point cloud completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 790–799. [Google Scholar]
- Lang, I.; Manor, A.; Avidan, S. Samplenet: Differentiable point cloud sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7578–7588. [Google Scholar]
- Chen, C.; Chen, Z.; Zhang, J.; Tao, D. Sasa: Semantics-augmented set abstraction for point-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, pp. 221–229. [Google Scholar]
- Cui, B.; Tao, W.; Zhao, H. High-precision 3D reconstruction for small-to-medium-sized objects utilizing line-structured light scanning: A review. Remote Sens. 2021, 13, 4457. [Google Scholar] [CrossRef]
- Liu, K.; Gao, Z.; Lin, F.; Chen, B.M. Fg-net: A fast and accurate framework for large-scale lidar point cloud understanding. IEEE Trans. Cybern. 2022, 53, 553–564. [Google Scholar] [CrossRef]
- Liu, K.; Gao, Z.; Lin, F.; Chen, B.M. Fg-net: Fast large-scale lidar point clouds understanding network leveraging correlated feature mining and geometric-aware modelling. arXiv 2020, arXiv:2012.09439. [Google Scholar]
- Wang, Y.; Yan, C.; Feng, Y.; Du, S.; Dai, Q.; Gao, Y. Storm: Structure-based overlap matching for partial point cloud registration. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1135–1149. [Google Scholar] [CrossRef] [PubMed]
- Yang, L.; Shrestha, R.; Li, W.; Liu, S.; Zhang, G.; Cui, Z.; Tan, P. Scenesqueezer: Learning to compress scene for camera relocalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8259–8268. [Google Scholar]
- Wang, T.; Yuan, L.; Chen, Y.; Feng, J.; Yan, S. Pnp-detr: Towards efficient visual analysis with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4661–4670. [Google Scholar]
- Zhu, M.; Ghaffari, M.; Peng, H. Correspondence-free point cloud registration with SO (3)-equivariant implicit shape representations. In Proceedings of the Conference on Robot Learning, Auckland, NZ, USA, 14–18 December 2022; pp. 1412–1422. [Google Scholar]
- Wang, H.; Pang, J.; Lodhi, M.A.; Tian, Y.; Tian, D. Festa: Flow estimation via spatial-temporal attention for scene point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14173–14182. [Google Scholar]
- Lv, C.; Lin, W.; Zhao, B. Approximate intrinsic voxel structure for point cloud simplification. IEEE Trans. Image Process. 2021, 30, 7241–7255. [Google Scholar] [CrossRef] [PubMed]
- Yang, P.; Snoek, C.G.; Asano, Y.M. Self-Ordering Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 15813–15822. [Google Scholar]
- Yuan, W.; Khot, T.; Held, D.; Mertz, C.; Hebert, M. Pcn: Point completion network. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 728–737. [Google Scholar]
- Zamanakos, G.; Tsochatzidis, L.; Amanatiadis, A.; Pratikakis, I. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving. Comput. Graph. 2021, 99, 153–181. [Google Scholar] [CrossRef]
- Chen, X.; Chen, B.; Mitra, N.J. Unpaired point cloud completion on real scans using adversarial training. arXiv 2019, arXiv:1904.00069. [Google Scholar]
- Achituve, I.; Maron, H.; Chechik, G. Self-supervised learning for domain adaptation on point clouds. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 123–133. [Google Scholar]
- Liu, M.; Sheng, L.; Yang, S.; Shao, J.; Hu, S.M. Morphing and sampling network for dense point cloud completion. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11596–11603. [Google Scholar]
- Zhou, L.; Du, Y.; Wu, J. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5826–5835. [Google Scholar]
- Xie, H.; Yao, H.; Zhou, S.; Mao, J.; Zhang, S.; Sun, W. Grnet: Gridding residual network for dense point cloud completion. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 365–381. [Google Scholar]
- Pan, L.; Chen, X.; Cai, Z.; Zhang, J.; Zhao, H.; Yi, S.; Liu, Z. Variational relational point completion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8524–8533. [Google Scholar]
- Zhang, J.; Chen, X.; Cai, Z.; Pan, L.; Zhao, H.; Yi, S.; Yeo, C.K.; Dai, B.; Loy, C.C. Unsupervised 3d shape completion through gan inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1768–1777. [Google Scholar]
- Huang, Z.; Yu, Y.; Xu, J.; Ni, F.; Le, X. Pf-net: Point fractal network for 3d point cloud completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7662–7670. [Google Scholar]
- Fei, B.; Yang, W.; Chen, W.M.; Li, Z.; Li, Y.; Ma, T.; Hu, X.; Ma, L. Comprehensive review of deep learning-based 3d point cloud completion processing and analysis. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22862–22883. [Google Scholar] [CrossRef]
- Yan, X.; Lin, L.; Mitra, N.J.; Lischinski, D.; Cohen-Or, D.; Huang, H. Shapeformer: Transformer-based shape completion via sparse representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6239–6249. [Google Scholar]
- Zhou, H.; Cao, Y.; Chu, W.; Zhu, J.; Lu, T.; Tai, Y.; Wang, C. Seedformer: Patch seeds based point cloud completion with upsample transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 416–432. [Google Scholar]
Model | Dataset | Data Representation |
---|---|---|
PointOutNet [9] | ShapeNet [10], 3D-R2N2 [11] | Point Cloud |
Pseudo-renderer [12] | ShapeNet [10] | Point Cloud |
RealPoint3D [13] | ShapeNet [10], ObjectNet3D [14] | Point Cloud |
Cycle-consistency-based approach [15] | ShapeNet [10], Pix3D [16] | Point Cloud |
3D34D [17] | ShapeNet [10] | Point Cloud |
Unsupervised learning of 3D structure [18] | ShapeNet [10], MNIST3D [19] | Point Cloud |
Models | Dataset | Data Representation |
---|---|---|
GenRe [20] | ShapeNet [10], Pix3D [16] | Voxels |
MarrNet [21] | ShapeNet [10], PASCAL3D+ [22] | Voxels |
Perspective Transformer Nets [23] | ShapeNet [10] | Voxels |
Rethinking reprojection [24] | ShapeNet [10], PASCAL3D+ [22], SUN [25], MS COCO [26] | Voxels |
3D-GAN [27] | ModelNet [28], IKEA [29] | Voxels |
Pix2Vox++ [30] | ShapeNet [10], Pix3D [16], Things3D [30] | Voxels |
3D-R2N2 [11] | ShapeNet [10], PASCAL3D+ [22], MVS CAD 3D [11] | Voxels |
Weak recon [31] | ShapeNet [10], ObjectNet3D [14] | Voxels |
Relative viewpoint estimation [32] | ShapeNet [10], Pix3D [16], Things3D [30] | Voxels |
Datasets | Number of Frames | Number of Labels | Object Type | 5 Common Classes |
---|---|---|---|---|
ModelNet [28] | 151,128 | 660 | 3D CAD Scans | Bed, Chair, Desk, Sofa, Table |
PASCAL3D+ [22] | 30,899 | 12 | 3D CAD Scans | Boat, Bus, Car, Chair, Sofa |
ShapeNet [10] | 220,000 | 3135 | Scans of Artefact, Plant, Person | Table, Car, Chair, Sofa, Rifle |
ObjectNet3D [14] | 90,127 | 100 | Scans of Artifact, Vehicles | Bed, Car, Door, Fan, Key |
ScanNet [39] | 2,492,518 | 1513 | Scans of Bedrooms, Kitchens, Offices | Bed, Chair, Door, Desk, Floor |
Nr. | Model | Dataset | Data Representation |
---|---|---|---|
1 | PointOutNet [9] | ShapeNet [10], 3D-R2N2 [11] | Point Cloud |
2 | Pseudo-renderer [12] | ShapeNet [10] | Point Cloud |
3 | RealPoint3D [13] | ShapeNet [10], ObjectNet3D [14] | Point Cloud |
4 | Cycle- consistency-based [15] approach | ShapeNet [10], Pix3D [16] | Point Cloud |
5 | GenRe [20] | ShapeNet [10], Pix3D [16] | Voxels |
6 | MarrNet [21] | ShapeNet [10], PASCAL3D+ [22] | Voxels |
7 | Perspective Transformer [23] Nets | ShapeNet [10] | Voxels |
8 | Rethinking reprojection | ShapeNet [10], PASCAL3D+ [22], SUN [25], MS COCO [26] | Voxels |
9 | 3D-GAN [24] | ModelNet [28], IKEA [29] | Voxels |
10 | Neural renderer [35] | ShapeNet [10] | Meshes |
11 | Residual MeshNet [36] | ShapeNet [10] | Meshes |
12 | Pixel2Mesh [37] | ShapeNet [10] | Meshes |
13 | CoReNet [38] | ShapeNet [10] | Meshes |
Model | Advantages | Limitations |
---|---|---|
PointOutNet [9] | Introduces the chamfer distance loss, which is invariant to the permutation of points and is adopted by many other models as a regulariser. | Utilises less memory, but since they lack connection information, they need extensive postprocessing. |
Pseudo-renderer [12] | Uses 2D supervision in addition to 3D supervision to obtain multiple projection images from various viewpoints of the generated 3D shape for optimisation. | Predicts denser, more accurate point clouds but is limited to the amount of points that point cloud-based representations can accommodate. |
RealPoint3D [13] | Attempts to recreate 3D models from nature photographs with complicated backgrounds. | Needs an encoder to extract the input image’s 2D features and input point cloud data’s 3D features. |
Cycle- consistency-based approach [15] | Uses a differentiable renderer to infer a 3D shape without using ground truth 3D annotation. | Cycle consistency produces deformed body structure or out-of-view images if it is unaware of the previous distribution of the 3D features, which interferes with the training process. |
GenRe [20] | Can rebuild 3D objects with resolutions of up to 128 × 128 × 128 and more detailed reconstruction outcomes. | Higher resolutions have been used by this model at the expense of sluggish training or lossy 2D projections, as well as small training batches. |
MarrNet [21] | Avoids modelling item appearance differences within the original image by generating 2.5D drawings from it. | Relies on 3D supervision which is only available for restricted classes or in a synthetic setting. |
Perspective Transformer Nets [23] | Learns 3D volumetric representations from 2D observations based on principles of projective geometry. | Struggles to produce images that are consistent across several views as the underlying 3D scene structure cannot be utilised. |
Rethinking reprojection [24] | Decoupling shape and posture lowers the number of free parameters in the network, increasing efficiency. | Assumes that the scene or object to be registered is either non-deformable or generally static. |
3D-GAN [27] | Generative component aims to map a latent space to a distribution of intricate 3D shapes. | GAN training is notoriously unreliable. |
Neural renderer [35] | Objects are trained in canonical pose. | This mesh renderer modifies geometry and colour in response to a target image. |
Residual MeshNet [36] | Reconstructing 3D meshes using MLPs in a cascaded hierarchical fashion. | Produces mesh automatically during the finite element method (FEM) computation process, although it does not save time increasing computing productivity. |
Pixel2Mesh [37] | Extracts perceptual features from the input image and gradually deforms an ellipsoid in order to obtain the output geometry. | Several perspectives of the target object or scene are not included in the training data for 3D shape reconstruction, as in real-world scenarios. |
CoReNet [38] | Reconstructs the shape and semantic class of many objects directly in a 3D volumetric grid using a single RGB image. | Training on synthetic representations restricts their practicality in real-world situations. |
Nr. | Model | Dataset | Data Representation |
---|---|---|---|
1 | 3D34D [17] | ShapeNet [10] | Point Cloud |
2 | Unsupervised learning of 3D structure [18] | ShapeNet [10], MNIST3D [19] | Point Cloud |
3 | Pix2Vox++ [30] | ShapeNet [10], Pix3D [16], Things3D [30] | Voxels |
4 | 3D-R2N2 [11] | ShapeNet [10], PASCAL3D+ [22], MVS CAD 3D [11] | Voxels |
5 | Weak recon [31] | ShapeNet [10], ObjectNet3D [14] | Voxels |
6 | Relative viewpoint estimation [32] | ShapeNet [10], Pix3D [16], Things3D [30] | Voxels |
Model | Advantages | Limitations |
---|---|---|
3D34D [17] | Obtains a more expressive intermediate shape representation by locally assigning features and 3D points. | Performs admirably on synthetic objects rendered with a clear background, but not on actual photos, novel categories, or more intricate object geometries. |
Unsupervised learning of 3D structures [18] | Optimises 3D representations to provide realistic 2D images from all randomly sampled views. | Only basic and coarse shapes can be reconstructed. |
Pix2Vox++ [30] | Generates a coarse volume for each input image. | Because of memory limitations, the model’s cubic complexity in space results in coarse discretisations. |
3D-R2N2 [11] | Converts RGB image partial inputs into a latent vector, which is then used to predict the complete volumetric shape using previously learned priors. | Only works with coarse 64 × 64 × 64 grids. |
Weak recon [31] | Alternative to costly 3D CAD annotation, and proposes using lower-cost 2D supervision. | Reconstructions are hampered by this weakly supervised environment. |
Relative viewpoint estimation [32] | Predicts a transformation that optimally matches the bottleneck features of two input images during testing. | It can only predict posture for instances of a single item and does not extend to the category level. |
Nr. | Model | Dataset | Data Representation |
---|---|---|---|
1 | CPD [136] | Stanford Bunny [33] | Meshes |
2 | PSR-SDP [137] | TUM RGB-D [138] | Point Cloud |
3 | RPM-Net [139] | ModelNet [28] | Meshes |
4 | DeepICP [140] | KITTI [141], SouthBay [142] | Point Cloud, Voxels |
5 | 3D-SmoothNet [143] | 3DMatch [144] | Point Cloud, Voxels |
6 | 3D multi-view registration [145] | 3DMatch [144], Redwood [146], ScanNet [39] | Point Cloud |
Model | Advantages | Limitations |
---|---|---|
CPD [136] | Considers the alignment as a probability density estimation problem, where one point cloud set represents the Gaussian mixture model centroids, and the other represents the data points. | While GMM-based methods might increase resilience against outliers and bad initialisations, local search remains the foundation of the optimisation. |
PSR-SDP [137] | Allows for verifying the global optimality of a local minimiser in a significantly faster manner. | Provides poor estimates even in the presence of a single outlier because it assumes that all measurements are inliers. |
RPM-Net [139] | Able to solve the partial visibility of the point cloud and obtain a soft assignment of point correspondences. | Computational efficacy increases as the number of points in the point clouds increases. |
DeepICP [140] | By creating a connection using the point cloud’s learned attributes, this study improved the conventional ICP algorithm using the neural network technique. | Takes a lot of computing effort to combine deep learning with ICP directly. |
3DSmoothNet [143] | First learned, universal matching method that allows transferring trained models between modalities. | 290 times slower than FCGF [162] model. |
3D multi-view registration [145] | First end-to-end algorithm for joint learning of both stages of the registration problem. | A lot of training data are required. |
Nr. | Model | Dataset | Data Representation |
---|---|---|---|
1 | MaskNet [165] | S3DIS [3], 3DMatch [144], ModelNet [28] | Point Cloud |
2 | GPDNet [166] | ShapeNet [10] | Point Cloud |
3 | DMR [167] | ModelNet [28] | Point Cloud |
4 | PU-Net [168] | ModelNet [28], ShapeNet [10] | Point Cloud |
5 | MPU [169] | ModelNet [28], MNIST-CP [19] | Point Cloud |
6 | CP-Net [170] | ModelNet [28] | Point Cloud |
7 | SampleNet [171] | ModelNet [28], ShapeNet [10] | Point Cloud |
Model | Advantages | Limitations |
---|---|---|
MaskNet [165] | Rejects noise in even partial clouds in a rather computationally inexpensive manner. | Requires the input of both a partial and complete point cloud. |
GDPNet [166] | Deals with the permutation-invariance problem and builds hierarchies of local or non-local features to effectively address the denoising problem. | The point clouds’ geometric characteristics are often oversmoothed. |
DMR [167] | Patch manifold reconstruction (PMR) upsampling technique is straightforward and efficient. | Downsampling step invariably results in detail loss, especially at low noise levels, and could also oversmooth by removing some useful information. |
PU-Net [168] | Both reconstruction loss and repulsion loss are jointly utilised to improve the quality of the output. | Only learns spatial relationships at a single level of multi-step point cloud decoding via self-attention. |
MPU [169] | Trained end-to-end on high-resolution point clouds and emphasises a certain level of detail by altering the spatial span of the receptive field in various steps. | Cannot be used for completion tasks and is restricted to upsampling sparse locations. |
CP-Net [170] | Final representations typically retain crucial points that take up a significant number of channels. | Potential loss of information due to the down-sampling process. |
SampleNet [171] | Sampling procedure for the representative point cloud classification problem becomes differentiable, allowing for end-to-end optimisation. | Fails to attain a satisfactory equilibrium between maintaining geometric features and uniform density. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vinodkumar, P.K.; Karabulut, D.; Avots, E.; Ozcinar, C.; Anbarjafari, G. Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper. Entropy 2024, 26, 235. https://doi.org/10.3390/e26030235
Vinodkumar PK, Karabulut D, Avots E, Ozcinar C, Anbarjafari G. Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper. Entropy. 2024; 26(3):235. https://doi.org/10.3390/e26030235
Chicago/Turabian StyleVinodkumar, Prasoon Kumar, Dogus Karabulut, Egils Avots, Cagri Ozcinar, and Gholamreza Anbarjafari. 2024. "Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper" Entropy 26, no. 3: 235. https://doi.org/10.3390/e26030235