Deep Learning on Point Clouds and Its Application: A Survey
Abstract
:1. Introduction
- Recent advances on point clouds with deep learning are surveyed. The architectures can be classified into two categories, i.e., raw point-based and tree-based architectures. Additionally, their differences from unstructured and disordered point clouds are highlighted.
- Applications of point clouds with deep learning are compared, and the future direction is given.
2. Related Works
3. Feature Learning on Point Cloud
3.1. Raw Point-Based Deep Learning
3.1.1. PointNet-Based Deep Learning
3.1.2. ConvNets-Based Deep Learning
3.1.3. RNN-Based Deep Learning
3.1.4. Autoencoder-Based Deep Learning
3.1.5. Others
3.2. Tree-Based Deep Learning
4. Applications of Point Clouds Using Deep Learning
4.1. Datasets
4.2. 3D Object Classification
- Missing data: Scanned models are usually occluded, and some data is lost.
- Noise: All sensors are noisy. There are different types of noise, including point perturbations and outliers. This means that a point has a certain probability within a certain radius around the location where it is sampled (disturbance), or it may appear at random locations (outliers) in space.
- Rotation invariance: Rotation and translation points should not affect classification.
4.3. Semantic Segmentation
- The segmentation algorithm should consider the specific properties of different ground objects.
- The segmentation algorithm should infer the attribute relationships of adjacent partition blocks.
- The segmentation algorithm should be applied to the point clouds acquired by different scanners.
4.4. 3D Object Detection
5. Discussion and Future Direction
5.1. Performance and Characteristics of Reviewed Methods
5.2. Some Future Directions
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Balsabarreiro, J.; Fritsch, D. Generation of Visually Aesthetic and Detailed 3d Models of Historical Cities by Using Laser Scanning and Digital Photogrammetry. Digit. Appl. Archaeol. Cult. Herit. 2017, 8, 57–64. [Google Scholar]
- Balsa-Barreiro, J.; Fritsch, D. Generation of 3d/4d Photorealistic Building Models. The Testbed Area for 4d Cultural Heritage World Project: The Historical Center of Calw (Germany). In Advances in Visual Computing, Proceedings of the 11th International Symposium (ISVC 2015), Las Vegas, NV, USA, 14–16 December 2015; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Oliveira, M.; Lopes, L.S.; Lim, G.H.; Kasaei, S.H.; Tomé, A.M.; Chauhan, A. 3D object perception and perceptual learning in the RACE project. Robot. Auton. Syst. 2016, 75, 614–626. [Google Scholar] [CrossRef]
- Mahler, J.; Matl, M.; Satish, V.; Danielczuk, M.; Derose, B.; McKinley, S.; Goldberg, K. Learning ambidextrous robot grasping policies. Sci. Robot. 2019, 4, eaau4984. [Google Scholar] [CrossRef]
- Velodyne Hdl-64e Lidar Specification. Available online: https://velodynelidar.com/hdl-64e.html (accessed on 5 May 2019).
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. Pointnet: Deep Learning on Point Sets for 3d Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Aubry, M.; Schlickewei, U.; Cremers, D. The Wave Kernel Signature: A Quantum Mechanical Approach to Shape Analysis. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV 2011 Workshops), Barcelona, Spain, 6–13 November 2011. [Google Scholar]
- Bronstein, M.M.; Kokkinos, I. Scale-Invariant Heat Kernel Signatures for Non-Rigid Shape Recognition. In Proceedings of the 23 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Hana, X.-F.; Jin, J.S.; Xie, J.; Wang, M.-J.; Jiang, W. A Comprehensive Review of 3d Point Cloud Descriptors. arXiv 2018, arXiv:1802.02297. [Google Scholar]
- Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J. 3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2270–2287. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive Computation and Machine Learning; MIT Press: London, UK, 2016. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Grilli, E.; Menna, F.; Remondino, F. A review of point clouds segmentation and classification algorithms. ISPRS Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2017, 42, 339–344. [Google Scholar] [CrossRef]
- Nguyen, A.; Le, B. 3d Point Cloud Segmentation: A Survey. In Proceedings of the IEEE 6th International Conference on Robotics, Automation and Mechatronics (RAM 2013), Manila, Philippines, 12–15 November 2013. [Google Scholar]
- Ahmed, E.; Saint, A.; Shabayek, A.E.R.; Cherenkova, K.; Das, R.; Gusev, G.; Aouada, D.; Ottersten, B. Deep Learning Advances on Different 3d Data Representations: A Survey. arXiv 2018, arXiv:1808.01462. [Google Scholar]
- Kaick, V.; Oliver, Z.H.; Hamarneh, G.; Cohen-Or, D. A Survey on Shape Correspondence. In Proceedings of the Eurographics 2010—State of the Art Reports, Norrköping, Sweden, 3–7 May 2010. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d Shapenets: A Deep Representation for Volumetric Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Maturana, D.; Scherer, S. Voxnet: A 3d Convolutional Neural Network for Real-Time Object Recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2015), Hamburg, Germany, 28 September–2 October 2015. [Google Scholar]
- Qi, R.C.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and Multi-View Cnns for Object Classification on 3d Data. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Riegler, G.; Ulusoy, A.O.; Geiger, A. Octnet: Learning Deep 3d Representations at High Resolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-View Convolutional Neural Networks for 3d Shape Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Savva, M.; Yu, F.; Su, H.; Aono, M.; Chen, B.; Cohen-Or, D.; Deng, W.; Su, H.; Bai, S.; Bai, X. Shrec’16 Track Large-Scale 3d Shape Retrieval from Shapenet Core55. In Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, Lisbon, Portugal, 8 May 2016. [Google Scholar]
- Fang, Y.; Xie, J.; Dai, G.; Wang, M.; Zhu, F.; Xu, T.; Wong, E. 3d Deep Shape Descriptor. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Ioannidou, A.; Chatzilari, E.; Nikolopoulos, S.; Kompatsiaris, I. Deep Learning Advances in Computer Vision with 3d Data: A Survey. ACM Comput. Surv. CSUR 2017, 50, 20. [Google Scholar] [CrossRef]
- Nygren, P.; Jasinski, M. A Comparative Study of Segmentation and Classification Methods for 3d Point Clouds. Master’s Thesis, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden, 2016. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (IClR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Girshick, R.; Cnn, R.F. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Liu, X.; Wang, B.; Xu, X.; Liang, J.; Ren, J.; Wei, C. Modified Nearest Neighbor Fuzzy Classification Algorithm for Ship Target Recognition. In Proceedings of the Industrial Electronics and Applications, Langkawi, Malaysia, 25–28 September 2011. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherland, 11–14 October 2016. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolo9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar]
- Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; VanderGheynst, P. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef]
- Griffiths, D.; Boehm, J. A Review on Deep Learning Techniques for 3D Sensed Data Classification. Remote. Sens. 2019, 11, 1499. [Google Scholar] [CrossRef]
- Qi, R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zeng, W.; Gevers, T. 3dcontextnet: Kd Tree Guided Hierarchical Learning of Point Clouds Using Local Contextual Cues. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2–6 June 2018. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Gadelha, M.; Wang, R.; Maji, S. Multiresolution Tree Networks for 3d Point Cloud Processing. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zou, X.; Cheng, M.; Wang, C.; Xia, Y.; Li, J. Tree Classification in Complex Forest Point Clouds Based on Deep Learning. IEEE Geosci. Remote. Sens. Lett. 2017, 14, 2360–2364. [Google Scholar] [CrossRef]
- Li, Y.; Bu, R.; Sun, M.; Pointcnn, B.C. PointCNN: Convolution On X-Transformed Points. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph Cnn for Learning on Point Clouds. arXiv 2018, arXiv:1801.07829. [Google Scholar]
- Te, G.; Hu, W.; Zheng, A.; Guo, Z. Rgcnn: Regularized Graph Cnn for Point Cloud Segmentation. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference (MM 2018), Seoul, Korea, 22–26 October 2018. [Google Scholar]
- Ye, X.; Li, J.; Huang, H.; Du, L.; Zhang, X. 3d Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Yang, Y.; Feng, C.; Shen, Y.; Tian, D. Foldingnet: Point Cloud Auto-Encoder Via Deep Grid Deformation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Deng, H.; Birdal, T.; Ilic, S. Ppfnet: Global Context Aware Local Features for Robust 3d Point Matching. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Deng, H.; Birdal, T.; Ilic, S. Ppf-Foldnet: Unsupervised Learning of Rotation Invariant 3d Local Descriptors. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Remelli, E.; Baqué, P.; Fua, P. Neuralsampler: Euclidean Point Cloud Auto-Encoder and Sampler. arXiv 2019, arXiv:1901.09394. [Google Scholar]
- Su, H.; Jampani, V.; Sun, S.D.; Maji, E.; Yang, Mi.; Kautz, J. Splatnet: Sparse Lattice Networks for Point Cloud Processing. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June2018. [Google Scholar]
- Li, J.; Chen, B.M.; Lee, G.H. So-Net: Self-Organizing Network for Point Cloud Analysis. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Shoef, M.; Fogel, S.; Cohen-Or, D. Pointwise: An Unsupervised Point-Wise Feature Learning Network. arXiv 2019, arXiv:1901.04544. [Google Scholar]
- Sun, J.; Ovsjanikov, M.; Guibas, L. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. In Computer Graphics Forum; Blackwell: Oxford, UK, 2009. [Google Scholar]
- Roynard, X.; Deschaud, J.-E.; Goulette, F. Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. Int. J. Robot. Res. 2018, 37, 545–557. [Google Scholar] [CrossRef] [Green Version]
- Vallet, B.; Brédif, M.; Serna, A.; Marcotegui, B.; Paparoditis, N. Terramobilita/Iqmulus Urban Point Cloud Analysis Benchmark. Comput. Graph. 2015, 49, 126–133. [Google Scholar] [CrossRef]
- Binh-Son, H.; Tran, Mi.; Yeung, Sa. Pointwise Convolutional Neural Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Wu, W.; Qi, Z.; Li, F. Pointconv: Deep Convolutional Networks on 3d Point Clouds. arXiv 2018, arXiv:1811.07246. [Google Scholar]
- Lan, S.; Yu, R.; Yu, G.; Davis, L.S. Modeling Local Geometric Structure of 3d Point Clouds Using Geo-Cnn. arXiv 2018, arXiv:1811.07782. [Google Scholar]
- Xu, Y.; Fan, T.; Xu, M.; Zeng, L.; Qiao, Y. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Karargyris, A. Color Space Transformation Network. In Computer Science; National Library of Medicine: Bethesda, MD, USA, 2015. [Google Scholar]
- Recurrent Neural Network. Available online: https://en.wikipedia.org/wiki/Recurrent_neural_network (accessed on 5 October 2018).
- Sak, H.; Senior, A.; Beaufays, F. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014), Singapore, 14–18 September 2014. [Google Scholar]
- Huang, Q.; Wang, W.; Neumann, U. Recurrent Slice Networks for 3d Segmentation of Point Clouds. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Louizos, C.; Swersky, K.; Li, Y.; Welling, M.; Zemel, R. The Variational Fair Autoencoder. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- He, T.; Huang, H.; Yi, L.; Zhou, Y.; Wu, C.; Wang, J.; Soatto, S. Geonet: Deep Geodesic Networks for Point Cloud Analysis. arXiv 2019, arXiv:1901.00680. [Google Scholar]
- Zamorski, M.; Zięba, M.; Nowak, R.; Stokowiec, W.; Trzciński, T. Adversarial Autoencoders for Generating 3d Point Clouds. arXiv 2018, arXiv:1811.07605. [Google Scholar]
- Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J. Photogramm. Remote. Sens. 2018, 145, 120–147. [Google Scholar] [CrossRef]
- Zhao, Y.; Birdal, T.; Deng, H.; Tombari, F. 3d Point-Capsule Networks. arXiv 2018, arXiv:1812.10775. [Google Scholar]
- Yu, L.; Li, X.; Fu, C.; Cohen-Or, D.; Heng, P. Pu-Net: Point Cloud Upsampling Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhang, W.; Xiao, C. Pcan: 3d Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval. arXiv 2019, arXiv:1904.09793. [Google Scholar]
- Gronat, P.; Sivic, J.; Arandjelovic, R.; Torii, A.; Pajdla, T. Netvlad: Cnn Architecture for Weakly Supervised Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Klokov, R.; Lempitsky, V. Escape from Cells: Deep Kd-Networks for the Recognition of 3d Point Cloud Models. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Pickup, D.; Sun, X.; Rosin, P.L.; Martin, R.R.; Cheng, Z.; Lian, Z.; Aono, M.; Ben Hamza, A.; Bronstein, A.; Bronstein, M.; et al. Shape Retrieval of Non-rigid 3D Human Models. Int. J. Comput. Vis. 2016, 120, 169–193. [Google Scholar] [CrossRef] [Green Version]
- Yi, L.; Kim, V.G.; Ceylan, D.; Shen, I.; Yan, M.; Su, H.; Lu, C.; Huang, Q.; Sheffer, A.; Guibas, L. A Scalable Active Framework for Region Annotation in 3d Shape Collections. ACM Trans. Graph. 2016, 35, 210. [Google Scholar] [CrossRef]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d Semantic Parsing of Large-Scale Indoor Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Garcia-Garcia, A.; Gomez-Donoso, F.; Garcia-Rodriguez, J.; Orts-Escolano, S.; Cazorla, M.; Azorin-Lopez, J. Pointnet: A 3d Convolutional Neural Network for Real-Time Object Class Recognition. In Proceedings of the 2016 International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, A.T.; Nießner, M. Scannet: Richly-Annotated 3d Reconstructions of Indoor Scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Eitz, M.; Richter, R.; Boubekeur, T.; Hildebrand, K.; Alexa, M. Sketch-Based Shape Retrieval. ACM Trans. Graph. 2012, 31, 31. [Google Scholar] [CrossRef]
- Chang, X.A.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H. Shapenet: An Information-Rich 3d Model Repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Riemenschneider, H.; Bódis-Szomorú, A.; Weissenberg, J.; van Gool, L. Learning Where to Classify in Multi-View Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2014—13th European Conference, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3dmatch: Learning Local Geometric Descriptors from Rgb-D Reconstructions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. Virtual Worlds as Proxy for Multi-Object Tracking Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The Kitti Vision Benchmark Suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012), Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs. In Proceedings of the 2nd International Conference on Learning Representations (ICLR 2014), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Sattler, T.; Tylecek, R.; Brox, T.; Pollefeys, M.; Fisher, R.B. 3d Reconstruction Meets Semantics–Reconstruction Challenge 2017. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Hackel, T.; Wegner, J.D.; Schindler, K. Fast semantic segmentation of 3d point clouds with strongly varying density. ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2016, 3, 177–184. [Google Scholar] [CrossRef]
- Serna, A.; Marcotegui, B.; Goulette, F.; Deschaud, Je. Paris-Rue-Madame Database: A 3d Mobile Laser Scanner Dataset for Benchmarking Urban Detection, Segmentation and Classification Methods. In Proceedings of the 4th International Conference on Pattern Recognition Applications and Methods–ICPRAM 2015, Lisbon, Portugal, 10–12 January 2015. [Google Scholar]
- Gehrung, J.; Hebel, M.; Arens, M.; Stilla, U. An Approach to Extract Moving Objects from Mls Data Using a Volumetric Background Representation. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; Copernicus GmbH: Gottingen, Germany, 2017. [Google Scholar]
- Borgmann, B.; Hebel, M.; Arens, M.; Stilla, U. Detection Of Persons In Mls Point Clouds. ISPRS Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2017, 2, 203–210. [Google Scholar] [CrossRef]
- Johnson, A.; Hebert, M. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 433–449. [Google Scholar] [CrossRef] [Green Version]
- Chen, D.; Tian, X.; Shen, Y.; Ouhyoung, M. On Visual Similarity Based 3d Model Retrieval. In Computer Graphics Forum; Blackwell: Oxford, UK, 2009. [Google Scholar]
- Douillard, B.; Underwood, J.; Vlaskine, V.; Quadros, A.; Singh, S. A Pipeline for the Segmentation and Classification of 3d Point Clouds. In Proceedings of the 12th International Symposium on Experimental Robotics (ISER 2010), New Delhi and Agra, India, 18–21 December 2010. [Google Scholar]
- Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.; Smola, A.J. Deep Sets. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Rusu, B.R.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (Fpfh) for 3d Registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA 2009), Kobe, Japan, 12–17 May 2009. [Google Scholar]
- Kohonen, T. The Self-Organizing Map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d Object Proposal Generation and Detection from Point Cloud. arXiv 2018, arXiv:1812.04244. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-End Learning for Point Cloud Based 3d Object Detection. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Sindagi, A.V.; Zhou, Y.; Tuzel, O. Mvx-Net: Multimodal Voxelnet for 3d Object Detection. arXiv 2019, arXiv:1904.01649. [Google Scholar]
- Zhou, J.; Lu, X.; Tan, X.; Shao, Z.; Ding, S.; Ma, L. Fvnet: 3d Front-View Proposal Generation for Real-Time Object Detection from Point Clouds. arXiv 2019, arXiv:1903.10750. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum Pointnets for 3d Object Detection from Rgb-D Data. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Qi, R.C.; Litany, O.; He, K.; Guibas, L.J. Deep Hough Voting for 3d Object Detection in Point Clouds. arXiv 2019, arXiv:1904.09664. [Google Scholar]
- Lu, X.; Liu, Y.; Li, K. Fast 3d Line Segment Detection from Unorganized Point Cloud. arXiv 2019, arXiv:1901.02532. [Google Scholar]
- Li, X.; Guivant, J.E.; Kwok, N.; Xu, Y. 3d Backbone Network for 3d Object Detection. arXiv 2019, arXiv:1901.08373. [Google Scholar]
- Razlaw, J.; Quenzel, J.; Behnke, S. Detection and Tracking of Small Objects in Sparse 3d Laser Range Data. arXiv 2019, arXiv:1903.05889. [Google Scholar]
- Simon, M.; Amende, K.; Kraus, A.; Honer, J.; Sämann, T.; Kaulbersch, H.; Milz, S.; Gross, Ho. Complexer-Yolo: Real-Time 3d Object Detection and Tracking on Semantic Point Clouds. arXiv 2019, arXiv:1904.07537. [Google Scholar]
- Ku, J.; Pon, A.D.; Waslander, S.L. Monocular 3d Object Detection Leveraging Accurate Proposals and Shape Reconstruction. arXiv 2019, arXiv:1904.01690. [Google Scholar]
- Koguciuk, D.; Chechlinski, L. 3d Object Recognition with Ensemble Learning—A Study of Point Cloud-Based Deep Learning Models. arXiv 2019, arXiv:1904.08159. [Google Scholar]
- Cheraghian, A.; Rahman, S.; Petersson, L. Zero-Shot Learning of 3d Point Cloud Objects. arXiv 2019, arXiv:1902.10272. [Google Scholar]
- V oulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intel. Neurosci. 2018, 2018, 13. [Google Scholar]
- Bengio, Y.; Courville, A.C.; Vincent, P. Unsupervised feature learning and deep learning: A review and new perspectives. CoRR 2012, 2012, 1. [Google Scholar]
- Le, E.T.; Kokkinos, I.; Mitra, N.J. Going Deeper with Point Networks. Comput. Vis. Pattern Recognit. 2019.
- Lin, T.; Dollár, P.; Girshick, R.B.; He, K.; Hariharan, B.; Belongie, S.J. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 8 September 2017. [Google Scholar]
Reference | Main Contents |
---|---|
Nygren et al. 2016 [26] | The traditional algorithms for 3D point cloud segmentation and classification |
Nguyen et al. 2013 [15] | The segmentation methods for the 3D point cloud. |
Ahmed et al. 2018 [16] | The 3D data from Euclidean and the non-Euclidean geometry and a discussion on how to apply deep learning to the 3D dataset. |
Hana et al. 2018 [9] | The feature descriptors of point clouds with three classes, i.e., local-based, global-based, and hybrid-based. |
Garcia et al. 2017 [40] | The semantic segmentation methods based on deep learning. |
Bronstein et al. 2017 [41] | The problems of geometric deep learning, extending grid-like deep learning methods to non-Euclidean structures. |
Griffiths et al. 2019 [42] | The classification models for processing 3D unstructured Euclidean data. |
Datasets Name | Descriptions | Application Tasks |
---|---|---|
ModelNet40 [18] | It consists of 12,311 CAD models in 40 object classes. | 3D object classification [48,50,51,79] and shape classification [45] |
ShapeNet part [80] | There are 16,881 shapes represented by 3D CAD models in 16 categories with a total of 50 parts annotated. | Part segmentation [44,48,49,56,80], shapes generation [46], and representation learning [52] |
Stanford 3D semantic parsing [81] | This dataset has 271 rooms in six areas captured by 3D Matterport scanners captured by Matterport Camera. | Semantic Segmentation [8,43,44,46,48,49,78,82] |
SHREC15 [18] | There are 1200 shapes in 50 categories by scanning real human participants and using 3D modeling software [79]. Each class has 24 shapes and most of these shapes are organic with different postures. | Non-rigid shape classification [43] |
SHREC16 [18] | It contains about 51,300 3D models in 55 categories. | 3D shape retrieval [8] |
ScanNet [83] | There are 1513 scanned and reconstructed indoor scenes. | Virtual scan generation [43], segmentation [48], and classification [48] |
S3DIS [81] | It consists of 271 rooms in six areas captured by 3D Matterport scanners. | 3D semantic segmentation [44,48] and representation |
TU-Berlin [84] | It has sketches in 250 categories. Each category has 80 sketches. | Classification [48] |
ShapeNetCore [85] | It has 51,300 3D shapes in 55 categories, which is indicated by triangular meshes. The dataset is labeled manually and a subset of the ShapeNet dataset. | 3D shape retrieval task [78], 3D shape retrieval task [8], and classification [8] |
ModelNet10 [18] | The 10-class of Model-Net (ModelNet10) benchmarks are used for 3D shape classification. They contain 4,899 and 12,311 models respectively. | Object classification [8] Shape classification [78] |
RueMonge2014 [86] | The images are multi-view in high-resolution images from a street in Paris and the number of these images is 428. | 3D point cloud labeling [56] |
3DMatch Benchmark [87] | It contains a total of 62 scenes. | Point Cloud representation [54] |
KITTI-3D Object Detection [88,89] | There are 16 classes, including 40,000 objects in 12,000 images captured by a Velodyne laser scanner. | 3D object detection [20,23,24,90] |
vKITTI [91] | This dataset includes a sparse point cloud captured by LiDAR without color information. It can be used for generalization verification, but it cannot be used for supervised training. | Semantic segmentation [51] |
3DRMS [92] | This dataset comes from the challenge of combining 3D and semantic information in complex scenarios and was captured by a robot that drove through a semantically rich garden with beautiful geometric details. | Semantic segmentation [51] |
Cornell RGBD Dataset | It has 52 labeled point cloud indoor scenes including 24 office scenes and 28 family scenarios with the Microsoft Kinect sensor. The data set consists of approximately 550 views with 2495 segments marked with 27 object classes. | Segmentation [14] |
VMR-Oakland dataset | It contains point clouds captured by mobile platforms with Navlab11 around the Carnegie Mellon University (CMU) campus. | Segmentation [14] |
Robot 3D Scanning Repository | The 3D point clouds acquired by Cyberware 3030 MS are provided for both indoor and outdoor environments. Heat and color information is included in some datasets. | Segmentation [14] |
ATG4D [89] | There are over 1.2 million, 5,969, and 11,969 frames in the training, validation, and test datasets, respectively. This dataset is captured by a PrimeSense sensor. | Point object detection [20] |
Paris-Lille-3D [60] | There are 50 classes in 143.1M point clouds acquired by Mobile Laser Scanning. | Segmentation and classification [60] |
Semantic3D [93] | There are eight classes in 1660M point clouds acquired by static LIDAR scanners. | Semantic segmentation [93] |
Paris-rueMadame [94] | There are 17 classes in 20M point clouds acquired by static LIDAR. | Segmentation, classification, and detection [94] |
IQmulus [61] | There are 22 classes in 12M point clouds acquired by static LIDAR. | Classification and detection [61] |
MLS 1 - TUM City Campus [95,96] | There are more than 16,000 scans captured by mobile laser scanning (MLS) in this dataset. | 3D detection [95,96], city modeling [95,96], and 3D change detection |
Methods | ModelNet 10 | ModelNet 40 | |||
---|---|---|---|---|---|
Class Accuracy | Instance Accuracy | Class Accuracy | Instance Accuracy | Training | |
PointNet [6] | - | - | 86.2 | 89.2 | 3–6 h |
PointNet++ [43] | - | - | - | 91.9 | 20 h |
Deepsets [100] | - | - | - | 90.0 | - |
SO-Net [57] | 95.5 | 95.7 | 90.8 | 93.4 | 3 h |
Dynamic Graph CNN [49] | - | - | - | 92.2 | - |
PointCNN [48] | - | - | 91.7 | - | - |
Kd-Net [78] | 93.5 | 94.0 | 88.5 | 91.8 | 120 h |
3DContextNet [44] | - | - | - | 91.1 | - |
MRTNet [46] | - | - | - | 91.7 | - |
SPLATNet [56] | - | - | 83.7 | 86.4 | - |
FoldingNet [95] | - | 94.4 | - | 88.4 | - |
NeuralSampler [55] | - | 95.3 | - | 88.7 | - |
Intersection over Union (IoU) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Mean | Air- Place | Bag | Cap | Car | Chair | Ear- Phone | Guitar | Knife | |
PointNet [6] | 83.7 | 83.4 | 78.7 | 82.5 | 74.9 | 89.6 | 73.0 | 91.5 | 85.9 |
PointNet++ [43] | 85.1 | 82.4 | 79.0 | 87.7 | 77.3 | 90.8 | 71.8 | 91.0 | 85.9 |
SO-Net [57] | 84.6 | 81.9 | 83.5 | 84.8 | 78.1 | 90.8 | 72.2 | 90.1 | 83.6 |
Dynamic Graph CNN [49] | 85.1 | 84.2 | 83.7 | 84.4 | 77.1 | 90.9 | 78.5 | 91.5 | 87.3 |
Kd-Net [78] | 82.3 | 80.1 | 74.6 | 74.3 | 70.3 | 88.6 | 73.5 | 90.2 | 87.2 |
3DContextNet [44] | 84.3 | 83.3 | 78.0 | 84.2 | 77.2 | 90.1 | 73.1 | 91.6 | 85.9 |
MRTNet [46] | 79.3 | 81.0 | 76.7 | 87.0 | 73.8 | 89.1 | 67.6 | 90.6 | 85.4 |
SPLATNet [56] | 83.7 | 85.4 | 83.2 | 84.3 | 89.1 | 80.3 | 90.7 | 75.5 | 93.1 |
Intersection over Union (IoU) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Mean | Lamp | Laptop | Motor | Mug | Pistol | Rocket | Skate | Table | |
PointNet [6] | 83.7 | 80.8 | 95.3 | 65.2 | 93.0 | 81.2 | 57.9 | 72.8 | 80.6 |
PointNet++ [43] | 85.1 | 83.7 | 95.3 | 71.6 | 94.1 | 81.3 | 58.7 | 76.4 | 82.6 |
SO-Net [57] | 84.6 | 82.3 | 95.2 | 69.3 | 94.2 | 80.0 | 51.6 | 73.1 | 82.6 |
Dynamic Graph CNN [49] | 85.1 | 82.9 | 96.0 | 67.8 | 93.3 | 82.6 | 59.7 | 75.5 | 82.0 |
Kd-Net [78] | 82.3 | 81.0 | 94.9 | 57.4 | 86.7 | 78.1 | 51.8 | 69.9 | 80.3 |
3DContextNet [44] | 84.3 | 81.4 | 95.4 | 69.1 | 92.3 | 81.7 | 60.8 | 71.8 | 81.4 |
MRTNet [46] | 79.3 | 80.6 | 95.1 | 64.4 | 91.8 | 79.7 | 57.0 | 69.1 | 80.6 |
SPLATNet [56] | 83.7 | 83.9 | 96.3 | 75.6 | 95.8 | 83.8 | 64.0 | 75.5 | 81.8 |
Model | Feature Extraction | mAPScanNet | mAPSUN RGB-D | mAP3D | ||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | ||||
FVNet [110] | PointNet | - | - | 65.43 | 57.34 | 51.85 |
VoxelNet [108] | - | - | - | 81.97 | 65.46 | 62.85 |
PointRCNN [107] | PointNet++, multi-scale grouping | - | - | 88.88 | 78.63 | 77.38 |
F-PointNet [111] | PointNet++ | - | - | 81.20 | 70.39 | 62.19 |
MVX-Net [109] | VoxelNet | - | - | 83.20 | 72.70 | 65.20 |
Deep Hough voting model [112] | PointNet++ | 46.80 | 57.70 | - | - | - |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, W.; Sun, J.; Li, W.; Hu, T.; Wang, P. Deep Learning on Point Clouds and Its Application: A Survey. Sensors 2019, 19, 4188. https://doi.org/10.3390/s19194188
Liu W, Sun J, Li W, Hu T, Wang P. Deep Learning on Point Clouds and Its Application: A Survey. Sensors. 2019; 19(19):4188. https://doi.org/10.3390/s19194188
Chicago/Turabian StyleLiu, Weiping, Jia Sun, Wanyi Li, Ting Hu, and Peng Wang. 2019. "Deep Learning on Point Clouds and Its Application: A Survey" Sensors 19, no. 19: 4188. https://doi.org/10.3390/s19194188