Towards a Meaningful 3D Map Using a 3D Lidar and a Camera
<p>Example of a semantic 3D map generated by the proposed method.</p> "> Figure 2
<p>Flowchart of the semantic 3D mapping method.</p> "> Figure 3
<p>Example of 2D semantic segmentation: (Top) input image (Bottom) prediction.</p> "> Figure 4
<p>Coordinate alignment: labeled voxels are projected onto image.</p> "> Figure 5
<p>Result of the error minimization process. (First column) a set of voxels with the same label. (Second column) results of Euclidean clustering. (Third column) results obtained by the classifier.</p> "> Figure 6
<p>Example of the trace removal from the top view.</p> "> Figure 7
<p>Visualization of semantic 3D mapping results. Top view for the entire map and three close-up views with different scenarios.</p> "> Figure 8
<p>Effectiveness of the map refinement. (First Row) original images. (Second Row) 2D semantic segmentation. (Third Row) semantic 3D map without map refinement. (Bottom Row) semantic 3D map with map refinement.</p> "> Figure 9
<p>Comparison of 2D semantic segmentation and 3D semantic segmentation.</p> ">
Abstract
:1. Introduction
- We developed a semantic 3D mapping algorithm suitable for large-scale environments by combining a 3D Lidar with a camera.
- We presented incremental semantic labeling including coordinate alignment, error minimization, and semantic information fusion to enhance the quality of a semantic map.
- We developed map refinement to remove traces and improve the accuracy of a semantic 3D map.
- We improved 3D segmentation accuracy over state-of-the-art algorithms on the KITTI dataset.
2. Approach
2.1. Semantic Mapping
2.1.1. Consistent Point Cloud Registration
2.1.2. 2D Semantic Segmentation
2.1.3. Incremental Semantic Labeling
Coordinate Alignment
Error Minimization
Semantic Information Fusion
2.2. Map Refinement
2.2.1. Rectification of Label’s Spatial Distribution
2.2.2. Removal of Traces
3. Experimental Results
3.1. Dataset
3.2. Implementation Details
3.3. Qualitative Evaluation
3.4. Quantitative Evaluation
3.5. Time Analysis
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Barreiro, J.B.; Vicent, J.P.A.; Garcia, J.L.L. Airborne light detection and ranging (LiDAR) point density analysis. Sci. Res. Essays 2012, 7, 3010–3019. [Google Scholar] [CrossRef]
- Barreiro, J.B.; Lerma, J.L. Empirical study of variation in lidar point density over different land covers. Int. J. Remote Sens. 2014, 35, 3372–3383. [Google Scholar] [CrossRef]
- Barreiro, J.B.; Lerma, J.L. A new methodology to estimate the discrete-return point density on airborne LiDAR surveys. Int. J. Remote Sens. 2014, 35, 1496–1510. [Google Scholar] [CrossRef]
- Bosse, M.; Zlot, R.; Flick, P. Zebedee: Design of a spring-mounted 3-d range sensor with application to mobile mapping. IEEE Trans. Robot. 2012, 28, 1104–1119. [Google Scholar] [CrossRef]
- Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. In Proceedings of the 2014 Robotics: Science and Systems Conference, Rome, Italy, 12–16 July 2014. [Google Scholar]
- Nüchter, A.; Hertzberg, J. Towards semantic maps for mobile robots. Robot. Auton. Syst. 2008, 56, 915–926. [Google Scholar] [CrossRef] [Green Version]
- Blodow, N.; Goron, L.C.; Marton, Z.C.; Pangercic, D.; Rühr, T.; Tenorth, M.; Beetz, M. Autonomous semantic mapping for robots performing everyday manipulation tasks in kitchen environments. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, CA, USA, 25–30 September 2011; pp. 4263–4270. [Google Scholar]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Whelan, T.; Salas-Moreno, R.F.; Glocker, B.; Davison, A.J.; Leutenegger, S. ElasticFusion: Real-time dense SLAM and light source estimation. Int. J. Robot. Res. 2016, 35, 1697–1716. [Google Scholar] [CrossRef] [Green Version]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Temeltas, H.; Kayak, D. SLAM for robot navigation. IEEE Aerosp. Electron. Syst. Mag. 2008, 23, 16–19. [Google Scholar] [CrossRef]
- Leung, C.; Huang, S.; Dissanayake, G. Active SLAM using model predictive control and attractor based exploration. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2006), Beijing, China, 9–15 October 2006; pp. 5026–5031. [Google Scholar]
- Vineet, V.; Miksik, O.; Lidegaard, M.; Nießner, M.; Golodetz, S.; Prisacariu, V.A.; Kähler, O.; Murray, D.W.; Izadi, S.; Pérez, P.; et al. Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA 2015), Seattle, WA, USA, 26–30 May 2015; pp. 75–82. [Google Scholar]
- Li, X.; Belaroussi, R. Semi-Dense 3D Semantic Mapping from Monocular SLAM. arXiv 2016, arXiv:1611.04144. [Google Scholar]
- Yang, S.; Huang, Y.; Scherer, S. Semantic 3D occupancy mapping through efficient high order CRFs. arXiv 2017, arXiv:1707.07388. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.Y.; Savarese, S. Segcloud: Semantic segmentation of 3D point clouds. In Proceedings of the 2017 International Conference on 3D Vision (3DV 2017), Qingdao, China, 10–12 October 2017; pp. 537–547. [Google Scholar]
- Floros, G.; Leibe, B. Joint 2d-3d temporally consistent semantic segmentation of street scenes. In Proceedings of the 2012 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2012), Providence, RI, USA, 16–21 June 2012; pp. 2823–2830. [Google Scholar]
- Larlus, D.; Jurie, F. Combining appearance models and markov random fields for category level object segmentation. In Proceedings of the 2008 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, AK, USA, 23–28 June 2008; pp. 1–7. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks with identity mappings for high-resolution semantic segmentation. arXiv 2016, arXiv:1611.06612. [Google Scholar]
- Cheng, J.; Sun, Y.; Meng, M.Q.H. A dense semantic mapping system based on CRF-RNN network. In Proceedings of the 2017 18th International Conference on Advanced Robotics (ICAR 2017), Hong Kong, China, 10–12 July 2017; pp. 589–594. [Google Scholar]
- Hermans, A.; Floros, G.; Leibe, B. Dense 3d semantic mapping of indoor scenes from rgb-d images. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–7 June 2014; pp. 2631–2638. [Google Scholar]
- Sengupta, S.; Sturgess, P. Semantic octree: Unifying recognition, reconstruction and representation via an octree constrained higher order MRF. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA 2015), Seattle, WA, USA, 26–30 May 2015; pp. 1874–1879. [Google Scholar]
- Puente, I.; González-Jorge, H.; Martínez-Sánchez, J.; Arias, P. Review of mobile mapping and surveying technologies. Measurement 2013, 46, 2127–2145. [Google Scholar] [CrossRef]
- Geiger, A.; Moosmann, F.; Car, Ö.; SchusterLin, B. Automatic camera and range sensor calibration using a single shot. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA 2012), Saint Paul, MN, USA, 14–18 May 2012; pp. 3936–3943. [Google Scholar]
- Pandey, G.; McBride, J.R.; Savarese, S.; Eustice, R.M. Automatic extrinsic calibration of vision and lidar by maximizing mutual information. J. Field Robot. 2015, 32, 696–722. [Google Scholar] [CrossRef]
- Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv 2017, arXiv:1704.03847. [Google Scholar] [CrossRef]
- Weinmann, M.; Jutzi, B.; Mallet, C. Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant features. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 2, 181–188. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996, 96, 226–231. [Google Scholar]
- Sengupta, S.; Greveson, E.; Shahrokni, A.; Torr, P.H. Urban 3D Semantic Modelling Using Stereo Vision. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA 2013), Karlsruhe, Germany, 6–10 May 2013; pp. 580–585. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
Method | Road | Sidewalk | Building | Fence | Pole | Vegetation | Vehicle | |
---|---|---|---|---|---|---|---|---|
Accuracy | Sengupta [34] | 97.8 | 86.5 | 88.5 | 46.1 | 38.2 | 86.9 | 88.5 |
Sengupta [27] | 97.0 | 73.4 | 89.1 | 45.7 | 3.3 | 81.2 | 72.5 | |
Vineet [13] | 98.7 | 91.8 | 97.2 | 47.8 | 51.4 | 94.1 | 94.1 | |
Yang [15] | 98.7 | 93.8 | 98.2 | 84.7 | 66.3 | 98.7 | 95.5 | |
Ours | 98.7 | 95.2 | 99.3 | 92.1 | 77.2 | 95.3 | 98.3 | |
IoU | Sengupta [34] | 96.3 | 68.4 | 83.8 | 45.2 | 28.9 | 74.3 | 63.5 |
Sengupta [27] | 87.8 | 49.1 | 73.8 | 43.7 | 1.9 | 65.2 | 55.8 | |
Vineet [13] | 94.7 | 73.8 | 88.3 | 46.3 | 41.7 | 83.2 | 79.5 | |
Yang [15] | 96.6 | 90.0 | 95.4 | 81.1 | 61.5 | 91.0 | 94.6 | |
Ours | 96.7 | 91.3 | 98.4 | 83.0 | 59.4 | 93.4 | 96.0 |
Method | Time(s) |
---|---|
2D Semantic Segmentation | 0.2412 |
Euclidean Clustering | 0.0898 |
Random Forest | 0.1913 |
Semantic information fusion | 0.0003 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jeong, J.; Yoon, T.S.; Park, J.B. Towards a Meaningful 3D Map Using a 3D Lidar and a Camera. Sensors 2018, 18, 2571. https://doi.org/10.3390/s18082571
Jeong J, Yoon TS, Park JB. Towards a Meaningful 3D Map Using a 3D Lidar and a Camera. Sensors. 2018; 18(8):2571. https://doi.org/10.3390/s18082571
Chicago/Turabian StyleJeong, Jongmin, Tae Sung Yoon, and Jin Bae Park. 2018. "Towards a Meaningful 3D Map Using a 3D Lidar and a Camera" Sensors 18, no. 8: 2571. https://doi.org/10.3390/s18082571