Robust Visual Odometry Leveraging Mixture of Manhattan Frames in Indoor Environments
<p>The proposed RGB-D VO system. <b>Top Left</b>: Structured scene. <b>Top Right</b>: Cluttered scene. <b>Bottom Left</b>: Sparse map in a structured scene. <b>Bottom Right</b>: Sparse map in a cluttered scene.</p> "> Figure 2
<p>The dominant directions in the proposed method. The <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>direction</mi> </mrow> <mn>0</mn> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>direction</mi> </mrow> <mn>5</mn> </msub> </mrow> </semantics></math> constitute a set of unit vectors <math display="inline"><semantics> <mrow> <mfenced close="}" open="{"> <mrow> <msubsup> <mi>d</mi> <mi>i</mi> <mi>w</mi> </msubsup> </mrow> </mfenced> </mrow> </semantics></math>. The <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>direction</mi> </mrow> <mn>0</mn> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>direction</mi> </mrow> <mn>2</mn> </msub> </mrow> </semantics></math> constitute the <math display="inline"><semantics> <mrow> <msub> <mi>M</mi> <mn>1</mn> </msub> </mrow> </semantics></math>.</p> "> Figure 3
<p>Overview of the proposed method.</p> "> Figure 4
<p>Rotation estimation in MW scenes. The proposed method first extracts the dominant directions from parallel lines and matches them in the global map. Secondly, we detect the MF <math display="inline"><semantics> <mrow> <msub> <mi>M</mi> <mn>2</mn> </msub> </mrow> </semantics></math> by using dominant directions to obtain the initial rotation from MF to the current frame. The frame <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>j</mi> </msub> </mrow> </semantics></math> first observed this MF. Then, we use a mean shift-based tracking strategy to refine the rotation. Finally, we obtain the drift-free rotation using <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>j</mi> </msub> </mrow> </semantics></math> as the reference frame. The green dashed arrow indicates the virtual dominant direction created by the cross-product between the two extracted dominant directions.</p> "> Figure 5
<p>Comparison of ATE RMSE (M) for ICL-NUIM sequence.</p> "> Figure 6
<p>The percentage of MFs detected from each sequence in the ICL-NUIM dataset.</p> "> Figure 7
<p>Sequences in TUM RGB-D dataset. The sorting is consistent with that in <a href="#sensors-22-08644-t003" class="html-table">Table 3</a>.</p> "> Figure 8
<p><b>Left</b>: Local map for the fr3-longoffice sequence. <b>Right</b>: Estimated trajectories with our method (blue) and ManhattanSLAM (green), and the ground truth (dashed grey) in TUM RGB-D dataset fr3-longoffice sequence.</p> "> Figure 9
<p><b>Left</b>: Comparison of ATE RMSE (M) for sequence fr1/xyz, fr1/desk, fr2/xyz, fr2/desk. <b>Right</b>: The percentage of MFs detected from each sequence.</p> "> Figure 10
<p><b>Left</b>: Comparison of ATE RMSE (M) for sequence fr3/s-nt-far, fr3/s-nt-near, fr3/s-t-far, fr3/s-t-near, fr3/cabinet. <b>Right</b>: The percentage of MFs detected from each sequence.</p> "> Figure 11
<p><b>Left</b>: Comparison of ATE RMSE (M) for sequence fr3/l-cabinet, fr3/longoffice. <b>Right</b>: The percentage of MFs detected from each sequence.</p> "> Figure 12
<p>Drift for TAMU-RGB-D Corridor-A sequence.</p> ">
Abstract
:1. Introduction
- A robust and general RGB-D VO framework for indoor environments is proposed. It is more suitable for real-world scenes because it can choose different tracking methods (decoupled and non-decoupled pose estimation methods) for different scenes.
- A novel drift-free rotation estimation approach is proposed. We detect the dominant directions for every frame by clustering the parallel lines. These dominant directions are tracked to detect MFs. Then, we use a mean-shift algorithm to obtain rotation estimation.
- An accurate and efficient local map bundle adjustment strategy combines points and lines reprojection errors with the rotation constraints from the multi-view dominant directions observations.
2. Materials and Methods
2.1. System Overview
2.2. Feature Detection and Matching
2.3. Dominant Direction
2.4. Manhattan Frame Detection
2.5. Pose Estimation
2.5.1. Non-MW Scenes
2.5.2. MW Scenes
2.6. Local Map Bundle Adjustment
3. Results
3.1. ICL-NUIM Dataset
3.2. TUM RGB-D Dataset
3.3. Time Consumption
3.4. Drift
4. Discussion
4.1. Localization Accuracy
4.1.1. ICL-NUIM Dataset
4.1.2. TUM RGB-D Dataset
4.2. Time Consumption
4.3. Drift
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Gomez-Ojeda, R.; Moreno, F.-A.; Zuniga-Noël, D.; Scaramuzza, D.; Gonzalez-Jimenez, J. PL-SLAM: A stereo SLAM system through the combination of points and line segments. IEEE Trans. Robot. 2019, 35, 734–746. [Google Scholar] [CrossRef] [Green Version]
- Pumarola, A.; Vakhitov, A.; Agudo, A.; Sanfeliu, A.; Moreno-Noguer, F. PL-SLAM: Real-time monocular visual SLAM with points and lines. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 4503–4508. [Google Scholar]
- Zureiki, A.; Devy, M. SLAM and data fusion from visual landmarks and 3D planes. IFAC Proc. Vol. 2008, 41, 14651–14656. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Wang, W.; Qi, X.; Liao, Z.; Wei, R. Point-plane slam using supposed planes for indoor environments. Sensors 2019, 19, 3795. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, C.; Qiao, N.; Ge, W.; Sun, J. Robust RGB-D Visual Odometry Using Point and Line Features. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 3826–3831. [Google Scholar]
- Zhang, C. PL-GM: RGB-D SLAM With a Novel 2D and 3D Geometric Constraint Model of Point and Line Features. IEEE Access 2021, 9, 9958–9971. [Google Scholar] [CrossRef]
- Kim, P.; Coltin, B.; Kim, H.J. Visual Odometry with Drift-Free Rotation Estimation Using Indoor Scene Regularities. In Proceedings of the BMVC, London, UK, 4–7 September 2017; p. 7. [Google Scholar]
- Kim, P.; Coltin, B.; Kim, H.J. Low-drift visual odometry in structured environments by decoupling rotational and translational motion. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), South Brisbane, Australia, 21–25 May 2018; pp. 7247–7253. [Google Scholar]
- Kim, P.; Coltin, B.; Kim, H.J. Linear RGB-D SLAM for planar environments. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 333–348. [Google Scholar]
- Joo, K.; Oh, T.-H.; Rameau, F.; Bazin, J.-C.; Kweon, I.S. Linear rgb-d slam for atlanta world. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, May 31–Aug 31 2020; pp. 1077–1083. [Google Scholar]
- Li, Y.; Brasch, N.; Wang, Y.; Navab, N.; Tombari, F. Structure-slam: Low-drift monocular slam in indoor environments. IEEE Robot. Autom. Lett. 2020, 5, 6583–6590. [Google Scholar] [CrossRef]
- Li, Y.; Yunus, R.; Brasch, N.; Navab, N.; Tombari, F. RGB-D SLAM with structural regularities. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi′an, China, 30 May–5 June 2021; pp. 11581–11587. [Google Scholar]
- Zhou, Y.; Kneip, L.; Rodriguez, C.; Li, H. Divide and conquer: Efficient density-based tracking of 3D sensors in Manhattan worlds. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 3–19. [Google Scholar]
- Liu, J.; Meng, Z. Visual SLAM With Drift-Free Rotation Estimation in Manhattan World. IEEE Robot. Autom. Lett. 2020, 5, 6512–6519. [Google Scholar] [CrossRef]
- Liu, J.; Meng, Z.; You, Z. A robust visual SLAM system in dynamic man-made environments. Sci. China Technol. Sci. 2020, 63, 1628–1636. [Google Scholar] [CrossRef]
- Shu, F.; Xie, Y.; Rambach, J.; Pagani, A.; Stricker, D. Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021; pp. 165–170. [Google Scholar]
- Xia, R.; Jiang, K.; Wang, X.; Zhan, Z. Structural line feature selection for improving indoor visual slam. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43, 327–334. [Google Scholar] [CrossRef]
- Yunus, R.; Li, Y.; Tombari, F. Manhattanslam: Robust planar tracking and mapping leveraging mixture of manhattan frames. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi′an, China, 30 May–5 June 2021; pp. 6687–6693. [Google Scholar]
- Company-Corcoles, J.P.; Garcia-Fidalgo, E.; Ortiz, A. MSC-VO: Exploiting Manhattan and Structural Constraints for Visual Odometry. IEEE Robot. Autom. Lett. 2022, 7, 2803–2810. [Google Scholar] [CrossRef]
- Straub, J.; Rosman, G.; Freifeld, O.; Leonard, J.J.; Fisher, J.W. A mixture of manhattan frames: Beyond the manhattan world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3770–3777. [Google Scholar]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Von Gioi, R.G.; Jakubowicz, J.; Morel, J.-M.; Randall, G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Koch, R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
- Kümmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. g2o: A general framework for graph optimization. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613. [Google Scholar]
- Handa, A.; Whelan, T.; McDonald, J.; Davison, A.J. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1524–1531. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Loulé, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Lu, Y.; Song, D. Robust RGB-D odometry using point and line features. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3934–3942. [Google Scholar]
Method | Year | Feature Types | Assumption | Pose Estimation Method |
---|---|---|---|---|
Ours | 2022 | Point, Line, direction | MMF | decoupled |
MSC-VO | 2021 | Point, Line | MW | non-decoupled |
ManhattanSLAM | 2021 | Point, Line, Plane | MMF | decoupled |
RGB-D SLAM | 2021 | Point, Line, Plane | MW | decoupled |
SP-SLAM | 2019 | Point, Plane | × | non-decoupled |
ORB-SLAM2 | 2017 | Point | × | non-decoupled |
Sequence | Ours | MSC-VO | ManhattanSLAM | RGB-D SLAM | SP-SLAM | ORB-SLAM2 |
---|---|---|---|---|---|---|
Ir-kt0 | 0.006 | 0.006 | 0.007 | 0.006 | 0.016 | 0.014 |
Ir-kt1 | 0.013 | 0.010 | 0.011 | 0.015 | 0.018 | 0.011 |
Ir-kt2 | 0.014 | 0.009 | 0.015 | 0.020 | 0.017 | 0.021 |
Ir-kt3 | 0.017 | 0.038 | 0.011 | 0.012 | 0.022 | 0.018 |
of-kt0 | 0.025 | 0.028 | 0.025 | 0.041 | 0.031 | 0.049 |
of-kt1 | 0.018 | 0.017 | 0.013 | 0.020 | 0.018 | 0.029 |
of-kt2 | 0.015 | 0.014 | 0.015 | 0.011 | 0.027 | 0.030 |
of-kt3 | 0.015 | 0.010 | 0.013 | 0.014 | 0.012 | 0.012 |
Average | 0.015 | 0.017 | 0.014 | 0.017 | 0.020 | 0.023 |
Group | Sequence | Texture | Structure | Plane | Strict Follow the MW Assumption |
---|---|---|---|---|---|
1 | fr1/xyz | high | middle | low | middle |
fr1/desk | |||||
fr2/xyz | |||||
fr2/desk | |||||
2 | fr3/s-nt-far | low | high | high | low |
fr3/s-nt-near | |||||
fr3/s-t-far | high | ||||
fr3/s-t-near | |||||
fr3/cabinet | low | high | |||
3 | fr3/l-cabinet | high | middle | middle | middle |
fr3/longoffice |
Group | Sequence | Ours | MSC-VO | ManhattanSLAM | RGB-D SLAM | SP-SLAM | ORB-SLAM2 |
---|---|---|---|---|---|---|---|
1 | fr1/xyz | 0.009 | 0.010 | 0.010 | × | 0.010 | 0.010 |
fr1/desk | 0.015 | 0.019 | 0.027 | × | 0.026 | 0.022 | |
fr2/xyz | 0.004 | 0.005 | 0.008 | × | 0.009 | 0.009 | |
fr2/desk | 0.010 | 0.023 | 0.037 | × | 0.025 | 0.040 | |
Average | 0.010 | 0.014 | 0.021 | * | 0.018 | 0.020 | |
2 | fr3/s-nt-far | 0.021 | 0.077 | 0.040 | 0.022 | 0.031 | × |
fr3/s-nt-near | 0.020 | × | 0.023 | 0.025 | 0.024 | × | |
fr3/s-t-far | 0.010 | - | 0.022 | 0.010 | 0.016 | 0.011 | |
fr3/s-t-near | 0.010 | - | 0.012 | 0.015 | 0.010 | 0.011 | |
fr3/cabinet | 0.036 | - | 0.023 | 0.035 | × | × | |
Average | 0.019 | * | 0.024 | 0.021 | * | * | |
3 | fr3/l-cabinet | 0.045 | 0.120 | 0.083 | 0.071 | 0.074 | × |
fr3/longoffice | 0.011 | 0.022 | 0.046 | - | - | 0.021 | |
Average | 0.028 | 0.071 | 0.065 | * | * | * |
Method | Tracking | Local Mapping | ||
---|---|---|---|---|
Ours | Feature Extrac. | Pose Estim. | Total (Hz) | Local Map BA |
24.39 | 12.59 | 25 | 183.34 | |
ManhattanSLAM | superpixel extraction and surfel fusion | Total (Hz) | - | |
37.8 | 16 | - |
Sequence | Ours | MSC-VO | ManhattanSLAM | ORB-SLAM2 | Length (m) |
---|---|---|---|---|---|
Corridor-A | 0.80 | 0.91 | 0.53 | 3.13 | 82 |
Entry-Hall | 0.76 | 1.07 | 0.39 | 2.22 | 54 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, H.; Wu, C.; Deng, Z.; Yin, J. Robust Visual Odometry Leveraging Mixture of Manhattan Frames in Indoor Environments. Sensors 2022, 22, 8644. https://doi.org/10.3390/s22228644
Yuan H, Wu C, Deng Z, Yin J. Robust Visual Odometry Leveraging Mixture of Manhattan Frames in Indoor Environments. Sensors. 2022; 22(22):8644. https://doi.org/10.3390/s22228644
Chicago/Turabian StyleYuan, Huayu, Chengfeng Wu, Zhongliang Deng, and Jiahui Yin. 2022. "Robust Visual Odometry Leveraging Mixture of Manhattan Frames in Indoor Environments" Sensors 22, no. 22: 8644. https://doi.org/10.3390/s22228644