drones-07-00329-v2
drones-07-00329-v2
drones-07-00329-v2
Article
Research on Environment Perception System of Quadruped
Robots Based on LiDAR and Vision
Guangrong Chen *,† , Liang Hong †
Robotics Research Center, Beijing Jiaotong University, Beijing 100044, China; 18222038@bjtu.edu.cn
* Correspondence: grchen@bjtu.edu.cn
† These authors contributed equally to this work.
Abstract: Due to the high stability and adaptability, quadruped robots are currently highly discussed
in the robotics field. To overcome the complicated environment indoor or outdoor, the quadruped
robots should be configured with an environment perception system, which mostly contain LiDAR
or a vision sensor, and SLAM (Simultaneous Localization and Mapping) is deployed. In this paper,
the comparative experimental platforms, including a quadruped robot and a vehicle, with LiDAR
and a vision sensor are established firstly. Secondly, a single sensor SLAM, including LiDAR SLAM
and Visual SLAM, are investigated separately to highlight their advantages and disadvantages.
Then, multi-sensor SLAM based on LiDAR and vision are addressed to improve the environmental
perception performance. Thirdly, the improved YOLOv5 (You Only Look Once) by adding ASFF
(adaptive spatial feature fusion) is employed to do the image processing of gesture recognition and
achieve the human–machine interaction. Finally, the challenge of environment perception system for
mobile robot based on comparison between wheeled and legged robots is discussed. This research
provides an insight for the environment perception of legged robots.
Keywords: quadruped robot; simultaneous localization and mapping; image processing; deep learning
data from these two sensors, the robustness of perception can be greatly improved [19].
Joel et al. fused LiDAR and color imagery for pedestrian detection using CNNs [20]. They
incorporated LiDAR by up-sampling the point cloud to a dense depth map and extracting
three features representing horizontal disparity, height above ground, and angle (HHA)
features. These features were then used as extra image channels and fed into CNNs to learn
a deep hierarchy of feature representation. Mohamed Dhouioui proposed an embedded
system based on two types of data, radar signals and camera images, aiming to identify
and classify obstacles on the road. They used machine learning methods and signal pro-
cessing techniques to optimize the overall computation performance and efficiency [21].
Elena incorporated vision and laser fusion techniques for simultaneous localization and
mapping of Micro Air Vehicles (MAVs) in indoor rescue and/or identification navigation
missions. The technique fused laser and visual information, as well as measurement data
from inertial components, to obtain reliable 6DOF pose estimation of MAV within a local
map. Experimental results showed that sensor fusion can improve position estimation
under different test conditions and obtain accurate maps [22]. When considering robotic
applications in complex scenarios, traditional geometric maps appear inaccurate due to
their lack of interaction with the environment. Based on this, Jing Li et al. proposed
building a three-dimensional (3D) semantic map with large-scale and precise integration of
LiDAR and camera information to more accurately present real-time road scenes [23]. First,
they performed SLAM through multi-sensor fusion of LiDAR and inertial measurement
unit (IMU) data to locate the robot’s position and build a map of the surrounding scene
while the robot moves. Furthermore, they employed a CNN-based image semantic seg-
mentation to develop a semantic map of the environment. To address the incompleteness
of environmental perception when using only a 2D LiDAR, they calibrated the point cloud
information from the RGBD camera Kinectv2 and the 2D LiDAR using internal and external
parameters based on the Cartographer algorithm [24]. Precise calibration of the rigid body
transform between the sensors is crucial for correct data fusion. To simplify the calibration
process, Michelle et al. presented the first framework that makes use of CNNs for odometry
estimation by fusing data from 2D laser scanners and monocular cameras without requiring
sensor calibration [25]. Mary et al. presented a fusion of a six-degrees-of-freedom (6-DoF)
inertial sensor and a monocular vision [26]. They integrated a monocular vision-based
object detection algorithm using Speeded-Up Robust Feature (SURF) and Random Sample
Consensus (RANSAC) algorithms to improve the accuracy of detection. By fusing data
from inertial sensors and a camera using an Extended Kalman Filter (EKF), they estimated
the position and orientation of the mobile robot. Xia X et al. proposed an automated driving
systems data acquisition and analytics platform. It presents a holistic pipeline from the
raw advanced sensory data collection to data processing, which is capable of processing
the sensor data from multi-CAVs (connected automated vehicle) and extracting the objects’
Identity (ID) number, position, speed, and orientation information in the map and Frenet
coordinates [27]. Liu W et al. proposed a novel kinematic-model-based VSA (vehicle slip
angle) estimation method by fusing information from a GNSS and an IMU [28]. Xia X et al.
proposed a method for the IMU and automotive onboard sensors fusion to estimate the
yaw misalignment autonomously [29].
matching. Qiu et al. proposed an Adaptive Spatial Feature Fusion (ASFF) YOLOv5
network (ASFF-YOLOv5) to improve the accuracy of recognition and detection of multiple
multiscale road traffic elements [31]. The first step was to use the K-means algorithm for
clustering statistics on the range of multiscale road traffic elements. Then, they employed
a spatial pyramid pooling fast (SPPF) structure to enhance the accuracy of information
extraction. To address the problems in object detection in drone-captured scenarios due to
different altitudes and high drone speeds, Zhu et al. proposed TPH-YOLOv5 to handle
different object scales and motion blur [32]. Based on YOLOv5, they added an additional
prediction head to detect objects of different scales. They replaced the original prediction
heads with Transformer Prediction Heads (TPH) and integrated the Convolutional Block
Attention Model (CBAM) to identify attention regions in scenarios with dense objects.
Experiments on the VisDrone2021 dataset demonstrated that TPH-YOLOv5 performed
well, with impressive interpretability, in drone-captured scenarios. Liu W et al. proposed a
novel algorithm referred to as YOLOv5-tassel to detect tassels in UAV-based (Unmanned
aerial vehicle) RGB imagery [33].
In this paper, environment perception system of quadruped robots based on LiDAR
and vision is investigated. The paper is organized as follows. In Section 2, the comparative
experimental platforms are set up. In Section 3, the single sensor SLAM is studied. In
Section 4, the multi-sensor SLAM is investigated. In Section 5, the human–machine inter-
action via gesture recognition is addressed. In Section 6, the challenge of environment
perception system for legged robots is analyzed. In Section 7, conclusions are drawn and
future works are issued.
2. System Overview
To investigate the environmental perception performance of different mobile platforms,
different sensors, and different algorithms, we used two platforms, a quadruped robot and
a vehicle are set up with LiDAR and vision sensor.
(a) (b)
Figure 1. (a,b) The comparative experimental platforms with LiDAR and depth-camera.
Drones 2023, 7, 329 5 of 19
Figure 5. The successful tracking outcome achieved with ORB-SLAM2: feature points is labeled and
point cloud map can be generated.
Figure 7. The failed tracking outcome achieved with ORB-SLAM2: No feature points are labeled and
a point cloud map can not be generated.
To obtain a dense mapping result, we employed RTAB-MAP after the sparse mapping
achieved by ORB-SLAM2. RTAB-MAP is a graph-based SLAM approach. The visual
odometry process in RTAB-MAP involves feature detection, feature matching, motion
prediction, motion estimation, local bundle adjustment, pose update, and key frame and
feature map update [36]. Figure 8 illustrates the experiment result using RTAB-MAP on the
quadruped robot. Figure 8a presents the top view of the map, Figure 8b shows the two-
dimensional grid map for navigation, Figure 8c displays exhibits the three-dimensional
point cloud maps. The resulting map presents three-dimensional stereoscopic visual
information, and additional visual features can be extracted after processing. However, the
small field of view angle of the depth camera results in the omission of certain scene angles
in the constructed map.
(a) (b)
(c)
Figure 8. The experiment result using RTAB-MAP on the quadruped robot. (a) Top view of the map;
(b) Two-dimensional grid map for navigation; (c) Three-dimensional point cloud maps.
xk |k = x k |k + Kk y k (6)
where Fk is the state transition matrix, Qk is the prediction noise covariance matrix, Rk is
the observation noise covariance matrix, Hk is the observation matrix, and I is the identity
matrix.
In the prediction section, Equation (2) shows the state prediction, which obtains the
prior of the current moment xk|k−1 from the previous moment posterior xk−1|k−1 and the
control input at this time uk . Equation (3) is used to predict the covariance priors.
In the update section, Equation (4) shows the calculation of the residual yk . Equation (5)
calculates the gain Kk . Equation (6) corrects the prediction, where xk|k is the estimated state
at the current moment. Equation (7) yields a posterior estimate Pk|k .
During fusion, the 3D visual information obtained by the camera needs to be de-
composed into a two-dimensional plane in order to achieve fusion with the informa-
tion obtained by the two-dimensional LiDAR. Since the LiDAR and RGB-D cameras
are horizontally mounted, this decomposition can be easily performed. Therefore, the
fusion problem can be transformed into an Extended Kalman filter fusion problem of
two two-dimensional planes.
(a) Top view of the map (b) Two-dimensional grid map for navigation
(c) Side view of the map (d) Three-dimensional point cloud maps
5. Human–Machine Interaction
For mobile robots, apart from environment perception, interaction with humans is
also essential. Human–machine interaction helps robots understand human intentions,
enabling them to make informed decisions. Here, 13 gesture recognitions are studied as an
interaction method.
Figures 11 and 12 illustrate the process of feature fusion using ASFF. In this process,
features X 1 , X 2 , and X 3 from level 1, level 2, and level 3, respectively, are multiplied by
weight parameters α, β, and γ to obtain weighted features. These weighted features are
then summed up to obtain the fused feature ASFF, as shown in Equation (8) [39],
where yijl implies the (i, j)-th vector of the output feature maps yl among channels. αijl , βlij ,
and γijl refer to the spatial importance weights for the feature maps at three different levels
to level l, which are adaptively learned by the network [39].
Drones 2023, 7, 329 12 of 19
Level 1
ASFF-1 predict
stride 32
ASFF-3 predict
Level 3
stride 8
ASFF-3
+ +
Χ1→3 𝛼3 Χ 2→3 𝛽3 Χ 3→3 𝛾3
Figure 11. Illustration of the ASFF mechanism. For each level, the features of all the other levels are
resized to the same shape and spatially fused according to the learned weight maps [39].
where αijl , βlij , and γijl are defined by using the softmax function with γαij
l , γl , and γl as
βij γij
control parameters, respectively. We use 1×1 convolution layers to compute the weight
scalar maps γαl , γlβ and γγl from x1→l , x2→l and x3→l , respectively, and they can thus be
learned through standard back-propagation [39].
As depicted in Figure 14a,b, the curves represent the training results. The Box curve
represents the mean loss function, where a smaller value indicates more accurate prediction
box positioning. Objectness represents the mean loss of object detection, and a smaller
value indicates more accurate object detection. Classification represents the mean loss of
classification, where a smaller value indicates more accurate classification. This can be
expressed as Equation (10). The Precision curve represents precision, where a higher value
indicates higher accuracy.
TP
Precision = (10)
TP + FP
where TP represents the number of predicting positive classes as positive classes, and FP
represents the number of predicting negative classes as positive classes.
Drones 2023, 7, 329 14 of 19
The calculation formula for recall is shown in Equation (11). A higher value of recall
indicates higher accuracy.
TP
Recall = (11)
TP + FN
where TP is the number of predicting positive classes as positive classes, FN is the number
of predicting positive classes as negative classes. mAP indicates the area enclosed by the
two axes of accuracy and recall. The higher the value, the more accurate the detection.
F1 is another indicator of classification. The calculation formula of F1 is shown as
Equation (12).
2 × recall × precision
F1 = (12)
recall + precision
As depicted in Figure 14a,b, the mean loss function of the improved YOLOv5 is
approximately 0.15, which is significantly lower than the 0.03 of YOLOv5. The classification
loss is around 0.0010, slightly lower than the 0.0015 of YOLOv5. The highest accuracy
reaches approximately 0.9, slightly higher than the nearly 0.9 of YOLO v5. The recall rate
is approximately 0.83, higher than the 0.8 of YOLO v5. The mAP is nearly 0.9, which is
significantly higher than the around 0.85 of YOLO v5. Overall, the improved network with
ASFF outperforms the original network.
(a) Top view of the map (b) Two-dimensional grid map for navigation
(c) Side view of the map (d) Three-dimensional point cloud maps
Figure 16. The results of multi-sensor fusion SLAM on the vehicle.
The factors that weaken the environment perception performance of legged robots
may include:
• Oscillating body.
• Changing attitude.
• Non-smooth speed.
To reduce the influence of the above three factors, maybe IMU or other sensors for
positioning should be considered into multi-sensor fusion SLAM.
7. Conclusions
In this paper, the environment perception system of quadruped robots based on
LIDAR and vision is investigated by comparative platforms, sensors, and algorithms.
In the SLAM part, initial experiments are conducted on quadruped robots using a
single laser and visual sensor for map construction. However, these experiments reveal the
limitations of a single sensor, including the lack of visual information and incomplete map
construction. To address these issues and achieve more accurate and robust localization
results, we employ the Extended Kalman Filter method to fuse data from the LiDAR
and depth camera. The fusion approach effectively compensates for the missing visual
Drones 2023, 7, 329 17 of 19
information in the laser map and addresses the limited field of view angle. Moreover, when
one sensor fails, the other sensor ensures uninterrupted positioning.
In the visual recognition part, we establish a human–machine interaction system and
enhance gesture recognition using the added ASFF improved YOLOv5 network. Experi-
mental results demonstrate a significant improvement in gesture recognition accuracy with
the improved YOLOv5 network.
In addition, the difference of environmental perception performance between wheeled
and legged robots is studied. Additionally, the results shows the environmental perception
performance of the quadruped robot is weaker than that of the vehicle, since the vehicle is
more stable in the movement.
With the rapid advancements in artificial intelligence and computer vision, quadruped
robots are poised to find extensive applications in various fields such as surveying, search
and rescue operations, courier services during epidemics, and assistance for the disabled
as guide dogs. Furthermore, the environmental perception system developed in this study
can be applied not only to quadruped robots but also to autonomous driving, offering
promising prospects for broad applications.
However, the mapping results of the quadruped robot in this study still suffer from
noise and blurred boundaries due to unstable motion. To address these issues, the following
methods can be employed:
• Involve IMU or other sensors for positioning into multi-sensor fusion SLAM.
• Reduce the walking speed of the quadruped robot during map construction and
implement intermittent stops to mitigate motion instability.
• Enhance the stability of the robot’s motion by improving gait planning methods and
reducing shaking during movement. Additionally, incorporating cushioning materials
at the foot end can help minimize ground impact while walking.
• Utilize mechanical anti-shake techniques and specialized sensors, such as gyroscopes
and accelerometers, to detect robot movement and compensate for camera motion.
• Introduce filtering algorithms in the mapping algorithm to remove image noise.
• Apply digital video stabilization methods to estimate and smooth motion, filter out
unwanted motion, and reconstruct stable video.
Author Contributions: Conceptualization, G.C. and L.H.; methodology, G.C. and L.H.; software,
G.C. and L.H.; validation, G.C. and L.H.; formal analysis, G.C. and L.H.; investigation, G.C. and
L.H.; resources, G.C. and L.H.; data curation, G.C. and L.H.; writing—original draft preparation,
G.C. and L.H.; writing—review and editing, G.C. and L.H.; visualization, G.C. and L.H.; supervision,
G.C.; project administration, G.C.; funding acquisition, G.C. All authors have read and agreed to the
published version of the manuscript.
Funding: This work was supported by National Natural Science Foundation of China (62103036),
Fundamental Research Funds for the Central Universities (2022JBMC025), National Key Research
and Development Program of China (2022YFB4701600), and Joint Fund of the Ministry of Education
for Equipment Pre-research (8091B032147).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Chen, G.; Wei, N.; Yan, L.; Lu, H.; Li, J. Perturbation-based approximate analytic solutions to an articulated SLIP model for legged
robots. Commun. Nonlinear Sci. Numer. Simul. 2023, 117, 106943. [CrossRef]
2. Hui, Z. Research on Environmental Perception, Recognition and Leader Following Algorithm of the Quadruped Robot. Ph.D.
Thesis, Shandong University, Jinan, China, 2016.
3. Chen, G.; Wang, J.; Wang, S.; Zhao, J.; Shen, W. Compliance control for a hydraulic bouncing system. ISA Trans. 2018, 79, 232–238.
[CrossRef] [PubMed]
4. Chen, G.; Wei, N.; Lu, H.; Yan, L.; Li, J. Optimization and evaluation of swing leg retraction for a hydraulic biped robot. J. Field
Robot. 2023, early view. [CrossRef]
5. Chen, G.; Guo, S.; Hou, B.; Wang, J. Virtual model control for quadruped robots. IEEE Access 2020, 8, 140736–140751. [CrossRef]
Drones 2023, 7, 329 18 of 19
6. Gao, Y.; Wang, D.; Wei, W.; Yu, Q.; Liu, X.; Wei, Y. Constrained Predictive Tracking Control for Unmanned Hexapod Robot with
Tripod Gait. Drones 2022, 6, 246. [CrossRef]
7. Lee, J.W.; Lee, W.; Kim, K.D. An algorithm for local dynamic map generation for safe UAV navigation. Drones 2021, 5, 88.
[CrossRef]
8. Lee, D.K.; Nedelkov, F.; Akos, D.M. Assessment of Android Network Positioning as an Alternative Source of Navigation for
Drone Operations. Drones 2022, 6, 35. [CrossRef]
9. Xia, X.; Hashemi, E.; Xiong, L.; Khajepour, A. Autonomous Vehicle Kinematics and Dynamics Synthesis for Sideslip Angle
Estimation Based on Consensus Kalman Filter. IEEE Trans. Control Syst. Technol. 2022, 31, 179–192. [CrossRef]
10. Gao, L.; Xiong, L.; Xia, X.; Lu, Y.; Yu, Z.; Khajepour, A. Improved vehicle localization using on-board sensors and vehicle lateral
velocity. IEEE Sens. J. 2022, 22, 6818–6831. [CrossRef]
11. Ramachandran, A.; Sangaiah, A.K. A review on object detection in unmanned aerial vehicle surveillance. Int. J. Cogn. Comput.
Eng. 2021, 2, 215–228. [CrossRef]
12. Liang, Y.; Li, M.; Jiang, C.; Liu, G. CEModule: A computation efficient module for lightweight convolutional neural networks.
IEEE Trans. Neural Netw. Learn. Syst. 2021, early access. [CrossRef]
13. Zhou, P.; Liu, G.; Wang, J.; Weng, Q.; Zhang, K.; Zhou, Z. Lightweight unmanned aerial vehicle video object detection based on
spatial-temporal correlation. Int. J. Commun. Syst. 2022, 35, e5334. [CrossRef]
14. Ocando, M.G.; Certad, N.; Alvarado, S.; Terrones, Á. Autonomous 2D SLAM and 3D mapping of an environment using a single
2D LIDAR and ROS. In Proceedings of the 2017 Latin American Robotics Symposium (LARS) and 2017 Brazilian Symposium on
Robotics (SBR), Curitiba, Brazil, 8–11 November 2017; pp. 1–6.
15. Jeong, W.; Lee, K.M. CV-SLAM: A new ceiling vision-based SLAM technique. In Proceedings of the 2005 IEEE/RSJ International
Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 3195–3200.
16. Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach.
Intell. 2007, 29, 1052–1067. [CrossRef] [PubMed]
17. Belter, D.; Nowicki, M.; Skrzypczyński, P. Evaluating map-based RGB-D SLAM on an autonomous walking robot. In International
Conference on Automation, 2–4 March 2016, Warsaw, Poland; Springer: Cham, Switzerland, 2016; pp. 469–481.
18. Callmer, J.; Törnqvist, D.; Gustafsson, F.; Svensson, H.; Carlbom, P. Radar SLAM using visual features. EURASIP J. Adv. Signal
Process. 2011, 2011, 71. [CrossRef]
19. Mittal, A.; Shivakumara, P.; Pal, U.; Lu, T.; Blumenstein, M. A new method for detection and prediction of occluded text in
natural scene images. Signal Process. Image Commun. 2022, 100, 116512. [CrossRef]
20. Schlosser, J.; Chow, C.K.; Kira, Z. Fusing lidar and images for pedestrian detection using convolutional neural networks. In
Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016;
pp. 2198–2205.
21. Dhouioui, M.; Frikha, T. Design and implementation of a radar and camera-based obstacle classification system using machine-
learning techniques. J. Real-Time Image Process. 2021, 18, 2403–2415. [CrossRef]
22. López, E.; Barea, R.; Gómez, A.; Saltos, Á.; Bergasa, L.M.; Molinos, E.J.; Nemra, A. Indoor SLAM for micro aerial vehicles using
visual and laser sensor fusion. In Robot 2015: Second Iberian Robotics Conference; Springer: Cham, Switzerland, 2016; pp. 531–542.
23. Li, J.; Zhang, X.; Li, J.; Liu, Y.; Wang, J. Building and optimization of 3D semantic map based on Lidar and camera fusion.
Neurocomputing 2020, 409, 394–407. [CrossRef]
24. Jin, D. Research on Laser Vision Fusion SLAM and Navigation for Mobile Robots in Complex Indoor Environments. Ph.D. Thesis,
Harbin Institute of Technology, Harbin, China, 2020.
25. Valente, M.; Joly, C.; de La Fortelle, A. Deep sensor fusion for real-time odometry estimation. In Proceedings of the 2019 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 6679–6685.
26. Alatise, M.B.; Hancke, G.P. Pose estimation of a mobile robot based on fusion of IMU data and vision data using an extended
Kalman filter. Sensors 2017, 17, 2164. [CrossRef] [PubMed]
27. Xia, X.; Meng, Z.; Han, X.; Li, H.; Tsukiji, T.; Xu, R.; Zheng, Z.; Ma, J. An automated driving systems data acquisition and analytics
platform. Transp. Res. Part C Emerg. Technol. 2023, 151, 104120. [CrossRef]
28. Liu, W.; Xia, X.; Xiong, L.; Lu, Y.; Gao, L.; Yu, Z. Automated vehicle sideslip angle estimation considering signal measurement
characteristic. IEEE Sens. J. 2021, 21, 21675–21687. [CrossRef]
29. Xia, X.; Xiong, L.; Huang, Y.; Lu, Y.; Gao, L.; Xu, N.; Yu, Z. Estimation on IMU yaw misalignment by fusing information of
automotive onboard sensors. Mech. Syst. Signal Process. 2022, 162, 107993. [CrossRef]
30. Wang, K.; Liu, M.; Ye, Z. An advanced YOLOv3 method for small-scale road object detection. Appl. Soft Comput. 2021, 112, 107846.
[CrossRef]
31. Qiu, M.; Huang, L.; Tang, B.H. ASFF-YOLOv5: Multielement detection method for road traffic in UAV images based on multiscale
feature fusion. Remote Sens. 2022, 14, 3498. [CrossRef]
32. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection
on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC,
Canada, 11–17 October 2021; pp. 2778–2788.
33. Liu, W.; Quijano, K.; Crawford, M.M. YOLOv5-Tassel: Detecting tassels in RGB UAV imagery with improved YOLOv5 based on
transfer learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8085–8094. [CrossRef]
Drones 2023, 7, 329 19 of 19
34. Norzam, W.; Hawari, H.; Kamarudin, K. Analysis of mobile robot indoor mapping using GMapping based SLAM with different
parameter. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; Volume 705, p. 012037.
35. Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot.
2017, 33, 1255–1262. [CrossRef]
36. Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for
large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [CrossRef]
37. Xiao, Y. Research on Real-Time Positioning and Mapping of Robots Based on Laser Vision Fusion. Master’s Thesis, University
of Chinese Academy of Sciences (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences), Shenzhen,
China, 2018.
38. Moore, T.; Stouch, D. A generalized extended kalman filter implementation for the robot operating system. In Intelligent
Autonomous Systems 13; Springer: Cham, Switzerland, 2016; pp. 335–348.
39. Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.