Applsci 09 02105 PDF
Applsci 09 02105 PDF
Applsci 09 02105 PDF
sciences
Article
A Simultaneous Localization and Mapping (SLAM)
Framework for 2.5D Map Building Based on
Low-Cost LiDAR and Vision Fusion
Guolai Jiang 1,2,3 , Lei Yin 3 , Shaokun Jin 1,2,3 , Chaoran Tian 1,2 , Xinbo Ma 2
and Yongsheng Ou 1,3,4, *
1 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2 Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences,
Shenzhen 518055, China; cr.tian@siat.ac.cn (C.T.); xb.Ma@siat.ac.cn (X.M.)
3 Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institutes of
Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; sk.jin@siat.ac.cn (S.J.);
gl.jiang@siat.ac.cn (G.J.); lei.yin@siat.ac.cn (L.Y.)
4 The CAS Key Laboratory of Human-Machine-Intelligence Synergic Systems, Shenzhen Institutes of
Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
* Correspondence: ys.ou@siat.ac.cn
Received: 8 March 2019; Accepted: 13 May 2019; Published: 22 May 2019
Abstract: The method of simultaneous localization and mapping (SLAM) using a light detection and
ranging (LiDAR) sensor is commonly adopted for robot navigation. However, consumer robots are
price sensitive and often have to use low-cost sensors. Due to the poor performance of a low-cost
LiDAR, error accumulates rapidly while SLAM, and it may cause a huge error for building a larger
map. To cope with this problem, this paper proposes a new graph optimization-based SLAM
framework through the combination of low-cost LiDAR sensor and vision sensor. In the SLAM
framework, a new cost-function considering both scan and image data is proposed, and the Bag of
Words (BoW) model with visual features is applied for loop close detection. A 2.5D map presenting
both obstacles and vision features is also proposed, as well as a fast relocation method with the map.
Experiments were taken on a service robot equipped with a 360◦ low-cost LiDAR and a front-view
RGB-D camera in the real indoor scene. The results show that the proposed method has better
performance than using LiDAR or camera only, while the relocation speed with our 2.5D map is
much faster than with traditional grid map.
1. Introduction
Localization and navigation are the key technologies of autonomous mobile service robots,
and simultaneous localization and mapping (SLAM) is considered as an essential basis for this.
The main principle of SLAM is to detect the surrounding environment through sensors on the robot,
and to construct the map of the environment while estimating the pose (including both location
and orientation) of the robot. Since SLAM was first put forward in 1988, it was growing very fast,
and many different schemes have been formed. Depending on the main sensors applied, there are two
mainstream practical approaches: LiDAR (light detection and Ranging)-SLAM and Visual-SLAM.
1.1. LiDAR-SLAM
LiDAR can detect the distance of the obstacles, and it is the best sensor to construct a grid map,
which represents the structure and obstacles on the robot running plane. The early SLAM research
often used LiDAR as the main sensor. Extended Kalman filter (EKF) was applied to estimate the
pose and of the robot [1], but the performance was not ideal. For some strong nonlinear systems,
this method will bring more truncation errors, which may lead to inaccurate positioning and mapping.
Particle filter approaches [2,3] were introduced because they can effectively avoid the nonlinear
problem, but it also leads to the problem of increasing the amount of calculation with the increase of
particle number. In 2007, Grisetti proposed a milestone of LiDAR-SLAM method called Gmapping [2],
based on improved Rao-Blackwellized particle filter (RBPF), it improves the positioning accuracy
and reduces the computational complexity by improving the proposed distribution and adaptive
re-sampling technique.
As an effective alternative to probabilistic approaches, optimization-based methods are popular
in recent years. In 2010, Kurt Konolige proposed such a representative method called Karto-SLAM [3],
which uses sparse pose adjustment to solve the problem of matrix direct solution in nonlinear
optimization. Hector SLAM [4] proposed in 2011 uses the Gauss-Newton method to solve the problem
of scanning matching, this method does not need odometer information, but high precision LiDAR is
required. In 2016, Google put forward a notable method called Cartographer [5], by applying the laser
loop closing to both sub-maps and global map, the accumulative error is reduced.
1.2. Visual-SLAM
Using visual sensors to build environment map is another hot spot for robot navigation. Compared
with LiDAR-SLAM, Visual-SLAM is more complex, because image carries too much information,
but has difficulty in distance measurement. Estimating robot motion by matching extracted image
features under different poses to build a feature map is a common method for Visual-SLAM.
Mono-SLAM [6], proposed in 2007, is considered the origin of many Visual-SLAM. Extended
Kalman filter (EKF) is used as the back-end to track the sparse feature points in the front-end.
The uncertainty is expressed by a probability density function. From the observation model and
recursive calculation, the mean and variance of the posterior probability distribution are obtained.
Reference [7] used RBPF to realize visual-SLAM. This method avoids the problem of non-linear and has
high precision, but it needs a large number of particles, which increase the computational complexity.
PTAM [8] is a representative work of visual-SLAM, it proposed a simple and effective method to extract
key frames, as well as a parallel framework of a real-time tracking thread and a back-end nonlinear
optimization mapping thread. It is the first time to put forward the concept of separating the front and
back ends, leading the structure design of many SLAM methods.
ORB-SLAM [9] is considered as a milestone of visual-SLAM. By applying Oriented FAST and
Rotated BRIEF (ORB) features and bag-of-words (BOW) model, this method can create a feature
map of the environment in real-time stably in many cases. Loop detection and closing via BOW is
an outstanding contribution of this work, it effectively prevents the cumulative error and can be quickly
retrieved after the tracking lost.
In recent years, different from feature-based methods, some direct methods of visual-SLAM were
explored, by estimating robot motion through pixel value directly. Dense image alignment based on
each pixel of the images proposed in Reference [10] can build a dense 3D map of the environment.
The work in Reference [11] built a semi-dense map by estimating the depth values of pixels with
a large gradient in the image. Engel et al. proposed LSD-SLAM (Large-Scale Direct Monocular
SLAM) algorithm [12]. The core of LSD-SLAM algorithm is to apply a direct method to semi-dense
monocular SLAM, which is rarely seen in the previous direct method. Forster et al. proposed SVO
(Semi-Direct Monocular Visual Odometry) [13] in 2014, called “sparse direct method”, which combines
feature points with direct methods to track some key points (such as corners, etc.), and then estimates
the camera motion and position according to the information around the key points. This method
runs fast for Unmanned Aerial Vehicles (UAV), by adding special constraints and optimization to
such applications.
Appl. Sci. 2019, 9, 2105 3 of 17
RGB-D camera can provide both color and depth information in its view field. It is the most
capable sensor for building a complete 3D scene map. Reference [14] proposes Kinect fusion method,
which uses the depth images acquired by Kinect to measure the minimum distance of each pixel in
each frame, and fuses all the depth images to obtain global map information. Reference [15] constructs
error function by using photo-metric and geometric information of image pixels. Camera pose is
obtained by minimizing the error function. Map problem is treated as pose graph representation.
Reference [16] is a better direct RGB-D SLAM method. This method combines the intensity error
and depth error of pixels as error functions, and minimizes the cost function to obtain the optimal
camera pose. This process is implemented by g2o. Entropy-based key frame extraction and closed-loop
detection method are both proposed, thus greatly reducing the path error.
2. SLAM
2. SLAM Framework
Framework Based
Based on
on Graph
Graph Optimization
Optimization
A graph-based
A graph-based SLAM framework
framework could
could be
bedivided
dividedinto
intotwo
twoparts:
parts:Front-end and
Front-end back-end,
and as
back-end,
shown
as in in
shown Figure 1 below.
Figure 1 below.
Pose Update
Front-End
Constrains Back-End
Sensor
Data Input (Pose Estimation) (Optimization)
Poses
Figure1.1. A
Figure A general
general graph-based
graph-basedlight
lightdetection
detectionand
andranging
ranging(SLAM)
(SLAM)framework.
framework.
The
The front-end
front-end is
is mainly
mainly used
used toto estimate
estimate the the position
position andand pose
pose of
of the
the robot
robot byby sensor
sensor data.
data.
However, the observed data contain varying degrees of noise whether in images
However, the observed data contain varying degrees of noise whether in images or in laser. or in laser. Especially
for low-cost for
Especially LiDAR and cameras.
low-cost LiDAR The andnoise will lead
cameras. Thetonoise
cumulative errors
will lead toincumulative
pose estimation,
errorsand
in such
pose
errors will increase with time. Through filtering or optimization algorithms, the back-end
estimation, and such errors will increase with time. Through filtering or optimization algorithms, part can
eliminate the cumulative errors and improve the positioning and map accuracy.
the back-end part can eliminate the cumulative errors and improve the positioning and map
In this paper, graph optimization is used as the back-end, and the error is minimized by searching
accuracy.
the descending gradient
In this paper, graphthrough nonlinear
optimization optimization.
is used Graph and
as the back-end, optimization
the error simply describes
is minimized by
an optimization problem in the form of a graph. The node of the graph represents
searching the descending gradient through nonlinear optimization. Graph optimization simply the position and
attitude,
describesandanthe edge represents
optimization the constraint
problem in the form relationship
of a graph.between the position
The node and the
of the graph attitude and
represents the
the observation. The sketch map of graph optimization is shown in Figure 2.
position and attitude, and the edge represents the constraint relationship between the position and
the attitude and the observation. The sketch map of graph optimization is shown in Figure 2.
Appl. Sci.
Appl. Sci. 2019,
2019, 9,
9, 2105
x FOR PEER REVIEW 55 of
of 17
17
As shown in Figure 3, for the feature point pairs p1 and p 2 in adjacent frames, a real-world
p1 could
spatial point P corresponding to Figure 3. Thebe localized at
re-projection first, and then P is re-projected onto
error.
the latter frame to form the feature point
Figureposition p̂ 2 of theerror.
3. The re-projection image. Due to the error of position and
According to Figure 3, the re-projection errors could be calculated as follows:
poseFirstly,
estimation, the the
existence 0 , Y0 , p
of p̂ 2 Xand is not
20 ] of a coincidence. point
0 (corresponding The distance betweencoordinate,
these two
As showncalculate
in Figure for the [feature
3D3,position Zpoint Ppairs of P) inframes,
p1 and p 2 in adjacent camera a real-world
points could
through be denoted
the pose as a re-projection
and transformation relationerror. [R,t]: Pure vision SLAM usually extracts and matches
feature point P
spatial points, corresponding
approaches to p[29]
like EPnP 1 could
can be be applied
localizedtoatestimate
first, and posePtransformation
thethen is re-projected
( Ronto
, t)
the latter frame to form the feature
of adjacent frames. Where R represents
0
point
P = RP position
the+rotation p̂ 2 , matrix,
t = [X 0 of 0 the T
0 image. Due to the error of position and
Y , Z ] . and t represents the transformation (1)
pose estimation, the existence of p̂ 2 and p 2 is not a coincidence. The distance between these two
matrix.
Then, get the normalized
According coordinate Pc : errors could be calculated as follows:
points could betodenoted
Figure 3,asthe re-projection
a re-projection error. Pure vision SLAM usually extracts and matches
Firstly, calculate the 3D position
feature points, approaches likePEPnP [ X
[29] can ' ,TY be
' , Zapplied
' ]0 of0 P’to
0 (corresponding
estimate
0 T the posepoint of P) in camera
transformation ( R(2)
, t)
c = [uc , vc , 1] = [X /Z , Y /Z , 1] .
coordinate, through the pose and transformation relation [ R, t ] :
of adjacent frames. Where R represents the rotation matrix, and t represents the transformation
matrix.
Based on camera parameters ( fx , f y , cx , c y ), the re-projected Tpixel coordinate p̂2 = (us , vs ) could be
According
calculated as:
P' = RP +errors
to Figure 3, the re-projection
t = [X ',Y ', Z ' ] .
could be calculated as follows:
(1)
(
Firstly, calculate the 3D position [ Xus' ,= Y 'f,xZuc' ]+ of
cx P’ (corresponding point of P) in camera
Then, get the normalized coordinate Pc : . (3)
coordinate, through the pose and transformation vs = f y relation
vc + c y [ R, t ] :
Pc = [uc , vc ,1]T = [ X ' / Z ' , Y ' / ZT ' ,1]T . (2)
P' = RP + t = [ X ' , Y ' , Z ' ] . (1)
u s = f x uc + c x
. (3)
vs = f y vc + c y
Appl. Sci. 2019, 9, 2105 6 of 17
Finally, if there are n feature points matched between two frames, the re-projection error is:
n
1 between two
1 =
Finally, if there are n feature pointsematched
2
|| pi − pˆ i ' ||22 . frames, the re-projection error is: (4)
i =1
n
1 X 0 2
Compared with the visual error, the e =
1 laser scanpi − p̂matching
i 2
. error is much simpler to obtain. (4)
2
i = 1
LiDAR-SLAM usually needs scan matching to realize the estimation of pose transformation of
adjacent frames. with
Compared The estimated error, the laser( Rscan
the visualtransformation , t ) cannot
matching guarantee
error isthat all the
much laser data
simpler in the
to obtain.
previous frameusually
LiDAR-SLAM completely
needs coincide with to
scan matching therealize
laser the
dataestimation
in the latter frame
of pose according toofthe
transformation pose
adjacent
transformation. The error
frames. The estimated could be calculated
transformation as:
(R, t) cannot guarantee that all the laser data in the previous
frame completely coincide with the laser data n in the latter frame according to the pose transformation.
2
The error could be calculated as: e2 = n M i − ( RSi + t ) 2 , (5)
i =1
X
2
e2 = kMi − (RSi + t)k2 , (5)
where n is the number of matched scan points, M is the source scan point set and S is the scan
i = 1
point set
where n isofthe
annumber
adjacentofframe.
matched scan points, M is the source scan point set and S is the scan point set
With multiple
of an adjacent frame. adjacent frame pairs get by Visual-SLAM or LiDAR-SLAM, the united
re-projections errors
With multiple of different
adjacent frameframe pairs
pairs get bycould be minimized
Visual-SLAM for optimizing
or LiDAR-SLAM, multiple
the united poses.
re-projections
errors of different frame pairs could be minimized for optimizing multiple poses.
3. The SLAM Framework of Low-Cost LiDAR and Vision Fusion
3. The SLAM Framework of Low-Cost LiDAR and Vision Fusion
Base on the framework and key parts introduced in section 2, this paper proposes a new SLAM
Base onof
framework thelow-cost
framework and key
LiDAR andparts introduced
vision fusion. in
In Section 2, this paper
the framework, forproposes a new SLAM
the back-end graph
framework of low-cost
optimization, LiDARerror
a new united and vision fusion.
function In the framework,
of combining for the
visual data back-enderror
matching graph optimization,
and laser data
a new united
matching error
error is function of combining
introduced. Meanwhile,visual
in data
ordermatching
to solveerror
the and laser data
problem matching
of loop error of
detection is
introduced. Meanwhile, in order to solve the problem of loop detection of traditional
traditional LiDAR-SLAM, a loop detection method is added, by introducing and visual data and LiDAR-SLAM,
a loop detection
bag-of-words. method
The is added,
integrated SLAM by framework
introducingcombining
and visualwith
data laser
and bag-of-words.
and vision data Theisintegrated
shown in
SLAM framework
Figure 4 below: combining with laser and vision data is shown in Figure 4 below:
In the
In the framework,
framework,bothboththe
thelaser
laserdata
data and image
and imagedata cancan
data get get
the robot posepose
the robot estimation. For laser
estimation. For
data, scan-to-scan or scan-to-map methods can be applied to estimate current robot pose.
laser data, scan-to-scan or scan-to-map methods can be applied to estimate current robot pose. For For image
data, we
image canwe
data, usecan
the use
ORBthefeature
ORB for image
feature forfeature
imagedetection and generate
feature detection the bag ofthe
and generate words.
bag ofMethods
words.
like EPnPlike
Methods [24]EPnP
could[24]
be applied
could betoapplied
estimate tothe pose transformation
estimate of adjacent
the pose transformation offrames.
adjacent frames.
Considering two adjacent robot poses with an initial guess of translation and rotation parameters
[R,t], which could get through scan matching, image feature matching or even encoder data individually.
Based on the error formulas introduced in Section 2, the united error function could be expressed as:
m
n
1
X
X
e=β C(RPi−1 + t))
+ (1 − β)α
p i −
(P j − (RP j−1 + t))
, (6)
Zi−1
i=1 j=1
where m is the number of matched features; n is the number of corresponding scan points; C is the
camera parameter matrix;pi is the image position of feature; P j is the scan point position; α is a parameter
for unit conversion of the image pixel error and distance error, it is mainly judged by camera resolution
and laser ranging range; β is the parameter for balancing the two measurements, it takes value between
(0,1). It is mainly judged by the two sensors’ reliability and precision, for example, if the robot is
working in an environment full of image features, β can be larger so that we consider visual input more
important. In contrast, if the scene is better for laser processing, β can be smaller.
(R, t) can be written in an algebra form, and the Lie Algebraic transformation formula is:
RP + t = exp(ξ̂)P. (7)
It is a function with the variable, ξ. For K pairs of multiple poses with their transformation
relations, the total error could be written as:
K
X K
X
E= ei = Fi (ξi ), (9)
i=1 i=1
and such error can be minimized to find better transformations and rotations between multiple poses.
Based on the Lie Algebraic transformation set ξ, robot pose set x is related to transformation set of
ξ by:
x = h(ξ). (11)
As a result, the total error of K pairs of related poses could be rewritten as:
K
X K
X
E= ei = Fi (ξi ) = f (x). (13)
i=1 i=1
Appl. Sci. 2019, 9, 2105 8 of 17
The target of pose graph optimization is to find a proper set x, so that the error f (x) is minimized.
Correspondingly, ∆x is the increment of the x of the global independent variable. Therefore, when the
increment is added, the objective function is:
k
1
2 1 X
2
f (x + ∆x)
≈
f (xt ) + Jt ∆xt
, (14)
2 2
t=1
where J represents the Jacobian matrix, which is the partial derivative of the cost function to the
variables, k is the number of poses to be optimized, and the number of postures between the current
frame and the loop frame is the global optimization.
Position and pose optimization can be regarded as the least square problem. The commonly used
methods to solve the least square problem are Gradient Descent method, Gauss Newton method [30]
and Levenberg-Marquadt (L-M) method [31]. The L-M method is a combination of Gadient Descent
method and Gauss Newton method. This paper uses the L-M method to solve the least square problem.
According to L-M method and formula (14), in each iteration step, we need to find:
1
2
2
min
f (xi ) + J (xi )∆xi
s.t.
I∆xi
≤ µ, (15)
∆xi 2
where µ is the radius of the searching area (here we judge the area is a space sphere) in each step. The
problem could be changed to solve the following incremental equation:
where J represents the Jacobian matrix, and λ is the Lagrange multiplier. As in each iteration step, x is
known, let H = J (x)T J (x) and g = −J (x)T f (x), it could be rewritten as:
(H + λI)∆x = g. (17)
As L-M method could be regarded as a combination of Gadient Descent method and Gauss
Newton method in the formula (17), it could be more robust to get a reliable ∆x even if H is not
invertible. By solving the above linear equation, we can obtain ∆x, and then iterate ∆x to obtain
x corresponding to the minimum objective function, that is robot pose. Generally speaking, the
dimension of H is very large, and the complexity of matrix inversion is O(n3 ). However, because H
contains the constraints between poses, and only the adjacent posed have direct constraints, most of
the elements are 0. H is sparse and shaped like an arrow. The calculation of (17) could be accelerated.
In practice, the pose graph optimization problem could be solved through tools like g2o [32] or
ceres [33], with given error function and initial value. In our work, we apply g2o to solve this problem.
The error function is formula (8), and the initial value of x is obtained mainly through scan matching.
frame selection mechanism is as follows: (1) It has passed 15 frames since the last global relocation;
(2) the distance from the previous key-frame has been 15 frames. (3) key frames must have tracked at
least 50 three-dimensional feature points.
Among them, (1) (2) is the basis of its uniqueness, because in a short period of time the
characteristics of the field of vision will not change significantly (3) to ensure its robustness, too few
map points will lead to an uneven calculation error.
For each key-frame, the ORB image features and their visual words in BoW dictionary are
calculated and saved as a bag of features. As a result, a key-frame stores the robot pose and bag
of features, as well as the scan data obtained in that pose. It should be denoted that the key-frame
here is applied for loop detection or relocation, the robot pose of each key-frame is updated after
global optimization.
During the loop detection procedure, for an incoming image frame, the bag of feature is calculated
and matched with all possible previous key-frames (possible means the key-frame should be a period
of time before, and close enough in an estimated pose during SLAM procedure). When the current
image frame is sufficiently similar to a key frame image by match the bag of feature, closing loop
possibility is considered. Then, the current frame and that key frame are matched by both the scan
data and image data, and the error in formula (6) is also calculated. If the matching error is small
enough, the loop is confirmed and the matching result is added as constraints to the graph optimization
framework, so that the accumulative errors can be eliminated.
where m g (x, y) denotes the possibility if the grid (x, y) is occupied. Generally, the value of m g (x, y) can
be 1 (the grid (x, y) is occupied) or 0 (the grid (x, y) is not occupied).
Feature map is another kind map generated by most feature-based Visual-SLAM approaches, it
can be represented as:
M f eature = { f (x, y, z)}, (19)
where f (x, y, z) denotes that on the world position (x, y, z), there is a feature f (x, y, z), for real
applications, f (x, y, z) could be a descriptor to the feature in a dictionary like BoW.
While Visual-SLAM processing, key-frames with bag-of-feature description and poses at where
they were observed are usually stored as an accessory for feature maps. For relocation applications,
searching and matching in key-frames can greatly improve the processing speed and reliability.
As a result, a feature map is a sparse map which only has value on the position which has features.
This makes a feature map is reliable for localization, but not good for navigation and path-planning.
Figure 5. The
Figure5. The expression
expression of
of the
the proposed
proposed2.5D
2.5Dmap.
map.
In
Asthe map,in
shown inFigure
order to
5, simplify the combing
a 2.5D Map calculation,
with the resolution
dense obstacleis uninformed
representation byon
thethe
grid2Dmap,
grid
for example,
plane the smallest
and sparse featuresgrid cell could bein5 3D
representation × 5 cmabove
cm space × 5 cm.
theInplane
that case,
couldthere may be multiple
be obtained, through
feature points with
the combination anddifferent feature with
optimization valueLiDAR
detected
andin camera
one cell.sensors.
That situation
A list ofiskey-frames
also considered
with in the
poses
map representation.
and visual words extracted while SLAM processing is also attached to the map.
In the map, in order to simplify the calculation, the resolution is uninformed by the grid map,
4.3. Relocation with Proposed 2.5D Map
for example, the smallest grid cell could be 5 cm × 5 cm × 5 cm. In that case, there may be multiple
In real applications, although the robot already has the environment map, it may usually lose its
location. Such a situation happens when the robot starts on an unknown position, or was kidnapped
by a human (blocking the sensors and carry it away) while working. Relocation is required, and its
speed and the successful rate have a great influence on the practicability of a mobile robot system.
Currently, for the grid map created by LiDAR-SLAM approaches, Montecalo and particle
filter-based method is wildly applied to find the robot pose. However, because the scan data carries too
little unique information of the environment, it may take a long time for the robot to find its location.
On the other hand, image information is rich enough for fast place finding. In this paper, with the aid
of image features and BoW, a fast relocation algorithm with our map is as follows:
Currently, for the grid map created by LiDAR-SLAM approaches, Montecalo and particle
filter-based method is wildly applied to find the robot pose. However, because the scan data carries
too little unique information of the environment, it may take a long time for the robot to find its
location. On the other hand, image information is rich enough for fast place finding. In this paper,
with the aid of image features and BoW, a fast relocation algorithm with our map is as follows:
Appl. Sci. 2019, 9, 2105 11 of 17
Firstly, extract current image features, calculate the bag-of-features (visual words) with BoW;
Secondly, List all the previous key-frames with poses in map which shares visual words with
current image,
Firstly, andcurrent
extract find n image
best match key-frames
features, calculate with ranking score (visual
the bag-of-features through BoW with
words) searching;
BoW;
Thirdly, for
Secondly, Listnall
best
thematch key-frames,
previous key-framesletwith
their poses
poses in as
mapinitial
which guess forvisual
shares particles;
words Finally, apply
with current
particleand
image, filter
findbased method
n best to find bestwith
match key-frames robot pose, score
ranking the error function
through BoW could be formula
searching; Thirdly,(6)
forin n
section 3.
best match key-frames, let their poses as initial guess for particles; Finally, apply particle filter based
method to find best robot pose, the error function could be formula (6) in Section 3.
5. Experiment
5. Experiment
The experiment is divided into three parts. In the first part, a comparative experiment of
fixed-point positioning
The experiment accuracy
is divided isthree
into carried out In
parts. inthe
a small rangea comparative
first part, of scenes. The traditionaloflaser
experiment SLAM
fixed-point
based on graph optimization, the visual SLAM based on orb feature point extraction
positioning accuracy is carried out in a small range of scenes. The traditional laser SLAM based and the laser
vision
on method
graph proposedthe
optimization, in visual
this paper
SLAM is used
based toon
collect positioning
orb feature pointdata. In the and
extraction second
the part,
laser avision
large
scene loop
method experiment
proposed in thisispaper
carried out totoverify
is used collectwhether the proposed
positioning data. In themethod
secondcan effectively
part, solveloop
a large scene the
problem of map
experiment closure
is carried outintolaser
verifySLAM.
whetherIn the
the last part, we
proposed load the
method canbuilt map for
effectively the the
solve re-localization
problem of
experiment.
map closure in laser SLAM. In the last part, we load the built map for the re-localization experiment.
The LiDAR is RPLIDAR-A2 produced by SLAMTEC company. It is a low-cost LiDAR for service
robotics, which has a 360◦ coverage field and range of up to 8 m. The key parameters are listed in
Table 1. It should be noted that the parameters are taken under the ideal situation, and being a low-cost
LiDAR, the data collected from it is usually much poorer than expensive ones in a real scene.
The RGB-D camera is Xtion-pro produced by ASUS company. The effective range of depth
measurement is 0.6–8 m, the precision is 3 mm, and the angle of view of the depth camera can reach
horizontal 58◦ and vertical 45.5◦ .
With the robot platform, we have collected several typical indoor databases in the “rosbag” format,
which is easy to read for ROS. For each database, it contains sensor data obtained by the robot while it is
running in a real scene, including odom data, laser scan, color image and depth image. Figure 7 shows
The RGB-D camera is Xtion-pro produced by ASUS company. The effective range of depth
measurement is 0.6–8 m, the precision is 3 mm, and the angle of view of the depth camera can reach
horizontal 58°and vertical 45.5°.
With the robot platform, we have collected several typical indoor databases in the “rosbag”
format, which
Appl. Sci. 2019, is easy to read for ROS. For each database, it contains sensor data obtained by
9, 2105 the
12 of 17
robot while it is running in a real scene, including odom data, laser scan, color image and depth
image. Figure 7 shows one example of the database. Where Figure 7a displays the robot in the
one example ofFigure
environment; the database.
7b shows Where
the Figure 7a displays
scan data theinrobot
obtained in the environment;
the place; Figure
Figure 7c is the RGB7b image;
shows
the scan data obtained
Figure 7d is the depth image. in the place; Figure 7c is the RGB image; Figure 7d is the depth image.
(a) (b)
(c) (d)
Figure 7. One example
example of
of the
thedatabase:
database:(a)
(a)The
Therobot
robotininthe
theenvironment;
environment;(b)(b)
scan data
scan obtained
data in the
obtained in
scene;
the (c) the
scene; RGBRGB
(c) the image; (d) the
image; (d) depth image.
the depth image.
Different experiments
Different experimentswere
weretaken
takentotoevaluate
evaluatethe performance
the performance of mapping
of mapping andand
relocation. For For
relocation. the
mapping, the methods for comparison include Karto-SLAM [3], orb-SLAM [9] and this
the mapping, the methods for comparison include Karto-SLAM [3], orb-SLAM [9] and this paper. paper. For the
relocation
For with s map,
the relocation with we compared
s map, our method
we compared ourwith the Adaptive
method with theMonte CarloMonte
Adaptive Localization
Carlo
(AMCL) [34]. In the experiments, the parameter β
Localization (AMCL) [34]. In the experiments, the parameter β of formula (6) and (8) is set to
of formulas (6) and (8) is set to 0.2, by considering
the precision and reliability of the two main sensors in the environment.
0.2, by considering the precision and reliability of the two main sensors in the environment.
5.2. Experiment of Building the Map
5.2. Experiment of Building the Map
Firstly, in order to evaluate the positioning accuracy while mapping, we manually marked
6 positions in a real scene, as shown in Figure 8. The start point position 0 is specified as the original
point (0,0) in the world coordinate system. As shown in Figure 8, the robot started from position 0 and
stopped at position 5, and the blue arrow in each picture is the robot’s moving direction. Table 2 shows
the result of positioning in the SLAM system from position 1 to position 5.
Firstly, in order to evaluate the positioning accuracy while mapping, we manually marked 6
positions in a real scene, as shown in Figure 8. The start point position 0 is specified as the original
point (0,0) in the world coordinate system. As shown in Figure 8, the robot started from position 0
and stopped
Appl. Sci. 2019, 9,at position 5, and the blue arrow in each picture is the robot’s moving direction.13
2105 Table
of 17
2 shows the result of positioning in the SLAM system from position 1 to position 5.
As can
canbebeseen
seenfrom Table
from 2, for2,both
Table for methods, the initial
both methods, the error of error
initial positioning is small. However,
of positioning is small.
while the robot was running further, the error increases with distance. Sooner or later,
However, while the robot was running further, the error increases with distance. Sooner or later, such error will
be
suchtooerror
largewill
to bebeignored,
too largeandtofinally cause failure
be ignored, to mapping.
and finally Such atosituation
cause failure mapping. may become
Such worth
a situation
for
maya become
low-costworth
LiDAR. forThat’s why we
a low-cost need That’s
LiDAR. a loop why
closing
wemodule for thr
need a loop SLAMmodule
closing system.for thr SLAM
system.
Table 2. The positioning result while Simultaneous Localization and Mapping (SLAM).
Table 2. The positioning result while Simultaneous Localization and Mapping (SLAM).
Position Number Real Position (m) Karto-SLAM (m) Our Method (m)
Position
1 Number Real Position
(3,0) (m) Karto-SLAM (m)
(2.995,−0.002) Our (2.994,−0.003)
Method (m)
2 1 (3,0)
(6,0) (2.995,−0.002)
(5.987,−0.003) (2.994,−0.003)
(5.990,−0.005)
3 2 (6,0)
(6,−8) (5.987,−0.003)
(6.025,−8.032) (5.990,−0.005)
(6.019,−8.021)
(6.025,−8.032) (6.019,−8.021)
4 3 (6,−8)
(6,−16) (6.038,−15.954) (6.028,−15.973)
(6.038,−15.954) (6.028,−15.973)
5 4 (6,−16)
(3,−16) (2.949,−15.946) (2.951,−15.965)
5 (3,−16) (2.949,−15.946) (2.951,−15.965)
We
We diddidfurther
furthermapping
mapping experiment in a real
experiment in scene
a realcompared with Karto-SLAM
scene compared and Orb-SLAM.
with Karto-SLAM and
The result is shown
Orb-SLAM. in Figure
The result 9. Figure
is shown 9a is the
in Figure result of
9. Figure 9aKarto-SLAM, theKarto-SLAM,
is the result of blue points are thethe estimated
blue points
robot
are theposition
estimated during
robotSLAM.
positionBecause
during of the growing
SLAM. Because cumulative
of the growingerror, the starting
cumulative point
error, the and the
starting
ending
point and point
the in the red
ending coilincannot
point the redbecoil
fused together,
cannot be fusedthe together,
mappingthe failed. Figure
mapping 9b isFigure
failed. the feature
9b is
map constructed
the feature by orb-SLAM.
map constructed It should be
by orb-SLAM. denoted
It should bethat orb-SLAM
denoted is easy toislose
that orb-SLAM easytrack, and
to lose the
track,
robot should run and turn very slow while building the map. When losing track,
and the robot should run and turn very slow while building the map. When losing track, it needs a it needs a manual
operation to turn back
manual operation to to findback
turn previous key-frames.
to find previous As a result, orb-SLAM
key-frames. can’torb-SLAM
As a result, work directly with
can’t our
work
recorded rosbag
directly with ourdatabases. Figure 9c
recorded rosbag is the gridFigure
databases. map part
9c isofthe
thegrid
proposed
map partmethod,
of thethe greed points
proposed method,are
the robot
greedpositions
points areestimated
the robotduring SLAM.estimated
positions Because ofduring
the effective
SLAM.loop detection
Because andeffective
of the optimization,
loop
the map isand
detection much better than compared
optimization, the map ismethods. The than
much better final 2.5D map showing
compared methods.bothThegridfinaland
2.5Dfeature
map
point
showingis shown in Figure
both grid 10.
and feature point is shown in Figure 10.
Appl. Sci. 2019, 9, 2105 14 of 17
Appl.
Appl. Sci.
Sci. 2019,
2019, 9,
9, xx FOR
FOR PEER
PEER REVIEW
REVIEW 14
14 of
of 17
17
(a)
(a) (b)
(b) (c)
(c)
Figure
Figure 9.
9. Comparison
Comparison
Comparison of of map
ofmap constructed
mapconstructed
constructedbyby
by different
different
different methods:
methods:
methods: (a) the
the result
(a) result
(a) the result of
of Karto-SLAM;
(b) the(b)
Karto-SLAM;
of Karto-SLAM; (b) the
the
result
result
of of orb-SLAM;
orb-SLAM;
result (c) the
(c) the result
of orb-SLAM; result
(c) theofresult of the
the proposed proposed method
method (showing
of the proposed (showing only
only grid map).
method (showing grid map).
only grid map).
Figure
Figure 10. The result
10. The
The result of
result of the
of the proposed
the proposed method
method showing
showing feature position and
feature position and grid
grid map.
map.
5.3. Experiment
5.3. Experiment of Relocation
Relocation
5.3. Experiment of of Relocation
Finally, with
Finally, the 2.5D map build by the proposed method,
method, relocation experiment
experiment is carried
carried out.
Finally, with
with the the 2.5D
2.5D map
map build
build byby the
the proposed
proposed method, relocation relocation experiment is is carried out.
out.
The
The comparing methods are the Adaptive Monte Carlo Localization method (with LiDAR and grid
The comparing
comparing methods
methods are are the
the Adaptive
Adaptive Monte
Monte CarloCarlo Localization
Localization methodmethod (with(with LiDAR
LiDAR and and grid
grid
map) and
map) and orb-SLAM localization mode (with camera and feature map).
map) and orb-SLAM
orb-SLAM localization
localization mode
mode (with
(with camera
camera and and feature
feature map).
map).
From
From the practical point of view of the robot, when the position isposition
lost and after the interference is
From the practical point of view of the robot, when the
the practical point of view of the robot, when the position is is lost
lost and
and after
after the
the
removed, theisshorter
interference the relocation time of the robot is, the better. which the
is also the reason why rapid
interference is removed,
removed, the the shorter
shorter the
the relocation
relocation timetime of of the
the robot
robot is,
is, the better.
better. which
which isis also
also the
the
relocation
reason why is needed.
rapid In our
relocationexperiment,
is needed. considering
In our the practicability
experiment, and
considering convenience
the of the
practicability robot,
and
reason why rapid relocation is needed. In our experiment, considering the practicability and
we artificially set
convenience the relocation time threshold to 30 s.
convenience of of the
the robot,
robot, wewe artificially
artificially set
set the
the relocation
relocation timetime threshold
threshold to to 30
30 s.
s.
During
During the experiment, the robot is put in a real scene in random positions, with or without
During the experiment, the robot is put in a real scene in random positions, with
the experiment, the robot is put in a real scene in random positions, with oror without
without
manually
manually given initial poses. With running different localization algorithms, the robot is driven
manually givengiven initial
initial poses.
poses. With
With running
running different
different localization
localization algorithms,
algorithms, the the robot
robot is is driven
driven
randomly
randomly in the environment until it finds its correct location. Finally, if time exceeds the given
randomly in in the
the environment
environment untiluntil it
it finds
finds itsits correct
correct location.
location. Finally,
Finally, if
if time
time exceeds
exceeds thethe given
given
threshold (30
threshold (30 s)s)ororconverges
convergestotoa wrong
a wrong position, we judge itan
asunsuccessful
an unsuccessful relocation try.
threshold (30 s) or converges to a wrong position, we judge it as an unsuccessful relocation try. The
position, we judge it as relocation try. The
The successful
successful rate of relocation and average time consumed (if succeed) are shown in Table 3.
successful rate
rate ofof relocation
relocation and
and average
average time
time consumed
consumed (if (if succeed)
succeed) are are shown
shown in in Table
Table 3.
3.
Table
Table 3.
3. The
The relocation
relocation results
results of
of aa different
different method.
method.
With
With Given
Given Initial
Initial Pose
Pose Without
Without Given
Given Initial
Initial Pose
Pose
method
method Map
Map and
and Sensor
Sensor Success Rate
Success Rate Avg.
Avg. Time(s)
Time(s) Success Rate
Success Rate Avg.
Avg. Time(s)
Time(s)
Grid
Grid map
map with
with
AMCL
AMCL [34]
[34] 95%
95% 6.2
6.2 30%
30% 20.8
20.8
LiDAR
LiDAR
Orb-SLAM
Orb-SLAM Feature
Feature map
map -- -- 88%
88% 5.8
5.8
Localization
Localization [9]
[9] with
with camera
camera
Appl. Sci. 2019, 9, 2105 15 of 17
As can be seen from Table 3. The AMCL approach needs given initial pose for fast relocation, or it
may usually fail in limited time. The Orb-SLAM adopted the bag of words approach, which is capable
of fast global relocation, the relocation failure happens in case of a similar scene in vision. Relocation
method with our proposed map is faster and higher in successful rate than compared methods, because
both laser scan and image are considered.
Author Contributions: Conceptualization, S.J., G.J. and L.Y.; methodology, G.J., S.J.; software, G.J., Lei Yin;
validation, L.Y., C.T.; formal analysis, S.J.; data acquisition, L.Y.; writing—original draft preparation, G.J.;
writing—review and editing, X.M.; supervision, Y.O.; project administration, Y.O.
Funding: This work was supported by the National Natural Science Foundation of China (Grants No. U1613210),
Guangdong Special Support Program (2017TX04X265), Primary Research & Development Plan of Guangdong Province
(2019B090915002), Shenzhen Fundamental Research Program (JCYJ20170413165528221, JCYJ2016428154842603).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Smith, R.; Self, M.; Cheeseman, P. Estimating uncertain spatial relationships in robotics. In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA), Taipei, Taiwan, 14–19 September 2003;
pp. 435–461.
2. Grisetti, G.; Stachniss, C.; Burgard, W. Improved Techniques for Grid Mapping with Rao-Blackwellized
Particle Filters. IEEE Trans. Robot. 2007, 23, 34–46. [CrossRef]
3. Konolige, K.; Grisetti, G.; Kümmerle, R.; Burgard, W.; Limketkai, B.; Vincent, R. Efficient Sparse Pose
Adjustment for 2D mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots
and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 22–29.
4. Kohlbrecher, S.; Von Stryk, O.; Meyer, J.; Klingauf, U. A flexible and scalable SLAM system with full 3D
motion estimation. In Proceedings of the IEEE International Symposium on Safety, Security, and Rescue
Robotics, Kyoto, Japan, 1–5 November 2011; pp. 155–160.
5. Hess, W.; Kohler, D.; Rapp, H.; Andor, D. Real-time loop closure in 2D LIDAR SLAM. In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016;
pp. 1271–1278.
Appl. Sci. 2019, 9, 2105 16 of 17
6. Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM.
IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052. [CrossRef] [PubMed]
7. Sim, R.; Elinas, P.; Griffin, M. Vision-based SLAM using the Rao-Blackwellised particle filter. IJCAI Workshop
Reason. Uncertain. Robot. 2005, 9, 500–509.
8. Klein, G.; Murray, D. Parallel Tracking and Mapping for Small AR Workspaces. In Proceedings of the IEEE
and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007;
pp. 1–10.
9. Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System.
IEEE Trans. Robot. 2015, 31, 1147–1163. [CrossRef]
10. Jan, S.; Gumhold, S.; Cremers, D. Real-Time Dense Geometry from a Handheld Camera. In Dagm Conference
on Pattern Recognition; Springer: Berlin, Germany, 2010; pp. 11–20.
11. Engel, J.; Sturm, J.; Cremers, D. Semi-dense Visual Odometry for a Monocular Camera. In Proceedings of the
IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1449–1456.
12. Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In Computer Vision,
Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer:
Cham, Switzerland, 2014; pp. 834–849.
13. Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of
the IEEE International Conference on Robotics and Automation, Tianjin, China, 3–6 August 2014; pp. 15–22.
14. Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohli, P.; Shotton, J.; Hodges, S.;
Fitzgibbon, A.W. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the
IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011;
pp. 127–136.
15. Gokhool, T.; Meilland, M.; Rives, P.; Fernandez-Moral, E. A dense map building approach from spherical
RGB-D images. In Proceedings of the International Conference on Computer Vision Theory and Applications,
Lisbon, Portugal, 5–8 January 2014; pp. 656–663.
16. Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014;
pp. 2100–2106.
17. Chen, M.; Yang, S.; Yi, X.; Wu, D. Real-time 3D Mapping using a 2D Laser Scanner and IMU-aided Visual
SLAM. In Proceedings of the IEEE International Conference on Real-time Computing & Robotics IEEE,
Kandima, Maldives, 1–5 August 2018.
18. Tong, Q.; Peiliang, L.; Shaojie, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State
Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020.
19. Li, M.; Mourikis, A.I. 3-D motion estimation and online temporal calibration for camera-IMU systems.
In Proceedings of the International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany,
6–10 May 2013.
20. Shi, J.; He, B.; Zhang, L.; Zhang, J. Vision-based real-time 3D mapping for UAV with laser sensor.
In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Deajeon, Korea, 9–10 October 2016; pp. 9–14.
21. Lynen, S.; Achtelik, M.W.; Weiss, S.; Chli, M.; Siegwart, R. A robust and modular multi-sensor fusion
approach applied to MAV navigation. In Proceedings of the 2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–8 November 2013.
22. Oh, T.; Lee, D.; Kim, H.; Myung, H. Graph Structure-Based Simultaneous Localization and Mapping
Using a Hybrid Method of 2D Laser Scan and Monocular Camera Image in Environments with Laser Scan
Ambiguity. Sensors 2015, 15, 15830–15852. [CrossRef] [PubMed]
23. López, E.; García, S.; Barea, R.; Bergasa, L.M.; Molinos, E.J.; Arroyo, R.; Romera, E.; Pardo, S. A Multi-Sensorial
Simultaneous Localization and Mapping (SLAM) System for Low-Cost Micro Aerial Vehicles in GPS-Denied
Environments. Sensors 2017, 17, 802. [CrossRef] [PubMed]
24. Nam, T.H.; Shim, J.H.; Cho, Y.I. A 2.5D Map-Based Mobile Robot Localization via Cooperation of Aerial and
Ground Robots. Sensors 2017, 17, 2730. [CrossRef] [PubMed]
25. Zhang, Z.; Zhao, R.; Liu, E.; Yan, K.; Ma, Y. Scale Estimation and Correction of the Monocular Simultaneous
Localization and Mapping (SLAM) Based on Fusion of 1D Laser Range Finder and Vision Data. Sensors 2018,
18, 1948. [CrossRef] [PubMed]
Appl. Sci. 2019, 9, 2105 17 of 17
26. Shin, Y.; Park, Y.; Kim, A. Direct Visual SLAM Using Sparse Depth for Camera-LiDAR System. In Proceedings
of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia,
21–25 May 2018; pp. 1–8.
27. Xu, Y.; Ou, Y.; Xu, T. SLAM of Robot based on the Fusion of Vision and LIDAR. In Proceedings of the 2018
IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018;
pp. 121–126.
28. Jiang, G.; Yin, L.; Liu, G.; Xi, W.; Ou, Y. FFT-Based Scan-Matching for SLAM Applications with Low-Cost
Laser Range Finders. Appl. Sci. 2019, 9, 41. [CrossRef]
29. Lepetit, V.; Morenonoguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis.
2009, 81, 155–166. [CrossRef]
30. Bjorck, A. Numerical Methods for Least Squares Problems; SIAM: Philadelphia, PA, USA, 1996.
31. Morrison, D.D. Methods for nonlinear least squares problems and convergence proofs. In Proceedings of the
Jet Propulsion Laboratory Seminar on Tracking Programs and Orbit Determination, Pasadena, CA, USA,
23–26 February 1960; pp. 1–9.
32. Mmerle, K.; Grisetti, G.; Strasdat, H. G2O: A general framework for graph optimization. In Proceedings
of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011;
pp. 9–13.
33. Ceres Solver. Available online: http://ceres-solver.org (accessed on 10 May 2018).
34. AMCL-ROS. Available online: http://wiki.ros.org/amcl (accessed on 25 April 2016).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).