[go: up one dir, main page]

0% found this document useful (0 votes)
15 views11 pages

SLAP

The document presents a novel approach to Simultaneous Localization and Planning (SLAP) under non-Gaussian uncertainty, utilizing a stochastic Model Predictive Control (MPC) framework. This method addresses challenges in both static and dynamic environments by optimizing trajectories while accounting for uncertainties in motion and measurements. The proposed optimization strategy enhances computational efficiency and scalability, enabling effective navigation in complex scenarios without relying on traditional belief propagation methods.

Uploaded by

Benjamin Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

SLAP

The document presents a novel approach to Simultaneous Localization and Planning (SLAP) under non-Gaussian uncertainty, utilizing a stochastic Model Predictive Control (MPC) framework. This method addresses challenges in both static and dynamic environments by optimizing trajectories while accounting for uncertainties in motion and measurements. The proposed optimization strategy enhances computational efficiency and scalability, enabling effective navigation in complex scenarios without relying on traditional belief propagation methods.

Uploaded by

Benjamin Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Non-Gaussian SLAP: Simultaneous Localization

and Planning Under Non-Gaussian Uncertainty in


Static and Dynamic Environments
Mohammadhussein Rafieisakhaei∗ , Suman Chakravorty† and P.R. Kumar‡
∗ ,‡ Department of Electrical and Computer Engineering, † Department of Aerospace Engineering
Texas A&M University, College Station, Texas 77840
∗ Email: mrafieis@tamu.edu
arXiv:1605.01776v3 [cs.RO] 11 Aug 2016

Abstract—Simultaneous Localization and Planning (SLAP) In contrast, deterministic motion-planning methods assume
under process and measurement uncertainties is a challenge. no uncertainty and aim to obtain collision-free paths. In
It involves solving a stochastic control problem modeled as a these methods, the dynamics of the system and any kinds of
Partially Observed Markov Decision Process (POMDP) in a
general framework. For a convex environment, we propose an uncertainty involved in the problem have no impact on the
optimization-based open-loop optimal control problem coupled final trajectory. Falling in this category are sampling-based
with receding horizon control strategy to plan for high quality approaches, such as Rapidly exploring Random Tree (RRT)
trajectories along which the uncertainty of the state localization [27, 21] and Probabilistic RoadMap (PRM)-based [22, 23]
is reduced while the system reaches to a goal state with minimum methods; and trajectory optimization-based algorithms, such
control effort. In a static environment with non-convex state
constraints, the optimization is modified by defining barrier as Covariant Hamiltonian Optimization for Motion Planning
functions to obtain collision-free paths while maintaining the (CHOMP) [57] and TrajOpt [43] which aim to obtain a
previous goals. By initializing the optimization with trajectories in collision-free trajectory.
different homotopy classes and comparing the resultant costs, we However, real world systems contain both motion and sens-
improve the quality of the solution in the presence of action and ing uncertainty and require belief space planning. Assuming
measurement uncertainties. In dynamic environments with time-
varying constraints such as moving obstacles or banned areas, the belief is fully parameterized by a mean and covariance is
the approach is extended to find collision-free trajectories. In a well-entrenched method in the literature. Early extensions
this paper, the underlying spaces are continuous, and beliefs are of Linear Quadratic Gaussian (LQG)-based methods [6], such
non-Gaussian. Without obstacles, the optimization is a globally as iterative LQG (iLQG) [48], restricted attention to process
convex problem, while in the presence of obstacles it becomes uncertainty by assuming full observations. By incorporating
locally convex. We demonstrate the performance of the method
on different scenarios. observation uncertainty, roadmap-based [40, 17] and tree-
based methods [50] apply LQG methodology to find a locally
I. I NTRODUCTION optimal nominal path, rather than constructing a trajectory.
Van den Berg et al. [51] use stochastic Differential Dynamic
Uncertainty is a feature of many robotic systems. Whether Programming (sDDP) to extend LQG-MP [50] to roadmaps,
it is navigation underwater, aerially, or in a cluttered envi- which is then extended in [52] utilizing iLQG to perform the
ronment, uncertainty presents itself as an inevitable part of belief value-iterations. Methods described in [49, 38] perform
the problem. Dealing with uncertainty in any situation is a filtering during the planning stage using the most likely obser-
challenge. For a robust and reliable plan, the controller must vation in order to make the belief propagation deterministic.
solve a stochastic control problem [25, 26], which can be Algorithms that consider Monte-Carlo simulations on future
formulated as a Partially Observed Markov Decision Process observations, or methods such as [1, 51] which consider all
(POMDP) [3], [45], [19]. However, unless the domain of future observations, can perform better. However, LQG-based
the planning problem is finite and small, finding the optimal methods always require an initial feasible solution, which
control law in a general set-up is extremely difficult [36, 30]. can limit trajectories and affect planning. In Feedback-Based
This is because the planner needs to compute an offline Information RoadMap (FIRM) [2], a belief-MDP problem
policy function for all conditional probability distributions of is solved over a belief-graph. This method is advantageous
the state, referred to as belief states. Approaches that solve because it breaks the curse of history; however, its belief
the problem by either building a decision tree of discretized stabilizations are LQG-based. In LQG-based methods, other
observation-action pairs [44, 37], or maintaining a global than the Gaussianity assumption of the uncertainty (which
discretization of the value function over continuous spaces, is insufficient in many situations), computational cost and
suffer from the exponential complexity known as the curse scalability is a concern. In case of large deviations, unpredicted
of dimensionality [19, 56]. A more significant problem is the error accumulations force the controller to re-plan. Hence,
exponential growth of the number of trajectories corresponding methods that provide an output trajectory instead of a feed-
to action-observation pairs, known as the curse of history [37]. back policy must have a fast re-planning algorithm. Thus,
application of methods, such as [51] with high computational be completely feasible; if it is feasible, the optimization avoids
complexity becomes limited to problems of small domains. morphing towards infeasible regions. In dynamic environments
Receding Horizon Control (RHC) or Model Predictive where the obstacles can move or banned areas can change
Control (MPC)-based methods [15, 31] are closely related over time, the optimization problem is modified to respect
methods, where the optimization problem involved in the inner the time-varying constraints via time-varying barrier functions.
control loop is required to compute quickly. In these methods, Therefore, in neither the static nor dynamic cases does the
at each time-step, a finite-horizon optimal control problem optimization vector size changes and the decision variables
results in an optimal open-loop sequence of controls for the remain solely as the control variables. Our approach, which
current state of the system whose first element is applied to simplifies the planning stage and reduces the computational
the plant [33, 32]. An appealing advantage of MPCs is that, complexity for solving the open-loop optimal control problem,
unlike POMDP solvers that obtain an offline feedback policy significantly improves online navigation under uncertainty
prescribing the control for all (belief) states, MPCs have a when compared to previous methods. Planning for uncer-
natural online planning procedure for the current state of the tainty and optimization becomes more efficient by reducing
plant. Particularly, in problems with changing environments computational costs by utilizing our stochastic MPC, which
with moving objects, offline plans can become unreliable after then becomes inherently flexible and scalable; thus, enabling
the execution of a few steps of planned actions, thus exploring its usage in both static and dynamic environments. This
only the relevant portion of the belief space. Deterministic flexibility allows our stochastic MPC to increase its scalability
MPCs have received the most attention in the literature, and in the static environment, thereby moving from locally-optimal
stochastic MPCs are still under development. A large body solutions towards a better approximation of a globally-optimal
of the literature focuses its attention on robust-MPCs and approach by applying the algorithm over multiple homotopy
optimization in a tube of trajectories generated by propagating classes. Furthermore, once established, the scalability can
the initial state with several samples of process uncertainties make possible the treatment of dynamic environments with
[18, 29]. These methods can lead to extreme conservativeness time-varying constraints or moving obstacles, which makes
[28]. Another class of methods tackles the problem with linear the algorithm suitable for online planning.
process and observation models. However, these methods can
lead to either non-convex or high-computational costs [16].
In problems with process uncertainty, Monte-Carlo based II. P ROBLEM D EFINITION
methods [4, 20], and related methods such as the scenario
approach [12], have also been successful in providing proba- In this section, we provide the general problem and other
bilistic guarantees with high confidence for convex problems. definitions.
However, in the presence of obstacles, most problems become System equations: We denote the state of the system by x ∈
inherently non-convex which limits the application of such X ⊂ Rnx , control action by u ∈ U ⊂ Rnu , and observation
methods in robotics problems. vector by z ∈ Z ⊂ Rnz . We only consider holonomic systems
In this paper, a stochastic MPC is proposed for planning in in this paper, where the general system equations are xt+1 =
the belief space. We define a finite-horizon optimal control f (xt , ut , ω t ) with zt = h(xt ) + ν t , where {ω t } and {ν t } are
problem with a terminal constraint that samples from an zero mean independent, identically distributed (iid) mutually
initial non-Gaussian belief and maps belief state samples independent random sequences, f : X × U × Rnx → X defines
to observation samples through usage of the observation the process (motion) model, and h : X → Z specifies the
model. The actions are planned with the aim of compression observation (measurement) model.
of the ensemble of observation trajectories. Therefore, the Belief (information state): The conditional distribution of
filtering equation, which usually poses a heavy burden on xt given the data history up to time t, called the belief
stochastic control problems, is avoided during the planning bt : X×Zt ×Ut → R. It is defined as bt (x, z0:t , u0:t−1 , b0 ) :=
stage. Additionally, the “most likely observation” assumption pXt |Z0:t ;U0:t−1 (x|z0:t ; u0:t−1 ; b0 ), and denoted by bt (x) or bt
is not used. The core problem in a convex environment in this paper [46, 25, 9, 24]. We use a non-Gaussian particle
is convexified for common non-linear observation models. filter representation of a belief in belief space, B, by taking a
Moreover, non-convex constraints are respected softly with a number N of state samples {xit }N with importance weights
i=1 P
N
combination of line integration of barrier functions along the {wti }Ni=1 [47, 13, 42, 41] and b t (x)≈ i i
i=1 wt δ(x − xt ) where
state trajectory. In a static environment, we apply the proposed δ(·) is the Dirac delta mass. It can be proven that by increasing
optimization over trajectories in different homotopy classes the number of particles to infinity, the distribution of the parti-
to find the lowest cost, smooth, collision-free trajectory with cles tends to the true filtering distribution [14]. In our planning
uncertainty reduction and minimum effort. It should be noted problem, we employ for the Maximum-A-Posteriori (MAP)
that, although the optimization is initialized with trajectories, state estimate defined as x̂t = arg maxx∈X bt (x). The belief
our algorithm constructs a new trajectory, since the underlying dynamics follow Bayesian update equations, summarized as a
trajectory only defines the homotopy class of the trajectory and function τ : B×U×Z → B, where bt+1 = τ (bt , ut , zt+1 ) [5].
is morphed continuously into a locally optimal one. Moreover, Problem 1: General optimal control problem. For a holo-
as elaborated in the simulations, the initial trajectory need not nomic system, given the initial belief state bt0 and a goal state
Fig. 1: Comparison of our method (3) with the traditional belief propagation methods (1 and 2).

xg , we define the following optimal control problem: where ftp := f (xpt , upt , 0) − At xpt − Bt upt , and the matrices
At = ∂f (x, u, 0)/∂x|xpt ,upt , Bt = ∂f (x, u, 0)/∂u|xpt ,upt , and
K−1
X Gt = ∂f (x, ut , 0)/∂ω|xpt ,upt are constant during each time
min Eπ [ cπt (xt , ut ) + cK (xK )]
π step (but can be time-dependent). In our planning problem
t=0
we construct an optimal trajectory. Therefore, in holonomic
s.t. bt+1 = τ (bt , ut , zt+1 ) (1a)
systems and under saturation constraints, the initial nominal
xt+1 = f (xt , ut , ω t ) (1b) trajectory can be morphed to differ significantly from the final
zt = h(xt ) + ν t (1c) optimized trajectory. However, it is important that although
j
φt (xt ) < 0, j ∈ [1, nc ], (1d) we linearize the observation model, the resultant Jacobian
H(x̂t ) = ∂h(x, 0)/∂x|x̂t is non-linear, state-and-time depen-
where the optimization is over feasible policies, and π = dent for many observation models. In other words, H(x̂t ) is
{π0 , · · · , πt } where πt : Zt+1 → U specifies an action given a function of the control variables.
the output, ut = πt (z0:t ). Moreover, cπt (·, ·) : X × U → R is Incremental cost c(·, ·) : X × U → R is defined as:
the one-step cost function, φjt (xt ) < 0 are nc (K+1) inequality
c(xt , ut )=Ebt [(xt − x̂t )T W(x̂t )(xt − x̂t )+uTt−1 Vtu ut−1], (3)
constraints, cπK (·) : X → R denotes the terminal cost, and the
expectation is taken with respect to policy π. where W(x̂t ) := H(x̂t )T R(x̂t )H(x̂t )  0 and Vtu  0
Now that we have defined our problem, we present the are positive definite matrices, and R : X → Rnz ×nz is a
proposed solution for different variations of this problem. proper weighting matrix, to be defined later. Conceptually, the
special definition of the W matrix converts the weighted error
III. O UR S OLUTION of state particles to a weighted error of observation particles.
A. Belief Space Planning Doing this, we are incorporating the uncertainty reduction in
the cost, avoiding the filtering equation as a constraint in the
Calculation of the solution for the belief space problem 1 optimization, which breaks the computational cost of planning
in general is intractable. One attempt to make the problem problem, avoids introduction of new decision variables, avoids
more tractable is to restrict the policy to strategies such as propagation of the belief using most likely observations,
RHC, reduce the closed-loop optimization to an open-loop and distinguishes our method from LQG-based methods and
optimal control problem, and close the feedback loop by re- similar particle filter-based methods. This cost seeks to reduce
planning after execution of each action to account for updated the dispersion in the ensemble of the observation trajectories
belief. Even though the open-loop problem does not provide in terms of the weighted covariance. In other words, the
a feedback policy for all beliefs, it provides an open-loop minimization seeks to reduce the uncertainty in the predicted
optimal control sequence that still involves optimization in the observation or equivalently in estimation, which translates
belief space, which is a function space. Particularly, since the itself to shrinking the support of belief distribution.
observation space is continuous, the constraint of equation (1a) Comparison of our method with traditional approaches:
makes the problem computationally intractable. However, we Figure 1 graphically compares our method with traditional
will define our cost function such that the filtering equation is methods in the literature that tackle the open-loop problem. In
incorporated in the cost function. It can be shown that equation order to perform the filtering equation, a previous belief and
(1a) can be omitted to make the optimization more tractable, action, and a current observation are required. In the planning
while the approximate solution maintains its quality. stage, where the controller obtains the best sequence of future
Linearized system equations: We linearize the process actions, a current belief is given; however, all that is known
t0 +K p t0 +K−1
model around a nominal trajectory {xpt }t=t 0 , {ut }t=t0 for about the future observation is a likelihood distribution. As
a lookahead time horizon of K and linearize the observation shown in this figure, in classic methods, the initial belief is
model about the MAP state trajectory, to obtain: propagated using finitely many samples of the observation
obtained from the likelihood distribution. Therefore, a decision
xt+1 = ftp + At xt + Bt ut + Gt ω t (2a)
tree on the future predicted beliefs is constructed and so that
zt = h(x̂t ) + H(x̂t )(xt − x̂t ) + ν t , (2b) the optimizer can obtain the best action for each height of the
tree. Overall, the first method is computationally intensive. from the information sources get more penalized, penalizing
In the second popular class of methods, only most likely the corresponding observation trajectories, as well.
observation is utilized to perform the filtering equations and Convexifying the first term of the cost: Given the differ-
propagate the belief. This method can be less accurate than the entiable observation model h(x) = [h  j (x)], and its Jacobian
latter, and although it provides a less expensive optimization,   ∂hj (x) ∂hj (x)
H(x) = Hj (x) , where Hj (x) = ,··· ,
nevertheless, the filtering equation is part of the optimization ∂x1 ∂xnx
constraints which limits the application of this method to small for 1 ≤ j ≤ nz , if R(x) = diag(Rj (x)) is the diagonal ma-
domains. However, in our method, once the samples of the trix of Ri (x)’s corresponding to (uncorrelated) observations,
initial belief are propagated via the predicted model of the extending the definition of g to include matrix R we have
nz
system, they are converted into observation particles by a dT H(x)T R(x)H(x)d = Rj (x)(d · Hj (x)T )2 , which is
P
proper usage of the observation model. Thus, a rope-like bun- j=1
dle of propagated observation particle strands is constructed a sum of positive convex functions as in Lemma 1. Thus,
using the initial belief samples and with the advantage of a the first term of the cost in equation (3) can be convexified.
particular defined cost function, the dispersion in the strands is In our cost function, vector d represents any of the vectors
minimized. Hence, the optimization not only morphs the rope eit := √1N Ãt0 :t−1 (xit0 − x̂t0 ) ∈ Rnx for 1 ≤ i ≤ N ,
T T T
towards regions that provide observations, but also seeks to summarized in vector et := (e1t , e2t , · · · , eN T
t ) ∈ R
N nx

compress the bundle towards the end of the horizon. As a result which is constant at each time step t.
of reduced uncertainty in observation bundle, the belief itself Problem 2: Core convex problem in convex feasible space.
shrinks and the same results are obtained without performing Define W̄(x̂t ) := BlockDiag(W(x̂t )) with N equal diagonal
the filtering equation. Therefore, using this idea, the main blocks of W(x̂t ). Moreover, define the cost of information
computational burden of the problem is broken and the much as costinf o (x̂t ) := eTt W̄(x̂t )et and cost of control effort
cheaper optimization yields the desired results. We provide as costef f (ut ) := uTt Vt+1u
ut . Under the assumptions of
more details as we proceed in the paper. holonomic systems and convex environment and given the
B. Convex Environment initial re-sampled set of particles at time step t0 , {xit0 }ki=1 ,
and a goal state xg , the core convex SLAP problem is:
In this subsection, we provide our solution to the first variant
0
of the problem 1. Our assumption in this variant is that the tX +K

inequalities φjt (x̂t ) < 0 can only be convex, such as the min [costinf o (x̂t ) + costef f (ut−1 )]
ut0 :t0 +K−1
ones that define the boundary of the problem or saturation t=t0 +1

constraints on the control inputs. s.t. x̂t0 +K = xg


Approximating the expectation in the cost: Using a particle Pt−1
where x̂t = Ãt0 :t−1 x̂t0 + s=t0 Ãs+1:t−1 (Bs us + fsp ).
representation, we can simplify the cost function by taking
the expectation over the state samples. We propagate the initial C. Static Environment with Non-Convex Constraints
belief using a noiseless process model (2a) to obtain xit2− x̂t2 = In this subsection, we extend the solution of the previous
Ãt1 :t2 −1 (xit1 − x̂t1 ), where {xit1 }ki=1 are the set of resampled subsection to include the non-convex constraints on the state,
particles at time step t1 , Ãt1 :t2 := Πtt=t 2
1
At , defined for t1 ≤ such as obstacles and banned areas in static environment
t2 . Thus, the expectation in (2a) is approximated as: with a known map of the environment. We define barrier
0
tX +K N functions to model the obstacles and softly incorporate them
1 X i
[ [(x 0 − x̂t0 )T ÃTt0 :t−1 W(x̂t )Ãt0 :t−1 (xit0 − x̂t0 )] in the optimization objective. The optimization in such a
N i=1 t case reduces to a locally convex optimization. Therefore,
t=t0 +1
+ uTt−1 Vtu ut−1 ]. (4) we need to initialize the optimization with a trajectory. This
trajectory does not need to be feasible; however, starting from
Next, we utilize the R matrix to convexify the cost function a feasible trajectory, the optimization avoids entering non-
for non-linear observations. feasible states. Moreover, by initialing the optimization with
Lemma 1: Scalar observation Suppose d = trajectories in different homotopy classes, we find the locally
(d1 , · · · , dnx )T ∈ R and h(x) : X p → R differentiable. optimal trajectories in different homotopy classes. We discuss
If l : X → R defined as l(x) := R(x)(d · H(x)T ), the benefit of doing this towards the end of this subsection.
is convex or concave in x, then g : X → R≥0 , where Polygonal obstacles approximated by ellipsoids: Given a set
T
g(x) := dT H(x) R(x)H(x)d is a convex function of x, of vertices that constitute a polygonal obstacle, we find the
where H(x) := ∇h|x is the Jacobian of h. minimum volume enclosing ellipsoid (MVEE) and obtain its
Lemma 1 can be proved observing that g(x) = (l(x))2 . parameters [34]. Particularly, for the ith obstacle, the barrier
Moreover, R can be deigned to be a positive and increasing function includes a Gaussian-like function where the argument
function of the distance from the information sources other of the exponential is the MVEE, which can be disambiguated
than the features in Lemma 1. This is mainly because, most with its center ci and a positive definite matrix Pi that deter-
of common observation models such as range and bearing are mines the rotation and axes of the ellipsoid. Moreover, we add
functions of such a distance. Therefore, more distant states a line of infinity over the major and minor axes of the ellipsoid
so the overall function acts as a barrier to prevent the trajectory each other by a continuous function without colliding with
from entering the region enclosed by the ellipsoid. Note that obstacles [7, 8]. As shown in Fig. 2 the two solid trajectories
for non-polygonal obstacles, one can find the MVEE, and the are in one homotopy class, while the dashed trajectory is in a
algorithm works independently from this fact. Therefore, given different class.
the ellipsoid parameters C := (c1 , c2 , · · · , cnb ) ∈ Rnx ×nb and
2
P := (P1 , · · · , Pnb ) ∈ Rnx ×nb , the Obstacle Barrier Function
(OBF) is constructed as follows:
nb
X
Φ(P,C) (x): = M [exp(−[(x − ci )T Pi (x − ci )]q )
i=1
X
+ (||x−(θζ +(1−θ)ζ i,2 )||−2
i,1
2 +||x−(θξ
i,1
+(1−θ)ξ i,2 )||−2
2 )],
θ=0:m :1 Fig. 2: Homotopy classes. The solid trajectories are in a
+ + + i,1 i,2
different homotopy class from the dashed trajectory.
where m = 1/m, m ∈ Z , M ∈ R , q ∈ Z , and ζ , ζ
Homotopy classes and optimal trajectory: There are several
and ξ i,1 , ξ i,2 are the endpoints of the major and minor axes of
methods to find the trajectories in homotopy classes [7, 8]. For
the ellipsoid, respectively. Therefore, the second term in the
instance, in low dimensions one can construct the visibility
sum places infinity points along the axes of the ellipsoid at
graph considering the pure motion planning problem and
points formed by convex combination of the two endpoints
find trajectories in different homotopy classes that connects
of each axis. As m tends to zero, the entire axes of the
the start state to the goal state by pruning the non-unique
ellipsoid become infinite, and, therefore, act as a barrier to
paths. These methods provide such paths for different pur-
any continuous trajectory of states. One can think of putting
poses such as finding the shortest path. However, usually the
more infinity points inside the ellipsoid by forming the convex
uncertainty or dynamics of the system are not considered.
combination of the existing infinity points. Moreover, the first
We initialize our optimization with non-looped trajectories in
term of the summand determines the territory of the ellipsoid
different homotopy classes [8]. The optimizer considers the
and imposes an outwards gradient around the ellipsoid, acting
cost of uncertainty, effort, and collision-avoidance along with
as a penalty function pushing the trajectory out of the banned
the linearized dynamics of the (holonomic) system and morphs
region. Hence, we define the cost of avoiding obstacles as:
Z xt2 the initial trajectory towards a locally optimal trajectory. Our
barrier function model of the obstacles prevents the trajectory
costobst (xt1 , xt2 ) := Φ(P,C) (x0 )dx0 , (5)
xt1 from entering the banned regions. These barrier functions,
along with a optimization tuned through the saturation con-
which is the line integral of the OBF between two given
straints, a long enough optimization horizon (determined by
points of the trajectory xt1 and xt2 . Therefore, the addition of
the time-discretization step of the initial trajectory), and a
this cost to the optimization objective, ensures that the solver
limited step size of the line-search in optimization [53, 52],
minimizes this cost and keeps the trajectory out of the banned
keep the trajectory in its initial homotopy class. Moreover,
regions. However, for implementation purposes, the integral
since the optimization is locally convex, it finds the local
in equation (5) is approximated by a finite sum consisting
optimal trajectory of that homotopy class under the imposed
of fewer points between xt1 and xt2 . Next, we provide the
constraints and conditions starting from a trajectory in that
modified optimization problem that is locally convex in the
class. Therefore, by comparing the total costs obtained in
inter-obstacle feasible space, and therefore can be solved using
different cases, we obtain the lowest cost smooth trajectory
gradient descent methods [10].
considering all the predefined costs, and most significant of all,
Problem 3: Locally convex problem in a static environ-
uncertainty reduction. This is the closest output trajectory of
ment. Given {xit0 }ki=1 , xg and obstacle parameters (P, C), the
our algorithm to the globally optimal trajectory in the existence
static environment problem for a holonomic system is:
of uncertainties.
0
tX+K
min [costinf o (x̂t)+costef f (ut−1)+costobst (x̂t−1 , x̂t )] D. Dynamic Environment with Time-Varying Constraints
ut0 :t0+K−1
t=t0+1 Now that we have specified all the machinery needed to
s.t. x̂ t0 +K = xg . (6) find the optimal path in terms of the defined cost in a static
environment, we extend our method to an environment that is
Moreover, we add convex saturation constraints of the type not fully static.
||ut || ≤ maxu based on the specific robot model. Incorporating dynamic obstacles: If some of the obstacles
Next, we proceed to optimize towards a better approxima- are moving, the state constraints become time-varying. In such
tion among different homotopy classes while reaching prede- a case, we modify the optimization problem by altering the
fined goals, such as uncertainty reduction, collision avoidance, obstacle cost so it includes the dynamic obstacles as follows:
and reaching the final destination with minimal energy effort. nb
Homotopy classes: Homotopy classes of trajectories are (P̂t ,Ĉt )
X
Φt (x):=M [exp(−[(x − ĉit )T P̂it (x − ĉit )]p )
defined as sets of trajectories that can be transformed into i=1
X
+ (||x−(θζ̂ti,1 +(1−θ)ζ̂ti,2 )||−2 ˆi,1 ˆi,2 −2
2 +||x−(θ ξt +(1−θ)ξt )||2 )], which is the first element of the open-loop optimal sequence of
θ=0:m :1 actions generated in different variants of problem 1. The agent
executes ut transitioning the state of the system from xt in
where ĉit , P̂it , ζ̂ti,1 , ζ̂ti,2 , ξˆti,1 and ξˆti,2 are the estimated
xt+1 where a new observation zt+1 is obtained by the sensors.
parameters of the ith obstacle at time step t given by a separate
The estimator updates the belief as bt+1 = τ (bt , ut , zt+1 )
estimator that tracks the obstacles. Note that if the ith obstacle
and the policy is fed the updated belief to close the loop.
is just moving, then at time t0 > t, ĉit0 = ĉit + v̂i (t0 − t)
Meanwhile, on another separate loop, the obstacle trackers
and P̂it0 = Rα̂ i
P̂it where v̂i is a constant estimated velocity
i measure the current state of the obstacles and the obstacle
vector, and Rα̂ is an estimated rotation matrix by α̂ degrees.
parameter estimators obtain the estimates of the obstacles. As
However, if there is a change of shape in the obstacle or
mentioned above, the estimates are fed into the policy function
appearance of new obstacles, we run the MVEE algorithm
immediately before the controller plans its next action. In the
to find the parameters of that obstacle. For our planning
case of the static environment, the policy function is fed the
purposes, we assume there is a separate estimator that tracks
parameters of the obstacles that remain the same for the entire
and estimates the obstacles’ parameters, and our planner only
horizon. Similarly, in the case of a convex environment, the
uses the results obtained by that tracker to find the optimized
general boundaries and convex constraints take the place of
trajectory. Moreover, since the algorithm is implemented in
the obstacle parameters in the planning problem.
an RHC fashion, if there is a change in the estimates of the
Stopping execution: The algorithm stops when the probabil-
obstacles, for the next step the optimization uses the new
ity of reaching the goal, calculated as the area under the belief
estimates of the obstacle parameters. Moreover, the obstacle
density over the goal region, exceeds a predefined value [39].
cost is modified as follows:
Z xt2 The planning algorithm is in Algorithm 1.
(P̂ ,Ĉ )
costobst (xt1 , xt2 , t):= Φt t t (x0 )dx0 . Algorithm 1: Planning Algorithm
xt1
Input: Initial belief state bt0 , Goal state xg , Planning
Problem 4: Dynamic environment. For a holonomic sys- horizon K, Belief dynamics 0
τ , Obstacle
tem, given {xit0 }ki=1 , xg and estimates of the obstacle param- parameters {(P̂t , Cˆt )}tt=t+K+1
0
0
eters for the entire lookahead horizon {(P̂t , Cˆt )}tt=t+K+1
0 , the 1 while P(bt , r, xg ) ≤ w̆th do
dynamic environment problem is defined as: 2 ut ← π(bt , P̂t:t+K+1 , Cˆt:t+K+1 , K, xg );
0
tX +K 3 execute ut , perceive zt ;
min [costinf o (x̂t ) + costef f (ut−1 ) 4 bt+1 (x) ← τ (bt (x), ut , zt+1 );
ut0 :t0 +K−1 5 end
t=t0 +1
+ costobst (x̂t−1 , x̂t , t − 1)] F. A Discussion and Comparison on Complexity
s.t. x̂t0 +K = xg . (7)
The core optimization problem in a convex environment as
If there is a sudden appearance of a new obstacle in part defined in problem 2 is a convex program that does not require
of the trajectory, only that part of the trajectory is changed an initial solution at all. The number of decision variables is
provided there is still a feasible path between the two points Knu , and there is 1 linear equality constraint, plus, the robot’s
immediately outside and on the other side of that obstacle. saturation inequality constraints, which can be Knu at most.
Otherwise, the entire algorithm runs again from the current Therefore, the optimization involves the minimum number of
state to the goal state. It should be added that, unlike a decision variables. Let us assume for simplicity that the size
static environment, in a stochastic problem with dynamic x, u, and z vectors are all O(n). Thus, utilizing a common
environment, unless the planning horizon is very small, there method, such as center of gravity for convex optimization [11]
is not much that can be said regarding the homotopy paths to obtain a globally optimal solution with  confidence, the
discussed in Section III-C. This is an ongoing research. algorithm requires Ω(Knlog(1/)) calls to the oracle [35]. On
Now that we have provided our solution for all the three the other hand, in equation (4), ÃTt0 :t−1 W(x̂t )Ãt0 :t−1 requires
cases, we proceed to discuss the implementation strategy. a multiplication of O(n) × O(n) matrices for O(K) times,
which takes O(Kn3 ). However, the multiplication of the
E. RHC Implementation vectors (xit0 − x̂t0 ) to a Rn×n matrix involves O(N n2 ) time.
The overall feedback control loop is shown in Fig. 3. The outer sum also takes K time. All the other operations,
The system initiates from a non-Gaussian distribution in the such as calculation of x̂t and constraints, take less time. Hence,
feasible state space that constitutes the initial belief. In the the time complexity of the computations is O(Kn3 + N n2 ).
case of a dynamic environment, the most complicated case In LQG-based belief space methods that construct a trajectory,
of our problems, given the current belief, 0
bt , estimates of the method described in [51] involves a non-convex optimiza-
the obstacles’ parameters, {(P̂t , Cˆt )}tt=t
+K+1
0 , lookahead time tion, which takes O(Kn6 ) computations with a second-order
horizon, K, and the goal state, xg , the RHC policy function convergence rate to a locally optimal solution. Another RHC-
2
π : B×Rnx ×nb ×(K+1) ×Rnx ×nb ×(K+1) ×R×X → U generates based method particle filter-based method is described in [39],
an optimal action ut = π(bt , P̂t:t+K+1 , Cˆt:t+K+1 , K, xg ), where the core problem is a convex problem in Knu + N
Fig. 4: Modified visibility graph. The two non-looped non-
homotopic trajectories are found with modified visibility graph
and are indicated by the red dotted and blue dashed paths.

Fig. 3: The overall feedback control loop.


number of decision variables with N (K − 1) + 1 number of
inequality constraints. The algorithm assumes a linear process
model with Gaussian noise and a linear measurement model
with a Gaussian noise whose covariance is state-dependent.
The solution is categorized in the second class of methods (a) (b)
in Fig. 1. Moreover, to include more than one observation
source, the algorithm requires a modification with integer
programming, such that at each time step, there could only
be one observation. Although the analysis of time complexity
is not given, to the best of our knowledge we calculate it
to be O(N K(Kn3 + N n2 )) without integer programming.
Moreover, the convergence needs Ω((N +Kn)log(1/)) calls.
In the presence of obstacles, the size of our optimization does
not change; however, the rate of convergence reduces to the (c) (d)
rate of gradient descent methods. Furthermore, the solution
becomes locally optimal starting with an infeasible solution Fig. 5: Four cases. Cases a and b show the resulting paths
whose immediate gradient is not towards the local minima generated by optimizing without considering the information
of the obstacles. Theoretically, if the m of OBF tends to sources, whereas cases c and d consider information sources.
infinity, there is no local minima of the barriers; neverthe- A. Simple Comparison Test
less, practically, starting from a semi-feasible trajectory, a
tuned optimization results in convergence to a locally optimal Figure 4 shows an environment with three obstacles forming
feasible solution. In [51], in the presence of obstacles, the a connected obstacle. The banned areas are enclosed with three
convergence rate and computational cost does not change, but MVEEs. In this experiment, we use the visibility graph to find
the (tuned) optimization must start with a feasible path. In [39], initial trajectories in different homotopy classes. Moreover,
obstacles are modeled with a chance-constrained method that instead of using the polygons, we use the ellipsoids that en-
involves the introduction of additional variables and integer close them. Since our optimization utilizes a gradient descent
programming with iterative applications of the algorithm. This method, we only consider the straight lines between the nodes
limits the scalability of that method to complex environments. and ignore the collision of the straight line with the ellipsoid
that the node is lying on. This increases the speed of finding
the visibility graph and coupled with optimization over the
IV. S IMULATIONS AND E XAMPLES
output paths, the minor collisions do not hurt the algorithm.
In this section, we exhibit some applications for our method. Next, each of the two paths is discretized to satisfy the
We performed our simulations in MATLAB 2015b in a 2.90 tuning properties described in Section III-C. They are then
GHz CORE i7 machine with dual core technology and 8 GB of fed into the optimization function π to produce the optimized
RAM. First, we perform a comparative simulation in which we smooth collision-free paths. We have produced two sets of
compare the results of our algorithm for a static environment results; in the first set, we do not consider the cost of infor-
with and without information sources. Then, we perform an mation (as if we are considering the motion planning problem
experiment for a KUKA youBot in static environment. Finally, to generate smooth collision-free paths); in the second, we
we perform simulations for dynamic environments. There are add a landmark as the information source, and consider the
8 multimedia files uploaded for figures 5, 6, 7, 8 that include optimization with cost of information, to compare the results.
more details on the simulations. In these figures, all the axes As seen in Fig. 5d, existence of the landmarks changes the
units are in meters. paths of the robot, such that the robot visits them to reduce its
uncertainty and then continues its path towards the goal state. the figure with green dashed line, with the planned trajectory
of the robot shown as a solid yellow line.
B. KUKA YouBot In another scenario shown in Fig. 8, an object is moving in
In this section, we use the kinematics equations of KUKA a spiral path shown in Fig. 8b with the robot trying to avoid
youBot base as described in [55]. Particularly, the state vector colliding with the obstacle, spending most of its time near the
can be described by a 3D vector, x = [xx , xy , xθ ]T , describing information source and reaching the goal in a safe, short and
the position and heading of the robot base, and x ∈ SO(3). smooth path.
The control consists of the velocities of the four wheels. It
can be shown that the discrete motion model√can be written as
xt+1 = f (xt , ut , ω t ) = xt + But dt + Gω t dt, where B and
G are appropriate constant matrices whose elements depend
on the dimensions of the robot as indicated in [54], and dt is
the time-discretization period. Inspired by [57], we model the
robot with a configuration of a set of points which represent
the centers of the balls that cover the body of the robot. In
our method, we cover the robot with two balls whose radii are
proportional to the width of the robot. We find the MVEE of (a) (b)
the polygons that are inflated from each vertex to the size of Fig. 7: Dynamic Environment. The robot heads towards the
the radius and modify the cost of obstacles to keep the centers landmark to reduce its uncertainty and avoids the moving
of the balls out of the new barriers. The observation model is objects by changing its path to be in the opposite direction
a range and bearing based model where the corresponding of the objects.
elements of the R matrix are chosen to be ||(xx − Lx , xy −
Ly )||22 for all observations so as to have the desired features
described in Lemma 1. (Lx , Ly ) represents the coordinates
of a landmark. The results depicted in Fig. 6, show that the
planned trajectory avoids entering the banned regions bordered
by the ellipsoids, so that the robot itself avoids colliding with
the three obstacles.

C. Dynamic Environment
In this scenario, we simulate a case where there are four (a) (b)
objects, starting from a common position and moving in
different directions downwards and towards the right of the Fig. 8: Moving object. The robot spends most of its time near
map. The robots starts from a distribution whose mean is the information source and avoids the object, which is moving
at (0,0), and wishes to reach the goal state (2,2) with high in a spiral path, and heads towards the goal region safely.
probability. As seen in Fig. 7, at the beginning of its trajectory, V. C ONCLUSION
the robot head towards the landmark at (1, 0.5), and as the
moving obstacles get closer, it changes its direction to bypass In this paper, we have proposed a method for controlling
the objects in the opposite direction. In this scenario, the initial a stochastic system starting from a configuration in the state
trajectory is just the straight line between the most probable space to reach a goal region. We have proposed a barrier func-
initial location of the robot and the final destination shown in tion method that combined with our optimization and gradient
descent methods, such as an interior point method, keeps a
trajectory in its homotopic path, and finds the local optimal
path. The optimization seeks a smooth trajectory that results in
the lowest combined cost of uncertainty in robot’s perception
and energy effort, while avoiding collision with obstacles,
and reaching to the goal destination with high probability.
Moreover, by finding trajectories in different homotopy classes
and optimizing over paths in all classes, we can compare the
costs incurred by those trajectories. Therefore, we find the
lowest cost trajectory that considers the cost of uncertainty
incurred in the trajectories, which is the closest trajectory to
the global optimal trajectory while avoiding the need to solve
Fig. 6: YouBot base control. There are three obstacles and two
the belief space dynamic programming equations. Finally, our
landmarks. The robot base is shown by a rectangle with a line
receding horizon control strategy and the low computational
at the heading. Initial and planned trajectories are depicted by
cost of the optimization allows us to incorporate dynamic
dashed and solid lines, respectively.
obstacles, as well as new objects that are detected along the [11] Sébastien Bubeck. Theory of convex optimization for
trajectory of the robot by adjusting the optimization and the machine learning. arXiv preprint arXiv:1405.4980, 2014.
barrier functions during execution. This allows us to perform [12] Giuseppe C Calafiore and Marco C Campi. The scenario
belief space planning in an online manner. Our future work approach to robust control design. Automatic Control,
is aimed at extending our results, analyzing the performance IEEE Transactions on, 51(5):742–753, 2006.
and sensitivities, and executing the methods on real robots. [13] D. Crisan and A. Doucet. A survey of convergence
results on particle filtering methods for practitioners.
ACKNOWLEDGMENT IEEE TRANSACTIONS ON SIGNAL PROCESSING, 50
(3), 2002.
This material is based upon work partially supported by
[14] A. Doucet, J.F.G. de Freitas, and N.J. Gordon. Sequential
NSF under Contract Nos. CNS-1646449 and Science & Tech-
Monte Carlo methods in practice. New York: Springer,
nology Center Grant CCF-0939370, the U.S. Army Research
2001.
Office under Contract No. W911NF-15-1-0279, and NPRP
[15] Carlos E Garcia, David M Prett, and Manfred Morari.
grant NPRP 8-1531-2-651 from the Qatar National Research
Model predictive control: theory and practicea survey.
Fund, a member of Qatar Foundation.
Automatica, 25(3):335–348, 1989.
[16] Peter Hokayem, Debasish Chatterjee, Federico Ramponi,
B IBLIOGRAPHY
Georgios Chaloulos, and John Lygeros. Stable stochastic
[1] A. Agha-Mohammadi, S. Chakravorty, and N. Am- receding horizon control of linear systems with bounded
ato. Firm: Feedback controller-based information-state control inputs. In Int. symposium on mathematical theory
roadmap-a framework for motion planning under uncer- of networks and systems (MTNS), pages 31–36, 2010.
tainty. In Intelligent Robots and Systems (IROS), 2011 [17] V. Huynh and N. Roy. icLQG: combining local and
IEEE/RSJ International Conference on, pages 4284– global optimization for control in information space. In
4291. IEEE, 2011. IEEE International Conference on Robotics and Automa-
[2] A. Agha-Mohammadi, S. Chakravorty, and N. Amato. tion (ICRA), 2009.
Firm: Sampling-based feedback motion planning un- [18] Zhong-Ping Jiang and Yuan Wang. Input-to-state stability
der motion uncertainty and imperfect measurements. for discrete-time nonlinear systems. Automatica, 37(6):
The International Journal of Robotics Research, page 857–869, 2001.
0278364913501564, 2013. [19] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra.
[3] K Astrom. Optimal control of markov decision processes Planning and acting in partially observable stochastic
with incomplete state estimation. Journal of Mathemat- domains. Artificial Intelligence, 101:99–134, 1998.
ical Analysis and Applications, 10:174–205, 1965. [20] Nikolas Kantas, JM Maciejowski, and A Lecchini-
[4] Daniele Bernardini and Alberto Bemporad. Scenario- Visintini. Sequential monte carlo for model predictive
based model predictive control of stochastic constrained control. In Nonlinear Model Predictive Control, pages
linear systems. In Decision and Control, 2009 held 263–273. Springer, 2009.
jointly with the 2009 28th Chinese Control Conference. [21] Sertac Karaman and Emilio Frazzoli. Sampling-based
CDC/CCC 2009. Proceedings of the 48th IEEE Confer- algorithms for optimal motion planning. International
ence on, pages 6333–6338. IEEE, 2009. Journal of Robotics Research, 30(7):846–894, June 2011.
[5] Dimitri Bertsekas. Dynamic Programming and Optimal [22] L.E. Kavraki, P. Švestka, J.C. Latombe, and M. Over-
Control: 3rd Ed. Athena Scientific, 2007. mars. Probabilistic roadmaps for path planning in high-
[6] Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P dimensional configuration spaces. IEEE Transactions on
Bertsekas, and Dimitri P Bertsekas. Dynamic program- Robotics and Automation, 12(4):566–580, 1996.
ming and optimal control, volume 1. Athena Scientific [23] L.E. Kavraki, M.N. Kolountzakis, and J.C. Latombe.
Belmont, MA, 1995. Analysis of probabilistic roadmaps for path planning.
[7] Subhrajit Bhattacharya, Vijay Kumar, and Maxim IEEE Transactions on Robotics and Automation, 14:166–
Likhachev. Search-based path planning with homotopy 171, February 1998.
class constraints. In Third Annual Symposium on Com- [24] Seyed-Mahdi Kazempour-Radi and Mohammad-Hussein
binatorial Search, 2010. Rafiei-Sakhaei. A utilization of wireless sensor network
[8] Subhrajit Bhattacharya, Maxim Likhachev, and Vijay Ku- in smart homes, reviewing its combination with modern
mar. Topological constraints in search-based robot path cellular networks. In Communication and Industrial
planning. Autonomous Robots, 33(3):273–290, 2012. Application (ICCIA), 2011 International Conference on,
[9] M Boloursaz, R Kazemi, B Barazandeh, and F Behnia. pages 1–5. IEEE, 2011.
Bounds on compressed voice channel capacity. In Com- [25] P. R. Kumar and P. P. Varaiya. Stochastic Systems: Es-
munication and Information Theory (IWCIT), 2014 Iran timation, Identification, and Adaptive Control. Prentice-
Workshop on, pages 1–6. IEEE, 2014. Hall, Englewood Cliffs, NJ, 1986.
[10] Stephen Boyd and Lieven Vandenberghe. Convex opti- [26] PR Kumar et al. Control: a perspective. Automatica, 50
mization. Cambridge university press, 2004. (1):3–43, 2014.
[27] Steve Lavalle and J. Kuffner. Randomized kinodynamic planning under non-gaussian uncertainty and non-convex
planning. International Journal of Robotics Research, 20 state constraints. In 2016 IEEE International Conference
(378-400), 2001. on Robotics and Automation (ICRA), pages 4238–4244.
[28] M. Lazar, W.P.M.H. Heemels, and A.R. Teel. Further IEEE, 2016.
input-to-state stability subtleties for discrete-time sys- [43] John Schulman, Yan Duan, Jonathan Ho, Alex Lee,
tems. Automatic Control, IEEE Transactions on, 58 Ibrahim Awwal, Henry Bradlow, Jia Pan, Sachin Patil,
(6):1609–1613, June 2013. ISSN 0018-9286. doi: Ken Goldberg, and Pieter Abbeel. Motion planning
10.1109/TAC.2012.2231611. with sequential convex optimization and convex colli-
[29] D Limon, T Alamo, DM Raimondo, D Munoz de la Pena, sion checking. The International Journal of Robotics
JM Bravo, A Ferramosca, and EF Camacho. Input-to- Research, 33(9):1251–1270, 2014.
state stability: a unifying framework for robust model [44] Guy Shani, Joelle Pineau, and Robert Kaplow. A survey
predictive control. In Nonlinear model predictive control, of point-based pomdp solvers. Autonomous Agents and
pages 1–26. Springer, 2009. Multi-Agent Systems, 27:1–51, 2013.
[30] O. Madani, S. Hanks, and A. Condon. On the unde- [45] R. D. Smallwood and E. J. Sondik. The optimal control
cidability of probabilistic planning and infinite-horizon of partially observable markov processes over a finite
partially observable markov decision problems. In Pro- horizon. Operations Research, 21(5):1071–1088, 1973.
ceedings of the Sixteen Conference on Artificial Intelli- [46] E. J. Sondik. The optimal control of partially observable
gence (AAAI), pages 541–548, 1999. markov processes. PhD thesis, Stanford University, 1971.
[31] David Q Mayne. Model predictive control: Recent [47] Sebastian Thrun, Wolfram Burgard, and Dieter Fox.
developments and future promise. Automatica, 50(12): Probabilistic Robotics. MIT Press, 2005.
2967–2986, 2014. [48] Emanuel Todorov and Weiwei Li. A generalized itera-
[32] David Q Mayne and Hannah Michalska. Receding tive lqg method for locally-optimal feedback control of
horizon control of nonlinear systems. Automatic Control, constrained nonlinear stochastic systems. In American
IEEE Transactions on, 35(7):814–824, 1990. Control Conference, 2005. Proceedings of the 2005,
[33] David Q Mayne, James B Rawlings, Christopher V Rao, pages 300–306. IEEE, 2005.
and Pierre OM Scokaert. Constrained model predictive [49] Noel Du Toit and Joel W. Burdick. Robotic motion
control: Stability and optimality. Automatica, 36(6):789– planning in dynamic, cluttered, uncertain environments.
814, 2000. In ICRA, May 2010.
[34] Nima Moshtagh. Minimum volume enclosing ellipsoid. [50] Jur Van Den Berg, Pieter Abbeel, and Ken Goldberg.
Convex Optimization, 111:112, 2005. Lqg-mp: Optimized path planning for robots with motion
[35] Arkadiı̆ Nemirovsky. Problem complexity and method uncertainty and imperfect state information. The Inter-
efficiency in optimization. national Journal of Robotics Research, 30(7):895–913,
[36] C. Papadimitriou and J. N. Tsitsiklis. The complexity of 2011.
markov decision processes. Mathematics of Operations [51] Jur Van den Berg, Sachin Patil, and Ron Alterovitz.
Research, 12(3):441–450, 1987. Motion planning under uncertainty using differential
[37] J. Pineau, G. Gordon, and S. Thrun. Point-based value dynamic programming in belief space. In Proc. of
iteration: An anytime algorithm for POMDPs. In Interna- International Symposium of Robotics Research, (ISRR),
tional Joint Conference on Artificial Intelligence, pages 2011.
1025–1032, 2003. [52] Jur Van Den Berg, Sachin Patil, and Ron Alterovitz.
[38] R. Platt, R. Tedrake, L. Kaelbling, and T. Lozano-Perez. Motion planning under uncertainty using iterative local
Belief space planning assuming maximum likelihood optimization in belief space. The International Journal
observations. In Robotics: Science and Systems (RSS), of Robotics Research, 31(11):1263–1278, 2012.
2010. [53] SJ Yakowitz. Algorithms and computational techniques
[39] Robert Platt. Convex receding horizon control in non- in differential dynamic programming. Control and Dy-
gaussian belief space. In Algorithmic Foundations of namical Systems: Advances in Theory and Applications,
Robotics X, pages 443–458. Springer, 2013. 31:75–91, 2012.
[40] Sam Prentice and Nicholas Roy. The belief roadmap: [54] Youbot-store.com. Youbot 3d model - youbot wiki,
Efficient planning in belief space by factoring the co- 2016. URL http://www.youbot-store.com/wiki/index.
variance. International Journal of Robotics Research, 28 php/YouBot 3D Model.
(11-12), October 2009. [55] zakharov. zakharov youbot model, 2011. URL
[41] Mohammadhussein Rafieisakhaei, Amirhossein Tamjidi, https://github.com/zakharov/youbot model/wiki/
and Suman Chakravorty. On-line mpc-based stochastic KUKA-youBot-kinematics,-dynamics-and-3D-model.
planning in the non-gaussian belief space with non- [56] R. Zhou and E. A. Hansent. An improved grid-based
convex constraints. 2015. approximation algorithm for pomdps. IJCAI, 2001.
[42] Mohammadhussein Rafieisakhaei, Amirhossein Tamjidi, [57] Matt Zucker, Nathan Ratliff, Anca D Dragan, Mihail Piv-
Suman Chakravorty, and PR Kumar. Feedback motion toraiko, Matthew Klingensmith, Christopher M Dellin,
J Andrew Bagnell, and Siddhartha S Srinivasa. Chomp:
Covariant hamiltonian optimization for motion planning.
The International Journal of Robotics Research, 32(9-
10):1164–1193, 2013.

You might also like