Proceedings of the
44th IEEE Conference on Decision and Control, and
the European Control Conference 2005
Seville, Spain, December 12-15, 2005
WeC03.3
A Hierarchical Approach To Multi-Player Pursuit-Evasion Differential Games
Dongxu Li, Student Member, IEEE, Jose B. Cruz, Jr., Life Fellow, IEEE,
Genshe Chen, Chiman Kwan, Senior Member, IEEE, and Mou-Hsiung Chang
Abstract—The increasing use of unmanned assets and robots
in modern military operations renews an interest in the study of
general pursuit-evasion games involving multiple pursuers and
multiple evaders. Due to the difficulty in formulation and
rigorous treatment, the literature in this field is very limited.
This paper presents a hierarchical approach to this kind of
problem. With an additional structure imposed on
decision-making of pursuers, this approach provides
conservative guidance to pursuers by finding certain
engagement between pursuers and evaders, and the
saddle-point strategies are utilized by each pursuer in chasing
the engaged evaders. A combinatorial optimization problem is
formulated and scenarios are created to demonstrate the
feasibility of the algorithm. This is a preliminary study on
multi-player pursuit-evasion games and future directions are
suggested.
I. INTRODUCTION
I
n a Pursuit-Evasion (PE) game, the problem of one or a
group of pursuers catching one or a group of moving
evaders is studied. It has extensive applications such as
missile guidance, military strategy, aircraft control and aerial
tactics. Under the framework of game theory and optimal
control, a number of formal solutions regarding optimal
strategies in particular PE problems can be achieved [1]-[2].
In the literature, most studies on PE games have concentrated
on two-player games with a single pursuer and a single
evader. As the use of unmanned assets and robots increases in
modern military operations, newly emergent scenarios
usually involve multiple pursuers and evaders. The problem
of formulation and computation of optimal pursuit strategies
of multiple players in continuous time needs to be addressed.
A PE game is usually formulated as a zero-sum game.
Since the 1950s, the deterministic PE game of a single
pursuer and a single evader with perfect information and
common knowledge has been extensively studied. Isaacs
Manuscript received September 14, 2005. This work was supported in part
by the U.S. Army under Contract W911NF-05-C-0018.
D. Li is with the Department of Electrical and Computer Engineering, The
Ohio State University, Columbus, OH 43210 USA. (phone: 614-404-5494;
email: li.447@osu.edu)
J.B. Cruz is with the Department of Electrical and Computer Engineering,
The Ohio State University, Columbus, OH 43210 USA. (email:
jbcruz@ieee.org)
G. Chen is with Intelligent Automation, Inc, 15400 Calhoun Dr, Suite 400,
Rockville, MD 20855 USA. (email: gchen@i-a-i.com)
C. Kwan is with Intelligent Automation, Inc, 15400 Calhoun Dr, Suite
400, Rockville, MD 20855 USA. (email: ckwan@i-a-i.com)
M.H. Chang is with the Mathematics Division, U.S. Army Research
Office, (email: mouhsiung.chang@us.army.mil).
0-7803-9568-9/05/$20.00 ©2005 IEEE
solved a PE problem for a saddle-point equilibrium solution
by the method of “tenet of transition” [1]. Although PE games
with multiple pursuers and multiple evaders have been
investigated recently, most of them deal with discrete time
problems or are in an ad hoc manner [6], [8]. Little has been
done for generic multi-player PE differential game problems
in continuous time. In this paper, we focus on deterministic
multi-player differential PE game problems [4]. We extend
Isaacs’ approach on a two-player PE game to a game with
multiple players. A conservative strategy is applied from the
pursuer’s perspective to achieve an upper-bound of the
performance index.
This paper is organized as follows. A generic problem is
formulated in the next section. In section III, difficulty of
application of conventional optimal control theory on
multi-player PE games has been analyzed, and then a
suboptimal hierarchical approach is proposed from the
pursuers’ perspective. In addition, the issue of capturability is
discussed and methods are suggested to address the problem.
Simulation results are presented in section IV. Finally, the
paper concludes with suggestions for future work.
II.
PROBLEM FORMULATION
Consider a general PE differential game with N pursuers
and M evaders in a n0 dimensional space S , S
i
p
\
n0
.
j
e
Denote by x ( x ) the state variable associated with pursuer
i , i 1," , N (evader j , j 1," , M ), where x ip
j
\ ne ). Notice that nip (nej )
( xej
\
nip
n0 because of the specific
dynamics of the pursuer (evader). Assume that the first n0
elements in x ip ( xej ) specify the physical position of pursuer
i (evader j ) in space S . In general, the dynamic equations
for each pursuer i and evader j are
x ip t
f pi xip t
, ui
t
xej t
fe j xej t
,vj
t
In (1), ui t
time t
, with
U ai , v j t
0 , where U ai
\
mip
xip 0
xip 0
xej 0
xej0
.
(1)
Vaj are control variables for
and Vaj
j
\ me are the sets of
corresponding admissible control actions; function f pi ,
( f e j , ) is a mapping from \
j
nip
j
U ai ( \ ne Vaj ) to \
nip
( \ ne ). In this paper, we consider the deterministic case,
where the function f pi ( f e j ) does not depend on time t
5674
explicitly. For simplicity of notation, let
xp
x1pT ," , x pN T
u
u1T ," , u TN
and
accordingly,
fe
f e1T ," , f eM T
T
T
, xe
x1eT ," , xeM T
and v
v1T ," , vMT
function
T
T
T
strategy u * by minimizing the objective (5) subject to (2);
while evaders try to maximize it. We use the notation
U a s, t to stand for the following set.
,
,
T
f p1T ," , f pN T
fp
U a s, t u
and
By (5), for
rewritten in a compact form as
f p xp t , u t
xe t
f e xe t , v t
Let X p
N
i 1
\
nip
, Xe
, with
M
j 1
xp 0
xp0
xe 0
xe 0
j
\ ne , U a
N
i 1
For pursuer i , define the projection P : \
P x
[ x ," , x
i
p
i
p1
V x p , xe
.
(2)
N
U ai and Va
nip
S
Vaj .
j 1
\
n0
] .
V x p , xe
(3)
ni
\ p,
all the pursuers and the evaders. Clearly, for any x
j
\ ne , j 1," , M ), P x
S.
In a PE game with multiple evaders, evaders are generally
not captured simultaneously. The terminal time of the game
can be defined based on the capture of all evaders, i.e., for any
j , j 1," , M , there exists i, 0 i N such that
d P xip t , P xej t
for some t
n
a metric in \ 0 , e.g., d x, y
0 . Here d
y 2 for
x
x, y
is
,
\ n0 ;
is a predetermined small positive real number. The capture
time of evader j , T j , can be defined as
Tj
inf t t
N , s.t. d P x ip t , P xej t
0, i,1 i
.
Then, the terminal time of the PE game, T , is defined as
(4)
T max T j
1 j M
Clearly, by (4), T
\
0,
, where, \ stands for the
set of all positive real numbers. Consider that the objective
function has a general form as follows, where subscript t
denotes the time.
J x p 0 , xe 0 , u , v
T
0
G x pt , xet , ut , vt dt Q xT
subject to (2).
In (5), u u t 0 t T , u t
Ua
and similar for
(5)
.
Function G is the cost rate,
G , , , : Xp
X e U a Va
\
T
U a s, t .
X p and xe
min
t
max
u0:T U a 0,T v0:T Va 0,T
X e , define V x p , xe
T
0
as
.1
G x pt , xet , ut , vt dt Q xT
(6)
as
i
T
pn0
xp
s
Similarly for V x p , xe ,
A similar operator can be defined for each pursuer and each
evader. We use the notation P to denote the projection for
i 1," , N (or x
Ua , 0
Use the notation us:t , us:t
. Then, the dynamic equation can be
x p t
u
0 ,
and function Q quantifies the terminal cost as a function of
the states at the terminal time of a game,
Q :X
\
0 . In a game where the capture time is the
objective, G 1 and Q 0 . In this paper, PE games
involving N pursuers and M evaders are modeled as
zero-sum games, where pursuers try to find an optimal
If V x p , xe
V x p , xe
max
min
v0:T Va 0,T u0:T U a 0,T
V x p , xe
T
0
.
G x pt , xet , ut , vt dt Q xT
(7)
(8)
V x p , xe ,
is called the Value function, and this condition is
called the Isaacs condition [2]. This Value is the so-called
saddle-point equilibrium defined for a zero-sum game [4]. In
this paper, the capitalized “Value” stands for the value
function defined in (8), avoiding the confusion with the
ordinary meaning of the word.
III. CONSERVATIVE HIERARCHICAL APPROACH
A. Dilemma in Backward Analysis
A PE game has been formulated as a zero-sum game and
the saddle-point equilibrium concept is adopted under the
Isaacs condition. This fact makes the mathematical tools that
are available for solving optimal control problems also useful
in solving conventional two-player PE differential games.
The kernel of modern optimal control theory includes
Pontryagin’s minimum principle and Bellman’s dynamic
programming. Both of them specify a set of conditions on
optimal dynamic controls. The former involves a set of
adjoint Ordinary Differential Equations (ODE) while the
latter is associated with a partial differential equation called
the Hamilton-Jacobi-Bellman2 (HJB) equation [15]. In both
approaches, boundary conditions on the states at the terminal
time are needed. Furthermore, Isaacs’ method of “tenet of
transition” in treating two-player PE differential games is
closely related to dynamic programming, which is based on
the underlying idea of state rollback. Starting from the
terminal, an optimal trajectory of the states is traced
backwards and with a formulation of the Hamilton-JacobiIsaacs (HJI) equation the Value function can be determined,
such that saddle-point equilibrium strategies of state feedback
can be obtained accordingly.
In the problem of a PE game involving multiple players,
the backward analysis encounters tremendous difficulty in
5675
1
*
We assume the optimum can be attained by some strategy pair u , v
*
the process of tracing back. The main challenge lies in
identifying the terminal states of both pursuers and evaders.
Starting from the end, under a saddle-point strategy, each
evader has been captured by some specific pursuer. If pursuer
i catches evader j in the game, we say that both players are
engaged. (Note that it is possible that one evader may be
engaged with more than one pursuer and vice versa.) Clearly,
the number of possible engagements between pursuers and
evaders increases at least exponentially with N and M .
This explosion of the number of possible engagements makes
the terminal state in a multi-player PE game extremely
difficult to specify in contrast to that in a two-player game. If
not impossible, the backward analysis should be repeated so
many times starting from different engagements even when
N and M are not very large. Furthermore, evaders are
generally not captured at the same time, which makes
determining the terminal state further intractable. Therefore,
the backward approach used by Isaacs and Bellman cannot be
practically applied in multi-player PE games and the same
situation holds for the minimum principle as well. In
summary, a multi-player PE differential game cannot be
characterized by either a set of adjoint ODEs or a HJI
equation.
B. Hierarchical Approach
To attack this problem, first let us study whether or not the
game can be decomposed into games between the pursuers
and the evaders that are closely engaged. Assume that there
exist continuous trajectories for the group of pursuers and
evaders that comprise a saddle-point solution for the game.
The engagement scheme among the pursuers and evaders can
be obtained at the terminal time of the game. Under this
engagement, pursuers do not switch evaders if evaders do not
change their strategy, which is the basic rationale of the
concept of equilibrium in game theory [2], [4]. In this sense,
the problem of determining an engagement may be part of the
pursuers’ strategies if they try to maximize (5). This can be
viewed as a natural hierarchical structure in decision-making
from the pursuers’ perspective. The upper level is to
determine a proper engagement scheme among pursuers and
evaders, while the lower level solves each engaged
differential game. However, given an engagement scheme,
the original game cannot be treated as a set of decoupled PE
games between pursuers and their engaged evaders.
Coordination fills this gap.
The major difficulty of multi-player PE differential games
is to characterize the coordination among pursuers or evaders.
It must be every difficult to solve it by Isaacs’ approach for
various terminal times and engagements if it is not
impossible. In PE games, Singular Surfaces (SS) are almost
always assumed to divide the state space into disjoint regions
with continuous differentiability [1]-[2]. We think the
2
In differential games, the corresponding equation is called the
Hamilton-Jacobi-Isaacs (HJI) equation, where “minimax” replaces “min”.
number of SS increases dramatically with the number of
pursuers and evaders, which makes the rigorous treatment of
this kind of problem extremely difficult. It is because
conventional optimal control theory requires conditions on
the smoothness of optimal functions.
Instead of determining the exact equilibrium solutions, we
seek an upper-bound of the performance index from the
pursuers’ perspective. In Isaacs’ approach, the “minmax”
operator in (6) is to determine the best worst-case strategy. It
is a conservative approach and the optimal objective acts as
the least uniform (no dependence on evaders) upper-bound
on the pursuers’ performance. When this result coincides
with that from (7), it becomes a well-defined solution for
zero-sum games. In this paper, we determine a good upper
bound instead of the least upper bound.
We focus on the capture time as our objective function.
The objective function for a game with multiple players is
T max T j . Assume that each pursuer is engaged with at
j
least one evader, and only captures one evader at a time. The
original multi-player PE game can be converted into a
hierarchical optimization problem. The upper level is to
determine such an engagement that T is minimized. Given
an engagement, the strategy of each pursuer is obtained by
solving decoupled two-player PE games based on Isaacs’
method at the lower level. This is a conservative approach
because in many situations, the strategies of multiple pursuers
are concealed so that the evaders cannot execute the
“optimal” strategies against the engaged pursuers. The
structure of the approach is illustrated in Fig. 1.
Optimization on Engagement
Two-Player
PE Game
Two-Player
PE Game
Two-Player
PE Game
……
Fig. 1. Hierarchical Structure Approach
C. Optimization at the Upper Level
Let V xip , xej
denote the Value function if pursuer i is
engaged with evader j . Notice that V xip , xej
T j . Assume
that V xip , xej can be solved analytically and the optimization
problem at the upper level can be formulated as
N
min J
K
V xip k , xej k
min max
bijk
j
N
subject to bijk
(9)
bijk
i 1 k 1
M
K
0,1 ,
bijk
i 1 k 1
1 and
bijk
1.
j 1
In (9), the problem is formulated with multiple stages, taking
into account the case when N M . Here, k is the index for
stages; bijk is a binary decision variable; bijk 1 indicates that
pursuer i is engaged with evader j at stage k ; bijk
5676
0
means the opposite. The maximum number of stages
considered is K M N , which is the smallest integer
greater than M N . Solving problem (9) provides an upper
bound of the objective for the original multi-player PE game.
When N M ( K 1 ), problem (9) can be converted into
a standard Mixed Integer Linear Programming (MILP)
problem by introducing a slack variable w , such that
commercial solvers such as CPLEX and LINDO can be
utilized [10]. A proper formulation is shown in (10).
min J min w
(10)
subject to T j
w, for 1
M
N
y
v p sin
p
ve cos
p
ve sin
x0p
e
(14)
e
x0e and y0
y0p
y0e .
is defined as
The terminal set
( x, y ) ( x, y )
The objective function is the capture time, J
H
V x ip , xej
j 1
i 1
v p cos
with x0
N
1 , with T j
bij
1 and
bij
x
bij .
i 1
D. PE Differential Game of Two Players at the Lower Level
Problem (9) requires solving V xip , xej . A number of
dt . The
1 Vx v p cos
ve cos
p
Vy v p sin
e
p
ve sin
Equation in [1], min max H
p
0 , and we obtain
e
Vx
Vy
where Vx2 Vy2
The objective function is
By the definition of S , the terminal state xT , yT
J
0
G xt , ut , vt dt Q xT .
Theorem 1: For a two-player PE differential game described
in (11) and (12), suppose that the control pair u * , v* is a
*
saddle-point solution and x t
\n
*
p
yT
cos
*
e
2
x
V
V
2
y
*
p
, sin
x
*
t
V y xT
H pt , x , u , vt
*
with x 0
satisfies
Vy yT
vp
.
ve
Considering the system dynamics, then
x2
V x, y
cos
*
p
cos
y2
x
*
e
x
2
y
ve ,
vp
2
, sin
*
p
sin
(15)
*
e
y0
x
2
y2
.
(16)
*
t
*
t
*
t
H pt , x , u , v
*
t
Vx2 Vy2
0 . It follows that
f xt* , ut* , vt*
p t
*
e
sin
1 v p ve .
Vx xT
is the corresponding
trajectory, there exists a costate function p t : 0, T
such that the following conditions are satisfied:
x * t
cos
Vx
(12)
.
e
According to Theorem 1, the costate equations are Vx 0
and Vy 0 . Thus, Vx and Vy are constant. By the Main
solutions have been achieved analytically [1]-[2]. The
following theorem specifies a set of conditions for an
equilibrium solution. Consider a general dynamics as
x t
f x, u , v with x 0 x0 \ n .
(11)
T
.
2
Hamiltonian is
0,1 ,
M , bij
j
perform at their maximum speeds and have complete
maneuverability. Small ground vehicles can be
approximately described by this dynamics. Define the new
states as x x p xe and y y p ye . It follows that
*
t
*
t
*
t
H pt , x , u , v
x0 and p T
time, where H pt , xt , ut , vt
x
*
t
*
t
H pt , x , ut , v
Q x* T
G xt , ut , vt
at the terminal
ptT f xt , ut , vt .
Proof: See [2].
A two-player PE problem can be difficult depending on the
dynamics and the positions of the players and analytical
solutions may be intractable. Here, we illustrate the idea by
solving a PE game with simplified dynamics in order to
reduce the complexity of determining V xip , xej . For
practical problems, it is desired that the model reveal the main
features of the players without involving too much of the
details. Consider a PE game with two dimensions. The
dynamics of the pursuer and the evader are
x p v p cos p xe ve cos e
.
(13)
,
y p v p sin p y e ve sin e
Here is the control variable. Let the initial conditions be
x p 0 , y p 0 , xe 0 and ye 0 . In this model, players are assumed to
E. Region of Capture
In practice, the feasibility of the hierarchical approach
depends on the capturability between any pair of a pursuer
and an evader. In this section, we briefly discuss this issue.
In [1], Isaacs defined concepts of game of degree and game
of kind to distinguish between the problem of solving optimal
solutions and the problem of existence. For a two-player PE
game, denote by C the capture region in the state space,
where capturability is guaranteed. On its boundary C , the
following condition is satisfied.
min max
ut
vt
f xt , ut , vt
0
(17)
Here, denotes the normal direction of C . Equation (17)
specifies a necessary condition of the capture region. It is
similar to HJI equation regardless of the specific objective
function.
An alternative approach is based on a feedback control
design method, which combines the concepts of function
minimization and Lyapunov stability techniques. This
method is called Lyapunov Optimizing Control (LOC)
[13]-[14]. The method of LOC can be used to study the
5677
capturability of PE games and to design a feasible pursuit
strategy for pursuers [14]. It depends on a positive definite
Lyapunov-type function W x . For any x in the state space,
if the following condition is satisfied for any t
inf W x
inf Wx x f xt , ut , vt
0 for
ut
ut
0,
vt Va ,
then x belongs to the capture region. Cleary, this condition is
sufficient and can be easily verified by stability analysis
based on the function W x .
F. Summary and Discussion
The hierarchical approach introduced above is a
suboptimal method. Instead of solving the problem
formulated in (6) within the admissible set U a 0, T , we do
optimization within a subset U
S
a
0, T
of U a 0, T
IV. SIMULATION AND DISCUSSION
As stated earlier, conventional optimal control theory is not
applicable. Even for the simplest multi-player PE game with
two pursuers and one evader, and if the evader is caught by
one pursuer, the terminal state of the other pursuer cannot be
specified. In this section, only the result from the hierarchical
approach is presented. We create a pursuit evasion scenario
involving 3 pursuers and 5 evaders in a two-dimensional
space. The dynamics of players are given in (13). The capture
time is considered as the objective function. The necessary
parameters and the initial states are in Table I.
TABLE I
INITIAL STATES OF PURSUERS AND EVADERS
Pursuers
1
2
( x p0 , y p0 )
(0, 3)
(0, 5)
by
v p (1/sec)
imposing a structure S on the strategy of the pursuers,
dividing the decision-making of pursuers into two levels. The
problem in (6) can be rewritten in the following form.
T
V x p , xe
G x pt , xet , ut , vt dt Q xT
min
max
u0:T U aS 0,T v0:T Va 0,T
p
(rad/sec)
Clearly, the solution of (18) is an upper-bound of the original
objective function in (5), and V x p , xe V x p , xe .
It is also worth noting that the hierarchical approach
described in this section is conceptually identical to a
two-level Stackelberg game problem [11] or bi-level
programming [12], where the optimization at the upper level
depends on that at the lower level. Note that in a two-level
Stackelberg problem, there may be more than one player at
the second level, where each player chooses a strategy
according to its individual objective. One special case is that
those objective functions are decoupled from each other,
which fits in the hierarchical approach in this section. In
general, for the more complicated coupled lower level, one
has to formulate the second level as a game, and relevant
solution concepts such as Nash, Pareto and etc may be
adopted.
Finally, the quality of the solution from this suboptimal
hierarchical approach depends on the combinatorial
optimization problem in (9). This is a NP-hard problem [16],
and the computational demand increases at least
exponentially with the number of pursuer and of evaders,
provided the fact the underlying V x ip , xej between pursuer
i and evader j are solvable. Practically, for a PE game
6
5
7
0.8
0.8
0.8
2
p0
2
Evaders
2
1
2
3
4
5
( xe 0 , ye 0 )
(1, 5)
(4, 5)
(6, 5)
(7, 5)
(9, 5)
ve (1/sec)
5
4
3
3
5
0
(18)
Equation (18) is almost the same as (6) except that
optimization of the pursuers is taken over a structured control
set U aS 0, T . In some sense, it is only a “local” optimum.
3
(0, 7)
Solving the combinatorial optimization problem in (9), we
obtain the optimal engagement shown in Table II. The
corresponding capture time is 8.8 seconds.
TABLE II
BEST ENGAGEMENT RESULT
P1
P2
E5
E3
N/A
E4
Order
1
2
P3
E2
E1
Next, we consider a little more complicated dynamics
x ip 0 xej0
x ip v ip cos pi xej vej cos ej
y ip
v ip sin
i
i
p
p
i
p
, y ej
u ip
vej sin
j
e
j
e
u
j
e
, with y ip 0 , yej0 ,
j
e
(19)
j
e0
i
p0
where, u p and ue are control variables, and assume
1 u p , ue
e
p
1 ; v p , ve ,
p
,
e
are constant. Consider the case
, i.e., evaders are viewed to directly control their
orientations. This model was originally used by Isaacs in
studying the homicidal chauffeur game [1]. The angular
velocity
and the initial orientation of each pursuer is
given in Table I (shaded). It can be verified that each evader
j is in the capture region of some pursuer i [2].
While in the capture region, the optimal strategy of evader
j can be shown as that in (16) by maximizing the time
derivative of the distance between pursuer i and evader j .
involving many pursuers and evaders, heuristic and
approximation methods should be applied on (9). In that case,
the upper-bound obtained is further degraded by loss of
optimality. This result would still be acceptable since no
better solution is available.
5678
D
x vej cos
j
e
vip cos
i
p
y vej sin
x2
y2
j
e
vip sin
i
p
xej
Here x
x ip and y
yej
y ip . Given a constant strategy
for evader j , the strategy of pursuer i may be solved by
taking the second order derivative of D , which yields
(20)
u ip
sign x sin pi y cos pi .
With the hierarchy imposed on the decision-making of the
pursuers, given a possible engagement scheme, we simulate
every decoupled game, in which pursuers utilize the strategy
in (20) and evaders play according to (16). All possible
engagements are enumerated and the best engagement result
is the same as that in Table II, but the capture time increases
to 12.5 seconds. The corresponding pursuit-evasion
trajectories are shown in Fig. 2, in which the trajectory of
each pursuer with its engaged evaders is plotted separately.
Snapshots at the 1st and the 5th second are illustrated.
8
15
6
10
4
REFERENCES
5
2
Pursuer 1
Evader 5
Pursuer 1
Evader 5
0
0
5
10
15
0
0
10
(a.1) By 1 Second
8
E3
10
E3 is captured
5
P2
P2
E4
2
0
E4
-5
5
6
7
8
6
8
10
12
(b.2) By 5 Seconds
Pursuer 3
Evaders 2/1
20
Pursuer 3
Evaders 2/1
E2 is captured
E2
6
Pursuer 2
Evaders 3/4
-10
4
(b.1) By 1 Second
10
4
0
P3
P3
0
30
15
Pursuer 2
Evaders 3/4
4
2
20
[1]
[2]
(a.2) By 5 Seconds
6
0
formulated to determine an optimal engagement involving a
single pursuer and a single evader. The underlying two-player
games are solved by currently available differential game
theory. The issue of capturability is discussed. Simulations
show the feasibility of the approach. This suboptimal
approach provides a theoretical upper-bound to a multiplayer PE game when exact equilibrium solutions are
unknown. In practice, solving the combinatorial optimization
problem suffers from the difficulty of its NP-hardness.
Future work falls in the following directions: 1) the locally
optimal strategy obtained by the hierarchical method may be
improved and if the improvement can be implemented
iteratively, a true equilibrium solution may be approached
asymptotically; 2) approximation and heuristic methods may
be designed to reduce the complexity of solving the
combinatorial
optimization
problem
in
practical
implementations; 3) PE games may be modeled in a
stochastic environment for more realistic conditions.
E1
2
-10
4
6
-20
-5
E1
0
5
10
(c.1) By 1 Second
(c.2) By 5 Seconds
Fig. 2. Pursuit-evasion Trajectories under the Best Engagement.
The simulation results demonstrate the cooperation of
pursuers when the hierarchical approach is applied, where
pursuers cooperate by choosing appropriate evaders to go
after. In the trajectories shown in Fig. 2, the “best” strategy is
utilized by evaders against each pursuer when engagement is
determined. This pursuit strategy is conservative from the
pursuer’s perspective, since in practice evaders may not know
the pursuer’s strategies perfectly.
V. CONCLUSION AND FUTURE WORK
In this paper, we deal with a multi-player PE differential
game. A generic problem is formulated. Conventional
optimal control theory is not applicable to this kind of
problem due to the difficulty in specifying the final states. A
suboptimal method is proposed to calculate a locally optimal
strategy with a specified control structure imposed on
pursuers. A combinatorial optimization problem is
Isaacs, Differential Games, John Wiley & Sons, Inc., New York, 1965.
T. Basar and G.J. Olsder, Dynamic Noncooperative Game Theory, 2nd
Ed, the Society for Industrial and Applied Mathematics, 1998.
[3] Bertsekas, D.P. 2000. Dynamic Programming and Optimal
Control: Volume 1, 2nd Edition. Athena Scientific, Belmont,
Massachusetts.
[4] M. J. Osborne and A. Rubinstein, A course in game theory, MIT press,
Cambridge, Massachusetts, 1994, pp 73-89.
[5] V. Turetsky and J. Shinar, J. “Missile Guidance Laws Based On
Pursuit-Evasion Game Formulations,” Automatica, Vol. 39, No. 3, pp.
740-746, 2003.
[6] Vidal, R., Shakernia, O., Kim, H.J., Shim, D.H. and Sastry, S,
“Probabilistic Pursuit-evasion Games: Theory, Implementation, and
Experimental Evaluation,” IEEE Transactions on Robotics and
Automation, v. 18, pp. 662- 669, 2002.
[7] S. M. LaValle, D. Lin, L. Guibas, J. C.Latombe, and R. Motwani.
“Finding an unpredicable target in a workspace with obstacles,” In
IEEE Int. Conf. Robot. & Autom., 1997.
[8] R. Vidal, S. Rashid, C. Sharp, O. Shakernia, J. Kim, and S. Sastry.
“Pursuit-evasion games with unmanned ground and aerial vehicles,”
IEEE Int. Conf. Robot.& Autom., pages 2948-2955. 2001.
[9] H. Yamaguchi, “A distributed motion coordination strategy for multiple
non-holonomic mobile robots in cooperative hunting operations,” in
Proceedings of the 41st IEEE Conference on Decision and Control, pp.
2984-2991, December 2002.
[10] Tom Schouwenaars, Eric Feron, Bart de Moor, and Jonathan How,
"Mixed Integer Programming for Multi-vehicle Path Planning,"
European Control Conference, September 2001.
[11] M. Simaan and J. B. Cruz Jr., On the Stackelberg strategy in
nonzero-sum games, Journal of Optimization Theory and Applications,
V. 11, 533 - 555, No. 5, 1973.
[12] B. Colson, P. Marcotte and G. Savard, Bilevel programming: A survey,
A Quarterly Journal of Operations Research, 3, 87ದ 107, 2005.
[13] T.L. Vincent, Guidance Against Maneuvering Targets Using Lyapunov
Optimizing Feedback Control, Proceedings of the American Control
Conference, Anchorage, AK, 2002.
[14] D.J. Sticht, T.L. Vincent and D.G. Schultz, Sufficiency Theorems for
Target Capture, Journal of Optimization Theory and Applications,
Vol.17, No.5/6, 1975.
[15] Yong, J. and Zhou, X.Y., Stochastic Controls: Hamiltonian Systems
and HJB Equations, Springer, 1999.
[16] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization:
Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs,
New Jersey, 1982.
5679