2006 CCRTS
The State of the Art and the State of the Practice
A Game Theoretic Approach to Threat Intent Inference
Assigned Track: C2 Modeling and Simulation
Assigned Paper Number: C-076
Dan Shen*
The Ohio State University
205 Dresses Laboratory, 2015 Neil Ave
Columbus, OH 43202
Ph: (614)292-2572
Email: shen.100@osu.edu
Genshe Chen (Principal point of contact)
Intelligent Automation, Inc.
15400 Calhoun Drive, Suite 400
Rockville, MD 20874
Tel: 301 294 5218 (direct)
Fax: 301 294 5201
Email: gchen@i-a-i.com
Jose B. Cruz, Jr.,
The Ohio State University
205 Dresses Laboratory, 2015 Neil Ave
Columbus, OH 43202
Ph: (614)292-1588
Email: cruz.22@osu.edu
Chiman Kwan
Intelligent Automation, Inc.
15400 Calhoun Drive, Suite 400
Rockville, MD 20874
Tel: 301 294 5238 (direct)
Fax: 301 294 5201
Email: ckwan@i-a-i.com
ABSTRACT
A game theoretic approach to threat intent inference
In the adversarial military environment, it is important to efficiently and promptly predict
the enemy’s tactical intent from the lower level spatial and temporal information. In this
paper, we propose a decentralized Markov game (MG) theoretic approach to estimate the
belief of each possible enemy COA (ECOA), which is utilized to model the adversary
intents. It has the following advantages: 1) Decentralized. Each cluster or team makes
decisions mostly based on the local information. We put more autonomies in each group
allowing for more flexibilities; 2) Markov Decision Process (MDP) can effectively model
the uncertainties in the noisy military environment; 3) Game model with three players:
red force (enemies), blue force (friendly forces), and white force (neutral objects); 4)
Deception. With the consideration that asymmetric enemy may manipulate the
information available to friendly force; we integrate the deception concept in our game
approach to model the action of purposely rendering partial information to increase the
payoffs of the enemy. A simulation software package has been developed with
connectivity to the Boeing OEP (Open Experimental Platform) to demonstrate the
performance of our proposed algorithms. Simulations have verified that our proposed
algorithms are scalable, stable, and satisfactory in performance.
1. Introduction
Game theory provides a framework for modeling and analyzing various interactions
between intelligent and rational decision makers, or players in conflict situations, in
which every individual decision maker is not in complete control of other decision units
entering into the environment. The idea of the game can be tracked back to the
Babylonian Talmud which is the compilation of ancient law and tradition set down
during the first five centuries A.D.. However, it was not until 1944 that the mathematical
theory of games was invented by John von Neumann and Oskar Morgenstern [1].
Mathematically, Game theory is used to study strategic situations where players choose
different actions in an attempt to maximize their returns, which depend upon the choices
of other individuals. To make optimal movement in multi-agent systems, strategies of
other agents should be taken into account and therefore it is essential to be able to model
the behavior of the opponents.
In the adversarial military environment, it is important to efficiently and promptly predict
the enemy’s or adversary tactical intent from the lower level spatial (terrain) and
temporal information. Standard AI tools for solving decision-making problems in
complex situations, such as dynamic decision networks and influence diagrams, are not
applicable to these kinds of situations. Game theory, on the other hand, provides a
mathematical framework designed for the analysis of agent interaction under the
assumption of rationality where one tries to identify the game equilibria as opposed to
traditional utility maximization principles. A game component in multi-agent decisionmaking thus uses rationality as a tool to predict the behavior of the other agents [11-15].
In this paper, the focus is on the application of Markov Game [2], the multi-agent
extensions of Markov Decision Processes (MDPs), to the estimation of enemy course of
actions (COAs) [3], which approximately model the intent of targets. By successfully
assessing possible future threats from the adversaries, the decision makers can make
more effective targeting decisions, plan friendly COAs, mitigate the impact of
unexpected adversary actions, and direct sensing systems to better observe adversary
behaviors. We have achieved the following important results. First, we developed a
highly innovative framework for threat intent prediction in an urban warfare setting based
on Markov (Stochastic) game theory. It consists of three closely coupled activities: 1) the
processing and integration of information from disparate sources to produce an integrated
object state; 2) the reasoning and grouping the cooperative objects which perform
common tasks; 3) predicting the intensions and CoAs of asymmetric threats. Second, we
have implemented an adversary Markov game model with three players: red force
(enemies), blue force (friendly forces), and white force (neutral objects). This is a
significantly extension of existing game theoretic tools for modeling and control of
military air operations, which does not explicitly consider the neutral force (or civilian) as
an intelligent player [13]. Inherent information imperfection is considered and
implemented in two methods: 1) decentralized decision making scheme; and 2) deception
with bounded rationality. Third, a software prototype has been developed with
connectivity to the MICA (Mixed Initiative Control of Automa-Teams) Open
Experimental Platform [4] (ontology-based virtual battlespace) to demonstrate the
performance of our proposed approach. It has verified that our proposed algorithms are
scalable, stable, and perform satisfactorily according to the situation awareness
performance metric.
The paper is organized as follows. In section 2, we will summarize the technical approach,
which includes basic ideas of Markov game, threat intent inference framework, and
moving horizon control approach for game solution. Section 3 describes the experimental
results. Section 4 concludes the paper.
2. Markov Game Framework
We propose a Markov game (MG) theoretic approach to estimate the belief of each
possible enemy COA (ECOA) because 1) Decentralized. Each cluster or team makes
decisions mostly based on the local information. We put more autonomies in each group
allowing for more flexibilities; 2) Markov Decision Process (MDP) can effectively model
the uncertainties in the noisy military environment; 3) Game model with three players:
red force (enemies), blue force (friendly forces), and white force (neutral objects). Game
framework is an effective and ideal model to capture the nature of military conflicts: the
determination of one side’s strategies is tightly coupled to that of the other side’s
strategies and vice versa. With the consideration that an asymmetric threat (terrorist) may
act like a neutral or white object (civilian), we also model the actions of white units in our
game framework; 4) Deception [10]. With the consideration that asymmetric enemy may
manipulate the information available to friendly force; we integrate the deception concept
in our game approach to model the action of purposely rendering partial information to
increase the payoffs of the enemy.
Fig. 1 Stochastic game model for intent inference
The game intent inference framework is shown in Fig. 1. It is constructed from the initial
state and evolved according to the transition rule. At each time k, blue and red actions are
decided according to the various sensor data, rules of engagement, rationality, terrain
information and current state. These actions decide the updated probability distribution
over the state space according to the transition rules, which also takes terrain information
as one of the inputs. Only one state is selected as the next state. In the real battle, the state
is chosen by the nature while in our model it is drawn based on the probabilities of all
possible states. The bigger the probability is, the easier it will be drawn.
By definition, a Markov (stochastic) game is given by (i) a finite set of players N ; (ii) a
finite set of states S ; (iii) for every player i ∈ N , a finite set of available actions Ai (we
denote the overall action space A = × i∈N A i ); (iv) a transition rule q : S × A → ∆( S ) , (where
∆ (S ) is the space of all probability distributions over S ); and (v) a payoff function
r : S × A → R N . For the intent inference problem, condition (i) is obviously satisfied.
Conditions (ii) and (iii) hold because we assume that rules of engagement and terrain
information are known and each player has a limited set of COAs given by the doctrines,
and terrain information, etc. For example, a river will limit the actions of ground forces.
(iv) and (v) are designed according to the specific situations including terrain
information. For our threat prediction problem, we obtain the following discrete time
decentralized Markov game:
Players (Decision Makers) --- Although, in our distributed (decentralized) Markov game
model, each group (cluster, team) makes decisions, there are three main players: enemy,
friendly force, and white objects. All clusters of enemy (friendly force, or white objects)
can be considered as a single player since they have a common objective.
State Space --- All the possible COAs for enemy and friendly force consist of the state
space. An element s ∈ S is thus a sample of enemy and friendly force COAs composed
of a set of triplets (Resource, Action Verb, and Objective). As an example, an enemy
COA might be: the red team 1 (Resource) attacks (Action Verb) the blue team 2
(Objective). In the context of this report, it is assumed that, for the enemy COAs, the
Resource is always an adversary entity while the Objective is a friendly asset. Similarly,
for the friendly force COAs, Resource is a friendly asset and Objective is an adversary
entity. s = ( s B , s R , s W ) and S = S B × S R × S W , where s B ∈ S B is the COAs of Blue (friendly)
force and s B = {(ri B , aiB , oiB ) | ri B ∈ R B , aiB ∈ A B , oiB ∈ O B } , R B , A B , O B are the set of the resource,
action, and objective of blue force, respectively. On the other hand, s R ∈ S R is the COAs
of Red (enemy) force and s R = {(ri R , aiR , oiR ) | ri R ∈ R R , aiR ∈ A R , oiR ∈ O R } . sW ∈ S W is the COAs
of White force (neutral objects).
Action Space --- At every time step, each blue group choose a list of targets with
associated actions and confidences (probability distribution over the list of targets, i.e.,
the sum of the confidences should be equal to 1) based on its local battle field
information, such as the unit type and positions of possible targets, from level 2 data
fusion. Let DiB denote the action space of the i th blue team. Each element d iB of DiB is
defined as
d iB = {( a iB , t iB , p iB ) | a iB ∈ A B , t iB ∈ O B ,0 < p iB ≤ 1,
(1)
p iB = 1}
where piB is the probability of the action-target couple ( aiB , t iB ), which defined as the
action aiB to target t iB . Therefore, the action space of blue side A1 = ×i∈R DiB . As an
example, for the blue small weapon UAV 1 in blue team 1, its action might be
d1B ={(attack, red fighter 1, 0.3), (fly to, red fighter 2, 0.5), (avoid, red fighter 3, 0.2)}.
B
Similarly, each red cluster (obtained from the level 2 data fusion) determines a
probability distribution over all possible action-target combinations. Let DiR denote the
action space of the i th red cluster. Each element d iR of DiR is defined as
(2)
d iR = {( a iR , t iR , p iR ) | a iR ∈ A R , t iR ∈ O R ,0 < p iR ≤ 1,
p iR = 1}
where piR is the probability of action aiR to target t iR . Therefore, the action space of red
force A 2 = × i∈R DiR . A possible action for red platform 1 (red fighter 1) is d1R ={(attack,
small weapon UAV 1, 0.6), (move to, blue solider 2, 0.2), (avoid, blue solider 1, 0.2)}.
Remark: Action and Action Verb are different concepts. Action is a set of triplets with
associated probabilities while Action Verb is just a component of triplet composed of
Resource, Action Verb and Objective. All Actions are included in A1 for player 1 (Blue
force) and A 2 for player 2 (Red force). All Action Verbs are enumerated in A B for player
1 (Blue force) and A R for player 2 (Red force).
R
The actions of white objects are relatively simple. The main action type is the
movement. Let DiW denote the action space of the i th white unit. Each element d iB of DiB
is defined as
(3)
d iW = {( a iW , t iW , p iW ) | a iW ∈ AW , t iW ∈ O W ,0 < p iW ≤ 1, p iW = 1}
where piW is the probability of action aiW to target t iW .
Transition rule --- Due to the uncertainty properties of military environments, we assume
that the states of the Markov game have inertia so that the decision makers have more
chance in pursuit of the objective of the previous actions. We define an inertia factor
vector η i = (η1i ,η 2i , ,η mi ) T for player i, where mi is the number of the teams or clusters of
i
player i , and 0 ≤ η ij ≤ 1 , 1 ≤ j ≤ mi . So, for the j th team of the i th player, there is a
probability of η ij to keep the current action-target couple and a probability of (1-η ij ) to
use the new action composed of action-target couples.
There are two steps to calculate the probability distribution over the state space S, where
s k , s k +1 are states of time step k and k+1 respectively, a1k , ak2 , a k3 , are the decisions of
player 1 (blue force or friendly force) , player 2 (red force or enemy), and play 3 (white
force) respectively, at time step k.
•
Step 1: with the consideration of inertia factor vector η i , we combine the current
state with decisions of both players to obtain fused probability distributions over
all possible action-target couples for red and blue forces. To do this, we first
decompose the current state into the action-target couples for each team of each
player (red force or blue force). Let Ψ ij ( s k ) denote the resulting action-target
couple related to the j th team of the i th player. For example, if there is one triplet
of (blue team 1, attack, red fighter 2) in the current state s k , then the action-target
couple for blue team 1 (the first team of blue force) is Ψ11 ( sk ) = (attack, red fighter
2). Secondly, for each specified team, say the j th cluster of player 2 (enemy
force), we fuse its action-target couples via modifying the probability of each
possible action-target couple based on the following formula
p Rj (1 − η 2j )
p Rj (1 − η 2j ) + η 2j
p (( a Rj , t Rj ) | s k ) =
η 2j
0
,
,
,
,
( a Rj , t Rj , p Rj ) ∈ d Rj
( a Rj , t Rj , p Rj ) ∈ d Rj
( a Rj , t Rj , p Rj ) ∉ d Rj
( a Rj , t Rj , p Rj ) ∉ d Rj
and
and
and
and
( a Rj , t Rj ) ∉ Ψ j2 ( s k )
( a Rj , t Rj ) ∈ Ψ j2 ( s k )
( a Rj , t Rj ) ∈ Ψ j2 ( s k )
( a Rj , t Rj ) ∉ Ψ j2 ( s k )
(4)
There are four cases in Eq (6): 1) The action-target couple (a Rj , t Rj ) only occurs in
the current action of j th cluster of player 2 and is not in the current state s k , which
can be mathematically represented by (a Rj , t Rj , p Rj ) ∈ d iR and (a Rj , t Rj ) ∉ {Ψi2 ( sk )} . Then
we know the probability of (a Rj , t Rj ) in current state s k is 0 and probability of
( a Rj , t Rj ) in current action is p Rj . So, according to the definition of inertia, the fused
probability of the action-target couple (a Rj , t Rj ) is p Rj (1 − η 2j ) +0(η 2j ) = p Rj (1 − η 2j ) . 2)
The action-target couple (a Rj , t Rj ) happens both in the current action of j th cluster
of player 2 and in the current state s k . Then we know the probability of (a Rj , t Rj ) in
current state s k is 1 and probability of (a Rj , t Rj ) in current action is p Rj . So,
according to the definition of inertia, the fused probability of the action-target
couple (a Rj , t Rj ) is p Rj (1 − η 2j ) +1( η 2j ) = p Rj (1 − η 2j ) + η 2j . 3) The action-target couple
( a Rj , t Rj )
only occurs in the current state s k , and then we know the probability of
in current state s k is 1 and probability of (a Rj , t Rj ) in current action is 0. So,
according to the definition of inertia, the fused probability of the action-target
couple (a Rj , t Rj ) is 0(1 − η 2j ) +1(η 2j ) =η 2j . 4) The action-target couple (a Rj , t Rj ) occurs
( a Rj , t Rj )
neither in the current state s k nor in the current action of j th cluster of player 2,
and then we know the probability of (a Rj , t Rj ) in current state s k is 0 and
probability of (a Rj , t Rj ) in current action is 0. So, according to the definition of
inertia, the fused probability of the action-target couple (a Rj , t Rj ) is 0(1 − η 2j ) +0(η 2j )
=0.
Similarly, the new probability distribution for the j th team of player 1 (blue
force) is
p Bj (1 − η 1j )
p Bj (1 − η 1j ) + η 1j
p (( a Bj , t Bj ) | s k ) =
η 1j
0
( a Bj , t Bj , p Bj ) ∈ d Bj
( a Bj , t Bj , p Bj ) ∈ d Bj
( a Bj , t Bj , p Bj ) ∉ d Bj
( a Bj , t Bj , p Bj ) ∉ d Bj
,
,
,
,
and
and
and
and
( a Bj , t Bj ) ∉ {Ψ 1j ( s k )}
( a Bj , t Bj ) ∈ {Ψ 1j ( s k )}
( a Bj , t Bj ) ∈ {Ψ 1j ( s k )}
( a Bj , t Bj ) ∉ {Ψ 1j ( s k )}
(5)
The new probability distribution for j th team of player 3 (white force) is
p Wj (1 − η 3j )
p Wj (1 − η 3j ) + η 3j
p (( a Wj , t Wj ) | s k ) =
η 3j
0
•
( a Wj , t Wj , p Wj ) ∈ d Wj
( a Wj , t Wj , p Wj ) ∈ d Wj
( a Wj , t Wj , p Wj ) ∉ d Wj
( a Wj , t Wj , p Wj ) ∉ d Wj
,
,
,
,
and
and
and
and
( a Wj , t Wj ) ∉ {Ψ 3j ( s k )}
( a Wj , t Wj ) ∈ {Ψ 3j ( s k )}
( a Wj , t Wj ) ∈ {Ψ 3j ( s k )}
( a Wj , t Wj ) ∉ {Ψ j3 ( s k )}
(6)
Step 2: we determine the probability distribution over the all possible outcomes
of state s k +1 ,
m1
(
q ( s k +1 | s k , a 1k , a k2 , a k3 ) = ∏ p ( a iB , t iB ) | s k
i =1
when sk +1 =
m1
m2
)∏ p (( a
m3
R
j
, t Rj ) | s k
j =1
m2
m3
j =1
j =1
)∏ p ((a
W
j
, t Wj ) | s k
)
(7)
j =1
{( ri B , aiB , tiB )} {( rjR , a Rj , t Rj )} {( r jW , aWj , t Wj )} . Otherwise,
i =1
q ( sk +1 | sk , a , a , a ) = 0 .
1
k
2
k
3
k
where m1 is the number of the teams or clusters of player 1 (blue force), m 2 is the
number of the teams or groups of player 2 (red force) and m3 is the number of the
units of player 3 (white force). {(ri B , aiB , tiB )} is the set of the all possible (with
positive probability) triplets for the i th team of player 1 (blue). Therefore
m1
{( ri B , aiB , tiB )} contains all the possible (with positive probability) triplets for the
i =1
blue force. From the step 1, we know that the fused probability of each specified
( a Bj , t Bj ) is p ((a Bj , t Bj ) | sk ) defined in equation (5). With the assumption that all teams
of blue force are independent, we obtain the overall probability of blue force,
m1
(
)
∏ p (aiB , tiB ) | sk .
i =1
Similarly,
m2
(
∏ p (a Rj , t Rj ) | sk
j =1
)
and
m3
∏ p((a
W
j
)
, t Wj ) | s k are the overall
j =1
probabilities of the enemy and white force, respectively. So the probability
distribution over the all possible outcomes of state s k +1 (composed of all possible
sub-states of blue force and red force) can be calculated via equation (7).
Payoff Functions --- In our proposed decentralized Markov game model, there are two
levels of payoff function for each player (enemy, friendly force, or white force).
•
The lower level payoff functions are used by each team, cluster or unit to
determine the team actions based on the local information. For the j th team of
blue force, the payoff function is defined as f jB (~s jB , d Bj ,WkB ) , where ~s jB ⊆ s is the
local information obtained by the team, and WkB , the weights for all possible
action-target couples of blue force, is announced to all blue teams and determined
according the top level payoff functions by supervisor of Blue force.
f jB ( ~
s jB , d Bj , WkB ) =
w B ( j , aiB , tiB ,WkB ) piB g B ( j , aiB , tiB , ~
s jB )
(8)
( aiB ,t iB , piB )∈d Bj
where, wB ( j, aiB , tiB ,WkB ) will calculate the weigh for any specified action-target
couple for j th team of blue force from the WkB , piB is the probability of the actiontarget couple ( aiB , t iB ), and g B ( j , aiB , tiB , ~s jB ) will determine the gain from the
action-target couple ( aiB , tiB ) for j th team of blue force according to the positions
and features, such as platform values and defense/offense capability, of blue and
aimed platforms. Similarly, we obtain the lower level payoff functions for the j th
team of red and white force,
(9)
f jR (~
s jR , d Rj ,WkR ) =
w R ( j , aiR , t iR ,WkR ) piR g R ( j , aiR , t iR , ~
s jR )
( aiR ,tiR , piR )∈d Rj
f jW (~
s jW , d Wj ,WkW ) =
wW ( j , aiW , t iW ,WkW ) piW g W ( j , aiW , t iW , ~
sW )
(10)
( aiW ,t iW , piW )∈d Wj
Remark 1: For some asymmetric threats, such as suicide bombers, the payoff
functions may only consider the loss of the blue side. For some camouflage, and
concealment entities, their objectives are to hide themselves and move close to the
blue units. Other deception units will do some irrational movements to hide their
true goals with the cost the time.
Remark 2: The white units only care about their possible losses. For an example,
when a dangerous spot is detected, normal white units will find a COA to keep
themselves as far as possible from the spot.
•
The top level payoff functions are used to evaluate the overall performance of
each player.
m1
f jB (~
s jB , d Bj ,WkB )
(11)
f jR (~
s jR , d Rj ,WkR )
(12)
f jW (~
s jW , d Wj ,WkW )
(13)
JB =
j =1
k
m2
JR =
j =1
k
m3
JW =
k
j =1
where k is the time index. In our approach, the lower lever payoffs are calculated
distributedly and sent back to commander/supervisor via communication
networks.
Remark 3: Since the gain functions g B ( j , aiB , t iB , ~s jB ) for blue force, g R ( j , aiR , t iR , ~s jR )
for red force and g W ( j, aiW , t iW , ~s jW ) for white force are different functions,
asymmetric force and cost utilities can be straightforwardly represented in our
model. In addition, after an irregular adversary is detected, a different type of
gain function will be assigned dynamically.
The strategies --- In this paper, we try several well known types of strategies.
•
Min-max strategies [5]. This kind of strategies will give a conservative solution
to minimize the possible maximum “loss”. Actually, in our problem, it is a maxmin solution in the sense that each player maximizes the possible minimum his
payoffs. So, this kind of strategies is also called safest solutions.
•
Pure Nash Strategies with finite horizon. In game theory, the Nash equilibrium
(named after John Nash [6] who proposed it) is a kind of optimal collective
strategy in a game involving two or more players, where no player has anything
to gain by changing only his or her own strategy. If each player has chosen a
strategy and no player can benefit by changing his or her strategy while the other
players keep theirs unchanged, then the current set of strategy choices and the
corresponding payoffs constitute a Nash equilibrium. In our approach, we use a
game search tree (shown in Fig. 2) to find the solution. In our proposed
approach, the solution to the Markov game tree is obtained via a K time-step
look-ahead approach, in which we only optimize the solution in the K time-step
horizon. The suboptimal technique is used successfully for reasoning in games
such as chess, backgammon and monopoly.
Fig. 2 A game tree search approach to find pure Nash strategies with finite planning
horizon (moving planning window)
•
Mixed Nash Strategies. A mixed strategy is used in game theory to describe a
strategy comprised of possible actions and an associated probability, which
corresponds to how frequently the action is chosen. Mixed strategy Nash
equilibria are equilibria where at least one player is playing a mixed strategy. It
was proved by Nash that that every finite game has Nash equilibria but not all has
a pure strategy Nash equilibrium.
•
Correlated Equilibria [7]. Unlike Nash equilibria, which are the concept of
equilibria formulated in independent strategies, the correlated equilibria were
developed from the correlated strategies in noncooperative games. The correlated
equilibrium of a Markov game describes a solution for playing a dynamic game
in which players are able to communicate but are self-interested. Based on the
signals generated by the correlated devices and announce to the each decision
maker, players choose their actions according to the received private signals.
There are two types of correlation devices: autonomous and stationary devices.
An autonomous correlation device is a pair
, where (i)
i
is
a
finite
set
of
signals
for
player
i
at
time
step
n,
and
(ii)
Mn
d n : M ( n) → ∆ ( M n ) ,
M n = × i∈N M ni and M (n) = M 1 × M 2 × × M n −1 . A stationary correlation device is a
pair
, where d ∈ ∆( M ) and M = × i∈N M i . Actually, a stationary
correlation device is a special case of an autonomous correlation device, where
M ni is independent of n and d n is a constant function that is independent of n .
Given a correlation device , we define an extended game
. The game
is played exactly as the original game, but at the beginning of each stage n,
a signal combination mn = (mni ) i∈N is drawn according to the probability function
d n (m1 , m 2 , , m n −1 ) and each player i is informed of mni . Then each decision maker
must base his choice of actions on the received signal. Any deviator will be
punished via his min-max value. The punishment only occurs if a player disobeys
the recommendation of the device. It was proved in [7] that every Markov game
with an autonomous correlated device admits a correlated equilibrium.
•
Sequential Nash Strategies [8]. A sequential game is one in which players choose
their strategies following a certain predefined order, and in which at least some
players can observe the moves of other
players who make decisions preceded
them. The sequential game is a natural
framework to address some real
problems, such as the Action-ReactionCounteraction paradigm used in military
intelligence and advertising campaigns
strategies of several competing firms in
economics. In our approach, we use a
turn-by-turn scheme shown in Fig. 3. At
each step k, the control strategy from
only one player, say, player 1 (blue
force), is applied and the corresponding Fig. 3 Two-player Sequential Game
outcome will be observed by player 2
(enemy force) before it decides its next action. This is helpful in estimating the
opponent’s intent because each time only one action is applied and the state
changes from the action is observed.
•
Leader-Follower Strategies [9]. With the consideration of the limited and nonperfect communication, we use the Stackelberg conception to model the
cooperation part between the commander and the local teams. The commander
(called the leader) declares incentives to the local teams (called followers) in
order to induce them to accept his desired system behavior as the common desired
behavior. The Leader-Follower strategy is useful in our clearly defined
hierarchical systems which have asymmetric information structures (in our case
the leader, or commander know the cost functional of every decision maker, or
local teams while the followers know only their own).
To efficiently reason the enemy’s intent or COAs, we divide our approach into two
phases: training phase and reasoning phase. In the first one, we measure or observe the
enemy’s actions and compare them with the actions generated by our model. The results
are used to tune or adjust the transition rules. In the reasoning phase, we fix the transition
rules and use the generated red actions as the intent of enemy.
Deceptions [10] are used to make the other player act in our own advantage by making it
believe that the game is in a state other than the actual one. It is only possible in partial
information games. We propose an equilibrium approach to deception where deception is
defined to be the process by which actions are chosen to induce erroneous inferences so
as to take advantage of them. This framework differs from the earlier literature on multistage games with incomplete information in that whether the player has a perfect
understanding of the strategy employed by his opponent. We introduced two types of
deceptions: cognitive type and strategy environment.
Cognitive types are defined as follows. Each player i forms an expectation about the
behavior of the other player by pooling together several nodes in which the other player
must move. Each such pool of nodes is referred to as a class of analogy. Players are also
differentiated according to whether or not they distinguish between the behaviors of the
various types of their opponent. Formally, a cognitive type of player i is characterized by
(An i, di) where Ani stands for player i’s analogy partition and di is a dummy variable that
specifies whether or not type ti distinguishes between the behaviors of the various types tj
of player j. We let di = 1 when type ti distinguishes between types tj’s behaviors and di =
0 otherwise.
A strategic environment is described by (Y,ui,p) where p denotes the prior joint
distribution on the type space Q = Q1 Q2. To simplify notation we will assume that the
types of the two players are independently distributed from each other, and we will refer
to pi = (pti)ti as the prior probability of player is type where pti denotes the prior
probability of type ti.
3. Simulations and Experiments
In the Simulation part, we build a virtual battle-space and a typical urban scenario based
on the Ontology concept, which is an explicit, formal, machine-readable semantic model
that defines the classes (or concepts) and their possible inter-relations specific to some
specified domain. To simulate our data fusion approach, we implemented and tested our
battle-space, scenario and algorithms on the MICA Open Environment Platform (OEP)
based on the Boeing C4Isim simulation, which models the collection, processing, and
dissemination of battlefield information.
We used a scenario shown in Fig. 4 to demonstrate the performance of our proposed
threat prediction and situation awareness algorithm. In the shown urban environment, the
blue force’s missions are to capture and secure two bridges which are guarded by the red
force. The two bridge locations are well connected via wide roads highlighted by dashed
lines. The red force includes armed vehicles, fighters and asymmetric forces hiding in
and acting like the white objects (the civilians and vehicles). The blue force consists of a
few fighters with close air support provided by several unmanned aerial vehicles (UAVs)
such as small sensor UAVs and small weapon UAVs, which will, if needed, do some
searching and fighting tasks too. We assume the total offense force and total defense
force are almost at the same level. There is no dominant one. There are several choices
for the red force to guard these objectives efficiently. They can deploy all red units to
protect one location. However, the blue force can capture other places first. The blue
force faces the same dilemma. So the main challenge for both sides is to understand the
situation from the fused sensor data and predict the intent of the opponent under the
“believed” war situation.
Fig. 4: A Simulated Scenario
For this scenario, in a specific simulation run (correlated equilibrium method) as shown
in Fig. 5, blue team 1 and blue team 3 were assigned to capture Bridge 1 and Bridge 2,
respectively, almost in the whole simulation period of 30 minutes. On the other hand,
Red team 1 and Red Team 3 were guarding Bridge 1 and Bridge 2 almost all the time.
Blue Team 2 was strategically moving based on the threat prediction from the Markov
game. At the same time, the Red Force was dynamically deploying Red Team 2 and
trying to keep a balance between the Blue force and the Red force at each bridge. As
shown in the movie of the demo, we can see that after about 15 minutes of movement, at
Bridge 1, the Blue force achieves domination with 2 Blue teams (Blue Team 1 and Blue
Team 3) to attack 1 red team (Red Team 1). The Blue force captures and secures Bridge
1 before Red Team 3 reaches Bridge 1 to help Red Team 1. During this Phase 1 battle,
our algorithm detected two asymmetric adversaries with deception (a person and a
vehicle) which were hidden in a vast of background harmless civilian activity. After the
capture of Bridge 1, the Blue side deployed a part of the remaining force of Team 1 and
Team 3 to secure the bridge and others to help Team 2 to capture Bridge 2. Finally, the
Blue side won the urban battle at a cost of 5 Blue soldiers and 6 small weapon UAVs.
Another hidden asymmetric threat with deception (terrorist) was detected and killed in
the Phase II battle of capturing Bridge 2.
Fig. 5: Result of a simulation run
In addition to the explained run, we
performed many experiments. We
compared the results using the various
options, such as without game theoretic
fusion (without levels 2 and 3), without
asymmetric Threat Prediction (with level
2 but the payoff function of game model
at level 3 doesn’t change dynamically),
Game approach with mixed Nash
Strategy, and game approach with
Correlated Equilibria. The results (since
the simulation is stochastic, results are
Fig. 6: Damage comparison of various options
the mean of 10 runs for each case) are
shown in Fig. 6 (Only the damage information of the Blue side is shown). From the
damage comparison results, we can see that our proposed Markov game framework with
correlated equilibrium and deception consideration for threat detection and situation
awareness is better than the other methods.
4. Conclusions
Game theoretic tools have a potential for threat prediction that takes real uncertainties in
Red plans and deception possibilities into consideration. In this paper, we have evaluated
the feasibility of the advanced adversary intent inference algorithm and their
effectiveness through extensive simulations. It has verified that our proposed algorithms
are scalable, stable, and perform satisfactorily.
Acknowledgements
This research was supported by the US Navy under contract number N68335-06-C-0105.
The views and conclusions contained herein are those of the authors and should not be
interpreted as necessarily representing the official policies or endorsements, either
expressed or implied, of the Navy.
References
[1] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior,
Princeton University Press, 1944.
[2] E. Solan and N. Vieille, “Correlated Equilibrium in Stochastic Games”, Games and
Economic Behavior, vol. 38, no. 2, pp. 362-399, Feb. 2002.
[3] North Atlantic Treaty Organization, “The Land C2 Information Exchange Data
Model”, ATCCIS Baseline 2.0, March 2002
[4] User Guide for the Open Experimental Platform (OEP), version 1.3, Boeing, Mar,
2003
[5] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory, SIAM Series in
Classics in Applied Mathematics.Philadelphia, second edition, January, 1999.
[6] John Nash, “Noncooperative games”, Annals of Mathematics, vol. 54, pp. 286-295,
1951.
[7] E. Solan and N. Vieille, “Correlated Equilibrium in Stochastic Games”, Games and
Economic Behavior, vol. 38, no. 2, pp. 362-399, Feb. 2002
[8] D. M. Kreps and R. Wilson, “Sequential equilibria”, Econometrica, vol. 50, no. 4,
pp. 863-894, July 1982.
[9] M. Simaan and J. B. Cruz, Jr., "On the Stackelberg Strategy in Nonzero-Sum
Games," Journal of Optimization Theory and Applications, Vol. 11, No. 5, May
1973, pp. 533-555.
[10] J. P. Hespanhaz, Y. S. Ateskan, and H. H. Kizilocak, “Deception in Non-Cooperative
Games with Partial Information”, In Proc. of the 2nd DARPA-JFACC Symposium on
Advances in Enterprise Control, July 2000.
[11] Cruz, J. B., Jr., Simaan, M. A., Gacic, A., Jiang, H., Letellier, B., Li, M., and Liu, Y.,
“Game-Theoretic Modeling and Control of a Military Air Operation”, IEEE
Transactions on Aerospace and Electronic Systems, Vol. 37, No. 4, October, 2001.
[12] Cruz, J. B. Jr., Simmaan M. A., Gacic, A., Liu, Y., “Moving horizon Nash strategies
for a military air operation”, IEEE Transactions on Aerospace and Electronic
Systems, 38(3), 2002.
[13] Cruz, J. B., Jr., Chen, G., Garagic D., Tan, X., Li, D., Shen, D., Wei, M., Wang, X.,
“Team dynamics and tactics for mission planning,” Proceedings, IEEE Conference
on Decision and Control, pp. 3579-3584, December 2003.
[14] D. Shen, J. B. Cruz, G. Chen, and C. Kwan, S. P. Riddle, S. Cox, and C. Matthews,
“A Game Theoretic Approach to Mission Planning For Multiple Aerial Platforms”,
AIAA infotech@aerospace, Sept. 26-29, 2005, Arlington, VA.
[15] G. Chen, D. Shen, J. B. Cruz, C. Kwan, S. Cox, S. P. Riddle, and C. Matthews, A
novel cooperative path planning for multiple aerospace vehicles, Proceedings of
AIAA infotech@aerospace, Sept. 26-29, 2005, Arlington, VA.
View publication stats