[go: up one dir, main page]

Academia.eduAcademia.edu
2006 CCRTS The State of the Art and the State of the Practice A Game Theoretic Approach to Threat Intent Inference Assigned Track: C2 Modeling and Simulation Assigned Paper Number: C-076 Dan Shen* The Ohio State University 205 Dresses Laboratory, 2015 Neil Ave Columbus, OH 43202 Ph: (614)292-2572 Email: shen.100@osu.edu Genshe Chen (Principal point of contact) Intelligent Automation, Inc. 15400 Calhoun Drive, Suite 400 Rockville, MD 20874 Tel: 301 294 5218 (direct) Fax: 301 294 5201 Email: gchen@i-a-i.com Jose B. Cruz, Jr., The Ohio State University 205 Dresses Laboratory, 2015 Neil Ave Columbus, OH 43202 Ph: (614)292-1588 Email: cruz.22@osu.edu Chiman Kwan Intelligent Automation, Inc. 15400 Calhoun Drive, Suite 400 Rockville, MD 20874 Tel: 301 294 5238 (direct) Fax: 301 294 5201 Email: ckwan@i-a-i.com ABSTRACT A game theoretic approach to threat intent inference In the adversarial military environment, it is important to efficiently and promptly predict the enemy’s tactical intent from the lower level spatial and temporal information. In this paper, we propose a decentralized Markov game (MG) theoretic approach to estimate the belief of each possible enemy COA (ECOA), which is utilized to model the adversary intents. It has the following advantages: 1) Decentralized. Each cluster or team makes decisions mostly based on the local information. We put more autonomies in each group allowing for more flexibilities; 2) Markov Decision Process (MDP) can effectively model the uncertainties in the noisy military environment; 3) Game model with three players: red force (enemies), blue force (friendly forces), and white force (neutral objects); 4) Deception. With the consideration that asymmetric enemy may manipulate the information available to friendly force; we integrate the deception concept in our game approach to model the action of purposely rendering partial information to increase the payoffs of the enemy. A simulation software package has been developed with connectivity to the Boeing OEP (Open Experimental Platform) to demonstrate the performance of our proposed algorithms. Simulations have verified that our proposed algorithms are scalable, stable, and satisfactory in performance. 1. Introduction Game theory provides a framework for modeling and analyzing various interactions between intelligent and rational decision makers, or players in conflict situations, in which every individual decision maker is not in complete control of other decision units entering into the environment. The idea of the game can be tracked back to the Babylonian Talmud which is the compilation of ancient law and tradition set down during the first five centuries A.D.. However, it was not until 1944 that the mathematical theory of games was invented by John von Neumann and Oskar Morgenstern [1]. Mathematically, Game theory is used to study strategic situations where players choose different actions in an attempt to maximize their returns, which depend upon the choices of other individuals. To make optimal movement in multi-agent systems, strategies of other agents should be taken into account and therefore it is essential to be able to model the behavior of the opponents. In the adversarial military environment, it is important to efficiently and promptly predict the enemy’s or adversary tactical intent from the lower level spatial (terrain) and temporal information. Standard AI tools for solving decision-making problems in complex situations, such as dynamic decision networks and influence diagrams, are not applicable to these kinds of situations. Game theory, on the other hand, provides a mathematical framework designed for the analysis of agent interaction under the assumption of rationality where one tries to identify the game equilibria as opposed to traditional utility maximization principles. A game component in multi-agent decisionmaking thus uses rationality as a tool to predict the behavior of the other agents [11-15]. In this paper, the focus is on the application of Markov Game [2], the multi-agent extensions of Markov Decision Processes (MDPs), to the estimation of enemy course of actions (COAs) [3], which approximately model the intent of targets. By successfully assessing possible future threats from the adversaries, the decision makers can make more effective targeting decisions, plan friendly COAs, mitigate the impact of unexpected adversary actions, and direct sensing systems to better observe adversary behaviors. We have achieved the following important results. First, we developed a highly innovative framework for threat intent prediction in an urban warfare setting based on Markov (Stochastic) game theory. It consists of three closely coupled activities: 1) the processing and integration of information from disparate sources to produce an integrated object state; 2) the reasoning and grouping the cooperative objects which perform common tasks; 3) predicting the intensions and CoAs of asymmetric threats. Second, we have implemented an adversary Markov game model with three players: red force (enemies), blue force (friendly forces), and white force (neutral objects). This is a significantly extension of existing game theoretic tools for modeling and control of military air operations, which does not explicitly consider the neutral force (or civilian) as an intelligent player [13]. Inherent information imperfection is considered and implemented in two methods: 1) decentralized decision making scheme; and 2) deception with bounded rationality. Third, a software prototype has been developed with connectivity to the MICA (Mixed Initiative Control of Automa-Teams) Open Experimental Platform [4] (ontology-based virtual battlespace) to demonstrate the performance of our proposed approach. It has verified that our proposed algorithms are scalable, stable, and perform satisfactorily according to the situation awareness performance metric. The paper is organized as follows. In section 2, we will summarize the technical approach, which includes basic ideas of Markov game, threat intent inference framework, and moving horizon control approach for game solution. Section 3 describes the experimental results. Section 4 concludes the paper. 2. Markov Game Framework We propose a Markov game (MG) theoretic approach to estimate the belief of each possible enemy COA (ECOA) because 1) Decentralized. Each cluster or team makes decisions mostly based on the local information. We put more autonomies in each group allowing for more flexibilities; 2) Markov Decision Process (MDP) can effectively model the uncertainties in the noisy military environment; 3) Game model with three players: red force (enemies), blue force (friendly forces), and white force (neutral objects). Game framework is an effective and ideal model to capture the nature of military conflicts: the determination of one side’s strategies is tightly coupled to that of the other side’s strategies and vice versa. With the consideration that an asymmetric threat (terrorist) may act like a neutral or white object (civilian), we also model the actions of white units in our game framework; 4) Deception [10]. With the consideration that asymmetric enemy may manipulate the information available to friendly force; we integrate the deception concept in our game approach to model the action of purposely rendering partial information to increase the payoffs of the enemy. Fig. 1 Stochastic game model for intent inference The game intent inference framework is shown in Fig. 1. It is constructed from the initial state and evolved according to the transition rule. At each time k, blue and red actions are decided according to the various sensor data, rules of engagement, rationality, terrain information and current state. These actions decide the updated probability distribution over the state space according to the transition rules, which also takes terrain information as one of the inputs. Only one state is selected as the next state. In the real battle, the state is chosen by the nature while in our model it is drawn based on the probabilities of all possible states. The bigger the probability is, the easier it will be drawn. By definition, a Markov (stochastic) game is given by (i) a finite set of players N ; (ii) a finite set of states S ; (iii) for every player i ∈ N , a finite set of available actions Ai (we denote the overall action space A = × i∈N A i ); (iv) a transition rule q : S × A → ∆( S ) , (where ∆ (S ) is the space of all probability distributions over S ); and (v) a payoff function r : S × A → R N . For the intent inference problem, condition (i) is obviously satisfied. Conditions (ii) and (iii) hold because we assume that rules of engagement and terrain information are known and each player has a limited set of COAs given by the doctrines, and terrain information, etc. For example, a river will limit the actions of ground forces. (iv) and (v) are designed according to the specific situations including terrain information. For our threat prediction problem, we obtain the following discrete time decentralized Markov game: Players (Decision Makers) --- Although, in our distributed (decentralized) Markov game model, each group (cluster, team) makes decisions, there are three main players: enemy, friendly force, and white objects. All clusters of enemy (friendly force, or white objects) can be considered as a single player since they have a common objective. State Space --- All the possible COAs for enemy and friendly force consist of the state space. An element s ∈ S is thus a sample of enemy and friendly force COAs composed of a set of triplets (Resource, Action Verb, and Objective). As an example, an enemy COA might be: the red team 1 (Resource) attacks (Action Verb) the blue team 2 (Objective). In the context of this report, it is assumed that, for the enemy COAs, the Resource is always an adversary entity while the Objective is a friendly asset. Similarly, for the friendly force COAs, Resource is a friendly asset and Objective is an adversary entity. s = ( s B , s R , s W ) and S = S B × S R × S W , where s B ∈ S B is the COAs of Blue (friendly) force and s B = {(ri B , aiB , oiB ) | ri B ∈ R B , aiB ∈ A B , oiB ∈ O B } , R B , A B , O B are the set of the resource, action, and objective of blue force, respectively. On the other hand, s R ∈ S R is the COAs of Red (enemy) force and s R = {(ri R , aiR , oiR ) | ri R ∈ R R , aiR ∈ A R , oiR ∈ O R } . sW ∈ S W is the COAs of White force (neutral objects). Action Space --- At every time step, each blue group choose a list of targets with associated actions and confidences (probability distribution over the list of targets, i.e., the sum of the confidences should be equal to 1) based on its local battle field information, such as the unit type and positions of possible targets, from level 2 data fusion. Let DiB denote the action space of the i th blue team. Each element d iB of DiB is defined as d iB = {( a iB , t iB , p iB ) | a iB ∈ A B , t iB ∈ O B ,0 < p iB ≤ 1, (1) p iB = 1} where piB is the probability of the action-target couple ( aiB , t iB ), which defined as the action aiB to target t iB . Therefore, the action space of blue side A1 = ×i∈R DiB . As an example, for the blue small weapon UAV 1 in blue team 1, its action might be d1B ={(attack, red fighter 1, 0.3), (fly to, red fighter 2, 0.5), (avoid, red fighter 3, 0.2)}. B Similarly, each red cluster (obtained from the level 2 data fusion) determines a probability distribution over all possible action-target combinations. Let DiR denote the action space of the i th red cluster. Each element d iR of DiR is defined as (2) d iR = {( a iR , t iR , p iR ) | a iR ∈ A R , t iR ∈ O R ,0 < p iR ≤ 1, p iR = 1} where piR is the probability of action aiR to target t iR . Therefore, the action space of red force A 2 = × i∈R DiR . A possible action for red platform 1 (red fighter 1) is d1R ={(attack, small weapon UAV 1, 0.6), (move to, blue solider 2, 0.2), (avoid, blue solider 1, 0.2)}. Remark: Action and Action Verb are different concepts. Action is a set of triplets with associated probabilities while Action Verb is just a component of triplet composed of Resource, Action Verb and Objective. All Actions are included in A1 for player 1 (Blue force) and A 2 for player 2 (Red force). All Action Verbs are enumerated in A B for player 1 (Blue force) and A R for player 2 (Red force). R The actions of white objects are relatively simple. The main action type is the movement. Let DiW denote the action space of the i th white unit. Each element d iB of DiB is defined as (3) d iW = {( a iW , t iW , p iW ) | a iW ∈ AW , t iW ∈ O W ,0 < p iW ≤ 1, p iW = 1} where piW is the probability of action aiW to target t iW . Transition rule --- Due to the uncertainty properties of military environments, we assume that the states of the Markov game have inertia so that the decision makers have more chance in pursuit of the objective of the previous actions. We define an inertia factor vector η i = (η1i ,η 2i , ,η mi ) T for player i, where mi is the number of the teams or clusters of i player i , and 0 ≤ η ij ≤ 1 , 1 ≤ j ≤ mi . So, for the j th team of the i th player, there is a probability of η ij to keep the current action-target couple and a probability of (1-η ij ) to use the new action composed of action-target couples. There are two steps to calculate the probability distribution over the state space S, where s k , s k +1 are states of time step k and k+1 respectively, a1k , ak2 , a k3 , are the decisions of player 1 (blue force or friendly force) , player 2 (red force or enemy), and play 3 (white force) respectively, at time step k. • Step 1: with the consideration of inertia factor vector η i , we combine the current state with decisions of both players to obtain fused probability distributions over all possible action-target couples for red and blue forces. To do this, we first decompose the current state into the action-target couples for each team of each player (red force or blue force). Let Ψ ij ( s k ) denote the resulting action-target couple related to the j th team of the i th player. For example, if there is one triplet of (blue team 1, attack, red fighter 2) in the current state s k , then the action-target couple for blue team 1 (the first team of blue force) is Ψ11 ( sk ) = (attack, red fighter 2). Secondly, for each specified team, say the j th cluster of player 2 (enemy force), we fuse its action-target couples via modifying the probability of each possible action-target couple based on the following formula p Rj (1 − η 2j ) p Rj (1 − η 2j ) + η 2j p (( a Rj , t Rj ) | s k ) = η 2j 0 , , , , ( a Rj , t Rj , p Rj ) ∈ d Rj ( a Rj , t Rj , p Rj ) ∈ d Rj ( a Rj , t Rj , p Rj ) ∉ d Rj ( a Rj , t Rj , p Rj ) ∉ d Rj and and and and ( a Rj , t Rj ) ∉ Ψ j2 ( s k ) ( a Rj , t Rj ) ∈ Ψ j2 ( s k ) ( a Rj , t Rj ) ∈ Ψ j2 ( s k ) ( a Rj , t Rj ) ∉ Ψ j2 ( s k ) (4) There are four cases in Eq (6): 1) The action-target couple (a Rj , t Rj ) only occurs in the current action of j th cluster of player 2 and is not in the current state s k , which can be mathematically represented by (a Rj , t Rj , p Rj ) ∈ d iR and (a Rj , t Rj ) ∉ {Ψi2 ( sk )} . Then we know the probability of (a Rj , t Rj ) in current state s k is 0 and probability of ( a Rj , t Rj ) in current action is p Rj . So, according to the definition of inertia, the fused probability of the action-target couple (a Rj , t Rj ) is p Rj (1 − η 2j ) +0(η 2j ) = p Rj (1 − η 2j ) . 2) The action-target couple (a Rj , t Rj ) happens both in the current action of j th cluster of player 2 and in the current state s k . Then we know the probability of (a Rj , t Rj ) in current state s k is 1 and probability of (a Rj , t Rj ) in current action is p Rj . So, according to the definition of inertia, the fused probability of the action-target couple (a Rj , t Rj ) is p Rj (1 − η 2j ) +1( η 2j ) = p Rj (1 − η 2j ) + η 2j . 3) The action-target couple ( a Rj , t Rj ) only occurs in the current state s k , and then we know the probability of in current state s k is 1 and probability of (a Rj , t Rj ) in current action is 0. So, according to the definition of inertia, the fused probability of the action-target couple (a Rj , t Rj ) is 0(1 − η 2j ) +1(η 2j ) =η 2j . 4) The action-target couple (a Rj , t Rj ) occurs ( a Rj , t Rj ) neither in the current state s k nor in the current action of j th cluster of player 2, and then we know the probability of (a Rj , t Rj ) in current state s k is 0 and probability of (a Rj , t Rj ) in current action is 0. So, according to the definition of inertia, the fused probability of the action-target couple (a Rj , t Rj ) is 0(1 − η 2j ) +0(η 2j ) =0. Similarly, the new probability distribution for the j th team of player 1 (blue force) is p Bj (1 − η 1j ) p Bj (1 − η 1j ) + η 1j p (( a Bj , t Bj ) | s k ) = η 1j 0 ( a Bj , t Bj , p Bj ) ∈ d Bj ( a Bj , t Bj , p Bj ) ∈ d Bj ( a Bj , t Bj , p Bj ) ∉ d Bj ( a Bj , t Bj , p Bj ) ∉ d Bj , , , , and and and and ( a Bj , t Bj ) ∉ {Ψ 1j ( s k )} ( a Bj , t Bj ) ∈ {Ψ 1j ( s k )} ( a Bj , t Bj ) ∈ {Ψ 1j ( s k )} ( a Bj , t Bj ) ∉ {Ψ 1j ( s k )} (5) The new probability distribution for j th team of player 3 (white force) is p Wj (1 − η 3j ) p Wj (1 − η 3j ) + η 3j p (( a Wj , t Wj ) | s k ) = η 3j 0 • ( a Wj , t Wj , p Wj ) ∈ d Wj ( a Wj , t Wj , p Wj ) ∈ d Wj ( a Wj , t Wj , p Wj ) ∉ d Wj ( a Wj , t Wj , p Wj ) ∉ d Wj , , , , and and and and ( a Wj , t Wj ) ∉ {Ψ 3j ( s k )} ( a Wj , t Wj ) ∈ {Ψ 3j ( s k )} ( a Wj , t Wj ) ∈ {Ψ 3j ( s k )} ( a Wj , t Wj ) ∉ {Ψ j3 ( s k )} (6) Step 2: we determine the probability distribution over the all possible outcomes of state s k +1 , m1 ( q ( s k +1 | s k , a 1k , a k2 , a k3 ) = ∏ p ( a iB , t iB ) | s k i =1 when sk +1 = m1 m2 )∏ p (( a m3 R j , t Rj ) | s k j =1 m2 m3 j =1 j =1 )∏ p ((a W j , t Wj ) | s k ) (7) j =1 {( ri B , aiB , tiB )} {( rjR , a Rj , t Rj )} {( r jW , aWj , t Wj )} . Otherwise, i =1 q ( sk +1 | sk , a , a , a ) = 0 . 1 k 2 k 3 k where m1 is the number of the teams or clusters of player 1 (blue force), m 2 is the number of the teams or groups of player 2 (red force) and m3 is the number of the units of player 3 (white force). {(ri B , aiB , tiB )} is the set of the all possible (with positive probability) triplets for the i th team of player 1 (blue). Therefore m1 {( ri B , aiB , tiB )} contains all the possible (with positive probability) triplets for the i =1 blue force. From the step 1, we know that the fused probability of each specified ( a Bj , t Bj ) is p ((a Bj , t Bj ) | sk ) defined in equation (5). With the assumption that all teams of blue force are independent, we obtain the overall probability of blue force, m1 ( ) ∏ p (aiB , tiB ) | sk . i =1 Similarly, m2 ( ∏ p (a Rj , t Rj ) | sk j =1 ) and m3 ∏ p((a W j ) , t Wj ) | s k are the overall j =1 probabilities of the enemy and white force, respectively. So the probability distribution over the all possible outcomes of state s k +1 (composed of all possible sub-states of blue force and red force) can be calculated via equation (7). Payoff Functions --- In our proposed decentralized Markov game model, there are two levels of payoff function for each player (enemy, friendly force, or white force). • The lower level payoff functions are used by each team, cluster or unit to determine the team actions based on the local information. For the j th team of blue force, the payoff function is defined as f jB (~s jB , d Bj ,WkB ) , where ~s jB ⊆ s is the local information obtained by the team, and WkB , the weights for all possible action-target couples of blue force, is announced to all blue teams and determined according the top level payoff functions by supervisor of Blue force. f jB ( ~ s jB , d Bj , WkB ) = w B ( j , aiB , tiB ,WkB ) piB g B ( j , aiB , tiB , ~ s jB ) (8) ( aiB ,t iB , piB )∈d Bj where, wB ( j, aiB , tiB ,WkB ) will calculate the weigh for any specified action-target couple for j th team of blue force from the WkB , piB is the probability of the actiontarget couple ( aiB , t iB ), and g B ( j , aiB , tiB , ~s jB ) will determine the gain from the action-target couple ( aiB , tiB ) for j th team of blue force according to the positions and features, such as platform values and defense/offense capability, of blue and aimed platforms. Similarly, we obtain the lower level payoff functions for the j th team of red and white force, (9) f jR (~ s jR , d Rj ,WkR ) = w R ( j , aiR , t iR ,WkR ) piR g R ( j , aiR , t iR , ~ s jR ) ( aiR ,tiR , piR )∈d Rj f jW (~ s jW , d Wj ,WkW ) = wW ( j , aiW , t iW ,WkW ) piW g W ( j , aiW , t iW , ~ sW ) (10) ( aiW ,t iW , piW )∈d Wj Remark 1: For some asymmetric threats, such as suicide bombers, the payoff functions may only consider the loss of the blue side. For some camouflage, and concealment entities, their objectives are to hide themselves and move close to the blue units. Other deception units will do some irrational movements to hide their true goals with the cost the time. Remark 2: The white units only care about their possible losses. For an example, when a dangerous spot is detected, normal white units will find a COA to keep themselves as far as possible from the spot. • The top level payoff functions are used to evaluate the overall performance of each player. m1 f jB (~ s jB , d Bj ,WkB ) (11) f jR (~ s jR , d Rj ,WkR ) (12) f jW (~ s jW , d Wj ,WkW ) (13) JB = j =1 k m2 JR = j =1 k m3 JW = k j =1 where k is the time index. In our approach, the lower lever payoffs are calculated distributedly and sent back to commander/supervisor via communication networks. Remark 3: Since the gain functions g B ( j , aiB , t iB , ~s jB ) for blue force, g R ( j , aiR , t iR , ~s jR ) for red force and g W ( j, aiW , t iW , ~s jW ) for white force are different functions, asymmetric force and cost utilities can be straightforwardly represented in our model. In addition, after an irregular adversary is detected, a different type of gain function will be assigned dynamically. The strategies --- In this paper, we try several well known types of strategies. • Min-max strategies [5]. This kind of strategies will give a conservative solution to minimize the possible maximum “loss”. Actually, in our problem, it is a maxmin solution in the sense that each player maximizes the possible minimum his payoffs. So, this kind of strategies is also called safest solutions. • Pure Nash Strategies with finite horizon. In game theory, the Nash equilibrium (named after John Nash [6] who proposed it) is a kind of optimal collective strategy in a game involving two or more players, where no player has anything to gain by changing only his or her own strategy. If each player has chosen a strategy and no player can benefit by changing his or her strategy while the other players keep theirs unchanged, then the current set of strategy choices and the corresponding payoffs constitute a Nash equilibrium. In our approach, we use a game search tree (shown in Fig. 2) to find the solution. In our proposed approach, the solution to the Markov game tree is obtained via a K time-step look-ahead approach, in which we only optimize the solution in the K time-step horizon. The suboptimal technique is used successfully for reasoning in games such as chess, backgammon and monopoly. Fig. 2 A game tree search approach to find pure Nash strategies with finite planning horizon (moving planning window) • Mixed Nash Strategies. A mixed strategy is used in game theory to describe a strategy comprised of possible actions and an associated probability, which corresponds to how frequently the action is chosen. Mixed strategy Nash equilibria are equilibria where at least one player is playing a mixed strategy. It was proved by Nash that that every finite game has Nash equilibria but not all has a pure strategy Nash equilibrium. • Correlated Equilibria [7]. Unlike Nash equilibria, which are the concept of equilibria formulated in independent strategies, the correlated equilibria were developed from the correlated strategies in noncooperative games. The correlated equilibrium of a Markov game describes a solution for playing a dynamic game in which players are able to communicate but are self-interested. Based on the signals generated by the correlated devices and announce to the each decision maker, players choose their actions according to the received private signals. There are two types of correlation devices: autonomous and stationary devices. An autonomous correlation device is a pair , where (i) i is a finite set of signals for player i at time step n, and (ii) Mn d n : M ( n) → ∆ ( M n ) , M n = × i∈N M ni and M (n) = M 1 × M 2 × × M n −1 . A stationary correlation device is a pair , where d ∈ ∆( M ) and M = × i∈N M i . Actually, a stationary correlation device is a special case of an autonomous correlation device, where M ni is independent of n and d n is a constant function that is independent of n . Given a correlation device , we define an extended game . The game is played exactly as the original game, but at the beginning of each stage n, a signal combination mn = (mni ) i∈N is drawn according to the probability function d n (m1 , m 2 , , m n −1 ) and each player i is informed of mni . Then each decision maker must base his choice of actions on the received signal. Any deviator will be punished via his min-max value. The punishment only occurs if a player disobeys the recommendation of the device. It was proved in [7] that every Markov game with an autonomous correlated device admits a correlated equilibrium. • Sequential Nash Strategies [8]. A sequential game is one in which players choose their strategies following a certain predefined order, and in which at least some players can observe the moves of other players who make decisions preceded them. The sequential game is a natural framework to address some real problems, such as the Action-ReactionCounteraction paradigm used in military intelligence and advertising campaigns strategies of several competing firms in economics. In our approach, we use a turn-by-turn scheme shown in Fig. 3. At each step k, the control strategy from only one player, say, player 1 (blue force), is applied and the corresponding Fig. 3 Two-player Sequential Game outcome will be observed by player 2 (enemy force) before it decides its next action. This is helpful in estimating the opponent’s intent because each time only one action is applied and the state changes from the action is observed. • Leader-Follower Strategies [9]. With the consideration of the limited and nonperfect communication, we use the Stackelberg conception to model the cooperation part between the commander and the local teams. The commander (called the leader) declares incentives to the local teams (called followers) in order to induce them to accept his desired system behavior as the common desired behavior. The Leader-Follower strategy is useful in our clearly defined hierarchical systems which have asymmetric information structures (in our case the leader, or commander know the cost functional of every decision maker, or local teams while the followers know only their own). To efficiently reason the enemy’s intent or COAs, we divide our approach into two phases: training phase and reasoning phase. In the first one, we measure or observe the enemy’s actions and compare them with the actions generated by our model. The results are used to tune or adjust the transition rules. In the reasoning phase, we fix the transition rules and use the generated red actions as the intent of enemy. Deceptions [10] are used to make the other player act in our own advantage by making it believe that the game is in a state other than the actual one. It is only possible in partial information games. We propose an equilibrium approach to deception where deception is defined to be the process by which actions are chosen to induce erroneous inferences so as to take advantage of them. This framework differs from the earlier literature on multistage games with incomplete information in that whether the player has a perfect understanding of the strategy employed by his opponent. We introduced two types of deceptions: cognitive type and strategy environment. Cognitive types are defined as follows. Each player i forms an expectation about the behavior of the other player by pooling together several nodes in which the other player must move. Each such pool of nodes is referred to as a class of analogy. Players are also differentiated according to whether or not they distinguish between the behaviors of the various types of their opponent. Formally, a cognitive type of player i is characterized by (An i, di) where Ani stands for player i’s analogy partition and di is a dummy variable that specifies whether or not type ti distinguishes between the behaviors of the various types tj of player j. We let di = 1 when type ti distinguishes between types tj’s behaviors and di = 0 otherwise. A strategic environment is described by (Y,ui,p) where p denotes the prior joint distribution on the type space Q = Q1 Q2. To simplify notation we will assume that the types of the two players are independently distributed from each other, and we will refer to pi = (pti)ti as the prior probability of player is type where pti denotes the prior probability of type ti. 3. Simulations and Experiments In the Simulation part, we build a virtual battle-space and a typical urban scenario based on the Ontology concept, which is an explicit, formal, machine-readable semantic model that defines the classes (or concepts) and their possible inter-relations specific to some specified domain. To simulate our data fusion approach, we implemented and tested our battle-space, scenario and algorithms on the MICA Open Environment Platform (OEP) based on the Boeing C4Isim simulation, which models the collection, processing, and dissemination of battlefield information. We used a scenario shown in Fig. 4 to demonstrate the performance of our proposed threat prediction and situation awareness algorithm. In the shown urban environment, the blue force’s missions are to capture and secure two bridges which are guarded by the red force. The two bridge locations are well connected via wide roads highlighted by dashed lines. The red force includes armed vehicles, fighters and asymmetric forces hiding in and acting like the white objects (the civilians and vehicles). The blue force consists of a few fighters with close air support provided by several unmanned aerial vehicles (UAVs) such as small sensor UAVs and small weapon UAVs, which will, if needed, do some searching and fighting tasks too. We assume the total offense force and total defense force are almost at the same level. There is no dominant one. There are several choices for the red force to guard these objectives efficiently. They can deploy all red units to protect one location. However, the blue force can capture other places first. The blue force faces the same dilemma. So the main challenge for both sides is to understand the situation from the fused sensor data and predict the intent of the opponent under the “believed” war situation. Fig. 4: A Simulated Scenario For this scenario, in a specific simulation run (correlated equilibrium method) as shown in Fig. 5, blue team 1 and blue team 3 were assigned to capture Bridge 1 and Bridge 2, respectively, almost in the whole simulation period of 30 minutes. On the other hand, Red team 1 and Red Team 3 were guarding Bridge 1 and Bridge 2 almost all the time. Blue Team 2 was strategically moving based on the threat prediction from the Markov game. At the same time, the Red Force was dynamically deploying Red Team 2 and trying to keep a balance between the Blue force and the Red force at each bridge. As shown in the movie of the demo, we can see that after about 15 minutes of movement, at Bridge 1, the Blue force achieves domination with 2 Blue teams (Blue Team 1 and Blue Team 3) to attack 1 red team (Red Team 1). The Blue force captures and secures Bridge 1 before Red Team 3 reaches Bridge 1 to help Red Team 1. During this Phase 1 battle, our algorithm detected two asymmetric adversaries with deception (a person and a vehicle) which were hidden in a vast of background harmless civilian activity. After the capture of Bridge 1, the Blue side deployed a part of the remaining force of Team 1 and Team 3 to secure the bridge and others to help Team 2 to capture Bridge 2. Finally, the Blue side won the urban battle at a cost of 5 Blue soldiers and 6 small weapon UAVs. Another hidden asymmetric threat with deception (terrorist) was detected and killed in the Phase II battle of capturing Bridge 2. Fig. 5: Result of a simulation run In addition to the explained run, we performed many experiments. We compared the results using the various options, such as without game theoretic fusion (without levels 2 and 3), without asymmetric Threat Prediction (with level 2 but the payoff function of game model at level 3 doesn’t change dynamically), Game approach with mixed Nash Strategy, and game approach with Correlated Equilibria. The results (since the simulation is stochastic, results are Fig. 6: Damage comparison of various options the mean of 10 runs for each case) are shown in Fig. 6 (Only the damage information of the Blue side is shown). From the damage comparison results, we can see that our proposed Markov game framework with correlated equilibrium and deception consideration for threat detection and situation awareness is better than the other methods. 4. Conclusions Game theoretic tools have a potential for threat prediction that takes real uncertainties in Red plans and deception possibilities into consideration. In this paper, we have evaluated the feasibility of the advanced adversary intent inference algorithm and their effectiveness through extensive simulations. It has verified that our proposed algorithms are scalable, stable, and perform satisfactorily. Acknowledgements This research was supported by the US Navy under contract number N68335-06-C-0105. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Navy. References [1] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, 1944. [2] E. Solan and N. Vieille, “Correlated Equilibrium in Stochastic Games”, Games and Economic Behavior, vol. 38, no. 2, pp. 362-399, Feb. 2002. [3] North Atlantic Treaty Organization, “The Land C2 Information Exchange Data Model”, ATCCIS Baseline 2.0, March 2002 [4] User Guide for the Open Experimental Platform (OEP), version 1.3, Boeing, Mar, 2003 [5] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory, SIAM Series in Classics in Applied Mathematics.Philadelphia, second edition, January, 1999. [6] John Nash, “Noncooperative games”, Annals of Mathematics, vol. 54, pp. 286-295, 1951. [7] E. Solan and N. Vieille, “Correlated Equilibrium in Stochastic Games”, Games and Economic Behavior, vol. 38, no. 2, pp. 362-399, Feb. 2002 [8] D. M. Kreps and R. Wilson, “Sequential equilibria”, Econometrica, vol. 50, no. 4, pp. 863-894, July 1982. [9] M. Simaan and J. B. Cruz, Jr., "On the Stackelberg Strategy in Nonzero-Sum Games," Journal of Optimization Theory and Applications, Vol. 11, No. 5, May 1973, pp. 533-555. [10] J. P. Hespanhaz, Y. S. Ateskan, and H. H. Kizilocak, “Deception in Non-Cooperative Games with Partial Information”, In Proc. of the 2nd DARPA-JFACC Symposium on Advances in Enterprise Control, July 2000. [11] Cruz, J. B., Jr., Simaan, M. A., Gacic, A., Jiang, H., Letellier, B., Li, M., and Liu, Y., “Game-Theoretic Modeling and Control of a Military Air Operation”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 37, No. 4, October, 2001. [12] Cruz, J. B. Jr., Simmaan M. A., Gacic, A., Liu, Y., “Moving horizon Nash strategies for a military air operation”, IEEE Transactions on Aerospace and Electronic Systems, 38(3), 2002. [13] Cruz, J. B., Jr., Chen, G., Garagic D., Tan, X., Li, D., Shen, D., Wei, M., Wang, X., “Team dynamics and tactics for mission planning,” Proceedings, IEEE Conference on Decision and Control, pp. 3579-3584, December 2003. [14] D. Shen, J. B. Cruz, G. Chen, and C. Kwan, S. P. Riddle, S. Cox, and C. Matthews, “A Game Theoretic Approach to Mission Planning For Multiple Aerial Platforms”, AIAA infotech@aerospace, Sept. 26-29, 2005, Arlington, VA. [15] G. Chen, D. Shen, J. B. Cruz, C. Kwan, S. Cox, S. P. Riddle, and C. Matthews, A novel cooperative path planning for multiple aerospace vehicles, Proceedings of AIAA infotech@aerospace, Sept. 26-29, 2005, Arlington, VA. View publication stats