[go: up one dir, main page]

Academia.eduAcademia.edu
Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 2005 Seville, Spain, December 12-15, 2005 WeC03.3 A Hierarchical Approach To Multi-Player Pursuit-Evasion Differential Games Dongxu Li, Student Member, IEEE, Jose B. Cruz, Jr., Life Fellow, IEEE, Genshe Chen, Chiman Kwan, Senior Member, IEEE, and Mou-Hsiung Chang Abstract—The increasing use of unmanned assets and robots in modern military operations renews an interest in the study of general pursuit-evasion games involving multiple pursuers and multiple evaders. Due to the difficulty in formulation and rigorous treatment, the literature in this field is very limited. This paper presents a hierarchical approach to this kind of problem. With an additional structure imposed on decision-making of pursuers, this approach provides conservative guidance to pursuers by finding certain engagement between pursuers and evaders, and the saddle-point strategies are utilized by each pursuer in chasing the engaged evaders. A combinatorial optimization problem is formulated and scenarios are created to demonstrate the feasibility of the algorithm. This is a preliminary study on multi-player pursuit-evasion games and future directions are suggested. I. INTRODUCTION I n a Pursuit-Evasion (PE) game, the problem of one or a group of pursuers catching one or a group of moving evaders is studied. It has extensive applications such as missile guidance, military strategy, aircraft control and aerial tactics. Under the framework of game theory and optimal control, a number of formal solutions regarding optimal strategies in particular PE problems can be achieved [1]-[2]. In the literature, most studies on PE games have concentrated on two-player games with a single pursuer and a single evader. As the use of unmanned assets and robots increases in modern military operations, newly emergent scenarios usually involve multiple pursuers and evaders. The problem of formulation and computation of optimal pursuit strategies of multiple players in continuous time needs to be addressed. A PE game is usually formulated as a zero-sum game. Since the 1950s, the deterministic PE game of a single pursuer and a single evader with perfect information and common knowledge has been extensively studied. Isaacs Manuscript received September 14, 2005. This work was supported in part by the U.S. Army under Contract W911NF-05-C-0018. D. Li is with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA. (phone: 614-404-5494; email: li.447@osu.edu) J.B. Cruz is with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA. (email: jbcruz@ieee.org) G. Chen is with Intelligent Automation, Inc, 15400 Calhoun Dr, Suite 400, Rockville, MD 20855 USA. (email: gchen@i-a-i.com) C. Kwan is with Intelligent Automation, Inc, 15400 Calhoun Dr, Suite 400, Rockville, MD 20855 USA. (email: ckwan@i-a-i.com) M.H. Chang is with the Mathematics Division, U.S. Army Research Office, (email: mouhsiung.chang@us.army.mil). 0-7803-9568-9/05/$20.00 ©2005 IEEE solved a PE problem for a saddle-point equilibrium solution by the method of “tenet of transition” [1]. Although PE games with multiple pursuers and multiple evaders have been investigated recently, most of them deal with discrete time problems or are in an ad hoc manner [6], [8]. Little has been done for generic multi-player PE differential game problems in continuous time. In this paper, we focus on deterministic multi-player differential PE game problems [4]. We extend Isaacs’ approach on a two-player PE game to a game with multiple players. A conservative strategy is applied from the pursuer’s perspective to achieve an upper-bound of the performance index. This paper is organized as follows. A generic problem is formulated in the next section. In section III, difficulty of application of conventional optimal control theory on multi-player PE games has been analyzed, and then a suboptimal hierarchical approach is proposed from the pursuers’ perspective. In addition, the issue of capturability is discussed and methods are suggested to address the problem. Simulation results are presented in section IV. Finally, the paper concludes with suggestions for future work. II. PROBLEM FORMULATION Consider a general PE differential game with N pursuers and M evaders in a n0 dimensional space S , S i p \ n0 . j e Denote by x ( x ) the state variable associated with pursuer i , i 1," , N (evader j , j 1," , M ), where x ip j \ ne ). Notice that nip (nej ) ( xej \ nip n0 because of the specific dynamics of the pursuer (evader). Assume that the first n0 elements in x ip ( xej ) specify the physical position of pursuer i (evader j ) in space S . In general, the dynamic equations for each pursuer i and evader j are x ip t f pi xip t , ui t xej t fe j xej t ,vj t In (1), ui t time t , with U ai , v j t 0 , where U ai \ mip xip 0 xip 0 xej 0 xej0 . (1) Vaj are control variables for and Vaj j \ me are the sets of corresponding admissible control actions; function f pi , ( f e j , ) is a mapping from \ j nip j U ai ( \ ne Vaj ) to \ nip ( \ ne ). In this paper, we consider the deterministic case, where the function f pi ( f e j ) does not depend on time t 5674 explicitly. For simplicity of notation, let xp x1pT ," , x pN T u u1T ," , u TN and accordingly, fe f e1T ," , f eM T T T , xe x1eT ," , xeM T and v v1T ," , vMT function T T T strategy u * by minimizing the objective (5) subject to (2); while evaders try to maximize it. We use the notation U a s, t to stand for the following set. , , T f p1T ," , f pN T fp U a s, t  u and By (5), for rewritten in a compact form as f p xp t , u t xe t f e xe t , v t Let X p  N i 1 \ nip , Xe  , with M j 1 xp 0 xp0 xe 0 xe 0 j \ ne , U a  N i 1 For pursuer i , define the projection P : \ P x [ x ," , x i p i p1 V x p , xe . (2) N U ai and Va  nip S Vaj . j 1 \ n0 ] . V x p , xe (3) ni \ p, all the pursuers and the evaders. Clearly, for any x j \ ne , j 1," , M ), P x S. In a PE game with multiple evaders, evaders are generally not captured simultaneously. The terminal time of the game can be defined based on the capture of all evaders, i.e., for any j , j 1," , M , there exists i, 0 i N such that d P xip t , P xej t for some t n a metric in \ 0 , e.g., d x, y 0 . Here d y 2 for x x, y is , \ n0 ; is a predetermined small positive real number. The capture time of evader j , T j , can be defined as Tj inf t t N , s.t. d P x ip t , P xej t 0, i,1 i . Then, the terminal time of the PE game, T , is defined as (4) T max T j 1 j M Clearly, by (4), T \ 0, , where, \ stands for the set of all positive real numbers. Consider that the objective function has a general form as follows, where subscript t denotes the time. J x p 0 , xe 0 , u , v T 0 G x pt , xet , ut , vt dt Q xT subject to (2). In (5), u u t 0 t T , u t Ua and similar for (5) . Function G is the cost rate, G , , , : Xp X e U a Va \ T U a s, t . X p and xe min t max u0:T U a 0,T v0:T Va 0,T X e , define V x p , xe T 0 as .1 G x pt , xet , ut , vt dt Q xT (6) as i T pn0 xp s Similarly for V x p , xe , A similar operator can be defined for each pursuer and each evader. We use the notation P to denote the projection for i 1," , N (or x Ua , 0 Use the notation us:t , us:t . Then, the dynamic equation can be x p t u 0 , and function Q quantifies the terminal cost as a function of the states at the terminal time of a game, Q :X \ 0 . In a game where the capture time is the objective, G 1 and Q 0 . In this paper, PE games involving N pursuers and M evaders are modeled as zero-sum games, where pursuers try to find an optimal If V x p , xe V x p , xe max min v0:T Va 0,T u0:T U a 0,T V x p , xe T 0 . G x pt , xet , ut , vt dt Q xT (7) (8) V x p , xe , is called the Value function, and this condition is called the Isaacs condition [2]. This Value is the so-called saddle-point equilibrium defined for a zero-sum game [4]. In this paper, the capitalized “Value” stands for the value function defined in (8), avoiding the confusion with the ordinary meaning of the word. III. CONSERVATIVE HIERARCHICAL APPROACH A. Dilemma in Backward Analysis A PE game has been formulated as a zero-sum game and the saddle-point equilibrium concept is adopted under the Isaacs condition. This fact makes the mathematical tools that are available for solving optimal control problems also useful in solving conventional two-player PE differential games. The kernel of modern optimal control theory includes Pontryagin’s minimum principle and Bellman’s dynamic programming. Both of them specify a set of conditions on optimal dynamic controls. The former involves a set of adjoint Ordinary Differential Equations (ODE) while the latter is associated with a partial differential equation called the Hamilton-Jacobi-Bellman2 (HJB) equation [15]. In both approaches, boundary conditions on the states at the terminal time are needed. Furthermore, Isaacs’ method of “tenet of transition” in treating two-player PE differential games is closely related to dynamic programming, which is based on the underlying idea of state rollback. Starting from the terminal, an optimal trajectory of the states is traced backwards and with a formulation of the Hamilton-JacobiIsaacs (HJI) equation the Value function can be determined, such that saddle-point equilibrium strategies of state feedback can be obtained accordingly. In the problem of a PE game involving multiple players, the backward analysis encounters tremendous difficulty in 5675 1 * We assume the optimum can be attained by some strategy pair u , v * the process of tracing back. The main challenge lies in identifying the terminal states of both pursuers and evaders. Starting from the end, under a saddle-point strategy, each evader has been captured by some specific pursuer. If pursuer i catches evader j in the game, we say that both players are engaged. (Note that it is possible that one evader may be engaged with more than one pursuer and vice versa.) Clearly, the number of possible engagements between pursuers and evaders increases at least exponentially with N and M . This explosion of the number of possible engagements makes the terminal state in a multi-player PE game extremely difficult to specify in contrast to that in a two-player game. If not impossible, the backward analysis should be repeated so many times starting from different engagements even when N and M are not very large. Furthermore, evaders are generally not captured at the same time, which makes determining the terminal state further intractable. Therefore, the backward approach used by Isaacs and Bellman cannot be practically applied in multi-player PE games and the same situation holds for the minimum principle as well. In summary, a multi-player PE differential game cannot be characterized by either a set of adjoint ODEs or a HJI equation. B. Hierarchical Approach To attack this problem, first let us study whether or not the game can be decomposed into games between the pursuers and the evaders that are closely engaged. Assume that there exist continuous trajectories for the group of pursuers and evaders that comprise a saddle-point solution for the game. The engagement scheme among the pursuers and evaders can be obtained at the terminal time of the game. Under this engagement, pursuers do not switch evaders if evaders do not change their strategy, which is the basic rationale of the concept of equilibrium in game theory [2], [4]. In this sense, the problem of determining an engagement may be part of the pursuers’ strategies if they try to maximize (5). This can be viewed as a natural hierarchical structure in decision-making from the pursuers’ perspective. The upper level is to determine a proper engagement scheme among pursuers and evaders, while the lower level solves each engaged differential game. However, given an engagement scheme, the original game cannot be treated as a set of decoupled PE games between pursuers and their engaged evaders. Coordination fills this gap. The major difficulty of multi-player PE differential games is to characterize the coordination among pursuers or evaders. It must be every difficult to solve it by Isaacs’ approach for various terminal times and engagements if it is not impossible. In PE games, Singular Surfaces (SS) are almost always assumed to divide the state space into disjoint regions with continuous differentiability [1]-[2]. We think the 2 In differential games, the corresponding equation is called the Hamilton-Jacobi-Isaacs (HJI) equation, where “minimax” replaces “min”. number of SS increases dramatically with the number of pursuers and evaders, which makes the rigorous treatment of this kind of problem extremely difficult. It is because conventional optimal control theory requires conditions on the smoothness of optimal functions. Instead of determining the exact equilibrium solutions, we seek an upper-bound of the performance index from the pursuers’ perspective. In Isaacs’ approach, the “minmax” operator in (6) is to determine the best worst-case strategy. It is a conservative approach and the optimal objective acts as the least uniform (no dependence on evaders) upper-bound on the pursuers’ performance. When this result coincides with that from (7), it becomes a well-defined solution for zero-sum games. In this paper, we determine a good upper bound instead of the least upper bound. We focus on the capture time as our objective function. The objective function for a game with multiple players is T max T j . Assume that each pursuer is engaged with at j least one evader, and only captures one evader at a time. The original multi-player PE game can be converted into a hierarchical optimization problem. The upper level is to determine such an engagement that T is minimized. Given an engagement, the strategy of each pursuer is obtained by solving decoupled two-player PE games based on Isaacs’ method at the lower level. This is a conservative approach because in many situations, the strategies of multiple pursuers are concealed so that the evaders cannot execute the “optimal” strategies against the engaged pursuers. The structure of the approach is illustrated in Fig. 1. Optimization on Engagement Two-Player PE Game Two-Player PE Game Two-Player PE Game …… Fig. 1. Hierarchical Structure Approach C. Optimization at the Upper Level Let V xip , xej denote the Value function if pursuer i is engaged with evader j . Notice that V xip , xej T j . Assume that V xip , xej can be solved analytically and the optimization problem at the upper level can be formulated as N min J K V xip k , xej k min max bijk j N subject to bijk (9) bijk i 1 k 1 M K 0,1 , bijk i 1 k 1 1 and bijk 1. j 1 In (9), the problem is formulated with multiple stages, taking into account the case when N M . Here, k is the index for stages; bijk is a binary decision variable; bijk 1 indicates that pursuer i is engaged with evader j at stage k ; bijk 5676 0 means the opposite. The maximum number of stages considered is K M N , which is the smallest integer greater than M N . Solving problem (9) provides an upper bound of the objective for the original multi-player PE game. When N M ( K 1 ), problem (9) can be converted into a standard Mixed Integer Linear Programming (MILP) problem by introducing a slack variable w , such that commercial solvers such as CPLEX and LINDO can be utilized [10]. A proper formulation is shown in (10). min J min w (10) subject to T j w, for 1 M N y v p sin p ve cos p ve sin x0p e (14) e x0e and y0 y0p y0e . is defined as The terminal set ( x, y ) ( x, y ) The objective function is the capture time, J H V x ip , xej j 1 i 1 v p cos with x0 N 1 , with T j bij 1 and bij x bij . i 1 D. PE Differential Game of Two Players at the Lower Level Problem (9) requires solving V xip , xej . A number of dt . The 1 Vx v p cos ve cos p Vy v p sin e p ve sin Equation in [1], min max H p 0 , and we obtain e Vx Vy where Vx2 Vy2 The objective function is By the definition of S , the terminal state xT , yT J 0 G xt , ut , vt dt Q xT . Theorem 1: For a two-player PE differential game described in (11) and (12), suppose that the control pair u * , v* is a * saddle-point solution and x t \n * p yT cos * e 2 x V V 2 y * p , sin x * t V y xT H pt , x , u , vt * with x 0 satisfies Vy yT vp . ve Considering the system dynamics, then x2 V x, y cos * p cos y2 x * e x 2 y ve , vp 2 , sin * p sin (15) * e y0 x 2 y2 . (16) * t * t * t H pt , x , u , v * t Vx2 Vy2 0 . It follows that f xt* , ut* , vt* p t * e sin 1 v p ve . Vx xT is the corresponding trajectory, there exists a costate function p t : 0, T such that the following conditions are satisfied: x * t cos Vx (12) . e According to Theorem 1, the costate equations are Vx 0 and Vy 0 . Thus, Vx and Vy are constant. By the Main solutions have been achieved analytically [1]-[2]. The following theorem specifies a set of conditions for an equilibrium solution. Consider a general dynamics as x t f x, u , v with x 0 x0 \ n . (11) T . 2 Hamiltonian is 0,1 , M , bij j perform at their maximum speeds and have complete maneuverability. Small ground vehicles can be approximately described by this dynamics. Define the new states as x x p xe and y y p ye . It follows that * t * t * t H pt , x , u , v x0 and p T time, where H pt , xt , ut , vt x * t * t H pt , x , ut , v Q x* T G xt , ut , vt at the terminal ptT f xt , ut , vt . ฀ Proof: See [2]. A two-player PE problem can be difficult depending on the dynamics and the positions of the players and analytical solutions may be intractable. Here, we illustrate the idea by solving a PE game with simplified dynamics in order to reduce the complexity of determining V xip , xej . For practical problems, it is desired that the model reveal the main features of the players without involving too much of the details. Consider a PE game with two dimensions. The dynamics of the pursuer and the evader are x p v p cos p xe ve cos e . (13) , y p v p sin p y e ve sin e Here is the control variable. Let the initial conditions be x p 0 , y p 0 , xe 0 and ye 0 . In this model, players are assumed to E. Region of Capture In practice, the feasibility of the hierarchical approach depends on the capturability between any pair of a pursuer and an evader. In this section, we briefly discuss this issue. In [1], Isaacs defined concepts of game of degree and game of kind to distinguish between the problem of solving optimal solutions and the problem of existence. For a two-player PE game, denote by C the capture region in the state space, where capturability is guaranteed. On its boundary C , the following condition is satisfied. min max ut vt f xt , ut , vt 0 (17) Here, denotes the normal direction of C . Equation (17) specifies a necessary condition of the capture region. It is similar to HJI equation regardless of the specific objective function. An alternative approach is based on a feedback control design method, which combines the concepts of function minimization and Lyapunov stability techniques. This method is called Lyapunov Optimizing Control (LOC) [13]-[14]. The method of LOC can be used to study the 5677 capturability of PE games and to design a feasible pursuit strategy for pursuers [14]. It depends on a positive definite Lyapunov-type function W x . For any x in the state space, if the following condition is satisfied for any t inf W x inf Wx x f xt , ut , vt 0 for ut ut 0, vt Va , then x belongs to the capture region. Cleary, this condition is sufficient and can be easily verified by stability analysis based on the function W x . F. Summary and Discussion The hierarchical approach introduced above is a suboptimal method. Instead of solving the problem formulated in (6) within the admissible set U a 0, T , we do optimization within a subset U S a 0, T of U a 0, T IV. SIMULATION AND DISCUSSION As stated earlier, conventional optimal control theory is not applicable. Even for the simplest multi-player PE game with two pursuers and one evader, and if the evader is caught by one pursuer, the terminal state of the other pursuer cannot be specified. In this section, only the result from the hierarchical approach is presented. We create a pursuit evasion scenario involving 3 pursuers and 5 evaders in a two-dimensional space. The dynamics of players are given in (13). The capture time is considered as the objective function. The necessary parameters and the initial states are in Table I. TABLE I INITIAL STATES OF PURSUERS AND EVADERS Pursuers 1 2 ( x p0 , y p0 ) (0, 3) (0, 5) by v p (1/sec) imposing a structure S on the strategy of the pursuers, dividing the decision-making of pursuers into two levels. The problem in (6) can be rewritten in the following form. T V x p , xe G x pt , xet , ut , vt dt Q xT min max u0:T U aS 0,T v0:T Va 0,T p (rad/sec) Clearly, the solution of (18) is an upper-bound of the original objective function in (5), and V x p , xe V x p , xe . It is also worth noting that the hierarchical approach described in this section is conceptually identical to a two-level Stackelberg game problem [11] or bi-level programming [12], where the optimization at the upper level depends on that at the lower level. Note that in a two-level Stackelberg problem, there may be more than one player at the second level, where each player chooses a strategy according to its individual objective. One special case is that those objective functions are decoupled from each other, which fits in the hierarchical approach in this section. In general, for the more complicated coupled lower level, one has to formulate the second level as a game, and relevant solution concepts such as Nash, Pareto and etc may be adopted. Finally, the quality of the solution from this suboptimal hierarchical approach depends on the combinatorial optimization problem in (9). This is a NP-hard problem [16], and the computational demand increases at least exponentially with the number of pursuer and of evaders, provided the fact the underlying V x ip , xej between pursuer i and evader j are solvable. Practically, for a PE game 6 5 7 0.8 0.8 0.8 2 p0 2 Evaders 2 1 2 3 4 5 ( xe 0 , ye 0 ) (1, 5) (4, 5) (6, 5) (7, 5) (9, 5) ve (1/sec) 5 4 3 3 5 0 (18) Equation (18) is almost the same as (6) except that optimization of the pursuers is taken over a structured control set U aS 0, T . In some sense, it is only a “local” optimum. 3 (0, 7) Solving the combinatorial optimization problem in (9), we obtain the optimal engagement shown in Table II. The corresponding capture time is 8.8 seconds. TABLE II BEST ENGAGEMENT RESULT P1 P2 E5 E3 N/A E4 Order 1 2 P3 E2 E1 Next, we consider a little more complicated dynamics x ip 0 xej0 x ip v ip cos pi xej vej cos ej y ip v ip sin i i p p i p , y ej u ip vej sin  j e j e u j e , with y ip 0 , yej0 , j e (19) j e0 i p0 where, u p and ue are control variables, and assume 1 u p , ue e  p 1 ; v p , ve , p , e are constant. Consider the case , i.e., evaders are viewed to directly control their orientations. This model was originally used by Isaacs in studying the homicidal chauffeur game [1]. The angular velocity and the initial orientation of each pursuer is given in Table I (shaded). It can be verified that each evader j is in the capture region of some pursuer i [2]. While in the capture region, the optimal strategy of evader j can be shown as that in (16) by maximizing the time derivative of the distance between pursuer i and evader j . involving many pursuers and evaders, heuristic and approximation methods should be applied on (9). In that case, the upper-bound obtained is further degraded by loss of optimality. This result would still be acceptable since no better solution is available. 5678 D x vej cos j e vip cos i p y vej sin x2 y2 j e vip sin i p xej Here x x ip and y yej y ip . Given a constant strategy for evader j , the strategy of pursuer i may be solved by taking the second order derivative of D , which yields (20) u ip sign x sin pi y cos pi . With the hierarchy imposed on the decision-making of the pursuers, given a possible engagement scheme, we simulate every decoupled game, in which pursuers utilize the strategy in (20) and evaders play according to (16). All possible engagements are enumerated and the best engagement result is the same as that in Table II, but the capture time increases to 12.5 seconds. The corresponding pursuit-evasion trajectories are shown in Fig. 2, in which the trajectory of each pursuer with its engaged evaders is plotted separately. Snapshots at the 1st and the 5th second are illustrated. 8 15 6 10 4 REFERENCES 5 2 Pursuer 1 Evader 5 Pursuer 1 Evader 5 0 0 5 10 15 0 0 10 (a.1) By 1 Second 8 E3 10 E3 is captured 5 P2 P2 E4 2 0 E4 -5 5 6 7 8 6 8 10 12 (b.2) By 5 Seconds Pursuer 3 Evaders 2/1 20 Pursuer 3 Evaders 2/1 E2 is captured E2 6 Pursuer 2 Evaders 3/4 -10 4 (b.1) By 1 Second 10 4 0 P3 P3 0 30 15 Pursuer 2 Evaders 3/4 4 2 20 [1] [2] (a.2) By 5 Seconds 6 0 formulated to determine an optimal engagement involving a single pursuer and a single evader. The underlying two-player games are solved by currently available differential game theory. The issue of capturability is discussed. Simulations show the feasibility of the approach. This suboptimal approach provides a theoretical upper-bound to a multiplayer PE game when exact equilibrium solutions are unknown. In practice, solving the combinatorial optimization problem suffers from the difficulty of its NP-hardness. Future work falls in the following directions: 1) the locally optimal strategy obtained by the hierarchical method may be improved and if the improvement can be implemented iteratively, a true equilibrium solution may be approached asymptotically; 2) approximation and heuristic methods may be designed to reduce the complexity of solving the combinatorial optimization problem in practical implementations; 3) PE games may be modeled in a stochastic environment for more realistic conditions. E1 2 -10 4 6 -20 -5 E1 0 5 10 (c.1) By 1 Second (c.2) By 5 Seconds Fig. 2. Pursuit-evasion Trajectories under the Best Engagement. The simulation results demonstrate the cooperation of pursuers when the hierarchical approach is applied, where pursuers cooperate by choosing appropriate evaders to go after. In the trajectories shown in Fig. 2, the “best” strategy is utilized by evaders against each pursuer when engagement is determined. This pursuit strategy is conservative from the pursuer’s perspective, since in practice evaders may not know the pursuer’s strategies perfectly. V. CONCLUSION AND FUTURE WORK In this paper, we deal with a multi-player PE differential game. A generic problem is formulated. Conventional optimal control theory is not applicable to this kind of problem due to the difficulty in specifying the final states. A suboptimal method is proposed to calculate a locally optimal strategy with a specified control structure imposed on pursuers. A combinatorial optimization problem is Isaacs, Differential Games, John Wiley & Sons, Inc., New York, 1965. T. Basar and G.J. Olsder, Dynamic Noncooperative Game Theory, 2nd Ed, the Society for Industrial and Applied Mathematics, 1998. [3] Bertsekas, D.P. 2000. Dynamic Programming and Optimal Control: Volume 1, 2nd Edition. Athena Scientific, Belmont, Massachusetts. [4] M. J. Osborne and A. Rubinstein, A course in game theory, MIT press, Cambridge, Massachusetts, 1994, pp 73-89. [5] V. Turetsky and J. Shinar, J. “Missile Guidance Laws Based On Pursuit-Evasion Game Formulations,” Automatica, Vol. 39, No. 3, pp. 740-746, 2003. [6] Vidal, R., Shakernia, O., Kim, H.J., Shim, D.H. and Sastry, S, “Probabilistic Pursuit-evasion Games: Theory, Implementation, and Experimental Evaluation,” IEEE Transactions on Robotics and Automation, v. 18, pp. 662- 669, 2002. [7] S. M. LaValle, D. Lin, L. Guibas, J. C.Latombe, and R. Motwani. “Finding an unpredicable target in a workspace with obstacles,” In IEEE Int. Conf. Robot. & Autom., 1997. [8] R. Vidal, S. Rashid, C. Sharp, O. Shakernia, J. Kim, and S. Sastry. “Pursuit-evasion games with unmanned ground and aerial vehicles,” IEEE Int. Conf. Robot.& Autom., pages 2948-2955. 2001. [9] H. Yamaguchi, “A distributed motion coordination strategy for multiple non-holonomic mobile robots in cooperative hunting operations,” in Proceedings of the 41st IEEE Conference on Decision and Control, pp. 2984-2991, December 2002. [10] Tom Schouwenaars, Eric Feron, Bart de Moor, and Jonathan How, "Mixed Integer Programming for Multi-vehicle Path Planning," European Control Conference, September 2001. [11] M. Simaan and J. B. Cruz Jr., On the Stackelberg strategy in nonzero-sum games, Journal of Optimization Theory and Applications, V. 11, 533 - 555, No. 5, 1973. [12] B. Colson, P. Marcotte and G. Savard, Bilevel programming: A survey, A Quarterly Journal of Operations Research, 3, 87ದ 107, 2005. [13] T.L. Vincent, Guidance Against Maneuvering Targets Using Lyapunov Optimizing Feedback Control, Proceedings of the American Control Conference, Anchorage, AK, 2002. [14] D.J. Sticht, T.L. Vincent and D.G. Schultz, Sufficiency Theorems for Target Capture, Journal of Optimization Theory and Applications, Vol.17, No.5/6, 1975. [15] Yong, J. and Zhou, X.Y., Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer, 1999. [16] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982. 5679