Optimal Policy in Discrete Pursuit-Evasion Games Marcos A. M. Vieira and Ramesh Govindan and Gaurav S. Sukhatme Abstract— In a discrete Pursuit-Evasion game, we describe an algorithm to calculate the optimal policy that pursuers should use in order to capture evaders with the minimum number of steps. We assume that all players have full knowledge and that evaders also play an optimal strategy. We illustrate how convergence time varies by topology. We have implemented this algorithm, and present results on a physical multi-robot testbed. I. INTRODUCTION We are motivated by practical problems in security and monitoring for large, structured, spaces (e.g., to ensure the integrity of a large building or complex). The problem we focus on is pursuit-evasion wherein robots must pursue and catch evaders. In Pursuit-Evasion Games (PEGs), multiple robots (the pursuers) collectively determine the location of one or more evaders, and try to corral them. The game terminates when every evader has been corralled by one or more robots. Several versions of the problem exist. In certain frameworks, it is acceptable to merely “sight” an evader for it to be “located”, in others, a precise coordinate must be reported. Other formulations insist on a certain speed of convergence with fewer constraints on accuracy. Finally, formulations vary depending on whether the multi-robot control algorithm is required to have provably correct behavior, whether the number of evaders is known a priori, and whether they are malicious or benign. Each variation of the problem brings with it a different set of challenges, and several of these variations have been solved to varying degrees. We focus on PEGs in bounded, spatially complex environments similar to today’s office environments. In such environments, we can assume imperfect geometric regularity (e.g., the presence of corridors and 90-degree turns, but possibly a regular placement of doorways or elevator exits). However, it is increasingly true that such environments are well provisioned with wireless communication capability, and that many such environments will likely have dense embedded sensing (for surveillance or environmental control). This environmental-embedded sensing can provide, for the players, full knowledge of the game. We consider the class of PEGs played on a discrete graph. Specifically, we use a topological map of the environment, whose nodes correspond to coarse-grained regions and whose links connect neighboring regions [1], [2]. Discrete graph based games are acceptable for many uses of pursuit-evasion (e.g., surveillance, finding survivors). Marcos Vieira and Ramesh Govindan and Gaurav Sukhatme are with Department of Computer Science, University of Southern California, 941 W. 37th Place, Los Angeles,CA USA {mvieira,ramesh,gaurav}@usc.edu In this paper, we consider a version of the game in which M pursuers collectively attempt to capture N evaders (Section II). We are interested in the convergence time of the game (i.e., the minimum number of steps for the pursuers to capture the evaders). To the best of our knowledge (Section VI), the first provably minimal convergence time algorithm for multiple pursuers and evaders is presented (Section III). In our formulation, all players have complete knowledge of the state of others, and the evaders also pursue an optimal evasion strategy. Clearly, convergence time depends on the topology as well as on the number of pursuers and evaders: we explore these dependencies for a few canonical topologies (Section IV). Finally, we present results from running an implementation of our algorithm on a physical robot testbed (Section V). The experiments confirm the feasibility of our algorithm, even on a worst-case initial configuration. II. G AME D EFINITION AND A SSUMPTIONS Let G = (V, L) be a finite connected undirected graph with V vertices and L lines or edges. There are two teams of players called pursuers P and evaders E. Initially, P and E occupy some vertices of G. In describing the algorithm, we assume that time is discrete and increments at steps of 1. At each time step, all pursuers and evaders are given the poses of all participants. Both teams play a game on G according to the following rule. At each step, each pursuer chooses a neighboring vertex of G to move to, then the evaders do the same. They then move to the corresponding vertex in G, as defined in [3], and repeat the previous step. An evader is captured if at a time t, the pursuer P is on the same vertex as the evader being captured. The team of pursuers P wins if it captures all evaders. If an evader can avoid capture indefinitely, then the evader team wins the game. We can now define our game more formally as follows: Input Coarse estimated poses of N robots and M evaders in a bounded environment E. Output Motion commands for N robots. Goal Minimize the capture time of the evaders. Restriction No motion model of the evaders is available to pursuers. Before describing the optimal strategy for this game, we discuss some assumptions and describe our notation. We assume the pursuers and the evaders have full knowledge. This is possible in an indoor scenario with an environmentembedded wireless communication network that also provides a sensing capability. The communication network provides a backbone for sharing game state, and the sensing capability detects evader positions, thereby giving full visibility to the pursuers. We assume that the evaders also play an optimal strategy: they too have, at each time instant, the locations of all the pursuers, and decide their moves with a view to avoiding capture for the longest possible time. Finally, we also assume that pursuers and evaders move at the same speed (more precisely, we assume that they move exactly one hop in the topology at each time step). Let p=|P|, e=|E|, v=|V |. Let Pi be the position of the ith pursuer and Ei be the position of the ith evader. The tuple a =< P0 , . . . , Pp , E0 , . . . , Ee > represents the current position of all participants. We define a boolean variable T (turn) to denote if it is the pursuers’ turn to move or not (recall that, in our algorithm, pursuers and evaders alternate at each time step1 We say that the tuple < a, T > encodes the state s of a game. In general, a game can have 2 ∗ v p+e states, because each pursuer and evader can be at one of v positions, and for each configuration of pursuers and evaders, there are 2 turns (evader and pursuer moves). Each game can be represented by a sequence of transitions through this state space. Each pursuer and evader executes a deterministic algorithm (called its policy) for determining, given the current state, what the next move should be. We call ρ the pursuer policy and ε the evader policy. Since we consider deterministic policies, if in a particular game, a state is repeated, the game will not terminate and the evaders win. The game terminates when a capture state is reached. In a capture state, at least one pursuer occupies each vertex in which an evader resides. There exists a different definition of termination. If, during the evolution of the game, a pursuer reaches an evader’s position, the evader exits the game. It is easy to see that our definition results in a game that is strictly harder than this variant. When evaders exit the game, the remaining pursuers can always, and more quickly, capture the remaining set of evaders. Indeed, there exist cases where our game might not terminate (a 2-pursuer 2-evader game, in some topologies) but this variant will. III. T HE O PTIMAL P OLICIES FOR P URSUIT-E VASION Pursuit-Evasion is a zero-sum game since pursuer’s gain or loss is exactly balanced by the losses or gains of the evader. The evader’s goal is to escape as long as possible whereas the pursuers have to capture the evaders as fast as possible. Zero-sum games have been extensively studied in the game theory literature, and our solution models a PEG as a zero-sum game that uses the minimax algorithm [4]. This algorithm minimizes the maximum possible loss for each player in the game. To describe this algorithm, consider first that the evolution of any PEG can be represented by a game graph, a directed graph with possible cycles. The start state (as defined by the starting configuration of the pursuers and evaders) has a directed edge from itself to all possible next states that 1 This assumption enables us to analyze our algorithm. Of course, in real world experiments, it is difficult to ensure this synchrony. the pursuers can make from the start state. (In our game, we assume that pursuers and evaders move alternatively). In turn, from each of these states, there is a directed edge to all possible next states resulting from evaders moves from that state. The graph can thus be recursively defined. In general, a game is a traversal on this graph. If this traversal ends in a capture state, the pursuers win the game. However, it is also possible for the traversal to repeat states: such a traversal will result in a non-terminating game and the evaders win. Suppose now, that in the game graph, we assign to each state S, a cost function C. • When an evader moves, C(s) denotes the maximum distance from state s to a capture state, and • When a pursuer moves, C(s) represents the minimum distance from state s to a capture state. In our game, we consider the following policies: • The pursuers’ policy ρ is: choose that neighboring state in the game graph which has the smallest C(s). Intuitively, this moves the game at each step as close as possible to a capture state. • The evaders’ policy ε is: choose that neighboring state in the game graph which has the largest C(s). Intuitively, this moves the game at each step as far away as possible from a capture state. Thus, the evader is truly adversarial. In what follows, we first show how to compute the game graph efficiently. Then, we prove that these policies are optimal from the pursuer’s perspective: if the game terminates, they reach a capture state in the shortest possible number of moves. First, we construct the game graph by generating all states and all possible transitions between states. Initially, all states have cost infinity, except the initial set of capture states which have cost 0 (Algorithm 1). Let us define F0 as a set of capture states, in other words, the states where at least one pursuer occupies each vertex in which an evader resides. These states have cost value 0 since no move is necessary for the termination of the game. Now define F1 as the set of states which can reach one of the capture states in one pursuer move. F1 consists of both states where it is the pursuer’s turn to move, and states where it is the evader’s turn to move. A state where it is the evader’s turn to move belongs to F1 where irrespective of the next move made by the evader, the pursuer can still reach the capture state in a single step. Similarly, we can define inductively the Fi+1 set of states as the states from which the pursuers only needs i + 1 steps to terminate the game, irrespect of evader’s movements. For any state s = (a, 1), when it is the pursuer’s turn to move, the cost C(s) = minimum (C(s′ ))+1 where s′ is any state that the pursuer can transition to from s. For any state s = (a, 0), when it is the evader’s turn to move, the cost C(s)= maximum C(s′ ), where s′ is any state that the evader can transition to from s. For a pursuer, the goal is to reach capture state, i.e. minimize C. For a evader, if C(s) is finite, where s is the current state, then irrespectively of what the evader does, it will be captured within C(s) pursuer’s move. The evader should then choose a transition a to a′ which maximizes the time of capture. If any state has a cost function value infinity, it means the number of pursuers is not sufficient to capture the evaders. It is necessary to add more pursuers to the game. Algorithm 1 Algorithm for computing the game graph. {Initialization} Generate all states Generate all possible transitions for all state s do if s is a capture state then add s to F0 C(s) ← 0 {cost function} else C(s) ← ∞ end if end for i←0 repeat i ← i+1 change ← f alse U ← set of all unmarked states that have a transition to a marked state. for all s in U do if s is a pursuer move then if s has at least one transition to a marked state then add s to Fi {mark s} C(s) ← min (over all marked neighbors)+1 {count this move} add transition to ρ change ← true end if else {evader move} if all transition from s reach a marked state then add s to Fi {mark s} C(s) ← max (over all marked neighbors) add transition to ε change ← true end if end if end for until not change Algorithm 1 presents the optimal policy for evaders and pursuers that we describe above. The algorithm loops while we add more states to Fi . Eventually, when there are no more states to be added, the algorithm terminates. Since each state can only be added once, the algorithm terminates. Theorem 3.1: Algorithm 1 provides the optimal policy ρ for pursuers to play the game in a graph G. Proof: We prove this by induction. The induction hypothesis is if state s belongs to Fi , then the game will terminate in at most C(s) = i steps independent of evader movements. If i = 0, by definition of Fi , it is a capture state, and the game is over. Suppose the claims holds for i and let us prove it for i + 1. Let state s = (a, 1) ∈ Fi+1 . By definition of Fi+1 , there exists transition (s, (a′ , 0)) with (a′ , 0) ∈ Fi . Let the pursuer move. The cost will be minimum of C(a′ ) + 1. Suppose the evader can escape on arrangement a′ . But then, the evader would have chosen a transition with cost infinity, and (a′ , 0) would not have been in Fi . By definition of Fi ,there exists transition ((a′ , 0), (a′′ , 1)) and (a′′ , 1) is a captured stated with C(a′′ ) = i. Hence, if a state s belongs to Fi+1 , the cost C(s) will be 1 to get to (a′′ , 1) +C(a′′ ), which terminates in i steps by the induction hypothesis. For a graph with v vertices, the time complexity and space complexity is O(v p+e ). IV. RESULTS Table I shows some instances of games and their properties. The first and second column are the topology name and its graphical representation respectively. The third column is the necessary number of pursuers to guarantee the termination of the game. The fourth and fifth columns represent the maximum number of steps to terminate the game, given the number of pursuers. It is interesting to see that even though the torus topology needs at least 3 pursuers, if we play the game with 3 pursuers, the game will be shorter than in a 4x4 grid. Another interesting fact is increasing the number of edges from grid 2d 4x4 to cylinder 4x4 does not increase the necessary number of pursuers but it decreases the number of steps to terminate the game. Increasing the edges again (from cylinder to torus), increases the necessary number of pursuers. 2 4 3 2 0 1 1 2 1 Fig. 1. Ring Topology Figure 1 shows a ring topology with 5 nodes. A game with 2 pursuers and 2 evaders will not terminate in this topology, since we defined the game to end only when all the evaders are captured. To illustrate this, suppose pursuer 1 stayed on the same location as evader 1. Pursuer 2 needs to capture evader 2, but evader 2 just needs to stay away from pursuer 2. Topology Name Graphical C(G) 2 pursuers 3 pursuers 2 6 6 2 5 5 3 ∞ 4 1 1 4 2 5 3 6 2 3 1 2 3 4 Grid 2D 4x4 5 54 6 1 1 2 Cylinder Grid 4x4 1 2 3 2 3 4 5 5 4 43 5 1 3 2 3 1 2 4 3 1 2 1 4 2 3 2 3 Torus Grid 4x4 3 4 4 2 TABLE I I NSTANCES OF G AMES AND THEIR PROPERTIES V. E VALUATION In this section, we describe the results on a physical robot testbed. To validate that our scheme is implementable, we played a 4 pursuer, 2 evader game. We analyzed all possible such games and chose a worst-case initial configuration (there can be more than one initial configuration with the same convergence time). Our robot platform consists of an iRobot Create and a small embedded computer mounted on top of it (Figure 2). The Create, a differential drive robot, has a round chassis of 33 cm diameter. The embedded computer, the Ebox 3854, is an 800MHz embedded PC with 256MB shared DDR memory, and supports compact flash sockets. The embedded computer runs Linux Fedora Core 6 as the operating system. For sensing and control, we developed a Create driver for Player [5]. The nominal speed is 0.2 m/s. The robots use the network shown in Figure 3. This network is deployed above the false ceiling on one floor of a large office building and consists of two tiers: an upper tier containing 9 Stargates (embedded computers running Linux), and a lower-tier containing 56 tmoteSky motes (tiny commercial sensor nodes). The sensor nodes run an 8MHz Texas Instruments MSP430 microcontroller, have 10KB RAM and a 2.4 GHz IEEE 802.15.4 Chipcon Wireless Transceiver with a nominal bit rate of 250 Kbps. The sensor nodes provide the capability of a virtual Fig. 2. The robot platform - an iRobot Create with an Ebox Fig. 3. Layout of the network testbed position sensor, and can sense the position of pursuers and evaders using radio signal strength indicator (RSSI). We executed the algorithm discussed in this paper on a computer and generate a policy file for pursuers and evaders. This file (available to all robots) contains the next state to go given the current state. We code a state as a number in base v. For example, suppose in a 3 pursuer, 1 evader game with 3 topological nodes, the evader is at location 1 and pursuers are located at 2,2 and 0 respectively. The sequence 1220 in base 3 is 51. Thus, the current state is represented by number 51. For a 4 pursuer, 2 evader game on 9 nodes, the file size is about 240 Kb. Each robot has an id, which gives a priority. Robots with high priority have their position code first. In this way, robots can coordinate among themselves without the need to communicate directly with each other. Robots receive position estimation from the network, check their policy file and get the next state number to go. With the id and the next state number information, each robot determines what is the next position to go by decoding it. To decode, each robot considers the next state number as a number in base v, and extracts the ith digit, where i represents its priority. We play the games on the floor plan shown in Figure 3, using the network whose nodes are shown in that figure. We use a topological map of the environment consisting of 9 nodes, whose nodes correspond to coarse-grained regions and whose links connect neighboring regions, as shown in Figure 4. Figure 4 shows the trajectory of each pursuer during one instance of a game. The nodes and solid undirected edges connecting them represent the topological map. The solid directed lines (lines with arrows) show the pursuer path as determined by the localization system. The dashed lines illustrate the evader path. The edge labels represent the time sequence of the robot. The pursuer and evader’s initial positions are indicated by the corresponding icons. Our robots can only follow walls in one direction (and must cross the corridor to reverse direction) and therefore must correct their orientation a posteriori. For example, it is not possible for the robots to go from position 2 to position 1 directly because the robots follow the walls to their right. To make it clear, we give a narrative of the game. All pursuers start at position 4 and all evaders start at position 0. Pursuer 3 is the first to move (it tries to corral by going around). The other pursuer do not move because they know pursuer 3 has to go around so they can corral the evaders. At time 1, pursuer 3 is at location 3. At time 2, pursuer 1 tries to capture evaders by moving towards them. Pursuer 3 moves to location 2. At time 3, pursuer 2 tries to capture evaders by moving towards them and move to location 3. Pursuer 1 and 3 keep moving and are at location 2 and 5 respectivelly. At time 4, pursuer 4 just once moves closer to evaders, it is not really necessary in the game. Pursuers are at 5,2,6,3. At location 5, pursuers 1 and 2 switch their orientation (turn around to another wall). At time 5, pursuers are at 2,5,6,3. At time step 6, pursuers are at 1,5,6,3 and evaders at location 0. If no evader moves, pursuer 1 would captured them. So, at time step 7, evader 2 tries to escape by going to location 1. Evader 2 knew evader 1 was not captured, thus it tried to escaped even though it went to same location as pursuer 1. Evader 1 did not move to location 8 because it would be captured by pursuer 3 anyways. Finally, at time 8, pursuer 1 captured evader 1 at location 0 and pursuer 2 captured evader 2 at location 1. Pursuer 2 is at location 8 to guarantee evaders are corral. In a simulation analysis, the convergence time for the game to end is 6 steps. Our results took 8 steps because our robots can only follow walls in one direction and we did not take this constraint into our analysis. Indeed, if you take into account that to move from node 2 to node 1, you need to go to node 5 and node 2 again, our results match our analysis. Hence, the experiments confirm the feasibility of our algorithm. VI. R ELATED W ORK To situate our work in the existing literature, we classify the type of PEGs using seven criteria: the ratio of the number of pursuers to the number of evaders; whether pursuers and evaders have full and/or global visibility, or whether they can only see within a threshold distance or until occluded by an obstacle (usually modelled by the edge of a polygon in 2D); what additional information robots have with respect to the opponents’ strategy or planning algorithm; whether the environment is modelled as a graph (discrete) or a polygon (continuous half-space with lines in 2D as boundaries); how the evader is captured, whether by being surrounded, seen or sensed by the pursuer, or approached within a certain distance, or physically contacted; the relative speed between 5,6 8 7 8 7 6 4 5 1 1 2 7 7 4 3 3 1 3 2 4 1 8 Fig. 4. A 4 pursuer, 2 evader game the pursuer and evader; and, how much uncertainty in sensing, actuation and communication is injected into the game. Our work assumes the pursuers and evaders have full visibility; they can know about others’ strategy; the environment is modeled as a graph; the evaders are captured only if all of them share a vertex with some pursuers; the evader speed is the same as the pursuer; and we did not consider uncertainty. Other work has explored theoretical bounds on eventual capture [6], [7], or pursuit-evasion under constrained geometries [8], [9], [10], [11], or has examined sophisticated control strategies [12], [13]. In [14], an algorithm to determine if K pursuers are sufficient to capture an evader is presented. The minimum necessary pursuers to capture an evader is called the the cop number c(G). They also shown that every graph is topologically equivalent to a graph with pursuer number at most two. Pursuit-Evasion Games are also called Cops and Robbers by mathematicians. In the survey presented by Alspach [15], a number of references on the necessary number of pursuers for a given graph class can be found. Aigner and Fromme [6] proved that in a planar graph G, 3 pursuers are sufficient for the pursuers to win the game. Quilliot [16] extended this result, giving an upper bound to the number of pursuers depending on the genus of the graph G. In [17], the necessary number of pursuers is studied under three graph product operations. To the best of our knowledge, we are the first to present an algorithm to minimize the time to capture of all evaders. VII. CONCLUSIONS We presented an optimal algorithm (causing pursuers to take the minimum number of steps to win a Pursuit-Evasion game in a discrete graph). We illustrate how convergence time varies with different topologies. We have validated the feasibility of our algorithm by experimentally playing mobile robot-based pursuit evasion games on a physical testbed. R EFERENCES [1] B. Kuipers and Y.-T. Byun, "A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations, Tech. Rep. AI90-120, 1, 1990. [Online]. Available: citeseer.ist.psu.edu/kuipers91robot.html [2] M. J. Matarić, "Integration of representation into goal-driven behaviorbased robots," IEEE Transactions on Robotics and Automation, vol. 8, no. 3, pp. 304–312, 1992. [3] R. J. Nowakowski and P. Winkler, "Vertex-to-vertex pursuit in a graph," Discrete Mathematics, vol. 43, no. 2-3, pp. 235–239, 1983. [4] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Pearson Education, 2003. [Online]. Available: http://portal.acm.org/citation.cfm?id=773294 [5] B. Gerkey, R. Vaughan, and A. Howard, "The player/stage project: Tools for multi-robot and distributed sensor systems," in 11th Int. Conf. on Advanced Robotics (ICAR 2003), June 2003, http://playerstage.sourceforge.net. [6] F. M. Aigner. M, “A game of cops and robber,” Tech. Rep., 1984. [7] J. Sgall, “Solution of David Gale’s lion and man problem,” Theor. Comput. Sci., vol. 259, no. 1-2, pp. 663–670, 2001. [8] L. J. Guibas, J.-C. Latombe, S. M. LaValle, D. Lin, and R. Motwani, “Visibility-based pursuit-evasion in a polygonal environment,” in WADS ’97: Proceedings of the 5th International Workshop on Algorithms and Data Structures. London, UK: Springer-Verlag, 1997, pp. 17–30. [9] R. Murrieta-Cid, T. Muppirala, A. Sarmiento, S. Bhattacharya, and S. Hutchinson, “Surveillance strategies for a pursuer with finite sensor range,” Int. J. Rob. Res., vol. 26, no. 3, pp. 233–253, 2007. [10] S. Bhattacharya, S. Candido, and S. Hutchinson, “Motion strategies for surveillance,” in Robotics: Science and Systems, 2007. [11] W. Cheung, “Constrained pursuit-evasion problems in the plane,” Master Thesis, U.British Columbia, 2005. [12] R. Vidal, O. Shakernia, H. J. Kim, D. H. Shim, and S. Sastry, “Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation,” Robotics and Automation, IEEE Transactions on, vol. 18, no. 5, pp. 662–669, 2002. [13] S. Oh, L. Schenato, P. Chen, and S. Sastry, “Tracking and coordination of multiple agents using sensor networks: system design, algorithms and experiments,” Proceedings of the IEEE, vol. 95, no. 1, pp. 234–254, January 2007. [Online]. Available: http://www.truststc.org/pubs/244.html [14] A. Berarducci and B. Intrigila, “On the cop number of a graph,” Adv. Appl. Math., vol. 14, no. 4, pp. 389–403, 1993. [15] B. Alspach, “Searching and sweeping graphs: a brief survey,” Le Matematiche (Catania), vol. 59, pp. 5–37, 2004. [16] A. Quilliot, “A short note about pursuit games played on a graph with a given genus,” J. Comb. Theory, Ser. B, vol. 38, no. 1, pp. 89–92, 1985. [17] S. Neufeld and R. Nowakowski, “A game of cops and robbers played on products of graphs,” Discrete Math., vol. 186, no. 1-3, pp. 253–268, 1998.