[go: up one dir, main page]

Academia.eduAcademia.edu
University of Pennsylvania ScholarlyCommons Lab Papers (GRASP) General Robotics, Automation, Sensing and Perception Laboratory 8-1-2009 Optimized Stochastic Policies for Task Allocation in Swarms of Robots Spring Berman University of Pennsylvania, sberman@eecs.harvard.edu Adam Halasz University of Virginia M. Ani Hsieh Drexel University Vijay Kumar University of Pennsylvania, kumar@grasp.upenn.edu Copyright 2009 IEEE. Reprinted from: Berman, S.; Halasz, A.; Hsieh, M.A.; Kumar, V., "Optimized Stochastic Policies for Task Allocation in Swarms of Robots," Robotics, IEEE Transactions on , vol.25, no.4, pp.927-937, Aug. 2009 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5161293&isnumber=5191252 This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This paper is posted at ScholarlyCommons. http://repository.upenn.edu/grasp_papers/19 For more information, please contact repository@pobox.upenn.edu. IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 927 Optimized Stochastic Policies for Task Allocation in Swarms of Robots Spring Berman, Member, IEEE, Ádám Halász, Member, IEEE, M. Ani Hsieh, Member, IEEE, and Vijay Kumar, Fellow, IEEE Abstract—We present a scalable approach to dynamically allocating a swarm of homogeneous robots to multiple tasks, which are to be performed in parallel, following a desired distribution. We employ a decentralized strategy that requires no communication among robots. It is based on the development of a continuous abstraction of the swarm obtained by modeling population fractions and defining the task allocation problem as the selection of rates of robot ingress and egress to and from each task. These rates are used to determine probabilities that define stochastic control policies for individual robots, which, in turn, produce the desired collective behavior. We address the problem of computing rates to achieve fast redistribution of the swarm subject to constraint(s) on switching between tasks at equilibrium. We present several formulations of this optimization problem that vary in the precedence constraints between tasks and in their dependence on the initial robot distribution. We use each formulation to optimize the rates for a scenario with four tasks and compare the resulting control policies using a simulation in which 250 robots redistribute themselves among four buildings to survey the perimeters. Index Terms—Distributed control, Markov processes, optimization, stochastic systems, swarm robotics, task allocation. I. INTRODUCTION DVANCES in embedded processor, sensor, and actuation technology are paving the way for the development of “swarms” of robots numbered in the hundreds or thousands. We present a strategy for reallocating a swarm of homogeneous robots among a set of tasks that are to be performed in parallel, continuously, and independently of one another. For instance, each task could be an activity at a physical site such as building surveillance, environmental monitoring, construction, or a search-and-rescue operation. The objective for the robots is to autonomously redistribute as quickly and efficiently as possible A Manuscript received July 27, 2008; revised January 12, 2009. First published July 10, 2009; current version published July 31, 2009. This paper was recommended for publication by Associate Editors S. Roumeliotis and D. Song, and Editor L. Parker upon evaluation of the reviewers’ comments. This work was supported in part by the Army Research Office under Grant W911NF-051-0219, in part by the Office of Naval Research under Grant N00014-08-1-0696 and Grant N00014-07-1-0829, in part by the Army Research Laboratory under Grant W911NF-08-2-0004, in part by the National Science Foundation (NSF) under Grant CSR-CPS 0720801, and in part by the NSF Graduate Research Fellowship. S. Berman and V. Kumar are with the General Robotics, Automation, Sensing, and Perception Laboratory, University of Pennsylvania, Philadelphia, PA 19104 USA (e-mail: spring@seas.upenn.edu; kumar@grasp.upenn.edu). Á. Halász is with the Department of Mathematics, West Virginia University, Morgantown, WV 26506 USA (e-mail: halasz@math.wvu.edu). M. A. Hsieh is with the Department of Mechanical Engineering and Mechanics, Drexel University, Philadelphia, PA 19104 USA (e-mail: mhsieh1@ drexel.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TRO.2009.2024997 in a way that causes the steady-state populations at the tasks to follow a predefined distribution. This is an instance of the single-task robot, multi-robot task problem (ST-MR) [1], where the goal is to assign teams of robots to tasks in a way that maximizes the system’s performance. This is known as the coalition formation problem when applied to software agents. Tractable approaches to this problem, which is NP-hard, rely on extensive agent cooperation that is not easily implemented in robot systems since communication can be costly and unreliable, and resources are not transferrable [2]. The algorithm in [3] was adapted to the multirobot domain in [4], but robots must compute all possible coalitions and agree on the best ones, and coalition sizes are limited. The ST-MR problem has recently been addressed with market-based techniques, although allocation strategies for robots have mostly considered the problem of assigning a single robot to each task [2]. Marketbased approaches [5] require robots to execute complex bidding schemes based on perceived costs and utilities, and the computation and communication requirements often scale poorly as the number of robots and tasks increases. These algorithms are not suitable for the large-scale systems that we consider. If tasks are at different sites, communication between all robots may not be possible due to interference, obstruction, or power limitations, or it may be too risky, as in military applications. Also, the bandwidth becomes a limiting factor in communication as population size increases. In light of these issues, we propose a strategy that does not use inter-robot communication. We do, however, assume that a central controller broadcasts information about tasks and task transitions without dictating the actions of individual robots. Our strategy should be readily implementable on robots with limited onboard resources, scalable in the number of robots and tasks, and robust to changes in the population. These properties are inherent in decentralized approaches [6]–[8] that are inspired by the self-organized behavior of social insects such as ants [9]. In these approaches, robots switch between simple behaviors based on environmental stimuli and interactions with other robots. We adopt this distributed paradigm using stochastic switching between tasks. We note that the potential-based algorithm in [10] is also scalable, but it is designed for tasks that are depleted and does not address the problem of allocating robots as quickly as possible. Recent work on decentralized control for task allocation has focused on abstracting the physical system to an accurate macroscopic model [11], [12]. Identical robot controllers are defined with stochastic state transitions, and they are averaged to obtain a set of differential or difference equations. System 1552-3098/$26.00 © 2009 IEEE Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. 928 IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 performance is studied by running the model, which is validated through simulations, under many different conditions. We use a controller synthesis approach that is less computationally expensive and gives theoretical guarantees on performance. We model the swarm as a set of differential equations in which the variables are continuous population fractions at tasks. We then use standard analysis and optimization tools to design the model parameters so that the swarm macroscopically displays rapid, efficient deployment to the desired distribution. We use these parameters to define rates of switching between tasks that can be realized on individual robots, which collectively display the properties of the continuous model. We first used this approach to design ant-inspired behaviors that cause robots to converge to the better of two sites [13] or split between two sites in a specified ratio [14]. We extended our methodology to the distribution of a swarm among tasks at many sites [15] and introduced quorum-based stochastic control policies [16]. In the present paper, we focus on optimizing the rates at which robots switch between tasks for fast convergence to the desired distribution subject to a constraint(s) on idle transitions at equilibrium. We began this investigation in [17], in which we accounted for transition times within the differential equation framework. Here, we present several new optimization methods and compare them using a four-site surveillance scenario. II. CONTINUOUS MODELS A. Definitions and Assumptions Consider a population of N robots to be allocated among M tasks. We denote the number of robots performing task i ∈ {1, . . . , M } at time t by ni (t), a nonnegative integer, and the desired number of robots for task i by ndi , a positive integer. The population fraction performing task i at time t is xi (t) = ni (t)/N , and the vector of population fractions is x(t) = [x1 (t), . . . , xM (t)]T . The target distribution is the set of desired population fractions for each task, xd = [xd1 , . . . , xdM ]T , where xdi = ndi /N . A specification in terms of fractions rather than integers is practical for scaling, as well as for applications in which losses of robots are common. The precedence constraints between tasks can be modeled using a directed graph G = (V, E), where V, the set of M vertices, represents tasks {1, . . . , M } and E, the set of NE edges, represents possible transitions between tasks. Tasks i and j are adjacent, denoted by writing i ∼ j, if a robot that is working on task i can switch to task j. We denote this relation by the ordered pair (i, j) ∈ V × V, with the set E = {(i, j) ∈ V × V|i ∼ j}. For example, if each task i is an activity at a physical site i, then G models the site interconnection topology: V is the set of M sites, and each edge (i, j) represents a one-way route that robots can travel from i to j. If there are P possible routes from i to j, then they are represented by distinct edges (i, j)m , where m = 1, . . . , P . We require G to be strongly connected, which means that a directed path exists between any pair of distinct vertices. This facilitates redistribution by allowing the robots to perform any task starting from any other task; no task acts as a source or a sink. We also consider the case of a fully connected graph, in which every vertex is adjacent to every other vertex. This allows robots to switch directly from one task to another, rather than working on a sequence of intermediate tasks first. We consider x(t) to represent the distribution of the state of a Markov process on G, for which V is the state space and E is the set of possible transitions. Every edge (i, j) is assigned a constant positive transition rate kij , the probability per unit time for one robot at task i to switch to task j. These rates define stochastic transition rules: the robots are programmed to switch from task i to j with probability kij δt at each time step δt. The number of transitions between tasks i and j in time ∆t has a Poisson distribution with parameter kij ∆t. Our objective is to compute kij that cause the robots to quickly redistribute among the tasks in order to occupy them in the population ratios dictated by xd . The use of constant kij is necessary to abstract the system to a linear continuous model (see Sections II-B and II-C), which is used to design the kij via optimization techniques (see Section III). We assume that a central controller determines xd , computes the rates kij , and broadcasts the rates to the robots. The robots have complete knowledge of G and the tasks to perform; this information can be preprogrammed and updated via a broadcast if the tasks change. The robots must also be capable of executing the tasks and transitions. For instance, if the tasks are at different sites, the robots must be able to localize themselves in their environment and navigate safely between sites. B. Base Model The swarm can be modeled as a function of the rates kij by representing it in terms of the continuous quantity x(t). In the limit N → ∞, the physical system of individual robots can be abstracted to the following linear ordinary differential equation (ODE) model according to the theoretical justification given in [18]:   kj i xj (t) − kij xi (t) (1) ẋi (t) = ∀j |(j,i)∈E ∀j |(i,j )∈E where i = 1, . . . , M . If there are P edges from  i to j, each with rate kij,m , where m ∈ {1, . . . , P }, then kij = Pm = 1 kij,m . We define the flux from task i to j at time t as kij xi (t), the fraction of robots per unit time that are leaving i to switch to j. Hence, model (1) quantifies the rate of change of population fraction xi (t) as the difference between the total influx and total outflux of robots at task i. The model captures this effect in a simple way by representing robots as switching instantaneously from one task to another, ignoring the time that robots take to effect transitions. Because the kij are constant, robots still switch between tasks at equilibrium, when the net flux through each task is zero. This contributes to system robustness since the population at each task, which may be depleted by breakdowns, is constantly replenished. The persistent switching may also serve a useful function, such as patrolling or sampling between sites. Since the number of robots is conserved, system (1) is subject to the constraint 1T x = 1. Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. (2) BERMAN et al.: OPTIMIZED STOCHASTIC POLICIES FOR TASK ALLOCATION IN SWARMS OF ROBOTS System (1) can be equivalently written as the linear model ẋ = −Kx M ×M where K ∈ R (3) is a matrix with the properties KT 1 = 0 (4) Kij ≤ 0 ∀(i, j) ∈ E. (5) These two properties result in the following matrix structure:  −k , if i = j, (j, i) ∈ E   ji if i = j, (j, i) ∈ /E (6) Kij = 0,   (i,l)∈E kil , if i = j. Theorem 1: If the graph G is strongly connected, then system (3) subject to (2) has a unique, stable equilibrium. Proof: Since G is strongly connected, the rank of K is M − 1 [19]. The null space of K, xn , is therefore one-dimensional. This null space is intersected by the (M − 1)-dimensional hyperplane described by constraint (2). Thus, system (3) subject to (2) has a unique equilibrium point, which we call x̄n = [x̄n1 , . . . , x̄nM ]T . Now consider the matrix T = tI − K, where t > 0 and I ∈ RM ×M is the identity matrix. Choose t large enough such that T is a nonnegative matrix. Since G is strongly connected, the matrix −K and, therefore, T are irreducible. Because T is nonnegative and irreducible, by the Perron–Frobenius theorem T has a real, positive, simple eigenvalue λm (T) such that all other eigenvalues of T, λ(T), satisfy |λ(T)| < λm (T).  This eigenvalue also satisfies the inequalities minj M i= 1 Tij ≤ M λm (T) ≤ maxj i= 1 Tij [19]. Since the columns of K sum to 0, both sides of these inequalities are t; therefore, λm (T) = t. Note that λ(T) = λ(−K) + t. Thus, the eigenvalue of −K corresponding to λm (T) is 0, and all other eigenvalues of −K satisfy |λ(−K) + t| < t. It follows that −K has a simple zero eigenvalue and all its other eigenvalues satisfy Re(λ(−K)) < 0. Therefore, the equilibrium point x̄n is stable.  Theorem 1 proves that system (3) subject to (2) always converges to a single equilibrium x̄n , which represents the steadystate distribution of population fractions among the M tasks. Hence, we can achieve the target robot distribution xd from any initial distribution x0 by specifying that x̄n = xd through the following constraint on K: d Kx = 0. (7) When the kij are chosen such that the corresponding K matrix satisfies constraint (7), a swarm of robots that use the kij as stochastic transition rules will redistribute from any x0 to xd . In practice, this redistribution must take place in a reasonably short amount of time. Since (3) is a linear system, the rate of convergence of x to xd is governed by the real parts of the eigenvalues of K, which are positive homogenous functions of the kij [20]. Thus, the rate of redistribution can be made arbitrarily fast by using high kij . However, in actual robotic systems, there is often a substantial cost to using high kij . At equilibrium, the probability that any robot performing task i will start switching to task j in time step 929 δt is kij ndi δt. Thus, raising kij increases the equilibrium “traffic” of robots transitioning between tasks i and j. This switching expends power; for instance, if the tasks are at different locations, the robots must travel between them and may experience delays due to congestion along the route. Thus, when choosing the kij , we are faced with a tradeoff between rapid convergence to xd and long-term system efficiency, i.e., few idle transitions between tasks once xd is achieved. In light of this tradeoff, we frame our objective as the design of an optimal transition rate matrix K that maximizes the convergence rate of system (3) to xd subject to one of two possible constraints on task transitions at equilibrium. The first is a limit on the total equilibrium flux of robots switching between tasks:  kij xdi ≤ ctot . (8) (i,j )∈E This constraint does not dictate how the transitioning population is distributed among edges. An alternative constraint achieves this with a set of limits on the equilibrium flux between each pair of adjacent tasks: kij xdi ≤ cij , (i, j) ∈ E. (9) C. Time-Delayed Model As mentioned previously, model (3) does not account for the fact that, in reality, the influx of robots to task j from task i is delayed by the time taken to switch between the tasks, τij . If we assume a constant transition time τij for each edge (i, j), this effect can be included by rewriting equation (1) as a delay differential equation (DDE) given by   kj i xj (t − τj i ) − kij xi (t) (10) ẋi (t) = ∀j |(j,i)∈E ∀j |(i,j )∈E where i = 1, . . . , M . Due to the finite τij , there will be robots  in the process of switching between tasks; thus, M i= 1 xi (t) < 1 for t > 0. Let nij (t) be the number of robots in transition from task i to j at time t and yij (t) = nij (t)/N . Then, the conservation equation for this system is M  i= 1 xi (t) + M   yij (t) = 1. (11) i= 1 ∀j |(i,j )∈E In practice, robots will complete a transition in different amounts of time, and therefore model (10) can be made more realistic by defining the τij as random variables, Tij . In the case where robots effect transitions by traveling between sites, variations in τij can arise from changes in navigation patterns caused by collision avoidance, congestion, and localization errors. For this case, we can estimate a reasonable form for the probability density of the Tij from an analogous scenario in which vehicles deliver items along roads to different sites. Vehicle inter-site travel times have been modeled as having an Erlang distribution to capture the fact that the times have positive, minimum possible values and a small probability of being large due to accidents, breakdowns, and low energy, as well as the tendency of their distributions to be skewed toward larger values [21]. We assume that each Tij follows this distribution with parameters Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. 930 IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 Fig. 1. Labeled edge (i, j) = (1, 2) that consists of (a) the real tasks, corresponding to model (1), and (b) both real and virtual tasks (for ω 1 2 = 2), corresponding to model (13). z = [x y]T . We interpret each component of z as the population fraction at task i ∈ {1, . . . , M ′ }, where M ′ is the sum of all real and virtual tasks. The interconnection topology of these tasks can be modeled as a directed graph G ′ = (V ′ , E ′ ), where V ′ = {1, . . . , M ′ } and E ′ = {(i, j) ∈ V ′ × V ′ |i ∼ j}. Since G is strongly connected, so is G ′ . Each component of z evolves according to the ODE   żi (t) = k̂j i zj (t) − k̂ij zi (t) (14) j |(j,i)∈E ′ ωij ∈ Z+ and θij ∈ R+ : ω θiji j tω i j −1 −θ i j t e . g(t; ωij , θij ) = (ωij − 1)! (12) In practice, the parameters are estimated by fitting empirical transition time data to density (12). Under this assumption, the DDE model (10) can be transformed into an equivalent ODE model of the form (1), which allows us to optimize the rates kij using the methods that we have developed for this type of model. We use the fact that Tij has the same distribution as the sum of ωij independent random variables, T1 , . . . , Tω i j , with a common distribution f (t; θij ) = θij e−θ i j t [22]. Each of the variables represents a portion of the transition time between tasks i and j. To model these portions of the transition, we define a directed path composed of a sequence of virtual tasks, u = 1, . . . , ωij , between the real tasks i and j. Assume that robots transition instantaneously from virtual task u to u + 1, which is task j when u = ωij , at a constant probability per unit time, θij . It follows that f (t; θij ) is the distribution of the time that a robot spends doing virtual task u, and therefore we can define Tu , u ∈ {1, . . . , ωij } as this task execution time. We denote the population fraction that is doing virtual task u ω i j (u ) (u ) along edge (i, j) by yij . Then, u = 1 yij represents yij , the fraction of robots in transition from task i to task j. Fig. 1 illustrates how an edge from model (1) is expanded with two virtual (u ) states yij . As in Section II-B, the dynamics of the population fractions at all real and virtual tasks in the expanded system can be written as a set of linear ODE’s given by   (ω ) θj i yj i j i (t) − kij xi (t) ẋi (t) = j |(j,i)∈E (1) ẏij (t) j |(i,j )∈E (1) θij yij (t) = kij xi (t) −   (m ) (m ) (m −1) (t) − yij (t) ẏij (t) = θij yij m = 2, . . . , ωij j |(i,j )∈E ′ where each k̂ij is defined by the corresponding coefficient in model (13). The system of equations for all M ′ tasks can be written in the same form as the linear model (3) using an expanded transition rate matrix K̂, and the conservation constraint (11) can be written as 1T z = 1. Since the system can be represented in the same form as system (3) subject to (2), Theorem 1 can now be applied to show that there is a unique, stable equilibrium.  Remark 1: At equilibrium, the incoming and the outgoing flux at each virtual task along the path from task i to j is kij xi , and therefore, model (13) contains the system Kx = 0. Thus, xn in zn , the null space of K̂, is the same as the null space of K in the corresponding model (3) that ignores transition times. This shows that the ratio between the equilibrium populations at any two real tasks is the same in both models. However, the equilibrium populations x̄ni ≡ xdi in model (3) are higher than those in model (13) because the conservation constraint for the latter model accounts for robots in transition. Remark 2: The modeling approach in this section can still be applied when the distribution of Tij is complicated (e.g., multimodal) by approximating it as a combination of Erlang distributions; this is a topic for future work. III. DESIGN OF OPTIMAL TRANSITION RATE MATRIX K We consider the task of redeploying a swarm modeled as system (3) from an initial distribution x0 to a target distribution xd . As described in Section II-B, we want to select the rates kij in a way that balances fast convergence to xd with long-term efficiency to conserve power. To this end, we compute the matrix K as the solution to an optimization problem that maximizes a measure of the convergence rate of system (3) to xd subject to constraint (8) or (9) on idle transitions at equilibrium. We quantify the degree of convergence to xd by the fraction of misplaced robots, µ(x) = x − xd (13) where i = 1, . . . , M , and (i, j) ∈ E. We now show that this more realistic model converges to a designable target distribution from any initial distribution. Theorem 2: If the graph G is strongly connected, then system (10) subject to (11) with each τij distributed according to density (12) has a unique, stable equilibrium. Proof: We prove this for the equivalent ODE model (13) subω i j (u ) (u ) ject to (11), where yij = u = 1 yij . Let y be the vector of yij , where u = 1, . . . , ωij , (i, j) ∈ E. The system state vector is then 2. (15) We say that one system converges faster than another if it takes less time tf for µ(x) to decrease to some small fraction f , such as 0.1, of its initial value µ(x0 ). We formulate several versions of this optimization problem, which are summarized in Table I (FC = fully connected, ROC = rate of convergence). Each version is tailored to an application with a particular combination of properties. The graph G will be fully connected, in addition to strongly connected, if there are no physical or logical constraints on the flow of robots between pairs of tasks, such as a path in a disaster area that is only wide Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. BERMAN et al.: OPTIMIZED STOCHASTIC POLICIES FOR TASK ALLOCATION IN SWARMS OF ROBOTS TABLE I K OPTIMIZATION PROBLEMS 931 as ftot for constraint (8) and find for constraint (9):  kij xdi ftot (k) = kij xdi , find (k) = max cij (i,j )∈E . (18) (i,j )∈E enough for robots to travel in one direction. In addition, it may be possible to obtain x0 , for instance by identifying robots in an image from an aerial camera. Problem P3 is solved using a stochastic optimization method that directly minimizes convergence time. The resulting system is used as a baseline to compare the systems computed by the other problems, which manipulate convergence time by maximizing functions of the eigenvalues of K using linear or semidefinite programs. Since these types of programs can be solved with methods that have polynomial complexity in the worst case [23], we can efficiently compute the M × M matrix K for large M . Thus, our allocation approach scales well with the number of tasks. Our K design methods can also be applied to the more realistic model (10) with Erlang-distributed τij when it is expressed as an equivalent linear model (13), as in [17]. A. Maximizing the Asymptotic Rate of Convergence If G is strongly connected, but not necessarily fully connected, and x0 is unknown, we can designate the asymptotic rate of convergence of system (3) to xd as the quantity to maximize. Let λi (K) signify the eigenvalue of K with the ith smallest real part of all the eigenvalues. By Theorem 1, λ1 (K) = 0 and λi (K) > 0 for i = 2, . . . , M . Thus, the asymptotic rate of convergence is governed by Re(λ2 (K)). Noting that K is usually not symmetric, we first find a symmetric matrix S such that λ2 (S) ≤ Re(λ2 (K)). We replace the objective function Re(λ2 (K)) by λ2 (S). We can write this problem as a semidefinite program with a linear matrix inequality that arises from a variational characterization of λ2 (S). Theorem 3: Define Π = diag(xd ), which is invertible since d x > 0. Let K be a matrix with the structure in (6). Define the matrices N = 12 (ΠKT + KΠ) (16) K̃ = Π−1/2 KΠ1/2 S = 12 (K̃ + K̃T ) = Π−1/2 NΠ−1/2 . Now, we can state the optimization problem as follows: maximize λ2 (S) subject to f (k) ≤ 1, k ≥ 0. We use an alternate formulation [20]: minimize f (k) subject to λ2 (S) ≥ 1, k ≥ 0. The vector q = [(xd1 )1/2 , . . . , (xdM )1/2 ]T is the eigenvector of Π−1/2 NΠ−1/2 corresponding to the zero eigenvalue. From equation (17) and the characterization of eigenvalues in [24], the constraint λ2 (S) ≥ 1 can be expressed as λ2 (S) = inf xT Π−1/2 NΠ−1/2 x ≥ inf [P1] minimize f (k) subject to Π−1/2 NΠ−1/2 I − qqT , for the theorems in this section can be found in Appendix A. k ≥ 0. ∗ Denote the optimized vector of rates by k . If constraint (8) is used, then we can achieve the maximum total flux by multiplying k∗ by ctot /ftot (k∗ ). If constraint (9) is used, we can achieve the maximum flux for each edge by dividing k∗ by find (k∗ ). For a strongly connected, but not necessarily fully connected, graph with bidirectional edges, which means that (i, j) ∈ E if and only if (j, i) ∈ E, we explore the advantage of having a reversible Markov process, which is defined by the detailed balance equations: kij xdi = kj i xdj ∀(i, j) ∈ E. (20) Suppose that G has bidirectional edges and that the two edges between each pair of adjacent tasks have equal flux capacities. For example, robots may travel between sites along identical parallel roads, similar to a two-way highway. Then, by condition (20), the Markov process on G is reversible. We adapt the problem of maximizing the asymptotic rate of convergence to this special case and call it problem P1R . For constraint (8): Condition (20) implies that KΠ = ΠKT , and therefore, N = KΠ in equation (16). Substitute KΠ for N in problem P1 (with f = ftot ). Since K = NΠ−1 , K is similar to S, so the constraint λ2 (S) ≥ 1 becomes λ2 (K) ≥ 1. Thus, the problem constrains Re(λ2 (K)) directly instead of a lower bound on this value. For constraint (9): We can maximize all the nonzero eigenvalues of K by setting each transition rate to its maximum value subject to condition (20) and constraint (9): kij = (1/xdi )min(cij , cj i ), (i, j) ∈ E. This is evident by using the Courant–Fischer min–max theorem [24] to express each nonzero eigenvalue of S, and therefore of K, in terms of a quadratic form x∗ Sx (x∗ is the conjugate transpose of x), which is equal to  kij xdi aij āij , aij = xi (xdi )−1/2 − xj (xdj )−1/2 (i,j )∈E 1 Proofs xT (I − qqT )x. (19) The problem can now be posed as problem P1, in which the linear matrix inequality comes from (19). (17) Then, λ2 (S) ≤ Re(λ2 (K)).1 2 Denote the vector of all kij by k ∈ RM −M , which is the optimization variable. Both constraints on transitions can be 2 written in the form f (k) ≤ 1, where f : RM −M → R is defined x =1 x T q= 0 x =1 x T q= 0 where āij is the complex conjugate of aij . Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. 932 IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 B. Maximizing the Overall Convergence Rate C. Maximizing the Convergence Rate for a Specified x0 The asymptotic rate of convergence only dictates the longterm system behavior. If G is fully connected and x0 is unknown, we can speed convergence of the faster modes by maximizing a measure of the overall convergence rate, which is a function of all the nonzero eigenvalues of K, Λ(K) = [λ2 (K), . . . , λM (K)]. We define the quantity to be maximized as 1T Λ, which weights each eigenvalue equally. We use equations (4) and (7) to write k as a linear function 2 of v ≡ [Λ(K) 0]T ∈ RM −M . This allows us to formulate the optimization problem as a linear program with optimization variable v and objective function 1T v. Let K be a matrix that satisfies (4), which sets M constraints on the M 2 entries of K, and (7), which sets M − 1 constraints. We now define the remaining (M − 1)2 constraints on K in terms of the variable Λ(K). Since no extra constraints can be applied, no kij may be set to zero, which is why G must be fully connected. Construct an orthonormal basis set in RM , D = {d1 , d2 , . . . , dM −1 , xd / xd }. Define a matrix in RM ×M as If G is strongly connected, but not necessarily fully connected, and x0 is known, we can use a stochastic optimization method to directly minimize the time to converge from x0 , which is quantified by tf . We implement problem P3 below using Metropolis optimization [25] with k as the variable. We chose this method for its simplicity and for the fact that it yields reasonable improvements in tf with moderate computing resources. [P3] Minimize tf subject to (4), (5), (7), and constraint (8) or (9). Implementation: At each iteration, k is perturbed by a random vector such that the resulting K matrix satisfies (4), (5), and (7). k is then scaled as in problem P1 to satisfy constraint (8) or (9) while maximizing flux capacity. The resulting K is decomposed into its normalized eigenvectors and eigenvalues, system (3) is mapped into the space spanned by the normalized eigenvectors, and the appropriate transformation is applied to compute x(t) using exp(t diag([Λ(K) 0])). Since the system is stable according to Theorem 1, µ(x) always decreases monotonically with time, and therefore a Newton scheme can be used to calculate tf . If G is fully connected and x0 is known, then K can be computed such that ∆ ≡ xd − x0 is one of its eigenvectors with eigenvalue λ > 0. By maximizing λ, we maximize the convergence rate along the vector from x0 to xd , which is the most direct route in RM to the target distribution. We use the decomposition of K from Theorem 4 to formulate the optimization problem as a linear program that maximizes λ. Theorem 5: Let K be a matrix that satisfies (4) and (7); then, by Theorem 4, K = BĈA. Let d1 = d in definition (21), where   T d = ∆′ / ∆′ , ∆′ = ∆ − xd ∆/ xd 2 xd . (27) A = [d1 , . . . , dM −1 , 1]T ≡ [ÃT | 1]. (21) Since 1T xd = 1 by (2), 1 has a nonzero component in the direction of xd , and therefore the rows of A are linearly independent. Thus, A is invertible. Let B = A−1 . Then  I 0 T d ≡ B̃ | xd . B = Ã | x −1T ÃT 1 Define C ∈ R(M −1)×(M −1) as follows for some fixed Ã: C = ÃKB̃. (22) M ×M as C augmented with an added row Also define Ĉ ∈ R of zeros and an added column of zeros. Theorem 4: A matrix K can be expressed as K = BĈA if and only if it satisfies (4) and (7). From this result, K is similar to Ĉ, and therefore the eigenvalues of C are Λ(K). Thus, we can define C as C ≡ diag(Λ(K)). (23) M ×(M 2 −M ) , Now reformulate (7) as Fk = 0, where F ∈ R and (22) with C determined by (23) as Gk = g, where G ∈ 2 2 2 R(M −1) ×(M −M ) and g = [Λ(K) 0]T ∈ R(M −1) . Define F̃ as any M − 1 rows of F. Then, k can be written as k = [G F̃]−T [gT 0]T ≡ H−1 v. (24) Using definition (24) for k, constraints (8) and (9) are rT H−1 v ≤ ctot , H−1 v ≤ c (25) 2 where the entries of r ∈ RM −M are xdi , and the entries of 2 c ∈ RM −M are cij /xdi . In addition, property (5) is H−1 v ≥ 0. (26) Note that while this property is not needed to prove Theorem 4, it is required to produce a valid K. The optimization problem can now be posed as problem P2. [P2] Maximize 1T v subject to vi = 0 for i = M, . . . , M 2 − M , (26), and one of the constraints in (25). Then, K∆ = λ∆ if and only if C from (22) is defined as C = [c | C̃], cT = [λ 0] (28) where λ and C̃ are unconstrained. We can now pose the optimization problem as problem P4, in which property (5) and constraints (8) and (9) are defined in terms of the entries of BĈA, with d1 = d and C defined by (28). The optimization variables are λ and C̃. [P4] Maximize λ subject to (5) and constraint (8) or (9). IV. RESULTS A. Effect of Connectivity of G on Asymptotic Convergence Rate As a preliminary study, we investigated the effect of the connectivity of G on λ2 (K) for several strongly connected, directed graphs on three tasks, which are labeled in Fig. 2. We used problem P1R to compute K for graph α with reversibility condition (20) and problem P1 to compute K for graph α without this condition and for all other graphs. We modeled each edge in a graph as providing one unit of equilibrium flux capacity by defining cij = 1 for all (i, j) ∈ E in constraint (9) and ctot = NE in constraint (8). The target distribution was xd1 = 0.2, xd2 = 0.3, and xd3 = 0.5. Table II gives the resulting λ2 (K) of each graph for both constraints, with column 2 indicating whether condition (20) was imposed or not. The fully connected graph α yields the fastest Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. BERMAN et al.: OPTIMIZED STOCHASTIC POLICIES FOR TASK ALLOCATION IN SWARMS OF ROBOTS Fig. 2. 933 Graphs on three tasks. TABLE II COMPARISON OF λ2 (K) FOR GRAPHS ON THREE TASKS Fig. 4. Fig. 3. Robot activities in the simulation. Numbers in brackets are related references for the stochastic simulation algorithm and motion controllers (see Appendix B). Cell decomposition of the free space used for navigation. Fig. 5. Numbering and connectivity of surveyed buildings for (a) a strongly connected, but not fully connected, graph and (b) a fully connected graph. convergence, which is expected since robots can switch from any task directly to any other task. Each removal of an edge from graph α lowers λ2 (K), except in the case of constraint (9) applied to the three-edge cycle δ. This is because the optimization problem maximized the flux capacity for each edge of graph δ (and did not for β and γ), which offset the stricter limits on task switching than in the other graphs. B. Comparison of K for Surveillance Simulation To demonstrate our approach on a realistic application, we simulated a scenario in which each task is the surveillance of one of four buildings in the University of Pennsylvania campus. Robots execute the tasks by monitoring the building perimeters, and they effect task transitions by traveling between buildings. We assume that robots can localize themselves in the campus and sense neighboring robots to avoid collisions. Appendix B describes our simulation methodology in detail, including the motion controllers for perimeter tracking and navigation that are used to implement surveillance behavior and inter-site travel, respectively. Fig. 3 illustrates the integration of switching initiations, perimeter surveillance, and navigation in the simulation. The buildings to be monitored are highlighted in light dashed lines in Fig. 4, which also shows the cell decomposition used for navigation (see Appendix B). Two different graphs G, shown in Fig. 5, were defined on these buildings. The swarm consists of 250 homogeneous robots and is initially split equally between buildings 3 and 4. The robots must redistribute to achieve the target population fractions xd1 = 0.1, xd2 = 0.4, xd3 = 0.2, and xd4 = 0.3. Fig. 6. Snapshots of a run using k from problem P1 with constraint (9). The red robots () are not engaged in a transition, and the orange robots (∗) are committed to travel to another site or are in the process of traveling. We compared the system convergence to xd for different sets of transition rates k, each computed from one of the optimization problems discussed in Section III. Problems P1 and P3 were used to compute rates for the system with graph Fig. 5(a), and problems P1R , P2, P3, and P4 were used for the system with graph Fig. 5(b). The snapshots in Fig. 6 illustrate the robot redistribution for one trial. The following discussion summarizes several key points from the simulation results. Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. 934 IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 Fig. 7. DDE and stochastic simulations using k from problem P1 with constraint (9). Stochastic plots show the average over 40 runs ± standard deviation. Fig. 9. Fig. 8. Distance from equilibrium for stochastic simulations using graph Fig. 5(a) with (a) constraint (8) and (b) constraint (9). Each plot is an average over 40 runs that use k from the problem labeled in the legend. The bold number to the right of each legend entry is the equilibrium traveler fraction averaged over 1000 data points of the corresponding plot. 1) Agreement Between Continuous and Stochastic Models: Our top–down methodology relies on the principle that the rates kij designed for the continuous model (3) will produce similar system performance when used as stochastic transition probabilities by individual robots. In Fig. 7, we compare performance in terms of x − xd 1 for 40 stochastic simulation runs and the continuous DDE model (10) with the same k. Each time delay τij in the DDE model was estimated as the average of 750–850 robot travel times at equilibrium from site i to j. These times were collected from a stochastic simulation using the site graph Fig. 5(b). The stochastic runs average to a plot that is close to the DDE plot and display little variability; if the number of robots were to approach infinity, the standard deviations would decrease to zero. The similarity in performance of the continuous and stochastic models verifies the validity of our top–down methodology. 2) Tradeoff Between Convergence Rate and Equilibrium Traffic: Figs. 8 and 9 compare system performance for different k in terms of the distance from equilibrium, ν(x, y) = x − xd 1 − 1T y. (29) Same quantities as in Fig. 8 for runs using graph Fig. 5(b). This quantity decreases to zero at equilibrium because then the fraction of travelers 1T y accounts entirely for all the discrepancies |xi − xdi |, i = 1, . . . , M . Each plot is an average over 40 stochastic simulation runs, and the bold numbers beside the legends are the average traveler fractions at equilibrium for each k. (Standard deviations are not shown to avoid cluttering the figures; the maximum standard deviation over all plots is 0.078.) The data in these figures verify that there is a tradeoff between rapid convergence to equilibrium and the number of idle transitions between sites at equilibrium. For instance, the runs in Fig. 8(b) are the slowest to converge, and they yield the lowest equilibrium traffic fractions. It is notable that this tradeoff can occur to different degrees depending on the flux constraint (8) or (9). The P2 plot converges slightly faster in Fig. 9(b) than in Fig. 9(a), but it has a lower equilibrium traffic fraction. 3) Faster Convergence With Increased Site Connectivity: Figs. 8 and 9 show that for both flux constraints, the runs for graph Fig. 5(b) converge faster to equilibrium than those for graph Fig. 5(a). This is due to the difference in allowable pathways between the initial and final distributions. In Fig. 5(b), robots can travel directly from sites 3 and 4 to sites 1 and 2, while in Fig. 5(a), they can only reach sites 1 and 2 via the path 3 → 4 → 1 → 2, which prolongs the redistribution process. The greater number of edges in Fig. 5(b) also reduces the impact on convergence of limiting each edge’s flux capacity. The range of convergence times to equilibrium for Fig. 5(b) are similar for both constraints, while the convergence times for Fig. 5(a) increase significantly when constraint (9) is applied. 4) Limits on Edge Flux Capacities Eliminate the Advantage of Knowing x0 : Since k produced by problems P3 and P4 are optimized for a specific x0 , it seems likely that for any given x0 , the P3 and P4 runs will converge at least as fast as the runs corresponding to other problems, which optimize k for the entire domain of x0 . As Figs. 8(a) and 9(a) show, this is true if constraint (8) is used. This is because the flux capacity can be Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. BERMAN et al.: OPTIMIZED STOCHASTIC POLICIES FOR TASK ALLOCATION IN SWARMS OF ROBOTS distributed among edges in any way as long as the total capacity does not exceed a limit. However, when constraint (9) is used, limits are placed on edges that, if left unconstrained, would be allocated a higher flux capacity to redistribute robots from x0 to xd . The problems that are independent of x0 are more robust to these limitations; their corresponding runs converge as fast as the runs that rely on x0 or outperform them. 5) K From Convex Optimization is Competitive Compared to K From Stochastic Optimization: The fastest-converging runs that use k from problems P1, P1R , P2, and P4 attain equilibrium at least as quickly as the corresponding runs that use k from problem P3. Hence, we can use efficient convex optimization techniques to compute a k that yields the same system performance as a k from a much more time-consuming stochastic optimization approach.2 This facilitates the quick computation of k in real-time task allocation scenarios. V. CONCLUSION We have presented an approach to redistributing a swarm of robots among multiple tasks that can be optimized for fast convergence to a target distribution subject to a constraint(s) on idle transitions at equilibrium. Our methodology is based on modeling the swarm as a set of continuous linear ODE’s and optimizing the transition rates in this model. We can account for realistic distributions of transition times within the framework of the linear ODE abstraction by augmenting the network of tasks with virtual tasks that represent the progress of transitioning robots. The optimized rates comprise a list of NE transition probabilities per unit time for individual robots to switch between tasks, and they are independent of the swarm size. The collective behavior that arises from individual robots switching stochastically between tasks follows the continuous model prediction. In this way, we synthesize decentralized robot controllers that can be computed a priori, do not require inter-robot communication, and have provable guarantees on performance. A possible extension of this study is the design of a time-dependent transition rate matrix K(t) that causes the swarm to redistribute according to a trajectory of desired configurations, xd (t). Other extensions include the introduction of communication between robots and the use of nonlinear models to represent robot interactions. PROOFS FOR SECTION III h(Re(λ(K))) ≤ h( 12 λ(K̃ + K̃T )) = h(λ(S)) (31) where the equality on the right comes from (17). Now, we evaluate both sides of the inequality in (31). By Theorem 1, h(Re(λ(K))) = −Re(λ2 (K)). We observe that λ(S) = Re(λ(S)) because S is symmetric. We now show that S is positive semidefinite, denoted by S 0, which implies that h(λ(S)) = −λ2 (S) and hence reduces (31) to the inequality λ2 (S) ≤ Re(λ2 (K)). By (17), S 0 if N 0. Since G is strongly connected, λ2 (N) > 0 (see [27, Lemma 10]). Using property (4) and constraint (7), N1 = (1/2)(ΠKT 1 + Kxd ) = 0, and therefore λ1 (N) = 0 with corresponding eigenvector 1. Thus, N 0. B. Proof of Theorem 4 K is similar to P ≡ MKN, where M, P ∈ RM ×M and N = M−1 . Subdivide M as [M̃T | m]T and N as [Ñ | n], where m, n ∈ RM ×1 . Then   M̃Ñ M̃n MN = =I (32) mT Ñ mT n   M̃KÑ M̃Kn MKN = = P. (33) mT KÑ mT Kn Choose an N with n = xd . It follows from (32) that mT xd = 1, which by (2) implies that m = 1. Suppose that K satisfies (4) and (7). Since m = 1 and n = xd , these constraints applied to (33) make both the last row and last column of P equal to 0. To satisfy M̃n = M̃xd = 0 in (32), M̃ can be set to Ã. Then, M = A, N = B, and P = Ĉ; therefore, it follows that K = BĈA. Now suppose that K = BĈA. Since ĈAxd = 0 and T 1 BĈ = 0, K satisfies (4) and (7). C. Proof of Theorem 5 Suppose that K∆ = λ∆. Then K∆ = BĈA∆ = λ∆ ⇒ ĈA∆ = λA∆. (34) (35) for i = 2, . . . , M − 1. From this equation and the fact that 1T ∆ = 0 by constraint (2), A∆ = [dT1 ∆ | 0]T . Thus, (34) is true if and only if C is defined as in (28). A. Proof of Theorem 3 Define a convex, symmetric function h : RM → R as i, j ∈ {1, . . . , M }. similar to K. Thus, since K̃ is similar to K, Using (27) for d1 and the orthonormality of the di ,  T  dTi ∆ = ∆′ dTi d1 + xd ∆/ xd 2 dTi xd = 0 APPENDIX A h(x) = −min{xi + xj }, 935 (30) Let λ(A) be the vector of the eigenvalues of a matrix A. By [26, Th. 16.4], since h is convex and symmetric, h(Re(λ(K))) is the infimum of h((1/2)λ(M + MT )) over all matrices M 2 On a standard 2-GHz laptop, one Metropolis optimization run used for graph Fig. 5(b) took about 15 min for t0 . 1 to decrease slowly enough with each iteration for K to be deemed close enough to optimal, while all the convex optimization programs computed an optimal K in less than a second. APPENDIX B SIMULATION METHODOLOGY The continuous DDE model (10) was numerically integrated using the Runge–Kutta method. Gillespie’s direct method [28] was used to perform a stochastic simulation of the system that is represented deterministically by the DDE model. This method simulates a sequence of robot transition events and their initiation times using the transition rates kij . Each event is identified Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. 936 IEEE TRANSACTIONS ON ROBOTICS, VOL. 25, NO. 4, AUGUST 2009 with the commitment of a robot to travel to another site. A transition from building i to building j is assigned to a random robot on the perimeter of i. This robot continues to track the perimeter until it reaches the exit for edge (i, j), at which point it begins navigating to the entrance on j. For more details, see [13] and [16]. To simulate perimeter tracking and navigation, we represented each robot k as a planar agent governed by a kinematic model q̇k = uk , where qk ∈ R2 denotes the robot’s (x, y) coordinates and uk ∈ R2 is a control input. Suppose that the boundary of a building m is parameterized by a vector s(s) ∈ R2 that maps arc length s to (x, y) coordinates. A robot k monitoring the perimeter of m moves in the direction of a unit vector tangent to this boundary, n̂m (s) ∈ R2 . To create an approximately uniform distribution of robots around the perimeter, we specify that the robot k slows down by a fraction ζ of its nominal speed vp if its distance qk l from the robot l in front of it is less than pm /nm , where pm is the perimeter length and nm is the site population. The robot kinematics are then defined as q̇k = (1 − σ(qk l , pm , nm )ζ)vp n̂m (qk ) where σ(qk l , pm , nm ) = 1 if qk l < pm /nm and 0 otherwise. To implement inter-site navigation, we first decomposed the free space into a tessellation of convex cells. Each edge (i, j) was defined as a sequence of cells to be traversed by robots moving from an exit point on the perimeter of building i to an entry point on the perimeter of j. Dijkstra’s algorithm was used a priori to compute the sequence with the shortest cumulative distance between cell centroids, starting from the cell adjacent to the exit at i and ending at the cell adjacent to the entrance at j. The robots are provided with the cell sequence corresponding to each edge. Define Nk as the set of robots within the sensing radius ρ of robot k. The robot kinematics for navigation are q̇k = vn (ng (qk ) + na (qk , Nk ))/ ng (qk ) + na (qk , Nk ) where vector ng (qk ) is computed from local potential functions to ensure arrival at the goal cell [29], and vector na (qk , Nk ) is computed from repulsive potential functions to achieve interrobot collision avoidance. Suppose that qk is in cell c. Let n̂ce be the unit vector pointing out of c orthogonal to its exit facet. Let n̂cf 1 , n̂cf 2 be unit vectors pointing into c orthogonal to each facet adjacent to the exit facet, and define dk 1 and dk 2 as the distance from robot k to each of these facets. Also, define η, υ, κ > 0. Then ng (qk ) = ηn̂ce + υ(1/dκk 1 n̂cf 1 + 1/dκk 2 n̂cf 2 ). In the last cell in the sequence, this vector is replaced with one pointing from qk to the perimeter entrance point. Let qk l = qk l = qk − ql and ξ > 0. Define a sum of vectors that point away from each neighboring robot,  1 1 − 2 2 2 ln (ξqk l ) − qk l . nn (qk , Nk ) = ξ qk l ξqk l l∈Nk This is derived from the example potential function given in [30], with the added parameter ξ that, when lowered, increases the range of repulsion between robots. Finally, na (qk , Nk ) = nn (qk , Nk ) ng (qk ) / nn (qk , Nk ) . We set the sensing radius ρ to 46 m, which is within the capabilities of some laser range finders. The navigation speed vn was set to 1.3 m/s, which is attainable by some mobile robots that are particularly suited to surveillance tasks, such as PatrolBot and Seekur. The perimeter surveillance speed vp was set to be 4.5 times slower. In the optimization problems, the total equilibrium flux capacity ctot for all possible edges (see graph Fig. 5(b)) was set to 0.175 robots/s and distributed among the edges in proportion to the cumulative distance between the centroids of their associated cells. REFERENCES [1] B. Gerkey and M. Mataric, “A formal analysis and taxonomy of task allocation in multi-robot systems,” Int. J. Robot. Res., vol. 23, no. 9, pp. 939–954, 2004. [2] L. Vig and J. Adams, “Coalition formation: From software agents to robots,” J. Intell. Robot. Syst., vol. 50, no. 1, pp. 85–118, 2007. [3] O. Shehory and S. Kraus, “Methods for task allocation via agent coalition formation,” Artif. Intell., vol. 101, no. 1/2, pp. 165–200, 1998. [4] L. Vig and J. Adams, “Multi-robot coalition formation,” IEEE Trans. Robot., vol. 22, no. 4, pp. 637–649, Aug. 2006. [5] M. B. Dias, R. M. Zlot, N. Kalra, and A. T. Stentz, “Market-based multirobot coordination: A survey and analysis,” Proc. IEEE, vol. 94, no. 7, pp. 1257–1270, Jul. 2006. [6] M. J. B. Krieger, J.-B. Billeter, and L. Keller, “Ant-like task allocation and recruitment in cooperative robots,” Nature, vol. 406, pp. 992–995, 2000. [7] T. H. Labella, M. Dorigo, and J.-L. Deneubourg, “Division of labor in a group of robots inspired by ants’ foraging behavior,” ACM Trans. Auton. Adapt. Syst., vol. 1, no. 1, pp. 4–25, 2006. [8] W. Agassounon and A. Martinoli, “Efficiency and robustness of thresholdbased distributed allocation algorithms in multi-agent systems,” in Proc. 1st Int. Joint Conf. Auton. Agents Multi-Agent Syst., 2002, pp. 1090–1097. [9] N. Franks, S. C. Pratt, N. F. Britton, E. B. Mallon, and D. T. Sumpter, “Information flow, opinion-polling and collective intelligence in househunting social insects,” Philos. Trans.: Biol. Sci., vol. 357, pp. 1567–1584, 2002. [10] O. Shehory, S. Kraus, and O. Yadgar, “Emergent cooperative goalsatisfaction in large-scale automated-agent systems,” Artif. Intell., vol. 110, no. 1, pp. 1–55, 1999. [11] W. Agassounon, A. Martinoli, and K. Easton, “Macroscopic modeling of aggregation experiments using embodied agents in teams of constant and time-varying sizes,” Auton. Robots, vol. 17, no. 2/3, pp. 163–192, 2004. [12] K. Lerman, C. V. Jones, A. Galstyan, and M. J. Mataric, “Analysis of dynamic task allocation in multi-robot systems,” Int. J. Robot. Res., vol. 25, no. 4, pp. 225–242, 2006. [13] S. Berman, Á. Halász, V. Kumar, and S. Pratt, “Algorithms for the analysis and synthesis of a bio-inspired swarm robotic system,” in Swarm Robotics (LNCS 4433 Series). New York: Springer-Verlag, 2006, pp. 56–70. [14] S. Berman, Á. Halász, V. Kumar, and S. Pratt, “Bio-inspired group behaviors for the deployment of a swarm of robots to multiple destinations,” in Proc. Int. Conf. Robot. Autom., 2007, pp. 2318–2323. [15] Á. Halász, M. A. Hsieh, S. Berman, and V. Kumar, “Dynamic redistribution of a swarm of robots among multiple sites,” in Proc. Int. Conf. Intell. Robots Syst., 2007, pp. 2320–2325. [16] M. A. Hsieh, Á. Halász, S. Berman, and V. Kumar, “Biologically inspired redistribution of a swarm of robots among multiple sites,” Swarm Intell., vol. 2, no. 2, pp. 121–141, 2008. [17] S. Berman, Á. Halász, M. A. Hsieh, and V. Kumar, “Navigation-based optimization of stochastic strategies for allocating a robot swarm among multiple sites,” presented at the Conf. Decis. Control, Cancun, Mexico, 2008. [18] D. Gillespie, “Stochastic simulation of chemical kinetics,” Annu. Rev. Phys. Chem., vol. 58, pp. 35–55, 2007. Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply. BERMAN et al.: OPTIMIZED STOCHASTIC POLICIES FOR TASK ALLOCATION IN SWARMS OF ROBOTS [19] H. Othmer, Analysis of Complex Reaction Networks, Lecture Notes. St. Paul, MN: Univ. Minnesota Press, 2003. [20] J. Sun, S. Boyd, L. Xiao, and P. Diaconis, “The fastest mixing Markov process on a graph and a connection to the maximum variance unfolding problem,” SIAM Rev., vol. 48, no. 4, pp. 681–699, 2006. [21] R. Russell and T. Urban, “Vehicle routing with soft time windows and Erlang travel times,” J. Oper. Res. Soc., vol. 59, no. 9, pp. 1220–1228, 2008. [22] B. Harris, Theory of Probability. Reading, MA: Addison-Wesley, 1966. [23] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, no. 1, pp. 49–95, 1996. [24] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1985. [25] D. P. Landau and K. Binder, A Guide to Monte-Carlo Simulations in Statistical Physics. Cambridge, U.K.: Cambridge Univ. Press, 2000. [26] A. Lewis and M. Overton, “Eigenvalue optimization,” Acta Numer., vol. 5, pp. 149–190, 1996. [27] C. W. Wu, “On Rayleigh–Ritz ratios of a generalized Laplacian matrix of directed graphs,” Linear Algebra Appl., vol. 402, pp. 207–227, 2005. [28] D. Gillespie, “A general method for numerically simulating the stochastic time evolution of coupled chemical reactions,” J. Comput. Phys., vol. 22, pp. 403–434, 1976. [29] D. C. Conner, A. Rizzi, and H. Choset, “Composition of local potential functions for global robot control and navigation,” in Proc. Int. Conf. Intell. Robots Syst., 2003, pp. 3546–3551. [30] H. G. Tanner, A. Jadbabaie, and G. J. Pappas, “Flocking in fixed and switching networks,” IEEE Trans. Autom. Control, vol. 52, no. 5, pp. 863– 868, May 2007. Spring Berman (M’07) received the B.S.E. degree in mechanical and aerospace engineering, and a Certificate in robotics and intelligent systems, from Princeton University, Princeton, NJ, in 2005 and the M.S.E. degree in mechanical engineering and applied mechanics in 2008 from the University of Pennsylvania, Philadelphia, where she is currently working toward the Ph.D. degree in mechanical engineering and applied mechanics. Her current research interests include swarm robotics and bio-inspired control of multi-robot systems. 937 M. Ani Hsieh (M’96) received the B.S. degree in engineering and the B.A. degree in economics from Swarthmore College, Swarthmore, PA, in 1999 and the Ph.D. degree in mechanical engineering and applied mechanics from the University of Pennsylvania, Philadelphia, in 2007. She is currently an Assistant Professor with the Department of Mechanical Engineering and Mechanics, Drexel University, Philadelphia. Her current research interests include bio-inspired control strategies for robot teams, control and coordination of heterogeneous teams, autonomous ground vehicles, and networked robots. Vijay Kumar (S’87–M’87–SM’02–F’05) received the M.S. and Ph.D. degrees in mechanical engineering from The Ohio State University, Columbus, in 1985 and 1987, respectively. Since 1987, he has been with the University of Pennsylvania, Philadelphia, where he was the Director of the General Robotics, Automation, Sensing, and Perception Laboratory from 1998 to 2004, the Chairman of the Department of Mechanical Engineering and Applied Mechanics from 2005 to 2008, the Deputy Dean of the School of Engineering and Applied Science from 2000 to 2004, and is currently a United Parcel Service Foundation Professor and the Associate Dean for Academic Affairs in the School of Engineering and Applied Science, a faculty Member with the Department of Mechanical Engineering and Applied Mechanics, and is also with the Department of Computer and Information Science. He was a member of the editorial boards of the Journal of the Franklin Institute, the American Society of Mechanical Engineers (ASME) Journal of Mechanical Design, and the ASME Journal of Mechanisms and Robotics. His current research interests include robotics and networked multi-agent systems. Prof. Kumar was a member of the editorial boards of the IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION and the IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING. He is a recipient of the 1991 National Science Foundation Presidential Young Investigator Award, the Lindback Award for Distinguished Teaching, the 1997 Freudenstein Award for significant accomplishments in mechanisms and robotics, and the 2004 IEEE International Conference on Robotics and Automation Kawamori Best Paper Award. He is also a Distinguished Lecturer with the IEEE Robotics and Automation Society and an elected member of the Robotics and Automation Society Administrative Committee. He is a Fellow of the ASME. Ádám Halász (M’06) received the Diploma de Licenta (B.A.) degree in physics from the University of Bucharest, Bucharest, Romania, in 1989 and the M.A. degree in physics and the Ph.D. degree in nuclear physics from Stony Brook University, Stony Brook, NY, in 1995 and 1998, respectively. From 1998 to 2003, he was a Postdoctoral Fellow with the Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, where he was also a Research Scientist with the General Robotics, Automation, Sensing, and Perception Laboratory from 2003 to 2008 and a National Institutes of Health/National Library of Medicine Postdoctoral Bioinformatics Fellow from 2004 to 2006. He is currently an Assistant Professor with the Department of Mathematics, West Virginia University, Morgantown. His current research interests include biomathematics, molecular systems biology, mesoscopic phenomena in cells, and swarm robotics. Authorized licensed use limited to: University of Pennsylvania. Downloaded on September 30, 2009 at 13:16 from IEEE Xplore. Restrictions apply.