[go: up one dir, main page]

Academia.eduAcademia.edu
Robustness and performance of threshold-based resource allocation policies Takayuki Osogami Mor Harchol-Balter Alan Scheller-Wolf Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA, {osogami,harchol}@cs.cmu.edu, Tepper School of Business, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA, awolf@andrew.cmu.edu. Area of review: Manufacturing, Service, and Supply Chain Optimization. Subject Classifications: Production/scheduling: Flexible manufacturing/line balancing. Queues: Markovian. Dynamic programming / optimal control: Markov: Infinite state. Abstract We provide the first analytical study of the mean response time and robustness of a wide range of threshold-based resource allocation policies for a multiserver queueing system such as those commonly used in modeling call centers. We introduce two different types of robustness: static robustness and dynamic robustness. Static robustness measures robustness against misestimation of load (i.e., constant load differing from that predicted), while dynamic robustness measures robustness against fluctuations in load (i.e., alternating high and low loads, or burstiness). We find that using multiple thresholds can have significant benefit over using only a single threshold with respect to static robustness, but that multiple thresholds surprisingly offer only small advantage with respect to dynamic robustness and mean response time. A careful evaluation of load conditions allows us to establish guidelines for choosing a good resource allocation policy, with respect to simplicity, robustness, and mean response time. Finally, we evaluate the effectiveness of our guidelines in designing resource allocation policies at a call center. 1 Introduction A common problem in multiserver systems is deciding how to allocate resources (e.g. operators, CPU time, and bandwidth) among jobs to maximize system performance, e.g. with respect to mean response time or throughput. Since good parameter settings typically depend on environmental conditions such as system loads, an allocation policy that is optimal in one environment may provide poor performance when the environment changes, or when the estimation of the environment is wrong. In other words, the policy may not be robust. In this paper, we design several allocation policies for multiserver systems, quantifying their performance with respect to mean response time and robustness, providing insights into which types of policies perform well in different operating environments. 1.1 Model and metric We consider a multiserver model that consists of two servers and two queues (Beneficiary-Donor model), as shown in Figure 1. Jobs arrive at queue 1 and queue 2 according to (possibly Markov modulated) Poisson processes with average arrival rates λ1 and λ2 , respectively. Jobs have exponentially distributed service demands; however, the running time of a job may also depend on the 1 2 1 ^1 =  + 1(1  ) 1 12 2 2 2 =  2 12 1 2 Figure 1: Beneficiary-Donor model. affinity between the particular server and the particular job/queue. Hence, we assume that server 1 (beneficiary server) processes jobs in queue 1 (type 1 jobs) with rate µ1 , while server 2 (donor server) can process type 1 jobs with rate µ12 , and can process jobs in queue 2 (type 2 jobs) with rate µ2 . We define ρ1 = λ1 /µ1 , ρ2 = λ2 /µ2 , and ρ̂1 = λ1 /(µ1 + µ12 (1 − ρ2 )). Note that ρ2 < 1 and ρ̂1 < 1 are necessary for the queues to be stable under any allocation policy, since the maximum rate at which type 1 jobs can be processed is µ1 , from server 1, plus µ12 (1 − ρ2 ), from server 2. The Beneficiary-Donor model has a wide range of applications in service facilities such as call centers and repair facilities. For example, in call centers, the donor server may be a bilingual operator, and the beneficiary server may be a monolingual operator (Shumsky, 2004; Stanford and Grassmann, 1993, 2000), or the donor server may be a cross-trained or experienced generalist who can handle all types of calls, and the beneficiary server may be a specialized operator who is only trained to handle a specific type of calls (Shumsky, 2004). In a repair facility, the donor server may be a technician who can handle jobs of any difficulty, and the beneficiary server may be a technician with limited expertise (Green, 1985). We design and evaluate allocation policies for the Beneficiary-Donor model with respect to three objectives. First, as is standard in the literature, we seek to minimize the overall weighted mean response time, c1 p1 E[R1 ] + c2 p2 E[R2 ], where ci is the weight (importance) of type i jobs, pi = λi /(λ1 + λ2 ) is the fraction of type i jobs, and E[Ri ] is the mean response time of type i jobs, for i = 1, 2. Here, response time refers to the total time a job spends in the system. Below, we refer to overall weighted mean response time simply as mean response time. In addition to mean response time, we consider an additional metric, robustness, introducing two types of robustness: static robustness and dynamic robustness. Static robustness measures 2 robustness against misestimation of load; to evaluate static robustness, we analyze the mean response time of allocation polices for a range of loads to see how a policy tuned for one load behaves under different loads. Dynamic robustness measures the robustness against fluctuations in load; to evaluate dynamic robustness, we analyze the mean response time of allocation policies under Markov modulated Poisson processes, where arrivals follow a Poisson process at each moment, but the arrival rate changes over time. 1.2 Prior work There has been a large amount of prior work on the Beneficiary-Donor model, the majority of which focused on proving the optimality of allocation policies in limiting or special cases. With respect to calculating mean response times, only coarse approximations exist for most of the allocation policies in our model. We provide a nearly exact analysis of these, as well as other allocation policies, while also investigating static and dynamic robustness. One common allocation policy is the cµ rule (Cox and Smith, 1971), which biases in favor of jobs with high c (high importance) and high µ (small expected size). Applying the cµ rule to our setting, server 2 serves type 1 jobs (rather than type 2 jobs) if c1 µ12 > c2 µ2 , or queue 2 is empty. The cµ rule is provably optimal when server 1 does not exist (Cox and Smith, 1971) or in the fluid limit (Meyn, 2001; Squillante et al., 2002). However Squillante et al. (2001) as well as Harrison (1998) have shown that cµ rule may lead to instability (queue length growing unboundedly) even if ρ̂1 < 1 and ρ2 < 1. More recently, Mandelbaum and Stolyar (2004) and Van Mieghem (1995) have introduced and analyzed the generalized cµ rule. However, in our model, the generalized cµ rule reduces to the cµ rule and hence has the same stability issues. In light of this instability, Squillante et al. (2001) and Williams (2000) independently proposed a threshold-based policy that, under the right choice of threshold value, improves upon the cµ rule with respect to mean response time, guaranteeing stability whenever ρ̂1 < 1 and ρ2 < 1. We refer to this threshold-based policy as the T1 policy, since it places a threshold value, t1 , on queue 1, so that server 2 processes type 1 jobs only when there are at least t1 jobs of type 1, or if queue 2 is empty. The rest of the time server 2 works on type 2 jobs. This “reserves” a certain amount of work for server 1, preventing server 1 from being under-utilized and server 2 from becoming overloaded, as can happen under the cµ rule. Bell and Williams (2001) prove the optimality of 3 the T1 policy for a model closely related to ours in the heavy traffic limit. However, studies by Meyn (2001) and Ahn et al. (2004) suggest that the T1 policy is not optimal in general. Meyn obtains, via a numerical approach, the optimal allocation policy when both queues have finite buffers. Although not proven, the optimal policy appears to be a “flexible” (i) (i) T1 policy that allows a continuum of T1 thresholds, {t1 }, where threshold t1 is used when the length of queue 2 is i. Ahn et al. characterize the optimal policy with respect to minimizing the total holding cost until all the jobs in the system at time zero leave the system, assuming that there are no arrivals after time zero. They also find that the optimal policy is in general a “flexible” T1 policy. All of the work above investigates a class of allocation policies that are optimal in limiting or special cases. In contrast, there has been little work on the analysis and evaluation of the mean response time of general allocation policies in our model, and no work evaluating robustness. Complicating this problem is the fact that the state space required to capture the system behavior grows infinitely in two dimensions; i.e., we need to track both the number of type 1 jobs and the number of type 2 jobs. Hence, only approximate analyses exist for most of allocation policies in our model. For example, Squillante et al. (2001) derive a coarse approximation for the mean response time of the T1 policy under Poisson arrivals based on vacation models. The mean response time of other simple allocation policies (in more general models) such as (idle) cycle stealing, where server 2 works on type 1 jobs when queue 2 is empty, have also been analyzed (with approximation) either by matrix analytic methods with state space truncation (Green, 1985; Stanford and Grassmann, 1993, 2000) or by approximate solutions of a 2D-infinite Markov chain via state space decomposition (Shumsky, 2004). Recently, we have introduced the first nearly exact analysis of the mean response time under a wide range of allocation policies for the Beneficiary-Donor model (Osogami et al., 2004). However, the analysis in (Osogami et al., 2004) is limited to Poisson arrivals. 1.3 Contributions of the paper • In this paper, we extend the analysis in (Osogami et al., 2004) to more general arrival processes, which allows us to investigate static and dynamic robustness. Our analysis is based on the approach of dimensionality reduction, DR (see for example Osogami, 2005). 4 DR reduces a two dimensionally (2D) infinite Markov chain to a 1D-infinite Markov chain, which closely approximates the 2D-infinite Markov chain. In particular, DR allows us to evaluate the mean response time under the T1 policy, and a similar policy called the T2 policy which places a threshold on queue 2. • We introduce two types of robustness: static robustness and dynamic robustness, and analytically study a wide range of threshold-based allocation policies with respect to both types of robustness. Surprisingly, we will see that policies that excel in static robustness do not necessarily excel in dynamic robustness. • Specifically, we find that an allocation policy with multiple thresholds can experience significant benefit over allocation policies with a single threshold with respect to static robustness. Illustrating this, we introduce the adaptive dual threshold (ADT) policy, which places two thresholds on queue 1, and show this has significant advantage over single threshold allocation policies with respect to static robustness. The ADT policy operates like a T1 policy, but the threshold value is self-adapted to the load. • In contrast to this, we find that multiple thresholds surprisingly offer only small advantage over a single threshold with respect to mean response time and dynamic robustness. • We apply the principles learned to designing allocation policies for call centers: based on the characterization of a call center’s operational data, we identify effective allocation policies. We then evaluate our recommended policies via trace driven simulation. Results suggest that our policies can reduce the mean response time by orders of magnitude. The rest of the paper is organized as follows. Section 2 discusses single threshold allocation policies, and Section 3 discusses multiple threshold allocation policies. In Sections 2-3, we evaluate the policies with respect to mean response time and static robustness. In Section 4, we shift our interest to dynamic robustness. In Section 5, we study a real-world call center fitting our model. 2 Analysis of single threshold allocation policies In this section, we analytically study the mean response time and static robustness of two single threshold allocation policies. The T1 policy (Section 2.1) places a threshold, t1 , on queue 1, 5 whereby server 2 serves type 1 jobs whenever the length of queue 1 is at least t1 . Thus, under T1, the beneficiary queue (queue 1) has control. Our second policy, the T2 policy (Section 2.2), places a threshold, t2 , on queue 2, whereby server 2 serves type 1 jobs whenever the length of queue 2 is below t2 . In this policy, the donor queue (queue 2) has control. In Section 2.1.2, we introduce a nearly exact analysis of the T1 policy based on DR. (DR also enables the analysis of the T2 policy and the ADT policy.) Our analysis will show that the T1 policy is superior to the T2 policy with respect to minimizing the mean response time, but that the T2 policy is superior with respect to static robustness. 2.1 T1 policy The T1 policy is formally defined as follows: Definition 1 Let N1 (respectively, N2 ) denote the number of jobs at queue 1 (respectively, queue 2). The T1 policy with parameter t1 , the T1(t1 ) policy, is characterized by the following set of rules, all of which are enforced preemptively (preemptive-resume): • Server 1 serves only its own jobs. • Server 2 serves jobs from queue 1 if either (i) N1 ≥ t1 or (ii) N2 = 0 & N1 ≥ 2. Otherwise, server 2 serves jobs from queue 2. To achieve maximal efficiency, we assume the following exceptions. When N1 = 1 and N2 = 0, the job is processed by server 2 if and only if µ1 < µ12 . Also, when t1 = 1 and N1 = 1, the job in queue 1 is processed by server 2 if and only if µ1 < µ12 regardless of the number of type 2 jobs. Note that we will discuss the nonpreemptive case in Section 5. Figure 2 shows the jobs processed by server 2 as a function of N1 and N2 under the T1 policy. Observe that the T1(1) policy is the cµ rule when c1 µ12 > c2 µ2 , and the T1(∞) policy is the cµ rule when c1 µ12 ≤ c2 µ2 ; thus the cµ rule falls within the broader class of T1 policies. 2.1.1 Stability under the T1 policy In the T1 policy, higher t1 values yield the larger stability region, and in the limit as t1 → ∞, the queues under the T1 policy are stable as long as ρ̂1 < 1 and ρ2 < 1. More formally, 6 N2 01 0000000000000000000 1111111111111111111 1111111111111111111 0 0000000000000000000 0000000000000000000 1111111111111111111 1111111111111111111 0000000000000000000 0000000000000000000 1111111111111111111 work on queue 2 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 t1 1111111111111111111 0000000000000000000 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 work on queue 1 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 N1 1111111111111111111 Figure 2: Figure shows whether server 2 works on jobs from queue 1 or queue 2 as a function of N1 and N2 , under the T1 policy with parameter t1 . Theorem 1 Under the T1 policy with parameter t1 < ∞, queue 1 is stable if and only if λ1 < µ1 + µ12 . Stability of queue 2 is given by the following conditions: • For 1 < t1 < ∞, queue 2 is stable if and only if ρ2 <                t 1−ρ11 t (1−ρ1 )ρ 1 1 +µ 1 12 /µ1 t if ρ1 6= 1, 1−ρ11 + 1−ρ t1 t1 +λ1 /µ12 (1) if ρ1 = 1. • For t1 = 1, if µ1 ≥ µ12 , queue 2 is stable if and only if equation (1) holds with t1 = 2. • For t1 = 1, if µ1 < µ12 , queue 2 is stable if and only if ρ2 < 1 1+ ρ1 +λ1 /µ12 1−ρ1 +µ12 /µ1 . Proof: We prove only the case when t1 > 1 and ρ1 6= 1. The case when t1 = 1 or ρ1 = 1 can be proved in a similar way. Let N = (N1 , N2 ) be the joint process of the number of jobs in queue 1 and queue 2, respectively. The expected length of a “busy period,” during which N1 ≥ t1 , is finite if and only if λ1 < µ1 + µ12 . This proves the stability condition for queue 1. Based on the strong law of large numbers, the necessary and sufficient condition for stability of queue 2 is ρ2 < F , where F is the time average fraction of time that server 2 processes type 2 jobs given N2 > 0. Below, we derive F . Let Ñ = (Ñ1 , Ñ2 ) be a process in which Ñ behaves the same 7 as N except that it has no transition from Ñ2 = 1 to Ñ2 = 0. Consider a semi-Markov process of Ñ1 , where the state space is (0,1,2,...,t1 − 1,t+ 1 ). The state n denotes there are n jobs in queue 1 for n = 0, 1, ..., t1 − 1, and the state t+ 1 denotes there are at least t1 jobs in queue 1. The expected sojourn time is 1/λ1 for state 0, 1/(λ1 + µ1 ) for states n = 1, ..., t1 − 1, and b = 1/(µ1 +µ12 ) 1−λ1 /(µ1 +µ12 ) for state t+ 1 , where b is the mean duration of the busy period in an M/M/1 queue with arrival rate λ1 and service rate µ1 + µ12 . The limiting probabilities for the corresponding embedded discrete time Markov chain are πn = (1 + ρ1 )ρ1n−1 π0 for n = 1, ..., t1 − 1 and πt+ = ρt11 π0 , where 1 π0 = (1 + ρt11 −1 )(1 1 − ρ1 . − ρ1 ) + (1 + ρ1 )(1 − ρt11 −1 ) As server 2 can work on queue 2 if and only if Ñ1 < t1 , the fraction of time that server 2 can work on queue 2 is F = π0 /λ1 + (1 − π0 − πt+ )/(λ1 + µ1 ) 1 π0 /λ1 + (1 − π0 − πt+ )/(λ1 + µ1 ) + bπt+ 1 1 = 1 − ρt11 1 − ρt11 + t (1−ρ1 )ρ11 1−ρ1 +µ12 /µ1 . The following corollary is an immediate consequence of Theorem 1. Corollary 1 Under the T1 policy, the stability region increases with t1 (i.e., the right hand side of equation (1) is an increasing function of t1 ). 2.1.2 Analysis of the T1 policy Our analysis of the T1 and other threshold-based policies is based on dimensionality reduction, DR (see for example Osogami, 2005). Advantages of DR include computational efficiency, accuracy, and simplicity; these allow us to extensively investigate the performance characteristics of the allocation policies. DR reduces a 2D-infinite Markov chain (see Figure 3(a)) to a 1D-infinite Markov chain (see Figure 3(b)), which closely approximates the 2D-infinite Markov chain. To derive the mean response time of T1, the 1D-infinite Markov chain tracks the exact number of type 2 jobs, but tracks the number of type 1 jobs only up to the point t1 − 1. At this point a type 1 arrival starts a “busy period,” during which both servers are working on type 1 jobs, and 8 Increasing number jobs in queue 2 max (µ1,µ 12) λ1 1,0 µ+ λ1 2,0 µ+ λ1 3,0 µ+ λ1 4,0 λ2 µ2 λ2 µ2 λ2 µ2 λ2 λ2 0,1 µ1 λ1 1,1 µ1 λ1 2,1 µ+ λ1 3,1 µ+ λ1 4,1 λ2 µ2 λ2 µ2 λ2 µ2 λ2 λ2 0,2 µ1 λ1 1,2 µ1 λ1 2,2 µ+ λ1 3,2 µ+ λ1 4,2 λ2 µ2 0,0 Increasing number jobs in queue 1 Increasing number jobs in queue 1 0,0 Increasing number jobs in queue 2 λ2 µ2 λ2 µ2 λ2 λ2 λ1 max (µ1,µ 12) 1,0 µ1+µ12 λ1 2,0 β1 λ1 + 3 ,0 β2 β12 3+,0 λ2 µ2 λ2 µ2 λ2 µ2 λ2 0,1 µ1 λ1 1,1 µ1 λ1 2,1 β1 λ1 + 3 ,1 β2 λ2 β12 3+,1 λ2 µ2 λ2 µ2 λ2 µ2 0,2 µ1 λ1 1,2 µ1 λ1 2,2 β1 λ1 λ2 3+,2 β2 λ2 λ2 µ2 λ2 µ2 λ2 µ2 λ2 β12 3+,2 λ2 µ+ = µ1 + µ12 (a) 2D infinite Markov chain (b) 1D infinite Markov chain Figure 3: Markov chains that model the behavior under the T1(3) policy. In figures, (i, j) represents the state where there are i jobs of type 1 and j jobs of type 2. In (a), µ+ ≡ µ1 + µ12 . type 2 jobs receive no service. This “busy period” ends when there are once again t1 − 1 jobs of type 1. State (t+ 1 , j) denotes there are at least t1 jobs of type 1 and there are j jobs of type 2 for j ≥ 0. The key point is that there is no need to track the exact number of type 1 jobs during this busy period. We approximate the duration of this busy period with a two-phase phase type (PH) distribution with parameters (β1 , β2 , β12 ), matching the first three moments (Osogami, 2005). We use the limiting probabilities of the Markov chain in Figure 3(b) to calculate the mean number of jobs of each type, E[N1 ] and E[N2 ], which in turn gives their mean response time via Little’s law. These limiting probabilities can be obtained efficiently via matrix analytic methods (Latouche and Ramaswami, 1999). Deriving E[N2 ] from the limiting probabilities is straightforward, since we track the exact number of type 2 jobs. We derive E[N1 ] by conditioning on the state of the chain. Let E[N1 ]ij denote the expected number of type 1 jobs given that the chain is in state (i, j). For i = 0, ..., t1 − 1, E[N1 ]ij = i for all j. For i = t+ 1 , E[N1 ]t+ ,j is the mean number 1 of jobs in an M/M/1 system given that the service rate is the sum of the two servers, µ1 + µ12 , and given that the system is busy, plus an additional t1 jobs. We find that the mean response time computed via DR is usually within two percent of the simulated value (Osogami, 2005). The high accuracy of DR stems from the fact that the state space of the 2D-infinite Markov chain in Figure 3(a) is not simply truncated. Rather, two rows representing 3+ jobs of type 1 in the 1D-infinite Markov chain in Figure 3(b) capture the infinite 9 number of rows (row 4, 5, ...) in the 2D-infinite Markov chain in such a way that the first three moments of the sojourn time distribution in these two regions agree. 2.1.3 Characterizing the performance of the T1 policy Our analysis of Section 2.1.2 allows an efficient and accurate analysis of the T1 policy. In this section, we characterize this performance. We find that the behavior of the T1 policy is quite different depending on whether queue 2 prefers type 1 or type 2 (c1 µ12 ≤ c2 µ2 or c1 µ12 > c2 µ2 ). The optimal t1 threshold is typically finite when c1 µ12 > c2 µ2 , and typically infinite when c1 µ12 ≤ c2 µ2 , where the optimality is with respect to minimizing the mean response time. In fact, when c1 = c2 and c1 µ12 ≤ c2 µ2 , the following theorem holds (the theorem may extend to the case of c1 µ12 ≤ c2 µ2 with general c1 and c2 , but the general case is not proved). Theorem 2 If c1 = c2 and c1 µ12 ≤ c2 µ2 , the mean response time of the T1 policy is minimized at t1 = ∞ (i.e., the cµ-rule is optimal). Proof: Due to Little’s law, it is sufficient to prove that the number of jobs completed under the T1(∞) policy is stochastically larger than those completed under the T1 policy with t1 < ∞ at any moment. Let N inf (t) = (N1inf (t), N2inf (t)) be the joint process of the number of jobs in queue 1 and queue 2, respectively, at time t when t1 = ∞. Let N fin (t) = (N1fin (t), N2fin (t)) be defined analogously for t1 < ∞. With t1 = ∞, server 2 processes type 2 jobs as long as there are type 2 jobs, and thus N1inf (t) is stochastically larger than N1fin (t) for all t. Consequently, the number of jobs completed by sever 1 is stochastically smaller when t1 < ∞ than when t1 = ∞ at any moment, since server 1 is work-conserving. As long as server 2 is busy, the number of jobs completed by sever 2 is stochastically smaller when t1 < ∞ than when t1 = ∞, since µ12 ≤ µ2 . Also, using coupling argument, we can show that the system with t1 = ∞ has server 2 go idle (both queues empty) earlier (stochastically) than the t1 < ∞ system. Thus, when server 2 is idle the number of jobs completed by the t1 = ∞ system is stochastically larger. Thus, the number of jobs completed (either by server 1 or by server 2) under the T1(∞) policy is stochastically larger than that completed under the T1 policy with t1 < ∞. 10 2 1 ^ρ =0.95 1 ^ =0.9 ρ 1 ^ =0.8 ρ 1 10 ^ρ =0.95 1 ^ =0.9 ρ 1 ^ =0.8 ρ 0 10 20 t1 (a) c1 µ1 = 30 ^ρ =0.95 1 ^ =0.9 ρ 1 ^ =0.8 ρ 1 1 1 10 mean response time mean response time 10 mean response time 10 1 0 40 0 1 4 10 20 t1 (b) c1 µ1 = 1 30 40 10 0 10 20 t1 30 40 (c) c1 µ1 = 4 Figure 4: The mean response time under the T1 policy as a function of t1 . Here, c1 = c2 = 1, 1 c1 µ12 = 1, c2 µ2 = 16 , and ρ2 = 0.6 are fixed. Since t1 = ∞ achieves the largest stability region (Corollary 1), if c1 = c2 , t1 = ∞ is the optimal choice with respect to both mean response time and the stability region. Note that the T1(∞) policy is the policy of following the cµ rule, as server 2 “prefers” to run its own jobs in a cµ sense when c1 µ12 ≤ c2 µ2 . Therefore, below we limit our attention to the case of c1 µ12 > c2 µ2 , where server 2 “prefers” to run type 1 jobs in a cµ sense. Note that condition c1 µ12 > c2 µ2 is achieved when type 1 jobs are small and type 2 jobs are large, when type 1 jobs are more important than type 2 jobs, and/or in the pathological case when type 1 jobs have good affinity with server 2. (These, in addition, may motivate use of the Beneficiary-Donor model, giving smaller or more important jobs better service.) We will see that the optimal t1 threshold is typically finite when c1 µ12 > c2 µ2 , in contrast to the cµ rule. Figure 4 shows the mean response time under the T1 policy as a function of t1 ; we see that optimal t1 is finite and depends on environmental conditions such as load (ρ̂1 ) and job sizes (µ1 ). Here, different columns correspond to different µ1 ’s. In each column, the mean response time is evaluated at three loads, ρ̂1 = 0.8, 0.9, 0.95, by changing λ1 (Note that ρ̂1 = 0.8, 0.9, 0.95 corresponds to ρ1 = 2.08, 2.34, 2.47 when µ1 = 1/4 in column 1, ρ1 = 1.12, 1.26, 1.33 when µ1 = 1 in column 2, and ρ1 = 0.88, 0.99, 1.05 when µ1 = 4 in column 3.) By Theorem 1, a larger value of t1 leads to a larger stability region, and hence there is a tradeoff between good performance at the estimated load, (ρ̂1 , ρ2 ), which is achieved at smaller t1 , and stability at higher ρ̂1 and/or ρ2 , which is achieved at larger t1 . Note also that the curves have sharper “V shapes” in general 11 2 mean response time 10 1 10 T1(3): opt at ρ2=0.4 T1(19): opt at ρ2=0.8 0 10 0.4 0.5 0.6 ρ 0.7 0.8 2 Figure 5: The mean response time under the T1(3) policy and the T1(19) policy as a function of 1 , and ρ1 = 1.15 are fixed. ρ2 , where c1 = c2 = 1, c1 µ1 = c1 µ12 = 1, c2 µ2 = 16 at higher ρ̂1 , which complicates the choice of t1 , since the mean response time quickly diverges to infinity as t1 becomes smaller. In addition, analyses (Osogami, 2005) show that the value of the cµ product primarily determines the behavior of the T1 policy, and individual values of c and µ have smaller effect. Also, when ρ2 is lower (and thus ρ1 is higher for a fixed ρ̂1 ), the optimal t1 tends to become smaller, and hence the tradeoff between the performance at the estimated load and stability at higher loads is more significant. This makes intuitive sense, since at lower ρ2 , server 2 can help more. Figure 5 highlights the static robustness of the T1 policy, plotting the mean response time as a function of ρ2 (only λ2 is changed) for T1(3) and T1(19). When ρ2 = 0.4, t1 = 3 is the optimal threshold (but T1(19) still provides finite mean response time). However, if it turns out that ρ2 = 0.8 is the actual load, then the T1(3) policy leads to instability (infinite mean response time), while the T1(19) policy minimizes mean response time. Thus, choosing a higher t1 (=19) guarantees stability against misestimation of the load, but results in worse performance at the estimated load. This experiment and others like it (Osogami, 2005) lead us to conclude that the T1 policy is poor with respect to static robustness. 2.2 T2 policy In this section, we investigate the mean response time and static robustness of the T2 policy, comparing it to the T1 policy. The T2 policy is formally defined as follows: Definition 2 The T2 policy with parameter t2 , the T2(t2 ) policy, is characterized by the following 12 3 3 10 work on queue 1 N1 work on queue 2 mean response time 00000000000 11111111111 11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 mean response time 0 10 N2 t2 0 2 10 1 10 T2(1) T2(2) T2(16) 0 10 0.4 (a) 0.5 0.6 ρ2 0.7 0.8 2 10 1 10 T2(2) T1(3), T1(19) 0 10 0.4 (b) 0.5 0.6 ρ2 0.7 0.8 (c) Figure 6: Part (a) shows whether server 2 works on jobs from queue 1 or queue 2 as a function of N1 and N2 , under the T2 policy with parameter t2 . Part (b) shows the mean response time under the T2 policy with various t2 threshold values as a function of ρ2 . Part (c) compares the mean response time under the “optimized” T2 policy and two T1 policies. In (b) and (c), c1 = c2 = 1, 1 , and ρ1 = 1.15 are fixed. c1 µ1 = c1 µ12 = 1, c2 µ2 = 16 set of rules, all of which are enforced preemptively (preemptive-resume): • Server 1 serves only its own jobs. • Server 2 serves jobs from queue 1 if N2 < t2 . Otherwise server 2 serves jobs from queue 2. When N1 = 1 and N2 = 0, we allow the same exception as in the T1 policy. Figure 6(a) shows the jobs processed by server 2 as a function of N1 and N2 under the T2 policy. Recall that the T1 policy guarantees stability whenever ρ̂1 < 1 and ρ2 < 1 provided that t1 is chosen appropriately. By contrast, the T2 policy guarantees stability whenever ρ̂1 < 1 and ρ2 < 1 for any finite t2 . It clearly dominates the T1 policy in this respect. More formally, the following theorem holds, which can be proved in a similar way as Theorem 1. Theorem 3 Under the T2 policy with t2 < ∞, queue 1 is stable if and only if ρ̂1 < 1, and queue 2 is stable if and only if ρ2 < 1. 2.2.1 Assessing the performance of the T2 policy It is not at all obvious how the T2 policy’s performance compares with that of the T1 policy, when each is run with its optimal threshold. In this section, we investigate this question. The T2 policy can be analyzed via DR as in Section 2.1.2, approximating the 2D-infinite Markov chain by 13 a 1D-infinite Markov chain tracking the exact number of type 1 jobs (cf. the 1D-infinite Markov chain for the T1 policy tracks the exact number of type 2 jobs). With respect to the number of type 2 jobs, the chain differentiates only between 0, 1, ..., t2 − 1, or t+ 2 jobs. Figure 6(b) illustrates that the mean response time under the T2 policy is minimized at a small t2 (in this case t2 = 2) for a range of load. This figure is representative of a wide range of parameter values that we studied. Since choosing a small t2 minimizes the mean response time and still provides the maximum stability region, there is no tradeoff between minimizing the mean response time and maximizing the stability region with the T2 policy. However, Figure 6(c) (and many other experiments like it) imply that the mean response time under the T2 policy with the optimal t2 is typically higher than that under the T1 policy with the optimal t1 . We conclude that, although the T2 policy has more static robustness than the T1 policy, it performs worse with respect to mean response time. 3 Analysis of multi-threshold allocation policies The tradeoff between the low mean response time of the T1 policy and good static robustness of the T2 policy motivates us to introduce a class of multi-threshold allocation policies: ADT policies. We will study how the mean response time and static robustness of these multi-threshold allocation policies compare to that of the single threshold allocation policies. 3.1 The adaptive dual threshold (ADT) policy The key idea in the design of the ADT policy is that we want the ADT policy to operate as a T1 policy to ensure low mean response time, but we will allow the value of t1 to adapt, depending on the length of queue 2, to provide static robustness. Specifically, the ADT policy behaves like (1) the T1 policy with parameter t1 (2) if the length of queue 2 is less than t2 and otherwise like the (2) (1) T1 policy with parameter t1 , where t1 > t1 . We will see that, indeed, the ADT policy is far superior to the T1 policy with respect to static robustness. In addition, one might also expect that the mean response time of the optimized ADT policy will significantly improve upon that of the optimized T1 policy, since the ADT policy generalizes the T1 policy (the ADT policy is (1) (2) reduced to the T1 policy by setting t1 = t1 ). However, this turns out to be largely false, as we 14 N 2 01 t2 1111111111111111111 0000000000000000000 0000000000000000000 0 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 work on queue 2 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 (1) 1111111111111111111 t 1 0000000000000000000 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 (2) 1111111111111111111 0000000000000000000 t 1 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 work on queue 1 0000000000000000000 1111111111111111111 0000000000000000000 N1 1111111111111111111 Figure 7: Figure shows whether server 2 works on jobs from queue 1 or queue 2 as a function of (1) (2) N1 and N2 under the ADT policy with parameters t1 , t1 , and t2 . see below. Formally, the ADT policy is characterized by the following rule. (1) (2) (1) (2) Definition 3 The ADT policy with parameters t1 , t1 , and t2 , the ADT(t1 , t1 , t2 ) policy, (1) (2) operates as the T1(t1 ) policy if N2 ≤ t2 ; otherwise, it operates as the T1(t1 ) policy. Figure 7 shows the jobs processed by server 2 under the ADT policy as a function of N1 and N2 . (A separate class of multi-threshold allocation policies that place only one threshold on queue 1 and one on queue 2, the T1T2 policy, is introduced in Osogami et al., 2004. Its mean response time and static robustness are only marginally improved over T1 and T2 policies. Thus, T1T2 is in general inferior to ADT.) At high enough ρ̂1 and ρ2 , N2 usually exceeds t2 , and the policy behaves similar to the T1 (2) policy with parameter t1 . Thus, the stability condition for ADT is the same as that for T1 with (2) parameter t1 . The following theorem can be proved in a similar way as Theorem 1. (1) (2) Theorem 4 The stability condition for the ADT policy with parameters t1 , t1 , and t2 is given (2) by the stability condition for the T1 policy with parameter t1 (Theorem 1). The ADT policy can likewise be analyzed via DR as in Section 2.1.2, by approximating the 2D-infinite Markov chain by a 1D-infinite Markov chain (see Figure 8). For the ADT policy, the 1D-infinite Markov chain tracks the exact number of type 2 jobs, but tracks the number of type (2) 1 jobs only up to the point where there are t1 − 1 jobs. A type 1 arrival at this point starts a (2) “busy period,” which ends when there are once again t1 − 1 jobs of type 1. We approximate the duration of this busy period with a two-phase PH distribution with parameters (β1 , β2 , β12 ), 15 Increasing number jobs in queue 2 λ2 µ2 Increasing number jobs in queue 1 0,0 λ2 µ2 0,1 µ1 λ1 λ2 1,0 µ2 1,1 µ1+µ12 λ1 µ1+µ12 λ1 λ2 λ1 max (µ1,µ 12) 2,0 3,0 β1 λ1 λ2 + 4 ,0 β2 β12 4+,0 λ2 2,1 λ1 µ1+µ12 λ2 µ1+µ12 λ2 µ2 λ2 λ1 λ2 3,1 β1 λ1 + 4 ,1 β2 λ2 β12 4+,1 λ2 λ2 µ2 0,2 µ1 λ1 λ2 µ2 1,2 µ1 λ1 λ2 µ2 2,2 µ1 λ1 λ2 µ2 3,2 β1 λ1 + 4 ,2 β2 λ2 β12 4+,2 λ2 λ2 µ2 0,3 µ1 λ1 λ2 µ2 1,3 µ1 λ1 λ2 µ2 2,3 µ1 λ1 λ2 µ2 3,3 β1 λ1 4+,3 β2 λ2 β12 4+,3 λ2 Figure 8: The 1D-infinite Markov chain that models the behavior under the ADT(2,4,2) policy. (2) + matching the first three moments as before. State (t1 (2) , j) denotes that there are at least t1 jobs of type 1 and there are j jobs of type 2 for j ≥ 0. The mean response time is again obtained via matrix analytic methods. In Appendix A.1, we analyze the ADT policy more formally. 3.2 Results: Static robustness of the ADT policy Figure 9 illustrates static robustness of the ADT policy, showing the mean response time under the ADT policy as a function of ρ2 ; the ADT policy achieves at least as low mean response time as the better of the T1 policies with the two different t1 values throughout the range of ρ2 . Though not shown, the ADT policy is also (statically) robust against misestimation of ρ̂1 (Osogami, 2005). The robustness of the ADT policy can be attributed to the following. The dual thresholds on queue 1 make the ADT policy adaptive to misestimation of load, in that the ADT policy with (1) (2) (1) parameters t1 , t1 , and t2 operates like the T1 policy with parameter t1 at the estimated load (2) (2) (1) and like the T1 policy with parameter t1 at a higher load, where t1 > t1 . Thus, server 2 can help queue 1 less when there are more type 2 jobs, preventing server 2 from becoming overloaded. This leads to the increased stability region and improved performance. (1) (2) In specifying the three thresholds, t1 , t1 , and t2 , for the ADT policy in Figure 9, we have used the following sequential heuristic: (1) 1. Set t1 as the optimal t1 value for the T1 policy at the estimated (given) load. 16 2 mean response time 10 1 10 ADT(3,19,6) T1(3),T1(19) 0 10 0.4 0.5 0.6 ρ2 0.7 0.8 Figure 9: The mean response time under the ADT policy as a function of ρ2 . Here, c1 = c2 = 1, 1 , and ρ1 = 1.15 are fixed. c1 µ1 = c1 µ12 = 1, c2 µ2 = 16 (2) 2. Choose t1 so that it achieves stability in a desired range of load. We find that the mean (2) response time at the estimated load is relatively insensitive to t1 , and hence we can choose (2) a high t1 to guarantee a large stability region. 3. Find t2 such that the policy provides both low mean response time at the estimated load and good static robustness. This is a nontrivial task. If t2 is set too low, the ADT policy (2) behaves like the T1 policy with parameter t1 , degrading the mean response time at the (2) estimated load, since t1 is larger than the optimal t1 in the T1 policy. If t2 is set too (1) high, the ADT policy behaves like the T1 policy with parameter t1 . This worsens the mean response time at loads higher than the estimated load. In plotting Figure 9, we found “good” t2 values manually by trying a few different values, which took only a few minutes. (1) Observe that since the stability region is insensitive to t1 and t2 , we can choose these values so that the mean response time at the estimated load is optimized. 3.3 Results: Mean response time of the ADT policy We have already seen the benefits of the ADT policy when the load is not exactly known (static robustness). One might also expect that, even when the load is known exactly, the ADT policy might significantly improve upon the T1 policy with respect to mean response time. Earlier work of Meyn (2001) provides some support for this expectation; Meyn shows via numerical examples that, in the case of finite buffers for both queues, the policy that minimizes mean response time 17 0 −1 −1 percentage change (%) percentage change (%) 0 −2 −3 −4 −5 0.2 0.4 0.6 ρ −2 −3 −4 −5 0.8 0.2 0.4 2 (a) c2 µ2 = 0.6 ρ 0.8 2 1 4 (b) c2 µ2 = 1 16 Figure 10: The percentage change (%) in the mean response time of the (locally) optimized ADT policy over the optimized T1 policy at each given load, as a function of ρ2 . A negative percentage indicates the improvement of ADT over T1. Here, c1 = c2 = 1, c1 µ1 = c1 µ12 = 1, and ρ1 = 1.15 are fixed. (i) (i) is a “flexible” T1 policy which allows a continuum of T1 thresholds, {t1 }, where threshold t1 is used when the length of queue 2 is i. The ADT policy can be seen as an approximation of a “flexible” T1 policy, using only two t1 thresholds. To evaluate the benefit of the ADT policy, we compare it over a range of ρ2 against the T1 policy optimized for the given ρ2 . Since the search space of the threshold values for the ADT policy is large, we find locally optimal threshold values, which are found to be optimal within a search space of ±5 for each threshold. We measure the percentage change in the mean response time of ADT versus T1: E[RADT ] − E[RT 1 ] × 100 E[RT 1 ] (%), (2) where E[RX ] denotes the mean response time in policy X ∈ {ADT,T1}. Figure 10 shows the percentage reduction in the mean response time of the locally optimized ADT policy over the T1 policy optimized at each ρ2 , as a function of ρ2 . Figure 10 shows that, surprisingly, the benefit of the ADT policy is quite small with respect to mean response time under fixed Poisson arrivals; the improvement of the ADT policy is larger at moderately high ρ2 and at smaller c2 µ2 value, but overall the improvement is typically within 3%. We conjecture that adding more thresholds (approaching the flexible T1 policy) will not improve mean response time appreciably, given the small improvement from one to two thresholds. Thus, whereas the 18 ADT policy has significant benefits over the simpler T1 policy with respect to static robustness, the two policies are comparable with respect to mean response time. 4 Dynamic robustness of threshold-based policies We have seen, in Section 3.1, that the mean response time of the optimized ADT policy is similar to that of the optimized T1 policy, although the ADT policy has greater static robustness. Note that this observation is based on the assumption of Poisson arrivals. Consider, to start, an alternate scenario, in which the load at queue 2 fluctuates. For example, a long high load period (e.g., ρ1 = 1.15 and ρ2 = 0.8) is followed by a long low load period (e.g., ρ1 = 1.15 and ρ2 = 0.4), and the high and low load periods alternate. The T1 policy with a fixed threshold value must have a high mean response time either during the high load period or during the low load period (recall Figure 5). On the other hand, the ADT policy may provide low mean response time during both high and low load periods, since the t1 threshold value is self-adapted to the load (recall Figure 9). In this section, we study the mean response time of the ADT policy when the load fluctuates, or the dynamic robustness of the ADT policy. We use a Markov modulated Poisson process of order two (MMPP(2)) as an arrival process at queue 2. An MMPP(2) has two phases, which we denote as the high load phase and the low load phase. The duration of each phase has an exponential distribution, which can differ in each phase. During the high (respectively, low) load phase, the arrival process follows a Poisson process with rate λH (respectively, λL ), where λH > λL . We postpone describing the techniques used to analyze the ADT policy under the MMPP to Appendix A.2, and first study the results. Results: Dynamic robustness of the ADT policy vs. the T1 policy Figure 11 shows the percentage change in the mean response time of the (locally) optimized ADT policy over the optimized T1 policy, when arrivals at queue 1 follow a Poisson process and arrivals at queue 2 follow an MMPP(2). The arrival rates in the MMPP(2) are chosen such that the load during the high load period is ρ2 = 0.8 and the load during the low load period is ρ2 = 0.2, 0.4, or 0.6, while ρ1 = 1.15 is fixed throughout. We choose the horizontal axis to be the expected number of type 2 arrivals during a high load period, and the three lines (solid, dashed, and dotted) to 19 0 −5 −5 percentage change (%) percentage change (%) 0 −10 −15 −20 −25 E[N ]=E[N ]/10 L H E[N ]=E[N ] L H E[N ]=10E[N ] L −30 1 10 −10 −15 −20 −25 H 2 10 E[N ]=E[N ]/10 L H E[N ]=E[N ] L H E[N ]=10E[N ] L 3 10 4 10 −30 1 10 5 10 E[N_H] H 2 10 3 10 4 10 5 10 E[N_H] (a) ρ2 = [0.2, 0.8] (c) ρ2 = [0.6, 0.8] Figure 11: The percentage change (%) in the mean response time of the (locally) optimized ADT policy over the optimized T1 policy for each given MMPP(2), shown as a function of the expected number of arrivals during a high load period, E[NH ], and during a low load period, E[NL ]. A negative percentage indicates the improvement of ADT over T1. Here, c1 = c2 = 1, c1 µ1 = 1 , and ρ1 = 1.15 are fixed. c1 µ12 = 1, c2 µ2 = 16 be different expected number of type 2 arrivals during a low load period. Note that since there are less frequent arrivals during a low load period, having the same number of arrivals during the high and low load periods implies that the low load period is longer. Thus, the number of arrivals during each period is an indicator of how frequently the load changes, which we find to be an important parameter in studying dynamic robustness. The threshold values of the optimized T1 policy and the (locally) optimized ADT policy are chosen such that the overall weighted mean response time is minimized for each given arrival process. The first thing to notice in Figure 11 is that the improvement of the ADT policy over the T1 policy is smaller when the duration of the high and low load periods is shorter, or equivalently when there are less arrivals in each period. This makes intuitive sense, since the MMPP(2) reduces to a Poisson process when the high and low load periods alternate infinitely quickly, and under the Poisson process, the optimized T1 policy and the optimized ADT policy provide similar mean response time; see Section 3.1. However, even when the durations are longer, the performance improvement of the ADT policy over the T1 policy is comparatively small (3 to 25%). This is mainly because the mean response time of the jobs arriving during the high load period tends to dominate the overall mean response 20 time for two reasons: (i) the response time of jobs arriving during the high load period is much higher than that of jobs arriving during the low load period, partially due to the fact that any reasonable allocation policy (such as the optimized T1 policy and the optimized ADT policy) can provide low mean response time at low load, and (ii) assuming that a low load period and a high load period have the same duration, there are more arrivals during a high load period. Since the T1 policy with a fixed t1 threshold can provide low mean response time for the jobs arriving during the high load period, it can provide low overall mean response time. (Of course, if there were many more arrivals during the low load period than the high load period, the t1 threshold would be adjusted.) It is only when the jobs arriving during the high load period and the jobs arriving during the low load period have roughly equal contribution to the overall mean response time that the ADT policy can have appreciable improvement over the T1 policy. This happens when P2 L L i=1 ci pi E[Ri ] ∼ P2 H H i=1 ci pi E[Ri ], H where pL i (respectively, pi ) is the fraction of jobs that are type i and arriving during the low (respectively, high) load period, and E[RiL ] (respectively, E[RiH ]) is the mean response time of type i jobs arriving during the low (respectively, high) load period, for i = 1, 2. For example, Figure 11 suggests that the ADT policy can provide a mean response time that is 20-30% lower than that of the T1 policy, when the number of arrivals during a low load period is (∼ 10 times) larger than that during a high load period. In addition (not shown), we find that when arrivals at queue 1 follow an MMPP(2) or when arrivals at both queues follow MMPP(2)’s, the improvement of the ADT policy over the T1 policy tends to be smaller than when only arrivals at queue 2 follow an MMPP(2). Overall, we conclude that ADT has appreciable improvement over T1 only when there are more arrivals during the low load period than during the high load period, giving them comparable importance. 5 Application to call center scheduling In this section we apply lessons of previous sections to designing allocation policies in a telephone call center simulated using traces at a call center of an anonymous bank in Israel in 1999 provided by Guedj and Mandelbaum (2000). This call center uses a service architecture that is similar to the Beneficiary-Donor model, based on different classes of callers. Our goals are to assess what 21 improvement may be possible for the call center through the implementation of threshold-based policies, and more generally to evaluate some of the high level principles of prior sections. For this purpose, we will first study some relevant characteristics of the trace in Section 5.1 (see Brown et al., 2005, for a complementary study of the trace). In particular, we will see that the arrival rate at this call center has great fluctuation as in Section 4. Based on the lessons learned in previous sections, we expect that the T1 policy may perform well in this call center. We will evaluate this expectation via trace driven simulation in Section 5.2. 5.1 5.1.1 Settings Trace characteristics The data spans twelve months of 1999, and were collected at the level of individual calls, at a small call center of an anonymous bank in Israel. An arriving call is first connected to an interactive voice response unit (VRU), where the customer receives recorded information and possibly performs self-service transactions. In total, roughly 1,200,000 calls arrived at the VRU during the year of 1999; out of those, about 420,000 calls indicated a desire to speak to an agent. Below, we limit our focus on the 420,000 calls that requested connection to an agent. The calls requesting connection to an agent can be divided into two types: Internet assistance (IN) calls and Regular calls. IN calls generally ask for technical support for online transactions via web sites. All the other calls are classified as Regular calls. Prior to August 1999, both the IN calls and the Regular calls joined a single shared queue, and were served by the same pool of agents. Post August 1999, the call center split the IN and Regular calls into two separate queues to be served by separate pools of agents (see Figure 12). In addition to distinguishing between two types of calls, the call center also differentiates between high and low priority customers, and looks for ways to give high priority customers shorter waiting times. Table 1 summarizes the total number of calls of each type and of each priority class at the call center during 1999. Out of the approximately 420,000 total calls, about 400,000 calls are Regular, and about 20,000 calls are IN. Out of the 400,000 Regular calls, about 140,000 calls have high priority, and about 260,000 calls have low priority. By contrast, almost all IN calls have low priority. Table 2 shows the percentage of calls that are served by agents. About 85% of the calls are served, and 15% of the calls are abandoned before receiving service by agents (as there are 22 Regular calls IN calls Figure 12: Post-August architectural model of a call center. only two IN calls with high priority, the entry is kept blank in the table). both prio high prio low prio both types 419,857 137,317 282,540 Regular 400,765 137,315 263,450 IN 19,092 2 19,090 both prio high prio low prio Table 1: Total number of calls during the year. both types 84.9% 85.8% 84.4% Regular 85.2% 85.8% 84.8% IN 78.9% 78.9% Table 2: Percentage of calls served. Figure 13 details the total number of calls, showing (a) the daily number of calls during a month (November) and (b) the hourly number of calls during a day (November 1). As Figure 13(a) suggests, the number of calls per day drops on weekends (Fridays and Saturdays in Israel). As Figure 13(b) suggests, the call center opens at 7am on weekdays, and the number of calls per hour peaks before lunch time (∼ 200 calls per hour). After lunch time, there is another peak, and then calls decline through the evening (to roughly 70 calls per hour). The arrival pattern does not differ much day to day during weekdays. Table 3 summarizes the mean service demand (in seconds) and its squared coefficient of variation for those calls that are served. As there are only two IN calls with high priority, the entry is kept blank in the table. Note that the IN calls have noticeably longer service demand with higher variability, and this might be a reason for the call center to serve the IN calls by a separate pool of agents, so that the Regular calls are not blocked by long (low priority) IN calls. 5.1.2 Architectural models for experiment In our experiment, we consider two possible service models for the call center, as shown in Figure 14. In both models, we assume that each queue has a single agent, approximating the fact that each queue is served by a pool of several agents at the call center (in fact, up to 13 agents 23 200 2000 150 1500 100 1000 50 500 0 5 10 15 day 20 25 0 0 30 (a) Daily arrivals 5 10 time 15 20 (b) Hourly arrivals Figure 13: A typical arrival pattern of all 420,000 calls. The figures show the number of (a) daily arrivals in November and (b) hourly arrivals on November 1. both prio high prio low prio both types 190.1 208.7 180.9 Regular 180.6 208.7 165.7 IN 406.3 406.3 both prio high prio low prio both types 2.217 1.871 2.420 Regular 1.836 1.871 1.741 IN 2.974 2.974 (b) C 2 (a) mean Table 3: Statistics of the duration of a service: (a) mean (in seconds) and (b) squared coefficient variation. serve the call center). This approximation becomes more accurate as the load becomes higher, where the study of performance becomes important. In the Regular-IN model Figure 14(a), we separate the IN calls from the Regular calls as in the original architectural model of a call center (Figure 12), but allow the IN agent to sometimes serve the Regular calls. Since almost all IN calls have low priority while 34% of the Regular calls have high priority, we place more weight (importance) on the Regular calls (specifically, the Regular calls have weight cR = 4, and the IN calls have weight cIN = 1). As the IN calls have longer service demand, more variability, and less importance, we do not want the Regular call agent to serve the IN calls. In this section, we assume a nonpreemptive service discipline, following call center convention. (In previous sections we have chosen preemptive service disciplines for clarity, as the analysis of the nonpreemptive case is complex, though possible, as described in Osogami, 2005.) In the Priority model Figure 14(b), we separate high priority calls from low priority calls, and place more weight on high priority jobs (specifically, high priority calls have weight cH = 16, and 24 Regular calls High priority IN calls Low priority cR= 1 cIN = 1 cH= 16 cL = 1 1/ µR = 181 1/ µI = 406 1/ µH = 209 1/ µL = 181 (a) Regular-IN model (b) Priority model Figure 14: Two architectural models of a call center. low priority calls have weight cL = 1). As high priority calls and low priority calls have roughly the same service demand, high priority calls have higher cµ value; thus, we allow the agent for low priority calls to sometimes serve high priority calls. 5.1.3 Preprocessing of trace We consider only those arrivals during weekdays, removing holidays (which have smaller numbers of calls). In our trace driven simulation, we feed the trace of the Regular or high priority calls into queue 1 and the trace of the IN or low priority calls into queue 2. In order to use our trace of limited length multiple times (specifically, 30 times in each run) with different arrival sequences, we follow a common approach in simulation (Uhlig and Mudge, 1997), whereby each call is independently removed from the trace with some probability, as specified below. Another reason for sampling the arrival sequence in this way is to create different loads. Hereby, we define the load at queue i, ρi , as follows for i = 1, 2: ρi = (1 − qi ) × (total number of calls at queue i) × (average duration of a service for queue i) , (total operation hours) where qi is the fraction of the calls at queue i removed from the trace. We pick the service demand of a call from the lognormal distribution whose parameters are estimated from the trace. Picking service demands from a distribution allows us to use the same trace multiple times with different service demand sequences. 25 5.2 Results Our trace characterization (Section 5.1.1) shows that the load at the call center has large fluctuation. We will see that the T1 policy (with a fixed t1 threshold) provides a low mean response time even under this fluctuating load. In addition, we will also see how much improvement the call center can expect with respect to static and dynamic robustness by employing allocation policies such as the T1 and ADT policies, which allow resource sharing. For this purpose, we evaluate the following three allocation policies: The Dedicated policy: Each agent serves only its own queue, as in the original call center model (Figure 12). The T1 policy: The rules are specified in Definition 1, but they are enforced nonpreemptively. The ADT policy: The rules are specified in Definition 3, but they are enforced nonpreemptively. We study static robustness of these policies in Section 5.2.1, and dynamic robustness in Section 5.2.2. 5.2.1 Static robustness Figures 15-16 illustrate static robustness of the Dedicated, T1, and ADT policies, plotting the mean response times as a function of ρ2 , under the Regular-IN model (Figure 15) and under the Priority model (Figure 16). In the Regular-IN model, ρ2 ranges only between 0 and 0.43; no calls are removed from the trace at ρ2 = 0.43. On the other hand, the Priority model allows us to change ρ2 in a much wider range, and we show only a portion of the full range of ρ2 . In both the Regular-IN and Priority models, ρ1 is chosen such that the mean response time of calls at queue 1 under Dedicated is about 60 minutes in column (a), i.e. ρ1 = 0.7 in Regular-IN and ρ1 = 0.42 in Priority, and about 30 minutes in column (b), i.e. ρ1 = 0.53 in Regular-IN and ρ1 = 0.32 in Priority. In both the Regular-IN and Priority models, the top row shows the mean response time under Dedicated, T1(1), T1(10), and T1(∞), and the bottom row shows the mean response time under T(1), T(10), and ADT with a different scale on the vertical axis. The top rows of Figures 15-16 show that all of the T1 policies can significantly improve upon the Dedicated policy for a range of ρ2 for both high and low ρ1 . In the case of the Regular-IN 26 Regular-IN model 2000 mean response time (sec) mean response time (sec) 4000 1000 dedicated T1(1) T1(10) T1(∞) 400 0.1 0.2 ρ 0.3 1000 dedicated T1(1) T1(10) T1(∞) 200 0.4 0.1 0.2 2 0.4 1300 mean response time (sec) mean response time (sec) 0.3 2 2000 1000 400 ρ T1(1) T1(10) ADT(1,10,9) 0.1 0.2 ρ 0.3 1000 300 0.4 2 T1(1) T1(10) ADT(1,10,9) 0.1 0.2 ρ 0.3 0.4 2 (a) ρ1 = 0.70 (b) ρ1 = 0.53 Figure 15: Static robustness of Dedicated, T1, and ADT under the Regular-IN model. model, this improvement implies that resource sharing (T1) has a significant benefit over the original call center architecture (Dedicated) with respect to mean response time. The figures also show that the improvement of the T1 policies over Dedicated becomes smaller at higher ρ2 and lower ρ1 . This makes intuitive sense, since the agent at queue 2 can help queue 1 less at higher ρ2 , and is needed less at lower ρ1 . T1(∞) is the policy where the agent at queue 2 helps queue 1 only when there are no calls waiting at queue 2, and is equivalent to the T2(1) policy. Figures 15-16 (top rows) suggest that T1(∞) can significantly improve upon Dedicated, but its mean response time can be much higher than the T1 policy with the optimized t1 threshold value. This is in agreement with what we have observed in Section 2.2: the mean response time under the T2 policy, including T2(1), is typically higher than that under the optimized T1 policy when queue 1 has higher cµ value (c1 µ12 > c2 µ2 ). 27 Priority model 6000 dedicated T1(1) T1(10) T1(∞) 1000 500 mean response time (sec) mean response time (sec) 7000 0.4 0.6 ρ 0.8 1000 dedicated T1(1) T1(10) T1(∞) 300 1 0.4 0.6 2 6000 T1(1) T1(10) ADT(1,10,19) 1000 500 0.4 0.6 ρ 0.8 1 ρ 0.8 1 2 mean response time (sec) mean response time (sec) 6000 ρ 0.8 1000 300 1 T1(1) T1(10) ADT(1,10,19) 0.4 0.6 2 2 (a) ρ1 = 0.42 (b) ρ1 = 0.32 Figure 16: Static robustness of Dedicated, T1, and ADT under the Priority model. Taking a closer look, we see in Figure 16 (top row) that at very high ρ2 , the mean response time under T1(1) becomes higher than that under Dedicated. This is due to the smaller stability region of the T1 policy with small T1 threshold value, which is discussed in Section 2.1. Observe that in our parameter settings, T1(1) is equivalent to the cµ rule, as the cµ value is higher at queue 1 for both models. The loss of stability of the T1 policies is less clear in Figures 15-16 than, for example, in Figure 5, due to the fact that there are no calls after midnight at the call center, and thus all the calls are served eventually in our simulation settings. Overall, Figures 15-16 (top rows) show that the T1 policy lacks static robustness. T1(1) provides low mean response time at lower ρ2 , but its mean response time becomes high (sometimes even higher than that under Dedicated) at higher ρ2 . On the other hand, T1(10) has higher mean response time than T1(1) at lower ρ2 , but it can provide lower mean response time at higher ρ2 . 28 Specifically, in Figures 15(a), the mean response time under T1(10) can be 10% worse than that under T1(1) at lower ρ2 , and the mean response time under T1(1) can be 10% worse than that under T1(10) at higher ρ2 . Likewise, in Figures 16(a), the mean response time under T1(10) can be 30% worse than that under T1(1) at lower ρ2 , and the mean response time under T1(1) can be 20% worse than that under T1(10) at higher ρ2 . The bottom rows of Figures 15-16 illustrate static robustness of the ADT policy, where thresh(1) (2) olds t1 = 1 and t1 = 10 are fixed and t2 is chosen via our heuristic introduced in Section 3.2. The figures show that the ADT policy provides roughly at least as good mean response time as the better of the two T1 policies for the full range of ρ2 and for both high and low ρ1 ; i.e. the ADT policy excels in static robustness. This reinforces our findings in Section 3.2 The conclusion of our experiments is that resource sharing (T1) can significantly improve the mean response time at the call center. The ADT policy is an even better choice if the call center wants static robustness against changes in the number of calls in each day, for example, due to changes in service at the bank, increased patronage, or due to increased popularity of online transactions (which in turn leads to increased IN calls). 5.2.2 Dynamic robustness In the previous section, we studied the effect of misestimation of the average load, by considering the mean response time of ADT and T1 at different average loads. To isolate the effect of dynamic robustness, we now hold the average load fixed, and study the effect of the load fluctuation inherent in our trace. Figure 17 illustrates dynamic robustness of the T1 and ADT policies, plotting the percentage change in the mean response time each month against the optimized T1 policy. Recall that the trace has large fluctuations in the arrival rate within each day (Figure 13). As the arrival pattern in the trace is slightly different in each month, evaluating the mean response time each month allows us to evaluate dynamic robustness over twelve different arrival patterns. The threshold value, t1 , of the optimized T1 policy is chosen such that the overall mean response time during the year is minimized, and is fixed throughout the year. In the Regular-IN model Figure 17(a), T1(1), T1(∞), and ADT(3,7,12) are evaluated against the optimized T1 policy, T1(6). In the Priority model Figure 17(b), T1(1), T1(∞), and ADT(3,6,28) are evaluated against the optimized T1 29 100 ADT(3,7,12) T1(1) T1(∞) 80 percentage change over T1(4) percentage change over T1(6) 100 60 40 20 0 ADT(3,6,28) T1(1) T1(∞) 80 60 40 20 0 2 4 6 month 8 10 12 2 (a) Regular-IN: ρ1 = 0.70 and ρ2 = 0.37 4 6 month 8 10 12 (b) Priority: ρ1 = 0.42 and ρ2 = 0.82 Figure 17: Dynamic robustness of T1 and ADT under (a) the Regular-IN model and (b) the Priority model. Figures show the percentage change (%) in the mean response time of T1(1), T1(∞), and the (locally) optimized ADT policy over the T1 policy with the optimal t1 threshold for the year: (a) t1 = 6 and (b) t1 = 4. A negative percentage indicates the improvement over the optimized T1 policy. policy, T1(4). Here, ADT(3,7,12) and ADT(3,6,28) are (locally) optimized ADT policies whose threshold values are chosen to minimize the mean response time during the year, as in Section 4. The loads, ρ1 and ρ2 , are chosen such that the mean response time under Dedicated is roughly 60 minutes for both queue 1 and queue 2, but as Figures 15-16 suggest, mean response time is much lower under T1 and ADT policies. Figure 17 shows that the mean response time under T1(∞) can be twice as high as the mean response time under the optimized T1 policy. Overall, in T1(∞) the agent at queue 2 is too conservative, and could help queue 1 more without penalizing calls at queue 2 too much. Figure 17 also shows that the mean response time under T1(1) can be higher than the optimized T1 policy by 10-20% for some months. In the Regular-IN model, T1(1) is superior to the optimized T1 policy during the first seven months, but its mean response time becomes higher during the rest of the year. This is due to the fact that the number of IN calls per month increases throughout the year, and the T1(1) policy causes starvation at queue 2 during the peak hours. On the other hand, in the Priority model the T1(1) policy is consistently (∼ 10%) worse than the optimized T1 policy. Overall, the mean response time under the T1(1) policy tends to be high, as it is likely to cause starvation at queue 2 at peak hours. Finally, Figure 17 shows that the mean response time under the (locally) optimized ADT 30 policy is slightly better than the optimized T1 policy, but parallel to the observation in Section 4, the performance advantage of ADT over T1 is small under load fluctuation. Specifically, the improvement of the optimized ADT policy over the optimized T1 policy is never more than 5% in the Regular-IN model and is never more than 2.5% in the Priority model. The conclusion of our experiments is that the ADT policy yields only a small improvement over the T1 policy with respect to dynamic robustness. The small improvement suggests that using more thresholds may not improve the mean response time appreciably. Thus, with respect to minimizing the mean response time at the call center, the T1 policy suffices, even though the load at the call center has large fluctuations within each day. Note that these observations agree with our findings in Section 4. 6 Conclusion This paper presents the first analytical study of the performance of a wide range of threshold based (resource) allocation policies in a multiserver system. The speed and accuracy of our analysis allow an extensive evaluation of these allocation policies, and we find surprising conclusions. We first consider single threshold policies, T1 and T2, and find that the T1 policy is superior with respect to (overall weighted) mean response time. That is, the threshold for resource allocation is better determined by the beneficiary queue length (queue 1) than by the donor queue length (queue 2), in all cases studied. We then compare single threshold policies to a multiple threshold policy, the adaptive dual threshold (ADT) policy, with respect to mean response time, assuming that the load is fixed and known. We find that when the threshold value is chosen appropriately, the mean response time of the T1 policy is at worst very close to the best mean response time achieved by the ADT policy. This is surprising, since the optimal policy appears to have infinitely many thresholds, but evidently the improvement these thresholds generate is marginal. We next study static robustness, where the load is constant, but may have been misestimated. We find that the ADT policy not only provides low mean response time but also excels in static robustness, whereas the T1 policy does not. The increased flexibility of the ADT policy enables it to provide low mean response time under a range of loads. Hence, when the load is not exactly 31 known, the ADT policy is a much better choice than the T1 policy. Finally, and surprisingly, our analysis shows that this improvement in static robustness does not necessarily carry over to dynamic robustness, i.e. robustness against load fluctuation. We observe that when the load is fluctuating, the T1 policy, which lacks static robustness, can often provide low mean response time, comparable to ADT. This can occur for example because the mean response time of jobs arriving during the high load period may dominate the overall mean response time. Complementing our analytical work, we evaluate the performance of various allocation policies by using a trace from a call center. The arrival pattern at the call center exhibits quite large fluctuation; our trace driven simulation reinforces our conclusions, implying that this call center could significantly improve mean response time using the T1 policy, and the ADT policy has only a small improvement over the T1 policy with respect to dynamic robustness. However, the ADT policy is a better choice if the call center wants static robustness. Acknowledgement This work is supported by NSF Career Grant CCR-0133077, NSF Theory CCR-0311383, NSF ITR CCR-0313148, and IBM Corporation via Pittsburgh Digital Greenhouse Grant 2003. The authors also thank Li Zhang, who contributed to an earlier version of this paper (Osogami et al., 2004). References H. S. Ahn, I. Duenyas, and R. Q. Zhang. Optimal control of a flexible server. Advances in Applied Probability, 36:139-170, 2004. S. L. Bell and R. J. Williams. Dynamic scheduling of a system with two parallel servers in heavy traffic with complete resource pooling: Asymptotic optimality of a continuous review threshold policy. Annals of Applied Probability, 11:608-649, 2001. L. Brown, N. Gans, A. Mandelbaum, A. Sakov, H. Shen, S. Zeltyn, and L. Zhao. Statistical analysis of a telephone call center: A queueing-science perspective. Journal of the American Statistical Association, 100(469):36-50, 2005. 32 D. R. Cox and W. L. Smith. Queues. Kluwer Academic Publishers, 1971. L. Green. A queueing system with general use and limited use servers. Operations Research, 33(1):168-182, 1985. I. Guedj and A. Mandelbaum. “Anonymous bank” call-center data, February 2000. http://ie.technion.ac.il/∼serveng/. J. M. Harrison. Heavy traffic analysis of a system with parallel servers: Asymptotic optimality of discrete review policies. Annals of Applied Probability, 8(3):822-848, 1998. G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling. ASA-SIAM, Philadelphia, 1999. A. Mandelbaum and A. Stolyar. Scheduling flexible servers with convex delay costs: Heavy traffic optimality of the generalized cµ-rule. Operations Research, 52(6):836-855, 2004. S. Meyn. Sequencing and routing in multiclass queueing networks part i: Feedback regulation. SIAM Journal on Control Optimization, 40(3):741-776, 2001. T. Osogami. chains. Analysis of multi-server systems via dimensionality reduction of Markov PhD thesis, School of Computer Science, Carnegie Mellon University, 2005. http://www.cs.cmu.edu/∼osogami/thesis/ T. Osogami, M. Harchol-Balter, A. Scheller-Wolf, and L. Zhang. Exploring threshold-based policies for load sharing. In Proceedings of the 42nd Annual Allerton Conference on Communication, Control, and Computing, pages 1012-1021, September 2004. R. Shumsky. Approximation and analysis of a call center with specialized and flexible servers. OR Spektrum, 26(3):307-330, 2004. M. S. Squillante, C. H. Xia, D. D. Yao, and L. Zhang. Threshold-based priority policies for parallel-server systems with affinity scheduling. In Proceedings of the IEEE American Control Conference, pages 2992-2999, June 2001. M. S. Squillante, C. H. Xia, and L. Zhang. Optimal scheduling in queuing network models of high-volume commercial web sites. Performance Evaluation, 47(4):223-242, 2002. D. A. Stanford and W. K. Grassmann. The bilingual server system: A queueing model featuring fully and partially qualified servers. INFOR, 31(4):261-277, 1993. 33 D. A. Stanford and W. K. Grassmann. Bilingual server call centers. In D.R. McDonald and S.R.E. Turner, editors, Analysis of Communication Networks: Call Centers, Traffic and Performance. American Mathematical Society, 2000. R. A. Uhlig and T. N. Mudge. Trace-driven memory simulation: A survey. ACM Computing Surveys, 29(2):128-170, 1997. J. A. Van Mieghem. Dynamic scheduling with convex delay costs: The generalized cµ rule. Annals of Applied Probability, 5(3):809-833, 1995. R. J. Williams. On dynamic scheduling of a parallel server system with complete resource pooling. In D. R. McDonald and S. R. E. Turner, editors, Analysis of Communication Networks: Call Centers, Traffic and Performance. American Mathematical Society, 2000. A Analysis of the ADT policy A.1 Analysis of the ADT policy under Poisson arrivals Below we provide a formal description of the analysis of the ADT policy; this will be useful when we generalize the arrival process. The Markov chain in Figure 8 is a (nonhomogeneous) QBD process, where level j of the process denotes the j-th column, namely all states of the form (i, j) for each j. The generator matrix, Q, of this process can be expressed as a block diagonal matrix:  L(0) F(0)   (1)  B  Q=    L(1)  F(1) B(2) L(2) .. . F(2) .. . .. .         where submatrix F(i) encodes transitions from level (column) i to level i + 1 for i ≥ 0, submatrix B(i) encodes transitions from level i to level i − 1 for i ≥ 1, and submatrix L(i) encodes transitions within level i for i ≥ 0. Note that our QBD process repeats after level t2 + 1, i.e., F(i) = F(t2 +1) , L(i) = L(t2 +1) , and B(i) = B(t2 +1) for all i > t2 + 1. Before dimensionality reduction, matrices F(i) , L(i) , and B(i) , had an infinite number of columns and rows, since the number of states in each level was infinite (recall Figure 3(a)). DR reduces the number of states in each level to a finite number; as a result, matrices F(i) , L(i) , 34 and B(i) , have a finite number of columns and rows. Again, in Figure 8, the bottom two rows (2) correspond to the “busy period”, during which there are ≥ t1 type 2 jobs, approximated by a two phase PH distribution with parameters (β1 , β2 , β12 ). (1) Assuming 1 < t1 (2) < t1 , and that the “busy period” is approximated by a two phase PH (2) (2) distribution, matrices F(i) , L(i) , and B(i) have size (t1 + 2) × (t1 + 2) for all i. Specifically, F(i) = λ2 I, for i ≥ 0, where I is an identity matrix of an appropriate size;    I B(i) = µ2  0   (1) (2) where I is an identity matrix of size t1 for 1 ≤ i ≤ t2 and of size t1 for i > t2 , and 0 is a zero matrix of an appropriate size;   (∗)   µ  1 L(i)      =           λ1 (∗) λ1 µ1 + µ12 (∗) .. . .. . .. . λ1 µ1 + µ12 (∗) λ1 0 β1 −(β1 + β12 ) β12 β2 0 −β2                    for all 1 ≤ i ≤ t2 , where the diagonal elements, (∗), are determined so that the sum of each row in the generator matrix Q becomes zero. Matrix L(0) is obtained from L(1) by replacing the (2,1) element by max{µ1 , µ12 }. For i > t2 , L(i) is obtained from L(1) by replacing the (k + 1, k) element (i.e., µ1 + µ12 ) by µ1 for 3 ≤ k ≤ t2 . → Using matrix analytic methods, the stationary probability of being in level i, − πi , is then given recursively by − → −− → (i) πi = π i−1 R , 35 where R(i) is given recursively by: F(i−1) + R(i) L(i) + R(i) R(i+1) B(i+1) = 0, for i = t2 , ..., 1, where 0 is a zero matrix of an appropriate size. Since our QBD process repeats after level t2 + 1, R(i) = R for all i ≥ t2 + 1, where R is given by the minimal solution to the following matrix quadratic equation: F(t2 +1) + RL(t2 +1) + (R)2 B(t2 +1) = 0. →0 is given by a positive solution of A row vector − π → − →0 L(0) + R(1) B(1) = − π 0,   normalized by − →0 π ∞ Y i X − → R(k) 1 = 1, (3) i=0 k=1 − → − → where 0 and 1 are vectors with an appropriate number of elements of 0 and 1, respectively. Note that the infinite sum in (3) may be rewritten by a closed form expression using (R)−1 , since R(i) = R for all i ≥ t2 + 1. The mean response time can be computed using the stationary → distributions (− πi ’s) given above, via Little’s law, as before. A.2 Analysis of the ADT policy under MMPP In this section, we describe the analysis of the ADT policy when arrivals at queue 2 follow an MMPP and arrivals at queue 1 follow a Poisson process. An extension to the case where arrivals at queue 1 also follow an MMPP is possible, but requires an additional technique (see e.g. Osogami, 2005). Following standard notation, we denote the parameters of an MMPP by a pair of matrices (D0 , D1 ). In the case of MMPP(2),   −(αH + λH ) D0 =  αL  αH −(αL + λL )   36 and   λH D1 =  0  0  λL , where the duration of the high (low, respectively) load period has an exponential distribution with rate αH (αL , respectively). Let λ denote the fundamental rate (the overall average arrival rate) of the MMPP. The generator matrix, Q̂, when the arrivals at queue 2 follow an MMPP with parameter (D0 , D1 ) is obtained by modifying the generator matrix, Q, for the Poisson arrivals, which we introduced in Section A.1. We denote the submatrices of Q̂ by F̂(i) , L̂(i) , and B̂(i) , i.e.,  L̂(0) F̂(0)   (1)  B̂  Q̂ =     L̂(1)  F̂(1) B̂(2) L̂(2) F̂(2) .. .. . . .. .     .    Then, F̂(i) , L̂(i) , and B̂(i) are obtained from the submatrices, F(i) , L(i) , and B(i) , of Q by F̂(i) = D1 ⊗ F(i) /λ B̂(i) = I0 ⊗ B(i) L̂(i) = D0 ⊗ Ii + I0 ⊗ L(i) for all i. Here, Ii denotes an identity matrix of the same size as L(i) , I0 denotes an identity matrix of the same size as D0 , and ⊗ denotes the Kronecker product. Note that, before DR, F(i) L(i) , and B(i) have an infinite size, and so do F̂(i) L̂(i) , and B̂(i) . Dimensionality reduction reduces the size of F(i) , L(i) , an B(i) to finite, and thus the size of F̂(i) , L̂(i) , an B̂(i) also becomes finite. Now, the mean response time is obtained in the same way as in Section A.1. 37