Robustness and performance of threshold-based
resource allocation policies
Takayuki Osogami
Mor Harchol-Balter
Alan Scheller-Wolf
Computer Science Department, Carnegie Mellon University,
5000 Forbes Avenue, Pittsburgh, PA 15213, USA, {osogami,harchol}@cs.cmu.edu,
Tepper School of Business, Carnegie Mellon University,
5000 Forbes Avenue, Pittsburgh, PA 15213, USA, awolf@andrew.cmu.edu.
Area of review: Manufacturing, Service, and Supply Chain Optimization.
Subject Classifications: Production/scheduling: Flexible manufacturing/line balancing. Queues:
Markovian. Dynamic programming / optimal control: Markov: Infinite state.
Abstract
We provide the first analytical study of the mean response time and robustness of a wide
range of threshold-based resource allocation policies for a multiserver queueing system such
as those commonly used in modeling call centers. We introduce two different types of robustness: static robustness and dynamic robustness. Static robustness measures robustness against
misestimation of load (i.e., constant load differing from that predicted), while dynamic robustness measures robustness against fluctuations in load (i.e., alternating high and low loads, or
burstiness). We find that using multiple thresholds can have significant benefit over using
only a single threshold with respect to static robustness, but that multiple thresholds surprisingly offer only small advantage with respect to dynamic robustness and mean response
time. A careful evaluation of load conditions allows us to establish guidelines for choosing
a good resource allocation policy, with respect to simplicity, robustness, and mean response
time. Finally, we evaluate the effectiveness of our guidelines in designing resource allocation
policies at a call center.
1
Introduction
A common problem in multiserver systems is deciding how to allocate resources (e.g. operators,
CPU time, and bandwidth) among jobs to maximize system performance, e.g. with respect to
mean response time or throughput. Since good parameter settings typically depend on environmental conditions such as system loads, an allocation policy that is optimal in one environment
may provide poor performance when the environment changes, or when the estimation of the
environment is wrong. In other words, the policy may not be robust. In this paper, we design
several allocation policies for multiserver systems, quantifying their performance with respect to
mean response time and robustness, providing insights into which types of policies perform well
in different operating environments.
1.1
Model and metric
We consider a multiserver model that consists of two servers and two queues (Beneficiary-Donor
model), as shown in Figure 1. Jobs arrive at queue 1 and queue 2 according to (possibly Markov
modulated) Poisson processes with average arrival rates λ1 and λ2 , respectively. Jobs have exponentially distributed service demands; however, the running time of a job may also depend on the
1
2
1
^1 = + 1(1 )
1
12
2
2
2 =
2
12
1
2
Figure 1: Beneficiary-Donor model.
affinity between the particular server and the particular job/queue. Hence, we assume that server
1 (beneficiary server) processes jobs in queue 1 (type 1 jobs) with rate µ1 , while server 2 (donor
server) can process type 1 jobs with rate µ12 , and can process jobs in queue 2 (type 2 jobs) with
rate µ2 . We define ρ1 = λ1 /µ1 , ρ2 = λ2 /µ2 , and ρ̂1 = λ1 /(µ1 + µ12 (1 − ρ2 )). Note that ρ2 < 1 and
ρ̂1 < 1 are necessary for the queues to be stable under any allocation policy, since the maximum
rate at which type 1 jobs can be processed is µ1 , from server 1, plus µ12 (1 − ρ2 ), from server 2.
The Beneficiary-Donor model has a wide range of applications in service facilities such as call
centers and repair facilities. For example, in call centers, the donor server may be a bilingual
operator, and the beneficiary server may be a monolingual operator (Shumsky, 2004; Stanford
and Grassmann, 1993, 2000), or the donor server may be a cross-trained or experienced generalist
who can handle all types of calls, and the beneficiary server may be a specialized operator who
is only trained to handle a specific type of calls (Shumsky, 2004). In a repair facility, the donor
server may be a technician who can handle jobs of any difficulty, and the beneficiary server may
be a technician with limited expertise (Green, 1985).
We design and evaluate allocation policies for the Beneficiary-Donor model with respect to
three objectives. First, as is standard in the literature, we seek to minimize the overall weighted
mean response time, c1 p1 E[R1 ] + c2 p2 E[R2 ], where ci is the weight (importance) of type i jobs,
pi = λi /(λ1 + λ2 ) is the fraction of type i jobs, and E[Ri ] is the mean response time of type i
jobs, for i = 1, 2. Here, response time refers to the total time a job spends in the system. Below,
we refer to overall weighted mean response time simply as mean response time.
In addition to mean response time, we consider an additional metric, robustness, introducing
two types of robustness: static robustness and dynamic robustness. Static robustness measures
2
robustness against misestimation of load; to evaluate static robustness, we analyze the mean response time of allocation polices for a range of loads to see how a policy tuned for one load behaves
under different loads. Dynamic robustness measures the robustness against fluctuations in load;
to evaluate dynamic robustness, we analyze the mean response time of allocation policies under
Markov modulated Poisson processes, where arrivals follow a Poisson process at each moment,
but the arrival rate changes over time.
1.2
Prior work
There has been a large amount of prior work on the Beneficiary-Donor model, the majority of
which focused on proving the optimality of allocation policies in limiting or special cases. With
respect to calculating mean response times, only coarse approximations exist for most of the
allocation policies in our model. We provide a nearly exact analysis of these, as well as other
allocation policies, while also investigating static and dynamic robustness.
One common allocation policy is the cµ rule (Cox and Smith, 1971), which biases in favor of
jobs with high c (high importance) and high µ (small expected size). Applying the cµ rule to our
setting, server 2 serves type 1 jobs (rather than type 2 jobs) if c1 µ12 > c2 µ2 , or queue 2 is empty.
The cµ rule is provably optimal when server 1 does not exist (Cox and Smith, 1971) or in the fluid
limit (Meyn, 2001; Squillante et al., 2002). However Squillante et al. (2001) as well as Harrison
(1998) have shown that cµ rule may lead to instability (queue length growing unboundedly) even
if ρ̂1 < 1 and ρ2 < 1. More recently, Mandelbaum and Stolyar (2004) and Van Mieghem (1995)
have introduced and analyzed the generalized cµ rule. However, in our model, the generalized cµ
rule reduces to the cµ rule and hence has the same stability issues.
In light of this instability, Squillante et al. (2001) and Williams (2000) independently proposed
a threshold-based policy that, under the right choice of threshold value, improves upon the cµ
rule with respect to mean response time, guaranteeing stability whenever ρ̂1 < 1 and ρ2 < 1. We
refer to this threshold-based policy as the T1 policy, since it places a threshold value, t1 , on queue
1, so that server 2 processes type 1 jobs only when there are at least t1 jobs of type 1, or if queue
2 is empty. The rest of the time server 2 works on type 2 jobs. This “reserves” a certain amount
of work for server 1, preventing server 1 from being under-utilized and server 2 from becoming
overloaded, as can happen under the cµ rule. Bell and Williams (2001) prove the optimality of
3
the T1 policy for a model closely related to ours in the heavy traffic limit.
However, studies by Meyn (2001) and Ahn et al. (2004) suggest that the T1 policy is not
optimal in general. Meyn obtains, via a numerical approach, the optimal allocation policy when
both queues have finite buffers. Although not proven, the optimal policy appears to be a “flexible”
(i)
(i)
T1 policy that allows a continuum of T1 thresholds, {t1 }, where threshold t1 is used when the
length of queue 2 is i. Ahn et al. characterize the optimal policy with respect to minimizing
the total holding cost until all the jobs in the system at time zero leave the system, assuming
that there are no arrivals after time zero. They also find that the optimal policy is in general a
“flexible” T1 policy.
All of the work above investigates a class of allocation policies that are optimal in limiting or
special cases. In contrast, there has been little work on the analysis and evaluation of the mean
response time of general allocation policies in our model, and no work evaluating robustness.
Complicating this problem is the fact that the state space required to capture the system behavior
grows infinitely in two dimensions; i.e., we need to track both the number of type 1 jobs and the
number of type 2 jobs. Hence, only approximate analyses exist for most of allocation policies
in our model. For example, Squillante et al. (2001) derive a coarse approximation for the mean
response time of the T1 policy under Poisson arrivals based on vacation models. The mean
response time of other simple allocation policies (in more general models) such as (idle) cycle
stealing, where server 2 works on type 1 jobs when queue 2 is empty, have also been analyzed
(with approximation) either by matrix analytic methods with state space truncation (Green,
1985; Stanford and Grassmann, 1993, 2000) or by approximate solutions of a 2D-infinite Markov
chain via state space decomposition (Shumsky, 2004). Recently, we have introduced the first
nearly exact analysis of the mean response time under a wide range of allocation policies for the
Beneficiary-Donor model (Osogami et al., 2004). However, the analysis in (Osogami et al., 2004)
is limited to Poisson arrivals.
1.3
Contributions of the paper
• In this paper, we extend the analysis in (Osogami et al., 2004) to more general arrival
processes, which allows us to investigate static and dynamic robustness. Our analysis is
based on the approach of dimensionality reduction, DR (see for example Osogami, 2005).
4
DR reduces a two dimensionally (2D) infinite Markov chain to a 1D-infinite Markov chain,
which closely approximates the 2D-infinite Markov chain. In particular, DR allows us to
evaluate the mean response time under the T1 policy, and a similar policy called the T2
policy which places a threshold on queue 2.
• We introduce two types of robustness: static robustness and dynamic robustness, and analytically study a wide range of threshold-based allocation policies with respect to both types
of robustness. Surprisingly, we will see that policies that excel in static robustness do not
necessarily excel in dynamic robustness.
• Specifically, we find that an allocation policy with multiple thresholds can experience significant benefit over allocation policies with a single threshold with respect to static robustness.
Illustrating this, we introduce the adaptive dual threshold (ADT) policy, which places two
thresholds on queue 1, and show this has significant advantage over single threshold allocation policies with respect to static robustness. The ADT policy operates like a T1 policy,
but the threshold value is self-adapted to the load.
• In contrast to this, we find that multiple thresholds surprisingly offer only small advantage
over a single threshold with respect to mean response time and dynamic robustness.
• We apply the principles learned to designing allocation policies for call centers: based on the
characterization of a call center’s operational data, we identify effective allocation policies.
We then evaluate our recommended policies via trace driven simulation. Results suggest
that our policies can reduce the mean response time by orders of magnitude.
The rest of the paper is organized as follows. Section 2 discusses single threshold allocation
policies, and Section 3 discusses multiple threshold allocation policies. In Sections 2-3, we evaluate
the policies with respect to mean response time and static robustness. In Section 4, we shift our
interest to dynamic robustness. In Section 5, we study a real-world call center fitting our model.
2
Analysis of single threshold allocation policies
In this section, we analytically study the mean response time and static robustness of two single
threshold allocation policies. The T1 policy (Section 2.1) places a threshold, t1 , on queue 1,
5
whereby server 2 serves type 1 jobs whenever the length of queue 1 is at least t1 . Thus, under
T1, the beneficiary queue (queue 1) has control. Our second policy, the T2 policy (Section 2.2),
places a threshold, t2 , on queue 2, whereby server 2 serves type 1 jobs whenever the length of
queue 2 is below t2 . In this policy, the donor queue (queue 2) has control.
In Section 2.1.2, we introduce a nearly exact analysis of the T1 policy based on DR. (DR also
enables the analysis of the T2 policy and the ADT policy.) Our analysis will show that the T1
policy is superior to the T2 policy with respect to minimizing the mean response time, but that
the T2 policy is superior with respect to static robustness.
2.1
T1 policy
The T1 policy is formally defined as follows:
Definition 1 Let N1 (respectively, N2 ) denote the number of jobs at queue 1 (respectively, queue
2). The T1 policy with parameter t1 , the T1(t1 ) policy, is characterized by the following set of
rules, all of which are enforced preemptively (preemptive-resume):
• Server 1 serves only its own jobs.
• Server 2 serves jobs from queue 1 if either (i) N1 ≥ t1 or (ii) N2 = 0 & N1 ≥ 2. Otherwise,
server 2 serves jobs from queue 2.
To achieve maximal efficiency, we assume the following exceptions. When N1 = 1 and N2 = 0,
the job is processed by server 2 if and only if µ1 < µ12 . Also, when t1 = 1 and N1 = 1, the job in
queue 1 is processed by server 2 if and only if µ1 < µ12 regardless of the number of type 2 jobs.
Note that we will discuss the nonpreemptive case in Section 5.
Figure 2 shows the jobs processed by server 2 as a function of N1 and N2 under the T1 policy.
Observe that the T1(1) policy is the cµ rule when c1 µ12 > c2 µ2 , and the T1(∞) policy is the cµ
rule when c1 µ12 ≤ c2 µ2 ; thus the cµ rule falls within the broader class of T1 policies.
2.1.1
Stability under the T1 policy
In the T1 policy, higher t1 values yield the larger stability region, and in the limit as t1 → ∞, the
queues under the T1 policy are stable as long as ρ̂1 < 1 and ρ2 < 1. More formally,
6
N2
01
0000000000000000000
1111111111111111111
1111111111111111111
0 0000000000000000000
0000000000000000000
1111111111111111111
1111111111111111111
0000000000000000000
0000000000000000000
1111111111111111111
work on queue 2
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
t1 1111111111111111111
0000000000000000000
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
work on queue 1
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
N1 1111111111111111111
Figure 2: Figure shows whether server 2 works on jobs from queue 1 or queue 2 as a function of
N1 and N2 , under the T1 policy with parameter t1 .
Theorem 1 Under the T1 policy with parameter t1 < ∞, queue 1 is stable if and only if λ1 <
µ1 + µ12 . Stability of queue 2 is given by the following conditions:
• For 1 < t1 < ∞, queue 2 is stable if and only if
ρ2 <
t
1−ρ11
t
(1−ρ1 )ρ 1
1
+µ
1
12 /µ1
t
if ρ1 6= 1,
1−ρ11 + 1−ρ
t1
t1 +λ1 /µ12
(1)
if ρ1 = 1.
• For t1 = 1, if µ1 ≥ µ12 , queue 2 is stable if and only if equation (1) holds with t1 = 2.
• For t1 = 1, if µ1 < µ12 , queue 2 is stable if and only if
ρ2 <
1
1+
ρ1 +λ1 /µ12
1−ρ1 +µ12 /µ1
.
Proof: We prove only the case when t1 > 1 and ρ1 6= 1. The case when t1 = 1 or ρ1 = 1 can be
proved in a similar way. Let N = (N1 , N2 ) be the joint process of the number of jobs in queue
1 and queue 2, respectively. The expected length of a “busy period,” during which N1 ≥ t1 , is
finite if and only if λ1 < µ1 + µ12 . This proves the stability condition for queue 1.
Based on the strong law of large numbers, the necessary and sufficient condition for stability of
queue 2 is ρ2 < F , where F is the time average fraction of time that server 2 processes type 2 jobs
given N2 > 0. Below, we derive F . Let Ñ = (Ñ1 , Ñ2 ) be a process in which Ñ behaves the same
7
as N except that it has no transition from Ñ2 = 1 to Ñ2 = 0. Consider a semi-Markov process of
Ñ1 , where the state space is (0,1,2,...,t1 − 1,t+
1 ). The state n denotes there are n jobs in queue 1
for n = 0, 1, ..., t1 − 1, and the state t+
1 denotes there are at least t1 jobs in queue 1. The expected
sojourn time is 1/λ1 for state 0, 1/(λ1 + µ1 ) for states n = 1, ..., t1 − 1, and b =
1/(µ1 +µ12 )
1−λ1 /(µ1 +µ12 )
for
state t+
1 , where b is the mean duration of the busy period in an M/M/1 queue with arrival rate
λ1 and service rate µ1 + µ12 . The limiting probabilities for the corresponding embedded discrete
time Markov chain are πn = (1 + ρ1 )ρ1n−1 π0 for n = 1, ..., t1 − 1 and πt+ = ρt11 π0 , where
1
π0 =
(1 +
ρt11 −1 )(1
1 − ρ1
.
− ρ1 ) + (1 + ρ1 )(1 − ρt11 −1 )
As server 2 can work on queue 2 if and only if Ñ1 < t1 , the fraction of time that server 2 can
work on queue 2 is
F =
π0 /λ1 + (1 − π0 − πt+ )/(λ1 + µ1 )
1
π0 /λ1 + (1 − π0 − πt+ )/(λ1 + µ1 ) + bπt+
1
1
=
1 − ρt11
1 − ρt11 +
t
(1−ρ1 )ρ11
1−ρ1 +µ12 /µ1
.
The following corollary is an immediate consequence of Theorem 1.
Corollary 1 Under the T1 policy, the stability region increases with t1 (i.e., the right hand side
of equation (1) is an increasing function of t1 ).
2.1.2
Analysis of the T1 policy
Our analysis of the T1 and other threshold-based policies is based on dimensionality reduction, DR
(see for example Osogami, 2005). Advantages of DR include computational efficiency, accuracy,
and simplicity; these allow us to extensively investigate the performance characteristics of the
allocation policies. DR reduces a 2D-infinite Markov chain (see Figure 3(a)) to a 1D-infinite
Markov chain (see Figure 3(b)), which closely approximates the 2D-infinite Markov chain. To
derive the mean response time of T1, the 1D-infinite Markov chain tracks the exact number of
type 2 jobs, but tracks the number of type 1 jobs only up to the point t1 − 1. At this point a
type 1 arrival starts a “busy period,” during which both servers are working on type 1 jobs, and
8
Increasing number jobs in queue 2
max (µ1,µ 12)
λ1
1,0
µ+ λ1
2,0
µ+ λ1
3,0
µ+ λ1
4,0
λ2
µ2
λ2
µ2
λ2
µ2
λ2
λ2
0,1
µ1 λ1
1,1
µ1 λ1
2,1
µ+ λ1
3,1
µ+ λ1
4,1
λ2
µ2
λ2
µ2
λ2
µ2
λ2
λ2
0,2
µ1 λ1
1,2
µ1 λ1
2,2
µ+ λ1
3,2
µ+ λ1
4,2
λ2
µ2
0,0
Increasing number jobs in queue 1
Increasing number jobs in queue 1
0,0
Increasing number jobs in queue 2
λ2
µ2
λ2
µ2
λ2
λ2
λ1
max (µ1,µ 12)
1,0
µ1+µ12
λ1
2,0
β1 λ1
+
3 ,0
β2
β12
3+,0
λ2
µ2
λ2
µ2
λ2
µ2
λ2
0,1
µ1 λ1
1,1
µ1 λ1
2,1
β1 λ1
+
3 ,1
β2
λ2
β12
3+,1
λ2
µ2
λ2
µ2
λ2
µ2
0,2
µ1 λ1
1,2
µ1 λ1
2,2
β1 λ1
λ2
3+,2
β2
λ2
λ2
µ2
λ2
µ2
λ2
µ2
λ2
β12
3+,2
λ2
µ+ = µ1 + µ12
(a) 2D infinite Markov chain
(b) 1D infinite Markov chain
Figure 3: Markov chains that model the behavior under the T1(3) policy. In figures, (i, j) represents the state where there are i jobs of type 1 and j jobs of type 2. In (a), µ+ ≡ µ1 + µ12 .
type 2 jobs receive no service. This “busy period” ends when there are once again t1 − 1 jobs of
type 1. State (t+
1 , j) denotes there are at least t1 jobs of type 1 and there are j jobs of type 2 for
j ≥ 0. The key point is that there is no need to track the exact number of type 1 jobs during this
busy period. We approximate the duration of this busy period with a two-phase phase type (PH)
distribution with parameters (β1 , β2 , β12 ), matching the first three moments (Osogami, 2005).
We use the limiting probabilities of the Markov chain in Figure 3(b) to calculate the mean
number of jobs of each type, E[N1 ] and E[N2 ], which in turn gives their mean response time via
Little’s law. These limiting probabilities can be obtained efficiently via matrix analytic methods
(Latouche and Ramaswami, 1999). Deriving E[N2 ] from the limiting probabilities is straightforward, since we track the exact number of type 2 jobs. We derive E[N1 ] by conditioning on the
state of the chain. Let E[N1 ]ij denote the expected number of type 1 jobs given that the chain is
in state (i, j). For i = 0, ..., t1 − 1, E[N1 ]ij = i for all j. For i = t+
1 , E[N1 ]t+ ,j is the mean number
1
of jobs in an M/M/1 system given that the service rate is the sum of the two servers, µ1 + µ12 ,
and given that the system is busy, plus an additional t1 jobs.
We find that the mean response time computed via DR is usually within two percent of the
simulated value (Osogami, 2005). The high accuracy of DR stems from the fact that the state
space of the 2D-infinite Markov chain in Figure 3(a) is not simply truncated. Rather, two rows
representing 3+ jobs of type 1 in the 1D-infinite Markov chain in Figure 3(b) capture the infinite
9
number of rows (row 4, 5, ...) in the 2D-infinite Markov chain in such a way that the first three
moments of the sojourn time distribution in these two regions agree.
2.1.3
Characterizing the performance of the T1 policy
Our analysis of Section 2.1.2 allows an efficient and accurate analysis of the T1 policy. In this
section, we characterize this performance. We find that the behavior of the T1 policy is quite
different depending on whether queue 2 prefers type 1 or type 2 (c1 µ12 ≤ c2 µ2 or c1 µ12 > c2 µ2 ).
The optimal t1 threshold is typically finite when c1 µ12 > c2 µ2 , and typically infinite when c1 µ12 ≤
c2 µ2 , where the optimality is with respect to minimizing the mean response time. In fact, when
c1 = c2 and c1 µ12 ≤ c2 µ2 , the following theorem holds (the theorem may extend to the case of
c1 µ12 ≤ c2 µ2 with general c1 and c2 , but the general case is not proved).
Theorem 2 If c1 = c2 and c1 µ12 ≤ c2 µ2 , the mean response time of the T1 policy is minimized
at t1 = ∞ (i.e., the cµ-rule is optimal).
Proof: Due to Little’s law, it is sufficient to prove that the number of jobs completed under the
T1(∞) policy is stochastically larger than those completed under the T1 policy with t1 < ∞ at
any moment. Let N inf (t) = (N1inf (t), N2inf (t)) be the joint process of the number of jobs in queue
1 and queue 2, respectively, at time t when t1 = ∞. Let N fin (t) = (N1fin (t), N2fin (t)) be defined
analogously for t1 < ∞. With t1 = ∞, server 2 processes type 2 jobs as long as there are type
2 jobs, and thus N1inf (t) is stochastically larger than N1fin (t) for all t. Consequently, the number
of jobs completed by sever 1 is stochastically smaller when t1 < ∞ than when t1 = ∞ at any
moment, since server 1 is work-conserving.
As long as server 2 is busy, the number of jobs completed by sever 2 is stochastically smaller
when t1 < ∞ than when t1 = ∞, since µ12 ≤ µ2 . Also, using coupling argument, we can show
that the system with t1 = ∞ has server 2 go idle (both queues empty) earlier (stochastically)
than the t1 < ∞ system. Thus, when server 2 is idle the number of jobs completed by the t1 = ∞
system is stochastically larger. Thus, the number of jobs completed (either by server 1 or by
server 2) under the T1(∞) policy is stochastically larger than that completed under the T1 policy
with t1 < ∞.
10
2
1
^ρ =0.95
1
^ =0.9
ρ
1
^ =0.8
ρ
1
10
^ρ =0.95
1
^ =0.9
ρ
1
^ =0.8
ρ
0
10
20
t1
(a) c1 µ1 =
30
^ρ =0.95
1
^ =0.9
ρ
1
^ =0.8
ρ
1
1
1
10
mean response time
mean response time
10
mean response time
10
1
0
40
0
1
4
10
20
t1
(b) c1 µ1 = 1
30
40
10
0
10
20
t1
30
40
(c) c1 µ1 = 4
Figure 4: The mean response time under the T1 policy as a function of t1 . Here, c1 = c2 = 1,
1
c1 µ12 = 1, c2 µ2 = 16
, and ρ2 = 0.6 are fixed.
Since t1 = ∞ achieves the largest stability region (Corollary 1), if c1 = c2 , t1 = ∞ is the
optimal choice with respect to both mean response time and the stability region. Note that the
T1(∞) policy is the policy of following the cµ rule, as server 2 “prefers” to run its own jobs in a
cµ sense when c1 µ12 ≤ c2 µ2 . Therefore, below we limit our attention to the case of c1 µ12 > c2 µ2 ,
where server 2 “prefers” to run type 1 jobs in a cµ sense. Note that condition c1 µ12 > c2 µ2 is
achieved when type 1 jobs are small and type 2 jobs are large, when type 1 jobs are more important
than type 2 jobs, and/or in the pathological case when type 1 jobs have good affinity with server
2. (These, in addition, may motivate use of the Beneficiary-Donor model, giving smaller or more
important jobs better service.) We will see that the optimal t1 threshold is typically finite when
c1 µ12 > c2 µ2 , in contrast to the cµ rule.
Figure 4 shows the mean response time under the T1 policy as a function of t1 ; we see that
optimal t1 is finite and depends on environmental conditions such as load (ρ̂1 ) and job sizes
(µ1 ). Here, different columns correspond to different µ1 ’s. In each column, the mean response
time is evaluated at three loads, ρ̂1 = 0.8, 0.9, 0.95, by changing λ1 (Note that ρ̂1 = 0.8, 0.9, 0.95
corresponds to ρ1 = 2.08, 2.34, 2.47 when µ1 = 1/4 in column 1, ρ1 = 1.12, 1.26, 1.33 when µ1 = 1
in column 2, and ρ1 = 0.88, 0.99, 1.05 when µ1 = 4 in column 3.) By Theorem 1, a larger value
of t1 leads to a larger stability region, and hence there is a tradeoff between good performance
at the estimated load, (ρ̂1 , ρ2 ), which is achieved at smaller t1 , and stability at higher ρ̂1 and/or
ρ2 , which is achieved at larger t1 . Note also that the curves have sharper “V shapes” in general
11
2
mean response time
10
1
10
T1(3): opt at ρ2=0.4
T1(19): opt at ρ2=0.8
0
10
0.4
0.5
0.6
ρ
0.7
0.8
2
Figure 5: The mean response time under the T1(3) policy and the T1(19) policy as a function of
1
, and ρ1 = 1.15 are fixed.
ρ2 , where c1 = c2 = 1, c1 µ1 = c1 µ12 = 1, c2 µ2 = 16
at higher ρ̂1 , which complicates the choice of t1 , since the mean response time quickly diverges to
infinity as t1 becomes smaller.
In addition, analyses (Osogami, 2005) show that the value of the cµ product primarily determines the behavior of the T1 policy, and individual values of c and µ have smaller effect. Also,
when ρ2 is lower (and thus ρ1 is higher for a fixed ρ̂1 ), the optimal t1 tends to become smaller, and
hence the tradeoff between the performance at the estimated load and stability at higher loads is
more significant. This makes intuitive sense, since at lower ρ2 , server 2 can help more.
Figure 5 highlights the static robustness of the T1 policy, plotting the mean response time
as a function of ρ2 (only λ2 is changed) for T1(3) and T1(19). When ρ2 = 0.4, t1 = 3 is the
optimal threshold (but T1(19) still provides finite mean response time). However, if it turns out
that ρ2 = 0.8 is the actual load, then the T1(3) policy leads to instability (infinite mean response
time), while the T1(19) policy minimizes mean response time. Thus, choosing a higher t1 (=19)
guarantees stability against misestimation of the load, but results in worse performance at the
estimated load. This experiment and others like it (Osogami, 2005) lead us to conclude that the
T1 policy is poor with respect to static robustness.
2.2
T2 policy
In this section, we investigate the mean response time and static robustness of the T2 policy,
comparing it to the T1 policy. The T2 policy is formally defined as follows:
Definition 2 The T2 policy with parameter t2 , the T2(t2 ) policy, is characterized by the following
12
3
3
10
work on
queue 1
N1
work on
queue 2
mean response time
00000000000
11111111111
11111111111
00000000000
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
mean response time
0
10
N2
t2
0
2
10
1
10
T2(1)
T2(2)
T2(16)
0
10
0.4
(a)
0.5
0.6
ρ2
0.7
0.8
2
10
1
10
T2(2)
T1(3), T1(19)
0
10
0.4
(b)
0.5
0.6
ρ2
0.7
0.8
(c)
Figure 6: Part (a) shows whether server 2 works on jobs from queue 1 or queue 2 as a function of
N1 and N2 , under the T2 policy with parameter t2 . Part (b) shows the mean response time under
the T2 policy with various t2 threshold values as a function of ρ2 . Part (c) compares the mean
response time under the “optimized” T2 policy and two T1 policies. In (b) and (c), c1 = c2 = 1,
1
, and ρ1 = 1.15 are fixed.
c1 µ1 = c1 µ12 = 1, c2 µ2 = 16
set of rules, all of which are enforced preemptively (preemptive-resume):
• Server 1 serves only its own jobs.
• Server 2 serves jobs from queue 1 if N2 < t2 . Otherwise server 2 serves jobs from queue 2.
When N1 = 1 and N2 = 0, we allow the same exception as in the T1 policy.
Figure 6(a) shows the jobs processed by server 2 as a function of N1 and N2 under the T2 policy.
Recall that the T1 policy guarantees stability whenever ρ̂1 < 1 and ρ2 < 1 provided that t1 is
chosen appropriately. By contrast, the T2 policy guarantees stability whenever ρ̂1 < 1 and ρ2 < 1
for any finite t2 . It clearly dominates the T1 policy in this respect. More formally, the following
theorem holds, which can be proved in a similar way as Theorem 1.
Theorem 3 Under the T2 policy with t2 < ∞, queue 1 is stable if and only if ρ̂1 < 1, and queue 2
is stable if and only if ρ2 < 1.
2.2.1
Assessing the performance of the T2 policy
It is not at all obvious how the T2 policy’s performance compares with that of the T1 policy,
when each is run with its optimal threshold. In this section, we investigate this question. The T2
policy can be analyzed via DR as in Section 2.1.2, approximating the 2D-infinite Markov chain by
13
a 1D-infinite Markov chain tracking the exact number of type 1 jobs (cf. the 1D-infinite Markov
chain for the T1 policy tracks the exact number of type 2 jobs). With respect to the number of
type 2 jobs, the chain differentiates only between 0, 1, ..., t2 − 1, or t+
2 jobs.
Figure 6(b) illustrates that the mean response time under the T2 policy is minimized at a
small t2 (in this case t2 = 2) for a range of load. This figure is representative of a wide range of
parameter values that we studied. Since choosing a small t2 minimizes the mean response time
and still provides the maximum stability region, there is no tradeoff between minimizing the mean
response time and maximizing the stability region with the T2 policy.
However, Figure 6(c) (and many other experiments like it) imply that the mean response time
under the T2 policy with the optimal t2 is typically higher than that under the T1 policy with
the optimal t1 . We conclude that, although the T2 policy has more static robustness than the T1
policy, it performs worse with respect to mean response time.
3
Analysis of multi-threshold allocation policies
The tradeoff between the low mean response time of the T1 policy and good static robustness
of the T2 policy motivates us to introduce a class of multi-threshold allocation policies: ADT
policies. We will study how the mean response time and static robustness of these multi-threshold
allocation policies compare to that of the single threshold allocation policies.
3.1
The adaptive dual threshold (ADT) policy
The key idea in the design of the ADT policy is that we want the ADT policy to operate as a T1
policy to ensure low mean response time, but we will allow the value of t1 to adapt, depending
on the length of queue 2, to provide static robustness. Specifically, the ADT policy behaves like
(1)
the T1 policy with parameter t1
(2)
if the length of queue 2 is less than t2 and otherwise like the
(2)
(1)
T1 policy with parameter t1 , where t1 > t1 . We will see that, indeed, the ADT policy is far
superior to the T1 policy with respect to static robustness. In addition, one might also expect
that the mean response time of the optimized ADT policy will significantly improve upon that
of the optimized T1 policy, since the ADT policy generalizes the T1 policy (the ADT policy is
(1)
(2)
reduced to the T1 policy by setting t1 = t1 ). However, this turns out to be largely false, as we
14
N
2
01
t2
1111111111111111111
0000000000000000000
0000000000000000000
0 1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
work on queue 2
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
(1) 1111111111111111111
t 1 0000000000000000000
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
(2) 1111111111111111111
0000000000000000000
t 1 1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
work on queue 1
0000000000000000000
1111111111111111111
0000000000000000000
N1 1111111111111111111
Figure 7: Figure shows whether server 2 works on jobs from queue 1 or queue 2 as a function of
(1) (2)
N1 and N2 under the ADT policy with parameters t1 , t1 , and t2 .
see below. Formally, the ADT policy is characterized by the following rule.
(1)
(2)
(1)
(2)
Definition 3 The ADT policy with parameters t1 , t1 , and t2 , the ADT(t1 , t1 , t2 ) policy,
(1)
(2)
operates as the T1(t1 ) policy if N2 ≤ t2 ; otherwise, it operates as the T1(t1 ) policy.
Figure 7 shows the jobs processed by server 2 under the ADT policy as a function of N1 and N2 .
(A separate class of multi-threshold allocation policies that place only one threshold on queue 1
and one on queue 2, the T1T2 policy, is introduced in Osogami et al., 2004. Its mean response
time and static robustness are only marginally improved over T1 and T2 policies. Thus, T1T2 is
in general inferior to ADT.)
At high enough ρ̂1 and ρ2 , N2 usually exceeds t2 , and the policy behaves similar to the T1
(2)
policy with parameter t1 . Thus, the stability condition for ADT is the same as that for T1 with
(2)
parameter t1 . The following theorem can be proved in a similar way as Theorem 1.
(1)
(2)
Theorem 4 The stability condition for the ADT policy with parameters t1 , t1 , and t2 is given
(2)
by the stability condition for the T1 policy with parameter t1
(Theorem 1).
The ADT policy can likewise be analyzed via DR as in Section 2.1.2, by approximating the
2D-infinite Markov chain by a 1D-infinite Markov chain (see Figure 8). For the ADT policy, the
1D-infinite Markov chain tracks the exact number of type 2 jobs, but tracks the number of type
(2)
1 jobs only up to the point where there are t1 − 1 jobs. A type 1 arrival at this point starts a
(2)
“busy period,” which ends when there are once again t1 − 1 jobs of type 1. We approximate
the duration of this busy period with a two-phase PH distribution with parameters (β1 , β2 , β12 ),
15
Increasing number jobs in queue 2
λ2
µ2
Increasing number jobs in queue 1
0,0
λ2
µ2
0,1
µ1 λ1
λ2
1,0 µ2 1,1
µ1+µ12
λ1 µ1+µ12 λ1
λ2
λ1
max (µ1,µ 12)
2,0
3,0
β1 λ1
λ2
+
4 ,0
β2
β12
4+,0
λ2
2,1
λ1 µ1+µ12
λ2
µ1+µ12
λ2
µ2
λ2
λ1
λ2
3,1
β1 λ1
+
4 ,1
β2
λ2
β12
4+,1
λ2
λ2
µ2
0,2
µ1 λ1
λ2
µ2
1,2
µ1 λ1
λ2
µ2
2,2
µ1 λ1
λ2
µ2
3,2
β1 λ1
+
4 ,2
β2
λ2
β12
4+,2
λ2
λ2
µ2
0,3
µ1 λ1
λ2
µ2
1,3
µ1 λ1
λ2
µ2
2,3
µ1 λ1
λ2
µ2
3,3
β1 λ1
4+,3
β2
λ2
β12
4+,3
λ2
Figure 8: The 1D-infinite Markov chain that models the behavior under the ADT(2,4,2) policy.
(2) +
matching the first three moments as before. State (t1
(2)
, j) denotes that there are at least t1
jobs of type 1 and there are j jobs of type 2 for j ≥ 0. The mean response time is again obtained
via matrix analytic methods. In Appendix A.1, we analyze the ADT policy more formally.
3.2
Results: Static robustness of the ADT policy
Figure 9 illustrates static robustness of the ADT policy, showing the mean response time under
the ADT policy as a function of ρ2 ; the ADT policy achieves at least as low mean response time as
the better of the T1 policies with the two different t1 values throughout the range of ρ2 . Though
not shown, the ADT policy is also (statically) robust against misestimation of ρ̂1 (Osogami, 2005).
The robustness of the ADT policy can be attributed to the following. The dual thresholds on
queue 1 make the ADT policy adaptive to misestimation of load, in that the ADT policy with
(1)
(2)
(1)
parameters t1 , t1 , and t2 operates like the T1 policy with parameter t1 at the estimated load
(2)
(2)
(1)
and like the T1 policy with parameter t1 at a higher load, where t1 > t1 . Thus, server 2 can
help queue 1 less when there are more type 2 jobs, preventing server 2 from becoming overloaded.
This leads to the increased stability region and improved performance.
(1)
(2)
In specifying the three thresholds, t1 , t1 , and t2 , for the ADT policy in Figure 9, we have
used the following sequential heuristic:
(1)
1. Set t1 as the optimal t1 value for the T1 policy at the estimated (given) load.
16
2
mean response time
10
1
10
ADT(3,19,6)
T1(3),T1(19)
0
10
0.4
0.5
0.6
ρ2
0.7
0.8
Figure 9: The mean response time under the ADT policy as a function of ρ2 . Here, c1 = c2 = 1,
1
, and ρ1 = 1.15 are fixed.
c1 µ1 = c1 µ12 = 1, c2 µ2 = 16
(2)
2. Choose t1
so that it achieves stability in a desired range of load. We find that the mean
(2)
response time at the estimated load is relatively insensitive to t1 , and hence we can choose
(2)
a high t1 to guarantee a large stability region.
3. Find t2 such that the policy provides both low mean response time at the estimated load
and good static robustness. This is a nontrivial task. If t2 is set too low, the ADT policy
(2)
behaves like the T1 policy with parameter t1 , degrading the mean response time at the
(2)
estimated load, since t1
is larger than the optimal t1 in the T1 policy. If t2 is set too
(1)
high, the ADT policy behaves like the T1 policy with parameter t1 . This worsens the
mean response time at loads higher than the estimated load. In plotting Figure 9, we found
“good” t2 values manually by trying a few different values, which took only a few minutes.
(1)
Observe that since the stability region is insensitive to t1 and t2 , we can choose these values so
that the mean response time at the estimated load is optimized.
3.3
Results: Mean response time of the ADT policy
We have already seen the benefits of the ADT policy when the load is not exactly known (static
robustness). One might also expect that, even when the load is known exactly, the ADT policy
might significantly improve upon the T1 policy with respect to mean response time. Earlier work
of Meyn (2001) provides some support for this expectation; Meyn shows via numerical examples
that, in the case of finite buffers for both queues, the policy that minimizes mean response time
17
0
−1
−1
percentage change (%)
percentage change (%)
0
−2
−3
−4
−5
0.2
0.4
0.6
ρ
−2
−3
−4
−5
0.8
0.2
0.4
2
(a) c2 µ2 =
0.6
ρ
0.8
2
1
4
(b) c2 µ2 =
1
16
Figure 10: The percentage change (%) in the mean response time of the (locally) optimized ADT
policy over the optimized T1 policy at each given load, as a function of ρ2 . A negative percentage
indicates the improvement of ADT over T1. Here, c1 = c2 = 1, c1 µ1 = c1 µ12 = 1, and ρ1 = 1.15
are fixed.
(i)
(i)
is a “flexible” T1 policy which allows a continuum of T1 thresholds, {t1 }, where threshold t1
is used when the length of queue 2 is i. The ADT policy can be seen as an approximation of a
“flexible” T1 policy, using only two t1 thresholds.
To evaluate the benefit of the ADT policy, we compare it over a range of ρ2 against the T1
policy optimized for the given ρ2 . Since the search space of the threshold values for the ADT
policy is large, we find locally optimal threshold values, which are found to be optimal within a
search space of ±5 for each threshold. We measure the percentage change in the mean response
time of ADT versus T1:
E[RADT ] − E[RT 1 ]
× 100
E[RT 1 ]
(%),
(2)
where E[RX ] denotes the mean response time in policy X ∈ {ADT,T1}.
Figure 10 shows the percentage reduction in the mean response time of the locally optimized
ADT policy over the T1 policy optimized at each ρ2 , as a function of ρ2 . Figure 10 shows that,
surprisingly, the benefit of the ADT policy is quite small with respect to mean response time
under fixed Poisson arrivals; the improvement of the ADT policy is larger at moderately high ρ2
and at smaller c2 µ2 value, but overall the improvement is typically within 3%. We conjecture
that adding more thresholds (approaching the flexible T1 policy) will not improve mean response
time appreciably, given the small improvement from one to two thresholds. Thus, whereas the
18
ADT policy has significant benefits over the simpler T1 policy with respect to static robustness,
the two policies are comparable with respect to mean response time.
4
Dynamic robustness of threshold-based policies
We have seen, in Section 3.1, that the mean response time of the optimized ADT policy is similar
to that of the optimized T1 policy, although the ADT policy has greater static robustness. Note
that this observation is based on the assumption of Poisson arrivals. Consider, to start, an
alternate scenario, in which the load at queue 2 fluctuates. For example, a long high load period
(e.g., ρ1 = 1.15 and ρ2 = 0.8) is followed by a long low load period (e.g., ρ1 = 1.15 and ρ2 = 0.4),
and the high and low load periods alternate. The T1 policy with a fixed threshold value must
have a high mean response time either during the high load period or during the low load period
(recall Figure 5). On the other hand, the ADT policy may provide low mean response time during
both high and low load periods, since the t1 threshold value is self-adapted to the load (recall
Figure 9). In this section, we study the mean response time of the ADT policy when the load
fluctuates, or the dynamic robustness of the ADT policy.
We use a Markov modulated Poisson process of order two (MMPP(2)) as an arrival process
at queue 2. An MMPP(2) has two phases, which we denote as the high load phase and the low
load phase. The duration of each phase has an exponential distribution, which can differ in each
phase. During the high (respectively, low) load phase, the arrival process follows a Poisson process
with rate λH (respectively, λL ), where λH > λL . We postpone describing the techniques used to
analyze the ADT policy under the MMPP to Appendix A.2, and first study the results.
Results: Dynamic robustness of the ADT policy vs. the T1 policy
Figure 11 shows the percentage change in the mean response time of the (locally) optimized ADT
policy over the optimized T1 policy, when arrivals at queue 1 follow a Poisson process and arrivals
at queue 2 follow an MMPP(2). The arrival rates in the MMPP(2) are chosen such that the load
during the high load period is ρ2 = 0.8 and the load during the low load period is ρ2 = 0.2, 0.4, or
0.6, while ρ1 = 1.15 is fixed throughout. We choose the horizontal axis to be the expected number
of type 2 arrivals during a high load period, and the three lines (solid, dashed, and dotted) to
19
0
−5
−5
percentage change (%)
percentage change (%)
0
−10
−15
−20
−25
E[N ]=E[N ]/10
L
H
E[N ]=E[N ]
L
H
E[N ]=10E[N ]
L
−30 1
10
−10
−15
−20
−25
H
2
10
E[N ]=E[N ]/10
L
H
E[N ]=E[N ]
L
H
E[N ]=10E[N ]
L
3
10
4
10
−30 1
10
5
10
E[N_H]
H
2
10
3
10
4
10
5
10
E[N_H]
(a) ρ2 = [0.2, 0.8]
(c) ρ2 = [0.6, 0.8]
Figure 11: The percentage change (%) in the mean response time of the (locally) optimized ADT
policy over the optimized T1 policy for each given MMPP(2), shown as a function of the expected
number of arrivals during a high load period, E[NH ], and during a low load period, E[NL ]. A
negative percentage indicates the improvement of ADT over T1. Here, c1 = c2 = 1, c1 µ1 =
1
, and ρ1 = 1.15 are fixed.
c1 µ12 = 1, c2 µ2 = 16
be different expected number of type 2 arrivals during a low load period. Note that since there
are less frequent arrivals during a low load period, having the same number of arrivals during the
high and low load periods implies that the low load period is longer. Thus, the number of arrivals
during each period is an indicator of how frequently the load changes, which we find to be an
important parameter in studying dynamic robustness. The threshold values of the optimized T1
policy and the (locally) optimized ADT policy are chosen such that the overall weighted mean
response time is minimized for each given arrival process.
The first thing to notice in Figure 11 is that the improvement of the ADT policy over the T1
policy is smaller when the duration of the high and low load periods is shorter, or equivalently
when there are less arrivals in each period. This makes intuitive sense, since the MMPP(2) reduces
to a Poisson process when the high and low load periods alternate infinitely quickly, and under
the Poisson process, the optimized T1 policy and the optimized ADT policy provide similar mean
response time; see Section 3.1.
However, even when the durations are longer, the performance improvement of the ADT policy
over the T1 policy is comparatively small (3 to 25%). This is mainly because the mean response
time of the jobs arriving during the high load period tends to dominate the overall mean response
20
time for two reasons: (i) the response time of jobs arriving during the high load period is much
higher than that of jobs arriving during the low load period, partially due to the fact that any
reasonable allocation policy (such as the optimized T1 policy and the optimized ADT policy)
can provide low mean response time at low load, and (ii) assuming that a low load period and a
high load period have the same duration, there are more arrivals during a high load period. Since
the T1 policy with a fixed t1 threshold can provide low mean response time for the jobs arriving
during the high load period, it can provide low overall mean response time. (Of course, if there
were many more arrivals during the low load period than the high load period, the t1 threshold
would be adjusted.)
It is only when the jobs arriving during the high load period and the jobs arriving during
the low load period have roughly equal contribution to the overall mean response time that
the ADT policy can have appreciable improvement over the T1 policy. This happens when
P2
L
L
i=1 ci pi E[Ri ]
∼
P2
H
H
i=1 ci pi E[Ri ],
H
where pL
i (respectively, pi ) is the fraction of jobs that are
type i and arriving during the low (respectively, high) load period, and E[RiL ] (respectively,
E[RiH ]) is the mean response time of type i jobs arriving during the low (respectively, high) load
period, for i = 1, 2. For example, Figure 11 suggests that the ADT policy can provide a mean
response time that is 20-30% lower than that of the T1 policy, when the number of arrivals during
a low load period is (∼ 10 times) larger than that during a high load period.
In addition (not shown), we find that when arrivals at queue 1 follow an MMPP(2) or when
arrivals at both queues follow MMPP(2)’s, the improvement of the ADT policy over the T1 policy
tends to be smaller than when only arrivals at queue 2 follow an MMPP(2). Overall, we conclude
that ADT has appreciable improvement over T1 only when there are more arrivals during the low
load period than during the high load period, giving them comparable importance.
5
Application to call center scheduling
In this section we apply lessons of previous sections to designing allocation policies in a telephone
call center simulated using traces at a call center of an anonymous bank in Israel in 1999 provided
by Guedj and Mandelbaum (2000). This call center uses a service architecture that is similar to
the Beneficiary-Donor model, based on different classes of callers. Our goals are to assess what
21
improvement may be possible for the call center through the implementation of threshold-based
policies, and more generally to evaluate some of the high level principles of prior sections. For this
purpose, we will first study some relevant characteristics of the trace in Section 5.1 (see Brown et
al., 2005, for a complementary study of the trace). In particular, we will see that the arrival rate
at this call center has great fluctuation as in Section 4. Based on the lessons learned in previous
sections, we expect that the T1 policy may perform well in this call center. We will evaluate this
expectation via trace driven simulation in Section 5.2.
5.1
5.1.1
Settings
Trace characteristics
The data spans twelve months of 1999, and were collected at the level of individual calls, at
a small call center of an anonymous bank in Israel. An arriving call is first connected to an
interactive voice response unit (VRU), where the customer receives recorded information and
possibly performs self-service transactions. In total, roughly 1,200,000 calls arrived at the VRU
during the year of 1999; out of those, about 420,000 calls indicated a desire to speak to an agent.
Below, we limit our focus on the 420,000 calls that requested connection to an agent.
The calls requesting connection to an agent can be divided into two types: Internet assistance
(IN) calls and Regular calls. IN calls generally ask for technical support for online transactions
via web sites. All the other calls are classified as Regular calls. Prior to August 1999, both the
IN calls and the Regular calls joined a single shared queue, and were served by the same pool of
agents. Post August 1999, the call center split the IN and Regular calls into two separate queues
to be served by separate pools of agents (see Figure 12). In addition to distinguishing between
two types of calls, the call center also differentiates between high and low priority customers, and
looks for ways to give high priority customers shorter waiting times.
Table 1 summarizes the total number of calls of each type and of each priority class at the
call center during 1999. Out of the approximately 420,000 total calls, about 400,000 calls are
Regular, and about 20,000 calls are IN. Out of the 400,000 Regular calls, about 140,000 calls have
high priority, and about 260,000 calls have low priority. By contrast, almost all IN calls have low
priority. Table 2 shows the percentage of calls that are served by agents. About 85% of the calls
are served, and 15% of the calls are abandoned before receiving service by agents (as there are
22
Regular calls
IN calls
Figure 12: Post-August architectural model of a call center.
only two IN calls with high priority, the entry is kept blank in the table).
both prio
high prio
low prio
both types
419,857
137,317
282,540
Regular
400,765
137,315
263,450
IN
19,092
2
19,090
both prio
high prio
low prio
Table 1: Total number of calls during the year.
both types
84.9%
85.8%
84.4%
Regular
85.2%
85.8%
84.8%
IN
78.9%
78.9%
Table 2: Percentage of calls served.
Figure 13 details the total number of calls, showing (a) the daily number of calls during a
month (November) and (b) the hourly number of calls during a day (November 1). As Figure 13(a)
suggests, the number of calls per day drops on weekends (Fridays and Saturdays in Israel). As
Figure 13(b) suggests, the call center opens at 7am on weekdays, and the number of calls per
hour peaks before lunch time (∼ 200 calls per hour). After lunch time, there is another peak, and
then calls decline through the evening (to roughly 70 calls per hour). The arrival pattern does
not differ much day to day during weekdays.
Table 3 summarizes the mean service demand (in seconds) and its squared coefficient of variation for those calls that are served. As there are only two IN calls with high priority, the entry is
kept blank in the table. Note that the IN calls have noticeably longer service demand with higher
variability, and this might be a reason for the call center to serve the IN calls by a separate pool
of agents, so that the Regular calls are not blocked by long (low priority) IN calls.
5.1.2
Architectural models for experiment
In our experiment, we consider two possible service models for the call center, as shown in Figure 14. In both models, we assume that each queue has a single agent, approximating the fact
that each queue is served by a pool of several agents at the call center (in fact, up to 13 agents
23
200
2000
150
1500
100
1000
50
500
0
5
10
15
day
20
25
0
0
30
(a) Daily arrivals
5
10
time
15
20
(b) Hourly arrivals
Figure 13: A typical arrival pattern of all 420,000 calls. The figures show the number of (a) daily
arrivals in November and (b) hourly arrivals on November 1.
both prio
high prio
low prio
both types
190.1
208.7
180.9
Regular
180.6
208.7
165.7
IN
406.3
406.3
both prio
high prio
low prio
both types
2.217
1.871
2.420
Regular
1.836
1.871
1.741
IN
2.974
2.974
(b) C 2
(a) mean
Table 3: Statistics of the duration of a service: (a) mean (in seconds) and (b) squared coefficient
variation.
serve the call center). This approximation becomes more accurate as the load becomes higher,
where the study of performance becomes important. In the Regular-IN model Figure 14(a), we
separate the IN calls from the Regular calls as in the original architectural model of a call center
(Figure 12), but allow the IN agent to sometimes serve the Regular calls. Since almost all IN
calls have low priority while 34% of the Regular calls have high priority, we place more weight
(importance) on the Regular calls (specifically, the Regular calls have weight cR = 4, and the IN
calls have weight cIN = 1). As the IN calls have longer service demand, more variability, and
less importance, we do not want the Regular call agent to serve the IN calls. In this section, we
assume a nonpreemptive service discipline, following call center convention. (In previous sections
we have chosen preemptive service disciplines for clarity, as the analysis of the nonpreemptive
case is complex, though possible, as described in Osogami, 2005.)
In the Priority model Figure 14(b), we separate high priority calls from low priority calls, and
place more weight on high priority jobs (specifically, high priority calls have weight cH = 16, and
24
Regular calls
High
priority
IN calls
Low
priority
cR= 1
cIN = 1
cH= 16
cL = 1
1/ µR = 181
1/ µI = 406
1/ µH = 209
1/ µL = 181
(a) Regular-IN model
(b) Priority model
Figure 14: Two architectural models of a call center.
low priority calls have weight cL = 1). As high priority calls and low priority calls have roughly
the same service demand, high priority calls have higher cµ value; thus, we allow the agent for
low priority calls to sometimes serve high priority calls.
5.1.3
Preprocessing of trace
We consider only those arrivals during weekdays, removing holidays (which have smaller numbers
of calls). In our trace driven simulation, we feed the trace of the Regular or high priority calls
into queue 1 and the trace of the IN or low priority calls into queue 2. In order to use our
trace of limited length multiple times (specifically, 30 times in each run) with different arrival
sequences, we follow a common approach in simulation (Uhlig and Mudge, 1997), whereby each
call is independently removed from the trace with some probability, as specified below. Another
reason for sampling the arrival sequence in this way is to create different loads. Hereby, we define
the load at queue i, ρi , as follows for i = 1, 2:
ρi =
(1 − qi ) × (total number of calls at queue i) × (average duration of a service for queue i)
,
(total operation hours)
where qi is the fraction of the calls at queue i removed from the trace.
We pick the service demand of a call from the lognormal distribution whose parameters are
estimated from the trace. Picking service demands from a distribution allows us to use the same
trace multiple times with different service demand sequences.
25
5.2
Results
Our trace characterization (Section 5.1.1) shows that the load at the call center has large fluctuation. We will see that the T1 policy (with a fixed t1 threshold) provides a low mean response time
even under this fluctuating load. In addition, we will also see how much improvement the call
center can expect with respect to static and dynamic robustness by employing allocation policies
such as the T1 and ADT policies, which allow resource sharing. For this purpose, we evaluate
the following three allocation policies:
The Dedicated policy: Each agent serves only its own queue, as in the original call center
model (Figure 12).
The T1 policy: The rules are specified in Definition 1, but they are enforced nonpreemptively.
The ADT policy: The rules are specified in Definition 3, but they are enforced nonpreemptively.
We study static robustness of these policies in Section 5.2.1, and dynamic robustness in Section 5.2.2.
5.2.1
Static robustness
Figures 15-16 illustrate static robustness of the Dedicated, T1, and ADT policies, plotting the
mean response times as a function of ρ2 , under the Regular-IN model (Figure 15) and under the
Priority model (Figure 16). In the Regular-IN model, ρ2 ranges only between 0 and 0.43; no calls
are removed from the trace at ρ2 = 0.43. On the other hand, the Priority model allows us to
change ρ2 in a much wider range, and we show only a portion of the full range of ρ2 . In both the
Regular-IN and Priority models, ρ1 is chosen such that the mean response time of calls at queue 1
under Dedicated is about 60 minutes in column (a), i.e. ρ1 = 0.7 in Regular-IN and ρ1 = 0.42
in Priority, and about 30 minutes in column (b), i.e. ρ1 = 0.53 in Regular-IN and ρ1 = 0.32 in
Priority. In both the Regular-IN and Priority models, the top row shows the mean response time
under Dedicated, T1(1), T1(10), and T1(∞), and the bottom row shows the mean response time
under T(1), T(10), and ADT with a different scale on the vertical axis.
The top rows of Figures 15-16 show that all of the T1 policies can significantly improve upon
the Dedicated policy for a range of ρ2 for both high and low ρ1 . In the case of the Regular-IN
26
Regular-IN model
2000
mean response time (sec)
mean response time (sec)
4000
1000
dedicated
T1(1)
T1(10)
T1(∞)
400
0.1
0.2
ρ
0.3
1000
dedicated
T1(1)
T1(10)
T1(∞)
200
0.4
0.1
0.2
2
0.4
1300
mean response time (sec)
mean response time (sec)
0.3
2
2000
1000
400
ρ
T1(1)
T1(10)
ADT(1,10,9)
0.1
0.2
ρ
0.3
1000
300
0.4
2
T1(1)
T1(10)
ADT(1,10,9)
0.1
0.2
ρ
0.3
0.4
2
(a) ρ1 = 0.70
(b) ρ1 = 0.53
Figure 15: Static robustness of Dedicated, T1, and ADT under the Regular-IN model.
model, this improvement implies that resource sharing (T1) has a significant benefit over the
original call center architecture (Dedicated) with respect to mean response time. The figures also
show that the improvement of the T1 policies over Dedicated becomes smaller at higher ρ2 and
lower ρ1 . This makes intuitive sense, since the agent at queue 2 can help queue 1 less at higher
ρ2 , and is needed less at lower ρ1 .
T1(∞) is the policy where the agent at queue 2 helps queue 1 only when there are no calls
waiting at queue 2, and is equivalent to the T2(1) policy. Figures 15-16 (top rows) suggest that
T1(∞) can significantly improve upon Dedicated, but its mean response time can be much higher
than the T1 policy with the optimized t1 threshold value. This is in agreement with what we have
observed in Section 2.2: the mean response time under the T2 policy, including T2(1), is typically
higher than that under the optimized T1 policy when queue 1 has higher cµ value (c1 µ12 > c2 µ2 ).
27
Priority model
6000
dedicated
T1(1)
T1(10)
T1(∞)
1000
500
mean response time (sec)
mean response time (sec)
7000
0.4
0.6
ρ
0.8
1000
dedicated
T1(1)
T1(10)
T1(∞)
300
1
0.4
0.6
2
6000
T1(1)
T1(10)
ADT(1,10,19)
1000
500
0.4
0.6
ρ
0.8
1
ρ
0.8
1
2
mean response time (sec)
mean response time (sec)
6000
ρ
0.8
1000
300
1
T1(1)
T1(10)
ADT(1,10,19)
0.4
0.6
2
2
(a) ρ1 = 0.42
(b) ρ1 = 0.32
Figure 16: Static robustness of Dedicated, T1, and ADT under the Priority model.
Taking a closer look, we see in Figure 16 (top row) that at very high ρ2 , the mean response
time under T1(1) becomes higher than that under Dedicated. This is due to the smaller stability
region of the T1 policy with small T1 threshold value, which is discussed in Section 2.1. Observe
that in our parameter settings, T1(1) is equivalent to the cµ rule, as the cµ value is higher at
queue 1 for both models. The loss of stability of the T1 policies is less clear in Figures 15-16 than,
for example, in Figure 5, due to the fact that there are no calls after midnight at the call center,
and thus all the calls are served eventually in our simulation settings.
Overall, Figures 15-16 (top rows) show that the T1 policy lacks static robustness. T1(1)
provides low mean response time at lower ρ2 , but its mean response time becomes high (sometimes
even higher than that under Dedicated) at higher ρ2 . On the other hand, T1(10) has higher mean
response time than T1(1) at lower ρ2 , but it can provide lower mean response time at higher ρ2 .
28
Specifically, in Figures 15(a), the mean response time under T1(10) can be 10% worse than that
under T1(1) at lower ρ2 , and the mean response time under T1(1) can be 10% worse than that
under T1(10) at higher ρ2 . Likewise, in Figures 16(a), the mean response time under T1(10) can
be 30% worse than that under T1(1) at lower ρ2 , and the mean response time under T1(1) can
be 20% worse than that under T1(10) at higher ρ2 .
The bottom rows of Figures 15-16 illustrate static robustness of the ADT policy, where thresh(1)
(2)
olds t1 = 1 and t1 = 10 are fixed and t2 is chosen via our heuristic introduced in Section 3.2.
The figures show that the ADT policy provides roughly at least as good mean response time as
the better of the two T1 policies for the full range of ρ2 and for both high and low ρ1 ; i.e. the
ADT policy excels in static robustness. This reinforces our findings in Section 3.2
The conclusion of our experiments is that resource sharing (T1) can significantly improve the
mean response time at the call center. The ADT policy is an even better choice if the call center
wants static robustness against changes in the number of calls in each day, for example, due
to changes in service at the bank, increased patronage, or due to increased popularity of online
transactions (which in turn leads to increased IN calls).
5.2.2
Dynamic robustness
In the previous section, we studied the effect of misestimation of the average load, by considering
the mean response time of ADT and T1 at different average loads. To isolate the effect of dynamic
robustness, we now hold the average load fixed, and study the effect of the load fluctuation inherent
in our trace.
Figure 17 illustrates dynamic robustness of the T1 and ADT policies, plotting the percentage
change in the mean response time each month against the optimized T1 policy. Recall that the
trace has large fluctuations in the arrival rate within each day (Figure 13). As the arrival pattern
in the trace is slightly different in each month, evaluating the mean response time each month
allows us to evaluate dynamic robustness over twelve different arrival patterns. The threshold
value, t1 , of the optimized T1 policy is chosen such that the overall mean response time during the
year is minimized, and is fixed throughout the year. In the Regular-IN model Figure 17(a), T1(1),
T1(∞), and ADT(3,7,12) are evaluated against the optimized T1 policy, T1(6). In the Priority
model Figure 17(b), T1(1), T1(∞), and ADT(3,6,28) are evaluated against the optimized T1
29
100
ADT(3,7,12)
T1(1)
T1(∞)
80
percentage change over T1(4)
percentage change over T1(6)
100
60
40
20
0
ADT(3,6,28)
T1(1)
T1(∞)
80
60
40
20
0
2
4
6
month
8
10
12
2
(a) Regular-IN: ρ1 = 0.70 and ρ2 = 0.37
4
6
month
8
10
12
(b) Priority: ρ1 = 0.42 and ρ2 = 0.82
Figure 17: Dynamic robustness of T1 and ADT under (a) the Regular-IN model and (b) the
Priority model. Figures show the percentage change (%) in the mean response time of T1(1),
T1(∞), and the (locally) optimized ADT policy over the T1 policy with the optimal t1 threshold
for the year: (a) t1 = 6 and (b) t1 = 4. A negative percentage indicates the improvement over the
optimized T1 policy.
policy, T1(4). Here, ADT(3,7,12) and ADT(3,6,28) are (locally) optimized ADT policies whose
threshold values are chosen to minimize the mean response time during the year, as in Section 4.
The loads, ρ1 and ρ2 , are chosen such that the mean response time under Dedicated is roughly
60 minutes for both queue 1 and queue 2, but as Figures 15-16 suggest, mean response time is
much lower under T1 and ADT policies.
Figure 17 shows that the mean response time under T1(∞) can be twice as high as the mean
response time under the optimized T1 policy. Overall, in T1(∞) the agent at queue 2 is too
conservative, and could help queue 1 more without penalizing calls at queue 2 too much.
Figure 17 also shows that the mean response time under T1(1) can be higher than the optimized T1 policy by 10-20% for some months. In the Regular-IN model, T1(1) is superior to the
optimized T1 policy during the first seven months, but its mean response time becomes higher
during the rest of the year. This is due to the fact that the number of IN calls per month increases
throughout the year, and the T1(1) policy causes starvation at queue 2 during the peak hours.
On the other hand, in the Priority model the T1(1) policy is consistently (∼ 10%) worse than the
optimized T1 policy. Overall, the mean response time under the T1(1) policy tends to be high,
as it is likely to cause starvation at queue 2 at peak hours.
Finally, Figure 17 shows that the mean response time under the (locally) optimized ADT
30
policy is slightly better than the optimized T1 policy, but parallel to the observation in Section 4,
the performance advantage of ADT over T1 is small under load fluctuation. Specifically, the
improvement of the optimized ADT policy over the optimized T1 policy is never more than 5%
in the Regular-IN model and is never more than 2.5% in the Priority model.
The conclusion of our experiments is that the ADT policy yields only a small improvement
over the T1 policy with respect to dynamic robustness. The small improvement suggests that
using more thresholds may not improve the mean response time appreciably. Thus, with respect
to minimizing the mean response time at the call center, the T1 policy suffices, even though the
load at the call center has large fluctuations within each day. Note that these observations agree
with our findings in Section 4.
6
Conclusion
This paper presents the first analytical study of the performance of a wide range of threshold based
(resource) allocation policies in a multiserver system. The speed and accuracy of our analysis allow
an extensive evaluation of these allocation policies, and we find surprising conclusions.
We first consider single threshold policies, T1 and T2, and find that the T1 policy is superior
with respect to (overall weighted) mean response time. That is, the threshold for resource allocation is better determined by the beneficiary queue length (queue 1) than by the donor queue
length (queue 2), in all cases studied.
We then compare single threshold policies to a multiple threshold policy, the adaptive dual
threshold (ADT) policy, with respect to mean response time, assuming that the load is fixed and
known. We find that when the threshold value is chosen appropriately, the mean response time
of the T1 policy is at worst very close to the best mean response time achieved by the ADT
policy. This is surprising, since the optimal policy appears to have infinitely many thresholds,
but evidently the improvement these thresholds generate is marginal.
We next study static robustness, where the load is constant, but may have been misestimated.
We find that the ADT policy not only provides low mean response time but also excels in static
robustness, whereas the T1 policy does not. The increased flexibility of the ADT policy enables
it to provide low mean response time under a range of loads. Hence, when the load is not exactly
31
known, the ADT policy is a much better choice than the T1 policy.
Finally, and surprisingly, our analysis shows that this improvement in static robustness does
not necessarily carry over to dynamic robustness, i.e. robustness against load fluctuation. We
observe that when the load is fluctuating, the T1 policy, which lacks static robustness, can often
provide low mean response time, comparable to ADT. This can occur for example because the
mean response time of jobs arriving during the high load period may dominate the overall mean
response time.
Complementing our analytical work, we evaluate the performance of various allocation policies
by using a trace from a call center. The arrival pattern at the call center exhibits quite large
fluctuation; our trace driven simulation reinforces our conclusions, implying that this call center
could significantly improve mean response time using the T1 policy, and the ADT policy has only
a small improvement over the T1 policy with respect to dynamic robustness. However, the ADT
policy is a better choice if the call center wants static robustness.
Acknowledgement
This work is supported by NSF Career Grant CCR-0133077, NSF Theory CCR-0311383, NSF
ITR CCR-0313148, and IBM Corporation via Pittsburgh Digital Greenhouse Grant 2003. The
authors also thank Li Zhang, who contributed to an earlier version of this paper (Osogami et al.,
2004).
References
H. S. Ahn, I. Duenyas, and R. Q. Zhang. Optimal control of a flexible server. Advances in
Applied Probability, 36:139-170, 2004.
S. L. Bell and R. J. Williams. Dynamic scheduling of a system with two parallel servers in
heavy traffic with complete resource pooling: Asymptotic optimality of a continuous review
threshold policy. Annals of Applied Probability, 11:608-649, 2001.
L. Brown, N. Gans, A. Mandelbaum, A. Sakov, H. Shen, S. Zeltyn, and L. Zhao. Statistical
analysis of a telephone call center: A queueing-science perspective. Journal of the American
Statistical Association, 100(469):36-50, 2005.
32
D. R. Cox and W. L. Smith. Queues. Kluwer Academic Publishers, 1971.
L. Green. A queueing system with general use and limited use servers. Operations Research,
33(1):168-182, 1985.
I. Guedj and A. Mandelbaum.
“Anonymous bank” call-center data, February 2000.
http://ie.technion.ac.il/∼serveng/.
J. M. Harrison. Heavy traffic analysis of a system with parallel servers: Asymptotic optimality
of discrete review policies. Annals of Applied Probability, 8(3):822-848, 1998.
G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic
Modeling. ASA-SIAM, Philadelphia, 1999.
A. Mandelbaum and A. Stolyar. Scheduling flexible servers with convex delay costs: Heavy
traffic optimality of the generalized cµ-rule. Operations Research, 52(6):836-855, 2004.
S. Meyn. Sequencing and routing in multiclass queueing networks part i: Feedback regulation.
SIAM Journal on Control Optimization, 40(3):741-776, 2001.
T. Osogami.
chains.
Analysis of multi-server systems via dimensionality reduction of Markov
PhD thesis, School of Computer Science, Carnegie Mellon University, 2005.
http://www.cs.cmu.edu/∼osogami/thesis/
T. Osogami, M. Harchol-Balter, A. Scheller-Wolf, and L. Zhang. Exploring threshold-based
policies for load sharing. In Proceedings of the 42nd Annual Allerton Conference on Communication, Control, and Computing, pages 1012-1021, September 2004.
R. Shumsky. Approximation and analysis of a call center with specialized and flexible servers.
OR Spektrum, 26(3):307-330, 2004.
M. S. Squillante, C. H. Xia, D. D. Yao, and L. Zhang. Threshold-based priority policies
for parallel-server systems with affinity scheduling. In Proceedings of the IEEE American
Control Conference, pages 2992-2999, June 2001.
M. S. Squillante, C. H. Xia, and L. Zhang. Optimal scheduling in queuing network models
of high-volume commercial web sites. Performance Evaluation, 47(4):223-242, 2002.
D. A. Stanford and W. K. Grassmann. The bilingual server system: A queueing model
featuring fully and partially qualified servers. INFOR, 31(4):261-277, 1993.
33
D. A. Stanford and W. K. Grassmann. Bilingual server call centers. In D.R. McDonald
and S.R.E. Turner, editors, Analysis of Communication Networks: Call Centers, Traffic and
Performance. American Mathematical Society, 2000.
R. A. Uhlig and T. N. Mudge. Trace-driven memory simulation: A survey. ACM Computing
Surveys, 29(2):128-170, 1997.
J. A. Van Mieghem. Dynamic scheduling with convex delay costs: The generalized cµ rule.
Annals of Applied Probability, 5(3):809-833, 1995.
R. J. Williams. On dynamic scheduling of a parallel server system with complete resource
pooling. In D. R. McDonald and S. R. E. Turner, editors, Analysis of Communication
Networks: Call Centers, Traffic and Performance. American Mathematical Society, 2000.
A
Analysis of the ADT policy
A.1
Analysis of the ADT policy under Poisson arrivals
Below we provide a formal description of the analysis of the ADT policy; this will be useful when
we generalize the arrival process. The Markov chain in Figure 8 is a (nonhomogeneous) QBD
process, where level j of the process denotes the j-th column, namely all states of the form (i, j)
for each j. The generator matrix, Q, of this process can be expressed as a block diagonal matrix:
L(0)
F(0)
(1)
B
Q=
L(1)
F(1)
B(2) L(2)
..
.
F(2)
..
.
..
.
where submatrix F(i) encodes transitions from level (column) i to level i + 1 for i ≥ 0, submatrix
B(i) encodes transitions from level i to level i − 1 for i ≥ 1, and submatrix L(i) encodes transitions
within level i for i ≥ 0. Note that our QBD process repeats after level t2 + 1, i.e., F(i) = F(t2 +1) ,
L(i) = L(t2 +1) , and B(i) = B(t2 +1) for all i > t2 + 1.
Before dimensionality reduction, matrices F(i) , L(i) , and B(i) , had an infinite number of
columns and rows, since the number of states in each level was infinite (recall Figure 3(a)).
DR reduces the number of states in each level to a finite number; as a result, matrices F(i) , L(i) ,
34
and B(i) , have a finite number of columns and rows. Again, in Figure 8, the bottom two rows
(2)
correspond to the “busy period”, during which there are ≥ t1
type 2 jobs, approximated by a
two phase PH distribution with parameters (β1 , β2 , β12 ).
(1)
Assuming 1 < t1
(2)
< t1 , and that the “busy period” is approximated by a two phase PH
(2)
(2)
distribution, matrices F(i) , L(i) , and B(i) have size (t1 + 2) × (t1 + 2) for all i. Specifically,
F(i) = λ2 I,
for i ≥ 0, where I is an identity matrix of an appropriate size;
I
B(i) = µ2
0
(1)
(2)
where I is an identity matrix of size t1 for 1 ≤ i ≤ t2 and of size t1 for i > t2 , and 0 is a zero
matrix of an appropriate size;
(∗)
µ
1
L(i)
=
λ1
(∗)
λ1
µ1 + µ12 (∗)
..
.
..
.
..
.
λ1
µ1 + µ12 (∗)
λ1
0
β1
−(β1 + β12 )
β12
β2
0
−β2
for all 1 ≤ i ≤ t2 , where the diagonal elements, (∗), are determined so that the sum of each row
in the generator matrix Q becomes zero. Matrix L(0) is obtained from L(1) by replacing the (2,1)
element by max{µ1 , µ12 }. For i > t2 , L(i) is obtained from L(1) by replacing the (k + 1, k) element
(i.e., µ1 + µ12 ) by µ1 for 3 ≤ k ≤ t2 .
→
Using matrix analytic methods, the stationary probability of being in level i, −
πi , is then given
recursively by
−
→
−−
→ (i)
πi = π
i−1 R ,
35
where R(i) is given recursively by:
F(i−1) + R(i) L(i) + R(i) R(i+1) B(i+1) = 0,
for i = t2 , ..., 1, where 0 is a zero matrix of an appropriate size. Since our QBD process repeats
after level t2 + 1, R(i) = R for all i ≥ t2 + 1, where R is given by the minimal solution to the
following matrix quadratic equation:
F(t2 +1) + RL(t2 +1) + (R)2 B(t2 +1) = 0.
→0 is given by a positive solution of
A row vector −
π
→
−
→0 L(0) + R(1) B(1) = −
π
0,
normalized by
−
→0
π
∞ Y
i
X
−
→
R(k) 1 = 1,
(3)
i=0 k=1
−
→
−
→
where 0 and 1 are vectors with an appropriate number of elements of 0 and 1, respectively.
Note that the infinite sum in (3) may be rewritten by a closed form expression using (R)−1 , since
R(i) = R for all i ≥ t2 + 1. The mean response time can be computed using the stationary
→
distributions (−
πi ’s) given above, via Little’s law, as before.
A.2
Analysis of the ADT policy under MMPP
In this section, we describe the analysis of the ADT policy when arrivals at queue 2 follow an
MMPP and arrivals at queue 1 follow a Poisson process. An extension to the case where arrivals at
queue 1 also follow an MMPP is possible, but requires an additional technique (see e.g. Osogami,
2005). Following standard notation, we denote the parameters of an MMPP by a pair of matrices
(D0 , D1 ). In the case of MMPP(2),
−(αH + λH )
D0 =
αL
αH
−(αL + λL )
36
and
λH
D1 =
0
0
λL
,
where the duration of the high (low, respectively) load period has an exponential distribution
with rate αH (αL , respectively). Let λ denote the fundamental rate (the overall average arrival
rate) of the MMPP.
The generator matrix, Q̂, when the arrivals at queue 2 follow an MMPP with parameter
(D0 , D1 ) is obtained by modifying the generator matrix, Q, for the Poisson arrivals, which we
introduced in Section A.1. We denote the submatrices of Q̂ by F̂(i) , L̂(i) , and B̂(i) , i.e.,
L̂(0)
F̂(0)
(1)
B̂
Q̂ =
L̂(1)
F̂(1)
B̂(2) L̂(2) F̂(2)
..
..
.
.
..
.
.
Then, F̂(i) , L̂(i) , and B̂(i) are obtained from the submatrices, F(i) , L(i) , and B(i) , of Q by
F̂(i) = D1 ⊗ F(i) /λ
B̂(i) = I0 ⊗ B(i)
L̂(i) = D0 ⊗ Ii + I0 ⊗ L(i)
for all i. Here, Ii denotes an identity matrix of the same size as L(i) , I0 denotes an identity matrix
of the same size as D0 , and ⊗ denotes the Kronecker product.
Note that, before DR, F(i) L(i) , and B(i) have an infinite size, and so do F̂(i) L̂(i) , and B̂(i) .
Dimensionality reduction reduces the size of F(i) , L(i) , an B(i) to finite, and thus the size of F̂(i) ,
L̂(i) , an B̂(i) also becomes finite. Now, the mean response time is obtained in the same way as in
Section A.1.
37