6
Weighted fair queueing
Fair queueing originated in the data communications field, initially as a congestion control device preventing ill-behaved users from unduly affecting the
service offered to others [Nag87]. In a fluid limit, fair queneing realizes head
of line processor sharing, the objective being to share server capacity equally
between all customers having packets to transmit. When the fluid service
rate is modulated according to weights attributed to the contending traffic
streams, we speak of Generalized Processor Sharing (GPS).
Weighted Fair Queueing (WFQ) is a practical implementation of GPS first
proposed by Demers, Keshav and Shenker [DKS89] and further discussed and
analysed under the name of Packet by packet Generalized Processor Sharing
(PGPS) by Parekh and Gallager [PG93a]. In the following we will use PGPS
to denote this algorithm reserving WFQ as a generic term. PGPS may be
considered as the closest possible approximation to GPS. Its implementation proves somewhat complex, however (see below). A simpler algorithm
which is less precise but much easier to realize was proposed independently by
Golestani, under the name Self Clocked Fair Queueing (SCFQ) [Go194], and
by Roberts, under the name Virtual Spacing [Rob94]. In fact, this simplified
algorithm was first "invented" by Davin in early work on fair queueing performed at MIT [DH90]. A related scheduling discipline is "Virtual Clock" as
proposed by Zhang [Zha90] which also provides protection from ill-behaved
users and can support diverse throughput guarantees.
6.1
WFQ algorithms
Consider a server of constant rate c handling cells from a number m of traffic
streams where each stream has its own queue and cells from a given stream
are served in FIFO order. This system is depicted in Figure 6.1.1. Each stream
might represent a particular ATM connection or a set of connections grouped
together for traffic handling purposes.
6.1.1
Generalized Processor Sharing
Generalized Processor Sharing (GPS) is a fluid approximation to the queueing system of Figure 6.1 where the service rate of any backlogged stream i is
proportional to a certain rate parameter r [PG93a]. We consider the special
case of GPS where, for each stream i, r is set to an intrinsic stream rate r/
and admission control is performed to ensure that ~ ri < c . The parameter
174
6. Weighted fair queueing
stream 1
9
server
rate c
(< E
stream m
1
-
Figure 6.1.1: WFQ system.
ri can then be interpreted as a m i n i m u m bandwidth guarantee. It might be
set to an equivalent bandwidth determined to serve a set of V B R connections constituting stream i or to the leak rate of a leaky bucket controlling
the burstiness of stream i. Choice of rate parameters is further discussed in
Section 6.4.
6.1.2
PGPS
Generalized Processor Sharing is an ideal which cannot be realized in practice
because packets are not infinitely divisible. Packet-by-packet GPS [PG93a]
is a generalization of the Fair Queueing algorithm proposed by Demers et al.
[DKS89]. In this algorithm, variable length packets are served in an order
determined so that, as far as possible, they complete service in the order which
would prevail if they were in fact served using GPS. In practice, this order
can o n l y be determined for the packets present at the end of each service:
the server chooses to serve the packet which would finish first in GPS if there
were no further arrivals. In the case of ATM cells, it can be shown that for a
given arrival process, the difference between the times transmission of a given
cell would finish in P G P S and GPS is bounded by one cell transmission time
(l/c) [PG93a].
To calculate the hypothetical finishing times necessary for P G P S scheduling, Parekh and Gallager introduce the notion of Virtual time [PG93a]. This
is a function of real time t which evolves at a rate inversely proportional to
the sum of the rate factors ri of backlogged connections in the corresponding
GPS system.
Let PGPSi be a variable associated with stream i. Applied to ATM cells,
the P G P S service algorithm can be stated as follows:
9 on a stream i cell arrival
(i) PGPSi +-- max{ Virtual time, PGPSi} + 1/ri
(ii) time s t a m p the cell with the value of PGPSi.
6.1. WFQ algorithms
175
9 serve cells in increasing order of time stamp.
It is demonstated in [PG93a] that P G P S provides a minimum rate guarantee to the extent that the work accomplished on a given stream queue in
any interval differs from that of the equivalent GPS system by at most one
cell. The closeness of P G P S to GPS comes at the expense of an algorithm
whose implementation would be rather complicated. The complication resides in keeping track of Virtual time which requires an evaluation of the
number of backlogged streams in the equivalent GPS system.
6.1.3
Virtual
clock
A queue scheduling scheme related to P G P S is the Virtual Clock algorithm
proposed by Zhang [Zha90]. In this algorithm, the time stamp, denoted VCi,
is calculated with respect to real time t:
9 on a stream i cell arrival
(i) VC~ +-- max{t,VCi} + 1/ri
(ii) time stamp the cell with the value of VCi.
9 serve cells in increasing order of time stamp.
This service discipline is simpler than PGPS, it is work conserving, it
provides performance bounds similar to those of P G P S (see [HK96]) and
it guarantees average throughput for each connection. It has, however, one
disadvantage with respect to PGPS, illustrated in the following example:
a stream emits a message of 1 Mbit at the multiplexer rate of
c = 100 Mbit/s; although the stream has a rate parameter r = 1
Mbit/s, its transmission proceeds at peak rate since it is the only
stream with a backlog; however, just before it can emit the last
cell of the burst, another stream becomes active and also begins
to emit a message at 100 Mbit/s; the last cell of the first burst will
have a time stamp roughly equal to the current time t + l second (1
Mbit at 1 Mbit/s). Assuming the second connection also has a rate
parameter equal to 1 Mbit/s, this last cell will be further delayed
until nearly 1 Mbit of the new message has been transmitted.
The message transfer time of the second stream would have been
the same if the last cell of the first stream had been emitted
without delay. The P G P S scheme would have emitted the last
cell just after the first cell of the second burst thus achieving
reduced message transfer time.
176
6. Weighted fair queueing
In general, the Virtual Clock algorithm offers rate guarantees with much
less precision than PGPS. As in the above example, realized throughput can
be much smaller than ri for significant periods of time. The Virtual Spacing
algorithm described below is as simple to implement as Virtual Clock and
achieves nearly the same throughput guarantees as PGPS.
6.1.4
Virtual Spacing/SCFQ
In the following we use the name Virtual Spacing for the special case of
SCFQ in the ATM context of constant length packets. The Virtual Spacing
algorithm is the same as PGPS except that we replace Virtual time by a
simpler variable denoted Spacing time. Spacing time is just equal to the
value of the time stamp of the last cell to have been taken from the head of
the queue, i.e., in a busy period, the cell currently being transmitted.
Let VSi be a variable associated with stream i. The algorithm is then:
9 on a stream i cell arrival
(i) VSi e-- max{Spacing time, VSi} + 1/ri
(ii) time stamp the cell with the value of VSi.
9 serve cells in increasing order of time stamp.
Since Spacing time cannot be greater than the time stamp of any cell
already waiting when its value was updated, step 1 of the algorithm implies
that the time stamps of the cells of a backlogged stream are arithmetically
spaced by the interval 1/ri. Spacing time only intervenes in the calculation of
the time stamp of a cell arriving on a non-backlogged stream and allows the
stream to be included at an appropriate place in the transmission schedule.
6.2
Performance
guarantees
This section discusses performance of WFQ algorithms in terms of delay
bounds and fairness characteristics. We also discuss the adequation of these
bounds, notably for dimensioining playback buffers for connections with real
time delay constraints.
6.2.1
Leaky
bucket
controlled
streams
Performance bounds can be derived most easily for the idealized GPS service
discipline. Suppose stream i is controlled by a leaky bucket of leak rate ri and
token pool size bi so that the amount of work ui(s, t) arriving in an interval
(s, t) satisfies the inequality:
<
+r (t - 8).
6.2. Performance guarantees
177
Let Vit be the amount of work belonging to connection i in the multiplexer
queue at time t (i.e., the number of cells in the queue plus the remainder of
the cell currently being transmitted). Since the service rate is not less than
ri, we have by Reich's theorem:
V~t
<_ s u p { u i ( s , t )
-
ri(s
-
t)}
(6.2.1)
s<t
__ b~.
(6.2.2)
Now consider the output from the multiplexer in the interval (t, u). Let
the amount of connection i work leaving the system in this interval be & (t, u).
This work is composed of the remainder of the cell being transmitted at t,
plus the cells completely transmitted in (t, u), plus the transmitted part of
the cell currently being transmitted at u. We necessarily have:
~(t, u) < v~t + .~(t, ~)
and thus, using (6.2.1):
~(t, ~) < sup {.~(s,t)
- ,-~(t - s)} + ~.~(t,,~)
sKt
= sup (~.~(8, u) - ,.~(t - 8)}
s(t
< sup {,.~(u - s) + b~ - ,-~(t - 8)}
s<t
= ri(u - t) + bi.
(6.2.3)
The last inequality shows that the output from the multiplexer conserves
the burstiness bound guaranteed at the network input by the leaky bucket.
This is an extremely desirable property since it ensures that if all multiplexers
on a connection path guarantee the minimum service rate ri then cell loss can
be completely avoided by reserving buffer space equal to hi. Furthermore, the
delay of any cell is bounded by bi/ri at each multiplexer, independently of
the activity of other connections. In fact, Parekh and Gallager [PG94] have
proved the stronger result that, neglecting fixed processing and propagation
times, the overall delay through a network of such GPS servers is bounded
by b~/r~.
It should be noted that (6.2.2) and (6.2.3) derive from the properties
of individual connection i alone and do not rely on all connections being
controlled by a leaky bucket at the access. If some connection were completely
unconstrained at the access, its potential impact on other connections could
still be limited by attributing to it a minimal service rate r (_> 0) and a
maximal buffer occupancy. Note finally that to derive the above bounds,
we have only used the minimum service rate property of GPS service. The
bounds thus apply to any other service discipline offering a minimum rate
guarantee including time division multiplexing.
178
6.2.2
6. Weighted fair queueing
End
to end delay
The overall delay bound of bi/ri for GPS applies in the fluid regime assuming
"cut through" switching at each node, i.e., a cell can begin transmission at
a downstream node before it has completed transmission at one or more
upstream stages. With the more realistic assumption of store and forward
cell switching (i.e., each cell must be received entirely at a given stage before
it can be re-transmitted) it is necessary to add the transmission time at each
stage to the overall delay. The delay DiGPs (K) in a network of K stages then
satisfies:
D~ ps (K) < bi 4- K - 1
(6.2.4)
ri
ri
The need to account in the service discipline for the discrete nature of
ATM cells introduces supplementary slackness in the above bounds. As previously noted, PGPS achieves the closest approximation to the GPS ideal. It
may be shown that the delay DiPGPS (K) in a network of K stages satisfies:
DPGPS
(K) <_ bi
ri
-
+
K - 1
ri
~-~ 1__
+Y_-I~
= Ck'
(6.2.5)
where ck is the link rate at stage k [PG94]. Since the ck are typically much
greater than the stream rate ri, the difference between (6.2.5) and (6.2.4) is
small.
A corresponding bound is proved for SCFQ in [Go195] which translates in
the ATM context (i.e., for Virtual Spacing) to:
K
I~"
mk- ---4---4-~
ri
ri
ck
D V S ( K ) < bl
1,
(6.2.6)
k=l
where mk is the number of streams multiplexed at stage k. Note that if all
streams have the same rate parameter, the last term in (6.2.6) is approximately equal to the second. The bound then corresponds to two additional
cell transmission times at the stream rate per multiplexing stage instead of
one for GPS and PGPS.
- The usefulness of the above end to end delay bounds depends on the
QoS requirements of the traffic stream in question. We distinguish two broad
classes of streams depending on whether their connections have strict real
time delay constraints or not, For the former, it is essential that the end to
end delay be very small and that its variability (jitter) be known, notably for
dimensioning the receiver playback buffer.
6.2.3
Jitter
and
playback
buffer dimensioning
Assume the playback buffer operates as a spacer emitting cells with a minimum interval of 1/ri. No cells will be lost if the buffer is greater than riDm~x
6.2. Performance guarantees
179
where Dmax is the end to end WFQ delay bound. If no cells are lost, the
overall delay actually experienced by the cells of a connection in the network
and the buffer will be a non-decreasing function of the rank of the cell and
will attain a maximum value less than or equal to Dm~x.
Consider the case of a telephone connection handled as an individual
stream with rate parameter ri equivalent (in cells per second) to 64 Kbit/s
and a low CDV tolerance parameter (corresponding to bi = 2, say) accounting
for initial jitter. Even for GPS service, to account for the maximum possible
delay in the playback buffer, it would be necessary to allow 6 ms (i.e., one cell
time at 64 Kbit/s) for each multiplexing stage in addition to the initial CDV
tolerance. In a large network, this can represent an unacceptably large delay
for interactive communications. Typically, the playback buffer would not be
dimensioned for the worst possible delay but for a maximum delay equal to
a suitably small quantile of the delay distribution. In this respect, P G P S
and Virtual Spacing perform better than GPS even though their bounds are
greater.
Suppose, for illustration purposes, that all streams are individual 64
Kbit/s CBR connections and all the link bandwidth is allocated ( ~ ri = c).
In GPS, each cell takes exactly 6 ms to comPlete transmission. P G P S and
Virtual Spacing, on the other hand, behave more like a FIFO queue: cell
transmission time is equal to one service time of 1/c plus a random delay
equal to the waiting time in an N * D / D / 1 queue (see Part III, Section 15.2).
The bounds (6.2.5) and (6.2.6) are determined from a worst case scenario
and correspond to the delay of the last cell to be served when all connections
emit cells at precisely the same instant. The actual delay of an arbitrary cell
is typically very much smaller than this and a playback buffer dimensioned
as discussed in Section 3.1.2 would be sufficient. This statement is generally
true when multiplexing only streams with low burstiness (i.e., a small value
of bl). Consider now the impact of bursty traffic on the jitter of real time
connections.
Since their delay is small, the cells of CBR streams generally arrive to
a non-backlogged queue. They consequently have their time stamp derived
using the value of virtual time ( Virtual time or Spacing time). Cells of bursty
connections, on the other hand, have their time stamps spaced by the reciprocal of their rate parameter. Assume for illustration purposes that all streams
have the same rate parameter r but that some are bursty while others are
CBR. We consider the operation of Virtual Spacing. It is clear that Spacing
time is a non-decreasing function of real time since the time stamps of waiting
cells, including those added to the queue since the last service instant, are
greater than or equal to the current value of Spacing time. At the arrival of
a cell of a CBR connection, its timestamp is set to Spacing time + l / r . Now,
the time stamp of the first celt of any backlogged stream cannot be greater
than this and is generally smaller (being 1 / r greater than the time stamp of
the last cell to have been served which cannot be greater than the current
180
6. Weighted fair queueing
value of Spacing time). Consequently, in this example, every CBR cell will
have to wait for service behind one cell from every backlogged stream. This
is a systematic increase in the delay of CBR streams which hardly profits the
bursty traffic.
A solution giving priority to non-backlogged streams consists in modifying
the Virtual Spacing algorithm as follows:
9 on a stream i packet arrival
(i) VSi 4-- max{Spacing
time, VSi
+
1/ri}
(ii) time stamp the packet with the value of VSi.
9 serve packets in increasing order of time stamp.
A similar modification seems appropriate in the case of PGPS. The worst
case delay scenarios used to derive the delay bounds are the same so that
these remain unchanged for the revised algorithm.
We have argued that the delay bounds are too loose to be useful for low
rate streams with real time constraints (for dimensioning playback buffers,
for example) 1. For bursty connections the delay bounds are even looser in
so far as the parameter bi for such connections is typically much larger than
the mean burst size and realized delays are typically very much smaller than
these worst case bounds [RBC93]. The more significant feature of W F Q
algorithms for such streams is their fairness.
6.2.4
Fairness
The term "weighted fair queueing" implies sharing available bandwidth between active streams in proportion to their rate parameters ri. This can
only be achieved exactly in the theoretical GPS scheduling algorithm. In
any packet by packet algorithm, bandwidth can only be shared to within the
granularity defined by the packet transmission time. We interpret fairness
to mean that, in any given interval (s, t] throughout which stream i remains
backlogged, there is a constant T(s, t) such that the number ~i (s, t) of stream
i cells served in (s, t] satisfies:
T(s, t)ri
- 1 < Ui(s,
t) <_T(s, t)ri
+ 1.
(6.2.7)
Notice that even with GPS where the work performed is exactly proportional
to the rate parameters, the number of cells transmitted in an interval can
only be defined by an inequality equivalent to (6.2.7). The inequality shows
that the number of cells taken from the queue tends to a fair share as the
1Note t h a t the problem is less severe for higher rate video s t r e a m s t h a n for telephone
connections and could be m a d e negligible for the latter if the ' s t r e a m ' grouped together a
sufficient n u m b e r of connections with an appropriately defined rate parameter.
6.3. Realising WFQ
181
considered interval increases (qi(s,t) > > 1). The inequality (6.2.7) can be
demonstrated as follows for the Virtual Spacing algorithm.
Let the time stamps of cells served at or immediately before s and t be v,
and vt, respectively. Assume stream i is backlogged in (s, t] and let Oi be the
time stamp of the first cell of this stream to be served in the interval. Since
stream i time stamps are spaced by 1/ri, Oi satisfies:
73 < Oi <_ v~ + 1/ri.
(6.2.8)
Note that we have Oi = v8 + 1/ri when stream i becomes backlogged before
s but after the immediately preceding service instant.
Since the time stamps of all stream i cells served in (s, t] are necessarily
less than or equal to vt (Spacing time is non-decreasing), we deduce:
+
t) -
oi +
<
(6.2.9)
>
(6.2.10)
Substituting appropriate bounds on 0~ from (6.2.8) we derive (6.2.7) with
T ( s , t ) = 7t - vs. Note that this result applies to both the original Virtual
Spacing algorithm of Section 6.1.4 and the modified algorithm suggested in
Section 6.2.3. Further fairness properties are demonstrated in the variable
packet length context of SCFQ in [Go194]. The fairness of PGPS is not directly addressed in [PG93a] but appears to follow immediately from its proven
closeness to the GPS ideal.
6.3
Realising WFQ
WFQ is certainly much more complicated than the simple FIFO queues which
are used exclusively in most ATM switch architectures. It may however
be argued that its substantial traffic control advantages largely justify the
increased complexity. A key requirement is to be able to rapidly sort cells in
increasing time stamp order.
6.3.1
Sort queueing
We consider an output queueing switch module where service scheduling is
performed by a dedicated output controller on each outgoing multiplex. The
switch fabric is assumed capable of routing cells to the required output without significantly changing the input traffic characteristics due to cell loss or
delay.
The weighted fair queueing algorithms discussed in Section 6.1 rely on
service in increasing order of time stamp. The multiplexer output controller
must therefore be able to schedule waiting cells so that the cell selected
for transmission is always the one with the smallest time stamp. One way
of doing this is to sort the cells in a serial memory in order of increasing
182
6. Weighted fair queueing
time stamp; a newly arriving cell is inserted at the appropriate queue place
by comparing its time stamp with those of cells already waiting. Devices for
rapidly performing this sorting function have been designed by Chao [Cha91].
An alternative design considered in the COST 242 project is based on the
following real time sort algorithm [RBS95].
To perform scheduling according to the value of a time stamp, it is not
necessary to completely sort messages in increasing time stamp order but
only to identify the message with the smallest time stamp at any time when
a service can take place. The following algorithm realizes this objective by
performing a set of comparison and shift operations in parallel on messages
arranged in two tables A(j) and B(j), 0 <_j < m. Each word of tables A and
B is either set to a default maximum value (all bits set to 1) or represents
a message and its associated time stamp. The time stamp occupies the k
leftmost bits while the remainder identifies the message content (generally
a pointer to an address). For the sake of simplicity we neglect -the problem
of time wrap around (i.e., the fact that the time periodically comes back to
zero every 2k units) and assume the time stamp unambiguously determines
the service order: the message represented by A(i) will be served before the
message represented by B(j), say, if A(i) < B(j). All words of tables A and
B are initially set to the default value. The sorting algorithm has two phases,
one when a new message is inserted, one when the smallest valued message
is extracted.
A new message is written to word A(0). Words A(j) and B(j), for 0 < j <
m, are compared and their contents interchanged if A(j) < B(j). After this
operation we therefore have B(j) <_A(j) for 0 _< j < rn. The words in table
A are then shifted one step downwards: A(j) +- A ( j - 1) for 1 < j < m. A(0)
is re-initialized to the default maximum value. Note that the value initially
stored in word A(m) is lost.
Messages are extracted on reading from word B(0). By working out
particular examples it is easy to convince oneself that this address effectively
corresponds to the message with the smallest time stamp. A formal proof
is given in [RBS95]. As above, words A(j) and B(j), for 1 _< j _< m, are
compared and their content interchanged if A(j) < B(j). The words in table
B are then shifted one step upwards: B(j) +- B(j + 1) for 0 < j < m. The
value of B(rn) is re-initialized to the default value.
An integrated circuit design realizing the above algorithm is proposed in
[RBS95]. This circuit turns out to be very similar in conception to so-called
systolic sorters used in data processing applications [CM88].
6.3.2
Virtual Spacing
algorithm
The Sort Queue associated with each output may have space to contain the
entire cell but it would probably be more economical to stock the cell content
in a general purpose memory with the Sort Queue entry containing the time
stamp and a pointer to the corresponding cell address. This pointer might
6.4. Fixing rate parameters
183
be the address itself or just the identity of the connection to which the cell
belongs: the cell address would then be derived from information contained
in a connection context.
If we use a connection context to identify the waiting ceils, the Sort Queue
need only be long enough to contain one entry for each connection set up on
the link. This entry bears the time stamp of the cell which is currently head
of the line. When this cell is transmitted, the output controller replaces
the time stamp of the Sort Queue entry by that of the cell next in line.
This is possible because, with Virtual Spacing, the time stamp of a cell on
a backlogged connection can be determined as late as the departure instant
of the preceding cell. A cell arriving to a non-backlogged connection, having
its time stamp determined by the current value of Spacing Time, is entered
directly to the Sort Queue.
6.4
Fixing
rate parameters
The implementation of WFQ would return to ATM some of its original
promise of flexibility necessary for the development of a future safe network.
In this section we aim to illustrate this potential by suggesting how different
services might be supported. The main distinction is between services with
and without real time delay constraints.
6.4.1
Services
with
real time
constraints
Services like voice or video telephony with strict delay constraints and low
bandwidth requirements are ideally suited to multiplexing with cell scale congestion only, i.e., using Rate Envelope Multiplexing, as discussed in Section
4.1. This operating mode can be simulated using WFQ by attributing stream
parameters for groups of connections with a small value of b and a value of r
chosen to ensure the required cell loss ratio (CLR).
Consider a service like the telephone where individual calls appear as
on/off VBR connections whose bit rate characteristics are known (i.e., we
know the peak rate and the mean rate is known in a statistical sense for the
population of telephone calls). It is then possible to accurately predict the
stationary probability distribution of the combined instantaneous bit rate
of a group of n connections (the number of active connections is binomial).
Denote the bit rate at time t by At(n). Approximating CLR by the freeze-out
fraction, we could fix r such that
<c
(6.4.1)
where c is the target CLR. A stream memory parameter b of a few tens of cell
places would be necessary to avoid cell loss when the combined arrival rate
is not greater than r. Note that this is not a dedicated memory but rather
184
6. Weighted fair queueing
a device for deciding when to reject cells: the fact that the stream queue
exceeds b is evidence that the input rate is currently exceeding the allocated
rate r. Any cell loss rate can be fixed, without affecting the service offered
to other connections, simply by the choice of the function r(n) denoting the
rate threshold above which arriving cells will be lost.
This allocation effectively ensures that the stream CLR is not greater than
the target e. However, the rate allocation may be overly generous leading to
inefficient link utilisation since no account is taken of statistical resource
sharing with other streams; as the rate available to the stream is generally
greater than r, the CLR may be much smaller than e. For example, if another
service with the same QoS requirements were handled by the multiplexer it
would be possible to group the connections of both services in a single stream.
Clearly, the rate required for the combined stream calculated as above would
generally be considerably smaller than the sum of the individual rates.
Consider the effective bandwidth framework for REM discussed in Section
5.2.2 and assume stream i connections are all representative of a certain type.
Let ei(e, c) be the effective bandwidth of a connection of type i for a link of
rate c and target CLR e and let ni be the number of such connections. If all
streams multiplexed in the WFQ system had real time delay constraints and
the same CLR requirement e, the allocation of rate parameters
would be sufficient. It is tempting to suppose that, if streams have different
CLR requirements e~, a rate allocation calculated according to
rl =niei(ci, c)
(6.4.2)
would be sufficient.
In [LT95a], Lindberger and Tidblom demonstrate that this is indeed a
conservative rate allocation policy in that realized CLR for each class is less
than the respective target value. It is also argued that the allocation (6.4.2)
is satisfactory even when some multiplexed streams are bursty with a large
buffer parameter b although we note that for periods when the latter are
backlogged, the expected CLR of the real time traffic will be greater than
the target c.
Multiplexing gains obtained through resource sharing between streams as
discussed above can be said to rely on knowledge of traffic characteristics of
all streams (e.g., due to leaky bucket access control). However, it is shown in
[LT95a] that if one stream (or class of streams) were to set its rate parameter
ri at a value lower than its actual mean rate then, although that stream would
experience a loss ratio greater than the target, the other streams would still
be protected by the fair queueing mechanism. An incorrect mean rate only
affects the stream concerned and any other streams grouped in the same
class.
6.4. Fixing rate parameters
185
Absolute guarantees on individual stream performance may be difficult
to achieve since it would be necessary to make worst case assumptions about
competing traffic. The appropriate choice of rate parameters might need
to be safer for streams requiring strict guarantees. The appropriate rate
parameters for VBR streams having different CLR requirements would then
need to be determined using an effective bandwidth derived assuming a link
capacity somewhat less than the capacity actually reserved for all these types
of VBR streams.
6.4.2
Leaky bucket defined connections
Individual connections defined by leaky bucket parameters r and b are naturally realized in WFQ as streams with corresponding parameters. WFQ
allows the network to provide throughput r in a very clear sense and to
guarantee negligible cell loss. On the other hand, although delay is bounded
as shown in Section 6.2.3, these bounds are too loose to constitute useful
performance guarantees. It is argued in [RBC93] that realized delays should
typically correspond to peak rate transmission most of the time. However, the
need to account for burst scale congestion and the inherent unpredictability
of data traffic make it impossible to strictly guarantee more than the unduly
pessimistic worst case bounds.
In fact, the quality of service on data connections is manifested by end to
end delays which depend mainly on the user's choice of leaky bucket parameters since these determine access delays. There appears an obvious trade-off
between the respective values of r and b: if r is close to the mean emission
rate then b must be very large; conversely, if b is limited (e.g., by the network
operator) then the user may need to choose r several times greater than the
mean rate (see the examples cited in Section 5.3.2 and [RBC93]). It is noted
that, for low access delays, traffic parameter r must be a small multiple of
the mean rate while b should be an order of magnitude greater than the mean
burst size.
While it is straightforward with WFQ to guarantee the minimal throughput r, it remains to determine the engineering rules for dimensioning multiplex buffers to ensure negligible cell loss given the b value of multiplexed
streams. This requirement should be considerably less than the sum of the
parameters b. One possibility would be for the network to fix the value of b
based on available memory; this value could even evolve dynamically according to current traffic conditions.
6.4.3
Best
effort
data c o n n e c t i o n s
In a private ATM network, in particular, it may be considered unnecessarily
restrictive to impose access control by a leaky bucket: why limit the mean input rate to ri at times when more capacity is available on network links. The
danger of uncontrolled connections saturating network links and affecting the
186
6. Weighted fair queueing
quality of service of other users can be alleviated when WPQ is used by allocating a minimal service rate r and maximum buffer occupancy b, as for
leaky bucket controlled connections. A single "connection" of given parameters r and b may be reserved for all best effort traffic, including datagrams.
The impact of such connections on other users is thus limited while any spare
capacity is immediately available to them. Users would ideally adjust their
input rate to prevailing traffic conditions using end to end protocols like TCP.
This type of service is standard in current LANs and in data networks like the
Internet. In the B-ISDN, the Available Bit Rate (ABR) transfer capability
is a means of providing best effort service in which unreserved bandwidth is
shared dynamically (ref to ABR section).
ABR connections are generally assumed to share link capacity not currently attributed to connections with QoS guarantees. In the present case
of WFQ scheduling, the notion of attributed rate is clear for streams with a
small buffer parameter where r is determined for peak rate allocation or REM
as discussed above in Section 6.4.1. For leaky bucket controlled streams, on
the other hand, the rate reservation is only a long term committment and
achieving satisfactory QoS relies on bursts usually arriving to find the link
with sufficient capacity available to carry the burst at peak rate. If ABR
connections are attributed all unused capacity, such bursts will never have a
throughput greater than their reserved rate r.
6.4.4
Virtual Private N e t w o r k s
The creation of Virtual Private Networks using interconnected VPCs is an
interesting possibility afforded by the B-ISDN [WA92]. Users create VPCs
of specified capacity between network nodes according to their particular interconnection requirement. Now, at a given node, the sum of the capacities
of incoming VPCs can exceed the capacity of any outgoing VPC of the same
VPN. With simple FIFO queues, in order to ensure that the traffic offered
to such an outgoing VPC does not exceed its designated capacity and consequently interfere with other connections on the same link, it is necessary to
implement "output policing" [WA92]. This is a considerable complication to
the design of nodes which generally only control the conformity of traffic at
their inputs. The generalized use of WFQ constitutes a (partial) solution to
this problem:
in a node terminating the VPCs of a VPN, a WFQ algorithm
would simply be applied with the parameters r and b of the outgoing VPCs; it still remains possible for the traffic carried by
these VPCs to exceed their allocation but only when the other
connections on the link do not fully utilize their own bandwidth
allocation.
An output policing mechanism for controlling the VPC peak rate, as
considered in [WA92], would probably need to actually space (and not just
6.4. Fixing rate parameters
187
virtually space) the cells at the designated rate. However, it is not obvious
that this strict rate control is preferable to the WFQ solution in the definition
of the VPN service. It does not appear any easier to implement.
6.4.5
Link sharing
The considerations in this section are inspired by the discussion on resource
sharing in an unpublished Internet Draft on "A service model for an integrated services Internet" by Shenker, Clark and Zhang. A user to user VPC
is typically used for a variety of applications materialized by a number of
connections identified by a VC identifier. A user may wish to share the bandwidth reserved for the VPC between different groups of applications with
minimum rate guarantees calculated to meet specific performance requirements, notably for real time communications. The minimum rates could be
guaranteed by applying WFQ with rate parameters defined appropriately for
each application group considered as a stream within the VPC. The disadvantage with this approach is that the rate of a momentarily idle stream would
be shared between all streams on the multiplexer in proportion to their rate
parameters. In order for the rate to be reserved for the VPC, except when all
the VPC streams have no backlog, it is necessary to implement a hierarchical
form of WFQ.
Let the rate allocated to the VPC stream be ri and assume this rate is
to be distributed to m substreams with individual rate parameters rlj with
rij = ri. A modified Virtual Spacing algorithm realizing the above sharing
objectives is as follows. Let VSi and VSij be variables associated with stream
i and substream ij, respectively. The algorithm is then:
9 on a stream i cell arrival
(i) VS~ +-- max{Spacing time, VSi} + 1/r~
(ii) VSo- +-- max{Spacing time, VSij} + 1/rij
(iii) time stamp the cell with the value of VSII]VSij
(the concatenation
of VSi and VSij).
9 serve cells in increasing order of time stamp.
It remains to demonstrate that the added complexity is justified by the
interest such a link sharing option would have for the user. He would perhaps
be better served by setting up an individual VPC for each application.