[go: up one dir, main page]

Academia.eduAcademia.edu

Weighted fair queueing

1996, Lecture Notes in Computer Science

6 Weighted fair queueing Fair queueing originated in the data communications field, initially as a congestion control device preventing ill-behaved users from unduly affecting the service offered to others [Nag87]. In a fluid limit, fair queneing realizes head of line processor sharing, the objective being to share server capacity equally between all customers having packets to transmit. When the fluid service rate is modulated according to weights attributed to the contending traffic streams, we speak of Generalized Processor Sharing (GPS). Weighted Fair Queueing (WFQ) is a practical implementation of GPS first proposed by Demers, Keshav and Shenker [DKS89] and further discussed and analysed under the name of Packet by packet Generalized Processor Sharing (PGPS) by Parekh and Gallager [PG93a]. In the following we will use PGPS to denote this algorithm reserving WFQ as a generic term. PGPS may be considered as the closest possible approximation to GPS. Its implementation proves somewhat complex, however (see below). A simpler algorithm which is less precise but much easier to realize was proposed independently by Golestani, under the name Self Clocked Fair Queueing (SCFQ) [Go194], and by Roberts, under the name Virtual Spacing [Rob94]. In fact, this simplified algorithm was first "invented" by Davin in early work on fair queueing performed at MIT [DH90]. A related scheduling discipline is "Virtual Clock" as proposed by Zhang [Zha90] which also provides protection from ill-behaved users and can support diverse throughput guarantees. 6.1 WFQ algorithms Consider a server of constant rate c handling cells from a number m of traffic streams where each stream has its own queue and cells from a given stream are served in FIFO order. This system is depicted in Figure 6.1.1. Each stream might represent a particular ATM connection or a set of connections grouped together for traffic handling purposes. 6.1.1 Generalized Processor Sharing Generalized Processor Sharing (GPS) is a fluid approximation to the queueing system of Figure 6.1 where the service rate of any backlogged stream i is proportional to a certain rate parameter r [PG93a]. We consider the special case of GPS where, for each stream i, r is set to an intrinsic stream rate r/ and admission control is performed to ensure that ~ ri < c . The parameter 174 6. Weighted fair queueing stream 1 9 server rate c (< E stream m 1 - Figure 6.1.1: WFQ system. ri can then be interpreted as a m i n i m u m bandwidth guarantee. It might be set to an equivalent bandwidth determined to serve a set of V B R connections constituting stream i or to the leak rate of a leaky bucket controlling the burstiness of stream i. Choice of rate parameters is further discussed in Section 6.4. 6.1.2 PGPS Generalized Processor Sharing is an ideal which cannot be realized in practice because packets are not infinitely divisible. Packet-by-packet GPS [PG93a] is a generalization of the Fair Queueing algorithm proposed by Demers et al. [DKS89]. In this algorithm, variable length packets are served in an order determined so that, as far as possible, they complete service in the order which would prevail if they were in fact served using GPS. In practice, this order can o n l y be determined for the packets present at the end of each service: the server chooses to serve the packet which would finish first in GPS if there were no further arrivals. In the case of ATM cells, it can be shown that for a given arrival process, the difference between the times transmission of a given cell would finish in P G P S and GPS is bounded by one cell transmission time (l/c) [PG93a]. To calculate the hypothetical finishing times necessary for P G P S scheduling, Parekh and Gallager introduce the notion of Virtual time [PG93a]. This is a function of real time t which evolves at a rate inversely proportional to the sum of the rate factors ri of backlogged connections in the corresponding GPS system. Let PGPSi be a variable associated with stream i. Applied to ATM cells, the P G P S service algorithm can be stated as follows: 9 on a stream i cell arrival (i) PGPSi +-- max{ Virtual time, PGPSi} + 1/ri (ii) time s t a m p the cell with the value of PGPSi. 6.1. WFQ algorithms 175 9 serve cells in increasing order of time stamp. It is demonstated in [PG93a] that P G P S provides a minimum rate guarantee to the extent that the work accomplished on a given stream queue in any interval differs from that of the equivalent GPS system by at most one cell. The closeness of P G P S to GPS comes at the expense of an algorithm whose implementation would be rather complicated. The complication resides in keeping track of Virtual time which requires an evaluation of the number of backlogged streams in the equivalent GPS system. 6.1.3 Virtual clock A queue scheduling scheme related to P G P S is the Virtual Clock algorithm proposed by Zhang [Zha90]. In this algorithm, the time stamp, denoted VCi, is calculated with respect to real time t: 9 on a stream i cell arrival (i) VC~ +-- max{t,VCi} + 1/ri (ii) time stamp the cell with the value of VCi. 9 serve cells in increasing order of time stamp. This service discipline is simpler than PGPS, it is work conserving, it provides performance bounds similar to those of P G P S (see [HK96]) and it guarantees average throughput for each connection. It has, however, one disadvantage with respect to PGPS, illustrated in the following example: a stream emits a message of 1 Mbit at the multiplexer rate of c = 100 Mbit/s; although the stream has a rate parameter r = 1 Mbit/s, its transmission proceeds at peak rate since it is the only stream with a backlog; however, just before it can emit the last cell of the burst, another stream becomes active and also begins to emit a message at 100 Mbit/s; the last cell of the first burst will have a time stamp roughly equal to the current time t + l second (1 Mbit at 1 Mbit/s). Assuming the second connection also has a rate parameter equal to 1 Mbit/s, this last cell will be further delayed until nearly 1 Mbit of the new message has been transmitted. The message transfer time of the second stream would have been the same if the last cell of the first stream had been emitted without delay. The P G P S scheme would have emitted the last cell just after the first cell of the second burst thus achieving reduced message transfer time. 176 6. Weighted fair queueing In general, the Virtual Clock algorithm offers rate guarantees with much less precision than PGPS. As in the above example, realized throughput can be much smaller than ri for significant periods of time. The Virtual Spacing algorithm described below is as simple to implement as Virtual Clock and achieves nearly the same throughput guarantees as PGPS. 6.1.4 Virtual Spacing/SCFQ In the following we use the name Virtual Spacing for the special case of SCFQ in the ATM context of constant length packets. The Virtual Spacing algorithm is the same as PGPS except that we replace Virtual time by a simpler variable denoted Spacing time. Spacing time is just equal to the value of the time stamp of the last cell to have been taken from the head of the queue, i.e., in a busy period, the cell currently being transmitted. Let VSi be a variable associated with stream i. The algorithm is then: 9 on a stream i cell arrival (i) VSi e-- max{Spacing time, VSi} + 1/ri (ii) time stamp the cell with the value of VSi. 9 serve cells in increasing order of time stamp. Since Spacing time cannot be greater than the time stamp of any cell already waiting when its value was updated, step 1 of the algorithm implies that the time stamps of the cells of a backlogged stream are arithmetically spaced by the interval 1/ri. Spacing time only intervenes in the calculation of the time stamp of a cell arriving on a non-backlogged stream and allows the stream to be included at an appropriate place in the transmission schedule. 6.2 Performance guarantees This section discusses performance of WFQ algorithms in terms of delay bounds and fairness characteristics. We also discuss the adequation of these bounds, notably for dimensioining playback buffers for connections with real time delay constraints. 6.2.1 Leaky bucket controlled streams Performance bounds can be derived most easily for the idealized GPS service discipline. Suppose stream i is controlled by a leaky bucket of leak rate ri and token pool size bi so that the amount of work ui(s, t) arriving in an interval (s, t) satisfies the inequality: < +r (t - 8). 6.2. Performance guarantees 177 Let Vit be the amount of work belonging to connection i in the multiplexer queue at time t (i.e., the number of cells in the queue plus the remainder of the cell currently being transmitted). Since the service rate is not less than ri, we have by Reich's theorem: V~t <_ s u p { u i ( s , t ) - ri(s - t)} (6.2.1) s<t __ b~. (6.2.2) Now consider the output from the multiplexer in the interval (t, u). Let the amount of connection i work leaving the system in this interval be & (t, u). This work is composed of the remainder of the cell being transmitted at t, plus the cells completely transmitted in (t, u), plus the transmitted part of the cell currently being transmitted at u. We necessarily have: ~(t, u) < v~t + .~(t, ~) and thus, using (6.2.1): ~(t, ~) < sup {.~(s,t) - ,-~(t - s)} + ~.~(t,,~) sKt = sup (~.~(8, u) - ,.~(t - 8)} s(t < sup {,.~(u - s) + b~ - ,-~(t - 8)} s<t = ri(u - t) + bi. (6.2.3) The last inequality shows that the output from the multiplexer conserves the burstiness bound guaranteed at the network input by the leaky bucket. This is an extremely desirable property since it ensures that if all multiplexers on a connection path guarantee the minimum service rate ri then cell loss can be completely avoided by reserving buffer space equal to hi. Furthermore, the delay of any cell is bounded by bi/ri at each multiplexer, independently of the activity of other connections. In fact, Parekh and Gallager [PG94] have proved the stronger result that, neglecting fixed processing and propagation times, the overall delay through a network of such GPS servers is bounded by b~/r~. It should be noted that (6.2.2) and (6.2.3) derive from the properties of individual connection i alone and do not rely on all connections being controlled by a leaky bucket at the access. If some connection were completely unconstrained at the access, its potential impact on other connections could still be limited by attributing to it a minimal service rate r (_> 0) and a maximal buffer occupancy. Note finally that to derive the above bounds, we have only used the minimum service rate property of GPS service. The bounds thus apply to any other service discipline offering a minimum rate guarantee including time division multiplexing. 178 6.2.2 6. Weighted fair queueing End to end delay The overall delay bound of bi/ri for GPS applies in the fluid regime assuming "cut through" switching at each node, i.e., a cell can begin transmission at a downstream node before it has completed transmission at one or more upstream stages. With the more realistic assumption of store and forward cell switching (i.e., each cell must be received entirely at a given stage before it can be re-transmitted) it is necessary to add the transmission time at each stage to the overall delay. The delay DiGPs (K) in a network of K stages then satisfies: D~ ps (K) < bi 4- K - 1 (6.2.4) ri ri The need to account in the service discipline for the discrete nature of ATM cells introduces supplementary slackness in the above bounds. As previously noted, PGPS achieves the closest approximation to the GPS ideal. It may be shown that the delay DiPGPS (K) in a network of K stages satisfies: DPGPS (K) <_ bi ri - + K - 1 ri ~-~ 1__ +Y_-I~ = Ck' (6.2.5) where ck is the link rate at stage k [PG94]. Since the ck are typically much greater than the stream rate ri, the difference between (6.2.5) and (6.2.4) is small. A corresponding bound is proved for SCFQ in [Go195] which translates in the ATM context (i.e., for Virtual Spacing) to: K I~" mk- ---4---4-~ ri ri ck D V S ( K ) < bl 1, (6.2.6) k=l where mk is the number of streams multiplexed at stage k. Note that if all streams have the same rate parameter, the last term in (6.2.6) is approximately equal to the second. The bound then corresponds to two additional cell transmission times at the stream rate per multiplexing stage instead of one for GPS and PGPS. - The usefulness of the above end to end delay bounds depends on the QoS requirements of the traffic stream in question. We distinguish two broad classes of streams depending on whether their connections have strict real time delay constraints or not, For the former, it is essential that the end to end delay be very small and that its variability (jitter) be known, notably for dimensioning the receiver playback buffer. 6.2.3 Jitter and playback buffer dimensioning Assume the playback buffer operates as a spacer emitting cells with a minimum interval of 1/ri. No cells will be lost if the buffer is greater than riDm~x 6.2. Performance guarantees 179 where Dmax is the end to end WFQ delay bound. If no cells are lost, the overall delay actually experienced by the cells of a connection in the network and the buffer will be a non-decreasing function of the rank of the cell and will attain a maximum value less than or equal to Dm~x. Consider the case of a telephone connection handled as an individual stream with rate parameter ri equivalent (in cells per second) to 64 Kbit/s and a low CDV tolerance parameter (corresponding to bi = 2, say) accounting for initial jitter. Even for GPS service, to account for the maximum possible delay in the playback buffer, it would be necessary to allow 6 ms (i.e., one cell time at 64 Kbit/s) for each multiplexing stage in addition to the initial CDV tolerance. In a large network, this can represent an unacceptably large delay for interactive communications. Typically, the playback buffer would not be dimensioned for the worst possible delay but for a maximum delay equal to a suitably small quantile of the delay distribution. In this respect, P G P S and Virtual Spacing perform better than GPS even though their bounds are greater. Suppose, for illustration purposes, that all streams are individual 64 Kbit/s CBR connections and all the link bandwidth is allocated ( ~ ri = c). In GPS, each cell takes exactly 6 ms to comPlete transmission. P G P S and Virtual Spacing, on the other hand, behave more like a FIFO queue: cell transmission time is equal to one service time of 1/c plus a random delay equal to the waiting time in an N * D / D / 1 queue (see Part III, Section 15.2). The bounds (6.2.5) and (6.2.6) are determined from a worst case scenario and correspond to the delay of the last cell to be served when all connections emit cells at precisely the same instant. The actual delay of an arbitrary cell is typically very much smaller than this and a playback buffer dimensioned as discussed in Section 3.1.2 would be sufficient. This statement is generally true when multiplexing only streams with low burstiness (i.e., a small value of bl). Consider now the impact of bursty traffic on the jitter of real time connections. Since their delay is small, the cells of CBR streams generally arrive to a non-backlogged queue. They consequently have their time stamp derived using the value of virtual time ( Virtual time or Spacing time). Cells of bursty connections, on the other hand, have their time stamps spaced by the reciprocal of their rate parameter. Assume for illustration purposes that all streams have the same rate parameter r but that some are bursty while others are CBR. We consider the operation of Virtual Spacing. It is clear that Spacing time is a non-decreasing function of real time since the time stamps of waiting cells, including those added to the queue since the last service instant, are greater than or equal to the current value of Spacing time. At the arrival of a cell of a CBR connection, its timestamp is set to Spacing time + l / r . Now, the time stamp of the first celt of any backlogged stream cannot be greater than this and is generally smaller (being 1 / r greater than the time stamp of the last cell to have been served which cannot be greater than the current 180 6. Weighted fair queueing value of Spacing time). Consequently, in this example, every CBR cell will have to wait for service behind one cell from every backlogged stream. This is a systematic increase in the delay of CBR streams which hardly profits the bursty traffic. A solution giving priority to non-backlogged streams consists in modifying the Virtual Spacing algorithm as follows: 9 on a stream i packet arrival (i) VSi 4-- max{Spacing time, VSi + 1/ri} (ii) time stamp the packet with the value of VSi. 9 serve packets in increasing order of time stamp. A similar modification seems appropriate in the case of PGPS. The worst case delay scenarios used to derive the delay bounds are the same so that these remain unchanged for the revised algorithm. We have argued that the delay bounds are too loose to be useful for low rate streams with real time constraints (for dimensioning playback buffers, for example) 1. For bursty connections the delay bounds are even looser in so far as the parameter bi for such connections is typically much larger than the mean burst size and realized delays are typically very much smaller than these worst case bounds [RBC93]. The more significant feature of W F Q algorithms for such streams is their fairness. 6.2.4 Fairness The term "weighted fair queueing" implies sharing available bandwidth between active streams in proportion to their rate parameters ri. This can only be achieved exactly in the theoretical GPS scheduling algorithm. In any packet by packet algorithm, bandwidth can only be shared to within the granularity defined by the packet transmission time. We interpret fairness to mean that, in any given interval (s, t] throughout which stream i remains backlogged, there is a constant T(s, t) such that the number ~i (s, t) of stream i cells served in (s, t] satisfies: T(s, t)ri - 1 < Ui(s, t) <_T(s, t)ri + 1. (6.2.7) Notice that even with GPS where the work performed is exactly proportional to the rate parameters, the number of cells transmitted in an interval can only be defined by an inequality equivalent to (6.2.7). The inequality shows that the number of cells taken from the queue tends to a fair share as the 1Note t h a t the problem is less severe for higher rate video s t r e a m s t h a n for telephone connections and could be m a d e negligible for the latter if the ' s t r e a m ' grouped together a sufficient n u m b e r of connections with an appropriately defined rate parameter. 6.3. Realising WFQ 181 considered interval increases (qi(s,t) > > 1). The inequality (6.2.7) can be demonstrated as follows for the Virtual Spacing algorithm. Let the time stamps of cells served at or immediately before s and t be v, and vt, respectively. Assume stream i is backlogged in (s, t] and let Oi be the time stamp of the first cell of this stream to be served in the interval. Since stream i time stamps are spaced by 1/ri, Oi satisfies: 73 < Oi <_ v~ + 1/ri. (6.2.8) Note that we have Oi = v8 + 1/ri when stream i becomes backlogged before s but after the immediately preceding service instant. Since the time stamps of all stream i cells served in (s, t] are necessarily less than or equal to vt (Spacing time is non-decreasing), we deduce: + t) - oi + < (6.2.9) > (6.2.10) Substituting appropriate bounds on 0~ from (6.2.8) we derive (6.2.7) with T ( s , t ) = 7t - vs. Note that this result applies to both the original Virtual Spacing algorithm of Section 6.1.4 and the modified algorithm suggested in Section 6.2.3. Further fairness properties are demonstrated in the variable packet length context of SCFQ in [Go194]. The fairness of PGPS is not directly addressed in [PG93a] but appears to follow immediately from its proven closeness to the GPS ideal. 6.3 Realising WFQ WFQ is certainly much more complicated than the simple FIFO queues which are used exclusively in most ATM switch architectures. It may however be argued that its substantial traffic control advantages largely justify the increased complexity. A key requirement is to be able to rapidly sort cells in increasing time stamp order. 6.3.1 Sort queueing We consider an output queueing switch module where service scheduling is performed by a dedicated output controller on each outgoing multiplex. The switch fabric is assumed capable of routing cells to the required output without significantly changing the input traffic characteristics due to cell loss or delay. The weighted fair queueing algorithms discussed in Section 6.1 rely on service in increasing order of time stamp. The multiplexer output controller must therefore be able to schedule waiting cells so that the cell selected for transmission is always the one with the smallest time stamp. One way of doing this is to sort the cells in a serial memory in order of increasing 182 6. Weighted fair queueing time stamp; a newly arriving cell is inserted at the appropriate queue place by comparing its time stamp with those of cells already waiting. Devices for rapidly performing this sorting function have been designed by Chao [Cha91]. An alternative design considered in the COST 242 project is based on the following real time sort algorithm [RBS95]. To perform scheduling according to the value of a time stamp, it is not necessary to completely sort messages in increasing time stamp order but only to identify the message with the smallest time stamp at any time when a service can take place. The following algorithm realizes this objective by performing a set of comparison and shift operations in parallel on messages arranged in two tables A(j) and B(j), 0 <_j < m. Each word of tables A and B is either set to a default maximum value (all bits set to 1) or represents a message and its associated time stamp. The time stamp occupies the k leftmost bits while the remainder identifies the message content (generally a pointer to an address). For the sake of simplicity we neglect -the problem of time wrap around (i.e., the fact that the time periodically comes back to zero every 2k units) and assume the time stamp unambiguously determines the service order: the message represented by A(i) will be served before the message represented by B(j), say, if A(i) < B(j). All words of tables A and B are initially set to the default value. The sorting algorithm has two phases, one when a new message is inserted, one when the smallest valued message is extracted. A new message is written to word A(0). Words A(j) and B(j), for 0 < j < m, are compared and their contents interchanged if A(j) < B(j). After this operation we therefore have B(j) <_A(j) for 0 _< j < rn. The words in table A are then shifted one step downwards: A(j) +- A ( j - 1) for 1 < j < m. A(0) is re-initialized to the default maximum value. Note that the value initially stored in word A(m) is lost. Messages are extracted on reading from word B(0). By working out particular examples it is easy to convince oneself that this address effectively corresponds to the message with the smallest time stamp. A formal proof is given in [RBS95]. As above, words A(j) and B(j), for 1 _< j _< m, are compared and their content interchanged if A(j) < B(j). The words in table B are then shifted one step upwards: B(j) +- B(j + 1) for 0 < j < m. The value of B(rn) is re-initialized to the default value. An integrated circuit design realizing the above algorithm is proposed in [RBS95]. This circuit turns out to be very similar in conception to so-called systolic sorters used in data processing applications [CM88]. 6.3.2 Virtual Spacing algorithm The Sort Queue associated with each output may have space to contain the entire cell but it would probably be more economical to stock the cell content in a general purpose memory with the Sort Queue entry containing the time stamp and a pointer to the corresponding cell address. This pointer might 6.4. Fixing rate parameters 183 be the address itself or just the identity of the connection to which the cell belongs: the cell address would then be derived from information contained in a connection context. If we use a connection context to identify the waiting ceils, the Sort Queue need only be long enough to contain one entry for each connection set up on the link. This entry bears the time stamp of the cell which is currently head of the line. When this cell is transmitted, the output controller replaces the time stamp of the Sort Queue entry by that of the cell next in line. This is possible because, with Virtual Spacing, the time stamp of a cell on a backlogged connection can be determined as late as the departure instant of the preceding cell. A cell arriving to a non-backlogged connection, having its time stamp determined by the current value of Spacing Time, is entered directly to the Sort Queue. 6.4 Fixing rate parameters The implementation of WFQ would return to ATM some of its original promise of flexibility necessary for the development of a future safe network. In this section we aim to illustrate this potential by suggesting how different services might be supported. The main distinction is between services with and without real time delay constraints. 6.4.1 Services with real time constraints Services like voice or video telephony with strict delay constraints and low bandwidth requirements are ideally suited to multiplexing with cell scale congestion only, i.e., using Rate Envelope Multiplexing, as discussed in Section 4.1. This operating mode can be simulated using WFQ by attributing stream parameters for groups of connections with a small value of b and a value of r chosen to ensure the required cell loss ratio (CLR). Consider a service like the telephone where individual calls appear as on/off VBR connections whose bit rate characteristics are known (i.e., we know the peak rate and the mean rate is known in a statistical sense for the population of telephone calls). It is then possible to accurately predict the stationary probability distribution of the combined instantaneous bit rate of a group of n connections (the number of active connections is binomial). Denote the bit rate at time t by At(n). Approximating CLR by the freeze-out fraction, we could fix r such that <c (6.4.1) where c is the target CLR. A stream memory parameter b of a few tens of cell places would be necessary to avoid cell loss when the combined arrival rate is not greater than r. Note that this is not a dedicated memory but rather 184 6. Weighted fair queueing a device for deciding when to reject cells: the fact that the stream queue exceeds b is evidence that the input rate is currently exceeding the allocated rate r. Any cell loss rate can be fixed, without affecting the service offered to other connections, simply by the choice of the function r(n) denoting the rate threshold above which arriving cells will be lost. This allocation effectively ensures that the stream CLR is not greater than the target e. However, the rate allocation may be overly generous leading to inefficient link utilisation since no account is taken of statistical resource sharing with other streams; as the rate available to the stream is generally greater than r, the CLR may be much smaller than e. For example, if another service with the same QoS requirements were handled by the multiplexer it would be possible to group the connections of both services in a single stream. Clearly, the rate required for the combined stream calculated as above would generally be considerably smaller than the sum of the individual rates. Consider the effective bandwidth framework for REM discussed in Section 5.2.2 and assume stream i connections are all representative of a certain type. Let ei(e, c) be the effective bandwidth of a connection of type i for a link of rate c and target CLR e and let ni be the number of such connections. If all streams multiplexed in the WFQ system had real time delay constraints and the same CLR requirement e, the allocation of rate parameters would be sufficient. It is tempting to suppose that, if streams have different CLR requirements e~, a rate allocation calculated according to rl =niei(ci, c) (6.4.2) would be sufficient. In [LT95a], Lindberger and Tidblom demonstrate that this is indeed a conservative rate allocation policy in that realized CLR for each class is less than the respective target value. It is also argued that the allocation (6.4.2) is satisfactory even when some multiplexed streams are bursty with a large buffer parameter b although we note that for periods when the latter are backlogged, the expected CLR of the real time traffic will be greater than the target c. Multiplexing gains obtained through resource sharing between streams as discussed above can be said to rely on knowledge of traffic characteristics of all streams (e.g., due to leaky bucket access control). However, it is shown in [LT95a] that if one stream (or class of streams) were to set its rate parameter ri at a value lower than its actual mean rate then, although that stream would experience a loss ratio greater than the target, the other streams would still be protected by the fair queueing mechanism. An incorrect mean rate only affects the stream concerned and any other streams grouped in the same class. 6.4. Fixing rate parameters 185 Absolute guarantees on individual stream performance may be difficult to achieve since it would be necessary to make worst case assumptions about competing traffic. The appropriate choice of rate parameters might need to be safer for streams requiring strict guarantees. The appropriate rate parameters for VBR streams having different CLR requirements would then need to be determined using an effective bandwidth derived assuming a link capacity somewhat less than the capacity actually reserved for all these types of VBR streams. 6.4.2 Leaky bucket defined connections Individual connections defined by leaky bucket parameters r and b are naturally realized in WFQ as streams with corresponding parameters. WFQ allows the network to provide throughput r in a very clear sense and to guarantee negligible cell loss. On the other hand, although delay is bounded as shown in Section 6.2.3, these bounds are too loose to constitute useful performance guarantees. It is argued in [RBC93] that realized delays should typically correspond to peak rate transmission most of the time. However, the need to account for burst scale congestion and the inherent unpredictability of data traffic make it impossible to strictly guarantee more than the unduly pessimistic worst case bounds. In fact, the quality of service on data connections is manifested by end to end delays which depend mainly on the user's choice of leaky bucket parameters since these determine access delays. There appears an obvious trade-off between the respective values of r and b: if r is close to the mean emission rate then b must be very large; conversely, if b is limited (e.g., by the network operator) then the user may need to choose r several times greater than the mean rate (see the examples cited in Section 5.3.2 and [RBC93]). It is noted that, for low access delays, traffic parameter r must be a small multiple of the mean rate while b should be an order of magnitude greater than the mean burst size. While it is straightforward with WFQ to guarantee the minimal throughput r, it remains to determine the engineering rules for dimensioning multiplex buffers to ensure negligible cell loss given the b value of multiplexed streams. This requirement should be considerably less than the sum of the parameters b. One possibility would be for the network to fix the value of b based on available memory; this value could even evolve dynamically according to current traffic conditions. 6.4.3 Best effort data c o n n e c t i o n s In a private ATM network, in particular, it may be considered unnecessarily restrictive to impose access control by a leaky bucket: why limit the mean input rate to ri at times when more capacity is available on network links. The danger of uncontrolled connections saturating network links and affecting the 186 6. Weighted fair queueing quality of service of other users can be alleviated when WPQ is used by allocating a minimal service rate r and maximum buffer occupancy b, as for leaky bucket controlled connections. A single "connection" of given parameters r and b may be reserved for all best effort traffic, including datagrams. The impact of such connections on other users is thus limited while any spare capacity is immediately available to them. Users would ideally adjust their input rate to prevailing traffic conditions using end to end protocols like TCP. This type of service is standard in current LANs and in data networks like the Internet. In the B-ISDN, the Available Bit Rate (ABR) transfer capability is a means of providing best effort service in which unreserved bandwidth is shared dynamically (ref to ABR section). ABR connections are generally assumed to share link capacity not currently attributed to connections with QoS guarantees. In the present case of WFQ scheduling, the notion of attributed rate is clear for streams with a small buffer parameter where r is determined for peak rate allocation or REM as discussed above in Section 6.4.1. For leaky bucket controlled streams, on the other hand, the rate reservation is only a long term committment and achieving satisfactory QoS relies on bursts usually arriving to find the link with sufficient capacity available to carry the burst at peak rate. If ABR connections are attributed all unused capacity, such bursts will never have a throughput greater than their reserved rate r. 6.4.4 Virtual Private N e t w o r k s The creation of Virtual Private Networks using interconnected VPCs is an interesting possibility afforded by the B-ISDN [WA92]. Users create VPCs of specified capacity between network nodes according to their particular interconnection requirement. Now, at a given node, the sum of the capacities of incoming VPCs can exceed the capacity of any outgoing VPC of the same VPN. With simple FIFO queues, in order to ensure that the traffic offered to such an outgoing VPC does not exceed its designated capacity and consequently interfere with other connections on the same link, it is necessary to implement "output policing" [WA92]. This is a considerable complication to the design of nodes which generally only control the conformity of traffic at their inputs. The generalized use of WFQ constitutes a (partial) solution to this problem: in a node terminating the VPCs of a VPN, a WFQ algorithm would simply be applied with the parameters r and b of the outgoing VPCs; it still remains possible for the traffic carried by these VPCs to exceed their allocation but only when the other connections on the link do not fully utilize their own bandwidth allocation. An output policing mechanism for controlling the VPC peak rate, as considered in [WA92], would probably need to actually space (and not just 6.4. Fixing rate parameters 187 virtually space) the cells at the designated rate. However, it is not obvious that this strict rate control is preferable to the WFQ solution in the definition of the VPN service. It does not appear any easier to implement. 6.4.5 Link sharing The considerations in this section are inspired by the discussion on resource sharing in an unpublished Internet Draft on "A service model for an integrated services Internet" by Shenker, Clark and Zhang. A user to user VPC is typically used for a variety of applications materialized by a number of connections identified by a VC identifier. A user may wish to share the bandwidth reserved for the VPC between different groups of applications with minimum rate guarantees calculated to meet specific performance requirements, notably for real time communications. The minimum rates could be guaranteed by applying WFQ with rate parameters defined appropriately for each application group considered as a stream within the VPC. The disadvantage with this approach is that the rate of a momentarily idle stream would be shared between all streams on the multiplexer in proportion to their rate parameters. In order for the rate to be reserved for the VPC, except when all the VPC streams have no backlog, it is necessary to implement a hierarchical form of WFQ. Let the rate allocated to the VPC stream be ri and assume this rate is to be distributed to m substreams with individual rate parameters rlj with rij = ri. A modified Virtual Spacing algorithm realizing the above sharing objectives is as follows. Let VSi and VSij be variables associated with stream i and substream ij, respectively. The algorithm is then: 9 on a stream i cell arrival (i) VS~ +-- max{Spacing time, VSi} + 1/r~ (ii) VSo- +-- max{Spacing time, VSij} + 1/rij (iii) time stamp the cell with the value of VSII]VSij (the concatenation of VSi and VSij). 9 serve cells in increasing order of time stamp. It remains to demonstrate that the added complexity is justified by the interest such a link sharing option would have for the user. He would perhaps be better served by setting up an individual VPC for each application.