WO2024187312A1

WO2024187312A1 - A device and methodology for asynchronous deterministic scheduling in networks at large scale

Info

Publication number: WO2024187312A1
Application number: PCT/CN2023/080838
Authority: WO
Inventors: Siyu Tang; Guanhua ZHUANG; Mohamed ELATTAR; Hongming GAO; Zhe Lou
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2024-09-19

Abstract

The present disclosure provides a data processing apparatus for implementing a two-dimensional scheduling methodology for scheduling data packets, the data processing apparatus comprising one or more processors configured to process data packets, each data packet belonging to a class and each class having a predetermined first transmission priority associated therewith, the one or more processors configured to: determine a transmission cycle time period for a transmission cycle, within which data packets are transmitted; receive, for one or more of a plurality of classes, data packets of data to be transmitted, each class having a transmission time; schedule in a first transmission cycle the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period.

Description

A DEVICE AND METHODOLOGY FOR ASYNCHRONOUS DETERMINISTIC SCHEDULING IN NETWORKS AT LARGE SCALE

FIELD OF THE INVENTION

This disclosure relates to an apparatus and method for implementing a two-dimensional scheduling methodology for scheduling network packets by introducing an additional parameter of transmission cycle time period for a plurality of classes of data packets.

BACKGROUND

In the transmission of data across any network there is a need to employ some form of scheduling scheme in order to organise the traffic transmitted over and around the network. There are many ways in which this can be achieved. Traditional scheduling schemes such as Generalized Processor Sharing (GPS) , Fair Queuing (FQ) , Weighted Fair Queuing (WFQ) , Round Robin (RR) , Deficit Round Robin (DRR) or Weighted Round Robin (WRR) are designed to improve fairness between different transmission flows and allow data to be transmitted in a fair manner. These schemes normally rely on mathematical tools (for example queuing theory, network calculus) to compute the average waiting time or maximum latency bound of data packet transmission. It is well-known that the external arrival process in queueing theory (e.g., latency analysis of a PP scheduler) necessarily takes the form of a Poisson distribution, however, this does not accurately describe the real world distribution of real data traffic arrivals. On the other hand, when latency bounds are calculated using network calculus, they suffer from a number of drawbacks, for example the bounds are too loose, or that the latency bounds compromise network utilization or are too costly to compute. Such drawbacks are undesirable and thus alternate solutions have been sought.

One solution that has been explored is the use of algorithms that are specifically designed for end-to-end (E2E) latency guarantee. Some examples of such algorithms are Rate Controlled Service Discipline (RSCP) /Weighted Fair Queuing (WFQ) /Virtual Circuit (VC) /Jitter-Earliest Due Date/Urgent-based Shaper (UBS) . Performing scheduling using these schemes is usually achieved at the cost of increased implementation complexity, for example by introducing per-flow queuing, sorted queues, a large amount of FIFO queues, maintenance of flow state, or time/frequency synchronization. Such complexity cannot be supported by existing commodity hardware.

SUMMARY

A data processing apparatus for implementing a two-dimensional scheduling methodology for scheduling data packets, the data processing apparatus comprising one or more processors configured to process data packets, each data packet belonging to a class and each class having a predetermined first transmission priority associated therewith, the one or more processors configured to: determine a transmission cycle time period for a transmission cycle, within which data packets are transmitted; receive, for one or more of a plurality of classes, data packets of data to be transmitted, each class having a transmission time; schedule in a first transmission cycle the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period. This allows multi-class real-time traffic to be supported with guaranteed latency and jitter bound.

A data processing apparatus as above, wherein the one or more processors are configured to schedule the data packets such that a total transmission time of the scheduled data packets for each of the plurality of classes in the first transmission cycle is equal to the transmission cycle time period. This allows the complete transmission cycle time period to be used during basic scheduling of this disclosure.

A data processing apparatus as above, wherein the one or more processors are configured to further schedule the data packets based on either a strict priority of the plurality of classes and/or a weight assigned to each of each of the plurality of classes. This allows data packets from the plurality of classes to be prioritised to avoid delaying higher priority traffic.

A data processing apparatus as above, wherein the one or more processors are further configured to schedule data packets of the plurality of classes for transmission in a subsequent transmission cycle of the determined transmission cycle time period, in dependence on the first transmission priority and the transmission cycle time period. This allows for subsequent transmission cycles to schedule traffic in the same manner as a first transmission cycle.

A data processing apparatus as above, wherein the data packets are scheduled in each transmission cycle using a round robin scheme. This provides a scheduling scheme with which to schedule traffic.

A data processing apparatus as above, wherein the round robin scheme is a deficit round robin scheme. This allows a deficit counter to be incorporated in the scheduling scheme that can be used to inform further scheduling decisions.

A data processing apparatus as above, wherein the one or more processors are configured, in scheduling the data packets of the plurality of classes for transmission in the first transmission cycle, to: determine a total transmission time of the scheduled data packets in comparison to the determined transmission cycle time period; then if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period, schedule data packets for each of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, or if a total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, schedule data packets in subsequent transmission cycle. This allows for early transmission of high priority data packets when non- uniform traffic is received and provides optimal scheduling during uneven flow rate of traffic.

A data processing apparatus according to claims 4 or 6, wherein the one or more processors are configured to schedule data packets for each of the plurality of classes in the subsequent transmission cycle in dependence on the first transmission priority and the transmission cycle time period, and/or such that a total transmission time of the scheduled data packets for each of the plurality of classes in the subsequent transmission cycle is equal to the transmission cycle time period. This allows for the subsequent transmission cycles to schedule traffic based on the transmission cycle time period thus providing latency and jitter bound guarantees.

A method of data processing for scheduling the transmission of data packets, each data packet belonging to a class and each class having a predetermined first transmission priority associated therewith, the method comprising: determining a transmission cycle time period for a transmission cycle, within which data packets are transmitted; receiving, for one or more of a plurality of classes, data packets of data to be transmitted, each class having a transmission time; scheduling in a first transmission cycle the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period. This allows multi-class real-time traffic to be supported with guaranteed latency and jitter bound.

A method of data processing as above, wherein the data packets are scheduled such that a total transmission time of the scheduled data packets for each of the plurality of classes in the first transmission cycle is equal to the transmission cycle time period. This allows the complete transmission cycle time period to be used during basic scheduling of this disclosure.

A method of data processing as above, further comprising scheduling the data packets based on either a strict priority of the plurality of classes and/or a weight assigned to each of each of the plurality of classes. This allows data packets from the plurality of classes to be prioritised to avoid delaying higher priority traffic.

A method of data processing as above, further comprising scheduling data packets of the plurality of classes for transmission in a subsequent transmission cycle of the determined transmission cycle time period, in dependence on the first transmission priority and the transmission cycle time period. This allows for subsequent transmission cycles to schedule traffic in the same manner as a first transmission cycle.

A method of data processing as above, wherein the data packets are scheduled in each transmission cycle using a round robin scheme. This provides a scheduling scheme with which to schedule traffic.

A method of data processing as above, wherein the round robin scheme is a deficit round robin scheme. This allows a deficit counter to be incorporated in the scheduling scheme that can be used to inform further scheduling decisions.

A method of data processing as above, the method further comprises, in scheduling the data packets of the plurality of classes for transmission in the first transmission cycle: determining a total transmission time of the scheduled data packets for each of the plurality of classes in comparison to the determined transmission cycle time period; then if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period, schedule data packets for each of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, or if a total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, schedule data packets in subsequent transmission cycle. This allows for early transmission of high priority data packets when non- uniform traffic is received and provides optimal scheduling during uneven flow rate of traffic.

A method of data processing as above, wherein the method further comprises scheduling data packets for each of the plurality of classes in the subsequent transmission cycle in dependence on the first transmission priority and the transmission cycle time period, and/or such that a total transmission time of the scheduled data packets for each of the plurality of classes in the subsequent transmission cycle is equal to the transmission cycle time period. This allows for the subsequent transmission cycles to schedule traffic based on the duration delta thus providing latency and jitter bound guarantees.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 illustrates deadline violation with the linear integer configuration model using the approach proposed by 802.1 Qdy;

Figure 2 illustrates an example of LJB-DRR: queue architecture;

Figure 3 illustrates an example of an LJB-DRR scheduling algorithm (Algorithm 1) ;

Figure 4a illustrates an example of a basic two-dimensional scheduling scheme of this disclosure;

Figure 4b illustrates an example service curve for the basic scheduling scheme of Figure 4a;

Figure 5 illustrates an example of an LJB-DRR scheduling algorithm including early transmission methodology;

Figure 6 illustrates an example of a two-dimensional scheduling scheme of this disclosure that includes early transmission scheduling;

Figure 7a illustrates the data packet latency in a classical DRR scheduling system;

Figure 7b illustrates the data packet latency in a two-dimensional early transmission scheduling system of this disclosure;

Figure 8a illustrates a comparison for all classes between the latency bound in a Paternoster scheduling scheme and that of the scheduling scheme of this disclosure;

Figure 8b illustrates a comparison for all classes between the jitter bound in a Paternoster scheduling scheme and that of the scheduling scheme of this disclosure;

Figure 9 illustrates comparisons for one class between the latency/jitter bounds in a classic scheduling system and that of the scheduling scheme of this disclosure.

DETAILED DESCRIPTION OF THE INVENTION

The IEEE 802.1 time sensitive networking working group have standardized a number of shaping schemes that attempt to address the problem of time-critical scheduling, such examples include using a Qbv (Time-Aware-Shaper) , and Qch (Cyclic Queue Forwarding) . However, these devices need to be synchronized in time or frequency so that extremely low latency and jitter can be guaranteed. The asynchronous traffic shaper (IEEE 802.1 Qcr) defined in the time sensitive networking working group requires networking devices to maintain per-flow state. One drawback of such an approach is that this approach leads to scalability issues when deploying such solutions on large scale networks. IEEE 802.1 Qdv specified a methodology (known as Paternoster-SP without frequency synchronization and Multi-CQF if frequency synchronization is applied) to enqueue packets of a traffic class into three or four queues that rotate at a constant frequency at a single node. An example of queue rotation utilised in Paternoster scheduling is that when a new epoch begins, the “current” queue becomes the “prior” queue, the “next” and “last” queues (and their remaining allocation for each reservation) become “current” , and “next” respectively. The previous prior queue (which should now be empty because the data packets have been transmitted in the previous epoch) becomes the new “last” queue. This Paternoster operation repeats at each epoch such that the four queues alternate during each epoch. However, in Paternoster scheduling the packets from the different priority classes within each queue are scheduled only in accordance with a strict priority scheduling scheme that is intrinsic to each class.

The length of the rotating queues of different traffic classes does not have to be the same. A general guideline proposed by 802.1 Qdv of setting the queue length (or epoch time) associated with each traffic class is to use multiple values of Δ₁, where Δ₁ is the smallest epoch associated with the highest priority class. In such a configuration which uses multiple values of the epochs, this leads to a large latency and jitter bound. Conversely, when a linear epoch configuration model is employed, this generates a tighter latency and jitter bound, but causes deadline violation for certain traffic classes. An example of this is shown in Figure 1, which illustrates deadline violation with the linear integer configuration model using the approach proposed by the IEEE 802.1 Qdv working group. Figure 1 demonstrates the maximum queuing time for each class of four classes at a single node. The dotted line for each class shown in the figure indicates the delay bound. The square points and line demonstrate epochs configured with a linear integer model, while the triangles and associated line demonstrate the epochs configured with a multiple integer model. As can be seen in Figure 1, the linear model leads to deadline violation with a maximum queuing time exceeding the maximum delay bound, where Q represents the queue length.

The asynchronous traffic shaper defined in IEEE 802.1 Qcr needs to maintain per-flow state at each networking node, leading to scalability issue. Traditional scheduling schemes such WRR/DRR/SP are designed to guarantee flow fairness (i.e., share of bandwidth) rather than guaranteeing bounded latency and jitter. In SP, low priority traffic are starved due to higher priority traffic. In WRR/DRR, the latency bound could be computed using network calculus, but no bound for jitter could be computed. Other schemes such as RCSD or CJVC require special hardware to re-order packets.

The embodiments of this disclosure address the issues discussed in past scheduling methods. The present disclosure is designed to solve the deadline violation issue described above. The general model described in this disclosure is more flexible than the state of the art and supports a wider range of application cycle times and achieves lower implementational complexity.

In order to address the above problems, the present disclosure introduces a two-dimensional scheduling/shaping methodology known as LJB-DRR (Latency-Jitter Bounded Deficit Round Robin) , an asynchronous and light-weighted scheduling methodology to guarantee Service Level Agreement (SLA) requirements of multiple service classes. LJB-DRR introduces traffic–awareness on top of classical deficit round robin (DRR) scheduling scheme. With the two parameters, i.e., weight and scheduling duration, it is able to get rid of time/frequency synchronization and support multi-class SLA guarantee.

LJB-DRR is a two-dimensional scheduling methodology to guarantee latency and jitter bound for service classes with different SLA requirements. The scheduling methodology introduces a time-based scheduling parameter Δ, representing a transmission cycle time period, on top of classical scheduling schemes that aims for bandwidth guarantee, e.g., classical-DRR. A novel two-dimensional scheduling methodology that avoids interference from low priority classes to high priority traffic classes (as in the case of classical-DRR) is also described herein as well as a set of rules to define the basic scheduling method during each transmission cycle time period Δ so that the latency/jitter bound of each service class can be satisfied. Furthermore, this disclosure provides a set of rules to define the early transmission phase so that the latency/jitter bound of a high priority class can be further reduced.

In particular, the present disclosure introduces a data processing apparatus and method for implementing a two-dimensional scheduling methodology for scheduling data packets. The data processing apparatus may comprise one or more processors as appropriate and these one or more processors may be configured to process data packets, each data packet belonging to a (traffic) class and each class having a predetermined first transmission priority. The first transmission priority may be the strict transmission priority of that class but may alternatively be an assigned transmission priority.

The present disclosure will describe one example of a two-dimensional scheduling methodology relating to LJB-DRR that, as shown in Figure 2, requires only one single queue per service class per port p. For the purpose of this disclosure the queue length of a class i is defined as σ_i, where a high priority service class is assigned with smaller σ_i (smaller traffic burst) and a lower priority service class is assigned with larger σ_i (larger traffic burst) . The present disclosure introduces a periodic scheduling duration Δ, referred to herein as a transmission cycle time period, where a new round of scheduling repeats every Δ periods. For the purpose of this disclosure, C is defined as the link capacity (in bits per second) and λ_i is defined as the fraction of bandwidth share that is guaranteed for class i traffic. In LJB-DRR, the weight ω_i associated with class i may be computed as ω_i=C·λ_i·Δ, and may define the maximum number of bits that can be scheduled from class i in each Δ period. Since LJB-DRR is an asynchronous scheme, no clock/frequency synchronization techniques are needed. The present disclosure will describe two primary embodiments, that the scheduling algorithm is composed of a basic scheduling phase and an early transmission phase. Both of these embodiments may be combined as part of the same scheduling algorithm, if so desired. In this disclosure a number of variables will be used and these may be defined as follows. N indicates the number of queues Q, Ω= [ω₁, ω₂, …, ω_i] , where Ω may be a constant integer dependent on a vector weight ω_i of class i. A further integer DC= [c₁, c₂, …, c_N] which represents a vector deficit counter c_i of class i and may be initialised as c_i = ω_i.

As shown in the example of Figure 2, Figure 2 illustrates a number of classes from 1 to N each comprised of data packets 201 and representing traffic that is latency/jitter guaranteed depicted by the dotted line in Figure 2. As can be seen in Figure 2, the two-dimensional scheduling methodology may be configured to receive, for one or more of a plurality of classes, data packets to be transmitted during the transmission cycle time period Δ that has been determined. Each of the plurality of classes that are comprised of data packets may have a set transmission time. This transmission time may represent the time taken for a data packet of that class to be transmitted or alternatively, for all the data packets of that class to be transmitted in a transmission cycle. The one or more processors configured to implement the LBJ-DRR scheduling methodology may then be configured to schedule, in a first transmission cycle, the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period. This scheduling may be performed based on a calculated weight of each of the classes and/or data packets within the plurality of classes. In addition, to the plurality of classes comprised of data packets for scheduling, there may also be a class of best effort data packets that may also be scheduled. Ordinarily these data packets will have the lowest priority, but this is not always the case, and any priority may be set for these data packets.

Example pseudocode representing the basic scheduling scheme can be seen in Figure. 3 of the present disclosure. The pseudocode and function thereof will now be described in relation to Figure 3, and the variables described above. Figure 3 shows Algorithm 1 relating to the Latency Jitter Bounded Deficit Round Robin that may be implemented on the data processing apparatus of this disclosure. In the initialization stage of Algorithm 1, the weighting of each class may be set in lines 4 to 6 wherein each class is allowed to transmit maximally, ω_i bits during the transmission cycle time period Δ. This initialisation step is optional and may be performed by a separate apparatus prior to the basic scheduling loop being performed by the primary apparatus of this disclosure.

As shown in Algorithm 1 (Figure 3) , the one or more processors of the data processing apparatus may be configured to reset the deficit counter to c_i=ω_i every Δ period (as seen in lines 8-10 of Algorithm 1) . During each transmission cycle time period Δ, traffic (data packets) from difference service classes may be scheduled based on a weight ω_i of each of the plurality of classes from high priority class to low priority class (line 11 of Algorithm 1) . The weight ω_i may represent a first transmission priority, or alternatively, the first transmission priority may be the strict priority of the class and the weight may be used to supplement this first transmission priority. This means that the traffic from different classes may be scheduled based on DRR from high priority classes to low priority classes based on the transmission cycle time period Δ.

As shown in lines 12 to 16 of Algorithm 1, if the queue q_i of a service class i is not empty and if the deficit counter c_i is greater than the packet size at the head of the queue, the data packets may then be transmitted and the value of the counter may be decremented by the packet size c_i-s, where s is the packet size in bits. Once the queue is empty or the value of the counter is insufficient to transmit a head-of-line packet, the scheduler will skip to the next class. Packets from class i may be scheduled only when c_i>0 during basic scheduling methodology.

Furthermore, due to the stochastic nature of the arrival traffic, the output link of a port is not always fully loaded, i.e., incoming traffic rate fluctuates as a function of time in an asynchronous scheduling system. To account for this fluctuation of traffic within a system this disclosure also includes that the data scheduling apparatus and method may be configured to perform early transmission of one or more data packets based one the total transmission time of the scheduled data packets within the plurality of classes and the transmission cycle time period. This allows data packets from higher priority classes with short transmission time periods to be scheduled early using available bandwidth that is free during sparse periods of traffic rate flow. This further protects against deadline violation by increasing the efficiency of data packet transmission.

Figure 4a demonstrates an example of basic scheduling performed by the method and/or data processing apparatus of this disclosure, which will now be described.

As shown in Figure 4a, and similar to Figure 2, the example includes the use of four service classes (class 1 with the highest priority and class 4 with the lowest priority) as well as one BE class for this example. A transmission cycle time period is set in the present example as Δ=10μs, a link capacity is set as C=1 Gbps and equal packet size of s=1000bits. In such a scenario, the computed weight ω_i (i.e., maximum number of bits that are allowed to send during each Δ) for each class may be calculated as shown in Table 1. In the example, 8 data packets are buffered in each class queue respectively, each class having an associated strict priority and may have calculated weightings. In each transmission cycle time period Δ, the data packets from each of the plurality of service classes are scheduled based on a first transmission priority and the transmission cycle time period. In the present example, this means that each of classes 1 to N (in the case of this example 4) respectively are capable or transmitting/scheduling for transmission maximally 2, 2, 3, 2 data packets within each transmission cycle, e.g., one round/cycle of round robin. In addition, a data packet from a BE class may also be scheduled. The deficit counter of each class is reduced to zero at the end of Δ and reset at the beginning of the next Δ. In this way, the one or more processors may be configured to schedule the data packets such that a total transmission time of the scheduled data packets for each of the plurality of classes in the first transmission cycle is equal to the transmission cycle time period.

The data processing apparatus configured to implement the above scheduling may schedule data packets, in the first transmission cycle (the first transmission cycle time period) , based on the either a strict priority of the plurality of classes and/or a weight assigned to each of the plurality of classes, where the weight, as discussed above, include the transmission cycle time period Δ. In this way it can be ensured that the transmission delay, in this example 1 microsecond does not cause deadline violation and the correct number of data packets from each class are transmitted in a fair and timely manner during the transmission cycle time period. At the end of the first transmission cycle (the end of the duration of the transmission cycle time period) a subsequent transmission cycle may be performed by the apparatus and/or method of this disclosure. In this situation, the one or more processors may be further configured to schedule data packets of the plurality of classes for transmission in a subsequent transmission cycle of the determined transmission cycle time period, in dependence on the first transmission priority and the transmission cycle time period. The same steps as in the first transmission cycle may repeat in the subsequent transmission cycle during the second transmission cycle time period. The transmission cycle time period may be the same for every transmission cycle executed by the one or more processors of the data processing apparatus if this disclosure.

As can be seen in Figure 4b, there is a data versus time graph demonstrating the amounts of data from each class transmitted in transmission cycle time period. As can be seen, each class increments by the number of data packets that are allowed to be transmitted per transmission cycle time period such that each class, class 1 401, class 2 402, class 3 403, class 4 404 and BE class 405 have predictable step functions to the service curves for that class.

Figure 5 shows an example in which early transmission is incorporated into the scheduling methodology shown in Algorithm 1. As can be seen in Figure 5, lines 7 to 18 relate to the basic scheduling described above, whereas lines 19 to 26 relate to further commands that may be implemented by the one or more processors of the data processing apparatus or method to perform early transmission. The early transmission portion of the algorithm can be seen in Figure 5 within the dotted lines.

Early transmission (allows low latency (high priority) traffic to transmit as part of an earlier transmission cycle by using the remaining bandwidth left from other classes within a transmission cycle time period Δ as seen in Figure 5.

As a further definition the deficit counter c_r=c₁+c₂+…+c_N represents the remaining deficit counter after the basic scheduling phase within a transmission cycle time period Δ. This may be an alternate representation of the total transmission time. When the Algorithm of Figure 5 is executed by the one or more processors of the data processing apparatus of this disclosure a class i may stay in the basic scheduling mode or employ the early transmission mode. Early transmission, however, is recommended for high priority traffic, especially if a transmission cycle time period (scheduling period) Δ has not yet been completed e.g., if some time remains in the transmission cycle time period, and the remaining deficit counter c_r is larger than zero e.g., if data packets remain ready for transmission in one or more of the plurality of classes. In such cases, the one or more processors of the present disclosure may be configured to execute the early transmission Algorithm shown in Figure 5. In doing so, if early transmission is enabled for class i, and its buffer is not empty, this class is allowed to transmit at least some of its c_r bits during the remaining transmission cycle time period for the first transmission cycle. However, if the queue (containing data packets for transmission) of class i is empty and the remaining counter is c_r > 0, then skip to the next class. This is shown in lines 20 to 26 of the Algorithm shown in Figure 5. Said another way, if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period, schedule data packets for each of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period. The total transmission time of the data packets may be thought of as the sum of the transmission delays for each packet. In the case in which a class has no more data packets to transmit in the current transmission cycle then the algorithm may skip to the class with the next highest priority and schedule a data packet for transmission from that class early instead.

Once a data packet is transmitted or scheduled for transmission during the early transmission phase, the remaining counter c_r may be decreased accordingly (as seen in line 24 of the Algorithm of Figure 5) such that no further data packets can be scheduled for transmission in the current transmission cycle. Optionally, this may continue until the transmission cycle timer period is completely utilised by the data packets from the plurality of classes. If no class is enabled with early transmission, best effort (BE) traffic can utilize the remaining counter c_r. The one or more processors may also be configured to enable and disable one or more of the classes for early transmission. This may be done via user input or based on learnt parameters.

Figure 6, which demonstrates an example of the early transmission scheme of this disclosure that may be implemented as a method and/or as part of the data processing apparatus, will now be described. Figure 6 demonstrates one example of how data packets may be scheduled when an early transmission scheduling method is employed by the method and data processing apparatus. As shown in the example of Figure 6, at time t=0μs, 5 packets are buffered in a 1st priority queue (class 1) , 2 packets are buffered in the 2nd priority queue (class 2) , 3 packets are buffered in the 3rd priority queue (class 3) and 1 packet is buffered in the 4th priority queue (class 4) . In the same way as the basic scheduling methodology, data packets are scheduled based on the transmission cycle time period and the priority. In the example the transmission cycle time period is set to 10 microseconds and at time t=10μs, 2 packets arrived and buffered in the 2nd priority queue; 3 packets arrived and buffered in the 3rd priority queue; 4 packets arrived and buffered in the 4th priority queue. This can be seen in Figure 6, wherein it can also be seen that the transmission delay for each data packet is 1 microsecond or the purpose of this example.

In the example of Figure 6, in a first scheduling cycle: the deficit counter for each service class is initiated as c₁=2, c₂=2, c₃=3, c₄=2, c_BE=1 according to the weights. The early transmission scheduling shown in figure 6 can be summarised as follows. At the start of the first transmission cycle a first class (class 1) has 5 data packets in its buffer ready for transmission/scheduling, a second class (class 2) has 2 data packets in its buffer, a third class (class 3) has 3 packets in its buffer, a fourth class (class 4) has 1 packet in its buffer and the best effort class has multiple data packets ready for transmission/scheduling. As previously described, in the example of Figure 6 each of the classes are limited in the number of data packets they can schedule based on the properties described in relation to the basic scheduling above. The same limitations to the classes may apply here. As such, during a first transmission cycle during a transmission cycle time period two data packets may be scheduled by the one or more processors from class 1 utilising all of that classes deficit counter and leaving three data packets in the buffer queue for that class, two data packets may be scheduled by the one or more processors from class 2 utilising all of that classes deficit counter and leaving the buffer queue empty until new data packets for transmission arrive. The same is true for class 3 in which two data packets may be scheduled by the one or more processors again utilising all of that classes deficit counter. In relation to class 4 however, only one data packet is present in the buffer and the remaining deficit counter is 1 as class 4 was capable of transmitting two data packets but in the example of Figure 6 the irregular traffic has meant that no such second data packet for class 4 was present. As such, the one or more processors may be configured to schedule the next data packet from the best effort class leaving transmission time remaining within the transmission cycle time period, e.g., 1 microsecond. The one or more processors may then employ early transmission to class 1 wherein after basic scheduling described above, there are in total c_r=1 counter left in the first cycle e.g., 1 microsecond in the example. Class 1, being the highest priority class and having the appropriate weighting may then utilise the remaining one counter to transmit one data packet at the end of Δ thus completing the transmission cycle. After the completion of the first transmission cycle, the deficit counters may be reset (c₁=2, c₂=2, c₃=3, c₄=2, c_BE=1) . The one or more processors/method may perform a second/subsequent transmission cycle wherein the remaining two packets from class 1 are transmitted, followed by two packets from class 2, three packets from class 3, two packets from class 4, etc as shown in Figure 6.

In other words, while the scheduling of Figure 6 proceeds in the same way as described for the basic scheme, due to traffic fluctuations there may not be adequate traffic to completely utilise the transmission cycle time period in the first transmission cycle and as shown in Figure 6, only 9 microseconds of the available 10 microseconds are initially scheduled for data packet transmission (Table 2) . Table 2 provides the same calculations of the transmission cycle time period as in Table 1, with similar properties. Instead of beginning a subsequent transmission cycle with a further transmission cycle time period, the early transmission scheduling scheme that may be employed by the data processing apparatus of the present disclosure, may determine that a total transmission time (the aggregate of the transmission delays for each of the data packets scheduled in the first transmission cycle) of the scheduled data packets in comparison to the determined transmission cycle time period. This may identify that the transmission cycle time period is not being fully utilised and it is therefore possible to schedule a further data packet for transmission thus utilising the full transmission cycle time period. In such a case, if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period (in the example of Figure 6 less than the 10 microseconds) , schedule data packets for at least one of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period. However, if the total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period (as it is in the basic scheduling scheme or in a case where traffic is uniform or abundant) , schedule data packets in subsequent transmission cycle e.g., move to a second transmission cycle beginning a subsequent transmission cycle time period.

In the example shown in Figure 6, when it is determined that there are not sufficient data packets in class 4 to meet the transmission cycle time period requirement and thus only 9 microseconds are scheduled of the possible 10 microseconds, the one or more processors may be configured to schedule an additional data packet from a high priority class e.g., class 1, in advance of its usual transmission time in a subsequent cycle, therefore providing an opportunity to increase the efficiency of data packet transmission. In such a case the advanced data packet may be scheduled at the end of the transmission cycle time period after the normal scheduling has taken place and may be performed based on the priority and/or the weight assigned to each of the plurality of classes comprised of data packets.

Once such early transmission has taken place and the transmission cycle time period has been utilised efficiently by ensuring maximal data packets are scheduled for transmission, the one or more processors may be configured to begin a subsequent transmission cycle either using the basic scheduling scheme described above or again using the early transmission scheduling scheme. The method of scheduling therefore continues in the subsequent cycle as described in one of the two scenarios discussed above. In the example shown in Figure 6, since new data packets (traffic) for transmission have arrived at time equal to 10 microseconds, basic scheduling can be resumed without the need for early transmission as sufficient traffic from each class is present in the buffer.

Figures 7a and 7b demonstrate the differences in data packet scheduling between classical-Deficit Round Robin (Figure 7a) and Latency Jitter Bounded-Deficit Round Robin (Figure 7b) . Note that classical DRR does not associate its weights with a period of time such as the transmission cycle time period used in early transmission scheduling, instead data packets are transmitted from the service classes purely based on a weighted round robin fashion. The first cycle in classical-DRR shown in Figure 7a ends at t=9μs. From t=10μs, a new cycle starts and schedules packets based on their weights from class 1 to class 4. The last data packet in class 1 will therefore be transmitted at time t=19μs. The worst-case latency for class 1 traffic is therefore 19μs, whereas in LJB-DRR scheduling scheme, the worst latency for class 1 traffic (data packets) is 11μs (as shown in Figure 7b) , with a reduction ratio of almost 50%. This improvement seen when employing LJB-DRR is seen due to the early processing that may be performed as part of the scheduling method.

The aforementioned scheduling methodologies do not depend on any clock or frequency synchronization scheme (e.g., IEEE 1588v2) . This allows the scheduling schemes disclosed herein to be completely asynchronous. Furthermore, the data processing apparatus and method of data processing do not require a network device to maintain per-flow state, and hence, are scalable as network size grows. Combining time transmission cycle time period Δ (periodic scheduling duration) and weighted fairness based on bandwidth (ω_i) , the LJB-DRR scheduling scheme implemented by the apparatus and method of this disclosure is able to avoid interference of low priority traffic with high priority traffic. The disclosed apparatus and method are therefore capable of supporting multi-class real-time traffic with a guaranteed latency and jitter bound. This greatly reduces the complexity of hardware implementation (compared to RSCP/WFQ/VC/Jitter-EDD) and is more applicable in large-scaled networks, with millions of flows, compared to IEEE 802.1 Qch and Qcr. An asynchronous scheme is normally subject to the error caused by clock and frequency drift.

Figures 8a and 8b demonstrate a performance comparison between LJB -DRR and Paternoster-SP for a number of different network topologies. To compare the performance of LJB-DRR with Paternoster-SP, a line topology may be set up in an Omnest network simulator, see Figures 8a and 8b. Three different network loads are used, namely, 40%, 60%and 80%. As seen in in Figure 8a the maximum queuing delay in milliseconds is plotted for each class for each topology and Figure 8 demonstrates that for every topology, the maximum queuing delay is reduced when the two-dimensional scheduling approach of this disclosure is employed. Compared to Paternoster-SP, LJB-DRR reduces the end-to-end queuing delay significantly for all packet classes. Furthermore, as shown by Figure 8a LJB-DRR achieves at least 97%reduction compared to the Paternoster-SP scheme for all network loads in terms of latency bound.

In addition, as shown in figure 8b, LJB-DRR is also able to reduce the jitter bound compared to Paternoster-SP. The scheduling procedure of this disclosure achieves at least 69%reduction compared to Paternoster-SP for class 1 and 2 (for all network loads) and at least 87%reduction for class 3 and 4 traffic.

Figure 9 shows the difference between LJB-DRR and classical-DRR. It can be seen that all network loads, the maximum reduction ratio of the latency bound of LJB-DRR when compared to classical-DRR is approximately 45%. The same observation is found for the jitter bound in Figure 9b.

As such, it can be seen that the LJB-DRR two-dimensional scheduling scheme implemented by the methodology and configured to be implemented by the one or more processors of this disclosure lead to significant improvements in asynchronous network traffic synchronisation.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention. Although the scheduling methodology is described herein as being implemented by a data processing apparatus comprising one or more processors configured to implement the methodology, it should be understood that the scheduling methodology may be implemented as a method having corresponding steps to those discussed in relation to the data processing apparatus.

Claims

A data processing apparatus for implementing a two-dimensional scheduling methodology for scheduling data packets, the data processing apparatus comprising one or more processors configured to process data packets, each data packet belonging to a class and each class having a predetermined first transmission priority associated therewith, the one or more processors configured to:

determine a transmission cycle time period for a transmission cycle, within which data packets are transmitted;

receive, for one or more of a plurality of classes, data packets of data to be transmitted, each class having a transmission time;

schedule in a first transmission cycle the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period.
A data processing apparatus according to claim 1, wherein the one or more processors are configured to schedule the data packets such that a total transmission time of the scheduled data packets for each of the plurality of classes in the first transmission cycle is equal to the transmission cycle time period.
A data processing apparatus according to claim any one of claims 1 or 2, wherein the one or more processors are configured to further schedule the data packets based on either a strict priority of the plurality of classes and/or a weight assigned to each of each of the plurality of classes.
A data processing apparatus according to any one of claims 1 to 3, wherein the one or more processors are further configured to schedule data packets of the plurality of classes for transmission in a subsequent transmission cycle of the determined transmission cycle time period, in dependence on the first transmission priority and the transmission cycle time period.
A data processing apparatus according to any one of claims 1 to 4, wherein the data packets are scheduled in each transmission cycle using a round robin scheme.
A data processing apparatus according to claim 5, wherein the round robin scheme is a deficit round robin scheme.
A data processing apparatus according to claim 5 or 6, wherein the one or more processors are configured, in scheduling the data packets of the plurality of classes for transmission in the first transmission cycle, to:

determine a total transmission time of the scheduled data packets in comparison to the determined transmission cycle time period; then

if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period, schedule data packets for at least one of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, or

if a total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, schedule data packets in subsequent transmission cycle.
A data processing apparatus according to claims 4 or 6, wherein the one or more processors are configured to schedule data packets for each of the plurality of classes in the subsequent transmission cycle in dependence on the first transmission priority and the transmission cycle time period, and/or such that a total transmission time of the scheduled data packets for each of the plurality of classes in the subsequent transmission cycle is equal to the transmission cycle time period.
A method of data processing for scheduling the transmission of data packets, each data packet belonging to a class and each class having a predetermined first transmission priority associated therewith, the method comprising:

determining a transmission cycle time period for a transmission cycle, within which data packets are transmitted;

receiving, for one or more of a plurality of classes, data packets of data to be transmitted, each class having a transmission time;

scheduling in a first transmission cycle the data packets of the plurality of classes for transmission in dependence on the first transmission priority and the transmission cycle time period.
A method of data processing according to claim 9, wherein the data packets are scheduled such that a total transmission time of the scheduled data packets for each of the plurality of classes in the first transmission cycle is equal to the transmission cycle time period.
A method of data processing according to claim any one of claims 9 or 10, further comprising scheduling the data packets based on either a strict priority of the plurality of classes and/or a weight assigned to each of each of the plurality of classes.
A method of data processing according to any one of claims 9 to 11, further comprising scheduling data packets of the plurality of classes for transmission in a subsequent transmission cycle of the determined transmission cycle time period, in dependence on the first transmission priority and the transmission cycle time period.
A method of data processing according to any one of claims 9 to 12, wherein the data packets are scheduled in each transmission cycle using a round robin scheme.
A method of data processing according to claim 13, wherein the round robin scheme is a deficit round robin scheme.
A method of data processing according to claim 13 or 14, the method further comprises, in scheduling the data packets of the plurality of classes for transmission in the first transmission cycle:

determining a total transmission time of the scheduled data packets for each of the plurality of classes in comparison to the determined transmission cycle time period; then

if a total transmission time of the scheduled data packets for each of the plurality of classes in a first transmission cycle is less than the determined transmission cycle time period, schedule data packets for each of the plurality of classes from a subsequent cycle of the round robin scheme based on the first transmission priority and/or a weight assigned to each of the plurality of classes until total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, or

if a total transmission time of the scheduled data packets for each of the plurality of classes is equal to the transmission cycle time period, schedule data packets in subsequent transmission cycle.
A method of data processing according to claims 12 or 15, wherein the method further comprises scheduling data packets for each of the plurality of classes in the subsequent transmission cycle in dependence on the first transmission priority and the transmission cycle time period, and/or such that a total transmission time of the scheduled data packets for each of the plurality of classes in the subsequent transmission cycle is equal to the transmission cycle time period.