WO2024234289A1

WO2024234289A1 - Adaptive traffic control for distributed networks

Info

Publication number: WO2024234289A1
Application number: PCT/CN2023/094478
Authority: WO
Inventors: Ali Munir; Mahmoud Mohamed BAHNASY; Yashar Ganjali
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2024-11-21

Abstract

Methods, apparatus, and systems for providing adaptive traffic control in computing or telecommunications networks are disclosed. Data packets transiting within a subnetwork of a network (short-haul) and data packets transiting between different subnetworks of the same network (long-haul) can interfere at connection points in the network. Adaptive traffic control is used in networks with complex mixtures of data flows to resolve congestion that may arise from this interference. Some embodiments can determine whether incoming data packets are to be enqueued to the buffer of a network device at a connection point based on the current ingress of short-haul and long-haul data packets at the connection point.

Description

Adaptive Traffic Control for Distributed Networks

TECHNICAL FIELD

The present disclosure generally relates to network technologies, and more particularly methods, apparatus, and systems for controlling traffic within a network.

BACKGROUND

Modern internet services and applications often distribute data and computations across multiple datacenters (DCs) to improve performance and reliability. These services and applications frequently run some computations within the DCs themselves, giving rise to DC network (DCN) traffic, and run other computations in different but interconnected DCs, giving rise to DC interconnect (DCI) traffic. Because DCs are generally geographically separated, data packets will typically take longer to transit between DCs than between network elements within a same DC. This difference in transit times creates large disparities in the round-trip times (RTTs) for DCN and DCI traffic, and results in a complex mixture of RTTs for traffic across the distributed network that comprises the DCs.

A key challenge for distributed networks is handling interference between DCN and DCI traffic. When DCs face periods of increased network demand, data packets will begin to accumulate in the buffer queues of network switches. Local congestion arising within a DC is only detected by the wider network after a full DCI RTT. This feedback delay creates a mismatch between the local and wider network responses, which results in slow and uncoordinated congestion control.

Traditional mechanisms for active queue management (AQM) in network switches aim to ease congestion by applying methods for admitting or dropping incoming packets. These mechanisms do not adapt to the specific mix of DCN and DCI traffic, but instead apply static methods for prioritizing either DCN or DCI packets during enqueuing. In distributed networks where DCN and DCI traffic co-exist, this prioritization severely punishes non-prioritized traffic by causing low throughput, high latency and/or frequent packet losses. The detriments to the overall network’s performance limit the benefits of distributed computing to internet services and applications.

Therefore, improvements in adaptive traffic control are desirable.

This background information is provided to reveal information believed by the applicant to be of possible relevance to the present invention. No admission is necessarily intended, nor should be construed, that any of the preceding information constitutes prior art against the present invention.

SUMMARY

An object of embodiments of the present disclosure is to provide improvements in adaptive traffic control.

A first aspect of the present disclosure is to provide a method for controlling traffic within networks that include multiple subnetworks coupled together. The method can be performed by a packet-processing device, located in the network, that has a buffer and is configured to receive data packets transiting within a subnetwork of the network and data packets transiting between subnetworks of the network. The method comprises receiving an incoming data packet; and enqueuing the incoming data packet to the buffer when the data packet is transiting within a subnetwork and a proportion between data packets transiting within the subnetwork and data packets transiting between subnetworks is less than a threshold value, or when the incoming data packet is transiting between subnetworks and the buffer has space available to buffer the incoming data packet. The incoming data packet is identified as transiting within a subnetwork or as transiting between subnetworks by identification data belonging to the data packet.

In some embodiments of the first aspect, the method further comprises dropping the incoming data packet when the incoming data packet is transiting within a subnetwork and the proportion between data packets transiting within the subnetwork and data packets transiting between subnetworks is greater than or equal to the threshold value, or when the incoming data packet is transiting between subnetworks and the buffer is devoid of space available to buffer the incoming data packet.

In some embodiments of the first aspect, the proportion between data packets transiting within the subnetwork and data packets transiting between subnetworks depends from a rate of arrival at the packet-processing device of the data packets transiting within the subnetwork, and a rate of arrival at the packet-processing device of the data packets transiting between the subnetworks. In some embodiments, the rate of arrival at the packet-processing device of the data packets transiting within the subnetwork depends from a ratio between a count of data packets transiting within the subnetwork within a register window and a length of the register window. In some embodiments, the rate of arrival at the packet-processing device of the data packets transiting between subnetworks depends from a ratio between a count of data packets transiting between subnetworks within a register window and a length of the register window. In some embodiments, the register window comprises 32 bits or 64 bits.

In some embodiments of the first aspect, the network includes a first transmitter configured to transmit data packets transiting within a subnetwork to the packet-processing device and a first receiver configured to receive the data packets transiting within the subnetwork from the packet-processing device. The network also includes a second transmitter configured to transmit data packets transiting between subnetworks to the packet-processing device and a second receiver configured to receive the data packets transiting between subnetworks from the data-processing device. In these embodiments, the round-trip times between the first transmitter and the first receiver and between the second transmitter and the second receiver may be within ranges that are exclusive to one another. In some embodiments, a non-limiting example of a range for the round-trip time between the first transmitter and the first receiver may be a range comprised between 1 μs and 1000 μs. In some embodiments, a non-limiting example of a range for the round-trip time between the second transmitter and the second receiver may be a range comprised between 5 ms and 100 ms.

In some embodiments of the first aspect, the threshold value depends from a ratio between an unoccupied space of the buffer and a buffer capacity.

In some embodiments of the first aspect, the threshold value depends from a ratio between an unoccupied space of the buffer and an unreserved capacity of the buffer. The unreserved capacity is defined as a difference between the buffer capacity and a reserved space in the buffer that comprises an adjustable amount of space or a fixed amount of space. In some embodiments, the method further comprises enqueuing the incoming data packet to the reserved space in the buffer when the incoming data packet is transiting between subnetworks and the reserved space in the buffer has space available to buffer the incoming data packet.

In some embodiments of the first aspect, the identification data comprises a source IP address. In some embodiments, the identification data comprises a destination IP address.

In some embodiments of the first aspect, the method further comprises, when the incoming data packet is enqueued to the buffer or a reserved space in the buffer, assigning a priority for dequeuing to the incoming data packet. Data packets transiting between subnetworks are assigned a priority that is less than the priority assigned to data packets transiting within a subnetwork.

In some embodiments of the first aspect, the subnetworks comprise each a datacenter. In some embodiments, the subnetworks belong to a distributed network.

In some embodiments of the first aspect, the method further comprises, when the incoming data packet is enqueued to the buffer or a reserved space in the buffer, dequeuing the incoming data packet according to a First-in-First-Out (FIFO) policy.

In some embodiments of the first aspect, the method further comprises, when the incoming data packet is enqueued to the buffer or a reserved space in the buffer, dequeuing the incoming data packet according to a Shortest-Remaining-Processing-Time (SRPT) policy.

In some embodiments of the first aspect, the network uses a Transmission Control Protocol (TCP) . In some embodiments, the network uses remote direct memory access (RDMA) . In some embodiments, the network uses a User Datagram Protocol (UDP) .

In some embodiments of the first aspect, the network comprises a Wi-Fi network. In some embodiments, the network comprises a cellular network that is 5G or LTE.

A second aspect of the present disclosure provides a network element comprising a packet-processing device. The packet-processing device has a buffer and is coupled with subnetworks to receive data packets transiting within a subnetwork and data packets transiting between subnetworks. The packet-processing device is configured to perform the method of the first aspect. Some embodiments of the second aspect may further provide the embodied variations of the first aspect.

A third aspect of the present disclosure provides a system comprising a network with two subnetworks coupled together. The system further comprises a first transmitter, which is in a first subnetwork of the two subnetworks and is configured to send first data packets; a second transmitter, which is in the first subnetwork and is configured to send second data packets; a first receiver, which is in the first subnetwork and is configured to receive the first data packets; a second receiver, which is in the second subnetwork and is configured to receive the second data packets; and a packet-processing device, which is in the network and includes a buffer. The packet-processing device is configured to perform the method of the first aspect, wherein the first data packets constitute data packets transiting within a subnetwork and the second data packets constitute data packets transiting between subnetworks. Some embodiments of the third aspect may further provide the embodied variations of the first aspect.

A fourth aspect of the present disclosure provides another system comprising a network with two subnetworks coupled together. The system further comprises a first transmitter, which is in the first subnetwork of the two subnetworks and is configured to send first data packets; a second transmitter, which is in the second subnetwork and is configured to send second data packets; a first receiver, which is in the first subnetwork and is configured to receive the first data packets; a second receiver, which is in the first subnetwork and is configured to receive the second data packets; and a packet-processing device, which is in the network and includes a buffer. The packet-processing device is configured to perform the method of the first aspect, wherein the first data packets constitute data packets transiting within a subnetwork and the second data packets constitute data packets transiting between subnetworks. Some embodiments of the fourth aspect may further provide the embodied variations of the first aspect.

A fifth aspect of the present disclosure provides a packet-processing network apparatus comprised in a network including two subnetworks, a first and a second subnetwork, coupled together. The apparatus is deployed in the first subnetwork and is configured to receive first data packets transiting within the first subnetwork and second data packets transiting between the first subnetwork and the second subnetwork. The apparatus comprises a buffer, a processor coupled to the buffer, and a tangible processor-readable memory. The tangible processor-readable memory has recorded thereon instructions to be performed by the processor to carry out a set of actions that comprise the method of the first aspect. Some embodiments of the fifth aspect may further provide the embodied variations of the first aspect.

Embodiments have been described above in conjunction with aspects of the present invention upon which they can be implemented. Those skilled in the art will appreciate that embodiments may be implemented in conjunction with the aspect with which they are described but may also be implemented with other embodiments of that aspect. When embodiments are mutually exclusive, or are incompatible with each other, it will be apparent to those skilled in the art. Some embodiments may be described in relation to one aspect, but may also be applicable to other aspects, as will be apparent to those of skill in the art.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a typical distributed computing network topology in operation, where embodiments of the present disclosure may be implemented.

FIG. 2 shows a flowchart of an embodiment of a prior art method for AQM at a switch with coexisting DCN and DCI data flows.

FIG. 3 shows a flowchart of a method for enqueuing incoming data packets at a switch with coexisting DCN and DCI data flows, according to an embodiment of the present disclosure.

FIG. 4 shows a flowchart of a method for adaptive traffic control, according to an embodiment of the present disclosure.

FIG. 5A shows an example of an incoming data packet of a DCN flow being enqueued to the buffer of a switch, according to an embodiment of the present disclosure.

FIG. 5B shows an example of an incoming data packet of a DCN flow being dropped from the buffer of a switch, according to an embodiment of the present disclosure.

FIG. 6 shows an apparatus for adaptive traffic control, according to embodiments of the present disclosure.

FIG. 7 shows a schematic of an embodiment of an electronic device that may implement all or part of the methods and features of the present disclosure.

DETAILED DESCRIPTION

To resolve network congestion arising from mixtures of DCN traffic (short-haul traffic) and DCI traffic (long-haul traffic) , embodiments of the present invention, as disclosed herein, are generally directed towards using conditional logic in the enqueuing of data packets at network switches or other appropriate network devices. Some embodiments may adapt to network conditions by conditionally enqueuing data packets according to the specific mixtures of short-haul and long-haul traffic that are being received by the switches. In yet some other embodiments, the conditional logic in the enqueuing of data packets may be used to manage network congestion arising from mixtures of data packets requiring low latency and/or data packets requiring high throughput.

The present disclosure sets forth various embodiments via the use of block diagrams, flowcharts, and examples. Insofar as such block diagrams, flowcharts, and examples contain one or more functions and/or operations, it will be understood by a person skilled in the art that each function and/or operation within such block diagrams, flowcharts, and examples can be implemented, individually or collectively, by a wide range of hardware, software, firmware, or combination thereof. As used herein, the term “about” should be read as including variation from the nominal value, for example, a +/-10%variation from the nominal value. It is to be understood that such a variation is always included in a given value provided herein, whether or not it is specifically referred to. The terms “traffic” , “flow” and “data packets” may be used interchangeably throughout the disclosure. The terms “queue” and “buffer” may be used interchangeably throughout the disclosure.

FIG. 1 shows an example of a distributed network topology. The network comprises two datacenters (DCs) : DC 101 and DC 102 connected to each other via a wide area network (WAN) 103. DC 101 comprises a router 104 coupled to WAN 103, and DC 102 comprises router 105 coupled to WAN 103. Within each of DC 101 and DC 102 there is a respective plurality of nodes (e.g., nodes 106, 107, 108, and 109 of DC 101; and nodes 110, 111, 112, and 113 of DC 102) . These nodes may transmit and receive data while executing computing applications and the computations involved therein. Nodes 106 to 109 of DC 101 are connected to router 104 through switches 114 or 115. Nodes 110 to 113 of DC 102 are connected to router 105 through switches 116 or SW 117. The connections between network components may comprise physical links (e.g., ethernet or optical cables) and/or wireless links (e.g., radiowave or microwave) . DC 101 may be referred to as a subnetwork and DC 101 may be referred to as another subnetwork. Both subnetworks are coupled (connected, in communication with) to each other and may define an entire network or may define only a portion of the entire network.

In the non-limiting example of FIG. 1, node 106 of DC 101 is transmitting data packets to node 113 of DC 102. This transmission produces a DCI flow 118, which transits via switch 114 and router 104 of DC 101, WAN 103, and router 105 and switch 117 of DC 102. In this non-limiting example, DCI flow 118 has an RTT of about 10 milliseconds (ms) . At the same time, node 107 of DC 101 is transmitting data packets to node 108 of DC 101. This second transmission produces a DCN flow 119, which transits via switch 114, router 104, and switch 115 of DC 101. In this non-limiting example, DCN flow 119 has an RTT of about 200 microseconds (μs) . A confluence of the DCI flow 118 and the DCN flow 119 hence arises at switch 114, which may result in interference between DCI flow 118 and DCN flow 119. As will be understood by a person skilled in the art, routers and middleboxes may include switches.

The interference between the DCI flow 118 and DCN flow 119 at switch 114 can result in congestion and build-up of data packets within the buffer of switch 114, wherein either or both of the DCI flow 118 and DCN flow 119 may experience low latency, high packet losses and/or low throughput. Although the DCs 101 and 102 may control the transmission rates of data flows to ease congestion during periods of increased demand, responses within the DCI flow 118 will lag behind responses within the DCN flow 119 because of the disparity in RTTs (e.g., ～10 ms for DCI flow 118 versus ～200 μs for DCN flow 119) . In this manner, congestion feedback cannot be detected and acted upon for a full DCI RTT. The mismatch in responses results in uncoordinated and ineffective congestion control.

AQM describes methods for controlling network congestion at a switch through policies for enqueuing, dropping, and dequeuing incoming data packets. These methods generally perform poorly when DCN and DCI traffic coexist in the network. FIG. 2 shows a flowchart for a typical method of the prior art for AQM with coexisting DCN and DCI data flows. In this method, an incoming data packet 201 is received by a switch 202 and is classified 203 as belonging to either a DCN flow (e.g., DCN flow 119) or a DCI flow (e.g., DCI flow 118) . If the data packet 201 belongs to DCN flow 119, then it is given a high priority 204 for enqueuing; alternatively, if the data packet 201 belongs to a DCI flow 118, it is given a low priority 205 for enqueuing. As in this example method, when DCN traffic is prioritized to ensure it has low latency, it will quickly fill up the switch’s buffer because of its smaller RTT. DCI traffic may then experience high packet losses and hence lower throughput. These effects can take a long time to resolve because of the long propagation delays of the DCI traffic (DCI packets) . In contrast with the method of FIG. 2, if DCI traffic were to be prioritized over DCN traffic instead, DCI traffic would fill up the buffer to achieve high throughput and then DCN traffic would suffer from high latency.

Embodiments of the present disclosure may mitigate interference between DCN traffic and DCI traffic to concurrently enable low latencies for DCN flows and low loss rates and high throughputs for DCI flows. In embodiments, buffer controls implemented through a network switch may resolve traffic congestion quickly by adapting to the specific mixture of DCI flows and DCN flows transiting through the switch. Such dynamic buffer controls may be automated to admit data packets based on whether they belong to DCN flows or DCI flows, the current buffer utilization, and/or the arrival rates of DCN traffic (DCN packets) and DCI traffic (DCI packets) . Embodiments may also comprise reserved space in the buffer for DCI packets in order to mitigate the impact of DCN flows on DCI flows and to guarantee high throughput for DCI flows. Some embodiments may further prioritize DCN packets for dequeuing to ensure low latency for DCN flows.

FIG. 3 shows a flowchart of a method according to an embodiment of the present disclosure. In contrast with the prior art method of FIG. 2, the method of the embodiment of FIG. 3 may comprise an adaptive traffic control 301 in switch 202. The adaptive traffic control 301 decides whether to admit or not admit the incoming data packet 201. In deciding whether to admit the incoming data packet 201, the adaptive traffic control 301 may consider, for example, whether incoming data packet 201 belongs to a DCN flow or a DCI flow, the state of buffer utilization, the rates of arrival of DCN traffic (DCN packets) and DCI traffic (DCI packets) , the occupancy of a reserved space in the buffer, and/or any other appropriate factors. If the incoming data packet 201 is admitted, it may proceed to being classified 203 as a DCN flow 119 or a DCI flow 118. Once classified, the data packet may be respectively assigned a high priority for dequeuing 302 or a low priority for dequeuing 303. If the incoming data packet 201 is not admitted, it may proceed to being dropped 304 from switch 202.

FIG. 4 shows a flowchart for an adaptive traffic control method according to an embodiment of the present disclosure. At action 401, an incoming data packet may be received at a connection point of a network. A connection point of a network may be any intersection among connections between transmitting and receiving nodes in the network. These connections may be connected at the intersection through a packet-processing device (network element) such as, for example, a switch, a middlebox, or a router, which may comprise a buffer. The buffer may comprise a queue or a plurality of queues for data packets arriving at the connection point. The DC may be connected to another DC through a wide area network, such that the incoming data packet could be either transiting within the DC of the connection point, as part of a DCN flow, or transiting between the two DCs, as part of a DCI flow. At action 402, a determination may be made of whether the incoming data packet belongs to a DCN flow or a DCI flow. This may be determined by obtaining identification data associated with or comprised in the incoming data packet, which may, for example, be IP addresses for the source and destination of the data packet, an end-host mark, a type-of-service (ToS) field, a differentiated service field or any other suitable identifier.

If the incoming data packet is determined to be belonging to a DCN flow, a threshold buffer availability, B_avail, and a proportional utilization of the buffer by DCN packets, S, may be calculated at action 403. The threshold buffer availability may be defined as the proportion of the buffer that is unoccupied:

where Q_c is the total capacity of the buffer and Q_s is the total utilization of the buffer. The difference between Q_c and Q_s may define an amount of unoccupied space in the buffer. Unoccupied space in the buffer may be space that is not being utilized by data packets. The buffer may have a portion of its capacity reserved for DCI traffic, z. The amount of reserved space for DCI traffic may be adjustable or pre-set (fixed) . In this case, the threshold buffer availability may be defined as:

The difference between Q_c and z may define an amount of space in the buffer that is not reserved for DCI traffic (i.e., an unreserved capacity) . The proportional utilization of the buffer by DCN packets may be defined as the proportion S of DCN data packets arriving at the connection point:

where DCN_t is the amount of incoming DCN data packets and DCI_t is the amount of incoming DCI data packets. The amounts of incoming DCN data packets and DCI data packets may include the incoming data packet currently under consideration at the connection point and may be defined by rates of arrival at the connection point of the respective types of data packets, with the rates being assessed over a register window. Thus, the rate of arrival of DCN, r_DCN, and the rate of arrival of DCI data packets, r_DCI, may be calculated as:

where DCN counts and DCI counts are the respective number of instances of DCN and DCI packets received at the connection point within the register window, and register length is the total number of packets that may be registered in the window. The length of the register window may, for example, be 32 bits or 64 bits or any other suitable number of bits.

At action 404, the threshold buffer availability may be compared to the utilization of the buffer by DCN packets. If s < B_avail, the incoming data packet of a DCN flow may be enqueued to an unreserved space in the buffer at action 405. If s ≥ B_avail, the incoming data packet of the DCN flow may be dropped at action 406.

If the incoming data packet is determined to belong to a DCI flow, a determination may be made, at action 407, of whether there is space available to enqueue the incoming data packet to a reserve space for DCI data packets in the buffer. If there is reserve space determined to be available, the incoming data packet may be enqueued to the reserve space for DCI traffic in the buffer at action 408. If no reserve space is determined to be available (i.e., the buffer is devoid of reserve space for buffering the incoming data packet) , a determination may be made, at action 409, of whether there is space available to enqueue the incoming data packet to an unreserved portion of the buffer. If the switch determines that there is unreserved space available, the incoming data packet may be enqueued to an unreserved space in the buffer at action 405. If no unreserved space is determined to be available (i.e., the buffer is devoid of any space for buffering the incoming data packet) , the incoming data packet may be dropped at action 406.

FIG. 5A shows an example of adaptive traffic control in operation, according to an embodiment of the present disclosure, wherein an incoming data packet 501 of a DCN flow is enqueued at a switch. In this example, the switch’s buffer 502 has six spaces for queuing data packets (Q_c = 6) , one of which is reserved for data packets of DCI flows (z = 1) . Two of the spaces in the buffer 502 are being utilized by data packets of DCI flows (Q_s = 2) , and the buffer 502 is receiving an incoming data packet 501 of a DCN flow; therefore, according to Equations 2 and 3, the proportional utilization of the buffer 502 by DCN packets is approximately 33% (S= 33%) and the threshold buffer availability is 80% (B_avail = 80%) . With S being less than B_avail, the switch decides to admit the incoming data packet 501 and enqueue it to an unreserved space in the buffer 502.

FIG. 5B shows another example of adaptive traffic control in operation, according to an embodiment of the present disclosure. In contrast with the example of FIG. 5A, the incoming data packet 501 of a DCN flow is dropped instead of being enqueued at the switch. The buffer 502 in this example, like that of FIG. 5A, has six spaces for queuing data packets (Q_c = 6) , one of which is again reserved for data packets of DCI flows (z = 1) . Unlike the example of FIG. 5A, three of the spaces in the buffer are being utilized by data packets of DCI flows and two of the spaces in the buffer are being utilized by data packets of DCN flows (Q_s = 5) . With the incoming data packet 501 belonging to a DCN flow, the proportional utilization of the buffer 502 by DCN packets is 50% (S= 50%) and the threshold buffer availability is 20% (B_avail = 20%) . In this example, S is greater than B_avail, and so the switch decides to drop the incoming data packet 501.

In some embodiments of the present disclosure, the conditions for enqueuing or dropping an incoming data packet may be implemented in an alternate form that is mathematically equivalent to those disclosed hereinbefore. For example, a proportional utilization of the buffer by DCI packets may instead be compared to a threshold. In this case, the incoming data packet may be enqueued if the following holds true:

Furthermore, if Equation 6 is not true in such a case, the incoming data packet may be dropped. In another implementation, a simple inversion of the ratios defining each of S and B_avail in Equations 2 and 3 respectively may be compared instead. In this case, if S > B_avail, the incoming data packet may be enqueued, and if S < B_avail, the incoming data packet may be dropped. A person skilled in the art will appreciate that the conditions of these examples are mathematically equivalent to the conditions disclosed according to the embodiment of FIG. 4.

Embodiments of the present disclosure may dequeue data packets from the buffer according to various methods. In some embodiments, enqueued data packets may be assigned a first or a second priority for dequeuing according to whether they belong to a DCI flow or a DCN flow. For example, an enqueued data packet of a DCN flow may be assigned a higher priority for dequeuing than an enqueued data packet of a DCI flow to ensure that DCN traffic has low latency. In other embodiments, data packets may be dequeued according to a First-in-First-Out (FIFO) policy or a Shortest-Remaining-Processing-Time (SRPT) policy. In further embodiments, a FIFO or SRPT policy may be combined with prioritization policies for DCN data packets or DCI data packets.

In some embodiments of the present disclosure, the network device enabling adaptive traffic control, as disclosed in FIG. 4, may be in a DC. This DC may be coupled with a second DC or a plurality of other DCs through a wide area network. The coupling of DCs through a wide area network may produce a complex mixture of RTTs for traffic in the network. In some embodiments, DCN traffic may have RTTs comprised in a first range and DCI traffic may have RTTs comprised in a second range that does not overlap with the first range (is exclusive) . For example, in one embodiment, the first range may be comprised between 1 and 1000 μs while the second range may be comprised between 5 and 100 ms. In another embodiment, the first range may be comprised between 1 and 1000 μs while the second range may be comprised between 1 and 500 ms. In other embodiments, the switch may belong to a Wi-Fi wireless network or a cellular network, the latter of which may be operating according to the Long-Term Evolution (LTE) or 5G standard. In further embodiments, the network device may belong to a network combining DC, wireless and/or cellular subnetworks. In still further embodiments, the mixture of DCN data packets and DCI data packets may instead be a mixture of data packets requiring low latency and data packets requiring high throughput.

Embodiments of the present disclosure may be implemented in networks using the Transmission Control Protocol (TCP) , the Quantized Congestion Notification (QCN) standard, random direct memory access (RDMA) , the Bottleneck Bandwidth and Round-trip propagation time (BBR) protocol, the User Datagram Protocol (UDP) , and/or other internet standards and transport protocols or any combination thereof.

Embodiments of the present disclosure may be implemented using electronics hardware, software, or a combination thereof. For example, some embodiments may be implemented in commodity switches, which may combine both hardware and software components. In some embodiments, the invention may be implemented by one or multiple computer processors executing program instructions stored in memory (processor-readable memory) . In some embodiments, the invention is implemented partially or fully in hardware, for example using one or more field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs) to rapidly perform processing operations.

FIG. 6 shows an apparatus 600 for adaptive traffic control, according to embodiments of the present invention. The apparatus is located at a connection point 610 of a network. The apparatus 600 includes a network interface 620 and processing electronics 630. The processing electronics 630 may include a computer processor executing program instructions stored in memory, or other electronics components such as digital circuitry, including for example FPGAs and ASICs. The network interface 620 may include an optical communication interface or a radio communication interface, such as a transmitter and receiver. The apparatus 600 may include several functional components, each of which may be partially or fully implemented using the underlying network interface 620 and processing electronics 630. Examples of functional components may include modules for receiving 640 incoming data packets, obtaining 641 identification data associated with incoming data packets, enqueuing 642 incoming data packets, dropping 643 incoming data packets, and dequeuing 644 enqueued data packets.

FIG. 7 shows a schematic diagram of an electronic device 700 that may perform any or all of operations of the above methods and features explicitly or implicitly described herein, such as methods for adaptive traffic control at network switches. For example, a computer equipped with network functions may be configured as electronic device 700. The electronic device 700 may be used to implement the apparatus 600 of FIG. 6, for example.

As shown in FIG. 7, the device 700 includes a processor 710, such as a Central Processing Unit (CPU) or specialized processors such as a Graphics Processing Unit (GPU) or other such processor unit, memory 720, a network interface 730, and a bi-directional bus 740 to communicatively couple the components of electronic device 700. Electronic device 700 may also optionally include non-transitory mass storage 750, an I/O interface 760, and a transceiver 770. According to certain embodiments, any or all of the depicted elements may be utilized, or only a subset of the elements. Furthermore, the device 700 may contain multiple instances of certain elements, such as multiple processors, memories, or transceivers. In addition, elements of the hardware device may be directly coupled to other elements without the bi-directional bus 740. Additionally or alternatively to a processor 710 and memory 720, other electronics, such as integrated circuits, may be employed for performing the required logical operations.

The memory 720 may include any type of non-transitory memory such as static random access memory (SRAM) , dynamic random access memory (DRAM) , synchronous DRAM (SDRAM) , read-only memory (ROM) , any combination of such, or the like. Memory 720 may include more than one type of memory, such as ROM for use at boot-up, and DRAM for program and data storage for use while executing programs. The mass storage element 750 may include any type of non-transitory storage device, such as a solid state drive, hard disk drive, a magnetic disk drive, an optical disk drive, USB drive, or any computer program product configured to store data and machine executable program code. According to certain embodiments, the memory 720 or mass storage 750 may have recorded thereon statements and instructions executable by the processor 710 for performing any of the operations described above. In some embodiments, mass storage 750 may be remote to the electronic device 700 and accessible through use of a network interface such as interface 730. In the embodiment of FIG. 7, mass storage 750 is distinct from memory 720 and may generally perform storage tasks compatible with higher latency but may generally provide lesser or no volatility. In some embodiments, mass storage 750 may be integrated with the memory 720.

Network interface 730 may include at least one of a wired network interface and a wireless network interface. The network interface 730 may include a wired network interface to connect to a network 780, and also may include a radio access network interface 790 for connecting to other devices over a radio link. The network interface 730 enables the electronic device 700 to communicate with remote entities such as those connected to network 780.

The bi-directional bus 740 may be one or more of any type of several bus architectures, including a memory bus or memory controller, a peripheral bus, or a video bus.

It will be appreciated that, although specific embodiments of the technology have been described herein for purposes of illustration, various modifications may be made without departing from the scope of the technology. The specification and drawings are, accordingly, to be regarded simply as an illustration of the invention as defined by the appended claims, and are contemplated to cover any and all modifications, variations, combinations or equivalents that fall within the scope of the present invention. In particular, it is within the scope of the technology to provide a computer program product or program element, or a program storage or memory device such as a magnetic or optical wire, tape or disc, or the like, for storing signals readable by a machine, for controlling the operation of a computer according to the method of the technology and/or to structure some or all of its components in accordance with the system of the technology.

Acts associated with the method described herein can be implemented as coded instructions in a computer program product. In other words, the computer program product is a computer-readable medium upon which software code is recorded to execute the method when the computer program product is loaded into memory and executed on the microprocessor of the wireless communication device.

Further, each operation of the method may be executed on any computing device, such as a personal computer, server, PDA, or the like and pursuant to one or more, or a part of one or more, program elements, modules or objects generated from any programming language, such as C++, Java, or the like. In addition, each operation, or a file or object or the like implementing each said operation, may be executed by special purpose hardware or a circuit module designed for that purpose.

Through the descriptions of the preceding embodiments, the present invention may be implemented by using hardware only or by using software and a necessary universal hardware platform. Based on such understandings, the technical solution of the present invention may be embodied in the form of a software product. The software product may be stored in a non-volatile or non-transitory storage medium, which can be a compact disk read-only memory (CD-ROM) , USB flash disk, or a removable hard disk. The software product may include instructions that enable a computer device (personal computer, server, or network device) to execute the methods provided in the embodiments of the present invention. For example, such an execution may correspond to a simulation of the logical operations as described herein. The software product may additionally or alternatively include instructions that enable a computer device to execute operations for configuring or programming a digital logic apparatus in accordance with embodiments of the present invention.

Although the present invention has been described with reference to specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the invention. The specification and drawings are, accordingly, to be regarded simply as an illustration of the invention as defined by the appended claims, and are contemplated to cover any and all modifications, variations, combinations or equivalents that fall within the scope of the present invention.

Claims

A method of controlling traffic within a network, the network including a first subnetwork and a second subnetwork, the first subnetwork being coupled to the second subnetwork, the method comprising,

by a packet-processing device in the network, the packet-processing device including a buffer, and being configured to receive first data packets transiting within the first subnetwork and second data packets transiting between the first subnetwork and the second subnetwork:

receiving an incoming data packet, the incoming data packet comprising identification data that identifies the incoming data packet as belonging to the first data packets or as belonging to the second data packets; and

enqueueing the incoming data packet to the buffer when:

the identification data identifies the incoming data packet as belonging to the first data packets, and a proportion of the first data packets to a sum of the first data packets and the second data packets is less than a threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer has space available to buffer the incoming data packet.
The method of claim 1, further comprising,

by the packet-processing device in the network:

dropping the incoming data packet when:

the identification data identifies the incoming data packet as belonging to the first data packets, and the proportion of the first data packets to the sum of the first data packets and the second data packets is greater than or equal to the threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer is devoid of space available to buffer the incoming data packet.
The method of any one of claims 1 or 2, wherein the proportion of the first data packets to the sum of the first data packets and the second data packets depends from:

a rate of arrival at the packet-processing device of the first data packets, and

a rate of arrival at the packet-processing device of the second data packets.
The method of any one of claims 1 to 3, wherein:

the network includes a first transmitter configured to transmit the first data packets to the packet-processing device and a first receiver configured to receive the first data packets from the packet-processing device, a round-trip time between the first transmitter and the first receiver being within a first range;

the network includes a second transmitter configured to transmit the second data packets to the packet-processing device and a second receiver configured to receive the second data packets from the packet-processing device, a round-trip time between the second transmitter and the second receiver being within a second range; and

the first range and the second range are exclusive of one another.
The method of claim 4, wherein the first range is comprised between 1 μs and 1000 μs.
The method of any one of claims 4 or 5, wherein the second range is comprised between 5 ms and 100 ms.
The method of any one of claims 1 to 6, wherein the threshold value depends from a ratio between an unoccupied space of the buffer and a buffer capacity.
The method of any one of claims 1 to 6, wherein the threshold value depends from a ratio between:

an unoccupied space of the buffer, and

an unreserved capacity of the buffer, the unreserved capacity being defined as a difference between:

a buffer capacity, and

a reserved space in the buffer, the reserved space in the buffer comprising an adjustable amount of space or a fixed amount of space.
The method of claim 8, further comprising,

by the packet-processing device in the network,

enqueuing the incoming data packet to the reserved space in the buffer when:

the identification data identifies the incoming data packet as belonging to the second data packets, and the reserved space in the buffer has space available to buffer the incoming data packet.
The method of claim 3, wherein:

the rate of arrival at the packet-processing device of the first packets depends from a ratio between:

a count of the first data packets arriving at the packet-processing device within a register window, and

a length of the register window; and

the rate of arrival at the packet-processing device of the second packets depends from a ratio between:

a count of the second data packets arriving at the packet-processing device within the register window, and

the length of the register window.
The method of claim 10, wherein the register window comprises 32 bits or 64 bits.
The method of any one of claims 1 to 11, wherein the identification data comprises a source IP address.
The method of any one of claims 1 to 12, wherein the identification data comprises a destination IP address.
The method of any one of claims 1 to 13, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer:

assigning a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assigning a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
The method of claim 9, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

assigning a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assigning a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
The method of any one of claims 1 to 15, wherein the first subnetwork comprises a first datacenter.
The method of any one of claims 1 to 16, wherein the second subnetwork comprises a second datacenter.
The method of any one of claims 1 to 17, wherein the first subnetwork and the second subnetwork belong to a distributed network.
The method of any one of claims 1 to 18, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer:

dequeuing the incoming data packet according to a First-in-First-Out (FIFO) policy.
The method of any one of claims 9 or 15, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

dequeuing the incoming data packet according to a First-in-First-Out (FIFO) policy.
The method of any one of claims 1 to 18, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer:

dequeuing the incoming data packet according to a Shortest-Remaining-Processing-Time (SRPT) policy.
The method of any one of claims 9 or 15, further comprising,

by the packet-processing device in the network,

when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

dequeuing the incoming data packet according to a Shortest-Remaining-Processing-Time (SRPT) policy.
The method of any one of claims 1 to 22, wherein the network uses a Transmission Control Protocol (TCP) .
The method of any one of claims 1 to 23, wherein the network uses a remote direct memory access (RDMA) .
The method of any one of claims 1 to 24, wherein the network uses a User Datagram Protocol (UDP) .
The method of any one of claims 1 to 25, wherein the network comprises a Wi-Fi network.
The method of any one of claims 1 to 26, wherein the network comprises a cellular network, the cellular network being 5G or LTE.
A network element, comprising:

a packet-processing device having a buffer,

the packet-processing device being coupled to a first subnetwork to receive first data packets transiting within the first subnetwork,

the packet-processing device being coupled to a second subnetwork to receive second data packets transiting between the second subnetwork and the first subnetwork,

the packet-processing device being configured to:

receive an incoming data packet, the incoming data packet comprising identification data that identifies the incoming data packet as belonging to the first data packets or as belonging to the second data packets; and

enqueue the incoming data packet to the buffer when:

the identification data identifies the incoming data packet as belonging to the first data packets, and a proportion of the first data packets to a sum of the first data packets and the second data packets is less than a threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer has space available to buffer the incoming data packet.
The network element of claim 28, wherein the packet-processing device is further configured to:

drop the incoming data packet when:

the identification data identifies the incoming data packet as belonging to the first data packets, and the proportion of the first data packets to the sum of the first data packets and the second data packets is greater than or equal to the threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer is devoid of space available to buffer the incoming data packet.
The network element of any one of claims 28 or 29, wherein the proportion of the first data packets to the sum of the first data packets and the second data packets depends from:

a rate of arrival at the packet-processing device of the first data packets, and

a rate of arrival at the packet-processing device of the second data packets.
The network element of any one of claims 28 to 30, wherein the threshold value depends from a ratio between an unoccupied space of the buffer and a buffer capacity.
The network element of any one of claims 28 to 30, wherein the threshold value depends from a ratio between:

an unoccupied space of the buffer, and

an unreserved capacity of the buffer, the unreserved capacity being defined as a difference between:

a buffer capacity, and

a reserved space in the buffer, the reserved space in the buffer comprising an adjustable amount of space or a fixed amount of space.
The network element of claim 32, wherein the packet-processing device is further configured to:

enqueue the incoming data packet to the reserved space in the buffer when:

the identification data identifies the incoming data packet as belonging to the second data packets, and the reserved space in the buffer has space available to buffer the incoming data packet.
The network element of claim 30, wherein:

the rate of arrival at the packet-processing device of the first packets depends from a ratio between:

a count of the first data packets arriving at the packet-processing device within a register window, and

a length of the register window; and

the rate of arrival at the packet-processing device of the second packets depends from a ratio between:

a count of the second data packets arriving at the packet-processing device within the register window, and

the length of the register window.
The network element of any one of claims 28 to 34, wherein the packet-processing device is further configured to:

when the incoming data packet is enqueued to the buffer:

assign a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assign a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
The network element of claim 33, wherein the packet-processing device is further configured to:

when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

assign a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assign a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
A system comprising:

a network, the network comprising a first subnetwork and a second subnetwork, the first subnetwork being coupled to the second subnetwork;

a first transmitter, the first transmitter being in the first subnetwork and being configured to send first data packets;

a second transmitter, the second transmitter being in the first subnetwork and being configured to send second data packets;

a first receiver, the first receiver being in the first subnetwork and being configured to receive the first data packets;

a second receiver, the second receiver being in the second subnetwork and being configured to receive the second data packets; and

a packet-processing device, the packet-processing device being in the network, and including a buffer, and being configured to:

receive an incoming data packet, the incoming data packet comprising identification data that identifies the incoming data packet as belonging to the first data packets or as belonging to the second data packets;

enqueue the incoming data packet to the buffer when:

the identification data identifies the incoming data packet as belonging to the first data packets, and a proportion of the first data packets to a sum of the first data packets and the second data packets is less than a threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer has space available to buffer the incoming data packet.
A system comprising:

a network, the network comprising a first subnetwork and a second subnetwork, the first subnetwork being coupled to the second subnetwork;

a first transmitter, the first transmitter being in the first subnetwork and being configured to send first data packets;

a second transmitter, the second transmitter being in the second subnetwork and being configured to send second data packets;

a first receiver, the first receiver being in the first subnetwork and being configured to receive the first data packets;

a second receiver, the second receiver being in the first subnetwork and being configured to receive the second data packets; and

a packet-processing device, the packet-processing device being in the network, and including a buffer, and being configured to:

receive an incoming data packet, the incoming data packet belonging to the first data packets or the second data packets, and comprising identification data that identifies the incoming data packet as belonging to the first data packets or as belonging to the second data packets;

enqueue the incoming data packet to the buffer when:

the identification data identifies the incoming data packet as belonging to the first data packets, and a proportion of the first data packets to a sum of the first data packets and the second data packets is less than a threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer has space available to buffer the incoming data packet.
The system of any one of claims 37 or 38, wherein the packet-processing device is further configured to:

drop the incoming data packet when:

the identification data identifies the incoming data packet as belonging to the first data packets, and the proportion of the first data packets to the sum of the first data packets and the second data packets is greater than or equal to the threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer is devoid of space available to buffer the incoming data packet.
The system of any one of claims 37 to 39, wherein the proportion of the first data packets to the sum of the first data packets and the second data packets depends from:

a rate of arrival at the packet-processing device of the first data packets, and

a rate of arrival at the packet-processing device of the second data packets.
The system of any one of claims 37 to 40, wherein the threshold value depends from a ratio between an unoccupied space of the buffer and a buffer capacity.
The system of any one of claims 37 to 40, wherein the threshold value depends from a ratio between:

an unoccupied space of the buffer, and

an unreserved capacity of the buffer, the unreserved capacity being defined as a difference between:

a buffer capacity, and

a reserved space in the buffer, the reserved space in the buffer comprising an adjustable amount of space or a fixed amount of space.
The system of claim 42, wherein the packet-processing device is further configured to:

enqueue the incoming data packet to the reserved space in the buffer when:

the identification data identifies the incoming data packet as belonging to the second data packets, and the reserved space in the buffer has space available to buffer the incoming data packet.
The system of claim 40, wherein:

the rate of arrival at the packet-processing device of the first packets depends from a ratio between:

a count of the first data packets arriving at the packet-processing device within a register window, and

a length of the register window; and

the rate of arrival at the packet-processing device of the second packets depends from a ratio between:

a count of the second data packets arriving at the packet-processing device within the register window, and

the length of the register window.
The system of any one of claims 37 to 44, wherein the packet-processing device is further configured to:

when the incoming data packet is enqueued to the buffer:

assign a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assign a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
The system of claim 43, wherein the packet-processing device is further configured to: when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

assign a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assign a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
A packet-processing network apparatus comprised in a network, the network including a first subnetwork and a second subnetwork, the first subnetwork being coupled to the second subnetwork, the apparatus being deployed in the first subnetwork, the apparatus configured to receive first data packets transiting within the first subnetwork and second data packets transiting between the first subnetwork and the second subnetwork, the apparatus comprising:

a buffer;

a processor coupled to the buffer; and

a tangible processor-readable memory having recorded thereon instructions to be performed by the processor to carry out a set of actions, the set of actions comprising:

receiving an incoming data packet, the incoming data packet comprising identification data that identifies the incoming data packet as belonging to the first data packets or as belonging to the second data packets; and

enqueueing the incoming data packet to the buffer when:

the identification data identifies the incoming data packet as belonging to the first data packets, and a proportion of the first data packets to a sum of the first data packets and the second data packets is less than a threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer has space available to buffer the incoming data packet.
The apparatus of claim 47, wherein the set of actions further comprises:

dropping the incoming data packet when:

the identification data identifies the incoming data packet as belonging to the first data packets, and the proportion of the first data packets to the sum of the first data packets and the second data packets is greater than or equal to the threshold value;

or

the identification data identifies the incoming data packet as belonging to the second data packets, and the buffer is devoid of space available to buffer the incoming data packet.
The apparatus of claim 47 or claim 48, wherein the proportion of the first data packets to the sum of the first data packets and the second data packets depends from:

a rate of arrival at the apparatus of the first data packets, and

a rate of arrival at the apparatus of the second data packets.
The apparatus of any one of claims 47 to 49, wherein the threshold value depends from a ratio between an unoccupied space of the buffer and a buffer capacity.
The apparatus of any one of claims 47 to 49, wherein the threshold value depends from a ratio between:

an unoccupied space of the buffer, and

an unreserved capacity of the buffer, the unreserved capacity being defined as a difference between:

a buffer capacity, and

a reserved space in the buffer, the reserved space in the buffer comprising an adjustable amount of space or a fixed amount of space.
The apparatus of claim 51, wherein the set of actions further comprises:

enqueuing the incoming data packet to the reserved space in the buffer when:

the identification data identifies the incoming data packet as belonging to the second data packets, and the reserved space in the buffer has space available to buffer the incoming data packet.
The apparatus of claim 49, wherein:

the rate of arrival at the apparatus of the first packets depends from a ratio between:

a count of the first data packets arriving at the apparatus within a register window, and

a length of the register window; and

the rate of arrival at the apparatus of the second packets depends from a ratio between:

a count of the second data packets arriving at the apparatus within the register window, and

the length of the register window.
The apparatus of any one of claims 47 to 53, wherein the set of actions further comprises, when the incoming data packet is enqueued to the buffer:

assigning a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assigning a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.
The apparatus of claim 52, wherein the set of actions further comprises, when the incoming data packet is enqueued to the buffer or the reserved space in the buffer:

assigning a first priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the first data packets, and

assigning a second priority for dequeuing to the incoming data packet when the identification data indicates the data packet as belonging to the second data packets, the second priority being less than the first priority.