US20130205038A1 - Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network - Google Patents
Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network Download PDFInfo
- Publication number
- US20130205038A1 US20130205038A1 US13/366,640 US201213366640A US2013205038A1 US 20130205038 A1 US20130205038 A1 US 20130205038A1 US 201213366640 A US201213366640 A US 201213366640A US 2013205038 A1 US2013205038 A1 US 2013205038A1
- Authority
- US
- United States
- Prior art keywords
- layer
- network
- dcb
- protocol
- congestion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000005540 biological transmission Effects 0.000 claims abstract description 24
- 230000004044 response Effects 0.000 claims description 14
- 239000000872 buffer Substances 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 5
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000003362 replicative effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims 8
- 238000000034 method Methods 0.000 description 66
- 230000001276 controlling effect Effects 0.000 description 12
- 230000001934 delay Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000796 flavoring agent Substances 0.000 description 2
- 235000019634 flavors Nutrition 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/11—Identifying congestion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/19—Flow control; Congestion control at layers above the network layer
- H04L47/193—Flow control; Congestion control at layers above the network layer at the transport layer, e.g. TCP related
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/26—Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
Definitions
- the invention relates to the field of computer networking, and, more particularly, to Ethernet networks.
- Converged Enhanced Ethernet (CEE) datacenters allow high link speeds and short delays while introducing lossless operation (and lossless traffic classes) by the means of link layer flow control (LL-FC, a.k.a. Priority Flow Control (PFC) in CEE) beyond the traditional lossy operation (lossy traffic classes).
- LL-FC link layer flow control
- PFC Priority Flow Control
- a reliability system for a Converged Enhanced Ethernet network may include a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network.
- the system may also include an adaptor between the layer 4 transport layer comprising one or more protocols, such as TCP, UDP, RCP, DCCP, XCP, etc., and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
- protocols such as TCP, UDP, RCP, DCCP, XCP, etc.
- the DCB layer 2 network may generate flow control signals according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC).
- the DCB layer 2 network may generate congestion control feedback signals according to a quantized congestion notification (QCN) protocol.
- PFC and QCN can be individually or simultaneously enabled in the DCB layer 2 network. If both PFC and QCN are enabled, either one or both may be independently used by any end point.
- the end point may be connected to the DCB layer 2 network through an end station, wherein the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals.
- QCN quantized congestion notification
- the network traffic generated by the transport layer may be carried on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
- PFC priority flow control
- the network traffic generated by the transport layer may be carried on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by the adjacent switch.
- the network traffic generated by the transport layer may be carried on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
- the network traffic generated by the transport layer may be carried on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch.
- the adaptor may preprocess the flow and congestion control feedback signals into consolidated feedback signals, with the preprocessing including at least one of delaying, aggregating, filtering, replicating, enhancing and decimating the primary feedback signals.
- the layer 4 transport layer may be a Transmission Control Protocol (TCP), RCP, XCP, DCCP, UDP or any socket-based transport scheme, herein named TCP.
- TCP Transmission Control Protocol
- the interface may provide a reduced-rate consolidated feedback signal indicating congestion severity induced by a TCP flow, and in which the interface comprises a TCP congestion module for controlling TCP flow transmissions in response to the consolidated feedback signal.
- the consolidated feedback signal may comprise at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit for processing TCP ACKs [if existent, as UDP doesn't employ ACK] or Explicit Congestion Notifications (ECN) and for controlling associated TCP transmissions.
- the congestion module adjusts a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal.
- the method may include providing a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network.
- the method may also include positioning an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
- DCB data center bridging
- the method may further include generating flow control signals at the DCB layer 2 network according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC).
- the method may additionally include generating congestion control feedback signals at the DCB layer 2 network according to a quantized congestion notification (QCN) protocol.
- QCN quantized congestion notification
- the method may also include connecting the end point to the DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals.
- QCN quantized congestion notification
- the method may further include carrying network traffic generated by the transport layer on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
- PFC priority flow control
- the method may additionally include carrying the network traffic generated by the transport layer on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch.
- PFC priority flow control
- switch we refer to any physical or virtual device that may be used for switching, bridging, steering, sorting, routing, forwarding, scheduling packets or Ethernet frames.
- the method may also include processing TCP ACKs and/or ECNs, and controlling associated TCP transmissions where the consolidated feedback signal comprises at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit.
- the method may further include adjusting a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal via the congestion module.
- the computer readable program codes may be configured to cause the program to provide a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network.
- the computer readable program codes may also position an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
- FIG. 1 is a block diagram illustrating a Converged Enhanced network in accordance with the invention.
- FIG. 2 is a flowchart illustrating method aspects according to the invention.
- FIG. 3 is a flowchart illustrating method aspects according to the method of FIG. 2 .
- FIG. 4 is a flowchart illustrating method aspects according to the method of FIG. 2 .
- FIG. 5 is a flowchart illustrating method aspects according to the method of FIG. 4 .
- FIG. 6 is a flowchart illustrating method aspects according to the method of FIG. 4 .
- FIG. 7 is a flowchart illustrating method aspects according to the method of FIG. 4 .
- FIG. 8 is a flowchart illustrating method aspects according to the method of FIG. 5 .
- FIG. 9 is a flowchart illustrating method aspects according to the method of FIG. 5 .
- FIG. 10 illustrates a prior art hotspot saturation tree in a 5-stage fat tree.
- FIG. 11 illustrates explicit congestion notification buffering size in the prior art.
- FIG. 12 is a block diagram illustrating an alternative Converged Enhanced network embodiment in accordance with the invention.
- the system 10 includes a plurality of end points 14 a - 14 n each comprising a layer 4 transport layer 16 a - 16 n , where each end point is connected to a data center bridging (DCB) layer 2 network 18 .
- the system 10 also includes an adaptor 20 between the layer 4 transport layer 16 a - 16 n and the DCB layer 2 network 18 to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
- the DCB layer 2 network 18 generates flow control signals according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC) and/or the like. In another embodiment, the DCB layer 2 network 18 generates congestion control feedback signals according to a quantized congestion notification (QCN) protocol.
- PFC Priority Flow Control
- QCN quantized congestion notification
- the end point 14 a - 14 n is connected to the DCB layer 2 network 18 through an end station 22 , wherein the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals.
- QCN quantized congestion notification
- the network traffic generated by the transport layer 16 a - 16 n is carried on layer 2 18 with lossy operation, either by configuring the end station 22 to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
- PFC priority flow control
- the network traffic generated by the transport layer 16 a - 16 n is carried on layer 2 18 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station 22 react to layer 2 flow control messages generated by the adjacent switch 24 .
- the network traffic generated by the transport layer 16 a - 16 n is carried on layer 2 18 with lossy operation, either by configuring the end station 22 to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
- PFC priority flow control
- the network traffic generated by the transport layer 16 a - 16 n is carried on layer 2 18 with lossless operation, by configuring the end station 22 to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch 24 .
- the adaptor 20 preprocess the flow and congestion control feedback signals into consolidated feedback signals, with the preprocessing including at least one of delaying, aggregating, filtering, replicating, enhancing and decimating the primary feedback signals.
- the layer 4 transport layer 16 a - 16 n is a Transmission Control Protocol (TCP) layer.
- the adaptor 20 provides a reduced-rate consolidated feedback signal indicating congestion severity induced by a TCP flow, and in which the adaptor comprises a TCP congestion module for controlling TCP flow transmissions in response to the consolidated feedback signal.
- the consolidated feedback signal may comprise at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit for processing TCP ACKs and for controlling associated TCP transmissions.
- the congestion module adjusts a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal.
- the method begins at Block 34 and may include providing a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network at Block 36 .
- the method may also include positioning an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer at Block 38 .
- the method ends at Block 40 .
- the method begins at Block 44 .
- the method may include the steps of FIG. 2 at Blocks 36 and 38 .
- the method may further include generating flow control signals at the DCB layer 2 network according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC) at Block 46 .
- PFC Priority Flow Control
- the method ends at Block 48 .
- the method begins at Block 52 .
- the method may include the steps of FIG. 2 at Blocks 36 and 38 .
- the method may additionally include generating congestion control feedback signals at the DCB layer 2 network according to a quantized congestion notification (QCN) protocol at Block 54 .
- QCN quantized congestion notification
- the method begins at Block 60 .
- the method may include the steps of FIG. 4 at Blocks 36 , 38 , and 54 .
- the method may also include connecting the end point to the DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals at Block 62 .
- QCN quantized congestion notification
- the method ends at Block 64 .
- the method begins at Block 68 .
- the method may include the steps of FIG. 4 at Blocks 36 , 38 , and 54 .
- the method may further include carrying network traffic generated by the transport layer on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol at Block 70 .
- PFC priority flow control
- the method begins at Block 76 .
- the method may include the steps of FIG. 4 at Blocks 36 , 38 , and 54 .
- the method may additionally include carrying the network traffic generated by the transport layer on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch at Block 78 .
- PFC priority flow control
- the method ends at Block 80 .
- the method begins at Block 84 .
- the method may include the steps of FIG. 5 at Blocks 36 , 38 , 54 , and 62 .
- the method may also include processing TCP ACKs and ECNs and controlling associated TCP transmissions where the consolidated feedback signal comprises at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit at Block 86 .
- the method ends at Block 88 .
- the method begins at Block 92 .
- the method may include the steps of FIG. 5 at Blocks 36 , 38 , 54 , and 62 .
- the method may further include adjusting a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal via the congestion module at Block 94 .
- the method ends at Block 96 .
- the computer readable program codes may be configured to cause the program to provide a plurality of end points 14 a - 14 n each comprising a layer 4 transport layer 16 a - 16 n respectively, where each end point is connected to a data center bridging (DCB) layer 2 network 18 .
- the computer readable program codes may also position an adaptor 20 between the layer 4 transport layer 16 a - 16 n and the DCB layer 2 network 18 to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
- the system 10 provides reliability in a Converged Enhanced Ethernet network.
- CEE Converged Enhanced Ethernet
- DCN Converged Enhanced Ethernet
- CEE datacenters allow high link speeds and short delays while introducing lossless operation (and lossless traffic classes) by the means of link layer flow control (LL-FC, aka PFC in CEE) beyond the traditional lossy operation (lossy traffic classes).
- LL-FC link layer flow control
- PFC link layer flow control
- the lossless operation of CEE introduces new challenges, such as deadlocks and saturation tree congestion. Namely, a single hotspot saturation tree congestion can cause a total DCN collapse within a few 10s-100s of us.
- FIG. 10 illustrates the problem (hotspot congestion box). If a sufficient fraction of all the inputs' traffic targets one of the outputs (in the figure, the output labeled 128 ), that output link can saturate: it becomes a hotspot (HS) that causes the queues in the switch feeding that link to fill up. If the traffic pattern persists, then, no matter what techniques are used to reassign buffer space, it is all ultimately exhausted. This forces that switch's LL-FC to quickly throttle back all the inputs feeding that switch. That in turn causes the previous stage to fill its buffer space. In a domino effect, the congestion eventually backs up all the way to the network inputs. This has been called tree saturation or, in other contexts, high-order Head of Line (HOL) blocking congestion spreading.
- HOL Head of Line
- the traffic causing the hotspot will root one or more saturation trees partly caused by the inherent traffic distribution and partly by flow interference or high-order HOL blocking.
- lossless LL-FC offers substantial performance benefits, albeit it has the drawback, besides its complexity, of facilitating saturation tree congestion.
- lossless ICTNs such as CEE-based DCNs will be increasingly exposed to saturation trees and congestion collapse.
- the first attempts in the CEE context were done in IEEE 802.1Qau, by using the QCN mechanism against simple (single bottleneck), yet persistent, hotspot congestion.
- TCP transmission control protocol
- IP Internet Protocol
- Double feedback loop Unlike TCP in IP networks, FCC mechanisms in DCNs are based on a dual closed-loop control system: (i) LL-FC (PFC) and (ii) end-to-end CM (either QCN, or TCP, or both).
- the former is the smaller and faster loop taking care of LL correctness and, sometimes, performance like e.g. advanced scheduling [ETS].
- CM involves a larger and slower loop with much longer time constants than the LL RTT; a complete CM solution may include congestion avoidance/prevention and control (after it happens). Since CM is inherently slower than its underlying LL-FC loops, it needs an aggregated view of the ICTN status—whereas the LL-FC relies only on local status.
- CM should compensate the inertia of its larger loop by (a) acquiring global view. Feedback (QCN CNM, TCP ECNs, Vegas' delays etc.) about traffic conditions and (b) elaborating a more complex source reaction that considers the outdated global view and ideally, tries to predict the traffic based on the trends acquired so far.
- TCP does not assume the existence of a fast and lossless LL-FC layer; nor does TCP coexist well with other flow control schemes (QCN), as proven by TCP over ATM/ABR.
- Shallow buffers The alternative would be to over-design the switch buffers beyond the size mandated for lossless ICTNs. This, however, is not practically possible (see FIG. 11 ), but also aggravates the post-congestion phase by slowing its recovery.
- TCP and ABR were extensively studied and improved for BE networks, we still lack conclusive evidence of their applicability and sufficiency in ICTNs.
- recent research invalidates TCP's use for certain types of middleware, as well as the TCP Incast.
- TCP was designed in early 80s to curb single bottleneck congestion in lossy BE networks with e2e lags of 100s of ms and 10s of MB switch buffers.
- a CEE-based DCN is lossless (hence multi-bottleneck saturation tree congestion), fast (lags of 0.5-50 us) and shallow (10-100s KB) buffers.
- system 10 uses the following changes/enhancements to TCP, resulting in “DC-TCP”: 1) Employ a software and/or hardware version of TCP, such as (CU)BIC, Reno, Vegas, Compound etc. in the end nodes.
- a software and/or hardware version of TCP such as (CU)BIC, Reno, Vegas, Compound etc.
- Future CEE DCN will implement native L2 CM, i.e. QCN (see 802.1Qau in [42]). Retain the QCN congestion detection, while disabling the QCN rate limiter in the source.
- Congestion signaling and TCP rate limiter Replace or complement the traditional TCP rate limiter based on duplicate ACKs with a hybrid rate limiter based on backward congestion notifications (BCNs) and QCN congestion notification messages (CNMs). Feed a digested form of CNMs associated with the TCP source into TCP for window control based on L2 feedback.
- BCNs backward congestion notifications
- CNMs QCN congestion notification messages
- TCP constants e.g. RTO
- Potentially adapt to changing network size and delay in real time (optional, via delay probing or Feedback Request protocol).
- the TCP receiver will report congestion/loss in the lossy network via duplicate ACKs.
- the CNMs sent towards the source must be appropriately translated at the boundary between the lossless and the lossy networks.
- One possibility is to convert CNMs to TCP window scalings in the boundary switch, as CNMs will not be understood by the lossy network.
- FIG. 12 illustrates one embodiment of system 10 .
- System 10 adapts TCP to a lossless DCN, by combining a re-tuned TCP flavor (CUBIC, Compound and New Reno are favored, others may apply) with L2 QCN signaling.
- CBIC re-tuned TCP flavor
- System 10 copes with saturation trees issues, us latency, and shallow buffers.
- System 10 also compensates and adapts to rapidly changing DCN loads.
- System 10 provided full TCP socket compatibility, and therefore legacy application support.
- a method for preventing the spread of packet congestion while simultaneously preventing packet loss in the network having at least one source channel adapter, at least one destination channel adapter, and multiple fiber channel over Ethernet (FCoE) enabled switches 24 is enabled by system 10 .
- the system 10 detects congestion occurring within the data center network.
- the system 10 measures the extent of the congestion and generates a feedback signal (value) at the Layer 2 level, notifying the source channel adapter and destination channel adapter that congestion is occurring.
- the system 10 also compensates for that congestion by changing the packet injection rate (within a sliding window) by an amount proportional to the magnitude of the feedback signal and dynamically readjusts the feedback signal (value) based on the extent of congestion.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A reliability system for a Converged Enhanced Ethernet network may include a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network. The system may also include an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
Description
- The invention relates to the field of computer networking, and, more particularly, to Ethernet networks.
- Converged Enhanced Ethernet (CEE) datacenters allow high link speeds and short delays while introducing lossless operation (and lossless traffic classes) by the means of link layer flow control (LL-FC, a.k.a. Priority Flow Control (PFC) in CEE) beyond the traditional lossy operation (lossy traffic classes). However, in contrast to the traditional Ethernet and Internet, the lossless operation of CEE introduces new challenges.
- According to one embodiment of the invention, a reliability system for a Converged Enhanced Ethernet network may include a plurality of end points each comprising a
layer 4 transport layer, where each end point is connected to a data center bridging (DCB)layer 2 network. The system may also include an adaptor between thelayer 4 transport layer comprising one or more protocols, such as TCP, UDP, RCP, DCCP, XCP, etc., and theDCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer. - The
DCB layer 2 network may generate flow control signals according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC). TheDCB layer 2 network may generate congestion control feedback signals according to a quantized congestion notification (QCN) protocol. PFC and QCN can be individually or simultaneously enabled in theDCB layer 2 network. If both PFC and QCN are enabled, either one or both may be independently used by any end point. - The end point may be connected to the
DCB layer 2 network through an end station, wherein the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in theDCB layer 2 network in response to receiving congestion control signals. The network traffic generated by the transport layer may be carried onlayer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol. - The network traffic generated by the transport layer may be carried on
layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react tolayer 2 flow control messages generated by the adjacent switch. The network traffic generated by the transport layer may be carried onlayer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol. - The network traffic generated by the transport layer may be carried on
layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react tolayer 2 flow control messages generated by an adjacent switch.] The adaptor may preprocess the flow and congestion control feedback signals into consolidated feedback signals, with the preprocessing including at least one of delaying, aggregating, filtering, replicating, enhancing and decimating the primary feedback signals. - The
layer 4 transport layer may be a Transmission Control Protocol (TCP), RCP, XCP, DCCP, UDP or any socket-based transport scheme, herein named TCP. The interface may provide a reduced-rate consolidated feedback signal indicating congestion severity induced by a TCP flow, and in which the interface comprises a TCP congestion module for controlling TCP flow transmissions in response to the consolidated feedback signal. - The consolidated feedback signal may comprise at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit for processing TCP ACKs [if existent, as UDP doesn't employ ACK] or Explicit Congestion Notifications (ECN) and for controlling associated TCP transmissions. The congestion module adjusts a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal.
- Another aspect of the invention is a reliability method for a Converged Enhanced Ethernet network. The method may include providing a plurality of end points each comprising a
layer 4 transport layer, where each end point is connected to a data center bridging (DCB)layer 2 network. The method may also include positioning an adaptor between thelayer 4 transport layer and theDCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer. - The method may further include generating flow control signals at the
DCB layer 2 network according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC). The method may additionally include generating congestion control feedback signals at theDCB layer 2 network according to a quantized congestion notification (QCN) protocol. - The method may also include connecting the end point to the
DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in theDCB layer 2 network in response to receiving congestion control signals. The method may further include carrying network traffic generated by the transport layer onlayer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol. - The method may additionally include carrying the network traffic generated by the transport layer on
layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react tolayer 2 flow control messages generated by an adjacent switch. By switch we refer to any physical or virtual device that may be used for switching, bridging, steering, sorting, routing, forwarding, scheduling packets or Ethernet frames. The method may also include processing TCP ACKs and/or ECNs, and controlling associated TCP transmissions where the consolidated feedback signal comprises at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit. The method may further include adjusting a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal via the congestion module. - Another aspect of the invention is a computer readable program codes coupled to tangible media to address reliability in a converged Ethernet network. The computer readable program codes may be configured to cause the program to provide a plurality of end points each comprising a
layer 4 transport layer, where each end point is connected to a data center bridging (DCB)layer 2 network. The computer readable program codes may also position an adaptor between thelayer 4 transport layer and theDCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer. -
FIG. 1 is a block diagram illustrating a Converged Enhanced network in accordance with the invention. -
FIG. 2 is a flowchart illustrating method aspects according to the invention. -
FIG. 3 is a flowchart illustrating method aspects according to the method ofFIG. 2 . -
FIG. 4 is a flowchart illustrating method aspects according to the method ofFIG. 2 . -
FIG. 5 is a flowchart illustrating method aspects according to the method ofFIG. 4 . -
FIG. 6 is a flowchart illustrating method aspects according to the method ofFIG. 4 . -
FIG. 7 is a flowchart illustrating method aspects according to the method ofFIG. 4 . -
FIG. 8 is a flowchart illustrating method aspects according to the method ofFIG. 5 . -
FIG. 9 is a flowchart illustrating method aspects according to the method ofFIG. 5 . -
FIG. 10 illustrates a prior art hotspot saturation tree in a 5-stage fat tree. -
FIG. 11 illustrates explicit congestion notification buffering size in the prior art. -
FIG. 12 is a block diagram illustrating an alternative Converged Enhanced network embodiment in accordance with the invention. - The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. Like numbers refer to like elements throughout, and like numbers with letter suffixes are used to identify similar parts in a single embodiment.
- With reference now to
FIG. 1 , areliability system 10 for a Converged Enhanced Ethernetnetwork 12 is initially described. In an embodiment, thesystem 10 includes a plurality of end points 14 a-14 n each comprising alayer 4transport layer 16 a-16 n, where each end point is connected to a data center bridging (DCB)layer 2network 18. Thesystem 10 also includes anadaptor 20 between thelayer 4transport layer 16 a-16 n and theDCB layer 2network 18 to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer. - In one embodiment, the
DCB layer 2network 18 generates flow control signals according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC) and/or the like. In another embodiment, theDCB layer 2network 18 generates congestion control feedback signals according to a quantized congestion notification (QCN) protocol. - In one embodiment, the end point 14 a-14 n is connected to the
DCB layer 2network 18 through anend station 22, wherein the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in theDCB layer 2 network in response to receiving congestion control signals. In another embodiment, the network traffic generated by thetransport layer 16 a-16 n is carried onlayer 2 18 with lossy operation, either by configuring theend station 22 to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol. - In one embodiment, the network traffic generated by the
transport layer 16 a-16 n is carried onlayer 2 18 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting theend station 22 react tolayer 2 flow control messages generated by theadjacent switch 24. In another embodiment, the network traffic generated by thetransport layer 16 a-16 n is carried onlayer 2 18 with lossy operation, either by configuring theend station 22 to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol. - In one embodiment, the network traffic generated by the
transport layer 16 a-16 n is carried onlayer 2 18 with lossless operation, by configuring theend station 22 to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react tolayer 2 flow control messages generated by anadjacent switch 24. In another embodiment, theadaptor 20 preprocess the flow and congestion control feedback signals into consolidated feedback signals, with the preprocessing including at least one of delaying, aggregating, filtering, replicating, enhancing and decimating the primary feedback signals. - In one embodiment, the
layer 4transport layer 16 a-16 n is a Transmission Control Protocol (TCP) layer. In another embodiment, theadaptor 20 provides a reduced-rate consolidated feedback signal indicating congestion severity induced by a TCP flow, and in which the adaptor comprises a TCP congestion module for controlling TCP flow transmissions in response to the consolidated feedback signal. - The consolidated feedback signal may comprise at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit for processing TCP ACKs and for controlling associated TCP transmissions. The congestion module adjusts a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal.
- Another aspect of the invention is a reliability method for a Converged Enhanced Ethernet network, which is now described with reference to
flowchart 32 ofFIG. 2 . The method begins atBlock 34 and may include providing a plurality of end points each comprising alayer 4 transport layer, where each end point is connected to a data center bridging (DCB)layer 2 network atBlock 36. The method may also include positioning an adaptor between thelayer 4 transport layer and theDCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer atBlock 38. The method ends atBlock 40. - In another method embodiment, which is now described with reference to
flowchart 42 ofFIG. 3 , the method begins atBlock 44. The method may include the steps ofFIG. 2 atBlocks DCB layer 2 network according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC) atBlock 46. The method ends atBlock 48. - In another method embodiment, which is now described with reference to
flowchart 50 ofFIG. 4 , the method begins atBlock 52. The method may include the steps ofFIG. 2 atBlocks DCB layer 2 network according to a quantized congestion notification (QCN) protocol atBlock 54. The method ends atBlock 56. - In another method embodiment, which is now described with reference to
flowchart 58 ofFIG. 5 , the method begins atBlock 60. The method may include the steps ofFIG. 4 atBlocks DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in theDCB layer 2 network in response to receiving congestion control signals atBlock 62. The method ends atBlock 64. - In another method embodiment, which is now described with reference to
flowchart 66 ofFIG. 6 , the method begins atBlock 68. The method may include the steps ofFIG. 4 atBlocks layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol atBlock 70. The method ends atBlock 72. - In another method embodiment, which is now described with reference to
flowchart 74 ofFIG. 7 , the method begins atBlock 76. The method may include the steps ofFIG. 4 atBlocks layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react tolayer 2 flow control messages generated by an adjacent switch atBlock 78. The method ends atBlock 80. - In another method embodiment, which is now described with reference to
flowchart 82 ofFIG. 8 , the method begins atBlock 84. The method may include the steps ofFIG. 5 atBlocks Block 86. The method ends atBlock 88. - In another method embodiment, which is now described with reference to
flowchart 90 ofFIG. 9 , the method begins atBlock 92. The method may include the steps ofFIG. 5 atBlocks Block 94. The method ends atBlock 96. - Another aspect of the invention is a computer readable program codes coupled to tangible media to address reliability in a converged
Ethernet network 12. The computer readable program codes may be configured to cause the program to provide a plurality of end points 14 a-14 n each comprising alayer 4transport layer 16 a-16 n respectively, where each end point is connected to a data center bridging (DCB)layer 2network 18. The computer readable program codes may also position anadaptor 20 between thelayer 4transport layer 16 a-16 n and theDCB layer 2network 18 to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer. - In view of the foregoing, the
system 10 provides reliability in a Converged Enhanced Ethernet network. For example, current saturation Converged Enhanced Ethernet (CEE) data center networks (DCN) suffer from tree congestion in the CEE/DCN. CEE datacenters allow high link speeds and short delays while introducing lossless operation (and lossless traffic classes) by the means of link layer flow control (LL-FC, aka PFC in CEE) beyond the traditional lossy operation (lossy traffic classes). However, in contrast to the traditional Ethernet and Internet, the lossless operation of CEE introduces new challenges, such as deadlocks and saturation tree congestion. Namely, a single hotspot saturation tree congestion can cause a total DCN collapse within a few 10s-100s of us. -
FIG. 10 illustrates the problem (hotspot congestion box). If a sufficient fraction of all the inputs' traffic targets one of the outputs (in the figure, the output labeled 128), that output link can saturate: it becomes a hotspot (HS) that causes the queues in the switch feeding that link to fill up. If the traffic pattern persists, then, no matter what techniques are used to reassign buffer space, it is all ultimately exhausted. This forces that switch's LL-FC to quickly throttle back all the inputs feeding that switch. That in turn causes the previous stage to fill its buffer space. In a domino effect, the congestion eventually backs up all the way to the network inputs. This has been called tree saturation or, in other contexts, high-order Head of Line (HOL) blocking congestion spreading. - Ultimately, the traffic causing the hotspot will root one or more saturation trees partly caused by the inherent traffic distribution and partly by flow interference or high-order HOL blocking. Once the tree of saturated switches is fully formed, every packet must cross at least one saturated switch. As the time to exit a queue grows exponentially the further a switch is from the hot destination, a majority of the delay is incurred even if only a single switch must be crossed. Hence, the network as a whole suffers a catastrophic loss of throughput: Its aggregate throughput is gated by the throughput of the single hot output.
- Saturation spreads very quickly via LL-FC; according to gathered data, the tree is filled in less than 10 traversal times of the network, far too quickly for software to react in time to the problem. Naturally, the problem also dissipates slowly because all the queues involved must be emptied. Hence, a hardware solution is required that reacts quickly enough to keep the tree from growing large. Clearly the network topology is irrelevant to this effect; saturation trees can be induced in any DCN topology.
- Thus, lossless LL-FC offers substantial performance benefits, albeit it has the drawback, besides its complexity, of facilitating saturation tree congestion. Unless an efficient CM protocol is designed and implemented to control the fabric operation just below the saturation region and recover from the occasional crossovers, lossless ICTNs such as CEE-based DCNs will be increasingly exposed to saturation trees and congestion collapse. However, while the problem is long outstanding definitive solutions are not yet practically available. The first attempts in the CEE context were done in IEEE 802.1Qau, by using the QCN mechanism against simple (single bottleneck), yet persistent, hotspot congestion.
- Why not rely on a widely deployed solution such as transmission control protocol (TCP)? The answer is that DCNs and their congestion phenomena are sufficiently different from the Ethernet (aka Best Effort) and Internet Protocol (IP) networks to invalidate the direct transfer of TCP (even if ECN is added)—that is, without major adaptations—to the DCN environment. The main three reasons are: Losslessness: TCP has been designed to operate based on loss; packet drops are the basic feedback mechanism which triggers the source reaction. Packet loss, however, contradicts CEE's principles.
- Next, recovery is very/too slow whenever the TCP window is smaller than 6 packets. In smaller ICTNs with large MTUs the TCP window size is mostly <6 packets. Conversely, if the recovery is too aggressive, performance may decrease 10-fold. This has been recently validated by the TCP Incast papers.
- Double feedback loop: Unlike TCP in IP networks, FCC mechanisms in DCNs are based on a dual closed-loop control system: (i) LL-FC (PFC) and (ii) end-to-end CM (either QCN, or TCP, or both). The former is the smaller and faster loop taking care of LL correctness and, sometimes, performance like e.g. advanced scheduling [ETS]. CM involves a larger and slower loop with much longer time constants than the LL RTT; a complete CM solution may include congestion avoidance/prevention and control (after it happens). Since CM is inherently slower than its underlying LL-FC loops, it needs an aggregated view of the ICTN status—whereas the LL-FC relies only on local status. Thus CM should compensate the inertia of its larger loop by (a) acquiring global view. Feedback (QCN CNM, TCP ECNs, Vegas' delays etc.) about traffic conditions and (b) elaborating a more complex source reaction that considers the outdated global view and ideally, tries to predict the traffic based on the trends acquired so far. Problem is that TCP does not assume the existence of a fast and lossless LL-FC layer; nor does TCP coexist well with other flow control schemes (QCN), as proven by TCP over ATM/ABR.
- Shallow buffers: The alternative would be to over-design the switch buffers beyond the size mandated for lossless ICTNs. This, however, is not practically possible (see
FIG. 11 ), but also aggravates the post-congestion phase by slowing its recovery. Whereas TCP (and ABR) were extensively studied and improved for BE networks, we still lack conclusive evidence of their applicability and sufficiency in ICTNs. Furthermore, recent research invalidates TCP's use for certain types of middleware, as well as the TCP Incast. - TCP was designed in early 80s to curb single bottleneck congestion in lossy BE networks with e2e lags of 100s of ms and 10s of MB switch buffers. By contrast, a CEE-based DCN is lossless (hence multi-bottleneck saturation tree congestion), fast (lags of 0.5-50 us) and shallow (10-100s KB) buffers.
- In response,
system 10 uses the following changes/enhancements to TCP, resulting in “DC-TCP”: 1) Employ a software and/or hardware version of TCP, such as (CU)BIC, Reno, Vegas, Compound etc. in the end nodes. - 2) Disable QCN's controller, if present. Future CEE DCN will implement native L2 CM, i.e. QCN (see 802.1Qau in [42]). Retain the QCN congestion detection, while disabling the QCN rate limiter in the source.
- 3) Congestion signaling and TCP rate limiter: Replace or complement the traditional TCP rate limiter based on duplicate ACKs with a hybrid rate limiter based on backward congestion notifications (BCNs) and QCN congestion notification messages (CNMs). Feed a digested form of CNMs associated with the TCP source into TCP for window control based on L2 feedback.
- 4) Detect and compensate for saturation tree. Re-tune the TCP constants (e.g. RTO) based on the DCN topology and size. Potentially adapt to changing network size and delay in real time (optional, via delay probing or Feedback Request protocol).
- One challenge will be that congestion may occur in an external Ethernet network that may be lossy, in which case congestion may result in packet loss and ensuing duplicate ACKs from the TCP receiver. This is about interoperability of DC-TCP with TCP in the external network. Hence the need for a hybrid rate limiter that understands BCNs, digested CNMs, as well as duplicate ACKs.
- In case of a DC-TCP sender and a TCP receiver in a lossy network, the TCP receiver will report congestion/loss in the lossy network via duplicate ACKs. In case of a TCP sender in a lossy network and congestion occurring in a lossless network leading to the DC-TCP receiver, the CNMs sent towards the source must be appropriately translated at the boundary between the lossless and the lossy networks. One possibility is to convert CNMs to TCP window scalings in the boundary switch, as CNMs will not be understood by the lossy network.
- While dozens of TCP flavors have been published, generally they deal with fast WANs (same rate as DCNs, but long delays) or wireless applications. Lossless apps of TCP, dealing with saturation trees (multiple correlated bottlenecks) and also working with shallow buffers are not known thus far. Furthermore, TCP has not yet been combined with L2 congestion detection mechanisms such as QCN, which provides a multi-bit (ECN/BCN is commonly binary) quantitative feedback. To effectively curb DCN congestion, optionally we additionally (may) apply the compensation scheme described above.
FIG. 12 illustrates one embodiment ofsystem 10. -
System 10 adapts TCP to a lossless DCN, by combining a re-tuned TCP flavor (CUBIC, Compound and New Reno are favored, others may apply) with L2 QCN signaling.System 10 copes with saturation trees issues, us latency, and shallow buffers. -
System 10 also compensates and adapts to rapidly changing DCN loads.System 10 provided full TCP socket compatibility, and therefore legacy application support. - In a CEE-based
data center network 12, a method for preventing the spread of packet congestion while simultaneously preventing packet loss in the network having at least one source channel adapter, at least one destination channel adapter, and multiple fiber channel over Ethernet (FCoE) enabled switches 24 is enabled bysystem 10. Thesystem 10 detects congestion occurring within the data center network. Thesystem 10 measures the extent of the congestion and generates a feedback signal (value) at theLayer 2 level, notifying the source channel adapter and destination channel adapter that congestion is occurring. Thesystem 10 also compensates for that congestion by changing the packet injection rate (within a sliding window) by an amount proportional to the magnitude of the feedback signal and dynamically readjusts the feedback signal (value) based on the extent of congestion. - It should be noted that in some alternative implementations, the functions noted in a flowchart block may occur out of the order noted in the figures. For instance, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved because the flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For example, the steps may be performed concurrently and/or in a different order, or steps may be added, deleted, and/or modified. All of these variations are considered a part of the claimed invention.
- The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
- The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
- While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
Claims (18)
1. A system comprising:
a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network; and
an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
2. The system of claim 1 wherein the DCB layer 2 network generates flow control signals according to a flow control protocol supporting multiple priorities.
3. The system of claim 1 wherein the DCB layer 2 network generates congestion control feedback signals according to a quantized congestion notification (QCN) protocol.
4. The system of claim 3 wherein the end point is connected to the DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals.
5. The system of claim 3 wherein the network traffic generated by the transport layer is carried on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
6. The system of claim 3 wherein the network traffic generated by the transport layer is carried on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by the adjacent switch.
7. The system of claim 4 wherein the network traffic generated by the transport layer is carried on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
8. The system of claim 4 wherein the network traffic generated by the transport layer is carried on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch.
9. The system of claim 1 wherein the adaptor preprocesses the flow and congestion control feedback signals into consolidated feedback signals, with the preprocessing including at least one of delaying, aggregating, filtering, replicating, enhancing and decimating the primary feedback signals.
10-17. (canceled)
18. A computer program product embodied in a tangible media comprising:
computer readable program codes coupled to the tangible media to improve reliability of a converged Ethernet network, the computer readable program codes configured to cause the program to:
provide a plurality of end points each comprising a layer 4 transport layer, where each end point is connected to a data center bridging (DCB) layer 2 network; and
position an adaptor between the layer 4 transport layer and the DCB layer 2 network to translate at least one of flow and congestion control feedback signals, provided by at least one of the DCB network and the transport layer, to consolidated feedback signals for controlling transmission by the transport layer.
19. The computer program product of claim 18 further comprising program code configured to: generate flow control signals at the DCB layer 2 network according to a flow control protocol supporting multiple priorities, such as Priority Flow Control (PFC).
20. The computer program product of claim 18 further comprising program code configured to: generate congestion control feedback signals at the DCB layer 2 network according to a quantized congestion notification (QCN) protocol
21. The computer program product of claim 20 further comprising program code configured to: connect the end point to the DCB layer 2 network through an end station, where the end station implements a quantized congestion notification (QCN) reaction point imposing rate limits based on a QCN protocol to limit network congestion in the DCB layer 2 network in response to receiving congestion control signals.
22. The computer program product of claim 20 further comprising program code configured to: carry network traffic generated by the transport layer on layer 2 with lossy operation, either by configuring the end station to steer the traffic to a lossy priority of a priority flow control (PFC) protocol, or by not using any PFC protocol.
23. The computer program product of claim 20 further comprising program code configured to: carrying the network traffic generated by the transport layer on layer 2 with lossless operation, by configuring the end station to steer the traffic to a lossless priority of a priority flow control (PFC) protocol and by letting the end station react to layer 2 flow control messages generated by an adjacent switch.
24. The computer program product of claim 21 further comprising program code configured to: processing TCP ACKs and controlling associated TCP transmissions where the consolidated feedback signal comprises at least one of a TCP flow rate limit, a TCP flow buffer occupancy metric, and a TCP flow rate limit.
25. The computer program product of claim 21 further comprising program code configured to: adjusting a TCP flow congestion window and transmission schedule in response to the consolidated feedback signal via the congestion module.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/366,640 US20130205038A1 (en) | 2012-02-06 | 2012-02-06 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
US13/708,933 US9356867B2 (en) | 2012-02-06 | 2012-12-08 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
EP13700912.2A EP2829025A1 (en) | 2012-02-06 | 2013-01-23 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
PCT/EP2013/051169 WO2013117427A1 (en) | 2012-02-06 | 2013-01-23 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
CN201380008254.1A CN104094559B (en) | 2012-02-06 | 2013-01-23 | For restraining the method and system of the reliability strengthening Ethernet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/366,640 US20130205038A1 (en) | 2012-02-06 | 2012-02-06 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/708,933 Continuation US9356867B2 (en) | 2012-02-06 | 2012-12-08 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130205038A1 true US20130205038A1 (en) | 2013-08-08 |
Family
ID=47598844
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/366,640 Abandoned US20130205038A1 (en) | 2012-02-06 | 2012-02-06 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
US13/708,933 Expired - Fee Related US9356867B2 (en) | 2012-02-06 | 2012-12-08 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/708,933 Expired - Fee Related US9356867B2 (en) | 2012-02-06 | 2012-12-08 | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network |
Country Status (4)
Country | Link |
---|---|
US (2) | US20130205038A1 (en) |
EP (1) | EP2829025A1 (en) |
CN (1) | CN104094559B (en) |
WO (1) | WO2013117427A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140301197A1 (en) * | 2013-04-05 | 2014-10-09 | International Business Machines Corporation | Virtual quantized congestion notification |
CN104980359A (en) * | 2014-04-04 | 2015-10-14 | 中兴通讯股份有限公司 | Flow control method of fiber channel over Ethernet (FCoE), flow control device of FCoE and flow control system of FCoE |
US9325639B2 (en) | 2013-12-17 | 2016-04-26 | At&T Intellectual Property I, L.P. | Hierarchical caching system for lossless network packet capture applications |
US9614765B2 (en) | 2014-08-26 | 2017-04-04 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Quantized congestion notification (QCN) proxy function in data center bridging capabilities exchange (DCBX) protocol |
CN112968811A (en) * | 2021-02-20 | 2021-06-15 | 中国工商银行股份有限公司 | PFC exception handling method and device for RDMA network |
US11683250B2 (en) * | 2021-10-22 | 2023-06-20 | Palo Alto Networks, Inc. | Managing proxy throughput between paired transport layer connections |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6015744B2 (en) * | 2012-03-23 | 2016-10-26 | 富士通株式会社 | Congestion control method, congestion control device, communication system, and congestion control program |
KR101536141B1 (en) * | 2014-02-13 | 2015-07-13 | 현대자동차주식회사 | Apparatus and method for converting signal between ethernet and can in a vehicle |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6426944B1 (en) * | 1998-12-30 | 2002-07-30 | At&T Corp | Method and apparatus for controlling data messages across a fast packet network |
US20100223397A1 (en) * | 2009-02-27 | 2010-09-02 | Uri Elzur | Method and system for virtual machine networking |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7564869B2 (en) | 2004-10-22 | 2009-07-21 | Cisco Technology, Inc. | Fibre channel over ethernet |
US7961621B2 (en) | 2005-10-11 | 2011-06-14 | Cisco Technology, Inc. | Methods and devices for backward congestion notification |
EP1936880A1 (en) | 2006-12-18 | 2008-06-25 | British Telecommunications Public Limited Company | Method and system for congestion marking |
EP2122936B1 (en) * | 2007-03-12 | 2012-11-14 | Citrix Systems, Inc. | Systems and methods for providing quality of service precedence in tcp congestion control |
US7821939B2 (en) | 2007-09-26 | 2010-10-26 | International Business Machines Corporation | Method, system, and computer program product for adaptive congestion control on virtual lanes for data center ethernet architecture |
US8458305B2 (en) * | 2009-08-06 | 2013-06-04 | Broadcom Corporation | Method and system for matching and repairing network configuration |
US8504690B2 (en) | 2009-08-07 | 2013-08-06 | Broadcom Corporation | Method and system for managing network power policy and configuration of data center bridging |
JP5621996B2 (en) | 2010-02-12 | 2014-11-12 | 日本電気株式会社 | Network system and congestion control method |
US20110261686A1 (en) | 2010-04-21 | 2011-10-27 | Kotha Saikrishna M | Priority Pause (PFC) in Virtualized/Non-Virtualized Information Handling System Environment |
US8767742B2 (en) | 2010-04-22 | 2014-07-01 | International Business Machines Corporation | Network data congestion management system |
US20110261696A1 (en) | 2010-04-22 | 2011-10-27 | International Business Machines Corporation | Network data congestion management probe system |
JP5580706B2 (en) | 2010-09-29 | 2014-08-27 | Kddi株式会社 | Data transfer apparatus, program, and method using retransmission control protocol |
-
2012
- 2012-02-06 US US13/366,640 patent/US20130205038A1/en not_active Abandoned
- 2012-12-08 US US13/708,933 patent/US9356867B2/en not_active Expired - Fee Related
-
2013
- 2013-01-23 EP EP13700912.2A patent/EP2829025A1/en not_active Withdrawn
- 2013-01-23 CN CN201380008254.1A patent/CN104094559B/en active Active
- 2013-01-23 WO PCT/EP2013/051169 patent/WO2013117427A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6426944B1 (en) * | 1998-12-30 | 2002-07-30 | At&T Corp | Method and apparatus for controlling data messages across a fast packet network |
US20100223397A1 (en) * | 2009-02-27 | 2010-09-02 | Uri Elzur | Method and system for virtual machine networking |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140301197A1 (en) * | 2013-04-05 | 2014-10-09 | International Business Machines Corporation | Virtual quantized congestion notification |
US20150295839A1 (en) * | 2013-04-05 | 2015-10-15 | International Business Machines Corporation | Virtual quantized congestion notification |
US9166925B2 (en) * | 2013-04-05 | 2015-10-20 | International Business Machines Corporation | Virtual quantized congestion notification |
US9654410B2 (en) * | 2013-04-05 | 2017-05-16 | International Business Machines Corporation | Virtual quantized congestion notification |
US10182016B2 (en) * | 2013-04-05 | 2019-01-15 | International Business Machines Corporation | Virtual quantized congestion notification |
US9325639B2 (en) | 2013-12-17 | 2016-04-26 | At&T Intellectual Property I, L.P. | Hierarchical caching system for lossless network packet capture applications |
US9577959B2 (en) | 2013-12-17 | 2017-02-21 | At&T Intellectual Property I, L.P. | Hierarchical caching system for lossless network packet capture applications |
CN104980359A (en) * | 2014-04-04 | 2015-10-14 | 中兴通讯股份有限公司 | Flow control method of fiber channel over Ethernet (FCoE), flow control device of FCoE and flow control system of FCoE |
US9614765B2 (en) | 2014-08-26 | 2017-04-04 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Quantized congestion notification (QCN) proxy function in data center bridging capabilities exchange (DCBX) protocol |
CN112968811A (en) * | 2021-02-20 | 2021-06-15 | 中国工商银行股份有限公司 | PFC exception handling method and device for RDMA network |
US11683250B2 (en) * | 2021-10-22 | 2023-06-20 | Palo Alto Networks, Inc. | Managing proxy throughput between paired transport layer connections |
Also Published As
Publication number | Publication date |
---|---|
EP2829025A1 (en) | 2015-01-28 |
CN104094559A (en) | 2014-10-08 |
US20130205039A1 (en) | 2013-08-08 |
CN104094559B (en) | 2016-12-14 |
US9356867B2 (en) | 2016-05-31 |
WO2013117427A1 (en) | 2013-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9356867B2 (en) | Lossless socket-based layer 4 transport (reliability) system for a converged ethernet network | |
US9961585B2 (en) | Network-side buffer management | |
US8831041B2 (en) | Prioritizing highly compressed traffic to provide a predetermined quality of service | |
US9413814B2 (en) | Systems and methods for providing quality of service via a flow controlled tunnel | |
US8379515B1 (en) | TCP throughput control by imposing temporal delay | |
US20140281018A1 (en) | Dynamic Optimization of TCP Connections | |
WO2020001192A1 (en) | Data transmission method, computing device, network device and data transmission system | |
US20190253364A1 (en) | Method For Determining TCP Congestion Window, And Apparatus | |
JP2018508151A (en) | Method, apparatus, and system for transmitting transmission control protocol TCP data packet | |
Kühlewind et al. | Using data center TCP (DCTCP) in the Internet | |
Honda et al. | Understanding TCP over TCP: effects of TCP tunneling on end-to-end throughput and latency | |
EP3323229A1 (en) | Method and apparatus for managing network congestion | |
Ye et al. | PTP: Path-specified transport protocol for concurrent multipath transmission in named data networks | |
CN110115011A (en) | Multicast service handling method and access device | |
Nabeshima | Performance evaluation of multcp in high-speed wide area networks | |
TWI757887B (en) | Method, network controller, and computer program product for facilitating multipath transmission of a data stream from a sender to a receiver | |
Verma et al. | Concurrent multipath transfer using delay aware scheduling | |
Arumaithurai et al. | Nf-tcp: Network friendly tcp | |
Andrew et al. | An example of instability in XCP | |
Bisio et al. | Performance enhanced proxy solutions for satellite networks: state of the art, protocol stack and possible interfaces | |
Mareev et al. | Multipoint data transmission issues in high bandwidth-delay product TCP/IP networks | |
López-Pacheco et al. | Enabling large data transfers on dynamic, very high-speed network infrastructures | |
Chen et al. | The performance of TCP congestion control algorithm over high-speed transmission links | |
Hung et al. | Simple slow-start and a fair congestion avoidance for TCP communications | |
Alghawli | Comparative Analysis of TCP-Protocol Operation Algorithms in Self-Similar Traffic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DECUSATIS, CASIMER M.;GUSAT, MIRCEA;LUIJTEN, RONALD P.;AND OTHERS;SIGNING DATES FROM 20120116 TO 20120119;REEL/FRAME:027656/0771 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |