[go: up one dir, main page]

CN118266203A - Intelligent NIC grouping - Google Patents

Intelligent NIC grouping Download PDF

Info

Publication number
CN118266203A
CN118266203A CN202280076727.0A CN202280076727A CN118266203A CN 118266203 A CN118266203 A CN 118266203A CN 202280076727 A CN202280076727 A CN 202280076727A CN 118266203 A CN118266203 A CN 118266203A
Authority
CN
China
Prior art keywords
smart
nic
nics
data
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280076727.0A
Other languages
Chinese (zh)
Inventor
B·S·洪
江文毅
杨国林
J·何奥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weirui LLC
Original Assignee
Weirui LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weirui LLC filed Critical Weirui LLC
Priority claimed from PCT/US2022/039016 external-priority patent/WO2023121720A1/en
Publication of CN118266203A publication Critical patent/CN118266203A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0806Configuration setting for initial configuration or provisioning, e.g. plug-and-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一些实施例提供一种用于主机计算机的多个智能NIC中的第一智能NIC的方法。所述智能NIC中的每一者执行智能NIC操作系统,所述智能NIC操作系统针对在所述主机计算机上执行的一组数据计算机器而执行虚拟网络操作。所述方法接收由在所述主机计算机上执行的所述数据计算机器中的一者发送的数据消息。所述方法对所述数据消息执行虚拟网络操作以确定所述数据消息将从所述多个智能NIC中的第二智能NIC的端口被传输。所述方法经由连接所述多个智能NIC的专用通信信道而将所述数据消息传递到所述第二智能NIC。

Some embodiments provide a method for a first smart NIC of a plurality of smart NICs of a host computer. Each of the smart NICs executes a smart NIC operating system that performs virtual network operations for a set of data computing machines executing on the host computer. The method receives a data message sent by one of the data computing machines executing on the host computer. The method performs virtual network operations on the data message to determine that the data message is to be transmitted from a port of a second smart NIC of the plurality of smart NICs. The method delivers the data message to the second smart NIC via a dedicated communication channel connecting the plurality of smart NICs.

Description

智能NIC的成组Smart NIC Teaming

背景技术Background technique

通常与主机计算机相关联的较多操作被推送到可编程智能网络接口控制器(NIC)。被推送到这些智能NIC的操作中的一些操作包含用于计算机器的数据消息的虚拟网络处理。在一些情形中,主机计算机将具有执行网络处理或其它操作的多个此类智能NIC。尽管执行先前在主机计算机上(例如,由主机计算机的管理程序)执行的操作,但期望使得这些智能NIC能够一起工作。Many operations typically associated with a host computer are pushed to a programmable intelligent network interface controller (NIC). Some of the operations pushed to these intelligent NICs include virtual network processing for data messages of the computing machine. In some cases, the host computer will have multiple such intelligent NICs that perform network processing or other operations. It is desirable to enable these intelligent NICs to work together while performing operations that were previously performed on the host computer (e.g., by the host computer's hypervisor).

发明内容Summary of the invention

一些实施例提供用于使得同一主机计算机的多个智能NIC能够作为单个实体(例如,作为一组成组的智能NIC)进行操作的方法。在一些实施例中,智能NIC各自执行智能NIC操作系统,所述智能NIC操作系统针对在主机计算机上执行的一组数据计算节点(例如,虚拟机(VM)、容器等)执行虚拟网络操作(及/或其它操作,例如虚拟存储操作)。在一些实施例中,智能NIC通过专用通信信道连接,以便共享动态状态信息、共享配置数据(使得智能NIC中的一者可充当网络管理及控制系统的单一联系点)及/或在彼此之间传递向及从需要虚拟网络处理的数据计算节点(DCN)发送的数据消息。Some embodiments provide methods for enabling multiple smart NICs of the same host computer to operate as a single entity (e.g., as a group of smart NICs). In some embodiments, the smart NICs each execute a smart NIC operating system that performs virtual network operations (and/or other operations, such as virtual storage operations) for a group of data computing nodes (e.g., virtual machines (VMs), containers, etc.) executing on the host computer. In some embodiments, the smart NICs are connected by a dedicated communication channel to share dynamic state information, share configuration data (so that one of the smart NICs can act as a single point of contact for a network management and control system), and/or pass data messages between each other that are sent to and from a data computing node (DCN) that requires virtual network processing.

通过执行智能NIC操作系统,智能NIC能够执行原本将由主机计算机软件(例如,主机计算机的管理程序)执行的各种任务。这些任务可包含针对数据消息的虚拟网络处理(即,执行虚拟交换及/或路由、防火墙操作等)、虚拟存储操作等。为了使多个智能NIC执行原本将完全由单个实体(例如,管理程序)执行的这些操作,可需要在智能NIC之间进行通信。By executing the Smart NIC operating system, the Smart NICs are able to perform various tasks that would otherwise be performed by host computer software (e.g., the host computer's hypervisor). These tasks may include virtual network processing for data messages (i.e., performing virtual switching and/or routing, firewall operations, etc.), virtual storage operations, etc. In order for multiple Smart NICs to perform these operations that would otherwise be performed entirely by a single entity (e.g., a hypervisor), communication between the Smart NICs may be required.

如所提及,在一些实施例中,在智能NIC之间建立专用通信信道以实现智能NIC之间的通信。在一些实施例中,专用通信信道是物理上单独的信道。举例来说,在一些实施例中,智能NIC经由仅承载智能NIC之间的通信的一组物理电缆来连接。在不同的此类实施例中,智能NIC可串行连接(使得每一智能NIC直接连接到两个其它智能NIC,除了位于连接的端上的仅连接到一个其它智能NIC的智能NIC之外)、在环路中连接(类似于串行连接,但其中所有智能NIC均连接到两个其它智能NIC),或经由单独物理交换机连接,使得每一智能NIC可通过物理交换机直接与任何其它智能NIC进行通信。如果存在足够的可用端口,那么电缆可连接到智能NIC的以太网端口(借此占用这些端口,使得主机计算机的网络业务不使用这些端口)或者连接到智能NIC的管理端口(通常是较低带宽端口)。在一些实施例中,智能NIC使用单独专门构建的信道,所述单独专门构建的信道经设计以用于将智能NIC彼此连接,而非占用可用于其它目的的端口。As mentioned, in some embodiments, a dedicated communication channel is established between the smart NICs to enable communication between the smart NICs. In some embodiments, the dedicated communication channel is a physically separate channel. For example, in some embodiments, the smart NICs are connected via a set of physical cables that carry only communication between the smart NICs. In different such embodiments, the smart NICs may be connected in series (so that each smart NIC is directly connected to two other smart NICs, except for the smart NICs at the ends of the connection that are only connected to one other smart NIC), connected in a loop (similar to a serial connection, but where all smart NICs are connected to two other smart NICs), or connected via a separate physical switch so that each smart NIC can communicate directly with any other smart NIC through the physical switch. If there are enough available ports, the cables may be connected to the Ethernet ports of the smart NICs (thereby occupying these ports so that the network traffic of the host computer does not use these ports) or to the management ports of the smart NICs (usually lower bandwidth ports). In some embodiments, the smart NICs use separate specially constructed channels that are designed to connect the smart NICs to each other, rather than occupying ports that can be used for other purposes.

在其它实施例中,智能NIC经由使用现有物理连接的逻辑专用通信信道来进行通信。举例来说,如果所有智能NIC均连接到同一数据链路层(层2)网络,那么专属虚拟局域网(VLAN)可用作用于智能NIC的专用通信信道。然而,如果此现有的层2网络具有众多其它主机计算机(所述主机计算机具有其自身的需要单独VLAN的智能NIC组)且还针对主机计算机上的DCN承载数据消息,那么可达到VLAN的最大数目。一些实施例替代地使用基于封装(例如,虚拟可扩展LAN(VXLAN)或通用网络虚拟化封装(Geneve))的覆叠网络作为逻辑专用通信信道。此类覆叠网络在数目上不像VLAN那样受约束且还具有使得智能NIC能够在必要时跨越多个层2网络进行通信(即,只要智能NIC全部均在同一层3网络上)的益处。In other embodiments, the smart NICs communicate via a logical dedicated communication channel using existing physical connections. For example, if all smart NICs are connected to the same data link layer (layer 2) network, then a dedicated virtual local area network (VLAN) can be used as a dedicated communication channel for the smart NICs. However, if this existing layer 2 network has many other host computers (the host computers have their own smart NIC groups that require separate VLANs) and also carry data messages for the DCN on the host computers, then the maximum number of VLANs can be reached. Some embodiments instead use an overlay network based on encapsulation (e.g., Virtual Extensible LAN (VXLAN) or Generic Network Virtualization Encapsulation (Geneve)) as the logical dedicated communication channel. Such an overlay network is not as constrained in number as VLANs and also has the benefit of enabling the smart NICs to communicate across multiple layer 2 networks when necessary (i.e., as long as the smart NICs are all on the same layer 3 network).

在其它实施例中,主机计算机的智能NIC经由穿过所述主机计算机的专用通信信道进行通信。举例来说,智能NIC通常连接到主机计算机的快速外围组件互连(PCIe)子系统,所述PCIe子系统可用于专用通信信道。在不同实施例中,智能NIC使用PCIe的标准点对点传送特征、利用PCIe交换结构或使用PCIe之上的其它增强功能(例如,计算快速链路(CXL))。In other embodiments, the intelligent NIC of the host computer communicates via a dedicated communication channel through the host computer. For example, the intelligent NIC is typically connected to the host computer's Peripheral Component Interconnect Express (PCIe) subsystem, which can be used for the dedicated communication channel. In various embodiments, the intelligent NIC uses the standard point-to-point transfer feature of PCIe, utilizes the PCIe switch fabric, or uses other enhancements over PCIe (e.g., Compute Express Link (CXL)).

如所提及,专用通信信道的一种用途是使第一智能NIC将数据消息(例如,被发送到或来自主机计算机或在主机计算机上执行的DCN的数据消息)传递到第二智能NIC。智能NIC作为单个实体进行操作,这是因为其智能NIC操作系统共同实施一组虚拟网络操作(例如,实施逻辑交换机及/或路由器、防火墙等)。然而,每一智能NIC具有其自身的接口(例如,物理功能及虚拟功能)以及其自身的物理网络端口,主机计算机的DCN绑定到所述接口。As mentioned, one use of the dedicated communication channel is for a first smart NIC to pass a data message (e.g., a data message sent to or from a host computer or a DCN executing on a host computer) to a second smart NIC. The smart NICs operate as a single entity because their smart NIC operating systems collectively implement a set of virtual network operations (e.g., implement logical switches and/or routers, firewalls, etc.). However, each smart NIC has its own interfaces (e.g., physical functions and virtual functions) and its own physical network ports to which the host computer's DCN is bound.

如此,第一智能NIC将从绑定到所述智能NIC的端口的DCN接收数据消息。如果智能NIC共同地实施虚拟网络操作,那么此第一智能NIC处理这些数据消息。然而,基于此处理,数据消息可需要经由专用通信信道而被传输到第二智能NIC,使得第二智能NIC可输出所述数据消息。举例来说,如果目的地是主机计算机上的绑定到第二智能NIC的另一DCN,那么第一智能NIC将需要将数据消息传递到第二智能NIC,使得数据消息可经由正确接口而被输出。另外,如果智能NIC的所有端口均在链路聚合群组(LAG)中进行成组,那么用于单个DCN的连接会跨越这些端口而进行负载平衡,因此从绑定到第一智能NIC的接口的特定DCN发送到第一智能NIC的数据消息中的一些数据消息将经由其它智能NIC而被输出到物理网络。相反,在第一智能NIC的物理网络端口处接收到的数据消息将由第一智能NIC处理,但可能需要被发送到第二智能NIC以递送到绑定到所述第二智能NIC的目的地DCN。在另一情形中,如果第一智能NIC的所有物理网络端口均已出现故障,但智能NIC本身仍可操作,那么所述智能NIC仍可对数据消息执行虚拟网络操作,但将需要将那些数据消息发送到其它智能NIC以输出到物理网络,而不管端口是否在LAG中操作。Thus, the first smart NIC will receive data messages from the DCN bound to the port of the smart NIC. If the smart NICs collectively implement virtual network operations, then this first smart NIC processes these data messages. However, based on this processing, the data messages may need to be transmitted to the second smart NIC via a dedicated communication channel so that the second smart NIC can output the data messages. For example, if the destination is another DCN bound to the second smart NIC on the host computer, then the first smart NIC will need to pass the data message to the second smart NIC so that the data message can be output via the correct interface. In addition, if all ports of the smart NICs are grouped in a link aggregation group (LAG), the connections for a single DCN will be load balanced across these ports, so some of the data messages sent to the first smart NIC from a specific DCN bound to the interface of the first smart NIC will be output to the physical network via other smart NICs. In contrast, data messages received at the physical network port of the first smart NIC will be processed by the first smart NIC, but may need to be sent to the second smart NIC for delivery to the destination DCN bound to the second smart NIC. In another scenario, if all of the physical network ports of a first smart NIC have failed, but the smart NIC itself is still operational, the smart NIC can still perform virtual network operations on data messages, but will need to send those data messages to other smart NICs for output to the physical network, regardless of whether the ports are operating in a LAG.

在许多情况中,智能NIC从网络管理及控制系统接收用于虚拟网络操作的配置数据。智能NIC中的每一者具有其自身的一组端口(可能包含管理端口),所述端口具有其自身的网络地址,但许多网络管理及控制系统将每一主机计算机视为单个实体(例如,与不使用智能NIC进行网络虚拟化操作的主机计算机的管理程序中的代理进行通信)。网络管理及控制系统针对每一主机计算机使用单个管理网络地址且因此不应与主机计算机的所有多个智能NIC直接进行通信。In many cases, the smart NICs receive configuration data for virtual network operations from a network management and control system. Each of the smart NICs has its own set of ports (possibly including a management port) with its own network address, but many network management and control systems treat each host computer as a single entity (e.g., communicating with an agent in a hypervisor of a host computer that does not use the smart NICs for network virtualization operations). The network management and control system uses a single management network address for each host computer and therefore should not communicate directly with all of the multiple smart NICs of a host computer.

在一些实施例中,智能NIC使用群集技术,以便向网络管理及控制系统显现为主机计算机的单个实体。举例来说,在一些实施例中,主机计算机的智能NIC执行领导者(leader)选取以确定所述智能NIC中与网络管理及控制系统进行通信的单个智能NIC。在一些此类实施例中,智能NIC操作系统中的每一者运行确定性算法,所述确定性算法选择智能NIC中的一者作为联系点。此领导者选取所需的任何消息均经由专用通信信道进行传递。In some embodiments, the SmartNICs use clustering techniques in order to appear as a single entity to the host computer to the network management and control system. For example, in some embodiments, the SmartNICs of the host computer perform leader election to determine a single one of the SmartNICs to communicate with the network management and control system. In some such embodiments, each of the SmartNIC operating systems runs a deterministic algorithm that selects one of the SmartNICs as the point of contact. Any messages required for this leader election are delivered over a dedicated communication channel.

所选取智能NIC从网络管理及控制系统接收配置数据(例如,逻辑交换机及逻辑路由器配置数据)且经由专用通信信道而将此数据散布到其它智能NIC,使得所有智能NIC可对向及从在主机计算机上执行的DCN发送的数据消息执行虚拟网络操作。在一些实施例中,网络管理及控制系统包含管理平面(MP)及中央控制平面(CCP),所述MP及CCP执行不同功能且向主机计算机提供不同配置数据(除了从主机计算机接收不同数据之外)。在一些情形中,智能NIC选取两个不同领导者,一个领导者用于与MP进行通信且一个领导者用于与CCP进行通信。The selected smart NIC receives configuration data (e.g., logical switch and logical router configuration data) from the network management and control system and spreads this data to other smart NICs via a dedicated communication channel so that all smart NICs can perform virtual network operations on data messages sent to and from the DCN executing on the host computer. In some embodiments, the network management and control system includes a management plane (MP) and a central control plane (CCP), which perform different functions and provide different configuration data to the host computer (in addition to receiving different data from the host computer). In some cases, the smart NIC elects two different leaders, one leader for communicating with the MP and one leader for communicating with the CCP.

除了传播来自网络管理及控制系统的配置数据之外,领导者智能NIC还经由专用通信信道而接收来自其它智能NIC的信息,所述信息中的一些信息被报告给网络管理及控制系统。此信息可包含运行时间统计数据(例如,数据消息处理统计数据)、状态信息等,且可由网络管理及控制系统及/或领导者智能NIC用于监测主机计算机及/或智能NIC。网络管理及控制系统还可使用此信息来修改针对智能NIC的虚拟网络配置。In addition to propagating configuration data from the network management and control system, the leader SmartNIC also receives information from other SmartNICs via dedicated communication channels, some of which is reported to the network management and control system. This information may include runtime statistics (e.g., data message processing statistics), status information, etc., and may be used by the network management and control system and/or the leader SmartNIC to monitor the host computer and/or SmartNIC. The network management and control system may also use this information to modify the virtual network configuration for the SmartNIC.

出于各种目的,在一些实施例中,智能NIC还使用专用通信信道来同步动态状态信息。举例来说,在领导者智能NIC出故障的情形中,由所选取领导者智能NIC检索的监测数据可与至少一个备份智能NIC同步。另外,当执行虚拟网络处理时,智能NIC可需要存储动态状态信息且彼此共享所述数据。在许多情况中,智能NIC操作系统存储连接跟踪信息,所述连接跟踪信息指示打开连接及针对每一打开连接的拥塞窗口。此连接跟踪信息由防火墙操作用于确定是允许还是丢弃/阻止数据消息。如果智能NIC变得不可操作且尚未与其它智能NIC共享任何状态,那么由所述智能NIC管理的所有连接将被转移到其它智能NIC,所述其它智能NIC将不具有所述连接的任何记录。如此,智能NIC彼此共享此连接跟踪状态信息,使得能够无缝处置智能NIC之间的故障转移。For various purposes, in some embodiments, the smart NICs also use a dedicated communication channel to synchronize dynamic state information. For example, in the event that a leader smart NIC fails, monitoring data retrieved by the selected leader smart NIC may be synchronized with at least one backup smart NIC. In addition, when performing virtual network processing, the smart NICs may need to store dynamic state information and share the data with each other. In many cases, the smart NIC operating system stores connection tracking information that indicates open connections and congestion windows for each open connection. This connection tracking information is used by firewall operations to determine whether to allow or drop/block data messages. If a smart NIC becomes inoperable and has not yet shared any state with other smart NICs, then all connections managed by the smart NIC will be transferred to other smart NICs, and the other smart NICs will not have any record of the connection. In this way, the smart NICs share this connection tracking state information with each other, enabling seamless handling of failover between smart NICs.

这种状态共享也可由正在执行除虚拟网络之外的操作的智能NIC(或执行使用状态共享的多种类型的操作的智能NIC)使用。如果存储虚拟化操作由智能NIC处置,那么在一些实施例中,存储虚拟化功能包含运行网络堆栈来管理到存储装置的层4连接。在此情形中,若发生故障转移,应在智能NIC之间再次共享连接信息,使得如果智能NIC中的一者出故障,那么这些连接不会被复位。This state sharing can also be used by SmartNICs that are performing operations other than virtual networking (or SmartNICs that are performing multiple types of operations that use state sharing). If storage virtualization operations are handled by the SmartNICs, then in some embodiments, the storage virtualization functions include running a network stack to manage layer 4 connections to the storage devices. In this case, if a failover occurs, the connection information should be shared again between the SmartNICs so that if one of the SmartNICs fails, the connections are not reset.

前述发明内容旨在用作对本发明的一些实施例的简要介绍。这并不意味着是对本文件中公开的所有发明标的物的介绍或概述。以下具体实施方式及在具体实施方式中参考的图式将进一步描述发明内容中所描述的实施例以及其它实施例。因此,为了理解本文件所描述的所有实施例,需要对发明内容、具体实施方式及图式进行全面综述。此外,所主张标的物不受发明内容、具体实施方式及图式中的说明性细节限制,而是将由所附权利要求书来定义,这是因为所主张标的物可在不背离标的物的精神的情况下以其它特定形式来体现。The foregoing summary is intended to serve as a brief introduction to some embodiments of the present invention. This is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The following detailed description and the drawings referenced in the detailed description will further describe the embodiments described in the summary as well as other embodiments. Therefore, in order to understand all the embodiments described in this document, a comprehensive review of the summary, the detailed description, and the drawings is required. In addition, the claimed subject matter is not limited by the illustrative details in the summary, the detailed description, and the drawings, but will be defined by the appended claims, because the claimed subject matter may be embodied in other specific forms without departing from the spirit of the subject matter.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

在所附权利要求书中陈述本发明的新颖特征。然而,出于解释目的,在以下各图中陈述本发明的数个实施例。The novel features of the invention are set forth in the appended claims.For purposes of illustration, however, several embodiments of the invention are set forth in the following figures.

图1在概念上图解说明具有执行网络虚拟化操作的多个物理智能NIC的主机计算机。FIG. 1 conceptually illustrates a host computer with multiple physical Smart NICs performing network virtualization operations.

图2在概念上图解说明一些实施例的单个主机计算机的一组智能NIC的共同操作。FIG. 2 conceptually illustrates the collective operation of a group of intelligent NICs of a single host computer of some embodiments.

图3在概念上图解说明其中主机计算机的智能NIC串联连接的一些实施例。FIG. 3 conceptually illustrates some embodiments in which the smart NICs of a host computer are connected in series.

图4在概念上图解说明其中主机计算机的每一智能NIC直接连接到主机计算机的每一其它智能NIC的一些实施例。4 conceptually illustrates some embodiments in which each intelligent NIC of a host computer is directly connected to each other intelligent NIC of the host computer.

图5在概念上图解说明主机计算机的通过单独物理交换机而连接的智能NIC的实例。FIG. 5 conceptually illustrates an example of a host computer's SmartNICs connected through a single physical switch.

图6在概念上图解说明各自具有三个智能NIC的两个主机计算机,所述智能NIC连接到数据中心网络且使用所述数据中心网络上的覆叠作为其相应专用通信信道。6 conceptually illustrates two host computers, each having three smart NICs, connected to a data center network and using an overlay on the data center network as their respective dedicated communication channels.

图7在概念上图解说明具有三个智能NIC的主机计算机,所述智能NIC通过主机计算机的PCIe总线进行通信。FIG. 7 conceptually illustrates a host computer with three Smart NICs communicating over the host computer's PCIe bus.

图8在概念上图解说明用于在智能NIC处处理数据消息的一些实施例的过程,所述智能NIC是主机计算机的多个智能NIC中的一者。8 conceptually illustrates a process of some embodiments for processing a data message at a smart NIC, which is one of a plurality of smart NICs of a host computer.

图9在概念上图解说明在智能NIC处经由所述智能NIC的物理端口而接收的两个数据消息的路径。FIG. 9 conceptually illustrates the paths of two data messages received at a smart NIC via a physical port of the smart NIC.

图10在概念上图解说明在智能NIC处经由VM所绑定到的VF而从所述VM接收的两个数据消息的路径。FIG. 10 conceptually illustrates the path of two data messages received at a smart NIC from a VM via a VF to which the VM is bound.

图11在概念上图解说明从在主机计算机上操作的第一VM发送到第二VM的数据消息的路径。11 conceptually illustrates the path of a data message sent from a first VM operating on a host computer to a second VM.

图12在概念上图解说明在主机计算机的第一智能NIC处经由所述智能NIC的物理端口而接收的多播数据消息的路径。12 conceptually illustrates the path of a multicast data message received at a first smart NIC of a host computer via a physical port of the smart NIC.

图13在概念上图解说明在智能NIC的物理端口已出现故障之后针对图10中所展示的连接中的一者的数据消息的路径。13 conceptually illustrates the path of a data message for one of the connections shown in FIG. 10 after a physical port of the intelligent NIC has failed.

图14在概念上图解说明用于配置多个智能NIC以执行网络虚拟化操作的一些实施例的过程。FIG. 14 conceptually illustrates a process of some embodiments for configuring multiple Intelligent NICs to perform network virtualization operations.

图15图解说明一对智能NIC中的每一者分别执行具有领导者选取模块的智能NIC操作系统。FIG. 15 illustrates that each of a pair of smart NICs executes a smart NIC operating system having a leader election module.

图16图解说明配置数据向多个智能NIC的散布。FIG. 16 illustrates the distribution of configuration data to multiple intelligent NICs.

图17在概念上图解说明从自身及另一智能NIC收集统计数据并将所述统计数据报告给网络管理及控制系统的所选取领导者智能NIC。17 conceptually illustrates a selected leader Intelligent NIC that collects statistics from itself and another Intelligent NIC and reports the statistics to a network management and control system.

图18在概念上图解说明主机计算机的三个智能NIC,所述智能NIC分别操作具有领导者选取模块的智能NIC操作系统。FIG. 18 conceptually illustrates three Smart NICs of a host computer, each of which operates a Smart NIC operating system having a leader election module.

图19在概念上图解说明共享连接状态的两个智能NIC。FIG. 19 conceptually illustrates two Smart NICs sharing a connection state.

图20图解说明在连接保持打开的同时图19中的第一智能NIC已变得不可操作且先前绑定到第一智能NIC的VM现在绑定到第二智能NIC的接口。20 illustrates that the first smart NIC in FIG. 19 has become inoperable and the VMs previously bound to the first smart NIC are now bound to the interface of the second smart NIC while the connection remains open.

图21在概念上图解说明实施本发明的一些实施例所利用的电子系统。FIG. 21 conceptually illustrates an electronic system utilized to implement some embodiments of the present invention.

具体实施方式Detailed ways

在本发明的以下详细说明中,陈述及描述了本发明的众多细节、实例及实施例。然而,所属领域的技术人员将清楚及明了,本发明不限于所陈述的实施例且本发明可在不具有所论述的特定细节及实例中的一些的情况下实践。In the following detailed description of the invention, numerous details, examples and embodiments of the invention are stated and described. However, it will be clear and apparent to those skilled in the art that the invention is not limited to the embodiments stated and that the invention can be practiced without some of the specific details and examples discussed.

一些实施例提供用于使得同一主机计算机的多个智能NIC能够作为单个实体(例如,作为一组成组的智能NIC)进行操作的方法。在一些实施例中,智能NIC各自执行智能NIC操作系统,所述智能NIC操作系统针对在主机计算机上执行的一组数据计算节点(例如,虚拟机(VM)、容器等)执行虚拟网络操作(及/或其它操作,例如虚拟存储操作)。在一些实施例中,智能NIC通过专用通信信道连接,以便共享动态状态信息、共享配置数据(使得智能NIC中的一者可充当网络管理及控制系统的单一联系点)及/或在彼此之间传递向及从需要虚拟网络处理的数据计算节点(DCN)发送的数据消息。Some embodiments provide methods for enabling multiple smart NICs of the same host computer to operate as a single entity (e.g., as a group of smart NICs). In some embodiments, the smart NICs each execute a smart NIC operating system that performs virtual network operations (and/or other operations, such as virtual storage operations) for a group of data computing nodes (e.g., virtual machines (VMs), containers, etc.) executing on the host computer. In some embodiments, the smart NICs are connected by a dedicated communication channel to share dynamic state information, share configuration data (so that one of the smart NICs can act as a single point of contact for a network management and control system), and/or pass data messages between each other that are sent to and from a data computing node (DCN) that requires virtual network processing.

通过执行智能NIC操作系统,智能NIC能够执行原本将由主机计算机软件(例如,主机计算机的管理程序)执行的各种任务。这些任务可包含针对数据消息的虚拟网络处理(即,执行虚拟交换及/或路由、防火墙操作等)、虚拟存储操作等。为了使多个智能NIC执行原本将完全由单个实体(例如,管理程序)执行的这些操作,可需要在智能NIC之间进行通信。By executing the Smart NIC operating system, the Smart NICs are able to perform various tasks that would otherwise be performed by host computer software (e.g., the host computer's hypervisor). These tasks may include virtual network processing for data messages (i.e., performing virtual switching and/or routing, firewall operations, etc.), virtual storage operations, etc. In order for multiple Smart NICs to perform these operations that would otherwise be performed entirely by a single entity (e.g., a hypervisor), communication between the Smart NICs may be required.

图1在概念上图解说明具有执行网络虚拟化操作的多个物理智能NIC 105及110的主机计算机100。如所展示,主机计算机100包含多个DCN(在此情形中,虚拟机)115到125,所述多个DCN以直通模式(即,无需在主机计算机100的虚拟化软件130内应用任何种类的网络虚拟化处理)连接到智能NIC 105及110。VM 115到125中的每一者具有相关联虚拟NIC(vNIC)135到145,所述相关联虚拟NIC经由快速外围组件互连(PCIe)结构165(将主机计算机100的物理处理器连接到智能NIC 105及110的物理接口的主板级互连)而连接到智能NIC105及110中的一者的不同虚拟功能(VF)161到164。1 conceptually illustrates a host computer 100 having multiple physical smart NICs 105 and 110 that perform network virtualization operations. As shown, the host computer 100 includes multiple DCNs (in this case, virtual machines) 115-125 that are connected to the smart NICs 105 and 110 in a pass-through mode (i.e., without applying any kind of network virtualization processing within the virtualization software 130 of the host computer 100). Each of the VMs 115-125 has an associated virtual NIC (vNIC) 135-145 that is connected to a different virtual function (VF) 161-164 of one of the smart NICs 105 and 110 via a peripheral component interconnect express (PCIe) fabric 165 (a motherboard-level interconnect that connects the physical processors of the host computer 100 to the physical interfaces of the smart NICs 105 and 110).

每一vNIC 135到145及因此每一VM 115到125绑定到智能NIC 105或110中的一者的不同VF。在一些实施例中,VF 161到164是被公开为智能NIC的接口的虚拟化PCIe功能。每一VF与物理功能(PF)相关联,所述PF是智能NIC的被视为唯一PCIe资源的物理接口。在此情形中,智能NIC 105具有一个PF 170且智能NIC 110具有一个PF 175,但在许多情形中,每一智能NIC将具有多于一个PF。PF 170经虚拟化以至少提供VF 161到162,而PF 175经虚拟化以至少提供VF 163到164。Each vNIC 135-145, and therefore each VM 115-125, is bound to a different VF of one of the smart NICs 105 or 110. In some embodiments, VFs 161-164 are virtualized PCIe functions that are exposed as interfaces to the smart NICs. Each VF is associated with a physical function (PF), which is a physical interface of the smart NIC that is treated as a unique PCIe resource. In this case, smart NIC 105 has one PF 170 and smart NIC 110 has one PF 175, but in many cases each smart NIC will have more than one PF. PF 170 is virtualized to provide at least VFs 161-162, and PF 175 is virtualized to provide at least VFs 163-164.

在一些实施例中,提供VF以便向不同VM提供其可各自连接到的智能NIC的不同虚拟接口。在一些实施例中,VF驱动程序150到160在VM 115到125中的每一者中执行以管理所述VM到VF的相应连接。如所展示,在一些实施例中,每一VM 115到125与由虚拟化软件130作为NIC的软件仿真而提供的vNIC 135到145相关联。在不同实施例中,VM 115到125通过其相应vNIC 135到145或直接以直通模式(其中虚拟化软件130不参与大多数网络通信)存取VF。在其它实施例中,VM 115到125可在此直通模式与经由其相应vNIC 135到145存取VF之间进行切换。在任一情形中,虚拟化软件130参与将VF 161到164分配给VM 115到125且使得VF能够从VF驱动程序150到160存取。In some embodiments, VFs are provided in order to provide different VMs with different virtual interfaces of a smart NIC to which they can each connect. In some embodiments, VF drivers 150-160 are executed in each of the VMs 115-125 to manage the respective connections of the VMs to the VFs. As shown, in some embodiments, each VM 115-125 is associated with a vNIC 135-145 provided by the virtualization software 130 as a software emulation of a NIC. In different embodiments, the VMs 115-125 access the VFs through their respective vNICs 135-145 or directly in a pass-through mode (where the virtualization software 130 does not participate in most network communications). In other embodiments, the VMs 115-125 can switch between this pass-through mode and accessing the VFs via their respective vNICs 135-145. In either case, virtualization software 130 participates in assigning VFs 161 - 164 to VMs 115 - 125 and enables access to the VFs from VF drivers 150 - 160 .

还应注意,虽然在此情形中,所有网络虚拟化操作均已从主机计算机的虚拟化软件130转移到智能NIC 105及110,但在其它实施例中,由虚拟化软件130提供的虚拟交换机可直接连接到PF 170及175。在一些此类实施例中,数据业务从VM经由vNIC发送到虚拟交换机,所述虚拟交换机将所述业务提供到PF。在此情形中,虚拟交换机执行基本交换操作,但将网络虚拟化操作留给智能NIC。It should also be noted that while in this scenario all network virtualization operations have been transferred from the host computer's virtualization software 130 to the smart NICs 105 and 110, in other embodiments, the virtual switch provided by the virtualization software 130 may be directly connected to the PFs 170 and 175. In some such embodiments, data traffic is sent from the VM via the vNIC to the virtual switch, which provides the traffic to the PF. In this scenario, the virtual switch performs basic switching operations, but leaves the network virtualization operations to the smart NIC.

智能NIC 105及110还包含物理网络端口181到184。在不同实施例中,智能NIC可各自仅包含单个物理网络端口或多个(例如,2个、3个、4个等)物理网络端口。这些物理网络端口181到184为主机计算机100提供到数据中心网络的物理通信。另外,在两个智能NIC 105与110之间展示专用通信信道180,所述专用通信信道允许这些智能NIC进行通信。如下文进一步所描述,此通信信道180可采取各种形式(例如,直接物理连接、经由现有网络的逻辑连接、经由PCIe消息的连接)。Intelligent NICs 105 and 110 also include physical network ports 181-184. In different embodiments, the intelligent NICs may each include only a single physical network port or multiple (e.g., 2, 3, 4, etc.) physical network ports. These physical network ports 181-184 provide physical communication for the host computer 100 to the data center network. In addition, a dedicated communication channel 180 is shown between the two intelligent NICs 105 and 110 that allows these intelligent NICs to communicate. As further described below, this communication channel 180 can take various forms (e.g., a direct physical connection, a logical connection via an existing network, a connection via PCIe messages).

最后,图1图解说明智能NIC 105及110执行网络虚拟化操作185。在一些实施例中,这些操作可包含逻辑交换及/或路由操作、散布式防火墙操作、封装以及通常在主机计算机的虚拟化软件中执行的其它网络操作。在一些实施例中,给定主机计算机的所有智能NIC均具备相同的虚拟网络配置。Finally, Figure 1 illustrates that smart NICs 105 and 110 perform network virtualization operations 185. In some embodiments, these operations may include logical switching and/or routing operations, distributed firewall operations, encapsulation, and other network operations that are typically performed in the virtualization software of the host computer. In some embodiments, all smart NICs of a given host computer have the same virtual network configuration.

尽管图中未展示,但在一些实施例中,每一智能NIC是包含(i)包处理电路,例如专用集成电路(ASIC)、(ii)通用中央处理单元(CPU)及(iii)存储器的NIC。在一些实施例中,包处理电路是I/O ASIC,其处置转发到主机计算机中的DCN及从所述DCN转发的数据消息的处理且至少部分地由CPU控制。在其它实施例中,包处理电路是经配置以执行包处理操作的现场可编程门阵列(FPGA)或者专门用于网络处理的固件可编程处理核心(其与通用CPU的不同之处在于处理核心是专门的且因此在包处理方面更高效)。在一些实施例中,CPU执行控制包处理电路的NIC操作系统,且可运行其它程序。在一些实施例中,CPU通过配置包处理电路用来处理数据消息的流条目而配置包处理电路以实施网络虚拟化操作。Although not shown in the figure, in some embodiments, each smart NIC is a NIC that includes (i) packet processing circuitry, such as an application specific integrated circuit (ASIC), (ii) a general purpose central processing unit (CPU), and (iii) memory. In some embodiments, the packet processing circuitry is an I/O ASIC that handles the processing of data messages forwarded to and from the DCN in the host computer and is controlled at least in part by the CPU. In other embodiments, the packet processing circuitry is a field programmable gate array (FPGA) configured to perform packet processing operations or a firmware programmable processing core specialized for network processing (which differs from a general purpose CPU in that the processing core is specialized and therefore more efficient at packet processing). In some embodiments, the CPU executes a NIC operating system that controls the packet processing circuitry and may run other programs. In some embodiments, the CPU configures the packet processing circuitry to implement network virtualization operations by configuring flow entries that the packet processing circuitry uses to process data messages.

当数据消息由VM 115到125中的一者发送时,所述数据消息(在主机计算机100的软件中)经由对应vNIC 135到145被发送。数据消息通过PCIe总线165被传递到适当智能NIC的对应VF 161到164。智能NIC ASIC处理数据消息以应用所配置的网络虚拟化操作185,然后(只要数据消息不需要被发送到主机计算机的另一智能NIC且数据消息的目的地在主机计算机外部)将数据消息从其物理端口181到184中的一者发送出去。When a data message is sent by one of the VMs 115 to 125, the data message is sent (in the software of the host computer 100) via the corresponding vNIC 135 to 145. The data message is passed to the corresponding VF 161 to 164 of the appropriate intelligent NIC over the PCIe bus 165. The intelligent NIC ASIC processes the data message to apply the configured network virtualization operation 185 and then (as long as the data message does not need to be sent to another intelligent NIC of the host computer and the destination of the data message is external to the host computer) sends the data message out of one of its physical ports 181 to 184.

应注意,尽管图1图解说明具有各种VM在上面进行操作的虚拟化软件的主机计算机,但本文对智能NIC的论述也适用于主控其它类型的虚拟化DCN(例如,容器)的主机计算机以及裸机计算装置(即,计算机不在所述裸机计算装置上执行虚拟化软件)。在后一情形中,裸机计算装置通常将直接存取多个智能NIC的PF,而非任何VF。也就是说,智能NIC用于提供网络虚拟化(或其它操作,例如存储虚拟化),而计算装置上的软件不知道这些操作。It should be noted that although FIG. 1 illustrates a host computer with virtualization software on which various VMs operate, the discussion of smart NICs herein also applies to host computers hosting other types of virtualized DCNs (e.g., containers) as well as bare metal computing devices (i.e., the computer does not execute virtualization software on the bare metal computing device). In the latter case, the bare metal computing device will typically directly access the PFs of multiple smart NICs, rather than any VFs. That is, the smart NICs are used to provide network virtualization (or other operations, such as storage virtualization), without the software on the computing device being aware of these operations.

图2在概念上图解说明一些实施例的单个主机计算机的一组智能NIC 205到210的共同操作。这些智能NIC中的每一者包含用于与主机计算机的VM进行通信的多个VF以及用于与数据中心网络进行通信的多个物理端口(例如,可使用或可不使用智能NIC的其它主机计算机也连接到所述多个物理端口)。2 conceptually illustrates the collective operation of a set of smart NICs 205-210 of a single host computer of some embodiments. Each of these smart NICs includes multiple VFs for communicating with the host computer's VMs and multiple physical ports for communicating with a data center network (e.g., to which other host computers that may or may not use the smart NICs are also connected).

智能NIC中的每一者运行(即,在相应智能NIC的CPU上)智能NIC操作系统215到220。每一智能NIC操作系统215到220控制智能NIC的ASIC且执行额外操作,例如网络虚拟化操作225及存储虚拟化操作230。这些操作225及230(以及在其它实施例中,其它类型的操作)跨越主机计算机的各种智能NIC 215到220而散布,使得智能NIC显现为作为单个实体操作(即,以与主机计算机的虚拟化软件是单个实体相同的方式)。如上文所指示,网络虚拟化操作225包含针对一或多个逻辑转发元件执行数据消息的逻辑交换及/或路由、应用散布式防火墙规则、执行网络地址转换以及其它网络特征。如果智能NIC 205到210中的每一者经配置以执行相同网络虚拟化操作,那么智能NIC中的任一者可接收被引导到或发送自在主机计算机上执行的DCN中的一者的数据消息且正确处理此数据消息。Each of the smart NICs runs (i.e., on the CPU of the respective smart NIC) a smart NIC operating system 215-220. Each smart NIC operating system 215-220 controls the ASIC of the smart NIC and performs additional operations, such as network virtualization operations 225 and storage virtualization operations 230. These operations 225 and 230 (and in other embodiments, other types of operations) are spread across the various smart NICs 215-220 of the host computer so that the smart NICs appear to operate as a single entity (i.e., in the same way that the virtualization software of the host computer is a single entity). As indicated above, the network virtualization operations 225 include performing logical switching and/or routing of data messages for one or more logical forwarding elements, applying distributed firewall rules, performing network address translation, and other network features. If each of the smart NICs 205-210 is configured to perform the same network virtualization operations, any of the smart NICs can receive a data message directed to or sent from one of the DCNs executing on the host computer and properly process such a data message.

类似地,如果存储虚拟化操作230跨越所有智能NIC进行配置,那么VM可绑定到智能NIC中的任一者且可处置从VM到虚拟存储网络的I/O请求。VM绑定到智能NIC网络适配器VF以进行网络操作,而VM出于存储虚拟化目的而绑定到的VF是存储VF(例如,快速非易失性存储器(NVMe)装置或小型计算机系统接口(SCSI)装置)。Similarly, if storage virtualization operations 230 are configured across all smart NICs, a VM can bind to any of the smart NICs and can handle I/O requests from the VM to the virtual storage network. The VM binds to the smart NIC network adapter VF for network operations, while the VF to which the VM binds for storage virtualization purposes is a storage VF (e.g., a non-volatile memory express (NVMe) device or a small computer system interface (SCSI) device).

为了使多个智能NIC像作为单个实体操作一样执行这些操作(类似于主机计算机的管理程序),在智能NIC之间可需要通信。因此,在一些实施例中,在智能NIC之间建立专用通信信道以实现智能NIC之间的通信。In order for multiple intelligent NICs to perform these operations as if they were a single entity (similar to the hypervisor of a host computer), communication may be required between the intelligent NICs. Therefore, in some embodiments, a dedicated communication channel is established between the intelligent NICs to enable communication between the intelligent NICs.

在一些实施例中,专用通信信道是物理上单独的信道。举例来说,在一些实施例中,智能NIC经由仅承载智能NIC之间的通信的一组物理电缆来连接。图3在概念上图解说明其中主机计算机300的智能NIC串联连接的一些实施例。如所展示,智能NIC 305到320中的每一者连接到另外两个智能NIC。也就是说,智能NIC 305连接到智能NIC 320及310,智能NIC 310连接到智能NIC 305及315,智能NIC 315连接到智能NIC 310及320,且因此智能NIC320连接到智能NIC 315及305。取决于智能NIC的物理布置,一些实施例不直接在末端处连接智能NIC(即,在所述实例中,智能NIC 305与320将不连接)。In some embodiments, the dedicated communication channel is a physically separate channel. For example, in some embodiments, the smart NICs are connected via a set of physical cables that carry only communications between the smart NICs. FIG. 3 conceptually illustrates some embodiments in which the smart NICs of a host computer 300 are connected in series. As shown, each of smart NICs 305-320 is connected to two other smart NICs. That is, smart NIC 305 is connected to smart NICs 320 and 310, smart NIC 310 is connected to smart NICs 305 and 315, smart NIC 315 is connected to smart NICs 310 and 320, and thus smart NIC 320 is connected to smart NICs 315 and 305. Depending on the physical arrangement of the smart NICs, some embodiments do not directly connect the smart NICs at the ends (i.e., in the example, smart NICs 305 and 320 will not be connected).

具有全环连接(如图3中所展示)允许智能NIC 305到320中的任一者在通信链路中的一者(或智能NIC自身中的一者)出故障的情形中与其它智能NIC中的任一者进行通信。举例来说,如果智能NIC 310与智能NIC 315之间的链路出故障,那么智能NIC 310仍可经由另外两个智能NIC 305及320到达智能NIC 315。类似地,如果智能NIC 310本身出故障,那么智能NIC 305仍可经由智能NIC 320到达智能NIC 315。Having a full ring connection (as shown in FIG. 3 ) allows any of the smart NICs 305-320 to communicate with any of the other smart NICs in the event that one of the communication links (or one of the smart NICs itself) fails. For example, if the link between smart NIC 310 and smart NIC 315 fails, smart NIC 310 can still reach smart NIC 315 via the other two smart NICs 305 and 320. Similarly, if smart NIC 310 itself fails, smart NIC 305 can still reach smart NIC 315 via smart NIC 320.

为实现更稳健的故障保护,一些实施例包含介于每一对智能NIC之间的专用通信信道链路(即,全网状连接)。图4在概念上图解说明其中主机计算机400的每一智能NIC直接连接到主机计算机的每一其它智能NIC的一些实施例。如所展示,智能NIC 405到420中的每一者直接连接到其它三个智能NIC 405到420中的每一者。在此设置中,如果主机计算机有N个智能NIC,那么每一智能NIC需要去往其它智能NIC的N-1个直接连接。这种设置对于具有合理较少数目个智能NIC(例如,3到5个智能NIC)的主机计算机是合理的,但对于较大数目个智能NIC变得较为困难。To achieve more robust fault protection, some embodiments include a dedicated communication channel link between each pair of smart NICs (i.e., a fully meshed connection). FIG. 4 conceptually illustrates some embodiments in which each smart NIC of a host computer 400 is directly connected to each other smart NIC of the host computer. As shown, each of smart NICs 405-420 is directly connected to each of the other three smart NICs 405-420. In this setup, if the host computer has N smart NICs, then each smart NIC requires N-1 direct connections to the other smart NICs. This setup is reasonable for host computers with a reasonably small number of smart NICs (e.g., 3 to 5 smart NICs), but becomes more difficult for larger numbers of smart NICs.

在一些实施例中,这些连接可使用单独专门构建的信道进行NIC间通信。在其它实施例中,如果智能NIC具有足够物理端口,那么连接可重新调整NIC的物理网络端口的用途(例如,使用以太网电缆—但如果存在多于两个智能NIC,那么这可需要两个网络端口)。其它实施例使用智能NIC的管理端口,如果这些端口可用且如果管理端口的带宽足够高以处置智能NIC之间的预期通信的话。在一些实施例中,启用专用通信信道的智能NIC组件与智能NIC的其它组件隔离。在此情形中,即使其它智能NIC组件为不可操作的(例如,由于固件或软件错误、硬件故障等),智能NIC仍能够至少在智能NIC之间中继业务。In some embodiments, these connections may use separate specially constructed channels for inter-NIC communication. In other embodiments, if the smart NIC has enough physical ports, the connections may repurpose the NIC's physical network ports (e.g., using an Ethernet cable—but if there are more than two smart NICs, this may require two network ports). Other embodiments use the management ports of the smart NICs, if those ports are available and if the bandwidth of the management ports is high enough to handle the intended communication between the smart NICs. In some embodiments, the smart NIC components that enable the dedicated communication channel are isolated from the other components of the smart NIC. In this case, the smart NICs are still able to at least relay traffic between the smart NICs even if the other smart NIC components are inoperable (e.g., due to firmware or software errors, hardware failures, etc.).

在其它实施例中,这些智能NIC经由单独物理交换机连接,使得每一智能NIC可通过物理交换机直接与任何其它智能NIC进行通信,而非使智能NIC彼此直接连接(无论是串行连接还是网状连接)。图5在概念上图解说明主机计算机500的通过单独物理交换机505而连接的智能NIC的实例。如所展示,主机计算机500的智能NIC 510到520中的每一者连接到经隔离物理交换机505。在一些实施例中,此物理交换机505仅处置智能NIC间通信(即,所述物理交换机并非是处置DCN之间的数据消息及/或管理及控制业务的数据中心网络的一部分)。事实上,此物理交换机甚至可能不使用用于在数据中心内承载网络业务的相同交换技术(例如,以太网或无限带宽(Infiniband))。与先前实例一样,这些连接可使用单独专门构建的信道、物理网络端口或管理端口。另外,针对冗余,一些实施例使用两个(或多于两个)单独经隔离交换机,其中每一智能NIC 510到520连接到这些经隔离交换机中的每一者。In other embodiments, these smart NICs are connected via separate physical switches, so that each smart NIC can communicate directly with any other smart NIC through the physical switch, rather than having the smart NICs directly connected to each other (whether serially or meshed). FIG. 5 conceptually illustrates an example of smart NICs of a host computer 500 connected via a separate physical switch 505. As shown, each of the smart NICs 510 to 520 of the host computer 500 is connected to an isolated physical switch 505. In some embodiments, this physical switch 505 only handles inter-smart NIC communications (i.e., the physical switch is not part of the data center network that handles data messages and/or management and control traffic between DCNs). In fact, this physical switch may not even use the same switching technology (e.g., Ethernet or Infiniband) used to carry network traffic within the data center. As with the previous examples, these connections may use separate specially constructed channels, physical network ports, or management ports. In addition, for redundancy, some embodiments use two (or more than two) separate isolated switches, with each smart NIC 510 to 520 connected to each of these isolated switches.

在一些实施例中,智能NIC经由使用现有物理连接的逻辑专用通信信道来进行通信,而非使用单独物理信道来进行智能NIC之间的专用通信(例如,如果不存在单独专门构建的信道且网络端口无法用于此用途)。举例来说,主机计算机的所有智能NIC通常将连接到同一个物理数据中心网络,因此可在所述网络上覆叠专用通信信道。In some embodiments, the Smart NICs communicate via a logical dedicated communication channel that uses an existing physical connection, rather than using a separate physical channel for dedicated communication between the Smart NICs (e.g., if a separate purpose-built channel does not exist and the network port cannot be used for this purpose). For example, all Smart NICs of a host computer will typically be connected to the same physical data center network, so the dedicated communication channel can be overlaid on that network.

图6在概念上图解说明各自具有三个智能NIC的两个主机计算机605及610,所述智能NIC连接到数据中心网络600且使用所述数据中心网络上的覆叠作为其相应专用通信信道。如所展示,第一主机计算机605包含连接到数据中心网络600的三个智能NIC 615到625,而第二主机计算机610也包含连接到数据中心网络600的三个智能NIC 630到640。6 conceptually illustrates two host computers 605 and 610 each having three smart NICs connected to a data center network 600 and using an overlay on the data center network as their respective dedicated communication channels. As shown, the first host computer 605 includes three smart NICs 615-625 connected to the data center network 600, while the second host computer 610 also includes three smart NICs 630-640 connected to the data center network 600.

这些相应组智能NIC中的每一者使用不同的覆叠网络(例如,使用封装)作为专用通信信道。第一组智能NIC 615到625使用第一覆叠网络645且第二组智能NIC 630到640使用第二覆叠网络650。用作专用通信信道的这些覆叠网络可为VXLAN网络、Geneve网络等。在一些实施例中,所使用的封装网络地址是与智能NIC的物理网络端口相关联的那些地址(即,用于封装其相应主机计算机上的DCN之间的数据业务的相同网络地址),而基础覆叠网络地址是与智能NIC操作系统相关联的逻辑地址(事实上,第一组智能NIC 615到625可使用与第二组智能NIC 630到640相同的一组覆叠网络地址。Each of these respective groups of smart NICs uses a different overlay network (e.g., using encapsulation) as a dedicated communication channel. The first group of smart NICs 615-625 uses a first overlay network 645 and the second group of smart NICs 630-640 uses a second overlay network 650. These overlay networks used as dedicated communication channels can be VXLAN networks, Geneve networks, etc. In some embodiments, the encapsulation network addresses used are those associated with the physical network ports of the smart NICs (i.e., the same network addresses used to encapsulate data traffic between DCNs on their respective host computers), while the underlying overlay network addresses are logical addresses associated with the smart NIC operating system (in fact, the first group of smart NICs 615-625 can use the same set of overlay network addresses as the second group of smart NICs 630-640).

覆叠网络的使用仅要求主机计算机的所有智能NIC附接到同一层3网络(但未必是同一子网)。因此,如果智能NIC中的一者仅连接到物理上独立的管理网络,但其它智能NIC连接到数据中心内的数据网络(而非连接到管理网络),那么智能NIC无法经由此覆叠网络进行通信。如果主机计算机的所有智能NIC均连接到同一数据链路层(层2)网络,那么一些其它实施例使用专属VLAN作为专用通信信道。然而,如果此现有的物理层2网络具有众多其它主机计算机(所述主机计算机具有其自身的需要单独VLAN的智能NIC组)且还针对这些主机计算机上的DCN承载数据消息,那么可达到单个网络上可用的VLAN的最大数目(4094)。The use of an overlay network requires only that all of the Smart NICs of a host computer be attached to the same layer 3 network (but not necessarily the same subnet). Thus, if one of the Smart NICs is connected only to a physically separate management network, but the other Smart NICs are connected to a data network within the data center (and not to the management network), then the Smart NICs cannot communicate via this overlay network. If all of the Smart NICs of a host computer are connected to the same data link layer (layer 2) network, then some other embodiments use a dedicated VLAN as a dedicated communication channel. However, if this existing physical layer 2 network has numerous other host computers (which have their own group of Smart NICs that require separate VLANs) and also carries data messages for the DCN on these host computers, then the maximum number of VLANs available on a single network (4094) may be reached.

在其它实施例中,主机计算机的智能NIC经由穿过所述主机计算机的专用通信信道进行通信。如上文所描述,智能NIC通常连接到主机计算机的PCIe子系统,所述PCIe子系统可用于专用通信信道。图7在概念上图解说明具有三个智能NIC 705到715的主机计算机700,所述智能NIC通过主机计算机的PCIe结构720进行通信。通过PCIe子系统进行通信通常允许任何智能NIC直接与其它智能NIC中的任一者进行通话。在不同实施例中,智能NIC使用PCIe的标准点对点传送特征、利用PCIe交换结构或使用在PCIe之上的其它增强功能(例如,计算快速链路(CXL))。In other embodiments, the smart NICs of a host computer communicate via a dedicated communication channel through the host computer. As described above, the smart NICs are typically connected to a PCIe subsystem of a host computer, which can be used for the dedicated communication channel. FIG. 7 conceptually illustrates a host computer 700 with three smart NICs 705-715 communicating through a PCIe fabric 720 of the host computer. Communicating through the PCIe subsystem typically allows any smart NIC to talk directly to any of the other smart NICs. In various embodiments, the smart NICs use the standard point-to-point transfer features of PCIe, utilize a PCIe switch fabric, or use other enhancements on top of PCIe (e.g., Compute Express Link (CXL)).

如所提及,专用通信信道的一种用途是使第一智能NIC将数据消息(例如,被发送到或来自主机计算机或在主机计算机上执行的DCN的数据消息)传递到第二智能NIC。智能NIC作为单个实体进行操作,这是因为其智能NIC操作系统共同实施一组虚拟网络操作(例如,实施逻辑交换机及/或路由器、防火墙等)。然而,每一智能NIC具有其自身的接口(例如,物理功能及虚拟功能)以及其自身的物理网络端口,主机计算机的DCN绑定到所述接口。As mentioned, one use of the dedicated communication channel is for a first smart NIC to pass a data message (e.g., a data message sent to or from a host computer or a DCN executing on a host computer) to a second smart NIC. The smart NICs operate as a single entity because their smart NIC operating systems collectively implement a set of virtual network operations (e.g., implement logical switches and/or routers, firewalls, etc.). However, each smart NIC has its own interfaces (e.g., physical functions and virtual functions) and its own physical network ports to which the host computer's DCN is bound.

图8在概念上图解说明用于在智能NIC处处理数据消息的一些实施例的过程800,所述智能NIC是主机计算机的多个智能NIC中的一者。如上文所描述,智能NIC中的每一者具有一或多个接口,且在主机计算机上操作的DCN各自绑定到不同接口。另外,对数据消息执行的虚拟网络操作已被推送到智能NIC操作系统中(而非由在主机计算机的管理程序中执行的转发元件执行)。将参考图9到11部分地描述过程800,图9到11图解说明由智能NIC正在处理的数据消息的实例。FIG. 8 conceptually illustrates a process 800 of some embodiments for processing a data message at a smart NIC, which is one of a plurality of smart NICs of a host computer. As described above, each of the smart NICs has one or more interfaces, and the DCNs operating on the host computer are each bound to a different interface. In addition, the virtual network operations performed on the data message have been pushed into the smart NIC operating system (rather than being performed by a forwarding element executing in the hypervisor of the host computer). Process 800 will be described in part with reference to FIGS. 9-11, which illustrate examples of data messages being processed by the smart NIC.

如所展示,过程800以在智能NIC处接收(在805处)数据消息开始。此数据消息可通过智能NIC的物理端口而从数据中心网络接收(例如,如图9中)或者通过绑定到主机计算机或主机计算机上的一或多个DCN的智能NIC的接口而从主机计算机(例如,从在主机计算机上执行的DCN)接收(例如,如图10及11中)。应理解,术语数据消息、包、数据包或消息在本文中用于指代可在网络端点之间(例如,在主机中的DCN之间及/或跨越物理网络)发送的各种格式化的位集合,例如以太网帧、IP包、TCP分段、UDP数据报等。尽管本文中的实例是指数据消息、包、数据包或消息,但应理解,本发明不应限制于任何特定格式或类型的数据消息。As shown, process 800 begins with receiving (at 805) a data message at a smart NIC. This data message may be received from a data center network through a physical port of the smart NIC (e.g., as in FIG. 9 ) or from a host computer (e.g., from a DCN executing on a host computer) through an interface of the smart NIC bound to a host computer or one or more DCNs on the host computer (e.g., as in FIGS. 10 and 11 ). It should be understood that the terms data message, packet, data packet, or message are used herein to refer to various formatted collections of bits that may be sent between network endpoints (e.g., between DCNs in a host and/or across a physical network), such as Ethernet frames, IP packets, TCP segments, UDP datagrams, etc. Although the examples herein refer to data messages, packets, data packets, or messages, it should be understood that the present invention should not be limited to any particular format or type of data message.

然后,过程800基于数据消息标头而对所接收数据消息应用(在810处)网络虚拟化操作。如上文所描述,这些操作可包含逻辑交换(例如,基于数据消息的逻辑目的地MAC地址)、逻辑路由(例如,基于数据消息的逻辑目的地IP地址)、散布式防火墙操作(例如,基于数据消息的连接五元组,包含源及目的地IP地址、传送层协议以及源及目的地传送层端口)、网络地址转换、封装(如果需要)以及通常由主机计算机的管理程序执行的其它操作。如果智能NIC共同实施虚拟网络操作,那么首先接收数据消息的智能NIC执行此处理。当第一智能NIC通过专用通信信道从第二智能NIC接收到数据消息时,第二智能NIC通常将已执行所需的网络虚拟化操作(或这些操作的大部分),且第一智能NIC可利用最少额外处理来确定数据消息的目的地。The process 800 then applies (at 810) network virtualization operations to the received data message based on the data message header. As described above, these operations may include logical switching (e.g., based on the logical destination MAC address of the data message), logical routing (e.g., based on the logical destination IP address of the data message), distributed firewall operations (e.g., based on the connection 5-tuple of the data message, including source and destination IP addresses, transport layer protocol, and source and destination transport layer ports), network address translation, encapsulation (if necessary), and other operations typically performed by the hypervisor of the host computer. If the smart NICs jointly implement virtual network operations, the smart NIC that first receives the data message performs this processing. When the first smart NIC receives the data message from the second smart NIC over the dedicated communication channel, the second smart NIC will typically have performed the required network virtualization operations (or most of these operations), and the first smart NIC can determine the destination of the data message with minimal additional processing.

基于这些网络虚拟化操作,智能NIC能够确定数据消息的目的地。应理解,过程800是概念性过程且未必反映由智能NIC执行的特定操作。举例来说,智能NIC通常将仅识别数据消息的匹配记录(例如,流记录)并执行由所述匹配记录指定的动作,而非执行关于目的地是否是特定类型的一系列确定(即,在操作815、825及840中所展示的那些)。还应注意,此过程并未涵盖数据消息处理选项的全部范围。举例来说,在一些实施例中,智能NIC可由于防火墙规则、拥塞等而阻止及/或丢弃数据消息。Based on these network virtualization operations, the smart NIC is able to determine the destination of the data message. It should be understood that process 800 is a conceptual process and does not necessarily reflect specific operations performed by the smart NIC. For example, the smart NIC will typically only identify a matching record (e.g., a flow record) for the data message and perform the action specified by the matching record, rather than performing a series of determinations (i.e., those shown in operations 815, 825, and 840) as to whether the destination is of a particular type. It should also be noted that this process does not cover the full range of data message processing options. For example, in some embodiments, the smart NIC may block and/or drop the data message due to firewall rules, congestion, etc.

过程800确定(在815处)数据消息的目的地是否为绑定到当前智能NIC(即,执行过程800的智能NIC)的DCN。这可能是针对从外部网络或从主机计算机上的其它DCN(其可绑定到智能NIC中的任一者)接收的数据消息的情形。当目的地是绑定到当前智能NIC的此DCN时,所述过程经由目的地DCN所绑定到的接口而从智能NIC输出(在820处)。在一些实施例中,数据消息然后由主机计算机处置(例如,经由vNIC被发送到DCN或直接被发送到在DCN上执行的VF驱动程序,而无需主机计算机的管理程序中的额外网络虚拟化处理)。Process 800 determines (at 815) whether the destination of the data message is a DCN bound to the current smart NIC (i.e., the smart NIC executing process 800). This may be the case for data messages received from an external network or from other DCNs on the host computer (which may be bound to any of the smart NICs). When the destination is this DCN bound to the current smart NIC, the process is output from the smart NIC via the interface to which the destination DCN is bound (at 820). In some embodiments, the data message is then handled by the host computer (e.g., sent to the DCN via a vNIC or directly to a VF driver executing on the DCN without requiring additional network virtualization processing in the hypervisor of the host computer).

当数据消息的目的地并非是绑定到当前智能NIC的DCN时,过程800确定(在825处)目的地是否为绑定到主机计算机的不同智能NIC的DCN。这可能是针对从外部网络或从主机计算机上的绑定到当前智能NIC的其它DCN接收的数据消息的情形。此外,如果专用通信信道在每一对智能NIC之间不具有直接通信,那么第一智能NIC可能从第二智能NIC接收数据消息且需要将所述数据消息发送到第三智能NIC(例如,在图3中所展示的实例中)。当目的地是绑定到另一智能NIC的此DCN时,过程800经由智能NIC之间的专用通信信道而将数据消息发送(在830处)到所述另一智能NIC(或中间智能NIC,如果NIC串联连接的话)。When the destination of the data message is not a DCN bound to the current smart NIC, the process 800 determines (at 825) whether the destination is a DCN bound to a different smart NIC bound to the host computer. This may be the case for data messages received from an external network or from other DCNs on the host computer that are bound to the current smart NIC. In addition, if the dedicated communication channel does not have direct communication between each pair of smart NICs, then the first smart NIC may receive a data message from the second smart NIC and need to send the data message to a third smart NIC (e.g., in the example shown in FIG. 3). When the destination is this DCN bound to another smart NIC, the process 800 sends (at 830) the data message to the other smart NIC (or an intermediate smart NIC if the NICs are connected in series) via the dedicated communication channel between the smart NICs.

图9在概念上图解说明在第一智能NIC 900处经由所述智能NIC的物理端口905而接收的两个数据消息910及915的路径。在此实例中,主机计算机925的智能NIC 900及920中的每一者各自具有单个物理端口。至少两个VM 930及935在主机计算机925上执行,且为了简单起见,仅展示绑定到这些VM的VF。第一智能NIC 900提供第一VM 930所绑定到(例如,经由其vNIC,未展示)的VF 940,而第二智能NIC 920提供第二VM 935所绑定到(也经由其vNIC)的VF 945。智能NIC 900及920中的每一者执行网络虚拟化操作950,且专用通信信道955连接所述两个智能NIC。此专用通信信道可为上文所描述的类型中的任一者(例如,单独物理信道、智能NIC的物理端口905及960所连接到的物理网络上的VLAN或覆叠网络,或者通过PCIe子系统的连接)。FIG. 9 conceptually illustrates the path of two data messages 910 and 915 received at a first smart NIC 900 via a physical port 905 of the smart NIC. In this example, each of the smart NICs 900 and 920 of a host computer 925 each has a single physical port. At least two VMs 930 and 935 execute on the host computer 925, and for simplicity, only the VFs bound to these VMs are shown. The first smart NIC 900 provides a VF 940 to which the first VM 930 is bound (e.g., via its vNIC, not shown), while the second smart NIC 920 provides a VF 945 to which the second VM 935 is bound (also via its vNIC). Each of the smart NICs 900 and 920 performs network virtualization operations 950, and a dedicated communication channel 955 connects the two smart NICs. This dedicated communication channel may be any of the types described above (eg, a separate physical channel, a VLAN or overlay network on the physical network to which the intelligent NIC's physical ports 905 and 960 are connected, or a connection through a PCIe subsystem).

智能NIC 900对数据消息910及915中的每一者执行网络虚拟化操作950。由于第一数据消息910的目的地地址是绑定到所述智能NIC 900的VM1 930的地址,因此智能NIC 900经由VF 940向VM 930输出数据消息910。另一方面,应用于第二数据消息915的网络虚拟化操作950识别出此数据消息915的目的地地址是绑定到第二智能NIC 920的VM2 935的地址。如此,第一智能NIC 900经由专用通信信道955将此数据消息915传递到第二智能NIC 920。在一些实施例中,第一智能NIC 900还向第二智能NIC 920提供关于由网络虚拟化操作950对数据消息的处理的上下文信息,使得此处理不需要在第二智能NIC 920处被完全重复。在一些实施例中,第二智能NIC 920应用网络虚拟化操作950来评估此上下文并确定数据消息915应被发送到VM2 935。如此,智能NIC 920经由VF 945向VM 935输出数据消息915。The smart NIC 900 performs a network virtualization operation 950 on each of the data messages 910 and 915. Since the destination address of the first data message 910 is the address of VM1 930 bound to the smart NIC 900, the smart NIC 900 outputs the data message 910 to the VM 930 via the VF 940. On the other hand, the network virtualization operation 950 applied to the second data message 915 recognizes that the destination address of this data message 915 is the address of VM2 935 bound to the second smart NIC 920. As such, the first smart NIC 900 delivers this data message 915 to the second smart NIC 920 via the dedicated communication channel 955. In some embodiments, the first smart NIC 900 also provides the second smart NIC 920 with context information about the processing of the data message by the network virtualization operation 950 so that this processing does not need to be fully repeated at the second smart NIC 920. In some embodiments, the second smart NIC 920 applies network virtualization operations 950 to evaluate this context and determine that the data message 915 should be sent to VM2 935. As such, the smart NIC 920 outputs the data message 915 to VM 935 via the VF 945.

返回到图8,如果数据消息的目的地并非是主机计算机上的DCN,那么(假定数据消息不会被丢弃或阻止)目的地在主机计算机外部。如此,过程800识别(在835处)用于数据消息的物理网络输出端口。在一些情形中,所有智能NIC的所有端口均在链路聚合群组(LAG)或其它成组机制中进行成组。在此情形中,用于绑定到特定智能NIC的单个DCN的连接会跨越所有智能NIC的所有物理输出端口而进行负载平衡,且不仅是由接收数据消息的智能NIC输出。在其它情形中,不同智能NIC端口可能具有不同连接性,使得针对特定目的地的数据消息需要从一个智能NIC被输出,且针对其它目的地的数据消息需要从另一智能NIC被输出(不考虑任何负载平衡操作)。Returning to FIG. 8 , if the destination of the data message is not a DCN on the host computer, then (assuming that the data message will not be dropped or blocked) the destination is external to the host computer. Thus, process 800 identifies (at 835) the physical network output port for the data message. In some cases, all ports of all intelligent NICs are grouped in a link aggregation group (LAG) or other grouping mechanism. In this case, the connection for a single DCN bound to a specific intelligent NIC is load balanced across all physical output ports of all intelligent NICs, and is not only output by the intelligent NIC receiving the data message. In other cases, different intelligent NIC ports may have different connectivity, so that data messages for a specific destination need to be output from one intelligent NIC, and data messages for other destinations need to be output from another intelligent NIC (regardless of any load balancing operation).

如此,过程800确定(在840处)所识别物理网络输出端口是在另一智能NIC上还是在当前智能NIC上。如果用于数据消息的输出端口是另一智能NIC的端口,那么过程800经由智能NIC之间的专用通信信道而将数据消息发送(在830处)到所述另一智能NIC(或中间智能NIC,如果NIC串联连接的话)。另一方面,如果所识别输出端口是当前智能NIC的端口,那么过程800经由所识别输出端口而将数据消息输出(在845处)到物理网络。在经由专用通信信道将数据消息输出到DCN(经由当前智能NIC的接口)、物理网络或另一智能NIC之后,过程800结束。As such, process 800 determines (at 840) whether the identified physical network output port is on another smart NIC or on the current smart NIC. If the output port for the data message is a port of another smart NIC, process 800 sends (at 830) the data message to the other smart NIC (or an intermediate smart NIC if the NICs are connected in series) via a dedicated communication channel between smart NICs. On the other hand, if the identified output port is a port of the current smart NIC, process 800 outputs (at 845) the data message to the physical network via the identified output port. After outputting the data message to the DCN (via the interface of the current smart NIC), the physical network, or another smart NIC via the dedicated communication channel, process 800 ends.

图10在概念上图解说明在第一智能NIC 900处经由第一VM 930所绑定到的VF 940而从所述第一VM接收的两个数据消息1005及1010的路径。如所展示,第一数据消息1005被引导到第一目的地(Dest1),而第二数据消息1010被引导到第二目的地(Dest2)。智能NIC900根据所配置的网络虚拟化操作950来处理这两个数据消息,这在此情形中(i)确定所述两个数据消息均应被输出到物理网络且(ii)包含跨越多个输出端口(例如,在LAG中)的负载平衡。在一些实施例中,连接中的仅第一数据消息应用了负载平衡操作,而对于后续操作,经高速缓存的结果将数据消息引导到同一物理输出端口。FIG. 10 conceptually illustrates the path of two data messages 1005 and 1010 received from a first VM 930 via a VF 940 to which the first VM is bound at a first smart NIC 900. As shown, the first data message 1005 is directed to a first destination (Dest1), while the second data message 1010 is directed to a second destination (Dest2). The smart NIC 900 processes the two data messages according to the configured network virtualization operation 950, which in this case (i) determines that both data messages should be output to the physical network and (ii) includes load balancing across multiple output ports (e.g., in a LAG). In some embodiments, only the first data message in a connection has the load balancing operation applied, and for subsequent operations, the cached results direct the data messages to the same physical output port.

基于这些操作,智能NIC 900经由其自身的物理端口905将第一数据消息1005输出到物理网络。然而,第二数据消息1010通过专用通信信道955而被发送到第二智能NIC 920。在一些实施例中,第一智能NIC 900还提供上下文信息,所述上下文信息指示已对数据消息1010执行了网络虚拟化操作且应经由第二智能NIC 920的物理端口960而输出所述数据消息。第二智能NIC 920经由专用通信信道955而接收第二数据消息1010且经由其物理端口960而将此数据消息1010输出到物理网络。Based on these operations, the smart NIC 900 outputs the first data message 1005 to the physical network via its own physical port 905. However, the second data message 1010 is sent to the second smart NIC 920 via the dedicated communication channel 955. In some embodiments, the first smart NIC 900 also provides context information indicating that a network virtualization operation has been performed on the data message 1010 and that the data message should be output via the physical port 960 of the second smart NIC 920. The second smart NIC 920 receives the second data message 1010 via the dedicated communication channel 955 and outputs the data message 1010 to the physical network via its physical port 960.

如上文参考图8所描述,当绑定到一个智能NIC的接口的DCN向绑定到另一智能NIC的接口的DCN发送数据消息时,也使用专用通信信道。专用通信信道允许数据消息直接在智能NIC之间发送,而非第一智能NIC将数据消息输出到物理网络上以经由第二智能NIC而被交换及/或路由回到主机计算机。As described above with reference to Figure 8, a dedicated communication channel is also used when a DCN bound to an interface of one smart NIC sends a data message to a DCN bound to an interface of another smart NIC. The dedicated communication channel allows data messages to be sent directly between smart NICs, rather than the first smart NIC outputting the data message onto the physical network to be switched and/or routed back to the host computer via the second smart NIC.

图11在概念上图解说明从在主机计算机925上操作的第一VM 930发送到第二VM935的数据消息1100的路径。此处,第一智能NIC 900经由源VM 930所绑定到的VF 940而接收数据消息1100。智能NIC 900对数据消息1100应用网络虚拟化操作950以确定数据消息的目的地是绑定到第二智能NIC 920的VM。基于此确定,智能NIC 900经由智能NIC之间的专用通信信道955而将数据消息1100发送到第二智能NIC 920。如在其它实例中一样,在一些实施例中,第一智能NIC 900还提供上下文信息,所述上下文信息指示已对数据消息1100执行了网络虚拟化操作且所述数据消息被引导到绑定到智能NIC 920的DCN(尽管不必指示DCN绑定到哪一接口)。第二智能NIC 920经由专用通信信道955而接收数据消息1100且经由VF接口945而将数据消息输出到VM 935。11 conceptually illustrates the path of a data message 1100 sent from a first VM 930 operating on a host computer 925 to a second VM 935. Here, the first smart NIC 900 receives the data message 1100 via the VF 940 to which the source VM 930 is bound. The smart NIC 900 applies a network virtualization operation 950 to the data message 1100 to determine that the destination of the data message is a VM bound to the second smart NIC 920. Based on this determination, the smart NIC 900 sends the data message 1100 to the second smart NIC 920 via a dedicated communication channel 955 between smart NICs. As in other examples, in some embodiments, the first smart NIC 900 also provides context information indicating that the network virtualization operation has been performed on the data message 1100 and that the data message is directed to a DCN bound to the smart NIC 920 (although it is not necessary to indicate which interface the DCN is bound to). The second smart NIC 920 receives the data message 1100 via the dedicated communication channel 955 and outputs the data message to the VM 935 via the VF interface 945 .

上文所描述的过程800以及图9到11中所展示的实例涉及单播数据消息(即,具有单个目的地的数据消息)。在一些实施例中,单个数据消息(例如,广播或多播数据消息)可通过在一个智能NIC内执行的网络虚拟化操作而沿着多个路径被发送。The process 800 described above and the examples shown in Figures 9 to 11 involve unicast data messages (i.e., data messages with a single destination). In some embodiments, a single data message (e.g., a broadcast or multicast data message) can be sent along multiple paths through network virtualization operations performed within one intelligent NIC.

图12在概念上图解说明在第一智能NIC 900处经由所述智能NIC的物理端口905而接收的多播数据消息1200的路径。在此实例中,第一智能NIC 900对多播数据消息1200应用网络虚拟化操作950且确定第一VM 930及第二VM 935两者均在数据消息1200被发送到的多播群组中。基于此,智能NIC 900(i)经由VF 940而向第一VM 930输出多播数据消息1200的第一副本,且(ii)经由专用通信信道955而向第二智能NIC 920传递数据消息1200的第二副本。在一些实施例中,第一智能NIC 900还向第二智能NIC 920提供关于由网络虚拟化操作950对数据消息进行的处理的上下文信息,使得此处理不需要在第二智能NIC 920处完全重复。在一些实施例中,第二智能NIC 920应用网络虚拟化操作950来评估此上下文且确定多播数据消息1200应被发送到第二VM 935。如此,智能NIC 920经由VF 945而向VM 935输出数据消息1200。在一些实施例中,如果多播数据消息的多个目的地绑定到第二智能NIC 920,那么仅经由通信信道955而传递数据消息的一个副本,从而允许第二智能NIC 920产生并输出数据消息的必要副本。类似地,如果附接到第一智能NIC的VM中的一者发送广播或多播数据消息,那么接收方智能NIC可处理所述数据消息且产生将所述数据消息发送到附接到第一智能NIC的其它VM所需的任何副本、经由其物理端口而输出所述数据消息,及/或将所述数据消息传递到其它智能NIC(将所述数据消息发送到绑定到那些智能NIC的VM、经由其物理端口而输出所述数据消息,或者进行其组合)。12 conceptually illustrates the path of a multicast data message 1200 received at a first smart NIC 900 via a physical port 905 of the smart NIC. In this example, the first smart NIC 900 applies network virtualization operations 950 to the multicast data message 1200 and determines that both the first VM 930 and the second VM 935 are in the multicast group to which the data message 1200 is sent. Based on this, the smart NIC 900 (i) outputs a first copy of the multicast data message 1200 to the first VM 930 via a VF 940, and (ii) delivers a second copy of the data message 1200 to the second smart NIC 920 via a dedicated communication channel 955. In some embodiments, the first smart NIC 900 also provides context information to the second smart NIC 920 regarding the processing of the data message by the network virtualization operations 950 so that such processing does not need to be fully repeated at the second smart NIC 920. In some embodiments, the second smart NIC 920 applies network virtualization operations 950 to evaluate this context and determines that the multicast data message 1200 should be sent to the second VM 935. As such, the smart NIC 920 outputs the data message 1200 to the VM 935 via the VF 945. In some embodiments, if multiple destinations for the multicast data message are bound to the second smart NIC 920, only one copy of the data message is delivered via the communication channel 955, allowing the second smart NIC 920 to generate and output the necessary copies of the data message. Similarly, if one of the VMs attached to the first smart NIC sends a broadcast or multicast data message, the receiving smart NIC may process the data message and generate any copies needed to send the data message to other VMs attached to the first smart NIC, output the data message via its physical ports, and/or pass the data message to other smart NICs (sending the data message to the VMs bound to those smart NICs, outputting the data message via their physical ports, or a combination thereof).

如果智能NIC的所有物理网络端口均不可操作,但智能NIC自身仍可操作,那么会出现可需要使用专用通信信道来在智能NIC之间传递数据消息的另一情况。在此情形中,智能NIC仍可对从绑定到所述智能NIC的DCN发送的数据消息执行虚拟网络操作,但将需要将那些数据消息发送到其它智能NIC以输出到物理网络,而不管端口是否在LAG中操作。当端口在LAG中操作或者智能NIC在NIC小组中使用另一成组机制进行配置时,先前被指派给不可操作的物理端口的连接将被移动到另一物理端口(例如,在另一智能NIC上)。Another situation where a dedicated communication channel may need to be used to pass data messages between smart NICs occurs if all of the physical network ports of a smart NIC are inoperable, but the smart NIC itself is still operational. In this case, the smart NIC can still perform virtual network operations on data messages sent from the DCN bound to the smart NIC, but will need to send those data messages to other smart NICs for output to the physical network, regardless of whether the ports are operating in a LAG. When the ports are operating in a LAG or the smart NICs are configured in a NIC team using another teaming mechanism, the connection previously assigned to the inoperable physical port will be moved to another physical port (e.g., on another smart NIC).

图13在概念上图解说明在第一智能NIC 900的物理端口905已出现故障之后针对图10中所展示的连接中的一者的数据消息的路径。这可能是由于智能NIC自身的问题、将端口905连接到数据中心网络的物理电缆断开连接等发生的。如所展示,被引导到Dest1的另一数据消息1300经由VF 940从VM 930被发送到智能NIC 900。智能NIC 900根据所配置的网络虚拟化操作950而处理数据消息,这确定数据消息应被输出到物理网络,但先前用于此连接的物理端口905不再可用。如此,连接现在被重新平衡以使用第二智能NIC 920的另一物理端口960。因此,数据消息1300经由专用通信信道955而被发送到第二智能NIC 920。在一些实施例中,第一智能NIC 900还提供上下文信息,所述上下文信息指示已对数据消息1300执行了网络虚拟化操作且应经由第二智能NIC 920的物理端口960而输出所述数据消息。第二智能NIC 920经由专用通信信道955而接收第二数据消息1300且经由其物理端口960而将此数据消息1300输出到物理网络。FIG. 13 conceptually illustrates the path of a data message for one of the connections shown in FIG. 10 after a physical port 905 of the first smart NIC 900 has failed. This may occur due to a problem with the smart NIC itself, a physical cable connecting port 905 to the data center network being disconnected, etc. As shown, another data message 1300 directed to Dest1 is sent from VM 930 to smart NIC 900 via VF 940. Smart NIC 900 processes the data message according to the configured network virtualization operation 950, which determines that the data message should be output to the physical network, but the physical port 905 previously used for this connection is no longer available. As such, the connection is now rebalanced to use another physical port 960 of the second smart NIC 920. Therefore, data message 1300 is sent to the second smart NIC 920 via dedicated communication channel 955. In some embodiments, the first smart NIC 900 also provides context information indicating that a network virtualization operation has been performed on the data message 1300 and that the data message should be output via the physical port 960 of the second smart NIC 920. The second smart NIC 920 receives the second data message 1300 via the dedicated communication channel 955 and outputs the data message 1300 to the physical network via its physical port 960.

在许多情况中,智能NIC从网络管理及控制系统接收用于虚拟网络操作的配置数据。在一些实施例中,此网络管理及控制系统从用户(例如,网络及/或安全管理员)接收定义网络操作(例如,定义逻辑网络)、安全操作等的数据,且使用此定义数据来针对各种网络元件(例如,如虚拟交换机及路由器等转发元件、如散布式防火墙等中间盒元件等)产生配置数据,并且将配置数据提供到网络元件,使得网络元件可实施各种网络及安全操作。此类网络元件包含执行网络虚拟化操作的智能NIC。In many cases, the smart NIC receives configuration data for virtual network operations from a network management and control system. In some embodiments, such a network management and control system receives data defining network operations (e.g., defining logical networks), security operations, etc. from a user (e.g., a network and/or security administrator), and uses this definition data to generate configuration data for various network elements (e.g., forwarding elements such as virtual switches and routers, middlebox elements such as distributed firewalls, etc.), and provides the configuration data to the network elements so that the network elements can implement various network and security operations. Such network elements include smart NICs that perform network virtualization operations.

不同智能NIC的端口中的每一者(可能包含管理端口)具有其自身的网络地址,但许多网络管理及控制系统将每一主机计算机视为单个实体。举例来说,对于不使用智能NIC进行网络虚拟化操作的主机计算机,一些实施例的网络管理及控制系统与主机计算机的管理程序中的代理进行通信。网络管理及控制系统针对每一主机计算机使用单个管理网络地址且因此不应直接与主机计算机的所有多个智能NIC进行通信。Each of the ports of the different smart NICs, possibly including the management port, has its own network address, but many network management and control systems treat each host computer as a single entity. For example, for host computers that do not use smart NICs for network virtualization operations, the network management and control system of some embodiments communicates with an agent in the hypervisor of the host computer. The network management and control system uses a single management network address for each host computer and therefore should not communicate directly with all of the multiple smart NICs of the host computer.

在一些实施例中,智能NIC使用群集技术,以便向网络管理及控制系统显现为主机计算机的单个实体。举例来说,在一些实施例中,主机计算机的智能NIC执行领导者选取以确定所述智能NIC中与网络管理及控制系统进行通信的单个智能NIC。在一些此类实施例中,智能NIC操作系统中的每一者运行确定性算法,所述确定性算法选择智能NIC中的一者作为联系点。此领导者选取所需的任何消息均经由专用通信信道进行传递。In some embodiments, the Smart NICs use clustering techniques in order to appear as a single entity to the host computer to the network management and control system. For example, in some embodiments, the Smart NICs of the host computer perform leader election to determine a single one of the Smart NICs to communicate with the network management and control system. In some such embodiments, each of the Smart NIC operating systems runs a deterministic algorithm that selects one of the Smart NICs as the point of contact. Any messages required for this leader election are communicated via a dedicated communication channel.

图14在概念上图解说明用于配置多个智能NIC以执行网络虚拟化操作的一些实施例的过程1400。在一些实施例中,当智能NIC联机时(例如,当主机计算机启动时),过程1400由主机计算机的一组智能NIC中的每一者独立地执行。过程1400也可响应于智能NIC小组成员资格的改变(例如,智能NIC添加到NIC小组或智能NIC从NIC小组移除,无论是由于外部动作还是NIC故障)而被执行(再次,由主机计算机的每一智能NIC独立地执行)。在一些实施例中,智能NIC使用信标或保持活动消息(例如,经由专用通信信道发送)来监测小组成员资格。参考图15及16部分地描述过程1400,图15及16图解说明主机计算机(未展示)的一对智能NIC 1500与1505的操作。FIG. 14 conceptually illustrates a process 1400 of some embodiments for configuring multiple smart NICs to perform network virtualization operations. In some embodiments, process 1400 is independently performed by each of a group of smart NICs of a host computer when the smart NICs are online (e.g., when the host computer boots up). Process 1400 may also be performed (again, independently performed by each smart NIC of the host computer) in response to a change in smart NIC team membership (e.g., a smart NIC is added to a NIC team or a smart NIC is removed from a NIC team, whether due to an external action or a NIC failure). In some embodiments, the smart NICs monitor team membership using beacons or keep-alive messages (e.g., sent over a dedicated communication channel). Process 1400 is described in part with reference to FIGS. 15 and 16, which illustrate the operation of a pair of smart NICs 1500 and 1505 of a host computer (not shown).

图15图解说明智能NIC 1500及1505中的每一者分别执行智能NIC操作系统1510及1515。智能NIC操作系统包含多个模块,例如网络虚拟化操作1520及1525、控制代理1530及1535以及领导者选取模块1540及1545。专用通信信道1550连接两个智能NIC且允许智能NIC之间的通信(例如,用于发送数据消息、配置数据等)。15 illustrates that each of the smart NICs 1500 and 1505 executes a smart NIC operating system 1510 and 1515, respectively. The smart NIC operating system includes multiple modules, such as network virtualization operations 1520 and 1525, control agents 1530 and 1535, and leader election modules 1540 and 1545. A dedicated communication channel 1550 connects the two smart NICs and allows communication between the smart NICs (e.g., for sending data messages, configuration data, etc.).

在一些实施例中,控制代理1530及1535与网络管理及控制系统进行通信,所述网络管理及控制系统在数据中心中的众多主机计算机上配置网络虚拟化操作(例如,通过设置这些主机计算机来执行交换及/或路由以实施逻辑网络)。控制代理1530及1535从此网络管理及控制系统接收配置数据且使用所述配置数据来适当地配置其相应网络虚拟化操作1520及1525。控制代理1530及1535能够经由专用通信信道1550而彼此进行通信。In some embodiments, the control agents 1530 and 1535 communicate with a network management and control system that configures network virtualization operations on numerous host computers in a data center (e.g., by setting up those host computers to perform switching and/or routing to implement a logical network). The control agents 1530 and 1535 receive configuration data from this network management and control system and use the configuration data to appropriately configure their respective network virtualization operations 1520 and 1525. The control agents 1530 and 1535 are able to communicate with each other via a dedicated communication channel 1550.

领导者选取模块1540及1545执行领导者选取,以将智能NIC中的一者指派为用于特定任务(例如,与网络管理及控制系统进行通信)的领导者。领导者选取模块1540与1545可经由专用通信信道1550进行通信以便确认针对任务的领导者选取、共享识别信息,使得每一领导者选取模块知道主机计算机中的可被选择为任务领导者的所有智能NIC等。Leader election modules 1540 and 1545 perform leader election to designate one of the smart NICs as the leader for a particular task (e.g., communicating with a network management and control system). Leader election modules 1540 and 1545 may communicate via a dedicated communication channel 1550 to confirm leader election for a task, share identification information so that each leader election module is aware of all smart NICs in the host computer that may be selected as task leaders, etc.

如所展示,过程1400以使用(在1405处)领导者选取算法来确定哪一智能NIC是用于网络管理及控制系统的单一通信点而开始。在一些实施例中,此领导者选取算法是在主机计算机的智能NIC群组的每一个别智能NIC上单独执行的确定性算法。也就是说,如果存在五个智能NIC,那么所述五个智能NIC中的每一者运行领导者选取算法以得出相同的所选取领导者。此算法的实例是基于散列的决策,所述基于散列的决策使所述五个智能NIC的标识符散列且计算所得散列模5(智能NIC的数目)以确定领导者。在其它实施例中,领导者选取算法涉及智能NIC之间的通信及/或协商,以得出被指定为与网络管理及控制系统进行通信的所选取领导者智能NIC。As shown, process 1400 begins with using (at 1405) a leader election algorithm to determine which smart NIC is the single point of communication for the network management and control system. In some embodiments, this leader election algorithm is a deterministic algorithm that is executed separately on each individual smart NIC of a group of smart NICs of a host computer. That is, if there are five smart NICs, each of the five smart NICs runs the leader election algorithm to arrive at the same elected leader. An example of this algorithm is a hash-based decision that hashes the identifiers of the five smart NICs and calculates the resulting hash modulo 5 (the number of smart NICs) to determine the leader. In other embodiments, the leader election algorithm involves communication and/or negotiation between smart NICs to arrive at a elected leader smart NIC that is designated as communicating with the network management and control system.

一旦此选取已完成,过程1400便确定(在1410处)当前智能NIC(即,执行此过程的智能NIC)是否被选取为联系点。应理解,过程1400是概念性过程且每一智能NIC未必做出此特定确定。而是,被选取为领导者的智能NIC执行第一组操作,而其它智能NIC在领导者选取之后执行一组不同的操作。在图16(其图解说明配置数据向多个智能NIC的散布)的实例中,领导者选取模块1540经加粗以指示智能NIC 1500已被选取为将与网络管理及控制系统1600进行通信的领导者。Once this election has been completed, process 1400 determines (at 1410) whether the current smart NIC (i.e., the smart NIC executing this process) is elected as the contact point. It should be understood that process 1400 is a conceptual process and that each smart NIC does not necessarily make this particular determination. Rather, the smart NIC elected as the leader performs a first set of operations, while the other smart NICs perform a different set of operations after the leader election. In the example of FIG. 16 (which illustrates the distribution of configuration data to multiple smart NICs), leader election module 1540 is bolded to indicate that smart NIC 1500 has been elected as the leader to communicate with network management and control system 1600.

对于并非是与网络管理及控制系统的所选取联系点的智能NIC,过程1400最终经由专用通信信道从所选取智能NIC接收(在1415处)配置数据。应注意,此情况在所选取智能NIC从网络管理及控制系统接收到此配置数据并将所述数据散布到其它智能NIC之前将不会发生。For the Smart NIC that is not the selected point of contact with the network management and control system, the process 1400 eventually receives (at 1415) configuration data from the selected Smart NIC via the dedicated communication channel. It should be noted that this will not occur before the selected Smart NIC receives this configuration data from the network management and control system and disseminates the data to other Smart NICs.

在被选取为与网络管理及控制系统的联系点的智能NIC处,所述过程使用针对主机计算机指派的管理IP地址来建立(在1420处)与网络管理及控制系统的通信。在一些实施例中,网络管理及控制系统将每一主机计算机视为单个实体,这可不涉及每一主机计算机上的内部网络实施方案。为了建立通信,在一些实施例中,所选取智能NIC从管理IP地址向网络管理及控制系统发送消息或一组消息。在一些实施例中,网络管理及控制系统将自动使用所指派IP地址,但所选取智能NIC需要向数据中心网络通告:发送到所述IP地址的消息应被引导到其端口中的使用所述IP地址的特定端口。At the smart NIC selected as the point of contact with the network management and control system, the process establishes (at 1420) communications with the network management and control system using the management IP address assigned for the host computer. In some embodiments, the network management and control system treats each host computer as a single entity, which may not involve internal network implementations on each host computer. To establish communications, in some embodiments, the selected smart NIC sends a message or a set of messages from the management IP address to the network management and control system. In some embodiments, the network management and control system will automatically use the assigned IP address, but the selected smart NIC needs to notify the data center network that messages sent to the IP address should be directed to a specific one of its ports that uses the IP address.

一旦建立通信,所述过程便从网络管理及控制系统接收(在1425处)配置数据。在一些实施例中,此配置数据指定智能NIC应如何处置数据消息。配置数据可包含路由表、虚拟交换机配置、防火墙规则、网络地址转换规则、负载平衡规则等。在一些实施例中,针对在智能NIC操作系统上运行的特定类型的网络虚拟化软件,配置数据呈特定格式。在其它实施例中,配置数据呈通用格式且每一智能NIC上的控制器代理负责将数据转换成针对网络虚拟化软件的特定格式。图16图解说明网络管理及控制系统1600将配置数据1605提供到智能NIC 1500的控制代理1530,所述智能NIC已被选取为针对网络管理及控制系统1600的联系点。Once communication is established, the process receives (at 1425) configuration data from the network management and control system. In some embodiments, this configuration data specifies how the smart NIC should handle data messages. The configuration data may include routing tables, virtual switch configurations, firewall rules, network address translation rules, load balancing rules, etc. In some embodiments, the configuration data is in a specific format for a specific type of network virtualization software running on the smart NIC operating system. In other embodiments, the configuration data is in a generic format and the controller agent on each smart NIC is responsible for converting the data to a specific format for the network virtualization software. FIG. 16 illustrates that the network management and control system 1600 provides configuration data 1605 to the control agent 1530 of the smart NIC 1500, which has been selected as the contact point for the network management and control system 1600.

接下来,所述过程与其它智能NIC(即,不与网络管理及控制系统直接进行通信的那些智能NIC)共享(在1430处)所接收的配置数据。此数据经由智能NIC之间的专用通信信道而被提供到其它智能NIC。也在此时,其它智能NIC在其自身的过程中到达操作1415,这是因为所述其它智能NIC现在能够接收配置数据。Next, the process shares (at 1430) the received configuration data with other intelligent NICs (i.e., those intelligent NICs that are not in direct communication with the network management and control system). This data is provided to the other intelligent NICs via a dedicated communication channel between the intelligent NICs. Also at this point, the other intelligent NICs reach operation 1415 in their own processes because they are now able to receive configuration data.

过程1400(无论是在所选取智能NIC上执行还是在其它智能NIC中的一者上执行)接下来基于配置数据而在所述智能NIC上配置(在1435处)网络虚拟化操作。如所提及,在一些实施例中,控制代理使用从网络管理及控制系统接收到的配置数据(例如,作为第一组数据元组)来产生用于网络虚拟化操作的配置数据(例如,作为第二组数据元组)。在一些实施例中,智能NIC操作系统中的网络虚拟化操作及/或控制代理还基于此配置数据而对智能NIC的数据消息处理ASIC进行编程。然后,过程1400结束,但在实践中,当配置改变被提供给网络管理及控制系统时,所选取智能NIC将定期从所述系统接收更新。Process 1400 (whether executed on the selected smart NIC or on one of the other smart NICs) then configures (at 1435) network virtualization operations on the smart NIC based on the configuration data. As mentioned, in some embodiments, the control agent uses the configuration data received from the network management and control system (e.g., as a first set of data tuples) to generate configuration data for network virtualization operations (e.g., as a second set of data tuples). In some embodiments, the network virtualization operations and/or control agent in the smart NIC operating system also programs the data message processing ASIC of the smart NIC based on this configuration data. Process 1400 then ends, but in practice, the selected smart NIC will periodically receive updates from the network management and control system as configuration changes are provided to the system.

图16展示所选取智能NIC 1500上的控制代理1530将配置数据1605提供到第二智能NIC 1505上的控制代理1535(例如,经由专用通信信道1550)。控制代理1530还使用此配置数据1605来配置其智能NIC 1500上的网络虚拟化操作1520,而第二智能NIC 1505上的控制代理1535使用配置数据1605来配置其相应网络虚拟化操作1525。16 shows that the control agent 1530 on the selected smart NIC 1500 provides configuration data 1605 to the control agent 1535 on the second smart NIC 1505 (e.g., via a dedicated communication channel 1550). The control agent 1530 also uses this configuration data 1605 to configure the network virtualization operation 1520 on its smart NIC 1500, while the control agent 1535 on the second smart NIC 1505 uses the configuration data 1605 to configure its corresponding network virtualization operation 1525.

除了传播来自网络管理及控制系统的配置数据之外,在一些实施例中,领导者智能NIC还经由专用通信信道而接收来自其它智能NIC的信息。在一些实施例中,此信息包含统计数据(例如,数据消息处理统计数据)、状态/监测信息及其它数据。在一些实施例中,所选取领导者智能NIC基于此信息而执行各种监测任务(例如,确保各种智能NIC当前是可操作的且如果智能NIC中的一者出现故障,那么向其它智能NIC发送消息)。In addition to propagating configuration data from the network management and control system, in some embodiments, the leader smart NIC also receives information from other smart NICs via a dedicated communication channel. In some embodiments, this information includes statistics (e.g., data message processing statistics), status/monitoring information, and other data. In some embodiments, the selected leader smart NIC performs various monitoring tasks based on this information (e.g., ensuring that the various smart NICs are currently operational and sending messages to other smart NICs if one of the smart NICs fails).

在一些实施例中,将一些共享信息报告给网络管理及控制系统。图17在概念上图解说明从自身及另一智能NIC 1505收集统计数据并将所述统计数据报告给网络管理及控制系统1600的所选取领导者智能NIC 1500。如所展示,控制代理1530及1535从其相应组的网络虚拟化操作1520及1525收集统计数据。控制代理1535经由专用通信信道1550而将这些统计数据提供到领导者智能NIC 1500处的控制代理1530。来自两个智能NIC的这些统计数据中的至少一些统计数据从控制代理1530被发送到网络管理及控制系统1600。在一些实施例中,控制代理1530或者所选取领导者智能NIC 1500上的另一模块聚合统计数据,使得网络管理及控制系统1600被提供有显现为来自单个实体的信息。In some embodiments, some of the shared information is reported to the network management and control system. FIG. 17 conceptually illustrates a selected leader Smart NIC 1500 collecting statistics from itself and another Smart NIC 1505 and reporting the statistics to the network management and control system 1600. As shown, control agents 1530 and 1535 collect statistics from their respective sets of network virtualization operations 1520 and 1525. The control agent 1535 provides these statistics to the control agent 1530 at the leader Smart NIC 1500 via a dedicated communication channel 1550. At least some of these statistics from both Smart NICs are sent from the control agent 1530 to the network management and control system 1600. In some embodiments, the control agent 1530 or another module on the selected leader Smart NIC 1500 aggregates the statistics so that the network management and control system 1600 is provided with information that appears to be from a single entity.

此所收集的信息可由网络管理及控制系统1600使用以监测主机计算机及/或个别智能NIC。网络管理及控制系统还可使用此信息来修改智能NIC的虚拟网络配置,在此情形中,网络管理及控制系统向领导者智能NIC提供配置更新,所述领导者智能NIC又经由专用通信信道而将这些更新散布到其它智能NIC。This collected information can be used by the network management and control system 1600 to monitor the host computer and/or individual Smart-NICs. The network management and control system can also use this information to modify the virtual network configuration of the Smart-NICs, in which case the network management and control system provides configuration updates to the leader Smart-NIC, which in turn spreads these updates to the other Smart-NICs via a dedicated communication channel.

在一些实施例中,网络管理及控制系统包含执行不同功能且向主机计算机提供不同配置数据(除了从主机计算机接收不同数据之外)的多个组件。举例来说,一些实施例的网络管理及控制系统包含管理平面(MP)及中央控制平面(CCP)两者。MP接收来自管理员的配置数据、保存此数据,且向主机计算机提供特定配置信息。另外,在一些实施例中,主机计算机向MP提供统计数据、状态及其它实时数据。在一些实施例中,CCP从MP接收网络配置数据、确定需要所述网络配置数据的每一部分的主机计算机(及其它转发元件,例如网关),且将此数据提供到这些主机计算机上的代理。In some embodiments, the network management and control system includes multiple components that perform different functions and provide different configuration data to host computers (in addition to receiving different data from host computers). For example, the network management and control system of some embodiments includes both a management plane (MP) and a central control plane (CCP). The MP receives configuration data from administrators, saves this data, and provides specific configuration information to host computers. In addition, in some embodiments, the host computers provide statistics, status, and other real-time data to the MP. In some embodiments, the CCP receives network configuration data from the MP, determines the host computers (and other forwarding elements, such as gateways) that need each portion of the network configuration data, and provides this data to agents on these host computers.

在一些实施例中,智能NIC针对多个不同任务选取多个不同领导者。举例来说,一些实施例选取一个领导者以用于接收配置数据、选取另一领导者以用于收集流量统计数据、选取第三领导者以用于收集监测数据等。在一些实施例中,一个领导者被选取用于与MP进行通信且第二领导者被选取用于与CCP进行通信。这些领导者选取可使用不同的散列函数或向相同散列函数的不同输入,以便得出作为所选取领导者的不同智能NIC。在一些实施例中,如果智能NIC被选取用于与MP进行通信,那么所述智能NIC不被考虑用于与CCP进行通信,以便确保负载被共享。In some embodiments, the Smart NIC elects multiple different leaders for multiple different tasks. For example, some embodiments elect one leader for receiving configuration data, another leader for collecting traffic statistics, a third leader for collecting monitoring data, etc. In some embodiments, one leader is elected for communicating with the MP and a second leader is elected for communicating with the CCP. These leader elections may use different hash functions or different inputs to the same hash function in order to derive different Smart NICs as the elected leaders. In some embodiments, if a Smart NIC is elected for communicating with the MP, then the Smart NIC is not considered for communicating with the CCP in order to ensure that the load is shared.

图18在概念上图解说明主机计算机(未展示)的三个智能NIC 1805到1815,所述智能NIC分别操作智能NIC操作系统1820到1830。如在先前图中,智能NIC操作系统1820到1830中的每一者包含相应控制代理1835到1845及领导者选取模块1850到1860。另外,智能NIC1805到1815中的每一者经由专用通信信道1865而连接到其它智能NIC。18 conceptually illustrates three smart NICs 1805-1815 of a host computer (not shown) that respectively operate smart NIC operating systems 1820-1830. As in the previous figures, each of the smart NIC operating systems 1820-1830 includes a respective control agent 1835-1845 and a leader election module 1850-1860. In addition, each of the smart NICs 1805-1815 is connected to the other smart NICs via a dedicated communication channel 1865.

另外,包含MP 1870及CCP 1875两者的网络管理及控制系统1800与智能NIC 1805到1815进行通信。此处,领导者选取模块1850到1860已指定第一智能NIC 1805作为用于MP1870的联系点,且已指定第三智能NIC 1815作为用于CCP 1875的联系点。如此,第一智能NIC 1805上的控制代理1835与MP 1870进行通信,且第三智能NIC 1815上的控制代理1840与CCP 1875进行通信。在一些实施例中,智能NIC操作系统中的每一者实际上运行单独的MP代理及CP代理,其中所选取MP代理与MP 1870进行通信且所选取CP代理与CCP 1875进行通信。Additionally, the network management and control system 1800, which includes both the MP 1870 and the CCP 1875, communicates with the smart NICs 1805-1815. Here, the leader election modules 1850-1860 have designated the first smart NIC 1805 as the point of contact for the MP 1870, and have designated the third smart NIC 1815 as the point of contact for the CCP 1875. As such, the control agent 1835 on the first smart NIC 1805 communicates with the MP 1870, and the control agent 1840 on the third smart NIC 1815 communicates with the CCP 1875. In some embodiments, each of the smart NIC operating systems actually runs a separate MP agent and CP agent, with the selected MP agent communicating with the MP 1870 and the selected CP agent communicating with the CCP 1875.

出于各种目的,在一些实施例中,智能NIC还使用专用通信信道来使动态状态信息同步。也就是说,当第一智能NIC接收到或创建一组动态状态信息时,所述第一智能NIC使用专用通信信道来将同一组动态状态信息提供到其它智能NIC中的一或多者。不同类型的状态可与给定主机计算机的单个其它智能NIC或多个(或所有)其它智能NIC共享。如果智能NIC中的一者出故障,那么动态状态信息的同步允许保留所述信息,而非丢失所述状态信息。智能NIC可能由于电短路、断开连接、过热等而出故障。For various purposes, in some embodiments, the smart NICs also use a dedicated communication channel to synchronize dynamic state information. That is, when a first smart NIC receives or creates a set of dynamic state information, the first smart NIC uses a dedicated communication channel to provide the same set of dynamic state information to one or more of the other smart NICs. Different types of states may be shared with a single other smart NIC or multiple (or all) other smart NICs of a given host computer. If one of the smart NICs fails, synchronization of the dynamic state information allows the information to be preserved, rather than lost. A smart NIC may fail due to an electrical short, a disconnect, overheating, etc.

如所提及,主机计算机的智能NIC群组当中的所选取领导者智能NIC可从所有其它智能NIC收集监测数据。此所收集的数据或从所收集的数据产生的数据可包含与至少一个备份智能NIC同步的动态状态信息。因此,如果领导者智能NIC出故障,那么下一领导者可以检索监测状态信息。As mentioned, a selected leader smart NIC among a group of smart NICs of a host computer can collect monitoring data from all other smart NICs. This collected data or data generated from the collected data can include dynamic status information synchronized with at least one backup smart NIC. Thus, if the leader smart NIC fails, the next leader can retrieve the monitoring status information.

另外,当执行虚拟网络处理时,智能NIC可需要存储动态状态信息且彼此共享所述数据。图19在概念上图解说明共享连接状态的两个智能NIC 1905及1910。如在先前图中,智能NIC(例如,在智能NIC操作系统内)中的每一者执行网络虚拟化操作1915及1920。这些操作包含交换及路由1925及1930以及执行防火墙操作(即,基于数据消息的标头而确定是允许、阻止还是丢弃那些数据消息)的防火墙引擎1935及1940。在一些实施例中,防火墙操作是状态性的,且因此使用来自相应连接跟踪器1945及1950的信息。In addition, when performing virtual network processing, the smart NICs may need to store dynamic state information and share that data with each other. FIG. 19 conceptually illustrates two smart NICs 1905 and 1910 that share connection state. As in the previous figures, each of the smart NICs (e.g., within a smart NIC operating system) performs network virtualization operations 1915 and 1920. These operations include switching and routing 1925 and 1930 and firewall engines 1935 and 1940 that perform firewall operations (i.e., determine whether to allow, block, or drop data messages based on their headers). In some embodiments, the firewall operations are stateful and therefore use information from respective connection trackers 1945 and 1950.

连接跟踪器1945及1950存储关于由智能NIC处理的打开连接的信息。如所展示,一些实施例针对每一打开连接而存储至少5元组(源及目的地IP地址、源及目的地传送层端口、传送层协议)、连接的当前状态以及连接的拥塞窗口。此连接信息是连接跟踪器1945及1950经由智能NIC之间的专用通信信道1955而同步的动态状态。The connection trackers 1945 and 1950 store information about open connections handled by the smart NICs. As shown, some embodiments store at least a 5-tuple (source and destination IP addresses, source and destination transport layer ports, transport layer protocol), the current state of the connection, and the congestion window of the connection for each open connection. This connection information is a dynamic state that the connection trackers 1945 and 1950 synchronize via a dedicated communication channel 1955 between the smart NICs.

如所展示,第一智能NIC 1905上的连接跟踪器1945存储针对两个打开连接(cxn1及cxn2)的信息,以及针对这些打开连接的拥塞窗口。其它实施例还可存储额外数据(例如,接收器窗口)。防火墙引擎1935及1940使用来自其相应连接跟踪器的此动态连接状态信息来处理向及从其主机计算机上的DCN发送的数据消息。关于特定连接是否已打开(例如,是否已完成三次握手)的信息允许防火墙引擎1935及1940确定是否应允许数据消息。拥塞窗口是由连接端点确定(且由智能NIC获知)的动态状态变量,所述动态状态变量限制针对特定连接的可被发送到网络上(即,从智能NIC中的一者的物理端口)的数据量,且通常从小处开始并增加到最大值(其可由接收器窗口设定)。As shown, the connection tracker 1945 on the first smart NIC 1905 stores information for two open connections (cxn1 and cxn2), as well as the congestion windows for these open connections. Other embodiments may also store additional data (e.g., a receiver window). The firewall engines 1935 and 1940 use this dynamic connection state information from their respective connection trackers to process data messages sent to and from the DCN on their host computers. Information about whether a particular connection is open (e.g., whether a three-way handshake has been completed) allows the firewall engines 1935 and 1940 to determine whether the data message should be allowed. The congestion window is a dynamic state variable determined by the connection endpoints (and learned by the smart NICs) that limits the amount of data that can be sent to the network (i.e., from the physical port of one of the smart NICs) for a particular connection, and typically starts small and increases to a maximum value (which may be set by the receiver window).

如果正在进行的连接的连接状态将丢失(例如,由于智能NIC在其连接跟踪器中存储所述连接状态失败),那么取决于防火墙引擎设定,所述连接的所有业务将由拾取所述连接的智能NIC的防火墙引擎阻止,或者所述智能NIC上的防火墙引擎将需要从端点重新获知所述连接状态。在第一种选项中,不仅需要重新建立连接,而且拥塞窗口将再次从小处开始,从而限制可被传输的数据量。后一种选项避免了断开连接,但代价是安全执行松懈的窗口。If the connection state of an ongoing connection is lost (e.g., due to a failure of the smart NIC to store the connection state in its connection tracker), then, depending on the firewall engine settings, all traffic for the connection will be blocked by the firewall engine of the smart NIC that picked up the connection, or the firewall engine on the smart NIC will need to relearn the connection state from the endpoint. In the first option, not only will the connection need to be reestablished, but the congestion window will start small again, limiting the amount of data that can be transmitted. The latter option avoids disconnection, but at the expense of a lax window for security enforcement.

如此,连接跟踪器1945与1950彼此共享其动态状态信息以避免需要这些选项中的任一者。此时,cxn1及cxn2的状态信息已被共享;这些连接可由智能NIC 1905及1910中的任一者处理。此时,VM 1900处于打开新连接(cxn3)并将用于此连接的数据消息1960发送到第一智能NIC 1905(即,VM 1900所绑定到的智能NIC)上的网络虚拟化操作1915的过程中。因此,连接跟踪器1945还将此连接状态数据1965同步到连接跟踪器1950。在一些实施例中,每一智能NIC仅将其连接状态数据(或其它状态数据)同步到一个其它智能NIC,而在其它实施例中,每一智能NIC将其连接状态数据(或其它状态数据)同步到所有其它智能NIC。Thus, connection trackers 1945 and 1950 share their dynamic state information with each other to avoid the need for either of these options. At this point, the state information for cxn1 and cxn2 has been shared; these connections can be handled by either of smart NICs 1905 and 1910. At this point, VM 1900 is in the process of opening a new connection (cxn3) and sending a data message 1960 for this connection to network virtualization operation 1915 on the first smart NIC 1905 (i.e., the smart NIC to which VM 1900 is bound). Therefore, connection tracker 1945 also synchronizes this connection state data 1965 to connection tracker 1950. In some embodiments, each smart NIC synchronizes its connection state data (or other state data) to only one other smart NIC, while in other embodiments, each smart NIC synchronizes its connection state data (or other state data) to all other smart NICs.

不同实施例以不同间隔来同步动态状态信息。一些实施例通过专用通信信道而同步每一改变,而其它实施例以规则的时间间隔(例如,每1ms、每100ms、每秒、每5秒等)同步状态数据。如果专用通信信道是专门构建的信道,那么这可实现非常快速(例如,每1ms左右)的同步。另外,一些实施例使用智能NIC中的机制来将连接状态(或其它经同步数据)写入到所述智能NIC中的特定存储区域,其中此写入自动地镜像到另一智能NIC上的对等存储区域,从而实现更快的同步(例如,小于10μs的延迟)。如果同步间隔较长(较高延迟)使得拥塞窗口无法被准确地同步,那么一些实施例仅同步基本连接状态(即,连接是否打开并被允许)。在处理特定连接的第一智能NIC出故障的情形中,开始处理所述连接的新智能NIC允许用于所述连接的业务,直到所述新智能NIC已获知所述连接的拥塞窗口为止。Different embodiments synchronize dynamic state information at different intervals. Some embodiments synchronize each change through a dedicated communication channel, while other embodiments synchronize state data at regular time intervals (e.g., every 1 ms, every 100 ms, every second, every 5 seconds, etc.). If the dedicated communication channel is a specially constructed channel, this can achieve very fast synchronization (e.g., every 1 ms or so). In addition, some embodiments use mechanisms in the smart NIC to write the connection state (or other synchronized data) to a specific storage area in the smart NIC, where this write is automatically mirrored to a peer storage area on another smart NIC, thereby achieving faster synchronization (e.g., less than 10 μs of latency). If the synchronization interval is longer (higher latency) so that the congestion window cannot be accurately synchronized, some embodiments only synchronize basic connection status (i.e., whether the connection is open and allowed). In the event that the first smart NIC handling a particular connection fails, the new smart NIC that begins handling the connection allows traffic for the connection until the new smart NIC has learned the congestion window for the connection.

当VM 1900绑定到第一智能NIC 1905时(且假定此连接向及从此第一智能NIC1905的物理端口发送),第二智能NIC 1910实际上对此信息不具有任何用途。然而,图20图解说明在这些连接保持打开的同时第一智能NIC 1905已变得不可操作,且因此VM 1900现在绑定到第二智能NIC 1910的接口。这并不意味着VM需要重新启动其所有连接,因为此信息已从第一智能NIC 1905被同步。取决于配置,如果存在多于两个智能NIC,那么在不同实施例中,绑定到现在不可操作的智能NIC的所有VM均转移到同一智能NIC或者跨越所有剩余的智能NIC而保持平衡。When VM 1900 is bound to first Smart NIC 1905 (and assuming this connection is sent to and from the physical port of this first Smart NIC 1905), second Smart NIC 1910 does not actually have any use for this information. However, FIG. 20 illustrates that while these connections remain open, first Smart NIC 1905 has become inoperable, and therefore VM 1900 is now bound to the interface of second Smart NIC 1910. This does not mean that the VM needs to restart all of its connections, as this information has been synchronized from first Smart NIC 1905. Depending on the configuration, if there are more than two Smart NICs, then in different embodiments, all VMs bound to the now inoperable Smart NIC are either transferred to the same Smart NIC or balanced across all remaining Smart NICs.

如所展示,VM 1900继续发送用于cxn3的数据消息2000(现在发送到第二智能NIC1910)。由于此连接的当前状态是其现在以拥塞窗口3打开(在第一智能NIC 1905出故障之前),因此防火墙引擎1940能够处理这些数据消息而不需要所述连接或其拥塞窗口重新启动。As shown, VM 1900 continues to send data messages 2000 for cxn3 (now to the second smart NIC 1910). Since the current state of this connection is that it is now open with a congestion window of 3 (before the first smart NIC 1905 failed), the firewall engine 1940 is able to process these data messages without requiring the connection or its congestion window to be restarted.

这种状态共享也可由正在执行除虚拟网络之外的操作的智能NIC(或执行使用状态共享的多种类型的操作的智能NIC)使用。如果存储虚拟化操作由智能NIC处置,那么在一些实施例中,存储虚拟化功能包含运行网络堆栈来管理到存储装置的传送层(例如,TCP)连接。在此情形中,若发生故障转移,应在智能NIC之间再次共享连接信息,使得如果智能NIC中的一者出故障,那么这些连接不会被复位。This state sharing can also be used by smart NICs that are performing operations other than virtual networking (or smart NICs that are performing multiple types of operations that use state sharing). If storage virtualization operations are handled by the smart NICs, then in some embodiments, the storage virtualization functions include running a network stack to manage transport layer (e.g., TCP) connections to the storage devices. In this case, if a failover occurs, the connection information should be shared again between the smart NICs so that if one of the smart NICs fails, the connections are not reset.

图21在概念上图解说明实施本发明的一些实施例所利用的电子系统2100。电子系统2100可为计算机(例如,桌上型计算机、个人计算机、平板计算机、服务器计算机、大型机、刀片计算机等)、电话、PDA或任何其它种类的电子装置。此电子系统包含各种类型的计算机可读媒体及用于各种其它类型的计算机可读媒体的接口。电子系统2100包含总线2105、处理单元2110、系统存储器2125、只读存储器2130、永久存储装置2135、输入装置2140及输出装置2145。FIG. 21 conceptually illustrates an electronic system 2100 utilized to implement some embodiments of the present invention. The electronic system 2100 may be a computer (e.g., a desktop computer, a personal computer, a tablet computer, a server computer, a mainframe, a blade computer, etc.), a phone, a PDA, or any other kind of electronic device. This electronic system includes various types of computer-readable media and interfaces for various other types of computer-readable media. The electronic system 2100 includes a bus 2105, a processing unit 2110, a system memory 2125, a read-only memory 2130, a permanent storage device 2135, an input device 2140, and an output device 2145.

总线2105共同表示所有系统、外围设备及芯片组总线,其通信地连接电子系统2100的众多内部装置。举例来说,总线2105将处理单元2110与只读存储器2130、系统存储器2125及永久存储装置2135通信地连接。The bus 2105 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 2100. For example, the bus 2105 communicatively connects the processing unit 2110 with the read-only memory 2130, the system memory 2125, and the permanent storage device 2135.

从这些各种存储器单元中,处理单元2110检索要执行的指令及要处理的数据以便执行本发明的过程。在不同实施例中,处理单元可为单处理器或多核处理器。From these various memory units, processing unit 2110 retrieves instructions to execute and data to process in order to perform the processes of the present invention. In various embodiments, the processing unit may be a single processor or a multi-core processor.

只读存储器(ROM)2130存储处理单元2110以及电子系统的其它模块所需的静态数据及指令。另一方面,永久存储装置2135是读写存储器装置。此装置是即使当电子系统2100关断时仍存储指令及数据的非易失性存储器单元。本发明的一些实施例使用大容量存储装置(例如磁盘或光盘及其对应磁盘驱动器)作为永久存储装置2135。The read-only memory (ROM) 2130 stores static data and instructions required by the processing unit 2110 and other modules of the electronic system. On the other hand, the permanent storage device 2135 is a read-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 2100 is turned off. Some embodiments of the present invention use a large-capacity storage device (such as a magnetic disk or optical disk and its corresponding magnetic disk drive) as the permanent storage device 2135.

其它实施例使用可装卸式存储装置(例如软盘、快闪驱动器等)作为永久存储装置。如同永久存储装置2135一样,系统存储器2125是读写存储器装置。然而,与存储装置2135不同,系统存储器是易失性读写存储器,例如随机存取存储器。系统存储器存储处理器在运行时间需要的一些指令及数据。在一些实施例中,本发明的过程存储在系统存储器2125、永久存储装置2135及/或只读存储器2130中。从这些各种存储器单元中,处理单元2110检索要执行的指令及要处理的数据以便执行一些实施例的过程。Other embodiments use removable storage devices (e.g., floppy disks, flash drives, etc.) as permanent storage devices. Like permanent storage device 2135, system memory 2125 is a read-write memory device. However, unlike storage device 2135, system memory is a volatile read-write memory, such as random access memory. System memory stores some instructions and data that the processor needs at run time. In some embodiments, the processes of the present invention are stored in system memory 2125, permanent storage device 2135, and/or read-only memory 2130. From these various memory units, processing unit 2110 retrieves instructions to be executed and data to be processed in order to perform the processes of some embodiments.

总线2105还连接到输入装置2140及输出装置2145。输入装置使得用户能够向电子系统传递信息及选择命令。输入装置2140包含字母数字键盘及指向装置(也称作“光标控制装置”)。输出装置2145显示由电子系统产生的图像。输出装置包含打印机及显示装置,例如阴极射线管(CRT)或液晶显示器(LCD)。一些实施例包含既用作输入装置又用作输出装置的装置,例如触摸屏。The bus 2105 is also connected to input devices 2140 and output devices 2145. The input devices enable a user to communicate information and select commands to the electronic system. The input devices 2140 include an alphanumeric keyboard and a pointing device (also called a "cursor control device"). The output devices 2145 display images generated by the electronic system. Output devices include printers and display devices, such as cathode ray tubes (CRTs) or liquid crystal displays (LCDs). Some embodiments include devices that function as both input devices and output devices, such as touch screens.

最后,如图21中所展示,总线2105还通过网络适配器(未展示)将电子系统2100连接到网络2165。以此方式,计算机可为计算机网络(例如局域网(“LAN”)、广域网(“WAN”)或内部网络,或者若干网络的网络,例如因特网)的一部分。电子系统2100的任何或所有组件可与本发明结合使用。Finally, as shown in FIG21, bus 2105 also connects electronic system 2100 to a network 2165 through a network adapter (not shown). In this way, the computer can be part of a computer network, such as a local area network ("LAN"), a wide area network ("WAN"), or an intranet, or a network of networks, such as the Internet. Any or all components of electronic system 2100 may be used in conjunction with the present invention.

一些实施例包含将计算机程序指令存储在机器可读或计算机可读媒体(替代地称为计算机可读存储媒体、机器可读媒体或机器可读存储媒体)中的电子组件,例如微处理器、存储装置及存储器。此类计算机可读媒体的一些实例包含RAM、ROM、只读光盘(CD-ROM)、可记录光盘(CD-R)、可重写光盘(CD-RW)、只读数字多功能光盘(例如,DVD-ROM、双层DVD-ROM)、多种可记录/可重写DVD(例如,DVD-RAM、DVD-RW、DVD+RW等)、快闪存储器(例如,SD卡、迷你SD卡、微型SD卡等)、磁性及/或固态硬盘驱动器、只读及可记录光盘、超密度光盘、任何其它光学或磁性媒体以及软盘。计算机可读媒体可存储计算机程序,所述计算机程序可由至少一个处理单元执行且包含用于执行各种操作的指令集。计算机程序或计算机代码的实例包含机器代码(例如由编译器产生的机器代码),以及包含由计算机、电子组件或微处理器使用解释器执行的较高级代码的文件。Some embodiments include electronic components, such as microprocessors, storage devices, and memories, that store computer program instructions in machine-readable or computer-readable media (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, compact disk-read only (CD-ROM), compact disk-recordable (CD-R), compact disk-rewritable (CD-RW), read-only digital versatile disks (e.g., DVD-ROM, dual-layer DVD-ROM), various recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD card, mini SD card, micro SD card, etc.), magnetic and/or solid-state hard drives, read-only and recordable A computer readable medium may store a computer program that is executable by at least one processing unit and includes a set of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as that produced by a compiler, and files containing higher-level code that are executed by a computer, electronic component, or microprocessor using an interpreter.

尽管上述论述主要参考执行软件的微处理器或多核处理器,但一些实施例由例如专用集成电路(ASIC)或现场可编程门阵列(FPGA)等一或多个集成电路执行。在一些实施例中,此类集成电路执行存储在电路本身上的指令。Although the above discussion mainly refers to a microprocessor or multi-core processor that executes software, some embodiments are performed by one or more integrated circuits such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). In some embodiments, such integrated circuits execute instructions stored on the circuit itself.

如本说明书中所使用,术语“计算机”、“服务器”、“处理器”及“存储器”均是指电子或其它技术装置。这些术语不包括人或人群。出于本说明书的目的,术语“显示(display或displaying)”意味着在电子装置上显示。如本说明书中所使用,术语“计算机可读媒体(medium)”、“计算机可读媒体(media)”及“机器可读媒体”完全限于以计算机可读的形式存储信息的有形物理对象。这些术语不包括任何无线信号、有线下载信号及任何其它短暂信号。As used in this specification, the terms "computer", "server", "processor" and "memory" refer to electronic or other technical devices. These terms do not include people or groups of people. For the purposes of this specification, the term "display" or "displaying" means displaying on an electronic device. As used in this specification, the terms "computer-readable medium (medium)", "computer-readable medium (media)" and "machine-readable medium" are entirely limited to tangible physical objects that store information in a computer-readable form. These terms do not include any wireless signals, wired download signals, and any other transient signals.

本说明书通篇参考包含虚拟机(VM)的计算及网络环境。然而,虚拟机仅是数据计算节点(DCN)或数据计算端节点(还称为可寻址节点)的一个实例。DCN可包含非虚拟化物理主机、虚拟机、在主机操作系统之上运行的容器(无需管理程序或单独操作系统)以及管理程序内核网络接口模块。Reference is made throughout this specification to computing and network environments that include virtual machines (VMs). However, a virtual machine is only one instance of a data computing node (DCN) or data computing end node (also referred to as an addressable node). A DCN may include non-virtualized physical hosts, virtual machines, containers that run on top of a host operating system (without a hypervisor or separate operating system), and a hypervisor kernel network interface module.

在一些实施例中,VM使用由虚拟化软件(例如,管理程序、虚拟机监测器等)虚拟化的主机的资源在主机上利用其自身的来宾操作系统进行操作。租户(即,VM的所有者)可选择在来宾操作系统之上操作哪些应用程序。另一方面,一些容器是在主机操作系统之上运行的构造,而不需要管理程序或单独来宾操作系统。在一些实施例中,主机操作系统使用命名空间来将容器彼此隔离且因此提供在不同容器内操作的不同应用程序群组的操作系统级分离。此分离类似于在虚拟化系统硬件的管理程序虚拟化环境中提供的VM分离,且因此可被视为对在不同容器中操作的不同应用程序群组进行隔离的一种虚拟化形式。此类容器比VM更轻量级。In some embodiments, the VM uses the resources of the host virtualized by the virtualization software (e.g., a hypervisor, a virtual machine monitor, etc.) to operate on the host using its own guest operating system. The tenant (i.e., the owner of the VM) can choose which applications to operate on the guest operating system. On the other hand, some containers are structures that run on top of the host operating system without the need for a hypervisor or a separate guest operating system. In some embodiments, the host operating system uses namespaces to isolate containers from each other and thus provide operating system-level separation of different application groups operating in different containers. This separation is similar to the VM separation provided in the hypervisor virtualization environment of the virtualized system hardware, and can therefore be regarded as a form of virtualization that isolates different application groups operating in different containers. Such containers are more lightweight than VMs.

在一些实施例中,管理程序内核网络接口模块为非VM DCN,所述非VM DCN包含具有管理程序内核网络接口及接收/传输线程的网络堆栈。管理程序内核网络接口模块的一个实例是vmknic模块,其为威睿公司(VMware,Inc.)的ESXiTM管理程序的一部分。In some embodiments, the hypervisor kernel network interface module is a non-VM DCN that includes a network stack with a hypervisor kernel network interface and receive/transmit threads. An example of a hypervisor kernel network interface module is the vmknic module, which is part of VMware, Inc.'s ESXi™ hypervisor.

应理解,尽管本说明书参考VM,但给出的实例可为任何类型的DCN,包含物理主机、VM、非VM容器及管理程序内核网络接口模块。事实上,在一些实施例中,实例性网络可包含不同类型的DCN的组合。It should be understood that although this specification refers to VMs, the examples given may be any type of DCN, including physical hosts, VMs, non-VM containers, and hypervisor kernel network interface modules. In fact, in some embodiments, the example network may include a combination of different types of DCNs.

尽管本发明已参考众多特定细节进行了描述,但所属领域的技术人员将认识到,在不背离本发明的精神的情况下,本发明可以其它特定形式体现。另外,若干个图(包含图8及14)在概念上图解说明了过程。这些过程的特定操作可不会按照所展示及描述的确切次序来执行。特定操作可不在一系列连续的操作中执行,且不同的特定操作可在不同实施例中执行。此外,所述过程可使用数个子过程来实施,或作为更大宏观过程的一部分来实施。因此,所属领域的技术人员将理解,本发明不受前述说明性细节限制,而是将由所附权利要求书界定。Although the present invention has been described with reference to numerous specific details, those skilled in the art will recognize that the present invention may be embodied in other specific forms without departing from the spirit of the present invention. In addition, several figures (including Figures 8 and 14) conceptually illustrate the process. The specific operations of these processes may not be performed in the exact order shown and described. Specific operations may not be performed in a series of continuous operations, and different specific operations may be performed in different embodiments. In addition, the process may be implemented using several sub-processes, or implemented as part of a larger macro process. Therefore, those skilled in the art will understand that the present invention is not limited by the aforementioned illustrative details, but will be defined by the appended claims.

Claims (42)

1.一种方法,其包括:1. A method comprising: 在主机计算机的多个智能网络接口控制器(NIC)中的第一智能NIC处,所述智能NIC中的每一者执行智能NIC操作系统,所述智能NIC操作系统针对在所述主机计算机上执行的一组数据计算机器而执行虚拟网络操作:At a first intelligent network interface controller (NIC) of a host computer, each of the intelligent NICs executes an intelligent NIC operating system that performs virtual network operations for a set of data computing machines executing on the host computer: 接收由在所述主机计算机上执行的所述数据计算机器中的一者发送的数据消息;receiving a data message sent by one of the data computing machines executing on the host computer; 对所述数据消息执行虚拟网络操作以确定所述数据消息将从所述多个智能NIC中的第二智能NIC的端口被传输;及performing a virtual network operation on the data message to determine that the data message is to be transmitted from a port of a second smart NIC of the plurality of smart NICs; and 经由连接所述多个智能NIC的专用通信信道而将所述数据消息传递到所述第二智能NIC。The data message is communicated to the second intelligent NIC via a dedicated communication channel connecting the plurality of intelligent NICs. 2.根据权利要求1所述的方法,其中对所述数据消息执行虚拟网络操作以确定所述数据消息将从第二智能NIC的端口被传输包括:2. The method of claim 1 , wherein performing a virtual network operation on the data message to determine that the data message is to be transmitted from a port of the second intelligent NIC comprises: 基于所述虚拟网络操作,确定所述数据消息的目的地,所述目的地在所述主机计算机外部且可通过所述第一智能NIC的第一物理端口到达;及determining, based on the virtual network operation, a destination for the data message, the destination being external to the host computer and reachable through a first physical port of the first smart NIC; and 确定所述第一智能NIC的所述第一物理端口当前不可操作,且所述目的地可通过所述第二智能NIC的第二物理端口到达。It is determined that the first physical port of the first smart NIC is currently inoperable and that the destination is reachable through a second physical port of the second smart NIC. 3.根据权利要求2所述的方法,其中所述第一智能NIC的将所述主机计算机连接到物理网络的所有物理端口均不可操作。3. The method of claim 2, wherein all physical ports of the first smart NIC that connect the host computer to a physical network are inoperable. 4.根据权利要求2所述的方法,其中发送所述数据消息的所述数据计算机器被绑定到所述第一智能NIC的端口。4. The method of claim 2, wherein the data computing machine that sends the data message is bound to a port of the first smart NIC. 5.根据权利要求1所述的方法,其中对所述数据消息执行虚拟网络操作包括:5. The method of claim 1 , wherein performing a virtual network operation on the data message comprises: 基于所述虚拟网络操作,确定所述数据消息的在所述主机计算机外部的目的地;及determining a destination of the data message external to the host computer based on the virtual network operation; and 执行负载平衡以将所述数据消息指派给所述第二智能NIC的所述端口。Load balancing is performed to assign the data messages to the ports of the second intelligent NIC. 6.根据权利要求5所述的方法,其中:6. The method according to claim 5, wherein: 所述第一智能NIC的一组物理端口及所述第二智能NIC的一组物理端口在同一链路聚合群组中,跨越所述链路聚合群组而对数据消息进行负载平衡;且A set of physical ports of the first intelligent NIC and a set of physical ports of the second intelligent NIC are in the same link aggregation group, and data messages are load balanced across the link aggregation group; and 发送所述数据消息的所述数据计算机器被绑定到所述第一智能NIC的接口。The data computing machine that sends the data message is bound to an interface of the first intelligent NIC. 7.根据权利要求1所述的方法,其中:7. The method according to claim 1, wherein: 所述数据计算机器是绑定到所述第一智能NIC的第一数据计算机器;且The data computing machine is a first data computing machine bound to the first smart NIC; and 对所述数据消息执行虚拟网络操作包括:基于所述虚拟网络操作而确定所述数据消息的目的地是在所述主机计算机上执行的被绑定到所述第二智能NIC的第二数据计算机器。Performing a virtual network operation on the data message includes determining, based on the virtual network operation, that a destination for the data message is a second data computing machine executing on the host computer and bound to the second smart NIC. 8.根据权利要求1所述的方法,其中所述虚拟网络操作包括逻辑交换操作。8. The method of claim 1, wherein the virtual network operations include logical switching operations. 9.根据权利要求8所述的方法,其中所述虚拟网络操作进一步包括逻辑路由操作。9. The method of claim 8, wherein the virtual network operations further include logical routing operations. 10.根据权利要求1所述的方法,其中所述数据消息是第一数据消息,所述方法进一步包括:10. The method of claim 1, wherein the data message is a first data message, the method further comprising: 经由所述专用通信信道而从所述第二智能NIC接收第二数据消息,其中所述第二智能NIC对所述第二数据消息执行了虚拟网络操作以确定所述第二数据消息的目的地是所述数据计算机器;及receiving a second data message from the second smart NIC via the dedicated communication channel, wherein the second smart NIC performs a virtual network operation on the second data message to determine that the second data message is destined for the data computing machine; and 通过所述数据计算机器所绑定到的所述第一智能NIC的端口而将所述第二数据消息发送到所述数据计算机器。The second data message is sent to the data computing machine through the port of the first intelligent NIC to which the data computing machine is bound. 11.根据权利要求10所述的方法,其中所述数据消息是第一数据消息,所述方法进一步包括:11. The method according to claim 10, wherein the data message is a first data message, the method further comprising: 接收由所述数据计算机器发送的第二数据消息;receiving a second data message sent by the data computing machine; 对所述数据消息执行虚拟网络操作以确定所述数据消息将从所述第一智能NIC的物理端口被传输;及performing a virtual network operation on the data message to determine that the data message will be transmitted from a physical port of the first smart NIC; and 从所述物理端口传输所述数据消息。The data message is transmitted from the physical port. 12.一种用于在主机计算机的使用动态状态信息来执行操作的多个智能网络接口控制器(NIC)之间同步状态的方法,所述方法包括:12. A method for synchronizing states between a plurality of intelligent network interface controllers (NICs) of a host computer that use dynamic state information to perform operations, the method comprising: 在所述多个智能NIC中的第一智能NIC处,存储一组动态状态信息;及storing, at a first smart NIC of the plurality of smart NICs, a set of dynamic state information; and 跨越连接所述多个智能NIC的通信信道而同步所述一组动态状态信息,使得所述多个智能NIC中的所述智能NIC中的每一者也存储所述一组动态状态信息。The set of dynamic state information is synchronized across a communication channel connecting the plurality of intelligent NICs such that each of the intelligent NICs in the plurality of intelligent NICs also stores the set of dynamic state information. 13.根据权利要求12所述的方法,其中所述动态状态信息包括连接跟踪数据。The method of claim 12 , wherein the dynamic state information comprises connection tracking data. 14.根据权利要求13所述的方法,其中所述智能NIC中的每一者执行智能NIC操作系统,所述智能NIC操作系统使用所述连接跟踪数据对向及从在所述主机计算机上执行的一组数据计算机器发送的数据消息执行虚拟网络操作。14. The method of claim 13, wherein each of the smart NICs executes a smart NIC operating system that uses the connection tracking data to perform virtual network operations on data messages sent to and from a set of data computing machines executing on the host computer. 15.根据权利要求14所述的方法,其中所述虚拟网络操作包括确定是否允许所述数据消息的防火墙操作。15. The method of claim 14, wherein the virtual network operation comprises a firewall operation that determines whether to allow the data message. 16.根据权利要求13所述的方法,其中所述连接跟踪数据包括活动传送层连接的列表。16. The method of claim 13, wherein the connection tracking data comprises a list of active transport layer connections. 17.根据权利要求16所述的方法,其中所述连接跟踪数据进一步包括所述活动传送层连接的至少子集中的每一者的拥塞窗口。17. The method of claim 16, wherein the connection tracking data further comprises a congestion window for each of at least a subset of the active transport layer connections. 18.根据权利要求12所述的方法,其中所述动态状态信息包括用于执行虚拟化存储操作的所跟踪存储连接信息。18. The method of claim 12, wherein the dynamic state information comprises tracked storage connection information used to perform virtualized storage operations. 19.根据权利要求12所述的方法,其中所述多个智能NIC中的每一智能NIC使用所述动态状态信息的子集来执行操作,所述方法进一步包括:19. The method of claim 12, wherein each of the plurality of intelligent NICs uses a subset of the dynamic state information to perform operations, the method further comprising: 在第二智能NIC出故障后,即刻通过使用在同步操作中从所述第二智能NIC接收到的动态状态信息来执行先前由所述第二智能NIC执行的操作。Upon failure of the second smart NIC, operations previously performed by the second smart NIC are performed using dynamic state information received from the second smart NIC in a synchronization operation. 20.根据权利要求12所述的方法,其中所述通信信道是具有高带宽及低等待时间的专用通信信道。20. The method of claim 12, wherein the communication channel is a dedicated communication channel with high bandwidth and low latency. 21.根据权利要求12所述的方法,其中所述专用通信信道是仅承载所述智能NIC之间的通信的一组物理电缆。21. The method of claim 12, wherein the dedicated communication channel is a set of physical cables that carry only communications between the smart NICs. 22.根据权利要求21所述的方法,其中针对所述多个智能NIC中的每一对相应的智能NIC,所述一组物理电缆包括连接所述一对智能NIC的相应物理电缆。22. The method of claim 21, wherein for each corresponding pair of smart NICs in the plurality of smart NICs, the set of physical cables includes a corresponding physical cable connecting the pair of smart NICs. 23.根据权利要求21所述的方法,其中所述多个智能NIC中的每一智能NIC经由所述一组物理电缆而直接连接到另外两个智能NIC。23. The method of claim 21, wherein each of the plurality of intelligent NICs is directly connected to two other intelligent NICs via the set of physical cables. 24.根据权利要求21所述的方法,其中所述一组物理电缆连接到一组一或多个物理交换机以承载所述智能NIC之间的通信。24. The method of claim 21, wherein the set of physical cables is connected to a set of one or more physical switches to carry communications between the smart NICs. 25.根据权利要求12所述的方法,其中:25. The method of claim 12, wherein: 所述智能NIC经由物理端口而连接到物理网络,以便向其它主机计算机传输数据及从其它主机计算机接收数据;且The smart NIC is connected to a physical network via a physical port to transmit and receive data to and from other host computers; and 所述专用通信信道是利用所述物理网络的逻辑专用通信信道。The dedicated communication channel is a logical dedicated communication channel utilizing the physical network. 26.根据权利要求25所述的方法,其中所述逻辑专用通信信道是覆叠网络。26. The method of claim 25, wherein the logical dedicated communication channel is an overlay network. 27.根据权利要求12所述的方法,其中所述专用通信信道使用所述主机计算机的快速外围组件互连(PCIe)子系统。27. The method of claim 12, wherein the dedicated communication channel uses a Peripheral Component Interconnect Express (PCIe) subsystem of the host computer. 28.一种方法,其包括:28. A method comprising: 在主机计算机的多个智能网络接口控制器(NIC)中的第一智能NIC处,所述智能NIC中的每一者用于针对在所述主机计算机上执行的一组数据计算机器而执行虚拟网络操作:At a first intelligent network interface controller (NIC) of a host computer, each of the intelligent NICs is configured to perform virtual network operations for a set of data computing machines executing on the host computer: 确定选取所述第一智能NIC来与配置所述虚拟网络操作的网络管理及控制系统进行通信;Determining to select the first intelligent NIC to communicate with a network management and control system that configures the virtual network operation; 从所述网络管理及控制系统接收用于所述虚拟网络操作的一组配置数据;及receiving a set of configuration data for operation of the virtual network from the network management and control system; and 将所述所接收的一组配置数据提供到所述主机计算机的其它智能NIC。The received set of configuration data is provided to other intelligent NICs of the host computer. 29.根据权利要求28所述的方法,其中确定选取所述第一智能NIC来与所述网络管理及控制系统进行通信包括:执行确定性选择算法以确定选取所述第一智能NIC。29. The method of claim 28, wherein determining to select the first intelligent NIC to communicate with the network management and control system comprises executing a deterministic selection algorithm to determine to select the first intelligent NIC. 30.根据权利要求29所述的方法,其中所述多个智能NIC中的所述其它智能NIC中的每一者执行相同的确定性算法来确定选取所述第一智能NIC。30. The method of claim 29, wherein each of the other smart NICs in the plurality of smart NICs executes a same deterministic algorithm to determine selection of the first smart NIC. 31.根据权利要求28所述的方法,其中确定选取所述第一智能NIC来与所述网络管理及控制系统进行通信包括:经由连接所述智能NIC的专用通信信道而与所述多个智能NIC中的所述其它智能NIC交换消息。31. The method of claim 28, wherein determining to select the first smart NIC to communicate with the network management and control system comprises exchanging messages with the other smart NICs of the plurality of smart NICs via a dedicated communication channel connecting the smart NICs. 32.根据权利要求28所述的方法,其中:32. The method of claim 28, wherein: 所述网络管理及控制系统包括管理平面及控制平面;The network management and control system includes a management plane and a control plane; 确定选取所述第一智能NIC来与所述网络管理及控制系统进行通信包括:确定选取所述第一智能NIC来与所述管理平面进行通信;且Determining to select the first intelligent NIC to communicate with the network management and control system includes: determining to select the first intelligent NIC to communicate with the management plane; and 所述一组配置数据是从所述管理平面接收。The set of configuration data is received from the management plane. 33.根据权利要求32所述的方法,其进一步包括确定选取所述多个智能NIC中的第二智能NIC来与所述控制平面进行通信。33. The method of claim 32, further comprising determining to select a second smart NIC of the plurality of smart NICs to communicate with the control plane. 34.根据权利要求33所述的方法,其中:34. The method of claim 33, wherein: 所述一组配置数据是第一组配置数据;且The set of configuration data is a first set of configuration data; and 所述第二智能NIC从所述控制平面接收第二组配置数据并将所述第二组配置数据提供到所述主机计算机的所述第一智能NIC及所述其它智能NIC。The second intelligent NIC receives a second set of configuration data from the control plane and provides the second set of configuration data to the first intelligent NIC and the other intelligent NICs of the host computer. 35.根据权利要求28所述的方法,其中经由连接所述多个智能NIC的专用通信信道而将所述所接收的一组配置数据提供到所述其它智能NIC。35. The method of claim 28, wherein the received set of configuration data is provided to the other intelligent NICs via a dedicated communication channel connecting the plurality of intelligent NICs. 36.根据权利要求28所述的方法,其进一步包括监测所述其它智能NIC以确定所述其它智能NIC是否可操作。36. The method of claim 28, further comprising monitoring the other intelligent NIC to determine whether the other intelligent NIC is operational. 37.根据权利要求28所述的方法,其中在确定选取所述第一智能NIC来与网络管理及控制系统进行通信后,即刻将由所述网络管理及控制系统用来与所述主机计算机进行通信的网络地址指派给所述第一智能NIC的接口。37. The method of claim 28, wherein upon determining that the first intelligent NIC is selected to communicate with a network management and control system, a network address used by the network management and control system to communicate with the host computer is assigned to an interface of the first intelligent NIC. 38.根据权利要求28所述的方法,其进一步包括从所述其它智能NIC收集运行时间统计数据。38. The method of claim 28, further comprising collecting runtime statistics from the other intelligent NICs. 39.一种存储程序的机器可读媒体,所述程序在由至少一个处理单元实施时,实施根据权利要求1至38中任一权利要求所述的方法。39. A machine-readable medium storing a program which, when executed by at least one processing unit, implements the method according to any one of claims 1 to 38. 40.一种电子装置,其包括:40. An electronic device comprising: 一组处理单元;及a set of processing units; and 存储程序的机器可读媒体,所述程序在由所述处理单元中的至少一者实施时,实施根据权利要求1至38中任一权利要求所述的方法。A machine-readable medium storing a program which, when executed by at least one of the processing units, implements the method according to any one of claims 1 to 38. 41.一种系统,其包括用于实施根据权利要求1至38中任一权利要求所述的方法的构件。41. A system comprising means for implementing the method according to any one of claims 1 to 38. 42.一种包括指令的计算机程序产品,所述指令在由计算机执行时,使所述计算机执行根据权利要求1至38中任一权利要求所述的方法。42. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 38.
CN202280076727.0A 2021-12-22 2022-08-01 Intelligent NIC grouping Pending CN118266203A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US17/560,142 2021-12-22
US17/560,153 US11863376B2 (en) 2021-12-22 2021-12-22 Smart NIC leader election
US17/560,148 2021-12-22
US17/560,153 2021-12-22
PCT/US2022/039016 WO2023121720A1 (en) 2021-12-22 2022-08-01 Teaming of smart nics

Publications (1)

Publication Number Publication Date
CN118266203A true CN118266203A (en) 2024-06-28

Family

ID=86769382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280076727.0A Pending CN118266203A (en) 2021-12-22 2022-08-01 Intelligent NIC grouping

Country Status (2)

Country Link
US (1) US11863376B2 (en)
CN (1) CN118266203A (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9825913B2 (en) 2014-06-04 2017-11-21 Nicira, Inc. Use of stateless marking to speed up stateful firewall rule processing
US11038845B2 (en) 2016-02-23 2021-06-15 Nicira, Inc. Firewall in a virtualized computing environment using physical network interface controller (PNIC) level firewall rules
US11606310B2 (en) 2020-09-28 2023-03-14 Vmware, Inc. Flow processing offload using virtual port identifiers
US11875172B2 (en) 2020-09-28 2024-01-16 VMware LLC Bare metal computer for booting copies of VM images on multiple computing devices using a smart NIC
US12190405B2 (en) 2021-07-06 2025-01-07 Intel Corporation Direct memory writes by network interface of a graphics processing unit
US12229578B2 (en) 2021-12-22 2025-02-18 VMware LLC Teaming of smart NICs
US11995024B2 (en) 2021-12-22 2024-05-28 VMware LLC State sharing between smart NICs
US12124342B2 (en) * 2022-04-29 2024-10-22 Dell Products L.P. Recovery of smart network interface controller operating system
US11899594B2 (en) 2022-06-21 2024-02-13 VMware LLC Maintenance of data message classification cache on smart NIC
US11928367B2 (en) 2022-06-21 2024-03-12 VMware LLC Logical memory addressing for network devices
US11928062B2 (en) 2022-06-21 2024-03-12 VMware LLC Accelerating data message classification with smart NICs

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393483B1 (en) * 1997-06-30 2002-05-21 Adaptec, Inc. Method and apparatus for network interface card load balancing and port aggregation
US20150222547A1 (en) * 2014-02-06 2015-08-06 Mellanox Technologies Ltd. Efficient management of network traffic in a multi-cpu server
US10997106B1 (en) * 2020-09-22 2021-05-04 Pensando Sytems Inc. Inter-smartNIC virtual-link for control and datapath connectivity
US20210357242A1 (en) * 2020-05-18 2021-11-18 Dell Products, Lp System and method for hardware offloading of nested virtual switches

Family Cites Families (200)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035105A (en) 1996-01-02 2000-03-07 Cisco Technology, Inc. Multiple VLAN architecture system
US5887134A (en) 1997-06-30 1999-03-23 Sun Microsystems System and method for preserving message order while employing both programmed I/O and DMA operations
US5884313A (en) 1997-06-30 1999-03-16 Sun Microsystems, Inc. System and method for efficient remote disk I/O
US6687758B2 (en) 2001-03-07 2004-02-03 Alacritech, Inc. Port aggregation for network connections that are offloaded to network interface devices
US5974547A (en) 1998-03-20 1999-10-26 3Com Corporation Technique for reliable network booting of an operating system to a client computer
US8225002B2 (en) 1999-01-22 2012-07-17 Network Disk, Inc. Data storage and data sharing in a network of heterogeneous computers
US6496935B1 (en) 2000-03-02 2002-12-17 Check Point Software Technologies Ltd System, device and method for rapid packet filtering and processing
JP4168574B2 (en) 2000-06-02 2008-10-22 株式会社日立製作所 Packet transfer apparatus, packet transfer control method, and packet transfer apparatus setting method
US7792923B2 (en) 2000-10-13 2010-09-07 Zhe Khi Pak Disk system adapted to be directly attached to network
US7231430B2 (en) 2001-04-20 2007-06-12 Egenera, Inc. Reconfigurable, virtual processing system, cluster, network and method
EP1482711A3 (en) 2001-04-20 2009-06-24 Egenera, Inc. Virtual networking system and method in a processing system
WO2003090073A1 (en) 2002-04-18 2003-10-30 Venturcom, Inc. System for and method of streaming data to a computer in a network
US7546364B2 (en) 2002-05-16 2009-06-09 Emc Corporation Replication of remote copy data for internet protocol (IP) transmission
US20080008202A1 (en) 2002-10-31 2008-01-10 Terrell William C Router with routing processors and methods for virtualization
US7424710B1 (en) 2002-12-18 2008-09-09 Vmware, Inc. TCP/IP offloading for virtual machines
JP4157409B2 (en) 2003-03-31 2008-10-01 富士通株式会社 Virtual path construction apparatus and virtual path construction method
US7366181B2 (en) 2003-09-06 2008-04-29 Fujitsu Limited Virtual private network (VPN) with channelized ethernet over sonet (EoS) interface and method
WO2005038599A2 (en) 2003-10-14 2005-04-28 Raptor Networks Technology, Inc. Switching system with distributed switching fabric
JP4391265B2 (en) 2004-02-26 2009-12-24 株式会社日立製作所 Storage subsystem and performance tuning method
WO2005099201A2 (en) 2004-04-03 2005-10-20 Troika Networks, Inc. System and method of providing network node services
US7404192B2 (en) 2004-08-03 2008-07-22 International Business Machines Corporation Apparatus, system, and method for isolating a storage application from a network interface driver
US8285907B2 (en) 2004-12-10 2012-10-09 Intel Corporation Packet processing in switched fabric networks
US7747836B2 (en) 2005-03-08 2010-06-29 Netapp, Inc. Integrated storage virtualization and switch system
US7480780B2 (en) 2005-04-19 2009-01-20 Hitachi, Ltd. Highly available external storage system
US7743198B2 (en) 2005-09-08 2010-06-22 International Business Machines Corporation Load distribution in storage area networks
US8230153B2 (en) 2006-01-20 2012-07-24 Broadcom Corporation Method and system for HBA assisted storage virtualization
KR20090087119A (en) 2006-12-06 2009-08-14 퓨전 멀티시스템즈, 인크.(디비에이 퓨전-아이오) Data management devices, systems, and methods in storage using empty data token directives
US8111707B2 (en) 2007-12-20 2012-02-07 Packeteer, Inc. Compression mechanisms for control plane—data plane processing architectures
US20080267177A1 (en) 2007-04-24 2008-10-30 Sun Microsystems, Inc. Method and system for virtualization of packet encryption offload and onload
US20090089537A1 (en) 2007-09-28 2009-04-02 Sun Microsystems, Inc. Apparatus and method for memory address translation across multiple nodes
US7945436B2 (en) 2007-11-06 2011-05-17 Vmware, Inc. Pass-through and emulation in a virtual machine environment
US7792057B2 (en) 2007-12-21 2010-09-07 At&T Labs, Inc. Method and system for computing multicast traffic matrices
CN101540826A (en) 2008-03-21 2009-09-23 张通 Multi-media device for TV set and TV set
JP5164628B2 (en) 2008-03-24 2013-03-21 株式会社日立製作所 Network switch device, server system, and server transfer method in server system
US8793117B1 (en) 2008-04-16 2014-07-29 Scalable Network Technologies, Inc. System and method for virtualization of networking system software via emulation
US20110060859A1 (en) 2008-04-21 2011-03-10 Rishabhkumar Shukla Host-to-host software-based virtual system
US8478835B2 (en) 2008-07-17 2013-07-02 Netapp. Inc. Method and system for using shared memory with optimized data flow to improve input/output throughout and latency
US8375151B1 (en) 2009-02-12 2013-02-12 Siliconsystems, Inc. Command portal for securely communicating and executing non-standard storage subsystem commands
US8667187B2 (en) 2008-09-15 2014-03-04 Vmware, Inc. System and method for reducing communication overhead between network interface controllers and virtual machines
US8442059B1 (en) 2008-09-30 2013-05-14 Gridiron Systems, Inc. Storage proxy with virtual ports configuration
US8250267B2 (en) 2008-10-31 2012-08-21 Netapp, Inc. Control I/O offload in a split-path storage virtualization system
US8161099B2 (en) 2008-12-17 2012-04-17 Microsoft Corporation Techniques to automatically syndicate content over a network
US8144582B2 (en) 2008-12-30 2012-03-27 International Business Machines Corporation Differentiating blade destination and traffic types in a multi-root PCIe environment
US8589919B2 (en) 2009-04-28 2013-11-19 Cisco Technology, Inc. Traffic forwarding for virtual machines
JP4810585B2 (en) 2009-05-11 2011-11-09 株式会社日立製作所 Calculator that supports remote scan
US8352482B2 (en) 2009-07-21 2013-01-08 Vmware, Inc. System and method for replicating disk images in a cloud computing based virtual machine file system
US8756387B2 (en) 2010-03-05 2014-06-17 International Business Machines Corporation Method and apparatus for optimizing the performance of a storage system
US8346919B1 (en) 2010-03-30 2013-01-01 Chelsio Communications, Inc. Failover and migration for full-offload network interface devices
US8954962B2 (en) 2010-09-22 2015-02-10 Juniper Networks, Inc. Automatically reconfiguring physical switches to be in synchronization with changes made to associated virtual system
US8804747B2 (en) 2010-09-23 2014-08-12 Cisco Technology, Inc. Network interface controller for virtual and distributed services
JP5594049B2 (en) 2010-10-18 2014-09-24 富士通株式会社 Virtual computer migration method, computer and program
US9135044B2 (en) 2010-10-26 2015-09-15 Avago Technologies General Ip (Singapore) Pte. Ltd. Virtual function boot in multi-root I/O virtualization environments to enable multiple servers to share virtual functions of a storage adapter through a MR-IOV switch
US20120167082A1 (en) 2010-12-23 2012-06-28 Sanjay Kumar Direct sharing of smart devices through virtualization
CN103416025B (en) 2010-12-28 2016-11-02 思杰系统有限公司 Systems and methods for adding VLAN tags via a cloud bridge
US8825900B1 (en) 2011-04-05 2014-09-02 Nicira, Inc. Method and apparatus for stateless transport layer tunneling
CN103392166B (en) 2011-04-27 2016-04-27 株式会社日立制作所 Information storage system and storage system management method
US9154327B1 (en) 2011-05-27 2015-10-06 Cisco Technology, Inc. User-configured on-demand virtual layer-2 network for infrastructure-as-a-service (IaaS) on a hybrid cloud network
US20120320918A1 (en) 2011-06-14 2012-12-20 International Business Business Machines Bridge port between hardware lan and virtual switch
US8660124B2 (en) 2011-08-05 2014-02-25 International Business Machines Corporation Distributed overlay network data traffic management by a virtual server
US9203703B2 (en) 2011-08-17 2015-12-01 Nicira, Inc. Packet conflict resolution
US8856518B2 (en) 2011-09-07 2014-10-07 Microsoft Corporation Secure and efficient offloading of network policies to network interface cards
US9158458B2 (en) 2011-09-21 2015-10-13 Os Nexus, Inc. Global management of tiered storage resources
WO2013095392A1 (en) 2011-12-20 2013-06-27 Intel Corporation Systems and method for unblocking a pipeline with spontaneous load deferral and conversion to prefetch
US8660129B1 (en) 2012-02-02 2014-02-25 Cisco Technology, Inc. Fully distributed routing over a user-configured on-demand virtual network for infrastructure-as-a-service (IaaS) on hybrid cloud networks
US9479461B2 (en) 2012-03-16 2016-10-25 Hitachi, Ltd. Computer system and method for communicating data between computers
US9325562B2 (en) 2012-05-15 2016-04-26 International Business Machines Corporation Overlay tunnel information exchange protocol
US9286472B2 (en) 2012-05-22 2016-03-15 Xockets, Inc. Efficient packet handling, redirection, and inspection using offload processors
US10454760B2 (en) 2012-05-23 2019-10-22 Avago Technologies International Sales Pte. Limited Layer-3 overlay gateways
US9059868B2 (en) 2012-06-28 2015-06-16 Dell Products, Lp System and method for associating VLANs with virtual switch ports
JP5958164B2 (en) 2012-08-07 2016-07-27 富士通株式会社 Control apparatus, method and program, system, and information processing method
US10057318B1 (en) 2012-08-10 2018-08-21 Dropbox, Inc. System, method, and computer program for enabling a user to access and edit via a virtual drive objects synchronized to a plurality of synchronization clients
US9008085B2 (en) 2012-08-15 2015-04-14 International Business Machines Corporation Network interface card having overlay gateway functionality
US9130879B2 (en) 2012-08-24 2015-09-08 Vmware, Inc. Methods and systems for offload processing of encapsulated packets
US9697093B2 (en) 2012-09-05 2017-07-04 Veritas Technologies Llc Techniques for recovering a virtual machine
US9317508B2 (en) 2012-09-07 2016-04-19 Red Hat, Inc. Pro-active self-healing in a distributed file system
US8953618B2 (en) 2012-10-10 2015-02-10 Telefonaktiebolaget L M Ericsson (Publ) IP multicast service leave process for MPLS-based virtual private cloud networking
US9571507B2 (en) 2012-10-21 2017-02-14 Mcafee, Inc. Providing a virtual security appliance architecture to a virtual cloud infrastructure
US20150242134A1 (en) 2012-10-22 2015-08-27 Hitachi, Ltd. Method and computer system to allocate actual memory area from storage pool to virtual volume
US8931046B2 (en) 2012-10-30 2015-01-06 Stateless Networks, Inc. System and method for securing virtualized networks
US9116727B2 (en) 2013-01-15 2015-08-25 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Scalable network overlay virtualization using conventional virtual switches
US9378161B1 (en) 2013-01-17 2016-06-28 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
EP2946296A4 (en) 2013-01-17 2016-11-16 Xockets Ip Llc DELIBRATION PROCESSOR MODULES FOR CONNECTING TO A SYSTEM MEMORY
US9935841B2 (en) 2013-01-28 2018-04-03 Intel Corporation Traffic forwarding for processing in network environment
US10437591B2 (en) 2013-02-26 2019-10-08 Qualcomm Incorporated Executing an operating system on processors having different instruction set architectures
US9384021B2 (en) 2013-02-27 2016-07-05 Dell Products L.P. System and method for virtualization aware server maintenance mode
US9143582B2 (en) 2013-03-08 2015-09-22 International Business Machines Corporation Interoperability for distributed overlay virtual environments
US9374241B2 (en) 2013-03-14 2016-06-21 International Business Machines Corporation Tagging virtual overlay packets in a virtual networking system
US9197551B2 (en) 2013-03-15 2015-11-24 International Business Machines Corporation Heterogeneous overlay network translation for domain unification
WO2014161133A1 (en) 2013-04-01 2014-10-09 华为技术有限公司 Data exchange method, apparatus and system for virtual machine
US9483431B2 (en) 2013-04-17 2016-11-01 Apeiron Data Systems Method and apparatus for accessing multiple storage devices from multiple hosts without use of remote direct memory access (RDMA)
US10073971B2 (en) 2013-06-28 2018-09-11 Microsoft Technology Licensing, Llc Traffic processing for network performance and security
US9130775B2 (en) 2013-07-10 2015-09-08 Cisco Technology, Inc. Support for virtual extensible local area network segments across multiple data center sites
US9218193B2 (en) 2013-07-12 2015-12-22 International Business Machines Corporation Distributed virtual machine image management for cloud computing
US20150033222A1 (en) 2013-07-25 2015-01-29 Cavium, Inc. Network Interface Card with Virtual Switch and Traffic Flow Policy Enforcement
US20150052280A1 (en) 2013-08-19 2015-02-19 Emulex Design & Manufacturing Corporation Method and system for communications-stack offload to a hardware controller
US9152593B2 (en) 2013-09-06 2015-10-06 Cisco Technology, Inc. Universal PCI express port
US10193771B2 (en) 2013-12-09 2019-01-29 Nicira, Inc. Detecting and handling elephant flows
US9124536B2 (en) 2013-12-12 2015-09-01 International Business Machines Corporation Managing data flows in overlay networks
US9729578B2 (en) 2014-01-10 2017-08-08 Arista Networks, Inc. Method and system for implementing a network policy using a VXLAN network identifier
US9652247B2 (en) 2014-01-24 2017-05-16 Nec Corporation Capturing snapshots of offload applications on many-core coprocessors
WO2015133448A1 (en) 2014-03-04 2015-09-11 日本電気株式会社 Packet processing device, packet processing method, and program
US9384033B2 (en) 2014-03-11 2016-07-05 Vmware, Inc. Large receive offload for virtual machines
US9696942B2 (en) 2014-03-17 2017-07-04 Mellanox Technologies, Ltd. Accessing remote storage devices using a local bus protocol
US9529773B2 (en) 2014-05-02 2016-12-27 Cavium, Inc. Systems and methods for enabling access to extensible remote storage over a network as local storage via a logical storage controller
US9594634B2 (en) 2014-06-02 2017-03-14 Intel Corporation Techniques to efficiently compute erasure codes having positive and negative coefficient exponents to permit data recovery from more than two failed storage units
US9825913B2 (en) 2014-06-04 2017-11-21 Nicira, Inc. Use of stateless marking to speed up stateful firewall rule processing
US9729512B2 (en) 2014-06-04 2017-08-08 Nicira, Inc. Use of stateless marking to speed up stateful firewall rule processing
CN111669362B (en) 2014-06-09 2022-04-08 华为技术有限公司 Information processing method, network node, verification method and server
US9692698B2 (en) 2014-06-30 2017-06-27 Nicira, Inc. Methods and systems to offload overlay network packet encapsulation to hardware
WO2016003489A1 (en) 2014-06-30 2016-01-07 Nicira, Inc. Methods and systems to offload overlay network packet encapsulation to hardware
US9419897B2 (en) 2014-06-30 2016-08-16 Nicira, Inc. Methods and systems for providing multi-tenancy support for Single Root I/O Virtualization
US9832168B2 (en) 2014-07-01 2017-11-28 Cable Television Laboratories, Inc. Service discovery within multi-link networks
US9483187B2 (en) 2014-09-30 2016-11-01 Nimble Storage, Inc. Quality of service implementation in a networked storage system with hierarchical schedulers
US20160162302A1 (en) 2014-12-07 2016-06-09 Strato Scale Ltd. Fast initiation of workloads using memory-resident post-boot snapshots
US9699060B2 (en) 2014-12-17 2017-07-04 Vmware, Inc. Specializing virtual network device processing to avoid interrupt processing for high packet rate applications
US9672070B2 (en) 2014-12-17 2017-06-06 International Business Machines Corporation Efficient validation of resource access consistency for a set of virtual devices
US10445123B2 (en) 2015-01-19 2019-10-15 Vmware, Inc. Hypervisor exchange with virtual-machine consolidation
US10025740B2 (en) 2015-09-14 2018-07-17 Cavium, Inc. Systems and methods for offloading link aggregation to a host bus adapter (HBA) in single root I/O virtualization (SRIOV) mode
US10162793B1 (en) 2015-09-29 2018-12-25 Amazon Technologies, Inc. Storage adapter device for communicating with network storage
US9756407B2 (en) 2015-10-01 2017-09-05 Alcatel-Lucent Usa Inc. Network employing multi-endpoint optical transceivers
JP2017108231A (en) 2015-12-08 2017-06-15 富士通株式会社 Communication control program, communication control method, and information processing device
US10037424B1 (en) 2015-12-22 2018-07-31 Amazon Technologies, Inc. Isolated virtual environments for untrusted applications
WO2017113231A1 (en) 2015-12-30 2017-07-06 华为技术有限公司 Packet transmission method, device and system
JP6549996B2 (en) 2016-01-27 2019-07-24 アラクサラネットワークス株式会社 Network apparatus, communication method, and network system
US12210476B2 (en) 2016-07-19 2025-01-28 Pure Storage, Inc. Disaggregated compute resources and storage resources in a storage system
US20180032249A1 (en) 2016-07-26 2018-02-01 Microsoft Technology Licensing, Llc Hardware to make remote storage access appear as local in a virtualized environment
CN107733670B (en) 2016-08-11 2020-05-12 新华三技术有限公司 A forwarding strategy configuration method and device
US20180088978A1 (en) 2016-09-29 2018-03-29 Intel Corporation Techniques for Input/Output Access to Memory or Storage by a Virtual Machine or Container
US10613974B2 (en) 2016-10-04 2020-04-07 Pure Storage, Inc. Peer-to-peer non-volatile random-access memory
US20180109471A1 (en) 2016-10-13 2018-04-19 Alcatel-Lucent Usa Inc. Generalized packet processing offload in a datacenter
WO2018086014A1 (en) 2016-11-09 2018-05-17 华为技术有限公司 Packet processing method in cloud computing system, host, and system
US20180150256A1 (en) 2016-11-29 2018-05-31 Intel Corporation Technologies for data deduplication in disaggregated architectures
US10516728B2 (en) 2017-03-10 2019-12-24 Microsoft Technology Licensing, Llc Virtual filtering platform in distributed computing systems
US10503427B2 (en) 2017-03-10 2019-12-10 Pure Storage, Inc. Synchronously replicating datasets and other managed objects to cloud-based storage systems
US10050884B1 (en) 2017-03-21 2018-08-14 Citrix Systems, Inc. Method to remap high priority connection with large congestion window to high latency link to achieve better performance
TWI647934B (en) 2017-04-21 2019-01-11 思銳科技股份有限公司 Network topology real machine simulation method and system
US11146508B2 (en) 2017-05-12 2021-10-12 Xilinx, Inc. Data processing system
US11093284B2 (en) 2017-05-12 2021-08-17 Xilinx, Inc. Data processing system
US10958729B2 (en) 2017-05-18 2021-03-23 Intel Corporation Non-volatile memory express over fabric (NVMeOF) using volume management device
CN114020482A (en) 2017-06-02 2022-02-08 伊姆西Ip控股有限责任公司 Method and apparatus for data writing
US10225233B2 (en) 2017-06-07 2019-03-05 Nicira, Inc. Media access control (MAC) address learning in virtualized computing environments
US10976962B2 (en) 2018-03-15 2021-04-13 Pure Storage, Inc. Servicing I/O operations in a cloud-based storage system
US20190044809A1 (en) 2017-08-30 2019-02-07 Intel Corporation Technologies for managing a flexible host interface of a network interface controller
AU2018340854B2 (en) 2017-09-27 2023-02-16 Canopus Networks Assets Pty Ltd Process and apparatus for identifying and classifying video-data
US11108751B2 (en) 2017-10-27 2021-08-31 Nicira, Inc. Segmentation of encrypted segments in networks
TWI654857B (en) 2017-12-25 2019-03-21 中華電信股份有限公司 Buffer scheduling method for traffic exchange
US10740181B2 (en) 2018-03-06 2020-08-11 Western Digital Technologies, Inc. Failed storage device rebuild method
JP6958440B2 (en) 2018-03-08 2021-11-02 富士通株式会社 Information processing equipment, information processing systems and programs
US10728172B2 (en) 2018-03-28 2020-07-28 Quanta Computer Inc. Method and system for allocating system resources
US11509606B2 (en) 2018-06-29 2022-11-22 Intel Corporation Offload of storage node scale-out management to a smart network interface controller
US10445272B2 (en) 2018-07-05 2019-10-15 Intel Corporation Network function virtualization architecture with device isolation
US10785161B2 (en) 2018-07-10 2020-09-22 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card
US10531592B1 (en) 2018-07-19 2020-01-07 Quanta Computer Inc. Smart rack architecture for diskless computer system
US11438279B2 (en) 2018-07-23 2022-09-06 Pure Storage, Inc. Non-disruptive conversion of a clustered service from single-chassis to multi-chassis
US10795612B2 (en) 2018-07-31 2020-10-06 EMC IP Holding Company LLC Offload processing using storage device slots
US10831603B2 (en) 2018-08-03 2020-11-10 Western Digital Technologies, Inc. Rebuild assist using failed storage device
US10824526B2 (en) 2018-08-03 2020-11-03 Western Digital Technologies, Inc. Using failed storage device in peer-to-peer storage system to perform storage-centric task
US11483245B2 (en) 2018-09-13 2022-10-25 Intel Corporation Technologies for filtering network traffic on ingress
US11489791B2 (en) 2018-10-31 2022-11-01 Intel Corporation Virtual switch scaling for networking applications
US10880210B2 (en) 2018-12-26 2020-12-29 Juniper Networks, Inc. Cloud network having multiple protocols using virtualization overlays across physical and virtualized workloads
US11385981B1 (en) 2018-12-28 2022-07-12 Virtuozzo International Gmbh System and method for deploying servers in a distributed storage to improve fault tolerance
US10567308B1 (en) 2019-01-28 2020-02-18 Dell Products L.P. Virtual machine virtual fabric login system
US11150963B2 (en) 2019-02-28 2021-10-19 Cisco Technology, Inc. Remote smart NIC-based service acceleration
US11943340B2 (en) 2019-04-19 2024-03-26 Intel Corporation Process-to-process secure data movement in network functions virtualization infrastructures
US10999084B2 (en) 2019-05-31 2021-05-04 Microsoft Technology Licensing, Llc Leveraging remote direct memory access (RDMA) for packet capture
US11010103B2 (en) 2019-06-20 2021-05-18 Western Digital Technologies, Inc. Distributed batch processing of non-uniform data objects
US11916800B2 (en) 2019-06-28 2024-02-27 Intel Corporation Dynamic virtual cut-through and dynamic fabric bandwidth allocation between virtual cut-through and store-and-forward traffic
US11494210B2 (en) 2019-07-25 2022-11-08 EMC IP Holding Company LLC Maintaining management communications across virtual storage processors
US20210042255A1 (en) 2019-08-09 2021-02-11 Sony Interactive Entertainment LLC Methods for Using High-Speed Data Communication Fabric to Enable Cross-System Command Buffer Writing for Data Retrieval in Cloud Gaming
US11159453B2 (en) 2019-08-22 2021-10-26 International Business Machines Corporation Fabric-based storage-server connection
LU101361B1 (en) 2019-08-26 2021-03-11 Microsoft Technology Licensing Llc Computer device including nested network interface controller switches
US11714763B2 (en) 2019-10-16 2023-08-01 Intel Corporation Configuration interface to offload capabilities to a network interface
US11438229B2 (en) 2020-01-16 2022-09-06 Dell Products L.P. Systems and methods for operating system deployment and lifecycle management of a smart network interface card
US11962501B2 (en) 2020-02-25 2024-04-16 Sunder Networks Corporation Extensible control plane for network management in a virtual infrastructure environment
US11941458B2 (en) 2020-03-10 2024-03-26 Sk Hynix Nand Product Solutions Corp. Maintaining storage namespace identifiers for live virtualized execution environment migration
US11343152B2 (en) * 2020-04-07 2022-05-24 Cisco Technology, Inc. Traffic management for smart network interface cards
US11689455B2 (en) 2020-05-28 2023-06-27 Oracle International Corporation Loop prevention in virtual layer 2 networks
US11962518B2 (en) 2020-06-02 2024-04-16 VMware LLC Hardware acceleration techniques using flow selection
US12242748B2 (en) 2020-06-03 2025-03-04 Intel Corporation Intermediary for storage command transfers
US12218840B2 (en) 2020-06-16 2025-02-04 Intel Corporation Flexible scheme for adding rules to a NIC pipeline
US12046578B2 (en) 2020-06-26 2024-07-23 Intel Corporation Stacked die network interface controller circuitry
US11374858B2 (en) 2020-06-30 2022-06-28 Pensando Systems, Inc. Methods and systems for directing traffic flows based on traffic flow classifications
US11733907B2 (en) 2020-08-05 2023-08-22 EMC IP Holding Company LLC Optimize recovery time objective and costs of cloud based recovery
US11221972B1 (en) 2020-09-23 2022-01-11 Pensando Systems, Inc. Methods and systems for increasing fairness for small vs large NVMe IO commands
US12021759B2 (en) * 2020-09-28 2024-06-25 VMware LLC Packet processing with hardware offload units
US11593278B2 (en) 2020-09-28 2023-02-28 Vmware, Inc. Using machine executing on a NIC to access a third party storage not supported by a NIC or host
US11606310B2 (en) 2020-09-28 2023-03-14 Vmware, Inc. Flow processing offload using virtual port identifiers
EP4127892A1 (en) 2020-09-28 2023-02-08 VMware, Inc. Distributed storage services supported by a nic
US11736566B2 (en) 2020-09-28 2023-08-22 Vmware, Inc. Using a NIC as a network accelerator to allow VM access to an external storage via a PF module, bus, and VF module
US11636053B2 (en) * 2020-09-28 2023-04-25 Vmware, Inc. Emulating a local storage by accessing an external storage through a shared port of a NIC
US11875172B2 (en) 2020-09-28 2024-01-16 VMware LLC Bare metal computer for booting copies of VM images on multiple computing devices using a smart NIC
US20220100491A1 (en) 2020-09-28 2022-03-31 Vmware, Inc. Integrated installation of resource sharing software on computer and connected network interface card
US12045354B2 (en) * 2020-11-23 2024-07-23 Verizon Patent And Licensing Inc. Smart network interface card-based inline secure communication service
US11645104B2 (en) * 2020-12-22 2023-05-09 Reliance Jio Infocomm Usa, Inc. Intelligent data plane acceleration by offloading to distributed smart network interfaces
US12147318B2 (en) 2020-12-30 2024-11-19 Oracle International Corporation Techniques for replicating state information for high availability
US11445028B2 (en) 2020-12-30 2022-09-13 Dell Products L.P. System and method for providing secure console access with multiple smart NICs using NC-SL and SPDM
US11552904B2 (en) * 2021-01-19 2023-01-10 Reliance Jio Infocomm Usa, Inc. Architecture for high performing data plane applications with smart network interface on compute servers
US20210232528A1 (en) 2021-03-22 2021-07-29 Intel Corporation Configurable device interface
US11640363B2 (en) 2021-07-01 2023-05-02 Dell Products L.P. Managing a smart network interface controller (NIC) of an information handling system
US12190405B2 (en) * 2021-07-06 2025-01-07 Intel Corporation Direct memory writes by network interface of a graphics processing unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393483B1 (en) * 1997-06-30 2002-05-21 Adaptec, Inc. Method and apparatus for network interface card load balancing and port aggregation
US20150222547A1 (en) * 2014-02-06 2015-08-06 Mellanox Technologies Ltd. Efficient management of network traffic in a multi-cpu server
US20210357242A1 (en) * 2020-05-18 2021-11-18 Dell Products, Lp System and method for hardware offloading of nested virtual switches
US10997106B1 (en) * 2020-09-22 2021-05-04 Pensando Sytems Inc. Inter-smartNIC virtual-link for control and datapath connectivity

Also Published As

Publication number Publication date
US20230198833A1 (en) 2023-06-22
US11863376B2 (en) 2024-01-02

Similar Documents

Publication Publication Date Title
US11863376B2 (en) Smart NIC leader election
US11995024B2 (en) State sharing between smart NICs
US12229578B2 (en) Teaming of smart NICs
US12177078B2 (en) Managed switch architectures: software managed switches, hardware managed switches, and heterogeneous managed switches
US20240015086A1 (en) Detecting failure of layer 2 service using broadcast messages
US11641321B2 (en) Packet processing for logical datapath sets
US10728174B2 (en) Incorporating layer 2 service between two interfaces of gateway device
WO2023121720A1 (en) Teaming of smart nics
US10554484B2 (en) Control plane integration with hardware switches
US9825851B2 (en) Distributing routing information in a multi-datacenter environment
US8964528B2 (en) Method and apparatus for robust packet distribution among hierarchical managed switching elements
US10411948B2 (en) Cooperative active-standby failover between network systems
US20230353485A1 (en) Packet processing for logical datapath sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination