US20250219869A1 - Virtual tunnel endpoint (vtep) mapping for overlay networking - Google Patents
Virtual tunnel endpoint (vtep) mapping for overlay networking Download PDFInfo
- Publication number
- US20250219869A1 US20250219869A1 US19/055,419 US202519055419A US2025219869A1 US 20250219869 A1 US20250219869 A1 US 20250219869A1 US 202519055419 A US202519055419 A US 202519055419A US 2025219869 A1 US2025219869 A1 US 2025219869A1
- Authority
- US
- United States
- Prior art keywords
- vtep
- state
- detecting
- mapping information
- virtualized computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4633—Interconnection of networks using encapsulation techniques, e.g. tunneling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
- H04L12/4645—Details on frame tagging
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0813—Configuration setting characterised by the conditions triggering a change of settings
- H04L41/082—Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0896—Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
- H04L41/0897—Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities by horizontal or vertical scaling of resources, or by migrating entities, e.g. virtual resources or entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4604—LAN interconnection over a backbone network, e.g. Internet, Frame Relay
- H04L2012/4629—LAN interconnection over a backbone network, e.g. Internet, Frame Relay using multilayer switching, e.g. layer 3 switching
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0895—Configuration of virtualised networks or elements, e.g. virtualised network function or OpenFlow elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/065—Generation of reports related to network devices
Definitions
- Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a software-defined data center (SDDC).
- SDDC software-defined data center
- virtualized computing instances such as virtual machines (VMs) running different operating systems may be supported by the same physical machine (e.g., referred to as a “computer system” or “host”).
- Each VM is generally provisioned with virtual resources to run a guest operating system and applications.
- the virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc.
- CPU central processing unit
- VTEPs virtual tunnel endpoints
- the VTEPs may be susceptible to various performance issues that affect the performance of overlay networking in the SDN environment.
- FIG. 1 is a schematic diagram illustrating an example software-defined networking (SDN) environment in which virtual tunnel endpoint (VTEP) mapping for overlay networking may be performed;
- SDN software-defined networking
- VTEP virtual tunnel endpoint
- FIG. 2 is a schematic diagram illustrating an example management plane view of the SDN environment in FIG. 1 ;
- FIG. 3 is a flowchart of an example process for a computer system for VTEP mapping for overlay networking in an SDN environment
- FIG. 4 is a flowchart of an example detailed process for a computer system for VTEP mapping for overlay networking in an SDN environment
- FIG. 5 is a schematic diagram illustrating a first example of VTEP mapping for overlay networking
- FIG. 6 is a schematic diagram illustrating an example VTEP state machine
- FIG. 7 is a schematic diagram illustrating an example overlay traffic forwarding based on the mapping information in FIG. 5 ;
- FIG. 8 is a schematic diagram illustrating a second example of VTEP mapping for overlay networking.
- overlay networking may be implemented in an improved manner by dynamically mapping virtual tunnel endpoints (VTEPs) and virtualized computing instances (e.g., virtual machines).
- VTEPs virtual tunnel endpoints
- virtualized computing instances e.g., virtual machines.
- One example may involve a computer system (e.g., host-A 110 A in FIG. 1 ) monitoring multiple VTEPs that are configured on the computer system for overlay networking, including a first VTEP (e.g., VTEP1 181 ) and a second VTEP (e.g., VTEP2 182 ).
- the computer system may identify mapping information that associates a virtualized computing instance (e.g., VM1 131 ) with the first VTEP. Also, the mapping information may be updated to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP (i.e., UNHEALTHY) to the second VTEP (i.e., HEALTHY).
- a virtualized computing instance e.g., VM1 131
- the mapping information may be updated to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP (i.e., UNHEALTHY) to the second VTEP (i.e., HEALTHY).
- an encapsulated packet (e.g., 192 in FIG. 1 ) may be generated and sent towards the destination based on the updated mapping information.
- the encapsulated packet may include the egress packet and an outer header identifying the second VTEP to be a source VTEP.
- mapping information may be updated dynamically and automatically to facilitate high-availability overlay networking, reduce system downtime, and improve data center user experience.
- FIG. 1 is a schematic diagram illustrating example software-defined networking (SDN) environment 100 in which VTEP mapping for overlay networking may be performed.
- SDN environment 100 may include additional and/or alternative components than that shown in FIG. 1 .
- first and second are used to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first element may be referred to as a second element, and vice versa.
- SDN environment 100 includes multiple hosts 110 A-B that are inter-connected via physical network 105 .
- Each host 110 A/ 110 B may include suitable hardware 112 A/ 112 B and virtualization software (e.g., hypervisor-A 114 A, hypervisor-B 114 B) to support various virtual machines (VMs).
- VMs virtual machines
- hosts 110 A-B may support respective VMs 131 - 134 .
- Hardware 112 A/ 112 B includes suitable physical components, such as central processing unit(s) or processor(s) 120 A/ 120 B; memory 122 A/ 122 B; physical network interface controllers (PNICs) 171 - 174 ; and storage 126 A/ 126 B, etc.
- SDN environment 100 may include any number of hosts (also known as “host computers”, “host devices”, “physical servers”, “server systems”, “transport nodes,” etc.). Each host may be supporting tens or hundreds of VMs.
- Hypervisor 114 A/ 114 B maintains a mapping between underlying hardware 112 A/ 112 B and virtual resources allocated to respective VMs.
- Virtual resources are allocated to respective VMs 131 - 134 to support a guest operating system (OS; not shown for simplicity) and application(s) 141 - 144 .
- the virtual resources may include virtual CPU, guest physical memory, virtual disk, virtual network interface controller (VNIC), etc.
- Hardware resources may be emulated using virtual machine monitors (VMMs). For example in FIG.
- VNICs 161 - 164 are virtual network adapters for VMs 131 - 134 , respectively, and are emulated by corresponding VMMs (not shown for simplicity) instantiated by their respective hypervisor at respective host-A 110 A and host-B 110 B.
- the VMMs may be considered as part of respective VMs, or alternatively, separated from the VMs. Although one-to-one relationships are shown, one VM may be associated with multiple VNICs (each VNIC having its own network address).
- a virtualized computing instance may represent an addressable data compute node (DCN) or isolated user space instance.
- DCN addressable data compute node
- Any suitable technology may be used to provide isolated user space instances, not just hardware virtualization.
- Other virtualized computing instances may include containers (e.g., running within a VM or on top of a host operating system without the need for a hypervisor or separate operating system or implemented as an operating system level virtualization), virtual private servers, client computers, etc. Such container technology is available from, among others, Docker, Inc.
- the VMs may also be complete computational environments, containing virtual equivalents of the hardware and software components of a physical computing system.
- hypervisor may refer generally to a software layer or component that supports the execution of multiple virtualized computing instances, including system-level software in guest VMs that supports namespace containers such as Docker, etc.
- Hypervisors 114 A-B may each implement any suitable virtualization technology, such as VMware ESX® or ESXiTM (available from VMware, Inc.), Kernel-based Virtual Machine (KVM), etc.
- the term “packet” may refer generally to a group of bits that can be transported together, and may be in another form, such as “frame,” “message,” “segment,” etc.
- traffic” or “flow” may refer generally to multiple packets.
- layer-2 may refer generally to a link layer or media access control (MAC) layer; “layer-3” to a network or Internet Protocol (IP) layer; and “layer-4” to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models.
- MAC media access control
- IP Internet Protocol
- layer-4 to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models.
- OSI Open System Interconnection
- Hypervisor 114 A/ 114 B implements virtual switch 115 A/ 115 B and logical distributed router (DR) instance 117 A/ 117 B to handle egress packets from, and ingress packets to, corresponding VMs.
- logical switches and logical DRs may be implemented in a distributed manner and can span multiple hosts.
- logical switches that provide logical layer-2 connectivity, i.e., an overlay network may be implemented collectively by virtual switches 115 A-B and represented internally using forwarding tables 116 A-B at respective virtual switches 115 A-B.
- Forwarding tables 116 A-B may each include entries that collectively implement the respective logical switches.
- logical DRs that provide logical layer-3 connectivity may be implemented collectively by DR instances 117 A-Band represented internally using routing tables (not shown) at respective DR instances 117 A-B.
- the routing tables may each include entries that collectively implement the respective logical DRs.
- Packets may be received from, or sent to, each VM via an associated logical port.
- logical switch ports 165 - 168 (labelled “LSP1” to “LSP4”) are associated with respective VMs 131 - 134 .
- the term “logical port” or “logical switch port” may refer generally to a port on a logical switch to which a virtualized computing instance is connected.
- a “logical switch” may refer generally to a software-defined networking (SDN) construct that is collectively implemented by virtual switches 115 A-B in FIG. 1
- a “virtual switch” may refer generally to a software switch or software implementation of a physical switch.
- mapping there is usually a one-to-one mapping between a logical port on a logical switch and a virtual port on virtual switch 115 A/ 115 B.
- the mapping may change in some scenarios, such as when the logical port is mapped to a different virtual port on a different virtual switch after migration of the corresponding virtualized computing instance (e.g., when the source host and destination host do not have a distributed virtual switch spanning them).
- SDN controller 103 and SDN manager 104 are example network management entities in SDN environment 100 .
- One example of an SDN controller is the NSX controller component of VMware NSX® (available from VMware, Inc.) that operates on a central control plane (CCP).
- SDN controller 103 may be a member of a controller cluster (not shown for simplicity) that is configurable using SDN manager 104 operating on a management plane.
- Network management entity 103 / 104 may be implemented using physical machine(s), VM(s), or both.
- Logical switches, logical routers, and logical overlay networks may be configured using SDN controller 103 , SDN manager 104 , etc.
- a local control plane (LCP) agent (not shown) on host 110 A/ 110 B may interact with SDN controller 103 via control-plane channel 101 / 102 .
- SDDCs Software Defined Data Centers
- the scale of these SDDCs has been increasing rapidly, such as towards hundreds of hypervisors that are each capable of hosting hundreds of VMs.
- overlay networking stretches a layer-2 network over an underlying layer-3 network.
- Any suitable overlay networking protocol(s) may be implemented, such as Virtual extensible Local Area Network (VXLAN), Stateless Transport Tunneling (STT), Generic Network Virtualization Encapsulation (GENEVE), Generic Routing Encapsulation (GRE), etc.
- VXLAN Virtual extensible Local Area Network
- STT Stateless Transport Tunneling
- GENEVE Generic Network Virtualization Encapsulation
- GRE Generic Routing Encapsulation
- overlay networking protocols require overlay traffic from VMs to be encapsulated with an outer header with source and destination VTEPs.
- hypervisor 114 A/ 114 B may implement multiple VTEPs to encapsulate and decapsulate packets with an outer header (also known as a tunnel header) identifying a logical overlay network.
- hypervisor-A 114 A at host-A 110 A implements VTEP1 181 and VTEP2 182
- hypervisor-B 114 B at host-B 110 B implements VTEP3 183 and VTEP4 184 .
- Encapsulated packets may be sent via a logical overlay tunnel established between a pair of VTEPs over physical network 105 , over which respective hosts 110 A-B are in layer-3 connectivity with one another.
- the logical overlay tunnel terminates at the VTEPs.
- FIG. 2 is a schematic diagram illustrating example management plane view 200 of SDN environment 100 in FIG. 1 .
- VNI virtual network identifier
- laaS infrastructure-as-a-service
- logical overlay networks may be deployed to support multiple tenants.
- each logical overlay network may be designed to be an abstract representation of a tenant's network in SDN environment 100 .
- a multi-tier topology may be
- a logical DR connects logical switches 201 - 202 to facilitate communication among VMs 131 - 134 on different segments. See also logical switch ports “LSP7” 203 and “LSP8” 204 , and logical router ports “LRP1” 207 and “LRP2” 208 connecting DR 205 with logical switches 201 - 202 .
- Logical switch 201 / 202 may be implemented collectively by multiple hosts 110 A-B, such as using virtual switches 115 A-B and represented internally using forwarding tables 116 A-B.
- DR 205 may be implemented collectively by multiple transport nodes, such as using edge node 206 and hosts 110 A-B. For example, DR 205 may be implemented using DR instances 117 A-B and represented internally using routing tables (not shown) at respective hosts 110 A-B.
- Edge node 206 may implement one or more logical DRs and logical service routers (SRs), such as DR 205 and SR 209 in FIG. 2 .
- SR 209 may represent a centralized routing component that provides centralized stateful services to VMs 131 - 134 , such as IP address assignment using dynamic host configuration protocol (DHCP), network address translation (NAT), etc.
- EDGE 206 may be implemented using VM(s) and/or physical machines (“bare metal machines”), and capable of performing functionalities of a switch, router (e.g., logical service router), bridge, gateway, edge appliance, or any combination thereof. In practice, EDGE 206 may be deployed at the edge of a geographical site to facilitate north-south traffic to an external network, such as another data center at a different geographical site.
- One of the challenges in SDN environment 100 is to maintain the availability of overlay networking to support packet forwarding to/from VMs 131 - 134 . These workloads require network connectivity to support various applications, such as web servers, databases, proxies, network functions, etc.
- the complexity of SDN environment 100 also increases, which inevitably introduces more possible failure points.
- multiple VTEPs 181 - 182 may be configured on host-A 110 A for overlay networking.
- VM(s) may be mapped to VTEP 181 / 182 that is responsible for packet encapsulation and decapsulation for the VM(s).
- VTEP 181 / 182 fails, however, overlay networking connectivity for the VM(s) will be lost. This is especially problematic there is a large number (e.g., several hundreds) of VMs that are mapped to particular VTEP 181 / 182 .
- FIG. 3 is a flowchart of example process 300 for a computer system to perform VTEP mapping for overlay networking in SDN environment 100 .
- Example process 300 may include one or more operations, functions, or actions illustrated by one or more blocks, such as 310 to 360 . The various blocks may be combined into fewer blocks, divided into additional blocks, and/or eliminated depending on the desired implementation.
- host 110 A as an example “computer system”
- VM1 131 as an example “virtualized computing instance.”
- host-A 110 A may generate and send an encapsulated packet towards the destination based on the updated mapping information.
- the encapsulated packet includes the egress packet and an outer header identifying the second VTEP2 182 to be a source VTEP.
- the mapping information may be updated to (VM1 131 , VTEP2 182 ).
- second encapsulated packet (see 192 in FIGS. 1 - 2 ) may be generated and sent towards destination VTEP3 183 on host-B 110 B.
- host-A 110 A may update the mapping information to reassociate VM1 131 with VTEP1 181 . This has the effect of migrating VM1 131 from second VTEP2 182 to first VTEP1 181 , both being in the HEALTHY state. This way, overlay networking traffic may be load balanced among VTEPs 181 - 182 on host-A 110 A.
- VTEP 181 / 182 may transition between a HEALTHY state and an UNHEALHTY state according to a state machine in FIG. 6 .
- block 320 may involve detecting the state transition to a first UNHEALTHY state (e.g., IP_WAITING in FIG. 6 ) in which first VTEP 181 has not been assigned with a valid IP address by a Dynamic Host Configuration Protocol (DHCP) server, or the lease of the IP address has expired.
- first VTEP 181 may transition to a second UNHEALTHY state (e.g., BFD_DOWN in FIG. 6 ) in which each and every overlay networking path via first VTEP1 181 is down.
- first VTEP 181 may transition to a third UNHEALTHY state (e.g., ADMIN_DOWN in FIG. 6 ) that is configured by a network administrator, such as for maintenance and troubleshooting purposes.
- a third UNHEALTHY state e.g., ADMIN_DOWN in FIG.
- Examples of the present disclosure should be contrasted against conventional approaches that rely on static VM-VTEP mapping.
- VMs mapped to the VTEP will be affected because all overlay traffic will be dropped.
- the loss of overlay networking connectivity is especially problematic for VMs are running critical workloads and/or when a large number of VMs (e.g., order of hundreds) are mapped to the VTEP.
- a network administrator may have to intervene and restore connectivity, which is time consuming and inefficient.
- connectivity loss and the need for manual intervention may be reduced using examples of the present disclosure.
- any improvement in the availability of overlay networking is important because every second of downtime may lead to huge losses and degraded user experience.
- FIG. 4 is a flowchart of example detailed process 400 for VTEP mapping for overlay networking in SDN environment 100 .
- Example process 400 may include one or more operations, functions, or actions illustrated at 410 to 460 .
- the various operations, functions or actions may be combined into fewer blocks, divided into additional blocks, and/or eliminated depending on the desired implementation.
- FIG. 5 is a schematic diagram illustrating example 500 of VTEP mapping for overlay networking.
- mapping module 118 A/ 118 B may include (1) an interface sub-module to check the IP address assignment, (2) a monitoring sub-module to manage monitoring sessions established between VTEPs, (3) remap up/down sub-module(s) to update mapping information dynamically, etc.
- mapping module 118 A/ 118 B may include (1) an interface sub-module to check the IP address assignment, (2) a monitoring sub-module to manage monitoring sessions established between VTEPs, (3) remap up/down sub-module(s) to update mapping information dynamically, etc.
- SIP source IP address
- DIP destination IP address
- OUTER_SIP outer source VTEP IP address in an outer header
- OUTER_DIP outer destination VTEP IP address in the outer header, etc.
- host-A 110 A may be configured with multiple (N) VTEPs for overlay networking.
- VTEP1 181 and VTEP2 182 are configured for overlay networking on host-A 110 A.
- VTEPs 181 - 182 may be created as ports on virtual switch 115 A.
- VTEP 181 / 182 requires an IP address and a MAC address.
- each VTEPi may be associated with an uplink (denoted as UPLINKi), such as UPLINK1 for VTEP1 181 and UPLINK2 for VTEP2 182 . See 501 - 502 in FIG. 5 .
- an “uplink” may represent a logical construct for a connection to a network. From the perspective of host 110 A/B, the term “uplink” may refer generally to a network connection from host 110 A/B via PNIC 171 / 172 / 173 / 174 to a physical network device (e.g., top-of-rack switch, spine switch, router) in physical network 105 .
- the term “downlink,” on the other hand, may refer to a connection from the physical network device to host 110 A/B.
- the mapping between an uplink and a PNIC may be one-to-one (i.e., one PNIC per uplink).
- a NIC teaming policy may be implemented to map multiple PNICs to one uplink.
- the term “NIC teaming” may refer to the grouping of multiple PNICs into one logical NIC.
- host-A 110 A may perform initial VM-VTEP mapping for VM1 131 and VM2 132 .
- host-A 110 A may create a VNIC port on virtual switch 115 A for VNIC 161 of VM1 131 .
- a VNIC port may be created on virtual switch 115 A for VNIC2 162 .
- VM1 131 and VM2 132 are connected to the same virtual switch 115 A via respective VNIC ports.
- a VTEP may be selected for VM 131 / 132 based on any suitable teaming policy.
- VM1 131 is mapped to VTEP1 181 (see 510 ), and VM2 132 to VTEP2 182 (see 520 ).
- the VM-VTEP mapping or association may not change unless there is a change in the teaming policy, or a VTEP is added, removed, or marked as standby.
- Any suitable teaming policy may be used, such as load balancing based on a configuration parameter (e.g., VNIC port ID, VNIC MAC address) associated with VM 131 / 132 , failover order associated with multiple VTEPs 181 - 182 , etc.
- a configuration parameter e.g., VNIC port ID, VNIC MAC address
- VTEP selection may be performed to achieve load balancing based on source VNIC port ID (denoted as VNICPortID) associated with VM 131 / 132 .
- VNICPortID VNICPortID % N.
- N number of VTEPs
- endpointID unique ID assigned to a VTEP.
- VTEP selection may be performed to achieve load balancing based on source VNIC MAC address (MACAddr) associated with VM 131 / 132 .
- MACAddr source VNIC MAC address
- the sixth octet of the MAC address may be used instead of the VNIC port ID to map VM 131 / 132 to either VTEP1 181 or VTEP2 182 .
- VTEP selection may be performed based on a failover order associated with VTEPs 181 - 182 .
- host-A 110 A may be configured with two active VTEPs 181 - 182 , as well as a standby VTEP (not shown). Once an active VTEP fails, the standby VTEP may switch to the active mode and take over.
- host-A 110 A may monitor VTEPs 181 - 182 configured for overlay networking.
- Each VTEPi may be associated with a health status or state (denoted as STATE-i) that is either HEALHTY or UNHEALTHY.
- block 410 may involve monitoring whether VTEP 181 / 182 is assigned with a valid IP address by a DHCP server, or a lease for the IP address has expired. Additionally or alternatively, block 410 may involve monitoring a path (also known as a logical overlay tunnel) between local VTEP 181 / 182 on host-A 110 A and remote VTEP 183 / 184 on host-B 110 B. See also 411 - 412 .
- BFD Bidirectional Forwarding Detection
- IETF Internet Engineering Task Force
- RRC Request for Comments
- BFD may be used between two VTEPs to detect failures in the underlay path between them.
- BFD packets may be generated and sent (e.g., using mapping module 118 A/ 118 B) over a BFD session periodically.
- FIG. 6 is a schematic diagram illustrating example VTEP state machine 600 .
- initialization state see INIT 601
- normal operational state see NORMAL 602
- NORMAL 602 normal operational state
- IP_WAITING 603 awaiting IP address assignment state
- BFD_DOWN 604 BFD session down state
- ADMIN_DOWN 605 administrator-configured down state
- a state transition to INIT 601 from IP_WAITING 603 may be detected when a valid IP address is not assigned to VTEP 181 / 182 within a predetermined period of time (i.e., timeout period).
- the IP address assignment might fail for various reasons, such as a DHCP server being unreachable or running out of IP addresses available for assignment (e.g., due to server expansion).
- a state transition from INIT 601 to NORMAL 602 may be detected when a valid IP address is assigned to VTEP 181 / 182 and all its BFD sessions are up and running.
- a state transition from NORMAL 602 to IP_WAITING 603 may be detected when an IP address assigned to VTEP 181 / 182 is lost.
- IP_WAITING 603 i.e., UNHEALTHY
- the IP address is leased for a specific amount of time called DHCP lease time.
- the IP address may be lost when the lease is not renewed, such as when DHCP server is unreachable or has run out of IP addresses.
- a state transition from NORMAL 602 to BFD_DOWN 604 may be detected when each and every overlay networking path and associated BFD session established using that VTEP 181 / 182 is down.
- a full-mesh topology may be used to establish BFD sessions among VTEPs 181 - 184 .
- host-A 110 A for example, local VTEP1 181 may establish two BFD sessions with respective remote VTEP3 183 and VTEP4 184 on host-B 110 B. The state transition occurs when each and every BFD session is down.
- a state transition from BFD_DOWN 604 to NORMAL 602 may occur when at least one BFD session is up, or there is an IP address change event.
- the IP address change may be detected when a new DHCP lease with a different IP address is given by a DHCP server during lease renewal, or an operator manually changes the VTEP IP address (e.g., using SDN manager 103 ). Note that if there is at least one of the BFD sessions is up and running, VTEP1 181 remains in NORMAL 602 and no state transition to BFD_DOWN 604 will occur.
- a state transition from BFD_DOWN 604 to IP_WAITING 603 may be detected when an IP address assigned to VTEP 181 / 182 is lost. Again, this may occur when the DHCP server becomes unreachable or has run out of IP addresses.
- ADMIN_DOWN 605 represents a state that is configured by a network administrator to bring VTEP 181 / 182 down, such as for maintenance and troubleshooting purposes.
- a state transition from ADMIN_DOWN 605 to INIT 601 may occur when the network administrator performs configuration to bring VTEP 181 / 182 up and running again.
- host-A 110 A may update a VM-VTEP mapping after a timeout period.
- block 415 may involve a notification system generating system notifications relating to state transitions, and a remap module listening to the notifications to detect any faulty VTEP.
- the timeout period may be user-configurable to avoid unnecessary remapping due to transient faults.
- host-A 110 A may identify a HEALTHY VTEPk where k ⁇ i and i, k € ⁇ 1, . . . , N ⁇ . This way, at 430 - 435 , each VM that is mapped to the UNHEALTHY VTEPi may be identified and migrated to the HEALTHY VTEPK.
- both VTEP1 181 and VTEP2 182 may be detected to be HEALTHY (e.g., NORMAL 602 ) at one point in time.
- the mapping information may be updated dynamically based on the state of VTEP 181 / 182 . This reduces the likelihood of the connectivity loss for VM(s) mapped to particular VTEP1 181 based on a teaming policy. Instead of maintaining the mapping statically, the VM(s) may be migrated to facilitate high availability of overlay networking. This reduces system downtime and improves VM performance. Based on the above examples, automatic remapping of VMs to HEALTHY VTEPs may be performed to support high availability of overlay networking to improve VM performance and user experience.
- FIG. 8 is a schematic diagram illustrating second example 800 of VTEP mapping for overlay networking.
- multiple VMs may be migrated from a source VTEP to respective multiple destination VTEPs for load balancing purposes.
- host-A 110 A may generate mapping information that associates multiple VMs (i.e., VM1 131 , VM5 135 , VM6 136 and VM7 137 ) with VTEP-A1 181 .
- VMs i.e., VM1 131 , VM5 135 , VM6 136 and VM7 137
- host-A 110 A may detect a state transition associated with VTEP-A1 181 from HEALTHY to UNHEALTHY (e.g., BFD_DOWN 604 in FIG. 6 ). In response, host-A 110 A may update the mapping information to migrate VMs 131 , 135 - 137 from VTEP-A1 181 . For example, VM1 131 may be migrated to VTEP-A2 182 (see 860 ), VM5 135 also to VTEP-A2 182 (see 870 ), VM6 136 to VTEP-A3 801 (see 880 ), and VM7 137 to VTEP-A4 802 (see 890 ). This way, overlay traffic from these VMs may continue to flow while a network administrator fixes issues affecting VTEP-A1 181 .
- VNICPortID or MACAddr
- the VTEP selection may be load-based, such as based on the number of VMs that are already mapped to VTEP-A2 182 , VTEP-A3 801 or VTEP-A4 802 .
- Another example may involve tracking a performance metric (e.g., packet rate) on the uplinks and selecting a VTEP associated with a particular uplink with the least usage.
- a performance metric e.g., packet rate
- host-A 110 A may restore the initial mappings by migrating VMs 131 , 135 - 137 back to VTEP-A1 181 . Based on the above, examples of the present disclosure facilitate high-availability overlay networking to reduce downtime in SDN environment 100 .
- public cloud environment 100 may include other virtual workloads, such as containers, etc.
- container also known as “container instance”
- container technologies may be used to run various containers inside respective VMs 131 - 134 .
- Containers are “OS-less”, meaning that they do not include any OS that could weigh 10s of Gigabytes (GB). This makes containers more lightweight, portable, efficient, and suitable for delivery into an isolated OS environment.
- Running containers inside a VM (known as “containers-on-virtual-machine” approach) not only leverages the benefits of container technologies but also that of virtualization technologies.
- the containers may be executed as isolated processes inside respective VMs.
- the above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof.
- the above examples may be implemented by any suitable computing device, computer system, etc.
- the computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc.
- the computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform process(es) described herein with reference to FIG. 1 to FIG. 8 .
- Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others.
- ASICs application-specific integrated circuits
- PLDs programmable logic devices
- FPGAs field-programmable gate arrays
- processor is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
- a computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Example methods and systems for virtual tunnel endpoint (VTEP) mapping for overlay networking are described. One example may involve a computer system monitoring multiple VTEPs that are configured for overlay networking. In response to detecting a state transition associated with a first VTEP from a healthy state to an unhealthy state, the computer system may identify mapping information that associates a virtualized computing instance with the first VTEP in the unhealthy state; and update the mapping information to associate the virtualized computing instance with a second VTEP in the healthy state. In response to detecting an egress packet from the virtualized computing instance to a destination, an encapsulated packet may be generated and sent towards the destination based on the updated mapping information. The encapsulated packet may include the egress packet and an outer header identifying the second VTEP to be a source VTEP.
Description
- This application is a continuation of U.S. patent application Ser. No. 17/560,284 filed Dec. 23, 2021, entitled “VIRTUAL TUNNEL ENDPOINT (VTEP) MAPPING FOR OVERLAY NETWORKING”, the entirety of which is incorporated herein by reference.
- Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a software-defined data center (SDDC). For example, through server virtualization, virtualized computing instances such as virtual machines (VMs) running different operating systems may be supported by the same physical machine (e.g., referred to as a “computer system” or “host”). Each VM is generally provisioned with virtual resources to run a guest operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc. In practice, multiple virtual tunnel endpoints (VTEPs) may be configured on a computer system. The VTEPs may be susceptible to various performance issues that affect the performance of overlay networking in the SDN environment.
-
FIG. 1 is a schematic diagram illustrating an example software-defined networking (SDN) environment in which virtual tunnel endpoint (VTEP) mapping for overlay networking may be performed; -
FIG. 2 is a schematic diagram illustrating an example management plane view of the SDN environment inFIG. 1 ; -
FIG. 3 is a flowchart of an example process for a computer system for VTEP mapping for overlay networking in an SDN environment; -
FIG. 4 is a flowchart of an example detailed process for a computer system for VTEP mapping for overlay networking in an SDN environment; -
FIG. 5 is a schematic diagram illustrating a first example of VTEP mapping for overlay networking; -
FIG. 6 is a schematic diagram illustrating an example VTEP state machine; -
FIG. 7 is a schematic diagram illustrating an example overlay traffic forwarding based on the mapping information inFIG. 5 ; and -
FIG. 8 is a schematic diagram illustrating a second example of VTEP mapping for overlay networking. - According to examples of the present disclosure, overlay networking may be implemented in an improved manner by dynamically mapping virtual tunnel endpoints (VTEPs) and virtualized computing instances (e.g., virtual machines). One example may involve a computer system (e.g., host-
A 110A inFIG. 1 ) monitoring multiple VTEPs that are configured on the computer system for overlay networking, including a first VTEP (e.g., VTEP1 181) and a second VTEP (e.g., VTEP2 182). In response to detecting a state transition associated with the first VTEP from a HEALTHY state to an UNHEALTHY state, the computer system may identify mapping information that associates a virtualized computing instance (e.g., VM1 131) with the first VTEP. Also, the mapping information may be updated to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP (i.e., UNHEALTHY) to the second VTEP (i.e., HEALTHY). - In response to detecting an egress packet from the virtualized computing instance to a destination, an encapsulated packet (e.g., 192 in
FIG. 1 ) may be generated and sent towards the destination based on the updated mapping information. The encapsulated packet may include the egress packet and an outer header identifying the second VTEP to be a source VTEP. As will be described below, mapping information may be updated dynamically and automatically to facilitate high-availability overlay networking, reduce system downtime, and improve data center user experience. - In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
- Challenges relating to overlay networking will now be explained in more detail using
FIG. 1 , which is a schematic diagram illustrating example software-defined networking (SDN)environment 100 in which VTEP mapping for overlay networking may be performed. It should be understood that, depending on the desired implementation,SDN environment 100 may include additional and/or alternative components than that shown inFIG. 1 . Although the terms “first” and “second” are used to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first element may be referred to as a second element, and vice versa. - SDN
environment 100 includesmultiple hosts 110A-B that are inter-connected viaphysical network 105. Eachhost 110A/110B may includesuitable hardware 112A/112B and virtualization software (e.g., hypervisor-A 114A, hypervisor-B 114B) to support various virtual machines (VMs). For example,hosts 110A-B may support respective VMs 131-134.Hardware 112A/112B includes suitable physical components, such as central processing unit(s) or processor(s) 120A/120B;memory 122A/122B; physical network interface controllers (PNICs) 171-174; andstorage 126A/126B, etc. In practice,SDN environment 100 may include any number of hosts (also known as “host computers”, “host devices”, “physical servers”, “server systems”, “transport nodes,” etc.). Each host may be supporting tens or hundreds of VMs. - Hypervisor 114A/114B maintains a mapping between
underlying hardware 112A/112B and virtual resources allocated to respective VMs. Virtual resources are allocated to respective VMs 131-134 to support a guest operating system (OS; not shown for simplicity) and application(s) 141-144. For example, the virtual resources may include virtual CPU, guest physical memory, virtual disk, virtual network interface controller (VNIC), etc. Hardware resources may be emulated using virtual machine monitors (VMMs). For example inFIG. 1 , VNICs 161-164 are virtual network adapters for VMs 131-134, respectively, and are emulated by corresponding VMMs (not shown for simplicity) instantiated by their respective hypervisor at respective host-A 110A and host-B 110B. The VMMs may be considered as part of respective VMs, or alternatively, separated from the VMs. Although one-to-one relationships are shown, one VM may be associated with multiple VNICs (each VNIC having its own network address). - Although examples of the present disclosure refer to VMs, it should be understood that a “virtual machine” running on a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node (DCN) or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances may include containers (e.g., running within a VM or on top of a host operating system without the need for a hypervisor or separate operating system or implemented as an operating system level virtualization), virtual private servers, client computers, etc. Such container technology is available from, among others, Docker, Inc. The VMs may also be complete computational environments, containing virtual equivalents of the hardware and software components of a physical computing system.
- The term “hypervisor” may refer generally to a software layer or component that supports the execution of multiple virtualized computing instances, including system-level software in guest VMs that supports namespace containers such as Docker, etc. Hypervisors 114A-B may each implement any suitable virtualization technology, such as VMware ESX® or ESXi™ (available from VMware, Inc.), Kernel-based Virtual Machine (KVM), etc. The term “packet” may refer generally to a group of bits that can be transported together, and may be in another form, such as “frame,” “message,” “segment,” etc. The term “traffic” or “flow” may refer generally to multiple packets. The term “layer-2” may refer generally to a link layer or media access control (MAC) layer; “layer-3” to a network or Internet Protocol (IP) layer; and “layer-4” to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models.
- Hypervisor 114A/114B implements
virtual switch 115A/115B and logical distributed router (DR)instance 117A/117B to handle egress packets from, and ingress packets to, corresponding VMs. InSDN environment 100, logical switches and logical DRs may be implemented in a distributed manner and can span multiple hosts. For example, logical switches that provide logical layer-2 connectivity, i.e., an overlay network, may be implemented collectively byvirtual switches 115A-B and represented internally using forwarding tables 116A-B at respectivevirtual switches 115A-B. Forwarding tables 116A-B may each include entries that collectively implement the respective logical switches. Further, logical DRs that provide logical layer-3 connectivity may be implemented collectively byDR instances 117A-Band represented internally using routing tables (not shown) atrespective DR instances 117A-B. The routing tables may each include entries that collectively implement the respective logical DRs. - Packets may be received from, or sent to, each VM via an associated logical port. For example, logical switch ports 165-168 (labelled “LSP1” to “LSP4”) are associated with respective VMs 131-134. Here, the term “logical port” or “logical switch port” may refer generally to a port on a logical switch to which a virtualized computing instance is connected. A “logical switch” may refer generally to a software-defined networking (SDN) construct that is collectively implemented by
virtual switches 115A-B inFIG. 1 , whereas a “virtual switch” may refer generally to a software switch or software implementation of a physical switch. In practice, there is usually a one-to-one mapping between a logical port on a logical switch and a virtual port onvirtual switch 115A/115B. However, the mapping may change in some scenarios, such as when the logical port is mapped to a different virtual port on a different virtual switch after migration of the corresponding virtualized computing instance (e.g., when the source host and destination host do not have a distributed virtual switch spanning them). - Through virtualization of networking services in
SDN environment 100, logical networks (also referred to as overlay networks or logical overlay networks) may be provisioned, changed, stored, deleted, and restored programmatically without having to reconfigure the underlying physical hardware architecture.SDN controller 103 andSDN manager 104 are example network management entities inSDN environment 100. One example of an SDN controller is the NSX controller component of VMware NSX® (available from VMware, Inc.) that operates on a central control plane (CCP).SDN controller 103 may be a member of a controller cluster (not shown for simplicity) that is configurable usingSDN manager 104 operating on a management plane.Network management entity 103/104 may be implemented using physical machine(s), VM(s), or both. Logical switches, logical routers, and logical overlay networks may be configured usingSDN controller 103,SDN manager 104, etc. To send or receive control information, a local control plane (LCP) agent (not shown) onhost 110A/110B may interact withSDN controller 103 via control-plane channel 101/102. - Advances relating to SDN with overlay networking in the last decade has enabled relatively quick and easy deployment and management of substantially large-scale data centers, usually called Software Defined Data Centers (SDDCs). The scale of these SDDCs has been increasing rapidly, such as towards hundreds of hypervisors that are each capable of hosting hundreds of VMs. In general, overlay networking stretches a layer-2 network over an underlying layer-3 network. Any suitable overlay networking protocol(s) may be implemented, such as Virtual extensible Local Area Network (VXLAN), Stateless Transport Tunneling (STT), Generic Network Virtualization Encapsulation (GENEVE), Generic Routing Encapsulation (GRE), etc.
- In practice, overlay networking protocols require overlay traffic from VMs to be encapsulated with an outer header with source and destination VTEPs. For example in
FIG. 1 ,hypervisor 114A/114B may implement multiple VTEPs to encapsulate and decapsulate packets with an outer header (also known as a tunnel header) identifying a logical overlay network. In particular, hypervisor-A 114A at host-A 110A implementsVTEP1 181 andVTEP2 182, while hypervisor-B 114B at host-B 110B implementsVTEP3 183 andVTEP4 184. Encapsulated packets may be sent via a logical overlay tunnel established between a pair of VTEPs overphysical network 105, over whichrespective hosts 110A-B are in layer-3 connectivity with one another. In other words, the logical overlay tunnel terminates at the VTEPs. - Some example logical overlay networks are shown in
FIG. 2 , which is a schematic diagram illustrating examplemanagement plane view 200 ofSDN environment 100 inFIG. 1 . Here,VM1 131 andVM4 134 are located on a first logical layer-2 segment associated with virtual network identifier (VNI)=5000 and connected to a first logical switch (see “LS1” 201).VM2 132 andVM3 133 are located on a second logical layer-2 segment associated with VNI=6000 and connected to a second logical switch (see “LS2” 202). With the growth of infrastructure-as-a-service (laaS), logical overlay networks may be deployed to support multiple tenants. In this case, each logical overlay network may be designed to be an abstract representation of a tenant's network inSDN environment 100. Depending on the desired implementation, a multi-tier topology may be used to isolate multiple tenants. - A logical DR (see “DR” 205) connects logical switches 201-202 to facilitate communication among VMs 131-134 on different segments. See also logical switch ports “LSP7” 203 and “LSP8” 204, and logical router ports “LRP1” 207 and “LRP2” 208 connecting
DR 205 with logical switches 201-202.Logical switch 201/202 may be implemented collectively bymultiple hosts 110A-B, such as usingvirtual switches 115A-B and represented internally using forwarding tables 116A-B. DR 205 may be implemented collectively by multiple transport nodes, such as usingedge node 206 and hosts 110A-B. For example,DR 205 may be implemented usingDR instances 117A-B and represented internally using routing tables (not shown) atrespective hosts 110A-B. - Edge node 206 (labelled “EDGE”) may implement one or more logical DRs and logical service routers (SRs), such as
DR 205 andSR 209 inFIG. 2 .SR 209 may represent a centralized routing component that provides centralized stateful services to VMs 131-134, such as IP address assignment using dynamic host configuration protocol (DHCP), network address translation (NAT), etc.EDGE 206 may be implemented using VM(s) and/or physical machines (“bare metal machines”), and capable of performing functionalities of a switch, router (e.g., logical service router), bridge, gateway, edge appliance, or any combination thereof. In practice,EDGE 206 may be deployed at the edge of a geographical site to facilitate north-south traffic to an external network, such as another data center at a different geographical site. - One of the challenges in
SDN environment 100 is to maintain the availability of overlay networking to support packet forwarding to/from VMs 131-134. These workloads require network connectivity to support various applications, such as web servers, databases, proxies, network functions, etc. However, with the increased use of overlay networking protocols, the complexity ofSDN environment 100 also increases, which inevitably introduces more possible failure points. For example inFIG. 1 , multiple VTEPs 181-182 may be configured on host-A 110A for overlay networking. - Conventionally, VM(s) may be mapped to
VTEP 181/182 that is responsible for packet encapsulation and decapsulation for the VM(s). WhenVTEP 181/182 fails, however, overlay networking connectivity for the VM(s) will be lost. This is especially problematic there is a large number (e.g., several hundreds) of VMs that are mapped toparticular VTEP 181/182. - According to examples of the present disclosure, the health of VTEPs 181-184 may be monitored and VM-VTEP mapping information updated dynamically and automatically on
host 110A/110B to facilitate high-availability overlay networking. In more detail,FIG. 3 is a flowchart ofexample process 300 for a computer system to perform VTEP mapping for overlay networking inSDN environment 100.Example process 300 may include one or more operations, functions, or actions illustrated by one or more blocks, such as 310 to 360. The various blocks may be combined into fewer blocks, divided into additional blocks, and/or eliminated depending on the desired implementation. In the following, various examples will be explained usinghost 110A as an example “computer system” andVM1 131 as an example “virtualized computing instance.” - At 310 in
FIG. 3 , host-A 110A may monitor multiple VTEPs that are configured on host-A 110A for overlay networking, includingfirst VTEP1 181 andsecond VTEP2 182. At 320, 330 and 340, in response to detecting a state transition associated withfirst VTEP 181 from a HEALTHY state to an UNHEALTHY state, host-A 110A may identify mapping information that associatesVM1 131 withfirst VTEP1 181 and update the mapping information to associateVM1 131 withsecond VTEP2 182 instead. This has the effect of migratingVM1 131 fromfirst VTEP1 181 in the UNHEALTHY state tosecond VTEP2 182 in the HEALTHY state. - At 350 and 360 in
FIG. 3 , in response to detecting anegress packet VM1 131 to a destination, host-A 110A may generate and send an encapsulated packet towards the destination based on the updated mapping information. In this case, the encapsulated packet includes the egress packet and an outer header identifying thesecond VTEP2 182 to be a source VTEP. - For example in
FIGS. 1-2 , first encapsulated packet (see 190) may be generated and sent based on mapping information=(VM1 131, VTEP1 181). In this case, first encapsulatedpacket 190 may include an outer header (O1) identifying source VTEP=VTEP1 181. In response to detecting a state transition from the HEALTHY state to the UNHEALHTY state, the mapping information may be updated to (VM1 131, VTEP2 182). Based on the updated mapping information, second encapsulated packet (see 192 inFIGS. 1-2 ) may be generated and sent towardsdestination VTEP3 183 on host-B 110B. Second encapsulatedpacket 192 may include an outer header (O2) identifying source VTEP=VTEP2 182 instead ofVTEP1 181. - Depending on the desired implementation, initial mapping=(
VM1 131, VTEP1 181) may be configured onceVM1 131 is created or connected to a network based on any suitable teaming policy, such as load balancing based on a configuration parameter (e.g., VNIC Port ID, MAC address) associated withVM1 131 and a failover order associated with multiple VTEPs 181-182. The initial mapping=(VM1, VTEP1) may be restored once connectivity viafirst VTEP1 181 has recovered. For example, in response to detecting a subsequent state transition associated withfirst VTEP1 181 from the UNHEALTHY state to the HEALTHY state, host-A 110A may update the mapping information to reassociateVM1 131 withVTEP1 181. This has the effect of migratingVM1 131 fromsecond VTEP2 182 tofirst VTEP1 181, both being in the HEALTHY state. This way, overlay networking traffic may be load balanced among VTEPs 181-182 on host-A 110A. - As will be described further below,
VTEP 181/182 may transition between a HEALTHY state and an UNHEALHTY state according to a state machine inFIG. 6 . In a first example, block 320 may involve detecting the state transition to a first UNHEALTHY state (e.g., IP_WAITING inFIG. 6 ) in whichfirst VTEP 181 has not been assigned with a valid IP address by a Dynamic Host Configuration Protocol (DHCP) server, or the lease of the IP address has expired. In a second example,first VTEP 181 may transition to a second UNHEALTHY state (e.g., BFD_DOWN inFIG. 6 ) in which each and every overlay networking path viafirst VTEP1 181 is down. In a third example,first VTEP 181 may transition to a third UNHEALTHY state (e.g., ADMIN_DOWN inFIG. 6 ) that is configured by a network administrator, such as for maintenance and troubleshooting purposes. - Examples of the present disclosure should be contrasted against conventional approaches that rely on static VM-VTEP mapping. In this case, when there is a failure affecting a VTEP, VMs mapped to the VTEP will be affected because all overlay traffic will be dropped. The loss of overlay networking connectivity is especially problematic for VMs are running critical workloads and/or when a large number of VMs (e.g., order of hundreds) are mapped to the VTEP. In some cases, a network administrator may have to intervene and restore connectivity, which is time consuming and inefficient. As will be described further below, connectivity loss and the need for manual intervention may be reduced using examples of the present disclosure. In enterprises and cloud operations, any improvement in the availability of overlay networking is important because every second of downtime may lead to huge losses and degraded user experience.
-
FIG. 4 is a flowchart of exampledetailed process 400 for VTEP mapping for overlay networking inSDN environment 100.Example process 400 may include one or more operations, functions, or actions illustrated at 410 to 460. The various operations, functions or actions may be combined into fewer blocks, divided into additional blocks, and/or eliminated depending on the desired implementation. The example inFIG. 4 will be explained usingFIG. 5 , which is a schematic diagram illustrating example 500 of VTEP mapping for overlay networking. - Examples of the present disclosure may be implemented any suitable software and/or hardware component(s) that will be collectively represented using VTEP mapping module 118A/118B in
FIG. 1 . Depending on the desired implementation, mapping module 118A/118B may include (1) an interface sub-module to check the IP address assignment, (2) a monitoring sub-module to manage monitoring sessions established between VTEPs, (3) remap up/down sub-module(s) to update mapping information dynamically, etc. In relation to overlay networking, the following notations will be used below: SIP=source IP address, DIP=destination IP address, OUTER_SIP=outer source VTEP IP address in an outer header, OUTER_DIP=outer destination VTEP IP address in the outer header, etc. - Referring first to
FIG. 5 , host-A 110A may be configured with multiple (N) VTEPs for overlay networking. Each VTEP may be denoted as VTEPi, where i=1, . . . , N. For the case N=2,VTEP1 181 andVTEP2 182 are configured for overlay networking on host-A 110A. In practice, VTEPs 181-182 may be created as ports onvirtual switch 115A. Like any other interface,VTEP 181/182 requires an IP address and a MAC address. For example,VTEP1 181 may be associated with (IP address=IP-VTEP1, MAC address=MAC-VTEP1, VTEP label=VTEP1) andVTEP2 182 with (IP address=IP-VTEP2, MAC address=MAC-VTEP2, VTEP label=VTEP2). To connect tophysical network 105, each VTEPi may be associated with an uplink (denoted as UPLINKi), such as UPLINK1 forVTEP1 181 and UPLINK2 forVTEP2 182. See 501-502 inFIG. 5 . - As used herein, an “uplink” may represent a logical construct for a connection to a network. From the perspective of
host 110A/B, the term “uplink” may refer generally to a network connection fromhost 110A/B viaPNIC 171/172/173/174 to a physical network device (e.g., top-of-rack switch, spine switch, router) inphysical network 105. The term “downlink,” on the other hand, may refer to a connection from the physical network device to host 110A/B. In practice, the mapping between an uplink and a PNIC may be one-to-one (i.e., one PNIC per uplink). Alternatively, a NIC teaming policy may be implemented to map multiple PNICs to one uplink. The term “NIC teaming” may refer to the grouping of multiple PNICs into one logical NIC. - Referring also to
FIG. 4 , at 405, host-A 110A may perform initial VM-VTEP mapping forVM1 131 andVM2 132. For example, whenVM1 131 is created and connected to a network, host-A 110A may create a VNIC port onvirtual switch 115A forVNIC 161 ofVM1 131. Similarly, forVM2 132, a VNIC port may be created onvirtual switch 115A forVNIC2 162.VM1 131 andVM2 132 are connected to the samevirtual switch 115A via respective VNIC ports. - To be mappable to different VTEPs, each VM may be configured with multiple VNICs. Each VNIC may be associated with a single VTEP for overlay networking. The one-to-one mapping is to reduce the risk of, if not prevent, MAC flaps on remote hosts. For example,
VM1 131 may be allocated with multiple VNICs (collectively represented as 161 inFIG. 1 ), including a first VNIC that is mappable to VTEP1 181 and a second VNIC mappable to VTEP2 182 viavirtual switch 115A. Similar configuration may be made forVM2 132 on host-A 110A, as well asVM3 133 andVM4 134 on host-B 110B. - Next, a VTEP may be selected for
VM 131/132 based on any suitable teaming policy. In the example inFIG. 5 ,VM1 131 is mapped to VTEP1 181 (see 510), andVM2 132 to VTEP2 182 (see 520). Once determined, the VM-VTEP mapping or association may not change unless there is a change in the teaming policy, or a VTEP is added, removed, or marked as standby. Any suitable teaming policy may be used, such as load balancing based on a configuration parameter (e.g., VNIC port ID, VNIC MAC address) associated withVM 131/132, failover order associated with multiple VTEPs 181-182, etc. These example policies will be discussed below. - (1) In a first example, VTEP selection may be performed to achieve load balancing based on source VNIC port ID (denoted as VNICPortID) associated with
VM 131/132. In this case, VTEP selection may involve determining modulo operation: endpointID=VNICPortID % N. Here, N=number of VTEPs and endpointID=unique ID assigned to a VTEP. For example, the modulo operation mapsVM 131/132 to either endpointID=0 assigned to VTEP1 181 or endpointID=1 assigned toVTEP2 182. - (2) In a second example, VTEP selection may be performed to achieve load balancing based on source VNIC MAC address (MACAddr) associated with
VM 131/132. In this case, VTEP selection may involve determining modulo operation: endpoint!D=MACAddr % N. Here, the sixth octet of the MAC address may be used instead of the VNIC port ID to mapVM 131/132 to eitherVTEP1 181 orVTEP2 182. - (3) In a third example, VTEP selection may be performed based on a failover order associated with VTEPs 181-182. For example, host-
A 110A may be configured with two active VTEPs 181-182, as well as a standby VTEP (not shown). Once an active VTEP fails, the standby VTEP may switch to the active mode and take over. - At 410 in
FIG. 4 , host-A 110A may monitor VTEPs 181-182 configured for overlay networking. Each VTEPi may be associated with a health status or state (denoted as STATE-i) that is either HEALHTY or UNHEALTHY. For example, block 410 may involve monitoring whetherVTEP 181/182 is assigned with a valid IP address by a DHCP server, or a lease for the IP address has expired. Additionally or alternatively, block 410 may involve monitoring a path (also known as a logical overlay tunnel) betweenlocal VTEP 181/182 on host-A 110A andremote VTEP 183/184 on host-B 110B. See also 411-412. - Any fault detection or continuity check protocol suitable for monitoring purposes may be used at
block 411. One example is Bidirectional Forwarding Detection (BFD) protocol hat is defined in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 5880, which is incorporated herein by reference. In overlay networking, BFD may be used between two VTEPs to detect failures in the underlay path between them. Using an asynchronous mode, for example, BFD packets may be generated and sent (e.g., using mapping module 118A/118B) over a BFD session periodically. - Some example HEALTHY and UNHEALTHY states will be discussed using
FIG. 6 , which is a schematic diagram illustrating exampleVTEP state machine 600. There are five states that a VTEP might be in: initialization state (see INIT 601), normal operational state (see NORMAL 602), awaiting IP address assignment state (see IP_WAITING 603), BFD session down state (see BFD_DOWN 604) and administrator-configured down state (see ADMIN_DOWN 605). When created,VTEP 181/182 will be in state=INIT 601. -
VTEP 181/182 may be considered HEALTHY when operating in state=NORMAL 602. Otherwise,VTEP 181/182 may be considered UNHEALTHY when inIP_WAITING 603,BFD_DOWN 604 orADMIN_DOWN 605. In this case, host-A 110A may detect the following state transitions: - At 610 in
FIG. 6 , a state transition toINIT 601 from IP_WAITING 603 (i.e., UNHEALTHY) may be detected when a valid IP address is not assigned toVTEP 181/182 within a predetermined period of time (i.e., timeout period). The IP address assignment might fail for various reasons, such as a DHCP server being unreachable or running out of IP addresses available for assignment (e.g., due to server expansion). - At 620 in
FIG. 6 , a state transition fromINIT 601 to NORMAL 602 (i.e., HEALTHY) may be detected when a valid IP address is assigned toVTEP 181/182 and all its BFD sessions are up and running. - At 630 in
FIG. 6 , a state transition fromNORMAL 602 to IP_WAITING 603 (i.e., UNHEALTHY) may be detected when an IP address assigned toVTEP 181/182 is lost. In practice, when an IP address is assigned by the DHCP server, the IP address is leased for a specific amount of time called DHCP lease time. The IP address may be lost when the lease is not renewed, such as when DHCP server is unreachable or has run out of IP addresses. - At 640 in
FIG. 6 , a state transition fromNORMAL 602 to BFD_DOWN 604 (i.e., UNHEALTHY) may be detected when each and every overlay networking path and associated BFD session established using thatVTEP 181/182 is down. For example, a full-mesh topology may be used to establish BFD sessions among VTEPs 181-184. At host-A 110A, for example,local VTEP1 181 may establish two BFD sessions with respectiveremote VTEP3 183 andVTEP4 184 on host-B 110B. The state transition occurs when each and every BFD session is down. At 641, a state transition fromBFD_DOWN 604 toNORMAL 602 may occur when at least one BFD session is up, or there is an IP address change event. The IP address change may be detected when a new DHCP lease with a different IP address is given by a DHCP server during lease renewal, or an operator manually changes the VTEP IP address (e.g., using SDN manager 103). Note that if there is at least one of the BFD sessions is up and running,VTEP1 181 remains inNORMAL 602 and no state transition toBFD_DOWN 604 will occur. - At 650 in
FIG. 6 , a state transition fromBFD_DOWN 604 to IP_WAITING 603 (i.e., UNHEALTHY) may be detected when an IP address assigned toVTEP 181/182 is lost. Again, this may occur when the DHCP server becomes unreachable or has run out of IP addresses. - At 660, 670, 680 and 690 in
FIG. 6 , a state transition to ADMIN_DOWN 605 (i.e., UNHEALTHY) fromINIT 601,NORMAL 602,IP_WAITING 603 or BFD_DOWN 604 may be detected.ADMIN_DOWN 605 represents a state that is configured by a network administrator to bringVTEP 181/182 down, such as for maintenance and troubleshooting purposes. At 681, a state transition fromADMIN_DOWN 605 toINIT 601 may occur when the network administrator performs configuration to bringVTEP 181/182 up and running again. - At 415-420 in
FIG. 4 , in response to detecting a state transition from HEALTHY to UNHEALTHY, host-A 110A may update a VM-VTEP mapping after a timeout period. In practice, block 415 may involve a notification system generating system notifications relating to state transitions, and a remap module listening to the notifications to detect any faulty VTEP. The timeout period may be user-configurable to avoid unnecessary remapping due to transient faults. Once the timeout period has elapsed, at 425, host-A 110A may identify a HEALTHY VTEPk where k≠i and i, k€{1, . . . , N}. This way, at 430-435, each VM that is mapped to the UNHEALTHY VTEPi may be identified and migrated to the HEALTHY VTEPK. - In the example in
FIG. 5 , bothVTEP1 181 andVTEP2 182 may be detected to be HEALTHY (e.g., NORMAL 602) at one point in time. In this case, host-A 110A may configure mapping information identifying first mapping=(VM1 131, VTEP1 181) and second mapping=(VM2 132, VTEP2 182) according to any suitable teaming policy. See 510-540 inFIG. 5 . - After some time, however, host-
A 110A may detect a state transition associated withVTEP1 181 from HEALTHY (e.g., NORMAL 602) to UNHEALTHY (e.g., IP_WAITING 603). Once the timeout period has elapsed, host-A 110A may identifyVTEP2 182 to be in state=HEALTHY (e.g., NORMAL 602) and update the mapping information to associateVM1 131 withVTEP2 182. This has the effect of migratingVM1 131 from source=VTEP1 181 in the UNHEALTHY state to target=VTEP2 182 in the HEALTHY state. SinceVTEP2 182 remains HEALTHY, the (VM2 132, VTEP2 182) mapping is not affected. See 550 (state transition), 560 (updated state) and 570 (updated mapping information) inFIG. 5 . - At 440 in
FIG. 4 , once the VM-VTEP mapping is updated, host-A 110A may generate and send a report to informSDN controller 104 of the updated mapping information, such as (VM1 131, VTEP2 182) inFIG. 5 . The control information may be sent to causeSDN controller 104 to propagate the updated mapping information to remote hosts, including host-B 110B to facilitate packet forwarding towardsVM1 131 usingdestination VTEP2 182 instead ofVTEP1 181. - At 445-455 in
FIG. 4 , in response to detecting a state transition from UNHEALTHY to HEALTHY, host-A 110A may restore a VM-VTEP mapping. In particular, at 450, in response to detecting that VTEPi has recovered, host-A 110A may identify VM(s) previously mapped to VTEPi and migrated to VTEPk. This way, at 455, host-A 110A may migrate the VM(s) from VTEPk to VTEPi for load balancing purposes. Further, at 460, a report may be generated and sent to informSDN controller 104 and trigger propagation of the updated mapping to other hosts, including host-B 110B. - In the example in
FIG. 5 , host-A 110A may detect a state transition from UNHEALTHY (e.g., IP_WAITING 603) to HEALTHY (e.g., NORMAL 602) forVTEP 181. In response, host-A 110A may identify restore the first mapping to (VM1 131, VTEP1 181) assuming that the teaming policy has not changed and no new VTEP is added or removed. This has the effect of migratingVM1 131 fromVTEP2 182 toVTEP1 181. In practice, blocks 410-460 may be repeated as required asVTEP 181/182 alternates between HEALTHY and UNHEALTHY. See also 550 and 580 inFIG. 5 . - In practice, whenever mapping information is updated, host-
A 110A may generate and send a notification tomanagement entity 103/104, such as to alert a network administrator using a user interface provided bySDN manager 103 on the management plane. The user interface may also display VTEP state information and support administrative operations to transition to/fromADMIN_DOWN 670 state. In practice, the user interface may be a graphical user interface (GUI), command line interface (CLI), application programming interface (API), etc. -
FIG. 7 is a schematic diagram illustrating example 700 of overlay traffic forwarding based on the mapping information inFIG. 5 . Any suitable tunneling protocol or encapsulation mechanism may be used for overlay networking, such as VXLAN, GENEVE, GRE, etc. The encapsulation mechanisms are generally connectionless. Using GENENE as an example, various implementation details may be found in a draft document entitled “GENEVE: Generic Network Virtualization Encapsulation” (draft-ietf-nvo3-geneve-16) published by Internet Engineering Task Force (IETF). The document is incorporated herein by reference. - At 710 and 720 in
FIG. 7 , in response to detecting a first egress packet (P1) fromVM1 131 toVM3 133, a first encapsulated packet (O1, P1) may be generated based on mapping information=(VM1 131, VTEP1 181). In this case,VTEP1 181 is associated with state=HEALTHY (e.g., NORMAL). The egress packet (P1) may specify (SIP=IP-VM1, DIP=IP-VM3) associated withrespective source VM1 131 on host-A 110A anddestination VM3 133 on host-B 110B. The first encapsulated packet may include the egress packet (P1) and an outer header (O1) specifying specify (OUTER_SIP=IP-VTEP1, OUTER_DIP=IP-VTEP2) associated withrespective source VTEP1 181 on host-A 110A anddestination VTEP3 183 on host-B 110B. - At 730 in
FIG. 7 , in response to receiving the first encapsulated packet,destination VTEP3 183 may perform decapsulation and forward the inner packet (P1) toVM3 133. Based on mapping information=(VM1 131, VTEP1 181), any return traffic fromVM3 133 toVM1 131 may be sent fromVTEP3 183 on host-B 110B to VTEP1 181 on host-A 110A. Note that the source and destination VMs may be associated with the same VNI, or different VNIs. Using the example inFIG. 2 ,VM1 131 andVM3 133 may be in different VNIs and connected via logical switches (e.g.,LS1 201 and LS2 202) and a logical router (e.g., DR 205). - At 740 in
FIG. 7 , in response to detecting a state transition associated withVTEP1 181 from HEALTHY to UNHEALTHY, host-A 110A may update the mapping information to (VM1 131, VTEP2 182). Again, this has the effect of migratingVM1 131 fromVTEP1 181 in the UNHEALTHY state (e.g., IP_WAITING, BFD_DOWN or ADMIN_DOWN) to VTEP2 182 in the HEALTHY state (e.g., NORMAL). - At 750 and 760 in
FIG. 7 , in response to detecting a second egress packet (P2) fromVM1 131 toVM3 133, a second encapsulated packet (O2, P2) may be generated based on updated mapping information=(VM1 131, VTEP2 182). The second encapsulated packet may be generated by encapsulating the egress packet (P2) with an outer header (O2) specifying specify (OUTER_SIP=IP-VTEP2, OUTER_DIP=IP-VTEP2) associated withrespective source VTEP2 182 on host-A 110A anddestination VTEP3 183 on host-B 110B. - At 770 in
FIG. 7 , in response to receiving the second encapsulated packet,destination VTEP3 183 may perform decapsulation and forward the inner packet (P2) toVM3 133. Based on updated mapping information=(VM1 131, VTEP2 182) learned from the second encapsulated packet and/or received fromSDN controller 104, any return traffic fromVM3 133 toVM1 131 may be sent fromVTEP3 183 on host-B 110B to VTEP2 182 on host-A 110A. - Similar to the example in
FIG. 5 , the mapping information may be updated dynamically based on the state ofVTEP 181/182. This reduces the likelihood of the connectivity loss for VM(s) mapped toparticular VTEP1 181 based on a teaming policy. Instead of maintaining the mapping statically, the VM(s) may be migrated to facilitate high availability of overlay networking. This reduces system downtime and improves VM performance. Based on the above examples, automatic remapping of VMs to HEALTHY VTEPs may be performed to support high availability of overlay networking to improve VM performance and user experience. -
FIG. 8 is a schematic diagram illustrating second example 800 of VTEP mapping for overlay networking. In this example, multiple VMs may be migrated from a source VTEP to respective multiple destination VTEPs for load balancing purposes. For example, host-A 110A may be configured with N=4 VTEPs for overlay networking, particularly VTEP-A1 181, VTEP-A2 182, VTEP-A3 801 and VTEP-A4 802. - At 810-840 in
FIG. 8 , host-A 110A may generate mapping information that associates multiple VMs (i.e.,VM1 131,VM5 135,VM6 136 and VM7 137) with VTEP-A1 181. Here, all VTEPs 181-182, 801-802 are in state=HEALTHY and mapped to respective uplinks 501-502, 803-804. - At 850 in
FIG. 8 , host-A 110A may detect a state transition associated with VTEP-A1 181 from HEALTHY to UNHEALTHY (e.g.,BFD_DOWN 604 inFIG. 6 ). In response, host-A 110A may update the mapping information to migrateVMs 131, 135-137 from VTEP-A1 181. For example,VM1 131 may be migrated to VTEP-A2 182 (see 860),VM5 135 also to VTEP-A2 182 (see 870),VM6 136 to VTEP-A3 801 (see 880), andVM7 137 to VTEP-A4 802 (see 890). This way, overlay traffic from these VMs may continue to flow while a network administrator fixes issues affecting VTEP-A1 181. - In practice, the destination VTEP for each VM may be selected at random, and/or using a teaming policy. For example, VTEP selection may be performed to achieve load balancing based on a configuration parameter (e.g., VNICPortID or MACAddr) associated with
VM 131/135/136/137. Since there are N−1=3 VTEPs in state=HEALTHY for overlay networking, the following modulo operation may be performed to select VTEP-A2 182, VTEP-A3 801 or VTEP-A4 802: endpointID=(VNICPortID or MACAddr) % (N−1). - In another example, the VTEP selection may be load-based, such as based on the number of VMs that are already mapped to VTEP-
A2 182, VTEP-A3 801 or VTEP-A4 802. This way, multiple (M=4) VMs requiring migration may be distributed among N−1=3 VTEPs that are operating in state=HEALTHY to reduce the risk of overloading a particular VTEP. Another example may involve tracking a performance metric (e.g., packet rate) on the uplinks and selecting a VTEP associated with a particular uplink with the least usage. - At 895 in
FIG. 8 , when faulty VTEP-A1 181 is fixed and transitions into state=HEALTHY again, host-A 110A may restore the initial mappings by migratingVMs 131, 135-137 back to VTEP-A1 181. Based on the above, examples of the present disclosure facilitate high-availability overlay networking to reduce downtime inSDN environment 100. - Although explained using VMs, it should be understood that
public cloud environment 100 may include other virtual workloads, such as containers, etc. As used herein, the term “container” (also known as “container instance”) is used generally to describe an application that is encapsulated with all its dependencies (e.g., binaries, libraries, etc.). In the examples inFIG. 1 toFIG. 8 , container technologies may be used to run various containers inside respective VMs 131-134. Containers are “OS-less”, meaning that they do not include any OS that could weigh 10s of Gigabytes (GB). This makes containers more lightweight, portable, efficient, and suitable for delivery into an isolated OS environment. Running containers inside a VM (known as “containers-on-virtual-machine” approach) not only leverages the benefits of container technologies but also that of virtualization technologies. The containers may be executed as isolated processes inside respective VMs. - The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform process(es) described herein with reference to
FIG. 1 toFIG. 8 . - The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
- The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
- Those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
- Software and/or to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
- The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.
Claims (20)
1. A method, comprising:
monitoring, by a computer system, multiple virtual tunnel endpoints (VTEP) that are configured on the computer system for overlay networking, wherein the multiple VTEPs include a first VTEP and a second VTEP;
in response to detecting a state transition associated with the first VTEP from a first state to a second state, identifying mapping information that associates a virtualized computing instance supported by the computer system with the first VTEP; and
updating the mapping information to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP to the second VTEP.
2. The method of claim 1 , wherein the method further comprises:
in response to detecting an egress packet from the virtualized computing instance to a destination, generating and sending an encapsulated packet towards the destination based on the updated mapping information, wherein the encapsulated packet includes the egress packet and an outer header identifying the second VTEP to be a source VTEP.
3. The method of claim 1 , wherein the first state is a healthy state and the second state is an unhealthy state, and wherein detecting the state transition comprises at least one of the following:
detecting the state transition to a first unhealthy state in which (a) the first VTEP has not been assigned with a valid Internet Protocol (IP) address or (b) a lease associated with the IP address has expired;
detecting the state transition to a second unhealthy state in which each and every overlay networking path via the first VTEP is down; or
detecting the state transition to a third unhealthy state that is configured by a network administrator.
4. The method of claim 1 , wherein detecting the state transition comprises: determining that the first VTEP remains in an unhealthy state after a timeout period has elapsed.
5. The method of claim 1 , wherein the method further comprises:
generating and sending a report to a management entity to cause the management entity to propagate the updated mapping information to multiple destination VTEPs.
6. The method of claim 1 , wherein identifying the mapping information comprises:
identifying the mapping information that is configured based on one of the following teaming policies: (a) load balancing among the multiple VTEPs based on a configuration parameter associated with the virtualized computing instance and (b) a failover order associated with the multiple VTEPs.
7. The method of claim 1 , wherein the method further comprises:
in response to detecting the state transition, selecting the second VTEP for the virtualized computing instance based on one of the following: a configuration parameter associated with the virtualized computing instance and number of virtualized computing instances mapped to the second VTEP.
8. A non-transitory computer-readable storage medium having stored thereon a set of instructions executable by one or more processors to perform operations comprising:
monitoring, by a computer system, multiple virtual tunnel endpoints (VTEP) that are configured on the computer system for overlay networking, wherein the multiple VTEPs include a first VTEP and a second VTEP;
in response to detecting a state transition associated with the first VTEP from a first state to a second state, identifying mapping information that associates a virtualized computing instance supported by the computer system with the first VTEP; and
updating the mapping information to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP to the second VTEP.
9. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise:
in response to detecting an egress packet from the virtualized computing instance to a destination, generating and sending an encapsulated packet towards the destination based on the updated mapping information, wherein the encapsulated packet includes the egress packet and an outer header identifying the second VTEP to be a source VTEP.
10. The non-transitory computer-readable storage medium of claim 8 , wherein the first state is a healthy state and the second state is an unhealthy state, and wherein detecting the state transition comprises at least one of the following:
detecting the state transition to a first unhealthy state in which (a) the first VTEP has not been assigned with a valid Internet Protocol (IP) address or (b) a lease associated with the IP address has expired;
detecting the state transition to a second unhealthy state in which each and every overlay networking path via the first VTEP is down; or
detecting the state transition to a third unhealthy state that is configured by a network administrator.
11. The non-transitory computer-readable storage medium of claim 8 , wherein detecting the state transition comprises:
determining that the first VTEP remains in an unhealthy state after a timeout period has elapsed.
12. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise:
generating and sending a report to a management entity to cause the management entity to propagate the updated mapping information to multiple destination VTEPs.
13. The non-transitory computer-readable storage medium of claim 8 , wherein identifying the mapping information comprises:
identifying the mapping information that is configured based on one of the following teaming policies: (a) load balancing among the multiple VTEPs based on a configuration parameter associated with the virtualized computing instance and (b) a failover order associated with the multiple VTEPs.
14. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise:
in response to detecting the state transition, selecting the second VTEP for the virtualized computing instance based on one of the following: a configuration parameter associated with the virtualized computing instance and number of virtualized computing instances mapped to the second VTEP.
15. A computer system, comprising:
one or more processors; and
a non-transitory computer-readable storage medium having stored thereon a set of instructions executable by one or more processors to perform operations comprising:
monitoring, by a computer system, multiple virtual tunnel endpoints (VTEP) that are configured on the computer system for overlay networking, wherein the multiple VTEPs include a first VTEP and a second VTEP;
in response to detecting a state transition associated with the first VTEP from a first state to a second state, identifying mapping information that associates a virtualized computing instance supported by the computer system with the first VTEP; and
updating the mapping information to associate the virtualized computing instance with the second VTEP, thereby migrating the virtualized computing instance from the first VTEP to the second VTEP.
16. The computer system of claim 15 , wherein the operations further comprise:
in response to detecting an egress packet from the virtualized computing instance to a destination, generating and sending an encapsulated packet towards the destination based on the updated mapping information, wherein the encapsulated packet includes the egress packet and an outer header identifying the second VTEP to be a source VTEP.
17. The computer system of claim 15 , wherein the first state is a healthy state and the second state is an unhealthy state, and wherein detecting the state transition comprises at least one of the following:
detecting the state transition to a first unhealthy state in which (a) the first VTEP has not been assigned with a valid Internet Protocol (IP) address or (b) a lease associated with the IP address has expired;
detecting the state transition to a second unhealthy state in which each and every overlay networking path via the first VTEP is down; or
detecting the state transition to a third unhealthy state that is configured by a network administrator.
18. The computer system of claim 15 , wherein detecting the state transition comprises:
determining that the first VTEP remains in an unhealthy state after a timeout period has elapsed.
19. The computer system of claim 15 , wherein the operations further comprise:
generating and sending a report to a management entity to cause the management entity to propagate the updated mapping information to multiple destination VTEPs.
20. The computer system of claim 15 , wherein identifying the mapping information comprises:
identifying the mapping information that is configured based on one of the following teaming policies: (a) load balancing among the multiple VTEPs based on a configuration parameter associated with the virtualized computing instance and (b) a failover order associated with the multiple VTEPs.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US19/055,419 US20250219869A1 (en) | 2021-12-23 | 2025-02-17 | Virtual tunnel endpoint (vtep) mapping for overlay networking |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/560,284 US12231262B2 (en) | 2021-12-23 | 2021-12-23 | Virtual tunnel endpoint (VTEP) mapping for overlay networking |
| US19/055,419 US20250219869A1 (en) | 2021-12-23 | 2025-02-17 | Virtual tunnel endpoint (vtep) mapping for overlay networking |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/560,284 Continuation US12231262B2 (en) | 2021-12-23 | 2021-12-23 | Virtual tunnel endpoint (VTEP) mapping for overlay networking |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20250219869A1 true US20250219869A1 (en) | 2025-07-03 |
Family
ID=86896334
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/560,284 Active 2042-12-27 US12231262B2 (en) | 2021-12-23 | 2021-12-23 | Virtual tunnel endpoint (VTEP) mapping for overlay networking |
| US19/055,419 Pending US20250219869A1 (en) | 2021-12-23 | 2025-02-17 | Virtual tunnel endpoint (vtep) mapping for overlay networking |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/560,284 Active 2042-12-27 US12231262B2 (en) | 2021-12-23 | 2021-12-23 | Virtual tunnel endpoint (VTEP) mapping for overlay networking |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US12231262B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12519725B2 (en) * | 2022-11-22 | 2026-01-06 | Dell Products L.P. | VTEP multipath data traffic forwarding system |
| CN120856500B (en) * | 2025-09-22 | 2026-01-27 | 中移(苏州)软件技术有限公司 | Data processing methods, apparatus, equipment, storage media, and computer program products |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9397946B1 (en) * | 2013-11-05 | 2016-07-19 | Cisco Technology, Inc. | Forwarding to clusters of service nodes |
| JP6434821B2 (en) * | 2015-02-19 | 2018-12-05 | アラクサラネットワークス株式会社 | Communication apparatus and communication method |
| US10719341B2 (en) * | 2015-12-02 | 2020-07-21 | Nicira, Inc. | Learning of tunnel endpoint selections |
| JP6549996B2 (en) * | 2016-01-27 | 2019-07-24 | アラクサラネットワークス株式会社 | Network apparatus, communication method, and network system |
| US10931629B2 (en) * | 2016-05-27 | 2021-02-23 | Cisco Technology, Inc. | Techniques for managing software defined networking controller in-band communications in a data center network |
| US10454758B2 (en) * | 2016-08-31 | 2019-10-22 | Nicira, Inc. | Edge node cluster network redundancy and fast convergence using an underlay anycast VTEP IP |
| CN107846342B (en) * | 2016-09-20 | 2020-11-06 | 华为技术有限公司 | Method, device and system for forwarding VXLAN message |
| US10999196B2 (en) * | 2019-02-25 | 2021-05-04 | Vmware, Inc. | Global replication mode for overlay runtime state migration |
| US11271776B2 (en) * | 2019-07-23 | 2022-03-08 | Vmware, Inc. | Logical overlay network monitoring |
| US11881986B2 (en) * | 2020-12-30 | 2024-01-23 | Arista Networks, Inc. | Fast failover support for remote connectivity failure for a virtual tunnel |
-
2021
- 2021-12-23 US US17/560,284 patent/US12231262B2/en active Active
-
2025
- 2025-02-17 US US19/055,419 patent/US20250219869A1/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| US20230208678A1 (en) | 2023-06-29 |
| US12231262B2 (en) | 2025-02-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10432426B2 (en) | Port mirroring in a virtualized computing environment | |
| US10536362B2 (en) | Configuring traffic flow monitoring in virtualized computing environments | |
| US10560375B2 (en) | Packet flow information invalidation in software-defined networking (SDN) environments | |
| CN110971442A (en) | Migrating workloads in a multi-cloud computing environment | |
| US11641305B2 (en) | Network diagnosis in software-defined networking (SDN) environments | |
| US20250219869A1 (en) | Virtual tunnel endpoint (vtep) mapping for overlay networking | |
| US11652717B2 (en) | Simulation-based cross-cloud connectivity checks | |
| US11546242B2 (en) | Logical overlay tunnel monitoring | |
| US11627080B2 (en) | Service insertion in public cloud environments | |
| US20250274392A1 (en) | Handling virtual machine migration in a computing system with multi-site stretched gateways | |
| US20240414025A1 (en) | Managing Traffic for Endpoints in Data Center Environments to Provide Cloud Management Connectivity | |
| US11271776B2 (en) | Logical overlay network monitoring | |
| US10447581B2 (en) | Failure handling at logical routers according to a non-preemptive mode | |
| US11303701B2 (en) | Handling failure at logical routers | |
| US11005745B2 (en) | Network configuration failure diagnosis in software-defined networking (SDN) environments | |
| US11558220B2 (en) | Uplink-aware monitoring of logical overlay tunnels | |
| US11695665B2 (en) | Cross-cloud connectivity checks | |
| US10938632B2 (en) | Query failure diagnosis in software-defined networking (SDN) environments | |
| US12413550B2 (en) | Media access control (MAC) address assignment for virtual network interface cards (VNICS) | |
| US20240031290A1 (en) | Centralized service insertion in an active-active logical service router (sr) cluster | |
| CN119232644A (en) | Redundant containerized virtual router for use with virtual private cloud | |
| US20230163997A1 (en) | Logical overlay tunnel selection | |
| US20210226869A1 (en) | Offline connectivity checks | |
| US20240406104A1 (en) | Adaptive traffic forwarding over multiple connectivity services | |
| US11658899B2 (en) | Routing configuration for data center fabric maintenance |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: VMWARE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:VMWARE, INC.;REEL/FRAME:070418/0748 Effective date: 20231121 Owner name: VMWARE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATHEW, SUBIN CYRIAC;RAMAN, CHIDAMBARESWARAN;RODNEY, PRERIT;AND OTHERS;SIGNING DATES FROM 20220210 TO 20220503;REEL/FRAME:070414/0164 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |