CN115174583A

CN115174583A - Server load balancing method based on programmable data plane

Info

Publication number: CN115174583A
Application number: CN202210744815.5A
Authority: CN
Inventors: 张栋; 郭新共
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-10-11
Anticipated expiration: 2042-06-28
Also published as: CN115174583B

Abstract

The invention relates to a server load balancing method based on a programmable data plane, which is characterized in that in a server cluster of a data center, the current actual performance of a server is measured and weighted according to the real-time load and the available computing power of the server, and the weight of the server is calculated through the WMLCF algorithm, so that the current actual performance of the server can be more accurately reflected. The algorithm periodically updates M servers with higher current performance and issues the servers to the data plane in the form of table entries for load balancing scheduling of new connection requests. In addition, the load balancing method is realized in a band state, and a bloom filter is introduced to ensure the connection consistency in the process of updating the server weight. Finally, the invention uses table compression method to store table to reduce the consumption of limited SRAM memory of programmable data plane, and can save a large amount of connection state, thereby ensuring load balance under high concurrency condition of connection request. Therefore, the invention can realize the load balance of the server with higher performance.

Description

Server Load Balancing Method Based on Programmable Data Plane

技术领域technical field

本发明涉及可编程网络和数据中心中服务器集群的负载均衡领域，尤其涉及对数据中心的服务器集群基于可编程数据平面的带状态负载均衡方法。The invention relates to the field of load balancing of a programmable network and a server cluster in a data center, in particular to a stateful load balancing method based on a programmable data plane for a server cluster of a data center.

背景技术Background technique

一个大规模的数据中心通常会提供数千种服务，比如电商、社交及视频服务等。每个服务分配一个虚拟IP(VIP，Virtual IP)地址，同时由多台物理服务器构成服务器集群共同提供该服务，每台物理服务器拥有专门的IP(DIP，Direct IP)地址。对于一个数据中心，VIP流量占比约44％，而这些大量流量需要流经负载均衡器进行后端服务器之间的调度。因此，需要一个高效的负载均衡方案将VIP流量的每个数据包映射转发到对应的后端服务器，与此同时，属于同一条连接的所有数据包都要发送给同一台服务器处理，以此保证连接一致性。如果同一条连接的数据包转发到不同服务器将导致中断连接，引入了额外的转发时延，影响终端用户体验。A large-scale data center usually provides thousands of services, such as e-commerce, social and video services. Each service is assigned a virtual IP (VIP, Virtual IP) address. At the same time, multiple physical servers form a server cluster to jointly provide the service. Each physical server has a dedicated IP (DIP, Direct IP) address. For a data center, VIP traffic accounts for about 44%, and these large amounts of traffic need to flow through the load balancer for scheduling between backend servers. Therefore, an efficient load balancing scheme is required to map and forward each data packet of the VIP traffic to the corresponding backend server. At the same time, all data packets belonging to the same connection must be sent to the same server for processing to ensure that Connection consistency. If the data packets of the same connection are forwarded to different servers, the connection will be interrupted, additional forwarding delay will be introduced, and the end user experience will be affected.

现有的负载均衡器主要实现方式包括基于软件、硬件以及软硬件混合。软件负载均衡器，实现在现成的服务器上，具有低成本、高可扩展性及高灵活性的优势，但是存在高时延、低吞吐量两个基本问题。此外，为了保证连接一致性，软件负载均衡器通常使用一致性哈希(consistent hashing)将连接请求分发给后端服务器处理。为了解决软件负载均衡器性能有限问题，软硬件混合的负载均衡器结合了硬件比如交换机高性能优势，然而在后端服务器对新连接请求的调度过程多数也仅通过一致性哈希实现。近年来，交换机厂商开始研发可编程交换ASICs，比如Intel的Tofino系列芯片。可编程交换ASICs的落地让交换机数据平面也具备了可编程能力，因为可编程交换ASICs具有更高的性能(如，Tb级别的吞吐量)以及可编程性，已经有许多负载均衡方案将负载均衡功能卸载到可编程数据平面。这些方案在性能上能够实现更高的吞吐量以及更低的转发时延，然而，对于后端服务器的调度多数方案使用一般的哈希算法或者一致性哈希的变体算法。The main implementation methods of the existing load balancer include software-based, hardware-based, and software-hardware hybrid. Software load balancers, implemented on off-the-shelf servers, have the advantages of low cost, high scalability and high flexibility, but there are two basic problems of high latency and low throughput. In addition, in order to ensure connection consistency, software load balancers usually use consistent hashing to distribute connection requests to backend servers for processing. In order to solve the problem of limited performance of software load balancers, mixed software and hardware load balancers combine the high performance advantages of hardware such as switches. However, most of the scheduling process of new connection requests in the backend server is only implemented through consistent hashing. In recent years, switch manufacturers have begun to develop programmable switching ASICs, such as Intel's Tofino series of chips. The landing of programmable switching ASICs makes the switch data plane also programmable. Because programmable switching ASICs have higher performance (eg, Tb-level throughput) and programmability, there are already many load balancing schemes to balance the load. Functions are offloaded to the programmable data plane. These schemes can achieve higher throughput and lower forwarding delay in performance. However, for the scheduling of backend servers, most schemes use general hashing algorithms or variants of consistent hashing.

发明内容SUMMARY OF THE INVENTION

对于现有的负载均衡方案，大多数都依赖基于哈希或者一致性哈希变体机制在服务器之间分配连接请求，没有考虑不同服务器实际状态的差异性，即服务器之间不同的计算能力以及实际的负载，这可能导致服务器之间负载分布不均衡，换句话说，可能导致高性能的服务器空闲而低性能的服务器过载，服务器资源得不到有效利用。有鉴于此，本发明提出一种基于可编程数据平面的带状态负载均衡方法，该方法基于服务器可用处理能力和实时负载进行连接请求的调度，同时带状态方法保证了连接一致性。For the existing load balancing solutions, most of them rely on the hash or consistent hash variant mechanism to distribute connection requests among servers, without considering the difference in the actual state of different servers, that is, the different computing capabilities and Actual load, which may lead to unbalanced load distribution among servers, in other words, may cause high-performance servers to be idle while low-performance servers are overloaded, and server resources cannot be effectively utilized. In view of this, the present invention proposes a stateful load balancing method based on a programmable data plane, which schedules connection requests based on the server's available processing capability and real-time load, while the stateful method ensures connection consistency.

本发明旨在基于可编程数据平面实现一种能够提升服务器负载分布公平性的负载均衡方法，同时在负载均衡调度过程中保证高性能的转发、连接一致性以及减小交换机SRAM空间占用。The invention aims to realize a load balancing method which can improve the fairness of server load distribution based on the programmable data plane, and at the same time guarantee high-performance forwarding and connection consistency and reduce the SRAM space occupation of the switch during the load balancing scheduling process.

为了实现最小化的服务器负载分布不均衡，本发明提出了WMLCF(Weighted M-Least-Connection First)算法，该算法基于服务器可用算力和实时负载计算不同服务器的动态权重，在服务器权重每个更新周期内根据不同权重选择实际剩余算力较高的M台服务器接收连接请求。另外，本发明引入了布隆过滤器(bloom filter)来避免在后端服务器权重更新过程中可能导致违反连接一致性的问题。同时，考虑到交换机内存有限的限制，本发明采用了数据平面中对表项进行压缩存储的方法，以此节省交换机SRAM内存消耗同时能够存储大量的连接状态，保证在连接请求高并发情况下的负载均衡性能。In order to minimize the unbalanced server load distribution, the present invention proposes the WMLCF (Weighted M-Least-Connection First) algorithm, which calculates the dynamic weights of different servers based on the server's available computing power and real-time load. During the cycle, M servers with higher actual remaining computing power are selected according to different weights to receive connection requests. In addition, the present invention introduces a bloom filter to avoid problems that may lead to violation of connection consistency during the weight update process of the backend server. At the same time, considering the limitation of the limited memory of the switch, the present invention adopts the method of compressing and storing table entries in the data plane, so as to save the consumption of the SRAM memory of the switch and store a large number of connection states, so as to ensure the high concurrency of connection requests. Load balancing performance.

本发明的整体实现包括交换机数据平面和控制平面：The overall implementation of the present invention includes the switch data plane and the control plane:

1、数据平面负责数据包处理转发，该过程不涉及控制平面，保证了较快的负载均衡速度，主要包括以下组件：1. The data plane is responsible for data packet processing and forwarding. This process does not involve the control plane, which ensures a fast load balancing speed. It mainly includes the following components:

(1)连接表(ConnTable)：存储每条连接状态，即每条连接(五元组)到DIP服务器的映射。当现有连接的数据包到达时，将在连接表中进行匹配对应DIP，然后直接转发出去，保证了同一条连接所有数据包都转发到同一台DIP服务器，即连接一致性。如果是新的连接请求则执行之后的步骤。(1) Connection table (ConnTable): stores the state of each connection, that is, the mapping of each connection (quintuple) to the DIP server. When the data packet of the existing connection arrives, it will match the corresponding DIP in the connection table, and then directly forward it out, ensuring that all data packets of the same connection are forwarded to the same DIP server, that is, connection consistency. If it is a new connection request, perform the following steps.

(2)服务表(VIPTable)：维护该服务器集群所提供的所有VIP即服务，如果新的连接不能在服务表中获得一个匹配，则说明该连接所请求的服务不在该服务器集群，直接将该连接数据包丢掉。(2) Service table (VIPTable): maintains all VIPs as services provided by the server cluster. If a new connection cannot obtain a match in the service table, it means that the service requested by the connection is not in the server cluster, and directly Connection packets are dropped.

(3)转发表(ForwardingTable)：用于为新连接请求分配DIP服务器处理，存储每个VIP对应的DIP池，只存储由控制平面计算得到的实际可用算力较高的M台服务器对应DIP。(3) Forwarding Table: It is used to allocate DIP server processing for new connection requests, store the DIP pool corresponding to each VIP, and only store the DIPs corresponding to M servers with higher actual available computing power calculated by the control plane.

(4)服务器表(DIPTable)：存储数据中心中所有服务器实际DIP，用于匹配转发。(4) Server Table (DIPTable): Stores the actual DIPs of all servers in the data center for matching and forwarding.

(5)布隆过滤器(bloom filter)：布隆过滤器是一种空间占用小但是高效的常用于元素查询的数据结构，我们利用它存储在服务器权重调整过程中到达的新连接，这样对于现有连接要么在连接表中要么在布隆过滤器中，保证了权重调整期间连接一致性。(5) Bloom filter: Bloom filter is a small but efficient data structure commonly used for element query. We use it to store new connections that arrive during the server weight adjustment process, so that for Existing connections are either in the connection table or in the bloom filter, which guarantees connection consistency during weight adjustment.

2、控制平面负责服务器实时权重的计算更新，同时把对应表项下发到数据平面，主要包括以下模块：2. The control plane is responsible for the calculation and update of the real-time weight of the server, and at the same time sends the corresponding table items to the data plane, mainly including the following modules:

(1)连接统计模块(Conn_Count Module)：统计每台服务器上长连接和短连接数量，长短连接的检测在数据平面实现，然后通知控制平面。(1) Connection Statistics Module (Conn_Count Module): Count the number of long and short connections on each server. The detection of long and short connections is implemented on the data plane, and then the control plane is notified.

(2)服务器权重模块(DIP_Weight Module)：存储每个VIP对应DIP服务器的权重，并将权重最小的M台DIP服务器以表项形式定期下发到数据平面用以负载均衡调度。(2) Server weight module (DIP_Weight Module): Stores the weight of each VIP corresponding to the DIP server, and periodically sends the M DIP servers with the smallest weight to the data plane in the form of table entries for load balancing scheduling.

(3)权重计算模块(WMLCF Module)：负责基于服务器实时剩余算力和实时负载应用提出的WMLCF算法定期计算服务器的权重，然后将结果存储到服务器权重模块。(3) Weight calculation module (WMLCF Module): Responsible for regularly calculating the weight of the server based on the WMLCF algorithm proposed by the server's real-time remaining computing power and real-time load application, and then storing the result in the server weight module.

本发明具体采用以下步骤设计：The present invention specifically adopts the following steps to design:

一种基于可编程数据平面的服务器负载均衡方法，其特征在于：其包括如下步骤：A method for server load balancing based on a programmable data plane, characterized in that it comprises the following steps:

步骤S1：交换机控制平面对服务器集群中不同服务器的实时负载进行评估；Step S1: The switch control plane evaluates the real-time loads of different servers in the server cluster;

步骤S2：交换机控制平面对服务器集群中不同服务器的实时可用算力进行评估；Step S2: The switch control plane evaluates the real-time available computing power of different servers in the server cluster;

步骤S3：交换机控制平面对服务器集群中不同服务器根据其实时负载和可用算力利用WMLCF算法定期计算更新服务器的实时权重，以衡量服务器当前实际性能；Step S3: The switch control plane uses the WMLCF algorithm to periodically calculate and update the real-time weight of the server for different servers in the server cluster according to its real-time load and available computing power, to measure the current actual performance of the server;

步骤S4：选择当前实际权重较小(即性能较高)的M台服务器以表项形式定期下发到数据平面用以负载均衡调度；Step S4: Selecting M servers with smaller current actual weights (that is, higher performance) and regularly delivering them to the data plane in the form of table entries for load balancing scheduling;

步骤S5：在服务器权重定期调整过程中，保证数据平面数据包转发的连接一致性；Step S5: In the process of regularly adjusting the server weight, ensure the connection consistency of data plane data packet forwarding;

步骤S6：为了在数据平面有限的SRAM内存保存大量连接状态以保证较高速度的负载均衡调度，对表项进行压缩以减少内存占用；Step S6: In order to save a large number of connection states in the limited SRAM memory of the data plane to ensure high-speed load balancing scheduling, the table entries are compressed to reduce memory usage;

所述WMLCF算法基于服务器可用算力和实时负载计算不同服务器的动态权重，以实现在服务器权重每个更新周期内根据不同权重选择实际剩余算力较高的M台服务器接收连接请求。The WMLCF algorithm calculates the dynamic weights of different servers based on the server's available computing power and real-time load, so as to select M servers with higher actual remaining computing power to receive connection requests according to different weights in each update cycle of the server weights.

进一步地，步骤S1中，服务器的实时负载评估包括服务器上长短连接的检测、活动长短连接数的统计、长短连接的权值和服务器实时负载的计算。Further, in step S1, the real-time load evaluation of the server includes detection of long and short connections on the server, statistics of the number of active long and short connections, weights of long and short connections, and calculation of real-time server load.

进一步地，步骤S1中，服务器上长短连接的检测在数据平面基于连接的数据包数量阈值实现；活动长短连接数根据TCP连接的SYN数据包(标志连接的开始)和FIN/RST数据包(标志连接的结束)进行检测统计；长短连接的权值预先配置，权值越大表示该连接负载越高；服务器的实时负载为该服务器上活动长短连接数与其权值的乘积的累加。Further, in step S1, the detection of long and short connections on the server is realized on the data plane based on the connected data packet quantity threshold; The active long and short connection number is connected according to the SYN data packet (the beginning of the mark connection) and the FIN/RST data packet (sign of the TCP connection). The end of the connection) is detected and counted; the weights of long and short connections are pre-configured, and the larger the weight, the higher the connection load; the real-time load of the server is the accumulation of the product of the number of active long and short connections on the server and its weight.

进一步地，步骤S2中，服务器的实时可用算力评估包括算力评估指标的选取和实时可用算力的计算。Further, in step S2, the real-time available computing power evaluation of the server includes the selection of computing power evaluation indicators and the calculation of real-time available computing power.

进一步地，步骤S2中，服务器算力评估选取的指标包括对服务器性能有重要影响的CPU和内存的可用利用率；根据其对服务器性能影响程度高低进行线性加权作为服务器的实时可用算力。Further, in step S2, the indicators selected for the evaluation of server computing power include the available utilization of CPU and memory that have an important impact on server performance; linear weighting is performed according to the degree of impact on server performance as the real-time available computing power of the server.

进一步地，步骤S3中，服务器的当前实际性能衡量包括步骤S1中服务器的实时负载和步骤S2中服务器的实时可用算力的综合和服务器动态权重的计算。Further, in step S3, the current actual performance measurement of the server includes the synthesis of the real-time load of the server in step S1 and the real-time available computing power of the server in step S2, and the calculation of the dynamic weight of the server.

进一步地，步骤S3中，服务器的动态权重根据WMLCF算法计算，用以衡量服务器当前的实际性能，所述WMLCF算法基于服务器的实时负载和实时可用算力实现，即服务器的动态权重为步骤S1中服务器的实时负载和步骤S2中服务器的实时可用算力的比值；服务器权重越小意味着该服务器实时负载越小，同时可用算力越高，即服务器当前实际性能越高；Further, in step S3, the dynamic weight of the server is calculated according to the WMLCF algorithm to measure the current actual performance of the server, and the WMLCF algorithm is implemented based on the real-time load and real-time available computing power of the server, that is, the dynamic weight of the server is in step S1. The ratio between the real-time load of the server and the real-time available computing power of the server in step S2; the smaller the server weight, the smaller the real-time load of the server, and the higher the available computing power, that is, the higher the current actual performance of the server;

所述WMLCF算法的具体实现过程包括：The specific implementation process of the WMLCF algorithm includes:

1、服务器实时负载评估1. Server real-time load assessment

在数据平面利用基于数据包数量阈值的方法对长短连接进行检测，然后通知控制平面，其将服务器上活动的长短连接进行分类并赋予不同权值，权值越大表示该连接负载越高；In the data plane, the method based on the threshold of the number of packets is used to detect the long and short connections, and then the control plane is notified, which classifies the active long and short connections on the server and assigns different weights. The larger the weight, the higher the load of the connection;

设一个服务器集群S＝{S₁,S₂,...,S_n}，服务器S_i的实时负载为L(S_i)，则：Suppose a server cluster S={S ₁ , S ₂ ,...,S _n }, and the real-time load of server S _i is L(S _i ), then:

其中，C_ij表示服务器S_i活动连接类型j的数量；j＝0或j＝1分别表示短连接和长连接；w_j表示连接类型j的权值；从公式(1)得出，对于L(S_i)，值越小表示服务器S_i实时负载越低；Among them, C _ij represents the number of active connection types j of the server S _i ; j=0 or j=1 represents the short connection and long connection respectively; w _j represents the weight of the connection type j; from formula (1), for L (S _i ), the smaller the value is, the lower the real-time load of the server S _i is;

2、服务器可用算力评估2. Evaluation of the available computing power of the server

只考虑对服务器性能有重要影响的两个指标CPU和内存的剩余可用利用率，对这两个指标线性加权即为服务器剩余可用算力C(S_i)：Only the remaining available utilization of CPU and memory, two indicators that have an important impact on server performance, are considered, and the linear weighting of these two indicators is the remaining available computing power of the server C(S _i ):

其中，A_C(Si)和A_M(Si)分别表示服务器S_i的CPU和内存剩余可用利用率；α和β系数分别取决于CPU和内存对服务器性能的影响程度，影响程度越大则系数越大；从公式(2)得出，对于C(S_i)，值越大表示服务器S_i剩余可用算力越多；Among them, A _C (Si) and A _M (Si) represent the remaining available _utilization of CPU and memory of server Si, respectively; the coefficients α and β depend on the degree of influence of CPU and memory on server performance, and the greater the degree of influence, the greater the coefficient. is larger; from formula (2), for C(S _i ), the larger the value, the more the remaining available computing power of the server S _i ;

3、服务器权重计算3. Server weight calculation

服务器权重为实时负载和剩余可用算力的比值，根据公式(1)和公式(2)，服务器S_i权重W(S_i)为：The server weight is the ratio of the real-time load to the remaining available computing power. According to formula (1) and formula (2), the weight W(S _i ) of the server S _i is:

W(S_i)＝L(S_i)/C(S_i) (3)W(S _i )=L(S _i )/C(S _i ) (3)

从公式(3)得出，对于服务器S_i，权重W(S_i)越小说明其实时负载L(S_i)越低同时剩余可用算力C(S_i)越多，意味着其当前实际性能越高。From formula (3), for the server S _i , the smaller the weight W(S _i ) is, the lower the real-time load L(S _i ) is and the more the remaining available computing power C(S _i ) is, which means that its current actual load is lower. higher performance.

进一步地，步骤S4中选择的M台服务器是基于步骤S3中所计算的服务器权重，选择权重较小的M台服务器以表项形式下发到数据平面用于负载均衡调度；其中，M可根据服务器集群规模动态配置。Further, the M servers selected in step S4 are based on the server weights calculated in step S3, and M servers with smaller weights are selected to be sent to the data plane in the form of entries for load balancing scheduling; The size of the server cluster is dynamically configured.

进一步地，在步骤S5中，在服务器权重定期更新期间，为避免新连接因表项下发存在时延可能出现该连接前几个数据包选择原先服务器处理而随后的数据包选择新下发的服务器处理，导致违反连接一致性问题，利用布隆过滤器存储在此期间到达的新连接，其中的新连接都选择原先的就服务器转发处理，以保证连接一致性。Further, in step S5, during the periodic update of the server weight, in order to avoid that the new connection may have a delay in the delivery of the table entry, the first few data packets of the connection are selected to be processed by the original server and the subsequent data packets are selected to be newly issued. Server processing, which leads to violation of connection consistency, uses Bloom filter to store new connections that arrive during this period, and new connections are selected to be forwarded to the original server to ensure connection consistency.

进一步地，步骤S6中，为在可编程数据平面有限的SRAM空间存储大量连接状态，对连接表中表项进行压缩存储：对于每条连接状态，其表项的匹配键Key不直接存储五元组，即：源IP地址、目的IP地址、源端口、目的端口、协议，而是存储其哈希值；表项的动作值Action不直接存储实际服务器IP地址和端口，而是存储其映射服务器编号ID。Further, in step S6, in order to store a large number of connection states in the limited SRAM space of the programmable data plane, the entries in the connection table are compressed and stored: for each connection state, the matching key Key of the entry does not directly store five yuan. Group, namely: source IP address, destination IP address, source port, destination port, protocol, but store its hash value; the action value Action of the table entry does not directly store the actual server IP address and port, but stores its mapping server Number ID.

本发明及其优选方案在数据中心的服务器集群中，根据服务器实时负载和可用算力衡量服务器当前实际性能并赋予权重，通过提出的WMLCF(Weighted M-Least-Connection First)算法计算服务器权重，能够更准确反映服务器当前的实际性能。该算法定期更新M台当前性能较高(权重较小)的服务器并以表项形式下发到数据平面用于新连接请求的负载均衡调度，因负载均衡调度在数据平面实现，保证了较快的负载均衡速度，也使得性能较高的服务器能够处理更多的新连接请求，提高了服务器负载分布的均衡性以及服务器资源利用率。另外，该负载均衡方法实现为带状态，同时引入布隆过滤器以保证服务器权重更新过程中连接一致性。最后，本发明使用表项压缩的方法存储表项以减少可编程数据平面有限SRAM内存的消耗，能够保存大量的连接状态，从而保证连接请求高并发情况下的负载均衡。因此，本发明可实现较高性能的服务器负载均衡。In the present invention and its preferred solution, in the server cluster of the data center, the current actual performance of the server is measured and weighted according to the real-time load of the server and the available computing power, and the weight of the server is calculated by the proposed WMLCF (Weighted M-Least-Connection First) algorithm. More accurately reflects the current actual performance of the server. The algorithm regularly updates M servers with higher current performance (smaller weight) and sends them to the data plane in the form of entries for load balancing scheduling of new connection requests. Because the load balancing scheduling is implemented on the data plane, it ensures faster The fast load balancing speed also enables higher-performance servers to handle more new connection requests, improving the balance of server load distribution and server resource utilization. In addition, the load balancing method is implemented with state, and at the same time, a Bloom filter is introduced to ensure connection consistency in the process of server weight update. Finally, the present invention uses the method of table entry compression to store table entries to reduce the consumption of limited SRAM memory of the programmable data plane, and can save a large number of connection states, thereby ensuring load balance in the case of high concurrency of connection requests. Therefore, the present invention can realize server load balancing with higher performance.

附图说明Description of drawings

以下结合附图和具体实施方式对本发明做进一步详细说明；The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments;

图1为本发明实施例基于可编程数据平面的服务器负载均衡方法的框架图；1 is a framework diagram of a server load balancing method based on a programmable data plane according to an embodiment of the present invention;

图2为本发明实施例基于可编程数据平面的服务器负载均衡方法提出的WMLCF(Weighted M-Least-Connection First)算法伪代码示意图；FIG. 2 is a schematic diagram of pseudo-code of the WMLCF (Weighted M-Least-Connection First) algorithm proposed by the server load balancing method based on the programmable data plane according to an embodiment of the present invention;

图3为本发明实施例基于可编程数据平面的服务器负载均衡方法的流程图。FIG. 3 is a flowchart of a server load balancing method based on a programmable data plane according to an embodiment of the present invention.

具体实施方式Detailed ways

为让本专利的特征和优点能更明显易懂，下文特举实施例，作详细说明如下：In order to make the features and advantages of this patent more obvious and easy to understand, the following specific examples are given and described in detail as follows:

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本说明书使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

如图1-图3所示，本发明实施例提出的WMLCF算法根据服务器实时负载以及可用算力计算服务器权重，然后根据不同服务器权重选择M台综合性能较高的服务器以表项形式下发到数据平面用以负载均衡调度，WMLCF算法伪代码如图2所示，其具体实现原理如下：1、服务器实时负载评估As shown in Figures 1-3, the WMLCF algorithm proposed in the embodiment of the present invention calculates the server weight according to the real-time load of the server and the available computing power, and then selects M servers with higher comprehensive performance according to different server weights. The data plane is used for load balancing scheduling. The pseudo code of the WMLCF algorithm is shown in Figure 2. The specific implementation principles are as follows: 1. Real-time server load evaluation

服务器的负载取决于活动连接数，而数据中心中大小流(长短连接)共存，其流量分布还符合“80/20”准则。相比于短连接，长连接的数据包的大小更大以及持续时间更长，需要消耗的服务器资源更大，相当于更高的负载。因此，为了更准确反映服务器实时负载，在数据平面利用基于数据包数量阈值的方法对长短连接进行检测，然后通知控制平面，其将服务器上活动的长短连接进行分类并赋予不同权值，权值越大表示该连接负载越高。另外，活动连接数根据TCP连接的SYN数据包(标志连接的开始)和FIN/RST数据包(标志连接的结束)进行检测统计。The load of the server depends on the number of active connections, and the large and small streams (long and short connections) coexist in the data center, and its traffic distribution also conforms to the "80/20" criterion. Compared with short connections, long connection data packets are larger in size and last longer, and consume more server resources, which is equivalent to a higher load. Therefore, in order to more accurately reflect the real-time load of the server, a method based on the threshold of the number of packets is used on the data plane to detect long and short connections, and then the control plane is notified, which classifies the active long and short connections on the server and assigns different weights. A larger value means a higher load on the connection. In addition, the number of active connections is detected and counted according to the SYN data packet (marking the start of the connection) and the FIN/RST data packet (marking the end of the connection) of the TCP connection.

假设一个服务器集群S＝{S₁,S₂,...,S_n}，服务器S_i的实时负载为L(S_i)，则：Assuming a server cluster S={S ₁ , S ₂ ,...,S _n }, and the real-time load of server Si is L(S _i ₎ , then:

其中，C_ij表示服务器S_i活动连接类型j的数量；j＝0或j＝1分别表示短连接和长连接；w_j表示连接类型j的权值；从公式⑴可以得出，对于L(S_i)，值越小表示服务器S_i实时负载越低。Among them, C _ij represents the number of active connection types j of the server S _i ; j=0 or j=1 represents the short connection and the long connection respectively; w _j represents the weight of the connection type j; it can be drawn from formula (1) that for L( S _i ), the smaller the value is, the lower the real-time load of the server S _i is.

考虑到服务器集群中服务器的异构性，即不同服务器其硬件配置不同，因此只考虑服务器的实时负载还不够，服务器实际性能还取决于剩余可用算力。影响服务器剩余算力的主要指标有CPU、内存、硬盘类型、I/O速度以及网络带宽，为了减少计算开销我们只考虑对服务器性能有重要影响的两个指标CPU和内存的剩余可用利用率。然后，对这两个指标线性加权即为服务器剩余可用算力C(S_i)：Considering the heterogeneity of servers in a server cluster, that is, different servers have different hardware configurations, it is not enough to only consider the real-time load of the server, and the actual performance of the server also depends on the remaining available computing power. The main indicators that affect the remaining computing power of the server are CPU, memory, hard disk type, I/O speed, and network bandwidth. In order to reduce computing overhead, we only consider the remaining available utilization of CPU and memory, two indicators that have an important impact on server performance. Then, linearly weighting these two indicators is the remaining available computing power C(S _i ) of the server:

C(Si)＝αA_C(Si)+βA_M(Si),α+β＝1 ⑵C(Si)=αA _C (Si)+βA _M (Si),α+β=1 ⑵

其中，A_C(Si)和A_M(Si)分别表示服务器S_i的CPU和内存剩余可用利用率；α和β系数分别取决于CPU和内存对服务器性能的影响程度，影响程度越大则系数越大。从公式⑵可用得出，对于C(S_i)，值越大表示服务器S_i剩余可用算力越多。Among them, A _C (Si) and A _M (Si) represent the remaining available _utilization of CPU and memory of server Si, respectively; the coefficients α and β depend on the degree of influence of CPU and memory on server performance, and the greater the degree of influence, the greater the coefficient. bigger. It can be obtained from formula (2) that, for C(S _i ), the larger the value is _, the more computing power the server Si has remaining available.

3、服务器权重计算3. Server weight calculation

服务器当前的实际性能取决于其实时负载以及可用处理能力，我们用权重表示服务器当前的的实际性能，服务器权重为其实时负载和剩余可用算力的比值，根据公式⑴和公式⑵，服务器S_i权重W(S_i)为：The current actual performance of the server depends on its real-time load and available processing capacity. We use the weight to represent the current actual performance of the server, and the server weight is the ratio of its real-time load to the remaining available computing power. According to formula (1) and formula (2), the server S _i The weight W(S _i ) is:

W(S_i)＝L(S_i)/C(S_i) ⑶W(S _i )=L(S _i )/C(S _i ) ⑶

从公式⑶可以得出，对于服务器S_i，权重W(S_i)越小说明其实时负载L(S_i)越低同时剩余可用算力C(S_i)越多，意味着其当前实际性能越高。From formula (3), it can be concluded that for the server S _i , the smaller the weight W(S _i ) is, the lower the real-time load L(S _i ) is and the more the remaining available computing power C(S _i ) is, which means its current actual performance higher.

在WMLCF算法中，服务器权重随着实时负载和剩余可用算力动态变化，因此，能够更准确反映服务器当前实际性能。服务器集群中权重最小的M(可根据服务器规模设置)台服务器以表项的形式下发到数据平面用于新请求的负载均衡调度，不选择服务器权重最小的唯一那台服务器下发是考虑到如果在短时间内有大量连接请求到达，可能会导致这台服务器过载，同时，保证了实际性能较高的服务器能够处理更多的连接请求。因此，我们选择多台服务器共同处理新连接请求以避免上述问题。In the WMLCF algorithm, the server weight changes dynamically with the real-time load and remaining available computing power, so it can more accurately reflect the current actual performance of the server. The M servers with the smallest weight in the server cluster (which can be set according to the size of the server) are sent to the data plane in the form of entries for load balancing scheduling of new requests. The only server with the smallest weight is not selected for delivery because it is considered that If a large number of connection requests arrive in a short period of time, the server may be overloaded, and at the same time, it is ensured that the server with higher actual performance can handle more connection requests. Therefore, we choose multiple servers to jointly handle new connection requests to avoid the above problems.

此外，在服务器权重更新期间可能存在违反连接一致性问题。具体来说，如果在服务器权重更新及表项下发期间有新的连接请求到达，而表项下发到数据平面由于需要控制平面参与存在一定时延(ms级别)。这样可能导致一种情况：由于存在表项安装时延，这条新连接的第一个数据包在新表项安装之前到达，则其会选择旧服务器去处理，之后新表现安装完成，这条新连接随后的数据包就会选择新服务器，即违反了连接一致性。为了解决该情况，本发明引入了布隆过滤器，用于存储在服务器权重更新期间到达的新连接，这些新连接都会选择旧服务器去处理，保证了连接一致性，而未在布隆过滤器中的新连接将选择新服务器处理。布隆过滤器在数据平面的实现可以使用register对象，而且SRAM内存占用小。Additionally, there may be connection consistency violations during server weight updates. Specifically, if a new connection request arrives during the server weight update and table entry delivery, and the table entry is delivered to the data plane, there is a certain delay (ms level) due to the control plane participation. This may lead to a situation: due to the entry installation delay, the first data packet of this new connection arrives before the new entry is installed, it will select the old server to process, and then the new display installation is completed, this Subsequent packets from the new connection will select the new server, violating connection consistency. In order to solve this situation, the present invention introduces a Bloom filter to store new connections that arrive during the server weight update period. These new connections will be processed by the old server, ensuring connection consistency, and not in the Bloom filter. New connections in will choose the new server to handle. The implementation of the Bloom filter in the data plane can use the register object, and the SRAM memory footprint is small.

同时，因为现有可编程交换机ASICs的SRAM内存空间有限(～100MB),无法直接存储数据中心10million数量级的连接数。对于IPv4连接，每条连接状态需要存储一个13-byte(五元组)匹配键(Key)和一个6-byte(DIP:port)动作值(Action)，意味着连接表存储10Million连接数需要消耗至少190MB，超过了现有可编程交换机一般100MB可用SRAM。为了可用存储这些连接数，我们对连接表中表项进行压缩，对于匹配键，我们存储五元组的24-bit哈希值；对于动作值，我们存储DIP_ID，对于大型数据中心来说服务器规模通常是几万台，因此DIP_ID我们设置为16-bit。经过表项压缩后，存储一条连接状态只需要5B，存储10Million连接数只要消耗50MB SRAM空间。因此，足够在现有可编程交换机ASICs存储数百万数量级的连接数。At the same time, due to the limited SRAM memory space (~100MB) of existing programmable switch ASICs, it is impossible to directly store the number of connections in the data center on the order of 10 million. For IPv4 connections, each connection state needs to store a 13-byte (five-tuple) matching key (Key) and a 6-byte (DIP:port) action value (Action), which means that the connection table needs to consume 10Million connections to store. At least 190MB, which exceeds the 100MB available SRAM of existing programmable switches. In order to store these connection numbers available, we compress the entries in the connection table. For the matching key, we store the 24-bit hash value of the quintuple; for the action value, we store the DIP_ID. For large data centers, the server scale Usually tens of thousands, so we set DIP_ID to 16-bit. After the table entry is compressed, it only takes 5B to store a connection state, and only 50MB of SRAM space is needed to store 10Million connections. Therefore, it is sufficient to store the number of connections on the order of millions in existing programmable switch ASICs.

本领域内的技术人员应明白，本发明的实施例可提供为方法、装置、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(装置)、和计算机程序产品的流程图来描述的。应理解可由计算机程序指令实现流程图中的每一流程、以及流程图中的流程结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程中指定的功能的装置。The present invention is described with reference to flowchart illustrations of methods, apparatus (apparatus), and computer program products according to embodiments of the invention. It will be understood that each process in the flowchart, and combinations of processes in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce A device that implements the functions specified in one or more of the flow charts.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程图中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The device implements the functions specified in one or more of the flowcharts.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that Instructions provide steps for implementing the functions specified in a flow or flows of the flowchart.

以上所述，仅是本发明的较佳实施例而已，并非是对本发明作其它形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例。但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in other forms. Any person skilled in the art may use the technical content disclosed above to make changes or modifications to equivalent changes. Example. However, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solutions of the present invention still belong to the protection scope of the technical solutions of the present invention.

本专利不局限于上述最佳实施方式，任何人在本专利的启示下都可以得出其它各种形式的基于可编程数据平面的服务器负载均衡方法，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本专利的涵盖范围。This patent is not limited to the above-mentioned best embodiment, anyone can come up with other various forms of server load balancing methods based on programmable data plane under the inspiration of this patent. Changes and modifications should all fall within the scope of this patent.

Claims

1. A server load balancing method based on a programmable data plane is characterized in that: which comprises the following steps:

step S1: the switch control plane evaluates the real-time load of different servers in the server cluster;

step S2: the switch control plane evaluates the real-time available computing power of different servers in the server cluster;

and step S3: the switch control plane periodically calculates real-time weights of the updated servers for different servers in the server cluster by using a WMLCF algorithm according to real-time loads and available computing power of the different servers so as to measure the current actual performance of the servers;

and step S4: selecting M servers with smaller current actual weights to periodically send the servers to a data plane in a table entry form for load balancing scheduling;

step S5: in the process of regularly adjusting the server weight, the connection consistency of forwarding the data plane data packet is ensured;

step S6: compressing the table entry to reduce the memory occupation;

the WMLCF algorithm calculates the dynamic weights of different servers based on the available computing power and real-time load of the servers, so that M servers with higher actual residual computing power are selected to receive the connection request according to different weights in each updating period of the server weights.

2. The method for server load balancing based on programmable data plane according to claim 1, characterized in that: in step S1, the real-time load evaluation of the server includes detection of long and short connections on the server, statistics of the number of active long and short connections, and calculation of weights of the long and short connections and real-time load of the server.

3. The method for server load balancing based on programmable data plane according to claim 2, characterized in that: in the step S1, the detection of the long and short connections on the server is realized on the basis of a threshold value of the number of connected data packets on a data plane; the number of the active long and short connections is detected and counted according to a SYN data packet and an FIN/RST data packet of the TCP connection; the weight of the long and short connections is configured in advance, and the larger the weight is, the higher the connection load is; the real-time load of the server is the accumulation of the product of the number of the long and short active connections on the server and the weight value of the long and short active connections.

4. The method for load balancing servers based on programmable data planes of claim 3, wherein: in step S2, the real-time available calculation power evaluation of the server includes selection of calculation power evaluation indexes and calculation of real-time available calculation power.

5. The method for load balancing servers based on programmable data planes of claim 4, wherein: in step S2, the index selected by the server computing power evaluation comprises the available utilization rate of a CPU and an internal memory which have important influence on the performance of the server; and carrying out linear weighting according to the influence degree of the linear weighting on the performance of the server to serve as the real-time available computing power of the server.

6. The method for server load balancing based on programmable data plane according to claim 5, characterized in that: in step S3, the current actual performance measurement of the server includes the integration of the real-time load of the server in step S1 and the real-time available computing power of the server in step S2, and the calculation of the dynamic weight of the server.

7. The method for server load balancing based on programmable data plane according to claim 6, characterized in that: in the step S3, the dynamic weight of the server is calculated according to a WMLCF algorithm to measure the current actual performance of the server, wherein the WMLCF algorithm is realized based on the real-time load and the real-time available computing power of the server, namely the dynamic weight of the server is the ratio of the real-time load of the server in the step S1 to the real-time available computing power of the server in the step S2; the smaller the weight of the server is, the smaller the real-time load of the server is, and meanwhile, the higher the available computing power is, namely, the higher the current actual performance of the server is;

the specific implementation process of the WMLCF algorithm comprises the following steps:

1. server real-time load assessment

Detecting long and short connections on a data plane by using a method based on a data packet quantity threshold, then informing a control plane, classifying the long and short connections which are active on a server and endowing different weights, wherein the larger the weight is, the higher the connection load is;

let a server cluster S = { S = } ₁ ,S ₂ ,...,S _n }, server S _i Has a real-time load of L (S) _i ) And then:

wherein, C _ij Presentation server S _i The number of active connection types j; j =0 or j =1 represents a short connection and a long connection, respectively; w is a _j Representing the weight of the connection type j; from equation (1), for L (S) _i ) The smaller the value, the server S is represented _i The lower the real-time load;

2. server available computing power assessment

Only two indexes which have important influence on the performance of the server, namely the residual available utilization rate of a CPU (Central processing Unit) and the residual available utilization rate of a memory, are considered, and the two indexes are linearly weighted to obtain the residual available computing power C (S) of the server _i )：

C(Si)＝αA _C (Si)+βA _M (Si),α+β＝1 (2)

Wherein A is _C (Si) and A _M (Si) respectively represents the servers S _i The CPU and the residual available utilization rate of the memory; the alpha and beta coefficients respectively depend on the influence degrees of the CPU and the memory on the performance of the server, and the larger the influence degree is, the larger the coefficient is; from equation (2), for C (S) _i ) The larger the value is, the larger the server S is _i The more remaining available computing power;

3. server weight calculation

The server weight is the ratio of the real-time load to the remaining available computing power, and the server S is based on the formula (1) and the formula (2) _i Weight W (S) _i ) Comprises the following steps:

W(S _i )＝L(S _i )/C(S _i ) (3)

from equation (3), it follows that for server S _i Weight W (S) _i ) The smaller the real-time load L (S) is _i ) Lower simultaneous residual available computing power C (S) _i ) More means higher current actual performance.

8. The method for server load balancing based on programmable data plane according to claim 7, characterized in that: the M servers selected in the step S4 are based on the server weight calculated in the step S3, and the M servers with smaller weights are selected to be issued to the data plane in a table entry form for load balancing scheduling; wherein, M can be dynamically configured according to the scale of the server cluster.

9. The method for load balancing servers based on programmable data planes as claimed in claim 1, wherein: in step S5, during the period of updating the server weight periodically, in order to avoid that the problem of violating the connection consistency is caused by the fact that several data packets before the connection are selected to be processed by the original server and the following data packets are selected to be processed by the newly issued server due to the delay of the table item issue of the new connection, a bloom filter is used to store the new connections arriving during the period, wherein the new connections all select the original forwarding processing of the server, so as to ensure the connection consistency.

10. The method for server load balancing based on programmable data plane according to claim 1, characterized in that: in step S6, in order to store a large number of connection states in the limited SRAM space of the programmable data plane, the table entries in the connection table are compressed and stored: for each connection state, the matching Key of its entry does not directly store the five tuple, i.e.: the source IP address, the destination IP address, the source port, the destination port and the protocol, but the hash value is stored; the Action value Action of the entry does not directly store the actual server IP address and port, but rather its mapping server number ID.