CN116489154A - A load sharing method and related device - Google Patents
A load sharing method and related device Download PDFInfo
- Publication number
- CN116489154A CN116489154A CN202210266506.1A CN202210266506A CN116489154A CN 116489154 A CN116489154 A CN 116489154A CN 202210266506 A CN202210266506 A CN 202210266506A CN 116489154 A CN116489154 A CN 116489154A
- Authority
- CN
- China
- Prior art keywords
- load
- path
- equivalent
- paths
- equal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/25—Routing or path finding in a switch fabric
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A load sharing method and related device are disclosed, which belong to the communication technical field. In a scenario where a data stream is forwarded over multiple equal-cost paths. And when the first data stream is forwarded, acquiring a plurality of equivalent paths corresponding to the first data stream, and determining a target path from the equivalent paths. When the target path is determined by the number of data streams forwarded by each equivalent path, the message interval in each data stream is smaller than the set time interval, and the load balance is ensured from the source. When the target path is determined by the load levels corresponding to the loads of the equivalent paths, the load level corresponding to the target path is the lightest load level allowed currently, and the load level of each equivalent path is determined according to the loads of the equivalent paths and the historic loads of the equivalent paths, so that when the loads of the equivalent paths change, the current load balancing strategy can be quickly adjusted, and the load balancing performance is improved.
Description
The present application claims priority from chinese patent application No. 202210041176.6 entitled "a load sharing method and related apparatus" filed on 14, 2022, 01, the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a load sharing method and a related device.
Background
In performing artificial intelligence (artificial intelligence, AI) calculations, a top of rack (ToR) switch in a data center is responsible for forwarding data streams generated by a local server to remote servers connected by other switches through a spine (spine) switch. The ToR switch is provided with a plurality of output interfaces, and the plurality of output interfaces are respectively connected with one of the plurality of spine switches to form a plurality of paths. Based on this, when the ToR switch receives a data stream sent by the local server, it needs to select one path from the multiple paths to send the data stream, so as to realize load sharing among the multiple paths.
Disclosure of Invention
The application provides a load sharing method and a related device, which can promote load balancing among paths so as to avoid network congestion.
In a first aspect, a load sharing method is provided. In the method, network equipment receives a first data stream and acquires a plurality of equivalent paths corresponding to the first data stream. The network device determines a target path from among a plurality of equal cost paths through which to forward the first data stream.
The target path is a path meeting preset conditions in a plurality of equivalent paths, wherein the preset conditions comprise that the quantity of forwarded data streams is minimum, and the message interval in each data stream is smaller than a set time interval; or the load grade corresponding to the load is the lightest load grade allowed currently, and the load grade of each equivalent path is determined according to the load of the current equivalent path and the historic loads of a plurality of equivalent paths.
In the application, the target path may be determined by the number of data flows forwarded by each equivalent path, or may be determined by a load level corresponding to the load of each equivalent path.
When the target path is determined by the number of data streams forwarded by each equivalent path, the number of data streams forwarded by the equivalent paths can be updated immediately after the data streams are sent by the equivalent paths, and compared with the load of each equivalent path to determine the target path, the method does not need to determine the load of each equivalent path, thereby avoiding the problem caused by the fact that the determined load cannot timely display the real load. In other words, the method actively performs load sharing based on the number of data streams forwarded by each equivalent path, with respect to passively performing load balancing by the load of each equivalent path, and ensures load balancing from the source.
When the target path is determined by the load level corresponding to the load of each equivalent path, since the load level corresponding to the load is the lightest load level currently allowed, and the load level of each equivalent path is determined according to the load of the current equivalent path and the historic loads of the equivalent paths. That is, the present application uses the historical load as a reference object to determine the load level corresponding to the load of the equivalent path. Compared with the method for determining the load grade corresponding to the load of the equivalent path by taking the fixed load threshold as the reference object, the method for determining the load grade corresponding to the load of the equivalent path by taking the historical load as the reference object can correspondingly adjust the load grade corresponding to the equivalent path when the load of the equivalent path changes relative to the historical load, and correspondingly adjusts the target path determined based on each equivalent path, so that the current load balancing strategy can be quickly adjusted when the load of the equivalent path changes. In other words, in the scheme that the target path is determined by the load levels corresponding to the loads of the equivalent paths, when the load of the equivalent path changes, the network device can quickly sense the load change of the equivalent path, so that the current load balancing strategy can be quickly adjusted.
Optionally, the load level of each equivalent path is determined according to the load of the current equivalent path and a load level table, the load level table indicates N load thresholds, the N load thresholds are determined according to the historical loads of the equivalent paths, the historical loads are loads when the load balance degree among the equivalent paths reaches the target load balance degree, the load balance degree is the proportion of equivalent paths, in which the difference value among the loads in the equivalent paths is lower than the first difference value threshold, and N is greater than or equal to 1.
In order to enable the load level corresponding to the load of the equivalent path to be determined with the historical load as a reference object, in the present application, the load threshold indicated by the load level table may be determined according to the historical load, so that the load level table is configured based on the determined load threshold. In addition, since the N load thresholds in the load level table are determined according to the historic loads of the equivalent paths, and the historic loads are loads when the load balance degree among the equivalent paths reaches the target load balance degree, when load sharing is performed through the load level table, paths with lighter loads can be more effectively utilized, and load balance among the current paths can be realized.
Optionally, in the method, the current load of the plurality of equivalent paths may also be determined; determining the load balancing degree of the equivalent paths based on the current loads of the equivalent paths; if the determined load balance degree is lower than the target load balance degree, the load level table is adjusted, and the operation of determining the current loads of the equivalent paths is performed in a returning mode until the last determined load balance degree exceeds the target load balance degree.
The load level table can be updated by combining the current loads of the equivalent paths, so that the load balance among the equivalent paths after the load sharing is carried out based on the load level table is improved.
Alternatively, the implementation process of adjusting the load level table may be: clustering the equivalent paths into M load sharing groups based on the current loads of the equivalent paths, wherein each load sharing group comprises at least one equivalent path, and M is greater than or equal to 1; and determining a load threshold corresponding to each load sharing group based on the current load of the equivalent path in each load sharing group, obtaining M load thresholds corresponding to the M load sharing groups one by one, and updating the load level table according to the M load thresholds. For example, the N load thresholds in the load level table are replaced with the M load thresholds.
And clustering the equivalent paths with the same load into a load sharing group by clustering the equivalent paths. The load threshold is then reconfigured around the current load of the equal cost path in each load sharing group to effect an update to the load class table. When load sharing is carried out on the basis of the updated load level table in the follow-up, if the current load of a certain equivalent path changes relative to the previous load, the load level corresponding to the current load of the equivalent path is adjusted accordingly because the load threshold is basically about the load before the equivalent path, so that the adjustment of the current load balancing strategy is triggered, and the load balancing among all equivalent paths is improved.
Optionally, the implementation process of determining the load threshold corresponding to each load sharing group based on the current load of the equal-speed path in each load sharing group may be: the maximum load in the current load of the equal-speed path in each load sharing group is used as a load threshold corresponding to the load sharing group.
The implementation process is also used for enabling the network equipment to quickly sense the load change of the equivalent paths so as to quickly adjust the load balancing strategy, thereby improving the load balancing among the equivalent paths.
Optionally, based on the current loads of the equivalent paths, the implementation process of clustering the equivalent paths into at least one load sharing group may be: collecting the load of each equivalent path in a plurality of equivalent paths once in every interval sampling period; when the load of each equivalent path acquired in the current sampling period is acquired each time, determining the sliding average load of each equivalent path based on the sliding weight; clustering the equivalent paths based on the moving average load of each equivalent path in the equivalent paths to obtain X equivalent path clusters, wherein X is greater than or equal to 1; if at least one equivalent path cluster in the X equivalent path clusters does not meet the target condition, increasing the sliding weight and/or reducing the duration of the sampling period, and returning to execute the acquisition of the load of each equivalent path in the equivalent paths every interval sampling period until M equivalent path clusters are obtained after the last clustering to meet the target condition, and taking the M equivalent path clusters as M load sharing groups.
Wherein, the target condition is: the distance between the moving average loads of any two equivalent paths in different equivalent path clusters exceeds a distance threshold.
For any two different equivalent path clusters, if the distance between the moving average loads of any two equivalent paths in the two equivalent path clusters exceeds the distance threshold, the distance between the different equivalent path clusters is far, and the clustering result reaches the expected clustering result. Accordingly, if the distance between the moving average loads of two equivalent paths in the two equivalent path clusters is lower than the distance threshold, the distance between the two equivalent path clusters is closer, and the clustered equivalent path clusters are not well separated. Therefore, the clustering needs to be performed again. Based on the above, the above-mentioned clustering implementation process can implement that the clustering result reaches the expected clustering result.
Optionally, in the method, if the X equivalent path clusters meet the target condition, the sliding weight is reduced and/or the duration of the sampling period is increased, and the load of each equivalent path in the plurality of equivalent paths is collected once every interval sampling period is returned to be executed until at least one equivalent path cluster after the last clustering does not meet the target condition, and the M equivalent path clusters after the last-last clustering are used as M load sharing groups.
And if the X equivalent path clusters meet the target condition, indicating that the effect after the first clustering reaches the expected clustering effect. In this scenario, if the current clustering effect is very good, it indicates that the sliding weight set before is too large or the duration of the sampling period is too small. If the current load of the equivalent path is determined directly based on the sliding weight or the duration of the sampling period when the load level table is updated next time, the data processing amount of the current load of the equivalent path is determined to be more. Therefore, through the implementation process, the sliding weight and the duration of the sampling period used in the final clustering can just meet the clustering requirement, and the situation that the sliding weight is too large or the duration of the sampling period is too small is avoided.
Alternatively, the process of determining the moving average load of each equivalent path based on the sliding weight may be: acquiring a historical moving average load of the equivalent path, wherein the historical moving average load is determined based on the load acquired in the last sampling period of the current sampling period; if the load of the equivalent path acquired in the current sampling period exceeds the historical moving average load of the equivalent path and the difference value between the load and the historical moving average load of the equivalent path exceeds a second difference value threshold, determining the moving average load of the equivalent path based on the first sliding weight and the historical moving average load; if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the moving average load of the equivalent path is determined based on the second sliding weight and the historical moving average load. Wherein the second sliding weight is greater than the first sliding weight.
In the application, when determining the sliding average load of each equivalent path based on the sliding weights, a scheme of adopting different sliding weights at different stages is also provided, so that the real load of the equivalent path can be quickly displayed in the determined sliding average load.
Optionally, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the implementation process of determining the moving average load of the equivalent path based on the second sliding weight and the historical moving average load may be: if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the continuous descending frequency of the load does not exceed the target frequency, determining the moving average load of the equivalent path based on the second sliding weight and the historical moving average load, wherein the continuous descending frequency of the load is the number of continuous sampling periods when the load acquired in the sampling period is continuously smaller than the corresponding historical moving average load.
Accordingly, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the number of continuous load drops exceeds the target number, the moving average load of the equivalent path is determined based on the third sliding weight and the historical moving average load. Wherein the third sliding weight is smaller than the second sliding weight.
If the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the data flow forwarded by the equivalent path may be in a flow stabilization stage or a flow reduction or stop stage. If the data flow forwarded by the equivalent path is in the flow reduction or stop stage, the sliding weight can be set to be smaller through the implementation process, and compared with the historical sliding average load, the determined sliding average load can also be rapidly reduced, so that the load condition of the equivalent path in the flow reduction or stop stage of the data flow is actually in the determined sliding average load.
Alternatively, the load of each equivalent path of the plurality of equivalent paths is obtained by: acquiring a first load of an equivalent path, wherein the first load is the load of the equivalent path acquired in the last sampling period before receiving a first data stream; and if the first load is smaller than the load threshold, acquiring the flow size of the second data stream sent from the equivalent path in the last sampling period of the last sampling period, and superposing the flow size of the second data stream on the first load to obtain the load of the equivalent path.
If the first load is smaller than the load threshold value, the data flow forwarded by the equivalent path is in the just-started stage, and the size of the data flow forwarded by the equivalent path before the current time can be superimposed on the acquired load, so that the problem that the load of the equivalent path is too small when the flow is just started, and the subsequent uneven load sharing is caused is avoided.
Optionally, when the target path includes Y equivalent paths, after forwarding the first data stream through the target path, if a third data stream is received and the third data stream corresponds to the target path, selecting an equivalent path from the Y equivalent paths by a polling manner; and transmitting the third data stream through the selected equivalent path.
In order to avoid traffic congestion caused by sending the first data stream and the third data stream to the same equivalent path in the Y equivalent paths, the third data stream can be sent by selecting one equivalent path from the Y equivalent paths through a polling mode and sending the third data stream through the selected equivalent path.
Alternatively, the implementation process of determining the target path from the plurality of equivalent paths may be: acquiring the number of data streams corresponding to each equivalent path in a plurality of equivalent paths, wherein the number of the data streams indicates the number of the data streams forwarded by the corresponding equivalent path; and taking the equivalent path with the minimum number of corresponding data streams in the equivalent paths as a target path. Accordingly, in the method, the number of data streams corresponding to the target path may also be increased.
After the first data stream is forwarded through the target path, the number of the data streams corresponding to the target path can be updated, so that load sharing is performed based on the updated data stream data, and load balancing among all equivalent paths is improved.
Optionally, in the method, after forwarding the first data stream through the target path, a time before the current time when the message belonging to the first data stream is received last may also be determined; if the time of last receiving the message belonging to the first data stream before the current time exceeds the set time interval, the number of data streams corresponding to the target path is reduced.
As the communication time increases, the number of data streams corresponding to each equivalent path is increasingly larger. If the recorded data stream number corresponding to each equivalent path cannot represent the data stream number forwarded by each equivalent path in the last period, load sharing based on the data stream number corresponding to each equivalent path in this case cannot well promote load balancing. Therefore, in this application, a set time interval (i.e., aging time) is also provided, where the set time interval is used for the number of data flows corresponding to each equivalent path in the aging flow table.
Optionally, in the method, after forwarding the first data stream through the target path, a difference between the current time and an initial sending time may also be obtained, where the initial sending time is a time when a message belonging to the first data stream is forwarded through the target path for the first time; if the difference between the current time and the initial sending time exceeds the path switching interval, acquiring the flow size of the first data flow forwarded through the target path before the current time; if the flow size of the first data flow forwarded before the current time exceeds the target value, selecting an alternative path from a plurality of equivalent paths, wherein the alternative path and the target path are different equivalent paths; and continuing to send the first data stream through the alternative path, increasing the number of the data streams corresponding to the alternative path, and reducing the number of the data streams corresponding to the target path.
In the present application, if all the messages of a large flow are forwarded on the same equivalent path, if the equivalent path is congested due to the large flow, the equivalent path congestion time lasts longer. In order to avoid this, a path switching interval may be set, by which the forwarding path of the large flow is switched.
Optionally, the path switching interval is greater than a path delay difference, the path delay difference is a difference between a first path delay and a second path delay, the first path delay is a duration required for sending a message through the first path, the second path delay is a duration required for sending a message through the second path, and the first path and the second path are two different paths from a source address to a destination address of the first data stream.
When the path switching interval is larger than the path delay difference, the large flow can not cause disorder even if the path is switched in the mode.
Optionally, in the method, after the number of data streams corresponding to each equivalent path in the plurality of equivalent paths is acquired, if the number of data streams corresponding to each equivalent path reaches the target number, one equivalent path is selected from the plurality of equivalent paths as the target path by a polling method.
If the number of data streams corresponding to each equivalent path reaches the target number, the size of the current stream table for recording the number of data streams is indicated to reach the specification of the stream table, and at the moment, one equivalent path is selected from a plurality of equivalent paths as the target path in a polling mode. When the size of the flow table reaches the size allowed by the specification, the data flow is uniformly distributed to each equivalent path in a polling mode.
In a second aspect, a network device is provided, where the network device has a function of implementing the load sharing method behavior in the first aspect. The network device comprises a receiving module, a sending module and a processing module, wherein the receiving module and the sending module are used for realizing the receiving and sending related operations in the load sharing method provided by the first aspect, and the processing module is used for realizing the data processing operations in the load sharing method provided by the first aspect.
In a third aspect, a network device is provided, where the network device includes a processor and a memory, where the memory is configured to store a program for supporting the network device to perform the load sharing method provided in the first aspect, and store data related to implementing the load sharing method provided in the first aspect. The processor is configured to execute a program stored in the memory. The operating means of the memory device may further comprise a communication bus for establishing a connection between the processor and the memory.
In a fourth aspect, there is provided a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the load sharing method of the first aspect described above.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the load sharing method of the first aspect described above.
The technical effects obtained in the second, third, fourth and fifth aspects are similar to the technical effects obtained in the corresponding technical means in the first aspect, and are not described in detail herein.
Drawings
Fig. 1 is a schematic architecture diagram of an AI computing networking provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an AI computing communication model provided in an embodiment of the application;
FIG. 3 is a schematic diagram of a load level provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a method for determining a moving average load according to an embodiment of the present application;
FIG. 5 is a schematic diagram of path overload provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of a load statistics result provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of a DLB system according to an embodiment of the present application;
Fig. 8 is a load sharing method provided in an embodiment of the present application;
fig. 9 is a flowchart of a load sharing method provided in an embodiment of the present application;
FIGS. 10A-10C are schematic diagrams illustrating a process for adjusting a load level table according to embodiments of the present application;
FIG. 11 is a schematic diagram of a load curve of an equivalent path determined based on a piecewise function of fast ramp up and slow ramp down according to an embodiment of the present application;
fig. 12 is a schematic flow chart of adaptively adjusting a load threshold according to an embodiment of the present application;
FIG. 13 is a schematic diagram of an adjusted load threshold provided by an embodiment of the present application;
FIG. 14 is a schematic diagram of a polling-mode routing in a sampling period according to an embodiment of the present application;
fig. 15 is a flowchart of a load sharing method provided in an embodiment of the present application;
fig. 16 is a schematic structural diagram of a network device according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of another network device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before explaining the embodiment of the present application in detail, an application scenario of the embodiment of the present application is explained.
Currently, forwarding nodes in a network, such as a switch, are generally configured with a plurality of outgoing interfaces, and different outgoing interfaces correspond to different message forwarding paths. In order to avoid traffic congestion of a certain path, when forwarding a message, a forwarding node needs to select a path with the lightest load at the current time from different paths to send the message, and the process is load sharing. The purpose of load sharing is to realize load balancing among various paths and avoid network congestion.
The load sharing process in the network will be explained by taking AI calculation networking as an example.
Fig. 1 is a schematic architecture diagram of an AI computing networking provided in an embodiment of the present application. As shown in fig. 1, the AI computing network includes a server (server) 101, a tor switch 102, and a spine (spine) switch 103. The server 101 and the ToR switch 102 are connected by a wired or wireless connection for communication, and the ToR switch 102 and the spine switch 103 are also connected by a wired or wireless connection for communication.
The number of servers 101, toR switches 102, and spine switches 103 in the AI computation network may be any number. Two ToR switches 102 are illustratively included in fig. 1, 4 servers 101 are connected to each ToR switch 102, and two spine switches 103 are connected to each ToR switch 102.
In addition, each server 101 mounts a plurality of graphics processing units (graphics processing unit, GPUs), and there are a plurality of paths between each server 101 and the connected ToR switch 102, each path corresponding to one GPU on the server, for transmitting the data stream generated by the GPU. Illustratively, as shown in fig. 1, each server 101 is loaded with 8 GPUs, so there are 8 paths between each server 101 and ToR switch 102. The bandwidth of these 8 paths may be, for example, 100 gigabits per second (Gbps), i.e., the total bandwidth of the paths between one server 101 and the connected ToR switches 102 is 8 x 100Gbps.
In addition, each ToR switch and a plurality of spine switches 103 are connected separately, and there are a plurality of equivalent paths between each ToR switch 102 and any spine switch 103. Illustratively, in fig. 1, each ToR switch 102 and 2 spine switches 103 are respectively connected, and there are 16 equivalent paths with a bandwidth of 100Gbps between each ToR switch 102 and any spine switch 103, that is, the total bandwidth of the equivalent paths between one ToR switch 102 and one spine switch 103 is 16×100Gbps. In this scenario, there are 32 egress interfaces per ToR switch 102, each corresponding to an equivalent path to spine switch 103.
The GPUs on the various servers in fig. 1 constitute a GPU cluster that performs AI computation. The GPUs in the GPU cluster work in an alternate mode according to a calculation phase and a communication phase, wherein the calculation phase is a process of generating a data stream by the GPU, and the communication phase is a process of sending the data stream to the ToR switch by the GPU. Wherein the computing phase and the communication phase alternately comprise: the method comprises the steps that a first time length and a second time length are preset, and when the GPU cluster performs AI calculation, the GPU alternately performs a calculation phase of the first time length and a communication phase of the second time length. In other words, the GPU current time is either in the computation phase or in the communication phase.
Fig. 2 is a schematic diagram of an AI computing communication model according to an embodiment of the present application. As shown in fig. 2, in the AI computation communication model, each GPU performs a computation phase of 9ms and a communication phase of 10ms in turn.
For the networking illustrated in fig. 1, it is assumed that two data streams are generated in each GPU computing phase, such that each ToR switch 102 will receive 64 data streams in the communication phase, and each ToR switch 102 has only 32 equal cost paths (each equal cost path corresponds to one outgoing interface on the ToR switch). For any ToR switch 102, 64 data streams received by that ToR switch 102 need to be load-shared over 32 equivalent paths.
In the above-described load sharing scenario, the transmission time difference between the 64 data streams is on the order of 100 microseconds (us) for any ToR switch 102, and therefore the concurrency (or synchronicity) of the 64 data streams is high. At this time, if a problem of uneven load sharing occurs between the ToR switch 102 and the spine switch 103, traffic congestion occurs in a certain equivalent path between the ToR switch 102 and the spine switch 103, which may easily cause a decrease in AI computing performance.
The method provided by the embodiment of the application can be applied to the scene of load sharing of the AI calculation networking. Alternatively, the method provided in the embodiment of the present application may be applied to other scenarios where load sharing is required, which is not illustrated here.
In order to facilitate the subsequent understanding of the technical effects of the embodiments of the present application, the following analysis is performed on a technique for load sharing based on a fixed load class table. Wherein the load levels in the fixed load level table are configured based on a preset plurality of fixed load thresholds.
Fig. 3 is a schematic diagram of a load level according to an embodiment of the present application. As shown in fig. 3, a plurality of load levels are divided based on pre-configuring a plurality of fixed load thresholds. In fig. 3, 7 load thresholds are configured, labeled as threshold 1, threshold 2, threshold 3, threshold 4, threshold 5, threshold 6, and threshold 7, respectively. The seven thresholds are gradually increased. Wherein, the load level corresponding to the load range smaller than the threshold 1 is 7, the load level corresponding to the load range between the threshold 1 and the threshold 2 is 6, the load level corresponding to the load range between the threshold 2 and the threshold 3 is 5, the load level corresponding to the load range between the threshold 3 and the threshold 4 is 4, the load level corresponding to the load range between the threshold 4 and the threshold 5 is 3, the load level corresponding to the load range between the threshold 5 and the threshold 6 is 2, the load level corresponding to the load range between the threshold 6 and the threshold 7 is 1, and the load level corresponding to the load range larger than the threshold 7 is 0.
The load level of each equivalent path is determined based on the average load of each equivalent path, and then when load sharing is required, the equivalent path with the lightest current load (in this embodiment, the load level number is the largest, and in other embodiments, the load level number may be the smallest) is selected for load sharing.
Since the average load of each equivalent path is usually determined by means of a moving average, that is, the average load of each equivalent path is a moving average load, and the moving average load is obtained by performing a sliding average process on the load collected by a plurality of sampling periods before the current time, in the immediately beginning stage of the data flow passing through a certain equivalent path of the ToR switch, the calculated moving average load of the equivalent path gradually increases, and after the data flow passes through the equivalent path for a plurality of periods, the calculated average load of the equivalent path can only represent the real load of the equivalent path. That is, when the equivalent path is flown with data, the statistical moving average load of the equivalent path cannot be fed back to the real load of the equivalent path quickly, and thus, when load sharing is performed based on the statistical moving average load, a problem of uneven sharing is likely to occur.
Fig. 4 is a schematic diagram of a method for determining a moving average load according to an embodiment of the present application. As shown in fig. 4, the running average load is determined using the load acquired in the last 8 sampling periods before the current time, each sampling period being 16 μs. Specifically, when the data flow passes through a certain equivalent path, the size of the data flow passing through the equivalent path in 16 μs is counted once every 16 μs to obtain the load of the equivalent path in each sampling period, and when the load acquired in the current sampling period is obtained, the load acquired in the previous 7 sampling periods can be combined to determine the moving average load of the equivalent path.
Thus, when the load of the equivalent path is acquired in the first sampling period, the load is marked as a load 1, and since the load of the equivalent path is not acquired before the current time, the load acquired in the sampling period before the current time is defaulted to be 0, the calculated moving average load of 8 sampling periods is smaller than the real load of the equivalent path, and after 8 sampling periods (128 mu s), the calculated moving average load can be close to the real load of the equivalent path, namely, the real load of the equivalent path can be fully reflected only by more than 128 mu s.
The above-described process of determining the moving average load is prone to the problem of path overload shown in fig. 5. As shown in fig. 5, when the ToR switch receives data stream 1 at 60Gbps, the equal cost path 1 is selected to transmit data stream 1 based on the current load level. In this scenario, if the load corresponding to the data stream 1 is not actually reflected in the moving average load of the equal cost path 1 (the load of the equal cost path 1 acquired in fig. 5 is 4 Gbps), the ToR switch will select the equal cost path 1 to send the data stream 2 based on the current load level. This results in simultaneous transmission of data stream 1 and data stream 2 via equal cost path 1, with the total traffic size of data stream 1 and data stream 2 being 120Gbps, exceeding the bandwidth of equal cost path 1 by 100Gbps, thereby causing congestion in equal cost path 1. That is, there arises a problem that the simultaneously transmitted large data streams cannot be uniformly load-shared.
In addition, since load sharing is performed based on the average load of the transmitted data streams of each equivalent path, if the problem of uneven load of each equivalent path has occurred at present, the load level of each equivalent path is passively adjusted according to the generated uneven load result, and this adjustment speed is slow, and the uneven load of each equivalent path can only be temporarily alleviated.
In addition, after a large data stream is sent through a certain equivalent path for the first time, the messages of the subsequent data stream are all sent through the equivalent path, so that if a plurality of large data streams are simultaneously sent through a certain equivalent path, the situation of uneven load sharing is caused, and the duration of the situation is longer. That is, after the large-flow equivalent path is selected, the equivalent path can not be modified, and the equivalent path can only be used all the time, and once a plurality of large flows have uneven load sharing, the equivalent path has long duration.
Based on the above analysis, the following problems are likely to occur in the technique of load sharing based on the fixed load rank table:
1) Since the load of the equal cost path appears too slow, the time to identify the true load of the outgoing stream is too slow.
2) Since the load level is determined based on a plurality of fixed load thresholds, a data stream may traverse a plurality of load levels when transmitted over an equivalent path. For example, in the load statistics shown in fig. 6, two data streams are sent sequentially through the same equivalent path, and when each data stream is sent through the equivalent path, the load levels of the equivalent path span 4 load levels. If the load statistics results of the different equivalent paths all show the results shown in fig. 6, the load levels of the different equivalent paths may also simultaneously include 4 levels at the same time, so that the different equivalent paths are divided into 4 groups when load sharing is performed, and a message is sent by selecting one equivalent path from the group with the highest load level, so that the number of interfaces capable of sending the message is reduced by multiple times. For example, for the ToR switch shown in fig. 1 including 32 equivalent paths, after dividing the equivalent paths into 4 groups, the number of equivalent paths that can be selected is 8, and the selector is lowered by 4 times with respect to selecting the target path from the 32 equivalent paths.
3) Because the routing adjustment is passively performed according to the load of the equivalent path (i.e. the equivalent path is reselected to forward the data stream), the network is severely congested when the routing adjustment is made, and even if the forwarding path of the newly received data stream is switched, the problem can only be relieved, but the congestion cannot be avoided.
4) The large data flow adopts the same equivalent path in the whole life cycle, so that once the congestion of the equivalent path selected by the large data flow occurs, the duration of the congestion condition of the equivalent path is long, and the service is interrupted.
Based on this, the embodiment of the application provides a load sharing method to solve the problem generated in the load sharing process.
The load sharing method provided by the embodiment of the application comprises two aspects. The first aspect configures the load threshold based on the historical load of the forwarding node, so that when the load of the equivalent path changes compared with the historical load, the load level of the equivalent path is correspondingly adjusted, and thus the forwarding node can quickly sense the load change of the equivalent path. The second aspect is to perform load sharing based on the number of data streams sent by each equivalent path, so as to ensure load balancing from the source. The contents of both of these aspects will be described in detail in the following embodiments.
In order to implement the load sharing method provided by the embodiment of the present application, the embodiment of the present application further provides a dynamic load balancing (dynamic load balance, DLB) system. The DLB system can be deployed on any forwarding node to realize load balancing among all equivalent paths on the forwarding node. Illustratively, the DLB system may be deployed into the ToR switch shown in FIG. 1 to achieve load balancing in AI computation.
Fig. 7 is a schematic architecture diagram of a DLB system according to an embodiment of the present application. As shown in fig. 7, the DLB system includes a plurality of functional modules. Three types of functional modules are included in fig. 7, one is a new module (a dark gray marked box in fig. 7), one is a functional module (a light gray marked box in fig. 7) after the original functional module is improved, and the other is an original functional module (a white marked box in fig. 7).
The function of the newly added or improved functional module is described below.
1) Load analysis module: the function module is a newly added function module and is used for analyzing the load of the equivalent path and determining a proper load threshold according to the load of the equivalent path.
2) An average load calculation module: the functional module is an improved module on the original functional module and is used for calculating the average load of the equivalent path, and the embodiment of the application can modify the calculation algorithm of the average load so as to accelerate the load appearance.
3) And a routing module: the functional module is a module improved on the original functional module and is used for selecting an equivalent path from a group of equivalent paths by using a certain algorithm, and the functional module is improved to select the equivalent path in a round-robin mode instead of a random mode in certain scenes, and the following embodiments are described in detail.
4) A flow table module: the functional module is an improved module on the original functional module and is used for recording equivalent path information, time stamp information and the like of the data stream. The function module is added with a data stream identification function and a data stream size statistics function, so that when an average load is calculated, the historical and counted data stream size can be directly overlapped on the load of an equivalent path, and load appearance is quickened.
The method provided in the embodiments of the present application is explained in detail below.
Fig. 8 is a load sharing method provided in an embodiment of the present application. The method is applied to any forwarding node, and the following embodiments take the first node as an example. As shown in fig. 8, the load sharing method includes the following steps 801 to 804.
Step 801: the first node receives a first data stream.
The first node receiving the first data stream means: the first node receives a message belonging to a first data stream. The message carries the identifier of the data stream to which the message belongs, so that when the first node receives the message, the first node can determine which data stream the currently received data stream belongs to based on the identifier of the data stream carried in the message.
Step 802: the first node acquires a plurality of equivalent paths corresponding to the first data stream.
The equivalent paths corresponding to the first data stream refer to: multiple paths from the first node to the second node, the second node forwarding the next hop of the first data stream for the first node. Based on this, the determination manner of the equivalent paths corresponding to the first data stream may be: the first node determines a next hop for forwarding the first data stream, then determines a plurality of outgoing interfaces on the first node that can reach the next hop, and takes each outgoing interface in the plurality of outgoing interfaces as an equivalent path.
For example, for AI networking shown in fig. 1, when the ToR switch receives the first data stream sent by the local GPU, it determines that the next hop for forwarding the first data stream is a spine switch. Because 32 outgoing interfaces are configured on the ToR switch for sending traffic to the spine switch, the ToR switch can determine 32 equivalent paths corresponding to the first data stream, and the 32 equivalent paths can be respectively identified through the 32 outgoing interfaces.
Step 803: the first node determines a target path from a plurality of equivalent paths, wherein the target path is a path which meets preset conditions in the equivalent paths, and the preset conditions comprise that the number of forwarded data streams is minimum and the message interval in each data stream is smaller than a set time interval; or the load grade corresponding to the load is the lightest load grade allowed currently, and the load grade of each equivalent path is determined according to the load of the current equivalent path and the historic loads of a plurality of equivalent paths.
Step 804: the first node forwards the first data stream over the target path.
And forwarding the first data stream through the target path, namely forwarding the first data stream through an output interface corresponding to the target path.
In the embodiment of the application, the target path may be determined by the number of data flows forwarded by each equivalent path. Because the quantity of the data streams forwarded by the equivalent paths can be updated immediately after the data streams are sent by the equivalent paths, the method does not need to determine the load of each equivalent path relative to the determination of the target path by the load of each equivalent path, thereby avoiding the problem caused by the fact that the determined load cannot timely display the real load. In other words, compared with passive load balancing through the load of each equivalent path, the method actively performs load sharing based on the number of data streams forwarded by each equivalent path, so that the load balancing can be ensured from the source.
In addition, in the embodiment of the application, the target path may also be determined by the load level corresponding to the load of each equivalent path. Since the load level corresponding to the load is the lightest load level currently allowed, and the load level of each equivalent path is determined according to the load of the current equivalent path and the historic loads of a plurality of equivalent paths. That is, the load class corresponding to the load of the equivalent path is determined with the historical load as the reference object. Compared with the method that a fixed load threshold is used as a reference object to determine the load level corresponding to the load of the equivalent path, the method has the advantages that the historical load is used as the reference object, the load level corresponding to the load of the equivalent path can be correspondingly adjusted when the load of the equivalent path changes relative to the historical load, and accordingly the target path determined by the first node based on each equivalent path can be correspondingly adjusted, so that the first node can quickly adjust the current load balancing strategy when the load of the equivalent path changes. In other words, in the scheme that the target path is determined by the load levels corresponding to the loads of the equivalent paths, when the load of the equivalent path changes, the first node can quickly sense the load change of the equivalent path, so that the current load balancing strategy can be quickly adjusted.
Based on the embodiment shown in fig. 8, it can be known that, when the first node performs load sharing, the target path may be determined by the load level corresponding to the load of each equivalent path, or may be determined by the number of data flows forwarded by each equivalent path. This is explained in detail below by means of two examples.
Fig. 9 is a flowchart of a load sharing method provided in an embodiment of the present application. The method is used for explaining how the first node determines the target path through the load level corresponding to the load of each equivalent path. As shown in fig. 9, the method includes the following steps 901 to 904.
Step 901: the first node receives a first data stream.
Step 902: the first node acquires a plurality of equivalent paths corresponding to the first data stream.
Step 903: the first node determines a target path from the plurality of equivalent paths based on a load class corresponding to the load of each equivalent path of the plurality of equivalent paths. The target path is a path meeting preset conditions in a plurality of equivalent paths, wherein the preset conditions comprise that the load level corresponding to the load is the lightest load level allowed currently, and the load level of each equivalent path is determined according to the load of the current equivalent path and the historical loads of the equivalent paths.
Step 904: the first node forwards the first data stream over the target path.
In step 903, the load level of each equivalent path may be determined according to the load of the equivalent path and a load level table, where the load level table indicates N load thresholds, and N is a positive integer greater than or equal to 1.
Wherein the load level table indicates that the N load thresholds can be understood as: the load levels in the load level table are determined based on the N load thresholds. Specifically, the N load thresholds are sequentially arranged according to the size order, and n+1 number ranges can be obtained based on the N load thresholds after sorting, where the n+1 number ranges include a number range between every two adjacent load thresholds after sorting, a number range greater than a maximum load threshold, and a number range less than a minimum load threshold. For the n+1 number ranges, n+1 load levels may be configured, respectively, to obtain a load level table. The larger the load level is, the smaller the value in the value range corresponding to the load level is.
In addition, the load level may also be referred to as a load sharing level. Based on this, the correspondence between the load sharing level and the load threshold in the load level table may refer to fig. 3, and a description thereof will not be repeated.
In the embodiment of the present application, the N load thresholds indicated by the load level table are determined according to the historical loads of the plurality of equivalent paths, where the historical loads are loads when the load balance degree among the plurality of equivalent paths reaches the target load balance degree. The load balancing degree is the proportion of equivalent paths, wherein the difference value between loads in the equivalent paths is lower than a first difference value threshold value.
The first difference threshold is a configured value. For example, the first difference threshold is a smaller value, and if the load difference between two equivalent paths is lower than the first difference threshold, the loads of the two equivalent paths may be considered to be substantially the same.
The target load balancing degree is also a configured value. Illustratively, the target load balancing is a value approaching 1, such as 90% or 95%. Based on this, when the load balance degree between the plurality of equivalent paths exceeds the target load balance degree, it can be considered that the loads between most of the equivalent paths are substantially the same, that is, the load balance is achieved between the equivalent paths.
Since the N load thresholds indicated by the load level table are determined according to the historic loads of the plurality of equivalent paths, the historic loads are loads when the load balance degree among the plurality of equivalent paths reaches the target load balance degree. Therefore, when load sharing is performed through the load level table, paths with lighter loads can be more effectively utilized, and load balancing among the current paths can be realized.
The AI computation communication model shown in fig. 2 will be described as an example. Assuming that the N load thresholds indicated by the load level table are determined based on the traffic sent by the ToR switch to the spine switch in the first 10ms communication phase from left to right, in the second 10ms communication phase, the ToR switch may directly use the load level table to perform load sharing, so as to promote load balancing between 32 equivalent paths corresponding to 32 outgoing interfaces on the ToR switch.
It should be noted that, in the AI computing communication model shown in fig. 2, the data stream generated by the GPU in the communication phase of each round and the data stream generated by the GPU in the communication phase of the previous round are substantially the same in size, that is, the size of the data stream in the AI computing communication model has a characteristic of repeated occurrence. Based on this, with the embodiment shown in fig. 9, in the communication phase of each round, load sharing is performed directly using the load level table determined based on the traffic of the communication phase of the previous round, so as to promote load balancing between the equivalent paths in the current round of communication phase.
That is, for the AI networking shown in fig. 1, the first node in the embodiment of fig. 9 is a ToR switch, and the first node is loaded with a plurality of servers, where each server is configured with a plurality of GPUs, and the GPUs are used to perform AI computation. The first node is configured to send data streams generated by the plurality of GPUs to a second node, which is a spine (spine) switch. In this scenario, the historical load in step 903 can be understood as: in a communication phase between the most recent round of the multiple GPUs from the current time and the ToR switch, when the load balance among the multiple equivalent paths on the ToR switch reaches the target load balance, the loads of the multiple equivalent paths.
In addition, in the first node communication stage, the load level table can be updated in combination with the current load of each equivalent path, so that the load balance among the equivalent paths in the subsequent load sharing process based on the updated load level table is improved.
For example, for the AI computing communication model shown in fig. 2, after load sharing is performed in the communication phase of each round by using the load level table determined based on the previous round of communication phase, the load threshold value in the load level table (i.e. the load level table) may be further updated in combination with the load of the current paths and the target load balance degree, so that the load balance degree between the paths in the current round of communication phase reaches the target balance degree. The updated load level table of this round can be used for load sharing in the next round of communication stage.
How to update the load level table is explained in detail below.
In the embodiment of the present application, the implementation process of updating the load level table may be: determining the current loads of a plurality of equivalent paths; determining the load balancing degree of the equivalent paths based on the current loads of the equivalent paths; if the determined load balance degree is lower than the target load balance degree, the load level table is adjusted, and the operation of determining the current loads of the equivalent paths is performed in a returning mode until the last determined load balance degree exceeds the target load balance degree.
If the determined load balancing degree is lower than the target load balancing degree, the load balancing among the equivalent paths is not good after the load sharing is carried out through the current load balancing table, so that the load balancing table needs to be adjusted to improve the load balancing among the equivalent paths.
In the embodiment of the application, the adjustment of the load level table can be realized by adjusting the load threshold value. Based on this, in one possible implementation, the implementation process of adjusting the load level table may be: clustering the equivalent paths into M load sharing groups based on the current loads of the equivalent paths, wherein each load sharing group comprises at least one equivalent path, and M is greater than or equal to 1; and determining a load threshold corresponding to each load sharing group based on the current load of the equivalent path in each load sharing group, obtaining M load thresholds corresponding to the M load sharing groups one by one, and updating the load level table according to the M load thresholds.
Wherein, updating the load level table according to the M load thresholds can be understood as: and replacing the original N load thresholds with the M load thresholds, and then configuring each load level in the load level table again based on the M load thresholds to obtain an updated load level table.
And clustering the equivalent paths with the same load into a load sharing group by clustering the equivalent paths. The load thresholds are then reconfigured around the current load of the equal cost paths in each load sharing group to effect an update to the load class table. Thus, when load sharing is performed based on the updated load level table, if the current load of a certain equivalent path changes relative to the previous load, the load level corresponding to the equivalent path will be adjusted accordingly because the load threshold in the load level table is basically about the load before the equivalent path, and accordingly the first node will adjust the current load balancing strategy accordingly, so as to promote the load balancing among the equivalent paths.
In other words, the load threshold is adjusted by clustering the equivalent paths, and the load level table is updated based on the adjusted load threshold, so that load balancing among the equivalent paths can be improved after load sharing is performed according to the updated load level table, and the load balancing among the equivalent paths can reach the target load balancing degree.
The implementation process of determining the load threshold corresponding to each load sharing group based on the current load of the equivalent path in each load sharing group may be: the maximum load in the current load of the equal-cost path in each load sharing group is used as the load threshold corresponding to the load sharing group. The implementation process is also for the first node to be able to quickly sense the load change of the equivalent paths, so as to quickly adjust the load balancing policy, thereby improving the load balancing among the equivalent paths.
Alternatively, an average value of the current load of the equal-speed path in each load sharing group may be determined, and the average value may be used as the load threshold value corresponding to the load sharing group. Alternatively, other processing may be performed on the current load of the equal-speed path in the load sharing group to obtain a load threshold corresponding to the load sharing group, which is not illustrated herein.
The above is to update the N load thresholds based on the current load of each equivalent path. Alternatively, in another possible implementation manner, the original load level table may be updated by directly fine-tuning on the basis of the original N load thresholds, such as the original N load thresholds fluctuating up and down. The implementation mode is simple to operate, and the corresponding algorithm is easy to implement.
In addition, the embodiment of the application also provides a detailed scheme for the multi-path clustering process, and the explanation is provided below.
In one possible implementation, based on the current loads of the equivalent paths, the implementation process of clustering the equivalent paths into at least one load sharing group may be: collecting the load of each equivalent path in a plurality of equivalent paths once in every interval sampling period; when the load of each equivalent path acquired in the current sampling period is acquired each time, determining the sliding average load of each equivalent path based on the sliding weight; clustering the equivalent paths based on the moving average load of each equivalent path in the equivalent paths to obtain X equivalent path clusters; if at least one equivalent path cluster in the X equivalent path clusters does not meet the target condition, increasing the sliding weight and/or reducing the duration of the sampling period, and returning to execute the acquisition of the load of each equivalent path in the equivalent paths every interval sampling period until M equivalent path clusters are obtained after the last clustering to meet the target condition, and taking the M equivalent path clusters as M load sharing groups.
Wherein, the target condition is: the distance between the moving average loads of any two equivalent paths in different equivalent path clusters exceeds a distance threshold. The distance threshold is a pre-configured value, and the value is used for judging whether the clustering result reaches an expected clustering result. For any two different equivalent path clusters, if the distance between the moving average loads of any two equivalent paths in the two equivalent path clusters exceeds the distance threshold, the distance between the different equivalent path clusters is far, and the clustering result reaches the expected clustering result. Accordingly, if the distance between the moving average loads of two equivalent paths in the two equivalent path clusters is lower than the distance threshold, the distance between the two equivalent path clusters is relatively close, and the boundary of each equivalent path cluster after clustering is not clear. Therefore, the clustering needs to be performed again.
In addition, each time the load of each equivalent path acquired in the current sampling period is acquired, the sliding average load of each equivalent path based on the sliding weight can be determined by the following formula:
AvePortLoading t+1 =
AvePortLoading t +(PortLoading t+1 -AvePortLoading t )/2 w
wherein, avePortLoading t+1 Indicating a running average load determined when the load acquired at the t+1th sampling period is obtained. PortLoading t+1 Indicating the load acquired at the t+1st sampling period. AvePortLoading t Indicating a running average load determined when the load acquired at the t-th sampling period is obtained. w represents the sliding weight.
Based on the above formula, it can be known that, when the load of the equivalent path acquired in the current sampling period is acquired each time, the moving average load of the equivalent path is redetermined based on the load acquired in the current sampling period and the moving average load determined in the last sampling period by the above formula. That is, each time the load of the equivalent path acquired in the current sampling period is acquired, a moving average load corresponding to the current sampling period is determined.
The clustering of the equivalent paths based on the moving average load of each equivalent path in the equivalent paths can be understood as: and clustering the equivalent paths based on the moving average load of each equivalent path in a plurality of sampling periods.
For ease of understanding, the sliding weights are explained herein as follows: 2 w Is an integral value, when the integral value is 1, the load PortLoading acquired in the current sampling period t+1 The load fluctuation caused can be directly reflected on the moving average load; when 2 w When the load is 2, the moving average load corresponding to the current sampling period can only reflect load fluctuation of 1/2, and the like. Written as 2 w In the form of (2), the algorithm mainly considers that some special chips cannot perform division operation, but can perform right shift operation w The algorithm of the chip aiming at the moving average load can be realized through right shift operation.
Based on the above explanation of the sliding weights, the larger the sliding weight w is, the smaller the fluctuation between the sliding average loads corresponding to different sampling periods is, that is, the more stable the equivalent paths between the sliding average loads corresponding to different sampling periods are, thereby being beneficial to clustering each equivalent path. Therefore, when the clustering effect is not good, the sliding weight can be increased, and then the clustering is performed again.
In addition, the smaller the duration of the sampling period is, the more stable the equivalent paths are in the moving average load corresponding to different sampling periods, and the clustering of the equivalent paths is also facilitated. Therefore, when the clustering effect is poor, the duration of the sampling period can be reduced, and then clustering can be performed again.
It should be noted that, the step size of increasing the sliding weight and the step size of decreasing the duration of the sampling period may be preconfigured, which is not limited in the embodiment of the present application.
In addition, after the above-mentioned cyclic process until the clustering effect reaches the expected clustering effect, in order to avoid that the clustering takes a long time when the load level table is updated next time, the sliding weight used at the time of the last clustering and the duration of the sampling period may also be stored. Therefore, when the load level table needs to be updated next time, the current load of each equivalent path can be redetermined directly based on the stored sliding weight and the duration of the sampling period, so that the clustering of each equivalent path can be completed quickly.
In addition, after the first clustering is performed on the plurality of equivalent paths to obtain the X equivalent path clusters, if the X equivalent path clusters meet the target condition, the effect after the first clustering is indicated to reach the expected clustering effect. In this scenario, if the current clustering effect is very good, it indicates that the sliding weight set before is too large or the duration of the sampling period is too small. If the current load of the equivalent path is determined directly based on the sliding weight or the duration of the sampling period when the load level table is updated next time, the data processing amount of the current load of the equivalent path is determined to be more.
Based on the above, after the first clustering is performed on the plurality of equivalent paths to obtain the X equivalent path clusters, if the X equivalent path clusters meet the target condition, the sliding weight is reduced and/or the duration of the sampling period is increased, and the load of each equivalent path in the plurality of equivalent paths is collected once per interval sampling period is returned to be executed until at least one equivalent path cluster after the last clustering does not meet the target condition, and then the M equivalent path clusters after the last-last clustering are used as M load sharing groups.
By the method, the sliding weight and the duration of the sampling period used in the final clustering can just meet the clustering requirement, and the situation that the sliding weight is too large or the duration of the sampling period is too small is avoided.
For ease of understanding, the process of adjusting the load level table described above will be explained below by taking fig. 10A to 10C as an example. Fig. 10A to 10C are graphs in which the abscissa represents time and the ordinate represents load, and each line in fig. 10A to 10C represents the load of one equivalent path over time. The load of the equivalent path is specifically referred to as the moving average load.
After load sharing the data stream based on the load level table, the first node gathers the load of each equivalent path. And determining the load balance degree among the equivalent paths based on the collected loads of the equivalent paths, and clustering the equivalent paths based on the loads of the equivalent paths if the load balance degree is lower than the target load balance degree.
After the first clustering, the clustering result shown in fig. 10A is obtained. As shown in fig. 10A, the lines corresponding to the paths are not well separated, so that the clustering effect is poor. And then adjusting the sliding weight and/or the duration corresponding to the sampling period, after adjusting, re-collecting the load of each equivalent path, and clustering each equivalent path based on the re-collected load of each equivalent path to obtain a clustering result shown in fig. 10B, wherein lines corresponding to each equivalent path are obviously divided into four equivalent path clusters as shown in fig. 10B, and the clustering effect achieves the expected effect. At this time, a load threshold value corresponding to each equivalent path cluster may be determined, such as setting the load threshold value near the maximum load of the equivalent path cluster. And then adjusting the load level table based on the load threshold value corresponding to each equivalent path cluster. And load sharing is carried out again based on the adjusted load level table.
And after load sharing is carried out again based on the adjusted load level table, continuously collecting the load of each equivalent path. And determining the load balance degree among the equivalent paths based on the collection of the loads of the equivalent paths, and clustering the equivalent paths based on the loads of the equivalent paths if the load balance degree is lower than the target load balance degree. The clustering result shown in fig. 10C is obtained. The clustering process is the same as described above and will not be repeated here.
As shown in fig. 10C, the lines corresponding to the equivalent paths are clearly shown as two equivalent path clusters, and the clustering effect reaches the expected effect. At this time, a load threshold value corresponding to each equivalent path cluster may also be determined, for example, the load threshold value is set near the maximum load of the equivalent path cluster. And then adjusting the load level table based on the load threshold value corresponding to each equivalent path cluster. And load sharing is carried out again based on the adjusted load level table. And analogically, completing updating the load level table until the load balance degree among all equivalent paths reaches the target load balance degree.
Fig. 10A to 10C are examples of a clustering process, and do not constitute a limitation of the clustering process according to the present application.
In addition, in the implementation of the application, when the sliding average load of each equivalent path is determined based on the sliding weight, a scheme of adopting different sliding weights at different stages is also provided, so that the real load of the equivalent path can be quickly displayed in the determined sliding average load. This will be explained in detail below.
For any equivalent path, in one possible implementation, the implementation of determining the moving average load of the equivalent path with different sliding weights at different stages may be: acquiring a historical moving average load of the equivalent path, wherein the historical moving average load is determined based on the load acquired in the last sampling period of the current sampling period; if the load of the equivalent path acquired in the current sampling period exceeds the historical moving average load of the equivalent path and the difference between the load and the historical moving average load of the equivalent path exceeds a second difference threshold, determining the moving average load of the equivalent path based on the first sliding weight and the historical moving average load; if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the moving average load of the equivalent path is determined based on the second sliding weight and the historical moving average load.
Wherein the second sliding weight is greater than the first sliding weight.
The second difference threshold is a preconfigured value. If the load of the equivalent path collected in the current sampling period exceeds the historical moving average load of the equivalent path and the difference between the load and the historical moving average load of the equivalent path exceeds a second difference threshold, the current forwarded data flow of the equivalent path is indicated to be in a flow rising stage, and the sliding weight can be set to be smaller, so that the determined moving average load can also rise rapidly compared with the historical moving average load, and the real entity of the load condition of the equivalent path in the flow rising stage of the data flow can determine the moving average load now.
Accordingly, if the load of the equal cost path collected during the current sampling period is lower than the historical moving average load of the equal cost path, it is indicated that the data stream forwarded by the equal cost path may be currently in a traffic stabilization stage. In this scenario, the sliding weight can be set to be a little larger, so that the determined sliding average load of the equal-cost paths is relatively stable in different sampling periods of the flow stabilization stage.
In addition, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the data flow forwarded by the equivalent path may be in a flow stabilization stage or a flow reduction or stop stage. If the data flow forwarded by the equal cost path is in a flow reducing or stopping stage, the sliding weight can be set to be smaller, and the determined sliding average load can be quickly reduced compared with the historical sliding average load, so that the real entity of the load condition of the equal cost path in the flow reducing or stopping stage of the data flow is in the determined sliding average load.
Based on this, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, the implementation process of determining the moving average load of the equivalent path based on the second sliding weight and the historical moving average load may be: if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the number of continuous load drops does not exceed the target number, the moving average load of the equivalent path is determined based on the second sliding weight and the historical moving average load.
The number of continuous load falling times is the number of continuous sampling periods when the load collected by the sampling period is continuously smaller than the corresponding historical moving average load. By way of example, the number of consecutive drops in load is 10, which indicates that there has been 10 consecutive sampling periods including the current sampling period, corresponding to a moving average load that is less than the moving average load corresponding to the last sampling period.
The target number is also a configured value. When the continuous load drop times exceeds the target times, the moving average load corresponding to a plurality of sampling periods is smaller than the moving average load corresponding to the last sampling period, and the data flow forwarded by the equal-cost path is in a flow reduction or stop stage. Accordingly, when the number of continuous load drops below the target number, it indicates that the traffic of the data stream forwarded by the current equal-cost path is reduced only temporarily, and the situation still belongs to the traffic stabilization stage.
Based on the analysis, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, but the number of continuous load drops does not exceed the target number, the moving average load is determined based on the second sliding weight. Accordingly, if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the number of continuous decreases of the load exceeds the target number, the moving average load of the equivalent path is determined based on the third sliding weight and the historical moving average load. Wherein the third sliding weight is smaller than the second sliding weight.
The scheme of adopting different sliding weights at different stages can be represented by the following algorithm:
that is, in the above algorithm, the first sliding weight w1 is smaller than the second sliding weight w2, and the third sliding weight w3 is also smaller than the second sliding weight w2. The algorithm described above is also known as a piecewise function of fast and slow ramp-up and ramp-down.
Note that, the embodiment of the present application does not limit the magnitude relation between the first sliding weight w1 and the third sliding weight w 3.
Fig. 11 is a schematic diagram of a load curve of an equivalent path determined based on a piecewise function of fast ramp up and slow ramp down according to an embodiment of the present application. As shown in fig. 11, the determined moving average load is rapidly increased in the flow rate increasing stage. In the flow stabilization phase, the determined moving average load is stable. Therefore, when the load of the equivalent path is determined based on the piecewise function of fast rise and slow fall, the real load of the equivalent path can be quickly embodied in the determined load.
In addition, the foregoing may be further described by the flowchart shown in fig. 12, and as shown in fig. 12, after the first node performs load sharing according to the current load level table (the load threshold indicated by the current load level is a preset threshold), the DLB system in the first node may perform status monitoring on the load of each equivalent path, adaptively adjust the load threshold based on the monitoring result, and further update the load level table.
Fig. 13 is a schematic diagram of an adjusted load threshold according to an embodiment of the present application. As shown in fig. 13, after the load threshold is adjusted by the current load of each equivalent path, the adjusted load thresholds are only three. Compared with the method that 7 fixed load thresholds are pre-configured in fig. 6, when one data stream is sent through a certain equivalent path, the load level corresponding to the equivalent path basically does not cross a plurality of load levels, so that the problem that the number of equivalent paths which can be selected currently is reduced in multiple times due to the fact that one equivalent path crosses a plurality of load levels, and load sharing is uneven is avoided. The specific cause analysis process may refer to the foregoing description shown in fig. 6.
In addition, in the embodiment shown in fig. 9, in order to avoid that the real load of the equivalent path is not reflected in the determined load when the traffic is just started, a plurality of large flows simultaneously select the same equivalent path, and then the equivalent path is caused to be congested in the subsequent traffic (refer to fig. 5 for a specific process). The load of each equivalent path of the plurality of equivalent paths in fig. 9 can be obtained in the following manner.
For any equivalent path of the plurality of equivalent paths, a first load of the equivalent path is obtained, the first load being a load of the equivalent path acquired in a last sampling period prior to receiving the first data stream. And if the first load is smaller than the load threshold, acquiring the flow of the second data flow passing through the first node in the last sampling period of the last sampling period, and superposing the flow of the second data flow on the first load to obtain the load of the equal-cost path.
Wherein the second data stream sent from the equal-cost path in the last sampling period of the last sampling period can be understood as: all data streams sent from the equal-cost paths in the last sampling period of the last sampling period.
The load threshold is a pre-configured value. If the first load is smaller than the load threshold, the data flow forwarded by the equivalent path at the current time is in the just-started stage, and the size of the data flow forwarded by the equivalent path before the current time can be superimposed on the collected load, so that the situation that the load of the equivalent path is too small when the flow is just started is avoided.
In addition, in the embodiment shown in fig. 9, when the target path in step 903 includes Y equivalent paths, Y is a positive integer greater than 1, after the first node forwards the first data stream through the target path, when the first node receives the third data stream, if the currently determined target path is consistent with the determined target path in step 903, that is, the third data stream corresponds to the target path in step 903. In this scenario, in order to avoid traffic congestion caused by sending the first data stream and the third data stream to the same equivalent path in the Y equivalent paths, the third data stream may be selected from the Y equivalent paths by a polling method, and sent through the selected equivalent path.
The selection of one equivalent path from the Y equivalent paths by polling means can be understood as: if the first data stream is transmitted through one equivalent path of the Y equivalent paths, selecting one equivalent path from other equivalent paths of the Y equivalent paths to transmit the third data stream. When the first node receives a new data stream again, the new data stream also corresponds to the target path in step 903, then the method continues to select an equivalent path from the remaining equivalent paths in the Y equivalent paths to send the new data stream.
It will be appreciated that, during the current sampling period, since the current sampling has not yet ended, the load of each equivalent path is always a load determined based on the data acquired during the previous sampling period, and thus the load level corresponding to the load of each equivalent path is unchanged during the current sampling period. Based on this, if the first node receives a plurality of data streams within the current sampling period, the target paths determined for the plurality of data streams are the same target path. In this scenario, when the target path includes Y equal-cost paths, in order to avoid traffic congestion caused by transmitting the plurality of data streams through the same equal-cost path, one equal-cost path is selected from the Y equal-cost paths, respectively, so as to transmit the plurality of data streams.
As an example, as shown in fig. 14, assuming a sampling period of 1 μs, the target paths satisfying the preset condition in the current sampling period are path 1 and path 2. If 4 data streams are received simultaneously during the current sampling period, the 4 data streams are transmitted over path 1, path 2, path 1 and path 2, respectively. Thus realizing that a round-robin (round-robin) mode is adopted for selecting a plurality of received data streams in the current sampling period.
In summary, based on the embodiment shown in fig. 9, the following technical effects can be achieved:
(1) And adjusting the load threshold indicated by the load level table based on the load of each equivalent path, and updating the load level table to improve the load balance among the equivalent paths.
(2) The load of the equivalent paths is reshaped by adjusting the duration of the sampling period and/or the sliding weight so as to cluster the equivalent paths, and then an appropriate load threshold is set based on the load of each type of equivalent paths. Based on the above, when the load of the equivalent path changes, the first node can more quickly sense the load change of the equivalent path, thereby quickly adjusting the load balancing strategy.
(3) The load appearance is quickened by improving a moving average load algorithm, and the problem of inaccurate load judgment caused by too slow load appearance is fundamentally solved.
(4) In a scene (such as a sampling period) that a plurality of data streams correspond to the same target path, a polling mode is adopted to select paths for the plurality of data streams, so that excessive uneven load of each equivalent path is avoided as much as possible when the real load is not reflected on the collected load in the sampling period.
In addition, for the DLB system shown in fig. 7, the load analysis module may complete the above-described process of clustering equivalent paths and adjusting the load threshold based on the clustering result. The average load calculation module may determine the moving average load by the piecewise function of the above-described fast ramp up and slow ramp down. The routing module can adopt a polling mode to route the data streams in the scene that the data streams correspond to the same target path.
Fig. 15 is a flowchart of a load sharing method provided in an embodiment of the present application. The method is used to illustrate how the first node determines the target path by the number of data streams forwarded over each equivalent path. As shown in fig. 15, the method includes steps 1501 to 1504.
Step 1501: the first node receives a first data stream.
In step 1501, the first node receiving the first data stream may be understood as the first node first receiving a message belonging to the first data stream. In order to facilitate subsequent sending of the messages of the same data flow through the same path, the embodiment of the application is further configured with a data flow identifier for each equivalent path, where the data flow identifier indicates a data flow forwarded by the equivalent path.
In this scenario, when a first node receives any message, it determines first the identifier of the first data stream to which the message belongs. And then acquiring a data flow identifier corresponding to each equivalent path in the plurality of equivalent paths, if the identifier of the first data flow is different from the acquired data flow identifiers, determining that the first data flow is not the data flow sent by the first node, and executing the following steps 1502 to 1504 in this scenario to select a forwarding path for the message.
Accordingly, if it is determined that the first data stream is a data stream that has been transmitted by the first node based on the identification of the first data stream, an equivalent path used to forward the first data stream before the current time is determined, and the message is transmitted through the determined equivalent path.
Step 1502: the first node acquires a plurality of equivalent paths corresponding to the first data flow and the number of the data flows corresponding to each equivalent path in the equivalent paths, wherein the number of the data flows indicates the number of the data flows forwarded by the corresponding equivalent path.
Step 1503: the first node determines a target path from the plurality of equivalent paths based on the number of data flows corresponding to each of the plurality of equivalent paths. The target path is a path meeting preset conditions in a plurality of equivalent paths, wherein the preset conditions comprise that the quantity of forwarded data flows is minimum, and the message interval in each data flow is smaller than the set time interval.
Step 1504: the first node forwards the first data stream over the target path.
The number of data streams corresponding to each equivalent path may be recorded in the stream table. Based on this, in step 1502, the number of data streams corresponding to each equivalent path may be obtained from the stream table. Accordingly, after forwarding the first data stream through the target path in step 1504, the number of data streams corresponding to the target path in the stream table may also be updated, for example, the number of data streams corresponding to the target path may be increased. For example, the number of data streams corresponding to the target path may be increased by 1.
In addition, after forwarding the first data stream through the target path, the data stream identifier corresponding to the target path may be further updated, so that when a message belonging to the first data stream is subsequently received again, the message is forwarded based on the target path.
In addition, as the communication time increases, the number of data flows corresponding to each equivalent path on the first node is increasingly larger. If the number of data streams corresponding to each equivalent path recorded in the flow table cannot represent the number of data streams forwarded by each equivalent path in the last period, load sharing based on the number of data streams corresponding to each equivalent path in this case cannot well promote load balancing. Therefore, in the embodiment of the present application, a set time interval (i.e., aging time) is further provided, where the set time interval is used for the number of data flows corresponding to each equivalent path in the aging flow table.
In one possible implementation manner, the implementation manner of the number of data streams corresponding to each equivalent path in the aging stream table based on the set time interval may be: after the first data stream is sent through the target path, determining the time of last receiving the message belonging to the first data stream before the current time, and if the difference between the time of last receiving the message belonging to the first data stream before the current time and the current time exceeds the set time interval, reducing the number of data streams corresponding to the target path.
Wherein the set time interval is preconfigured. The set time interval may be, for example, a value between 200 mus and 1000 mus. If the time of last receiving the message belonging to the first data stream before the current time exceeds the set time interval, the difference between the current time and the current time indicates that no message of the first data stream is received in the last time, and the first data stream needs to be aged out of the data stream corresponding to the target path at the moment so as to reduce the number of the data streams corresponding to the target path. Illustratively, the number of data streams corresponding to the target path is reduced by 1.
In addition, when the first data stream is aged out of the data stream corresponding to the target path, the identification of the first data stream is deleted from the data stream identifications corresponding to the target path.
In addition, in the embodiment of the present application, if all the messages of a large flow are forwarded on the same equivalent path, if the equivalent path is congested due to the large flow, the equivalent path congestion time lasts longer. In order to avoid this, a path switching interval may be set by which the forwarding path is switched for the large flow.
In one possible implementation, the implementation procedure of switching the forwarding path for the large flow through the path switching interval may be: after forwarding the first data stream through the target path, acquiring a difference value between the current time and the initial sending time, wherein the initial sending time is the time for forwarding the message belonging to the first data stream through the target path for the first time; if the difference between the current time and the initial sending time exceeds the path switching interval, acquiring the flow size of the first data flow forwarded through the target path before the current time; if the flow size of the first data flow forwarded before the current time exceeds the target value, selecting an alternative path from a plurality of equivalent paths, wherein the alternative path and the target path are different equivalent paths; and continuing to send the first data stream through the alternative path, increasing the number of the data streams corresponding to the alternative path, and reducing the number of the data streams corresponding to the target path.
Accordingly, if the traffic size of the first data stream forwarded before the current time is lower than the target value, the first data stream does not need to be rerouted in the above manner.
That is, when the timing time reaches the path switching interval, if the flow of the first data stream in the path switching interval exceeds the target value, it indicates that the first data stream is a large stream, and in order to avoid congestion of the target path, an equivalent path is reselected to send the first data stream when the timing time reaches the path switching interval. If the traffic size of the first data stream forwarded before the current time is lower than the target value, the first data stream is a small stream, and then the forwarding path of the first data stream is not required to be switched. In this way, a large stream can be divided into a plurality of small streams, and the divided small streams are then transmitted on different equivalent paths.
The path switching interval is preconfigured. The path switching interval may be a value between 10 mus and 100 mus, for example.
In one possible implementation, the path switch interval may be greater than the path delay difference. The path delay difference is a difference value between a first path delay and a second path delay, the first path delay is a time length required for sending a message through the first path, the second path delay is a time length required for sending the message through the second path, and the first path and the second path are two different paths from a source address to a destination address of the first data stream.
For example, for the networking shown in fig. 1, the first path and the second path may be two paths between two different GPUs. When the path switching interval is larger than the path delay difference, the large flow can not cause disorder even if the path is switched in the mode.
Alternatively, the path switch interval may be determined in other ways, which are not illustrated here.
In addition, after the first node sends the message of the first data stream for the first time, the number of the messages of the first data stream sent by the first node subsequently can be recorded through a counter, so that when the current time reaches the path switching interval, the number of the messages of the first data stream sent by the first node is determined based on the counter, and the number of the messages is used as the flow size of the first data stream forwarded by the first node.
Furthermore, the specifications of the current flow table, i.e. the data flows in the flow table that can be recorded for the equal cost paths, are limited. For example, the current flow table allows recording 100 data flows for each equivalent path. Under the scene, under the condition that a first node receives a message belonging to a first data stream for the first time, acquiring the number of data streams corresponding to each equivalent path in a plurality of equivalent paths, and if the number of data streams corresponding to each equivalent path reaches the target number, selecting one equivalent path from the plurality of equivalent paths as the target path in a polling mode.
Wherein the target number, i.e. the number of data streams the stream table allows to be recorded for a single equivalent path. If the number of the data streams corresponding to each equivalent path reaches the target number, the content recorded in the current stream table is indicated to reach the size allowed by the specification of the stream table, and at the moment, one equivalent path is selected from a plurality of equivalent paths as the target path in a polling mode. And, when the route is selected by the polling mode, the information of the data streams is not recorded in the stream table.
In summary, based on the embodiment shown in fig. 15, the following technical effects can be achieved:
(1) The embodiment shown in fig. 15 actively performs load sharing based on the number of data flows forwarded by each equivalent path, thus actively ensuring the uniformity of the data flows on each equivalent path, as opposed to passively adjusting the load balancing policy by the load of each equivalent path.
(2) The large flow is split into a plurality of small flows through a large flow rerouting mechanism, so that the flow fine granularity division is ensured. In addition, the path switching interval is configured to be larger than the path delay difference, so that the problem of disorder caused by path switching after dividing a large flow is avoided.
The following explains the structure of the network device according to the embodiment of the present application. The functions of any of the nodes involved in the foregoing embodiments may be implemented by the following network devices.
Fig. 16 is a schematic structural diagram of a network device according to an embodiment of the present application. As shown in fig. 16, the network device 1600 includes a receiving module 1601, and a processing module 1602 and a transmitting module 1603.
A receiving module 1601, configured to receive a first data stream. Specific implementation may refer to step 801 in the embodiment of fig. 8, or step 901 in the embodiment of fig. 9, or step 1501 in the embodiment of fig. 15.
The processing module 1602 is configured to obtain a plurality of equivalent paths corresponding to the first data stream. Specific implementations may refer to step 802 in the embodiment of fig. 8, or step 902 in the embodiment of fig. 9, or step 1502 in the embodiment of fig. 15.
The processing module 1602 is further configured to determine a target path from the plurality of equivalent paths, where the target path is a path that satisfies a preset condition in the plurality of equivalent paths, the preset condition includes that a number of forwarded data flows is minimum, and a packet interval in each data flow is smaller than a set time interval; or the load grade corresponding to the load is the lightest load grade allowed currently, and the load grade of each equivalent path is determined according to the load of the current equivalent path and the historic loads of a plurality of equivalent paths. Specific implementation may refer to step 803 in the embodiment of fig. 8, or step 903 in the embodiment of fig. 9, or step 1503 in the embodiment of fig. 15.
The sending module 1603 is further configured to forward the first data stream through the target path. Specific implementations may refer to step 804 in the embodiment of fig. 8, or step 904 in the embodiment of fig. 9, or step 1504 in the embodiment of fig. 15.
Optionally, the load level of each equivalent path is determined according to the load of the current equivalent path and a load level table, the load level table indicates N load thresholds, the N load thresholds are determined according to the historical loads of the equivalent paths, the historical loads are loads when the load balance degree among the equivalent paths reaches the target load balance degree, the load balance degree is the proportion of equivalent paths, in which the difference value among the loads in the equivalent paths is lower than the first difference value threshold, and N is greater than or equal to 1.
Optionally, the processing module 1602 is further configured to:
determining the current loads of a plurality of equivalent paths;
determining the load balancing degree of the equivalent paths based on the current loads of the equivalent paths;
if the determined load balance degree is lower than the target load balance degree, the load level table is adjusted, and the operation of determining the current loads of the equivalent paths is performed in a returning mode until the last determined load balance degree exceeds the target load balance degree.
Optionally, the processing module 1602 is specifically configured to:
clustering the equivalent paths into M load sharing groups based on the current loads of the equivalent paths, wherein each load sharing group comprises at least one equivalent path, and M is greater than or equal to 1;
and determining a load threshold corresponding to each load sharing group based on the current load of the equivalent path in each load sharing group, obtaining M load thresholds corresponding to the M load sharing groups one by one, and updating the load level table according to the M load thresholds.
Optionally, the processing module 1602 is specifically configured to:
the maximum load in the current load of the equal-speed path in each load sharing group is used as a load threshold corresponding to the load sharing group.
Optionally, the processing module 1602 is specifically configured to:
collecting the load of each equivalent path in a plurality of equivalent paths once in every interval sampling period;
when the load of each equivalent path acquired in the current sampling period is acquired each time, determining the sliding average load of each equivalent path based on the sliding weight;
clustering the equivalent paths based on the moving average load of each equivalent path in the equivalent paths to obtain X equivalent path clusters, wherein X is greater than or equal to 1;
If at least one equivalent path cluster in the X equivalent path clusters does not meet the target condition, increasing the sliding weight and/or reducing the duration of the sampling period, and returning to execute the acquisition of the load of each equivalent path in the equivalent paths once every interval sampling period until M equivalent path clusters are obtained after the last clustering to meet the target condition, and taking the M equivalent path clusters as M load sharing groups;
wherein, the target condition is: the distance between the moving average loads of any two equivalent paths in different equivalent path clusters exceeds a distance threshold.
Optionally, the processing module 1602 is specifically configured to:
if the X equivalent path clusters meet the target condition, the sliding weight is reduced and/or the duration of the sampling period is increased, the load of each equivalent path in the equivalent paths is collected once in every interval sampling period is returned to be executed, until at least one equivalent path cluster after the last clustering does not meet the target condition, and the M equivalent path clusters after the last-last clustering are used as M load sharing groups.
Optionally, the processing module 1602 is specifically configured to:
acquiring a historical moving average load of the equivalent path, wherein the historical moving average load is determined based on the load acquired in the last sampling period of the current sampling period;
If the load of the equivalent path acquired in the current sampling period exceeds the historical moving average load of the equivalent path and the difference value between the load and the historical moving average load of the equivalent path exceeds a second difference value threshold, determining the moving average load of the equivalent path based on the first sliding weight and the historical moving average load;
if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path, determining the moving average load of the equivalent path based on the second sliding weight and the historical moving average load;
wherein the second sliding weight is greater than the first sliding weight.
Optionally, the processing module 1602 is specifically configured to:
if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the continuous descending frequency of the load does not exceed the target frequency, determining the moving average load of the equivalent path based on the second sliding weight and the historical moving average load, wherein the continuous descending frequency of the load is the number of continuous sampling periods when the load acquired in the sampling period is continuously smaller than the corresponding historical moving average load.
Optionally, the processing module 1602 is specifically configured to:
if the load of the equivalent path acquired in the current sampling period is lower than the historical moving average load of the equivalent path and the continuous descending times of the load exceeds the target times, determining the moving average load of the equivalent path based on the third sliding weight and the historical moving average load;
Wherein the third sliding weight is smaller than the second sliding weight.
Optionally, the processing module 1602 is further configured to:
acquiring a first load of an equivalent path, wherein the first load is the load of the equivalent path acquired in the last sampling period before receiving a first data stream;
and if the first load is smaller than the load threshold, acquiring the flow size of the second data stream sent from the equivalent path in the last sampling period of the last sampling period, and superposing the flow size of the second data stream on the first load to obtain the load of the equivalent path.
Alternatively, when the target path includes Y equivalent paths, the reception module 1601 is further configured to: receiving a third data stream, wherein the third data stream corresponds to the target path;
accordingly, the processing module 1602 is also configured to: selecting an equivalent path from Y equivalent paths in a polling mode;
the sending module 1603 is further configured to: and transmitting the third data stream through the selected equivalent path.
Optionally, the processing module 1602 is specifically configured to:
acquiring the number of data streams corresponding to each equivalent path in a plurality of equivalent paths, wherein the number of the data streams indicates the number of the data streams forwarded by the equivalent paths;
and taking the equivalent path with the minimum number of corresponding data streams in the equivalent paths as a target path.
Accordingly, the processing module 1602 is also configured to:
the number of data streams corresponding to the target path is increased.
Optionally, the processing module 1602 is further configured to:
determining the time of last receiving a message belonging to a first data stream before the current time;
if the time of last receiving the message belonging to the first data stream before the current time exceeds the set time interval, the number of data streams corresponding to the target path is reduced.
Optionally, the processing module 1602 is further configured to:
acquiring a difference value between the current time and the initial sending time, wherein the initial sending time is the time for forwarding the message belonging to the first data stream through the target path for the first time;
if the difference between the current time and the initial sending time exceeds the path switching interval, acquiring the flow size of the first data flow forwarded through the target path before the current time;
if the flow size of the first data flow forwarded before the current time exceeds the target value, selecting an alternative path from a plurality of equivalent paths, wherein the alternative path and the target path are different equivalent paths;
and continuing to send the first data stream through the alternative path, increasing the number of the data streams corresponding to the alternative path, and reducing the number of the data streams corresponding to the target path.
Optionally, the path switching interval is greater than a path delay difference, the path delay difference is a difference between a first path delay and a second path delay, the first path delay is a duration required for sending a message through the first path, the second path delay is a duration required for sending a message through the second path, and the first path and the second path are two different paths from a source address to a destination address of the first data stream.
Optionally, after acquiring the number of data flows corresponding to each of the plurality of equal cost paths, the processing module 1602 is further configured to:
and if the number of the data streams corresponding to each equivalent path reaches the target number, selecting one equivalent path from the equivalent paths as the target path in a polling mode.
In the embodiment of the application, the target path may be determined by the number of data flows forwarded by each equivalent path. Because the quantity of the data streams forwarded by the equivalent paths can be updated immediately after the data streams are sent by the equivalent paths, the method does not need to determine the load of each equivalent path relative to the determination of the target path by the load of each equivalent path, thereby avoiding the problem caused by the fact that the determined load cannot timely display the real load. In other words, the method actively performs load sharing based on the number of data streams forwarded by each equivalent path, with respect to passively performing load balancing by the load of each equivalent path, and ensures load balancing from the source.
In addition, in the embodiment of the application, the target path may also be determined by the load level corresponding to the load of each equivalent path. Since the load level corresponding to the load is the lightest load level currently allowed, and the load level of each equivalent path is determined according to the load of the current equivalent path and the historic loads of a plurality of equivalent paths. That is, the load class corresponding to the load of the equivalent path is determined with the historical load as the reference object. And correspondingly, the first node adjusts the target path determined based on each equivalent path, so that the first node can quickly adjust the current load balancing strategy when the load of the equivalent path changes. In other words, in the scheme that the target path is determined by the load levels corresponding to the loads of the equivalent paths, when the load of the equivalent path changes, the first node can quickly sense the load change of the equivalent path, so that the current load balancing strategy can be quickly adjusted.
It should be noted that: in the network device provided in the foregoing embodiment, only the division of the functional modules is used for illustration when load sharing is performed, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the network device and the load sharing method embodiment provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the network device and the load sharing method embodiment are detailed in the method embodiment, which is not described herein again.
Fig. 17 is a schematic structural diagram of a network device according to an embodiment of the present application. Any of the foregoing nodes may be implemented by the network device shown in fig. 17. Referring to fig. 17, the network device includes a processor 1701, a communication bus 1702, a memory 1703, and at least one communication interface 1704.
The processor 1701 may be a general purpose central processing unit (central processing unit, CPU), application Specific Integrated Circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in the present application.
Communication bus 1702 is used to transfer information between the above-described components.
The Memory 1703 may be, but is not limited to, read-only Memory (ROM) or other type of static storage device that can store static information and instructions, random access Memory (random access Memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be electrically erasable programmable read-only Memory (EEPROM), compact disc read-only Memory (compact disc read-only Memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1703 may be self-contained and coupled to the processor 1701 via the communications bus 1702. The memory 1703 may also be integrated with the processor 1701.
The memory 1703 is used for storing program codes for executing the embodiments of the present application, and the processor 1701 controls the execution. The processor 1701 is configured to execute program code stored in the memory 1703. One or more software modules may be included in the program code. The network device may determine data for developing the application through one or more software modules in the processor 1701 and program code in the memory 1703.
The communication interface 1704, using any transceiver or like device, is used to communicate with other devices or communication networks, which may be ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.
In a particular implementation, as one embodiment, the network device may include multiple processors, such as processor 1701 and processor 1705 shown in FIG. 17. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, data subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatile disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The embodiments provided in the present application are not limited to the embodiments of the present application, but any modifications, equivalent substitutions, improvements, etc. within the principles of the embodiments of the present application should be included in the protection scope of the embodiments of the present application.
Claims (36)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210041176 | 2022-01-14 | ||
CN2022100411766 | 2022-01-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116489154A true CN116489154A (en) | 2023-07-25 |
Family
ID=87210710
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210266506.1A Pending CN116489154A (en) | 2022-01-14 | 2022-03-17 | A load sharing method and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116489154A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117579543A (en) * | 2024-01-16 | 2024-02-20 | 苏州元脑智能科技有限公司 | Data stream segmentation method, device, equipment and computer readable storage medium |
-
2022
- 2022-03-17 CN CN202210266506.1A patent/CN116489154A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117579543A (en) * | 2024-01-16 | 2024-02-20 | 苏州元脑智能科技有限公司 | Data stream segmentation method, device, equipment and computer readable storage medium |
CN117579543B (en) * | 2024-01-16 | 2024-04-05 | 苏州元脑智能科技有限公司 | Data stream segmentation method, device, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11496399B2 (en) | Dynamically balancing traffic in a fabric using telemetry data | |
CN111512600B (en) | Method and apparatus for distributing traffic in a telecommunications network | |
US11785113B2 (en) | Client service transmission method and apparatus | |
CN113098789B (en) | SDN-based data center network multipath dynamic load balancing method | |
CN110545228A (en) | Service function chain request processing method and system | |
CN113949665B (en) | Method, device, chip and computer storage medium for determining flow control threshold | |
EP4184937A1 (en) | Method, apparatus, and system for communication in data centre | |
EP4149066A1 (en) | Communication method and apparatus | |
US12166678B2 (en) | Path traffic allocation method, network device, and network system | |
CN112492643B (en) | Multilink-based data forwarding method and device and terminal equipment | |
CN109274589B (en) | Service transmission method and device | |
CN113543209A (en) | Token scheduling-based congestion control method and device | |
WO2018006649A1 (en) | A method and system for distributed control of large photonic switched networks | |
CN112866146A (en) | End-to-end bandwidth adjusting method, system, electronic equipment and storage medium | |
CN116915694A (en) | Data transmission method, gateway, communication device, and computer-readable storage medium | |
EP3874699A1 (en) | Methods, apparatus and computer programs for allocating traffic in a telecommunications network | |
WO2023116611A1 (en) | Queue control method and apparatus | |
CN116489154A (en) | A load sharing method and related device | |
CN110324265B (en) | Traffic distribution method, routing method, equipment and network system | |
Wang et al. | URBM: User-rank-based management of flows in data center networks through SDN | |
US11240164B2 (en) | Method for obtaining path information of data packet and device | |
Wu et al. | Routing policy for balanced management of slices using flexible Ethernet | |
CN117978877A (en) | Communication request processing method and device, electronic equipment and storage medium | |
CN115190537B (en) | Dynamic selection method and system for wireless link | |
WO2021197196A1 (en) | Route information transmission method and apparatus, and system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |