CN116302578B

CN116302578B - A QoS-constrained streaming application delay assurance method and system

Info

Publication number: CN116302578B
Application number: CN202310595972.9A
Authority: CN
Inventors: 孙大为; 管程钧
Original assignee: China University of Geosciences Beijing
Current assignee: China University of Geosciences Beijing
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-08-08
Anticipated expiration: 2043-05-25
Also published as: CN116302578A

Abstract

The present invention provides a QoS-constrained stream application delay guarantee method and system, which relate to the field of streaming computing technology, including: constructing a delay-constrained model by modeling the topology as a queuing network; evaluating system delay through the delay-constrained model, for The components with priority in the system are assigned executors, and the parallelism of the components in the topology task is adjusted online, and the processing delay under different QoS constraints is optimized; the communication overhead model is constructed; based on the greedy algorithm, the components after the parallelism are adjusted Task scheduling, to obtain task scheduling that minimizes communication overhead. Implemented the data monitoring module and performance optimization module of Lg‑Stream, and integrated it into Apache Storm, a typical distributed stream computing system, to conduct a comprehensive evaluation of system indicators from the perspective of system delay, throughput, resource utilization and resource usage . Compared with the existing Apache Storm framework, the proposed Lg‑Stream has a significant improvement in system performance.

Description

A QoS-constrained streaming application delay assurance method and system

技术领域technical field

本发明涉及分流式计算技术领域，尤其涉及一种QoS约束的流应用延迟确保方法及系统。The present invention relates to the technical field of streaming computing, in particular to a QoS-constrained stream application delay guarantee method and system.

背景技术Background technique

流计算环境中计算资源的有效使用对系统性能至关重要。现有的调度策略在QoS约束下很难保证延迟，更不用说调度本身也会带来通信成本。如何在QoS约束下合理分配资源，同时保持低延迟仍然是一个挑战。Effective use of computing resources in stream computing environments is critical to system performance. It is difficult for existing scheduling strategies to guarantee latency under QoS constraints, not to mention that scheduling itself will also bring communication costs. How to reasonably allocate resources under QoS constraints while maintaining low latency remains a challenge.

在模式转换方面，传统进行异构数据库迁移通常使用ETL方式，此类方法往往需要定制化，数据模式转化策略通常只能依靠数据库管理员的经验，需要大量的人力物力进行开发与测试，成本过高且不具备通用性。In terms of schema conversion, traditional heterogeneous database migration usually uses the ETL method, which often requires customization, and the data schema conversion strategy usually can only rely on the experience of the database administrator, requiring a lot of manpower and material resources for development and testing, and the cost is too high. High and not universal.

分布式流式计算的发展吸引了国内外专家学者的注意，各种流式计算框架如Storm、Heron应运而生，它们具有较高的实时计算能力，能够执行各种计算任务。Storm的模型简洁且有高容错、高可靠性、高可扩展性、支持多种编程语言等特点，使得Storm成为主流的实时流式计算框架[Wei X, Li L, Li X, et al. Pec: Proactive elasticcollaborative resource scheduling in data stream processing[J]. IEEETransactions on Parallel and Distributed Systems, 2019, 30(7): 1628-1642]。Storm在数据的时效性问题上有优秀的性能，并且容错能力和可扩展性强，从而被广泛应用于多个领域，但Storm在任务分配和资源调度方面的技术仍然存在缺陷，无法根据实时数据合理分配任务和使用计算资源，造成较大的资源浪费和系统开销。目前，Storm调度策略分为三步：组件并行度设置，把Executor分配给Worker，把Worker分配给Worker node。Storm现有的调度策略通常依赖于用户在任务拓扑中配置组件并行度，并行度是由用户根据他们的经验来设置的，这对于当前组件来说可能不理想，并且可能导致元组处理延迟更长和吞吐量更低。同时，Storm的默认调度算法没有考虑集群节点之间的数据传输成本，这也会影响系统性能。因此合理的调度对于Storm来说至关重要。The development of distributed streaming computing has attracted the attention of experts and scholars at home and abroad. Various streaming computing frameworks such as Storm and Heron have emerged as the times require. They have high real-time computing capabilities and can perform various computing tasks. Storm's model is simple and has the characteristics of high fault tolerance, high reliability, high scalability, and support for multiple programming languages, making Storm a mainstream real-time streaming computing framework [Wei X, Li L, Li X, et al. Pec : Proactive elasticcollaborative resource scheduling in data stream processing[J]. IEEETransactions on Parallel and Distributed Systems, 2019, 30(7): 1628-1642]. Storm has excellent performance on the timeliness of data, and has strong fault tolerance and scalability, so it is widely used in many fields. However, Storm's technology in task allocation and resource scheduling still has flaws, and it cannot be based on real-time data. Rational allocation of tasks and use of computing resources results in a large waste of resources and system overhead. Currently, the Storm scheduling strategy is divided into three steps: component parallelism setting, assigning Executors to Workers, and assigning Workers to Worker nodes. Storm's existing scheduling strategy usually relies on the user to configure component parallelism in the task topology. The parallelism is set by the user based on their experience, which may not be ideal for the current component and may result in higher latency for tuple processing. Longer and lower throughput. At the same time, Storm's default scheduling algorithm does not consider the cost of data transmission between cluster nodes, which will also affect system performance. Therefore, reasonable scheduling is very important for Storm.

发明内容Contents of the invention

本发明提供了一种QoS约束的流应用延迟确保方法及系统，解决现有的Storm调度策略通常依赖于用户在任务拓扑中配置组件并行度，对于当前组件来说不理想，且可能导致元组处理延迟更长和吞吐量更低，Storm的默认调度算法没有考虑集群节点之间的数据传输成本的问题。The present invention provides a QoS-constrained flow application delay guarantee method and system, which solves the problem that the existing Storm scheduling strategy usually relies on the user to configure the component parallelism in the task topology, which is not ideal for the current component and may cause tuples With higher latency and lower throughput, Storm's default scheduling algorithm does not take into account the cost of data transfer between cluster nodes.

为解决上述发明目的，本发明提供的技术方案如下：一种QoS约束的流应用延迟确保方法，其特征在于，步骤包括：In order to solve the above-mentioned object of the invention, the technical solution provided by the present invention is as follows: a QoS-constrained stream application delay ensuring method, characterized in that the steps include:

S1、将系统的拓扑任务建模为排队网络，构建延迟约束模型；S1. Model the topological tasks of the system as a queuing network and build a delay constraint model;

S2、通过延迟约束模型对系统中的拓扑延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整拓扑任务中组件的并行度，对不同QoS约束情况的处理延迟进行优化；S2. Evaluate the topology delay in the system through the delay constraint model, assign executors to the components with priority in the system, adjust the parallelism of the components in the topology task online, and optimize the processing delay of different QoS constraints;

其中，所述不同QoS约束情况包括：资源有限以及延迟约束；Wherein, the different QoS constraints include: limited resources and delay constraints;

S3、构建通信开销模型；基于贪心算法，结合所述通信开销模型对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度，完成QoS约束的流应用延迟确保。S3. Construct a communication overhead model; based on a greedy algorithm, combined with the communication overhead model, perform task scheduling on the components after adjusting the degree of parallelism, obtain task scheduling that minimizes communication overhead, and complete QoS-constrained flow application delay guarantee.

优选地，步骤S1中，将系统的拓扑任务建模为排队网络，构建延迟约束模型，包括：Preferably, in step S1, the topological task of the system is modeled as a queuing network, and a delay constraint model is constructed, including:

基于流入排队网络的参数，构建延迟约束模型评估整个拓扑的平均处理延迟；Based on the parameters of the inflow queuing network, construct a delay constraint model to evaluate the average processing delay of the entire topology;

其中，参数包括：流入排队网络的元组平均到达率λ₀，单个组件S_i；输入元组的平均到达速率λ_i，S_i分配的执行器数量c_i；每个执行器的平均处理速率µ_i；分配执行器最大数量c_max；添加一个执行器之后对元组平均处理延迟的减少程度L_i；Among them, the parameters include: the average arrival rate λ ₀ of tuples flowing into the queuing network, a single component S _i ; the average arrival rate λ _i of input tuples, the number of executors ci allocated by S _i _; the average processing rate of each executor µ _i ; allocate the maximum number of executors c _max ; add an executor to reduce the average processing delay of tuples L _i ;

其中，，i为一个记录下角标的正整数变量，1≤i≤N；E表示期望；T_i表示单个组件S_i的平均处理延迟。in, , i is a positive integer variable that records subscripts, 1≤i≤N; E represents expectation; T _i represents the average processing delay of a single component S _i .

优选地，步骤S2中，通过延迟约束模型对系统中的拓扑延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整拓扑任务中组件的并行度，对不同QoS约束情况的处理延迟进行优化，包括：Preferably, in step S2, the topology delay in the system is evaluated through the delay constraint model, executors are assigned to components with priority in the system, the parallelism of components in the topology task is adjusted online, and the processing delay for different QoS constraints is Optimize, including:

在资源有限的场景下执行器数量有限，则计算统计周期内的λ₀、λ_i、µ_i，通过所述延迟约束模型计算所有组件输入元组的平均处理延迟E[T_i](S_i)；优先分配执行器对整个拓扑输入元组的平均处理延迟影响最大的组件，直到执行器分配完，得出的执行器分配结果为使拓扑输入元组的平均处理延迟最小的结果，获得最优分配方案；In a scenario with limited resources, the number of executors is limited, then calculate λ ₀ , λ _i , µ _i within the statistical period, and calculate the average processing delay E[T _i ](S _i ); give priority to assigning executors to the components that have the greatest impact on the average processing delay of the entire topology input tuple until the executors are allotted. optimal allocation plan;

在延迟约束的场景下，通过贪心算法，每次均优先分配执行器给对整个拓扑输入元组的总停留时间影响最大的组件，此时延迟会随着逐步增加的执行器数量而减少，直到刚好满足延迟限制T_max，获得最优分配方案。In the case of delay constraints, through the greedy algorithm, each time the executor is preferentially assigned to the component that has the greatest impact on the total residence time of the input tuple of the entire topology. At this time, the delay will decrease with the number of executors gradually increasing until Just satisfy the delay limit T _max , and obtain the optimal allocation scheme.

优选地，步骤S3中，构建通信开销模型，包括：Preferably, in step S3, a communication overhead model is constructed, including:

通过评估节点之间的通信量，为拓扑结构构建通信开销模型。Model the communication overhead for the topology by evaluating the amount of communication between nodes.

优选地，步骤S3中，基于贪心算法，对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度，包括：Preferably, in step S3, based on a greedy algorithm, task scheduling is performed on the components after adjusting the degree of parallelism to obtain task scheduling that minimizes communication overhead, including:

采集周期内相互通信的执行器之间的元组传输速率E_i,j、Worker之间的元组传输速率W_i,j，Worker的CPU使用率U_wi、集群所有Worker node的CPU使用率U_ni；其中，j为一个记录下角标的正整数变量，1≤j≤N，且i≠j；The tuple transmission rate E _i,j between executors communicating with each other in the collection period, the tuple transmission rate W _i, j between Workers, the CPU usage rate U _wi of Workers, and the CPU usage rate U of all Worker nodes in the cluster _ni ; Among them, j is a positive integer variable that records subtitles, 1≤j≤N, and i≠j;

基于贪心算法，结合所述通信开销模型将互相通信的执行器分配给同一节点；Based on a greedy algorithm, combined with the communication overhead model, allocating the executors communicating with each other to the same node;

贪心算法每次均执行最优分配方案，获得最终分配结果，最终分配结果为降低Worker node间网络通信开销的最优解。The greedy algorithm executes the optimal allocation plan every time to obtain the final allocation result, which is the optimal solution to reduce the network communication overhead between Worker nodes.

在分配过程中增加性能优化阈值和Worker node CPU利用率阈值来控制分配方案触发频率，获得最小化通信开销的任务调度。In the allocation process, increase the performance optimization threshold and the Worker node CPU utilization threshold to control the trigger frequency of the allocation plan, and obtain task scheduling that minimizes communication overhead.

优选地，基于贪心算法，结合所述通信开销模型将互相通信的执行器分配给同一节点，包括：Preferably, based on a greedy algorithm, in combination with the communication overhead model, the executors communicating with each other are allocated to the same node, including:

基于贪心算法，结合所述通信开销模型优化Worker间通信开销，对E_i,j进行降序排列，基于贪心算法选取E_i,j序列中对应的通信量最大的两个执行器Executor_i和Executor_j，将两个执行器放入CPU利用率最小的Worker中去；Based on the greedy algorithm, combine the communication overhead model to optimize the communication overhead between Workers, arrange E _{i, j} in descending order, and select the two executors Executor _i and Executor _j corresponding to the largest communication volume in the E _{i, j} sequence based on the greedy algorithm , put the two executors into the Worker with the least CPU utilization;

基于贪心算法，结合所述通信开销模型优化Worker node间通信开销，对W_i,j进行降序排列，采用贪心算法每次选取W_i,j序列中对应的通信量最大的两个Worker：Worker_i和Worker_j，将两个Worker放入CPU利用率最小的Worker node中去；Based on the greedy algorithm, combined with the communication overhead model to optimize the communication overhead between Worker nodes, arrange W _i,j in descending order, and use the greedy algorithm to select the two Workers with the largest communication volume in the W _i,j sequence each time: Worker _i and Worker _j , put the two Workers into the Worker node with the smallest CPU utilization;

重复上述两次优化过程，直至所有的执行器和Worker分配完毕。Repeat the above two optimization processes until all executors and workers are allocated.

优选地，在分配过程中增加性能优化阈值控制分配方案触发频率，获得最小化通信开销的任务调度，包括：Preferably, a performance optimization threshold is added during the allocation process to control the triggering frequency of the allocation scheme, so as to obtain task scheduling that minimizes communication overhead, including:

在分配过程中增加性能优化阈值来控制分配方案触发频率，判断是否进行调度操作，只有当调度后节点间流量改进百分比达到性能优化阈值或以上时，才能进行调度操作。In the allocation process, increase the performance optimization threshold to control the trigger frequency of the allocation plan, and judge whether to perform scheduling operations. Only when the percentage of traffic improvement between nodes after scheduling reaches the performance optimization threshold or above, the scheduling operation can be performed.

优选地，还包括：对Worker node CPU利用率设置一个利用率阈值，若分配结果超过所述Worker node CPU的利用率阈值，则将负载分配到其他负载小的Worker node。Preferably, the method further includes: setting a utilization threshold for the CPU utilization of the Worker node, and if the distribution result exceeds the utilization threshold of the CPU of the Worker node, allocating the load to other Worker nodes with a small load.

一方面，提供了一种QoS约束的流应用延迟确保系统，系统包括：On the one hand, a QoS-constrained flow application delay guarantee system is provided, and the system includes:

调度模块，用于通过将拓扑建模为排队网络，构建延迟约束模型；通过延迟约束模型对系统中的拓扑延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整拓扑任务中组件的并行度，对不同QoS约束情况的处理延迟进行优化；The scheduling module is used to build a delay constraint model by modeling the topology as a queuing network; evaluate the topology delay in the system through the delay constraint model, assign executors to components with priority in the system, and adjust components in the topology task online The degree of parallelism optimizes the processing delay of different QoS constraints;

并行度优化模块，用于构建通信开销模型；基于贪心算法，结合所述通信开销模型对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度，完成QoS约束的流应用延迟确保。The parallelism optimization module is used to build a communication overhead model; based on the greedy algorithm, combined with the communication overhead model, the components after adjusting the parallelism are scheduled for tasks, to obtain task scheduling that minimizes communication overhead, and to complete the QoS-constrained flow application delay guarantee .

优选地，调度模块，进一步用于基于流入排队网络的参数，构建延迟约束模型评估整个拓扑的平均处理延迟；Preferably, the scheduling module is further configured to construct a delay constraint model to evaluate the average processing delay of the entire topology based on the parameters of the inflow queuing network;

一方面，提供了一种电子设备，所述电子设备包括处理器和存储器，所述存储器中存储有至少一条指令，所述至少一条指令由所述处理器加载并执行以实现上述QoS约束的流应用延迟确保方法。In one aspect, an electronic device is provided, the electronic device includes a processor and a memory, at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the flow of the above-mentioned QoS constraints Applies a delay-assured method.

一方面，提供了一种计算机可读存储介质，所述存储介质中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现上述QoS约束的流应用延迟确保方法。In one aspect, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement the above QoS-constrained flow application delay ensuring method.

上述技术方案，与现有技术相比至少具有如下有益效果：Compared with the prior art, the above-mentioned technical solution has at least the following beneficial effects:

上述方案，本发明实现了Lg-Stream的数据监测模块以及性能优化模块，并集成到典型的分布式流计算系统Apache Storm中，从系统延迟、吞吐量、资源利用率和资源使用量的角度对系统指标进行全面评估。实验结果表明，与现有的Apache Storm框架相比，提出的Lg-Stream在系统性能上具有明显的提升。与现有的Apache Storm的默认调度策略和在线调度策略相比，提出的Lg-Stream系统降低了延迟和资源使用量，提高了吞吐量和资源利用率。The above scheme, the present invention realizes the data monitoring module and the performance optimization module of Lg-Stream, and integrates it into the typical distributed stream computing system Apache Storm, from the perspective of system delay, throughput, resource utilization and resource usage A comprehensive evaluation of the system indicators. Experimental results show that compared with the existing Apache Storm framework, the proposed Lg-Stream has a significant improvement in system performance. Compared with the existing Apache Storm's default scheduling strategy and online scheduling strategy, the proposed Lg-Stream system reduces delay and resource usage, and improves throughput and resource utilization.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1是本发明实施例提供的QoS约束的流应用延迟确保方法流程示意图；FIG. 1 is a schematic flowchart of a QoS-constrained stream application delay guarantee method provided by an embodiment of the present invention;

图2是本发明实施例提供的资源有限场景下并行度弹性缩放算法1图；FIG. 2 is a diagram of a parallelism elastic scaling algorithm 1 provided in an embodiment of the present invention in a scenario with limited resources;

图3 是本发明实施例提供的资源有限场景下并行度弹性缩放算法2图；Fig. 3 is a diagram of parallelism elastic scaling algorithm 2 in the resource limited scenario provided by the embodiment of the present invention;

图4是本发明实施例提供的任务分配模型图；Fig. 4 is a task allocation model diagram provided by an embodiment of the present invention;

图5是本发明实施例提供的优化Worker间通信开销的调度算法1图；Fig. 5 is a diagram of scheduling algorithm 1 for optimizing inter-Worker communication overhead provided by an embodiment of the present invention;

图6是本发明实施例提供的优化Worker间通信开销的调度算法2图Fig. 6 is a diagram of the scheduling algorithm 2 for optimizing the communication overhead between Workers provided by the embodiment of the present invention

图7是本发明实施例提供的QoS约束的流应用延迟确保系统框图；FIG. 7 is a block diagram of a QoS-constrained flow application delay guarantee system provided by an embodiment of the present invention;

图8是本发明实施例提供的一种电子设备的结构示意图。Fig. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例的附图，对本发明实施例的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于所描述的本发明的实施例，本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the following will clearly and completely describe the technical solutions of the embodiments of the present invention in conjunction with the drawings of the embodiments of the present invention. Apparently, the described embodiments are some, not all, embodiments of the present invention. Based on the described embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明针对现有的调度策略在QoS约束下很难保证延迟的问题，提供了一种在QoS约束下的延迟确保调度策略Lg-Stream。Aiming at the problem that the existing scheduling strategy is difficult to guarantee the delay under the QoS constraint, the present invention provides a delay-guaranteed scheduling strategy Lg-Stream under the QoS constraint.

如图1所示，本发明实施例提供了一种QoS约束的流应用延迟确保方法，该方法可以由电子设备实现。如图1所示的QoS约束的流应用延迟确保方法流程图，该方法的处理流程可以包括如下的步骤：As shown in FIG. 1 , an embodiment of the present invention provides a QoS-constrained flow application delay guarantee method, which can be implemented by electronic devices. As shown in FIG. 1 , the QoS-constrained flow application delay guarantee method flow chart, the processing flow of the method may include the following steps:

S101、构建延迟约束模型。S101. Construct a delay constraint model.

一种可行的实施方式中，基于流入排队网络的参数，构建延迟约束模型评估整个拓扑的平均处理延迟；In a feasible implementation, based on the parameters of the inflow queuing network, a delay constraint model is constructed to evaluate the average processing delay of the entire topology;

S102、通过延迟约束模型对系统延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整拓扑任务中组件的并行度，对不同QoS约束情况下的处理延迟进行优化；其中，不同QoS约束情况包括：资源有限以及延迟约束；S102. Evaluate the system delay through the delay constraint model, assign executors to the components with priority in the system, adjust the parallelism of the components in the topology task online, and optimize the processing delay under different QoS constraints; wherein, different QoS Constraints include: limited resources and delay constraints;

一种可行的实施方式中，S102中，通过延迟约束模型对系统延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整组件的并行度，对不同QoS约束情况下的处理延迟进行优化；其中，不同QoS约束情况包括：资源有限以及延迟约束，包括：In a feasible implementation manner, in S102, the delay constraint model is used to evaluate the system delay, assign executors to the components with priority in the system, adjust the parallelism of the components online, and perform processing delay under different QoS constraints Optimization; wherein, different QoS constraints include: limited resources and delay constraints, including:

在资源有限的场景下，执行器数量有限，则根据周期T统计周期内的λ₀、λ_i、µ_i，通过延迟约束模型来计算所有组件输入元组的平均处理延迟E[T_i](S_i)；每次优先分配执行器对整个拓扑输入元组的平均处理延迟影响最大的组件，直到执行器分配完，此时得出的执行器分配结果为使得拓扑输入元组的平均处理延迟最小的；如图2所示为资源有限场景下并行度弹性缩放算法，即算法1。首先对拓扑所有组件所分配的Executor数量ci（1≤i≤N）进行初始化，满足每个组件的最小运行需求minci，然后对ci进行求和并判断是否大于所分配Executor最大数量cmax，如果最小运行需求都满足不了则不进行后续分配。基于贪心算法思想，在对拓扑分配组件总数不超过cmax的条件下，每次都把Executor分配给maxS，使得整体元组的总处理时间最小，执行上述步骤直到Executor分配完，输出并行度分配方案。其中，maxS为添加一个Executor之后对整个拓扑输入元组的平均处理延迟影响最大的那个组件。In a scenario where resources are _limited and the number of executors is limited, _the average processing delay E[T _i _] ( S _i ); Each time the executor is assigned priority to the component that has the greatest impact on the average processing delay of the entire topology input tuple, until the executor is allocated, the result of the executor allocation at this time is the average processing delay of the topology input tuple The smallest; as shown in Figure 2, it is the parallelism elastic scaling algorithm in the resource limited scenario, that is, Algorithm 1. First, initialize the number of executors ci (1≤i≤N) allocated to all components of the topology to meet the minimum operating requirements minci of each component, then sum up ci and determine whether it is greater than the maximum number of allocated executors cmax, if the minimum If the operation requirements cannot be met, no subsequent allocation will be made. Based on the idea of greedy algorithm, under the condition that the total number of components assigned to the topology does not exceed cmax, the Executor is assigned to maxS each time, so that the total processing time of the overall tuple is minimized, and the above steps are executed until the Executor is allocated, and the parallelism allocation plan is output . Among them, maxS is the component that has the greatest impact on the average processing delay of the entire topology input tuple after adding an Executor.

在延迟约束的场景下，通过贪心算法，每次优先分配执行器给对整个拓扑输入元组的总停留时间影响最大的组件，此时延迟会随着逐步增加的执行器数量而减少，直到刚好满足延迟限制T_max。此时执行器Executor分配结果就是在刚好满足延迟限制的条件下使得Executor使用数量最小，节约了系统的计算资源。In the case of delay constraints, through the greedy algorithm, each time the executor is preferentially assigned to the component that has the greatest impact on the total residence time of the entire topology input tuple. At this time, the delay will decrease with the number of executors gradually increasing until just Satisfy the delay limit T _max . At this time, the Executor allocation result of the executor is to minimize the number of Executors under the condition that the delay limit is just met, saving the computing resources of the system.

一种可行的实施方式中，另一种延迟约束场景下并行度弹性缩放算法如图3的算法2所示。In a feasible implementation manner, another parallelism elastic scaling algorithm in a delay-constrained scenario is shown in Algorithm 2 in FIG. 3 .

S103、构建通信开销模型；基于贪心算法，对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度。S103. Construct a communication overhead model; based on a greedy algorithm, perform task scheduling on the components after adjusting the degree of parallelism, and obtain task scheduling that minimizes communication overhead.

一种可行的实施方式中，S103中，构建通信开销模型，包括：In a feasible implementation manner, in S103, building a communication overhead model includes:

一种可行的实施方式中，如图4所示，Storm集群中存在以下三种不同的任务通信模式：In a feasible implementation, as shown in Figure 4, there are the following three different task communication modes in the Storm cluster:

（1）执行器Executor间通信。两个Executor在同一个Worker node中的同一个Worker中通信，如E4和E7之间即为Executor间通信。(1) Communication between Executors. Two Executors communicate in the same Worker in the same Worker node, such as the communication between E4 and E7 is the communication between Executors.

（2）工作者Worker间通信。两个Executor在同一个Worker node中的不同Worker中通信，需要经历两次路由，通信开销较Executor间通信大。如E1和E3之间即为Worker间通信。(2) Communication among workers. When two Executors communicate in different Workers in the same Worker node, they need to go through two routes, and the communication overhead is larger than that between Executors. For example, the communication between E1 and E3 is between Workers.

（3）工作节点Worker node间通信。两个Executor在不同Worker node中通信，需要占用Worker node间的网络带宽资源，通信开销最大。如E1和E4之间即为Worker node间通信。(3) Communication between Worker nodes. Two Executors communicate in different Worker nodes, which need to occupy network bandwidth resources between Worker nodes, and the communication overhead is the largest. For example, between E1 and E4 is the communication between Worker nodes.

因此，减少拓扑中Worker node间通信和Worker间通信有利于提高系统整体性能，为了达到最小化拓扑整体的通信开销，需要将Worker node间通信和Worker间通信尽可能转化为Executor间通信。Therefore, reducing the communication between Worker nodes and Workers in the topology is beneficial to improve the overall system performance. In order to minimize the overall communication overhead of the topology, it is necessary to convert the communication between Worker nodes and Workers into communication between Executors as much as possible.

定义R_j,k为两个相互通信的Executor j和Executor k之间的数据流大小，E为Executor的集合，D₁为Executor所在的Worker，D₂为Worker所在的Worker node，如，Executor i所在的Worker可表示为D₁（E_i），D₁（E_i）所在的Worker node可表示为D₂（D₁（E_i）），由于拓扑提交后，在不发生拥塞的前提下整个拓扑中的数据流总量为定值，即满足:Define R _j,k as the data flow size between two communicating Executor j and Executor k, E is the set of Executors, D ₁ is the Worker where the Executor is located, and D ₂ is the Worker node where the Worker is located, such as Executor i The Worker where D ₁ (E _i ) is located can be expressed as D ₁ (E _i ), and the Worker node where D 1 (E i ) is located can be expressed as D ₂ (D ₁ (E _i )). After the topology is submitted, the entire The total amount of data flow in the topology is a fixed value, which satisfies:

，因此当Worker node间数据流总量/>和Worker间数据流总量最小时，Executor间数据流总量最大，可表示为，使用CT_m,n表示Worker node m和Worker node n间的通信开销，则可表示为公式(1)，其中B_m,n为m和n之间的网络带宽，如果两个Executor在同一个Worker node上，则认为它们之间通信开销忽略不计： , so when the total amount of data flow between Worker nodes /> The total amount of data flow between and Worker At the minimum, the total amount of data flow between Executors is the largest, which can be expressed as , using CT _m,n to represent the communication overhead between Worker node m and Worker node n, it can be expressed as formula (1), where B _m,n is the network bandwidth between m and n, if two Executors are in the same On the Worker node, the communication overhead between them is considered negligible:

(1) (1)

其中，j为一个记录下角标的正整数变量，1≤j≤N，且i≠j；Among them, j is a positive integer variable that records subtitles, 1≤j≤N, and i≠j;

因为在执行资源调度期间会产生系统开销影响系统性能，因此我们加入性能优化阈值来决定是否根据计算出的调度方案执行资源调整。设调度前的节点间流量为Traffic_old，调度后的节点间流量为Traffic_new，设Traffic_change为调度后节点间流量改进百分比，表示为公式(2)，Traffic_improve由用户设置，只有Traffic_change至少达到Traffic_improve才进行调度操作。Because system overhead will affect system performance during the execution of resource scheduling, we add a performance optimization threshold to determine whether to perform resource adjustment according to the calculated scheduling scheme. Let the traffic between nodes before scheduling be Traffic _old , the traffic between nodes after scheduling is Traffic _new , let Traffic _change be the traffic improvement percentage between nodes after scheduling, expressed as formula (2), Traffic _improve is set by the user, only if Traffic _change is at least The scheduling operation is performed only when Traffic _improve is reached.

（2） (2)

在调度过程中需要注意Worker node过载问题，一旦Worker node过载，作业平均处理时间会暴涨、元组会开始失败并且元组开始排队，直到最终超时。因此需要对Workernode CPU利用率设置一个阈值，若是分配结果超过该Worker node CPU利用率阈值，则将负载分配到其他负载小的Worker node。Pay attention to the Worker node overload problem during the scheduling process. Once the Worker node is overloaded, the average job processing time will skyrocket, tuples will start to fail, and tuples will start to queue until they eventually time out. Therefore, it is necessary to set a threshold for the CPU utilization of the Worker node. If the allocation result exceeds the CPU utilization threshold of the Worker node, the load will be allocated to other Worker nodes with a small load.

Storm集群的Worker node用N={n₁,n₂,...,n_N}表示，Worker node可用CPU资源用C={c_n1,c_n2,...,c_nN}表示，u为Worker node允许的最大CPU利用率，c_Ej为Executor j运行在Worker node上所占用的CPU资源，则对任意n_s∈N，其上运行的线程需满足。The Worker node of the Storm cluster is represented by N={n ₁ ,n ₂ ,...,n _N }, the available CPU resources of the Worker node are represented by C={c _n1 ,c _n2 ,...,c _nN }, and u is The maximum CPU utilization allowed by Worker node, c _Ej is the CPU resources occupied by Executor j running on Worker node, then for any n _s ∈ N, the threads running on it must satisfy .

一种可行的实施方式中，S103中，基于贪心算法，对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度，包括：In a feasible implementation manner, in S103, based on a greedy algorithm, task scheduling is performed on the components after adjusting the degree of parallelism to obtain task scheduling that minimizes communication overhead, including:

采集周期T内相互通信的执行器之间的元组传输速率E_i，j、Worker之间的元组传输速率W_i，j，Worker的CPU使用率U_wi、集群所有Worker node的CPU使用率U_ni；The tuple transmission rate E _{i, j between executors communicating with each other within the collection period T,} the tuple transmission rate W _i, j between Workers, the CPU usage rate U _wi of Workers, and the CPU usage rate of all Worker nodes in the cluster U _ni ;

基于贪心算法，将互相通信的执行器分配给同一节点；Based on the greedy algorithm, the executors that communicate with each other are assigned to the same node;

贪心算法每次均执行最优解，最终分配结果也是降低Worker node间网络通信开销的最优解。The greedy algorithm executes the optimal solution every time, and the final allocation result is also the optimal solution to reduce the network communication overhead between Worker nodes.

在分配过程中增加性能优化阈值来控制其触发频率，获得最小化通信开销的任务调度。Increase the performance optimization threshold to control its trigger frequency in the allocation process, and obtain the task scheduling that minimizes the communication overhead.

一种可行的实施方式中，如图5所示，为优化Worker间通信开销的调度算法1 。基于贪心算法，将互相通信的执行器分配给同一节点，包括：In a feasible implementation manner, as shown in FIG. 5 , it is a scheduling algorithm 1 for optimizing communication overhead between Workers. Based on the greedy algorithm, the executors that communicate with each other are assigned to the same node, including:

优化Worker间通信开销，对E_i，j进行降序排列，基于贪心算法选取E_i，j序列中对应的通信量最大的两个执行器Executor_i和Executor_j，将其放入中央处理器（CentralProcessing Unit，CPU）利用率最小的Worker中去；Optimize the communication overhead between Workers, arrange E _{i, j} in descending order, select E _{i, j} sequence based on the greedy algorithm, and select the two executors Executor _i and Executor _j with the largest communication volume in the sequence, and put them into the central processing unit (Central Processing Unit, CPU) to the Worker with the smallest utilization;

优化Worker node间通信开销，与第一阶段的算法类似，对W_i，j进行降序排列，采用贪心算法每次选取W_i，j序列中对应的通信量最大的两个Worker：Worker_i和Worker_j，将其放入CPU利用率最小的那个Worker node中去；Optimize the communication overhead between Worker nodes. Similar to the algorithm in the first stage, W _{i, j} are sorted in descending order, and the greedy algorithm is used to select the two workers with the largest communication volume in the W _{i, j} sequence each time: Worker _i and Worker _j , put it into the Worker node with the smallest CPU utilization;

本发明一种实施例中，如图6所示为优化Worker node间通信开销的调度算法2。In an embodiment of the present invention, as shown in FIG. 6 , a scheduling algorithm 2 for optimizing communication overhead between Worker nodes is shown.

一种可行的实施方式中，在分配过程中增加性能优化阈值来控制其触发频率，获得最小化通信开销的任务调度。In a feasible implementation manner, a performance optimization threshold is added in the allocation process to control its triggering frequency, so as to obtain task scheduling that minimizes communication overhead.

一种可行的实施方式中，在分配过程中需要注意Worker node过载问题，一旦Worker node过载，作业平均处理时间会暴涨、元组会开始失败并且元组开始排队，直到最终超时。因此需要对Worker node CPU利用率设置一个阈值，若是分配结果超过该Workernode CPU利用率阈值，则将负载分配到其他负载小的Worker node。任务调度过程中会产生成本影响系统性能。本申请添加了一个性能优化阈值，以确定是否根据计算出的调度方案进行任务调整。只有当调度效果达到设置的性能优化阈值或以上时，才能进行调度。In a feasible implementation, attention needs to be paid to the worker node overload problem during the allocation process. Once the worker node is overloaded, the average job processing time will skyrocket, tuples will start to fail, and tuples will start to queue until they eventually time out. Therefore, it is necessary to set a threshold for the CPU utilization of the Worker node. If the allocation result exceeds the CPU utilization threshold of the Worker node, the load will be allocated to other Worker nodes with a small load. During the task scheduling process, there will be costs that affect system performance. This application adds a performance optimization threshold to determine whether to adjust tasks according to the calculated scheduling scheme. Scheduling can only be performed when the scheduling effect reaches the set performance optimization threshold or above.

本发明实施例中，本发明提出了一种在QoS约束下的延迟确保调度策略，称为Lg-Stream。从以下几个方面讨论了Lg-Stream策略：（1）将拓扑建模为排队网络，构建延迟约束模型来评估系统延迟，并构建通信开销模型来计算通信开销（2）对于资源有限和延迟约束的场景，我们确保每个执行器到组件的分配最大限度地优化系统的处理延迟，从而改变组件的并行度（3）基于将相互通信的执行器尽可能多的被放置在同一节点上的贪心算法思想，通过调度来降低通信开销，并增加性能优化阈值来控制其触发频率。In the embodiment of the present invention, the present invention proposes a delay-guaranteed scheduling strategy under QoS constraints, which is called Lg-Stream. The Lg-Stream strategy is discussed from the following aspects: (1) Modeling the topology as a queuing network, constructing a delay-constrained model to evaluate the system delay, and constructing a communication overhead model to calculate the communication overhead (2) For limited resources and delay constraints scenario, we ensure that the allocation of each executor to a component maximizes the processing latency of the system, thus varying the parallelism of the components (3) based on the greediness of placing as many communicating executors on the same node as possible Algorithm idea, reduce communication overhead through scheduling, and increase performance optimization threshold to control its trigger frequency.

本发明实现了Lg-Stream的数据监测模块以及性能优化模块，并集成到典型的分布式流计算系统Apache Storm中，从系统延迟、吞吐量、资源利用率和资源使用量的角度对系统指标进行全面评估。实验结果表明，与现有的Apache Storm框架相比，提出的Lg-Stream在系统性能上具有明显的提升。与现有的Apache Storm的默认调度策略和在线调度策略相比，提出的Lg-Stream（latency guaranteed Stream）系统降低了延迟和资源使用量，提高了吞吐量和资源利用率。The present invention realizes the data monitoring module and the performance optimization module of Lg-Stream, and integrates it into Apache Storm, a typical distributed stream computing system, and conducts system indicators from the perspectives of system delay, throughput, resource utilization and resource usage Comprehensive assessment. Experimental results show that compared with the existing Apache Storm framework, the proposed Lg-Stream has a significant improvement in system performance. Compared with the existing default scheduling strategy and online scheduling strategy of Apache Storm, the proposed Lg-Stream (latency guaranteed Stream) system reduces delay and resource usage, and improves throughput and resource utilization.

图7是本发明的一种QoS约束的流应用延迟确保系统示意图，所述系统200包括：FIG. 7 is a schematic diagram of a QoS-constrained stream application delay guarantee system according to the present invention, and the system 200 includes:

调度模块210，用于将系统的拓扑任务建模为排队网络，构建延迟约束模型；通过延迟约束模型对系统中的拓扑延迟进行评估，为系统中具有优先级的组件分配执行器；在线调整拓扑任务中组件的并行度，对不同QoS约束情况的处理延迟进行优化；The scheduling module 210 is used to model the topological tasks of the system as a queuing network and build a delay constraint model; evaluate the topology delay in the system through the delay constraint model, and assign executors to components with priority in the system; adjust the topology online The parallelism of the components in the task optimizes the processing delay of different QoS constraints;

并行度优化模块220，用于构建通信开销模型；基于贪心算法，对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度，完成QoS约束的流应用延迟确保。The parallelism optimization module 220 is used to build a communication overhead model; based on a greedy algorithm, perform task scheduling on the components after adjusting the parallelism, obtain task scheduling that minimizes communication overhead, and complete QoS-constrained flow application delay guarantee.

优选地，调度模块210，进一步用于基于流入排队网络的参数，构建延迟约束模型评估整个拓扑的平均处理延迟；Preferably, the scheduling module 210 is further configured to construct a delay constraint model to evaluate the average processing delay of the entire topology based on the parameters of the inflow queuing network;

优选地，调度模块210，进一步用于在资源有限的场景下，执行器数量有限，则计算周期T统计周期内的λ₀、λ_i、µ_i，通过所述延迟约束模型计算所有组件输入元组的平均处理延迟E[T_i](S_i)；优先分配执行器对整个拓扑输入元组的平均处理延迟影响最大的组件，直到执行器分配完，得出的执行器分配结果为使拓扑输入元组的平均处理延迟最小的结果，获得最优分配方案；Preferably, the scheduling module 210 is further used to calculate λ ₀ , λ _i , µ _i in the statistical period of the cycle T in a scenario with limited resources and a limited number of executors, and calculate the input elements of all components through the delay constraint model The average processing delay E[T _i ](S _i ) of the group; the priority allocation of executors has the greatest influence on the average processing delay of the input tuples of the entire topology until the executors are allocated, and the result of the executor allocation is that the topology The result of the minimum average processing delay of the input tuples is obtained to obtain the optimal allocation scheme;

优选地，并行度优化模块220，进一步用于通过评估节点之间的通信量，为拓扑结构构建通信开销模型。Preferably, the parallelism optimization module 220 is further configured to construct a communication overhead model for the topology by evaluating communication traffic between nodes.

优选地，并行度优化模块220，进一步用于采集周期T内相互通信的执行器之间的元组传输速率E_i,j、Worker之间的元组传输速率W_i,j，Worker的CPU使用率U_wi、集群所有Worker node的CPU使用率U_ni；Preferably, the parallelism optimization module 220 is further used to collect the tuple transmission rate E _i,j between the executors communicating with each other in the period T, the tuple transmission rate W _i,j between the Workers, and the CPU usage of the Worker rate U _wi , the CPU usage rate U _ni of all Worker nodes in the cluster;

贪心算法每次均执行最优分配方案，获得最终分配结果，所述最终分配结果为降低Worker node间网络通信开销的最优解。The greedy algorithm executes the optimal allocation scheme each time to obtain the final allocation result, which is an optimal solution for reducing network communication overhead between Worker nodes.

在分配过程中增加性能优化阈值控制分配方案触发频率，获得最小化通信开销的任务调度。In the allocation process, a performance optimization threshold is added to control the trigger frequency of the allocation scheme, and task scheduling with minimum communication overhead is obtained.

优选地，基于贪心算法，将互相通信的执行器分配给同一节点，包括：Preferably, based on the greedy algorithm, the executors communicating with each other are assigned to the same node, including:

基于贪心算法，结合所述通信开销模型优化Worker node间通信开销，对W_i,j进行降序排列，采用贪心算法每次选取W_i,j序列中对应的通信量最大的两个Worker：Worker_i和Worker，将两个Worker放入CPU利用率最小的那个Worker node中去；Based on the greedy algorithm, combined with the communication overhead model to optimize the communication overhead between Worker nodes, arrange W _i,j in descending order, and use the greedy algorithm to select the two Workers with the largest communication volume in the W _i,j sequence each time: Worker _i and Worker, put the two Workers into the Worker node with the smallest CPU utilization;

优选地，还包括：Preferably, it also includes:

数据监测模块230，用于收集负载信息和执行器之间数据流大小；A data monitoring module 230, configured to collect load information and the size of the data flow between the actuators;

存储模块240，用于存储每次任务分配信息以及数据监测模块Data Monitor收集的负载信息和执行器之间数据流大小并保持更新。The storage module 240 is used to store the assignment information of each task and the load information collected by the data monitoring module Data Monitor and the size of the data flow between the executors and keep it updated.

本发明实现了Lg-Stream的数据监测模块以及性能优化模块，并集成到典型的分布式流计算系统Apache Storm中，从系统延迟、吞吐量、资源利用率和资源使用量的角度对系统指标进行全面评估。实验结果表明，与现有的Apache Storm框架相比，提出的Lg-Stream在系统性能上具有明显的提升。与现有的Apache Storm的默认调度策略和在线调度策略相比，提出的Lg-Stream系统降低了延迟和资源使用量，提高了吞吐量和资源利用率。The present invention realizes the data monitoring module and the performance optimization module of Lg-Stream, and integrates it into Apache Storm, a typical distributed stream computing system, and conducts system indicators from the perspectives of system delay, throughput, resource utilization and resource usage Comprehensive assessment. Experimental results show that compared with the existing Apache Storm framework, the proposed Lg-Stream has a significant improvement in system performance. Compared with the existing Apache Storm's default scheduling strategy and online scheduling strategy, the proposed Lg-Stream system reduces delay and resource usage, and improves throughput and resource utilization.

图8是本发明实施例提供的一种电子设备300的结构示意图，该电子设备300可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器（centralprocessing units，CPU）301和一个或一个以上的存储器302，其中，所述存储器302中存储有至少一条指令，所述至少一条指令由所述处理器301加载并执行以实现下述QoS约束的流应用延迟确保方法的步骤：FIG. 8 is a schematic structural diagram of an electronic device 300 provided by an embodiment of the present invention. The electronic device 300 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 301 and one or more memories 302, wherein at least one instruction is stored in the memory 302, and the at least one instruction is loaded and executed by the processor 301 to implement the steps of the following QoS-constrained flow application delay ensuring method :

S1、通过将拓扑建模为排队网络，构建延迟约束模型；S1. Construct a delay constraint model by modeling the topology as a queuing network;

S2、通过延迟约束模型对系统延迟进行评估，为系统中具有优先级的组件分配执行器，在线调整拓扑任务中组件的并行度，对不同QoS约束情况下的处理延迟进行优化；其中，不同QoS约束情况包括：资源有限以及延迟约束；S2. Evaluate the system delay through the delay constraint model, assign executors to the components with priority in the system, adjust the parallelism of the components in the topology task online, and optimize the processing delay under different QoS constraints; among them, different QoS Constraints include: limited resources and delay constraints;

S3、构建通信开销模型；基于贪心算法，对调整并行度后的组件进行任务调度，获得最小化通信开销的任务调度。S3. Construct a communication overhead model; based on a greedy algorithm, perform task scheduling on the components after adjusting the degree of parallelism, and obtain task scheduling that minimizes communication overhead.

在示例性实施例中，还提供了一种计算机可读存储介质，例如包括指令的存储器，上述指令可由终端中的处理器执行以完成上述QoS约束的流应用延迟确保方法。例如，所述计算机可读存储介质可以是ROM、随机存取存储器（RAM）、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory including instructions, which can be executed by a processor in a terminal to implement the above QoS-constrained stream application delay ensuring method. For example, the computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Claims

1. A stream application delay guarantee method of QoS constraints, characterized in that the method steps include:

S1. Model the topological tasks of the system as a queuing network and build a delay constraint model;

In the step S1, the topological task of the system is modeled as a queuing network, and a delay constraint model is constructed, including:

Based on the parameters of the inflow queuing network, construct a delay constraint model to evaluate the average processing delay of the entire topology;

Among them, the parameters include: the average arrival rate λ ₀ of tuples flowing into the queuing network, a single component S _i ; the average arrival rate λ _i of input tuples, the number of executors ci assigned by S _i _; the average processing rate µ _i ; assign the maximum number of executors c _max ; add an executor to reduce the average tuple processing delay L _i ;

in, , i is a positive integer variable that records subscripts, 1≤i≤N , N is a positive integer greater than 1; E represents expectation; T _i represents the average processing delay of a single component S _i ;

S2. Evaluate the topology delay in the system through the delay constraint model, and assign executors to the components with priority in the system; adjust the parallelism of the components in the topology task online, and optimize the processing delay of different QoS constraints;

Wherein, the different QoS constraints include: limited resources and delay constraints;

In the step S2, the topological delay in the system is evaluated through the delay constraint model, and executors are assigned to the components with priority in the system; the parallelism of the components in the topology task is adjusted online, and the processing of different QoS constraints Latency is optimized, including:

In a scenario with limited resources, the number of executors is limited, then calculate λ ₀ , λ _i , µ _i within the statistical period, and calculate the average processing delay E[T _i ](S _i ); give priority to assigning executors to the components that have the greatest impact on the average processing delay of the entire topology input tuple until the executors are allotted. optimal allocation plan;

In the case of delay constraints, through the greedy algorithm, each time the executor is preferentially assigned to the component that has the greatest impact on the total residence time of the input tuple of the entire topology. At this time, the delay will decrease with the number of executors gradually increasing until It just satisfies the delay limit T _max and obtains the optimal allocation scheme;

S3. Construct a communication overhead model; based on a greedy algorithm, combined with the communication overhead model, perform task scheduling on the components after adjusting the degree of parallelism, obtain task scheduling that minimizes communication overhead, and complete QoS-constrained flow application delay guarantee.

2. The method according to claim 1, wherein, in the step S3, building a communication overhead model includes:

Model the communication overhead for the topology by evaluating the amount of communication between nodes.

3. The method according to claim 2, characterized in that, in the step S3, based on the greedy algorithm, combined with the communication overhead model, the components after adjusting the degree of parallelism are task-scheduled to obtain task scheduling that minimizes the communication overhead ,include:

The tuple transmission rate E _i,j between executors communicating with each other in the collection period, the tuple transmission rate W _i,j between workers, the CPU usage rate U _wi of Workers, and the CPU usage of all Worker nodes in the cluster Rate U _ni ; Among them, j is a positive integer variable that records subscripts, 1≤ j ≤ N, and i ≠ j ;

Based on a greedy algorithm, combined with the communication overhead model, allocating the executors communicating with each other to the same node;

The greedy algorithm executes the optimal allocation scheme every time to obtain the final allocation result, which is the optimal solution for reducing the network communication overhead between the Worker nodes;

In the allocation process, a performance optimization threshold is added to control the trigger frequency of the allocation scheme, and task scheduling with minimum communication overhead is obtained.

4. The method according to claim 3, wherein the greedy algorithm, in combination with the communication overhead model, assigns the executors communicating with each other to the same node, comprising:

Based on the greedy algorithm, combine the communication overhead model to optimize the communication overhead between Workers, arrange E _{i, j} in descending order, and select the two executors Executor _i and Executor _j corresponding to the largest communication volume in the E _{i, j} sequence based on the greedy algorithm , put the two executors into the Worker with the least CPU utilization;

Based on the greedy algorithm, combined with the communication overhead model to optimize the communication overhead between Worker nodes, arrange W _i,j in descending order, and use the greedy algorithm to select the two Workers with the largest corresponding traffic in the W _i,j sequence each time: Worker _i and Worker _j , put the two Workers into the Worker node with the smallest CPU utilization;

Repeat the above two optimization processes until all executors and workers are allocated.

5. The method according to claim 4, characterized in that, increasing the trigger frequency of the performance optimization threshold control allocation scheme in the allocation process, and obtaining task scheduling that minimizes communication overhead, including:

In the allocation process, increase the performance optimization threshold to control the trigger frequency of the allocation plan, and judge whether to perform scheduling operations. Only when the percentage of traffic improvement between nodes after scheduling reaches the performance optimization threshold or above, the scheduling operation can be performed.

6. The method according to claim 5, further comprising: setting a utilization threshold for the Worker node CPU utilization, if the allocation result exceeds the utilization threshold of the Worker node CPU, then the load is distributed to other Worker node with light load.

7. A QoS-constrained flow application delay guarantee system, characterized in that the system comprises:

The scheduling module is used to model the topological tasks of the system as a queuing network and build a delay constraint model; evaluate the system delay through the delay constraint model, assign executors to components with priority in the system, and adjust the components in the topological task online Parallelism, optimize the processing delay under different QoS constraints; among them, different QoS constraints include: limited resources and delay constraints;

The scheduling module is configured to construct a delay constraint model to evaluate the average processing delay of the entire topology based on the parameters of the inflow queuing network;

Among them, the parameters include: the average arrival rate λ ₀ of tuples flowing into the queuing network, a single component S _i ; the average arrival rate λ _i of input tuples, the number of executors ci allocated by S _i _; the average processing rate of each executor µ _i ; allocate the maximum number of executors c _max ; add an executor to reduce the average processing delay of tuples L _i ;

in, , i is a positive integer variable that records subscripts, 1≤ i ≤ N; E represents expectation; T _i represents the average processing delay of a single component S _i ;

The parallelism optimization module is used to build a communication overhead model; based on the greedy algorithm, combined with the communication overhead model, the components after adjusting the parallelism are scheduled for tasks, to obtain task scheduling that minimizes communication overhead, and to complete the QoS-constrained flow application delay guarantee ;

The scheduling module is used to calculate λ ₀ , λ _i , µ _i in the statistical period of period T in a scenario with limited resources and a limited number of executors, and calculate the average processing delay of input tuples of all components through the delay constraint model E [ T _i ]( S _i ); Prioritize the allocation of executors to the components that have the greatest impact on the average processing delay of the entire topology input tuple until the executors are allocated, and the result of the executor allocation is to make the average of the topology input tuples Process the result with the least delay to obtain the optimal allocation plan;

In the case of delay constraints, through the greedy algorithm, each time the executor is preferentially assigned to the component that has the greatest impact on the total residence time of the input tuple of the entire topology. At this time, the delay will decrease with the number of executors gradually increasing until Just satisfy the delay limit T _max , and obtain the optimal allocation scheme.