CN112988344A - Distributed batch task scheduling method, device, equipment and storage medium - Google Patents
Distributed batch task scheduling method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112988344A CN112988344A CN202110174120.3A CN202110174120A CN112988344A CN 112988344 A CN112988344 A CN 112988344A CN 202110174120 A CN202110174120 A CN 202110174120A CN 112988344 A CN112988344 A CN 112988344A
- Authority
- CN
- China
- Prior art keywords
- task
- node
- subtask
- execution unit
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000012545 processing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 20
- 238000013136 deep learning model Methods 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 4
- 238000006062 fragmentation reaction Methods 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 241001522296 Erithacus rubecula Species 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000013468 resource allocation Methods 0.000 description 2
- 101100311330 Schizosaccharomyces pombe (strain 972 / ATCC 24843) uap56 gene Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 101150018444 sub2 gene Proteins 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a distributed batch task scheduling method, a distributed batch task scheduling device, distributed batch task scheduling equipment and a storage medium, and relates to the field of distributed systems. Wherein, the method comprises the following steps: and acquiring a flow scheduling tree, wherein the flow scheduling tree is created by the task control center according to the pre-stored configuration information. And analyzing the flow scheduling tree, acquiring a plurality of subtasks to be executed, and registering each subtask in the task scheduling center. And the task scheduling center distributes each registered subtask to the execution unit according to a preset load balancing strategy and executes the subtask. The dynamic distribution and the concurrency control of the tasks are realized, and the task nodes and the task scheduling center are highly available. A set of complete distributed batch task scheduling method is provided according to a three-layer structure of task control, task scheduling and task execution.
Description
Technical Field
The invention relates to the field of distributed systems, in particular to a distributed batch task scheduling method, a distributed batch task scheduling device, distributed batch task scheduling equipment and a storage medium.
Background
In recent years, with the continuous development of banking business, the scale of a back-end system is continuously enlarged, and the number and complexity of batch processing tasks are increased. When a single complex task is scheduled to be processed on a single server, the single server cannot provide enough performance due to the fact that the task complexity is too high, so that the server load is too large, and the processing of the task is affected.
In the prior art, a single complex task can be disassembled to a plurality of processing servers for concurrent execution through a distributed deployment scheme, so that the purposes of reducing complexity and shortening task execution time are achieved. Therefore, unified management and scheduling of the disassembled scattered tasks are required.
At present, the distributed task scheduling method is only based on a three-layer structure of task consumers, task producers and task schedulers. The complete combination of methods from task disassembly, task allocation, task control to task circulation is lacking.
Disclosure of Invention
The distributed task scheduling method based on the prior art is only based on a three-layer structure of task consumers, task producers and task schedulers. The problem of lack of complete combination of methods from task disassembly, task allocation, task control to task circulation is solved, and embodiments of the present invention provide a distributed batch task scheduling method, apparatus, device, and storage medium, which can provide a complete distributed batch task scheduling method.
In a first aspect, an embodiment of the present invention provides a distributed batch task scheduling method, where the method includes: and acquiring a flow scheduling tree, wherein the flow scheduling tree is created by the task control center according to the pre-stored configuration information. And analyzing the flow scheduling tree, acquiring a plurality of subtasks to be executed, and registering each subtask in the task scheduling center. And the task scheduling center distributes each registered subtask to the execution unit according to a preset load balancing strategy and executes the subtask.
In the first aspect, a task control center sequences a plurality of subtasks to be executed through a flow scheduling tree and registers the subtasks to a task scheduling center, and then the task scheduling center distributes each registered subtask to an execution unit according to a preset load balancing strategy and executes the subtask. The dynamic distribution and the concurrency control of the tasks are realized, and the task nodes and the task scheduling center are highly available. A set of complete distributed batch task scheduling method is provided according to a three-layer structure of task control, task scheduling and task execution.
Optionally, parsing the flow scheduling tree, obtaining a plurality of subtasks to be executed, and registering each subtask in the task scheduling center includes: and analyzing the flow scheduling tree, and sequentially acquiring a plurality of subtasks to be executed according to the execution sequence of each subtask in the flow scheduling tree. Each subtask is registered in the task scheduling center according to the execution order.
Optionally, distributing and executing each registered subtask to the execution unit, including: and after receiving the subtasks, the unit manager of the execution unit acquires the node information of the subtasks and the configuration parameters of the execution flow of the subtasks. And the unit manager sequentially issues each subtask to the execution unit indicated by the node information of the subtask to execute according to the execution flow configuration parameter of each subtask and the node information of the subtask.
Optionally, the node information of the subtask node includes: and indicating the server node, the resource node and the task node corresponding to the subtask.
Optionally, the server node comprises at least one server sub-node. Each server sub-node comprises at least one task node, and the task node is used for responding to the execution unit and executing the sub-tasks.
Optionally, if a temporary node for indicating the availability status of the task node is further included below the server child node, the availability status of the task node is available. And if the temporary node for indicating the available state of the task node is not included in the server child node, the available state of the task node is unavailable.
Optionally, the resource nodes include database table child nodes and resource child nodes. And the database table child node is used for storing the configuration information of the database fragmentation resources. The resource child node is used for storing the server resource configuration information.
Optionally, the task sub-node is configured to store task information.
Optionally, the preset load balancing policy includes: and acquiring the performance weight of the corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit.
Distributing each registered subtask to an execution unit and executing the subtask comprises: and distributing the subtasks to the execution units corresponding to the servers with the highest performance weights.
Optionally, obtaining the performance weight of the corresponding server according to the performance parameter of the execution unit of each server and the subtask being executed by the execution unit, includes: and acquiring the initial weight of the execution unit according to the performance parameter of the execution unit of each server. And acquiring the effective weight of the execution unit according to the initial weight and the subtask being executed by the execution unit. And acquiring the current weight of the execution unit, and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server.
Optionally, after obtaining the current weight of the execution unit and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server, the method further includes: and acquiring the sum of the effective weights of all the execution units, and updating the weight value obtained by subtracting the sum of the effective weights of all the execution units from the performance weight to be the current weight of the execution unit.
Optionally, the subtasks include: subtask node object, subflow program number. Each subtask stream program number corresponds to at least one subtask flow, and each subtask flow corresponds to at least one subtask node object.
In a second aspect, an embodiment of the present invention provides a distributed batch task scheduling apparatus, where the apparatus includes: and the acquisition module is used for acquiring a flow scheduling tree, and the flow scheduling tree is created by the task control center according to the pre-stored configuration information. And the analysis module is used for analyzing the flow scheduling tree, acquiring a plurality of subtasks to be executed and registering each subtask in the task scheduling center. And the execution module is used for distributing each registered subtask to the execution unit and executing the subtask according to a preset load balancing strategy by the task scheduling center.
Optionally, the parsing module is specifically configured to parse the flow scheduling tree, and sequentially obtain a plurality of to-be-executed subtasks according to an execution sequence of each subtask in the flow scheduling tree. Each subtask is registered in the task scheduling center according to the execution order.
Optionally, the execution module is specifically configured to, after receiving the subtask, the unit manager of the execution unit obtains node information of the subtask and an execution flow configuration parameter of the subtask. And the unit manager sequentially issues each subtask to the execution unit indicated by the node information of the subtask to execute according to the execution flow configuration parameter of each subtask and the node information of the subtask.
Optionally, the node information of the subtask node includes: and indicating the server node, the resource node and the task node corresponding to the subtask.
Optionally, the server node comprises at least one server sub-node. Each server sub-node comprises at least one task node, and the task node is used for responding to the execution unit and executing the sub-tasks.
Optionally, if a temporary node for indicating the availability status of the task node is further included below the server child node, the availability status of the task node is available. And if the temporary node for indicating the available state of the task node is not included in the server child node, the available state of the task node is unavailable.
Optionally, the resource nodes include database table child nodes and resource child nodes. And the database table child node is used for storing the configuration information of the database fragmentation resources. The resource child node is used for storing the server resource configuration information.
Optionally, the task sub-node is configured to store task information.
Optionally, the preset load balancing policy includes: and acquiring the performance weight of the corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit.
And the execution module is specifically used for distributing the subtasks to the execution units corresponding to the servers with the highest performance weights.
Optionally, the obtaining module is further configured to obtain an initial weight of the execution unit according to the performance parameter of the execution unit of each server. And acquiring the effective weight of the execution unit according to the initial weight and the subtask being executed by the execution unit. And acquiring the current weight of the execution unit, and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server.
Optionally, the obtaining module is further configured to obtain a sum of the effective weights of all the execution units, and update a weight value obtained by subtracting the sum of the effective weights of all the execution units from the performance weight to be a current weight of the execution unit.
Optionally, the subtasks include: subtask node object, subflow program number. Each subtask stream program number corresponds to at least one subtask flow, and each subtask flow corresponds to at least one subtask node object.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operated, the processor executing the machine-readable instructions to perform the steps of the method as in the first aspect when executed.
In a fourth aspect, an embodiment of the present invention provides a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the steps of the method according to the first aspect.
For the above beneficial effects of the second aspect to the fourth aspect, reference may be made to the first aspect, which is not described herein again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic diagram illustrating an application scenario of a distributed task scheduling method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a distributed task scheduling method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a flow scheduling tree in the distributed task scheduling method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another distributed task scheduling method according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a subtask flow scheduling tree in the distributed task scheduling method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating another distributed task scheduling method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating a server node in the distributed task scheduling method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating a resource node in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram illustrating task child nodes in the distributed task scheduling method according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating task registration in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating task allocation in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating task allocation in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 13 is a diagram illustrating task execution in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 14 is a diagram illustrating task execution in a distributed task scheduling method according to an embodiment of the present invention;
FIG. 15 is a schematic structural diagram illustrating a distributed task scheduling apparatus according to an embodiment of the present invention;
fig. 16 shows a schematic structural diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the drawings in the present invention are for illustrative and descriptive purposes only and are not used to limit the scope of the present invention. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this disclosure illustrate operations implemented according to some embodiments of the present invention. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the direction of this summary, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments of the present invention are only some embodiments of the present invention, and not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the term "comprising" will be used in the embodiments of the invention to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features. It should also be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. In the description of the present invention, it should also be noted that the terms "first", "second", "third", and the like are used for distinguishing the description, and are not intended to indicate or imply relative importance.
Fig. 1 shows a system architecture of a task scheduling method provided by the present application, which includes a task scheduling center, a task control center, and an execution unit management. In a distributed system, there is only one instance of a task scheduling center. The task scheduling center is the core of task distribution and scheduling and is used for task registration monitoring and task distribution. The task control center is generated along with the starting of the distributed tasks and is used for instantiation task registration, task state monitoring and resource monitoring. After the distributed tasks are completely executed, the task control center can be eliminated. That is, each distributed task corresponds to a task control center.
Each distributed task can be disassembled into a plurality of subtasks by the task scheduling center, and each subtask is distributed to the execution unit for processing according to the load balancing strategy. The execution unit manager is used for executing and managing subtasks, such as environment information updating, task state updating and the like.
It should be noted that the distributed task scheduling method provided by the present application is implemented based on a Zookeeper architecture, and is applied to scenarios such as data publishing and subscribing, cluster management, distributed coordination and control, Master election, and the like, and a distributed task scheduling method based on a three-layer architecture of a task control center, a task scheduling center, and an execution unit is designed and implemented.
Fig. 2 shows a flowchart of a distributed task scheduling method provided in the present application, and referring to fig. 2, the method includes:
s101, acquiring a flow scheduling tree.
The flow scheduling tree is created by the task control center according to the pre-stored configuration information. FIG. 3 shows a schematic diagram of a flow scheduling tree. The execution sequence of the flow scheduling tree is from top to bottom and from left to right. Taking FIG. 3 as an example, the first subtask to be executed is "0B-00-00-C". Since there are two concurrent subtasks under "0B-01", two subtasks are executed simultaneously: "0B-01-01-S-Z-J" and "0B-01-02-S-Z-J". Then, execution of "0B-ZY-ZY-B" is continued, and finally "0B-ZY-ZY-T" is executed. Wherein, the 'C', 'S', 'Z', 'J', 'B' and 'T' are the functional templates after the sub-tasks are split.
S102, analyzing the flow scheduling tree, obtaining a plurality of subtasks to be executed, and registering each subtask in the task scheduling center.
The process of analyzing the flow scheduling tree is a process of obtaining an execution sequence of sub-tasks in the flow scheduling tree. After the task scheduling center registers each subtask, the task scheduling center will arrange each subtask in order and wait for execution.
S103, the task scheduling center distributes each registered subtask to an execution unit according to a preset load balancing strategy and executes the subtask.
In this embodiment, the task control center sequences and registers a plurality of to-be-executed subtasks to the task scheduling center through the flow scheduling tree, and then the task scheduling center distributes and executes each of the registered subtasks to the execution unit according to a preset load balancing policy. The dynamic distribution and the concurrency control of the tasks are realized, and the task nodes and the task scheduling center are highly available. A set of complete distributed batch task scheduling method is provided according to a three-layer structure of task control, task scheduling and task execution.
Referring to fig. 4, parsing the flow scheduling tree to obtain a plurality of subtasks to be executed, and registering each subtask in the task scheduling center includes:
and S1021, analyzing the flow scheduling tree, and sequentially acquiring a plurality of subtasks to be executed according to the execution sequence of each subtask in the flow scheduling tree.
Referring to fig. 3 and 5, the flow scheduling tree in fig. 3 is parsed into a plurality of subtasks to be executed in fig. 5. Wherein each subtask includes: subtask node object, subflow program number. Each subtask stream program number corresponds to at least one subtask flow, and each subtask flow corresponds to at least one subtask node object. For example, referring to fig. 5, the sub-flow program number 01 of the sub-task 01 corresponds to the sub-task flow 01 and the sub-task flow 02, and the sub-task flow 01 and the sub-task flow 02 respectively correspond to the sub-task node object S, the sub-task node object Z and the sub-task node object J.
S1022, each subtask is registered in the task scheduling center according to the execution order.
According to the plurality of to-be-executed subtasks shown in fig. 5, each of the to-be-executed subtasks is sequentially registered in the task scheduling center in the execution order.
Referring to fig. 6, distributing and executing each of the subtasks registered to the execution unit includes:
and S1031, after receiving the subtasks, the unit manager of the execution unit acquires the node information of the subtasks and the configuration parameters of the execution flow of the subtasks.
In some embodiments, the node information of the subtask node includes: and indicating the server node, the resource node and the task node corresponding to the subtask.
Wherein, referring to fig. 7, the server node includes at least one server sub-node. Each server sub-node comprises at least one task node, and the task node is used for responding to the execution unit and executing the sub-tasks.
And if the server child node further comprises a temporary node for indicating the availability status of the task node, the availability status of the task node is available. And if the temporary node for indicating the available state of the task node is not included in the server child node, the available state of the task node is unavailable.
By way of example, there are multiple child nodes (/ apn (ip address)) under each server. The child node name may be an IP address of the server. For example, the name of the first child node may be/ap 110.0.0.55, and the name of the second child node may be/ap 210.0.0.55.
Each child node comprises a class 2 node: heartbeat nodes and task nodes. The heartBeat node (/ heartBeat node) is a temporary node, and the node is stored to indicate that the execution unit is in an available state, and if the node is not stored, the node indicates that the execution unit is in an unavailable state.
The task node (/ task _ sub) is a persistent node, and when the task scheduling center allocates the task to the execution unit, the task node is established under the corresponding/apn node, such as/task 2_ sub1 or/task 2_ sub 4.
The/leader _ lock node is used for locking each server electing a master node.
Referring to fig. 8, a plurality of directories (CN000, CN001, etc.) are included under a resource node (/ resource), and each directory includes a database table (sharing) child node and a resource child node (/ QR). The database table child node (/ mapping) is used for storing the configuration information of the database fragmentation resource. The resource (/ QR) child node is used for storing server resource configuration information.
The Resource sub-node includes a Control Resource (CR) and a Quality Resource (QR). And the CR nodes are divided into shared-CR (SCR) and Exclusive-or (ECR) control resources. The resource information of each task node is recorded in the quantity resource, such as/QR/ap 1, which may include/FileSplit (m, n) and/FileEngine (m, n). Wherein m represents the total number of data content resources of the node, and n represents the occupied number of data content resources of the node. While the QR (such as/sharing 1/TBPC10101/QR) in the database table is used to represent the number of concurrent accesses allowed by the database table, and the value will change with the application or release of the CR resource.
Referring to FIG. 9, a task sub-node (/ task) is used to store task information.
The task child node (/ task) includes a new task (/ new), an executing task (/ running), and a completing task (/ finished). The corresponding task (/ task) is included under the new task (/ new), the executing task (/ running) and the completing task (/ finished)
Wherein, once created, the task name can not be modified (such as task1, task 2). If the name is modified, it is considered as a new task. Task instance nodes (such as/task 1_ sub1 and/task 1_ sub2) are contained under task names and are used for storing detailed information of task instances. The (/ wait _ res) child node is used for waiting for the completion of the task resource reception and starting task scheduling after the completion of the task resource reception. The/task _ psclnd child node is a unique identifier that identifies the distributed task.
And S1032, the unit manager sequentially issues each subtask to the execution unit indicated by the node information of the subtask to execute according to the execution flow configuration parameter of each subtask and the node information of the subtask.
The preset load balancing strategy comprises the following steps: and acquiring the performance weight of the corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit.
Correspondingly, distributing each registered subtask to the execution unit and executing the subtask comprises: and distributing the subtasks to the execution units corresponding to the servers with the highest performance weights.
Acquiring the performance weight of the corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit, wherein the method comprises the following steps: and acquiring the initial weight of the execution unit according to the performance parameter of the execution unit of each server. And acquiring the effective weight of the execution unit according to the initial weight and the subtask being executed by the execution unit. And acquiring the current weight of the execution unit, and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server.
After obtaining the current weight of the execution unit and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server, the method further comprises the following steps: and acquiring the sum of the effective weights of all the execution units, and updating the weight value obtained by subtracting the sum of the effective weights of all the execution units from the performance weight to be the current weight of the execution unit.
In some embodiments, the initial Weight (Weight), i.e., the Weight per execution unit agreed upon at configuration file or initialization time. The effective weight (effective weight) is a weight quantification of the processing power that each execution unit has. The function of the variable is mainly to provide a basis for the selection of the execution unit. The current weight (currentWeight), which is the current weight of the execution unit, is initialized to 0.
When distributing the subtask to the execution unit corresponding to the server with the highest performance weight, firstly calculating the sum totalWeight of the effectiveWeight in the current state, then adding the currentWeight of each execution unit to the effectiveWeight to obtain the performance weight of the execution unit, sequencing all the performance weights, and then distributing the subtask to the execution unit corresponding to the server with the highest performance weight. totalWeight is then subtracted from the performance weight of the selected execution unit as the new currentWeight.
In some embodiments, effectveweight may be obtained using a weighted round robin algorithm. The weighted round robin algorithm does not need to record the current connection status and is a stateless connection. However, the weights can only be set in advance, and the differences between execution units cannot be reflected dynamically. An effective weight factor for dynamic feedback can thus be introduced.
First, the CPU upper threshold and the memory upper threshold are configured, for example, the CPU upper threshold may be set to 60%, and the memory upper threshold may be set to 90%. And acquiring the actual CPU utilization rate and the actual memory utilization rate of the server. If the actual CPU usage rate < the CPU upper threshold and the actual memory usage rate < the memory upper threshold, the effective weight is W1 lg (1+ CPU upper threshold-actual average CPU usage rate) + W2 lg (1+ memory upper threshold-actual average memory usage rate), where W1 and W2 are the CPU factor and memory factor influence ratios, respectively.
The following describes an application process of the distributed task scheduling method provided in the present application with reference to fig. 10 to 14.
Fig. 10 is a diagram illustrating task registration in the distributed task scheduling method.
Referring to fig. 10, when a distributed task (/ bat _ root _ DEMO) is started, the task control center first parses the distributed scheduling tree to obtain a plurality of subtasks (/ tasks), and after obtaining the subtasks to be executed, creates task persistent node information under the/tasks/new node. For example, the task "/flow 1_ A" created under the/tasks/new node and the corresponding subtasks "/flow 1_ A _1010_ file _ db 1" and "/flow 1_ A _1560_ file _ db 1" are shown in FIG. 10. And the task scheduling center monitors/tasks/new nodes in real time, and when a task node is created, task allocation processing is triggered.
Fig. 11 and 12 are schematic diagrams illustrating task allocation in the distributed task scheduling method.
Referring to fig. 11, the task scheduling center assigns tasks to appropriate execution units for processing according to a load balancing policy. And the task scheduling center applies for resources according to the resource allocation strategy. After the resources are successfully allocated, the task scheduling center respectively creates a persistent node of the task under the/servers/ap 1 and/servers/ap 2 nodes. And the task scheduling center deletes the task nodes successfully distributed under the/tasks/new nodes.
Referring to fig. 12, when the ap1 and ap2 execute the context listening process on the unit and listen that there is task addition under/servers/ap 1 and/servers/ap 2, the controller creates a task temporary node (the parent node is a persistent node) under the/tasks/running node.
Fig. 13 and 14 are schematic diagrams illustrating task execution in the distributed task scheduling method.
Referring to fig. 13, the controller in the execution unit updates task node information under the/tasks/running node in real time. And the task scheduling center monitors the task node information under the/tasks/running node and updates the task node information to the corresponding task node under the/servers/apn node.
The task control center monitors task node information under the tasks/running nodes:
and when the task execution state under the/tasks/running node is in execution, updating the task state corresponding to the distributed flow scheduling tree into execution.
And when the task execution state under the/tasks/running node is (normal/abnormal) finished, updating the corresponding task state of the distributed flow scheduling tree to be finished, and then creating a task node with finished execution under the/tasks/finished node.
Referring to fig. 14, the task scheduling center monitors/tasks/finished nodes at the same time, and when a task node is created, the task scheduling center may delete the corresponding task node under/servers/ipn node, the parent node of the corresponding task node under/tasks/finished node, and the parent node of the corresponding task node under/tasks/running node. And the task scheduling center releases resources according to the resource allocation strategy.
On the basis of a three-layer architecture managed by a task control center, a task scheduling center and an execution unit, the complete scheduling method for distributed batch tasks from registration distribution to control execution is realized, and the method has the advantages of dynamic task distribution, task concurrency control, task flexible capacity expansion, task scheduling control, multi-task type support, high availability of task nodes and high availability of the scheduling center.
Based on the distributed batch task scheduling method described in the foregoing embodiment, the embodiment of the present application further provides a distributed batch task scheduling device.
FIG. 15 is a schematic structural diagram illustrating a distributed batch task scheduling apparatus according to an embodiment of the present invention.
As shown in fig. 15, an embodiment of the present invention provides a distributed batch task scheduling apparatus, including:
the obtaining module 1501 is configured to obtain a flow scheduling tree, where the flow scheduling tree is created by the task control center according to pre-stored configuration information.
The parsing module 1502 is configured to parse the flow scheduling tree, obtain a plurality of to-be-executed subtasks, and register each subtask in the task scheduling center.
The execution module 1503 is configured to, by the task scheduling center, distribute each registered subtask to the execution unit according to a preset load balancing policy and execute the subtask.
Optionally, the parsing module 1502 is specifically configured to parse the flow scheduling tree, and sequentially obtain a plurality of to-be-executed subtasks according to an execution sequence of each subtask in the flow scheduling tree. Each subtask is registered in the task scheduling center according to the execution order.
Optionally, the execution module 1503 is specifically configured to, after receiving the subtask, the unit manager of the execution unit obtain node information of the subtask and an execution flow configuration parameter of the subtask. And the unit manager sequentially issues each subtask to the execution unit indicated by the node information of the subtask to execute according to the execution flow configuration parameter of each subtask and the node information of the subtask.
Optionally, the node information of the subtask node includes: and indicating the server node, the resource node and the task node corresponding to the subtask.
Optionally, the server node comprises at least one server sub-node. Each server sub-node comprises at least one task node, and the task node is used for responding to the execution unit and executing the sub-tasks.
Optionally, if a temporary node for indicating the availability status of the task node is further included below the server child node, the availability status of the task node is available. And if the temporary node for indicating the available state of the task node is not included in the server child node, the available state of the task node is unavailable.
Optionally, the resource nodes include database table child nodes and resource child nodes. And the database table child node is used for storing the configuration information of the database fragmentation resources. The resource child node is used for storing the server resource configuration information.
Optionally, the task sub-node is configured to store task information.
Optionally, the preset load balancing policy includes: and acquiring the performance weight of the corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit.
The execution module 1503 is specifically configured to distribute the subtask to the execution unit corresponding to the server with the highest performance weight.
Optionally, the obtaining module 1501 is further configured to obtain an initial weight of the execution unit according to the performance parameter of the execution unit of each server. And acquiring the effective weight of the execution unit according to the initial weight and the subtask being executed by the execution unit. And acquiring the current weight of the execution unit, and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server.
Optionally, the obtaining module 1501 is further configured to obtain a sum of effective weights of all the execution units, and update a weight value obtained by subtracting the sum of effective weights of all the execution units from the performance weight to be a current weight of the execution unit.
Optionally, the subtasks include: subtask node object, subflow program number. Each subtask stream program number corresponds to at least one subtask flow, and each subtask flow corresponds to at least one subtask node object.
The above-mentioned apparatus can be integrated into a server, a computer, and other devices, and the present invention is not limited herein. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the deep learning model training apparatus may refer to the corresponding process of the deep learning model training method described in the foregoing method embodiments, and details are not repeated in the present invention.
It should be understood that the above-described apparatus embodiments are merely exemplary, and that the apparatus and method disclosed in the embodiments of the present invention may be implemented in other ways. For example, the division of the modules into only one logical functional division may be implemented in other ways, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a user terminal or a driver terminal to perform all or part of the steps of the method according to the embodiments of the present invention.
That is, those skilled in the art will appreciate that embodiments of the present invention may be implemented in any form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
Based on this, the embodiment of the present invention further provides a program product, where the program product may be a storage medium such as a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, and the storage medium may store a computer program, and the computer program is executed by a processor to perform the steps of the deep learning model training method described in the foregoing method embodiment. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, an embodiment of the present invention further provides an electronic device, where the electronic device may be a server, a computer, or a like device, and fig. 16 illustrates a schematic structural diagram of the electronic device provided in the embodiment of the present invention.
As shown in fig. 16, the electronic device 16 may include: a processor 1601, a storage medium 1602, and a bus 1603, the storage medium 1602 storing machine readable instructions executable by the processor 1601, the processor 1601 communicating with the storage medium 1602 via the bus 1603 when the electronic device is operated, the processor 1601 executing the machine readable instructions to perform the steps of the deep learning model training method as described in the foregoing embodiments. The specific implementation and technical effects are similar, and are not described herein again.
For ease of illustration, only one processor is described in the above electronic device. However, it should be noted that in some embodiments, the electronic device in the present invention may further include multiple processors, and thus, the steps performed by one processor described in the present invention may also be performed by multiple processors in combination or individually.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the present invention shall be covered thereby. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (15)
1. A distributed batch task scheduling method is characterized by comprising the following steps:
acquiring a flow scheduling tree, wherein the flow scheduling tree is created by a task control center according to pre-stored configuration information;
analyzing the flow scheduling tree, acquiring a plurality of subtasks to be executed, and registering each subtask in a task scheduling center;
and the task scheduling center distributes each registered subtask to an execution unit according to a preset load balancing strategy and executes the subtask.
2. The method of claim 1, wherein the parsing the process scheduling tree, obtaining a plurality of subtasks to be executed, and registering each of the subtasks in a task scheduling center comprises:
analyzing the flow scheduling tree, and sequentially acquiring the plurality of subtasks to be executed according to the execution sequence of each subtask in the flow scheduling tree;
and registering each subtask in a task scheduling center according to the execution sequence.
3. The method according to claim 2, wherein the distributing and executing each of the registered subtasks to an execution unit comprises:
after receiving the subtasks, a unit manager of the execution unit acquires node information of the subtasks and configuration parameters of an execution process of the subtasks;
and the unit manager sequentially issues each subtask to the execution unit indicated by the node information of the subtask to execute according to the execution flow configuration parameter of each subtask and the node information of the subtask.
4. The method of claim 3, wherein the node information of the subtask node includes: and indicating the server node, the resource node and the task node corresponding to the subtask.
5. The method of claim 4, wherein the server node comprises at least one server sub-node;
each server sub-node comprises at least one task node, and the task node is used for responding to the execution unit and executing a sub-task.
6. The method according to claim 5, wherein the availability status of the task node is available if a temporary node for indicating the availability status of the task node is further included under the server sub-node;
and if the server child node does not comprise the temporary node for indicating the availability state of the task node, the availability state of the task node is unavailable.
7. The method of claim 4, wherein the resource nodes include database table child nodes and resource child nodes;
the database table child node is used for storing database fragment resource configuration information;
the resource child node is used for storing the resource configuration information of the server.
8. The method of claim 4, wherein the task sub-nodes are configured to store task information.
9. The method of claim 1, wherein the preset load balancing policy comprises:
acquiring the performance weight of a corresponding server according to the performance parameters of the execution unit of each server and the subtasks being executed by the execution unit;
the distributing and executing each of the registered subtasks to the execution unit includes:
and distributing the subtasks to the execution units corresponding to the servers with the highest performance weights.
10. The method of claim 9, wherein obtaining the performance weight of the corresponding server according to the performance parameter of the execution unit of each server and the subtask being executed by the execution unit comprises:
acquiring the initial weight of the execution unit according to the performance parameters of the execution unit of each server;
obtaining the effective weight of the execution unit according to the initial weight and the subtasks being executed by the execution unit;
and acquiring the current weight of the execution unit, and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server.
11. The method of claim 10, after obtaining the current weight of the execution unit and taking the sum of the effective weight of the execution unit and the current weight of the execution unit as the performance weight of the server, further comprising:
and acquiring the sum of the effective weights of all the execution units, and updating the weight value obtained by subtracting the sum of the effective weights of all the execution units from the performance weight to be the current weight of the execution unit.
12. The method of claims 1-11, wherein the subtasks include: subtask node object, subprocess sequence number;
each subtask flow program number corresponds to at least one subtask flow, and each subtask flow corresponds to at least one subtask node object.
13. A distributed batch task scheduling apparatus, the apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a flow scheduling tree, and the flow scheduling tree is created by a task control center according to pre-stored configuration information;
the analysis module is used for analyzing the flow scheduling tree, acquiring a plurality of subtasks to be executed and registering each subtask in a task scheduling center;
and the execution module is used for distributing each registered subtask to the execution unit and executing the subtask according to a preset load balancing strategy by the task scheduling center.
14. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method of any one of claims 1 to 12 when executed.
15. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110174120.3A CN112988344A (en) | 2021-02-09 | 2021-02-09 | Distributed batch task scheduling method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110174120.3A CN112988344A (en) | 2021-02-09 | 2021-02-09 | Distributed batch task scheduling method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112988344A true CN112988344A (en) | 2021-06-18 |
Family
ID=76347814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110174120.3A Pending CN112988344A (en) | 2021-02-09 | 2021-02-09 | Distributed batch task scheduling method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112988344A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114416346A (en) * | 2021-12-23 | 2022-04-29 | 广州市玄武无线科技股份有限公司 | Multi-node task scheduling method, device, equipment and storage medium |
CN115454640A (en) * | 2022-09-21 | 2022-12-09 | 苏州启恒融智信息科技有限公司 | Task processing system and self-adaptive task scheduling method |
CN116028188A (en) * | 2023-01-30 | 2023-04-28 | 合众新能源汽车股份有限公司 | Scheduling system, method and computer-readable medium for cloud computing tasks |
CN118796950A (en) * | 2024-09-12 | 2024-10-18 | 麦乐峰(厦门)智能科技有限公司 | An information collection system based on big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104536814A (en) * | 2015-01-16 | 2015-04-22 | 北京京东尚科信息技术有限公司 | Method and system for processing workflow |
CN105630589A (en) * | 2014-11-24 | 2016-06-01 | 航天恒星科技有限公司 | Distributed process scheduling system and process scheduling and execution method |
CN108958920A (en) * | 2018-07-13 | 2018-12-07 | 众安在线财产保险股份有限公司 | A kind of distributed task dispatching method and system |
AU2020103205A4 (en) * | 2020-10-20 | 2021-01-14 | Agricultural Information Institute, Chinese Academy of Agricultural Sciences | Biological information deep mining and analysis system infrastructure construction method |
-
2021
- 2021-02-09 CN CN202110174120.3A patent/CN112988344A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105630589A (en) * | 2014-11-24 | 2016-06-01 | 航天恒星科技有限公司 | Distributed process scheduling system and process scheduling and execution method |
CN104536814A (en) * | 2015-01-16 | 2015-04-22 | 北京京东尚科信息技术有限公司 | Method and system for processing workflow |
CN108958920A (en) * | 2018-07-13 | 2018-12-07 | 众安在线财产保险股份有限公司 | A kind of distributed task dispatching method and system |
AU2020103205A4 (en) * | 2020-10-20 | 2021-01-14 | Agricultural Information Institute, Chinese Academy of Agricultural Sciences | Biological information deep mining and analysis system infrastructure construction method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114416346A (en) * | 2021-12-23 | 2022-04-29 | 广州市玄武无线科技股份有限公司 | Multi-node task scheduling method, device, equipment and storage medium |
CN115454640A (en) * | 2022-09-21 | 2022-12-09 | 苏州启恒融智信息科技有限公司 | Task processing system and self-adaptive task scheduling method |
CN115454640B (en) * | 2022-09-21 | 2024-01-19 | 苏州启恒融智信息科技有限公司 | Task processing system and self-adaptive task scheduling method |
CN116028188A (en) * | 2023-01-30 | 2023-04-28 | 合众新能源汽车股份有限公司 | Scheduling system, method and computer-readable medium for cloud computing tasks |
CN116028188B (en) * | 2023-01-30 | 2023-12-01 | 合众新能源汽车股份有限公司 | Scheduling system, method and computer readable medium for cloud computing task |
WO2024159940A1 (en) * | 2023-01-30 | 2024-08-08 | 合众新能源汽车股份有限公司 | Scheduling system and method for cloud computing task, and computer-readable medium |
CN118796950A (en) * | 2024-09-12 | 2024-10-18 | 麦乐峰(厦门)智能科技有限公司 | An information collection system based on big data |
CN118796950B (en) * | 2024-09-12 | 2024-11-12 | 麦乐峰(厦门)智能科技有限公司 | An information collection system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112988344A (en) | Distributed batch task scheduling method, device, equipment and storage medium | |
CN110908795B (en) | Cloud computing cluster mixed part job scheduling method and device, server and storage device | |
US9262228B2 (en) | Distributed workflow in loosely coupled computing | |
US20090282413A1 (en) | Scalable Scheduling of Tasks in Heterogeneous Systems | |
CN110221920B (en) | Deployment method, device, storage medium and system | |
Delamare et al. | SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures | |
CN113886089B (en) | Task processing method, device, system, equipment and medium | |
CN114896068B (en) | Resource allocation method, resource allocation device, electronic device and storage medium | |
CN110838939A (en) | A lightweight container-based scheduling method and edge IoT management platform | |
CN108268314A (en) | A kind of method of multithreading task concurrent processing | |
CN113255165A (en) | Experimental scheme parallel deduction system based on dynamic task allocation | |
CA2631255A1 (en) | Scalable scheduling of tasks in heterogeneous systems | |
CN112214318A (en) | Task scheduling method, system, device and medium | |
US20120059938A1 (en) | Dimension-ordered application placement in a multiprocessor computer | |
CN114595041A (en) | Resource scheduling system and method | |
US20100122261A1 (en) | Application level placement scheduler in a multiprocessor computing environment | |
CN109257256A (en) | Apparatus monitoring method, device, computer equipment and storage medium | |
CN113472557B (en) | A virtual network element processing method, device and electronic equipment | |
CN118069349A (en) | A variable depth resource management method and system for multiple scenarios | |
Sahu et al. | Efficient load Balancing algorithm analysis in Cloud Computing | |
CN111522664A (en) | Service resource management and control method and device based on distributed service | |
Di Stefano et al. | Improving the allocation of communication-intensive applications in clouds using time-related information | |
US20100122254A1 (en) | Batch and application scheduler interface layer in a multiprocessor computing environment | |
CN116112497B (en) | Node scheduling method, device, equipment and medium of cloud host cluster | |
CN109614239A (en) | System cluster load-balancing method, device and relevant device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |