CN112860387A

CN112860387A - Distributed task scheduling method and device, computer equipment and storage medium

Info

Publication number: CN112860387A
Application number: CN201911183995.9A
Authority: CN
Inventors: 张杨
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2021-05-28

Abstract

The invention provides a distributed task scheduling method, a distributed task scheduling device, computer equipment and a computer readable storage medium, wherein the method comprises the following steps: the method comprises the steps that a main node receives a task to be executed and acquires a resource consumption object corresponding to the task to be executed, wherein the resource consumption object is used for representing a resource object which consumes the most when the task to be executed runs; acquiring resource state information of a plurality of slave nodes, wherein the resource state information comprises the number of processes currently operated by the slave nodes and the current remaining resource numerical values of the slave nodes; based on the resource consumption object, taking the slave nodes of which the number of the currently running processes accords with a preset first rule and the current remaining resource numerical value accords with a preset second rule as target slave nodes; and distributing the task to be executed to the target slave node for processing.

Description

Distributed task scheduling method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of task scheduling technologies, and in particular, to a distributed task scheduling method and apparatus, a computer device, and a storage medium.

Background

Distributed task scheduling is the research direction of many network companies at present, and the traditional task scheduling mode generally refers to that tasks to be processed are alternately distributed among a plurality of execution nodes according to a fixed sequence or are randomly distributed by a scheduling platform, so that the defect that a task distribution strategy is simple and rough and the best matching effect between the tasks to be processed and the execution nodes cannot be achieved is overcome. In the prior art, tasks are distributed according to the processing capacity of an execution node, and the faster a certain execution node processes a task, the more tasks are distributed. However, the way of qualitatively allocating tasks according to the processing speed is still too general, and actually, because the resources consumed by each task are different, the processing speeds of the execution nodes with different resource configurations for the same task are also different. Therefore, how to provide a more flexible and reasonable task scheduling scheme becomes a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention aims to provide a distributed task scheduling method, a distributed task scheduling device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, the matching effect between a task to be processed and a task execution node is poor.

In order to achieve the above object, the present invention provides a distributed task scheduling method, which includes the following steps:

the method comprises the steps that a main node receives a task to be executed and acquires a resource consumption object corresponding to the task to be executed, wherein the resource consumption object is used for representing a resource object which consumes the most when the task to be executed runs;

acquiring resource state information of a plurality of slave nodes, wherein the resource state information comprises the number of processes currently operated by the slave nodes and the current remaining resource numerical values of the slave nodes;

based on the resource consumption object, taking the slave nodes of which the number of the currently running processes accords with a preset first rule and the current remaining resource numerical value accords with a preset second rule as target slave nodes;

and distributing the task to be executed to the target slave node for processing.

The distributed task scheduling method provided by the invention is characterized in that the resource consumption object comprises a first resource and a second resource, and the step of taking the slave node, in which the number of the currently running processes conforms to a preset first rule and the value of the currently remaining resource conforms to a preset second rule, as the target slave node based on the resource consumption object comprises the following steps:

taking the slave nodes with the number of the currently running processes smaller than the total number of the kernels as candidate slave nodes;

when the resource consumption object is a first resource, taking the candidate slave node with the maximum value of the currently remaining first resource as a target slave node;

and when the resource consumption object is a second resource, taking the candidate slave node with the maximum value of the currently remaining second resource as a target slave node.

According to the distributed task scheduling method provided by the present invention, the step of using the slave nodes with the number of currently running processes smaller than the total number of cores as candidate slave nodes further comprises:

and taking the slave nodes of which the currently remaining first resource value is greater than the first resource limit and the currently remaining second resource value is greater than the second resource limit as candidate slave nodes, wherein the first resource limit and the second resource limit are preset by the master node.

According to the distributed task scheduling method provided by the present invention, after the step of allocating the task to be executed to the target slave node for processing, the method further includes:

monitoring a first actual resource value and a second actual resource value consumed by the target slave node when processing the task to be executed;

and adjusting the first resource limit and the second resource limit according to the first actual resource value and the second actual resource value.

According to the distributed task scheduling method provided by the invention, the step of acquiring the residual resource information of the plurality of slave nodes comprises the following steps:

acquiring the residual resource information of a plurality of slave nodes from a resource state table, wherein the residual resource information in the resource state table is obtained by the slave nodes reporting to the master node, or the master node subtracting the task allocation information of the last time from the residual resource information reported by the slave nodes last time.

According to the distributed task scheduling method provided by the invention, the step of acquiring the remaining resource information of a plurality of slave nodes from the resource state table comprises the following steps:

acquiring a first moment of last acquiring the residual resource information reported by the slave node;

acquiring a second moment of distributing tasks to the slave nodes at the latest time;

if the first time is earlier than the second time, subtracting the first resource limit from the current remaining first resource value reported by the slave node to serve as first remaining resource information in the resource state table, and subtracting the second resource limit from the current remaining second resource value reported by the slave node to serve as second remaining resource information in the resource state table;

and if the first time is not earlier than the second time, taking the residual resource information reported by the slave node as the residual resource information in the resource state table.

According to the distributed task scheduling method provided by the invention, the step that the main node receives the task to be executed and acquires the resource consumption object corresponding to the task to be executed comprises the following steps:

the main node receives a task to be executed and acquires a task type contained in the task to be executed;

and acquiring the resource consumption object matched with the task type according to a preset matching relation.

According to the distributed task scheduling method provided by the invention, the resource consumption object comprises a CPU resource and a memory resource.

To achieve the above object, the present invention further provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above method.

The distributed task scheduling method, the distributed task scheduling device, the computer equipment and the computer storage medium provided by the invention provide a solution for distributing tasks according to the resource consumption types of the tasks and the resource performance of the corresponding execution nodes. Specifically, the invention divides different tasks into CPU intensive type and memory intensive type, and then reasonably distributes resources according to the number of processes currently operated by the execution node and the resource state information. Secondly, the invention sets the resource quota for each task to be processed, limits the maximum number of resources consumed when the task to be processed runs on the slave node, and avoids the condition that the slave node cannot run due to some abnormal tasks or malicious tasks. And thirdly, the master node in the invention can additionally store a record table besides acquiring the resource state information reported by the slave node in real time, and is used for recording the slave node resource state information calculated based on the task allocation condition and selecting the resource state information reported by the slave node or the resource state information recorded in the record table according to the actual condition, thereby solving the problem of inaccurate slave node state information caused by transmission delay.

Drawings

FIG. 1 is a flowchart of a distributed task scheduling method according to a first embodiment of the present invention;

FIG. 2 is a block diagram of a first embodiment of a distributed task scheduler;

fig. 3 is a schematic hardware structure diagram of a distributed task scheduling apparatus according to a first embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The invention is suitable for a task scheduling system comprising a multi-node cluster, such as a master-slave task distribution system comprising a master node and a plurality of slave nodes. The master node is used for receiving the tasks to be processed, managing the running states of the plurality of slave nodes and achieving overall scheduling of the received tasks to be processed among the plurality of slave nodes. The slave node is used as a task execution main body and used for reporting the running state of the slave node to the master node in real time and executing substantial processing on the received task to be processed.

Referring to fig. 1, the present embodiment provides a distributed task scheduling method, which specifically includes the following steps:

and S1, the main node receives the task to be executed and acquires the resource consumption object corresponding to the task to be executed, wherein the resource consumption object is used for representing the resource object which is consumed the most when the task to be executed is operated.

The task to be executed in the invention can be a timing task or a task triggered by a user in real time. The task to be executed is firstly sent to the main node, and the task to be executed is distributed through the main node.

The task to be executed received by the main node in the invention at least comprises a task name and a task type. The task name is used for representing the specific content of the task, and the task type is a type divided in advance according to the task function. Specifically, at the beginning of creating the task, the invention divides different tasks into four types of mr, hivesql, shell and email in advance. The task of mr (mapreduce, distributed computation framework) type refers to a batch-based computing task, the task of hivesql type refers to a big data query task, the task of shell type refers to an interface task between a user and a system based on shell language, and the task of email type refers to a task related to e-mail.

On the basis of acquiring the task type of the task to be processed, the resource consumption object is further determined according to the task type. The resource consumption object is used for representing which resource a task is focused on consuming at the time of operation. For example, a CPU-intensive task represents that the task consumes a lot of CPU resources during operation, and a memory-intensive task represents that the task consumes a lot of memory during operation. In this embodiment, the present invention divides two tasks, mr and shell, which need to be independent resources into memory intensive tasks, and divides two tasks, hivesql and email, into cpu intensive tasks. That is, if the task type of the task to be processed is an mr type or a shell type, the corresponding resource consumption object is a memory; and if the task type of the task to be processed is a hivesql type or an email type, the corresponding resource consumption object is a CPU.

And S2, acquiring resource state information of a plurality of slave nodes, wherein the resource state information comprises the number of processes currently operated by the slave nodes and the current remaining resource value of the slave nodes.

The method comprises the steps of acquiring resource state information of all slave nodes scheduled by a master node, wherein the resource state information specifically comprises the number of processes currently operated by the slave nodes, the residual CPU resources of the slave nodes and the residual memory resources of the slave nodes.

The master node in the invention acquires the resource state information of the slave node in two ways, one is based on the resource state information reported by the slave node at regular time, and the other is based on the resource state information of the slave node recorded by the master node.

For the first mode, a plurality of slave nodes report resource state information to the zookeeper at regular time according to a preset time interval, and once the zookeeper receives the resource state information reported by the slave nodes, the zookeeper immediately sends the received resource state information to the corresponding master node. The zookeeper is a distributed service framework and is used for maintaining a data structure similar to a file system and monitoring directory nodes in the data structure. Once a directory node is found to be changed (data is changed, deleted, and a child directory node is added and deleted), zookeeper notifies the corresponding host node. The preset time interval may be, for example, reporting once every 10 minutes, or reporting once every 5 minutes, and the like, and may be specifically determined according to the distribution density of the to-be-processed task, which is not limited in the present invention.

For the second mode, the master node in the invention can calculate the resource state information of the slave node in real time according to the resource state information reported by the slave node and the task allocation condition. For example, the slave node a reports the resource status information every 10 minutes, assuming that 15:00 minutes of slave nodes report to the master node: the number of the current running processes is 10, the residual CPU resources are 30 cores, and the residual memory resources are 20G. If the master node allocates the to-be-processed task P to the slave node a in 15:03, knowing that the CPU resource limit of the to-be-processed task P is 2 cores and the memory limit of the to-be-processed task P is 2G, the master node subtracts the resource limit of the to-be-processed task P on the basis of the received resource state information reported by the slave node a, and uses the resource limit as the resource state information of the slave node a calculated by the master node, specifically, the number of currently-running processes is 10, the number of remaining CPU resources is 28 cores, and the remaining memory resources are 18G. The invention will be described in detail below with respect to the setting of resource limits.

Further, the invention determines whether the resource state information to be acquired is the information reported according to the timing of the slave node or the information obtained according to the real-time calculation of the master node according to the precedence relationship between the time T1 when the slave node reports the information last time and the time T2 when the master node distributes tasks to the relevant slave nodes last time. The general principle is that if the time T1 of the last information report of the slave node is earlier than the time T2 of the last task allocation of the master node to the relevant slave nodes, the resource state information is obtained according to the real-time calculation of the master node; conversely, if the time T1 of the last report of the information by the slave node is later than the time T2 of the last task distributed to the relevant slave node by the master node, the resource state information is acquired according to the timed report of the slave node.

As mentioned above, for the slave node a, since the time 15:03 for the master node to allocate the task last time is later than the time 15:00 reported last time by the slave node, the resource state information of the slave node a, that is, the number of currently running processes 10, the remaining CPU resource 28 cores, and the remaining memory resource 18G, are obtained according to the real-time calculation of the master node.

And S3, based on the resource consumption object, taking the slave nodes of which the number of the currently running processes conforms to a preset first rule and the current remaining resource value conforms to a preset second rule as target slave nodes.

This step is used to allocate the task to be processed to a suitable target slave node according to the resource consumption object of the task to be processed acquired in step S1. The specific allocation strategy is to use the slave node, as a target slave node, where the number of currently running processes meets a preset first rule and the current remaining resource value meets a preset second rule.

The first rule of the invention is that the slave nodes with the number of the current running processes not larger than the total number of CPU cores are taken as the first candidate slave nodes. The rule is to prevent the problem that the running speed of the task to be processed is slow due to the fact that the number of processes running in the slave nodes is too large.

According to the second rule, a slave node with resource state information larger than the resource quota of the task to be processed is selected from the first candidate slave nodes to serve as a second candidate slave node. The resource quota of the task to be processed is the maximum CPU number and the maximum memory number set by the conventional cgroup mechanism. The invention aims to set the resource quota for the task to be processed, so as to prevent the condition that the task to be processed is slow in operation or even can not be operated because the task to be processed excessively occupies CPU (central processing unit) resources or memory resources of the slave node in the operation process. Therefore, the remaining CPU resources and the remaining memory resources of the second candidate slave node in the present invention must be greater than the maximum CPU number and the maximum memory number of the to-be-processed task, respectively, that is, the resource state information of the slave node must be greater than the resource quota of the to-be-processed task.

It should be noted that the resource quota of the task to be processed in the present invention can be dynamically modified according to the actual operation condition. For example, the maximum number of CPUs originally set for a certain task to be processed is 2 cores, and the maximum number of memories is 2G. However, by counting the actual running conditions in the last thirty days, it is found that the CPU resource actually occupied by the task to be processed during running is not more than 1 core, and the memory resource actually occupied by the task to be processed is not more than 1G. Under the condition, the resource quota of the task to be processed can be dynamically adjusted to be 1 core with the maximum CPU number and 1G with the maximum memory number, so that unnecessary resource waste is avoided, and reasonable scheduling and distribution of the task to be processed are realized.

Further, the second rule of the present invention further includes, if the resource consumption object of the task to be processed is a CPU, selecting a slave node with the largest remaining CPU resource from the second candidate slave nodes as a target slave node; and if the resource consumption object of the task to be processed is the memory, selecting the slave node with the largest residual memory resource from the second candidate slave nodes as the target slave node. Through the allocation rule, the invention establishes reasonable relation between the resource consumption object of the task to be processed and the resource state information of the slave node, and allocates the target slave node which is most consistent with the different task to be processed, thereby establishing a more flexible and reasonable task allocation mechanism.

And S4, distributing the task to be executed to the target slave node for processing.

Upon determining the contact of the target slave node, this step is for assigning the pending task to the target slave node through which the pending task is executed.

And in the process that the target slave node runs the task to be processed, the resource state information is still reported to the master node at regular time according to a preset time interval, and the master node is waited to distribute the task again.

Referring to fig. 2, a distributed task scheduling device is shown, in this embodiment, the task scheduling device 10 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the task scheduling method described above. The program modules referred to herein are a series of computer program instruction segments that perform particular functions and are more suitable than the program itself for describing the execution of the task scheduler 10 on a storage medium. The following description will specifically describe the functions of the program modules of the present embodiment:

the task receiving module 11 is adapted to receive a task to be executed by a host node, and obtain a resource consumption object corresponding to the task to be executed, where the resource consumption object is used to represent a resource object that is consumed the most when the task to be executed is executed;

a state obtaining module 12, adapted to obtain resource state information of a plurality of slave nodes, where the resource state information includes a number of processes currently running by the slave nodes and a current remaining resource value of the slave nodes;

a target determining module 13, adapted to, based on the resource consumption object, take the slave node as a target slave node, where the number of currently running processes conforms to a preset first rule, and the number of currently remaining resources conforms to a preset second rule;

and the task distribution module 14 is adapted to distribute the task to be executed to the target slave node for processing.

Wherein, the task receiving module 11 includes:

the task type unit 111 is adapted to receive a task to be executed by the master node and acquire a task type included in the task to be executed;

and the resource consumption object unit 112 is adapted to obtain the resource consumption object matched with the task type according to a preset matching relationship.

The state obtaining module 12 is configured to obtain remaining resource information of a plurality of slave nodes from a resource state table, where the remaining resource information in the resource state table is reported to the master node by the slave nodes, or is calculated by the master node by subtracting task allocation information of the last time from the remaining resource information reported last time by the slave nodes, and the state obtaining module 12 specifically includes:

a first time unit 121, configured to obtain a first time at which the remaining resource information reported by the slave node is obtained last time;

a second time unit 122, adapted to obtain a second time of a last task assignment to the slave node;

a first resource information unit 123, adapted to, if the first time is earlier than the second time, use the current remaining first resource value reported by the slave node minus the first resource quota as the first remaining resource information in the resource state table, and use the current remaining second resource value reported by the slave node minus the second resource quota as the second remaining resource information in the resource state table;

a second resource information unit 124, configured to, if the first time is not earlier than the second time, use the remaining resource information reported by the slave node as the remaining resource information in the resource status table.

Wherein the target determination module 13 comprises:

a first determining unit 131, adapted to take the slave nodes whose number of currently running processes is less than the total number of cores as candidate slave nodes;

a second determining unit 132, adapted to use a slave node whose currently remaining first resource value is greater than a first resource limit and whose currently remaining second resource value is greater than a second resource limit as a candidate slave node, where the first resource limit and the second resource limit are preset by the master node;

a target determining unit 133, adapted to, when the resource consumption object is a first resource, take a candidate slave node with a largest numerical value of the currently remaining first resource as a target slave node; and when the resource consumption object is a second resource, taking the candidate slave node with the maximum value of the currently remaining second resource as a target slave node.

The embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers) capable of executing programs, and the like. The computer device 20 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 22, which may be communicatively coupled to each other via a system bus, as shown in FIG. 3. It is noted that fig. 3 only shows the computer device 20 with components 21-22, but it is to be understood that not all shown components are required to be implemented, and that more or fewer components may be implemented instead.

In the present embodiment, the memory 21 (i.e., a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 20, such as a hard disk or a memory of the computer device 20. In other embodiments, the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 20. Of course, the memory 21 may also include both internal and external storage devices of the computer device 20. In this embodiment, the memory 21 is generally used for storing an operating system and various application software installed in the computer device 20, such as program codes of the first embodiment distributed to the task scheduling device 10. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 20. In this embodiment, the processor 22 is configured to run the program code stored in the memory 21 or process data, for example, run the distributed task scheduling apparatus 10, so as to implement the distributed task scheduling method according to the first embodiment.

The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer readable storage medium of this embodiment is used to store the distributed human service scheduling apparatus 10, and when executed by a processor, the distributed task scheduling method of the first embodiment is implemented.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example" or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A distributed task scheduling method is characterized by comprising the following steps:

2. The distributed task scheduling method according to claim 1, wherein the resource consumption object includes a first resource and a second resource, and the step of taking the slave node, in which the number of currently running processes meets a preset first rule and the value of the currently remaining resource meets a preset second rule, as the target slave node based on the resource consumption object includes:

3. The distributed task scheduling method of claim 2, wherein the step of using the slave nodes with the number of currently running processes smaller than the total number of cores as candidate slave nodes further comprises:

4. The distributed task scheduling method according to claim 3, wherein after the step of allocating the task to be executed to the target slave node for processing, the method further comprises:

5. The distributed task scheduling method according to claim 1, wherein the step of obtaining the remaining resource information of the plurality of slave nodes comprises:

6. The distributed task scheduling method of claim 5, wherein the step of obtaining the remaining resource information of the plurality of slave nodes from the resource status table comprises:

7. The distributed task scheduling method according to claim 1, wherein the master node receives a task to be executed, and the step of obtaining a resource consumption object corresponding to the task to be executed comprises:

8. The distributed task scheduling method of claim 1 wherein the resource consuming objects include CPU resources and memory resources.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented by the processor when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.