CN113032116B

CN113032116B - Training method of task time prediction model, task scheduling method and related devices

Info

Publication number: CN113032116B
Application number: CN202110247231.2A
Authority: CN
Inventors: 李晓杰
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-03-05
Anticipated expiration: 2041-03-05
Also published as: CN113032116A

Abstract

The application discloses a training method of a task time prediction model, a task scheduling method and a related device, wherein the training method of the task time prediction model comprises the following steps: converting task information of the historical task into a gray image; the task information comprises a task operation instruction, resources required by a task and a code for task operation; inputting the gray image into a task time prediction model, and outputting the predicted running time corresponding to the historical task; and adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task. By the scheme, the resource utilization rate can be improved.

Description

Training method of task time prediction model, task scheduling method and related devices

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training method of a task time prediction model, a task scheduling method, and related devices.

Background

Many companies' businesses contain a large number of deep learning models, which typically require model training by a high performance device, such as a GPU. For this reason, many companies using deep learning algorithms build specialized clusters, where the resources of each machine in the cluster are fixed, but the resources required for training tasks are different, and the running times required are different.

The typical small card training tasks are run on a single machine, rather than distributed, because such training is the fastest. Assuming that each machine in the cluster has 8 GPUs, the tasks running on each machine occupy 7 GPUs, and if the tasks of 1 single machine and 2 GPUs are required to be scheduled, the total resources of the cluster are enough, but no machine with 2 GPUs idle is available, so that new tasks cannot be scheduled, which is a problem caused by resource fragmentation, and the resource utilization rate is low.

Disclosure of Invention

The technical problem to be solved mainly by the application is to provide a training method of a task time prediction model, a task scheduling method and a related device, which can improve the resource utilization rate.

To solve the above problem, a first aspect of the present application provides a training method of a task time prediction model, where the training method includes: converting task information of the historical task into a gray image; the task information comprises a task operation instruction, resources required by a task and a code for task operation; inputting the gray image into a task time prediction model, and outputting the predicted running time corresponding to the historical task; and adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

In order to solve the above problem, a second aspect of the present application provides a task scheduling method, including: predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node running a target task from all machine nodes meeting the resources required by the task to be run as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; scheduling the task to be operated to the target node; the task time prediction model is obtained by training the task time prediction model training method in the first aspect.

In order to solve the above problem, a third aspect of the present application provides a task scheduling system, including: the machine nodes are used for running tasks by utilizing system resources; the task scheduler is used for predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node running a target task from all machine nodes meeting the resources required by the task to be run as a target node; scheduling the task to be operated to the target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; the task time prediction model is obtained by training the training method of the task time prediction model in the first aspect.

In order to solve the above problem, a fourth aspect of the present application provides a training device for a task time prediction model, including: the information processing module is used for converting task information of the historical task into a gray image; the task information comprises a task operation instruction, resources required by a task and a code for task operation; the first prediction module is used for inputting the gray level image into a task time prediction model and outputting the predicted running time corresponding to the historical task; and the model optimization module is used for adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

In order to solve the above problem, a fifth aspect of the present application provides a task scheduling device, including: the second prediction module is used for predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; the node selection module is used for selecting a machine node running with a target task from all machine nodes meeting the resources required by the task to be run as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; the task scheduling module is used for scheduling the task to be operated to the target node; the task time prediction model is obtained by training the task time prediction model training method in the first aspect.

In order to solve the above-mentioned problem, a sixth aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, to implement the training method of the task time prediction model of the first aspect, or the task scheduling method of the second aspect.

In order to solve the above-mentioned problems, a seventh aspect of the present application provides a computer-readable storage medium having stored thereon program instructions that, when executed by a processor, implement the training method of the task time prediction model of the above-mentioned first aspect, or the task scheduling method of the above-mentioned second aspect.

The beneficial effects of the invention are as follows: in the training method of the task time prediction model, task information of a historical task is converted into a gray image, wherein the task information comprises a task running instruction, resources required by the task and codes for running the task, then the gray image is input into the task time prediction model, and the predicted running time corresponding to the historical task is output, so that the network parameters of the task time prediction model can be adjusted based on the difference between the predicted running time corresponding to the historical task and the actual running time, and the task time prediction model for predicting the running time of the task can be trained and obtained, and technical support is provided for improving the resource utilization rate; the predicted running time of the task to be run can be obtained by using the task time prediction model, so that the task to be run can be scheduled to a machine node with the residual time of the target task being close to the consumption time of the task to be run, the completion time of the task to be run is close to the completion time of the target task, and the machine node can simultaneously vacate resources as much as possible for the scheduling use of the subsequent task, thereby solving the problem of resource fragmentation, fully utilizing cluster resources and improving the resource utilization rate.

Drawings

FIG. 1 is a flow chart of one embodiment of a training method of a task time prediction model of the present application;

FIG. 2 is a flowchart illustrating an embodiment of step S11 in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment of step S13 in FIG. 1;

FIG. 4 is a flow chart of an embodiment of a task scheduling method of the present application;

FIG. 5 is a flowchart illustrating an embodiment of step S42 in FIG. 4;

FIG. 6 is a schematic diagram of a framework of one embodiment of a task scheduling system of the present application;

FIG. 7 is a schematic block diagram of one embodiment of a training apparatus of the task time prediction model of the present application;

FIG. 8 is a schematic diagram of a framework of one embodiment of a task scheduler of the present application;

FIG. 9 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;

FIG. 10 is a schematic diagram of a framework of one embodiment of the computer-readable storage medium of the present application.

Detailed Description

The following describes the embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a training method of a task time prediction model according to the present application. Specifically, the training method of the task time prediction model in this embodiment may include the following steps:

step S11: task information of the historical task is converted into a gray scale image. The task information comprises a task running instruction, resources required by the task and a code for running the task.

The artificial intelligence can simulate the information process of consciousness and thinking of people, while the artificial intelligence is continuously developed, the nature of the artificial intelligence is unchanged, data to be learned are arranged into training samples, and then an algorithm is written for learning, and various parameters or algorithms are adjusted until the calculation result reaches the design target. According to the nature of artificial intelligence, training samples are the basis of artificial intelligence, and can be used as the content of training sample data, such as a lot of texts, sounds, pictures and the like. In order to enable a computer to learn, words, sounds and pictures can be converted into a digital matrix through digitalization, and learning is achieved through an algorithm. Therefore, it can be understood that in order to predict the running time of a task using the task time prediction model, task information of the task needs to be processed as input readable by the task time prediction model, that is, task information of a historical task needs to be converted into a grayscale image.

In the application, the task information of the task may include a task running instruction, resources required by the task, and codes of task running, and of course, the task information may also include other parameters, such as a task name, a task frame, a duration of task input, a task iteration number, and the like. Resources required for a task include at least one of GPU, CPU, memory, and disk.

Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of step S11 in fig. 1. In an embodiment, the step S11 may specifically include:

step S111: and combining the task information into a binary information text.

Specifically, the task information may include various text formats, such as exe, apk, doc and txt, and the like, and the text needs to be converted into a binary file that can be recognized by a computer; for example, english words are composed of 26 letters, even if some punctuations and special symbols are added, fewer numbers can be used for representing, so that each plurality of bytes can be set to represent one letter or symbol in a computer algorithm, and English text can be converted into a binary file which can be recognized by a computer. Similarly, other plain text written in natural language can be converted into a binary file recognizable by the computer, so that all the contents contained in the task information can be combined into a binary information text.

Step S112: taking each 8 bits of the binary information text as an image value in sequence, and arranging all the image values according to a square array to form an initial image. Wherein, the range of the image value is 0-255, and the blank part in the initial image is filled with zero.

After task information is combined into a binary information text, each 8 bits of the binary information text are sequentially used as an image value, namely, for the combined binary information text, the read binary stream is an 8-bit non-negative integer vector, the value range represented by 8 bits is 0-255, the value range represented by 0-255 corresponds to the pixel value of a gray level image, then each 8 bits of the binary stream is mapped into a pixel point, the pixel value of the pixel point is the image value, then the pixel values of the pixel points are adjusted into a two-dimensional matrix according to the size of the combined binary information text, the two-dimensional matrix is square, a picture is obtained, and the picture is an initial image. It will be appreciated that the number of pixels is not well enough to form a square initial image, and therefore the blank portion of the initial image needs to be filled with zeros.

Step S113: and adjusting the initial image to 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

It will be appreciated that the size of the binary information text combined by the task information may be different for different tasks, resulting in different sizes of the initial images formed, while the sizes of the images corresponding to different tasks should be the same as the input of the task time prediction model; therefore, a standard image size may be set, for example, the standard image size may be set to 255×255, after the initial image is formed, the size of the initial image may be compared with the standard image size, and if the size of the initial image is different from the standard image size, the initial image needs to be scaled to obtain a gray image, so that the scaled gray image can reach the standard image size. Specifically, when the scaling processing is performed on the initial image, a bilinear difference algorithm may be used to calculate a pixel value of each pixel point in the scaled gray image, and the scaled gray image is used as an input of the task time prediction model.

Step S12: and inputting the gray image into a task time prediction model, and outputting the predicted running time corresponding to the historical task.

The task time prediction model can be freely set in structure, for example, a ResNet-50 structure can be adopted, after a gray image corresponding to a historical task is input into the task time prediction model, a full connection layer is finally used for outputting 1 numerical value of the model, and the predicted running time corresponding to the predicted historical task is obtained. It can be understood that the residual error network can solve the problem that the classification accuracy rate is not increased and decreased as the network deepens, and the prediction model can achieve the effect of improving the accuracy rate only by simply stacking the network depth through the residual error network.

Step S13: and adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

It can be understood that, because the historical task is adopted as a sample of the training task time prediction model, the historical task corresponds to the actual running time, and the task time prediction model is utilized to predict the historical task, so that the predicted running time corresponding to the historical task can be obtained, the predicted running time corresponding to the historical task and the actual running time are expected to have higher consistency, therefore, the difference between the predicted running time and the actual running time is obtained by comparing the predicted running time corresponding to the historical task and the actual running time, and then the network parameters of the task time prediction model can be adjusted according to the difference between the predicted running time and the actual running time, so as to update the task time prediction model.

Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 may specifically include:

step S131: and determining a square loss function of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

Step S132: and adjusting network parameters of the task time prediction model by utilizing a square loss function of the task time prediction model.

The method comprises the steps of comparing the predicted running time and the actual running time corresponding to the historical task to obtain the difference between the predicted running time and the actual running time, so that a loss function of a task time prediction model can be determined.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of a task scheduling method according to the present application. The task scheduling method in this embodiment may include the following steps:

step S41: and predicting the task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated. The task time prediction model is obtained by training any one of the task time prediction model training methods.

Taking an execution main body of the task scheduling method as a task manager for example, the task manager may receive an operation request of a task to be operated, that is, a user may submit the task to be operated to the task manager, for example, may submit a model training task or an application program. It can be understood that after the task manager obtains the task to be operated, the task to be operated may be analyzed, and task information of the task to be operated is extracted, where the task information of the task to be operated may include a task operation instruction, resources required by the task, and codes for the task to be operated. Then, task information of the task to be operated can be converted into a gray image, and then the gray image is input into a task time prediction model, so that the task time prediction model predicts the task to be operated, and the predicted operation time of the task to be operated is obtained.

Step S42: selecting a machine node running a target task from all machine nodes meeting the resources required by the task to be run as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be run is the smallest.

Step S43: and scheduling the task to be operated to the target node.

When task scheduling is needed, the machine nodes meeting the resources needed by the task to be operated are filtered, then the analysis is carried out according to the states of the tasks running on the machine nodes and the task to be operated, and the machine node running with the target task is selected as the target node. Specifically, the states of the running tasks on the machine nodes comprise the residual running time of the running tasks, and the ending time point of the running tasks and the predicted finishing time point of the tasks to be run can be obtained according to the residual running time of the running tasks and the predicted running time of the tasks to be run; it can be understood that, from all running tasks, the task with the smallest difference between the end time point and the expected completion time point of the task to be run is selected as the target task, and then the machine node running with the target task is selected as the target node, so that after the task to be run is scheduled to the target node, the completion time of the task to be run and the target task is closest, and the target node can almost simultaneously vacate resources for the subsequent task scheduling, thereby solving the problem of resource fragmentation, fully utilizing cluster resources and improving the resource utilization rate.

Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S42 in fig. 4. In an embodiment, the step S42 may specifically include:

step S421: when the operation requirement of the task to be operated is received, acquiring the resource requirement information of the task to be operated, and selecting machine nodes with idle resources meeting the resource requirement information from all machine nodes as candidate nodes.

Specifically, when receiving the operation requirement of the task to be operated, firstly acquiring the resource requirement information of the task to be operated, and then screening out the machine node of which the current idle resource meets the resource required by the task to be operated according to the resource requirement information of the task to be operated. The current idle resources refer to memory idle resources, CPU idle resources, GPU idle resources, disk idle resources and the like of the machine node. For example, the resource required by the task to be run is a GPU resource 2 core; when the GPU resource of the machine node A is not used, the GPU resource is a core 8, and the currently occupied GPU resource is a core 7, so that the currently idle resource of the machine node A is a core 1; when the GPU resource of the machine node B is not used, the GPU resource is a 6-core GPU resource, and the currently occupied GPU resource is a 4-core GPU resource, so that the currently idle resource of the machine node B is a 2-core GPU resource; it can be found that the current free resources of the machine node a do not meet the resources required by the task to be run, and the current free resources of the machine node B meet the resources required by the task to be run, so the machine node B can be used as a candidate node.

Step S422: and calculating the difference value between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference value as the target node.

Specifically, according to the remaining running time of the running task and the predicted running time of the task to be run on the candidate node, the end time point of the running task and the predicted completion time point of the task to be run can be obtained, for example, if the end time point of the task C to be run is close to the predicted completion time point of the task to be run, the task to be run is scheduled to the machine node with the task C, and then the completion time of the task to be run is close to that of the task C, so that the machine node can vacate resources for subsequent task scheduling as simultaneously as possible, and the problem of resource fragmentation can be solved. It can be understood that, from all running tasks, the task with the smallest difference between the end time point and the predicted completion time point of the task to be run is selected as the target task, then the machine node running the target task is selected as the target node, so that after the task to be run is scheduled to the target node, the completion time of the task to be run and the target task is closest, the target node can almost simultaneously vacate resources for the scheduling of the subsequent tasks, thereby solving the problem of resource fragmentation, greatly improving the overall utilization rate of the cluster, remarkably reducing the time of waiting for the start of the task after the user submits the task, fully utilizing the cluster resources, and improving the resource utilization rate.

In an embodiment, the target task is a task with the latest ending time point in the target node. Therefore, the required running time of the task to be run is predicted according to the task information of the task to be run, and then the task to be run is scheduled to the machine node with the latest running end time closest to the predicted running end time according to the predicted running time, so that the resources of the cluster can be fully utilized, and the problem of resource fragmentation is solved.

In addition, it should be noted that, regarding scheduling of a task to be executed, scheduling of a task to be executed into an idle machine node is selected only when there is no non-idle machine node in the cluster that meets the condition as the target node. Non-idle machine nodes refer to machine nodes currently running tasks, and idle machine nodes refer to machine nodes currently not running tasks.

The execution subject of the task scheduling method of the present application may be hardware or software. When the execution subject is hardware, it may be various electronic devices including, but not limited to, a smart phone, a tablet computer, an electronic book reader, an in-vehicle terminal, and the like. When the execution body is software, it may be installed in the above-listed electronic device, which may be implemented as a plurality of software or software modules for providing distributed tasks, or as a single software or software module. The present invention is not particularly limited herein.

According to the training method of the task time prediction model, task information of a historical task is converted into a gray image, then the gray image is input into the task time prediction model, and the predicted running time corresponding to the historical task is output, so that network parameters of the task time prediction model can be adjusted based on the difference between the predicted running time corresponding to the historical task and the actual running time, the task time prediction model for predicting the running time of the task can be trained, and technical support is provided for improving the resource utilization rate; the predicted running time of the task to be run can be obtained by using the task time prediction model, so that the task to be run can be scheduled to a machine node with the residual time of the target task being close to the consumption time of the task to be run, the completion time of the task to be run is close to the completion time of the target task, and the machine node can simultaneously vacate resources as much as possible for the scheduling use of the subsequent task, thereby solving the problem of resource fragmentation, fully utilizing cluster resources and improving the resource utilization rate.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a task scheduling system according to an embodiment of the present application. The task scheduling system 60 includes: a plurality of machine nodes 601, wherein the machine nodes 601 are used for running tasks by using system resources; a task scheduler 602, where the task scheduler 602 is configured to predict a task to be executed by using a task time prediction model, to obtain a predicted running time of the task to be executed; selecting a machine node 601 running a target task from all machine nodes 601 meeting the resources required by the task to be run as a target node; scheduling the task to be operated to the target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; the task time prediction model is obtained by training any one of the training methods of the task time prediction model.

According to the scheme, the task scheduler 602 obtains the predicted running time of the task to be run by using the task time prediction model, so that the task to be run can be scheduled to the machine node 601 with the residual time of the target task being close to the consumption time of the task to be run, the completion time of the task to be run is close to the completion time of the target task, and the machine node 601 can simultaneously vacate resources as much as possible for the subsequent task scheduling, thereby solving the problem of resource fragmentation, fully utilizing cluster resources and improving the resource utilization rate.

Referring to fig. 7, fig. 7 is a schematic diagram of a training apparatus for task time prediction model according to an embodiment of the present application. The training device 70 for the task time prediction model includes: an information processing module 700, wherein the information processing module 700 is used for converting task information of a historical task into a gray image; the task information comprises a task operation instruction, resources required by a task and a code for task operation; the first prediction module 702 is configured to input the grayscale image into a task time prediction model, and output a predicted running time corresponding to the historical task; and the model optimization module 704 is used for adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

In some embodiments, the information processing module 700 performs the step of converting task information of a historical task into a grayscale image, including: combining the task information into a binary information text; taking each 8 bits of the binary information text as an image value in sequence, and arranging all the image values according to a square array to form an initial image; wherein, the range of the image value is 0-255, and the blank part in the initial image is filled with zero; and adjusting the initial image to 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

In some embodiments, the model optimization module 704 performs adjusting network parameters of the task time prediction model based on a difference between a predicted run time and an actual run time corresponding to the historical task, including: determining a square loss function of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task; and adjusting network parameters of the task time prediction model by utilizing a square loss function of the task time prediction model.

Referring to fig. 8, fig. 8 is a schematic diagram of a task scheduling device according to an embodiment of the present application. The task scheduling device 80 includes: the second prediction module 800 is configured to predict a task to be operated by using a task time prediction model, so as to obtain a predicted operation time of the task to be operated; a node selection module 802, where the node selection module 802 is configured to select, from all machine nodes that meet the resources required by the task to be executed, a machine node that executes a target task as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; a task scheduling module 804, where the task scheduling module 804 is configured to schedule the task to be executed to the target node; the task time prediction model is obtained by training any one of the task time prediction model training methods.

In some embodiments, the node selection module 802 performs a step of selecting, from all machine nodes satisfying the resources required for the task to be executed, a machine node having a target task executed as a target node, including: when the operation requirement of the task to be operated is received, acquiring the resource requirement information of the task to be operated, and selecting machine nodes with idle resources meeting the resource requirement information from all machine nodes as candidate nodes; and calculating the difference value between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference value as the target node.

Referring to fig. 9, fig. 9 is a schematic frame diagram of an embodiment of an electronic device of the present application. The electronic device 90 comprises a memory 91 and a processor 92 coupled to each other, the processor 92 being configured to execute program instructions stored in the memory 91 to implement the steps of any one of the training method embodiments of the task time prediction model or the steps of any one of the task scheduling method embodiments described above. In one particular implementation scenario, electronic device 90 may include, but is not limited to: microcomputer, server.

Specifically, the processor 92 is configured to control itself and the memory 91 to implement the steps of any one of the training method embodiments of the task time prediction model or the steps of any one of the task scheduling method embodiments described above. The processor 92 may also be referred to as a CPU (Central Processing Unit ). The processor 92 may be an integrated circuit chip with signal processing capabilities. The processor 92 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 92 may be commonly implemented by an integrated circuit chip.

Referring to fig. 10, fig. 10 is a schematic diagram illustrating an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 100 stores program instructions 1000 that can be executed by a processor, where the program instructions 1000 are configured to implement the steps of any one of the training method embodiments of the task time prediction model or the steps of any one of the task scheduling method embodiments described above.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatuses may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A task scheduling method, characterized in that the task scheduling method comprises:

predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated;

selecting a machine node running a target task from all machine nodes meeting the resources required by the task to be run as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest;

scheduling the task to be operated to the target node;

the training method of the task time prediction model comprises the following steps:

converting task information of the historical task into a gray image; the task information comprises a task operation instruction, resources required by a task and a code for task operation;

inputting the gray image into a task time prediction model, and outputting the predicted running time corresponding to the historical task;

and adjusting network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task.

2. The task scheduling method according to claim 1, wherein selecting, from all machine nodes satisfying the resources required for the task to be executed, a machine node having a target task executed as a target node, comprises:

when the operation requirement of the task to be operated is received, acquiring the resource requirement information of the task to be operated, and selecting machine nodes with idle resources meeting the resource requirement information from all machine nodes as candidate nodes;

and calculating the difference value between the ending time point of each task operated in all the candidate nodes and the predicted completion time point of the task to be operated, and selecting the candidate node with the task with the minimum corresponding difference value as the target node.

3. The task scheduling method according to claim 1, wherein the target task is a task with a latest end time point in the target node.

4. The task scheduling method of claim 1, wherein,

the converting task information of the historical task into a gray image comprises the following steps:

combining the task information into a binary information text;

taking each 8 bits of the binary information text as an image value in sequence, and arranging all the image values according to a square array to form an initial image; wherein, the range of the image value is 0-255, and the blank part in the initial image is filled with zero;

and adjusting the initial image to 255 x 255 by using a bilinear interpolation algorithm to obtain the gray image.

5. The task scheduling method of claim 1, wherein,

the adjusting the network parameters of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task comprises:

determining a square loss function of the task time prediction model based on the difference between the predicted running time and the actual running time corresponding to the historical task;

and adjusting network parameters of the task time prediction model by utilizing a square loss function of the task time prediction model.

6. The task scheduling method of any one of claims 1, 4 or 5, wherein the resources comprise at least one of GPU, CPU, memory and disk.

7. A task scheduling system, comprising:

the machine nodes are used for running tasks by utilizing system resources;

the task scheduler is used for predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated; selecting a machine node running a target task from all machine nodes meeting the resources required by the task to be run as a target node; scheduling the task to be operated to the target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest; the task time prediction model is obtained by training the training method of the task time prediction model according to any one of claims 1 and 4 to 6.

8. A task scheduling device, comprising:

the second prediction module is used for predicting a task to be operated by using a task time prediction model to obtain the predicted operation time of the task to be operated;

the node selection module is used for selecting a machine node running with a target task from all machine nodes meeting the resources required by the task to be run as a target node; wherein the difference between the ending time point of the target task and the predicted completion time point of the task to be operated is the smallest;

the task scheduling module is used for scheduling the task to be operated to the target node;

wherein the task time prediction model is trained by the training method of the task time prediction model according to any one of claims 1 and 4 to 6.

9. An electronic device comprising a memory and a processor coupled to each other, the processor configured to execute program instructions stored in the memory to implement the task scheduling method of any one of claims 1 to 6.

10. A computer readable storage medium having stored thereon program instructions, which when executed by a processor implement the task scheduling method of any one of claims 1 to 6.