CN113010280B

CN113010280B - Processing method, system, device, equipment and medium for distributed task

Info

Publication number: CN113010280B
Application number: CN202110195236.5A
Authority: CN
Inventors: 龙飞
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2024-08-13
Anticipated expiration: 2041-02-19
Also published as: CN113010280A

Abstract

The invention relates to a method, a system, a device, equipment and a medium for processing distributed tasks. The processing method of the distributed task comprises the following steps: receiving a task to be processed, and splitting the task to be processed into a plurality of subtasks, wherein the plurality of subtasks comprise: one front subtask and at least two rear subtasks, at least two rear subtasks depending on the front subtasks; scheduling the pre-sub-tasks to a first server; and determining that the execution of the front-end subtask is finished, and scheduling the to-be-processed rear-end subtask to a second server, wherein the to-be-processed rear-end subtask is part or all of at least two rear-end subtasks, and the second server is any one server in the cluster where the first server is located. The distributed task processing method can improve the task processing efficiency.

Description

Processing method, system, device, equipment and medium for distributed task

Technical Field

The disclosure relates to the technical field of distributed task processing, and in particular relates to a distributed task processing method, a system, a device, equipment and a medium.

Background

The distributed task processing system is a system formed by interconnecting a plurality of computers through a communication network and is used for managing distributed system resources, and the interconnected computers can work in a coordinated manner under the support of a distributed computer operating system to jointly complete a task. In a distributed environment, a distributed task to be processed is generally split into a plurality of subtasks, and each subtask obtained after splitting is distributed to a plurality of computing nodes for processing.

In a distributed task processing system, multiple tasks may depend on the same resource, and the same processing needs to be performed on the co-dependent resource when each task is executed, for example, in a long video transcoding scenario, multiple transcoding tasks only need to depend on the same video source file, and each subtask needs to be executed to download a long video source, which may cause a decrease in task processing efficiency if a performance bottleneck exists in a provider of the co-dependent resource.

Disclosure of Invention

The embodiment of the invention provides a method, a system, a device, equipment and a medium for processing distributed tasks, which can improve the efficiency of task processing.

In a first aspect, an embodiment of the present invention provides a method for processing a distributed task, including:

receiving a task to be processed, and splitting the task to be processed into a plurality of subtasks, wherein the subtasks comprise: a front subtask and at least two rear subtasks, the at least two rear subtasks depending on the front subtasks;

scheduling the pre-subtasks to a first server;

And determining that the execution of the front-end subtask is finished, and scheduling the to-be-processed rear-end subtask to a second server, wherein the to-be-processed rear-end subtask is part or all of the at least two rear-end subtasks, and the second server is any one server in a cluster where the first server is located.

Optionally, before the scheduling the post-task to be processed to the second server, the method further includes:

Establishing a first corresponding relation between the task identifier and the first server according to the task identifier carried by the front-end subtask;

Establishing a second corresponding relation between the task identifier and the cluster according to the first corresponding relation and the cluster where the first server is located;

The scheduling the post-processing sub-task to a second server includes:

And according to the second corresponding relation and the task identifier carried by the post-processing subtask, the post-processing subtask is scheduled to a second server in the cluster corresponding to the task identifier carried by the post-processing subtask.

Optionally, the processing method of the distributed task further includes:

And migrating the execution result of the front-end subtask in the first server to a third server for receiving, wherein the third server and the first server are positioned in the same cluster.

Optionally, the second server and the first server are the same server;

Before the scheduling of the post-processing sub-task to the second server, the method further comprises:

Scheduling the post-processing sub-task to a second server, including:

and according to the first corresponding relation and the task identifier carried by the post-processing subtask, the post-processing subtask is scheduled to a first server corresponding to the task identifier carried by the post-processing subtask.

Optionally, the processing method of the distributed task further includes:

migrating the execution result of the front-end subtask in the first server to the third server for receiving, wherein the third server and the first server are positioned in the same cluster;

and updating a first corresponding relation between the task identifier and the first server, wherein the updated first server is the third server.

and determining the post-subtask to be processed according to the execution result of the pre-subtask.

In a second aspect, an embodiment of the present invention further provides a distributed task processing system, including: the system comprises a workflow server and a task scheduler, wherein the workflow server is in communication connection with the task scheduler; wherein,

The workflow server is configured to receive a task to be processed, split the task to be processed into a plurality of subtasks, where the plurality of subtasks include: a front subtask and at least two rear subtasks, the at least two rear subtasks depending on the front subtasks;

The task scheduler is used for scheduling the front-end subtasks to a first server;

The workflow server is used for determining that the execution of the front-end subtask is finished; the post-subtasks to be processed and the pre-subtasks are sent to the task scheduler;

the task scheduler is configured to schedule the post-processing sub-task to a second server, where the post-processing sub-task is a part or all of the at least two post-processing sub-tasks, and the second server is any one of the servers in the cluster where the first server is located.

In a third aspect, an embodiment of the present invention further provides a device for processing a distributed task, including:

a receiving module for receiving the task to be processed,

The splitting module is configured to split the task to be processed into a plurality of subtasks, where the plurality of subtasks includes: a front subtask and at least two rear subtasks, the at least two rear subtasks depending on the front subtasks;

the subtask determining module is used for determining that the execution of the front subtask is finished; the post-subtasks to be processed and the pre-subtasks are sent to a task scheduling module;

The task scheduling module is used for scheduling the front-end subtasks to a first server; and dispatching the post-processing sub-task to a second server, wherein the post-processing sub-task is part or all of the at least two post-processing sub-tasks, and the second server is any one server in the cluster where the first server is located.

In a fourth aspect, an embodiment of the present invention further provides an apparatus, including: a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing any of the distributed task processing methods provided in the first aspect when the computer program is executed.

In a fifth aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for processing any of the distributed tasks provided in the first aspect.

In the technical scheme provided by the embodiment of the invention, the task to be processed is split into a plurality of subtasks by receiving the task to be processed, wherein the subtasks comprise: one front subtask and at least two rear subtasks, at least two rear subtasks depending on the front subtasks; scheduling the pre-sub-tasks to a first server; determining that the execution of the front-end subtask is finished, and scheduling the to-be-processed rear-end subtask to a second server, wherein the to-be-processed rear-end subtask is part or all of at least two rear-end subtasks, the second server is any one of the servers in the cluster where the first server is located, the front-end subtask can be executed first, the server executing the to-be-processed rear-end subtask can execute the corresponding to-be-processed rear-end subtask based on the execution result of the front-end subtask, repeated execution of the same subtask is avoided, accordingly, overall processing efficiency reduction of the tasks caused by a single resource bottle is avoided, and further task processing efficiency can be improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a method for processing distributed tasks according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a distributed task processing system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a distributed task processing system according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating another method for processing a distributed task according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating another method for processing a distributed task according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a distributed task processing device according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

Fig. 1 is a flow chart of a method for processing a distributed task according to an embodiment of the present invention, as shown in fig. 1, specifically including:

s110, receiving a task to be processed, and splitting the task to be processed into a plurality of subtasks.

Wherein the plurality of subtasks includes: one pre-subtask and at least two post-subtasks, the at least two post-subtasks being dependent on the pre-subtask.

Exemplary, fig. 2 is a schematic structural diagram of a distributed task processing system according to an embodiment of the present invention, and as shown in fig. 2, the distributed task processing system 100 includes: comprising two workflow servers 110 and a task scheduler 120, each workflow server 110 receiving a task to be processed, each workflow server 110 being communicatively connected to the task scheduler 120. In other embodiments, the number of workflow servers 110 in the distributed task processing system 100 may be more than two, and the number of workflow servers 110 is not particularly limited in the embodiment of the present invention.

Specifically, the user sends at least one task to be processed, and after receiving the task to be processed, the workflow server 110 splits the task to be processed into a plurality of subtasks. The plurality of subtasks comprises a front subtask and at least two rear subtasks, the front subtask comprises the processing of resources which are relied on by all the rear subtasks together, that is to say, the execution of all the rear subtasks depends on the execution result of the front subtask, that is to say, all the rear subtasks depend on the front subtask.

The task to be processed is illustratively transcoding long video, the pre-task generated is downloading long video, the post-subtask is for example transcoding the front 300 frames of images and transcoding the back 300 frames of images respectively, or the post-subtask includes for example transcoding video and extracting frames according to a preset frequency respectively.

And S120, scheduling the front-end subtasks to the first server.

Specifically, the workflow server 110 sends the generated pre-subtasks to the task scheduler 120, and after the task scheduler 120 receives the pre-subtasks, the pre-subtasks are scheduled to the first server 131 according to a certain policy. The first server 131 executes the received pre-subtask, stores the execution result of the pre-subtask, and sends a callback instruction corresponding to the execution result of the pre-subtask to the workflow server 110.

Illustratively, based on the above embodiment, the first server 131 performs a pre-word task of downloading long videos and stores the downloaded long videos, while transmitting a callback instruction of completion of the downloading to the workflow server 110.

S130, determining that the execution of the front sub-task is finished.

Specifically, after receiving the callback instruction, the workflow server 110 confirms that the execution of the front subtask is finished, and sends the determined to-be-processed rear subtask to the task scheduler 120.

And S140, scheduling the post-processing sub-task to be processed to a second server.

The to-be-processed post-subtasks are part or all of the at least two post-subtasks, and the second server is any one server in the cluster where the first server is located.

Illustratively, the task scheduler 120 schedules the post-processing sub-tasks sent by the workflow server 110 to the second server 132, where the second server 132 is the same server as the first server 131, as shown in fig. 2, i.e. the task scheduler 120 schedules the post-processing sub-tasks sent by the workflow server 110 to the first server 131. The first server 131 stores the execution result of the pre-subtask, and based on the execution result of the pre-subtask, the first server 131 can continuously execute each post-subtask to be processed without executing repeated related operations on resources on which each post-subtask to be processed depends together, so that the task processing efficiency reduction caused by single resource bottleneck is avoided.

In other embodiments, the task scheduler 120 schedules the post-processing sub-tasks sent by the workflow server 110 to the second server 132, where the second server 132 is located in the same cluster as the first server 131 and the second server 132 is not the same server as the first server 131, as shown in fig. 3. The servers in the same cluster are communicatively connected to each other, that is, the second server 132 can call the execution result of the pre-subtask stored in the first server 131. The post-processing subtasks are scheduled to the second server 132, and based on the execution result of the pre-processing subtasks, the second server 132 continues to execute each post-processing subtask without executing repeated related operations on resources on which each post-processing subtask depends together, so that the task processing efficiency reduction caused by single resource bottleneck is avoided. As an alternative embodiment, the second server and the first server may share data via a data link.

Optionally, fig. 4 is a flowchart of another method for processing a distributed task according to an embodiment of the present invention, where, as shown in fig. 4, before executing S140 shown in fig. 1, the method includes:

S210, according to task identifiers carried by the front-end subtasks, a first corresponding relation between the task identifiers and the first server is established.

Specifically, the front subtask and the rear subtask corresponding to the same task to be processed both carry the same task identifier. The task scheduler 120 schedules the pre-sub task to the first server 110, and establishes a corresponding relationship between the task identifier carried by the pre-sub task and the first server 120. For example: the task identifier carried by the front sub-task is key1, and a first corresponding relation between the task identifier key1 and the first server 120 is established.

S220, establishing a second corresponding relation between the task identifier and the cluster according to the first corresponding relation and the cluster where the first server is located.

Illustratively, based on the above embodiment, the task scheduler 120 establishes a second correspondence between the task identifier key1 and the cluster M1 according to the first correspondence between the task identifier key1 and the first server 120 and the cluster M1 where the first server 120 is located.

As a way of implementing S140 shown in fig. 1, it includes:

S230, according to the second corresponding relation and the task identification carried by the post-processing subtask, the post-processing subtask is scheduled to a second server of the cluster corresponding to the task identification carried by the post-processing subtask.

Illustratively, based on the above embodiment, the task scheduler 120 schedules the post-processing sub-task carrying the task identifier key1 to any one of the servers in the cluster M1 according to the second correspondence between the task identifier key1 and the cluster M1 and the task identifier carried by the post-processing sub-task. According to the embodiment of the invention, the first corresponding relation between the task identifier and the first server is established, the second corresponding relation between the task identifier and the cluster is established according to the first corresponding relation and the cluster where the first server 131 is located, and the plurality of post-processing sub-tasks can be simultaneously scheduled to any one server in the cluster where the first server 131 is located according to the second corresponding relation and the task identifier carried by the post-processing sub-tasks, so that the task processing time is shortened, and the task processing efficiency is improved.

Optionally, with continued reference to fig. 4, the method for processing a distributed task further includes:

S240, migrating the execution result of the front sub-task in the first server to a third server for receiving.

Wherein the third server and the first server are located in the same cluster.

For example, if the first server 131 fails, the execution result of the pre-subtask stored in the first server 131 is migrated to the other servers in the cluster where the first server 131 is located, i.e. the third server. The task scheduler 120 schedules the post-processing sub-task to the third server, and based on the execution result of the pre-processing sub-task, the third server continues to execute the post-processing sub-task, so that processing interruption caused by failure of the server executing the pre-processing sub-task can be avoided, and stability of the distributed task processing system is improved.

It should be noted that, in the embodiment of the present invention, migration of the execution result of the pre-sub task due to the failure of the first server 131 is only illustrated by way of example. In other embodiments, the migration of the execution result of the pre-subtask may be caused by other reasons, which is not particularly limited by the embodiment of the present invention.

Optionally, the second server 132 and the first server 131 are the same server, and fig. 5 is a flowchart of another processing method of a distributed task according to an embodiment of the present invention, as shown in fig. 5, before executing S140 shown in fig. 1, including:

S310, according to task identifiers carried by the front-end subtasks, a first corresponding relation between the task identifiers and the first server is established.

As another way of implementing S140 shown in fig. 1, specifically includes:

and S320, according to the first corresponding relation and the task identifier carried by the post-processing subtask, the post-processing subtask is scheduled to a first server corresponding to the task identifier carried by the post-processing subtask.

Illustratively, based on the above embodiment, the task scheduler 120 schedules the post-processing sub-task carrying the task identifier key1 to the first server 131 according to the first corresponding relationship between the task identifier key1 and the first server 120 and the task identifier carried by the post-processing sub-task. According to the embodiment of the invention, the to-be-processed post-subtask is scheduled through the first corresponding relation and the task identifier carried by the to-be-processed post-subtask, and the to-be-processed post-subtask can be scheduled to the first server more rapidly due to a simpler acquisition mode of the first corresponding relation.

Optionally, with continued reference to fig. 5, the method for processing a distributed task further includes:

s330, migrating the execution result of the front sub-task in the first server to the third server for receiving.

Wherein the third server and the first server are located in the same cluster.

S340, updating a first corresponding relation between the task identifier and the first server, wherein the updated first server is the third server.

Specifically, while the execution result of the pre-subtask stored in the first server 131 is migrated to the third server, the task scheduler 120 uses the third server as a new first server 131, and updates the first correspondence between the task identifier and the first server 131. Then, the task scheduler 120 schedules the post-processing sub-task to the new first server 131, i.e. the third server, according to the first correspondence between the updated task identifier and the first server 131.

Optionally, with continued reference to fig. 1, as shown in fig. 1, before executing S140, the method further includes:

Illustratively, the pending post-subtasks are determined from the callback instructions, which may be all of the post-subtasks. For example: based on the above embodiment, after the downloading of the long video is finished, the workflow server 110 receives the corresponding callback instruction, and determines that transcoding the front 300 frames of images and transcoding the rear 300 frames of images are all post-subtasks to be processed.

It should be noted that, in the embodiment of the present invention, the post-processing subtasks are all of the post-processing subtasks, and in other embodiments, the post-processing subtasks may also be part of the post-processing subtasks. For example: the post-subtask is B and the post-subtask is C both depend on the pre-subtask A, if the pre-subtask A is successfully executed, the workflow server 110 receives a first callback instruction, determines that the pre-subtask A is executed and ends, and determines that the post-subtask B is a post-subtask to be processed; if the execution of the pre-task a fails, the workflow server 110 receives a second callback instruction, determines that the execution of the pre-task a is finished and determines that the post-task C is a post-task to be processed.

The embodiment of the invention also provides a processing device of the distributed task, which is used for executing the processing method of the distributed task provided by the embodiment.

Fig. 6 is a schematic structural diagram of a distributed task processing device according to an embodiment of the present invention, and as shown in fig. 6, a distributed task processing device 200 includes:

The receiving module 210 is configured to receive a task to be processed.

A splitting module 220, configured to split the task to be processed into a plurality of subtasks, where the plurality of subtasks includes: one pre-subtask and at least two post-subtasks, the at least two post-subtasks being dependent on the pre-subtask.

The subtask determining module 230 is configured to determine that the execution of the pre-subtask is ended, and send the post-subtask to be processed and the pre-subtask to a task scheduling module.

A task scheduling module 240, configured to schedule the pre-subtasks to a first server; and dispatching the post-processing sub-task to a second server, wherein the post-processing sub-task is part or all of the at least two post-processing sub-tasks, and the second server is any one server in the cluster where the first server is located.

Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention, and fig. 7 shows a block diagram of an exemplary device suitable for implementing an embodiment of the present invention. The device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.

As shown in fig. 7, device 12 is in the form of a general purpose computing device. Components of device 12 may include, but are not limited to: one or more processors 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processors 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 12 typically includes a variety of computer system readable media. Such media can be any medium that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with device 12, and/or any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, device 12 may communicate with one or more networks such as a local area network, a wide area network, and/or a public network such as the Internet via network adapter 20. As shown, network adapter 20 communicates with other modules of device 12 over bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with device 12, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, tape drives, data backup storage systems, and the like.

The processor 16 executes various functional applications and data processing, such as the processing methods of distributed tasks provided by embodiments of the present invention, by running at least one of a plurality of programs stored in the system memory 28.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements a method for processing any of the distributed tasks provided by the embodiment of the invention. That is, the program, when executed by the processor, implements:

Receiving a task to be processed, and splitting the task to be processed into a plurality of subtasks, wherein the subtasks comprise: one pre-subtask and at least two post-subtasks, the at least two post-subtasks being dependent on the pre-subtask.

And dispatching the front-end subtasks to a first server.

Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for processing a distributed task, comprising:

scheduling the pre-subtasks to a first server;

Establishing a first corresponding relation between the task identifier and the first server according to the task identifier carried by the front-end subtask; wherein, the front subtask and the rear subtask corresponding to the same task to be processed carry the same task identification;

And determining that the execution of the front-end subtask is finished, and scheduling the rear-end subtask to be processed to a second server in a cluster corresponding to the task identifier carried by the rear-end subtask to be processed according to the second corresponding relation and the task identifier carried by the rear-end subtask to be processed, wherein the rear-end subtask to be processed is part or all of the at least two rear-end subtasks, and the second server is any one server in the cluster where the first server is located.

2. The method for processing a distributed task according to claim 1, further comprising:

3. The method for processing a distributed task according to claim 1, wherein the second server and the first server are the same server;

Scheduling the post-processing sub-task to a second server, including:

4. A method of processing a distributed task as claimed in claim 3, further comprising:

Migrating the execution result of the front-end subtask in the first server to a third server for receiving, wherein the third server and the first server are positioned in the same cluster;

5. The method for processing a distributed task according to any one of claims 1 to 4, further comprising, before the scheduling the post-processing sub-task to be processed to the second server:

6. A system for processing distributed tasks, comprising: the system comprises a workflow server and a task scheduler, wherein the workflow server is in communication connection with the task scheduler; wherein,

The task scheduler is used for establishing a first corresponding relation between the task identifier and the first server according to the task identifier carried by the front-end subtask; wherein, the front subtask and the rear subtask corresponding to the same task to be processed carry the same task identification;

The task scheduler is configured to establish a second correspondence between the task identifier and a cluster where the first server is located according to the first correspondence and the cluster where the first server is located;

the workflow server is used for determining that the execution of the front-end subtask is finished and sending the to-be-processed rear-end subtask and the front-end subtask to the task scheduler;

The task scheduler is configured to schedule the post-processing sub-task to a second server in a cluster corresponding to the task identifier carried by the post-processing sub-task according to the second correspondence and the task identifier carried by the post-processing sub-task, where the pre-processing sub-task and the post-processing sub-task corresponding to the same post-processing task both carry the same task identifier; the to-be-processed post-subtask is part or all of the at least two post-subtasks, and the second server is any one server in the cluster where the first server is located.

7. A distributed task processing device, comprising:

the receiving module is used for receiving the task to be processed;

the task scheduling module is used for scheduling the front-end subtasks to a first server; establishing a first corresponding relation between the task identifier and the first server according to the task identifier carried by the front-end subtask;

the task scheduling module is used for establishing a second corresponding relation between the task identifier and the cluster according to the first corresponding relation and the cluster where the first server is located; and dispatching the post-processing subtasks to a second server in the cluster corresponding to the task identifiers carried by the post-processing subtasks according to the second corresponding relation and the task identifiers carried by the post-processing subtasks, wherein the post-processing subtasks are part or all of the at least two post-processing subtasks, and the second server is any one server in the cluster where the first server is located.

8. An apparatus, comprising: memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method of processing distributed tasks according to any of claims 1-4 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of processing distributed tasks according to any of claims 1-4.