CN115248750A

CN115248750A - Distributed task scheduling method, apparatus, computer equipment and medium

Info

Publication number: CN115248750A
Application number: CN202110456914.9A
Authority: CN
Inventors: 张小勇; 和敬刚; 杨志欣; 王洪伟
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Shuguang Information Industry Henan Co ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2022-10-28

Abstract

The application relates to a distributed task scheduling method, a distributed task scheduling device, computer equipment and a medium. In the distributed task scheduling method, original main node equipment creates task nodes in the process of executing a first target task, and writes task information of the first target task into the task nodes, so that after the original main node is down, secondary node equipment is converted into target main node equipment, and then the target main node equipment scans preset directory nodes in a Zookeeper system; if at least one task node exists under the directory node, reading task information from the at least one task node; and recovering the first target task according to the task information, and executing the first target task. Because the Zookeeper system has high availability, the data storage of the distributed system can be ensured not to be easy to lose efficacy, thereby ensuring the high availability of the distributed system.

Description

Distributed task scheduling method, apparatus, computer equipment and medium

技术领域technical field

本申请涉及分布式系统技术领域，特别是涉及一种分布式任务调度方法、装置、计算机设备和介质。The present application relates to the technical field of distributed systems, and in particular, to a distributed task scheduling method, apparatus, computer equipment and medium.

背景技术Background technique

分布式系统包括多台计算机设备，该些计算机设备包括主节点设备和次节点设备。其中，主节点设备用于接收任务请求，并根据任务请求建立和执行任务，在主节点设备出现宕机时，分布式系统可以将次节点设备转换为新的主节点设备，然后由新的主节点设备恢复前一主节点设备未执行完的任务。A distributed system includes a plurality of computer devices, including a primary node device and a secondary node device. Among them, the master node device is used to receive task requests, and establish and execute tasks according to the task requests. When the master node device is down, the distributed system can convert the secondary node device into a new master node device, and then the new master node device The node device resumes the unfinished tasks of the previous master node device.

然而，由于分布式系统现有的数据存储系统容易发生单点失效的问题，因此影响了分布式系统的高可用性。However, since the existing data storage system of the distributed system is prone to the problem of single point of failure, the high availability of the distributed system is affected.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种分布式任务调度方法、装置、计算机设备和介质。Based on this, it is necessary to provide a distributed task scheduling method, apparatus, computer equipment and medium for the above technical problems.

一种分布式任务调度方法，该方法包括：A distributed task scheduling method, comprising:

在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；根据任务信息恢复第一目标任务，并执行第一目标任务。After converting from the secondary node device to the target primary node device, scan the preset directory nodes in the Zookeeper system; if there is at least one task node under the directory node, read task information from at least one task node; among them, at least one The task node is created by the original master node device in the process of executing the first target task, and the task information is the information related to the execution of the first target task written by the original master node device; the first target task is restored according to the task information. task, and perform the first target task.

本申请实施例通过Zookeeper系统的分布式存储的支持，实现了支持分布式系统的原有的主节点设备在宕机的情况下，由目标主节点设备恢复第一目标任务并继续执行第一目标任务的能力，提高了分布式系统的高可用性。With the support of distributed storage of the Zookeeper system, the embodiment of the present application realizes that when the original master node device supporting the distributed system is down, the target master node device restores the first target task and continues to execute the first target The ability of tasks to improve the high availability of distributed systems.

在其中一个实施例中，该方法还包括：In one embodiment, the method further includes:

在接收到任务请求之后，根据任务请求建立第二目标任务，并执行第二目标任务；在执行第二目标任务的过程中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中。After receiving the task request, establish a second target task according to the task request, and execute the second target task; in the process of executing the second target task, create a connection with the second target task under the preset directory node in the Zookeeper system corresponding task node, and write the task information corresponding to the second target task into the task node corresponding to the second target task.

通过在建立第二目标任务时，即将第二目标任务的任务信息写入Zookeeper系统中，为后续恢复并执行第二目标任务做好准备，为提高分布式系统的高可用性服务。When the second target task is established, the task information of the second target task is written into the Zookeeper system to prepare for subsequent recovery and execution of the second target task, and to improve the high availability service of the distributed system.

在其中一个实施例中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中，包括：将第二目标任务分割为多个目标子任务，各目标子任务为第二目标任务的不同阶段的任务；在目录节点下按照各目标子任务的执行先后顺序建立与各目标子任务对应的任务节点，并将各目标子任务对应的任务信息写入与各目标子任务对应的任务节点中。In one embodiment, a task node corresponding to the second target task is established under a preset directory node in the Zookeeper system, and task information corresponding to the second target task is written into the task node corresponding to the second target task , including: dividing the second target task into a plurality of target subtasks, each target subtask is a task of a different stage of the second target task; under the directory node, according to the execution sequence of each target subtask, establish and each target subtask The task node corresponding to the task, and the task information corresponding to each target subtask is written into the task node corresponding to each target subtask.

通过将第二目标任务分割为多个独立的目标子任务，并分别建立对应各目标子任务的任务节点，这样，在目标主节点设备宕机的情况下，新的主节点设备只需要恢复第二目标任务的一部分目标子任务，而不需要将第二目标任务全部恢复并重新执行，避免了分布式系统重复作业，影响任务的执行效率。By dividing the second target task into multiple independent target subtasks, and establishing task nodes corresponding to each target subtask, in this way, when the target master node device is down, the new master node device only needs to restore the first A part of the target subtasks of the second target task does not need to be restored and re-executed in all of the second target task, which avoids repeated operations in the distributed system and affects the execution efficiency of the task.

在其中一个实施例中，在执行第二目标任务的过程中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中，包括：将第二目标任务分割为多个按照执行先后顺序排列的目标子任务，各目标子任务为第二目标任务的不同阶段的任务；将各目标子任务对应的任务信息保存在目标任务节点中；对于各目标子任务，在执行目标子任务时，基于目标任务节点中存储的目标子任务对应的任务信息在Zookeeper系统中建立目标子任务对应的子任务节点，并删除目标子任务的前一目标子任务对应的子任务节点。In one embodiment, during the process of executing the second target task, a task node corresponding to the second target task is established under a preset directory node in the Zookeeper system, and the task information corresponding to the second target task is written Entering the task node corresponding to the second target task includes: dividing the second target task into a plurality of target subtasks arranged in order of execution, and each target subtask is a task of a different stage of the second target task; The task information corresponding to the target subtask is stored in the target task node; for each target subtask, when the target subtask is executed, the target subtask corresponding to the target subtask is established in the Zookeeper system based on the task information corresponding to the target subtask stored in the target task node. subtask node of the target subtask, and delete the subtask node corresponding to the previous target subtask of the target subtask.

各目标子任务对应的子任务节点并不是一次性都创建出来的，而是顺序执行目标子任务时，执行到哪个目标子任务就建立一个子任务节点，执行完成此子任务之后删除该子任务节点，以此类推来顺序执行各个目标子任务，这样可以节省zookeeper系统上的存储空间。The subtask nodes corresponding to each target subtask are not created at one time, but when the target subtasks are executed sequentially, whichever target subtask is executed, a subtask node is created, and the subtask is deleted after the subtask is executed. Nodes, and so on, execute each target subtask sequentially, which can save storage space on the zookeeper system.

在其中一个实施例中，该方法还包括：在执行第二目标任务的过程中，获取第二目标任务的任务状态信息直至第二目标任务执行结束；任务状态信息用于表示第二目标任务的执行状态；将任务状态信息写入与第二目标任务对应的任务节点中，以供在目标主节点设备宕机之后，确定第二目标任务是否执行结束。In one embodiment, the method further includes: in the process of executing the second target task, acquiring task status information of the second target task until the execution of the second target task ends; the task status information is used to indicate the status of the second target task. Execution state; write the task state information into the task node corresponding to the second target task, so as to determine whether the execution of the second target task ends after the target master node device goes down.

本申请实施例通过写入任务状态信息，避免了频繁地在目录节点中执行删除操作，减少Zookeeper系统的数据处理量，提高分布式系统的工作效率。By writing the task status information in the embodiment of the present application, frequent deletion operations are avoided in the directory node, the data processing volume of the Zookeeper system is reduced, and the work efficiency of the distributed system is improved.

在其中一个实施例中，将任务状态信息写入与第二目标任务对应的任务节点中，包括：在目录节点下建立事务标记，事务标记包括待将任务状态信息写入的至少两个任务节点的标识；事务标记用于表示对至少两个任务节点的标识对应的至少两个任务节点同步执行写入操作；对至少两个任务节点中的任务信息进行备份，得到备份数据；将任务状态信息写入至少两个任务节点的标识对应的至少两个任务节点中。In one of the embodiments, writing the task status information into the task node corresponding to the second target task includes: establishing a transaction mark under the directory node, where the transaction mark includes at least two task nodes to which the task status information is to be written The transaction flag is used to indicate that the write operation is performed synchronously on at least two task nodes corresponding to the identifiers of at least two task nodes; the task information in the at least two task nodes is backed up to obtain backup data; the task status information Write into at least two task nodes corresponding to the identifiers of at least two task nodes.

若写入成功，则删除备份数据和事务标记；If the writing is successful, the backup data and transaction mark are deleted;

若写入失败，则根据备份数据恢复至少两个任务节点的标识对应的至少两个任务节点中的任务信息，以及删除事务标记。If the writing fails, the task information in the at least two task nodes corresponding to the identifiers of the at least two task nodes is restored according to the backup data, and the transaction mark is deleted.

通过建立事务标记和删除事务标记，实现了对多个任务节点进行操作，保证了对Zookeeper系统操作的原子性。解决了使用Zookeeper系统不支持对多个任务节点同时事务操作的问题。完整地支持了基于Zookeeper系统的高可用作业框架。By establishing transaction marks and deleting transaction marks, operations on multiple task nodes are realized, and the atomicity of Zookeeper system operations is guaranteed. Solved the problem that the Zookeeper system does not support simultaneous transaction operations on multiple task nodes. Completely supports the highly available job framework based on the Zookeeper system.

在第二目标任务执行结束之后，从目录节点中删除与第二目标任务对应的任务节点。After the execution of the second target task is completed, the task node corresponding to the second target task is deleted from the directory node.

通过将执行结束的第二目标任务对应的任务节点删除，可以简化数据存储量，降低Zookeeper系统的数据管理负担。By deleting the task node corresponding to the second target task whose execution ends, the data storage amount can be simplified, and the data management burden of the Zookeeper system can be reduced.

一种分布式任务调度装置，该装置包括：A distributed task scheduling device, the device includes:

扫描模块，用于在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；The scanning module is used to scan the preset directory nodes in the Zookeeper system after converting from the secondary node device to the target primary node device;

读取模块，用于若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；The reading module is used for reading task information from at least one task node if there is at least one task node under the directory node; wherein, at least one task node is the original master node device in the process of executing the first target task Created, the task information is the information related to the execution of the first target task written by the original master node device;

恢复模块，用于根据任务信息恢复第一目标任务，并执行第一目标任务。The restoration module is used for restoring the first target task according to the task information, and executing the first target task.

一种计算机设备，包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时实现以下步骤：在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；根据任务信息恢复第一目标任务，并执行第一目标任务。A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program: after switching from a secondary node device to a target primary node device, scan preset directory nodes in the Zookeeper system If there is at least one task node under the directory node, then read task information from at least one task node; wherein, at least one task node is created by the original master node device in the process of executing the first target task, and the task information The information related to the execution of the first target task written for the original master node device; the first target task is restored according to the task information, and the first target task is executed.

一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；根据任务信息恢复第一目标任务，并执行第一目标任务。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: after being converted from a secondary node device to a target master node device, scan a preset directory node in the Zookeeper system; If there is at least one task node under the directory node, read task information from at least one task node; wherein, at least one task node is created by the original master node device in the process of executing the first target task, and the task information is The information related to the execution of the first target task written by the original master node device; the first target task is restored according to the task information, and the first target task is executed.

上述分布式任务调度方法、装置、计算机设备和介质，可以保证分布式系统的高可用性。该分布式任务调度方法中，原有的主节点设备在执行第一目标任务的过程中创建的任务节点，并将第一目标任务的任务信息写入任务节点中，这样当原有的主节点宕机之后，次节点设备转换为目标主节点设备，然后目标主节点设备通过扫描Zookeeper系统中的预设的目录节点；若所述目录节点下存在至少一个任务节点，则从所述至少一个任务节点中读取任务信息；根据所述任务信息恢复所述第一目标任务，并执行所述第一目标任务。由于Zookeeper系统具有高可用性，因此，可以保障分布式系统的数据存储不容易失效，从而保证分布式系统的高可用性。The above-mentioned distributed task scheduling method, apparatus, computer equipment and medium can ensure the high availability of the distributed system. In the distributed task scheduling method, the original master node device creates a task node in the process of executing the first target task, and writes the task information of the first target task into the task node, so that when the original master node device After the downtime, the secondary node device is converted into the target master node device, and then the target master node device scans the preset directory node in the Zookeeper system; if there is at least one task node under the directory node, then the at least one task node The task information is read in the node; the first target task is restored according to the task information, and the first target task is executed. Since the Zookeeper system has high availability, it can ensure that the data storage of the distributed system is not easy to fail, thereby ensuring the high availability of the distributed system.

附图说明Description of drawings

图1为一个实施例中分布式任务调度方法所涉及到的实施环境的示意图；1 is a schematic diagram of an implementation environment involved in a distributed task scheduling method in one embodiment;

图2为一种主节点设备的结构示意图；2 is a schematic structural diagram of a master node device;

图3为另一种主节点设备的结构示意图；3 is a schematic structural diagram of another master node device;

图4为本申请实施例提供的一种分布式任务调度方法的流程图；4 is a flowchart of a distributed task scheduling method provided by an embodiment of the present application;

图5为本申请实施例提供的另一种分布式任务调度方法的流程图；5 is a flowchart of another distributed task scheduling method provided by an embodiment of the present application;

图6为本申请实施例提供的另一种分布式任务调度方法的流程图；6 is a flowchart of another distributed task scheduling method provided by an embodiment of the present application;

图7为本申请实施例提供的另一种分布式任务调度方法的流程图；7 is a flowchart of another distributed task scheduling method provided by an embodiment of the present application;

图8为本申请实施例提供的对Zookeeper系统中的多个任务节点进行写入操作的方法的流程图；8 is a flowchart of a method for performing a write operation on multiple task nodes in a Zookeeper system provided by an embodiment of the present application;

图9为本申请实施例提供的一种分布式任务调度装置的结构框图；FIG. 9 is a structural block diagram of a distributed task scheduling apparatus provided by an embodiment of the present application;

图10为本申请实施例提供的另一种分布式任务调度装置的结构框图。FIG. 10 is a structural block diagram of another distributed task scheduling apparatus provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

首先，对本案中涉及到的分布式系统的架构进行说明：First, the architecture of the distributed system involved in this case is explained:

分布式系统包括多台计算机设备组成的集群，各计算机设备之间通过网络通信。多个计算机设备协同对外提供某种服务或者资源。一般而言，分布式系统中包括的多台计算机设备中的一个计算机设备为该分布式系统中的主节点设备，其余计算机设备为该分布式系统中的次节点设备。A distributed system includes a cluster composed of multiple computer devices, and each computer device communicates through a network. Multiple computer devices cooperate to provide a certain service or resource to the outside world. Generally speaking, one computer device among the multiple computer devices included in the distributed system is the primary node device in the distributed system, and the other computer devices are the secondary node devices in the distributed system.

其次，对本案中涉及到的分布式系统的工作过程进行说明：Secondly, the working process of the distributed system involved in this case is explained:

分布式系统中由主节点设备执行任务，并在执行任务的过程中，将任务的任务信息以及任务的执行状态信息存储起来。In the distributed system, the master node device executes the task, and in the process of executing the task, the task information of the task and the execution state information of the task are stored.

当主节点设备宕机时，分布式系统可以基于预先设定的规则从多个次节点设备中选择新的主节点设备。该新的主节点设备可以根据存储的任务信息和执行状态信息将前一主节点设备没有执行完成的任务重新恢复，然后继续执行。When the primary node device goes down, the distributed system can select a new primary node device from multiple secondary node devices based on preset rules. The new master node device can restore the tasks that were not executed and completed by the previous master node device according to the stored task information and execution state information, and then continue to execute.

在实际应用中，分布式系统中新的主节点设备在恢复任务时对存储数据的依赖性比较高，因此存储数据的有效性严重影响到了分布式系统的高可用性。然而，现有的分布式系统的数据存储容易发生失效的问题，导致现有的分布式系统的高可用性不稳定。In practical applications, the new master node device in the distributed system is highly dependent on the stored data when restoring tasks, so the effectiveness of the stored data seriously affects the high availability of the distributed system. However, the data storage of the existing distributed system is prone to failure, resulting in unstable high availability of the existing distributed system.

基于上述问题，本提案提供了以下技术方案：主节点设备通过将任务信息写入Zookeeper系统中，由于Zookeeper系统为分布式系统，其本身具有高可用性，因此不容易出现失效的问题。这样当主节点设备宕机之后，可以保证新的主节点设备可以正常地从Zookeeper系统中获取恢复任务所需要的任务信息，从而保证了分布式系统的高可用性。Based on the above problems, this proposal provides the following technical solutions: The master node device writes the task information into the Zookeeper system. Since the Zookeeper system is a distributed system and has high availability, it is not easy to fail. In this way, when the master node device goes down, it can be ensured that the new master node device can normally obtain the task information required for the recovery task from the Zookeeper system, thereby ensuring the high availability of the distributed system.

进一步的，本申请实施例提供的分布式任务调度方法基于Zookeeper系统的分布式一致性支持，简化了高可用任务信息在不同服务器间的一致性处理问题。并且，基于Zookeeper系统的分布式存储的支持，简化了高可用任务信息在不同服务器间同步数据的设计。最后，本申请解决了使用Zookeeper系统不支持对多个任务节点同时事务操作的技术问题，完整地支持了基于Zookeeper系统的高可用作业框架。Further, the distributed task scheduling method provided by the embodiments of the present application is based on the distributed consistency support of the Zookeeper system, which simplifies the problem of consistency processing of high-availability task information among different servers. Moreover, the support of distributed storage based on the Zookeeper system simplifies the design of high-availability task information to synchronize data between different servers. Finally, the present application solves the technical problem that using the Zookeeper system does not support simultaneous transaction operations on multiple task nodes, and fully supports the highly available job framework based on the Zookeeper system.

下面，将对本申请实施例提供的分布式任务调度方法所涉及到的实施环境进行简要说明。Below, the implementation environment involved in the distributed task scheduling method provided by the embodiment of the present application will be briefly described.

如图1所示，该实施环境中包括配置有Zookeeper系统的服务器集群101和计算机设备集群102。计算机设备集群102中配置有分布式系统，该分布式系统可以应用本申请提供的分布式调度方法。As shown in FIG. 1 , the implementation environment includes a server cluster 101 and a computer equipment cluster 102 configured with a Zookeeper system. A distributed system is configured in the computer device cluster 102, and the distributed scheduling method provided in this application can be applied to the distributed system.

其中，Zookeeper系统可以提供对分布式系统一致性及分布式存储的支持，配置有Zookeeper系统的服务器集群101包括主服务器和若干从服务器，配置有Zookeeper系统的服务器集群101可以用于维护一个类似文件系统的数据结构，该数据结构中包括多个目录节点，每个目录节点还可以包括多个子目录节点，各个目录节点或者各个子目录节点可以存储数据。当配置有Zookeeper系统的服务器集群101中的主服务器宕机时，该配置有Zookeeper系统的服务器集群101中的从服务器可以成为新的主服务器以保证Zookeeper系统的正常运行。The Zookeeper system can provide support for distributed system consistency and distributed storage. The server cluster 101 configured with the Zookeeper system includes a master server and several slave servers, and the server cluster 101 configured with the Zookeeper system can be used to maintain a similar file. The data structure of the system, the data structure includes multiple directory nodes, each directory node may also include multiple sub-directory nodes, and each directory node or each sub-directory node can store data. When the master server in the server cluster 101 configured with the Zookeeper system goes down, the slave server in the server cluster 101 configured with the Zookeeper system can become the new master server to ensure the normal operation of the Zookeeper system.

该计算机设备集群102包括多台计算机设备，该多台计算机设备中的一台计算机设备为主节点设备1021，其余计算机设备为次节点设备1022，其中，主节点设备1021与配置有Zookeeper系统的服务器集群101中的主服务器连接，以便于在Zookeeper系统中执行写入操作和读取操作。The computer device cluster 102 includes multiple computer devices, one of the multiple computer devices is the primary node device 1021, and the rest of the computer devices are secondary node devices 1022, wherein the primary node device 1021 and the server configured with the Zookeeper system The master servers in the cluster 101 are connected in order to perform write operations and read operations in the Zookeeper system.

在本申请实施例中，该主节点设备可以是服务器或者终端。若该主节点设备为终端，其内部结构图可以如图2所示，该主节点设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中，该主节点设备的处理器用于提供计算和控制能力。该主节点设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该主节点设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时实现本申请提供的分布式任务调度方法。该主节点设备可以包括液晶显示屏或者电子墨水显示屏，该主节点设备的输入装置可以是按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In this embodiment of the present application, the master node device may be a server or a terminal. If the master node device is a terminal, its internal structure diagram may be as shown in FIG. 2 , and the master node device includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. The processor of the master node device is used to provide computing and control capabilities. The memory of the master node device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The network interface of the master node device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the distributed task scheduling method provided by the present application is implemented. The master node device may include a liquid crystal display screen or an electronic ink display screen, and the input device of the master node device may be a button, a trackball, or a touchpad, and may also be an external keyboard, touchpad, or mouse, and the like.

若该主节点设备为服务器，其内部结构图可以如图3所示，该主节点设备包括通过系统总线连接的处理器、存储器和网络接口。其中，该主节点设备的处理器用于提供计算和控制能力。该主节点设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该主节点设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种分布式任务调度方法。If the master node device is a server, its internal structure diagram may be as shown in FIG. 3 , and the master node device includes a processor, a memory and a network interface connected through a system bus. The processor of the master node device is used to provide computing and control capabilities. The memory of the master node device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The network interface of the master node device is used to communicate with an external terminal through a network connection. The computer program, when executed by the processor, implements a distributed task scheduling method.

本领域技术人员可以理解，图2和图3中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structures shown in FIG. 2 and FIG. 3 are only block diagrams of partial structures related to the solution of the present application, and do not constitute a limitation on the computer equipment to which the solution of the present application is applied. A computer device may include more or fewer components than those shown in the figures, or combine certain components, or have a different arrangement of components.

请参考图4，其示出了本申请实施例提供的一种分布式任务调度方法的流程图，如图4所示，该分布式任务调度方法可以包括以下步骤：Please refer to FIG. 4 , which shows a flowchart of a distributed task scheduling method provided by an embodiment of the present application. As shown in FIG. 4 , the distributed task scheduling method may include the following steps:

步骤401，在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点。Step 401 , after switching from the secondary node device to the target primary node device, scan the preset directory nodes in the Zookeeper system.

首先需要说明的是，配置有分布式系统的主节点设备和次节点设备中均预先存储有目录节点的路径，该目录节点的路径用于指向Zookeeper系统中的预设的目录节点。First of all, it should be noted that both the primary node device and the secondary node device configured with the distributed system are pre-stored with the path of the directory node, and the path of the directory node is used to point to the preset directory node in the Zookeeper system.

在分布式系统运行时，原有的主节点设备可以接收任务请求，可以根据任务请求建立并执行第一目标任务。其中，第一目标任务为分布式系统所需要执行的应用类任务，本申请实施例对第一目标任务的实际任务类型不做限定。When the distributed system is running, the original master node device can receive the task request, and can establish and execute the first target task according to the task request. The first target task is an application task that needs to be executed by the distributed system, and the embodiment of the present application does not limit the actual task type of the first target task.

在执行第一目标任务的过程中，原有的主节点设备可以基于预先存储的目录节点的路径访问Zookeeper系统中的目录节点，并在该目录节点下建立任务节点，然后将第一目标任务的任务信息写入任务节点中。其中，第一目标任务的任务信息是与执行第一目标任务相关的信息。In the process of executing the first target task, the original master node device can access the directory node in the Zookeeper system based on the path of the pre-stored directory node, establish a task node under the directory node, and then transfer the first target task's path to the directory node. The task information is written into the task node. The task information of the first target task is information related to executing the first target task.

当主节点设备宕机时，分布式系统可以基于预先设定的规则从多个次节点设备中选取出新的主节点设备，该新的主节点设备在本申请中被称为目标主节点设备。该被选中的次节点设备的身份从次节点设备转换为目标主节点设备之后，目标主节点设备需要检测是否存在原有的主节点设备未执行完成的任务，因此，目标主节点设备需要根据其预先存储的目录节点的路径扫描Zookeeper系统中的预设的目录节点。When the primary node device is down, the distributed system can select a new primary node device from multiple secondary node devices based on a preset rule, and the new primary node device is referred to as the target primary node device in this application. After the identity of the selected secondary node device is converted from the secondary node device to the target primary node device, the target primary node device needs to detect whether there is an unfinished task performed by the original primary node device. Therefore, the target primary node device needs to The path of the pre-stored directory node scans the preset directory node in the Zookeeper system.

步骤402，若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息。Step 402: If there is at least one task node under the directory node, read task information from the at least one task node.

其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息。Wherein, at least one task node is created by the original master node device in the process of executing the first target task, and the task information is information related to executing the first target task written by the original master node device.

在一种可选的实现方式中，目标主节点设备在扫描目录节点之后，若目录节点下存在至少一个任务节点，就表示存在原有的主节点设备未执行完成的任务，这种情况下，目标主节点设备可以从该至少一个任务节点中读取任务信息。In an optional implementation manner, after the target master node device scans the directory node, if there is at least one task node under the directory node, it means that there is an unfinished task performed by the original master node device. In this case, The target master node device may read task information from the at least one task node.

若目录节点下不存在任务节点，就表示原有的主节点设备将所有的第一目标任务均执行结束。If there is no task node under the directory node, it means that the original master node device finishes executing all the first target tasks.

在一种可选的实现方式中，若目录节点下存在至少一个任务节点，就表示原有的主节点设备曾经建立过第一目标任务，目标主节点设备则可以读取该至少一个任务节点中的任务信息，可选的，任务信息可以包括任务状态，任务状态用于表示第一目标任务是否执行结束的状态，这样目标主节点设备就可以根据从至少一个任务节点中读取的任务信息确定是否存在原有的主节点设备未执行完成的任务。In an optional implementation manner, if there is at least one task node under the directory node, it means that the original master node device once created the first target task, and the target master node device can read the at least one task node in the The task information, optionally, the task information may include the task status, and the task status is used to indicate whether the execution of the first target task has ended, so that the target master node device can determine according to the task information read from at least one task node. Whether there are unfinished tasks performed by the original master node device.

步骤403，根据任务信息恢复第一目标任务，并执行第一目标任务。Step 403: Restore the first target task according to the task information, and execute the first target task.

本申请实施例中，目标主节点设备在确定存在原有的主节点设备未执行完成的任务的情况下，可以根据任务信息重新建立该第一目标任务，然后继续执行第一目标任务。In the embodiment of the present application, if the target master node device determines that there is an unfinished task performed by the original master node device, it can re-establish the first target task according to the task information, and then continue to execute the first target task.

本申请实施例中，原有的主节点设备在执行第一目标任务的过程中创建的任务节点，并将第一目标任务的任务信息写入任务节点中，这样当原有的主节点宕机之后，次节点设备转换为目标主节点设备，然后目标主节点设备通过扫描Zookeeper系统中的预设的目录节点；若所述目录节点下存在至少一个任务节点，则从所述至少一个任务节点中读取任务信息；根据所述任务信息恢复所述第一目标任务，并执行所述第一目标任务。由于Zookeeper系统具有高可用性，因此，可以保障分布式系统的数据存储不容易失效，从而保证分布式系统的高可用性。In the embodiment of the present application, the original master node device creates a task node in the process of executing the first target task, and writes the task information of the first target task into the task node, so that when the original master node goes down After that, the secondary node device is converted into the target master node device, and then the target master node device scans the preset directory node in the Zookeeper system; if there is at least one task node under the directory node, then the at least one task node Read task information; restore the first target task according to the task information, and execute the first target task. Since the Zookeeper system has high availability, it can ensure that the data storage of the distributed system is not easy to fail, thereby ensuring the high availability of the distributed system.

在本申请的另一个实施例中，如图5所示，其示出了本申请实施例提供的另一种分布式调度方法，该方法包括：In another embodiment of the present application, as shown in FIG. 5 , which shows another distributed scheduling method provided by the embodiment of the present application, the method includes:

步骤501，在接收到任务请求之后，根据任务请求建立第二目标任务，并执行第二目标任务。Step 501, after receiving the task request, establish a second target task according to the task request, and execute the second target task.

本申请实施例中，目标主节点设备在接收到任务请求之后，可以根据任务请求建立第二目标任务，并执行第二目标任务。In the embodiment of the present application, after receiving the task request, the target master node device may establish a second target task according to the task request, and execute the second target task.

可选的，本申请实施例中，任务请求中可以携带有任务信息，任务信息包括任务属性信息，该任务属性信息是与建立第二目标任务相关的信息，目标主节点设备可以根据任务信息建立第二目标任务，并执行第二目标任务。Optionally, in this embodiment of the present application, the task request may carry task information, and the task information includes task attribute information, and the task attribute information is information related to establishing the second target task, and the target master node device may establish the task information according to the task information. the second target task, and execute the second target task.

需要说明的是，本申请实施例中，原有的主节点设备建立的目标任务称为第一目标任务，目标主节点设备建立的任务称为第二目标任务，第一目标任务和第二目标任务并不存在时序上的先后关系。It should be noted that, in the embodiment of the present application, the target task established by the original master node device is called the first target task, the task established by the target master node device is called the second target task, the first target task and the second target task. There is no temporal sequence relationship between tasks.

步骤502，在执行第二目标任务的过程中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中。Step 502, in the process of executing the second target task, establish a task node corresponding to the second target task under a preset directory node in the Zookeeper system, and write the task information corresponding to the second target task into the second target task. in the task node corresponding to the target task.

目标主节点设备在执行第二目标任务的过程中，可以基于预先存储的目录节点的地址访问Zookeeper系统中的该目录节点，然后在该目录节点下建立与第二目标任务对应的任务节点。In the process of executing the second target task, the target master node device can access the directory node in the Zookeeper system based on the address of the pre-stored directory node, and then establish a task node corresponding to the second target task under the directory node.

其中，与第二目标任务对应的任务节点是指该任务节点中包含有第二目标任务的任务标识，其中，任务标识可以是任务编号或者名称编号。目标主节点设备可以通过读取任务节点中的任务标识来确定该任务节点所对应的目标任务。The task node corresponding to the second target task refers to that the task node contains a task identifier of the second target task, where the task identifier may be a task number or a name number. The target master node device can determine the target task corresponding to the task node by reading the task identifier in the task node.

在建立第二目标任务对应的任务节点之后，目标主节点设备可以获取第二目标任务对应的任务信息，其中，第二目标任务对应的任务信息包括任务属性信息、任务名称、任务标识、开始时间以及任务状态等，任务属性信息与建立第二目标任务相关的信息，任务状态可以用于表示该第二目标任务是否执行结束。After establishing the task node corresponding to the second target task, the target master node device may obtain task information corresponding to the second target task, wherein the task information corresponding to the second target task includes task attribute information, task name, task identifier, and start time. and task status, etc., the task attribute information is related to the establishment of the second target task, and the task status can be used to indicate whether the second target task has been executed or not.

本申请实施例中，目标主节点设备在建立第二目标任务之后，通过将第二目标任务的任务信息写入Zookeeper系统中的目录节点下的第二目标任务对应的任务节点下，使得在目标主节点设备宕机时，新的主节点设备可以从目录节点下获取未完成的第二目标任务的任务信息，从而能够恢复第二目标任务，保证分布式系统的高可用性。In the embodiment of the present application, after establishing the second target task, the target master node device writes the task information of the second target task to the task node corresponding to the second target task under the directory node in the Zookeeper system, so that the target When the master node device is down, the new master node device can obtain the task information of the unfinished second target task from the directory node, so that the second target task can be recovered and the high availability of the distributed system is ensured.

在一种可选的实现方式中，本申请实施例中，目标主节点设备在执行第二目标任务的过程中，在第二目标任务执行结束之后，目标主节点设备可以将第二目标任务对应的任务节点从目录节点中删除。当有新的第二目标任务时，再重新建立与新的第二目标任务对应的任务节点。In an optional implementation manner, in this embodiment of the present application, in the process of executing the second target task by the target master node device, after the execution of the second target task is completed, the target master node device may correspond to the second target task The task node is removed from the directory node. When there is a new second target task, the task node corresponding to the new second target task is re-established.

该种实现方式，使得目录节点下存在的任务节点均为未执行结束的第二目标任务对应的任务节点，便于在目标主节点宕机时，新的主节点设备快速识别未执行结束的第二目标任务，以便于进行任务恢复和继续执行，提高了分布式系统的任务执行效率以及高可用性。In this implementation manner, the task nodes existing under the directory node are all task nodes corresponding to the unexecuted second target task, so that when the target master node goes down, the new master node device can quickly identify the unexecuted second target task. The target task is convenient for task recovery and continuous execution, which improves the task execution efficiency and high availability of the distributed system.

在实际应用中，由于第二目标任务会比较大，若一个第二目标任务对应一个任务节点，那么该第二目标任务在未执行结束的情况下，若目标主节点设备宕机，那么新的主节点设备在恢复该第二目标任务时，就要将整个第二目标任务全部重新恢复，而对于目标主节点设备已经执行过的部分，新的目标主节点设备仍会重新执行一次，导致分布式系统重复作业，影响任务的执行效率。为了解决这个技术问题，本申请实施例中，如图6所示，提出了一种新的分布式任务调度方法，该方法包括以下步骤：In practical applications, since the second target task will be relatively large, if a second target task corresponds to a task node, then the second target task is not completed. When the master node device restores the second target task, it must restore the entire second target task, and for the part that the target master node device has already executed, the new target master node device will still re-execute it once, resulting in the distribution of The system repeats operations, which affects the efficiency of task execution. In order to solve this technical problem, in the embodiment of the present application, as shown in FIG. 6 , a new distributed task scheduling method is proposed, and the method includes the following steps:

步骤601，在接收到任务请求之后，根据任务请求建立第二目标任务，并执行第二目标任务。Step 601, after receiving the task request, establish a second target task according to the task request, and execute the second target task.

步骤602，将第二目标任务分割为多个目标子任务，各目标子任务为第二目标任务的不同阶段的任务。Step 602: Divide the second target task into a plurality of target subtasks, and each target subtask is a task of different stages of the second target task.

本申请实施例中，用户可以通过目标主节点设备将第二目标任务分割为自主设计的多个独立阶段的目标子任务，各目标子任务为第二目标任务的不同阶段的任务。每个阶段即高可用作业支持的最小粒度。In this embodiment of the present application, the user can divide the second target task into multiple independently designed target subtasks of independent stages through the target master node device, and each target subtask is a task of different stages of the second target task. Each stage is the smallest granularity supported by a high-availability job.

步骤603，在目录节点下按照各目标子任务的执行先后顺序建立与各目标子任务对应的任务节点，并将各目标子任务对应的任务信息写入与各目标子任务对应的任务节点中。Step 603: Create a task node corresponding to each target subtask under the directory node according to the execution sequence of each target subtask, and write the task information corresponding to each target subtask into the task node corresponding to each target subtask.

目标主节点设备可以在目录节点下建立多个任务节点，每个任务节点用于存储一个目标子任务的任务信息。各个目标子任务按照执行先后顺序依次将各自的任务信息写入各个目标子任务对应的任务节点中。The target master node device may establish multiple task nodes under the directory node, and each task node is used to store task information of a target subtask. Each target subtask sequentially writes the respective task information into the task node corresponding to each target subtask according to the execution sequence.

可选的，目标子任务对应的任务信息可以包括第二目标任务标识、目标子任务标识、执行时间、执行状态等信息。Optionally, the task information corresponding to the target subtask may include information such as a second target task identifier, a target subtask identifier, execution time, and execution status.

可选的，本申请实施例中，目标主节点设备每执行完一个目标子任务，可以将该目标子任务对应的任务节点从目录节点下删除，使得目录节点下存在的任务节点均为未执行结束的目标子任务对应的任务节点。Optionally, in this embodiment of the present application, each time the target master node device finishes executing a target subtask, the task node corresponding to the target subtask may be deleted from the directory node, so that all task nodes existing under the directory node are not executed. The task node corresponding to the finished target subtask.

可选的，本申请实施例中，目标主节点设备每执行完一个目标子任务，可以将该目标子任务的任务状态信息写入该目标子任无对应的任务节点下，该任务状态信息可以用于表示目标子任务是否执行结束。这样目录节点下存在的任务节点包括执行结束的目标子任务对应的任务节点，也包括未执行结束的目标子任务对应的任务节点。Optionally, in this embodiment of the present application, each time the target master node device finishes executing a target subtask, the task status information of the target subtask may be written into the task node that does not have a corresponding target subtask, and the task status information may be It is used to indicate whether the execution of the target subtask has ended. In this way, the task nodes existing under the directory node include task nodes corresponding to target subtasks whose execution has ended, and task nodes corresponding to target subtasks that have not been executed.

步骤604，在目标主节点设备宕机的情况下，新的主节点设备从Zookeeper系统中的目录节点下恢复未执行结束的目标子任务。Step 604 , in the case that the target master node device is down, the new master node device restores the target subtask that has not been executed and ended from the directory node in the Zookeeper system.

在目标主节点设备宕机的情况下，分布式系统可以从多个次节点设备中选出新的主节点设备，该新的主节点设备扫描目录节点，然后恢复未执行结束的目标子任务，并继续执行。这样，新的主节点设备只需要恢复第二目标任务的一部分目标子任务，而不需要将第二目标任务全部恢复并重新执行，避免了分布式系统重复作业，影响任务的执行效率。When the target primary node device is down, the distributed system can select a new primary node device from multiple secondary node devices, the new primary node device scans the directory node, and then resumes the target subtasks that have not been executed. and proceed. In this way, the new master node device only needs to restore a part of the target subtasks of the second target task, and does not need to restore and re-execute all the second target tasks, which avoids repeated operations in the distributed system and affects the execution efficiency of the tasks.

在一种可选的实现方式中，由于在执行第二目标任务时就在Zookeeper系统中建立所有目标子任务对应的任务节点会占用Zookeeper系统中较多的存储空间，本申请实施例中，提供了另一种实现方式，来节省Zookeeper系统上的存储空间，包括：In an optional implementation manner, since task nodes corresponding to all target subtasks are established in the Zookeeper system when the second target task is executed, more storage space in the Zookeeper system will be occupied. Another implementation method is proposed to save storage space on the Zookeeper system, including:

将第二目标任务分割为多个按照执行先后顺序排列的目标子任务，各目标子任务为第二目标任务的不同阶段的任务；将各目标子任务对应的任务信息保存在目标任务节点中；对于各目标子任务，在执行目标子任务时，基于目标任务节点中存储的目标子任务对应的任务信息在Zookeeper系统中建立目标子任务对应的子任务节点，并删除目标子任务的前一目标子任务对应的子任务节点。The second target task is divided into a plurality of target subtasks arranged in the order of execution, and each target subtask is a task of a different stage of the second target task; the task information corresponding to each target subtask is stored in the target task node; For each target subtask, when the target subtask is executed, a subtask node corresponding to the target subtask is established in the Zookeeper system based on the task information corresponding to the target subtask stored in the target task node, and the previous target of the target subtask is deleted. The subtask node corresponding to the subtask.

本申请实施例中，将第二目标任务分割为多个目标子任务之后，并不会在Zookeeper系统中建立每个目标子任务对应的任务节点，而是将所有目标子任务的任务信息全部存储到目标任务节点中，并可以按照执行顺序存储。In the embodiment of the present application, after the second target task is divided into multiple target subtasks, a task node corresponding to each target subtask will not be established in the Zookeeper system, but all task information of all target subtasks will be stored. to the target task node, and can be stored in the order of execution.

然后，按照执行顺序依次执行各个目标子任务，当执行第k个目标子任务时，可以从目标任务节点中获取第k个目标子任务对应的任务信息，并在Zookeeper系统中建立该第k个目标子任务对应的子任务节点，然后开始执行该第k个目标子任务。Then, execute each target subtask in sequence according to the execution order. When the kth target subtask is executed, the task information corresponding to the kth target subtask can be obtained from the target task node, and the kth target subtask can be established in the Zookeeper system. The subtask node corresponding to the target subtask, and then start to execute the kth target subtask.

这样各目标子任务对应的子任务节点并不是一次性都创建出来的，而是顺序执行目标子任务时，执行到哪个目标子任务就建立一个子任务节点，执行完成此子任务之后删除该子任务节点，以此类推来顺序执行各个目标子任务，从而节省zookeeper系统上的存储空间。In this way, the subtask nodes corresponding to each target subtask are not created at one time, but when the target subtasks are executed sequentially, a subtask node is created for which target subtask is executed, and the subtask node is deleted after the subtask is executed. Task nodes, and so on, execute each target subtask in sequence, thereby saving storage space on the zookeeper system.

其中，本申请实施例中，在第k个目标子任务对应的子任务节点建立之后，删除第k个目标子任务的前一目标子任务即第k-1个目标子任务对应的子任务节点。第k-1个目标子任务为已经执行结束的目标子任务，这样可以明确地区别已经执行结束的目标子任务，避免出现混淆。Among them, in the embodiment of the present application, after the subtask node corresponding to the kth target subtask is established, delete the previous target subtask of the kth target subtask, that is, the subtask node corresponding to the k-1th target subtask . The k-1 th target subtask is the target subtask that has been executed, so that the target subtask that has been executed can be clearly distinguished to avoid confusion.

本申请实施例中，如图7所示，本申请实施例提供了一种新的分布式任务调度方法，该方法包括以下内容：In the embodiment of the present application, as shown in FIG. 7 , the embodiment of the present application provides a new distributed task scheduling method, and the method includes the following contents:

步骤701，在执行第二目标任务的过程中，获取第二目标任务的任务状态信息直至第二目标任务执行结束。Step 701: During the execution of the second target task, task status information of the second target task is acquired until the execution of the second target task ends.

本申请实施例中，目标主节点设备在执行第二目标任务的过程中，可以周期性地获取该第二目标任务的任务状态信息，其中，任务状态信息用于表示第二目标任务的执行状态。可选的，任务状态信息可以包括任务名称，任务标识、任务阶段标识，执行状态，任务阶段开始时间或任务阶段结束时间，其中执行状态可以为执行中，或者可以为执行结束。In this embodiment of the present application, during the process of executing the second target task, the target master node device may periodically acquire task status information of the second target task, where the task status information is used to indicate the execution status of the second target task . Optionally, the task status information may include task name, task ID, task stage ID, execution status, task stage start time or task stage end time, wherein the execution status may be executing, or may be execution completed.

可选的，若任务状态信息中包括任务阶段结束时间，则表示该第二目标任务执行结束，若任务状态信息中未包括任务阶段结束时间，则表示第二目标任务未执行结束。Optionally, if the task status information includes the end time of the task phase, it means that the execution of the second target task is completed, and if the task status information does not include the end time of the task phase, it means that the execution of the second target task has not ended.

步骤702，将任务状态信息写入与第二目标任务对应的任务节点中，以供在目标主节点设备宕机之后，确定第二目标任务是否执行结束。Step 702: Write the task status information into the task node corresponding to the second target task, so as to determine whether the execution of the second target task ends after the target master node device goes down.

本申请实施例中，目标主节点设备可以将任务状态信息写入第二目标任务对应的任务节点中。这若目标主节点设备宕机之后，新的主节点设备可以通过扫描目录节点，以根据第二目标任务对应的任务节点中的第二目标任务的任务状态信息确定第二目标任务是否执行结束。In this embodiment of the present application, the target master node device may write the task status information into the task node corresponding to the second target task. If the target master node device goes down, the new master node device can scan the directory node to determine whether the second target task has finished execution according to the task status information of the second target task in the task node corresponding to the second target task.

本申请实施例中，通过将第二目标任务对应的任务状态信息写入第二目标任务对应的任务节点中，从而更加明确地确定出第二目标任务是否执行结束，以便于在目标主节点设备宕机时，新的主节点设备可以快速恢复未执行结束的第二目标任务。In this embodiment of the present application, by writing the task status information corresponding to the second target task into the task node corresponding to the second target task, it is more clearly determined whether the execution of the second target task has ended, so that the target master node device can In the event of downtime, the new master node device can quickly resume the unfinished second target task.

在实际应用中，可能出现第二目标任务对应的任务信息的数据量比较大，或者某些目标子任务对应的任务信息的数据量比较大的情况。而由于Zookeeper系统中的每个任务节点的数据存储容量是有限制的，因此，如果任务信息的数据量超过任务节点的数据存储容量，则需要对任务信息进行拆分。In practical applications, it may happen that the data amount of the task information corresponding to the second target task is relatively large, or the data amount of the task information corresponding to some target subtasks is relatively large. Since the data storage capacity of each task node in the Zookeeper system is limited, if the data volume of the task information exceeds the data storage capacity of the task node, the task information needs to be split.

本申请实施例中，目标主节点设备在建立与第二目标任务对应的任务节点时可以检测第二目标任务对应的任务信息的数据量是否大于数据量阈值，若大于，则建立与第二目标任务的任务信息的数据量对应的多个任务节点，该多个任务节点可以用于存储同一个第二目标任务对应的任务信息。In this embodiment of the present application, when establishing a task node corresponding to the second target task, the target master node device can detect whether the data volume of the task information corresponding to the second target task is greater than the data volume threshold, and if it is greater than the data volume threshold, establish a task node corresponding to the second target task. Multiple task nodes corresponding to the data amount of the task information of the task, and the multiple task nodes can be used to store the task information corresponding to the same second target task.

对应的，目标主节点设备也可以检测每个目标子任务的任务信息的数据量是否大于数据量阈值，若某个目标子任务的任务信息的数据量大于数据量阈值，则建立与该目标子任务的任务信息的数据量对应的多个任务节点，该多个任务节点可以用于存储同一个目标子任务对应的任务信息。Correspondingly, the target master node device can also detect whether the data volume of the task information of each target subtask is greater than the data volume threshold. Multiple task nodes corresponding to the data amount of the task information of the task, and the multiple task nodes can be used to store the task information corresponding to the same target subtask.

然而，在实际应用中，目标主节点设备将一个第二任务对应的任务信息，或者将一个目标子任务对应的任务信息写入不同的任务节点会带来新的事务性问题。However, in practical applications, the target master node device writes task information corresponding to a second task or task information corresponding to a target subtask to different task nodes, which will bring new transactional problems.

下面以将一个第二任务对应的任务信息写入不同的任务节点这种情况为例进行说明。现有的Zookeeper系统并不支持对多个任务节点同时事务操作，因此，目标主节点设备不能同步对多个任务节点同步执行写入操作或者读取操作，这样就会发生这样一种情况：目标主节点设备对同一个第二目标任务的任务信息进行写入操作时，只对第二目标任务对应的多个任务节点中一部分任务节点执行了写入操作，而对另一部分任务节点未执行写入操作，并且现有的Zookeeper系统并不能识别哪些任务节点执行了写入操作，哪些任务节点没有执行写入操作，因此会给Zookeeper系统对数据的管理造成困难。The following description will be given by taking the case of writing task information corresponding to a second task to different task nodes as an example. The existing Zookeeper system does not support simultaneous transaction operations on multiple task nodes. Therefore, the target master node device cannot synchronously perform write operations or read operations on multiple task nodes. When the master node device writes the task information of the same second target task, it only executes the write operation to some of the multiple task nodes corresponding to the second target task, and does not execute the write operation to the other part of the task nodes. In addition, the existing Zookeeper system cannot identify which task nodes perform the write operation and which task nodes do not perform the write operation, so it will cause difficulties for the Zookeeper system to manage the data.

为了解决上述技术问题，本申请实施例提供了一种可以对Zookeeper系统中的多个任务节点进行写入的实现方式，如图8所示，图8为本申请实施例提供的对Zookeeper系统中的多个任务节点进行写入操作的方法的流程图，该方法包括以下步骤：In order to solve the above technical problems, the embodiments of the present application provide an implementation manner that can write to multiple task nodes in the Zookeeper system, as shown in FIG. A flowchart of a method for performing a write operation on a plurality of task nodes, the method includes the following steps:

步骤801，在目录节点下建立事务标记。Step 801, create a transaction mark under the directory node.

本申请实施例中，当任务状态信息对应的第二目标任务对应的任务节点的数量为多个时，目标主节点设备在写入目标状态信息时，需要首先建立事务标记，其中，事务标记包括待将任务状态信息写入的至少两个任务节点的标识；事务标记用于表示对至少两个任务节点的标识对应的至少两个任务节点同步执行写入操作。In the embodiment of the present application, when the number of task nodes corresponding to the second target task corresponding to the task status information is multiple, the target master node device needs to establish a transaction mark first when writing the target status information, wherein the transaction mark includes The identifiers of the at least two task nodes to which the task status information is to be written; the transaction flag is used to indicate that the write operation is performed synchronously on the at least two task nodes corresponding to the identifiers of the at least two task nodes.

步骤802，对至少两个任务节点中的任务信息进行备份，得到备份数据。Step 802: Back up the task information in at least two task nodes to obtain backup data.

在建立好事务标记之后，目标主节点设备明确需要对该至少两个任务节点的标识对应的至少两个任务节点均执行写入操作，这样可以避免遗漏部分任务节点。After the transaction mark is established, the target master node device clearly needs to perform a write operation on at least two task nodes corresponding to the identifiers of the at least two task nodes, so as to avoid missing some task nodes.

需要说明的是，Zookeeper系统中的目录节点下的任务节点中的数据不支持更新操作，只支持覆盖写入操作。为了避免由于写入操作造成的数据存储异常，本申请实施例中，在执行写入操作时，需要先保存该至少两个任务节点中的原始数据，备份数据即该原始数据。It should be noted that the data in the task node under the directory node in the Zookeeper system does not support update operations, only overwrite write operations. In order to avoid abnormal data storage caused by the write operation, in the embodiment of the present application, when the write operation is performed, the original data in the at least two task nodes needs to be saved first, and the backup data is the original data.

步骤803，将任务状态信息写入至少两个任务节点的标识对应的至少两个任务节点中。Step 803: Write the task status information into at least two task nodes corresponding to the identifiers of the at least two task nodes.

目标主节点设备可以根据至少两个任务节点的标识确定出目录节点下的该至少两个任务节点。然后，逐一地对该至少两个任务节点执行写入操作。The target master node device may determine the at least two task nodes under the directory node according to the identifiers of the at least two task nodes. Then, write operations are performed to the at least two task nodes one by one.

步骤804，若写入成功，则删除备份数据和事务标记。Step 804, if the writing is successful, delete the backup data and the transaction mark.

本申请实施例中，对至少两个任务节点均执行完写入操作，则表示写入成功，若写入成功，则备份数据不再使用，因此删除。且写入成功表示将任务状态信息写入任务节点这个事务已经完成，因此也删除事务标记。In the embodiment of the present application, if the writing operation is performed on at least two task nodes, it indicates that the writing is successful, and if the writing is successful, the backup data is no longer used, and is therefore deleted. And the successful writing indicates that the transaction of writing the task status information to the task node has been completed, so the transaction mark is also deleted.

步骤805，若写入失败，则根据备份数据恢复至少两个任务节点的标识对应的至少两个任务节点中的任务信息，以及删除事务标记。Step 805, if the writing fails, restore the task information in the at least two task nodes corresponding to the identifiers of the at least two task nodes according to the backup data, and delete the transaction mark.

若写入失败，则表示该至少两个任务节点没有完成写入任务状态信息这个事务，为了保证该至少两个任务节点中的数据的准确性，将备份数据重新存入该至少两个任务节点中，保证任务信息的准确性。与此同时，删除事务标记表示本次写入操纵结束。If the writing fails, it means that the at least two task nodes have not completed the transaction of writing the task status information. In order to ensure the accuracy of the data in the at least two task nodes, the backup data is re-stored in the at least two task nodes , to ensure the accuracy of task information. At the same time, the deletion of the transaction marker indicates the end of the write operation.

本申请实施例中，通过建立事务标记和删除事务标记，实现了对多个任务节点进行同时事务操作，保证了对Zookeeper系统操作的原子性。解决了使用Zookeeper系统不支持对多个任务节点同时事务操作的问题。完整地支持了基于Zookeeper系统的高可用作业框架。In the embodiment of the present application, by establishing a transaction mark and deleting a transaction mark, simultaneous transaction operations on multiple task nodes are implemented, and the atomicity of operations on the Zookeeper system is guaranteed. Solved the problem that the Zookeeper system does not support simultaneous transaction operations on multiple task nodes. Completely supports the highly available job framework based on the Zookeeper system.

应该理解的是，虽然图4至图8的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图4至图8中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of FIG. 4 to FIG. 8 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 4 to FIG. 8 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The order of execution is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages within the other steps.

在一个实施例中，如图9所示，提供了一种分布式任务调度装置900，包括：扫描模块901，读取模块902和恢复模块903，其中：In one embodiment, as shown in FIG. 9, a distributed task scheduling apparatus 900 is provided, including: a scanning module 901, a reading module 902 and a recovery module 903, wherein:

扫描模块901，用于在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；The scanning module 901 is configured to scan the preset directory node in the Zookeeper system after the conversion from the secondary node device to the target primary node device;

读取模块902，用于若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；The reading module 902 is configured to read the task information from the at least one task node if there is at least one task node under the directory node; wherein, the at least one task node is the process of the original master node device executing the first target task Created in , the task information is the information related to the execution of the first target task written by the original master node device;

恢复模块903，用于根据任务信息恢复第一目标任务，并执行第一目标任务。The restoration module 903 is configured to restore the first target task according to the task information, and execute the first target task.

在其中一个实施例中，如图10所示，提供了一种分布式任务调度装置1000，还包括：In one of the embodiments, as shown in FIG. 10 , a distributed task scheduling apparatus 1000 is provided, further comprising:

接收模块1001，用于在接收到任务请求之后，根据任务请求建立第二目标任务，并执行第二目标任务；The receiving module 1001 is used to establish a second target task according to the task request after receiving the task request, and execute the second target task;

建立模块1002，用于在执行第二目标任务的过程中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中。The establishment module 1002 is used to establish a task node corresponding to the second target task under a preset directory node in the Zookeeper system during the process of executing the second target task, and write the task information corresponding to the second target task in the task node corresponding to the second target task.

在其中一个实施例中，建立模块1002具体用于：In one of the embodiments, the establishment module 1002 is specifically used for:

将第二目标任务分割为多个目标子任务，各目标子任务为第二目标任务的不同阶段的任务；Divide the second target task into a plurality of target subtasks, and each target subtask is a task of different stages of the second target task;

在目录节点下按照各目标子任务的执行先后顺序建立与各目标子任务对应的任务节点，并将各目标子任务对应的任务信息写入与各目标子任务对应的任务节点中。A task node corresponding to each target subtask is established under the directory node according to the execution sequence of each target subtask, and the task information corresponding to each target subtask is written into the task node corresponding to each target subtask.

将第二目标任务分割为多个按照执行先后顺序排列的目标子任务，各目标子任务为第二目标任务的不同阶段的任务；Divide the second target task into a plurality of target subtasks arranged in order of execution, and each target subtask is a task of different stages of the second target task;

将各目标子任务对应的任务信息保存在目标任务节点中；Save the task information corresponding to each target subtask in the target task node;

对于各目标子任务，在执行目标子任务时，基于目标任务节点中存储的目标子任务对应的任务信息在Zookeeper系统中建立目标子任务对应的子任务节点，并删除目标子任务的前一目标子任务对应的子任务节点。For each target subtask, when the target subtask is executed, a subtask node corresponding to the target subtask is established in the Zookeeper system based on the task information corresponding to the target subtask stored in the target task node, and the previous target of the target subtask is deleted. The subtask node corresponding to the subtask.

在执行第二目标任务的过程中，获取第二目标任务的任务状态信息直至第二目标任务执行结束；任务状态信息用于表示第二目标任务的执行状态；In the process of executing the second target task, the task status information of the second target task is obtained until the execution of the second target task ends; the task status information is used to represent the execution status of the second target task;

将任务状态信息写入与第二目标任务对应的任务节点中，以供在目标主节点设备宕机之后，确定第二目标任务是否执行结束。The task status information is written into the task node corresponding to the second target task, so as to determine whether the execution of the second target task ends after the target master node device goes down.

在目录节点下建立事务标记，事务标记包括待将任务状态信息写入的至少两个任务节点的标识；事务标记用于表示对至少两个任务节点的标识对应的至少两个任务节点同步执行写入操作；A transaction marker is established under the directory node, and the transaction marker includes the identifiers of the at least two task nodes to which the task status information is to be written; the transaction marker is used to indicate that the at least two task nodes corresponding to the identifiers of the at least two task nodes are synchronously executing writing enter operation;

对至少两个任务节点中的任务信息进行备份，得到备份数据；Back up the task information in at least two task nodes to obtain backup data;

将任务状态信息写入至少两个任务节点的标识对应的至少两个任务节点中。The task status information is written into at least two task nodes corresponding to the identifiers of the at least two task nodes.

关于分布式任务调度装置的具体限定可以参见上文中对于分布式任务调度方法的限定，在此不再赘述。上述分布式任务调度装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the distributed task scheduling apparatus, reference may be made to the above limitations on the distributed task scheduling method, which will not be repeated here. Each module in the above-mentioned distributed task scheduling apparatus may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:

在从次节点设备转换为目标主节点设备之后，扫描Zookeeper系统中的预设的目录节点；After converting from the secondary node device to the target primary node device, scan the preset directory nodes in the Zookeeper system;

若目录节点下存在至少一个任务节点，则从至少一个任务节点中读取任务信息；其中，至少一个任务节点为原有的主节点设备在执行第一目标任务的过程中创建的，任务信息为原有的主节点设备写入的与执行第一目标任务相关的信息；If there is at least one task node under the directory node, read task information from at least one task node; wherein, at least one task node is created by the original master node device in the process of executing the first target task, and the task information is Information related to the execution of the first target task written by the original master node device;

根据任务信息恢复第一目标任务，并执行第一目标任务。The first target task is restored according to the task information, and the first target task is executed.

在其中一个实施例中，该处理器执行计算机程序时还可以实现以下步骤：In one of the embodiments, the processor may further implement the following steps when executing the computer program:

在接收到任务请求之后，根据任务请求建立第二目标任务，并执行第二目标任务；After receiving the task request, establish a second target task according to the task request, and execute the second target task;

在执行第二目标任务的过程中，在Zookeeper系统中的预设的目录节点下建立与第二目标任务对应的任务节点，并将第二目标任务对应的任务信息写入与第二目标任务对应的任务节点中。In the process of executing the second target task, a task node corresponding to the second target task is established under the preset directory node in the Zookeeper system, and the task information corresponding to the second target task is written into the second target task corresponding to the second target task in the task node.

本申请实施例提供的计算机设备，其实现原理和技术效果与上述方法实施例类似，在此不再赘述。The implementation principles and technical effects of the computer equipment provided in the embodiments of the present application are similar to those of the foregoing method embodiments, and details are not described herein again.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

在其中一个实施例中，该计算机程序被处理器执行时还可以实现以下步骤：In one of the embodiments, when the computer program is executed by the processor, the following steps may also be implemented:

本实施例提供的计算机可读存储介质，其实现原理和技术效果与上述方法实施例类似，在此不再赘述。The implementation principle and technical effect of the computer-readable storage medium provided in this embodiment are similar to those of the foregoing method embodiments, and details are not described herein again.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory，ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic Random Access Memory，DRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. The non-volatile memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash memory or optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, the RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM).

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. A distributed task scheduling method, wherein the method comprises:

After converting from the secondary node device to the target primary node device, scan the preset directory nodes in the Zookeeper system;

If there is at least one task node under the directory node, read task information from the at least one task node; wherein, the at least one task node is the original master node device in the process of executing the first target task Created, the task information is the information related to the execution of the first target task written by the original master node device;

The first target task is restored according to the task information, and the first target task is executed.

2. The method according to claim 1, wherein the method further comprises:

After receiving the task request, establish a second target task according to the task request, and execute the second target task;

In the process of executing the second target task, a task node corresponding to the second target task is established under the preset directory node in the Zookeeper system, and a task node corresponding to the second target task is created. The task information is written into the task node corresponding to the second target task.

3. The method according to claim 2, wherein the task node corresponding to the second target task is established under the preset directory node in the Zookeeper system, and the The task information corresponding to the second target task is written into the task node corresponding to the second target task, including:

The second target task is divided into a plurality of target subtasks, and each of the target subtasks is a task of a different stage of the second target task;

A task node corresponding to each target subtask is established under the directory node according to the execution sequence of each target subtask, and the task information corresponding to each target subtask is written into each target subtask in the task node corresponding to the task.

4. The method according to claim 3, characterized in that, in the process of executing the second target task, establishing a relationship with the second directory node in the Zookeeper system under the preset directory node The task node corresponding to the target task, and writing the task information corresponding to the second target task into the task node corresponding to the second target task, including:

The second target task is divided into a plurality of target subtasks arranged in the order of execution, and each of the target subtasks is a task of different stages of the second target task;

saving the task information corresponding to each of the target subtasks in the target task node;

For each target subtask, when executing the target subtask, a subtask corresponding to the target subtask is established in the Zookeeper system based on the task information corresponding to the target subtask stored in the target task node. task node, and delete the subtask node corresponding to the previous target subtask of the target subtask.

5. The method according to claim 2, wherein the method further comprises:

In the process of executing the second target task, the task status information of the second target task is acquired until the execution of the second target task ends; the task status information is used to indicate the execution status of the second target task ;

The task status information is written into the task node corresponding to the second target task, so as to determine whether the execution of the second target task ends after the target master node device goes down.

6. The method according to claim 5, wherein the writing the task status information into the task node corresponding to the second target task comprises:

A transaction marker is established under the directory node, the transaction marker includes the identifiers of at least two task nodes to which the task status information is to be written; the transaction marker is used to indicate the identifiers of the at least two task nodes The corresponding at least two task nodes perform the write operation synchronously;

Backing up the task information in the at least two task nodes to obtain backup data;

Writing the task status information into the at least two task nodes corresponding to the identifiers of the at least two task nodes.

7. The method according to claim 6, wherein the method further comprises:

If the writing is successful, delete the backup data and the transaction mark;

If the writing fails, the task information in the at least two task nodes corresponding to the identifiers of the at least two task nodes is restored according to the backup data, and the transaction flag is deleted.

8. A distributed task scheduling apparatus, wherein the apparatus comprises:

The scanning module is used to scan the preset directory nodes in the Zookeeper system after converting from the secondary node device to the target primary node device;

A reading module, configured to read task information from the at least one task node if there is at least one task node under the directory node; wherein, the at least one task node is the original master node device executing the first Created in the process of a target task, the task information is the information related to the execution of the first target task written by the original master node device;

A restoring module, configured to restore the first target task according to the task information, and execute the first target task.

9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the method according to any one of claims 1 to 7 when the processor executes the computer program. step.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.