CN112162841B

CN112162841B - Big data processing oriented distributed scheduling system, method and storage medium

Info

Publication number: CN112162841B
Application number: CN202011069582.0A
Authority: CN
Inventors: 黄立; 蔡春茂; 段朋
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2024-09-06
Anticipated expiration: 2040-09-30
Also published as: CN112162841A

Abstract

The invention discloses a distributed scheduling system, a distributed scheduling method and a storage medium for big data processing, which comprise a scheduling center module, a scheduling center module and a scheduling center module, wherein the scheduling center module is used for taking charge of the dependent configuration and job development of workflow; the leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center; the follower module is used for executing the specific calculation tasks distributed by the leader module, submitting task results and storing task execution logs; the coordinator module is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules; a task queue module and a metadata module. The invention considers the dependence on the task, avoids the phenomenon of idle running of the downstream task caused by overtime or idle running of the execution time of the upstream task, and is beneficial to the whole data flow.

Description

Big data processing oriented distributed scheduling system, method and storage medium

Technical Field

The invention belongs to the technical field of big data computing task scheduling, and particularly relates to a big data processing-oriented distributed scheduling system, a big data processing-oriented distributed scheduling method and a storage medium.

Background

With the rapid development of data technology, modern enterprises start to move from the IT age to the DT age, and whether public clouds or self-built data centers are selected, a large data platform becomes an infrastructure of the modern enterprises. The big data platform is iterated from the initial single execution engine MapReduce to the times of multiple execution engines such as MapReduce, spark, flink and the like. In the process of mining data value, enterprises can generate thousands of data calculation tasks, and how to schedule the tasks is particularly important to construct a complicated calculation task dependency network.

The patent document CN107506381a discloses a big data distributed scheduling analysis method, a system device and a storage medium: the self-built big data distributed scheduling and analysis system has the core functions of realizing the encapsulation of big data processing technical processes and the scheduling function of a self-contained part, but does not propose a method for coping with the dependency and arrangement of complicated task flows in a big data scene, and the whole system has single-point faults, does not consider a high-availability fault-tolerant strategy, so the following problems are faced:

(1) The manner in which tasks are distributed does not take into account dependencies on the tasks. Once the execution time of the upstream task is overtime or runs empty, the downstream task is most likely to have the phenomenon of running empty, which is unfavorable for the whole data circulation process and aggravates the burden of developers.

(2) The server submitting or executing the computing task has a single point of failure, and once the server is down, the computing task cannot be triggered to influence the computing logic.

Therefore, there is a need to develop a distributed scheduling system, method and storage medium for big data processing.

Disclosure of Invention

In order to solve the problems, the invention provides a distributed scheduling system, a distributed scheduling method and a storage medium for big data processing.

In a first aspect, the present invention provides a distributed scheduling system for big data processing, including:

the scheduling center module is used for responsible for the dependent configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of the relational database through an API interface;

The leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center;

The follower module, also called an executor, is used for executing the specific calculation task distributed by the leader module, submitting the task result and storing the task execution log;

The coordinator module is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules;

The task queue module is a message queue and comprises a workflow topic, a task topic and a task result topic, and is used for realizing task dependence among the workflows;

The metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the workflow related metadata from the relational database and loading the workflow related metadata into the memory.

In a second aspect, the big data processing oriented distributed scheduling method of the present invention adopts the big data processing oriented distributed scheduling system of the present invention, and the method includes the following steps:

receiving dependency configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of a relational database through an API (application program interface);

dividing the workflow configured by the dispatching center according to the dependency relationship, and sending the divided specific task nodes to follower nodes;

The system comprises a leader module, a task execution log and a task execution log, wherein the leader module is used for distributing a task execution log to a task execution log;

The method is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader modules by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.

Further, the coordinator service periodically scans a to-be-executed workflow table in the relational database to obtain a to-be-executed command, periodically requests load information of a Leader cluster from the ZooKeeper cluster, and distributes the workflow to corresponding Leader according to the CPU and memory allowance of each Leader machine currently by adopting a Round-Robin algorithm; and finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.

Further, the process_instance theme in the header consuming message queue judges whether the workflow needs to be executed according to the header_host_name field in the message; if the workflow needs to be executed, dividing the workflow into a plurality of computing tasks according to the dependency relationship of the workflow, and estimating computing resources required by each computing task; load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; adding corresponding executor Follower _host_name information to the independent calculation task after segmentation, and sending the information to a task_instance theme in a message queue; in the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned by the message queue, and updates the execution result of the workflow; and when the execution state of the whole workflow is changed into a final state, persisting the workflow execution result into a relational database.

Further, follower consumes the data of the task_instance theme in the message queue, performs execution according to Follower _host_name matching with a corresponding executor, writes the execution result of the task back to the task_instance_result theme in the message queue after the task is executed, waits for the Leader to consume the task execution result, and completes the execution of the whole task flow after the Leader writes the task result after the execution completion back to the relational database.

Further, when the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once a Leader or Follower is found to be down, a workflow fault-tolerant flow is entered, including a Leader fault-tolerant flow and a Follower fault-tolerant flow.

Further, the Leader fault-tolerant flow specifically includes:

Each machine of the Leader cluster monitors the znode of the Leader on the ZooKeeper cluster, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper cluster is triggered, one of the live Leader acquires the distributed lock, triggers workflow error tolerant logic, inserts the workflow information to be fault tolerant into a fault tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow to complete the distributed fault tolerant process of the Leader.

Further, the Follower fault-tolerant flow specifically includes:

Each machine in Follwer clusters registers itself with a znode on the ZooKeeper cluster, if Follower downtime of the executing task occurs, a monitoring mechanism of the Leader is triggered, all running tasks on all current downtime Follower are terminated, the Leader marks the workflow as requiring fault-tolerant state, and surviving Follower is reselected as an executor of the remaining tasks of the workflow.

In a third aspect, the storage medium of the present invention has a computer readable program stored therein, where the computer readable program is capable of executing the steps of the big data processing oriented distributed scheduling method according to the present invention when called by an executor.

The invention has the following advantages: :

(1) The dependence on the task is considered in the task distribution mode, so that the phenomenon that the downstream task runs empty due to overtime or idle running of the execution time of the upstream task is avoided, the whole data circulation is facilitated, and the burden of a developer is reduced.

(2) Fault-tolerant strategies are designed, and when a single point of failure exists in the whole system, the corresponding workflow can be taken over and the workflow can be continuously executed.

(3) The whole dispatching cluster can realize linear expansion of the Leader node and the Worker node.

Drawings

FIG. 1 is a diagram of a Follower actuator node architecture in this embodiment;

FIG. 2 is a general architecture diagram of the present embodiment;

FIG. 3 is a workflow execution flow chart of the present embodiment;

FIG. 4 is a diagram showing the fault tolerance principle of the Leader according to the present embodiment;

fig. 5 is a Follower fault-tolerant schematic diagram of the present embodiment.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

In this embodiment, a distributed scheduling system for big data processing includes:

The dispatching center module is provided with a dispatching center Web interface, provides a simple and convenient task visual configuration window for a user, and provides monitoring and operation and maintenance functions of dispatching platform operation. The dispatching center module is used for responsible for the dependent configuration and job development of the workflow, and persisting the configured workflow into a workflow table to be executed of a relational database (DB for short) through an API interface; workflow dependencies are described by Json, and their predecessor and successor job id information is stored in the Json data for each job.

And the leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center.

The follower module, also called an executor, is used for executing the specific calculation task distributed by the leader module, submitting the task result and storing the task execution log.

And the coordinator module is used for taking out tasks to be executed from the database at fixed time, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.

The task queue module is a message queue (called MQ for short) and comprises a workflow topic, a task topic and a task result topic.

The metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the relevant metadata of the workflow from the relational database and loading the relevant metadata into the memory, so that the delay in the workflow operation process is reduced, and the operation efficiency is improved.

The system is a multi-execution engine distributed task scheduling system facing to a big data platform intricate and complex calculation task scene. The system can divide the workflow based on the decentralization concept aiming at the user-defined computing task workflow. And distributing the segmented computing tasks to Follower in the cluster for execution. Task dependencies between workflows are implemented using message queues, with the expression capabilities of complex dependent DAG graphs. The high availability of the leader module and the follower module is realized based on the distributed coordination service zookeeper, so that the whole system can realize linear expansion in the running process.

In a data platform, a complete data processing task includes: the data access, data cleaning, data mining and analysis result storage are four stages, namely a complete data processing workflow comprises a plurality of computing engines. As shown in fig. 1, follower is not a running node of a specific computing task, but a gateway node in a large data platform, i.e., a submitting node of the computing task, is taken as a Follower node. The Follower node is provided with the gateway of the computing engines such as the Sqoop client, the Spark client, the Flink client and the Hive client, and the gateway is not directly used as an operation node of the computing tasks, so that the capability of dispatching the computing tasks of the multi-execution engine can be realized, the decoupling of a dispatching platform and the computing platform is realized, and the resource competition is avoided.

In this embodiment, the big data processing oriented distributed scheduling method adopts the big data processing oriented distributed scheduling system as described in this embodiment, and the method includes the following steps:

As shown in fig. 2 and 3, the specific flow of the method is as follows:

The dispatching center module is responsible for the dependent configuration and job development of the workflow, and the configured workflow is persisted into a workflow table to be executed of the relational database through an API interface; workflow dependencies are described by Json, and their predecessor and successor job id information is stored in the Json data for each job.

The coordinator service (coordinator) scans the to-be-executed workflow table in the relational database at regular time to acquire a to-be-executed command, and requests the load information of the Leader cluster to the ZooKeeper (ZK for short) at regular time, and the workflow is distributed to the corresponding Leader according to the CPU and the memory allowance of each Leader machine currently by adopting a Round-Robin algorithm. And finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.

Judging whether the workflow needs to be executed or not according to a process_instance theme in a header consumption message queue and a header_host_name field in a message; if the workflow needs to be executed, the workflow is segmented into a plurality of computing tasks according to the dependency relationship of the workflow, and the computing resources needed by each computing task are estimated. Load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; and adding the corresponding executor Follower _host_name information to the independent calculation tasks after segmentation, and sending the independent calculation tasks to a task_instance theme in a message queue. In the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned through the message queue, and updates the execution result of the workflow. And when the execution state of the whole workflow is changed into a final state, persisting the workflow execution result into a relational database.

Follower consume the data of the task_instance theme in the message queue, and execute according to Follower _host_name matching with the corresponding executor. After the task is executed, writing the execution result of the task back to a task_instance_result theme in the message queue, and waiting for the Leader to consume the task execution result. After the Leader writes the task result after execution back to the relational database, the whole task flow execution is completed.

As a dispatch system with distributed capabilities, fault tolerant design is the core that must be considered by the overall system because of the natural unreliability of distributed systems. Distributed fault tolerance of the whole dispatching system is realized based on a ZooKeeper. When the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once a Leader or Follower is found to be down, a workflow fault-tolerant flow is entered, including a Leader fault-tolerant flow and a Follower fault-tolerant flow.

As shown in fig. 4, in this embodiment, the header fault tolerance flow specifically includes:

Each machine of the Leader cluster monitors znode on the ZooKeeper, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper is triggered, one of the live Leader acquires the distributed lock, and triggers workflow error tolerant logic, workflow information to be fault-tolerant is inserted into a fault-tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow, so that the distributed fault-tolerant process of the Leader is completed. As shown in fig. 4, after the Leader1 is hung up, the Leader2 acquires the distributed lock, then triggers the workflow error tolerant logic, inserts the workflow information to be fault-tolerant into the fault-tolerant command table in the relational database, and then the Leader2 takes over the workflow to complete the distributed fault-tolerant process of the Leader.

In this embodiment, the Follower fault-tolerant flow specifically includes:

As shown in fig. 5, when Follwer is down, it is re-executed by Follwer 2.

In this embodiment, a storage medium has stored therein a computer readable program that, when called by an executor, can execute the steps of the big data processing oriented distributed scheduling method as described in this embodiment.

Claims

1. A distributed scheduling system for big data processing, comprising:

the metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the workflow related metadata from the relational database and loading the workflow related metadata into the memory;

When the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once the Leader or Follower is found to be down, entering a workflow fault-tolerant flow, wherein the workflow fault-tolerant flow comprises a Leader fault-tolerant flow and a Follower fault-tolerant flow;

the fault-tolerant process of the Leader specifically comprises the following steps:

Each machine of the Leader cluster monitors znode on the ZooKeeper, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper is triggered, one of the live Leader acquires a distributed lock and triggers workflow error-tolerant logic, workflow information to be fault-tolerant is inserted into a fault-tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow to complete the distributed fault-tolerant process of the Leader;

the Follower fault-tolerant flow specifically comprises the following steps:

Each machine of Follwer clusters registers itself with a znode on a ZooKeeper, if Follower downtime of executing tasks occurs, a monitoring mechanism of a Leader is triggered, all running tasks on all current downtime Follower are terminated, the Leader marks the workflow as a fault-tolerant state, and surviving Follower is reselected as an executor of the rest tasks of the workflow;

Judging whether the workflow needs to be executed or not according to a process_instance theme in a header consumption message queue and a header_host_name field in a message; if the workflow needs to be executed, dividing the workflow into a plurality of computing tasks according to the dependency relationship of the workflow, and estimating the computing resources required by each computing task; load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; adding corresponding executor Follower _host_name information to the independent calculation task after segmentation, and sending the information to a task_instance theme in a message queue; in the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned by the message queue, and updates the execution result of the workflow; and when the execution state of the whole workflow is changed into a final state, persisting the execution result state of the workflow into a relational database.

2. A distributed scheduling method for big data processing, characterized in that the distributed scheduling system for big data processing according to claim 1 is adopted, and the method comprises the following steps:

3. The big data processing oriented distributed scheduling method according to claim 2, wherein: the coordinator service periodically scans a to-be-executed workflow table in the relational database to obtain a to-be-executed command, periodically requests load information of a Leader cluster from a ZooKeeper, and distributes a workflow to corresponding Leader according to the CPU and memory allowance of each current Leader machine by adopting a Round-Robin algorithm; and finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.

4. A distributed scheduling method for big data processing according to claim 3, wherein: follower consuming the data of the task_instance theme in the message queue, performing execution according to Follower _host_name matching with a corresponding executor, writing the execution result of the task back to the task_instance_result theme in the message queue after the task is completed, waiting for the Leader to consume the task execution result, and completing the execution of the whole task flow after the Leader writes the task result after the execution completion back to the relational database.

5. A storage medium having a computer readable program stored therein, characterized in that: the computer readable program, when invoked by an executor, is capable of performing the steps of the big data processing oriented distributed scheduling method of any of claims 2 to 4.