[go: up one dir, main page]

CN112162841B - Big data processing oriented distributed scheduling system, method and storage medium - Google Patents

Big data processing oriented distributed scheduling system, method and storage medium Download PDF

Info

Publication number
CN112162841B
CN112162841B CN202011069582.0A CN202011069582A CN112162841B CN 112162841 B CN112162841 B CN 112162841B CN 202011069582 A CN202011069582 A CN 202011069582A CN 112162841 B CN112162841 B CN 112162841B
Authority
CN
China
Prior art keywords
workflow
task
leader
follower
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011069582.0A
Other languages
Chinese (zh)
Other versions
CN112162841A (en
Inventor
黄立
蔡春茂
段朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202011069582.0A priority Critical patent/CN112162841B/en
Publication of CN112162841A publication Critical patent/CN112162841A/en
Application granted granted Critical
Publication of CN112162841B publication Critical patent/CN112162841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a distributed scheduling system, a distributed scheduling method and a storage medium for big data processing, which comprise a scheduling center module, a scheduling center module and a scheduling center module, wherein the scheduling center module is used for taking charge of the dependent configuration and job development of workflow; the leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center; the follower module is used for executing the specific calculation tasks distributed by the leader module, submitting task results and storing task execution logs; the coordinator module is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules; a task queue module and a metadata module. The invention considers the dependence on the task, avoids the phenomenon of idle running of the downstream task caused by overtime or idle running of the execution time of the upstream task, and is beneficial to the whole data flow.

Description

Big data processing oriented distributed scheduling system, method and storage medium
Technical Field
The invention belongs to the technical field of big data computing task scheduling, and particularly relates to a big data processing-oriented distributed scheduling system, a big data processing-oriented distributed scheduling method and a storage medium.
Background
With the rapid development of data technology, modern enterprises start to move from the IT age to the DT age, and whether public clouds or self-built data centers are selected, a large data platform becomes an infrastructure of the modern enterprises. The big data platform is iterated from the initial single execution engine MapReduce to the times of multiple execution engines such as MapReduce, spark, flink and the like. In the process of mining data value, enterprises can generate thousands of data calculation tasks, and how to schedule the tasks is particularly important to construct a complicated calculation task dependency network.
The patent document CN107506381a discloses a big data distributed scheduling analysis method, a system device and a storage medium: the self-built big data distributed scheduling and analysis system has the core functions of realizing the encapsulation of big data processing technical processes and the scheduling function of a self-contained part, but does not propose a method for coping with the dependency and arrangement of complicated task flows in a big data scene, and the whole system has single-point faults, does not consider a high-availability fault-tolerant strategy, so the following problems are faced:
(1) The manner in which tasks are distributed does not take into account dependencies on the tasks. Once the execution time of the upstream task is overtime or runs empty, the downstream task is most likely to have the phenomenon of running empty, which is unfavorable for the whole data circulation process and aggravates the burden of developers.
(2) The server submitting or executing the computing task has a single point of failure, and once the server is down, the computing task cannot be triggered to influence the computing logic.
Therefore, there is a need to develop a distributed scheduling system, method and storage medium for big data processing.
Disclosure of Invention
In order to solve the problems, the invention provides a distributed scheduling system, a distributed scheduling method and a storage medium for big data processing.
In a first aspect, the present invention provides a distributed scheduling system for big data processing, including:
the scheduling center module is used for responsible for the dependent configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of the relational database through an API interface;
The leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center;
The follower module, also called an executor, is used for executing the specific calculation task distributed by the leader module, submitting the task result and storing the task execution log;
The coordinator module is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules;
The task queue module is a message queue and comprises a workflow topic, a task topic and a task result topic, and is used for realizing task dependence among the workflows;
The metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the workflow related metadata from the relational database and loading the workflow related metadata into the memory.
In a second aspect, the big data processing oriented distributed scheduling method of the present invention adopts the big data processing oriented distributed scheduling system of the present invention, and the method includes the following steps:
receiving dependency configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of a relational database through an API (application program interface);
dividing the workflow configured by the dispatching center according to the dependency relationship, and sending the divided specific task nodes to follower nodes;
The system comprises a leader module, a task execution log and a task execution log, wherein the leader module is used for distributing a task execution log to a task execution log;
The method is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader modules by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.
Further, the coordinator service periodically scans a to-be-executed workflow table in the relational database to obtain a to-be-executed command, periodically requests load information of a Leader cluster from the ZooKeeper cluster, and distributes the workflow to corresponding Leader according to the CPU and memory allowance of each Leader machine currently by adopting a Round-Robin algorithm; and finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.
Further, the process_instance theme in the header consuming message queue judges whether the workflow needs to be executed according to the header_host_name field in the message; if the workflow needs to be executed, dividing the workflow into a plurality of computing tasks according to the dependency relationship of the workflow, and estimating computing resources required by each computing task; load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; adding corresponding executor Follower _host_name information to the independent calculation task after segmentation, and sending the information to a task_instance theme in a message queue; in the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned by the message queue, and updates the execution result of the workflow; and when the execution state of the whole workflow is changed into a final state, persisting the workflow execution result into a relational database.
Further, follower consumes the data of the task_instance theme in the message queue, performs execution according to Follower _host_name matching with a corresponding executor, writes the execution result of the task back to the task_instance_result theme in the message queue after the task is executed, waits for the Leader to consume the task execution result, and completes the execution of the whole task flow after the Leader writes the task result after the execution completion back to the relational database.
Further, when the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once a Leader or Follower is found to be down, a workflow fault-tolerant flow is entered, including a Leader fault-tolerant flow and a Follower fault-tolerant flow.
Further, the Leader fault-tolerant flow specifically includes:
Each machine of the Leader cluster monitors the znode of the Leader on the ZooKeeper cluster, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper cluster is triggered, one of the live Leader acquires the distributed lock, triggers workflow error tolerant logic, inserts the workflow information to be fault tolerant into a fault tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow to complete the distributed fault tolerant process of the Leader.
Further, the Follower fault-tolerant flow specifically includes:
Each machine in Follwer clusters registers itself with a znode on the ZooKeeper cluster, if Follower downtime of the executing task occurs, a monitoring mechanism of the Leader is triggered, all running tasks on all current downtime Follower are terminated, the Leader marks the workflow as requiring fault-tolerant state, and surviving Follower is reselected as an executor of the remaining tasks of the workflow.
In a third aspect, the storage medium of the present invention has a computer readable program stored therein, where the computer readable program is capable of executing the steps of the big data processing oriented distributed scheduling method according to the present invention when called by an executor.
The invention has the following advantages: :
(1) The dependence on the task is considered in the task distribution mode, so that the phenomenon that the downstream task runs empty due to overtime or idle running of the execution time of the upstream task is avoided, the whole data circulation is facilitated, and the burden of a developer is reduced.
(2) Fault-tolerant strategies are designed, and when a single point of failure exists in the whole system, the corresponding workflow can be taken over and the workflow can be continuously executed.
(3) The whole dispatching cluster can realize linear expansion of the Leader node and the Worker node.
Drawings
FIG. 1 is a diagram of a Follower actuator node architecture in this embodiment;
FIG. 2 is a general architecture diagram of the present embodiment;
FIG. 3 is a workflow execution flow chart of the present embodiment;
FIG. 4 is a diagram showing the fault tolerance principle of the Leader according to the present embodiment;
fig. 5 is a Follower fault-tolerant schematic diagram of the present embodiment.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
In this embodiment, a distributed scheduling system for big data processing includes:
The dispatching center module is provided with a dispatching center Web interface, provides a simple and convenient task visual configuration window for a user, and provides monitoring and operation and maintenance functions of dispatching platform operation. The dispatching center module is used for responsible for the dependent configuration and job development of the workflow, and persisting the configured workflow into a workflow table to be executed of a relational database (DB for short) through an API interface; workflow dependencies are described by Json, and their predecessor and successor job id information is stored in the Json data for each job.
And the leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center.
The follower module, also called an executor, is used for executing the specific calculation task distributed by the leader module, submitting the task result and storing the task execution log.
And the coordinator module is used for taking out tasks to be executed from the database at fixed time, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.
The task queue module is a message queue (called MQ for short) and comprises a workflow topic, a task topic and a task result topic.
The metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the relevant metadata of the workflow from the relational database and loading the relevant metadata into the memory, so that the delay in the workflow operation process is reduced, and the operation efficiency is improved.
The system is a multi-execution engine distributed task scheduling system facing to a big data platform intricate and complex calculation task scene. The system can divide the workflow based on the decentralization concept aiming at the user-defined computing task workflow. And distributing the segmented computing tasks to Follower in the cluster for execution. Task dependencies between workflows are implemented using message queues, with the expression capabilities of complex dependent DAG graphs. The high availability of the leader module and the follower module is realized based on the distributed coordination service zookeeper, so that the whole system can realize linear expansion in the running process.
In a data platform, a complete data processing task includes: the data access, data cleaning, data mining and analysis result storage are four stages, namely a complete data processing workflow comprises a plurality of computing engines. As shown in fig. 1, follower is not a running node of a specific computing task, but a gateway node in a large data platform, i.e., a submitting node of the computing task, is taken as a Follower node. The Follower node is provided with the gateway of the computing engines such as the Sqoop client, the Spark client, the Flink client and the Hive client, and the gateway is not directly used as an operation node of the computing tasks, so that the capability of dispatching the computing tasks of the multi-execution engine can be realized, the decoupling of a dispatching platform and the computing platform is realized, and the resource competition is avoided.
In this embodiment, the big data processing oriented distributed scheduling method adopts the big data processing oriented distributed scheduling system as described in this embodiment, and the method includes the following steps:
receiving dependency configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of a relational database through an API (application program interface);
dividing the workflow configured by the dispatching center according to the dependency relationship, and sending the divided specific task nodes to follower nodes;
The system comprises a leader module, a task execution log and a task execution log, wherein the leader module is used for distributing a task execution log to a task execution log;
The method is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader modules by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.
As shown in fig. 2 and 3, the specific flow of the method is as follows:
The dispatching center module is responsible for the dependent configuration and job development of the workflow, and the configured workflow is persisted into a workflow table to be executed of the relational database through an API interface; workflow dependencies are described by Json, and their predecessor and successor job id information is stored in the Json data for each job.
The coordinator service (coordinator) scans the to-be-executed workflow table in the relational database at regular time to acquire a to-be-executed command, and requests the load information of the Leader cluster to the ZooKeeper (ZK for short) at regular time, and the workflow is distributed to the corresponding Leader according to the CPU and the memory allowance of each Leader machine currently by adopting a Round-Robin algorithm. And finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.
Judging whether the workflow needs to be executed or not according to a process_instance theme in a header consumption message queue and a header_host_name field in a message; if the workflow needs to be executed, the workflow is segmented into a plurality of computing tasks according to the dependency relationship of the workflow, and the computing resources needed by each computing task are estimated. Load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; and adding the corresponding executor Follower _host_name information to the independent calculation tasks after segmentation, and sending the independent calculation tasks to a task_instance theme in a message queue. In the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned through the message queue, and updates the execution result of the workflow. And when the execution state of the whole workflow is changed into a final state, persisting the workflow execution result into a relational database.
Follower consume the data of the task_instance theme in the message queue, and execute according to Follower _host_name matching with the corresponding executor. After the task is executed, writing the execution result of the task back to a task_instance_result theme in the message queue, and waiting for the Leader to consume the task execution result. After the Leader writes the task result after execution back to the relational database, the whole task flow execution is completed.
As a dispatch system with distributed capabilities, fault tolerant design is the core that must be considered by the overall system because of the natural unreliability of distributed systems. Distributed fault tolerance of the whole dispatching system is realized based on a ZooKeeper. When the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once a Leader or Follower is found to be down, a workflow fault-tolerant flow is entered, including a Leader fault-tolerant flow and a Follower fault-tolerant flow.
As shown in fig. 4, in this embodiment, the header fault tolerance flow specifically includes:
Each machine of the Leader cluster monitors znode on the ZooKeeper, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper is triggered, one of the live Leader acquires the distributed lock, and triggers workflow error tolerant logic, workflow information to be fault-tolerant is inserted into a fault-tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow, so that the distributed fault-tolerant process of the Leader is completed. As shown in fig. 4, after the Leader1 is hung up, the Leader2 acquires the distributed lock, then triggers the workflow error tolerant logic, inserts the workflow information to be fault-tolerant into the fault-tolerant command table in the relational database, and then the Leader2 takes over the workflow to complete the distributed fault-tolerant process of the Leader.
In this embodiment, the Follower fault-tolerant flow specifically includes:
Each machine in Follwer clusters registers itself with a znode on the ZooKeeper cluster, if Follower downtime of the executing task occurs, a monitoring mechanism of the Leader is triggered, all running tasks on all current downtime Follower are terminated, the Leader marks the workflow as requiring fault-tolerant state, and surviving Follower is reselected as an executor of the remaining tasks of the workflow.
As shown in fig. 5, when Follwer is down, it is re-executed by Follwer 2.
In this embodiment, a storage medium has stored therein a computer readable program that, when called by an executor, can execute the steps of the big data processing oriented distributed scheduling method as described in this embodiment.

Claims (5)

1. A distributed scheduling system for big data processing, comprising:
the scheduling center module is used for responsible for the dependent configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of the relational database through an API interface;
The leader module is used as a task flow segmentation and distribution node in the cluster, and the segmented specific task nodes are sent to follower nodes according to the workflow configured by the dependency segmentation scheduling center;
The follower module, also called an executor, is used for executing the specific calculation task distributed by the leader module, submitting the task result and storing the task execution log;
The coordinator module is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader module by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules;
The task queue module is a message queue and comprises a workflow topic, a task topic and a task result topic, and is used for realizing task dependence among the workflows;
the metadata module comprises two databases, namely a relational database and a distributed memory database, wherein the relational database is used for persistently storing the execution record of the workflow; the distributed memory database is used for taking out the workflow related metadata from the relational database and loading the workflow related metadata into the memory;
When the system is started, the Leader and Follower register with the znode of leader_ mechs and Follower _ mechs on the zookeeper, provide the CPU and memory information of the local machine and maintain the heartbeat; each Leader and Follower monitors the znode; once the Leader or Follower is found to be down, entering a workflow fault-tolerant flow, wherein the workflow fault-tolerant flow comprises a Leader fault-tolerant flow and a Follower fault-tolerant flow;
the fault-tolerant process of the Leader specifically comprises the following steps:
Each machine of the Leader cluster monitors znode on the ZooKeeper, and once the Leader is found to be down, a distributed lock mechanism based on the ZooKeeper is triggered, one of the live Leader acquires a distributed lock and triggers workflow error-tolerant logic, workflow information to be fault-tolerant is inserted into a fault-tolerant command table in a relational database, and then the Leader acquiring the distributed lock takes over the workflow to complete the distributed fault-tolerant process of the Leader;
the Follower fault-tolerant flow specifically comprises the following steps:
Each machine of Follwer clusters registers itself with a znode on a ZooKeeper, if Follower downtime of executing tasks occurs, a monitoring mechanism of a Leader is triggered, all running tasks on all current downtime Follower are terminated, the Leader marks the workflow as a fault-tolerant state, and surviving Follower is reselected as an executor of the rest tasks of the workflow;
Judging whether the workflow needs to be executed or not according to a process_instance theme in a header consumption message queue and a header_host_name field in a message; if the workflow needs to be executed, dividing the workflow into a plurality of computing tasks according to the dependency relationship of the workflow, and estimating the computing resources required by each computing task; load information of each machine of a Follower cluster is obtained from the ZooKeeper cluster, and a Round-Robin algorithm is adopted to perform load balancing of an executor obtaining task; adding corresponding executor Follower _host_name information to the independent calculation task after segmentation, and sending the information to a task_instance theme in a message queue; in the process of suspending the execution thread of the workflow, the Leader consumes Follower data in a task_instance_result theme returned by the message queue, and updates the execution result of the workflow; and when the execution state of the whole workflow is changed into a final state, persisting the execution result state of the workflow into a relational database.
2. A distributed scheduling method for big data processing, characterized in that the distributed scheduling system for big data processing according to claim 1 is adopted, and the method comprises the following steps:
receiving dependency configuration and job development of the workflow, and persisting the configured workflow to a workflow table to be executed of a relational database through an API (application program interface);
dividing the workflow configured by the dispatching center according to the dependency relationship, and sending the divided specific task nodes to follower nodes;
The system comprises a leader module, a task execution log and a task execution log, wherein the leader module is used for distributing a task execution log to a task execution log;
The method is used for regularly taking out tasks to be executed from the database, and carrying out load balancing on the leader modules by adopting a Round-Robin algorithm according to the load conditions of all the current leader modules.
3. The big data processing oriented distributed scheduling method according to claim 2, wherein: the coordinator service periodically scans a to-be-executed workflow table in the relational database to obtain a to-be-executed command, periodically requests load information of a Leader cluster from a ZooKeeper, and distributes a workflow to corresponding Leader according to the CPU and memory allowance of each current Leader machine by adopting a Round-Robin algorithm; and finally, sending the workflow marked with the Leader to a process_instance theme in a message queue, and waiting for the Leader to consume the topic for workflow execution.
4. A distributed scheduling method for big data processing according to claim 3, wherein: follower consuming the data of the task_instance theme in the message queue, performing execution according to Follower _host_name matching with a corresponding executor, writing the execution result of the task back to the task_instance_result theme in the message queue after the task is completed, waiting for the Leader to consume the task execution result, and completing the execution of the whole task flow after the Leader writes the task result after the execution completion back to the relational database.
5. A storage medium having a computer readable program stored therein, characterized in that: the computer readable program, when invoked by an executor, is capable of performing the steps of the big data processing oriented distributed scheduling method of any of claims 2 to 4.
CN202011069582.0A 2020-09-30 2020-09-30 Big data processing oriented distributed scheduling system, method and storage medium Active CN112162841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011069582.0A CN112162841B (en) 2020-09-30 2020-09-30 Big data processing oriented distributed scheduling system, method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011069582.0A CN112162841B (en) 2020-09-30 2020-09-30 Big data processing oriented distributed scheduling system, method and storage medium

Publications (2)

Publication Number Publication Date
CN112162841A CN112162841A (en) 2021-01-01
CN112162841B true CN112162841B (en) 2024-09-06

Family

ID=73861133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011069582.0A Active CN112162841B (en) 2020-09-30 2020-09-30 Big data processing oriented distributed scheduling system, method and storage medium

Country Status (1)

Country Link
CN (1) CN112162841B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010295B (en) * 2021-03-30 2024-06-11 中信银行股份有限公司 Stream computing method, device, equipment and storage medium
CN113438281B (en) * 2021-06-05 2023-02-28 济南浪潮数据技术有限公司 Storage method, device, equipment and readable medium of distributed message queue
CN113535362B (en) * 2021-07-26 2023-07-28 北京计算机技术及应用研究所 Distributed scheduling system architecture and micro-service workflow scheduling method
CN113821322B (en) * 2021-09-10 2024-08-06 浙江数新网络有限公司 Loosely coupled distributed workflow coordination system and method
CN115131130B (en) * 2022-07-01 2024-11-26 江苏苏商银行股份有限公司 A configurable accounting core automatic batch running method, device and storage medium
CN115840631B (en) * 2023-01-04 2023-05-16 中科金瑞(北京)大数据科技有限公司 RAFT-based high-availability distributed task scheduling method and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888719A (en) * 2019-09-18 2020-03-17 广州市巨硅信息科技有限公司 Distributed task scheduling system and method based on web service
CN111400017A (en) * 2020-03-26 2020-07-10 华泰证券股份有限公司 Distributed complex task scheduling method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403945B2 (en) * 2004-11-01 2008-07-22 Sybase, Inc. Distributed database system providing data and space management methodology
US7979859B2 (en) * 2005-05-03 2011-07-12 International Business Machines Corporation Managing automated resource provisioning with a workload scheduler
US20150067028A1 (en) * 2013-08-30 2015-03-05 Indian Space Research Organisation Message driven method and system for optimal management of dynamic production workflows in a distributed environment
CN104683422B (en) * 2013-12-03 2019-01-29 腾讯科技(深圳)有限公司 Data transmission method and device
US9560119B2 (en) * 2014-12-23 2017-01-31 Cisco Technology, Inc. Elastic scale out policy service
CN107766147A (en) * 2016-08-23 2018-03-06 上海宝信软件股份有限公司 Distributed data analysis task scheduling system
CN111209301A (en) * 2019-12-29 2020-05-29 南京云帐房网络科技有限公司 Method and system for improving computing performance based on dependency tree splitting
CN111338774B (en) * 2020-02-21 2023-09-19 华云数据有限公司 Distributed timing task scheduling system and computing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888719A (en) * 2019-09-18 2020-03-17 广州市巨硅信息科技有限公司 Distributed task scheduling system and method based on web service
CN111400017A (en) * 2020-03-26 2020-07-10 华泰证券股份有限公司 Distributed complex task scheduling method

Also Published As

Publication number Publication date
CN112162841A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN112162841B (en) Big data processing oriented distributed scheduling system, method and storage medium
US8914805B2 (en) Rescheduling workload in a hybrid computing environment
US10042886B2 (en) Distributed resource-aware task scheduling with replicated data placement in parallel database clusters
US8024529B2 (en) Providing shared memory in a distributed computing system
US8739171B2 (en) High-throughput-computing in a hybrid computing environment
CN105808334B (en) A kind of short optimization of job system and method for MapReduce based on resource reuse
JP7203102B2 (en) manage the computer cluster interface
CN112860393B (en) Distributed task scheduling method and system
CN102147755B (en) Multi-core system fault tolerance method based on memory caching technology
CN112114973B (en) Data processing method and device
Mei et al. Fault-tolerant dynamic rescheduling for heterogeneous computing systems
CN110471777B (en) An implementation method and system for sharing and using Spark cluster among multiple users in a Python-Web environment
US11748164B2 (en) FAAS distributed computing method and apparatus
CN102999317A (en) Multi-tenant oriented elastic multi-process service processing method
CN112948096A (en) Batch scheduling method, device and equipment
CN111459622A (en) Method and device for scheduling virtual CPU, computer equipment and storage medium
Chen et al. Pisces: optimizing multi-job application execution in mapreduce
CN115617480A (en) Task scheduling method, device and system and storage medium
CN118331708A (en) Dynamic queue scheduling method and system
Tang et al. A network load perception based task scheduler for parallel distributed data processing systems
CN114968553A (en) Heterogeneous server automatic scheduling system and method for massive machine learning tasks
CN102915257A (en) TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method
JPH11353284A (en) Job re-execution method
Cheng et al. Permanent fault-tolerant scheduling in heterogeneous multi-core real-time systems
Zhai et al. Scalability and Fault Tolerance for Real-Time Edge Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant