CN118331708A

CN118331708A - Dynamic queue scheduling method and system

Info

Publication number: CN118331708A
Application number: CN202410520671.4A
Authority: CN
Inventors: 江恒; 周丕化; 陈勇明; 谢冠鹏; 王凤霞
Original assignee: Guangzhou Jianzhi Information Technology Co ltd
Current assignee: Guangzhou Jianzhi Information Technology Co ltd
Priority date: 2024-04-28
Filing date: 2024-04-28
Publication date: 2024-07-12

Abstract

The application is applicable to the technical field of information, and provides a dynamic queue scheduling method and a system, wherein the method comprises the following steps: setting a management interface, which is used for inputting configuration information of a script, storing the configuration information into a database, adopting a task scheduling framework Quartz, generating a scheduling plan according to the configuration information, triggering the script to start and execute at regular time according to the scheduling plan, deploying a scheduling system on a plurality of nodes, and scheduling the task which is not completed by one node to another node which is normally operated to continue to execute if one node is abnormally operated; aiming at tasks with high execution precision, dynamically adjusting the execution sequence of each task; the application solves the technical problems of how to realize the centralized management of the script, monitor and automatically process the abnormal subprocesses in time and ensure the execution of the timing tasks with finer time granularity.

Description

Dynamic queue scheduling method and system

Technical Field

The application belongs to the technical field of information, and particularly relates to a dynamic queue scheduling method and a system.

Background

The demands of modern business scenes on service continuity and response timeliness are increasing, and the core for achieving the aim is to construct an efficient queue scheduling system.

However, the current queue scheduling system has problems in various aspects, and is difficult to adapt to the increasing service demands, and is particularly characterized in the following aspects, in the conventional queue scheduling system, script management is performed in a scattered and manual mode, a plurality of scripts are required to be set and started one by an administrator, and the operation process is complex and easy to make mistakes. As the number of scripts increases, the efficiency of such management modes becomes increasingly inefficient. In addition, in the service operation process, unexpected interruption may occur in the sub-process, and the existing system has weak functions in monitoring and processing such anomalies, and cannot quickly and accurately identify and cope with the anomalies, so that service continuity, user experience and overall stability of the system are affected. In addition, the execution granularity of the timing task is too coarse, and the currently commonly adopted minute-level scheduling mode cannot meet the high requirement of modern business on the refinement degree of the timing task execution and the requirement of second-level or even millisecond-level scheduling. In addition, the current system has insufficient elasticity when coping with the fluctuation of the queue information, and cannot dynamically adjust the system resources according to the actual data flow, so that a hysteresis phenomenon occurs in task execution at high flow, and a phenomenon that the system resources are idle occurs at low flow, thereby seriously affecting the system performance.

In summary, the main technical contradiction faced at present is how to realize centralized management of scripts, timely monitor and automatically process sub-process exceptions, ensure that timing tasks are executed with finer time granularity, and flexibly adjust system resources according to information such as queue information.

Disclosure of Invention

The embodiment of the application provides a dynamic queue scheduling method and a system, which can solve at least one of the problems in the prior art.

In a first aspect, an embodiment of the present application provides a dynamic queue scheduling method, where the method includes:

the device comprises a setting management interface, a control interface and a control interface, wherein the setting management interface is used for inputting configuration information of a script, and the configuration information comprises a script path, parameters and execution period metadata;

Adopting a task scheduling frame Quartz, generating a scheduling plan according to the configuration information, and triggering a script to start and execute at fixed time according to the scheduling plan;

the scheduling system is deployed on a plurality of nodes, and if one node runs abnormally, the task which is not finished by the node is scheduled to another node which runs normally to continue to execute;

Aiming at tasks with high execution precision, dynamically adjusting the execution sequence of each task;

Aiming at a complex task, splitting the task into a plurality of subtasks, determining the execution sequence of each subtask, and sequentially executing the subtasks according to the execution sequence;

dynamically adjusting system resources according to queue information of the script, wherein the queue information comprises a production rate and a consumption rate;

Analyzing and modeling historical task data quantity, predicting future task data quantity and demand quantity of system resources, and scheduling tasks and allocating the system resources in advance;

and monitoring the execution condition of the script, periodically detecting the running state of the subprocess based on a heartbeat detection mechanism, triggering an alarm mechanism and restarting the subprocess if the subprocess runs abnormally, and simultaneously recording abnormal log information.

In a second aspect, an embodiment of the present application further provides a dynamic queue scheduling system, where the system includes:

a first processing module: the device comprises a configuration information input module, a configuration information input module and a configuration information input module, wherein the configuration information is used for setting a management interface and is used for inputting configuration information of a script, and the configuration information comprises a script path, parameters and execution period metadata;

and a second processing module: the method comprises the steps of generating a scheduling plan according to configuration information by adopting a task scheduling framework Quartz, and triggering a script to start and execute at fixed time according to the scheduling plan;

And a third processing module: the scheduling system is used for being deployed on a plurality of nodes, and if one node runs abnormally, tasks which are not completed by the node are scheduled to another node which runs normally to continue to be executed;

A fourth processing module: the method is used for dynamically adjusting the execution sequence of each task aiming at the task with high execution precision;

and a fifth processing module: the method comprises the steps of dividing a task into a plurality of subtasks aiming at a complex task, determining the execution sequence of each subtask, and sequentially executing the subtasks according to the execution sequence;

And a sixth processing module: the system resource management method comprises the steps of dynamically adjusting system resources according to queue information of a script, wherein the queue information comprises a production rate and a consumption rate;

Seventh processing module: the system is used for analyzing and modeling historical queue information, predicting future queue information and system demand, and scheduling tasks and allocating system resources in advance;

an eighth processing module: the method is used for monitoring the execution condition of the script, detecting the running state of the subprocess based on the heartbeat detection mechanism at regular intervals, triggering the alarm mechanism and restarting the subprocess if the subprocess runs abnormally, and recording abnormal log information.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

The invention discloses a dynamic queue scheduling method, which improves the maintenance efficiency of a system through centralized management and configuration scripts, enhances the reliability of the system through the self-recovery capability of faults, improves the efficiency of task execution and the utilization rate of resources through dynamic resource allocation, and ensures the continuity and the stability of the system through a process level monitoring and exception handling mechanism.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a dynamic queue scheduling method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a dynamic queue scheduling system according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Referring to fig. 1, the present invention is a dynamic queue scheduling method, comprising the following steps:

S100, setting a management interface, wherein the configuration information is used for inputting configuration information of a script, and the configuration information comprises a script path, parameters and execution period metadata;

In this embodiment, through setting a management interface, the configuration information of the script input by the user is obtained, so as to implement centralized configuration and management of the script.

S200, a task scheduling frame Quartz is adopted, a scheduling plan is generated according to the configuration information, and a script is triggered to start and execute at fixed time according to the scheduling plan;

in one embodiment, the task scheduling framework q is adopted, a scheduling plan is generated according to the configuration information, and a trigger script is started and executed at regular time according to the scheduling plan, which specifically includes:

Storing the configuration information into a database, and converting the configuration information into a scheduling plan which can be identified and processed by Quartz;

Loading the scheduling plan into a task scheduling engine of a Quartz, and enabling a script to execute tasks according to the scheduling plan;

In the execution process of the script, acquiring operation data of the script, storing the operation data into a database, and associating the operation data with the configuration information, wherein the operation data comprises an operation state and an output result of the script;

after script execution is completed, storing the execution result and the script state of the script into a database, and triggering a corresponding notification mechanism.

In this embodiment, a database table structure is created, configuration information is stored in the database table structure, an association relationship between a script and the configuration information is established, subsequent data query, task scheduling and management are facilitated, then a task scheduling framework, namely, quartz, is adopted, a corresponding scheduling plan is generated according to the script configuration information stored in the database, specifically, information such as execution time and frequency of the script is converted into a task scheduling rule which can be identified and processed by the Quartz, namely, the corresponding scheduling plan is generated, the task scheduling rule is loaded into a task scheduling engine of the Quartz, and execution of the script is automatically triggered by the Quartz according to the scheduling plan, so that the script can be automatically executed at a designated time point without manual intervention.

In the script execution process, the running state and the output result of the script are acquired, the information is recorded in a database and is associated with corresponding script configuration information, so that subsequent inquiry and analysis are facilitated, and meanwhile, the information can be used as a history record of script execution for audit and problem tracing. If the script execution is completed, updating the execution result and the script state into a database, and triggering a corresponding notification mechanism, namely notifying the execution result to related management personnel so as to know the running condition of the script in time, and carrying out subsequent processing and decision-making according to the execution result to ensure the closed loop and traceability of script management.

Specifically, configuration information of the script is stored by using MySQL database. For example, a table named script_config is created, which contains field id (primary key), script_path, parameters, execution_cycle, and the like. Through the management interface, the user may enter a new script configuration, such as information representing the execution of 12 pm per day, which is inserted into the script_config table, creating a new record with id being the self-growing unique identifier. And generating a corresponding scheduling plan according to the script configuration information in the database. For the script configuration described above, the Quartz generates a CronTrigger with a time expression of "0012? ", indicates 12 pm triggers per day. CronTrigger is associated with one JobDetail, and JobDetail contains configuration information of the script to be executed. When the dispatch plan is generated, the Quartz will load it into the task dispatch engine. When reaching the appointed execution time point, the Quartz automatically triggers the corresponding Job to execute the script. When the system time reaches 12 pm per day, the quantiz will start a new thread, call/path/to/script.sh script, and pass in the "-a-b-c" parameter. In the script execution process, the running state and the output result of the script can be obtained by monitoring the standard output and the error output stream of the script. If the script outputs 'Success' to the standard output stream, the script is considered to be successfully executed; if the script outputs 'Error: xxx' to the Error output stream, the script is considered to fail to execute, and corresponding Error information is recorded. The script_execution_log table for recording these information into the database includes a field ID (primary key), script_id (associated script configuration ID), start_time (start time), end_time (end time), status (execution state), output (output result), and the like. After script execution is completed, according to the execution state and the output result, the record in the database is updated, and a corresponding notification mechanism is triggered. If the script is successfully executed, updating a status field of a corresponding record in the script_execution_log table to 'SUCCESS', and sending a mail to inform related personnel; if script execution fails, the status field is updated to FAILURE, and an alarm mail is sent to inform relevant personnel to process in time.

S300, deploying a scheduling system on a plurality of nodes, and if one node runs abnormally, scheduling tasks which are not completed by the node to another node which runs normally to continue to execute;

In this embodiment, by disposing the scheduling system on a plurality of nodes, performance bottlenecks are avoided when a single node fails, thereby affecting the service.

In one embodiment, the step S300 includes:

Based on a containerization technology, splitting each component of a dispatching system into a plurality of independent micro services, packaging the micro services into container images, and deploying a plurality of the container images on a plurality of nodes;

based on a load balancing strategy, uniformly distributing tasks to all nodes;

Adding affinity attributes to each node in metadata information of the task;

designating a node which preferentially executes the task according to the affinity attribute and the load condition of the node;

If the designated node is not available, selecting a suboptimal node to execute the task according to a degradation strategy of the affinity;

Detecting the running state of tasks on each node;

and if the task on the node fails to execute or is overtime, triggering a fault transfer mechanism, and scheduling the task of the node to another node which normally operates to execute again.

In this embodiment, each component of the scheduling system, such as a task scheduler, a task executor, and configuration management, is split into independent micro-services through a containerization technique, such as Docker or Kubernetes, packaged into a container mirror image, and deployed on a plurality of nodes, so as to implement distributed deployment of the scheduling system, where a node may be a physical node or a virtual node. And then, adopting a load balancing strategy such as a consistent hash algorithm or a polling algorithm and the like to uniformly distribute task requests to each node, so that a single node is prevented from becoming a performance bottleneck. Meanwhile, the health state of each node is detected periodically through a heartbeat detection mechanism, and tasks on the failed or faulty node are reassigned to other available nodes.

In other embodiments, a distributed lock or a distributed coordination service, such as a Zookeeper, is used to lock and synchronize the shared resource, so as to avoid that multiple nodes consume the same task at the same time, which results in repeated execution of the task or inconsistent data, and meanwhile, the atomicity of the task and the integrity of the data are ensured through a distributed transaction or a final consistency algorithm. In addition, a distributed log collection and monitoring system such as ELK or Prometheus is used for acquiring real-time log and index data in the task execution process and carrying out real-time monitoring and abnormal alarm on the task execution condition. If the task on a certain node is found to fail or overtime, a fault transfer mechanism is triggered, and the task is transferred to other available nodes for retry. Meanwhile, link tracking and performance analysis are performed on each component of the scheduling system through a distributed tracking system such as Zipkin or Jaeger, so that performance bottlenecks and fault points of the system are identified. Based on the tracking results, the system is optimized and improved, such as increasing the number of nodes, adjusting task allocation policies, optimizing task execution logic, etc., continuously improving the availability and performance of the system.

Specifically, when the Docker containerized deployment scheduling system is adopted, components such as a scheduler, an executor, configuration management and the like can be respectively packaged into independent images, and the lifecycle and the load balance of the container can be managed through Deployment of Kubernetes and Service resources. For example, the scheduler image may be implemented based on SpringBoot and the Quartz framework, with the configuration file being mounted inside the container through ConfigMap and the address and port of the Zookeeper being specified by the environment variables. The executor image may be implemented based on the Python and Celery frameworks, and receives the task issued by the scheduler through the task queue and stores the execution state and result of the task through the Redis. In Kubernetes, automatic expansion of a scheduler and an actuator can be realized through HPA (HorizontalPodAutoscaler), the copy number of Pod is dynamically adjusted according to the utilization rate of a CPU and a memory, and the availability and the performance of a system are ensured. In the task scheduling process, the scheduler selects an executor node through a consistent hash algorithm, and routes a task request to a corresponding Pod. The core idea of the consistent hash algorithm is that both nodes and tasks are mapped to a ring, and the first node which is larger than or equal to the task hash value is found clockwise to serve as an execution node. There are 3 actuator nodes A, B, C with hash values of 10, 20, 30, respectively, and 2 tasks T1 and T2 with hash values of 15 and 25, respectively, then T1 will be assigned to node B for execution and T2 will be assigned to node C for execution. In this way, an even distribution of tasks among nodes can be ensured, and when the number of nodes changes, only a small part of task distribution can be influenced. During task execution, the executor sends the execution log and index data of the task to the elastic search and Prometaheus, and log query and index monitoring are achieved through Kibana and Grafana. Key fields in the task log, such as task ID, execution time, execution result, etc., can be parsed by a Grok filter of logstack, and quick retrieval can be realized by an inverted index of the elastomer search. For another example, indexes such as execution time, execution times, failure rate and the like of the task can be collected through exporters of Prometheus, and the aggregation calculation and alarm rule can be realized through PromQL. During the failover process, the state change of the node can be monitored through a Watcher mechanism of a Zookeeper, and a new scheduler node can be determined through a Leader election. When one scheduler node A goes offline, other scheduler nodes can receive a notification of a Zookeeper and trigger a Leader election process. There are 3 scheduler nodes A, B, C whose Zookeeper session IDs are 1, 2, 3, respectively, and the current Leader is node a. When node a goes offline, nodes B and C will compare their session IDs, and since node B's session ID is minimal, it will be selected as a new Leader, taking over node a's task scheduling and allocation. Meanwhile, the node B can reassign the task which is not finished by the node A to the node C and ensure the continuous execution of the task.

S400, aiming at tasks with high execution precision, dynamically adjusting the execution sequence of each task;

in some of these embodiments, the step S400 includes:

modifying the configuration of a task scheduling frame Quartz, and adjusting the time granularity of the task;

setting priority attributes for each task according to the sensitivity degree of different tasks to time;

determining the execution sequence of each task according to the priority attribute;

for tasks with the same priority attribute, executing according to the time sequence submitted to the system

Triggering the script to execute the corresponding task by adopting a time round algorithm;

Detecting the execution condition of the task, wherein the execution condition comprises execution times and execution time;

If the execution time of the task is greater than a preset time threshold, triggering a warning mechanism;

And periodically detecting the execution state of the task, and if the task fails to be executed, recovering to execute the task from the latest detection point.

In this embodiment, for a timing task, the original time granularity is adjusted from the minute level to the second level by modifying the configuration of the task scheduling framework quantiz, so that the scheduler can support task scheduling with finer granularity, and the execution precision of the timing task is improved.

Specifically, in the Quartz task scheduling framework, the temporal granularity of tasks is adjusted by configuring the org.quatertz.jobstore.misfirethreshold parameter. If the parameter is set to 5000, which means that the scheduler considers the task to be timely within 5 seconds, and considers misfire to be re-scheduled beyond 5 seconds, which means that the original minute level scheduling can be changed to the second level.

For tasks requiring millisecond level triggering, a time-round algorithm may be employed to achieve an efficient timer. The core idea of the time-wheel algorithm is to divide the time into a number of slots, each representing a fixed time interval, e.g. 1 millisecond. The algorithm uses a circular array to represent the time wheel, each element of the array being called a bucket, each bucket storing tasks that need to be performed during that time interval. When the current time of the system reaches the time interval represented by a bucket, all tasks in the bucket are triggered. When the current time is t and the execution time of the next task is t+Δt, the task is put into the (t+Δt)/1 millisecond% bucket. By taking the model, it is possible to quickly locate which bucket the task should be put into.

For time sensitive tasks, it can be guaranteed to execute in time by setting priority. In the Quartz, its priority can be dynamically set before task triggering by implementing TRIGGERLISTENER interface. For example, a task that is required to be performed on a daily 09:15:00 schedule may determine whether the current time is within a narrow window period, such as 30 milliseconds, around 09:15:00 in the TRIGGERFIRED method of TRIGGERLISTENER, and if so, set the priority of the task to be highest. Furthermore, it is necessary to synchronize with the time server periodically, considering that the system clock may drift, affecting the accuracy of the timing tasks. At this time, by using NTP (NetworkTimeProtocol) protocol to realize clock synchronization, the Linux system has ntpdate command, and the precision can reach millisecond level by calling the command to synchronize time with the NTP server through the crontab timing.

In addition, in order to monitor the execution condition of the timed task, logs may be recorded before and after the task is executed, including information such as task ID, execution time, execution result, and the like. If the deviation between the actual execution time of the task and the expected time exceeds a threshold value, such as 1 second, an alarm is triggered, and an administrator is notified through mail or a short message. Meanwhile, indexes such as the execution times and the execution time of tasks can be counted through monitoring tools such as Prometheus and the like, and a visual instrument panel and report are generated, so that an administrator can conveniently know the running state of the system in real time, and the problems can be found and solved in time.

For a task running for a long time, a task monitoring and fault tolerance mechanism is adopted, the execution state of the task is detected and saved regularly, and when the task fails, the task can be recovered from the last check point, so that the expenditure of repeated calculation is reduced. Meanwhile, for overtime or failed tasks, retry or alarm is automatically carried out, so that the reliability of the tasks is ensured. In the task scheduling process, metadata such as the submitting time, the starting time, the ending time, the execution state, the resource consumption and the like of the task are recorded, and the execution progress and the statistical information of the task such as the average waiting time, the average execution time, the resource utilization rate and the like are displayed through a visualized task monitoring interface, so that an administrator can conveniently know the running condition and the bottleneck of the system, and optimization and adjustment are performed.

S500, aiming at a complex task, splitting the task into a plurality of subtasks, determining the execution sequence of each subtask, and sequentially executing the subtasks according to the execution sequence;

in this embodiment, for a complex task, a task splitting manner is adopted to split the task into a plurality of subtasks, and each subtask is responsible for processing a part of data, so that waiting time of the task and waste of resources are reduced to the greatest extent.

In one embodiment, the determining the execution sequence of each subtask includes:

generating a DAG (directed acyclic graph) according to the dependency relationship between each subtask;

and determining the execution sequence among each subtask by adopting a topological sorting algorithm according to the DAG.

Specifically, a Directed Acyclic Graph (DAG) is used to represent the dependency between subtasks, each node represents a task, and the directed edges represent the order of the tasks. Meanwhile, the execution sequence among each subtask is obtained through a topological sorting algorithm, so that circular dependence and deadlock are avoided. In other embodiments, the order of execution among the subtasks may also be achieved by setting a priority attribute for each task, where the priority may be numerical or a high, medium, low, etc. level. When the tasks are scheduled, the execution sequence of the tasks is determined according to the priority, the tasks with high priority are executed preferentially, and the tasks with the same priority are executed sequentially according to the submitting time.

Further specifically, the use condition of resources such as CPU, memory, disk and the like of each node in the cluster is monitored in real time through a yacn or Mesos resource management system. When the task is scheduled, the optimal node is dynamically selected to execute the task according to the resource demand of the task and the resource idle condition of the node, so that load balancing and resource utilization rate maximization are realized. Meanwhile, subtasks can be executed in parallel, so that the total execution time of the tasks is reduced. In addition, the subtasks communicate and cooperate through a shared memory or a message queue and the like.

Specifically, in optimizing a task scheduling algorithm, DAGs (directed acyclic graphs) may be employed to represent dependencies between tasks. For example, for a data processing flow, the data processing flow can be split into a plurality of tasks such as data acquisition, data cleaning, data conversion, data analysis and the like, and each task has a precedence dependence relationship to form a DAG. Through a topological sorting algorithm, the execution sequence of tasks, such as data acquisition- > data cleaning- > data conversion- > data analysis, can be obtained. The time complexity of the topology ordering algorithm is O (V+E), wherein V is the number of task nodes and E is the task dependency coefficient. In addition, a priority may be set for each task, for example, the priority of the data analysis task is set to be high, the priority of the data collection task is set to be low, and the scheduler may execute the task with high priority preferentially.

Specifically, in task scheduling, the resource utilization in the cluster needs to be considered. When a task needs to consume a lot of memory, the scheduler needs to select a node with a higher memory idle rate to execute the task. The resource utilization condition of the node can be acquired in real time through resource management systems such as YARN or Mesos, and the optimal node is dynamically selected according to the resource requirement of the task and the resource idle condition of the node. If a certain task needs a 2-core CPU and a 4GB memory, the scheduler selects a node with the CPU idle rate being greater than or equal to 2 cores and the memory idle rate being greater than or equal to 4GB to execute the task.

Specifically, for a complex big data processing task, the task is split into multiple Map and Reduce stages by adopting a task splitting mode, and each stage is responsible for processing a part of data. For example, for a word frequency statistics task, the word frequency statistics task may be split into a Map stage and a Reduce stage, where the Map stage is responsible for splitting text data into words according to spaces, outputting key value pairs of < word,1>, and the Reduce stage is responsible for adding count values of the same words, and finally outputting results of < word, count >. By executing Map and Reduce tasks in parallel, the execution efficiency of the tasks can be greatly improved. And data exchange and ordering are carried out between the Map and the Reduce task through a shuffle process.

Specifically, in the task execution process, a monitoring and fault-tolerant mechanism is also required to be added. For a machine learning training task running for a long time, parameters and intermediate results of the model can be stored every 1 epoch, and when the task is abnormally terminated, the task can be recovered from the latest check point, so that the expenditure of repeated calculation is reduced. Meanwhile, the maximum retry times and the overtime time of the tasks can be set, if the retry times of one task fail to be more than 3 times or the execution time exceeds 2 hours, the task is automatically terminated, and an alarm notification is sent to an administrator to timely handle abnormal conditions.

In addition, in order to monitor and optimize the performance of task scheduling in real time, various indexes of the task need to be collected and analyzed. For each task, metadata such as the submitting time, the starting time, the ending time, the execution state, the resource consumption and the like of the task can be recorded, and the execution progress and the statistical information of the task are displayed through visualization tools such as Grafana and the like. If the average task waiting time is 10s, the average task execution time is 5min, the CPU utilization rate is 60%, the memory utilization rate is 40%, etc.

S600, dynamically adjusting system resources according to queue information of the script, wherein the queue information comprises a production rate and a consumption rate;

In one embodiment, the step S600 includes:

acquiring real-time queue information of a script queue, wherein the queue information comprises a production rate and a consumption rate;

Presetting an upper limit threshold and a lower limit threshold of a queue length;

Calculating the real-time queue length of the task scheduling queue based on the production rate and the consumption rate;

If the length of the real-time queue is greater than or equal to the upper threshold, triggering capacity expansion operation;

And if the real-time queue length is smaller than the lower threshold value, triggering the capacity-shrinking operation.

Specifically, kafka is introduced as a middleware of the queue, and the state of the queue is monitored by monitoring indexes such as MESSAGESINPERSEC (message writing amount per second) and BytesOutPerSec (byte output amount per second) carried by Kafka, so that real-time queue information of the queue is obtained. If MESSAGESINPERSEC of the current queue reaches 10000/s and BytesOutPerSec is only 2MB/s, the current consumption rate is about 200 bars/s through calculation, the production rate is 10000 bars/s, and therefore the queue stacking speed is 9800 bars/s.

In particular, the upper and lower thresholds may be determined based on historical data and traffic demand. If the upper limit threshold of the queue length is 100000 and the lower limit threshold is 50000, an alarm event is triggered when the queue length exceeds 100000. The alarm event can be configured by monitoring systems such as Zabbix or Prometheus, when the monitoring index exceeds a threshold value, alarm notification can be automatically sent, capacity expansion operation is triggered, and the system determines the number of consumers to be increased according to a predefined capacity expansion strategy. The predefined dilatation strategy is to divide the difference between the current consumption rate and the production rate by the average consumption rate of individual consumers to obtain the number of consumers to be increased. Specifically, if the average consumption rate of individual consumers is 100 pieces/s, it is necessary to increase (10000-200)/100=98 consumers. At this point the deployment module will automatically create and configure a new customer instance through Ansibleplaybook and register it with the customer group of Kafka. The newly added consumer instance will average share the messages in the queue, increasing the overall consumption rate. The total number of consumers after the increase is 100, and the number of messages to be processed by each consumer is 100000/100=1000, so that the queue can be theoretically consumed within 10 seconds.

When the queue length falls below the lower threshold 50000, the capacity shrinking operation is triggered, namely, a part of consumer examples are selected to stop and remove according to a predefined capacity shrinking strategy. The first 10 consumer instances to start up may be selected for scaling until the total number of consumers drops to 90. In the whole allocation process, each expansion and contraction operation is recorded in audit tables of the MySQL database, and the information comprises operation time, operation type, operation object, operation parameters, operation results and the like. Meanwhile, in order to avoid frequent expansion and contraction operations, a cooling time, such as 30 seconds, can be set, the blending operation cannot be repeated in the time, and the next blending cannot be performed until the cooling time is over. In addition, the limitation of concurrent allocation can be set, for example, 2 capacity expansion operations and 1 capacity reduction operation can be performed at most simultaneously, so that the system is prevented from being excessively impacted. If abnormality occurs in the allocation process, if Ansibleplaybook fails to execute, the rollback operation is automatically triggered, the system is restored to the state before allocation, and an alarm notification is sent to an administrator so as to perform manual processing in time.

S700, analyzing and modeling historical task data quantity, predicting future task data quantity and system demand quantity, and scheduling tasks and allocating system resources in advance;

in one embodiment, the step S700 includes:

collecting historical task data volume, and analyzing the historical task data volume;

modeling the analyzed historical task data amount to obtain a predictive model of the queue;

Based on the prediction model, combining the execution time of the task and the consumption of system resources to obtain a prediction result, wherein the prediction result comprises the task data amount and the demand of the system resources in the future preset time;

According to the prediction result, carrying out dynamic capacity expansion operation or capacity shrinkage operation on the system;

In the task scheduling process, the task data volume of the task and the consumption of system resources are monitored in real time, and if one of the real-time task data volume and the consumption of the system resources deviates greatly from a predicted result, an early warning mechanism is triggered.

Specifically, machine learning and predictive models are introduced, and a time series analysis algorithm such as ARIMA is selected to model historical queue information. Taking an order queue of an e-commerce platform as an example, assuming that the platform will generate 10 to 50 tens of thousands of order data per day, the order data of the last year can be collected, and a time series data set is constructed, wherein each data point comprises the date and the order quantity of the day. Parameters p, d, q of the ARIMA model are determined by performing stationarity tests, white noise tests, etc. on the dataset, e.g., ARIMA (7,1,2) indicates that 7 autoregressive terms, 1 st order difference and 2 moving average terms are included in the model. The model is trained using the training set data and the predictive effect of the model is evaluated on the test set, and if RMSE (root mean square error) is less than 5000, the predictive accuracy of the model is considered acceptable. In addition, factors affecting the order volume are introduced as features, such as holidays, promotional campaigns, and the like. A holiday feature can be defined, and if the current day is a holiday, the value is 1, otherwise, the value is 0; a promotional feature is defined that takes a value of 1 if there is a promotion on the day, and 0 otherwise. These features are combined with historical data to construct a multivariate time series dataset, which is then trained and predicted using the ARIMAX model. Through the trained prediction model, the order quantity of the future week can be predicted, for example, the order quantity of the future 7 days is respectively 12 ten thousand, 15 ten thousand, 18 ten thousand, 20 ten thousand, 25 ten thousand, 40 ten thousand and 30 ten thousand. The average processing time of the combined orders is 0.5 seconds, the average CPU occupation of each order is 0.2 core, the memory occupation is 100MB, and the resource requirement of the future week can be estimated. The first day has a CPU demand of 12 ten thousand 0.2 core=2.4 ten thousand core and a memory demand of 12 ten thousand 100 mb=1.2 TB. According to the prediction result of the resource demand, the capacity of the resource can be expanded in advance, for example, the number of clustered machines is increased from 100 machines to 200 machines, and the configuration of each machine is upgraded from 8 cores to 16 cores to 32 cores, so that the system has enough resource processing capability when an order peak arrives.

In addition, for some periodic tasks, such as data statistics and report generation in the early morning, the execution records of historical tasks can be analyzed, the average execution time of the tasks is found to be 30 minutes, and the resource consumption is 10 CPU cores and 50GB of memory. As the early morning is the low peak period of the system, the scheduling time of the task can be advanced to 0 midnight, and the system resource is prevented from being preempted in the peak period. In addition, the running state of the system needs to be monitored in real time, if the actual order quantity or the resource usage amount is found to have larger deviation from the predicted value, if the actual order quantity is 30% higher than the predicted value, the early warning needs to be triggered, and the operation and maintenance personnel are informed to carry out emergency treatment. Meanwhile, actual data is fed back to a prediction model, and the model is updated and optimized in real time by using an online learning algorithm such as a Passive-AGGRESSIVE algorithm, so that the accuracy of prediction is improved continuously.

S800, monitoring the execution condition of the script, periodically detecting the running state of the subprocess based on a heartbeat detection mechanism, triggering an alarm mechanism and restarting the subprocess if the subprocess runs abnormally, and recording abnormal log information.

In one embodiment, the step S800 includes:

Creating an independent monitoring process for each executed script by adopting a multi-process mode;

Sending a heartbeat packet to the monitored subprocess through the monitoring process, detecting the response time and the running state of the subprocess, and judging whether the subprocess is in a normal running state or not;

if the heartbeat packet does not receive the response of the subprocess for a plurality of times, judging that the subprocess is abnormal in operation, triggering an alarm mechanism, and simultaneously regenerating a new subprocess according to the configuration information of the script to resume script execution;

If the same sub-process has abnormal operation for a plurality of times within the preset time, triggering an alarm upgrading mechanism;

during the restart of the sub-process, exception log information is recorded, wherein the exception log information includes the time at which the exception occurred, the script path, and the error stack.

Specifically, a separate monitor process is created for each script executed using the multiprocessing library multiprocessing of Python. For example, when a new script starts executing, heartbeat packets may be sent to the sub-process periodically (e.g., every 30 seconds), and the response of the sub-process may be detected. The heartbeat packet may be a simple string, such as "PING," sent to the sub-process via a pipe or socket. If no reply of the sub-process is received for 3 continuous times (namely 90 seconds), the sub-process is judged to run abnormally, and an alarm mechanism is triggered and the sub-process is restarted automatically. The alert may be implemented by sending a mail or calling an API of the enterprise WeChat/spike. When restarting the sub-process, a new sub-process is started according to the configuration information of the script. At the same time, the anomaly information is recorded into the log file for subsequent analysis and optimization. Specifically, logging can be performed using a logging library of Python. Setting an anomaly counter, if the same script has anomalies for 3 times or more within 30 minutes, considering that the script may have serious problems, triggering an alarm upgrading mechanism, and then performing intervention of an advanced administrator to maintain the script.

In some embodiments, the system's CPU, memory, disk, network and other key indexes are collected and analyzed in real time, reasonable alarm rules and index thresholds are set, and particularly, when the key indexes are abnormal, operation and maintenance personnel are informed of the operation and maintenance personnel to conduct treatment in time.

Specifically, key indexes such as a CPU, a memory, a disk, a network and the like of the system are collected and analyzed in real time, a data collection agent program is monitored by Telegraf or Metricbeat and is deployed on each node, various performance indexes of each node such as CPU utilization rate, memory usage amount, disk IO rate, network bandwidth and the like are collected regularly, and collected data are sent to a time sequence database for storage. While selecting an appropriate timing database, such as InfluxDB or OpenTSDB, to store and manage the monitoring data. The time sequence database is optimized for time sequence data, supports high concurrency writing and quick query, and is suitable for processing large-scale monitoring data. In addition, the data storage cost is reduced through a data retention strategy and a data compression algorithm.

Specifically, telegraf is an index collection agent driven by a plug-in developed by Go language, which supports multiple operating systems and data sources, such as Linux, windows, docker, and can collect multiple types of data, such as system index, application index, log, and the like. For example, on the Linux node, indexes such as CPU utilization rate, idle rate, load and the like can be collected by configuring CPUInput plug-ins; by configuring MemInput plug-ins, indexes such as total memory, use amount, idle amount, swap use amount and the like are collected; by configuring DiskInput plug-ins, indexes such as IO read-write rate, IO waiting time, IOPS and the like of the disk are collected; by configuring NetInput plug-ins, indexes such as the sending amount, the receiving amount, the error number, the packet loss number and the like of the network interface are collected. Telegraf the collected data can be sent to an InfluxDB time sequence database for storage, and InfluxDB provides a query language similar to SQL and an HTTPAPI interface, so that the data can be conveniently queried and aggregated. Meanwhile, influxDB supports a data retention strategy and continuous inquiry, can automatically downsample and compress old data, and saves storage space.

Further specifically, the visualized monitoring instrument panel is connected with the time sequence database through the data visualization platform, so that the monitoring instrument panel and the chart are created, and real-time trend and historical data of various indexes are displayed. The operation and maintenance personnel can check the overall operation condition of the system through a Web interface to quickly locate the abnormal problem.

Specifically, a Dashboard for system monitoring is created through Grafana, and key indexes such as a CPU, a memory, a disk, a network and the like of each node are displayed. The CPU utilization rate is shown through a Gauge chart, a color interval of an instrument panel is set, the green utilization rate is 0-60%, the yellow utilization rate is 60-80%, and the red utilization rate is 80-100%; the memory usage and the swap usage are displayed through an Area chart, and the use trend of the memory and the swap-in and swap-out conditions of the swap are intuitively seen; displaying the utilization rate of the magnetic disk through a Stat chart, setting a threshold value, displaying a yellow warning when the utilization rate exceeds 80%, and displaying a red warning when the utilization rate exceeds 90%; and displaying the sending rate and the receiving rate of the network traffic through a Graph chart, and distinguishing the direction and the size of the traffic through an area stacking mode.

Further specifically, reasonable alarm rules and thresholds are formulated according to service requirements and system characteristics. For example, when CPU usage exceeds 80% for 5 consecutive minutes, or disk space usage exceeds 90%, an alarm of warning level is triggered; when the memory usage rate exceeds 95% in 10 minutes continuously, or the packet loss rate of the network exceeds 1%, the serious level alarm is triggered. In addition, the alarm rules are configured into an alarm engine, such as ALERTMANAGER or Kapacitor, which periodically evaluates the monitoring data to determine whether an alarm condition is met. If the condition is met, automatically sending the alarm message to a designated notification channel, such as mail, short message, weChat, slot and the like, according to the alarm notification template. After the alarm message is sent, the operation and maintenance personnel respond and process the alarm event in time. The reasons of the abnormality are deeply analyzed through a monitoring instrument panel and an ELK or Graylog log analysis tool, a solution is formulated, and the processing process and the result are recorded to form a case library for subsequent fault diagnosis and optimization.

Specifically, alarm rules and notification channels are configured through ALERTMANAGER. For example, for CPU usage, an alarm rule may be set that triggers an alarm at the warning level and sends an alarm notification when the average CPU idle rate of the node is below 20% for 5 minutes. The alert notification may be sent by Email, slack, webhook or the like, such as by Email. When an alert is triggered ALERTMANAGER will send an alert notification to the corresponding Email address based on the tag matching the alert route. After the operation and maintenance personnel receives the alarm, the operation and maintenance personnel logs in Grafana to check the trend graph of the abnormal index, analyze the cause of the abnormality, search the related error log and the abnormal stack by combining with the ELK log analysis platform, and locate the root cause of the problem. For example, if the CPU utilization rate of a certain node is found to be continuously high, the process with the highest CPU utilization rate is checked through a top command, then the thread stack of the process is led out through a jstack command, the performance bottleneck of the code is analyzed, and the problem code is optimized or repaired, so that the CPU utilization rate is reduced, and the influence on the service is avoided.

Referring to fig. 2, the present invention further provides a dynamic queue scheduling system, which includes:

And a second processing module: adopting a task scheduling frame Quartz, generating a scheduling plan according to the configuration information, and triggering a script to start and execute at fixed time according to the scheduling plan;

It can be understood that the contents of the embodiment of the dynamic queue scheduling method shown in fig. 1 are applicable to the embodiment of the dynamic queue scheduling system, and the functions of the embodiment of the dynamic queue scheduling system are the same as those of the embodiment of the dynamic queue scheduling method shown in fig. 1, and the beneficial effects achieved by the embodiment of the dynamic queue scheduling method shown in fig. 1 are the same as those achieved by the embodiment of the dynamic queue scheduling method shown in fig. 1.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for dynamic queue scheduling, the method comprising:

2. The method according to claim 1, characterized in that: the adoption of a task scheduling framework Quartz, generating a scheduling plan according to the configuration information, triggering a script to start and execute at fixed time according to the scheduling plan, comprises the following steps:

3. The method according to claim 1, characterized in that: the deployment of the scheduling system on a plurality of nodes, if one of the nodes operates abnormally, scheduling the task which is not completed by the node to another node which operates normally for continuous execution, including:

Based on a containerization technology, splitting each component of a dispatching system into a plurality of independent micro services, packaging the micro services into container images, and deploying the plurality of container images on a plurality of nodes;

based on a load balancing strategy, uniformly distributing tasks to all nodes;

Adding affinity attributes to each node in metadata information of the task;

Detecting the running state of tasks on each node;

4. The method according to claim 1, characterized in that: the task for high execution precision dynamically adjusts the execution sequence of each task, including:

aiming at tasks with the same priority attribute, executing according to the time sequence submitted to the system;

5. The method of claim 1, wherein said determining an order of execution of each of said subtasks comprises:

6. The method of claim 1, wherein dynamically adjusting system resources based on the queue information of the script comprises:

calculating a real-time queue length of the queue length based on the production rate and the consumption rate;

7. The method of claim 1, wherein analyzing and modeling the historical task data amount, predicting the future task data amount and the demand of the system resource, scheduling tasks in advance, and allocating the system resource, comprises:

8. The method according to claim 1, wherein the monitoring the execution of the script periodically detects the running state of the sub-process based on the heartbeat detection mechanism, and if the sub-process runs abnormally, triggers the alarm mechanism to restart the sub-process, and simultaneously records the abnormal log information, including:

9. A dynamic queue scheduling system, the system comprising: