[go: up one dir, main page]

CN113032125B - Job scheduling method, job scheduling device, computer system and computer readable storage medium - Google Patents

Job scheduling method, job scheduling device, computer system and computer readable storage medium Download PDF

Info

Publication number
CN113032125B
CN113032125B CN202110364946.6A CN202110364946A CN113032125B CN 113032125 B CN113032125 B CN 113032125B CN 202110364946 A CN202110364946 A CN 202110364946A CN 113032125 B CN113032125 B CN 113032125B
Authority
CN
China
Prior art keywords
executed
job
scheduler
tasks
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110364946.6A
Other languages
Chinese (zh)
Other versions
CN113032125A (en
Inventor
裴伟斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110364946.6A priority Critical patent/CN113032125B/en
Publication of CN113032125A publication Critical patent/CN113032125A/en
Application granted granted Critical
Publication of CN113032125B publication Critical patent/CN113032125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure provides a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product. The job scheduling method comprises the following steps: determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster comprises one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on a target scheduler instance; the method comprises the steps of averagely distributing a plurality of tasks to be executed to a plurality of executor examples of an executor cluster, concurrently executing the plurality of tasks to be executed by the plurality of executor examples, and updating the execution state of the target task to be executed by the executor example executing the target task to be executed under the condition that the target task to be executed already executed exists; acquiring execution states of a plurality of tasks to be executed; and determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed.

Description

Job scheduling method, job scheduling device, computer system and computer readable storage medium
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product.
Background
With the rapid development of computer technology, business processing gradually tends to be intelligent, and in the intelligent business processing process, the need of job scheduling is unavoidable. Job scheduling is typically to select certain jobs from a backup queue of the external memory to call into memory, create processes for them, allocate necessary resources, and then insert the newly created processes into a ready queue for execution.
In the process of realizing the disclosed concept, the inventor finds that at least the following problems exist in the related art, and a job scheduling method designed for a certain service scene cannot be commonly applied to other service scenes.
Disclosure of Invention
In view of this, the present disclosure provides a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product.
One aspect of the present disclosure provides a job scheduling method, including: determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster comprises one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on the target scheduler instance; the tasks to be executed are averagely distributed to a plurality of executor examples of an executor cluster, so that the tasks to be executed are concurrently executed by the executor examples, and the execution state of the target tasks to be executed is updated by the executor examples for executing the target tasks to be executed under the condition that the target tasks to be executed which are completed are executed; acquiring execution states of the plurality of tasks to be executed; and determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed.
Another aspect of the present disclosure provides a job scheduling method applied to a plurality of actuator instances in an actuator cluster, the method including: acquiring a plurality of tasks to be executed, which are transmitted by target scheduler instances in a scheduler cluster and form a job to be executed, wherein the scheduler cluster comprises one or more scheduler instances, and the tasks to be executed are averagely distributed to the plurality of executor instances; concurrently executing the plurality of tasks to be executed by utilizing the plurality of executor instances; and updating the execution state of the target task to be executed by using an executor instance for executing the target task to be executed when the target task to be executed which is completed is executed, so that the target scheduler instance determines whether scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a job scheduling apparatus, including: a first determining module, configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances; the first acquisition module is used for acquiring a plurality of tasks to be executed, which form a job to be executed, based on the target scheduler instance; the first dispatch module is used for averagely dispatching the plurality of tasks to be executed to a plurality of executor examples of the executor cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of executor examples, and updating the execution state of the target task to be executed by using the executor example for executing the target task to be executed under the condition that the target task to be executed already executed exists; the second acquisition module is used for acquiring the execution states of the plurality of tasks to be executed; and the second determining module is used for determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed.
Another aspect of the present disclosure provides a job scheduling apparatus, including: a fifth obtaining module, configured to obtain a plurality of tasks to be executed that are sent by a target scheduler instance in a scheduler cluster and form a job to be executed, where the scheduler cluster includes one or more scheduler instances, and the plurality of tasks to be executed are averagely allocated to the plurality of executor instances; the execution module is used for concurrently executing the tasks to be executed by utilizing the executor instances; and the first updating module is used for updating the execution state of the target task to be executed by using a target executor instance for executing the target task to be executed under the condition that the target task to be executed after execution is completed, so that the target scheduler instance determines whether the scheduling of the job to be executed is completed or not according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a job scheduling system, comprising: a scheduler module for: determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster comprises one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on the target scheduler instance; the tasks to be executed are averagely distributed to a plurality of executor examples of an executor cluster, so that the tasks to be executed are concurrently executed by the executor examples, and the execution state of the target tasks to be executed is updated by the executor examples for executing the target tasks to be executed under the condition that the target tasks to be executed which are completed are executed; acquiring execution states of the plurality of tasks to be executed; and determining that scheduling for the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed; and an actuator module for: acquiring a plurality of tasks to be executed, which are transmitted by target scheduler instances in a scheduler cluster and form a job to be executed, wherein the scheduler cluster comprises one or more scheduler instances, and the tasks to be executed are averagely distributed to the plurality of executor instances; concurrently executing the plurality of tasks to be executed by utilizing the plurality of executor instances; and updating the execution state of the target task to be executed by using an executor instance for executing the target task to be executed when the target task to be executed which is completed is executed, so that the target scheduler instance determines whether scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a computer system comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the job scheduling method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed, are configured to implement a job scheduling method as described above.
Another aspect of the present disclosure provides a computer program product comprising computer executable instructions which, when executed, are for implementing a job scheduling method as described above.
According to an embodiment of the present disclosure, determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster includes one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on a target scheduler instance; the method comprises the steps of averagely distributing a plurality of tasks to be executed to a plurality of executor examples of an executor cluster, concurrently executing the plurality of tasks to be executed by the plurality of executor examples, and updating the execution state of the target task to be executed by the executor example executing the target task to be executed under the condition that the target task to be executed already executed exists; acquiring execution states of a plurality of tasks to be executed; and under the condition that the execution states of a plurality of tasks to be executed are all completed, determining the technical means for scheduling the tasks to be executed, wherein the steps of dividing the task scheduling process into finer granularity by adopting a scheduler and an executor can be applied to various service scenes, so that the technical problem that the task scheduling method is not universal is at least partially overcome, and the technical effect of realizing the task scheduling method which can be flexibly designed in different service scenes is further achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture in which a job scheduling method may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a job scheduling method applied to a scheduler cluster, in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates an association between jobs and tasks in a database according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a job scheduling method applied to an actuator cluster in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates an overall architecture diagram of a job scheduling system in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a heartbeat model of a scheduler in accordance with an embodiment of the present disclosure;
FIG. 7 schematically illustrates an example diagram of scheduling jobs using a polling approach in accordance with an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a job scheduling apparatus applied to a scheduler cluster, in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of a job scheduling apparatus applied to an actuator cluster, in accordance with an embodiment of the present disclosure; and
Fig. 10 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The currently existing job scheduling framework includes: the Quartz is used as a most popular scheduling framework in Java open source communities, can support various timing tasks and circulating tasks, and also supports distributed deployment; and XXL-Job, a scheduling framework developed by domestic engineers that considers the functions required by many scheduling systems and is a very popular domestic set of scheduling frameworks.
The inventor finds that in the process of realizing the conception of the disclosure, job scheduling needs to control fair allocation and utilization of critical resources of a system among multiple tenants of the SaaS platform besides general basic scheduling logic of the job, large jobs need to be decomposed into small tasks to be subjected to distributed concurrent scheduling so as to improve efficiency, and relations among various system modules need to be decoupled so that the system can be expanded transversely better. Meanwhile, as a distributed scheduling system, the robustness of the system needs to be considered: any node is down and cannot affect other nodes and the whole service. None of the current frames described above meet the requirements.
For example, the quantiz only controls the time of the future operation of the job well, can support the user to customize various timing jobs and cyclic jobs, but it cannot perceive multiple tenants of the SaaS (Software-as-a-Service) platform, nor provide interface extensions for the SaaS platform; splitting a large job into small tasks for scheduling is not supported. Quartz is simply a toolkit for Java job scheduling, not a complete scheduling system.
For another example, XXL-Job is also unable to perceive multiple tenants of the SaaS platform, nor does XXL-Job provide an interface extension that can be used with the SaaS platform; splitting a large job into small tasks for scheduling is not supported; two module dispatching centers in the system are tightly coupled with the executors, the executors need to be registered with the dispatching centers, the dispatching centers need to manage the executors, and the whole system is troublesome and complex in terms of distributed deployment and transverse expansion. The system robustness aspect also lacks complete consideration.
In summary, the inventor finds that a general open-source job scheduling framework does not exist at present in the process of realizing the conception of the disclosure, so that the distributed deployment and the high-concurrency and high-efficiency execution of jobs can be supported, the SaaS platform can be supported, and meanwhile, the limited critical resources can be guaranteed to be fairly and effectively distributed among multiple SaaS multiple tenants and multiple jobs.
Embodiments of the present disclosure provide a job scheduling method, job scheduling apparatus, job scheduling system, computer-readable storage medium, and computer program product. The method includes determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster includes one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on a target scheduler instance; the method comprises the steps of averagely distributing a plurality of tasks to be executed to a plurality of executor examples of an executor cluster, concurrently executing the plurality of tasks to be executed by the plurality of executor examples, and updating the execution state of the target task to be executed by the executor example executing the target task to be executed under the condition that the target task to be executed already executed exists; acquiring execution states of a plurality of tasks to be executed; and determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed.
Fig. 1 schematically illustrates an exemplary system architecture 100 in which a job scheduling method may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the job scheduling method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the job scheduling apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The job scheduling method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the job scheduling apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Or the job scheduling method provided by the embodiment of the present disclosure may be performed by the terminal apparatus 101, 102, or 103, or may be performed by another terminal apparatus other than the terminal apparatus 101, 102, or 103. Accordingly, the job scheduling apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103, or in another terminal device different from the terminal device 101, 102, or 103.
For example, a plurality of tasks to be performed may be originally stored in any one of the terminal devices 101, 102, or 103 (for example, but not limited to, the terminal device 101), or stored on an external storage device and may be imported into the terminal device 101. Then, the terminal device 101 may locally execute the job scheduling method provided by the embodiment of the present disclosure, or send a plurality of tasks to be executed to other terminal devices, servers, or server clusters, and execute the job scheduling method provided by the embodiment of the present disclosure by the other terminal devices, servers, or server clusters that receive the plurality of tasks to be executed.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flow chart of a job scheduling method applied to a scheduler cluster according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, a target scheduler instance is determined from a scheduler cluster, wherein the scheduler cluster includes one or more scheduler instances.
In operation S202, a plurality of tasks to be executed constituting a job to be executed are acquired based on the target scheduler instance.
According to the embodiments of the present disclosure, since the job (job to be executed) may be large or small, the running time may be long or short, and when the system is designed, for example, the job may be split into a plurality of tasks with smaller granularity (i.e., the plurality of tasks to be executed that compose the job to be executed) for execution, and the job and the tasks obtained by splitting thereof may be stored in a database in advance, for example, for the target scheduler instance to obtain.
Fig. 3 schematically illustrates an association relationship between jobs and tasks in a database according to an embodiment of the present disclosure.
As shown in fig. 3, the relevant parameters of the Job (Job) may include, for example, an id (primary key), name (Job name), etc. of the Job, and the relevant parameters of the Task (Task) may include, for example, an id (primary key), job_id (Job id to which the Task belongs), etc. Wherein the job_id field of the Task table corresponds to the primary key of the Job table, indicating that a Job may be composed of multiple Tasks.
In operation S203, the tasks to be executed are equally allocated to the multiple executor instances of the executor cluster, so that the multiple executor instances are utilized to concurrently execute the tasks to be executed, and if there is a target task to be executed that has already been executed, the execution state of the target task to be executed is updated by utilizing the executor instance that executes the target task to be executed.
In operation S204, execution states of a plurality of tasks to be executed are acquired.
In operation S205, in the case where the execution states of the plurality of tasks to be executed are all completed, scheduling completion for the job to be executed is determined.
According to embodiments of the present disclosure, each job and task has its own execution state, which may be stored in the database as a parameter of the corresponding job or task, for example. The target scheduler instance may obtain the execution status of the task from the database, and when the execution status of all tasks of a certain job is "completed", the execution status of the job is set to "completed", for example.
According to the embodiment of the disclosure, the step of splitting the job scheduling process into finer granularity by adopting the scheduler and the executor can be applied to various service scenes, so that the technical problem that the job scheduling method is not universal is solved, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is achieved. Meanwhile, by decomposing one large job into small tasks to be executed on a plurality of nodes in parallel, the execution efficiency of the job can be improved.
According to an embodiment of the present disclosure, the above-described operation S201 includes: determining a first global lock related to the job to be executed; acquiring a scheduler instance with normal heartbeat response in a scheduler cluster; and acquiring the scheduler instance with the first global lock from the scheduler instances with normal heartbeat response as the target scheduler instance.
According to an embodiment of the disclosure, the first global lock may be, for example, a global lock related to the job to be executed and including a scheduler identifier, and the heartbeat response may be, for example, indicative that the corresponding scheduler instance may work normally.
According to an embodiment of the present disclosure, the above-described operation S202 includes: acquiring the heartbeat response of the target scheduler instance; and re-determining the target scheduler instance according to other scheduler instances in the scheduler cluster except the target scheduler instance in the case of abnormal heartbeat response.
According to the embodiment of the disclosure, the abnormal heartbeat response may, for example, represent that the corresponding scheduler instance is down or cannot work normally, in the case that the target scheduler instance is down, in order to not affect the job already allocated to the target scheduler, the job already allocated will not be suspended due to the down of the node, for example, a heartbeat model may be introduced for the scheduler cluster, so that, in the case that the original target scheduler instance is down, a new target scheduler instance may be timely replaced, so as to continue to complete the scheduling process of the task to be executed acquired by the original target scheduler instance.
By the embodiment of the disclosure, after any node is down, the heartbeat model is set, so that the job responsible for scheduling by the node is taken over by other nodes, and the robustness of the scheduling system (namely the scheduler cluster) is ensured.
According to an embodiment of the present disclosure, the critical resource allocation of the system may be allocated in units of tasks, and the acting scheduling method applied to the scheduler cluster may further include: acquiring a first target task to be executed, the execution state of which is completed; and releasing critical resources for executing the task to be executed of the first target.
By the above embodiments of the present disclosure, a finer granularity of controlling the fair allocation of critical resources among individual jobs may be achieved.
According to an embodiment of the present disclosure, the acting scheduling method applied to the scheduler cluster may further include: under the condition that the scheduling process of the job to be executed is interrupted, acquiring a second target job to be executed, the execution state of which is incomplete, in the job to be executed; and averagely assigning the second target task to be executed to a plurality of executor instances for re-execution.
According to the embodiment of the disclosure, by combining the job to be executed and the execution state parameters of the task to be executed, the running condition of each task forming the job can be accurately recorded, and when the job scheduling system fails and resumes, the execution can be continued from the position where the last job was executed without restarting to execute the whole job.
Through the embodiment of the disclosure, repeated execution processes can be reduced, execution efficiency is quickened, and critical resources are saved. Meanwhile, for the job which can not be repeatedly executed by idempotent, the fault caused by repeated execution can be eliminated.
According to an embodiment of the present disclosure, the acting scheduling method applied to the scheduler cluster may further include: and sending the execution process information of the job to be executed to a management end for visual display.
Through the embodiment of the disclosure, the real-time running condition of the operation can be counted and displayed in finer granularity, and real-time control and adjustment can be performed according to the requirement.
Fig. 4 schematically illustrates a flow chart of a job scheduling method applied to an actuator cluster according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S403.
In operation S401, a plurality of tasks to be executed that constitute a job to be executed and are sent by a target scheduler instance in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the plurality of tasks to be executed are equally distributed to a plurality of executor instances.
In operation S402, a plurality of tasks to be performed are concurrently performed using a plurality of executor instances.
In operation S403, in the case where there is a target task to be executed that has already been executed, the execution state of the target task to be executed is updated with the executor instance that executes the target task to be executed, so that the target scheduler instance determines whether scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
According to an embodiment of the present disclosure, the above-described operation S402 includes: judging whether the task to be executed is configured with a retry mark or not under the condition that the task to be executed is not executed successfully; under the condition that the to-be-executed job is configured with a retry mark, the to-be-executed task which is not successfully executed is put into a retry queue, wherein the retry queue is configured with a retry waiting period; and re-executing the task to be executed in the retry queue under the condition that the retry waiting period is satisfied.
In accordance with embodiments of the present disclosure, since some tasks may be performed once without success, for those tasks that are performed without success, the system may provide a mechanism for it to retry running again. For the job configured with the retry identification, for example, a task that needs to be retried in the execution failure in the job may be put into a retry queue, and according to the configured retry waiting period, the task is allocated and executed after waiting for the period. In the period of waiting for retry, the task may also release the critical resource, for example, to ensure that the critical resource is used by other waiting tasks, thereby ensuring maximum use of the critical resource.
According to an embodiment of the present disclosure, the job scheduling method applied to the executor cluster may further include: acquiring a second global lock related to the job to be executed under the condition that the target task to be executed which is already executed is present; and updating global statistical information of the job to be executed by using the executor instance with the second global lock.
According to an embodiment of the disclosure, the second global lock may be, for example, a global lock related to the job to be executed and including an executor identifier, and the global statistics may include, for example, a statistical indicator such as an execution progress, an execution success rate, a failure rate of the job to be executed, and detailed information about execution success or failure of each task to be executed (for example, may include a cause of execution failure of the task).
According to an embodiment of the present disclosure, the job to be executed has N, n≡1, and the job scheduling method applied to the scheduler cluster and the executor cluster may further include: scheduling N jobs to be executed in a polling mode, wherein each polling comprises the following steps: and scheduling one task to be executed in each of the N tasks to be executed.
According to an embodiment of the present disclosure, the scheduling of the job to be executed may include, for example, the obtaining of the task to be executed by the target scheduler instance and the execution of the task to be executed by the executor instance. The first job to be executed and the second job to be executed may be, for example, the same or different jobs started for different tenants in the SaaS platform.
Note that, the job scheduling methods corresponding to fig. 2 and fig. 4 may be applied to a system to implement job scheduling independently, or may be combined with each other to implement job scheduling in the same system.
According to an embodiment of the present disclosure, by combining the job scheduling methods shown in fig. 2 and 4, for example, a job scheduling system including a scheduler module and an executor module may be constructed.
According to an embodiment of the present disclosure, the scheduler module may be configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances; acquiring a plurality of tasks to be executed which form a job to be executed based on a target scheduler instance; the method comprises the steps of averagely distributing a plurality of tasks to be executed to a plurality of executor examples of an executor cluster, concurrently executing the plurality of tasks to be executed by the plurality of executor examples, and updating the execution state of the target task to be executed by the executor example executing the target task to be executed under the condition that the target task to be executed already executed exists; acquiring execution states of a plurality of tasks to be executed; and determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed.
According to an embodiment of the present disclosure, the above-mentioned executor module may be used, for example, to obtain a plurality of tasks to be executed that constitute a job to be executed and are sent by a target scheduler instance in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the plurality of tasks to be executed are equally distributed to the plurality of executor instances; concurrently executing a plurality of tasks to be executed by utilizing a plurality of executor instances; and updating the execution state of the target task to be executed by using the executor instance for executing the target task to be executed when the target task to be executed which is completed is present, so that the target scheduler instance determines whether the scheduling of the job to be executed is completed or not according to the execution states of the plurality of tasks to be executed.
It should be noted that, for example, the scheduler module may correspondingly execute other methods in the job scheduling method applied to the scheduler cluster, and the executor module may correspondingly execute other methods in the job scheduling method applied to the executor cluster, which are not described herein.
Taking the job scheduling of the SaaS platform as an example, the job scheduling system that can use the job scheduling method shown in fig. 2 and 4 described above needs to be divided into three modules, for example: the system comprises a management end, a scheduler module and an executor module. The management end can be responsible for managing the operation including operations of creating, modifying, deleting, copying, starting, suspending, searching, checking, exporting and the like of the operation; the scheduler module can be responsible for splitting the jobs into tasks for fine-grained scheduling and balancing the fair and reasonable allocation of critical resources among all users and all jobs of each organization of the SaaS platform; the executor module may be responsible for specific execution of the task.
Fig. 5 schematically illustrates an overall architecture diagram of a job scheduling system according to an embodiment of the present disclosure.
As shown in fig. 5, in the actual scheduling process, the three modules of the management end (not shown in the figure), the scheduler module and the executor module all perform unidirectional communication, for example, through a message queue, so as to achieve the purpose of decoupling between the modules. The management end can inform each scheduler instance in the scheduler module of task operation through message broadcasting, the scheduler instance splits a certain responsible job into tasks and then sends the tasks to each actuator instance in the actuator module through a message queue, each actuator instance starts to execute a specific certain task after receiving task information, and the state of the task in the database is updated after execution is completed.
According to an embodiment of the present disclosure, the flow of job scheduling based on the job scheduling system shown in fig. 5 may include operations S501 to S509, for example.
In operation S501, a job to be executed is determined.
According to the embodiment of the disclosure, after the management end creates the job (job to be executed), when the job is started, the job may be broadcast to each scheduler instance in the scheduler cluster through a message queue, such as the scheduler instance 1, the scheduler instance 2, and the like, so that the scheduler instance determines the job to be executed that needs to be acquired.
In operation S502, a scheduler instance contends for a global lock.
According to embodiments of the present disclosure, only one of several scheduler instances in a scheduler cluster can schedule the job, the qualification of which can be determined, for example, by competing for a global lock (i.e., the first global lock described above). In this embodiment, the first global lock may be implemented, for example, with Redis and it is determined that the job was scheduled by the scheduler instance that first obtained the first global lock.
In operation S503, the scheduling process and the read job information are updated.
According to an embodiment of the present disclosure, a scheduler instance that obtains scheduling qualification (i.e., the above-mentioned target scheduler instance, for example, scheduler instance 1 in the present embodiment) starts scheduling a job, writes matching information of the scheduler instance 1 and the job into a database (such as MySql in fig. 5), and reads information needed when the job is scheduled from the MySql database.
In operation S504, scheduler instance 1 converts all the tasks that make up the job into MQ messages to send into the task dispatch queue.
In operation S505, the tasks in the task dispatch queue are evenly distributed to concurrent execution in all of the executor instances of the executor cluster (e.g., executor instance 1, executor instance 2,..and executor instance n in fig. 5).
In operation S506, the executor instance contends for the global lock.
In operation S507, the task run-time and end states are updated.
According to the embodiment of the disclosure, after a certain task is executed, an executor instance needs to update the execution state of the task, and at the same time, needs to update the global statistics information of the job to which the task belongs.
It should be noted that, before updating the global statistics of the job, the global lock (i.e., the second global lock) needs to be obtained first, and then the global statistics of the job in the MySql database can be updated. In this embodiment, by introducing the second global lock, the problem of error in updating the statistical index caused when the plurality of executor instances simultaneously execute the operation of updating the global statistical information of the job can be prevented.
In operation S508, the job execution information is pulled and updated.
According to the embodiment of the disclosure, the daemon thread in the scheduler instance can regularly pull the execution state of the job and the task from the corresponding surface and the back surface of the MySql database, release the critical resources occupied by the completed task, and set the running state of the job to be completed after all the tasks of the job are executed.
In operation S509, the management end displays the job and task information in real time.
According to the embodiment of the disclosure, the management end can display the operation and task operation information to the user in real time.
According to an embodiment of the present disclosure, the three modules may be deployed in a cluster distribution manner, where each module may be expanded laterally as required by a service.
Through the embodiment of the disclosure, the decoupled system distributed architecture design is adopted, so that each module of the system is not interdependent, distributed deployment and support of transverse expansion are facilitated, and meanwhile, the distributed structure can support concurrent execution of the job on a plurality of nodes (such as a plurality of executor examples), and the operation efficiency and the system throughput can be effectively improved.
According to embodiments of the present disclosure, a heartbeat model may be configured for example for a scheduler in the job scheduling system shown in fig. 5.
Fig. 6 schematically shows a flow chart of a heartbeat model of a scheduler according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, referring to fig. 5, the above heartbeat model, for example, needs each scheduler to send heartbeat information to the Redis cluster at a timing, where a key of the heartbeat information is, for example, a specific prefix added to an IP address of the corresponding scheduler instance (e.g., scheduler instance 1), the value may be the IP address of the scheduler instance 1 itself, and the expiration time may be, for example, 2 seconds or a configurable parameter.
According to an embodiment of the present disclosure, referring to fig. 6, the flow of the heartbeat model of the scheduler may include operations S601 to S609, for example.
In operation S601, job operation information is received.
In operation S602, query the memory for the job information?
In operation S603, a corresponding operation is performed and ended.
According to the embodiment of the present disclosure, corresponding to operations S601 to S602, when the scheduler instance receives a broadcast message of a job operation sent by the management end, first, whether the job exists in a job list responsible for scheduling itself is queried from the memory of the scheduler instance, if so, operation S603 is performed, otherwise, operation S604 is entered.
In operation S604, the database is queried for the scheduler IP to which the job corresponds.
Is there a corresponding scheduler IP?
Is the queried IP address an own IP?
According to the embodiment of the present disclosure, corresponding to operations S605 to S606, the above-mentioned scheduler instance queries the database according to the job id, obtains the IP address of the scheduler instance responsible for scheduling the job, and if the IP address is equal to its own IP address, performs operation S603.
In operation S607, the global lock is preempted, the operation of the job is contended for and ended.
According to an embodiment of the present disclosure, corresponding to operation S605, if the IP address of the scheduler instance responsible for the job is not queried in the database, which indicates that the job has not been allocated to any scheduler instance, operation S607 is performed.
Query is the scheduler heartbeat for the corresponding IP normal in operation S608?
According to an embodiment of the present disclosure, corresponding to operation S606, if it is queried that the scheduler IP responsible for the job is not itself, then the dis is queried according to the IP address whether the scheduler instance responsible for the job has a heartbeat.
In operation S609, the process returns directly.
According to an embodiment of the present disclosure, corresponding to operation S608, if the scheduler instance responsible for the job is normal, operation S609 is performed, otherwise operation S607 is performed.
According to embodiments of the present disclosure, for an actuator, after any node is down, it will mean that it will no longer receive any messages sent by the MQ, and thus no tasks will be distributed to it. For the scheduler, after any node is down, it means that it will not receive the operation instruction for the job sent by the management end, and the new job created by the subsequent management end will not be scheduled by the management end, so that any node in the scheduler cluster will not affect other nodes after being down. For the scheduler node with the assigned job, the limited scheduler instance can be reassigned for the incomplete task of the down scheduler instance by configuring the heartbeat model, so that the job scheduling is not influenced.
By adopting the heartbeat model in the dispatcher cluster, the embodiment of the invention can ensure the robustness of the system, so that other machines can not be influenced under the condition that any machine in the dispatching system is down, the whole system can also normally operate, and the restarting of any machine can not influence the continuity of service (job dispatching).
According to the embodiments of the present disclosure, for multi-tenant data of the SaaS platform, for example, the multi-tenant data of the SaaS platform may be isolated in a logically isolated manner based on the job scheduling system shown in fig. 5, and a field organization_id (organization Id) may be set on the front and back of all entities of the multi-tenant data to indicate to which organization the data belongs (i.e., tenant, hereinafter, "organization" replaces "tenant"), so that all the organization data may exist in one database, thereby facilitating unified operation, management and deployment. On job scheduling, for example, polling may be used to ensure fair scheduling, with each scheduler instance having, for example, and only one thread responsible for tasking to task all jobs for which the scheduler instance is responsible.
Fig. 7 schematically illustrates an example diagram of scheduling jobs using a polling approach in accordance with an embodiment of the present disclosure.
In accordance with an embodiment of the present disclosure, referring to fig. 5, each scheduler instance maintains, for example, a list of executing jobs (which may include, for example, jobs and tasks as shown in fig. 7), the list may be unorganized (e.g., job1, job2, job3, jobN in fig. 7, corresponding to different organizations), i.e., tasks of all organizations (e.g., tasks of each job in fig. 7 may include multiple task tasks, for example) may be stored in the job list in a mix of all columns. The newly created job may also be added to the job list at any time. On this basis, referring to fig. 7, scheduling a job using a polling method may be represented as, for example, the following procedure:
a. The currently scheduled job (e.g., job 2) is pointed to by pointer jobIndex, and each time a task (e.g., task1 or task 2) is fetched from the currently pointed job for dispatch, jobIndex is moved to the next job (e.g., job 3) after the job is fetched, and a loop is started from the beginning (e.g., job 1) after the tail of the queue (e.g., jobN) is reached.
B. Each job has a list of tasks that make up the job, and the scheduler maintains a taskIndex for each job, pointing to the next task to be dispatched for that job (e.g., task1 or task 2). After a task (e.g., task 1) is assigned to the job, the value taskIndex is added to a next task (e.g., task 2), when the value taskIndex is greater than or equal to the length of the task list, it may be indicated that all tasks of the job are assigned, the execution status of the job may be identified as "completed", and the related resources occupied by the job may be reclaimed.
C. For a job requiring use of critical resources, at the time of job loading, for example, the number of maximum critical resources that the job can occupy statically at the time of running may be calculated first. This value may be, for example, the minimum of the following values: the number of critical resources allocated to the organization on the management platform, the maximum number of tasks that the scheduler instance of the system configuration can concurrently run, the number of tasks that each job of the system configuration is allowed to concurrently execute, and the like are not limited thereto.
D. On the basis of the operation c, for example, the maximum critical resource number allowed to occupy in the process of loading the job can be determined, when the job is actually dispatched, the occupation condition of a certain critical resource is required to be dynamically calculated, whether the idle critical resource can still be used for meeting the task operation is judged, if yes, the job can be dispatched, the critical resource number is reduced by one, and otherwise jobIndex is moved to the next job to be dispatched.
It should be noted that the "job", "task" and "critical resource" need to be specified in combination with an actual service scenario when specifically used. Taking the example of the intelligent outbound SaaS product, the job list may correspond to an outbound list for a specific crowd, the task list may correspond to each called person/number in the outbound list, and a limited number of telephone lines may form the critical resource, for example.
According to the embodiment of the disclosure, one large job is split into a plurality of small tasks to carry out fine-granularity scheduling, so that the critical resources can be distributed fairly and efficiently among the jobs, and further, the data isolation among multiple tenants of the SaaS platform and the fairly and efficient distribution of the critical resources of the system among the multiple tenants are further realized by introducing a polling mode.
According to the embodiment of the disclosure, for example, for the management end in the job scheduling system, functions of supporting various starting modes of the job, supporting failed task retry, executing process display and the like can be set.
The various ways of supporting the job may include, for example, manual initiation by a user, timed initiation, and periodic initiation. Wherein, the manual starting of the user means that the user starts by manually clicking a start button after the management end creates the job; the timing start is to designate that the job is automatically started at a certain time in the future when the job is created; cycle initiation is the period and number of times that a job is specified at the time of creation of the job, the first initiation time of the job, and subsequent cycle initiation.
Supporting failed task retries may be expressed, for example, as: some task may be limited by internal and external operating environments and conditions when executed, failing to run successfully, and may require a retry. By providing conditions, retry times and retry waiting time for defining task retries when the management end creates the job, the user can define whether to retry and how to retry according to the service requirement.
The pointing procedure presentation is used for example for realizing real-time data presentation during operation of a job and data statistics and presentation after operation. For example, the operation state, progress and various index data of the task can be displayed in real time when the job is operated; after the operation of the job is finished, a special data bulletin board can be arranged to provide various indexes, data and logs for searching, viewing and analyzing by a user, and meanwhile, the derivation of related data can be supported.
Based on the job scheduling system, for example, the allocation of critical resources among organizations can be configured at the management end, and then the job scheduling method described in fig. 2 and fig. 4 can be utilized to perform fair and reasonable scheduling according to the configurations.
Through the above embodiments of the present disclosure, a job scheduling method applied to a scheduler cluster and an executor cluster is provided, and a job scheduling system is provided, which decouples the dependencies between each module of the job scheduling system, so as to facilitate distributed deployment and support lateral expansion; concurrent execution of the operation is supported, and the operation efficiency and the system throughput of the operation are improved; data isolation among multiple tenants of the SaaS platform and fair and efficient allocation of system critical resources among multiple groups of multiple jobs are supported; the robustness of the system is ensured, so that other machines can not be influenced under the condition that any machine in the dispatching system is down, the whole system can also normally run, and the restarting of any machine can not influence the continuity of service.
Fig. 8 schematically illustrates a block diagram of a job scheduling apparatus applied to a scheduler cluster according to an embodiment of the present disclosure.
As shown in fig. 8, the job scheduling apparatus 800 includes a first determination module 810, a first acquisition module 820, a first dispatch module 830, a second acquisition module 840, and a second determination module 850.
A first determining module 810 is configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances.
The first obtaining module 820 is configured to obtain a plurality of tasks to be executed that constitute a job to be executed based on the target scheduler instance.
The first dispatch module 830 is configured to dispatch the plurality of tasks to be executed to a plurality of executor instances of the executor cluster on average, so as to concurrently execute the plurality of tasks to be executed by using the plurality of executor instances, and update an execution state of the target task to be executed by using the executor instance that executes the target task to be executed if there is the target task to be executed that has already been executed.
The second obtaining module 840 is configured to obtain execution states of a plurality of tasks to be executed.
The second determining module 850 is configured to determine that scheduling for the job to be executed is completed when the execution states of the plurality of tasks to be executed are all completed.
According to the embodiment of the disclosure, the step of splitting the job scheduling process into finer granularity by adopting the scheduler and the executor can be applied to various service scenes, so that the technical problem that the job scheduling method is not universal is at least partially overcome, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is further achieved.
According to an embodiment of the present disclosure, the first determining module includes a first determining unit, a first acquiring unit, and a second acquiring unit.
And the first determining unit is used for determining a first global lock related to the job to be executed.
And the first acquisition unit is used for acquiring the scheduler instance with normal heartbeat response in the scheduler cluster.
And the second acquisition unit is used for acquiring the scheduler instance with the first global lock from the scheduler instances with normal heartbeat response as a target scheduler instance.
According to an embodiment of the present disclosure, the first acquisition module includes a third acquisition unit and a second determination unit.
And a third obtaining unit, configured to obtain a heartbeat response of the target scheduler instance.
And the second determining unit is used for redefining the target scheduler instance according to other scheduler instances except the target scheduler instance in the scheduler cluster under the condition of abnormal heartbeat response.
According to an embodiment of the present disclosure, the job scheduling device 800 further includes a third acquisition module and a release module.
And the third acquisition module is used for acquiring the first target task to be executed with the execution state being completed.
And the releasing module is used for releasing the critical resources for executing the task to be executed of the first target.
According to an embodiment of the present disclosure, the job scheduling device 800 further includes a fourth obtaining module and a second assigning module.
And the fourth acquisition module is used for acquiring a second target task to be executed, the execution state of which is incomplete, in the task to be executed under the condition that the scheduling process of the task to be executed is interrupted.
And the second dispatch module is used for averagely dispatching the second target task to be executed to a plurality of executor examples for re-execution.
According to an embodiment of the present disclosure, the job to be executed has N, N is greater than or equal to 1, and the job scheduling apparatus 800 further includes a first polling module.
The first polling module is configured to schedule N jobs to be executed by using a polling manner, where each polling includes: and scheduling one task to be executed in each of the N tasks to be executed.
According to an embodiment of the present disclosure, the job scheduling apparatus 800 further includes:
The sending module is used for sending the execution process information of the job to be executed to the management end for visual display.
Fig. 9 schematically illustrates a block diagram of a job scheduling apparatus applied to an actuator cluster according to an embodiment of the present disclosure.
As shown in fig. 9, the job scheduling device 900 includes a fifth acquisition module 910, an execution module 920, and a first update module 930.
A fifth obtaining module 910, configured to obtain a plurality of tasks to be executed that are sent by a target scheduler instance in a scheduler cluster and constitute a job to be executed, where the scheduler cluster includes one or more scheduler instances, and the plurality of tasks to be executed are equally distributed to a plurality of executor instances.
The execution module 920 is configured to concurrently execute a plurality of tasks to be executed by using a plurality of executor instances.
The first updating module 930 is configured to update, when there is a target task to be executed that has already been executed, an execution state of the target task to be executed with an executor instance that executes the target task to be executed, so that the target scheduler instance determines whether scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
According to the embodiment of the disclosure, the step of splitting the job scheduling process into finer granularity by adopting the scheduler and the executor can be applied to various service scenes, so that the technical problem that the job scheduling method is not universal is at least partially overcome, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is further achieved.
According to an embodiment of the present disclosure, the execution module includes a determination unit, a retry unit, and an execution unit.
The judging unit is used for judging whether the task to be executed is configured with a retry mark or not under the condition that the task to be executed is not executed successfully;
and the retry unit is used for placing the task to be executed which is not successfully executed into a retry queue under the condition that the task to be executed is configured with a retry mark, wherein the retry queue is configured with a retry waiting period.
And the execution unit is used for executing the task to be executed in the retry queue again under the condition that the retry waiting period is met.
According to an embodiment of the present disclosure, the job scheduling device 900 further includes a sixth obtaining module and a second updating module.
And a sixth acquisition module, configured to acquire a second global lock related to the job to be executed when there is a target task to be executed that has already been executed.
And the second updating module is used for updating the global statistical information of the job to be executed by using the executor instance with the second global lock.
According to an embodiment of the disclosure, the job to be executed has N, N is greater than or equal to 1, and the job scheduling apparatus 900 further includes a second polling module.
The second polling module is configured to schedule N jobs to be executed by using a polling manner, where each polling includes: and scheduling one task to be executed in each of the N tasks to be executed.
Any number of the modules, units, or at least some of the functionality of any number of the modules, units, or units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or in any suitable combination of three of software, hardware, and firmware. Or one or more of the modules, units according to embodiments of the present disclosure may be at least partially implemented as computer program modules which, when executed, may perform the corresponding functions.
For example, any of the first determining module 810, the first acquiring module 820, the first assigning module 830, the second acquiring module 840, and the second determining module 850, or the fifth acquiring module 910, the executing module 920, and the first updating module 930 may be combined in one module/unit to be implemented, or any one of the modules/units may be split into a plurality of modules/units. Or at least some of the functionality of one or more of the modules/units may be combined with, and implemented in, at least some of the functionality of other modules/units. According to embodiments of the present disclosure, at least one of the first determination module 810, the first acquisition module 820, the first assignment module 830, the second acquisition module 840, and the second determination module 850, or the fifth acquisition module 910, the execution module 920, and the first update module 930 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware implementations. Or the first determining module 810, the first obtaining module 820, the first assigning module 830, the second obtaining module 840 and the second determining module 850, or at least one of the fifth obtaining module 910, the executing module 920 and the first updating module 930 may be at least partially implemented as computer program modules, which may perform the respective functions when being executed.
It should be noted that, in the embodiment of the present disclosure, the job scheduling device portion corresponds to the job scheduling method portion in the embodiment of the present disclosure, and the description of the job scheduling device portion specifically refers to the job scheduling method portion, which is not described herein.
Fig. 10 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method according to an embodiment of the present disclosure. The computer system illustrated in fig. 10 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, a computer system 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. The processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1001 may also include on-board memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.
In the RAM 1003, various programs and data required for the operation of the system 1000 are stored. The processor 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiment of the present disclosure by executing programs in the ROM 1002 and/or the RAM 1003. Note that the program may be stored in one or more memories other than the ROM 1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flow according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
According to embodiments of the present disclosure, system 1000 may also include an input/output (I/O) interface 1005, with input/output (I/O) interface 1005 also connected to bus 1004. The system 1000 may also include one or more of the following components connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1001. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1002 and/or RAM 1003 and/or one or more memories other than ROM 1002 and RAM 1003 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the job scheduling method provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1001. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of signals on a network medium, distributed, and downloaded and installed via the communication section 1009, and/or installed from the removable medium 1011. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (17)

1. A job scheduling method, comprising:
determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster comprises one or more scheduler instances;
acquiring a plurality of tasks to be executed which form a job to be executed based on the target scheduler instance;
The tasks to be executed are averagely distributed to a plurality of executor examples of an executor cluster, so that the tasks to be executed are concurrently executed by the executor examples, and the execution state of the target tasks to be executed is updated by the executor examples for executing the target tasks to be executed under the condition that the target tasks to be executed which are completed are executed;
acquiring execution states of the plurality of tasks to be executed; and
Determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed;
Under the condition that the to-be-executed job needs to use critical resources, determining the maximum critical resource number allowed to be occupied by the loading process of the to-be-executed job when the to-be-executed job is loaded; when the task to be executed is actually allocated, the occupation condition of certain critical resources is dynamically calculated; assigning the task to be executed and subtracting one from the critical resource number under the condition that the free critical resource is determined to meet the running of the task to be executed; and scheduling the next job to be executed under the condition that no free critical resources are determined to meet the running of the task to be executed.
2. The method of claim 1, wherein determining a target scheduler instance from a scheduler cluster comprises:
determining a first global lock related to the job to be executed;
Acquiring a scheduler instance with normal heartbeat response in the scheduler cluster; and
And acquiring the scheduler instance with the first global lock from the scheduler instances with normal heartbeat response as the target scheduler instance.
3. The method of claim 1, wherein obtaining a plurality of tasks to be performed that make up a job to be performed based on the target scheduler instance comprises:
acquiring a heartbeat response of the target scheduler instance; and
And in the case of abnormal heartbeat response, redefining the target scheduler instance according to other scheduler instances except the target scheduler instance in the scheduler cluster.
4. The method of claim 1, further comprising:
acquiring a first target task to be executed, the execution state of which is completed; and
And releasing critical resources for executing the task to be executed of the first target.
5. The method of claim 1, further comprising:
under the condition that interruption occurs in the scheduling process of the job to be executed, acquiring a second target job to be executed, the execution state of which is incomplete, in the job to be executed; and
And averagely distributing the second target task to be executed to the plurality of executor instances for re-execution.
6. The method of claim 1, wherein the job to be executed has N, n≡1, the method further comprising:
Scheduling the N jobs to be executed in a polling mode, wherein each polling comprises the following steps:
And scheduling one task to be executed in each of the N tasks to be executed.
7. The method of claim 1, further comprising:
and sending the execution process information of the job to be executed to a management end for visual display.
8. A job scheduling method applied to a plurality of executor instances in an executor cluster, the method comprising:
Acquiring a plurality of tasks to be executed, which are transmitted by target scheduler instances in a scheduler cluster and form a job to be executed, wherein the scheduler cluster comprises one or more scheduler instances, and the tasks to be executed are averagely distributed to the plurality of executor instances;
concurrently executing the plurality of tasks to be executed by utilizing the plurality of executor instances; and
Updating the execution state of the target task to be executed by using an executor instance for executing the target task to be executed under the condition that the target task to be executed is executed, so that the target scheduler instance determines whether scheduling of the job to be executed is completed or not according to the execution states of the plurality of tasks to be executed;
Under the condition that the to-be-executed job needs to use critical resources, determining the maximum critical resource number allowed to be occupied by the loading process of the to-be-executed job when the to-be-executed job is loaded; when the task to be executed is actually allocated, the occupation condition of certain critical resources is dynamically calculated; assigning the task to be executed and subtracting one from the critical resource number under the condition that the free critical resource is determined to meet the running of the task to be executed; and scheduling the next job to be executed under the condition that no free critical resources are determined to meet the running of the task to be executed.
9. The method of claim 8, wherein concurrently executing the plurality of tasks to be performed with the plurality of executor instances comprises:
judging whether the task to be executed is configured with a retry mark or not under the condition that the task to be executed is not executed successfully;
Under the condition that the to-be-executed job is configured with a retry mark, placing the to-be-executed task which is not successfully executed into a retry queue, wherein the retry queue is configured with a retry waiting period; and
And re-executing the task to be executed in the retry queue under the condition that the retry waiting period is satisfied.
10. The method of claim 8, further comprising:
Acquiring a second global lock related to the job to be executed under the condition that a target task to be executed which is completed by execution exists; and
And updating global statistical information of the job to be executed by using the executor instance with the second global lock.
11. The method of claim 8, wherein the job to be executed has N, n≡1, the method further comprising:
Scheduling the N jobs to be executed in a polling mode, wherein each polling comprises the following steps:
And scheduling one task to be executed in each of the N tasks to be executed.
12. A job scheduling device, comprising:
A first determining module, configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances;
The first acquisition module is used for acquiring a plurality of tasks to be executed, which form a job to be executed, based on the target scheduler instance;
The first dispatch module is used for averagely dispatching the plurality of tasks to be executed to a plurality of executor examples of the executor cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of executor examples, and updating the execution state of the target task to be executed by using the executor example for executing the target task to be executed under the condition that the target task to be executed already executed exists;
the second acquisition module is used for acquiring the execution states of the plurality of tasks to be executed; and
The second determining module is used for determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed;
Under the condition that the to-be-executed job needs to use critical resources, determining the maximum critical resource number allowed to be occupied by the loading process of the to-be-executed job when the to-be-executed job is loaded; when the task to be executed is actually allocated, the occupation condition of certain critical resources is dynamically calculated; assigning the task to be executed and subtracting one from the critical resource number under the condition that the free critical resource is determined to meet the running of the task to be executed; and scheduling the next job to be executed under the condition that no free critical resources are determined to meet the running of the task to be executed.
13. A job scheduling device for use with a plurality of actuator instances in an actuator cluster, the device comprising:
A fifth obtaining module, configured to obtain a plurality of tasks to be executed that are sent by a target scheduler instance in a scheduler cluster and form a job to be executed, where the scheduler cluster includes one or more scheduler instances, and the plurality of tasks to be executed are averagely allocated to the plurality of executor instances;
the execution module is used for concurrently executing the tasks to be executed by utilizing the executor instances; and
The first updating module is used for updating the execution state of the target task to be executed by using a target executor instance for executing the target task to be executed under the condition that the target task to be executed is executed, so that the target scheduler instance determines whether the scheduling of the job to be executed is completed or not according to the execution states of the plurality of tasks to be executed;
Under the condition that the to-be-executed job needs to use critical resources, determining the maximum critical resource number allowed to be occupied by the loading process of the to-be-executed job when the to-be-executed job is loaded; when the task to be executed is actually allocated, the occupation condition of certain critical resources is dynamically calculated; assigning the task to be executed and subtracting one from the critical resource number under the condition that the free critical resource is determined to meet the running of the task to be executed; and scheduling the next job to be executed under the condition that no free critical resources are determined to meet the running of the task to be executed.
14. A job scheduling system, comprising:
a scheduler module for:
determining a target scheduler instance from a scheduler cluster, wherein the scheduler cluster comprises one or more scheduler instances;
acquiring a plurality of tasks to be executed which form a job to be executed based on the target scheduler instance;
The tasks to be executed are averagely distributed to a plurality of executor examples of an executor cluster, so that the tasks to be executed are concurrently executed by the executor examples, and the execution state of the target tasks to be executed is updated by the executor examples for executing the target tasks to be executed under the condition that the target tasks to be executed which are completed are executed;
acquiring execution states of the plurality of tasks to be executed; and
Determining that the scheduling of the job to be executed is completed under the condition that the execution states of the plurality of tasks to be executed are all completed;
An actuator module for:
Acquiring a plurality of tasks to be executed, which are transmitted by target scheduler instances in a scheduler cluster and form a job to be executed, wherein the scheduler cluster comprises one or more scheduler instances, and the tasks to be executed are averagely distributed to the plurality of executor instances;
concurrently executing the plurality of tasks to be executed by utilizing the plurality of executor instances; and
Updating the execution state of the target task to be executed by using an executor instance for executing the target task to be executed under the condition that the target task to be executed is executed, so that the target scheduler instance determines whether scheduling of the job to be executed is completed or not according to the execution states of the plurality of tasks to be executed;
Under the condition that the to-be-executed job needs to use critical resources, determining the maximum critical resource number allowed to be occupied by the loading process of the to-be-executed job when the to-be-executed job is loaded; when the task to be executed is actually allocated, the occupation condition of certain critical resources is dynamically calculated; assigning the task to be executed and subtracting one from the critical resource number under the condition that the free critical resource is determined to meet the running of the task to be executed; and scheduling the next job to be executed under the condition that no free critical resources are determined to meet the running of the task to be executed.
15. A computer system, comprising:
One or more processors;
A memory for storing one or more programs,
Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 7 and/or 8 to 11.
16. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to implement the method of any one of claims 1 to 7 and/or 8 to 11.
17. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 7 and/or 8 to 11 when executed.
CN202110364946.6A 2021-04-02 2021-04-02 Job scheduling method, job scheduling device, computer system and computer readable storage medium Active CN113032125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110364946.6A CN113032125B (en) 2021-04-02 2021-04-02 Job scheduling method, job scheduling device, computer system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110364946.6A CN113032125B (en) 2021-04-02 2021-04-02 Job scheduling method, job scheduling device, computer system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113032125A CN113032125A (en) 2021-06-25
CN113032125B true CN113032125B (en) 2024-08-16

Family

ID=76453807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110364946.6A Active CN113032125B (en) 2021-04-02 2021-04-02 Job scheduling method, job scheduling device, computer system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113032125B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113419835A (en) * 2021-07-02 2021-09-21 中国工商银行股份有限公司 Job scheduling method, device, equipment and medium
CN113946417A (en) * 2021-09-18 2022-01-18 广州虎牙科技有限公司 Distributed task execution method, related device and equipment
CN113778652A (en) * 2021-09-22 2021-12-10 武汉悦学帮网络技术有限公司 Task scheduling method and device, electronic equipment and storage medium
CN115061814B (en) * 2022-06-14 2025-07-18 北京恒泰实达科技股份有限公司 Distributed high concurrency scheduling system based on decentralized job tasks
CN115129454B (en) * 2022-07-19 2025-07-15 北京恒泰实达科技股份有限公司 A data subscription registration and deregistration system and method based on distributed screen
CN116233258A (en) * 2023-01-18 2023-06-06 新浪技术(中国)有限公司 Method, apparatus, electronic device and computer readable medium for traffic scheduling
CN117539642B (en) * 2024-01-09 2024-04-02 上海晨钦信息科技服务有限公司 Credit card distributed scheduling platform and scheduling method
CN118963975A (en) * 2024-10-18 2024-11-15 江苏金智教育信息股份有限公司 A method, device, medium, equipment and system for scheduling data consumption tasks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706788B (en) * 2009-11-25 2012-11-14 惠州Tcl移动通信有限公司 Cross-area access method for embedded file system
US8789054B2 (en) * 2010-10-29 2014-07-22 Fujitsu Limited Scheduling policy for efficient parallelization of software analysis in a distributed computing environment
CN104636204B (en) * 2014-12-04 2018-06-01 中国联合网络通信集团有限公司 A kind of method for scheduling task and device
US10223164B2 (en) * 2016-10-24 2019-03-05 International Business Machines Corporation Execution of critical tasks based on the number of available processing entities
CN110134505A (en) * 2019-05-15 2019-08-16 湖南麒麟信安科技有限公司 A kind of distributed computing method of group system, system and medium
CN110209488B (en) * 2019-06-10 2021-12-07 北京达佳互联信息技术有限公司 Task execution method, device, equipment, system and storage medium
CN110825535B (en) * 2019-10-12 2022-06-28 中国建设银行股份有限公司 Job scheduling method and system
CN111930487B (en) * 2020-08-28 2024-05-24 北京百度网讯科技有限公司 Job stream scheduling method and device, electronic equipment and storage medium
CN112579267A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Decentralized big data job flow scheduling method and device
CN112561326A (en) * 2020-12-15 2021-03-26 青岛海尔科技有限公司 Task execution method and device, storage medium and electronic device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113032125A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN113032125B (en) Job scheduling method, job scheduling device, computer system and computer readable storage medium
US11593149B2 (en) Unified resource management for containers and virtual machines
CN113569987B (en) Model training method and device
CN107729139B (en) Method and device for concurrently acquiring resources
CN112486648A (en) Task scheduling method, device, system, electronic equipment and storage medium
US9483314B2 (en) Systems and methods for fault tolerant batch processing in a virtual environment
US11182217B2 (en) Multilayered resource scheduling
CN112445598B (en) Task scheduling method and device based on quartz, electronic equipment and medium
CN109766172B (en) Asynchronous task scheduling method and device
CN111274033B (en) Resource deployment method, device, server and storage medium
CN111641515A (en) VNF life cycle management method and device
CN109688191B (en) Traffic scheduling method and communication device
US11442756B2 (en) Common service resource application method, related device, and system
CN112231073A (en) Distributed task scheduling method and device
US20230168940A1 (en) Time-bound task management in parallel processing environment
US11327788B2 (en) Methods for scheduling multiple batches of concurrent jobs
CN113419835A (en) Job scheduling method, device, equipment and medium
CN108089919B (en) Method and system for concurrently processing API (application program interface) requests
CN111522630B (en) Method and system for executing planned tasks based on batch dispatching center
CN106648871B (en) Resource management method and system
CN113703930A (en) Task scheduling method, device and system and computer readable storage medium
CN106681810A (en) Task docking processing customized management method, device and electronic equipment
WO2022222975A1 (en) Load processing method, calculation node, calculation node cluster, and related device
CN112965815B (en) Host deployment method and device, electronic equipment and computer storage medium
CN114840324A (en) Transcoding task scheduling method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant