[go: up one dir, main page]

CN117632390A - Job scheduling method, device, scheduler and system - Google Patents

Job scheduling method, device, scheduler and system Download PDF

Info

Publication number
CN117632390A
CN117632390A CN202210963659.1A CN202210963659A CN117632390A CN 117632390 A CN117632390 A CN 117632390A CN 202210963659 A CN202210963659 A CN 202210963659A CN 117632390 A CN117632390 A CN 117632390A
Authority
CN
China
Prior art keywords
tenant
scheduling
job
jobs
scheduler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210963659.1A
Other languages
Chinese (zh)
Inventor
孔凡斌
张晓东
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210963659.1A priority Critical patent/CN117632390A/en
Publication of CN117632390A publication Critical patent/CN117632390A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

公开了作业调度方法、装置、调度器和系统,涉及数据处理领域。方法包括调度器获取待处理作业,根据两级调度策略确定待处理作业的关联租户和调度结果,根据调度结果执行待处理作业的操作,即一级调度实现租户级的租户调度,二级调度实现作业级的作业调度。相对于所有租户的用户共享使用超算中心提供的共享资源池,不同租户内用户的作业可能基于同一资源进行处理存在的安全隐患,本申请提供的方案通过调度器对不同租户的作业依据租户进行两级隔离调度,以及将系统提供的租户资源依据租户进行隔离,实现了租户之间隔离,调度器对不同租户的作业进行隔离处理提升数据处理安全性,而且基于每个租户自有的租户资源池进行作业调度提升数据处理的效率。

Disclosed are job scheduling methods, devices, schedulers and systems, relating to the field of data processing. The method includes the scheduler obtaining the pending job, determining the associated tenant and scheduling result of the pending job according to the two-level scheduling policy, and executing the operation of the pending job according to the scheduling result, that is, the first-level scheduling implements tenant-level tenant scheduling, and the second-level scheduling implements tenant-level tenant scheduling. Job-level job scheduling. Compared with users of all tenants sharing and using the shared resource pool provided by the supercomputing center, the jobs of users in different tenants may be processed based on the same resources, which poses security risks. The solution provided by this application uses the scheduler to process the jobs of different tenants according to the tenant. Two-level isolation scheduling and the isolation of tenant resources provided by the system based on tenants realize the isolation between tenants. The scheduler isolates the jobs of different tenants to improve data processing security, and is based on each tenant's own tenant resources. The pool performs job scheduling to improve data processing efficiency.

Description

作业调度方法、装置、调度器和系统Job scheduling method, device, scheduler and system

技术领域Technical field

本申请涉及数据处理领域,尤其涉及一种作业调度方法、装置、调度器和系统。The present application relates to the field of data processing, and in particular, to a job scheduling method, device, scheduler and system.

背景技术Background technique

随着高性能计算(High Performance Computing,HPC)应用的快速发展,在科研、生产制造、生命科学等领域高性能计算需求快速增长,越来越多的企业使用超算的计算资源进行科学计算。超算中心作为高性能计算应用的一种部署形态,通常以共享资源池形式处理高性能计算的作业,但随着用户数量的不断增加,基于共享资源池对用户提交的作业进行调度的方式往往无法保证用户数据的安全性。因此,亟需一种更安全的作业调度方法。With the rapid development of High Performance Computing (HPC) applications, the demand for high-performance computing in scientific research, manufacturing, life sciences and other fields is growing rapidly. More and more companies are using supercomputing computing resources for scientific computing. As a deployment form of high-performance computing applications, supercomputing centers usually process high-performance computing jobs in the form of shared resource pools. However, as the number of users continues to increase, jobs submitted by users are often scheduled based on shared resource pools. The security of user data cannot be guaranteed. Therefore, a more secure job scheduling method is urgently needed.

发明内容Contents of the invention

本申请提供了作业调度方法、装置、调度器和系统,由此提升HPC领域处理用户的作业的数据安全性。This application provides a job scheduling method, device, scheduler and system, thereby improving data security for processing user jobs in the HPC field.

第一方面,提供了一种作业调度方法,方法由调度器执行。方法包括:调度器获取待处理作业,根据两级调度策略确定待处理作业的关联租户和调度结果,根据调度结果执行待处理作业的操作。两级调度策略用于指示以多租户隔离方式实现资源管理的作业调度方式即系统提供的租户资源依据租户进行隔离,对租户的作业进行两级隔离调度,其中,一级调度确定系统中存在与待处理作业关联的租户,实现租户级的租户调度,二级调度基于待处理作业的关联租户的租户资源池调度待处理作业,实现作业级的作业调度。The first aspect provides a job scheduling method, which is executed by the scheduler. The method includes: the scheduler obtains the job to be processed, determines the associated tenant and scheduling result of the job to be processed according to the two-level scheduling policy, and executes the operation of the job to be processed according to the scheduling result. The two-level scheduling policy is used to indicate the job scheduling method that implements resource management in a multi-tenant isolation manner. That is, the tenant resources provided by the system are isolated according to the tenant, and the tenant's jobs are scheduled for two-level isolation. Among them, the first-level scheduling determines whether there are The tenant associated with the job to be processed implements tenant-level tenant scheduling. The secondary scheduling schedules the job to be processed based on the tenant resource pool of the tenant associated with the job to be processed to implement job-level job scheduling.

如此,相对于所有租户的用户共享使用超算中心提供的共享资源池,不同租户内用户的作业可能基于同一资源进行处理存在的安全隐患,本申请提供的方案通过调度器对不同租户内用户的作业依据租户进行两级隔离调度,以及将系统提供的租户资源依据租户进行隔离,实现了租户之间隔离,使得调度器对不同租户的作业进行隔离处理,从而提升了数据处理安全性,而且基于每个租户自有的租户资源池进行作业调度有效地提升了数据处理的效率。In this way, compared with users of all tenants sharing the shared resource pool provided by the supercomputing center, there are security risks in that the jobs of users in different tenants may be processed based on the same resources. The solution provided by this application uses the scheduler to handle the tasks of users in different tenants. Jobs are scheduled in two-level isolation based on tenants, and the tenant resources provided by the system are isolated based on tenants, thereby achieving isolation between tenants and allowing the scheduler to isolate jobs of different tenants, thus improving data processing security and based on Each tenant has its own tenant resource pool for job scheduling, which effectively improves the efficiency of data processing.

其中,多个租户的租户资源池相互隔离,租户资源池包括计算资源、存储资源和网络资源中至少一种。从而,租户资源池提供丰富的资源以便于处理租户内用户的作业。Among them, tenant resource pools of multiple tenants are isolated from each other, and the tenant resource pool includes at least one of computing resources, storage resources, and network resources. Therefore, the tenant resource pool provides abundant resources to process the tasks of users within the tenant.

在一种可能的实现方式中,根据两级调度策略确定待处理作业的关联租户,包括:根据租户和用户的关联关系确定与提交待处理作业的用户的关联租户。从而,基于租户和用户的关联关系实现一级调度,确定系统中存在与待处理作业关联的租户,以便于调度器对不同租户的作业进行隔离处理,从而提升了数据处理安全性。In one possible implementation, determining the associated tenant of the pending job according to the two-level scheduling policy includes: determining the associated tenant with the user who submitted the pending job based on the association between the tenant and the user. Therefore, one-level scheduling is implemented based on the association between tenants and users, and it is determined that there are tenants associated with the jobs to be processed in the system, so that the scheduler can isolate and process the jobs of different tenants, thus improving the security of data processing.

在一种示例中,根据租户和用户的关联关系确定与提交待处理作业的用户的关联租户,包括:根据用户标识和关联关系确定与提交待处理作业的用户的关联租户。以便于调度器根据用户标识可以尽可能快地确定到与提交待处理作业的用户的关联租户实现一级调度。In one example, determining the tenant associated with the user who submitted the job to be processed based on the association between the tenant and the user includes: determining the tenant associated with the user who submitted the job to be processed based on the user ID and the association. This is so that the scheduler can determine the tenant associated with the user who submitted the pending job as quickly as possible based on the user ID to implement first-level scheduling.

在一些实施例中,根据租户和用户的关联关系确定与提交待处理作业的用户的关联租户,包括:调度器根据关联关系确定调度器内待处理作业的关联租户的调度资源,调度资源用于基于待处理作业的关联租户的租户资源池调度待处理作业。调度器内用于调度不同租户的作业的调度资源相互隔离。In some embodiments, determining the tenant associated with the user who submitted the job to be processed according to the association relationship between the tenant and the user includes: the scheduler determines the scheduling resources of the associated tenant of the job to be processed in the scheduler according to the association relationship, and the scheduling resources are used to Pending jobs are scheduled based on the tenant resource pool of the pending job's associated tenant. The scheduling resources used to schedule jobs of different tenants in the scheduler are isolated from each other.

在另一种可能的实现方式中,根据两级调度策略确定待处理作业的关联租户和调度结果,包括:基于关联租户的租户资源池,调度关联租户内用户的作业,得到调度结果。其中,关联租户内用户的作业包含待处理作业。调度结果用于指示关联租户的租户资源池中处理待处理作业的资源。从而,由于不同租户的租户资源池相互隔离,基于关联租户的租户资源池实现二级调度,即基于关联租户的租户资源池调度关联租户内用户的作业,实现调度器对不同租户的作业进行隔离处理,从而提升了数据处理安全性。In another possible implementation, determining the associated tenant and scheduling result of the to-be-processed job based on the two-level scheduling policy includes: scheduling the job of the user in the associated tenant based on the tenant resource pool of the associated tenant to obtain the scheduling result. Among them, the jobs associated with users in the tenant include pending jobs. The scheduling results are used to indicate the resources in the tenant resource pool of the associated tenant to process pending jobs. Therefore, since the tenant resource pools of different tenants are isolated from each other, secondary scheduling is implemented based on the tenant resource pool of the associated tenant, that is, the jobs of the users in the associated tenant are scheduled based on the tenant resource pool of the associated tenant, and the scheduler is implemented to isolate the jobs of different tenants. processing, thereby improving data processing security.

在一种示例中,调度器可以根据租户的业务特征为不同的租户配置不同的调度策略。基于关联租户的租户资源池,调度关联租户内用户的作业,得到调度结果,包括:基于关联租户的租户资源池,根据关联租户的调度策略调度关联租户内用户的作业,得到调度结果。从而,基于每个租户自有的租户资源池结合调度策略进行作业调度进一步提升了数据处理的效率。In one example, the scheduler can configure different scheduling policies for different tenants based on the tenant's business characteristics. Based on the tenant resource pool of the associated tenant, scheduling the jobs of the users in the associated tenant and obtaining the scheduling results includes: based on the tenant resource pool of the associated tenant, scheduling the jobs of the users in the associated tenant according to the scheduling policy of the associated tenant and obtaining the scheduling results. Therefore, job scheduling based on each tenant's own tenant resource pool combined with scheduling strategies further improves the efficiency of data processing.

在另一种示例中,调度器可以为一个租户配置多个队列,每个队列绑定同一租户的不同用户。基于关联租户的租户资源池,调度关联租户内用户的作业,得到调度结果,包括:基于关联租户的租户资源池,根据关联租户的调度策略调度第一队列中关联租户内用户的作业,得到调度结果,第一队列为多个队列中包含待处理作业的队列。从而,基于不同队列调度租户内用户的作业进一步提升了数据处理的效率。In another example, the scheduler can configure multiple queues for a tenant, and each queue is bound to a different user of the same tenant. Based on the tenant resource pool of the associated tenant, schedule the jobs of the users in the associated tenant, and obtain the scheduling results, including: based on the tenant resource pool of the associated tenant, schedule the jobs of the users in the associated tenant in the first queue according to the scheduling policy of the associated tenant, and obtain the scheduling As a result, the first queue is the queue of the plurality of queues that contains pending jobs. Therefore, scheduling the jobs of users in the tenant based on different queues further improves the efficiency of data processing.

在另一种可能的实现方式中,根据调度结果执行待处理作业的操作,包括:根据调度结果对待处理作业执行作业处理操作和作业管理操作。In another possible implementation manner, performing operations on the job to be processed according to the scheduling result includes: performing job processing operations and job management operations on the job to be processed according to the scheduling result.

在一种示例中,根据调度结果对待处理作业执行作业处理操作,包括:当关联租户的租户资源池满足待处理作业的资源需求时,调度关联租户内用户的作业,得到调度结果。In one example, performing a job processing operation on the job to be processed according to the scheduling result includes: when the tenant resource pool of the associated tenant meets the resource requirements of the job to be processed, scheduling the job of the user in the associated tenant to obtain the scheduling result.

在另一种示例中,根据调度结果对待处理作业执行作业管理操作,包括:当作业管理操作与作业状态匹配时,调度关联租户内用户的作业,得到调度结果。In another example, performing a job management operation on the job to be processed according to the scheduling result includes: when the job management operation matches the job status, scheduling the job of the user in the associated tenant to obtain the scheduling result.

第二方面,提供了一种作业调度装置,所述装置包括用于执行第一方面或第一方面任一种可能设计中的作业调度方法的各个模块。A second aspect provides a job scheduling device, which includes various modules for executing the job scheduling method in the first aspect or any possible design of the first aspect.

第三方面,提供一种调度器,该调度器包括至少一个处理器和存储器,存储器用于存储一组计算机指令;当处理器作为第一方面或第一方面任一种可能实现方式中的调度器执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的作业调度方法的操作步骤。In a third aspect, a scheduler is provided. The scheduler includes at least one processor and a memory, and the memory is used to store a set of computer instructions; when the processor serves as the scheduler in the first aspect or any possible implementation of the first aspect, When the processor executes the set of computer instructions, the operation steps of the job scheduling method in the first aspect or any possible implementation of the first aspect are executed.

第四方面,提供一种计算机设备,该计算机设备包括调度器,调度器用于执行第一方面或第一方面任一种可能实现方式中的作业调度方法的操作步骤。A fourth aspect provides a computer device. The computer device includes a scheduler, and the scheduler is configured to perform the operation steps of the job scheduling method in the first aspect or any possible implementation of the first aspect.

第五方面,提供一种作业调度系统,作业调度系统包括调度器和资源池,资源池包括存储节点、计算节点和网络,资源池用于提供多个租户的租户资源池,调度器用于执行第一方面或第一方面任一种可能实现方式中的作业调度方法的操作步骤。In the fifth aspect, a job scheduling system is provided. The job scheduling system includes a scheduler and a resource pool. The resource pool includes storage nodes, computing nodes and networks. The resource pool is used to provide tenant resource pools for multiple tenants. The scheduler is used to execute the first Operation steps of the job scheduling method in one aspect or any possible implementation of the first aspect.

第六方面,计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在调度器中运行时,使得调度器执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。In a sixth aspect, a computer-readable storage medium includes: computer software instructions; when the computer software instructions are run in a scheduler, they cause the scheduler to execute the method described in the first aspect or any possible implementation of the first aspect. operating steps.

第七方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。In a seventh aspect, a computer program product is provided. When the computer program product is run on a computer, it causes the computer to perform the operation steps of the method described in the first aspect or any possible implementation of the first aspect.

本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.

附图说明Description of drawings

图1为本申请实施例提供的一种作业调度系统的架构示意图;Figure 1 is a schematic architectural diagram of a job scheduling system provided by an embodiment of the present application;

图2为本申请实施例提供的一种调度器初始化的流程示意图;Figure 2 is a schematic flow chart of scheduler initialization provided by an embodiment of the present application;

图3为本申请实施例提供的一种作业调度方法的流程示意图;Figure 3 is a schematic flowchart of a job scheduling method provided by an embodiment of the present application;

图4为本申请实施例提供的一种调度队列内作业的流程示意图;Figure 4 is a schematic flowchart of scheduling jobs in a queue provided by an embodiment of the present application;

图5为本申请实施例提供的一种作业管理方法的流程示意图;Figure 5 is a schematic flowchart of a job management method provided by an embodiment of the present application;

图6为本申请实施例提供的一种作业调度装置的结构示意图;Figure 6 is a schematic structural diagram of a job scheduling device provided by an embodiment of the present application;

图7为本申请实施例提供的一种调度器的结构示意图。Figure 7 is a schematic structural diagram of a scheduler provided by an embodiment of the present application.

具体实施方式Detailed ways

高性能计算(High Performance Computing,HPC)集群,指一个计算机集群系统。HPC集群包含利用各种互联技术连接在一起的多个计算机。互联技术例如可以是无限带宽技术(infiniband,IB)、基于聚合以太网的远程直接内存访问(Remote Direct MemoryAccess over Converged Ethernet,RoCE)或传输控制协议(Transmission ControlProtocol,TCP)。HPC提供了超高浮点计算能力,可用于解决计算密集型和海量数据处理等业务的计算需求。连接在一起的多个计算机的综合计算能力可以来处理大型计算问题。例如,科学研究、气象预报、金融、仿真实验、生物制药、基因测序和图像处理等行业涉及的利用HPC集群来解决的大型计算问题和计算需求。利用HPC集群处理大型计算问题可以有效地缩短处理数据的计算时间,以及提高计算精度。通常,HPC集群中的管理节点可以对作业进行分解,将分解后的作业分配给多个计算节点,由多个计算节点并行完成作业。High Performance Computing (HPC) cluster refers to a computer cluster system. HPC clusters contain multiple computers connected together using various interconnect technologies. The interconnection technology may be, for example, infinite bandwidth technology (infiniband, IB), Remote Direct Memory Access over Converged Ethernet (RoCE), or transmission control protocol (Transmission Control Protocol, TCP). HPC provides ultra-high floating-point computing capabilities and can be used to solve the computing needs of computing-intensive and massive data processing services. The combined computing power of multiple computers connected together can handle large computing problems. For example, industries such as scientific research, weather forecasting, finance, simulation experiments, biopharmaceuticals, gene sequencing, and image processing involve the use of HPC clusters to solve large-scale computing problems and computing needs. Using HPC clusters to handle large-scale computing problems can effectively shorten the computing time for processing data and improve computing accuracy. Usually, the management node in the HPC cluster can decompose the job, assign the decomposed job to multiple computing nodes, and have multiple computing nodes complete the job in parallel.

资源(resource),指处理作业时所需的计算资源、存储资源和网络资源中至少一种,如:中央处理器(central processing unit,CPU),图形处理器(graphics processingunit,GPU)、数据处理单元(data processing unit,DPU)、神经处理单元(neuralprocessing unit,NPU)和嵌入式神经网络处理器(neural-network processing unit,NPU)、内存,输入/输出(Input/Output,I/O)。Resources refer to at least one of computing resources, storage resources and network resources required to process jobs, such as: central processing unit (CPU), graphics processing unit (GPU), data processing Unit (data processing unit, DPU), neural processing unit (neural processing unit, NPU) and embedded neural network processor (neural-network processing unit, NPU), memory, input/output (Input/Output, I/O).

租户(tenant),指被赋予了软件即服务(Software-as-a-Service,SaaS)使用权的企业组织。Tenant refers to an enterprise organization that has been granted the right to use Software-as-a-Service (SaaS).

用户(user),指具有SaaS使用权的客体。一个租户包括多个用户。User refers to an object with the right to use SaaS. A tenant includes multiple users.

作业(job),指用户向系统提交的可执行命令或脚本。Job refers to the executable command or script submitted by the user to the system.

队列(queue),是一种操作受限制的线性表。队列的数据元素又称为队列元素。在队列中插入一个队列元素称为入队,从队列中删除一个队列元素称为出队。进行插入操作的端称为队尾,进行删除操作的端称为队头。队列中没有元素时,称为空队列。本申请实施例中,队列可以是一个作业容器,包含一个租户的多个用户的作业。A queue is a linear list with restricted operations. The data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueuing, and deleting a queue element from the queue is called dequeuing. The end that performs the insertion operation is called the tail of the queue, and the end that performs the deletion operation is called the head of the queue. When there are no elements in the queue, it is called an empty queue. In this embodiment of the application, the queue may be a job container that contains jobs for multiple users of a tenant.

调度策略(scheduling policy),指作业调度算法。HPC领域常见的调度策略包括先来先服务策略、公平共享策略和抢占式策略中至少一种。Scheduling policy refers to the job scheduling algorithm. Common scheduling strategies in the HPC field include at least one of first-come, first-served strategy, fair sharing strategy, and preemptive strategy.

为了解决HPC集群中基于共享资源池对用户提交的作业调度所导致的数据安全性问题,本申请提供一种作业调度方法,尤其提供一种超算场景中多租户两级调度的方法,即系统提供的资源按租户进行隔离,则系统包括相互隔离的多个租户的租户资源池,系统中的调度器获取到待处理作业后,根据两级调度策略确定待处理作业的关联租户和调度结果,根据调度结果执行待处理作业的操作,即先确定系统中存在与待处理作业关联的租户,基于待处理作业的关联租户的租户资源池调度待处理作业。相对于所有租户的用户共享使用超算中心提供的共享资源池,不同租户内用户的作业可能运行在同一资源存在的安全隐患,本申请提供的方案通过一级调度实现租户级的租户调度,通过二级调度实现作业级的作业调度,通过将调度器内调度多个租户的作业的调度资源相互隔离,以及将系统提供的租户资源依据租户进行隔离,实现了租户之间隔离,使得调度器对不同的租户的作业进行两级隔离处理,从而提升了数据处理安全性,而且基于每个租户自有的资源进行作业调度有效地提升了数据处理的效率。In order to solve the data security problem caused by scheduling jobs submitted by users based on shared resource pools in HPC clusters, this application provides a job scheduling method, especially a multi-tenant two-level scheduling method in a supercomputing scenario, that is, the system If the provided resources are isolated by tenant, the system includes tenant resource pools of multiple tenants that are isolated from each other. After the scheduler in the system obtains the pending jobs, it determines the associated tenants and scheduling results of the pending jobs based on the two-level scheduling policy. To execute the operation of the pending job according to the scheduling result, that is, first determine that there is a tenant associated with the pending job in the system, and schedule the pending job based on the tenant resource pool of the tenant associated with the pending job. Compared with users of all tenants sharing and using the shared resource pool provided by the supercomputing center, there is a security risk that the jobs of users in different tenants may run on the same resource. The solution provided by this application implements tenant-level tenant scheduling through one-level scheduling. Second-level scheduling implements job-level job scheduling. By isolating the scheduling resources for scheduling jobs of multiple tenants in the scheduler, and isolating the tenant resources provided by the system according to tenants, the isolation between tenants is achieved, allowing the scheduler to The jobs of different tenants are processed in two-level isolation, thereby improving the security of data processing, and job scheduling based on each tenant's own resources effectively improves the efficiency of data processing.

图1为本申请实施例提供的一种作业调度系统的架构示意图。如图1所示,作业调度系统100是一种提供高性能计算的实体。作业调度系统100包括资源池110。资源池110包括计算资源111、存储资源112和网络资源113。该资源池110可以依据多个租户划分为多个租户资源池(如图1所示的N个租户资源池,N大于或等2)。多个租户资源池相互隔离,租户资源池包括计算资源、存储资源和网络资源中至少一种。不同租户的租户资源池包含的资源可以相同也可以不同。每个租户资源池用于为处理所属租户的作业提供资源支持。Figure 1 is a schematic architectural diagram of a job scheduling system provided by an embodiment of the present application. As shown in Figure 1, the job scheduling system 100 is an entity that provides high-performance computing. Job scheduling system 100 includes resource pool 110 . The resource pool 110 includes computing resources 111, storage resources 112 and network resources 113. The resource pool 110 can be divided into multiple tenant resource pools based on multiple tenants (N tenant resource pools as shown in Figure 1, N is greater than or equal to 2). Multiple tenant resource pools are isolated from each other, and the tenant resource pool includes at least one of computing resources, storage resources, and network resources. Tenant resource pools of different tenants can contain the same or different resources. Each tenant resource pool is used to provide resource support for processing jobs of the tenant to which it belongs.

计算资源111可以由系统中的计算集群提供。计算集群包含至少两个计算节点,计算节点之间可以相互通信。计算节点是一种计算设备,如服务器、台式计算机或者存储阵列的控制器等。Computing resources 111 may be provided by computing clusters in the system. A computing cluster contains at least two computing nodes, and the computing nodes can communicate with each other. A computing node is a computing device, such as a server, desktop computer, or storage array controller.

存储资源112可以由系统中的存储集群提供。存储集群包含至少两个存储节点。一个存储节点包括一个或多个控制器、网卡与多个硬盘。硬盘用于存储数据。硬盘可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。网卡用于与计算集群包含的计算节点通信。控制器用于根据计算节点发送的读/写数据请求,往硬盘中写入数据或者从硬盘中读取数据。在读写数据的过程中,控制器需要将读/写数据请求中携带的地址转换为硬盘能够识别的地址。Storage resources 112 may be provided by storage clusters in the system. A storage cluster contains at least two storage nodes. A storage node includes one or more controllers, network cards, and multiple hard disks. Hard drives are used to store data. The hard disk can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive. Network cards are used to communicate with the computing nodes contained in the computing cluster. The controller is used to write data to the hard disk or read data from the hard disk according to the read/write data request sent by the computing node. In the process of reading and writing data, the controller needs to convert the address carried in the read/write data request into an address that the hard disk can recognize.

网络资源113可以交换机、路由器等网络设备提供。网络资源113用于为计算节点和存储节点之间进行数据传输提供支持。Network resources 113 can be provided by network devices such as switches and routers. Network resources 113 are used to provide support for data transmission between computing nodes and storage nodes.

作业调度系统100可以包括租户管理平台,租户可以通过租户管理平台购买租户资源池。例如,租户管理平台可以接收租户的建立请求,建立请求包括租户资源池的资源需求。在为租户赋予SaaS使用权时,根据租户的资源要求为租户构建租户资源池。The job scheduling system 100 may include a tenant management platform through which tenants can purchase tenant resource pools. For example, the tenant management platform can receive a tenant's establishment request, and the establishment request includes the resource requirements of the tenant's resource pool. When granting SaaS usage rights to a tenant, a tenant resource pool is constructed for the tenant based on the tenant's resource requirements.

作业调度系统100还包括调度器120。调度器可以是独立的物理设备。调度器120用于对获取到待处理作业进行两级隔离调度。待处理作业可以是如图1所示的调度器120获取的N个租户下不同用户的作业,如租户1的作业包括作业11、作业12和作业13;租户2的作业包括作业21、作业22和作业23;租户23的作业包括作业31、作业32和作业23。在一些实施例中,用户可以操作客户端显示的用户界面(user interface,UI)提交作业。调度器120通过网络获取作业。网络可以是指企业内部网络(如:局域网((Local Area Network,LAN))或互联网(Internet)。Job scheduling system 100 also includes a scheduler 120 . The scheduler can be a separate physical device. The scheduler 120 is used to perform two-level isolation scheduling on the acquired jobs to be processed. The jobs to be processed may be jobs of different users under N tenants obtained by the scheduler 120 as shown in Figure 1. For example, the jobs of tenant 1 include job 11, job 12 and job 13; the jobs of tenant 2 include job 21 and job 22. and job 23; tenant 23’s jobs include job 31, job 32 and job 23. In some embodiments, the user can operate a user interface (UI) displayed on the client to submit a job. Scheduler 120 obtains jobs over the network. The network may refer to an enterprise's internal network (such as a Local Area Network (LAN)) or the Internet.

一级调度用于实现租户级的租户调度,即确定调度器120中与待处理作业关联的租户的租户空间。租户空间可以指调度器内用于调度作业的调度资源。一级调度还用于实现租户管理和作业管理。The first-level scheduling is used to implement tenant-level tenant scheduling, that is, to determine the tenant space of the tenant associated with the pending job in the scheduler 120 . Tenant space can refer to the scheduling resources within the scheduler used to schedule jobs. Level one scheduling is also used to implement tenant management and job management.

租户管理用于管理具有系统中SaaS使用权的租户,以便于调度器120根据租户与用户的关系对获取到的作业进行一级调度。Tenant management is used to manage tenants who have the right to use SaaS in the system, so that the scheduler 120 can perform first-level scheduling of acquired jobs based on the relationship between the tenant and the user.

作业管理用于对作业的作业状态进行管理。作业状态包括暂停、恢复和终止。Job management is used to manage the job status of jobs. Job statuses include paused, resumed, and terminated.

二级调度用于实现作业级的作业调度,即基于租户的租户资源池,对租户空间中用户的作业进行调度。在一些实施例中,调度器120可以为不同的租户空间配置不同的调度策略,依据租户的调度策略对租户空间中用户的作业进行调度。调度器120还可以对至少一个队列中用户的作业进行二级调度。二级调度还用于实现资源管理、调度策略管理、队列管理和作业管理。Second-level scheduling is used to implement job-level job scheduling, that is, to schedule users' jobs in the tenant space based on the tenant's tenant resource pool. In some embodiments, the scheduler 120 can configure different scheduling policies for different tenant spaces, and schedule users' jobs in the tenant spaces according to the tenant's scheduling policies. The scheduler 120 may also perform secondary scheduling on the user's jobs in at least one queue. Secondary scheduling is also used to implement resource management, scheduling policy management, queue management and job management.

资源管理用于监控租户资源池中总资源、可用资源和已用资源。Resource management is used to monitor the total resources, available resources, and used resources in the tenant resource pool.

调度策略管理用于依据租户的业务特征配置调度策略。业务特征包括业务的计算特征、存储特征和数据传输特征。例如,业务特征为计算密集型业务,调度策略指示该业务优先调度到计算资源丰富的节点。又如,业务特征为存储密集型业务,调度策略指示该业务优先调度到存储资源丰富的节点。Scheduling policy management is used to configure scheduling policies based on tenants' business characteristics. Business characteristics include computing characteristics, storage characteristics and data transmission characteristics of the business. For example, if the service characteristic is a computing-intensive service, the scheduling policy indicates that the service is preferentially scheduled to nodes with abundant computing resources. For another example, if the service characteristics are storage-intensive services, the scheduling policy instructs the service to be prioritized to nodes with abundant storage resources.

队列管理包括队列权限管理、创建队列、删除队列和修改队列等。Queue management includes queue permission management, creating queues, deleting queues, modifying queues, etc.

接下来,结合附图对本申请实施例提供的作业调度方法的实施方式进行详细描述。Next, the implementation of the job scheduling method provided by the embodiment of the present application will be described in detail with reference to the accompanying drawings.

图2为本申请实施例提供的一种调度器初始化的流程示意图。Figure 2 is a schematic flowchart of scheduler initialization provided by an embodiment of the present application.

步骤210、调度器120获取初始化信息。Step 210: The scheduler 120 obtains initialization information.

调度器120可以从本地硬盘或系统中的存储节点获取初始化信息存储到本地内存。初始化信息包括系统配置文件和租户配置文件。系统配置文件包括集群名、计算节点数量、存储节点数量和资源数量。租户配置文件包括租户与用户的关联关系、一级调度应用程序、二级调度应用程序和租户资源池情况。The scheduler 120 may obtain initialization information from a local hard disk or a storage node in the system and store it in local memory. Initialization information includes system configuration files and tenant configuration files. The system configuration file includes the cluster name, number of computing nodes, number of storage nodes, and number of resources. The tenant configuration file includes the association between tenants and users, primary scheduling applications, secondary scheduling applications, and tenant resource pool conditions.

步骤220、调度器120启动一级调度。Step 220: The scheduler 120 starts level one scheduling.

调度器120从内存读取初始化信息,运行一级调度应用程序,启动一级调度。在一些实施例中,调度器120分配一个进程或线程运行一级调度应用程序,启动一级调度,实现租户级的租户调度。The scheduler 120 reads the initialization information from the memory, runs the first-level scheduling application, and starts the first-level scheduling. In some embodiments, the scheduler 120 allocates a process or thread to run the first-level scheduling application, starts the first-level scheduling, and implements tenant-level tenant scheduling.

步骤230、调度器120启动二级调度。Step 230: The scheduler 120 starts secondary scheduling.

调度器120运行二级调度应用程序,启动每个租户的二级调度。调度器120为每个租户分配一个线程或子线程运行二级调度应用程序,启动租户的二级调度,实现租户作业级的作业调度。The scheduler 120 runs the secondary scheduling application to initiate secondary scheduling for each tenant. The scheduler 120 allocates a thread or sub-thread to each tenant to run the secondary scheduling application, starts the tenant's secondary scheduling, and implements tenant job-level job scheduling.

调度器120可以按顺序启动每个租户的二级调度。示例地,步骤230包含如下详细步骤231至步骤232。The scheduler 120 may initiate secondary scheduling for each tenant in sequence. For example, step 230 includes the following detailed steps 231 to 232.

步骤231、调度器120遍历租户,判断是否存在下一个租户的二级调度未启动,若不存在下一个租户的二级调度未启动,二级调度初始化完成。若存在下一个租户的二级调度未启动,步骤232、调度器120运行租户的二级调度应用程序,启动租户的二级调度。在一些实施例中,调度器120运行二级调度应用程序后,调度器120获取到待处理作业,对待处理作业执行一级调度后,对租户内用户的作业执行二级调度,租户内用户的作业包含待处理作业。Step 231: The scheduler 120 traverses the tenants and determines whether the secondary scheduling of the next tenant has not been started. If there is no secondary scheduling of the next tenant that has not been started, the secondary scheduling initialization is completed. If there is a next tenant whose secondary scheduling has not been started, in step 232, the scheduler 120 runs the tenant's secondary scheduling application and starts the tenant's secondary scheduling. In some embodiments, after the scheduler 120 runs the second-level scheduling application, the scheduler 120 obtains the jobs to be processed. After performing the first-level scheduling on the jobs to be processed, the scheduler 120 performs the second-level scheduling on the jobs of the users in the tenant. Jobs contains pending jobs.

可选地,调度器120也可以在获取到待处理作业后,确定与待处理作业关联的租户,启动与待处理作业关联的租户的二级调度。从而,在获取到待处理作业后,启动并执行二级调度,节省调度器120的计算资源。Optionally, after obtaining the job to be processed, the scheduler 120 may also determine the tenant associated with the job to be processed, and initiate secondary scheduling of the tenant associated with the job to be processed. Therefore, after obtaining the to-be-processed job, the secondary scheduling is started and executed, thereby saving the computing resources of the scheduler 120 .

图3为本申请实施例提供的一种作业调度方法的流程示意图。在这里以调度器120为例进行说明。如图3所示,该方法包括以下步骤。Figure 3 is a schematic flowchart of a job scheduling method provided by an embodiment of the present application. Here, the scheduler 120 is taken as an example for description. As shown in Figure 3, the method includes the following steps.

步骤310、调度器120获取待处理作业。Step 310: The scheduler 120 obtains the jobs to be processed.

调度器120通过应用接口接收租户中任一用户提交的作业。待处理作业为HPC相关的处理请求。应用接口可以包括应用平台接口(application platform interface,API)或命令行接口(command-line interface,CLI)或图形用户界面(Graphical UserInterface,GUI)。The scheduler 120 receives jobs submitted by any user in the tenant through the application interface. The pending jobs are HPC-related processing requests. The application interface may include an application platform interface (application platform interface, API) or a command-line interface (command-line interface, CLI) or a graphical user interface (Graphical UserInterface, GUI).

调度器120获取待处理作业后,对待处理作业执行两级隔离调度,即根据两级调度策略确定与待处理作业关联的租户和调度结果。两级调度策略用于指示以多租户隔离方式实现资源管理的作业调度方式。系统为多个租户提供的租户资源池相互隔离。关于两级调度如下步骤320和步骤330的阐述。After obtaining the job to be processed, the scheduler 120 performs two-level isolation scheduling on the job to be processed, that is, determining the tenant and scheduling result associated with the job to be processed according to the two-level scheduling policy. The two-level scheduling policy is used to indicate the job scheduling method that implements resource management in a multi-tenant isolation manner. The tenant resource pools provided by the system for multiple tenants are isolated from each other. Regarding the two-level scheduling, steps 320 and 330 are described below.

步骤320、调度器120确定与待处理作业关联的租户。Step 320: The scheduler 120 determines the tenant associated with the job to be processed.

调度器120可以预先配置租户与用户的关联关系。调度器120根据租户和用户的关联关系确定与提交待处理作业的用户的关联租户。The scheduler 120 may pre-configure the association between tenants and users. The scheduler 120 determines the tenant associated with the user who submitted the job to be processed based on the association relationship between the tenant and the user.

在一些实施例中,租户与用户的关联关系指示了租户标识和用户标识的对应关系。用户提交的处理请求包含的用户标识,用户标识用于唯一指示一个用户。调度器120根据用户标识查询租户与用户的关联关系,确定待处理作业的关联租户。可选地,处理请求包含的租户标识,调度器120根据租户标识查询租户与用户的关联关系,确定待处理作业的关联租户。租户标识用于唯一指示一个租户。In some embodiments, the association between the tenant and the user indicates the corresponding relationship between the tenant identification and the user identification. The processing request submitted by the user contains the user ID, which is used to uniquely indicate a user. The scheduler 120 queries the association relationship between the tenant and the user according to the user ID, and determines the associated tenant of the job to be processed. Optionally, the tenant identifier included in the processing request is processed. The scheduler 120 queries the association between the tenant and the user based on the tenant identifier, and determines the associated tenant of the job to be processed. The tenant ID is used to uniquely identify a tenant.

在另一些实施例中,租户与用户的关联关系指示了租户标识、用户标识和调度资源的对应关系。调度资源用于调度租户内用户的作业,例如调度资源可以是指调度器120中运行二级调度应用程序的线程或子线程。多个租户的调度资源相互隔离。调度器120确定待处理作业的关联租户,即确定调度器120内关联租户的调度资源,关联租户的调度资源用于基于关联租户的租户资源池调度关联租户内用户的作业。In other embodiments, the association between tenants and users indicates the corresponding relationship between tenant identification, user identification and scheduling resources. Scheduling resources are used to schedule jobs of users within a tenant. For example, scheduling resources may refer to threads or sub-threads running secondary scheduling applications in the scheduler 120 . The scheduling resources of multiple tenants are isolated from each other. The scheduler 120 determines the associated tenant of the job to be processed, that is, determines the scheduling resources of the associated tenant in the scheduler 120. The scheduling resources of the associated tenant are used to schedule jobs of users in the associated tenant based on the tenant resource pool of the associated tenant.

在一种示例中,租户与用户的关联关系可以以表格的形式呈现,如表1所示。In an example, the association between tenants and users can be presented in the form of a table, as shown in Table 1.

表1Table 1

由表1可知,租户1关联的用户包括用户11、用户12和用户13,租户1还关联了调度资源1。租户2关联的用户包括用户21、用户22和用户23,租户2还关联了调度资源2。租户3关联的用户包括用户31、用户32和用户33,租户3还关联了调度资源3。As can be seen from Table 1, the users associated with tenant 1 include user 11, user 12, and user 13. Tenant 1 is also associated with scheduling resource 1. The users associated with tenant 2 include user 21, user 22, and user 23. Tenant 2 is also associated with scheduling resource 2. The users associated with tenant 3 include user 31, user 32, and user 33. Tenant 3 is also associated with scheduling resource 3.

假设调度器120接收到租户1中用户11提交的作业,调度器120可以从处理请求中获取到用户11,调度器120根据用户11查询表1,得到用户11提交的作业关联了租户1,租户1的调度资源为调度资源1,利用调度资源1基于租户1的租户资源池调度租户1内用户的作业。Assume that the scheduler 120 receives a job submitted by user 11 in tenant 1. The scheduler 120 can obtain user 11 from the processing request. The scheduler 120 queries table 1 according to user 11 and obtains that the job submitted by user 11 is associated with tenant 1. Tenant 1 The scheduling resource of 1 is scheduling resource 1. Scheduling resource 1 is used to schedule the jobs of users in tenant 1 based on the tenant resource pool of tenant 1.

表1只是以表格的形式示意关联关系在存储设备中的存储形式,并不是对该关联关系在存储设备中的存储形式的限定,当然,该关联关系在存储设备中的存储形式还可以以其他的形式存储,本实施例对此不做限定。Table 1 only illustrates the storage form of the association relationship in the storage device in the form of a table, and does not limit the storage form of the association relationship in the storage device. Of course, the storage form of the association relationship in the storage device can also be in other forms. stored in the form, this embodiment does not limit this.

假设第一租户的用户提交了待处理作业。如果租户与用户的关联关系不包含第一租户,表示系统中不存在第一租户,第一租户为非法租户,不允许第一租户的用户访问系统,调度器120还可以向用户反馈访问失败响应。Assume that a user of the first tenant submitted a pending job. If the association between the tenant and the user does not include the first tenant, it means that the first tenant does not exist in the system, the first tenant is an illegal tenant, and users of the first tenant are not allowed to access the system. The scheduler 120 can also feedback an access failure response to the user. .

如果租户与用户的关联关系包含第一租户,表示系统中存在第一租户,第一租户为合法租户,允许第一租户的用户访问系统,对待处理作业执行二级调度,即执行步骤330。调度器120还可以向用户反馈访问成功响应。If the association between the tenant and the user includes the first tenant, it means that the first tenant exists in the system and the first tenant is a legal tenant. Users of the first tenant are allowed to access the system, and secondary scheduling is performed on the jobs to be processed, that is, step 330 is performed. The scheduler 120 may also feed back an access success response to the user.

步骤330、调度器120基于关联租户的租户资源池调度关联租户内用户的作业,得到调度结果。Step 330: The scheduler 120 schedules jobs of users in the associated tenant based on the tenant resource pool of the associated tenant, and obtains the scheduling result.

调度器120执行二级调度时,对关联租户的多个用户的作业进行一起调度,即确定多个用户的作业的执行顺序,该关联租户的多个用户的作业包括待处理作业。调度器120确定了需要处理的作业后,为作业分配所使用的租户资源池中的资源。例如,调度器120将租户1的用户的作业调度到租户1的租户资源池1中至少一个计算节点,由至少一个计算节点提供计算资源处理作业。调度器120可以先调度待处理作业,为待处理作业分配租户资源池中的资源;或者,调度器120先调度其他用户的作业,为其他用户的作业分配租户资源池中的资源。When the scheduler 120 performs secondary scheduling, it schedules the jobs of multiple users associated with the tenant together, that is, determines the execution order of the jobs of the multiple users, and the jobs of the multiple users associated with the tenant include pending jobs. After determining the job that needs to be processed, the scheduler 120 allocates resources in the tenant resource pool used by the job. For example, the scheduler 120 schedules the job of the user of tenant 1 to at least one computing node in tenant resource pool 1 of tenant 1, and the at least one computing node provides the computing resource processing job. The scheduler 120 may first schedule the jobs to be processed and allocate resources in the tenant resource pool to the jobs to be processed; or the scheduler 120 may first schedule the jobs of other users and allocate resources in the tenant resource pool to the jobs of other users.

在一些实施例中,调度器120可以配置租户级的调度策略,即为每个租户配置一个调度策略。调度器120运行租户的二级调度的线程或子线程,对租户的作业执行租户的二级调度时,可以根据租户的调度策略对租户的作业进行调度。不同的租户配置的调度策略可以相同也可以不同。In some embodiments, the scheduler 120 can configure a tenant-level scheduling policy, that is, configure a scheduling policy for each tenant. The scheduler 120 runs a thread or sub-thread of the tenant's secondary scheduling. When executing the tenant's secondary scheduling on the tenant's job, the tenant's job can be scheduled according to the tenant's scheduling policy. The scheduling policies configured by different tenants can be the same or different.

例如,假设调度策略为先来先服务策略,调度器120执行二级调度时,对先获取到的用户的作业进行调度。For example, assuming that the scheduling policy is a first-come, first-served policy, when the scheduler 120 performs secondary scheduling, it schedules the jobs of the users obtained first.

在另一些实施例中,调度器120还可以配置租户级的队列,即为每个租户配置至少一个队列。如果一个租户配置一个队列,租户内用户均与队列具有绑定关系,调度器120对该队列内的作业进行二级调度。如果一个租户配置至少两个队列,租户内用户可以分配给至少两个队列,即每个队列与租户内部分用户具有绑定关系。例如,租户配置两个队列,租户包括6个用户,每个队列与3个用户租户具有绑定关系。另外,每个队列也可以配置调度策略、队列可用的租户资源池中的资源。每个队列绑定的用户不同。一个用户的作业属于一个队列。In other embodiments, the scheduler 120 can also configure tenant-level queues, that is, configure at least one queue for each tenant. If a tenant configures a queue, and all users in the tenant have a binding relationship with the queue, the scheduler 120 performs secondary scheduling on the jobs in the queue. If a tenant is configured with at least two queues, users within the tenant can be assigned to at least two queues, that is, each queue has a binding relationship with some users within the tenant. For example, a tenant configures two queues. The tenant includes 6 users, and each queue has a binding relationship with 3 user tenants. In addition, each queue can also be configured with scheduling policies and resources in the tenant resource pool available to the queue. The users bound to each queue are different. A user's jobs belong to a queue.

调度器120执行二级调度时,可以轮询租户的至少两个队列,对每个队列的作业进行二级调度。或者,调度器120还可以按照队列的优先级进行作业调度,高优先级的队列中的作业优先获取租户资源池中的资源。When the scheduler 120 performs secondary scheduling, it may poll at least two queues of the tenant and perform secondary scheduling on the jobs of each queue. Alternatively, the scheduler 120 can also schedule jobs according to the priorities of the queues, and jobs in the queues with high priority will first obtain resources in the tenant resource pool.

调度器120调度每个队列的作业的过程可以参考图4所示的方法步骤。The process by which the scheduler 120 schedules jobs in each queue may refer to the method steps shown in FIG. 4 .

步骤410、调度器120判断队列中是否有作业。Step 410: The scheduler 120 determines whether there is a job in the queue.

如果队列中无等待调度的作业,此轮调度结束。如果队列中有等待调度的作业,遍历队列中的作业,执行步骤420。If there are no jobs waiting to be scheduled in the queue, this round of scheduling ends. If there are jobs waiting for scheduling in the queue, traverse the jobs in the queue and execute step 420.

步骤420、调度器120确定处理作业的资源。Step 420: The scheduler 120 determines resources for processing the job.

调度器120判断租户资源池是否满足资源需求,若租户资源池满足资源需求,从租户资源池中确定处理作业所需的资源,执行步骤430。若租户资源池不满足资源需求,执行步骤440,向用户反馈提交失败响应,以及执行步骤410,判断队列中是否有作业。The scheduler 120 determines whether the tenant resource pool meets the resource requirements. If the tenant resource pool meets the resource requirements, the scheduler 120 determines the resources required to process the job from the tenant resource pool, and executes step 430. If the tenant resource pool does not meet the resource requirements, step 440 is executed to feedback a submission failure response to the user, and step 410 is executed to determine whether there are jobs in the queue.

在一些实施例中,处理请求包括处理作业所需的资源需求。例如资源需求指示了计算资源、存储资源和网络资源中至少一种。计算资源包括进程数量、线程数量和计算节点数量,以及CPU、GPU、NPU等用于作业处理的XPU中至少一种。存储资源包括处理作业所需的存储容量。网络资源包括带宽需求。In some embodiments, processing the request includes resource requirements required to process the job. For example, the resource requirement indicates at least one of computing resources, storage resources, and network resources. Computing resources include the number of processes, threads, and computing nodes, as well as at least one of XPUs such as CPU, GPU, and NPU used for job processing. Storage resources include the storage capacity required to process jobs. Network resources include bandwidth requirements.

步骤430、调度器120派发作业。Step 430: The scheduler 120 dispatches the job.

调度器120根据两级调度策略确定待处理作业的关联租户和调度结果后,根据调度结果执行待处理作业的操作。例如,调度器120向满足资源需求的计算节点发送指示,指示计算节点处理待处理作业。调度器120派发作业完成后,可以再执行步骤410。After determining the associated tenant and scheduling result of the job to be processed according to the two-level scheduling policy, the scheduler 120 executes the operation of the job to be processed according to the scheduling result. For example, the scheduler 120 sends an indication to a computing node that meets the resource requirements, instructing the computing node to process the pending job. After the scheduler 120 completes dispatching the job, step 410 can be executed again.

调度器120对队列中的作业调度完成后,还可以向用户反馈提交成功响应。After the scheduler 120 completes scheduling the jobs in the queue, it may also feedback a successful submission response to the user.

如此,用户在提交作业后,调度器通过一级调度将不同用户提交的作业调度到用户归属的租户的调度资源,即将不同用户的作业以租户进行二级调度,通过二级调度对同一租户内不同用户的作业进行排序处理,进而为用户的作业分配租户资源池中的资源。使得调度器对不同租户内用户的作业进行两级隔离处理,从而提升了数据处理安全性,而且基于每个租户自有的资源进行作业调度有效地提升了数据处理的效率。In this way, after the user submits the job, the scheduler will schedule the jobs submitted by different users to the scheduling resources of the tenant to which the user belongs through first-level scheduling. That is, the jobs of different users will be scheduled for the second-level tenant, and the jobs within the same tenant will be scheduled through the second-level scheduling. The jobs of different users are sorted and processed, and then the resources in the tenant resource pool are allocated to the users' jobs. This allows the scheduler to perform two-level isolation processing on the jobs of users in different tenants, thus improving the security of data processing. Moreover, job scheduling based on each tenant's own resources effectively improves the efficiency of data processing.

上述实施例是对用户提交作业流程的说明。在另一些实施例中,用户提交的处理请求还可以指示作业管理操作。图5为本申请实施例提供的一种作业管理方法的流程图。The above embodiment is an explanation of the user submission process. In other embodiments, user-submitted processing requests may also indicate job management operations. Figure 5 is a flow chart of a job management method provided by an embodiment of the present application.

步骤510、调度器120判断作业管理操作与作业状态是否匹配。Step 510: The scheduler 120 determines whether the job management operation matches the job status.

调度器120接收到待处理作业,进行一级调度。例如,待处理作业为HPC相关的处理请求。处理请求包括作业号,作业号用于唯一指示一个作业。调度器120根据作业号确定作业调度系统中是否存在作业,判断作业管理操作与作业状态是否匹配。若作业管理操作与作业状态匹配,执行步骤520,以及步骤530,调度器120向用户反馈操作成功响应。若作业管理操作与作业状态不匹配,执行步骤540,调度器120向用户反馈操作失败响应。The scheduler 120 receives the jobs to be processed and performs first-level scheduling. For example, the pending jobs are HPC-related processing requests. Processing requests include a job number, which uniquely identifies a job. The scheduler 120 determines whether there is a job in the job scheduling system based on the job number, and determines whether the job management operation matches the job status. If the job management operation matches the job status, step 520 is executed, and step 530 is performed. The scheduler 120 feeds back a successful operation response to the user. If the job management operation does not match the job status, step 540 is executed, and the scheduler 120 feeds back an operation failure response to the user.

步骤520、调度器120根据作业管理操作对作业号指示的作业进行操作。Step 520: The scheduler 120 operates the job indicated by the job number according to the job management operation.

调度器120根据两级调度策略确定待处理作业的关联租户和调度结果后,根据调度结果执行待处理作业的操作。例如,调度器120可以向处理该作业的计算节点发送指示,指示计算节点根据作业管理操作调整作业号指示的作业的作业状态。作业状态包括运行、暂停、恢复和终止。计算节点可以周期性的向调度器120上报作业的运行状态,以便于调度器120对作业的作业状态进行管理。After determining the associated tenant and scheduling result of the job to be processed according to the two-level scheduling policy, the scheduler 120 executes the operation of the job to be processed according to the scheduling result. For example, the scheduler 120 may send an instruction to the computing node processing the job, instructing the computing node to adjust the job status of the job indicated by the job number according to the job management operation. Job status includes running, paused, resumed, and terminated. The computing node can periodically report the running status of the job to the scheduler 120 so that the scheduler 120 can manage the job status of the job.

示例地,如果计算节点处理的作业处于运行状态,作业管理操作指示终止作业,则计算节点根据指示将作业的状态由运行状态调整为终止状态。For example, if the job processed by the computing node is in the running state and the job management operation instructs the job to be terminated, the computing node adjusts the status of the job from the running state to the terminated state according to the instruction.

可选地,计算节点根据作业管理操作调整作业状态后,也可以向用户反馈操作失败响应或操作成功响应。Optionally, after the computing node adjusts the job status according to the job management operation, it can also feedback an operation failure response or an operation success response to the user.

如此,用户在管理作业时,调度器通过一级调度在确定作业管理操作与作业状态匹配时,将不同用户的作业管理操作调度到用户归属的租户的调度资源,即将不同用户的作业的作业管理操作以租户进行二级调度,通过二级调度对同一租户内不同用户的作业的作业管理操作进行排序处理,进而为将作业管理操作派发到处理作业的计算节点。使得调度器对不同租户内用户的作业的作业管理操作进行两级隔离处理,从而提升了数据处理安全性,而且基于每个租户自有的资源进行作业调度有效地提升了数据处理的效率。In this way, when a user manages a job, the scheduler determines that the job management operation matches the job status through primary scheduling, and schedules the job management operations of different users to the scheduling resources of the tenant to which the user belongs, that is, the job management of the jobs of different users The operation is scheduled on a tenant-by-tenant basis, and the job management operations of different users' jobs within the same tenant are sorted and processed through the second-level scheduling, and then the job management operations are dispatched to the computing nodes that process the jobs. This allows the scheduler to perform two-level isolation processing on the job management operations of users' jobs in different tenants, thus improving the security of data processing. Moreover, job scheduling based on each tenant's own resources effectively improves the efficiency of data processing.

可以理解的是,为了实现上述实施例中的功能,调度器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。It can be understood that, in order to implement the functions in the above embodiments, the scheduler includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.

上文中结合图1至图5,详细描述了根据本实施例所提供的作业调度方法,下面将结合图6,描述根据本实施例所提供的作业调度装置。The job scheduling method provided according to this embodiment is described in detail above with reference to FIGS. 1 to 5 . Next, the job scheduling device provided according to this embodiment will be described with reference to FIG. 6 .

图6为本实施例提供的可能的作业调度装置的结构示意图。这些作业调度装置可以用于实现上述方法实施例中调度器的功能,因此也能实现上述方法实施例所具备的有益效果。在本实施例中,该作业调度装置可以是如图1所示的调度器120,还可以是应用于服务器的模块(如芯片)。Figure 6 is a schematic structural diagram of a possible job scheduling device provided by this embodiment. These job scheduling devices can be used to implement the functions of the scheduler in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments. In this embodiment, the job scheduling device may be the scheduler 120 as shown in FIG. 1 , or may be a module (such as a chip) applied to the server.

如图6所示,作业调度装置600包括通信模块610、调度模块620和存储模块630。作业调度装置600用于实现上述图3、图4和图5中所示的方法实施例中调度器120的功能。As shown in FIG. 6 , the job scheduling device 600 includes a communication module 610 , a scheduling module 620 and a storage module 630 . The job scheduling device 600 is used to implement the functions of the scheduler 120 in the method embodiments shown in FIG. 3, FIG. 4, and FIG. 5.

通信模块610用于获取待处理作业,待处理作业为HPC相关的处理请求。The communication module 610 is used to obtain pending jobs, which are HPC-related processing requests.

调度模块620,用于根据两级调度策略确定所述待处理作业的关联租户和调度结果,所述两级调度策略用于指示以多租户隔离方式实现资源管理的作业调度方式。例如,调度模块620用于执行图3中步骤320和步骤330。The scheduling module 620 is configured to determine the associated tenant and scheduling result of the to-be-processed job according to a two-level scheduling policy, which is used to indicate a job scheduling method that implements resource management in a multi-tenant isolation manner. For example, the scheduling module 620 is used to execute step 320 and step 330 in FIG. 3 .

存储模块630用于存储租户信息、调度策略、队列,以便于调度模块620基于所述关联租户的租户资源池,根据所述关联租户的调度策略调度所述关联租户内用户的作业,得到所述调度结果。The storage module 630 is used to store tenant information, scheduling policies, and queues, so that the scheduling module 620 schedules the jobs of users in the associated tenant according to the scheduling policy of the associated tenant based on the tenant resource pool of the associated tenant, and obtains the Scheduling results.

调度模块620具体用于基于所述关联租户的租户资源池,根据所述关联租户的调度策略调度第一队列中所述关联租户内用户的作业,得到所述调度结果,所述第一队列为多个队列中包含所述待处理作业的队列。The scheduling module 620 is specifically configured to schedule the jobs of the users in the associated tenant in the first queue according to the scheduling policy of the associated tenant based on the tenant resource pool of the associated tenant, and obtain the scheduling result. The first queue is A queue containing the pending job among multiple queues.

调度模块620还用于根据所述调度结果对所述待处理作业执行作业处理操作和作业管理操作。The scheduling module 620 is also configured to perform job processing operations and job management operations on the to-be-processed jobs according to the scheduling results.

调度模块620还用于根据租户和用户的关联关系确定与提交所述待处理作业的用户的关联租户。The scheduling module 620 is also configured to determine the tenant associated with the user who submitted the job to be processed based on the association between the tenant and the user.

应理解的是,本申请实施例的作业调度装置600可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complexprogrammable logical device,CPLD),现场可编程门阵列(field-programmable gatearray,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图3、图4和图5所示的作业调度方法时,及其各个模块也可以为软件模块,作业调度装置600及其各个模块也可以为软件模块。It should be understood that the job scheduling device 600 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD can be a complex program. Logic device (complexprogrammable logical device, CPLD), field-programmable gate array (field-programmable gatearray, FPGA), general array logic (generic array logic, GAL) or any combination thereof. When the job scheduling methods shown in Figures 3, 4 and 5 can also be implemented through software, the job scheduling apparatus 600 and its respective modules can also be software modules.

根据本申请实施例的作业调度装置600可对应于执行本申请实施例中描述的方法,并且作业调度装置600中的各个单元的上述和其它操作和/或功能分别为了实现图3、图4和图5中的各个方法的相应流程,为了简洁,在此不再赘述。The job scheduling device 600 according to the embodiment of the present application may correspond to executing the method described in the embodiment of the present application, and the above and other operations and/or functions of the various units in the job scheduling device 600 are respectively to implement Figures 3, 4 and The corresponding processes of each method in Figure 5 will not be described again for the sake of simplicity.

图7为本实施例提供的一种调度器700的结构示意图。如图所示,调度器700包括处理器710、总线720、存储器730、通信接口740和内存单元750(也可以称为主存(mainmemory)单元)。处理器710、存储器730、内存单元750和通信接口740通过总线720相连。Figure 7 is a schematic structural diagram of a scheduler 700 provided in this embodiment. As shown in the figure, the scheduler 700 includes a processor 710, a bus 720, a memory 730, a communication interface 740, and a memory unit 750 (which may also be called a main memory unit). The processor 710, the memory 730, the memory unit 750 and the communication interface 740 are connected through a bus 720.

应理解,在本实施例中,处理器710可以是CPU,该处理器710还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。It should be understood that in this embodiment, the processor 710 may be a CPU. The processor 710 may also be other general-purpose processors, digital signal processing (DSP), ASIC, FPGA or other programmable logic devices. Discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor, etc.

处理器还可以是图形处理器(graphics processing unit,GPU)、神经网络处理器(neural network processing unit,NPU)、微处理器、ASIC、或一个或多个用于控制本申请方案程序执行的集成电路。The processor may also be a graphics processing unit (GPU), a neural network processing unit (NPU), a microprocessor, an ASIC, or one or more integrated devices for controlling the execution of the program of the present application. circuit.

通信接口740用于实现调度器700与外部设备或器件的通信。在本实施例中,调度器700用于实现图3、图4和图5所示的调度器120的功能时,通信接口740用于获取待处理作业,所述待处理作业为高性能计算HPC相关的处理请求,处理器710根据两级调度策略确定待处理作业的关联租户和调度结果。The communication interface 740 is used to implement communication between the scheduler 700 and external devices or devices. In this embodiment, when the scheduler 700 is used to implement the functions of the scheduler 120 shown in Figure 3, Figure 4 and Figure 5, the communication interface 740 is used to obtain the jobs to be processed, and the jobs to be processed are high-performance computing HPC For related processing requests, the processor 710 determines the associated tenant and scheduling result of the job to be processed according to the two-level scheduling policy.

总线720可以包括一通路,用于在上述组件(如处理器710、内存单元750和存储器730)之间传送信息。总线720除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线720。总线720可以是快捷外围部件互连标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线720可以分为地址总线、数据总线、控制总线等。Bus 720 may include a path for communicating information between the components described above, such as processor 710, memory unit 750, and storage 730. In addition to the data bus, the bus 720 may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, the various buses are labeled bus 720 in the figure. The bus 720 may be a Peripheral Component Interconnect Express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), or a computer quick link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus 720 can be divided into an address bus, a data bus, a control bus, etc.

作为一个示例,调度器700可以包括多个处理器。处理器可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的计算单元。As one example, scheduler 700 may include multiple processors. The processor may be a multi-CPU processor. A processor here may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions).

值得说明的是,图7中仅以调度器700包括1个处理器710和1个存储器730为例,此处,处理器710和存储器730分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。It is worth noting that FIG. 7 only takes the scheduler 700 including a processor 710 and a memory 730 as an example. Here, the processor 710 and the memory 730 are respectively used to indicate a type of device or device. In specific embodiments, , the quantity of each type of device or equipment can be determined based on business needs.

内存单元750可以是易失性存储器池或非易失性存储器池,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlinkDRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。Memory unit 750 may be a pool of volatile or non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlinkDRAM, SLDRAM) and direct memory Bus random access memory (direct rambus RAM, DR RAM).

存储器730可以对应上述方法实施例中用于存储调度策略、队列等信息的存储介质,例如,磁盘,如机械硬盘或固态硬盘。The memory 730 may correspond to the storage medium used to store information such as scheduling policies and queues in the above method embodiments, for example, a magnetic disk, such as a mechanical hard disk or a solid-state hard disk.

上述调度器700可以是一个通用设备或者是一个专用设备。例如,调度器700可以是边缘设备(例如,携带具有处理能力芯片的盒子)等。可选地,调度器700也可以是服务器或其他具有计算能力的设备。The above-mentioned scheduler 700 may be a general-purpose device or a special-purpose device. For example, the scheduler 700 may be an edge device (eg, a box carrying a chip with processing capabilities) or the like. Optionally, the scheduler 700 may also be a server or other device with computing capabilities.

应理解,根据本实施例的调度器700可对应于本实施例中的作业调度装置600,并可以对应于执行根据图3、图4或图5中任一方法中的相应主体,并且作业调度装置600中的各个模块的上述和其它操作和/或功能分别为了实现图3、图4或图5中的各个方法的相应流程,为了简洁,在此不再赘述。It should be understood that the scheduler 700 according to this embodiment may correspond to the job scheduling device 600 in this embodiment, and may correspond to the corresponding subject executing any method according to FIG. 3, FIG. 4 or FIG. 5, and the job scheduling The above and other operations and/or functions of each module in the device 600 are respectively intended to implement the corresponding processes of each method in Figure 3, Figure 4 or Figure 5. For the sake of simplicity, they will not be described again here.

本申请实施例提供一种计算机设备,该计算机设备包括调度器,调度器用于执行上述方法实施例所述的作业调度方法的操作步骤。Embodiments of the present application provide a computer device. The computer device includes a scheduler. The scheduler is configured to execute the operating steps of the job scheduling method described in the above method embodiment.

本申请实施例提供一种作业调度系统,该作业调度系统包括调度器和资源池,资源池包括存储节点、计算节点和网络,资源池用于提供多个租户的租户资源池,调度器用于执行上述方法实施例所述的作业调度方法的操作步骤。Embodiments of the present application provide a job scheduling system. The job scheduling system includes a scheduler and a resource pool. The resource pool includes storage nodes, computing nodes, and networks. The resource pool is used to provide tenant resource pools for multiple tenants. The scheduler is used to execute The operation steps of the job scheduling method described in the above method embodiment.

本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于计算设备中。当然,处理器和存储介质也可以作为分立组件存在于计算设备中。The method steps in this embodiment can be implemented by hardware or by a processor executing software instructions. Software instructions can be composed of corresponding software modules. Software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media. An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage media may be located in an ASIC. Additionally, the ASIC can be located in a computing device. Of course, the processor and storage medium may also exist as discrete components in a computing device.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device. The computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media, such as floppy disks, hard disks, and magnetic tapes; they may also be optical media, such as digital video discs (DVDs); they may also be semiconductor media, such as solid state drives (solid state drives). ,SSD). The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present application. Modification or replacement, these modifications or replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (11)

1.一种作业调度方法,其特征在于,所述方法由调度器执行,所述方法包括:1. A job scheduling method, characterized in that the method is executed by a scheduler, and the method includes: 获取待处理作业,所述待处理作业为高性能计算HPC相关的处理请求;Obtain pending jobs, which are high-performance computing HPC-related processing requests; 根据两级调度策略确定所述待处理作业的关联租户和调度结果,所述两级调度策略用于指示以多租户隔离方式实现资源管理的作业调度方式;Determine the associated tenant and scheduling result of the to-be-processed job according to a two-level scheduling policy, which is used to indicate a job scheduling method that implements resource management in a multi-tenant isolation manner; 根据所述调度结果执行所述待处理作业的操作。Execute the operation of the pending job according to the scheduling result. 2.根据权利要求1所述的方法,其特征在于,根据两级调度策略确定所述待处理作业的关联租户,包括:2. The method according to claim 1, characterized in that determining the associated tenant of the to-be-processed job according to a two-level scheduling policy includes: 根据租户和用户的关联关系确定与提交所述待处理作业的用户的关联租户。Determine the tenant associated with the user who submitted the job to be processed based on the association between the tenant and the user. 3.根据权利要求2所述的方法,其特征在于,根据租户和用户的关联关系确定与提交所述待处理作业的用户的关联租户,包括:3. The method according to claim 2, characterized in that determining the tenant associated with the user who submitted the job to be processed according to the association between the tenant and the user includes: 根据用户标识和所述关联关系确定与提交所述待处理作业的用户的关联租户。Determine a tenant associated with the user who submitted the job to be processed based on the user ID and the association relationship. 4.根据权利要求2或3所述的方法,其特征在于,根据两级调度策略确定所述待处理作业的关联租户和调度结果,包括:4. The method according to claim 2 or 3, characterized in that determining the associated tenant and scheduling result of the to-be-processed job according to a two-level scheduling policy includes: 基于所述关联租户的租户资源池,调度所述关联租户内用户的作业,得到所述调度结果,所述关联租户内用户的作业包含所述待处理作业,所述调度结果用于指示所述关联租户的租户资源池中处理所述待处理作业的资源。Based on the tenant resource pool of the associated tenant, the jobs of the users in the associated tenant are scheduled to obtain the scheduling result. The jobs of the users in the associated tenant include the to-be-processed jobs. The scheduling result is used to indicate the The resource in the tenant resource pool of the associated tenant that handles the pending job. 5.根据权利要求4所述的方法,其特征在于,基于所述关联租户的租户资源池,调度所述关联租户内用户的作业,得到所述调度结果,包括:5. The method according to claim 4, characterized in that, based on the tenant resource pool of the associated tenant, scheduling the jobs of the users in the associated tenant to obtain the scheduling result, including: 基于所述关联租户的租户资源池,根据所述关联租户的调度策略调度所述关联租户内用户的作业,得到所述调度结果。Based on the tenant resource pool of the associated tenant, jobs of users in the associated tenant are scheduled according to the scheduling policy of the associated tenant, and the scheduling result is obtained. 6.根据权利要求4或5所述的方法,其特征在于,基于所述关联租户的租户资源池,调度所述关联租户内用户的作业,得到所述调度结果,包括:6. The method according to claim 4 or 5, characterized in that, based on the tenant resource pool of the associated tenant, scheduling the jobs of users in the associated tenant to obtain the scheduling result includes: 基于所述关联租户的租户资源池,根据所述关联租户的调度策略调度第一队列中所述关联租户内用户的作业,得到所述调度结果,所述第一队列为多个队列中包含所述待处理作业的队列。Based on the tenant resource pool of the associated tenant, schedule the jobs of the users in the associated tenant in the first queue according to the scheduling policy of the associated tenant to obtain the scheduling result. The first queue contains all the tasks in the multiple queues. Describes the queue of pending jobs. 7.根据权利要求1-6中任一项所述的方法,其特征在于,多个租户的租户资源池相互隔离,所述租户资源池包括计算资源、存储资源和网络资源中至少一种。7. The method according to any one of claims 1 to 6, characterized in that tenant resource pools of multiple tenants are isolated from each other, and the tenant resource pools include at least one of computing resources, storage resources and network resources. 8.根据权利要求1-7中任一项所述的方法,其特征在于,根据所述调度结果执行所述待处理作业的操作,包括:8. The method according to any one of claims 1 to 7, characterized in that, executing the operation of the pending job according to the scheduling result includes: 根据所述调度结果对所述待处理作业执行作业处理操作和作业管理操作。Perform job processing operations and job management operations on the to-be-processed jobs according to the scheduling results. 9.一种作业调度装置,其特征在于,所述装置包括:9. A job scheduling device, characterized in that the device includes: 通信模块,用于获取待处理作业,所述待处理作业为高性能计算HPC相关的处理请求;A communication module, used to obtain pending jobs, which are high-performance computing HPC-related processing requests; 调度模块,用于根据两级调度策略确定所述待处理作业的关联租户和调度结果,所述两级调度策略用于指示以多租户隔离方式实现资源管理的作业调度方式;A scheduling module, configured to determine the associated tenants and scheduling results of the to-be-processed job according to a two-level scheduling policy, which is used to indicate a job scheduling method that implements resource management in a multi-tenant isolation manner; 所述调度模块,还用于根据所述调度结果执行所述待处理作业的操作。The scheduling module is also configured to execute the operation of the to-be-processed job according to the scheduling result. 10.一种调度器,其特征在于,所述调度器包括存储器和至少一个处理器,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行上述权利要求1-8中任一项所述的方法的操作步骤。10. A scheduler, characterized in that the scheduler includes a memory and at least one processor, the memory is used to store a set of computer instructions; when the processor executes the set of computer instructions, the above The operating steps of the method according to any one of claims 1-8. 11.一种作业调度系统,其特征在于,所述作业调度系统包括调度器和资源池,所述资源池包括存储节点、计算节点和网络,所述资源池用于提供多个租户的租户资源池,所述调度器用于执行上述权利要求1-8中任一项所述的方法的操作步骤。11. A job scheduling system, characterized in that the job scheduling system includes a scheduler and a resource pool. The resource pool includes storage nodes, computing nodes and networks. The resource pool is used to provide tenant resources for multiple tenants. Pool, the scheduler is used to perform the operation steps of the method described in any one of the above claims 1-8.
CN202210963659.1A 2022-08-11 2022-08-11 Job scheduling method, device, scheduler and system Pending CN117632390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210963659.1A CN117632390A (en) 2022-08-11 2022-08-11 Job scheduling method, device, scheduler and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210963659.1A CN117632390A (en) 2022-08-11 2022-08-11 Job scheduling method, device, scheduler and system

Publications (1)

Publication Number Publication Date
CN117632390A true CN117632390A (en) 2024-03-01

Family

ID=90018599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210963659.1A Pending CN117632390A (en) 2022-08-11 2022-08-11 Job scheduling method, device, scheduler and system

Country Status (1)

Country Link
CN (1) CN117632390A (en)

Similar Documents

Publication Publication Date Title
US11204807B2 (en) Multi-layer QOS management in a distributed computing environment
US11630704B2 (en) System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
WO2017133351A1 (en) Resource allocation method and resource manager
WO2018006864A1 (en) Method, apparatus and system for creating virtual machine, control device and storage medium
US20140250440A1 (en) System and method for managing storage input/output for a compute environment
US11586392B2 (en) Multi-stream SSD QoS management
US10810143B2 (en) Distributed storage system and method for managing storage access bandwidth for multiple clients
WO2023082560A1 (en) Task processing method and apparatus, device, and medium
US11995016B2 (en) Input/output command rebalancing in a virtualized computer system
US11311722B2 (en) Cross-platform workload processing
US20080229319A1 (en) Global Resource Allocation Control
CN109271236A (en) A kind of method, apparatus of traffic scheduling, computer storage medium and terminal
WO2013091219A1 (en) Method and apparatus for processing concurrent tasks
CN105677467A (en) Yarn resource scheduler based on quantified labels
Dimopoulos et al. Big data framework interference in restricted private cloud settings
TWI554945B (en) Routine task allocating method and multicore computer using the same
CN117632390A (en) Job scheduling method, device, scheduler and system
Jordan et al. Wrangler's user environment: A software framework for management of data-intensive computing system
US12210521B2 (en) Short query prioritization for data processing service
CN120336034A (en) Task scheduling method, device, equipment and storage medium
CN117171095A (en) A data transmission and processing method, device, computer equipment, and storage medium
JP2015170270A (en) Information processing apparatus, resource access method thereof and resource access program
Yao Resource management in cluster computing platforms for large scale data processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination