[go: up one dir, main page]

CN112698911B - Cloud job scheduling method based on deep reinforcement learning - Google Patents

Cloud job scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN112698911B
CN112698911B CN202011578884.0A CN202011578884A CN112698911B CN 112698911 B CN112698911 B CN 112698911B CN 202011578884 A CN202011578884 A CN 202011578884A CN 112698911 B CN112698911 B CN 112698911B
Authority
CN
China
Prior art keywords
job
virtual machine
scheduling
mentioned
ready
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011578884.0A
Other languages
Chinese (zh)
Other versions
CN112698911A (en
Inventor
李启锐
彭志平
崔得龙
林建鹏
何杰光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Petrochemical Technology
Original Assignee
Guangdong University of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Petrochemical Technology filed Critical Guangdong University of Petrochemical Technology
Priority to CN202011578884.0A priority Critical patent/CN112698911B/en
Publication of CN112698911A publication Critical patent/CN112698911A/en
Application granted granted Critical
Publication of CN112698911B publication Critical patent/CN112698911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of cloud computing resource scheduling, in particular to a cloud job scheduling method based on deep reinforcement learning, which comprises the following steps: receiving user jobs sent by a user; decoupling user operation to obtain a ready operation set; scheduling the ready job set through a job scheduler; the scheduling is to take action according to a scheduling strategy and deploy the operation in the ready operation set to the corresponding virtual machine; executing the job through the virtual machine and returning an execution result; collecting training samples and establishing an experience pool; the training sample is used for storing a ready operation set state, a virtual machine state, an action and a return value; the return value is the return obtained by taking action; judging whether the number of training samples in the experience pool is smaller than a threshold value, if so, re-receiving user operation sent by a user, and otherwise, optimizing an operation scheduler by using the training samples in the experience pool; and scheduling by using the optimized job scheduler. The invention can shorten the completion time of the user operation.

Description

一种基于深度强化学习的云作业调度方法A cloud job scheduling method based on deep reinforcement learning

技术领域technical field

本发明涉及云计算资源调度领域,更具体地,涉及一种基于深度强化学习的云作业调度方法。The invention relates to the field of cloud computing resource scheduling, and more particularly, to a cloud job scheduling method based on deep reinforcement learning.

背景技术Background technique

云计算本质上是一种通过技术集成,将计算、存储、应用程序等IT资源集中并实现高效低成本供给的服务提供模型。中国云计算专家咨询委员会秘书长刘鹏教授曾对云计算做了长短两种定义。长定义是:云计算是一种商业计算模型,它将计算任务分布在大计算机构成的资源池上,使各种应用系统能够根据需要获取计算能力、储存空间和信息服务。短定义是:云计算是通过网络按需提供可动态伸缩的廉价计算服务。Cloud computing is essentially a service provision model that integrates computing, storage, applications and other IT resources and achieves efficient and low-cost supply through technology integration. Professor Liu Peng, secretary-general of the China Cloud Computing Expert Advisory Committee, has made two definitions of cloud computing, long and short. The long definition is: cloud computing is a business computing model that distributes computing tasks on a resource pool composed of large computers, enabling various application systems to obtain computing power, storage space and information services as needed. The short definition is: Cloud computing is the provision of cheap computing services that can be dynamically scaled on demand over the network.

如今,云计算已发展成为提供各类云端服务与组件的软硬一体化技术资源平台,是一个带有明确商业模式的综合性载体。在社会需求的价值驱动下,云计算作为一种低成本、可灵活的配置的服务提供模型,将迎来更丰富的应用场景与更广阔的发展空间。在云作业调度过程中虽然具有较强的目标性,但其运动策略随环境状态变化而不断调整,具有一定的随机性;同时在制定作业调度策略时仅利用当前的系统状态,与过去和将来的系统状态无关,因此作业调度过程具有明显的马尔科夫特性(Markov property,是概率论中的一个概念,因为俄国数学家安德雷·马尔科夫得名。在给定现在状态时,它与过去状态(即该过程的历史路径)是条件独立的,那么此随机过程即具有马尔科夫特性)。一般情况下,云资源供需双方追求的目标都是在提交作业后在最短的时间内得到响应。随着人工智能技术的迅速发展,很多学者开始尝试使用强化学习等机器学习算法解决诸多在云作业调度上的问题,以贴近这一目标。目前,基于强化学习的云作业调度算法在云计算资源调度领域已取得许多不错的成果,同时也存在不少问题。云计算平台是一个庞大的、瞬变的系统,因此状态空间也是庞大的,状态空间的庞大以及即将持续增大,使基于强化学习的云作业调度算法应用在分布式数据中心这种复杂的大型的云计算系统中具有很大局限性,这种局限性严重影响强化学习算法的性能。而强化学习算法的性能受到影响会导致作业的完工时间过长,用户体验不佳。综上所述,目前亟需一种能缩短用户作业完工时间的基于深度强化学习的云作业调度方法。Today, cloud computing has developed into a software and hardware integrated technology resource platform that provides various cloud services and components, and is a comprehensive carrier with a clear business model. Driven by the value of social needs, cloud computing, as a low-cost, flexible configuration service provision model, will usher in richer application scenarios and broader development space. Although the cloud job scheduling process has a strong objective, its motion strategy is constantly adjusted with the change of the environmental state, which is random. The system state of the It is conditionally independent from the past state (that is, the historical path of the process), then this stochastic process has Markov properties). In general, the goal pursued by both the supply and demand sides of cloud resources is to get a response in the shortest time after submitting a job. With the rapid development of artificial intelligence technology, many scholars have begun to try to use machine learning algorithms such as reinforcement learning to solve many problems in cloud job scheduling in order to get closer to this goal. At present, cloud job scheduling algorithms based on reinforcement learning have achieved many good results in the field of cloud computing resource scheduling, but there are also many problems. The cloud computing platform is a huge and transient system, so the state space is also huge. The state space is huge and will continue to increase, which makes the cloud job scheduling algorithm based on reinforcement learning applied to the complex large-scale distributed data center. There are great limitations in the cloud computing system of the RL, and this limitation seriously affects the performance of reinforcement learning algorithms. The performance of reinforcement learning algorithms is affected, resulting in long job completion times and poor user experience. To sum up, there is an urgent need for a cloud job scheduling method based on deep reinforcement learning that can shorten the completion time of user jobs.

发明内容SUMMARY OF THE INVENTION

为了解决上述问题,本发明提供一种基于深度强化学习的云作业调度方法,该方法能缩短用户作业完工时间。In order to solve the above problems, the present invention provides a cloud job scheduling method based on deep reinforcement learning, which can shorten the completion time of user jobs.

本发明采取的技术方案是:The technical scheme adopted by the present invention is:

一种基于深度强化学习的云作业调度方法,包括:A cloud job scheduling method based on deep reinforcement learning, comprising:

接收用户发送的用户作业;Receive user jobs sent by users;

对用户作业进行解耦,获取就绪作业集;Decouple user jobs to obtain ready job sets;

通过作业调度器对就绪作业集进行调度;所述调度为按照调度策略采取动作,将就绪作业集中的作业部署到相应的虚拟机上;所述动作为就绪作业集中的作业的虚拟机分配方式;The ready job set is scheduled by the job scheduler; the scheduling is to take an action according to the scheduling policy, and deploy the jobs in the ready job set to the corresponding virtual machine; the action is the virtual machine allocation method of the jobs in the ready job set;

通过虚拟机执行作业,并且返回执行结果;Execute the job through the virtual machine and return the execution result;

收集训练样本,建立经验池;所述训练样本用于存储就绪作业集状态、虚拟机状态、动作和回报值;所述回报值为采取动作获得的回报;Collect training samples to establish an experience pool; the training samples are used to store the ready job set state, virtual machine state, action and reward value; the reward value is the reward obtained by taking the action;

判断经验池内的训练样本数量是否小于阈值,若小于阈值则重新接收用户发送的用户作业,否则利用经验池中的训练样本优化作业调度器;Determine whether the number of training samples in the experience pool is less than the threshold, and if it is less than the threshold, re-receive the user job sent by the user, otherwise use the training samples in the experience pool to optimize the job scheduler;

利用优化后的作业调度器进行调度。Use the optimized job scheduler for scheduling.

具体地,作业调度的具体过程为:首先,企业在多个数据中心部署虚拟机服务器后,企业工作人员将种类各异的用户作业提交给作业解耦器,作业解耦器对用户作业进行解耦。用户作业一般包含原子作业和存在依赖关系的次子作业,这里的解耦就是根据子作业的优先级和依赖关系将用户作业拆分为不同的子作业。将用户作业解耦成不同的子作业之后,子作业由就绪作业集进行存储。然后,通过作业调度器对就绪作业集进行调度,将就绪作业集中的子作业分配到不同数据中心。部署在不同数据中心的虚拟机服务器建立虚拟机,以此对分配到该中心的子作业进行处理。最后,虚拟机执行子作业后把执行结果返回给企业。每个虚拟机服务器与企业的距离不同、每个服务器之间处理性能不同以及每个服务器要处理的子作业的种类和数量不同,这些都会导致每个服务器建立的虚拟机的状态不同。根据虚拟机之间状态的差异,按照合适的调度策略将子作业分配虚拟机执行(即作业调度)会带来不同的调度效果(最直观的效果为缩短机器的响应时间)。为了缩短机器的响应时间,提高用户体验,本方案使用深度强化学习算法优化作业调度器,深度强化学习的过程为:智能体Agent通过不断与云环境进行交互探索,通过奖罚机制和经验回放机制,累积学习经验,以寻找最优的调度策略。与上述过程对应的步骤为:云计算平台通过不断接收企业的用户作业,对用户作业进行调度和执行;通过收集回报函数的计算结果以及从作业调度的数据,建立经验池;利用经验池寻找最优的调度策略,优化作业调度器。Specifically, the specific process of job scheduling is as follows: first, after the enterprise deploys virtual machine servers in multiple data centers, the enterprise staff submits various user jobs to the job decoupler, and the job decoupler decouples the user jobs. coupled. User jobs generally include atomic jobs and secondary sub-jobs with dependencies. The decoupling here is to split user jobs into different sub-jobs according to their priorities and dependencies. After decoupling the user jobs into different sub-jobs, the sub-jobs are stored by the ready job set. Then, the ready job set is scheduled through the job scheduler, and sub-jobs in the ready job set are allocated to different data centers. Virtual machine servers deployed in different data centers create virtual machines to process sub-jobs assigned to the center. Finally, the virtual machine executes the sub-job and returns the execution result to the enterprise. The distance between each virtual machine server and the enterprise, the processing performance between each server, and the type and number of sub-jobs to be processed by each server will result in different states of the virtual machines established by each server. According to the state difference between virtual machines, assigning sub-jobs to virtual machines for execution according to an appropriate scheduling policy (ie job scheduling) will bring different scheduling effects (the most intuitive effect is to shorten the response time of the machine). In order to shorten the response time of the machine and improve the user experience, this solution uses the deep reinforcement learning algorithm to optimize the job scheduler. The process of deep reinforcement learning is as follows: the agent interacts with the cloud environment through continuous exploration, through the reward and punishment mechanism and the experience playback mechanism , accumulate learning experience to find the optimal scheduling strategy. The steps corresponding to the above process are: the cloud computing platform schedules and executes user jobs by continuously receiving user jobs of the enterprise; establishes an experience pool by collecting the calculation results of the return function and data from job scheduling; Optimized scheduling strategy and optimized job scheduler.

进一步地,所述调度的目标函数为:Further, the objective function of the scheduling is:

Figure GDA0003565851410000031
Figure GDA0003565851410000031

所述J为用户作业;所述π为调度策略;所述

Figure GDA0003565851410000032
为第k个用户的第i个作业;所述
Figure GDA00035658514100000336
为第k个用户的第i个作业的完工时间。The J is the user job; the π is the scheduling strategy; the
Figure GDA0003565851410000032
is the ith job of the kth user; the
Figure GDA00035658514100000336
is the completion time of the i-th job of the k-th user.

具体地,由目标函数可知,作业调度的目标为将作业分配到合适数据中心的虚拟机服务器的前提下,尽可能减少作业的完工时间。Specifically, it can be known from the objective function that the goal of job scheduling is to reduce the completion time of the job as much as possible under the premise of allocating the job to a virtual machine server in a suitable data center.

进一步地,所述

Figure GDA0003565851410000034
所述
Figure GDA0003565851410000035
所述
Figure GDA0003565851410000036
为作业
Figure GDA0003565851410000037
传输到虚拟机的数据量;所述Lk(i)为作业
Figure GDA0003565851410000038
的长度;所述
Figure GDA0003565851410000039
为作业
Figure GDA00035658514100000310
被执行后返回执行结果的数据量;所述
Figure GDA00035658514100000311
为作业
Figure GDA00035658514100000312
的执行时间;所述
Figure GDA00035658514100000337
为作业
Figure GDA00035658514100000314
的传输时间;所述
Figure GDA00035658514100000315
为作业
Figure GDA00035658514100000316
的等待时间;所述等待时间为在通过作业调度器对就绪作业集进行调度之后,通过虚拟机执行作业,并且返回执行结果之前,虚拟机计算能力不足,被调度的作业进入虚拟机等待队列等待被执行的时间。Further, the
Figure GDA0003565851410000034
said
Figure GDA0003565851410000035
said
Figure GDA0003565851410000036
for homework
Figure GDA0003565851410000037
The amount of data transferred to the virtual machine; the L k (i) is the job
Figure GDA0003565851410000038
the length of; the
Figure GDA0003565851410000039
for homework
Figure GDA00035658514100000310
The amount of data that returns the execution result after being executed; the
Figure GDA00035658514100000311
for homework
Figure GDA00035658514100000312
execution time; the
Figure GDA00035658514100000337
for homework
Figure GDA00035658514100000314
the transmission time; the
Figure GDA00035658514100000315
for homework
Figure GDA00035658514100000316
waiting time; the waiting time is that after the ready job set is scheduled by the job scheduler, the job is executed by the virtual machine, and before the execution result is returned, the computing power of the virtual machine is insufficient, and the scheduled job enters the virtual machine waiting queue to wait time it was executed.

具体地,作业的完工时间为作业的执行时间、传输时间和等待时间之和。Lk(i)为作业

Figure GDA00035658514100000317
的长度,即作业
Figure GDA00035658514100000318
的文件长度。Specifically, the completion time of a job is the sum of the execution time, transmission time, and waiting time of the job. L k (i) is the homework
Figure GDA00035658514100000317
the length of the job
Figure GDA00035658514100000318
file length.

进一步地,所述

Figure GDA00035658514100000319
所述
Figure GDA00035658514100000320
为分配给作业
Figure GDA00035658514100000321
的MIPS;所述c为兆字节到字节的转换系数;所述p为虚拟机完成每单位长度作业的CPU周期;所述
Figure GDA00035658514100000322
所述
Figure GDA00035658514100000323
为作业
Figure GDA00035658514100000324
向虚拟机传输数据的时间;所述
Figure GDA00035658514100000325
为作业
Figure GDA00035658514100000326
被执行后,返回处理结果的传输时间;所述
Figure GDA00035658514100000327
所述Jj为第j个作业;所述q为等待队列中作业
Figure GDA00035658514100000328
之前所有作业的集合;所述tj,e为第j个作业的执行时间。Further, the
Figure GDA00035658514100000319
said
Figure GDA00035658514100000320
to assign to a job
Figure GDA00035658514100000321
The MIPS; the c is the conversion factor from megabytes to bytes; the p is the CPU cycle that the virtual machine completes the job per unit length; the
Figure GDA00035658514100000322
said
Figure GDA00035658514100000323
for homework
Figure GDA00035658514100000324
the time to transfer data to the virtual machine; the
Figure GDA00035658514100000325
for homework
Figure GDA00035658514100000326
After being executed, the transmission time of the processing result is returned; the
Figure GDA00035658514100000327
The J j is the jth job; the q is the job in the waiting queue
Figure GDA00035658514100000328
The set of all previous jobs; the t j, e are the execution time of the jth job.

进一步地,所述

Figure GDA00035658514100000329
所述
Figure GDA00035658514100000330
所述作业
Figure GDA00035658514100000331
的传输数据量为
Figure GDA00035658514100000332
Figure GDA00035658514100000333
所述
Figure GDA00035658514100000334
为虚拟机分配给每个作业的带宽资源。Further, the
Figure GDA00035658514100000329
said
Figure GDA00035658514100000330
the job
Figure GDA00035658514100000331
The amount of transmitted data is
Figure GDA00035658514100000332
Figure GDA00035658514100000333
said
Figure GDA00035658514100000334
The bandwidth resource allocated to each job for the virtual machine.

具体地,根据上述

Figure GDA00035658514100000335
Specifically, according to the above
Figure GDA00035658514100000335

进一步地,所述

Figure GDA0003565851410000041
所述b为虚拟机的带宽资源;所述
Figure GDA0003565851410000042
为在时隙T传输到虚拟机的作业数。Further, the
Figure GDA0003565851410000041
The b is the bandwidth resource of the virtual machine; the
Figure GDA0003565851410000042
is the number of jobs transferred to the virtual machine at time slot T.

进一步地,所述训练样本为(st,αt,rt,st+1);所述就绪作业集状态为sJ={t1,d1,t2,d2,……,tn,dn};所述虚拟机状态为

Figure GDA0003565851410000043
所述动作由动作空间A存储,A={α1,α2,……,αn};所述回报值由回报函数R计算,
Figure GDA0003565851410000044
Figure GDA0003565851410000045
所述st和st+1分别为时间步t和时间步t+1的状态;所述状态由状态空间S存储,S={sJ,sVM};所述α为时间步t从动作空间A中选取的动作;所述rt为时间步t回报函数R计算的回报值;所述就绪作业集状态sJ中的ti和di,分别表示就绪作业集中第i个作业的执行时间和传输到虚拟机的数据量;所述n为就绪作业集的作业数量;所述虚拟机状态SVM中的
Figure GDA0003565851410000046
Figure GDA00035658514100000411
分别表示当前时间步第x个虚拟机中剩余的计算能力和等待执行的作业数量;所述m为虚拟机的数量;所述动作空间A中的动作αi表示就绪作业集中第i个作业的虚拟机分配方式;所述动作αi的可选项为m+1;所述
Figure GDA0003565851410000048
Figure GDA0003565851410000049
分别为第x个虚拟机已执行的作业数量和等待执行的作业数量。Further, the training samples are (s t , α t , r t , s t+1 ); the state of the ready job set is s J ={t 1 , d 1 , t 2 , d 2 ,..., t n , d n }; the virtual machine state is
Figure GDA0003565851410000043
The action is stored in the action space A, A={α 1 , α 2 , ..., α n }; the reward value is calculated by the reward function R,
Figure GDA0003565851410000044
Figure GDA0003565851410000045
The s t and s t+1 are the states of the time step t and the time step t+1 respectively; the state is stored in the state space S, S={s J , s VM }; the α is the time step t from The action selected in the action space A; the r t is the reward value calculated by the reward function R at the time step t; t i and d i in the state s J of the ready job set represent the Execution time and the amount of data transferred to the virtual machine; the n is the number of jobs in the ready job set; the virtual machine state S VM
Figure GDA0003565851410000046
and
Figure GDA00035658514100000411
Respectively represent the remaining computing power and the number of jobs waiting to be executed in the xth virtual machine at the current time step; the m is the number of virtual machines; the action α i in the action space A represents the ith job in the ready job set. virtual machine allocation method; the optional option of the action α i is m+1; the
Figure GDA0003565851410000048
and
Figure GDA0003565851410000049
are the number of jobs that have been executed by the xth virtual machine and the number of jobs waiting to be executed.

具体地,在强化学习中训练样本一般为四元组信息(S,α,r,S′),S为当前时间步的状态,α为当前时间步的动作,r为当前时间步采取动作α获得的回报值,S′为S下一个时间步的状态。与上述对应,本方案采集的训练样本为(st,αt,rt,st+1),st为时间步t的状态,αt为时间步t的动作,rt为时间步t采取动作αt获得的回报值,st+1为st的下一个时间步t+1的状态。训练样本的状态由状态空间S存储,状态空间S由就绪作业集状态和虚拟机状态构成,S={sJ,sVM}。训练样本的动作α,由动作空间A存储,动作空间A内每个动作αi有m+1个可选项,每个动作αi第一个可选项为空动作,第二个可选项为将作业分配到排序第一的虚拟机上,以此类推。例如:α1=(0,0,1,0)表示将作业1分配到虚拟机2上,α2=(1,0,0,0)表示采用空动作,当前时间步作业2不分配到任何虚拟机上。回报值r由回报函数计算获取,回报函数为:

Figure GDA00035658514100000410
回报函数设计在深度强化学习中是极其重要的一环,回报函数的设计是否符合目标需求将决定机器能否学到期望的策略,并接影响算法的收敛速度和最终性能。因为的优化目标为最小化作业调度的完工时间,所以针对当前时间步完成的作业给予正回报,对于当前时间步等待的作业给予负回报,鼓励作业能够尽快完成。Specifically, in reinforcement learning, the training samples are generally four-tuple information (S, α, r, S'), where S is the state of the current time step, α is the action at the current time step, and r is the action taken at the current time step α The reward value obtained, S' is the state of S at the next time step. Corresponding to the above, the training samples collected in this scheme are (s t , α t , r t , s t+1 ), s t is the state of time step t, α t is the action of time step t, and r t is time step t takes the reward value obtained by action α t , and s t+1 is the state of the next time step t+1 of s t . The state of the training samples is stored by the state space S, which is composed of the ready job set state and the virtual machine state, S={s J , s VM }. The action α of the training sample is stored in the action space A. Each action α i in the action space A has m+1 options, the first option of each action α i is an empty action, and the second option is the The job is assigned to the first-ranked virtual machine, and so on. For example: α 1 =(0, 0, 1, 0) indicates that job 1 is assigned to virtual machine 2, α 2 =(1, 0, 0, 0) indicates that an empty action is used, and job 2 at the current time step is not assigned to on any virtual machine. The reward value r is calculated by the reward function, and the reward function is:
Figure GDA00035658514100000410
The design of reward function is an extremely important part in deep reinforcement learning. Whether the design of reward function meets the target requirements will determine whether the machine can learn the desired strategy, which in turn affects the convergence speed and final performance of the algorithm. Because the optimization goal is to minimize the completion time of job scheduling, positive rewards are given to jobs completed at the current time step, and negative rewards are given to jobs waiting at the current time step, encouraging jobs to be completed as soon as possible.

进一步地,所述优化作业调度器的目标函数为:Further, the objective function of the optimized job scheduler is:

Figure GDA0003565851410000051
Figure GDA0003565851410000051

所述γ为折扣因子,γ∈[0,1]。The γ is a discount factor, γ∈[0,1].

具体地,本方案采集的训练样本大于经验池阈值后,从中批量抽取进行作业调度器的优化。目标为最大化期望累计折扣回报。Specifically, after the training samples collected by this solution are larger than the threshold of the experience pool, they are extracted in batches to optimize the job scheduler. The goal is to maximize the expected cumulative discounted return.

进一步地,所述优化作业调度器的损失函数为:Further, the loss function of the optimized job scheduler is:

Figure GDA0003565851410000052
Figure GDA0003565851410000052

所述θz为第z次迭代后的作业调度器参数;所述st+1为st的下一个时间步的状态空间;D(M)为经验池D每次抽取的样本数为M;所述αt+1为st+1对应最大Q值的动作;所述

Figure GDA0003565851410000055
为优化第z次迭代后的作业调度器的参数。The θ z is the job scheduler parameter after the z-th iteration; the s t+1 is the state space of the next time step of s t ; D(M) is the number of samples drawn by the experience pool D each time is M ; the α t+1 is the action of s t+1 corresponding to the maximum Q value; the
Figure GDA0003565851410000055
Parameters for optimizing the job scheduler after the zth iteration.

具体地,作业调度器采用Mini-batch训练方法,每次迭代从经验池中随机选取M个样本(st,αt,rt,st+1),将状态st作为在线网络的输入,获得动作αt的当前Q值,将下一状态st+1作为目标网络的输入,获得目标网络中所有动作中的最大Q值。Specifically, the job scheduler adopts the Mini-batch training method, and randomly selects M samples (s t , α t , r t , s t+1 ) from the experience pool in each iteration, and takes the state s t as the input of the online network , obtain the current Q value of the action α t , take the next state s t+1 as the input of the target network, and obtain the maximum Q value among all actions in the target network.

进一步地,所述参数θ关于损失函数的梯度为:Further, the gradient of the parameter θ with respect to the loss function is:

Figure GDA0003565851410000053
Figure GDA0003565851410000053

Figure GDA0003565851410000054
Figure GDA0003565851410000054

与现有技术相比,本发明的有益效果为:Compared with the prior art, the beneficial effects of the present invention are:

(1)通过调度策略有效缩短了机器的响应时间,提高了用户体验。(1) Through the scheduling strategy, the response time of the machine is effectively shortened, and the user experience is improved.

(2)通过训练样本优化了作业调度器,避免了因状态空间增大降低了算法性能,减少了作业的完工时间。(2) The job scheduler is optimized by training samples, which avoids the reduction of algorithm performance and the completion time of jobs due to the increase of state space.

附图说明Description of drawings

图1为本发明的云平台示意图;1 is a schematic diagram of a cloud platform of the present invention;

图2为本发明的仿真实验数据图a;Fig. 2 is the simulation experiment data figure a of the present invention;

图3为本发明的仿真实验数据图b;Fig. 3 is the simulation experiment data graph b of the present invention;

图4为本发明的仿真实验数据图c。Fig. 4 is the simulation experiment data graph c of the present invention.

具体实施方式Detailed ways

本发明附图仅用于示例性说明,不能理解为对本发明的限制。为了更好说明以下实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。The accompanying drawings of the present invention are only used for exemplary illustration, and should not be construed as limiting the present invention. In order to better illustrate the following embodiments, some parts of the drawings may be omitted, enlarged or reduced, which do not represent the size of the actual product; for those skilled in the art, some well-known structures and their descriptions in the drawings may be omitted. understandable.

实施例Example

本实施例提供一种基于深度强化学习的云作业调度方法,包括:This embodiment provides a cloud job scheduling method based on deep reinforcement learning, including:

接收用户发送的用户作业;Receive user jobs sent by users;

对用户作业进行解耦,获取就绪作业集;Decouple user jobs to obtain ready job sets;

通过作业调度器对就绪作业集进行调度;所述调度为按照调度策略采取动作,将就绪作业集中的作业部署到相应的虚拟机上;所述动作为就绪作业集中的作业的虚拟机分配方式;The ready job set is scheduled by the job scheduler; the scheduling is to take an action according to the scheduling policy, and deploy the jobs in the ready job set to the corresponding virtual machine; the action is the virtual machine allocation method of the jobs in the ready job set;

通过虚拟机执行作业,并且返回执行结果;Execute the job through the virtual machine and return the execution result;

收集训练样本,建立经验池;所述训练样本用于存储就绪作业集状态、虚拟机状态、动作和回报值;所述回报值为采取动作获得的回报;Collect training samples to establish an experience pool; the training samples are used to store the ready job set state, virtual machine state, action and reward value; the reward value is the reward obtained by taking the action;

判断经验池内的训练样本数量是否小于阈值,若小于阈值则重新接收用户发送的用户作业,否则利用经验池中的训练样本优化作业调度器;Determine whether the number of training samples in the experience pool is less than the threshold, and if it is less than the threshold, re-receive the user job sent by the user, otherwise use the training samples in the experience pool to optimize the job scheduler;

利用优化后的作业调度器进行调度。Use the optimized job scheduler for scheduling.

图1为本发明的云平台示意图,如图1所示,上述方法使用在图内的云平台上。作业调度的具体过程为:首先,企业在多个数据中心部署虚拟机服务器后,企业工作人员将种类各异的用户作业提交给作业解耦器,作业解耦器对用户作业进行解耦。用户作业一般包含原子作业和存在依赖关系的次子作业,这里的解耦就是根据子作业的优先级和依赖关系将用户作业拆分为不同的子作业。将用户作业解耦成不同的子作业之后,子作业由就绪作业集进行存储。然后,通过作业调度器对就绪作业集进行调度,将就绪作业集中的子作业分配到不同数据中心。部署在不同数据中心的虚拟机服务器建立虚拟机,以此对分配到该中心的子作业进行处理。最后,虚拟机执行子作业后把执行结果返回给企业。每个虚拟机服务器与企业的距离不同、每个服务器之间处理性能不同以及每个服务器要处理的子作业的种类和数量不同,这些都会导致每个服务器建立的虚拟机的状态不同。根据虚拟机之间状态的差异,按照合适的调度策略将子作业分配虚拟机执行(即作业调度)会带来不同的调度效果(最直观的效果为缩短机器的响应时间)。为了缩短机器的响应时间,提高用户体验,本方案使用深度强化学习算法优化作业调度器,深度强化学习的过程为:智能体Agent通过不断与云环境进行交互探索,通过奖罚机制和经验回放机制,累积学习经验,以寻找最优的调度策略。与上述过程对应的步骤为:云计算平台通过不断接收企业的用户作业,对用户作业进行调度和执行;通过收集回报函数的计算结果以及从作业调度的数据,建立经验池;利用经验池寻找最优的调度策略,优化作业调度器。FIG. 1 is a schematic diagram of a cloud platform of the present invention. As shown in FIG. 1 , the above method is used on the cloud platform in the figure. The specific process of job scheduling is as follows: First, after the enterprise deploys virtual machine servers in multiple data centers, the enterprise staff submits various user jobs to the job decoupler, and the job decoupler decouples the user jobs. User jobs generally include atomic jobs and secondary sub-jobs with dependencies. The decoupling here is to split user jobs into different sub-jobs according to their priorities and dependencies. After decoupling the user jobs into different sub-jobs, the sub-jobs are stored by the ready job set. Then, the ready job set is scheduled through the job scheduler, and sub-jobs in the ready job set are allocated to different data centers. Virtual machine servers deployed in different data centers create virtual machines to process sub-jobs assigned to the center. Finally, the virtual machine executes the sub-job and returns the execution result to the enterprise. The distance between each virtual machine server and the enterprise, the processing performance between each server, and the type and number of sub-jobs to be processed by each server will result in different states of the virtual machines established by each server. According to the state difference between virtual machines, assigning sub-jobs to virtual machines for execution according to an appropriate scheduling policy (ie job scheduling) will bring different scheduling effects (the most intuitive effect is to shorten the response time of the machine). In order to shorten the response time of the machine and improve the user experience, this solution uses the deep reinforcement learning algorithm to optimize the job scheduler. The process of deep reinforcement learning is as follows: the agent interacts with the cloud environment through continuous exploration, through the reward and punishment mechanism and the experience playback mechanism , accumulate learning experience to find the optimal scheduling strategy. The steps corresponding to the above process are: the cloud computing platform schedules and executes user jobs by continuously receiving user jobs of the enterprise; establishes an experience pool by collecting the calculation results of the return function and data from job scheduling; Optimized scheduling strategy and optimized job scheduler.

进一步地,所述调度的目标函数为:Further, the objective function of the scheduling is:

Figure GDA0003565851410000071
Figure GDA0003565851410000071

所述J为用户作业;所述π为调度策略;所述

Figure GDA0003565851410000072
为第k个用户的第i个作业;所述
Figure GDA0003565851410000073
为第k个用户的第i个作业的完工时间。The J is the user job; the π is the scheduling strategy; the
Figure GDA0003565851410000072
is the ith job of the kth user; the
Figure GDA0003565851410000073
is the completion time of the i-th job of the k-th user.

具体地,由目标函数可知,作业调度的目标为将作业分配到合适数据中心的虚拟机服务器的前提下,尽可能减少作业的完工时间。Specifically, it can be known from the objective function that the goal of job scheduling is to reduce the completion time of the job as much as possible under the premise of allocating the job to a virtual machine server in a suitable data center.

进一步地,所述

Figure GDA0003565851410000074
所述
Figure GDA0003565851410000075
所述
Figure GDA0003565851410000076
为作业
Figure GDA0003565851410000077
传输到虚拟机的数据量;所述Lk(i)为作业
Figure GDA0003565851410000078
的长度;所述
Figure GDA0003565851410000079
为作业
Figure GDA00035658514100000710
被执行后返回执行结果的数据量;所述
Figure GDA00035658514100000711
勾作业
Figure GDA00035658514100000712
的执行时间;所述
Figure GDA00035658514100000713
为作业
Figure GDA00035658514100000714
的传输时间;所述
Figure GDA00035658514100000715
为作业
Figure GDA00035658514100000716
的等待时间;所述等待时间为在通过作业调度器对就绪作业集进行调度之后,通过虚拟机执行作业,并且返回执行结果之前,虚拟机计算能力不足,被调度的作业进入虚拟机等待队列等待被执行的时间。Further, the
Figure GDA0003565851410000074
said
Figure GDA0003565851410000075
said
Figure GDA0003565851410000076
for homework
Figure GDA0003565851410000077
The amount of data transferred to the virtual machine; the L k (i) is the job
Figure GDA0003565851410000078
the length of; the
Figure GDA0003565851410000079
for homework
Figure GDA00035658514100000710
The amount of data that returns the execution result after being executed; the
Figure GDA00035658514100000711
tick homework
Figure GDA00035658514100000712
execution time; the
Figure GDA00035658514100000713
for homework
Figure GDA00035658514100000714
the transmission time; the
Figure GDA00035658514100000715
for homework
Figure GDA00035658514100000716
waiting time; the waiting time is that after the ready job set is scheduled by the job scheduler, the job is executed by the virtual machine, and before the execution result is returned, the computing power of the virtual machine is insufficient, and the scheduled job enters the virtual machine waiting queue to wait time it was executed.

具体地,作业的完工时间为作业的执行时间、传输时间和等待时间之和。Lk(i)为作业

Figure GDA00035658514100000717
的长度,即作业
Figure GDA00035658514100000718
的文件长度。Specifically, the completion time of a job is the sum of the execution time, transmission time, and waiting time of the job. L k (i) is the homework
Figure GDA00035658514100000717
the length of the job
Figure GDA00035658514100000718
file length.

进一步地,所述

Figure GDA00035658514100000719
所述
Figure GDA00035658514100000720
为分配给作业
Figure GDA00035658514100000721
的MIPS;所述c为兆字节到字节的转换系数;所述p为虚拟机完成每单位长度作业的CPU周期;所述
Figure GDA00035658514100000722
所述
Figure GDA00035658514100000723
为作业
Figure GDA00035658514100000724
向虚拟机传输数据的时间;所述
Figure GDA00035658514100000725
为作业
Figure GDA00035658514100000726
被执行后,返回处理结果的传输时间;所述
Figure GDA00035658514100000727
所述Jj为第j个作业;所述q为等待队列中作业
Figure GDA00035658514100000728
之前所有作业的集合;所述tj,e为第j个作业的执行时间。Further, the
Figure GDA00035658514100000719
said
Figure GDA00035658514100000720
to assign to a job
Figure GDA00035658514100000721
The MIPS; the c is the conversion factor from megabytes to bytes; the p is the CPU cycle that the virtual machine completes the job per unit length; the
Figure GDA00035658514100000722
said
Figure GDA00035658514100000723
for homework
Figure GDA00035658514100000724
the time to transfer data to the virtual machine; the
Figure GDA00035658514100000725
for homework
Figure GDA00035658514100000726
After being executed, the transmission time of the processing result is returned; the
Figure GDA00035658514100000727
The J j is the jth job; the q is the job in the waiting queue
Figure GDA00035658514100000728
The set of all previous jobs; the t j, e are the execution time of the jth job.

进一步地,所述

Figure GDA00035658514100000729
所述
Figure GDA00035658514100000730
所述作业
Figure GDA00035658514100000731
的传输数据量为
Figure GDA00035658514100000732
Figure GDA00035658514100000733
所述
Figure GDA00035658514100000734
为虚拟机分配给每个作业的带宽资源。Further, the
Figure GDA00035658514100000729
said
Figure GDA00035658514100000730
the job
Figure GDA00035658514100000731
The amount of transmitted data is
Figure GDA00035658514100000732
Figure GDA00035658514100000733
said
Figure GDA00035658514100000734
The bandwidth resource allocated to each job for the virtual machine.

具体地,根据上述

Figure GDA0003565851410000081
Specifically, according to the above
Figure GDA0003565851410000081

进一步地,所述

Figure GDA0003565851410000082
所述b为虚拟机的带宽资源;所述
Figure GDA0003565851410000083
为在时隙T传输到虚拟机的作业数。Further, the
Figure GDA0003565851410000082
The b is the bandwidth resource of the virtual machine; the
Figure GDA0003565851410000083
is the number of jobs transferred to the virtual machine at time slot T.

进一步地,所述训练样本为(st,αt,rt,st+1);所述就绪作业集状态为sJ={t1,d1,t2,d2,……,tn,dn};所述虚拟机状态为

Figure GDA0003565851410000084
所述动作由动作空间A存储,A={α1,α2,……,αn};所述回报值由回报函数R计算,
Figure GDA0003565851410000085
Figure GDA0003565851410000086
所述st和st+1分别为时间步t和时间步t+1的状态;所述状态由状态空间S存储,S={sJ,sVM};所述α为时间步t从动作空间A中选取的动作;所述rt为时间步t回报函数R计算的回报值;所述就绪作业集状态sJ中的ti和di,分别表示就绪作业集中第i个作业的执行时间和传输到虚拟机的数据量;所述n为就绪作业集的作业数量;所述虚拟机状态sVM中的
Figure GDA0003565851410000087
Figure GDA0003565851410000088
分别表示当前时间步第x个虚拟机中剩余的计算能力和等待执行的作业数量;所述m为虚拟机的数量;所述动作空间A中的动作αi表示就绪作业集中第i个作业的虚拟机分配方式;所述动作αi的可选项为m+1;所述
Figure GDA0003565851410000089
Figure GDA00035658514100000810
分别为第x个虚拟机已执行的作业数量和等待执行的作业数量。Further, the training samples are (s t , α t , r t , s t+1 ); the state of the ready job set is s J ={t 1 , d 1 , t 2 , d 2 ,..., t n , d n }; the virtual machine state is
Figure GDA0003565851410000084
The action is stored in the action space A, A={α 1 , α 2 , ..., α n }; the reward value is calculated by the reward function R,
Figure GDA0003565851410000085
Figure GDA0003565851410000086
The s t and s t+1 are the states of the time step t and the time step t+1 respectively; the state is stored in the state space S, S={s J , s VM }; the α is the time step t from The action selected in the action space A; the r t is the reward value calculated by the reward function R at the time step t; t i and d i in the state s J of the ready job set represent the Execution time and the amount of data transferred to the virtual machine; the n is the number of jobs in the ready job set; the virtual machine state s in the VM
Figure GDA0003565851410000087
and
Figure GDA0003565851410000088
Respectively represent the remaining computing power and the number of jobs waiting to be executed in the xth virtual machine at the current time step; the m is the number of virtual machines; the action α i in the action space A represents the ith job in the ready job set. virtual machine allocation method; the optional option of the action α i is m+1; the
Figure GDA0003565851410000089
and
Figure GDA00035658514100000810
are the number of jobs that have been executed by the xth virtual machine and the number of jobs waiting to be executed.

具体地,在强化学习中训练样本一般为四元组信息(S,α,r,S′),S为当前时间步的状态,α为当前时间步的动作,r为当前时间步采取动作α获得的回报值,S′为S下一个时间步的状态。与上述对应,本方案采集的训练样本为(st,αt,rt,st+1),st为时间步t的状态,αt为时间步t的动作,rt为时间步t采取动作αt获得的回报值,st+1为st的下一个时间步t+1的状态。训练样本的状态由状态空间S存储,状态空间S由就绪作业集状态和虚拟机状态构成,S={sJ,sVM}。训练样本的动作α,由动作空间A存储,动作空间A内每个动作αi有m+1个可选项,每个动作αi第一个可选项为空动作,第二个可选项为将作业分配到排序第一的虚拟机上,以此类推。例如:α1=(0,0,1,0)表示将作业1分配到虚拟机2上,α2=(1,0,0,0)表示采用空动作,当前时间步作业2不分配到任何虚拟机上。回报值r由回报函数计算获取,回报函数为:

Figure GDA00035658514100000811
回报函数设计在深度强化学习中是极其重要的一环,回报函数的设计是否符合目标需求将决定机器能否学到期望的策略,并接影响算法的收敛速度和最终性能。因为的优化目标为最小化作业调度的完工时间,所以针对当前时间步完成的作业给予正回报,对于当前时间步等待的作业给予负回报,鼓励作业能够尽快完成。Specifically, in reinforcement learning, the training samples are generally four-tuple information (S, α, r, S'), where S is the state of the current time step, α is the action at the current time step, and r is the action taken at the current time step α The reward value obtained, S' is the state of S at the next time step. Corresponding to the above, the training samples collected in this scheme are (s t , α t , r t , s t+1 ), s t is the state of time step t, α t is the action of time step t, and r t is time step t takes the reward value obtained by action α t , and s t+1 is the state of the next time step t+1 of s t . The state of the training samples is stored by the state space S, which is composed of the ready job set state and the virtual machine state, S={s J , s VM }. The action α of the training sample is stored in the action space A. Each action α i in the action space A has m+1 options, the first option of each action α i is an empty action, and the second option is the Jobs are assigned to the first-ranked virtual machine, and so on. For example: α 1 =(0, 0, 1, 0) indicates that job 1 is assigned to virtual machine 2, α 2 =(1, 0, 0, 0) indicates that an empty action is used, and job 2 at the current time step is not assigned to on any virtual machine. The reward value r is calculated by the reward function, and the reward function is:
Figure GDA00035658514100000811
The design of reward function is an extremely important part in deep reinforcement learning. Whether the design of reward function meets the target requirements will determine whether the machine can learn the desired strategy, which in turn affects the convergence speed and final performance of the algorithm. Because the optimization goal is to minimize the completion time of job scheduling, positive rewards are given to jobs completed at the current time step, and negative rewards are given to jobs waiting at the current time step, encouraging jobs to be completed as soon as possible.

进一步地,所述优化作业调度器的目标函数为:Further, the objective function of the optimized job scheduler is:

Figure GDA0003565851410000091
Figure GDA0003565851410000091

所述γ为折扣因子,γ∈[0,1]。The γ is a discount factor, γ∈[0,1].

具体地,本方案采集的训练样本大于经验池阈值后,从中批量抽取进行作业调度器的优化。目标为最大化期望累计折扣回报。Specifically, after the training samples collected by this solution are larger than the threshold of the experience pool, they are extracted in batches to optimize the job scheduler. The goal is to maximize the expected cumulative discounted return.

进一步地,所述优化作业调度器的损失函数为:Further, the loss function of the optimized job scheduler is:

Figure GDA0003565851410000092
Figure GDA0003565851410000092

所述θz为第z次迭代后的作业调度器参数;所述st+1为st的下一个时间步的状态空间;D(M)为经验池D每次抽取的样本数为M;所述αt+1为st+1对应最大Q值的动作;所述

Figure GDA0003565851410000093
为优化第z次迭代后的作业调度器的参数。The θ z is the job scheduler parameter after the z-th iteration; the s t+1 is the state space of the next time step of s t ; D(M) is the number of samples drawn by the experience pool D each time is M ; the α t+1 is the action of s t+1 corresponding to the maximum Q value; the
Figure GDA0003565851410000093
Parameters for optimizing the job scheduler after the zth iteration.

具体地,作业调度器采用Mini-batch训练方法,每次迭代从经验池中随机选取M个样本(st,αt,rt,st+1),将状态st作为在线网络的输入,获得动作αt的当前Q值,将下一状态st+1作为目标网络的输入,获得目标网络中所有动作中的最大Q值。Specifically, the job scheduler adopts the Mini-batch training method, and randomly selects M samples (s t , α t , r t , s t+1 ) from the experience pool in each iteration, and takes the state s t as the input of the online network , obtain the current Q value of the action α t , take the next state s t+1 as the input of the target network, and obtain the maximum Q value among all actions in the target network.

进一步地,所述参数θ关于损失函数的梯度为:Further, the gradient of the parameter θ with respect to the loss function is:

Figure GDA0003565851410000094
Figure GDA0003565851410000094

Figure GDA0003565851410000095
Figure GDA0003565851410000095

本实施例还进行了仿真实验,实验目标为测试基于深度强化学习的作业调度方法。In this embodiment, a simulation experiment is also performed, and the objective of the experiment is to test the job scheduling method based on deep reinforcement learning.

使用Python语言搭建了一个仿真平台,平台的具体系统参数为:用户数量为4,用户作业队列数量为4,每个时间步作业队列进行就绪作业集的作业数量为3,就绪作业集的作业数量为12,资源利用率阀值为0.6。实验用的用户作业包括4种作业类型,作业的数据传输量与计算量的比值如表1所示。A simulation platform is built using Python language. The specific system parameters of the platform are: the number of users is 4, the number of user job queues is 4, the number of jobs in the job queue for each time step is 3, and the number of jobs in the ready job set is 3. is 12, and the resource utilization threshold is 0.6. The user jobs used in the experiment include four types of jobs, and the ratio of the data transmission amount to the calculation amount of the job is shown in Table 1.

表1作业的数据传输量与计算量的比值Table 1. The ratio of data transfer amount to computation amount of the job

Figure GDA0003565851410000096
Figure GDA0003565851410000096

Figure GDA0003565851410000101
Figure GDA0003565851410000101

作业的传输数据量在10-20M之间随机生成,子作业之间的依赖性随机生成,作业数量为200。云平台中3台虚拟机的计算能力分别为650MIPS、850MIPS和1500MIPS,带宽大小分别为200M、300M和500M,计算核心数分别为4、8和12。DQN网络关键参数如表2所示,在上述实验环境下进行以下实验。The transfer data volume of the job is randomly generated between 10-20M, the dependencies between sub-jobs are randomly generated, and the number of jobs is 200. The computing capabilities of the three virtual machines in the cloud platform are 650MIPS, 850MIPS, and 1500MIPS, respectively, the bandwidth sizes are 200M, 300M, and 500M, and the number of computing cores are 4, 8, and 12, respectively. The key parameters of the DQN network are shown in Table 2. The following experiments are carried out in the above experimental environment.

表2虚拟机使用阶段DQN模型的参数Table 2 Parameters of the DQN model in the stage of virtual machine use

Figure GDA0003565851410000102
Figure GDA0003565851410000102

首先,验证本阶段DQN算法在训练过程中的收敛性以及收敛速度。图2为本发明的仿真实验数据图a,训练过程回报值的变化情况如图2所示。可以看出,随着训练的深入,Agent从环境中获得的总回报值递增,大约经过1300回合训练后开始趋于收敛,说明模型通过不断的训练,学习到可实现目标优化的策略。First, verify the convergence and convergence speed of the DQN algorithm in the training process at this stage. Fig. 2 is a simulation experiment data graph a of the present invention, and Fig. 2 shows the change of the reward value in the training process. It can be seen that with the deepening of training, the total return value obtained by the agent from the environment increases, and begins to converge after about 1300 rounds of training, indicating that the model has learned a strategy that can achieve target optimization through continuous training.

接下来,对本发明在全局完工时间方面的优化效果与其他算法的性能差异。采用的基准算法有随机算法Random、循环算法RR、具备学习能力的智能调度算法HDDL算法。HDDL算法协同多个异构深度学习模型作为智能调度器,从历史经验中,学习探索最优或是次优的调度策略。图3为本发明的仿真实验数据图b,实验结果如图3所示。实验结果表明随着训练迭代次数的增加,DQN和HDDL的完工时间曲线递减并趋于稳定收敛。同时表明DQN和HDDL智能体均能从历史经验中学习到优化策略,实现系统目标优化,减少全局完工时间,但是DQN的完工时间要优于HDDL。Next, the optimization effect of the present invention in terms of global completion time is different from the performance of other algorithms. The benchmark algorithms used include random algorithm Random, cyclic algorithm RR, and intelligent scheduling algorithm HDDL algorithm with learning ability. The HDDL algorithm cooperates with multiple heterogeneous deep learning models as an intelligent scheduler, and learns to explore optimal or sub-optimal scheduling strategies from historical experience. FIG. 3 is a simulation experimental data diagram b of the present invention, and the experimental results are shown in FIG. 3 . The experimental results show that with the increase of the number of training iterations, the makepan curves of DQN and HDDL decrease and tend to converge stably. At the same time, it is shown that both DQN and HDDL agents can learn optimization strategies from historical experience, achieve system goal optimization, and reduce global completion time, but DQN's completion time is better than HDDL.

最后,验证在不同的虚拟机个数,即不同系统资源的情况下,各算法的优化效果。图4为本发明的仿真实验数据图c,实验结果如图4所示。Finally, verify the optimization effect of each algorithm under the condition of different number of virtual machines, that is, different system resources. FIG. 4 is a simulation experimental data graph c of the present invention, and the experimental results are shown in FIG. 4 .

在图中,由于DQN算法的波动性,为了使实验结果更具一般性,本文算法作业完工时间取的是最后100个回合的平均值。从图4可清楚观察到,提出的基于DQN选择算法在不同虚拟机数目下,作业完工时间均小于其他基准算法。另外,随着虚拟机数目增多,各算法模型的作业完工时间逐渐减少,差距变小。以上结果可以说明在云资源竞争较大的情况下,智能调度器能够根据任务属性和系统资源状态来动态任务的调度策略,从而减少全局作业完工时间。In the figure, due to the volatility of the DQN algorithm, in order to make the experimental results more general, the job completion time of the algorithm in this paper is the average of the last 100 rounds. It can be clearly observed from Figure 4 that the proposed DQN-based selection algorithm has less job completion time than other benchmark algorithms under different numbers of virtual machines. In addition, as the number of virtual machines increases, the job completion time of each algorithm model gradually decreases, and the gap becomes smaller. The above results can illustrate that in the case of large competition for cloud resources, the intelligent scheduler can dynamically schedule tasks according to task attributes and system resource status, thereby reducing the global job completion time.

在云计算环境中,作业调度时,本实验使用基于深度强化学习算法求得作业的优化调度策略,并依此策略将作业提交到最优虚拟机执行,解决了由于用户作业类型、大小、虚拟机状态等均为动态变化导致用户作业调度困难的问题。通过综合考虑作业执行时间、作业等待时间等服务质量因素,有效地降低了作业的整体完工时间。In the cloud computing environment, when scheduling jobs, this experiment uses a deep reinforcement learning algorithm to obtain an optimal scheduling strategy for jobs, and submits jobs to the optimal virtual machine for execution according to this strategy. The state of the machine and the like are all the problems that the dynamic change leads to the difficulty of user job scheduling. By comprehensively considering service quality factors such as job execution time and job waiting time, the overall completion time of the job is effectively reduced.

显然,本发明的上述实施例仅仅是为清楚地说明本发明技术方案所作的举例,而并非是对本发明的具体实施方式的限定。凡在本发明权利要求书的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principle of the claims of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (7)

1. A cloud job scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
receiving user jobs sent by a user;
decoupling user operation to obtain a ready operation set;
scheduling the ready job set through a job scheduler; the scheduling is to take action according to a scheduling strategy and deploy the operation in the ready operation set to the corresponding virtual machine; the action is used as a virtual machine distribution mode of the operation in the ready operation set;
executing the job through the virtual machine and returning an execution result;
collecting training samples and establishing an experience pool; the training sample is used for storing a ready operation set state, a virtual machine state, an action and a return value; the return value is the return obtained by taking action;
judging whether the number of training samples in the experience pool is smaller than a threshold value, if so, re-receiving user operation sent by a user, and otherwise, optimizing an operation scheduler by using the training samples in the experience pool;
scheduling by using the optimized job scheduler;
the objective function of the scheduling is:
Figure FDA0003565851400000011
j is user operation; the pi is a scheduling strategy; the above-mentioned
Figure FDA0003565851400000012
An ith job for a kth user; the above-mentioned
Figure FDA0003565851400000013
The completion time of the ith job for the kth user; the above-mentioned
Figure FDA0003565851400000014
The described
Figure FDA0003565851400000015
Figure FDA0003565851400000016
The above-mentioned
Figure FDA0003565851400000017
To do work
Figure FDA0003565851400000018
The amount of data transferred to the virtual machine; said Lk(i) To do work
Figure FDA0003565851400000019
Length of (d); the above-mentioned
Figure FDA00035658514000000110
To do work
Figure FDA00035658514000000111
Returning the data volume of the execution result after being executed; the above-mentioned
Figure FDA00035658514000000112
To do work
Figure FDA00035658514000000113
The execution time of (c); the above-mentioned
Figure FDA00035658514000000114
To do work
Figure FDA00035658514000000115
The transmission time of (c); the above-mentioned
Figure FDA00035658514000000116
To do work
Figure FDA00035658514000000117
Waiting time; the waiting time is adjusted by a job scheduler to a ready job setAfter the operation is finished, the operation is executed through the virtual machine, and before the execution result is returned, the computing capacity of the virtual machine is insufficient, and the scheduled operation enters a virtual machine waiting queue to wait for the time of being executed;
the above-mentioned
Figure FDA00035658514000000118
The above-mentioned
Figure FDA00035658514000000119
To be allocated to a job
Figure FDA00035658514000000120
The MIPS of (mobile industry processor); c is a conversion coefficient from megabytes to bytes; the p is the CPU period of the virtual machine for completing the operation of each unit length; the above-mentioned
Figure FDA00035658514000000121
The above-mentioned
Figure FDA00035658514000000122
To do work
Figure FDA00035658514000000123
Time of data transfer to the virtual machine; the above-mentioned
Figure FDA00035658514000000124
To do work
Figure FDA00035658514000000125
Returning the transmission time of the processing result after being executed; the described
Figure FDA00035658514000000126
Said JjIs the j job; q is a job in the wait queue
Figure FDA00035658514000000127
A set of all previous jobs; the above-mentionedtj,eIs the execution time of the j-th job.
2. The cloud job scheduling method based on deep reinforcement learning according to claim 1, wherein the cloud job scheduling method is based on deep reinforcement learning
Figure FDA0003565851400000021
The above-mentioned
Figure FDA0003565851400000022
The operation
Figure FDA0003565851400000023
The amount of data transmitted is
Figure FDA0003565851400000024
The above-mentioned
Figure FDA0003565851400000025
Bandwidth resources allocated to each job for the virtual machine.
3. The cloud job scheduling method based on deep reinforcement learning according to claim 2, wherein the cloud job scheduling method is characterized in that
Figure FDA0003565851400000026
B is the bandwidth resource of the virtual machine; the above-mentioned
Figure FDA0003565851400000027
Is the number of jobs transferred to the virtual machine in time slot T.
4. The cloud job scheduling method based on deep reinforcement learning of claim 3, wherein the training sample is(s)t,αt,rt,st+1) (ii) a The ready job set state is sJ={t1,d1,t2,d2,……,tn,dn}; the virtual machine state is
Figure FDA0003565851400000028
The motion is stored by a motion space a, a ═ α1,α2,……,αn}; the reward value is calculated by a reward function R,
Figure FDA0003565851400000029
s istAnd st+1The states of time step t and time step t +1 respectively; the states are stored by a state space S, S ═ SJ,sVM}; a is saidtSelecting an action from the action space A for a time step t; said rtReporting the value calculated for the function R at time step t; the ready job set state sJT in (1)iAnd diRespectively representing the execution time of the ith job in the ready job set and the data amount transmitted to the virtual machine; the n is the job number of the ready job set; the virtual machine state sVMIn (1)
Figure FDA00035658514000000210
And
Figure FDA00035658514000000211
respectively representing the residual computing capacity in the x-th virtual machine at the current time step and the number of jobs waiting to be executed; the m is the number of the virtual machines; motion a in the motion space AiShowing the distribution mode of the virtual machine of the ith operation in the ready operation set; the action aiIs m + 1; the above-mentioned
Figure FDA00035658514000000212
And
Figure FDA00035658514000000213
the number of executed jobs and the number of jobs waiting to be executed of the x-th virtual machine are respectively.
5. The method for cloud job scheduling based on deep reinforcement learning according to claim 4, wherein the objective function of the optimized job scheduler is:
Figure FDA00035658514000000214
the gamma is a discount factor, and the gamma belongs to [0, 1 ].
6. The method for cloud job scheduling based on deep reinforcement learning according to claim 5, wherein the loss function of the optimized job scheduler is:
Figure FDA00035658514000000215
theta is describedzThe z-th iteration is the job scheduler parameter; s ist+1S for the next time stept(ii) a D (M) is the number M of samples extracted each time by the experience pool D; a is saidt+1Is s ist+1An action corresponding to the maximum Q value; the above-mentioned
Figure FDA00035658514000000216
To optimize the parameters of the job scheduler after the z-th iteration.
7. The cloud job scheduling method based on deep reinforcement learning according to claim 6, wherein the gradient of the parameter θ with respect to the loss function is:
Figure FDA0003565851400000031
CN202011578884.0A 2020-12-28 2020-12-28 Cloud job scheduling method based on deep reinforcement learning Active CN112698911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011578884.0A CN112698911B (en) 2020-12-28 2020-12-28 Cloud job scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011578884.0A CN112698911B (en) 2020-12-28 2020-12-28 Cloud job scheduling method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112698911A CN112698911A (en) 2021-04-23
CN112698911B true CN112698911B (en) 2022-05-17

Family

ID=75511311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011578884.0A Active CN112698911B (en) 2020-12-28 2020-12-28 Cloud job scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112698911B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610474B (en) * 2022-05-12 2022-09-02 之江实验室 A multi-strategy job scheduling method and system in a heterogeneous supercomputing environment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769182A (en) * 2018-05-24 2018-11-06 国网上海市电力公司 A kind of prediction executes the Combinatorial Optimization dispatching method of task execution time

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262190B2 (en) * 2013-11-19 2016-02-16 Xerox Corporation Method and system for managing virtual machines in distributed computing environment
US10802488B1 (en) * 2017-12-29 2020-10-13 Apex Artificial Intelligence Industries, Inc. Apparatus and method for monitoring and controlling of a neural network using another neural network implemented on one or more solid-state chips
CN110389823A (en) * 2018-04-19 2019-10-29 广东石油化工学院 A workflow task scheduling method in cloud computing environment based on virtualization container technology
CN109388484B (en) * 2018-08-16 2020-07-28 广东石油化工学院 Multi-resource cloud job scheduling method based on Deep Q-network algorithm
CN111722910B (en) * 2020-06-19 2023-07-21 广东石油化工学院 A method for cloud job scheduling and resource allocation
CN111831415B (en) * 2020-07-10 2024-01-26 广东石油化工学院 Multi-queue multi-cluster task scheduling method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769182A (en) * 2018-05-24 2018-11-06 国网上海市电力公司 A kind of prediction executes the Combinatorial Optimization dispatching method of task execution time

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
云虚拟机资源分配的效用最大化模型;师雪霖 等;《计算机学报》;20130228;第36卷(第2期);第252-262页 *

Also Published As

Publication number Publication date
CN112698911A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
CN112416585B (en) Deep learning-oriented GPU resource management and intelligent scheduling method
Yang et al. A dynamic ant-colony genetic algorithm for cloud service composition optimization
CN114691363A (en) Cloud data center self-adaption efficient resource allocation method based on deep reinforcement learning
CN103631657B (en) A kind of method for scheduling task based on MapReduce
CN107168770B (en) Low-energy-consumption cloud data center workflow scheduling and resource supply method
WO2020248226A1 (en) Initial hadoop computation task allocation method based on load prediction
CN107888669A (en) A kind of extensive resource scheduling system and method based on deep learning neutral net
Tong et al. DDQN-TS: A novel bi-objective intelligent scheduling algorithm in the cloud environment
CN111722910A (en) A method for cloud job scheduling and resource allocation
CN104536804A (en) Virtual resource dispatching system for related task requests and dispatching and distributing method for related task requests
CN113094159B (en) Data center job scheduling method, system, storage medium and computing device
CN109710372B (en) Calculation intensive cloud workflow scheduling method based on owl search algorithm
CN114741955A (en) Multi-objective optimization task scheduling method based on security cloud
CN110311965A (en) A task scheduling method and system in a cloud computing environment
CN118740835A (en) A cloud-edge computing task scheduling method based on reinforcement learning
CN114090239B (en) Method and device for dispatching edge resources based on model reinforcement learning
CN106648831A (en) Cloud workflow scheduling method based on firefly algorithm and dynamic priority algorithm
CN117850999A (en) Heterogeneous computing platform task scheduling method based on graph neural network
CN112306642A (en) A Workflow Scheduling Method Based on Stable Matching Game Theory
CN116302389A (en) Task scheduling method based on improved ant colony algorithm
CN117909061A (en) Model task processing system and resource scheduling method based on GPU hybrid cluster
CN117687759A (en) Task scheduling method, device, processing equipment and readable storage medium
CN117032902A (en) Cloud task scheduling method for improving discrete particle swarm algorithm based on load
Chalack et al. Resource allocation in cloud environment using approaches based particle swarm optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant