CN103309748B

CN103309748B - Adaptive scheduling host system and scheduling method of GPU virtual resources in cloud game

Info

Publication number: CN103309748B
Application number: CN201310244765.5A
Authority: CN
Inventors: 王润泽; 张超; 钟贤明; 戚正伟; 管海兵
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2013-06-19
Filing date: 2013-06-19
Publication date: 2015-04-29
Anticipated expiration: 2033-06-19
Also published as: CN103309748A

Abstract

A host system for self-adaptive scheduling of GPU virtual resources in a cloud game, comprising a scheduling control module, an agent module respectively connected to the scheduling control module, an image application programming interface analysis module, and a list of virtual machines, the agent module It consists of a scheduling module and a monitoring module. All modules are deployed in the host machine; the scheduling control module is responsible for the communication of information between other modules: the scheduling control module obtains the information of all running virtual machines from the virtual machine list and sends them to required modules. The present invention realizes the scientific management of GPU resources in the virtual machine, improves the utilization rate of GPU resources, and allocates enough GPU resources to all running virtual machines through a fair allocation method based on the fairness-based QoS adaptive scheduling method, and ensures that the QoS requirements are met. while maximizing overall GPU utilization.

Description

Adaptive scheduling host system and scheduling method for GPU virtual resources in cloud games

技术领域technical field

本发明涉及的是GPU虚拟化领域中的资源自适应调度宿主机系统，应用于云游戏平台，具体是一种在操作系统图像API中进行干预的调度系统。The invention relates to a resource self-adaptive scheduling host system in the field of GPU virtualization, which is applied to a cloud game platform, and is specifically a scheduling system that intervenes in an operating system image API.

背景技术Background technique

GPU虚拟化技术的逐渐成熟促进了云游戏应用的发展，因此云服务中的云游戏应用时兴。但是由于目前默认的GPU资源共享机制性能不良，使得云游戏的用户体验不可避免地被一些实时不确定因素破坏，比如渲染复杂的游戏场景。The gradual maturity of GPU virtualization technology has promoted the development of cloud game applications, so cloud game applications in cloud services are popular. However, due to the poor performance of the current default GPU resource sharing mechanism, the user experience of cloud games is inevitably damaged by some real-time uncertain factors, such as rendering complex game scenes.

在视频流质量分析中，FPS(Frames Per Second)的定义为每秒传输帧数，测量用于保存、显示动态视频的信息数量，即动画或视频的画面数。In video stream quality analysis, FPS (Frames Per Second) is defined as the number of frames per second, which measures the amount of information used to save and display dynamic video, that is, the number of frames of animation or video.

在云游戏用户体验中，QoS要求指的是保证用户正常使用的最小FPS和最大延迟。In the user experience of cloud games, QoS requirements refer to the minimum FPS and maximum delay that guarantee the normal use of users.

影响虚拟环境中游戏运行的主要因素：CPU运行时间、GPU执行游戏同步渲染时间、不可预测的游戏场景变化和虚拟机之间的干预。这些因素导致游戏的FPS不符合QoS要求，影响用户体验。The main factors affecting the running of the game in the virtual environment are: CPU running time, GPU execution game synchronous rendering time, unpredictable game scene changes and intervention between virtual machines. These factors cause the FPS of the game to fail to meet the QoS requirements, affecting user experience.

目前存在的虚拟化解决方案，其资源共享机制性能不良，主要体现在资源利用率低下以及不符合服务等级协议(QoS)要求。现在大部分的虚拟机中的资源调度机制采用先进先出方式，导致一些虚拟机不符合QoS要求，并且在单一的服务器上运行多个虚拟机会给各虚拟机中的游戏应用带来性能波动，特别是在云游戏平台上更为明显。Currently existing virtualization solutions have poor resource sharing mechanisms, mainly reflected in low resource utilization and failure to meet service level agreement (QoS) requirements. Most of the resource scheduling mechanisms in virtual machines now use the first-in-first-out method, which causes some virtual machines to fail to meet the QoS requirements, and running multiple virtual machines on a single server will bring performance fluctuations to game applications in each virtual machine. Especially on cloud gaming platforms.

经过对现有技术的检索发现，美国卡耐基梅隆大学在GPU加速窗口系统中提出了该难题。IBM东京研究院开发了资源自动调度系统来加速通用GPU集群中stencil应用的运行。北卡罗来纳州大学教堂山分校提出了两种方法，把GPU集成到软实时多处理器系统中来提高整体系统的性能。纽约州大学石溪分校提出了GERM，旨在提供了一种公平的GPU资源分配算法，并且还使用例如垂直同步的固定帧率方法来避免游戏过分依赖硬件资源。但是它们都有如下不足，GERM没有考虑QoS要求并且固定帧率方法不能有效地使用硬件资源。TimeGraph不能保证在所有的虚拟机符合QoS要求，尤其是在重要负载更少的时候。After searching the prior art, it was found that Carnegie Mellon University in the United States raised this problem in the GPU accelerated window system. IBM Tokyo Research Institute has developed an automatic resource scheduling system to accelerate the operation of stencil applications in general-purpose GPU clusters. The University of North Carolina at Chapel Hill proposed two approaches to integrate GPUs into soft real-time multiprocessor systems to improve overall system performance. Stony Brook University of New York proposed GERM, which aims to provide a fair GPU resource allocation algorithm, and also uses a fixed frame rate method such as vertical synchronization to prevent the game from relying too much on hardware resources. But they all have the following disadvantages, GERM does not consider the QoS requirement and the fixed frame rate method cannot effectively use hardware resources. TimeGraph cannot guarantee that all virtual machines meet the QoS requirements, especially when the important load is less.

发明内容Contents of the invention

本发明针对现有的GPU资源调度方法存在的不足，提出了一种云游戏中的GPU虚拟资源自适应调度宿主机系统和方法，通过运用GPU半虚拟化和库注入技术，在宿主机上植入轻量级的调度控制，采用了无需更改宿主机的图像驱动、子机操作系统和应用的绿色软件技术。调度算法减轻了实时不确定因素的影响，并保证了系统资源的高利用率。具体说，基于公平性的QoS自适应调度算法不仅保证每个虚拟机都满足QoS的基本要求，而且对GPU资源进行再分配，把具有较高FPS的虚拟机的GPU资源分配给那些不满足QoS要求的虚拟机，因此基于公平性的QoS自适应调度算法不仅满足QoS要求，更实现了公平调度并显著提高GPU利用率。Aiming at the deficiencies of existing GPU resource scheduling methods, the present invention proposes a system and method for adaptively scheduling GPU virtual resources on a host machine in a cloud game. By using GPU semi-virtualization and library injection technology, a Introduce lightweight scheduling control, using green software technology that does not need to change the image driver of the host machine, the operating system of the guest machine and the application. The scheduling algorithm alleviates the impact of real-time uncertain factors and ensures high utilization of system resources. Specifically, the QoS adaptive scheduling algorithm based on fairness not only ensures that each virtual machine meets the basic requirements of QoS, but also redistributes GPU resources, and allocates GPU resources of virtual machines with higher FPS to those that do not meet QoS Therefore, the fairness-based QoS adaptive scheduling algorithm not only meets the QoS requirements, but also achieves fair scheduling and significantly improves GPU utilization.

本发明的技术解决方案如下：Technical solution of the present invention is as follows:

一种云游戏中的GPU虚拟资源自适应调度宿主机系统，其特点在于，包括调度控制模块、以及与该调度控制模块分别连接的代理模块、图像应用程序编程接口分析模块和虚拟机列表，所述的代理模块由调度模块和监控模块组成，所有模块全部部署在宿主机中；A host system for self-adaptive scheduling of GPU virtual resources in a cloud game, which is characterized in that it includes a scheduling control module, an agent module connected to the scheduling control module, an image application programming interface analysis module, and a virtual machine list. The agent module described above is composed of a scheduling module and a monitoring module, and all modules are deployed in the host machine;

所述的调度控制模块负责其他各模块之间信息的沟通：调度控制模块从虚拟机列表中获得所有运行的虚拟机的信息，并发送给需要的模块；The dispatching control module is responsible for the communication of information between other modules: the dispatching control module obtains the information of all running virtual machines from the virtual machine list, and sends them to the required modules;

调度控制模块还有一个功能就是会自动调节已设定调度方法的参数FPS来使得调度方法良好地运行；Another function of the scheduling control module is to automatically adjust the parameter FPS of the scheduled scheduling method to make the scheduling method run well;

所述的监测模块从调度控制模块获得虚拟机的运行实时信息并把它发送至调度模块，The monitoring module obtains the real-time running information of the virtual machine from the scheduling control module and sends it to the scheduling module,

所述的调度模块接收来自上层GPU命令分发器的命令，并根据监测模块发送来的虚拟机的运行实时信息，对接收的命令进行处理，最后发送至下层主机GPU驱动；The scheduling module receives the command from the upper-layer GPU command distributor, and processes the received command according to the real-time running information of the virtual machine sent by the monitoring module, and finally sends it to the lower-level host GPU to drive;

所述的代理模块和调度控制模块一起决定需要将多少GPU资源分配给虚拟机；The agent module and the scheduling control module decide together how many GPU resources need to be allocated to the virtual machine;

所述的图像应用程序编程接口分析模块负责计算由图像应用程序编程接口产生的GPU命令运行带来的开销，并将结果发送至调度控制模块，从而使需要该信息的其他模块获得。The image application programming interface analysis module is responsible for calculating the overhead caused by the execution of GPU commands generated by the image application programming interface, and sending the result to the scheduling control module, so that other modules that need the information can obtain it.

虚拟机列表保存了宿主机系统中所有的正在运行的虚拟机，每一个虚拟机都能被系统索引，当有新的虚拟机加入并开始运行时，系统就会自动把它加入虚拟机列表中并再次重新调度GPU资源。The virtual machine list saves all the running virtual machines in the host system. Each virtual machine can be indexed by the system. When a new virtual machine is added and starts running, the system will automatically add it to the virtual machine list. And reschedule the GPU resources again.

利用所述的云游戏中的GPU虚拟资源自适应调度宿主机系统的调度方法，其特点在于，该方法包括如下步骤：The scheduling method of using the GPU virtual resources in the cloud game to adaptively schedule the host computer system is characterized in that the method includes the following steps:

①户在调度控制模块内设置达到的每秒帧数，虚拟机列表中记录了虚拟机运行的实时信息，调度控制模块从虚拟机列表中获得所有运行的虚拟机的信息；① The user sets the number of frames per second achieved in the scheduling control module, the virtual machine list records the real-time information of the virtual machine running, and the scheduling control module obtains the information of all running virtual machines from the virtual machine list;

②图像应用程序编程接口分析模块计算GPU命令运行带来的开销，并发送至调度控制模块；②The image application programming interface analysis module calculates the overhead caused by the GPU command operation, and sends it to the scheduling control module;

③监测模块通过调度控制模块获得虚拟机列表中记录的虚拟机运行的实时信息，以及通过调度控制模块获得图像应用程序编程接口分析模块计算的GPU命令运行带来的开销，并把它们一同发送至调度模块；③ The monitoring module obtains the real-time information of the virtual machine operation recorded in the virtual machine list through the scheduling control module, and obtains the overhead caused by the GPU command operation calculated by the image application programming interface analysis module through the scheduling control module, and sends them together to Scheduling module;

④调度模块接收来自GPU命令分发器的命令，并根据从监测模块获得的虚拟机运行的实时信息和GPU命令运行带来的开销计算睡眠函数的运行时间，对接收到的命令进行处理，最后发送至下层主机GPU驱动；④ The scheduling module receives the commands from the GPU command distributor, and calculates the running time of the sleep function according to the real-time information of the virtual machine running obtained from the monitoring module and the overhead caused by the GPU command running, processes the received commands, and finally sends to the underlying host GPU driver;

所述的步骤④的具体步骤如下：The concrete steps of described step ④ are as follows:

当调度模块从监测模块获得的虚拟机运行的实时信息显示FPS大于30fps，则调度模块在调用渲染画面函数前插入睡眠函数，再调用渲染画面函数渲染画面；When the real-time information of the virtual machine operation obtained by the scheduling module from the monitoring module shows that the FPS is greater than 30fps, the scheduling module inserts a sleep function before calling the rendering picture function, and then calls the rendering picture function to render the picture;

当调度模块从监测模块获得的虚拟机运行的实时信息显示FPS小于30fps，则调度模块为该虚拟机分配更多的GPU资源，然后再调用渲染画面函数渲染画面；When the real-time information of the virtual machine operation obtained by the scheduling module from the monitoring module shows that the FPS is less than 30fps, the scheduling module allocates more GPU resources for the virtual machine, and then calls the rendering screen function to render the screen;

当调度模块从监测模块获得的虚拟机运行的实时信息显示FPS等于30fps，则调度模块直接调用渲染画面函数渲染画面。When the real-time information of virtual machine operation obtained by the scheduling module from the monitoring module shows that the FPS is equal to 30fps, the scheduling module directly calls the rendering screen function to render the screen.

本发明原理如下：Principle of the present invention is as follows:

(1)基于半虚拟化的系统架构(2)自适应调度算法。其中：基于半虚拟化的系统架构是一种基于宿主机库植入技术的轻量级调度模块，对于宿主机的图像API、客户机的操作系统和应用程序无需修改；其中基于公平性的QoS自适应调度算法满足云游戏用户体验的最小FPS和最大延迟，并强调资源调度的公平性。(1) System architecture based on paravirtualization (2) Adaptive scheduling algorithm. Among them: the system architecture based on paravirtualization is a lightweight scheduling module based on the host machine library implantation technology, and there is no need to modify the image API of the host machine, the operating system and application programs of the client machine; among them, the QoS based on fairness The adaptive scheduling algorithm meets the minimum FPS and maximum delay of cloud game user experience, and emphasizes the fairness of resource scheduling.

虚拟机运行时Present调用的计算时间是非常稳定的，即使发生波动也是逐渐平缓变化的，因此我们认为每个虚拟机的代理可以通过各自虚拟机的历史信息来预测Present的计算量，本算法采用过去连续的20次Present调用时间的算术平均值作为下一帧的预测。为了使得各个虚拟机的帧长度平缓稳定在设定的值FPS，通过延迟它的上一次Present调用，来扩展每个帧。而延迟的效果通过在每次Present调用前插入Sleep调用达到，sleep的量由公式给出。我们发现当存在大量GPU资源竞争时，Present的时间量会发生明显变化，我们则通过在Sleep调用前插入flush命令来得到更精确的GPU计算时间。When the virtual machine is running, the calculation time of the Present call is very stable, and even if there is a fluctuation, it will gradually change smoothly. Therefore, we believe that the agent of each virtual machine can predict the calculation amount of the Present through the historical information of the respective virtual machine. This algorithm uses The arithmetic mean of the past 20 consecutive Present call times is used as the prediction of the next frame. In order to make the frame length of each virtual machine stable at the set value FPS, each frame is extended by delaying its last Present call. The effect of delay is achieved by inserting a Sleep call before each Present call, and the amount of sleep is given by the formula. We found that when there is a lot of competition for GPU resources, the amount of time for Present will change significantly, and we can get more accurate GPU calculation time by inserting the flush command before the Sleep call.

公平性的实现：1、基于公平性的QoS自适应调度算法会维持一个虚拟机列表，虚拟机列表详细记录当前运行的虚拟机对GPU资源占有情况和各个虚拟机运行游戏的FPS。3、图像API分析器实时分析虚拟机中游戏运行图像信息以计算其FPS，并预测运行开销。4、调度控制模块一方面通过图像API分析器来把游戏运行的图像信息如FPS等情况写入虚拟机列表，另一方面通过虚拟机列表的虚拟机运行信息来决定释放那些拥有过高FPS的虚拟机的GPU资源，并把它们再分配给低于QoS标准的虚拟机。5、调度模块负责实现4中提到的GPU资源释放和再分配，一方面通过插入Sleep()延迟帧长度来释放GPU资源，另一方面把由于插入Sleep而得到的空闲GPU资源分配给那些低于Qos的虚拟机。6、上述的释放与再分配的过程中，拥有过高FPS的虚拟机等比例承担GPU资源的再分配部分。拥有最多GPU资源的虚拟机占据GPU利用率的首位，因此这种GPU资源的再分配算法是相对公平的。Realization of fairness: 1. The QoS adaptive scheduling algorithm based on fairness will maintain a virtual machine list, which records in detail the current running virtual machine's occupation of GPU resources and the FPS of each virtual machine running the game. 3. The image API analyzer analyzes the running image information of the game in the virtual machine in real time to calculate its FPS and predict the running cost. 4. On the one hand, the scheduling control module writes the image information of the game running, such as FPS, into the virtual machine list through the image API analyzer; on the other hand, it decides to release those with too high FPS GPU resources of virtual machines and reallocate them to virtual machines below the QoS standard. 5. The scheduling module is responsible for the release and reallocation of GPU resources mentioned in 4. On the one hand, the GPU resources are released by inserting Sleep() to delay the frame length, and on the other hand, the idle GPU resources obtained by inserting Sleep are allocated to those low A virtual machine based on Qos. 6. During the above-mentioned release and reallocation process, the virtual machines with too high FPS are responsible for the reallocation of GPU resources in equal proportion. The virtual machine with the most GPU resources occupies the first place in GPU utilization, so this reallocation algorithm of GPU resources is relatively fair.

与现有技术相比，本发明的有益效果是实现了虚拟机中GPU资源的科学管理，并提高了GPU资源利用率，基于公平性的QoS自适应调度方法通过公平分配的方式为所有运行的虚拟机分配足够的GPU资源并且保证符合QoS要求，同时使得整体GPU利用率最大化。Compared with the prior art, the beneficial effect of the present invention is that the scientific management of GPU resources in the virtual machine is realized, and the utilization rate of GPU resources is improved, and the fairness-based QoS adaptive scheduling method provides the The virtual machine allocates sufficient GPU resources and guarantees compliance with QoS requirements, while maximizing overall GPU utilization.

附图说明Description of drawings

图1是现有技术中虚拟资源调度系统的架构示意图。FIG. 1 is a schematic diagram of the architecture of a virtual resource scheduling system in the prior art.

图2是本发明云游戏中的GPU虚拟资源自适应调度宿主机系统的示意图。FIG. 2 is a schematic diagram of a host system for adaptively scheduling GPU virtual resources in a cloud game according to the present invention.

图3帧的组成与帧延迟。Figure 3 Frame composition and frame delay.

图4GPU虚拟资源自适应调度宿主机系统调度方法流程图。Fig. 4 is a flow chart of a host system scheduling method for adaptive scheduling of GPU virtual resources.

图5控制系统不同参数对性能的影响，(a)、(b)分别是两个不同的测试程序。Figure 5. The influence of different parameters of the control system on performance, (a) and (b) are two different test procedures.

图6两个虚拟机之间的GPU资源再分配。Figure 6. GPU resource reallocation between two virtual machines.

图7基于公平性的QoS自适应调度方法性能评估结果。Fig. 7 Performance evaluation results of the fairness-based QoS adaptive scheduling method.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明，本实施例在本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，本发明的适用平台不限于下述的实施例。The following is a detailed description of the embodiments of the present invention. This embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed implementation methods and specific operation processes. The applicable platform of the present invention is not limited to the following embodiments.

图1是现有技术中虚拟资源调度系统的架构示意图，是本发明云游戏中的GPU虚拟资源自适应调度宿主机系统的示意图，本发明系统的模块全部部署在宿主机中，是一种介于GPU和宿主机GPU应用程序编程接口的调度系统，包括调度控制模块1、以及与该调度控制模块1分别连接的代理模块、图像应用程序编程接口分析模块4和虚拟机列表5，所述的代理模块由调度模块2和监控模块3组成。Figure 1 is a schematic diagram of the architecture of the virtual resource scheduling system in the prior art, and a schematic diagram of the GPU virtual resource adaptive scheduling host system in the cloud game of the present invention. All the modules of the system of the present invention are deployed in the host machine, which is a media The scheduling system based on the GPU and the GPU application programming interface of the host machine includes a scheduling control module 1, an agent module connected respectively to the scheduling control module 1, an image application programming interface analysis module 4 and a virtual machine list 5, the described The agent module is composed of scheduling module 2 and monitoring module 3 .

调度控制模块1从所有运行的虚拟机中得到性能信息的反馈并自动调节已设定调度方法的参数来使得调度方法良好地运行；每一个运行的虚拟机都有一个由调度模块2和监控模块3组成的代理模块，监控模块发送相应虚拟机的实时性能信息至调度控制模块1，调度模块接收来自驱动的命令并调度GPU计算任务。代理模块通常也和调度控制模块1一起决定需要将多少GPU资源分配给虚拟机。图像应用程序编程接口分析模块4计算GPU命令运行带来的开销，这些命令是由图像API产生的，比如Present命令。虚拟机列表包括所有的宿主机系统中正在运行的虚拟机，每一个虚拟机都能被系统索引，当一个新的虚拟机开始运行时，宿主机系统会自动把它加入虚拟机列表中并再重新调度GPU资源。这里所述的基于半虚拟化的系统架构采用II型虚拟化。当客户机应用唤醒标准GPU渲染API时，客户机GPU计算库在主存中准备了相应的GPU缓存并发布GPU命令包，这些包被压入虚拟GPU I/O队列中并依次被宿主机处理，最后dispatch层以异步的形式发送命令至宿主机驱动。直接内存访问方式被用来把缓存从子机内存运送至GPU内存中。Scheduling control module 1 obtains the feedback of performance information from all running virtual machines and automatically adjusts the parameters of the scheduling method that have been set to make the scheduling method run well; each running virtual machine has a scheduling module 2 and a monitoring module The agent module composed of 3, the monitoring module sends the real-time performance information of the corresponding virtual machine to the scheduling control module 1, and the scheduling module receives commands from the driver and schedules GPU computing tasks. The agent module usually also decides how many GPU resources need to be allocated to the virtual machine together with the scheduling control module 1 . The image application programming interface analysis module 4 calculates the overhead caused by the execution of GPU commands, and these commands are generated by the image API, such as the Present command. The virtual machine list includes all running virtual machines in the host system, and each virtual machine can be indexed by the system. When a new virtual machine starts running, the host system will automatically add it to the virtual machine list and then Reschedule GPU resources. The paravirtualization-based system architecture described here employs Type II virtualization. When the client application wakes up the standard GPU rendering API, the client GPU computing library prepares the corresponding GPU cache in the main memory and issues GPU command packets, which are pushed into the virtual GPU I/O queue and processed by the host in turn , and finally the dispatch layer sends commands to the host driver asynchronously. Direct memory access is used to move caches from guest memory to GPU memory.

如图2所示，本方法中帧延迟的定义：将相邻两次Present返回值之间的时间差定义为帧延迟。每个帧由GPU渲染、Present()、Sleep()、目标计算和图像绘画四部分组成，由于GPU渲染、目标计算和图像绘画的变化，不同帧的长度因此而相异。如图4算法的伪代码所示，本算法在于调整帧的组成之一Sleep()的时间来稳定每个帧的大小，最终使得各个帧维持在一个相近的水平。具体来说，基于公平性的QoS自适应调度算法分为两个方面，分别是调度自适应性的实现和调度公平性的实现。方法程序的逻辑流程图如图3所示。As shown in Figure 2, the definition of frame delay in this method: the time difference between the return values of two adjacent Presents is defined as the frame delay. Each frame consists of four parts: GPU rendering, Present(), Sleep(), target calculation, and image painting. Due to the changes in GPU rendering, target calculation, and image painting, the lengths of different frames are different. As shown in the pseudocode of the algorithm in Figure 4, this algorithm is to adjust the time of Sleep(), one of the components of the frame, to stabilize the size of each frame, and finally maintain each frame at a similar level. Specifically, the QoS adaptive scheduling algorithm based on fairness is divided into two aspects, namely, the realization of scheduling adaptability and the realization of scheduling fairness. The logic flow chart of the method program is shown in FIG. 3 .

自适应调度部分的实现：1、首先用户在调度控制模块里设置好系统欲达到的FPS(每秒帧数)，虚拟机列表详细记录了虚拟机运行的实时信息，调度控制模块负责接收来自虚拟机列表的虚拟机运行信息。2、图像应用程序编程接口分析模块计算GPU命令(比如渲染画面函数命令)运行带来的开销并发送至调度控制模块。3、监测模块从调度控制模块那里获得了虚拟机的运行实时信息和GPU命令开销，并把它们一同发送至调度模块。一方面，调度模块从上层接受到来自GPU命令分发器的命令并结合从监测模块获得的虚拟机运行信息和GPU命令运行带来的开销计算睡眠函数的运行时间，对接收来的命令进行处理，最后再送至底层主机GPU驱动：若信息显示FPS大于30则在调用渲染画面函数前插入睡眠函数，从而达到扩展延迟帧的目的，最后调用渲染画面函数渲染画面。若小于30则调度模块为该虚拟机(FPS小于30的)分配更多的GPU资源，然后调用渲染画面函数渲染画面。若等于30则直接调用渲染画面函数渲染画面。The realization of the adaptive scheduling part: 1. First, the user sets the FPS (frames per second) that the system wants to achieve in the scheduling control module. The virtual machine list records the real-time information of the virtual machine operation in detail. The virtual machine running information in the machine list. 2. The image application programming interface analysis module calculates the overhead caused by the running of GPU commands (such as rendering screen function commands) and sends it to the scheduling control module. 3. The monitoring module obtains the real-time running information of the virtual machine and the GPU command overhead from the scheduling control module, and sends them to the scheduling module together. On the one hand, the scheduling module receives commands from the GPU command distributor from the upper layer and calculates the running time of the sleep function by combining the virtual machine running information obtained from the monitoring module and the overhead caused by the GPU command running, and processes the received commands. Finally, it is sent to the underlying host GPU driver: if the information shows that the FPS is greater than 30, a sleep function is inserted before calling the rendering screen function, so as to achieve the purpose of extending the delayed frame, and finally the rendering screen function is called to render the screen. If it is less than 30, the scheduling module allocates more GPU resources for the virtual machine (FPS less than 30), and then calls the rendering screen function to render the screen. If it is equal to 30, directly call the render screen function to render the screen.

这里简单对本发明的一个实例做出阐述。假设标准FPS是30，系统上一共运行了3个虚拟机V1、V2、V3，我们假设历史GPU使用和FPS的比率分别是t1、t2、t3。如果一开始V1运行在20FPS，V2运行在40FPS，V3运行在60FPS，那么首先调度驱动模块接收到来自监控器传送的性能参数，一旦它探测到V1没有满足QoS要求，基于公平性的QoS自适应调度算法开始工作，首先计算出应该再分配给V1的GPU资源，10t1。其他两个虚拟机V2、V3分别要准备释放5t1的资源An example of the present invention is briefly described here. Assuming that the standard FPS is 30, and a total of 3 virtual machines V1, V2, and V3 are running on the system, we assume that the ratios of historical GPU usage and FPS are t1, t2, and t3, respectively. If V1 runs at 20FPS at the beginning, V2 runs at 40FPS, and V3 runs at 60FPS, then the scheduling driver module first receives the performance parameters transmitted from the monitor, and once it detects that V1 does not meet the QoS requirements, QoS self-adaptation based on fairness The scheduling algorithm starts to work, and first calculates the GPU resources that should be allocated to V1, 10t1. The other two virtual machines V2 and V3 should prepare to release the resources of 5t1 respectively

本实施例的具体操作平台是i7-2600K 3.4GHz的CPU，16GB RAM，AMD HD6750显卡和2GB的视频内存，宿主操作系统和子操作系统都采用Windows7x64，VMware的版本是4.0，每一个宿主操作系统都是双核的并且拥有2GB的RAM，屏幕分辨率是1280*720(高清品质)，为了简化性能比较，本实施例禁用了宿主机上的对换空间、GPU加速窗口系统。The specific operating platform of this embodiment is the CPU of i7-2600K 3.4GHz, 16GB RAM, the video memory of AMD HD6750 graphics card and 2GB, host operating system and sub-operating system all adopt Windows7x64, the version of VMware is 4.0, each host operating system It is dual-core and has 2GB of RAM, and the screen resolution is 1280*720 (high-definition quality). In order to simplify the performance comparison, this embodiment disables the swap space and GPU accelerated window system on the host machine.

本实施例采用两种类型的负载，第一种是“理想模型游戏”，另一种是“现实模型游戏”，三个游戏代号分别为A、B、C。首先我们测试并评估了使用PI控制的控制系统的作用，如图5显示了控制系统分别在kp＝0.5，kp＝0.25，kp＝0.1和ki＝0.5，ki＝0.1，ki＝0.05下的性能表现，可以得出结论QoS自适应调度算法提高了系统的性能。This embodiment adopts two types of loads, the first is "ideal model game" and the other is "realistic model game". The code names of the three games are A, B, and C respectively. First, we tested and evaluated the effect of the control system using PI control. Figure 5 shows the performance of the control system at kp=0.5, kp=0.25, kp=0.1 and ki=0.5, ki=0.1, ki=0.05 From the performance, it can be concluded that the QoS adaptive scheduling algorithm improves the performance of the system.

Sleep效果和GPU再分配的分析，系统开发了库注入的技术用来在调用GPU API的时候插入sleep函数，我们评估了sleep函数在控制FPS和GPU资源利用率上的效果，在这个测试中，只有一个虚拟机运行在控制系统中，图像显示设置为1920*1200分辨率，初始睡眠时间调整为每帧300ms，之后每秒自减1ms。如图6，得到的实验数据表明其中一个虚拟机的GPU和CPU资源可以被其他的虚拟机资源获得，即系统架构和算法可以有效地在多个虚拟机中调度GPU资源。Analysis of Sleep effect and GPU reallocation, the system developed a library injection technology to insert the sleep function when calling the GPU API, we evaluated the effect of the sleep function on controlling FPS and GPU resource utilization, in this test, There is only one virtual machine running in the control system, the image display is set to 1920*1200 resolution, the initial sleep time is adjusted to 300ms per frame, and then decreases by 1ms per second. As shown in Figure 6, the obtained experimental data shows that the GPU and CPU resources of one virtual machine can be obtained by other virtual machine resources, that is, the system architecture and algorithm can effectively schedule GPU resources in multiple virtual machines.

最后测试评估系统的调度算法性能。基于公平性的QoS自适应调度算法，首先，C拥有最高的FPS而B的FPS由于GPU资源竞争低于30。然后FSA调度策略开始发挥作用，在第2秒的时候，系统探测到B的FPS大概在28左右而其他两个负载大于30。系统释放了来自A和C的GPU资源并再分配给B，因此在第三秒的时候，B的FPS增加到33并且其他两个负载只减少了一点，在整个调度过程中，C拥有最高的FPS并且A的FPS并没有减少到设定的标准以下。由图7可知，对于GPU资源利用率，最大值是99.1％，而最小的是85.2％，平均是92.7％，尽管这里还有一小部分GPU资源的浪费，但是基于公平性的QoS自适应调度算法基本可以在大多数时间内把GPU利用率维持在一个很高的水平。Finally, test and evaluate the scheduling algorithm performance of the system. Fairness-based QoS adaptive scheduling algorithm, first, C has the highest FPS while B's FPS is lower than 30 due to GPU resource competition. Then the FSA scheduling strategy starts to play a role. In the second second, the system detects that the FPS of B is about 28 and the other two loads are greater than 30. The system releases the GPU resources from A and C and reassigns them to B, so at the third second, B's FPS increases to 33 and the other two loads decrease only a little, and C has the highest FPS and A's FPS has not decreased below the set standard. It can be seen from Figure 7 that for GPU resource utilization, the maximum is 99.1%, the minimum is 85.2%, and the average is 92.7%. Although there is still a small amount of waste of GPU resources, the fairness-based QoS adaptive scheduling algorithm Basically, the GPU utilization can be maintained at a very high level most of the time.

通过测试，本发明云游戏中的GPU虚拟资源自适应调度宿主机系统和自适应调度方法，实现了虚拟机中GPU资源的科学管理并提高了GPU资源利用率。其中基于公平性的QoS自适应调度算法通过公平分配的方式为所有运行的虚拟机分配足够的GPU资源并且保证符合QoS要求，同时使得整体GPU利用率最大化。测试表明本算法在各种不同的负载下都实现了相应目标，并把开销限制在5-10％。Through testing, the GPU virtual resource adaptive scheduling host system and adaptive scheduling method in the cloud game of the present invention realize the scientific management of GPU resources in the virtual machine and improve the utilization rate of GPU resources. Among them, the fairness-based QoS adaptive scheduling algorithm allocates enough GPU resources to all running virtual machines in a fair way and ensures compliance with QoS requirements, while maximizing the overall GPU utilization. Tests show that the algorithm achieves the corresponding goals under various loads, and limits the overhead to 5-10%.

Claims

1. the GPU virtual resource adaptive scheduling host machine system in a cloud game, it is characterized in that, the proxy module comprising dispatching control module (1) and be connected respectively with this dispatching control module (1), image application program DLL analysis module (4) and virtual machine list (5), described proxy module is made up of scheduler module (2) and monitoring modular (3), all modules are all deployed in host

Described dispatching control module (1) is responsible for the communication of information between other each modules: dispatching control module (1) obtains the information of the virtual machine of all operations from virtual machine list (5), and sends to the module of needs;

Described monitoring modular (3) obtains the operation real time information of virtual machine from dispatching control module (1) and it is sent to scheduler module (2),

Described scheduler module (2) receives the order from upper strata GPU order distributor, and the operation real time information of the virtual machine sent according to monitoring modular (3), the order received is processed, is finally sent to lower floor main frame GPU and drives;

Described proxy module determines to need how many GPU resource are distributed to virtual machine together with dispatching control module (1);

Described image application program DLL analysis module (4) is responsible for calculating the GPU order produced by image application program DLL and is run the expense brought, and result is sent to dispatching control module (1), thus other modules needing this information are obtained

Virtual machine list (5) saves the virtual machines run all in described host machine system, each virtual machine can by system index, when there being new virtual machine add and bring into operation, system will automatically add it in virtual machine list (5) and again reschedule GPU resource.

2. utilize the dispatching method of the GPU virtual resource adaptive scheduling host machine system in the cloud game described in claim 1, it is characterized in that, the method comprises the steps:

1. user arranges the number of pictures per second (FPS) reached in dispatching control module, have recorded the real time information that virtual machine runs in virtual machine list, dispatching control module obtains the information of the virtual machine of all operations from virtual machine list (5);

2. image application program DLL analysis module (4) calculates GPU order and runs the expense brought, and is sent to dispatching control module (1);

3. monitoring modular (3) obtains the real time information of the virtual machine operation of recording in virtual machine list by dispatching control module (1), and run by the GPU order that dispatching control module (1) acquisition image application program DLL analysis module calculates the expense brought, and they are together sent to scheduler module (2);

4. scheduler module (2) receives the order from GPU order distributor, and the real time information run according to the virtual machine that obtains from monitoring modular and GPU order run the running time of the overhead computational sleep function brought, the order received is processed, is finally sent to lower floor main frame GPU and drives.

3. dispatching method according to claim 2, is characterized in that, described step concrete steps are 4. as follows:

The transmission frame number per second shown from the real time information that the virtual machine that monitoring modular obtains runs when scheduler module (2) is greater than 30fps, then scheduler module inserts sleep function before calling rendered picture function, then calls rendered picture function rendered picture;

The transmission frame number per second of the real time information display that the virtual machine obtained from monitoring modular when scheduler module (2) runs is less than 30fps, then scheduler module distributes more GPU resource for this virtual machine, and then calls rendered picture function rendered picture;

The transmission frame number per second shown from the real time information that the virtual machine that monitoring modular obtains runs when scheduler module (2) equals 30fps, then scheduler module directly calls rendered picture function rendered picture.