CN102650950B

CN102650950B - Platform architecture supporting multi-GPU (Graphics Processing Unit) virtualization and work method of platform architecture

Info

Publication number: CN102650950B
Application number: CN201210102989.8A
Authority: CN
Inventors: 袁家斌; 吕相文; 马业
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2012-04-10
Filing date: 2012-04-10
Publication date: 2015-04-15
Anticipated expiration: 2032-04-10
Also published as: CN102650950A

Abstract

The present invention provides a platform framework supporting multi-GPU virtualization and its working method. The framework deploys middleware on the GPU server side and the virtual machine side, and uses socket or infiniband as the transmission medium to fill the original virtual machine The platform cannot take advantage of the disadvantages of GPU acceleration. This architecture manages GPU resources through one or more centralized control management nodes, and divides GPU resources in a fine-grained manner, and can provide the function of multi-task parallel tasks. The virtual machine requests GPU resources from the management node through the middleware, and uses the GPU resources to accelerate. The GPU server registers GPU resources with the management node through the middleware, and uses GPU resources to provide services. The invention introduces the parallel processing ability of the GPU into the virtual machine and combines the management mechanism to maximize the utilization rate of the GPU. This architecture can effectively reduce energy consumption and improve computing efficiency.

Description

A platform architecture supporting multi-GPU virtualization and its working method

技术领域 technical field

本发明涉及一种虚拟化环境中虚拟机以重定向方式利用多GPU加速计算的平台架构及其工作方法，属于虚拟化技术领域。 The invention relates to a platform architecture and a working method for a virtual machine in a virtualized environment to accelerate computing by using multiple GPUs in a redirection manner, and belongs to the technical field of virtualization.

背景技术 Background technique

虚拟化是云计算的核心技术基础，它所带来的成本节约、安全性增强等优势逐渐获得了人们的认可，是计算机科学领域的研究热点。虚拟化技术通过对硬件资源的虚拟化，可以在一台计算机上模拟多个相同的计算机硬件平台，从而能够同时运行多个操作系统并实现相互隔离，提高了服务器的利用效率，在服务器合并、网络安全、计算数据保护、高性能计算和可信等领域都有大量的应用。 Virtualization is the core technical basis of cloud computing, and its advantages such as cost saving and security enhancement have gradually been recognized by people, and it is a research hotspot in the field of computer science. Through the virtualization of hardware resources, virtualization technology can simulate multiple identical computer hardware platforms on one computer, so that multiple operating systems can run simultaneously and be isolated from each other, which improves the utilization efficiency of servers. There are numerous applications in areas such as network security, computational data protection, high-performance computing, and trustworthiness.

近几年，图形处理单元（GPU）的性能和功能显著增加。GPU的功能不再局限于图像处理，同时发展成为一个有着高计算峰值和内存带宽的高度并行的处理器。随着一些支持GPGPU（通用图形处理单元）计算的技术（例如CUDA）的推出，GPGPU的应用也越来越广泛。由于GPU的强大的并行计算能力，使得越来越多的高性能运算中引入了CPU+GPU的异构模式。但是，一方面GPU的功耗较大，如果每个节点都配备GPU，则可能大大增加集群的功耗；一方面，GPU的并行计算能力强大，而在大部分运算中，GPU作为协处理器，仅仅加速代码中的并行部分，使得GPU的使用率不高；另一方面由于GPU的封闭性，虚拟机无法直接利用GPU来加速计算。这使得GPU在虚拟机方面的应用受到了很大的限制。 The performance and capabilities of Graphics Processing Units (GPUs) have increased significantly in recent years. The function of GPU is no longer limited to image processing, and has developed into a highly parallel processor with high computing peak and memory bandwidth. With the introduction of some technologies that support GPGPU (general-purpose graphics processing unit) computing (such as CUDA), the application of GPGPU is becoming more and more extensive. Due to the powerful parallel computing capability of GPU, the heterogeneous mode of CPU+GPU has been introduced into more and more high-performance computing. However, on the one hand, the power consumption of the GPU is large. If each node is equipped with a GPU, it may greatly increase the power consumption of the cluster; , only accelerates the parallel part of the code, so that the utilization rate of the GPU is not high; on the other hand, due to the closed nature of the GPU, the virtual machine cannot directly use the GPU to accelerate the calculation. This greatly limits the application of GPU in virtual machines.

发明内容 Contents of the invention

技术问题technical problem

为了解决虚拟化环境无法利用GPU加速的问题，本发明提出一种适用于集群的多GPU虚拟化平台架构及其工作方法，通过管理组件、客户端组件、服务器端组件协同工作，使得虚拟机可以获取GPU强大的并行计算能力，能够实现对GPU资源的获取和细粒度的资源分配，并通过对GPU进行负载均衡，提高GPU利用率，降低能耗。本发明实现了利用GPU对虚拟机处理能力的提升，虚拟机中的组件通过拦截应用程序对GPU调用，并重定向至本机特权域或者远程GPU服务器，使得应用程序的GPU调用在特权域或者远程GPU服务器之上执行，并在执行完成返回结果到虚拟机。 In order to solve the problem that the virtualization environment cannot use GPU acceleration, the present invention proposes a multi-GPU virtualization platform architecture and its working method applicable to clusters. Through the cooperative work of management components, client components, and server components, the virtual machine can Obtaining the powerful parallel computing capability of the GPU can realize the acquisition of GPU resources and fine-grained resource allocation, and through load balancing of the GPU, the utilization rate of the GPU can be improved and the energy consumption can be reduced. The present invention realizes the improvement of the processing ability of the virtual machine by using the GPU. The components in the virtual machine intercept the application program calling the GPU and redirect it to the local privilege domain or the remote GPU server, so that the GPU call of the application program is in the privilege domain or the remote GPU server. Execute on the GPU server, and return the result to the virtual machine after the execution is completed.

技术方案Technical solutions

本发明为了解决上述技术问题采用如下技术方案： The present invention adopts following technical scheme in order to solve the problems of the technologies described above:

一种支持多GPU虚拟化的平台架构，包括GPU资源管理模块、虚拟机客户端模块、GPU服务端模块，GPU资源管理模块部署在GPU资源管理节点上，虚拟机客户端模块部署在虚拟机客户端上，GPU服务端模块部署在GPU服务端上，GPU资源管理模块负责GPU服务器的注册以及对GPU资源请求的处理，虚拟机客户端模块和GPU服务端模块进行数据传输，虚拟机客户端模块与GPU服务端模块进行交互，虚拟机客户端模块拦截虚拟机对GPU的调用，并重定向至GPU服务端模块，GPU服务端模块则接受虚拟机客户端模块拦截的GPU调用信息，调用GPU执行，并返回执行结果。 A platform architecture that supports multi-GPU virtualization, including a GPU resource management module, a virtual machine client module, and a GPU server module. The GPU resource management module is deployed on the GPU resource management node, and the virtual machine client module is deployed on the virtual machine client. On the end, the GPU server module is deployed on the GPU server. The GPU resource management module is responsible for the registration of the GPU server and the processing of GPU resource requests. The virtual machine client module and the GPU server module perform data transmission. The virtual machine client module Interact with the GPU server module. The virtual machine client module intercepts the call of the virtual machine to the GPU and redirects it to the GPU server module. The GPU server module accepts the GPU call information intercepted by the virtual machine client module and calls the GPU for execution. And return the execution result.

上述支持多GPU虚拟化的平台架构的工作方法包括如下步骤： The working method of the above-mentioned platform architecture supporting multi-GPU virtualization includes the following steps:

步骤1，启动GPU资源管理模块，监听GPU服务端模块的注册请求以及虚拟机客户端模块的资源请求，并对注册的GPU服务器维护一张状态表。 Step 1, start the GPU resource management module, monitor the registration request of the GPU server module and the resource request of the virtual machine client module, and maintain a status table for the registered GPU server.

步骤2，启动GPU服务端模块，向GPU资源管理模块发送注册请求。 Step 2, start the GPU server module, and send a registration request to the GPU resource management module.

步骤3，GPU资源管理模块接收注册请求，建立一张表，维护当前GPU服务器的状态，完成后返回成功。 Step 3, the GPU resource management module receives the registration request, creates a table, maintains the status of the current GPU server, and returns success after completion.

步骤4，GPU服务端模块接收到注册成功的信息，立刻监听指定端口。 Step 4, the GPU server module receives the message of successful registration, and immediately monitors the designated port.

步骤5，启动虚拟机客户端模块，监听对GPU的调用。当虚拟机出现GPU调用，则向GPU资源管理模块发送资源请求。 Step 5, start the virtual machine client module, and monitor calls to the GPU. When a GPU call occurs in the virtual machine, a resource request is sent to the GPU resource management module.

步骤6，GPU资源管理模块接收到资源请求，获取已注册GPU服务器的工作状态，根据一定算法，向虚拟机客户端模块分配最匹配的GPU服务器。 Step 6, the GPU resource management module receives the resource request, obtains the working status of the registered GPU server, and assigns the most matching GPU server to the virtual machine client module according to a certain algorithm.

步骤7，GPU资源管理模块接收到分配的GPU服务器，并与该GPU服务器的GPU服务端模块建立数据传输连接，虚拟机客户端模块将拦截的调用封装，并通过数据传输连接发送至GPU服务端模块。 Step 7, the GPU resource management module receives the assigned GPU server, and establishes a data transmission connection with the GPU server module of the GPU server, and the virtual machine client module encapsulates the intercepted call and sends it to the GPU server through the data transmission connection module.

步骤8，GPU服务端模块收到封装的数据，综合每个GPU负载信息，为其选择最匹配的GPU，执行调用，并返回结果直至执行结束。 Step 8: The GPU server module receives the encapsulated data, integrates the load information of each GPU, selects the most matching GPU for it, executes the call, and returns the result until the execution ends.

步骤9，虚拟机客户端模块接收到结果返回给应用程序直至执行结束。 In step 9, the virtual machine client module receives the result and returns it to the application program until the execution ends.

有益效果Beneficial effect

本发明利用现有的虚拟化技术，将GPU的并行处理能力引入虚拟机，并结合管理机制，通过常用的socket、infiniband或者各虚拟化平台专用的通信方式作为数据传输的载体，不对现有虚拟化平台进行修改，只在原有平台基础上添加组件，使得虚拟机可以使用GPU来计算，适用于所有的虚拟化平台。本发明易使用，使用者仅需要简单的设置和配置，易移植，无须修改可以工作在各个虚拟化平台之上；本发明设计定位为在虚拟化集群中，利用GPU协助提高虚拟化的处理能力，适合教师、学生等在教学过程需要使用GPU演示编程技术或者其他内容，同时也适合在虚拟化集群中，提高虚拟机的处理能力，并且提高GPU的使用率。本发明使用场景广泛，具有良好的实用性和可行性。 The present invention utilizes the existing virtualization technology, introduces the parallel processing capability of the GPU into the virtual machine, and combines the management mechanism, and uses the commonly used socket, infiniband or the communication mode dedicated to each virtualization platform as the carrier of data transmission, which does not affect the existing virtual machine. It only needs to add components on the basis of the original platform, so that the virtual machine can use GPU to calculate, which is applicable to all virtualization platforms. The present invention is easy to use, users only need simple settings and configurations, easy to transplant, and can work on various virtualization platforms without modification; the present invention is designed and positioned to use GPU to assist in improving the processing capacity of virtualization in virtualization clusters It is suitable for teachers and students who need to use GPU to demonstrate programming technology or other content in the teaching process. It is also suitable for virtualization clusters to improve the processing capacity of virtual machines and increase the utilization of GPUs. The present invention has a wide range of application scenarios, and has good practicability and feasibility.

附图说明 Description of drawings

图1为本发明功能模块示意图； Fig. 1 is a schematic diagram of functional modules of the present invention;

图2为本发明的传输程序流程图； Fig. 2 is the transmission program flowchart of the present invention;

图3为本发明的实时控制程序流程图。 Fig. 3 is a flow chart of the real-time control program of the present invention.

具体实施方式 Detailed ways

以下结合附图说明本发明具体实施方案： The specific embodiment of the present invention is illustrated below in conjunction with accompanying drawing:

如图1所示，本发明提供的支持多GPU虚拟化的平台架构包括GPU资源管理模块、虚拟机客户端模块、GPU服务端模块。GPU资源管理模块部署在GPU资源管理节点上，虚拟机客户端模块部署在虚拟机客户端上，GPU服务端模块部署在GPU服务端（即GPU服务器）上。虚拟机客户端与GPU服务端以及GPU资源管理模块通过socket、infiniband或者虚拟化平台的专属方式进行通信。GPU资源管理模块负责GPU服务器的注册以及对GPU资源请求的处理，虚拟机客户端模块和GPU服务端模块进行数据传输，虚拟机客户端模块与GPU服务端模块进行交互，虚拟机客户端模块拦截虚拟机对GPU的调用，并重定向至GPU服务端模块，GPU服务端模块则接受虚拟机客户端模块拦截的GPU调用信息，调用GPU执行，并返回执行结果。GPU资源管理模块对GPU服务器模块以及虚拟机客户端模块的请求进行处 As shown in FIG. 1 , the platform architecture supporting multi-GPU virtualization provided by the present invention includes a GPU resource management module, a virtual machine client module, and a GPU server module. The GPU resource management module is deployed on the GPU resource management node, the virtual machine client module is deployed on the virtual machine client, and the GPU server module is deployed on the GPU server (that is, the GPU server). The virtual machine client communicates with the GPU server and the GPU resource management module through socket, infiniband or a dedicated method of the virtualization platform. The GPU resource management module is responsible for the registration of the GPU server and the processing of GPU resource requests, data transmission between the virtual machine client module and the GPU server module, interaction between the virtual machine client module and the GPU server module, and interception by the virtual machine client module The call of the virtual machine to the GPU is redirected to the GPU server module, and the GPU server module receives the GPU call information intercepted by the virtual machine client module, calls the GPU for execution, and returns the execution result. The GPU resource management module processes the requests of the GPU server module and the virtual machine client module

理，响应GPU服务器模块的资源注册，并实时监控每个GPU服务器的负载，对每个GPU服务器的任务进行调整，对每个GPU服务器的计算资源进行管理，同时根据负载响应虚拟机客户端模块的请求，并为其分配最匹配的GPU服务器。虚拟机客户端模块与GPU服务端模块，通过socket或者infiniband或者各虚拟化平台专用的通信方式进行交互，通过拦截、重定向的方式，完成虚拟机的GPU加速。此外，GPU服务端模块对GPU服务器中多个GPU进行资源管理，每个GPU支持多个任务并行运行，GPU服务端模块统计每一个GPU当前任务负载，并根据当前负载进行GPU间的负载均衡。 It responds to the resource registration of the GPU server module, monitors the load of each GPU server in real time, adjusts the tasks of each GPU server, manages the computing resources of each GPU server, and responds to the virtual machine client module according to the load. request, and assign the best matching GPU server to it. The virtual machine client module and the GPU server module interact through socket or infiniband or communication methods dedicated to each virtualization platform, and complete the GPU acceleration of the virtual machine through interception and redirection. In addition, the GPU server module manages the resources of multiple GPUs in the GPU server. Each GPU supports multiple tasks running in parallel. The GPU server module counts the current task load of each GPU and performs load balancing among GPUs according to the current load.

如图2所示为GPU资源管理节点的工作流程图。虚拟机客户端模块向GPU资源管理模块请求资源，GPU服务端模块向GPU资源管理模块注册资源。GPU资源管理模块对请求进行解析，并根据当前状态进行响应。 Figure 2 is a workflow diagram of the GPU resource management node. The virtual machine client module requests resources from the GPU resource management module, and the GPU server module registers resources with the GPU resource management module. The GPU resource management module parses the request and responds according to the current state.

如图3所示为虚拟机客户端模块与GPU服务端模块交互的工作流程图。计算工作主要由虚拟机客户端模块与GPU服务端模块共同完成。在虚拟机客户模块与GPU服务端模块建立连接后，双方进行数据收发传输。此时，虚拟机客户端模块拦截本机上运行的应用程序对于GPU的调用，并向GPU服务端模块发送数据。GPU服务端模块接收虚拟机客户端的数据，并解析，选择GPU并利用该GPU运算得出结果，并返回执行结果。虚拟机客户端模块接收数据，并将执行结果返回给应用程序。GPU服务端模块判断用户端发送的数据是否为停止传输命令。若不是，则重复上述过程；若是，则关闭此连接。 FIG. 3 is a flowchart of the interaction between the virtual machine client module and the GPU server module. The computing work is mainly completed by the virtual machine client module and the GPU server module. After the virtual machine client module establishes a connection with the GPU server module, both parties send and receive data. At this time, the virtual machine client module intercepts the call of the application program running on the local machine to the GPU, and sends data to the GPU server module. The GPU server module receives the data from the virtual machine client, parses it, selects the GPU, uses the GPU to calculate the result, and returns the execution result. The virtual machine client module receives the data and returns the execution result to the application program. The GPU server module determines whether the data sent by the client is a stop transmission command. If not, repeat the above process; if yes, close the connection.

支持多GPU虚拟化的平台架构的工作方法具体包括如下步骤： The working method of the platform architecture supporting multi-GPU virtualization specifically includes the following steps:

Claims

1. A working method of a platform architecture supporting multi-GPU virtualization, characterized in that said platform architecture includes a GPU resource management module, a virtual machine client module, a GPU server module, and the GPU resource management module is deployed on a GPU resource management node In the above, the virtual machine client module is deployed on the virtual machine client, the GPU server module is deployed on the GPU server, the GPU resource management module is responsible for the registration of the GPU server and the processing of GPU resource requests, the virtual machine client module and the GPU The server module performs data transmission, the virtual machine client module interacts with the GPU server module, the virtual machine client module intercepts the virtual machine’s call to the GPU, and redirects it to the GPU server module, and the GPU server module accepts the virtual machine client The terminal module intercepts the GPU call information, calls the GPU for execution, and returns the execution result;

The GPU resource management module processes the requests of the GPU server module and the virtual machine client module, responds to the registration request of the GPU server module, and monitors the load of each GPU server in real time, and adjusts the tasks of each GPU server , manage the computing resources of each GPU server, and at the same time respond to the request of the virtual machine client module according to the load, and assign the most matching GPU server to it;

Described working method comprises the following steps:

Step 1, start the GPU resource management module, monitor the registration request of the GPU server module and the resource request of the virtual machine client module, and maintain a status table for the registered GPU server;

Step 2, start the GPU server module, and send a registration request to the GPU resource management module;

Step 3, the GPU resource management module receives the registration request, establishes a status table, maintains the current working status of the GPU server, and returns the registration success information after completion;

Step 4, the GPU server module receives the message that the registration is successful, and immediately monitors the designated port;

Step 5, start the virtual machine client module, monitor the call to the GPU, and when the virtual machine calls the GPU, send a resource request to the GPU resource management module;

Step 6, the GPU resource management module receives the resource request, obtains the working status of the registered GPU server, and assigns the most matching GPU server to the virtual machine client module according to a certain algorithm;

Step 7, the GPU resource management module receives the assigned GPU server, and establishes a data transmission connection with the GPU server module of the GPU server, and the virtual machine client module encapsulates the intercepted call and sends it to the GPU server through the data transmission connection module;

Step 8, the GPU server module receives the encapsulated data, integrates the load information of each GPU, selects the most matching GPU for it, executes the call, and returns the result until the execution ends;

In step 9, the virtual machine client module receives the result and returns it to the application program until the execution ends.

2. The working method according to claim 1, wherein the virtual machine client module interacts with the GPU server module through socket or infiniband or a communication mode dedicated to each virtualization platform, by intercepting and redirecting , complete the GPU acceleration of the virtual machine.

3. working method according to claim 1, it is characterized in that described GPU server module carries out resource management to a plurality of GPUs in GPU server, and each GPU supports a plurality of tasks to run in parallel, and GPU server module counts each GPU The current task load, and perform load balancing among GPUs according to the current task load.