CN117593170A

CN117593170A - A video memory allocation method, device, equipment and readable storage medium

Info

Publication number: CN117593170A
Application number: CN202311601081.6A
Authority: CN
Inventors: 陈培; 刘慧兴
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-02-23

Abstract

The invention discloses a video memory distribution method, a device, equipment and a readable storage medium in the technical field of computer application, wherein the method comprises the following steps: acquiring a task of a video memory to be allocated; acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table; based on the host process identifier, obtaining the memory consumption of each process in the container; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; and determining the residual video memory usable by the container by using the total video memory consumption and the video memory quota of the container, and distributing the video memory to the task based on the residual video memory. The invention has the technical effects that: the accurate container video memory consumption can be obtained, the authority of a host is not granted to a user, and the multi-card multi-task scene can be effectively managed through the process identification relation table, so that the video memory allocation can be effectively and orderly carried out.

Description

A video memory allocation method, device, equipment and readable storage medium

技术领域Technical field

本发明涉及计算机应用技术领域，特别是涉及一种显存分配方法、装置、设备及可读存储介质。The present invention relates to the field of computer application technology, and in particular to a memory allocation method, device, equipment and readable storage medium.

背景技术Background technique

在CPU+GPU这种计算架构上，GPU(graphics processing unit，图形处理器)主要承载计算密集型的任务负载，用以加速计算。在CPU+GPU计算架构中，在同一个GPU上运行的多个任务，会争抢GPU资源，如果剩余资源不足以运行本任务，就会导致任务失败退出。即，当缺乏GPU资源有效、有序管理导致多任务共享GPU时混乱无序，致使任务失败率增加。In the computing architecture of CPU+GPU, the GPU (graphics processing unit, graphics processor) mainly carries computing-intensive task loads to accelerate calculations. In the CPU+GPU computing architecture, multiple tasks running on the same GPU will compete for GPU resources. If the remaining resources are not enough to run the task, the task will fail and exit. That is, when the lack of effective and orderly management of GPU resources leads to chaos and disorder when multiple tasks share the GPU, the task failure rate increases.

目前，通常利用k8s(Kubernetes，管理集群资源的应用)或者容器管理系统来管理GPU资源，并采用预分配显存的技术来解决，具体方法是劫持cuda driver(统一计算设备架构驱动程序；cuda：Compute Unified Device Architecture，统一计算设备架构，用于在通用计算设备(GPU)上运行并行计算；driver，驱动)的显存分配API，在显存分配时判断容器内空闲显存是否满足待分配显存大小，从而进行显存分配。Currently, k8s (Kubernetes, an application that manages cluster resources) or a container management system are usually used to manage GPU resources, and the technology of pre-allocated video memory is used to solve the problem. The specific method is to hijack the cuda driver (unified computing device architecture driver; cuda: Compute Unified Device Architecture, a unified computing device architecture, is used to run parallel computing on general computing devices (GPU); driver, driver)'s video memory allocation API determines whether the free video memory in the container meets the size of the video memory to be allocated during video memory allocation, thereby performing Video memory allocation.

但是，在统计容器内GPU进程显存使用量时，往往会遗漏隐式分配显存的大小，会造成容器内进程实际使用显存超出配额，影响任务执行；在多卡多任务的情况下，管理容器内各个GPU卡的显存变的更为复杂，尤其是用户在运行GPU任务时设置了CUDA_VISIBLE_DEVICES参数，任务会改变GPU index，造成控制显存逻辑的混乱；nvidia驱动自带的nvidia-smi命令，在容器内无法显示GPU进程信息，用户为了查看自己任务的运行情况，需要登录主机运行此命令，而且还需要从所有任务中，区分哪些是自己容器的任务，当GPU卡上共享容器和GPU任务较多时，查看自己的任务十分麻烦，且还需要赋予普通用户主机权限，会导致普通用户之间可以相互看到对方的任务，存在安全隐患。However, when counting the video memory usage of GPU processes in a container, the size of the implicitly allocated video memory is often omitted, which will cause the actual use of video memory by the process in the container to exceed the quota and affect task execution; in the case of multiple cards and multiple tasks, the management of the container The video memory of each GPU card becomes more complicated, especially when the user sets the CUDA_VISIBLE_DEVICES parameter when running a GPU task. The task will change the GPU index, causing confusion in the logic of controlling the video memory; the nvidia-smi command that comes with the nvidia driver is in the container. GPU process information cannot be displayed. In order to view the running status of their own tasks, users need to log in to the host and run this command. They also need to distinguish among all tasks which tasks are their own containers. When there are many shared containers and GPU tasks on the GPU card, It is very troublesome to check your own tasks, and you need to give ordinary users host permissions, which will cause ordinary users to see each other's tasks, which poses a security risk.

综上所述，如何有效地解决显存分配等问题，是目前本领域技术人员急需解决的技术问题。To sum up, how to effectively solve problems such as video memory allocation is a technical problem that those skilled in the art urgently need to solve.

发明内容Contents of the invention

本发明的目的是提供一种显存分配方法、装置、设备及可读存储介质，能够获取准确的容器显存用量，可免去对用户授予主机权限，通过进程标识关系表可以有效管理多卡多任务的场景，因此可以实现有效、有序地进行显存分配。The object of the present invention is to provide a video memory allocation method, device, equipment and readable storage medium that can obtain accurate container video memory usage, avoid granting host permissions to users, and effectively manage multiple cards and multiple tasks through a process identification relationship table. scenarios, so that efficient and orderly allocation of video memory can be achieved.

为解决上述技术问题，本发明提供如下技术方案：In order to solve the above technical problems, the present invention provides the following technical solutions:

一种显存分配方法，包括：A video memory allocation method, including:

获取待分配显存的任务；Get the task to be allocated video memory;

获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询所述容器中各个进程的主机进程标识符；Obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identifier relationship table;

基于所述主机进程标识符，获取所述容器中各个进程的显存用量；Based on the host process identifier, obtain the video memory usage of each process in the container;

叠加所述容器中的全部进程的显存用量，得到所述容器的显存总用量；Add the video memory usage of all processes in the container to obtain the total video memory usage of the container;

利用所述显存总用量与所述容器的显存配额确定所述容器的剩余显存，并基于所述剩余显存对所述任务进行显存分配。The remaining video memory of the container is determined using the total amount of video memory and the video memory quota of the container, and video memory is allocated to the task based on the remaining video memory.

优选地，获取所述进程标识关系表，包括：Preferably, obtaining the process identification relationship table includes:

获取所述容器内各个进程的容器进程标识符；Obtain the container process identifier of each process in the container;

调用打开函数打开挂载的自定义字符设备；Call the open function to open the mounted custom character device;

调用所述自定义字符设备的调用接口函数，并输入参数，得到返回结果；其中，所述参数为预定义指令和所述容器内各个进程的容器进程标识符，所述返回结果为各个进程的主机进程标识符；Call the calling interface function of the custom character device and enter parameters to obtain a return result; wherein the parameters are predefined instructions and the container process identifier of each process in the container, and the return result is the container process identifier of each process. Host process identifier;

将同一个进程的容器进程标识符和主机进程标识符成对写入所述进程标识关系表。Write the container process identifier and the host process identifier of the same process into the process identification relationship table in pairs.

优选地，所述获取所述容器内各个进程的容器进程标识符，包括：Preferably, obtaining the container process identifier of each process in the container includes:

获取所述容器内的GPU卡信息；Obtain the GPU card information in the container;

利用所述GPU卡信息，获取运行在所述GPU卡上所有进程的容器进程标识符。Use the GPU card information to obtain the container process identifiers of all processes running on the GPU card.

优选地，基于所述主机进程标识符，获取所述容器中各个进程的显存用量，包括：Preferably, based on the host process identifier, obtaining the video memory usage of each process in the container includes:

利用所述GPU卡信息，获取运行在所述GPU卡上所有进程的显存信息；Using the GPU card information, obtain the video memory information of all processes running on the GPU card;

从所述显存信息中找出所述主机进程标识符对应的显存用量。Find the video memory usage corresponding to the host process identifier from the video memory information.

优选地，还包括：Preferably, it also includes:

利用GPU进程显示工具，读取所述进程标识关系表；Use the GPU process display tool to read the process identification relationship table;

获取所述显存配额；Obtain the video memory quota;

获取所述容器内的GPU数量和GPU唯一标识符；Obtain the number of GPUs and GPU unique identifiers in the container;

基于所述GPU数量和GPU唯一标识符，循环获取所述容器内的每一个GPU对应的每一个进程信息；Based on the number of GPUs and the unique GPU identifier, cyclically obtain each process information corresponding to each GPU in the container;

匹配所述进程标识关系表与调用管理库函数返回的GPU进程信息，记录主机进程标识的显存用量；Match the process identification relationship table with the GPU process information returned by calling the management library function, and record the video memory usage of the host process identification;

基于所述进程标识关系表，对计算机运行信息查看函数返回结果进行参数替换处理；Based on the process identification relationship table, parameter replacement processing is performed on the result returned by the computer running information viewing function;

完成参数替换后，得到所述容器的显存使用信息；其中，所述显存使用信息包括每一个进程的显存用量，所述容器的显存配额，所述容器的剩余容量；After completing the parameter replacement, the video memory usage information of the container is obtained; wherein the video memory usage information includes the video memory usage of each process, the video memory quota of the container, and the remaining capacity of the container;

基于所述进程标识关系表，对计算机运行信息查看函数返回结果进行参数替换处理，包括：Based on the process identification relationship table, parameter replacement processing is performed on the result returned by the computer running information viewing function, including:

基于所述进程标识关系表，将所述返回结果中的所述主机进程标识替换为对应的容器进程标识；Based on the process identification relationship table, replace the host process identification in the returned result with the corresponding container process identification;

将所述返回结果中的所述容器内显存用量替换为所述容器内全部进程的使用显存和；Replace the amount of video memory in the container in the return result with the sum of the video memory used by all processes in the container;

将所述返回结果中的所述显存总量替换为所述显存配额。Replace the total amount of video memory in the returned result with the video memory quota.

优选地，所述获取待分配显存的任务，包括：Preferably, the task of obtaining the video memory to be allocated includes:

劫持显存分配函数，获取所述任务。Hijack the video memory allocation function and obtain the task.

优选地，基于所述剩余显存对所述任务进行显存分配，包括：Preferably, allocating video memory to the task based on the remaining video memory includes:

解析所述任务，得到请求分配的显存大小；Analyze the task and obtain the requested video memory size;

判断所述剩余显存是否大于等于所述显存大小；Determine whether the remaining video memory is greater than or equal to the video memory size;

如果是，则调用显存分配函数，向所述任务分配显存；If so, call the video memory allocation function to allocate video memory to the task;

如果否，则确定内存溢出，任务报错返回。If not, it is determined that the memory has overflowed and the task returns with an error.

一种显存分配装置，包括：A video memory allocation device, including:

任务获取单元，用于获取待分配显存的任务；The task acquisition unit is used to obtain the tasks to be allocated to the video memory;

标识转换单元，用于获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询所述容器中各个进程的主机进程标识符；An identification conversion unit is used to obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identification relationship table;

显存用量获取单元，用于基于所述主机进程标识符，获取所述容器中各个进程的显存用量；叠加所述容器中的全部进程的显存用量，得到所述容器的显存总用量；利用所述显存总用量与所述容器的显存配额确定所述容器的剩余显存；A video memory usage acquisition unit, configured to obtain the video memory usage of each process in the container based on the host process identifier; superimpose the video memory usage of all processes in the container to obtain the total video memory usage of the container; using the The total amount of video memory used and the video memory quota of the container determine the remaining video memory of the container;

显存分配单元，用于基于所述剩余显存对所述任务进行显存分配。A video memory allocation unit, configured to allocate video memory to the task based on the remaining video memory.

一种电子设备，包括：An electronic device including:

存储器，用于存储计算机程序；Memory, used to store computer programs;

处理器，用于执行所述计算机程序时实现上述显存分配方法的步骤。A processor, configured to implement the steps of the above video memory allocation method when executing the computer program.

一种可读存储介质，所述可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现上述显存分配方法的步骤。A readable storage medium. A computer program is stored on the readable storage medium. When the computer program is executed by a processor, the steps of the above video memory allocation method are implemented.

应用本发明实施例所提供的方法，获取待分配显存的任务；获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询容器中各个进程的主机进程标识符；基于主机进程标识符，获取容器中各个进程的显存用量；叠加容器中的全部进程的显存用量，得到容器的显存总用量；利用显存总用量与容器的显存配额确定容器可使用的剩余显存，并基于剩余显存对任务进行显存分配。Apply the method provided by the embodiment of the present invention to obtain the task to be allocated video memory; obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identifier relationship table; based on the host process identifier character to obtain the video memory usage of each process in the container; superimpose the video memory usage of all processes in the container to obtain the total video memory usage of the container; use the total video memory usage and the container's video memory quota to determine the remaining video memory that can be used by the container, and compare the remaining video memory based on the remaining video memory The task performs video memory allocation.

当获取到待分配显存的任务后，首先获取容器中的各个进程的容器进程标识符，然后基于标识符关系表，找出各个进程的主机进程标识符。利用主机进程标识符可以获取到各个进程的显存用量。通过累加容器中全部进程的显存用量，便可得到容器中实际的显存总用量。基于显存总用量与显存配额，即可确定容器可使用的剩余显存。基于该剩余显存，可以对该任务进行显存分配。After obtaining the task to be allocated video memory, first obtain the container process identifier of each process in the container, and then find out the host process identifier of each process based on the identifier relationship table. The video memory usage of each process can be obtained by using the host process identifier. By accumulating the video memory usage of all processes in the container, the actual total video memory usage in the container can be obtained. Based on the total video memory usage and the video memory quota, the remaining video memory that can be used by the container can be determined. Based on the remaining video memory, video memory can be allocated to the task.

技术效果：通过获取全部进程的主机进程标识符获取各个进程的显存用量，叠加全部的进程的显存使用量可以得到准确的容器显存用量；在获取显存用量过程中，也无需用户登录主机查看，可免去对用户授予主机权限，避免带来安全隐患；通过进程标识关系表可以有效管理多卡多任务的场景。也就是说，本发明可以实现有效、有序地进行显存分配。Technical effect: Obtain the video memory usage of each process by obtaining the host process identifier of all processes, and superimpose the video memory usage of all processes to obtain the accurate container video memory usage; in the process of obtaining the video memory usage, there is no need for the user to log in to the host to view it. There is no need to grant host permissions to users to avoid security risks; multi-card and multi-task scenarios can be effectively managed through the process identification relationship table. In other words, the present invention can realize effective and orderly allocation of video memory.

相应地，本发明实施例还提供了与上述显存分配方法相对应的显存分配装置、设备和可读存储介质，具有上述技术效果，在此不再赘述。Correspondingly, embodiments of the present invention also provide a video memory allocation device, equipment and a readable storage medium corresponding to the above video memory allocation method, which have the above technical effects and will not be described again here.

附图说明Description of drawings

为了更清楚地说明本发明实施例或相关技术中的技术方案，下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or related technologies, the drawings needed to be used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings in the following description are only for illustration purposes. For some embodiments of the invention, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本发明实施例中一种显存分配方法的实施流程图；Figure 1 is an implementation flow chart of a video memory allocation method in an embodiment of the present invention;

图2为一种主机+GPU的软件系统示意图；Figure 2 is a schematic diagram of a host + GPU software system;

图3为本发明实施例中一种主机+GPU的软件系统示意图；Figure 3 is a schematic diagram of a host + GPU software system in an embodiment of the present invention;

图4为本发明实施例中一种显存分配方法的具体实施时序示意图；Figure 4 is a schematic timing diagram of a specific implementation of a video memory allocation method in an embodiment of the present invention;

图5为本发明实施例中一种显存分配装置的结构示意图；Figure 5 is a schematic structural diagram of a video memory allocation device in an embodiment of the present invention;

图6为本发明实施例中一种电子设备的结构示意图；Figure 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention;

图7为本发明实施例中一种电子设备的具体结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面结合附图和具体实施方式对本发明作进一步的详细说明。显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only some of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

请参考图1，图1为本发明实施例中一种显存分配方法的流程图，该方法可以应用于GPU+CPU的计算架构中的容器中，该方法包括以下步骤：Please refer to Figure 1. Figure 1 is a flow chart of a video memory allocation method in an embodiment of the present invention. This method can be applied to containers in a GPU+CPU computing architecture. The method includes the following steps:

S101、获取待分配显存的任务。S101. Obtain the task of the video memory to be allocated.

在用户提交需要使用GPU的任务的情况下，即可确定获取到待分配显存的任务。例如，当用户向容器提交了了需要使用GPU的任务，则确定获取到了待分配显存的任务。When the user submits a task that requires the use of the GPU, it can be determined that the task to be allocated video memory is obtained. For example, when the user submits a task that requires the use of the GPU to the container, it is determined that the task to be allocated video memory is obtained.

在本发明中的一种具体实施方式中，获取待分配显存的任务，包括：劫持显存分配函数，获取任务。也就是说，可以通过劫持显存分配函数，从而得到该任务。对于具体如何实现函数的劫持，可以具体参照相关劫持方法，在此不再一一赘述。In a specific implementation manner of the present invention, obtaining the task of the video memory to be allocated includes: hijacking the video memory allocation function and obtaining the task. In other words, the task can be obtained by hijacking the video memory allocation function. For specific instructions on how to implement function hijacking, you can refer to the relevant hijacking methods and will not go into details here.

S102、获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询容器中各个进程的主机进程标识符。S102. Obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identifier relationship table.

其中，容器即当前接收到了待分配显存的任务的容器。在GPU+CPU的计算架构中，当容器接收到需要进行显存任务分配的任务后，都可以获取自身的各个进程的容器进程标识符，然后找对比进程标识关系表，找到这些进程在主机中的主机进程标识符。Among them, the container is the container that currently receives the task to be allocated video memory. In the GPU+CPU computing architecture, when a container receives a task that requires graphics memory task allocation, it can obtain the container process identifier of each of its own processes, and then compare the process identifier relationship table to find the location of these processes in the host. Host process identifier.

需要注意的是，对于同一个进程，在容器中存在一个唯一标识符，即容器进程标识符，用pid1表示，在主机中存在一个唯一标识符，即主机进程标识符，用pid2(即host pid)表示。It should be noted that for the same process, there is a unique identifier in the container, that is, the container process identifier, represented by pid1, and there is a unique identifier in the host, that is, the host process identifier, represented by pid2 (that is, host pid )express.

在进程标识关系表中，可以提前存放各个进程的容器进程标识符和主机进程标识符。In the process identification relationship table, the container process identifier and host process identifier of each process can be stored in advance.

通过查询进程标识关系表，则可快速获取到容器中各个进程的主机进程标识符。相较于容器与主机进行通信的方式，明确各个进程的主机进程标识符，能够更加方便快捷。By querying the process identification relationship table, you can quickly obtain the host process identifier of each process in the container. Compared with the way the container communicates with the host, it is more convenient and faster to clarify the host process identifier of each process.

具体的，该进程标识关系表可以是容器与主机进行通信，在获得到的主机进程标识符后存。也可以开发一个自定义字符设备，从而挂起该自定义字符设备，通过调用相应接口，即可获得主机进程标识符进行存储。Specifically, the process identifier relationship table may be stored after the container communicates with the host after obtaining the host process identifier. You can also develop a custom character device to suspend the custom character device. By calling the corresponding interface, you can obtain the host process identifier for storage.

在本发明中的一种具体实施方式中，获取进程标识关系表，包括：In a specific implementation manner of the present invention, obtaining the process identification relationship table includes:

获取容器内各个进程的容器进程标识符；Get the container process identifier of each process in the container;

调用自定义字符设备的调用接口函数，并输入参数，得到返回结果；其中，参数为预定义指令和容器内各个进程的容器进程标识符，返回结果为各个进程的主机进程标识符；Call the calling interface function of the custom character device and enter the parameters to get the return result; where the parameters are the predefined instructions and the container process identifier of each process in the container, and the return result is the host process identifier of each process;

将同一个进程的容器进程标识符和主机进程标识符成对写入进程标识关系表。Write the container process identifier and the host process identifier of the same process into the process identification relationship table in pairs.

为便于描述，下面将上述若干步骤结合起来进行说明。For the convenience of description, the above steps are combined for description below.

一般地，由于容器或者pod(K8s管理资源的最小运行单元)内采用命名空间隔离机制(linux pid namespace)，因此容器内无法看到进程(应用)在主机中pid2(pid，processID，进程标识符))。基于此，在本发明实施例中，可以自定义字符设备，该自定义字符设备主要功能是在容器内根据输入的指令和GPU进程pid1，返回该进程在主机上的pid2。基于自定义字符设备获取主机进程标识符实现和使用方式简单，可以很方便地集成在显存分配控制函数库中，相比主机与容器实时通信方式，部署容易且性能好。Generally, because the namespace isolation mechanism (linux pid namespace) is used in the container or pod (the smallest running unit for K8s to manage resources), the process (application) in the container cannot be seen in the host pid2 (pid, processID, process identifier) )). Based on this, in the embodiment of the present invention, a character device can be customized. The main function of the customized character device is to return the pid2 of the process on the host according to the input instruction and the GPU process pid1 in the container. Obtaining the host process identifier based on a custom character device is simple to implement and use, and can be easily integrated into the memory allocation control function library. Compared with the real-time communication method between the host and the container, it is easy to deploy and has good performance.

其中，获取容器内各个进程的容器进程标识符，包括：Among them, obtain the container process identifier of each process in the container, including:

获取容器内的GPU卡信息；Get the GPU card information in the container;

利用GPU卡信息，获取运行在GPU卡上所有进程的容器进程标识符。Use the GPU card information to obtain the container process identifiers of all processes running on the GPU card.

具体的，可以通过cuda context(统一计算设备架构的上下文)、cuda uuid和GPUuuid获取到GPU卡信息，然后基于GPU卡信息，则可获取到运行在GPU卡上的所有进程的容器进程标识符。Specifically, the GPU card information can be obtained through the cuda context (the context of the unified computing device architecture), cuda uuid and GPUuuid, and then based on the GPU card information, the container process identifiers of all processes running on the GPU card can be obtained.

在实际应用中，当容器内创建新的任务时，这个时候关系表中还没有建立这个新任务的进程对应关系，所以需要调用自定义字符设备创建这个关系，更新关系表。当容器内任务进程结束后，需要删除此进程对那个关系表中的信息。如此，便可保障关系表中的关系信息与当前的进程实时对应。In actual applications, when a new task is created in the container, the process corresponding relationship of the new task has not been established in the relationship table at this time, so it is necessary to call a custom character device to create this relationship and update the relationship table. When the task process in the container ends, the information in that relationship table related to this process needs to be deleted. In this way, it can be ensured that the relationship information in the relationship table corresponds to the current process in real time.

S103、基于主机进程标识符，获取容器中各个进程的显存用量。S103. Based on the host process identifier, obtain the video memory usage of each process in the container.

获取到了主机进程标识符之后，便可基于该主机进程标识符，得到相应进程的实际显存用量。也就是说，基于主机进程标识符，可以得到容器内全部进程的显存用量。After obtaining the host process identifier, the actual video memory usage of the corresponding process can be obtained based on the host process identifier. In other words, based on the host process identifier, the video memory usage of all processes in the container can be obtained.

在本发明中的一种具体实施方式中，基于主机进程标识符，获取容器中各个进程的显存用量，包括：In a specific implementation manner of the present invention, based on the host process identifier, the video memory usage of each process in the container is obtained, including:

获取容器内的GPU卡信息；Get the GPU card information in the container;

利用GPU卡信息，获取运行在GPU卡上所有进程的显存信息；Use the GPU card information to obtain the video memory information of all processes running on the GPU card;

从显存信息中找出主机进程标识符对应的显存用量。Find the video memory usage corresponding to the host process identifier from the video memory information.

其中，显存信息可以具体为使用了多少显存。例如，进程使用了10GB显存。Among them, the video memory information can be specifically how much video memory is used. For example, the process uses 10GB of video memory.

具体的，当容器使用1个或多个GPU卡时，可以首先获取到GPU卡信息，然后，基于该GPU卡信息，获取到运行在GPU卡上所有进程的显存信息。然后，从显存信息中找出与主机进程标识符对应的显存用量。也就是说，通过容器与GPU的关系，以及进程的pid2，即可获取到显存用量。Specifically, when a container uses one or more GPU cards, the GPU card information can be obtained first, and then, based on the GPU card information, the video memory information of all processes running on the GPU card can be obtained. Then, find the video memory usage corresponding to the host process identifier from the video memory information. In other words, the video memory usage can be obtained through the relationship between the container and the GPU and the pid2 of the process.

例如，假设容器A中有2个应用进程，其容器内的pid1分别为2和3；在主机上分别对应的pid2为14456和14457，这个关系表可以用如下二维表格记录：For example, assume that there are two application processes in container A, and the pid1 in the container are 2 and 3 respectively; the corresponding pid2 on the host are 14456 and 14457 respectively. This relationship table can be recorded in the following two-dimensional table:

22 1445614456 33 1445714457

由于，Nvml运行在主机上，所以只能获得14456、14457所使用的显存信息，在容器A内，要获得容器A的进程已经使用了多少显存，就可以首先通过pid1＝2查询到其进程的pid2＝14456，进而通过nvml获取到14456所使用的显存值(即显存用量)。Pid1＝3查询其进程的所使用的显存值类似，这样就可以获得容器A中所有进程总共使用了多少显存。Since Nvml runs on the host, you can only get the video memory information used by 14456 and 14457. In container A, to get how much video memory has been used by the process of container A, you can first query the process through pid1=2 pid2=14456, and then obtain the video memory value used by 14456 through nvml (that is, the amount of video memory used). Pid1=3 queries the video memory value used by its process, so that you can get the total video memory used by all processes in container A.

S104、叠加容器中的全部进程的显存用量，得到容器的显存总用量。S104. Add the video memory usage of all processes in the container to obtain the total video memory usage of the container.

容器中全部进程的显存用量的叠加，即实际容器已使用显存总用量。The superposition of the video memory usage of all processes in the container is the total amount of video memory actually used by the container.

需要注意的是，由于是通过叠加容器中的全部进程的显存用量，因此，该叠加结果即为实际容器的显存总用量，而不是忽略了隐式分配显存的大小对应的容器内GPU进程显存使用量。It should be noted that since the video memory usage of all processes in the container is superimposed, the superposition result is the total video memory usage of the actual container, rather than the GPU process video memory usage in the container that ignores the size of the implicitly allocated video memory. quantity.

S105、利用显存总用量与容器的显存配额确定容器的剩余显存，并基于剩余显存对任务进行显存分配。S105. Determine the remaining video memory of the container using the total amount of video memory and the video memory quota of the container, and allocate video memory to the task based on the remaining video memory.

其中，显存配额可以为创建容器时，为其分配的显存大小，例如当容器的显存配额为8G，则该容器内的进程总计最多可使用8G的显存。Among them, the video memory quota can be the size of the video memory allocated to the container when it is created. For example, when the video memory quota of the container is 8G, the total number of processes in the container can use up to 8G of video memory.

在得到显存总用量之后，将显存总用量与容器的显存配额做差，即可确定容器可使用的剩余显存。After obtaining the total video memory usage, the remaining video memory that can be used by the container can be determined by taking the difference between the total video memory usage and the container's video memory quota.

明确了容器的剩余显存之后，便可基于剩余显存对任务进行显存分配。After knowing the remaining video memory of the container, you can allocate video memory to the task based on the remaining video memory.

在本发明中的一种具体实施方式中，基于剩余显存对任务进行显存分配，包括：In a specific implementation of the present invention, allocating video memory to tasks based on remaining video memory includes:

解析任务，得到请求分配的显存大小；Analyze the task and obtain the requested memory size;

判断剩余显存是否大于等于显存大小；Determine whether the remaining video memory is greater than or equal to the video memory size;

如果是，则调用显存分配函数，向任务分配显存；If so, call the video memory allocation function to allocate video memory to the task;

即，首先确定出任务所请求分配的显存大小。然后，比对剩余显存和该显存大小，若剩余显存大于等于显存大小，则表明当前的剩余显存满足分配，此时可调用显存分配函数，想该任务分配显存；若剩余显存小于显存大小，则表明当前的剩余显存无法满足分配，可以确定内存溢出，即可返回OOM(out of memory，内存(显存)溢出)。That is, first determine the size of the video memory requested by the task. Then, compare the remaining video memory and the size of the video memory. If the remaining video memory is greater than or equal to the video memory size, it means that the current remaining video memory meets the allocation. At this time, the video memory allocation function can be called to allocate video memory to the task; if the remaining video memory is less than the video memory size, then It indicates that the current remaining video memory cannot meet the allocation, and the memory overflow can be determined, and OOM (out of memory, memory (video memory) overflow) can be returned.

当获取到待分配显存的任务后，首先获取容器中的各个进程的容器进程标识符，然后基于标识符关系表，找出各个进程的主机进程标识符。利用主机进程标识符可以获取到各个进程的显存用量。通过累加容器中全部进程的显存用量，便可得到容器中实际的显存总用量。基于显存总用量与显存配额，即可确定容器的剩余显存。基于该剩余显存，可以对该任务进行显存分配。After obtaining the task to be allocated video memory, first obtain the container process identifier of each process in the container, and then find out the host process identifier of each process based on the identifier relationship table. The video memory usage of each process can be obtained by using the host process identifier. By accumulating the video memory usage of all processes in the container, the actual total video memory usage in the container can be obtained. Based on the total video memory usage and the video memory quota, the remaining video memory of the container can be determined. Based on the remaining video memory, video memory can be allocated to the task.

需要说明的是，基于上述实施例，本发明实施例还提供了相应的改进方案。在优选/改进实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考，相应的有益效果也可相互参照，在本文的优选/改进实施例中不再一一赘述。It should be noted that, based on the above embodiments, the embodiments of the present invention also provide corresponding improvement solutions. In the preferred/improved embodiments, the same steps or corresponding steps as in the above-mentioned embodiments may be referred to each other, and the corresponding beneficial effects may also be referred to each other. They will not be described one by one in the preferred/improved embodiments herein.

在本发明中的一种具体实施方式中，还可向用户提供容器显存使用情况的查看功能，具体的实现过程，包括：In a specific implementation manner of the present invention, the user can also be provided with a function of viewing the usage of container memory. The specific implementation process includes:

利用GPU进程显示工具，读取进程标识关系表；Use the GPU process display tool to read the process identification relationship table;

获取显存配额；Get the video memory quota;

获取容器内的GPU数量和GPU唯一标识符；Get the number of GPUs and GPU unique identifier in the container;

基于GPU数量和GPU唯一标识符，循环获取容器内的每一个GPU对应的每一个进程信息；Based on the number of GPUs and the unique identifier of the GPU, obtain the process information corresponding to each GPU in the container in a loop;

匹配进程标识关系表与调用管理库函数返回的GPU进程信息，记录主机进程标识的显存用量；Match the process identification relationship table with the GPU process information returned by calling the management library function, and record the video memory usage of the host process identification;

基于进程标识关系表，对计算机运行信息查看函数返回结果进行参数替换处理；Based on the process identification relationship table, perform parameter replacement processing on the results returned by the computer running information viewing function;

完成参数替换后，得到容器的显存使用信息；其中，显存使用信息包括每一个进程的显存用量，容器的显存配额，容器的剩余容量；After completing the parameter replacement, the video memory usage information of the container is obtained; among which, the video memory usage information includes the video memory usage of each process, the video memory quota of the container, and the remaining capacity of the container;

基于进程标识关系表，对计算机运行信息查看函数返回结果进行参数替换处理，包括：Based on the process identification relationship table, parameter replacement processing is performed on the result returned by the computer running information viewing function, including:

基于进程标识关系表，将返回结果中的主机进程标识替换为对应的容器进程标识；Based on the process identification relationship table, replace the host process identification in the returned result with the corresponding container process identification;

将返回结果中的容器内显存用量替换为容器内全部进程的使用显存和；Replace the amount of video memory in the container in the returned result with the sum of the video memory used by all processes in the container;

将返回结果中的显存总量替换为显存配额。Replace the total amount of video memory in the returned result with the video memory quota.

为便于描述，下面将上述步骤结合起来进行说明。For ease of description, the above steps are combined for description below.

在本实施例中，通过分析nvidia-smi的实现逻辑和输出格式，可自定义一个容器内GPU进程显示工具(inais-smi)，其通过读取存储模块中GPU进程的进程标识关系表，将nvml函数返回的进程pid2替换为容器内相应的pid1，同时将容器内已经使用的显存替换为该容器内所有GPU进程使用显存的总和，total显存替换为容器内显存的配额，可实现对容器内GPU进程信息的统计，使用户在容器内可以看到自己的GPU进程信息，避免授权用户主机权限，且该方法不依赖于nvidia-smi命令的变化，比直接劫持nvidia-smi命令更稳定，还可以自定义输出信息。In this embodiment, by analyzing the implementation logic and output format of nvidia-smi, an in-container GPU process display tool (inais-smi) can be customized, which reads the process identification relationship table of the GPU process in the storage module. The process pid2 returned by the nvml function is replaced with the corresponding pid1 in the container. At the same time, the used video memory in the container is replaced with the sum of the video memory used by all GPU processes in the container. The total video memory is replaced with the quota of the video memory in the container, which can realize the control of the video memory in the container. The statistics of GPU process information allows users to see their own GPU process information in the container, avoiding authorizing user host rights. Moreover, this method does not rely on changes in the nvidia-smi command, and is more stable than directly hijacking the nvidia-smi command. Output information can be customized.

其中，nvml：NVIDIA Management Library，NVIDIA管理库，用于管理NVIDIA GPU设备，本文亦称之为管理库函数。Among them, nvml: NVIDIA Management Library, NVIDIA management library, is used to manage NVIDIA GPU devices. This article is also called the management library function.

为便于本领域技术人员更好地理解和实施本发明实施例所提供的显存分配方法，下面结合具体的应用场景为例，对显存分配方法进行详细说明。In order to facilitate those skilled in the art to better understand and implement the video memory allocation method provided by the embodiments of the present invention, the video memory allocation method will be described in detail below with reference to specific application scenarios as examples.

请参考图2和图3，图2为一种主机+GPU的软件系统示意图；图3为本发明实施例中一种主机+GPU的软件系统示意图。即，图2为相关主机+GPU的软件系统，图3为应用了本发明实施例所提供的显存分配方法的主机+GPU的软件系统。也就是说，应用了本发明实施例所提供的显存分配方法的主机+GPU的软件系统，相比于如图2所示的主机+GPU的软件系统，新增以下模块：Please refer to Figures 2 and 3. Figure 2 is a schematic diagram of a host + GPU software system; Figure 3 is a schematic diagram of a host + GPU software system in an embodiment of the present invention. That is, Figure 2 is a related host + GPU software system, and Figure 3 is a host + GPU software system that applies the video memory allocation method provided by the embodiment of the present invention. That is to say, the host+GPU software system that applies the video memory allocation method provided by the embodiment of the present invention has the following new modules compared to the host+GPU software system shown in Figure 2:

其中，自定义字符设备模块，由于容器或者pod内采用命名空间隔离机制，因此容器内无法看到进程的pid2。自定义字符设备模块的主要功能是在容器内根据输入的指令和GPU进程pid1，返回该进程在主机上的pid2。Among them, the custom character device module uses a namespace isolation mechanism in the container or pod, so the pid2 of the process cannot be seen in the container. The main function of the custom character device module is to return the pid2 of the process on the host based on the input instructions and the GPU process pid1 in the container.

关于隔离机制：在Linux系统中使用进程标识符(pid)命名空间隔离机制。因此在容器内启动一个应用/进程时，该应用的pid1在容器内可以看到，例如为1，但是容器内看不到该应用在主机(host)上的pid2。About the isolation mechanism: The process identifier (pid) namespace isolation mechanism is used in Linux systems. Therefore, when an application/process is started in a container, the pid1 of the application can be seen in the container, for example, 1, but the pid2 of the application on the host (host) cannot be seen in the container.

容器内GPU进程pid存储模块，在容器内，通过调用自定义字符设备模块的接口，可以获取GPU进程的pid关系，并将关系表保存到存储模块中，该容器内GPU进程pid存储模块对容器内所有GPU进程可读写，不同容器间无法访问，因此可以采用容器共享内存或者文件的方式来存储，存储模块可以避免对容器GPU进程的多次解析。GPU process pid storage module in the container. In the container, by calling the interface of the custom character device module, you can obtain the pid relationship of the GPU process and save the relationship table to the storage module. The GPU process pid storage module in the container is very important to the container. All GPU processes within it can be read and written, but cannot be accessed between different containers. Therefore, the container can be stored in shared memory or files. The storage module can avoid multiple parsing of the container GPU process.

其中，pid关系，即通过一个关系表的形式存储容器内应用进程的pid1所对应的在主机上的进程pid2。即，pid关系表可以具体为关系表格。例如容器内进程pid1为1，对应主机上进程pid2为14456，这个关系表格可以具体为：1->14456使用时，可以通过查表得形式方便获取pid1对应的pid2，不需要每次都单独进行解析。Among them, the pid relationship is a relationship table that stores the process pid2 on the host corresponding to the pid1 of the application process in the container. That is, the pid relationship table can be specifically a relationship table. For example, the process pid1 in the container is 1, and the corresponding process pid2 on the host is 14456. This relationship table can be specifically: 1->14456. When used, the pid2 corresponding to pid1 can be easily obtained by looking up the table, and there is no need to do it separately every time. parse.

容器内GPU进程显存控制模块，主要功能是控制容器内所有GPU进程的显存不能超出容器显存的配额。具体的，即劫持cuda driver中显存分配函数，在调用显存分配函数时，首先利用cuda context、cuda uuid和GPU uuid获取要分配显存的GPU卡，该方法获取的GPU信息精准，适用于容器内有多个GPU卡的情况，而后利用nvml相应的函数获取该GPU卡上所有运行任务的显存信息，最后利用存储模块中的进程标识关系表，结合GPU卡所有任务信息，获取该容器内的GPU进程显存信息；将容器内所有GPU进程使用显存相加，则可以精准获取容器内GPU的使用量，该显存包含各个GPU进程的cuda context等隐式分配的显存，同时结合容器显存配额和待分配显存大小，判断剩余显存是否满足，满足则调用真正的cudadriver函数完成此次分配，不满足则返回OOM。The main function of the GPU process memory control module in the container is to control the video memory of all GPU processes in the container to not exceed the container's video memory quota. Specifically, it hijacks the video memory allocation function in the cuda driver. When calling the video memory allocation function, first use the cuda context, cuda uuid and GPU uuid to obtain the GPU card to which video memory is to be allocated. The GPU information obtained by this method is accurate and is suitable for applications where there are files in the container. In the case of multiple GPU cards, the corresponding function of nvml is then used to obtain the video memory information of all running tasks on the GPU card. Finally, the process identification relationship table in the storage module is used, combined with all task information of the GPU card, to obtain the GPU process in the container. Video memory information; by adding up the video memory usage of all GPU processes in the container, you can accurately obtain the GPU usage in the container. This video memory includes implicitly allocated video memory such as the cuda context of each GPU process, and is combined with the container video memory quota and the video memory to be allocated. size to determine whether the remaining video memory is satisfied. If it is satisfied, the real cudadriver function will be called to complete the allocation. If it is not satisfied, OOM will be returned.

容器内GPU进程统计模块。通过分析nvidia-smi的实现逻辑和输出格式，自定义一个容器内GPU进程显示工具，即inais-smi，其通过读取存储模块中GPU进程的pid关系表(进程标识关系表，本文中也简称为关系表)，将nvml函数返回的进程pid2替换为容器内相应的pid1，同时将容器内已经使用的显存替换为该容器内所有GPU进程使用显存的总和，total显存替换为容器内显存的配额，实现了对容器内GPU进程信息的统计，使用户在容器内可以看到自己的GPU进程信息，避免授权用户主机权限，该方法不依赖于nvidia-smi命令的变化，比直接劫持nvidia-smi命令更稳定，且可以自定义输出信息。GPU process statistics module in the container. By analyzing the implementation logic and output format of nvidia-smi, we customize a GPU process display tool in the container, namely inais-smi, which reads the pid relationship table of the GPU process in the storage module (process identification relationship table, also referred to in this article as (Relationship table), replace the process pid2 returned by the nvml function with the corresponding pid1 in the container, and replace the used video memory in the container with the sum of the video memory used by all GPU processes in the container, and replace the total video memory with the quota of video memory in the container , realizes the statistics of GPU process information in the container, so that users can see their own GPU process information in the container, avoiding authorizing user host permissions. This method does not rely on changes in the nvidia-smi command, and is better than directly hijacking nvidia-smi The command is more stable and the output information can be customized.

在具体实现时，可以按照以下步骤进行部署和实施。During specific implementation, you can follow the following steps for deployment and implementation.

首先，自定义字符设备模块，即编写字符设备模块处理逻辑，主要定义unlocked_ioctl(自定义字符设备调用接口函数)接口，通过内核函数实现在容器内获取host pid的逻辑。在部署资源管理平台时，可根据具体的系统内核版本模块编译生成对应的ko文件(linux内核模块文件，如vcuda_dev.ko)，并使用insmod命令(linux内核模块文件安装命令)进行设备模块的安装。First, customize the character device module, that is, write the character device module processing logic, mainly define the unlocked_ioctl (custom character device call interface function) interface, and implement the logic of obtaining the host pid in the container through the kernel function. When deploying the resource management platform, you can compile and generate the corresponding ko file (linux kernel module file, such as vcuda_dev.ko) according to the specific system kernel version module, and use the insmod command (linux kernel module file installation command) to install the device module .

挂载vcuda_dev字符设备，具体的，用户通过资源管理平台，创建容器或者POD时，将该字符设备挂载到相应的容器内，对应的docker参数为--device(docker命令，例如：docker run-it--device/dev/vcuda_dev<容器镜像>)。Mount the vcuda_dev character device. Specifically, when the user creates a container or POD through the resource management platform, the character device is mounted into the corresponding container. The corresponding docker parameter is --device(docker command, for example: docker run- it--device/dev/vcuda_dev<container image>).

挂载容器内GPU进程统计工具。可分析nvidia-smi的实现逻辑和输出格式，自定义一个容器内GPU进程显示工具，并编译生成可执行文件，如inais-smi。用户通过资源管理平台创建容器或者POD时，将inais-smi可执行文件挂载到/usr/bin(容器内的一个目录地址)目录下，而后用户可以在容器内任意目录下执行inais-smi命令。Mount the GPU process statistics tool in the container. It can analyze the implementation logic and output format of nvidia-smi, customize a GPU process display tool in a container, and compile and generate executable files, such as inais-smi. When users create a container or POD through the resource management platform, they mount the inais-smi executable file to the /usr/bin (a directory address within the container) directory, and then the user can execute the inais-smi command in any directory within the container. .

挂载显存分配控制函数库并设置优先加载逻辑。重定义cuda driver中显存分配相关函数，在显存分配函数开始处，加入显存控制逻辑，并将整个显存分配控制逻辑编译成so动态库，如libvcuda.so(显存分配控制逻辑动态库的名称)；用户通过资源管理平台创建容器或者POD内，将其挂载到容器内，并利用ld.so.prelod或者LD_PRELOAD(linux的内部命令)等参数，设置显存分配控制函数库优先加载逻辑。Mount the video memory allocation control function library and set priority loading logic. Redefine the video memory allocation related functions in the cuda driver, add the video memory control logic at the beginning of the video memory allocation function, and compile the entire video memory allocation control logic into a so dynamic library, such as libvcuda.so (the name of the video memory allocation control logic dynamic library); Users create a container or POD through the resource management platform, mount it into the container, and use parameters such as ld.so.prelod or LD_PRELOAD (linux internal command) to set the priority loading logic of the video memory allocation control function library.

请参考图4，图4为本发明实施例中一种显存分配方法的具体实施时序示意图。任务启动、GPU进程pid表的获取和存储。当用户在容器或者POD内，启动GPU任务时，由于设置了显存分配控制函数库的优先加载逻辑，因此会触发自定义的GPU进程pid获取逻辑；首先会调用open函数打开挂载的vcuda_dev字符设备，其次调用ioctl函数，参数为提前定义的指令和容器内GPU进程的pid，返回的结果为该进程的host pid，最后将GPU进程的pid关系表存储到共享模块内。Please refer to FIG. 4 , which is a schematic timing diagram of a specific implementation of a video memory allocation method in an embodiment of the present invention. Task startup, acquisition and storage of GPU process pid table. When the user starts a GPU task in a container or POD, due to the priority loading logic of the video memory allocation control function library, the custom GPU process pid acquisition logic will be triggered; first, the open function will be called to open the mounted vcuda_dev character device , secondly call the ioctl function, the parameters are the pre-defined instructions and the pid of the GPU process in the container, the returned result is the host pid of the process, and finally the pid relationship table of the GPU process is stored in the shared module.

容器内GPU进程显存控制：当GPU任务运行并调用GPU显存分配函数时，由于设置了libvcuda.so的优先加载逻辑，因此会优先调用libvcuda.so中的显存分配函数，从而触发显存控制逻辑。GPU process memory control in the container: When the GPU task runs and calls the GPU memory allocation function, because the priority loading logic of libvcuda.so is set, the memory allocation function in libvcuda.so will be called first, thus triggering the memory control logic.

也就是说，在本发明中，可通过自定义字符设备模块，实现在容器内通过调用自定义的ioctl指令获取GPU进程的host pid，简单且高效；通过劫持cuda driver的显存分配函数，利用cuda context、cuda uuid与GPU uuid的转换关系，解决了容器内多个GPU卡和多GPU任务的显存控制问题；并基于容器存储模块中GPU进程pid关系表，通过劫持nvml函数的输出以及主机和容器间进程pid的转换策略，解决了用户无法在容器内获取各个GPU进程统计信息的问题。That is to say, in the present invention, the custom character device module can be used to obtain the host pid of the GPU process by calling the custom ioctl instruction in the container, which is simple and efficient; by hijacking the video memory allocation function of the cuda driver, using cuda The conversion relationship between context, cuda uuid and GPU uuid solves the problem of memory control of multiple GPU cards and multi-GPU tasks in the container; and based on the GPU process pid relationship table in the container storage module, by hijacking the output of the nvml function and the host and container The conversion strategy of inter-process pid solves the problem that users cannot obtain statistical information of each GPU process in the container.

此外，容器GPU进程pid关系表存储和筛选模块；该存储模块对于容器内的各个GPU进程均可读写，具体可采用共享内存、文件等方式，存储单元内保存了该容器内GPU进程的pid关系表，避免对容器内GPU进程的多次解析；即，筛选模块比对已获取的pid信息，删除容器中未运行的进程对应的pid信息。也就是说筛选模块可去掉无效的pid息，确保存储单元的信息与实际运行的程序一致。例如，可判断容器内host pid是否位于nvml_table中，存在则保留nvml_table中对应进程的GPU信息，不存在则说明该任务/进程已经结束，将其从存储模块中删除。In addition, the container GPU process pid relationship table storage and filtering module; this storage module can be read and written to each GPU process in the container. Specifically, shared memory, files, etc. can be used. The storage unit stores the pid of the GPU process in the container. The relational table avoids multiple parsing of the GPU process in the container; that is, the filtering module compares the obtained pid information and deletes the pid information corresponding to the processes that are not running in the container. In other words, the filtering module can remove invalid pid information and ensure that the information in the storage unit is consistent with the actually running program. For example, you can determine whether the host pid in the container is located in nvml_table. If it exists, the GPU information of the corresponding process in nvml_table will be retained. If it does not exist, it means that the task/process has ended and it will be deleted from the storage module.

实施显存分配时GPU卡识别策略，显存分配时首先需要确定此次显存分配的GPU卡，当容器内有多张GPU卡时，任务可能会运行在多个GPU卡上；另一种情况，如果用户启动任务设置了CUDA_VISIBLE_DEVICES参数，则单纯的GPU index可能会失效。而应用本发明实施例所提供的方法和策略，通过获取当前显存分配的cuda context，再利用cuda context获取cuda uuid，最后利用cuda uuid和GPU uuid之间的转换关系，获取此次显存分配的GPUuuid，根据GPU uuid可以调用nvml函数获取更多GPU卡信息。When implementing the GPU card identification strategy when allocating video memory, you first need to determine the GPU card for this video memory allocation. When there are multiple GPU cards in the container, the task may run on multiple GPU cards; in another case, if If the user starts the task and sets the CUDA_VISIBLE_DEVICES parameter, the simple GPU index may fail. By applying the methods and strategies provided by the embodiments of the present invention, the cuda context of the current video memory allocation is obtained, and then the cuda context is used to obtain the cuda uuid. Finally, the conversion relationship between the cuda uuid and the GPU uuid is used to obtain the GPUuuid of the video memory allocation. , according to the GPU uuid, the nvml function can be called to obtain more GPU card information.

显存使用统计策略，显存分配时需要获知该容器内对应GPU卡上显存的使用信息。由于无法获知cuda context等隐式分配的显存大小，但忽略该部分显存容易造成容器配额的超额，因此本发明实施例所提供的方法和策略在控制显存分配时，先通过GPU卡识别策略获取GPU uuid，其次调用nvmlDeviceGetHandleByUUID函数获取nvidia device，最后通过nvidia device调用nvmlDeviceGetComputeRunningProcesses函数(一种查看进程信息的函数)获取对应GPU卡上所有用户的GPU进程信息，此信息由nvidia驱动维护，显存大小非常准确，而后结合该容器存储模块中的GPU进程pid关系表，通过匹配两者的host pid，精准获取该容器内所有GPU进程使用的显存总量。Video memory usage statistics policy. When allocating video memory, you need to know the usage information of the video memory on the corresponding GPU card in the container. Since it is impossible to know the size of the implicitly allocated video memory such as the CUDA context, but ignoring this part of the video memory can easily cause the container quota to exceed the quota, the method and strategy provided by the embodiment of the present invention first obtain the GPU through the GPU card identification strategy when controlling the video memory allocation. uuid, then call the nvmlDeviceGetHandleByUUID function to obtain the nvidia device, and finally call the nvmlDeviceGetComputeRunningProcesses function (a function to view process information) through the nvidia device to obtain the GPU process information of all users on the corresponding GPU card. This information is maintained by the nvidia driver, and the video memory size is very accurate. Then combined with the GPU process pid relationship table in the container storage module, by matching the host pids of the two, the total amount of video memory used by all GPU processes in the container can be accurately obtained.

容器GPU进程统计和刷新策略，将nvmlDeviceGetComputeRunningProcesses函数返回的GPU信息，记为nvml_table；从容器内的存储模块中获取GPU进程pid关系表，判断容器内host pid是否位于nvml_table中，存在则保留nvml_table中对应进程的GPU信息，不存在则说明该任务已经结束，将其从存储模块中删除，最后回写存储模块进行刷新。Container GPU process statistics and refresh strategy, record the GPU information returned by the nvmlDeviceGetComputeRunningProcesses function as nvml_table; obtain the GPU process pid relationship table from the storage module in the container, determine whether the host pid in the container is located in nvml_table, if it exists, keep the corresponding one in nvml_table If the GPU information of the process does not exist, it means that the task has ended, delete it from the storage module, and finally write it back to the storage module for refresh.

也就是说，本发明实施例所提供的方法和策略，可利用自定义字符设备模块在容器内获取GPU进程的host pid，并在显存分配时利用cuda context、cuda uuid和GPU uuid之间的关系获取了真实GPU卡的信息，解决了容器内多GPU卡任务和CUDA_VISIBLE_DEVICES特殊变量的问题；同时利用存储模块中的pid关系表和nvml函数，获取该容器内相应GPU卡的显存占用信息，实现了对容器内显存的精准控制；最后，通过劫持nvml函数的输出以及主机和容器间进程pid的转换策略，解决了容器内无法统计和显示GPU进程信息的问题。That is to say, the methods and strategies provided by the embodiments of the present invention can use the custom character device module to obtain the host pid of the GPU process in the container, and use the relationship between cuda context, cuda uuid and GPU uuid when allocating video memory. The information of the real GPU card was obtained, which solved the problem of multi-GPU card tasks and CUDA_VISIBLE_DEVICES special variables in the container; at the same time, the pid relationship table and nvml function in the storage module were used to obtain the video memory occupation information of the corresponding GPU card in the container, realizing Precise control of the video memory in the container; finally, by hijacking the output of the nvml function and the process pid conversion strategy between the host and the container, the problem of being unable to count and display GPU process information in the container is solved.

相应于上面的方法实施例，本发明实施例还提供了一种显存分配装置，下文描述的显存分配装置与上文描述的显存分配方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present invention also provide a video memory allocation device. The video memory allocation device described below and the video memory allocation method described above can be referenced correspondingly.

参见图5所示，该装置包括以下模块：As shown in Figure 5, the device includes the following modules:

任务获取单元101，用于获取待分配显存的任务；Task acquisition unit 101, used to acquire tasks to be allocated video memory;

标识转换单元102，用于获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询容器中各个进程的主机进程标识符；The identification conversion unit 102 is used to obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identification relationship table;

显存用量获取单元103，用于基于主机进程标识符，获取容器中各个进程的显存用量；叠加容器中的全部进程的显存用量，得到容器的显存总用量；利用显存总用量与容器的显存配额确定容器的剩余显存；The video memory usage acquisition unit 103 is used to obtain the video memory usage of each process in the container based on the host process identifier; superimpose the video memory usage of all processes in the container to obtain the total video memory usage of the container; determine the total video memory usage and the video memory quota of the container. The remaining video memory of the container;

显存分配单元104，用于基于剩余显存对任务进行显存分配。The video memory allocation unit 104 is used to allocate video memory to tasks based on remaining video memory.

应用本发明实施例所提供的装置，获取待分配显存的任务；获取容器中各个进程的容器进程标识符，并按照进程标识关系表，查询容器中各个进程的主机进程标识符；基于主机进程标识符，获取容器中各个进程的显存用量；叠加容器中的全部进程的显存用量，得到容器的显存总用量；利用显存总用量与容器的显存配额确定容器可使用的剩余显存，并基于剩余显存对任务进行显存分配。Apply the device provided by the embodiment of the present invention to obtain the task to be allocated video memory; obtain the container process identifier of each process in the container, and query the host process identifier of each process in the container according to the process identifier relationship table; based on the host process identifier character to obtain the video memory usage of each process in the container; superimpose the video memory usage of all processes in the container to obtain the total video memory usage of the container; use the total video memory usage and the container's video memory quota to determine the remaining video memory that can be used by the container, and compare the remaining video memory based on the remaining video memory The task performs video memory allocation.

技术效果：通过获取全部进程的主机进程标识符获取各个进程的显存用量，叠加全部的进程的显存使用量可以得到准确的容器显存用量；在获取显存用量过程中，也无需用户登录主机查看，可免去对用户授予主机权限，避免带来安全隐患；通过pid关系表可以有效管理多卡多任务的场景。也就是说，本发明可以实现有效、有序地进行显存分配。Technical effect: Obtain the video memory usage of each process by obtaining the host process identifier of all processes, and superimpose the video memory usage of all processes to obtain the accurate container video memory usage; in the process of obtaining the video memory usage, there is no need for the user to log in to the host to view it. There is no need to grant host permissions to users to avoid security risks; multi-card and multi-task scenarios can be effectively managed through the pid relationship table. In other words, the present invention can realize effective and orderly allocation of video memory.

在本发明的一种具体实施方式中，表建立单元，用于获取进程标识关系表，包括：获取容器内各个进程的容器进程标识符；调用打开函数打开挂载的自定义字符设备；调用自定义字符设备的调用接口函数，并输入参数，得到返回结果；其中，参数为预定义指令和容器内各个进程的容器进程标识符，返回结果为各个进程的主机进程标识符；将同一个进程的容器进程标识符和主机进程标识符成对写入进程标识关系表。In a specific implementation of the present invention, the table creation unit is used to obtain the process identification relationship table, including: obtaining the container process identifier of each process in the container; calling the open function to open the mounted custom character device; calling Define the calling interface function of the character device, and enter the parameters to get the return result; among them, the parameters are the predefined instructions and the container process identifier of each process in the container, and the return result is the host process identifier of each process; the same process's The container process identifier and the host process identifier are written into the process identification relationship table in pairs.

在本发明的一种具体实施方式中，表建立单元，具体用于获取容器内的GPU卡信息；In a specific implementation of the present invention, the table creation unit is specifically used to obtain the GPU card information in the container;

在本发明的一种具体实施方式中，显存用量获取单元，用于获取容器内的GPU卡信息；In a specific implementation of the present invention, the video memory usage acquisition unit is used to acquire GPU card information in the container;

在本发明的一种具体实施方式中，显存查看单元，用于利用GPU进程显示工具，读取进程标识关系表；In a specific implementation of the present invention, the video memory viewing unit is used to use the GPU process display tool to read the process identification relationship table;

获取显存配额；Get the video memory quota;

在本发明的一种具体实施方式中，任务获取单元，具体用于劫持显存分配函数，获取任务。In a specific implementation manner of the present invention, the task acquisition unit is specifically used to hijack the video memory allocation function and acquire the task.

在本发明的一种具体实施方式中，显存分配单元，具体用于解析任务，得到请求分配的显存大小；In a specific implementation of the present invention, the video memory allocation unit is specifically used to parse the task and obtain the requested video memory size to be allocated;

相应于上面的方法实施例，本发明实施例还提供了一种电子设备，下文描述的一种电子设备与上文描述的一种显存分配方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present invention also provide an electronic device. The electronic device described below and the video memory allocation method described above may be mutually referenced.

参见图6所示，该电子设备包括：As shown in Figure 6, the electronic device includes:

存储器332，用于存储计算机程序；Memory 332 for storing computer programs;

处理器322，用于执行计算机程序时实现上述方法实施例的显存分配方法的步骤。The processor 322 is configured to implement the steps of the video memory allocation method of the above method embodiment when executing the computer program.

在一种具体实施方式中，该电子设备可包括处理器和存储器，该电子设备连接有GPU(如插接GPU)。In a specific implementation, the electronic device may include a processor and a memory, and the electronic device may be connected to a GPU (such as a plug-in GPU).

在另一种具体实施方式中，该电子设备可包括处理器，存储器和GPU(图6中未绘制)。具体的，请参考图7，图7为本实施例提供的一种电子设备的具体结构示意图，该电子设备可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器(centralprocessing units，CPU)322(例如，一个或一个以上处理器)、存储器332和GPU(图7中未绘制出)，存储器332存储有一个或一个以上的计算机程序342或数据344。其中，存储器332可以是短暂存储或持久存储。存储在存储器332的程序可以包括一个或一个以上模块(图示没标出)，每个模块可以包括对数据处理设备中的一系列指令操作。更进一步地，处理器322可以设置为与存储器332通信，在电子设备301上执行存储器332中的一系列指令操作。In another specific implementation, the electronic device may include a processor, a memory and a GPU (not shown in Figure 6). Specifically, please refer to Figure 7. Figure 7 is a specific structural schematic diagram of an electronic device provided in this embodiment. The electronic device may vary greatly due to different configurations or performance, and may include one or more processors ( central processing units (CPU) 322 (eg, one or more processors), memory 332 and GPU (not shown in FIG. 7 ). The memory 332 stores one or more computer programs 342 or data 344 . Among them, the memory 332 may be short-term storage or persistent storage. The program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device. Furthermore, the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the electronic device 301 .

电子设备301还可以包括一个或一个以上电源326，一个或一个以上有线或无线网络接口350，一个或一个以上输入输出接口358，和/或，一个或一个以上操作系统341。Electronic device 301 may also include one or more power supplies 326 , one or more wired or wireless network interfaces 350 , one or more input/output interfaces 358 , and/or, one or more operating systems 341 .

上文所描述的显存分配方法中的步骤可以由电子设备的结构实现。The steps in the video memory allocation method described above can be implemented by the structure of the electronic device.

相应于上面的方法实施例，本发明实施例还提供了一种可读存储介质，下文描述的一种可读存储介质与上文描述的一种显存分配方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present invention also provide a readable storage medium. The readable storage medium described below and the video memory allocation method described above may be mutually referenced.

一种可读存储介质，可读存储介质上存储有计算机程序，计算机程序被处理器执行时实现上述方法实施例的显存分配方法的步骤。A readable storage medium. A computer program is stored on the readable storage medium. When the computer program is executed by a processor, the steps of the video memory allocation method of the above method embodiment are implemented.

该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。The readable storage medium can specifically be a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk that can store program codes. readable storage media.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.

本领域技术人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件的方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应该认为超出本发明的范围。Those skilled in the art may further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系属于仅仅用来将一个实体或者操作与另一个实体或者操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语包括、包含或者其他任何变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。Finally, it should be noted that in this article, relationships such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms include, include, or any variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements but also other elements not expressly listed, or It also includes elements inherent to the process, method, article or equipment.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上，本说明书内容不应理解为对本发明的限制。This article uses specific examples to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only used to help understand the method and the core idea of the present invention; at the same time, for those of ordinary skill in the art, according to the present invention There will be changes in the specific implementation and scope of application of the ideas. In summary, the content of this description should not be understood as limiting the present invention.

Claims

1. A video memory allocation method is characterized by comprising the following steps:

Acquiring a task of a video memory to be allocated;

acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table;

based on the host process identifier, obtaining the video memory consumption of each process in the container;

superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container;

and determining the residual video memory of the container by using the total video memory consumption and the video memory quota of the container, and performing video memory allocation on the task based on the residual video memory.

2. The method of claim 1, wherein obtaining the process identification relationship table comprises:

acquiring a container process identifier of each process in the container;

calling an opening function to open the mounted custom character equipment;

calling the calling interface function of the custom character equipment, and inputting parameters to obtain a return result; the parameter is a predefined instruction and a container process identifier of each process in the container, and the returned result is a host process identifier of each process;

and writing the container process identifier and the host process identifier of the same process into the process identification relation table in pairs.

3. The method of claim 2, wherein the obtaining a container process identifier for each process within the container comprises:

acquiring GPU card information in the container;

and acquiring the container process identifiers of all processes running on the GPU card by utilizing the GPU card information.

4. The method of claim 1, wherein obtaining the memory usage of each process in the container based on the host process identifier comprises:

acquiring GPU card information in the container;

acquiring the video memory information of all processes running on the GPU card by utilizing the GPU card information;

and finding out the video memory quantity corresponding to the host process identifier from the video memory information.

5. The method as recited in claim 1, further comprising:

reading the process identification relation table by using a GPU process display tool;

acquiring the video memory quota;

acquiring the quantity of the GPUs and the unique identifiers of the GPUs in the container;

based on the number of GPUs and the unique identifier of the GPU, circularly acquiring each process information corresponding to each GPU in the container;

matching the process identification relation table with GPU process information returned by calling a management library function, and recording the memory consumption of a host process identification;

Based on the process identification relation table, parameter replacement processing is carried out on a return result of the computer running information checking function;

after parameter replacement is completed, obtaining the video memory use information of the container; the video memory usage information comprises the video memory consumption of each process, the video memory quota of the container and the residual capacity of the container;

based on the process identification relation table, parameter replacement processing is carried out on a return result of the computer running information checking function, and the method comprises the following steps:

replacing the host process identifier in the returned result with a corresponding container process identifier based on the process identifier relation table;

replacing the memory consumption in the container in the returned result with the sum of the memory consumption of all processes in the container;

and replacing the total amount of the video memory in the returned result with the video memory quota.

6. The method of claim 1, wherein the task of obtaining the memory to be allocated comprises:

hijacking a video memory allocation function to acquire the task.

7. The method of any of claims 1 to 6, wherein allocating the task based on the remaining memory comprises:

Analyzing the task to obtain the size of the video memory which is requested to be allocated;

judging whether the residual video memory is larger than or equal to the video memory size;

if yes, calling a video memory allocation function, and allocating video memory to the task;

if not, determining that the memory overflows, and reporting the task by mistake and returning.

8. A memory allocation apparatus, comprising:

the task acquisition unit is used for acquiring tasks of the video memory to be allocated;

the device comprises an identification conversion unit, a storage unit and a storage unit, wherein the identification conversion unit is used for acquiring a container process identifier of each process in a container and inquiring a host process identifier of each process in the container according to a process identification relation table;

the memory amount obtaining unit is used for obtaining the memory amount of each process in the container based on the host process identifier; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; determining the residual video memory of the container by using the total video memory consumption and the video memory quota of the container;

and the video memory distribution unit is used for distributing the video memory to the task based on the residual video memory.

9. An electronic device, comprising:

a memory for storing a computer program;

A processor for implementing the steps of the memory allocation method according to any one of claims 1 to 7 when executing the computer program.

10. A readable storage medium, wherein a computer program is stored on the readable storage medium, the computer program implementing the steps of the video memory allocation method according to any one of claims 1 to 7 when executed by a processor.