CN119003098B

CN119003098B - Method, computing device, medium and program product for building a trusted execution environment

Info

Publication number: CN119003098B
Application number: CN202411487427.9A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Bi Ren Technology Co ltd
Current assignee: Shanghai Bi Ren Technology Co ltd
Priority date: 2024-10-23
Filing date: 2024-10-23
Publication date: 2025-02-25
Anticipated expiration: 2044-10-23
Also published as: CN119003098A

Abstract

The present invention relates to the field of artificial intelligence and to a method, computing device, medium and program product for constructing a trusted execution environment. The method comprises the steps of configuring a plurality of virtual machines on a host side, isolating the plurality of virtual machines, and configuring a hardware isolated GPU instance on a device side so as to form an independent hardware access path with at least one virtual machine in the configured plurality of virtual machines for constructing a trusted execution environment supporting confidential computation. The invention can effectively construct a trusted execution environment supporting confidential calculation.

Description

Method, computing device, medium and program product for constructing trusted execution environment

Technical Field

Embodiments of the present invention relate generally to the field of artificial intelligence and, more particularly, relate to a method, computing device, computer readable storage medium and computer program product for building a trusted execution environment.

Background

The trusted execution environment (Trusted Execution Environments, TEE) is a key element for the functioning of confidential computing. There is a growing demand for confidential computing at present, and it is therefore becoming increasingly important to build a trusted execution environment that can effectively support confidential computing.

While some encryption techniques exist today to protect static data in storage as well as data transmitted over a network, conventional computing devices (e.g., computing devices used in artificial intelligence domain computing) often have difficulty building trusted execution environments that are sufficient to support confidential computing. The main reason for this is that confidential calculations need to provide protection against the data and applications being used, while for traditional computing devices they mainly include a host (e.g., a central processing unit, i.e., central Processing Unit, or referred to as a "CPU") and a device (e.g., a graphics processor, i.e., graphics Processing Unit, or referred to as a "GPU"), there is a difficulty in achieving the above requirements by requiring that the hardware on the device side meet higher security isolation requirements and also enabling specific code in the application.

In summary, conventional computing devices suffer from the disadvantage that it is difficult to build a trusted execution environment that supports confidential computing.

Disclosure of Invention

The present invention provides a method, computing device, computer-readable storage medium and computer program product for building a trusted execution environment that is capable of efficiently building a trusted execution environment that supports confidential computations.

According to a first aspect of the present invention, a method for building a trusted execution environment is provided. The method comprises the steps of configuring a plurality of virtual machines on a host side, isolating the plurality of virtual machines, and configuring a hardware isolated GPU instance on a device side so as to form an independent hardware access path with at least one virtual machine in the configured plurality of virtual machines for constructing a trusted execution environment supporting confidential computation.

In some embodiments, configuring the hardware-isolated GPU instance includes isolating an address space of hardware on the device side and isolating memory, command processors, and compute cores associated with the configured GPU instance from each other.

In some embodiments, building a trusted execution environment supporting confidential computing includes configuring a plurality of GPU instances that are hardware isolated from one another, building the trusted execution environment based on independent hardware access paths formed by at least one of the plurality of GPU instances and a corresponding virtual machine of a plurality of virtual machines, and managing data flow and memory access within at least one GPU instance used to build the trusted execution environment.

In some embodiments, managing data flow and memory access in an internal for building at least one GPU instance of the trusted execution environment includes sending a command with an address space identification to a command queue in a process run by the GPU instance for a master module in the GPU instance to obtain the command, the master module processing the command to initiate an access request to memory, a system memory management unit looking up a context matching table through the address space identification to obtain a valid context descriptor to obtain a page table entry, the valid context descriptor indicating the GPU instance, the system memory management unit sending a memory request to a physical address generator, the memory request indicating a system memory management unit address, and converting the system memory management unit address to a target physical address of memory via a physical address generator.

In some embodiments, internally managing data flow and memory access at least one GPU instance for building the trusted execution environment includes sending, via a host, an access request to a device-side memory, the access request indicating a device-side PCIe address, translating, via a device-side host portal, the PCIe address to a system memory management unit address to provide the system memory management unit address to a physical address generator, and translating, via the physical address generator, the system memory management unit address to a target physical address of the device-side memory.

In some embodiments, isolating the address space of the device-side hardware includes partitioning for system address space, PCIe address space, high speed bus addresses, and low speed bus addresses to map with configured GPU instances.

In some embodiments, building the trusted execution environment based on independent hardware access paths formed by at least one of the plurality of GPU instances and a corresponding one of the plurality of virtual machines includes forming one hardware access path based on each of the plurality of GPU instances and a corresponding one of the plurality of virtual machines, respectively, thereby forming a plurality of hardware access paths, or forming a plurality of hardware access paths based on one of the plurality of GPU instances and a corresponding one of the plurality of virtual machines, respectively, via multiplexing.

In some embodiments, building the trusted execution environment based on independent hardware access paths formed by at least one of the plurality of GPU instances and corresponding ones of the plurality of virtual machines includes enabling the corresponding ones of the hardware access paths to directly access the at least one GPU instance, and configuring separate memory spaces, interrupts, and direct memory access data transfers for the formed hardware access paths.

In some embodiments, partitioning the system address space, the PCIe address space, the high speed bus address, and the low speed bus address for mapping with the configured GPU instances includes virtualizing the PCIe address space into a plurality of blocks for mapping the virtualized plurality of blocks with the configured GPU instances, respectively.

According to a second aspect of the present invention, there is also provided a computing device. The computing device includes at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the computing device to perform the method of the first aspect of the invention.

According to a third aspect of the present invention, there is also provided a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a machine, performs the method of the first aspect of the invention.

According to a fourth aspect of the present invention there is also provided a computer program product comprising a computer program which when executed by a machine performs the method of the first aspect of the present invention.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

The above and other features, advantages and aspects of embodiments of the present invention will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals denote the same or similar elements.

FIG. 1 schematically illustrates a schematic diagram of a computing device for implementing a method of building a trusted execution environment according to an embodiment of the present invention.

FIG. 2 illustrates a flow chart of a method for building a trusted execution environment according to an embodiment of the present invention.

FIG. 3 illustrates a schematic diagram of a trusted execution environment according to some embodiments of the present invention.

FIG. 4 illustrates a flowchart of a method for managing data flow and memory access within a GPU instance according to some embodiments of the present invention.

FIG. 5 illustrates a schematic diagram of a method for managing data flow and memory access within a GPU instance according to some embodiments of the present invention.

FIG. 6 illustrates a flowchart of a method for managing data flow and memory access within a GPU instance according to further embodiments of the present invention.

FIG. 7 schematically illustrates a schematic diagram of a PCIe SR-IOV device in accordance with some embodiments of the invention.

Like or corresponding reference characters indicate like or corresponding parts throughout the several views.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are illustrated in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like, may refer to different or the same object.

As described above, conventional computing devices suffer from the disadvantage that it is difficult to build a trusted execution environment that supports confidential computing.

To at least partially address one or more of the above problems, as well as other potential problems, example embodiments of the present invention provide a method for building a trusted execution environment. In the method, a plurality of virtual machines are configured by a host side; isolating the plurality of virtual machines; and configuring, on the device side, the hardware-isolated GPU instance to form an independent hardware access path with at least one of the configured plurality of virtual machines for use in constructing a virtual machine-based trusted execution environment that supports confidential computing, according to the invention, an independent hardware access path can be formed cooperatively by configuring the hardware isolated GPU instances on the virtual machine and the equipment side configured on the host, so that the hardware isolated GPU instances can be provided for a plurality of users sharing a single equipment, the sharing of the equipment can be ensured, the safety isolation performance is obviously improved, and the data and the application in use can be effectively protected. Thus, the present invention enables efficient construction of a trusted execution environment that supports confidential computing.

FIG. 1 schematically illustrates a schematic diagram of a computing device 100 for implementing a method of building a trusted execution environment according to an embodiment of the present invention. As shown in fig. 1, computing device 100 may have one or more processing units, including a special purpose processing unit such as a graphics processor (Graphics Processing Unit, GPU), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), an Application SPECIFIC INTEGRATED integrated Circuit (ASIC), or a General-purpose graphics processor (GPGPU), and a General-purpose processing unit such as a CPU. In some embodiments, the computing device 100 is further configured with at least a virtual machine configuration module 102, a virtual machine isolation module 104, and a hardware access path formation module 106. In some embodiments, the virtual machine configuration module 102, the virtual machine isolation module 104, and the hardware access path formation module 106 described above are, for example, program modules running on the computing device 100.

Regarding the virtual machine configuration module 102, it is used to configure a plurality of virtual machines on the host side.

With respect to the virtual machine isolation module 104, it is used to isolate the plurality of virtual machines.

With respect to the hardware access path formation module 106, it is configured to configure the hardware isolated GPU instance at the device side to form an independent hardware access path with at least one virtual machine of the configured plurality of virtual machines for use in constructing a trusted execution environment that supports confidential computations.

A method 200 for constructing a trusted execution environment according to an embodiment of the present invention will be described below in conjunction with fig. 2 and 3. FIG. 2 illustrates a flow chart of a method 200 for building a trusted execution environment according to an embodiment of the present invention. FIG. 3 illustrates a schematic diagram of a trusted execution environment 300, according to some embodiments of the present invention. It should be appreciated that the method 200 may be performed, for example, at the computing device 100 depicted in fig. 1. Method 200 may also include additional acts not shown and/or may omit acts shown, the scope of the present invention being not limited in this respect.

At step 202, the computing device 100 configures a plurality of virtual machines on a host side.

With respect to the host, it is, for example, a CPU, as indicated, for example, by reference numeral 310 in FIG. 3. In some embodiments, at least a software stack is built on CPU 310. For example, an Operating System (OS)/hypervisor, such as the OS/hypervisor indicated by reference numeral 312 in fig. 3, is configured on the CPU 310. The OS/hypervisor 312 is used, for example, to manage processes, storage, devices, and the like. In some embodiments, the OS/hypervisor 312 further includes a physical function (Physical Function, PF) driver. For example, the PF driver indicated by reference numeral 311 in FIG. 3.

With respect to Virtual Machines (VMs), they include, for example, hardware and software. The virtual machine runs on the host machine. Each virtual machine is configured, for example, to be allocated physical computing resources (e.g., processors, memory, and storage). For example, the physical computing resources of the host are configured into blocks to form a container for hardware.

Regarding the number of virtual machines configured, it is typically determined based on the physical computing resources of the application and host. For example, the present invention may configure 4, 8, 16, or other numbers of virtual machines.

For example, as shown in fig. 3, a plurality of virtual machines are arranged on the CPU 310. For example, 4 virtual machines are configured, including, for example, a first virtual machine 322 (i.e., VM 0), a second virtual machine 324 (i.e., VM 1), a third virtual machine 326 (i.e., VM 2), and a fourth virtual machine 328 (i.e., VM 3).

At step 204, the computing device 100 isolates the plurality of virtual machines.

Such as such that there is no interaction and access of information between the first virtual machine 322 through the fourth virtual machine 328, and such that the physical computing resources (e.g., processors, memory, and storage) allocated by each of the first virtual machine 322 through the fourth virtual machine 328 are independent of, and isolated from, each other.

In some embodiments, the computing device 100 is configured with one Virtual Function (VF) driver in each Virtual machine. In some embodiments, a secure driver is configured in at least one of the plurality of virtual machines on the host side for supporting confidential virtual machines. For example, the secure driver is, for example, the VF driver 321 configured by the first virtual machine 322 in FIG. 3.

At step 206, the computing apparatus 100 configures, on the device side, the hardware-isolated GPU instance to form an independent hardware access path with at least one virtual machine of the configured plurality of virtual machines for use in constructing a trusted execution environment that supports confidential computations.

With respect to a device, it is, for example, a GPU (e.g., indicated by reference numeral 340 in fig. 3). It should be appreciated that the hardware resources on the GPU include, for example, registers, memory, and the like. In some embodiments, GPU 340 also includes peripheral device high speed connection standard (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) physical functions (referred to simply as "PCIe PFs"). PCIe PF 342 further includes a graphics subsystem processor (Graphics Subsystem Processor, GSP).

With respect to GPU instances, they are, for example, virtual functions (or virtual function modules) of the GPU. In some embodiments, a peripheral device high speed connection standard (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) virtual function module (simply referred to as "PCIe VF"), a memory instance, a single-operation DMA or Stream Processor Cluster (SPC), or the like, is configured in each GPU instance configured on the device side, and in some embodiments, a CP/SDMA/SPC/VD/VE is configured in each GPU instance. Where CP stands for command processor (Command Processor), SDMA stands for System direct memory Access (SYSTEM DIRECT memory access), SPC stands for stream processor Cluster (Stream Processor Cluster), VD stands for video decoder (video encoder), and VE stands for video encoder (video encoder).

Regarding the hardware access path, it may be formed by at least one of the plurality of GPU instances and a corresponding virtual machine of the plurality of virtual machines. Specifically, the method for forming independent hardware access paths includes, for example, the computing device 100 forming one hardware access path based on each of the plurality of GPU instances and a corresponding one of the plurality of virtual machines, respectively, thereby forming a plurality of hardware access paths, or forming a plurality of hardware access paths based on one of the plurality of GPU instances and a corresponding one of the plurality of virtual machines, respectively, via multiplexing.

It should be appreciated that one GPU may be virtualized, for example, into multiple GPU instances. For example, as shown in FIG. 3, GPU 340 is caused to virtualize 4 GPU instances that are independent of each other in hardware. The 4 GPU instances that are independent of each other are, for example, a first GPU instance (i.e., VF 0), a second GPU instance (i.e., VF 1), a third GPU instance (i.e., VF 2), and a fourth GPU instance (i.e., VF 3), respectively. In some embodiments, the 4 hardware independent GPU instances described above are used independently by 4 virtual machines configured on the CPU side, respectively. In terms of hardware, physical isolation between GPU instances includes, for example, memory isolation between GPU instances, command processor isolation, compute core isolation, and individual passwords between virtual machines, among others. It should be appreciated that by employing the above-described approach, a relatively independent, spaced-apart environment from the CPU side to the GPU side may be provided for multiple users.

Regarding a method of configuring a hardware-isolated GPU instance, it includes, for example, the computing device 100 isolating an address space of hardware on the device side, and isolating memory, command processors, and computing cores associated with the configured GPU instance from each other. For example, for each of a plurality of GPU instances configured on the device side, a corresponding compute core is allocated.

With respect to the method of isolating the address space of the hardware on the device side, it includes, for example, the computing apparatus 100 partitioning for the system address space, PCIe address space, high speed bus address, and low speed bus address to map with configured GPU instances. Specifically, GPU hardware address space isolation may be built based on single root I/O virtualization (i.e., SR-IOV) techniques. It should be appreciated that SR-IOV is a PCIe expansion function that causes one physical device to appear as multiple virtual devices. FIG. 7 schematically illustrates a schematic diagram of a PCIe SR-IOV device 700 in accordance with some embodiments of the invention. The physical device is referred to, for example, as a Physical Function (PF), as indicated by reference numeral 714 in fig. 7. While the virtual device is referred to, for example, as a Virtual Function (VF), as indicated by the numeral 716 in FIG. 7, it should be appreciated that the PCIe SR-IOV apparatus 700 may be configured with multiple Virtual Functions (VFs). Wherein the allocation of Virtual Functions (VFs) may be dynamically controlled by a Physical Function (PF) through registers encapsulated in that function. The PCI configuration space of each Virtual Function (VF) is accessible through its bus, device, and function identification (routing ID). And each Virtual Function (VF) also has PCI memory space. It should be appreciated that the SR-IOV technology provides a way to share the Physical Functions (PFs) of PCIe I/O devices. Which allows devices to independently access their resources in various PCIe Physical Functions (PFs). As can be seen in FIG. 7, access via the PCIe port 710 may be routed to a corresponding Virtual Function (VF) via an internal route 712 (e.g., as indicated by the label 716). In addition, the SR-IOV technology enables separate allocation and use of devices, enables a virtual machine configured by a CPU to directly access the hardware of the GPU, and can provide separate memory space, interrupts, DMA data transfers, and the like. It should be appreciated that if the HBM on the GPU side is accessed from the CPU, for example, four rounds of address translation are required to complete the addressing. First, the addresses on the CPU side are addresses of the system address space, the addresses of the system address space need to be translated into addresses of the PCIe address space when accessing PCIe, the addresses need to be translated into low speed bus addresses similar to the system memory management unit addresses (e.g., SMMU_PA) after entering through the PCIe path, such as transmitting on the low speed bus, and the addresses need to be translated into high speed bus addresses if the HBM is connected to the high speed bus. The invention can remarkably improve the performance and the safety of the hardware at the equipment side by constructing the GPU hardware address space isolation based on the single I/O virtualization (namely, SR-IOV) technology.

It should be appreciated that accesses from the CPU to the GPU may be read/write to memory on the GPU side or may be read/write to registers on the GPU side. With respect to access to registers in SR-IOV virtualization, the hardware of the GPU may be correspondingly protected and address remapped. Typically for register access, the bandwidth is not high and therefore a relatively low speed bus is used. It is thus necessary to divide the address space of the low-speed bus. For example, the address space for the low speed bus has 4 gigabytes. Taking 4 Virtual Functions (VFs) as an example, each Virtual Function (VF) may correspond to a 1 Gbyte address space on the low-speed bus. With respect to protection of access to registers in the SR-IOV, one Virtual Function (VF) cannot access the dedicated Memory mapping I/O (MMIO) space of the Physical Function (PF) or the register space of another Virtual Function (VF). Regarding the address remapping involved in the access to registers in the SR-IOV, for those register spaces that can be programmed by a Virtual Function (VF), the hardware must translate these programming from logical register space inside one Virtual Function (VF) to device physical space. It should be appreciated that the address space indicated by the access from the host (e.g., CPU side) is, for example, a logical identification, such as VF1 or VF3. The hardware entity at the device side needs to perform corresponding conversion on the logical identifier, for example, converting the logical identifier VF1 or VF3 to a target physical address in the physical space of the corresponding device.

In regard to a method of building a trusted execution environment supporting confidential computing, for example, the method includes the computing device 100 configuring a plurality of GPU instances that are hardware isolated from one another, building the trusted execution environment based on independent hardware access paths formed by at least one GPU instance of the plurality of GPU instances and a corresponding virtual machine of the plurality of virtual machines, and managing data flow and memory access within at least one GPU instance used to build the trusted execution environment. The specific embodiments of the method for managing data flow and memory access in the GPU instance will be described in detail below with reference to fig. 4 to 6, and will not be repeated here.

For example, as shown in FIG. 3, computing device 100 configures 4 GPU instances with hardware isolated from each other on GPU 340. Then 4 independent hardware access paths are formed based on the 4 GPU instances configured in GPU 340 and the 4 corresponding virtual machines configured in CPU 310. The first hardware access path 314, the second hardware access path 316, the third hardware access path 318, and the fourth hardware access path 320, respectively. In some embodiments, the computing device 100 builds the trusted execution environment for supporting confidential computations based on a first hardware access path 314 formed by a first virtual machine 322 (i.e., VM 0) configured on the CPU side and a first GPU instance (i.e., VF 0) configured on the GPU side. Whereas the normal applications run on the second hardware access path 316 formed based on the second GPU instance (i.e., VF 1) and the corresponding second virtual machine 324 (i.e., VM 1), and the third hardware access path 318 and the fourth hardware access path 320 formed by the third GPU instance (i.e., VF 2), the fourth GPU instance (i.e., VF 3), and the corresponding virtual machine, respectively. It should be appreciated that the present invention may also construct the trusted execution environment for supporting confidential computations based on the first and second hardware access paths 314, 316 formed.

In some embodiments, with respect to a method of building a trusted execution environment supporting confidential computing, in addition to configuring hardware isolated GPU instances for the GPU side, the method further includes the computing device 100 building a confidential virtual machine on the CPU side, building a confidential GPU instance on the GPU side, and conducting encrypted transfers of data between the confidential virtual machine and the confidential GPU instance. Wherein all content running in the confidential virtual machine should be confidential, e.g., drivers, libraries. It should be appreciated that the isolation between the confidential virtual machine and the virtual machine manager, as well as other users, requires not only isolation between the corresponding GPU instances, but also security functions of the CPU software, such as secure access control, paging control, address translation, data encryption, and the like. By adopting the means, the method and the device can further isolate the confidential virtual machine from other virtual machines and potential malicious virtual machine management programs, so that the safety is further improved.

In the scheme, the virtual machine configured on the host and the GPU instance with hardware isolation configured on the device side can cooperatively form an independent hardware access path, so that the GPU instance with hardware isolation can be provided for a plurality of users sharing a single device, the sharing of the device can be ensured, the safety isolation performance is obviously improved, and the data and the application in use can be effectively protected. Thus, the present invention enables efficient construction of trusted execution environments that support confidential computations, as well as preventing unauthorized user access and modification.

A method 400 of managing data flow and memory access within a GPU instance according to an embodiment of the present invention is described below in conjunction with fig. 4 and 5. FIG. 4 illustrates a flowchart of a method 400 for managing data flow and memory access within a GPU instance according to some embodiments of the present invention. FIG. 5 illustrates a schematic diagram of a method for managing data flow and memory access within a GPU instance according to some embodiments of the present invention. It should be appreciated that the method 400 may be performed, for example, at the computing device 100 depicted in fig. 1. Method 400 may also include additional acts not shown and/or may omit acts shown, the scope of the present invention being not limited in this respect.

It should be appreciated that memory addressing in a device (e.g., GPU) may occur in multiple passes. It should be appreciated that the access received by the device may be from a load or store of direct instructions, may be a direct memory access (Direct Memory Access, DMA), may be a command packet access, or may be a register access. Even for accesses to device memory (e.g., for high bandwidth memory), it may pass through multiple paths. For example, the access request may be from a device memory access between a host (e.g., CPU) and a device (e.g., GPU), or from a master module within the GPU (the master module including, for example, a command processor, a compute core, and a DMA processing unit, etc.).

Two exemplary paths for access to high bandwidth memory are illustrated in fig. 5. One way is for example, that the CPU 510 sends a command with an address space identification (ADRESS SPACE ID, ASID) to the master 520 (or Master IP Modules) of one GPU instance 530 of the GPU, and then the master 520 initiates an access to the high bandwidth memory (High Bandwidth Memory, HBM). Another path is, for example, direct access to the high bandwidth memory 528 by the CPU 510 via the PCIe switch 512, the Host portal 514 (HA), the physical address generator 524. The following describes a method for managing data flow and memory access within a GPU instance in connection with a scenario where an application accesses device memory.

At step 402, in a process run by a GPU instance, a command with an address space identification is sent to a command queue so that a master module in the GPU instance for processing the command obtains the command.

Regarding the master module 520 of the GPU instance 530, it is a processing unit for processing commands. In some embodiments, master module 520 is, for example, a command processor, a compute core, a DMA processing unit, or the like.

With respect to address space identification (ADRESS SPACE ID, ASID), it includes, for example, a process identification (e.g., a process ID). In some embodiments, the command also indicates a virtual address, and an identification of the GPU instance.

For example, when an application on the CPU accesses the HBM, the GPU may initiate (lanch) a process (process). In the context of the present invention via virtualized, isolated GPU instances, in general, a process may run in one GPU instance configured in a GPU, e.g., on GPU instance 530 shown in fig. 5, rather than in the entire GPU. In the running process, a command packet concerning a software instruction sent by a driver of the CPU is transmitted. The command packet includes a command with an address space identifier (ADRESS SPACE ID, ASID). The command also includes, for example, a virtual address, and an identification of the GPU instance. The command is sent, for example, to the main module 520 (e.g., command processor) for processing the associated command.

At step 404, the master module processes the command to initiate an access request for memory.

As for the memory, it is a memory on the device side, and includes, for example, a high bandwidth memory or a secondary cache. High bandwidth memory is known as "HBM". The second level cache may also be referred to as an "L2 cache" or "L2 cache".

For example, the command handler processes the received command to initiate an access request to the HBM based on a memory read/write to the HBM, and invokes the system memory management unit 522 (System Memory Management Unit, SMMU).

At step 406, the system memory management unit looks up the context match table through the address space identification to obtain a valid context descriptor to obtain a page table entry. The valid context descriptor indicates the GPU instance.

For example, via system memory management unit 522, a context match Table is looked up based on the ASID indicated by the access request to obtain a context descriptor indicating the GPU instance, thereby obtaining a Page Table Entry (PTE).

At step 408, the system memory management unit sends a memory request to the physical address generator, the memory request indicating a system memory management unit address.

For example, via system memory management unit 522, a memory request indicating a system memory management unit address (i.e., smmu_pa) is sent to physical address generator 524.

At step 410, the system memory management unit address is translated to a target physical address of the memory via the physical address generator.

For example, system memory management unit addresses (e.g., smmu_pa) are converted to high-speed bus addresses via physical address generator 524, and addressing is performed for the target physical address (i.e., hbm_pa) of high-bandwidth memory 528 based on the high-speed bus addresses via high-speed bus 526.

In the scheme, the method and the device can realize management of internal management data flow and memory access of the GPU instance, thereby realizing hardware level isolation.

Methods 600 for internal management data flow and memory access for GPU instances according to further embodiments of the present invention are described below in conjunction with fig. 5 and 6. FIG. 6 illustrates a flowchart of a method 600 for managing data flow and memory access within a GPU instance according to further embodiments of the present invention. It should be appreciated that the method 600 may be performed, for example, at the computing device 100 depicted in fig. 1. Method 600 may also include additional acts not shown and/or may omit acts shown, the scope of the present invention being not limited in this respect.

At step 602, an access request is sent via a host for a memory on a device side, the access request indicating a PCIe address on the device side.

For example, an access request for a device-side memory (e.g., HBM) is sent via CPU 510 (e.g., indicating a PCIe address of the device-side, where the PCIe address is a physical address of the PCIe address space, or "pcie_pa") to PCIe switch 512 (PCIE SWITCH), and the access request is provided via PCIe switch 512 to host portal 514.

Regarding the PCIe switch 512, it is, for example, a device for extending PCIe bus connection capability, for allowing a plurality of PCIe devices to be connected to the same PCIe bus, thereby enabling management of communication and data transmission between the plurality of devices.

At step 604, the PCIe address is translated to a system memory management unit address via the device-side host portal to provide a physical address generator for the system memory management unit address.

With respect to a Host portal (HA) on the device side, it HAs the capability of low-speed bus translation.

At step 606, the system memory management unit address is translated to a target physical address of the device side memory via a physical address generator.

As shown in fig. 5, the system memory management unit address (i.e., smmu_pa) is converted to a high-speed bus address via a physical address generator 524, and the target physical address (i.e., hbm_pa) for the high-bandwidth memory 528 is addressed based on the high-speed bus address via a high-speed bus 526.

In the scheme, the CPU can directly access the high-bandwidth memory through the PCIe switch, the host portal and the physical address generator.

The various processes and treatments described above, such as methods 200, 400, 600, may be performed at a computing device. The computing device includes, for example, at least one processor (at least one graphics processor and at least one central processing unit), and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor. In some embodiments, the methods 200, 400, 600 may be implemented as a computer software program or program product tangibly embodied on a machine-readable medium. In some embodiments, part or all of the computer program may be loaded and/or installed onto the computing device via Read-Only Memory (ROM) and/or a communication unit. One or more of the acts of the methods 200, 400, 600 described above may be performed when a computer program is loaded into Random-access memory (RAM) and executed by a GPU and a CPU.

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention. The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a central processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the central processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors.

Claims

1. A method for building a trusted execution environment, the method comprising:

a plurality of virtual machines are configured at a host side;

isolating the plurality of virtual machines, and

On a device side, configuring the hardware isolated GPU instances to form independent hardware access paths with at least one virtual machine of the configured plurality of virtual machines for building a trusted execution environment supporting confidential computation, wherein building the trusted execution environment supporting confidential computation includes configuring the plurality of GPU instances that are hardware isolated from each other, building the trusted execution environment based on the independent hardware access paths formed by the at least one GPU instance of the plurality of GPU instances and a corresponding virtual machine of the plurality of virtual machines, and managing data flow and memory access within the at least one GPU instance used to build the trusted execution environment.

2. The method of claim 1, wherein configuring the hardware isolated GPU instance comprises:

Isolating address space of hardware on the device side, and

The memory, command processor, and compute cores associated with the configured GPU instance are isolated from each other.

3. The method of claim 1, wherein managing data flow and memory access within at least one GPU instance for building the trusted execution environment comprises:

in the process operated by the GPU instance, sending a command with an address space identifier to a command queue so that a main module for processing the command in the GPU instance can acquire the command;

The main module processes the command to initiate an access request for the memory;

the system memory management unit searches a context matching table through the address space identification to acquire an effective context descriptor, so as to acquire a page table item, wherein the effective context descriptor indicates the GPU instance;

The system memory management unit sending a memory request to the physical address generator, the memory request indicating a system memory management unit address, and

The system memory management unit address is converted to a target physical address of the memory via a physical address generator.

4. The method of claim 1, wherein managing data flow and memory access within at least one GPU instance for building the trusted execution environment comprises:

sending an access request for a memory of a device side via a host, the access request indicating a PCIe address of the device side;

Translating the PCIe address into a system memory management unit address via a device-side host portal to provide the system memory management unit address to a physical address generator, and

The system memory management unit address is converted to a target physical address of the device side memory via a physical address generator.

5. The method of claim 2, wherein isolating an address space of hardware on the device side comprises:

the system address space, PCIe address space, high speed bus addresses, and low speed bus addresses are partitioned to map with configured GPU instances.

6. The method of claim 1, wherein building the trusted execution environment based on independent hardware access paths formed by at least one of the plurality of GPU instances and a corresponding one of the plurality of virtual machines comprises any one of:

forming a hardware access path based on each GPU instance of the plurality of GPU instances and a corresponding virtual machine of the plurality of virtual machines, respectively, thereby forming a plurality of hardware access paths, or

Based on one of the plurality of GPU instances, a plurality of hardware access paths are formed with a plurality of corresponding virtual machines of the plurality of virtual machines, respectively, via multiplexing.

7. The method of claim 1, wherein building the trusted execution environment based on independent hardware access paths formed by at least one of the plurality of GPU instances and a corresponding virtual machine of the plurality of virtual machines comprises:

Enabling a corresponding virtual machine for forming a hardware access path to directly access the at least one GPU instance, and

Individual memory space, interrupts, and direct memory access data transfers are configured for the formed hardware access path.

8. The method of claim 5, wherein partitioning for the system address space, PCIe address space, high speed bus addresses, and low speed bus addresses to map with the configured GPU instance comprises:

the PCIe address space is virtualized into a plurality of blocks so that the virtualized plurality of blocks are mapped with the configured plurality of GPU instances, respectively.

9. A computing device, comprising:

at least one processor, and

A memory communicatively coupled to the at least one processor, wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a machine, performs the method according to any of claims 1-8.

11. A computer program product comprising a computer program which, when executed by a machine, performs the method according to any of claims 1-8.