[go: up one dir, main page]

CN119537051A - Method for processing shared memory among threads based on SPIR-V - Google Patents

Method for processing shared memory among threads based on SPIR-V Download PDF

Info

Publication number
CN119537051A
CN119537051A CN202411590422.9A CN202411590422A CN119537051A CN 119537051 A CN119537051 A CN 119537051A CN 202411590422 A CN202411590422 A CN 202411590422A CN 119537051 A CN119537051 A CN 119537051A
Authority
CN
China
Prior art keywords
instruction
shared memory
variable
spir
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411590422.9A
Other languages
Chinese (zh)
Inventor
吴正豪
许世文
张抗
彭获然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Lingjiu Microelectronics Co ltd
Original Assignee
Wuhan Lingjiu Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Lingjiu Microelectronics Co ltd filed Critical Wuhan Lingjiu Microelectronics Co ltd
Priority to CN202411590422.9A priority Critical patent/CN119537051A/en
Publication of CN119537051A publication Critical patent/CN119537051A/en
Pending legal-status Critical Current

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

本发明适用于GPU技术领域,提供一种基于SPIR‑V的线程间共享内存处理方法,包括下述步骤:S1、读取高级语言代码源码到字符缓冲区,然后将字符缓冲区中的源码转换为SPIR‑V形式的中间代码;S2、遍历中间代码的指令序列,对于其中的变量声明,累计所有共享内存变量的占用空间大小总和;S3、计算线程组的一维索引以及共享内存地址;S4、遍历中间代码的指令序列,对于其中函数声明和函数定义中的指令,如果涉及共享内存的处理,则计算真实地址并根据指令类型生成不同的GPU指令。本发明提出一种将SPIR‑V代码中用于处理工作组共享内存的相关代码转译为所有GPU架构都有的基本指令的通用的方法,由于本发明使用的指令都是所有GPU指令集中都有的基础指令,使得本发明方法具有硬件平台普适性。

The present invention is applicable to the field of GPU technology, and provides a method for processing shared memory between threads based on SPIR‑V, comprising the following steps: S1, reading high-level language code source code into a character buffer, and then converting the source code in the character buffer into an intermediate code in the form of SPIR‑V; S2, traversing the instruction sequence of the intermediate code, and for the variable declaration therein, accumulating the total size of the occupied space of all shared memory variables; S3, calculating the one-dimensional index and shared memory address of the thread group; S4, traversing the instruction sequence of the intermediate code, for the instructions in the function declaration and function definition therein, if the processing of shared memory is involved, calculating the real address and generating different GPU instructions according to the instruction type. The present invention proposes a general method for translating the relevant code for processing the shared memory of the work group in the SPIR‑V code into basic instructions available in all GPU architectures. Since the instructions used in the present invention are all basic instructions available in all GPU instruction sets, the method of the present invention has universality of hardware platforms.

Description

Method for processing shared memory among threads based on SPIR-V
Technical Field
The invention belongs to the technical field of GPU (graphics processing Unit), and particularly relates to a processing method for shared memory among threads based on SPIR-V.
Background
SPIR-V (Standard Portable Intermediate Representation-Vulkan) is an intermediate language developed by the Khronos organization for the parallel computing and graphics arts. The SPIR-V can be transcoded from a variety of high-level languages, such as code written using standards of OpenCL (Open Computing Language ), openGL (Open Graphics Library, open graphics library or open graphics library), vulkan (a cross-platform 2D and 3D graphics application program interface), and the like.
Due to the introduction of the SPIR-V, a developer does not need to provide a grammar parser for each language in the development process of the compiler program, and after the high-level language codes are converted into the SPIR-V codes, the SPIR-V codes are subjected to the subsequent processing, so that convenience is provided for developing the compiler program in a cross-language mode. The SPIR-V standard has become an increasingly indispensable part of modern GPU software ecology today.
SPIR-V supports basic and high-level features of different high-level languages, including compute shaders, parallel computing, memory management operations, and the like. To support parallel computing features in different languages, the SPIR-V standard introduced the concept of a workgroup (Workgroup) which refers to a set of threads that simultaneously perform computing tasks, threads in the workgroup may share a specific Memory region, referred to as Shared Memory (Shared Memory), for ease of data exchange and communication, and further to ensure inter-thread collaboration and data consistency, the SPIR-V instruction set also provides OpControlBarrier instructions for inter-thread synchronization. In order to fully support the SPIR-V standard, it is necessary to translate the SPIR-V code for processing the workgroup shared memory into GPU hardware supported object code, and to maintain the same semantics.
SPIR-V is a standardized intermediate form that cannot be executed directly on a GPU, and requires a sequence of GPU instructions that are converted to a specific hardware architecture before execution in the GPU. In order to enable the converted GPU instruction sequence to completely support the shared memory characteristic of the SPIR-V in the parallel computing field, the allocation of a shared memory space is required to be completed, and the SPIR-V instruction using a shared memory variable is mapped into the GPU instruction. At present, related technical schemes are not disclosed.
Disclosure of Invention
In view of the above problems, the present invention is directed to a method for processing shared memory between threads based on SPIR-V, and aims to solve the technical problem of how to translate the instructions for processing the shared memory in the SPIR-V code into GPU instruction sequences with the same semantics after the SPIR-V code is converted by a high-level language code.
The invention adopts the following technical scheme:
the method for processing the shared memory among threads based on the SPIR-V comprises the following steps:
S1, reading source codes of a high-level language code into a character buffer area, and converting the source codes in the character buffer area into intermediate codes in an SPIR-V form;
Step S2, traversing an instruction sequence of an intermediate code, regarding a variable statement, if a storage class of the variable is a working group, determining the current variable as a shared memory variable, calculating the occupied space of the variable, simultaneously recording the variable offset, and finally accumulating the sum of the occupied space of all the shared memory variables;
S3, calculating the size of the shared memory space to be allocated and acquiring a base address according to the sum of the sizes of the occupied spaces, and calculating the shared memory address of the thread group according to the one-dimensional index, the sum of the sizes of the occupied spaces and the base address by calculating the one-dimensional index of the thread group;
Step S4, traversing the instruction sequence of the intermediate code, and for the instructions in the function statement and the function definition, if the processing of the shared memory is involved, calculating the real address and generating different GPU instructions according to the instruction types.
Further, the specific process of step S2 is as follows:
S21, traversing an instruction sequence of the intermediate code, wherein OpVariable instructions are used for variable declarations, judging whether the storage class of the OpVariable instructions is a work group Workgroup or not for the variable declarations, and if so, indicating that the variable declared by the variable declaration is a shared memory variable;
S22, calculating the space size S i of the declared variable according to the type in the operand in the OpVariable instruction, wherein if the type declaration is OpTypeVoid, the space is not occupied, S i is 0, if the type declaration is the original type, S i is the specific space size of the original type, and if the type declaration is the composite data type, S i is the specific space size of the composite type;
s23, calculating the sum of the occupied space sizes of all the shared memory variables N is the number of shared memory variables, and in addition, the variable offset of the shared memory variable is recorded synchronously with offset k, the kth shared memory variable, the variable offset thereofI.e. the sum of the space size occupied by all the previous variables of the current shared memory variable.
Further, the specific process of step S3 is as follows:
S31, calculating the size SmTotalSize, smTotalSize =S total × GroupNum of the shared memory space to be allocated, wherein GroupNum is the number of all thread groups, and allocating a region with the size SmTotalSize in the GPU video memory for the shared memory space, wherein the base address of the region is SmBaseAddr;
S32, calculating a one-dimensional index GroupIndex of a thread group by generating a section of GPU instruction, wherein if the index of the thread group in each dimension is i 1,i2,…,id and the maximum thread group size in each dimension is n 1,n2,…,nd, the one-dimensional index GroupIndex of the thread group is:
GroupIndex=i1+n1×(i2+n2×(i3+…+nd-2×(id-1+nd-1×id)…));
s33, calculate the shared memory address of the thread group in the shared memory space, i.e. the start address SmAddr = GroupIndex ×s total + SmBaseAddr.
Further, in the step S4, the GPU instructions include OpLoad instructions, opStore instructions, and OpControlBarrier instructions, wherein OpLoad instructions are used for memory reading, opStore instructions are used for memory writing, opControlBarrier instructions are used for controlling synchronization and memory barrier of memory access:
For the OpLoad instruction, before generation, the real address realAddr =offset k +smaddr of the calculate instruction, the pointer operand of the OpLoad instruction comes directly from the return value of the OpVariable instruction, the address read by the OpLoad instruction is realAddr; if the pointer operand of the OpLoad instruction is from the return value of the chained operation instruction of SPIR-V, recursively calculating the address offset by the base address and index of the chained operation instruction, and adding realAddr to the address offset as the address read by the OpLoad instruction;
For OpStore instruction, judging whether the storage class of pointer operand of OpStore instruction is a working group or not to write data into the shared memory space, and calculating the real address in the same way as OpLoad instruction;
For OpControlBarrier instructions, according to the execution range of OpControlBarrier instruction synchronization and the synchronous memory range, a corresponding GPU Barrier instruction Barrier is generated to ensure the correct synchronization of the thread groups.
The method has the advantages that the method can reduce memory waste when space is accurately allocated by converting the source code of the high-level language code into the SPIR-V code, traversing the variable declaration instruction and accurately calculating the sum of the used shared memory space according to the data type of each variable, and simultaneously adding a section of GPU instruction to the GPU instruction sequence head to enable threads to calculate the shared memory address of the thread group before executing operation and calculate the real address in subsequent instruction translation, thereby reducing repeated calculation. Because the instructions used in the method are all basic instructions in all GPU instruction sets, the method has hardware platform universality.
Drawings
FIG. 1 is a flowchart of a method for processing shared memory between threads based on SPIR-V according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The method aims at solving the problem of converting the SPIR-V codes of the processing shared memory into GPU instruction sequences with the same function after the high-level language is converted into the intermediate codes of the SPIR-V. The invention provides a processing method of shared memory among threads based on SPIR-V. The method comprises the steps of firstly analyzing variable declarations of shared memory variables used in SPIR-V codes, then calculating the size of occupied space, distributing shared memory space in hardware, generating a GPU instruction, placing the GPU instruction in a sequence header for calculating shared memory addresses, and finally converting instructions for reading and writing and synchronizing the shared memory in the SPIR-V codes into GPU instructions with the same semantics. The generated GPU instructions are basic instructions that are found in all GPU instruction sets, including MUL (multiplication), ADD (addition), LOAD (memory read), and STORE (memory write). In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
As shown in fig. 1, the present embodiment provides a method for processing shared memory between threads based on SPIR-V, comprising the following steps:
and S1, reading source codes of the high-level language codes into a character buffer area, and converting the source codes in the character buffer area into intermediate codes in the form of SPIR-V.
The host program first reads the high-level language code from a file or other source into a character buffer, and then converts the source code in the character buffer into intermediate code in the form of SPIR-V. The SPIR-V instructions are arranged according to a specified sequence, and source codes are converted into intermediate codes in the form of SPIR-V to form an SPIR-V instruction sequence.
And S2, traversing an instruction sequence of the intermediate code, regarding variable declaration, if the storage class of the variable is a working group, recognizing the current variable as a shared memory variable, calculating the occupied space of the variable, recording the variable offset, and finally accumulating the sum of the occupied space of all the shared memory variables.
The SPIR-V instruction consists of a series of precisely arranged binary data streams, each word containing 32 bits, following a small-end endian, each instruction consisting of one or more words. Each instruction in SPIR-V contains an operation code (OpCode) and several operands. The opcode defines the type of instruction, each SPIR-V instruction beginning with an opcode indicating the type and function of the instruction. An opcode is a 16-bit integer that uniquely identifies an operation, and an operand provides the specific data needed for instruction execution. These operands may be direct values, values generated by referencing other instructions, or references to other resources, such as buffers or textures.
This step traverses the SPIR-V instruction sequence, calculating and accumulating the sum of the sizes of the space occupied by the shared memory variables. The specific process is as follows:
S21, traversing an instruction sequence of the intermediate code, wherein OpVariable instructions are used for variable declaration, judging whether the storage class of the OpVariable instructions is a work group Workgroup for the variable declaration, and if so, indicating that the variable declared by the variable declaration is a shared memory variable.
In the intermediate code in the SPIR-V format, opVariable instructions are used for variable declarations, and the Storage Class (Storage Class) of the OpVariable instruction specifies the Storage type of the variable. The Storage Class (Storage Class) in the SPIR-V defines the Storage type of the variable and its access range. Common storage classes include UniformConstant, input, uniform, output, workgroup, crossWorkgroup, private, function and Generic. UniformConstant are used for externally shared read-only variables, can be accessed in all functions of all calls, are commonly used for graphics unified memory and OpenCL constant memory, and can have initializers, specifically according to the specification of the client API. Therefore, in this step, it is determined whether the storage class of OpVariable instructions in the instruction sequence is the workgroup Workgroup, if so, it indicates that the variable declared by the OpVariable instruction is a shared memory variable, then the step S22 is continued to be executed, the shared variable occupation space is calculated, and otherwise, the next OpVariable instruction is traversed.
S22, calculating the space size S i of the declared variable according to the type in the operand in OpVariable instruction, wherein if the type declaration is OpTypeVoid, the space is not occupied, S i is 0, if the type declaration is the original type, S i is the specific space size of the original type, and if the type declaration is the composite data type, S i is the specific space size of the composite type.
The Result Type (Result Type) in the operand (Operand) of the OpVariable instruction is the return value of the OpTypePointer instruction, the OpTypePointer instruction is used to declare a pointer Type, the Type (Type) in the operand of the OpTypePointer instruction is the Type of object that the pointer Type points to, and the declared variable footprint size S i is calculated from the types herein.
There are various instructions for the SPIR-V to declare different types, such as in this embodiment, S i is 0 if the type declaration is OpTypeVoid, not taking up memory, S i is the specific footprint size of the original type if the type declaration is the original type, such as OpTypeBool, opTypeFloat, opTypeInt, etc., and S i is the specific footprint size of the entire composite type if the type declaration is a composite data type, such as OpTypeVector, opTypeMatrix, opTypeArray, opTypeStruct, etc.
S23, calculating the sum of the occupied space sizes of all the shared memory variablesN is the number of shared memory variables, and in addition, the variable offset of the shared memory variable is recorded synchronously with offset k, the kth shared memory variable, the variable offset thereofI.e. the sum of the space size occupied by all the previous variables of the current shared memory variable.
And S3, calculating the size of the shared memory space to be allocated and acquiring a base address according to the sum of the sizes of the occupied spaces, and calculating the shared memory address of the thread group according to the one-dimensional index, the sum of the sizes of the occupied spaces and the base address by calculating the one-dimensional index of the thread group.
This step is mainly to calculate the one-dimensional index. The process is specifically as follows:
S31, calculating the size SmTotalSize, smTotalSize =s total × GroupNum of the shared memory space to be allocated, where GroupNum is the number of all thread groups, and allocating a region with a size of SmTotalSize in the GPU video memory for the shared memory space, where the base address of the region is SmBaseAddr.
GroupNum is the number of all thread groups, the value of which is entered by the user when the API of the driver is called. And (3) distributing a region with the size SmTotalSize in the GPU video memory for a shared memory space, wherein the base address of the memory region is SmBaseAddr, and the shared memory space is used for storing data of shared memory variables in different thread groups.
S32, calculating a one-dimensional index GroupIndex of a thread group by generating a section of GPU instruction, wherein if the index of the thread group in each dimension is i 1,i2,…,id and the maximum thread group size in each dimension is n 1,n2,…,nd, the one-dimensional index GroupIndex of the thread group is:
GroupIndex=i1+n1×(i2+n2×(i3+…+nd-2×(id-1+nd-1×id)…)).
The invention aims to convert source codes into SPIR-V intermediate codes, namely an SPIR-V instruction sequence, then translate the instruction sequence into GPU instructions, and the translated GPU instructions are also an instruction sequence. The instructions included in GPU instructions may vary depending on the architecture and implementation of the GPU and typically include basic instructions to perform various computational tasks, manage data, and control program flows, basic instructions such as arithmetic operation instructions, e.g., add (add), subtract (sub), multiply (mul), divide (div), etc., and load and store instructions for memory access, where load is used to read data from a memory address and store is used to write data to a memory address. In embodiments of the present invention, the translation of SPIR-V instructions is accomplished primarily in dependence upon these instructions.
Before translation, a one-dimensional index GroupIndex of a set of GPU instruction compute threads is first generated, the instruction being located at the head of the GPU instruction sequence. The GPU instruction sequence is executed first. Let i 1,i2,…,id be the index of the thread group in each dimension, and n 1,n2,…,nd be the maximum thread group size in each dimension, d being the index number. The one-dimensional index GroupIndex for a thread group is:
GroupIndex=i1+n1×(i2+n2×(i3+…+nd-2×(id-1+nd-1×id)…)).
For example, for the commonly used OpenCL, vulkan such standards, the dimension that their thread group supports at maximum is three-dimensional, then GroupIndex =i 1+i2×n1+i3×n1×n2. Expressed in the form of a three address code as follows:
R1=i2×n1
R2=i1+r 1// for calculation of i1+. I2.n1
R3=n2+n1
R4=r3×i3// use in calculate i3. n1 is n2
GroupIndex=R4+R2//GroupIndex=i1+i2*n1+i3*n1*n2
Triple address codes are a representation for intermediate code generation and are widely used in compiler designs. Each instruction of a three address code typically contains three operands (addresses), two operands and one result address.
S33, calculate the shared memory address of the thread group in the shared memory space, i.e. the start address SmAddr = GroupIndex ×s total + SmBaseAddr.
Each thread needs to calculate the starting address SmAddr of the thread group in which it is located in the shared memory space before using the shared memory space, and in this embodiment there is a linear arrangement of the shared memory space in the shared memory space used by the different thread groups, so SmAddr = GroupIndex ×s total + SmBaseAddr.
Thus, 2 instructions are added after GroupIndex instructions for computing a one-dimensional index, represented in the form of a three address code as follows:
AddrOffset=GroupIndex×S_total
SmAddr=AddrOffset+SmBaseAddr。
Step S4, traversing the instruction sequence of the intermediate code, and for the instructions in the function statement and the function definition, if the processing of the shared memory is involved, calculating the real address and generating different GPU instructions according to the instruction types.
The different GPU instructions generated in this step include OpLoad instructions, opStore instructions, and OpControlBarrier instructions, wherein OpLoad instructions are used for memory reads, opStore instructions are used for memory writes, and OpControlBarrier instructions are used for controlling synchronization and memory barriers for memory accesses.
For OpLoad instructions, the OpLoad instruction in SPIR-V is used for memory reads. The OpLoad instruction has an operand of the return value type, determines whether the storage class of the return value type is a WorkGroup, and if so, indicates that it is used to read the data in the shared memory. The pointer operand of OpLoad instruction is the address of the memory, from which it can trace back to OpVariable instruction, and the previously recorded offset of the variable can be obtained from k, then an add instruction is added to calculate the real address realAddr of the instruction before OpLoad instruction in the GPU instruction sequence is generated, and the three address code form is expressed as follows:
realAddr=offsetk+SmAddr
If the pointer operand of the OpLoad instruction is directly from the return value of the OpVariable instruction, which means that no additional memory operation occurs in the middle, the address read by the OpLoad instruction is realAddr, and if the pointer operand of the OpLoad instruction is from the return value of the SPIR-V chained operation instruction, the address offset is recursively calculated by the base address and index of the chained operation instruction, and the address offset is added realAddr as the address read by the OpLoad instruction.
For OpStore instructions, the OpStore instruction in SPIR-V is used for memory writes. Whether the pointer operand of the OpStore instruction writes data into the shared memory space is judged by whether the storage class of the pointer operand is a working group or not, and the real address is calculated in the same way as the OpLoad instruction.
For the OpControlBarrier instruction, the OpControlBarrier instruction in SPIR-V is used to control the synchronization and memory barrier of memory accesses. And generating a corresponding GPU Barrier instruction Barrier according to the execution range of OpControlBarrier instruction synchronization and the synchronous memory range to ensure the correct synchronization of the thread groups.
And traversing all instructions until translation is completed according to the steps S2-S4. As can be seen from the above steps, the present invention is a method for allocating and calculating shared memory and mapping the instructions for processing shared memory in the SPIR-V code to the GPU instruction set, specifically recording the sizes and offsets of all the shared memory variables in the SPIR-V code, allocating the shared memory space of the corresponding sizes, generating instructions in the GPU instruction sequence header, ensuring that the thread calculates the shared memory address of the thread group of the thread before executing the operation, and translating the instructions for processing shared memory in the rest of the SPIR-V into GPU instructions with the same function.
In summary, the technical scheme of the invention has the following characteristics:
Firstly, the invention uses variable declaration instructions in the traversing SPIR-V codes to precisely calculate the size of a shared memory space required by each thread group, simultaneously records the offset of each variable in the shared memory, and finally calculates the total size of the shared memory space required by all the thread groups, secondly, after the shared memory space is distributed according to the total size of the shared memory space required by all the thread groups, a section of basic instructions are generated in a GPU instruction sequence head and are respectively used for calculating one-dimensional indexes of the thread groups, then the shared memory addresses of the thread groups are calculated according to the one-dimensional indexes of the thread groups, and are used for calculating real addresses when subsequent instructions are translated, thirdly, when the memory read-write instructions of the SPIR-V are translated, if the variables in the shared memory are read-write, an addition instruction is added to calculate the real addresses, and the memory barrier OpControlBarrier instructions of the SPIR-V are converted into GPU instructions with the same semantics to ensure the correct synchronization of the thread groups.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (4)

1. The processing method of the shared memory among threads based on the SPIR-V is characterized by comprising the following steps of:
S1, reading source codes of a high-level language code into a character buffer area, and converting the source codes in the character buffer area into intermediate codes in an SPIR-V form;
Step S2, traversing an instruction sequence of an intermediate code, regarding a variable statement, if a storage class of the variable is a working group, determining the current variable as a shared memory variable, calculating the occupied space of the variable, simultaneously recording the variable offset, and finally accumulating the sum of the occupied space of all the shared memory variables;
S3, calculating the size of the shared memory space to be allocated and acquiring a base address according to the sum of the sizes of the occupied spaces, and calculating the shared memory address of the thread group according to the one-dimensional index, the sum of the sizes of the occupied spaces and the base address by calculating the one-dimensional index of the thread group;
Step S4, traversing the instruction sequence of the intermediate code, and for the instructions in the function statement and the function definition, if the processing of the shared memory is involved, calculating the real address and generating different GPU instructions according to the instruction types.
2. The method for processing shared memory between threads based on SPIR-V as recited in claim 1, wherein the specific process of step S2 is as follows:
S21, traversing an instruction sequence of the intermediate code, wherein OpVariable instructions are used for variable declarations, judging whether the storage class of the OpVariable instructions is a work group Workgroup or not for the variable declarations, and if so, indicating that the variable declared by the variable declaration is a shared memory variable;
S22, calculating the space size S i of the declared variable according to the type in the operand in the OpVariable instruction, wherein if the type declaration is OpTypeVoid, the space is not occupied, S i is 0, if the type declaration is the original type, S i is the specific space size of the original type, and if the type declaration is the composite data type, S i is the specific space size of the composite type;
s23, calculating the sum of the occupied space sizes of all the shared memory variables N is the number of shared memory variables, and in addition, the variable offset of the shared memory variable is recorded synchronously with offset k, the kth shared memory variable, the variable offset thereofI.e. the sum of the space size occupied by all the previous variables of the current shared memory variable.
3. The method for processing shared memory between threads based on SPIR-V as recited in claim 2, wherein the step S3 comprises the following specific procedures:
S31, calculating the size SmTotalSize, smTotalSize =S total × GroupNum of the shared memory space to be allocated, wherein GroupNum is the number of all thread groups, and allocating a region with the size SmTotalSize in the GPU video memory for the shared memory space, wherein the base address of the region is SmBaseAddr;
S32, calculating a one-dimensional index GroupIndex of a thread group by generating a section of GPU instruction, wherein if the index of the thread group in each dimension is i 1,i2,...,id and the maximum thread group size in each dimension is n 1,n2,...,nd, the one-dimensional index GroupIndex of the thread group is:
GroupIndex=i1+n1×(i2+n2×(i3+…+nd-2×(id-1+nd-1×id)…));
s33, calculate the shared memory address of the thread group in the shared memory space, i.e. the start address SmAddr = GroupIndex ×s total + SmBaseAddr.
4. The SPIR-V based inter-thread shared memory processing method as recited in claim 3, wherein in step S4, different GPU instructions are generated according to instruction type, including OpLoad instruction, opStore instruction, and OpControlBarrier instruction, wherein OpLoad instruction is used for memory read, opStore instruction is used for memory write, opControlBarrier instruction is used for controlling synchronization and memory barrier of memory access:
For the OpLoad instruction, before generation, the real address realAddr =offset k +smaddr of the calculate instruction, the pointer operand of the OpLoad instruction comes directly from the return value of the OpVariable instruction, the address read by the OpLoad instruction is realAddr; if the pointer operand of the OpLoad instruction is from the return value of the chained operation instruction of SPIR-V, recursively calculating the address offset by the base address and index of the chained operation instruction, and adding realAddr to the address offset as the address read by the OpLoad instruction;
For OpStore instruction, judging whether the storage class of pointer operand of OpStore instruction is a working group or not to write data into the shared memory space, and calculating the real address in the same way as OpLoad instruction;
For OpControlBarrier instructions, according to the execution range of OpControlBarrier instruction synchronization and the synchronous memory range, a corresponding GPU Barrier instruction Barrier is generated to ensure the correct synchronization of the thread groups.
CN202411590422.9A 2024-11-08 2024-11-08 Method for processing shared memory among threads based on SPIR-V Pending CN119537051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411590422.9A CN119537051A (en) 2024-11-08 2024-11-08 Method for processing shared memory among threads based on SPIR-V

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411590422.9A CN119537051A (en) 2024-11-08 2024-11-08 Method for processing shared memory among threads based on SPIR-V

Publications (1)

Publication Number Publication Date
CN119537051A true CN119537051A (en) 2025-02-28

Family

ID=94707454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411590422.9A Pending CN119537051A (en) 2024-11-08 2024-11-08 Method for processing shared memory among threads based on SPIR-V

Country Status (1)

Country Link
CN (1) CN119537051A (en)

Similar Documents

Publication Publication Date Title
US8037461B2 (en) Program code conversion
CN106415492B (en) Language, function library and compiler for graphics and non-graphics computation on a graphics processor unit
US6467075B1 (en) Resolution of dynamic memory allocation/deallocation and pointers
JP5851396B2 (en) Processing method
US7412710B2 (en) System, method, and medium for efficiently obtaining the addresses of thread-local variables
US7793272B2 (en) Method and apparatus for combined execution of native code and target code during program code conversion
US20030093780A1 (en) Annotations to executable images for improved dynamic optimization of functions
US8578357B2 (en) Endian conversion tool
US20090049431A1 (en) Method and compiler of compiling a program
US6658657B1 (en) Method and apparatus for reducing the overhead of virtual method invocations
CN116661804B (en) Code compiling method, code compiling device, electronic device and storage medium
CN113138755A (en) JSON serialization and deserialization optimization method and system
US8010945B1 (en) Vector data types with swizzling and write masking for C++
CN119537051A (en) Method for processing shared memory among threads based on SPIR-V
CN115421875B (en) Binary translation method and device
US8010944B1 (en) Vector data types with swizzling and write masking for C++
US20190266084A1 (en) Verifying The Validity of a Transition From a Current Tail Template to a New Tail Template for a Fused Object
Chang et al. A translation framework for automatic translation of annotated llvm ir into opencl kernel function
US11829736B2 (en) Method of optimizing register memory allocation for vector instructions and a system thereof
CN116700729B (en) Code compilation method and related device
CN119576879A (en) Method and device for compressing structural body domain and related equipment
CN119536743A (en) A compiling method, device and medium for multi-language mixed code
Meissner Adding named address space support to the gcc compiler
Wang From assumptions to assertions: A sound and precise points-to analysis for the C language
Habib A Microprogram Architecture Applied to Name Resolution in Block-structured Languages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination