CN116051710A

CN116051710A - A shader architecture with texture unit and cache multiplexing

Info

Publication number: CN116051710A
Application number: CN202211553464.6A
Authority: CN
Inventors: 查道路; 王攀
Original assignee: Suzhou Graphichina Electronic Technology Co ltd
Current assignee: Suzhou Graphichina Electronic Technology Co ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-05-02

Abstract

The invention discloses a shader architecture in which texture units and caches are multiplexed. In the present invention, the multi-port cache internally maps the data of the entire memory space to different single-port caches according to the lower bits of the address, as shown in FIG. 3 , thereby providing multiple read interfaces. The requests of multiple ports are merged by the address merging unit, and the requests of each port are mapped to an internal single-port cache. After all the port request data are returned, the multi-port cache outputs the data. In the present invention, a unified shader architecture in which texture units and caches are multiplexed is proposed. The advantages of this architecture are: the architecture of the vertex shader and the pixel shader is consistent; no additional units are required to process different requests, which saves on-chip resources; the multi-port cache design improves the request efficiency of multiple parallel processing units.

Description

A shader architecture with texture unit and cache multiplexing

技术领域technical field

本发明属于计算机图形学技术领域，具体为一种纹理单元与cache复用的shader架构。The invention belongs to the technical field of computer graphics, and specifically relates to a shader architecture for multiplexing texture units and caches.

背景技术Background technique

着色器(shader)是用来实现图像渲染的，用来替代固定渲染管线的可编辑程序。其中VertexShader(顶点着色器)主要负责顶点的几何关系等的运算，需要访问顶点信息；PixelShader(像素着色器)主要负责片源颜色等的计算，需要访问纹理信息。顶点数据通常使用cache来管理访问，而纹理信息需要额外的纹理单元(textureunit)对请求进行管理。在通常的架构中，cache和纹理单元是独立的两个模块。Shaders are used to implement image rendering, and are used to replace editable programs with fixed rendering pipelines. Among them, the VertexShader (vertex shader) is mainly responsible for the calculation of the geometric relationship of the vertex, and needs to access the vertex information; the PixelShader (pixel shader) is mainly responsible for the calculation of the source color, etc., and needs to access the texture information. Vertex data usually uses a cache to manage access, while texture information requires an additional texture unit (texture unit) to manage requests. In the usual architecture, cache and texture unit are two independent modules.

美国申请公布专利US202217583151A，提出了一种图形处理系统，包括几何处理逻辑、光栅化以及染色器(纹理)。几何处理逻辑使用cache缓存顶点数据，而纹理单元直接从内存获取纹理数据。The US application published patent US202217583151A proposes a graphics processing system, including geometry processing logic, rasterization, and a shader (texture). Geometry processing logic uses cache to cache vertex data, while texture units get texture data directly from memory.

但是此方案提供了一种图形处理系统架构，但是仅像素着色器是可编程的，几何阶段使用固定流水线，并且几何阶段的缓存与像素着色器不共享。But this scheme provides a graphics processing system architecture, but only the pixel shader is programmable, the geometry stage uses a fixed pipeline, and the cache of the geometry stage is not shared with the pixel shader.

发明内容Contents of the invention

本发明的目的在于：为了解决上述提出的问题，提供一种纹理单元与cache复用的shader架构。The object of the present invention is to provide a shader architecture in which texture units and caches are multiplexed in order to solve the above-mentioned problems.

本发明采用的技术方案如下：一种纹理单元与cache复用的shader架构，所述纹理单元与cache复用的shader架构包括：纹理单元、shader架构所述纹理单元与shader架构之间采用私有总线连接。The technical scheme adopted in the present invention is as follows: a shader architecture for multiplexing texture units and cache, the shader architecture for multiplexing texture units and cache includes: texture unit, shader architecture A private bus is used between the texture unit and the shader architecture connect.

在一优选的实施方式中，所述纹理单元与cache复用的shader架构的运行流程包括以下步骤：In a preferred implementation manner, the operation flow of the shader architecture multiplexed by the texture unit and the cache includes the following steps:

S1:在顶点染色阶段，通过顶点总线请求顶点数据；S1: In the vertex coloring stage, request vertex data through the vertex bus;

S2:在像素染色阶段，通过纹理总线请求纹理数据。S2: In the pixel coloring stage, texture data is requested through the texture bus.

在一优选的实施方式中，所述步骤S1中，在顶点染色阶段，染色器通过顶点私有总线请求使用地址直接请求顶点数据。顶点请求发给纹理单元后，直接透传到cache。Cache直接通过顶点私有总线将数据返回给染色器。In a preferred embodiment, in the step S1, in the vertex coloring stage, the shader directly requests vertex data through the vertex private bus request address. After the vertex request is sent to the texture unit, it is directly transparently transmitted to the cache. Cache directly returns data to the shader through the vertex private bus.

在一优选的实施方式中，所述步骤S2中，在像素染色阶段，染色器通过像素私有总线使用纹理坐标和纹理id请求纹理数据。纹理请求发给纹理单元后，首先根据纹理id从纹理查找表中获取纹理信息；然后根据纹理信息和纹理坐标，将请求转换为地址，从cache取回数据；取回数据后根据纹理信息进行解码，最终通过总线返回给染色器。In a preferred embodiment, in the step S2, in the pixel coloring stage, the shader requests texture data by using texture coordinates and texture ids through a pixel private bus. After the texture request is sent to the texture unit, first obtain the texture information from the texture lookup table according to the texture id; then convert the request into an address according to the texture information and texture coordinates, and retrieve the data from the cache; after retrieving the data, decode according to the texture information , and finally returned to the shader through the bus.

在一优选的实施方式中，所述多口cache内部通过将整个内存空间的数据根据地址低位映射到不同的单口cache内，如图3所示，从而提供了多个读接口。多个口的请求通过地址合并单元合并，分别将每个口的请求映射到内部的某个单口cache上。在所有的口请求数据都返回后，多口cache将数据输出。In a preferred embodiment, the multi-port cache internally maps the data of the entire memory space into different single-port caches according to the lower bits of the address, as shown in FIG. 3 , thereby providing multiple read interfaces. The requests of multiple ports are merged by the address merging unit, and the requests of each port are mapped to an internal single-port cache. After all the port request data are returned, the multi-port cache outputs the data.

综上所述，由于采用了上述技术方案，本发明的有益效果是：In summary, owing to adopting above-mentioned technical scheme, the beneficial effect of the present invention is:

本发明中，提出了一种纹理单元与cache复用的统一着色器架构。本架构的好处是：顶点染色器和像素染色器架构一致；不需要额外的单元处理不同的请求，节约了片内资源；采用多口cache的设计，提高了并行多个处理单元的请求效率。In the present invention, a unified shader architecture in which texture units and caches are multiplexed is proposed. The advantages of this architecture are: the vertex shader and pixel shader have the same architecture; no additional units are required to process different requests, which saves on-chip resources; the multi-port cache design improves the request efficiency of multiple parallel processing units.

附图说明Description of drawings

图1为本发明的系统框架图；Fig. 1 is a system frame diagram of the present invention;

图2为本发明中多口cache架构图；Fig. 2 is a multi-port cache architecture diagram in the present invention;

图3为本发明中cache内存映射图。FIG. 3 is a cache memory map in the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

参照图1-3，Referring to Figure 1-3,

实施例：Example:

一种纹理单元与cache复用的shader架构，所述纹理单元与cache复用的shader架构包括：纹理单元、shader架构所述纹理单元与shader架构之间采用私有总线连接。A shader architecture multiplexing a texture unit and a cache, the shader architecture multiplexing a texture unit and a cache includes: a texture unit, shader architecture The texture unit and the shader architecture are connected by a private bus.

所述纹理单元与cache复用的shader架构的运行流程包括以下步骤：The operation flow of the shader architecture multiplexed by the texture unit and the cache includes the following steps:

所述步骤S1中，在顶点染色阶段，染色器通过顶点私有总线请求使用地址直接请求顶点数据。顶点请求发给纹理单元后，直接透传到cache。Cache直接通过顶点私有总线将数据返回给染色器。In the step S1, in the vertex coloring stage, the shader directly requests vertex data through the vertex private bus request address. After the vertex request is sent to the texture unit, it is directly transparently transmitted to the cache. Cache directly returns data to the shader through the vertex private bus.

所述步骤S2中，在像素染色阶段，染色器通过像素私有总线使用纹理坐标和纹理id请求纹理数据。纹理请求发给纹理单元后，首先根据纹理id从纹理查找表中获取纹理信息；然后根据纹理信息和纹理坐标，将请求转换为地址，从cache取回数据；取回数据后根据纹理信息进行解码，最终通过总线返回给染色器。In the step S2, in the pixel coloring stage, the shader requests texture data through the pixel private bus using texture coordinates and texture id. After the texture request is sent to the texture unit, first obtain the texture information from the texture lookup table according to the texture id; then convert the request into an address according to the texture information and texture coordinates, and retrieve the data from the cache; after retrieving the data, decode according to the texture information , and finally returned to the shader through the bus.

所述多口cache内部通过将整个内存空间的数据根据地址低位映射到不同的单口cache内，如图3所示，从而提供了多个读接口。多个口的请求通过地址合并单元合并，分别将每个口的请求映射到内部的某个单口cache上。在所有的口请求数据都返回后，多口cache将数据输出。The multi-port cache internally maps the data of the entire memory space into different single-port caches according to the lower bits of the address, as shown in FIG. 3 , thereby providing multiple read interfaces. The requests of multiple ports are merged by the address merging unit, and the requests of each port are mapped to an internal single-port cache. After all the port request data are returned, the multi-port cache outputs the data.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments Modifications are made to the recorded technical solutions, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A loader architecture for multiplexing a texture unit with a cache is characterized in that: the loader architecture for multiplexing the texture unit and the cache comprises: the texture unit and the loader architecture are connected by adopting a private bus.

2. The loader architecture for multiplexing a texture unit with cache as recited in claim 1, wherein: the operation flow of the loader architecture for multiplexing the texture unit and the cache comprises the following steps:

s1, in the vertex dyeing stage, vertex data is requested through a vertex bus;

s2, in the pixel dyeing stage, texture data is requested through a texture bus.

3. The loader architecture for multiplexing texture units with cache as recited in claim 2, wherein: in the step S1, in the vertex dyeing stage, the dyeing device requests the vertex data directly by using the address through the vertex private bus; after the vertex request is sent to the texture unit, the vertex request is directly transmitted to the cache; the Cache returns the data to the stainer directly through the vertex private bus.

4. The loader architecture for multiplexing texture units with cache as recited in claim 2, wherein: in the step S2, in the pixel dyeing stage, the stainer requests texture data by using texture coordinates and texture ids through a pixel private bus; after a texture request is sent to a texture unit, firstly, texture information is obtained from a texture lookup table according to texture id; then converting the request into an address according to texture information and texture coordinates, and retrieving data from the cache; and after retrieving the data, decoding according to the texture information, and finally returning the data to the dyeing machine through a bus.

5. The loader architecture for multiplexing texture units with cache as recited in claim 2, wherein: the data in the whole memory space is mapped into different single-port caches according to the low-order address, as shown in fig. 3, so that a plurality of read interfaces are provided; the requests of a plurality of ports are combined through an address combining unit, and the requests of each port are mapped to a single-port cache inside; after all the port request data are returned, the multi-port cache outputs the data.