CN101216932A

CN101216932A - Graphics processing apparatus, unit and method for performing triangle configuration and attribute configuration

Info

Publication number: CN101216932A
Application number: CNA2008100018156A
Authority: CN
Inventors: 焦阳; 洪洲; 尹莉; 许云杰
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2008-01-03
Filing date: 2008-01-03
Publication date: 2008-07-09
Anticipated expiration: 2028-01-03
Also published as: CN101216932B

Abstract

The invention discloses a graphics processing device, a unit and a method for executing triangle configuration and attribute configuration. One embodiment includes at least one execution unit operable for multi-threaded operation. The execution unit may also execute at least one thread for triangle configuration operations and attribute configuration operations and threads for pixel shader, geometry shader, and vertex shader operations. The graphics processing apparatus, unit and method for performing triangle configuration and attribute configuration according to the present invention can remove at least part of hardware components, thereby reducing the number of gates in the system and resulting in more efficient graphics pipeline, and have flexibility and extensibility for modification of program errors, addition of new features or adjustment of algorithms.

Description

Graphics processing device, unit, and method for executing triangle configuration and attribute configuration

技术领域technical field

本发明的内容是关于计算机图形系统，且更特定而言，关于图形管线(graphics pipeline)的三角形配置以及属性配置阶段的系统以及方法。The present invention relates to computer graphics systems, and more particularly to systems and methods for the triangle configuration and attribute configuration stages of a graphics pipeline.

背景技术Background technique

众所周知，三维(“3-D”)计算机图形的技术以及科学是关于3-D物件的二维(“2-D”)影像的产生或再现，以显示或呈现于显示装置或监视器上，诸如阴极射线管(CathodeRay Tube，CRT)或液晶显示器(Liquid Crystal Display，LCD)。物件可为简单几何基元，诸如，点、线段、三角形或多角形。通过以一连串连接的平面多角形来表示物件，诸如，通过将物件表示为一连串连接的平面三角形，可将较复杂的物件再现于显示装置上。所有的几何基元可最终由一顶点或一组顶点(例如，界定点(例如，线段的端点或多角形的角)的坐标(X，Y，Z))来描述。As is well known, the art and science of three-dimensional ("3-D") computer graphics is concerned with the generation or reproduction of two-dimensional ("2-D") images of 3-D objects for display or representation on display devices or monitors, Such as cathode ray tube (CathodeRay Tube, CRT) or liquid crystal display (Liquid Crystal Display, LCD). Objects can be simple geometric primitives such as points, line segments, triangles or polygons. By representing an object as a series of connected planar polygons, such as by representing an object as a series of connected planar triangles, more complex objects can be rendered on the display device. All geometric primitives can ultimately be described by a vertex or a set of vertices (eg, coordinates (X, Y, Z) that define a point (eg, an endpoint of a line segment or a corner of a polygon).

为了产生作为表示3-D基元的2-D投影而显示于计算机监视器或其他显示装置上的数据组，基元的顶点可经由图形再现管线中的一连串操作或处理阶段来处理。图形管线仅为一连串的处理单元或阶段，其中来自先前阶段的输出可作为后续阶段的输入。举例而言，在图形处理器的内容操作阶段，此等阶段包括每顶点(per-vertex)操作，基元组件操作、像素操作、纹理组件操作、再现处理操作以及片段操作。To produce a data set displayed on a computer monitor or other display device as a 2-D projection representing a 3-D primitive, the primitive's vertices may be processed through a series of operations or processing stages in the graphics rendering pipeline. A graphics pipeline is simply a series of processing units, or stages, where output from previous stages can be used as input to subsequent stages. For example, in the content operation phase of a graphics processor, these phases include per-vertex operations, primitive component operations, pixel operations, texture component operations, rendering processing operations, and fragment operations.

在典型的图形显示系统中，影像数据库(例如，命令清单)可储存一场景的物件的描述子。物件是通过以众多可覆盖物件的表面的小多角形来描述，如同小砖块覆盖墙壁或其他表面的相同方式。每一多角形被描述为顶点坐标(在“模型”坐标中的X、Y、Z)的清单与材料表面特性(亦即，色彩、纹理、光泽度等)的一些规格，以及在每一顶点处相对于表面的法向向量(normal vector)。对于具有复杂弯曲表面的三维物件，一般而言，多角形必须为三角形或四边形，且后者始终可分解为成对的三角形。In a typical graphics display system, an image database (eg, a command list) may store descriptors for objects of a scene. Objects are described by a number of small polygons that can cover the surface of the object in the same way that small bricks cover a wall or other surface. Each polygon is described as a list of vertex coordinates (X, Y, Z in "model" coordinates) and some specification of material surface properties (i.e., color, texture, glossiness, etc.), and at each vertex The normal vector relative to the surface. For 3D objects with complex curved surfaces, polygons must generally be either triangles or quadrilaterals, with the latter always decomposed into pairs of triangles.

对应于使用者自使用者输入选择的检视角度，转换引擎(transformation engine)可转换物件坐标。另外，使用者可指定视野、待产生的影像的大小以及视野体积的后端，以按需要包括或消除背景。A transformation engine may transform object coordinates corresponding to a viewing angle selected by the user from the user input. In addition, the user can specify the field of view, the size of the image to be generated, and the back end of the field of view volume to include or eliminate the background as desired.

一旦已选择了此检视区(viewing area)，裁剪(clipping)逻辑电路消除处于检视区外的多角形(亦即，三角形)且“裁剪”部分处于检视区内且部分处于检视区外的多角形。此等经裁剪的多角形对应于多角形处于检视区内的部分，其中新的边缘对应于检视区的边缘。多角形顶点接着以对应至检视屏幕的坐标(X、Y坐标)以及每一顶点对应的深度值(Z坐标)被传输至下一个阶段。在典型系统中，接下来根据光源而加上照明模型，然后将多角形以及其色值传输至再现处理器。Once the viewing area has been selected, clipping logic eliminates polygons (i.e., triangles) that are outside the viewing area and "clipping" polygons that are partially inside the viewing area and partially outside the viewing area . These clipped polygons correspond to the portion of the polygon that is within the viewport, where the new edges correspond to the edges of the viewport. The polygon vertices are then passed to the next stage with coordinates corresponding to the viewing screen (X, Y coordinates) and the corresponding depth value (Z coordinate) for each vertex. In a typical system, an illumination model is next applied according to the light source, and the polygons and their color values are then passed to the rendering processor.

对于每一多角形，再现处理器判定哪些像素位置被多角形覆盖，且试图将相关的色值以及深度值(Z值)写入至帧缓冲器(frame buffer)中。再现处理器将正处理的多角形的深度值(Z)与一像素的深度值(其可能已被写入至帧缓冲器中)比较。若新的多角形像素的深度值较小，表示其处于已写入至帧缓冲器的多角形的前端，则其值将替代帧缓冲器中的值，因为新的多角形将遮掩先前经处理且写入至帧缓冲器中的多角形。此过程会一直重复至已再现处理所有多角形为止。此时，视频控制器将帧缓冲器的内容按再现次序一次一扫描线地显示于显示器上。For each polygon, the rendering processor determines which pixel locations are covered by the polygon, and attempts to write the associated color and depth values (Z values) into the frame buffer. The rendering processor compares the depth value (Z) of the polygon being processed to the depth value of a pixel (which may have been written into the frame buffer). If the new polygon pixel has a lower depth value, indicating that it is in front of the polygon already written to the framebuffer, its value will replace the value in the framebuffer, because the new polygon will obscure the previously processed And write to the polygon in the framebuffer. This process is repeated until all polygons have been reproduced. At this point, the video controller displays the contents of the frame buffer on the display one scan line at a time in reproduction order.

执行即时再现的预设方法通常是将多角形显示为位于多角形之内或外的像素。界定多角形的边缘在静态显示器中看起来可能具有锯齿状外观，而在动画显示器中看起来为一拖曳外观。产生此效应的潜在问题称为偏移(aliasing)，且经应用以减少或消除问题的方法称为反偏移(anti aliasing)技术。The default method for performing instant rendering is usually to display polygons as pixels that lie inside or outside the polygon. The edges bounding the polygon may appear to have a jagged appearance in static displays and a dragging appearance in animated displays. The underlying problem that creates this effect is called aliasing, and the methods applied to reduce or eliminate the problem are called anti-aliasing techniques.

针对屏幕显像的反偏移方法并不需要知晓正在再现的物件，因为其仅使用管线输出样本。一种典型的反偏移方法利用一种被称为多样本反偏移(Multi-Sample Anti-Aliasing，MSAA)的线性反偏移技术，其在单一传输中每像素采样一个以上样本。每一像素需要的样本或子像素的数目被称为取样率，且理论上，当取样率增加时，相关的存储器信息量亦增加。The demisting method for screen rendering does not need to know what is being rendered, since it only uses the pipeline output samples. A typical demigration method utilizes a linear demigration technique known as Multi-Sample Anti-Aliasing (MSAA), which samples more than one sample per pixel in a single transmission. The number of samples or sub-pixels required for each pixel is called the sampling rate, and in theory, as the sampling rate increases, the associated amount of memory information also increases.

虽然前述内容已简要地概括了各种处理组件的操作，但本领域技术人员应认识到，关于图形数据的处理需相当地加强。因此，只要有可能，则需要改良处理、设计以及制造效率。图形管线的固定功能阶段，诸如三角形配置以及属性配置，是用于图形管线中的几何基元以及像素的处理所必须的。此等包括在已知图形处理单元中的固定功能阶段是在固定功能硬件组件或专用硬件中来执行。一般使用的单独的三角形配置以及属性配置单元需要相当数目的门、通信线以及硬件成本。另外，更改图形管线的三角形配置以及属性配置阶段需要对此等昂贵的硬件组件进行改变。因此，存在至今未解决的需求来克服先前技术的不足。While the foregoing has briefly outlined the operation of the various processing components, those skilled in the art will recognize that processing with respect to graphics data requires considerable enhancement. Therefore, improvements in processing, design, and manufacturing efficiencies are desired, wherever possible. Fixed-function stages of the graphics pipeline, such as triangle configuration and attribute configuration, are necessary for the processing of geometric primitives and pixels in the graphics pipeline. Such fixed-function stages included in known graphics processing units are implemented in fixed-function hardware components or in dedicated hardware. The individual triangle configurations and attribute configuration cells typically used require a considerable number of gates, communication lines, and hardware costs. Additionally, changing the triangle configuration and attribute configuration stages of the graphics pipeline requires changes to such expensive hardware components. Accordingly, there is a heretofore unaddressed need to overcome the deficiencies of the prior art.

发明内容Contents of the invention

本发明是关于实施图形管线的三角形配置以及属性配置阶段的系统以及方法。简言之，本发明的一系统的实施例其架构可如下实现：此系统包括至少一执行单元，此执行单元可用于多线程操作，其中此执行单元可执行用于三角形配置操作以及属性配置操作的至少一线程。此执行单元是可编程化的以执行至少一线程以用于选自以下的至少一个：顶点着色器(vertexshader)操作、像素着色器(pixel shader)操作以及几何着色器(geometry shader)操作。此执行单元更可中止为三角形配置操作以及属性配置操作所建立的至少一线程。此执行单元更可将来自三角形配置操作(来自至少一线程)的数据输出至执行单元外的至少一硬件组件，所述可编程化三角形配置操作来自所述至少一线程。当接收到对应于至少一线程的数据时，此执行单元更可恢复中止的线程。最后，此执行单元更可将来自线程的结果数据储存于至少一执行单元内的缓冲器中，以供由该执行单元所建立的随后线程来使用。The present invention relates to systems and methods for implementing the triangle configuration and attribute configuration stages of a graphics pipeline. In short, the architecture of a system embodiment of the present invention can be implemented as follows: the system includes at least one execution unit, and the execution unit can be used for multi-threaded operations, wherein the execution unit can perform triangle configuration operations and attribute configuration operations at least one thread of . The execution unit is programmable to execute at least one thread for at least one selected from: vertex shader operations, pixel shader operations, and geometry shader operations. The execution unit is further capable of stopping at least one thread established for the triangle configuration operation and the attribute configuration operation. The execution unit can further output data from triangle configuration operations (from at least one thread), the programmable triangle configuration operations from the at least one thread, to at least one hardware component outside the execution unit. The execution unit may further resume the suspended thread when data corresponding to at least one thread is received. Finally, the execution unit may store result data from threads in at least one buffer within the execution unit for use by subsequent threads created by the execution unit.

本发明另提供一种图形处理单元，包括：至少一执行单元，所述至少一执行单元可用于多线程操作，其中所述至少一执行单元可执行用于三角形配置操作以及属性配置操作的至少一线程，且所述执行单元可用以执行可编程着色器操作；以及一执行单元集区控制系统，用以排程与管理所述至少一执行单元的所述至少一线程；其中所述执行单元集区控制系统可同时起始用于所述三角形配置操作、所述属性配置操作以及一可编程着色器操作的所述至少一线程。The present invention further provides a graphics processing unit, including: at least one execution unit, the at least one execution unit can be used for multi-thread operation, wherein the at least one execution unit can execute at least one of the triangle configuration operation and the attribute configuration operation threads, and the execution units are operable to perform programmable shader operations; and an execution unit pool control system for scheduling and managing the at least one thread of the at least one execution unit; wherein the execution unit set The region control system may simultaneously initiate the at least one thread for the triangle configuration operation, the attribute configuration operation, and a programmable shader operation.

本发明的方法的一实施例包括接收顶点数据的步骤，此顶点数据对应于几何基元。此实施例更包括在可用于多线程操作的一执行单元内建立一线程，其中此执行单元可执行可编程着色器操作。此实施例更包括在执行线程内对顶点数据执行三角形配置操作。最后，此实施例包括在此线程内执行属性配置操作以产生相关顶点数据识别的像素属性并终止线程。An embodiment of the method of the present invention includes the step of receiving vertex data corresponding to geometric primitives. This embodiment further includes creating a thread within an execution unit that can be used for multi-threaded operations, wherein the execution unit can perform programmable shader operations. This embodiment further includes performing triangle configuration operations on the vertex data within the execution thread. Finally, this embodiment includes performing attribute configuration operations within the thread to generate pixel attributes identified by the associated vertex data and terminating the thread.

本发明所述的图形处理装置、单元与执行三角形配置、属性配置的方法，可移除至少部分硬件组件，进而减少系统中的门的数量，并导致更有效的图形管线，对于程序错误的修改、新特征的添加或演算法的调整，具有灵活性以及可扩展性。The graphics processing device, the unit and the method for performing triangle configuration and attribute configuration according to the present invention can remove at least part of the hardware components, thereby reducing the number of gates in the system, and leading to a more efficient graphics pipeline, and correcting for program errors , the addition of new features or the adjustment of algorithms, with flexibility and scalability.

附图说明Description of drawings

图1描绘计算机图形系统中图形管线内的某些组件的功能流程图。Figure 1 depicts a functional flow diagram of certain components within a graphics pipeline in a computer graphics system.

图2描绘说明图形系统的固定功能以及可编程组件的方块图。Figure 2 depicts a block diagram illustrating the fixed-function as well as programmable components of a graphics system.

图3描绘说明图形处理单元以及图形处理单元的某些内部组件的功能方块图。3 depicts a functional block diagram illustrating a graphics processing unit and certain internal components of the graphics processing unit.

图4描绘说明图形系统的某固定功能以及可编程组件的方块图。FIG. 4 depicts a block diagram illustrating certain fixed functions as well as programmable components of a graphics system.

图5描绘说明图形处理单元以及图形处理单元的某些内部组件的功能方块图。5 depicts a functional block diagram illustrating a graphics processing unit and certain internal components of the graphics processing unit.

图6描绘根据本发明揭露内容的实施例的方法的流程图。Figure 6 depicts a flowchart of a method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下将对本发明的各种实施例进行详细描述(如图式中所说明)。虽然此等图式描述了若干实施例，但并非用以将本发明的内容限于本文中揭露的一或多个实施例。相反地，本发明的范围可涵盖所有的替代、修改以及其等效物。Various embodiments of the invention, as illustrated in the drawings, are described in detail below. While the drawings depict several embodiments, they are not intended to limit the disclosure to the one or more embodiments disclosed herein. On the contrary, the scope of the present invention may cover all alternatives, modifications and equivalents thereof.

如上，本发明是关于一种用于将三角形配置以及属性配置操作整合至可编程执行单元内的系统以及方法。在讨论各种实施例的实施细节前，首先参看图1，其说明图形管线100中的某些组件的方块图，此等组件可为本发明的实施例所利用或用于本发明的实施例中。图1所示的主要组件为顶点着色器110、几何着色器120、三角形配置单元130、跨距与像素片产生器(spanand tile generator)140、属性配置单元150、像素着色器160以及帧缓冲器170。本领域技术人员应可知且理解此等组件的一般功能以及操作，因此本文中无需对其详细地描述。然而，简言之，图形基元可由位置数据(例如，X、Y、Z以及W坐标)以及照明与纹理信息界定。此等所有信息可传递至顶点着色器110。如所知，顶点着色器110可对自命令清单所接收的图形数据执行各种转换。在此方面，可将数据自世界坐标(Worldcoordinate)转换为模型视野坐标(Model View coordinate)、再转换为投影坐标(Projection coordinate)以及最终转换为屏幕坐标(Screen coordinate)。顶点着色器110所执行的功能处理是本领域技术人员所已知的，无需在本文中进行进一步描述。顶点着色器110将几何基元输出至几何着色器120。As above, the present invention relates to a system and method for integrating triangle configuration and attribute configuration operations into a programmable execution unit. Before discussing implementation details of various embodiments, reference is first made to FIG. 1 , which illustrates a block diagram of certain components in a graphics pipeline 100 that may be utilized by or used in embodiments of the present invention. middle. The main components shown in FIG. 1 are a vertex shader 110, a geometry shader 120, a triangle configuration unit 130, a span and tile generator 140, an attribute configuration unit 150, a pixel shader 160, and a frame buffer 170. Those skilled in the art should know and understand the general functions and operations of these components, and thus need not be described in detail herein. In brief, however, a graphics primitive may be defined by positional data (eg, X, Y, Z, and W coordinates), as well as lighting and texture information. All of this information may be passed to the vertex shader 110 . As is known, vertex shader 110 may perform various transformations on the graphics data received from the command manifest. In this regard, data can be converted from World coordinates to Model View coordinates, to Projection coordinates, and finally to Screen coordinates. The functional processing performed by vertex shader 110 is known to those skilled in the art and requires no further description herein. Vertex shader 110 outputs the geometry primitives to geometry shader 120 .

几何着色器120所产生的几何数据以及其他图形数据被传送至三角形配置单元130，以执行三角形配置操作。三角形配置单元130的具体功能与实施细节可因不同实施例而有所不同。一般而言，可将三角形基元的相关顶点信息传递至三角形配置单元130，且可对由被传递至三角形配置单元130的图形数据所界定的各种基元执行操作。除了其他操作之外，可在三角形配置单元130内执行某些几何转换。The geometry data generated by the geometry shader 120 and other graphics data are passed to the triangle configuration unit 130 to perform triangle configuration operations. The specific functions and implementation details of the triangle configuration unit 130 may vary in different embodiments. In general, the associated vertex information of the triangle primitives may be passed to the triangle configuration unit 130 , and operations may be performed on the various primitives defined by the graphics data passed to the triangle configuration unit 130 . Among other operations, certain geometric transformations may be performed within triangle configuration unit 130 .

对于一给定顶点，可提供诸如x、y、z以及w信息的几何数据(其中，x、y与z为几何坐标，且w为齐次坐标(homogeneouscoordinate))。如本领域技术人员已知，可进行各种转换，例如，自模型空间至世界空间(world space)、至眼睛空间、至投影空间、至齐次空间、至正规化装置坐标(normalized devicecoordinate)(或NDC)，以及最后至屏幕空间(由视频端口转换执行)。应了解，本文的说明省略了图形管线的某些组件以易于描述以及清楚性，但本领域技术人员而言应可知悉。如一非限制性实例，为了清楚起见，省略了图形管线的再现处理管线的某些阶段，但一般本领域技术人员应了解，图形管线可包括其他阶段。For a given vertex, geometric data such as x, y, z, and w information may be provided (where x, y, and z are geometric coordinates, and w is a homogeneous coordinate). Various transformations can be performed as known to those skilled in the art, for example, from model space to world space, to eye space, to projective space, to homogeneous space, to normalized device coordinates ( or NDC), and finally to screen space (performed by video port conversion). It should be understood that the description herein omits some components of the graphics pipeline for ease of description and clarity, but those skilled in the art should know. As a non-limiting example, certain stages of the rendering processing pipeline of the graphics pipeline have been omitted for clarity, but those of ordinary skill in the art will appreciate that the graphics pipeline may include other stages.

现参看图2，其说明图形管线200的某些组件或阶段的方块图。第一个组件为命令流处理器(command stream processor)252，其基本上自存储器250接收或读取顶点，此顶点是用以形成几何基元以及为管线建立工作项。在此方面，命令流处理器252自存储器读取数据，以及从此数据产生待引入管线的三角形、线、点或其他基元。此几何信息一旦经组合，则被传递至顶点着色器254。在此顶点着色器254被表示为具有圆形边缘，于本发明中圆形边缘用以表示图形管线中通过执行可编程执行单元或执行单元集区(如图3中所描绘)中的指令来实现的所述阶段。如所知，顶点着色器254通过执行诸如转换、扫描以及照明的操作来处理顶点。其后，顶点着色器254将数据传递至几何着色器256。几何着色器256接收一完整基元的顶点作为输入，且能够输出形成单一拓扑(诸如，三角形条、线条、点清单等)的多个顶点。几何着色器256还可执行各种演算法，诸如镶嵌(tessellation)、阴影体(shadow volume)产生等。Referring now to FIG. 2 , a block diagram of certain components or stages of a graphics pipeline 200 is illustrated. The first component is the command stream processor 252, which basically receives or reads vertices from memory 250, which are used to form geometric primitives and build work items for the pipeline. In this regard, the command stream processor 252 reads data from memory, and from this data generates triangles, lines, points, or other primitives to be introduced into the pipeline. Once assembled, this geometry information is passed to vertex shader 254 . Vertex shader 254 is shown here as having rounded edges, which are used in this disclosure to indicate that the graphics pipeline executes instructions in a programmable execution unit or pool of execution units (as depicted in FIG. 3 ). said stage of realization. As is known, a vertex shader 254 processes vertices by performing operations such as transformation, scanning, and lighting. Vertex shader 254 then passes the data to geometry shader 256 . Geometry shader 256 receives as input the vertices of a complete primitive, and can output multiple vertices forming a single topology (such as a triangle strip, line, point list, etc.). The geometry shader 256 may also perform various algorithms, such as tessellation, shadow volume generation, and the like.

几何着色器256将信息输出至三角形配置单元257，如所已知，其执行诸如三角形琐细排斥、行列式计算、精选、预属性配置KLMN、边缘函数计算以及安全带裁剪的操作。一般本领域技术人员应了解三角形配置单元的必要操作，且无需进一步对其进行详细阐述。三角形配置单元257将信息输出至跨距与像素片产生器258。图形管线的此阶段在此项技术中是已知的，且无需进行进一步地详细讨论。然而，总结而论，若不必将此三角形再现至屏幕，则跨距与像素片产生器258会执行三角形的排斥操作。应了解，再现处理管线的其他元件可操作，诸如，图形管线的Z测试或其他固定功能元件。举例而言，可执行Z测试来判定三角形的深度以进一步判定是否应排斥三角形为不必再现至屏幕。然而，此等元件并未在本文中进一步讨论，因为其应是一般本领域技术人员所了解。Geometry shader 256 outputs information to triangle configuration unit 257, which performs operations such as triangular exclusion, determinant computation, refinement, pre-attribute configuration KLMN, edge function computation, and seat belt clipping, as is known. Generally, those skilled in the art should understand the necessary operation of the triangle configuration unit, and no further elaboration thereof is required. The triangle configuration unit 257 outputs the information to the stride and tile generator 258 . This stage of the graphics pipeline is known in the art and need not be discussed in further detail. However, in summary, if the triangle does not have to be rendered to the screen, the stride and tile generator 258 performs a triangle repelling operation. It should be appreciated that other elements of the rendering processing pipeline may operate, such as the Z-test or other fixed function elements of the graphics pipeline. For example, a Z test can be performed to determine the depth of a triangle to further determine whether the triangle should be excluded from rendering to the screen. However, these elements are not further discussed herein as they would be known to those of ordinary skill in the art.

如果由三角形配置单元257处理的三角形未受到跨距与像素片产生器258或图形管线的其他阶段排斥，则图形管线的属性配置单元259将执行属性配置操作。属性配置单元259产生在管线的随后阶段中待判定的已知且需要的属性的内插变数的清单。此外，如所已知，属性配置单元259处理与正由图形管线处理的几何基元相关的各种属性。The attribute configuration unit 259 of the graphics pipeline will perform attribute configuration operations if the triangles processed by the triangle configuration unit 257 are not excluded by the stride and tile generator 258 or other stages of the graphics pipeline. The attribute configuration unit 259 generates a list of interpolated variables for known and required attributes to be determined in subsequent stages of the pipeline. Furthermore, as is known, attribute configuration unit 259 handles various attributes associated with geometric primitives being processed by the graphics pipeline.

由属性配置单元259所输出的基元覆盖的每一像素需要经过像素着色器260的处理。众所周知，像素着色器260执行判定输出至帧缓冲器262的像素色彩的内插法以及其他操作。图2中说明的各种组件的操作对于本领域技术人员而言是熟知的，且在本文中无需进行进一步描述。因此，此等单元内部的具体实施以及操作无需在本文中描述。Each pixel covered by the primitives output by the attribute configuration unit 259 needs to be processed by the pixel shader 260 . Pixel shader 260 performs interpolation to determine the color of pixels output to frame buffer 262, among other operations, as is well known. The operation of the various components illustrated in FIG. 2 is well known to those skilled in the art and requires no further description herein. Therefore, the specific implementation and operation inside these units need not be described herein.

现参看图3，其描绘一实施例的图形处理单元(graphicsprocessing unit，GPU)300。此图形系统具有建立可编程着色器如几何着色器、像素着色器、顶点着色器或已知的其他着色器的能力。所述着色器由程序建立且可由多个可编程执行单元集区306(以下称为执行单元集区306)中的至少一个执行。应了解，执行单元集区306可包括能够进行多线程操作的处理核心。因此，执行单元集区306可发动分配给特定类型的着色器的一个以上线程。举例而言，执行单元集区306可对一组数据启动以及执行用于几何着色器310的线程，并同时对另一组启动另一条线程于顶点着色器308。关于执行单元集区的结构以及操作的实例，请参照2006年4月19日申请的同在申请中的美国申请案序号11/406,543。Referring now to FIG. 3 , a graphics processing unit (GPU) 300 of an embodiment is depicted. The graphics system has the ability to create programmable shaders such as geometry shaders, pixel shaders, vertex shaders or other known shaders. The shader is created by a program and is executable by at least one of a plurality of programmable execution unit pools 306 (hereinafter referred to as execution unit pools 306 ). It should be appreciated that execution unit pool 306 may include processing cores capable of multi-threaded operations. Accordingly, execution unit pool 306 may launch more than one thread assigned to a particular type of shader. For example, execution unit pool 306 may launch and execute a thread for geometry shader 310 on one set of data while simultaneously launching another thread for vertex shader 308 on another set. See co-pending US Application Serial No. 11/406,543, filed April 19, 2006, for an example of the structure and operation of the pool of execution units.

然而，总结以上结构，执行单元集区306中的每一执行单元能够在单一时脉周期内处理多个指令。因此，每一执行单元可同时处理多个线程。举例而言，如上所提到，执行单元可同时处理用于几何着色器操作的线程以及用于像素着色器操作的线程。排程器自多个着色器阶段接收进来的任务以执行与着色器相关的计算，且将其指派至具有能力的执行单元。执行单元集区306的执行单元内的线程经各个排程以执行与着色器相关的计算，使其可随着时间而排程给定的线程，以执行用于不同着色器阶段的着色器操作。此外，在给定执行单元内，可将某些线程指派至一着色器的任务，而同时可将其他线程指派至其他着色器单元的任务。以此方式，可平衡系统中的执行单元之间的负载以达成流量最佳化。类似地，可平衡执行单元集区内可利用线程之间的负载以使系统的流量最大化。由于先前技术图形系统使用专用着色器硬件，所以无法将诸如在以上结构中的稳固与动态线程管理用于图形系统。因此，无法实现此结构的图形系统的灵活性以及可扩展性。However, to summarize the above structure, each execution unit in the execution unit pool 306 is capable of processing multiple instructions within a single clock cycle. Therefore, each execution unit can process multiple threads simultaneously. For example, as mentioned above, an execution unit may concurrently process threads for geometry shader operations and threads for pixel shader operations. The scheduler receives incoming tasks from multiple shader stages to perform shader-related computations and dispatches them to capable execution units. Threads within the execution units of the execution unit pool 306 are individually scheduled to perform shader-related computations so that a given thread can be scheduled over time to perform shader operations for different shader stages . Furthermore, within a given execution unit, certain threads may be assigned to tasks of one shader, while other threads may be assigned to tasks of other shader units. In this way, the load can be balanced among the execution units in the system to achieve flow optimization. Similarly, the load among available threads within the pool of execution units can be balanced to maximize the throughput of the system. Since prior art graphics systems use dedicated shader hardware, robust and dynamic thread management such as in the above architecture cannot be used with graphics systems. Therefore, the flexibility and scalability of the graphics system of this structure cannot be realized.

执行单元集区控制与快取子系统304含有供执行单元集区306使用的电平二(1evel 2)快取存储器以及用以排程执行单元集区306的系统(未图示)。在此图形处理单元中，执行单元集区306与其外部组件之间的通信是通过执行单元集区控制与快取子系统304来进行，然而，如所已知亦可将其他线及/或通信链路直接建立至执行单元集区以有助于图形管线的执行。详言之，三角形配置单元314、属性配置单元316以及跨距与像素片产生器318为可经由执行单元集区控制与快取子系统304与执行单元集区306通信的固定功能硬件逻辑组件。Execution unit pool control and caching subsystem 304 contains level 2 cache memory for execution unit pool 306 and a system (not shown) for scheduling execution unit pool 306 . In this graphics processing unit, communication between the execution unit pool 306 and its external components is through the execution unit pool control and caching subsystem 304, however, other lines and/or communication lines may be used as is known. Links are established directly to the pool of execution units to facilitate execution of the graphics pipeline. Specifically, the triangle configuration unit 314 , the attribute configuration unit 316 , and the stride and tile generator 318 are fixed-function hardware logic components that communicate with the execution unit pool 306 via the execution unit pool control and cache subsystem 304 .

如以上参看图2所提到，为了清楚起见，已自图式省略了图形管线的某些组件。类似地，为了清楚起见，图3省略了图形处理单元300的某些组件；然而，一般本领域技术人员应了解可能需要其他组件。对于一般本领域技术人员而言，用于三角形配置、属性配置以及跨距产生器/像素片产生器的操作是已知的，且无需进行进一步详细讨论。如一实施例，三角形配置单元314执行诸如下列的操作：三角形琐细排斥、行列式计算、边界框计算、精选、预属性配置KLMN、边缘函数产生、裁剪以及安全带裁剪。类似地，属性配置单元316执行诸如对应于在制备像素着色器中以及像素着色器操作中的像素的属性的处理操作。As mentioned above with reference to FIG. 2, certain components of the graphics pipeline have been omitted from the diagram for clarity. Similarly, certain components of graphics processing unit 300 are omitted from FIG. 3 for clarity; however, those of ordinary skill in the art will appreciate that other components may be required. The operations for triangle configurations, attribute configurations, and span generators/tile generators are known to those of ordinary skill in the art and need not be discussed in further detail. As an example, the triangle configuration unit 314 performs operations such as triangular exclusion, determinant calculation, bounding box calculation, refinement, pre-attribute configuration KLMN, edge function generation, clipping, and belt clipping. Similarly, the attribute configuration unit 316 performs processing operations such as corresponding to attributes of pixels in preparing a pixel shader and in pixel shader operations.

现参看图4，其描绘本发明的一实施例的图形管线400。图4中描绘的图形管线400与先前技术中的图形管线具有不同创新。数据通常在管线中自命令流处理器452向下方移动。如上所提到，顶点着色器454具有圆形边缘，此表示其为通过执行可编程执行单元或执行单元集区中的指令而实施的图形管线的阶段。类似地，几何着色器456亦为图形管线的可编程阶段，且因此通过执行可编程执行单元或执行单元集区中的指令而实施。Referring now to FIG. 4, a graphics pipeline 400 of one embodiment of the present invention is depicted. The graphics pipeline 400 depicted in FIG. 4 has different innovations than graphics pipelines in the prior art. Data generally moves down the pipeline from command stream processor 452 . As mentioned above, vertex shader 454 has rounded edges, which indicates that it is a stage of the graphics pipeline implemented by executing instructions in a programmable execution unit or pool of execution units. Similarly, geometry shader 456 is also a programmable stage of the graphics pipeline, and thus is implemented by executing instructions in a programmable execution unit or pool of execution units.

如上所提到，图形管线的三角形配置457阶段通常为固定功能阶段，其意谓，此阶段并不为使用者可编程的。三角形配置457阶段接受数据且对数据执行预定操作并输出结果。三角形配置457阶段的先前实施通常包括与用于图形管线400的可编程阶段的可编程执行单元(诸如，几何着色器456或顶点着色器454)分开的单独的硬件组件。根据本发明的实施例，三角形配置457阶段可实施于可编程执行单元或执行单元集区内，尽管三角形配置457阶段通常不为图形管线的使用者可编程阶段。如上所提到，三角形配置操作可包括三角形琐细排斥、行列式计算、边界框计算、精选、预属性配置KLMN、边缘函数产生、裁剪以及安全带裁剪。As mentioned above, the delta configuration 457 stage of the graphics pipeline is typically a fixed function stage, which means that this stage is not user programmable. The triangle configuration 457 stages accept data and perform predetermined operations on the data and output results. Previous implementations of the triangle configuration 457 stage typically included separate hardware components from the programmable execution units for the programmable stages of the graphics pipeline 400 , such as geometry shader 456 or vertex shader 454 . According to an embodiment of the invention, the delta configuration 457 stage may be implemented within a programmable execution unit or pool of execution units, although the delta configuration 457 stage is typically not a user-programmable stage of a graphics pipeline. As mentioned above, triangle configuration operations may include triangular exclusion, determinant calculation, bounding box calculation, refinement, pre-attribute configuration KLMN, edge function generation, clipping, and belt clipping.

类似地，根据此实施例，属性配置459阶段亦可实施于可编程执行单元内，尽管属性配置459阶段通常不为图形管线400的使用者可编程阶段。属性配置操作可包括对应于在制备像素着色器中以及像素着色器操作中的像素的处理属性。根据本发明的内容，用于三角形配置457阶段以及属性配置459阶段的操作可实施于软件中而非于固定功能硬件组件中。换言之，与执行单元集区互动的软件可发出对一数据组操作的一指令组以完成三角形配置或属性配置操作。Similarly, according to this embodiment, the attribute configuration 459 stage can also be implemented in a programmable execution unit, although the attribute configuration 459 stage is generally not a user-programmable stage of the graphics pipeline 400 . Attribute configuration operations may include processing attributes corresponding to pixels in preparing a pixel shader and in pixel shader operations. In accordance with the teachings of the present invention, the operations for the triangle configuration 457 and attribute configuration 459 stages may be implemented in software rather than in fixed-function hardware components. In other words, software interacting with the pool of execution units can issue a set of instructions that operate on a set of data to perform triangle configuration or attribute configuration operations.

根据图4，跨距与像素片产生器458为固定功能硬件组件，而非实施于可编程执行单元内的图形管线的阶段。然而，一般本领域技术人员应了解，跨距与像素片产生器或图形管线的其他阶段(包括(但不限于)未图示的再现处理管线的固定功能阶段)亦可经由在可编程执行单元中执行软件指令来实施。According to FIG. 4, stride and tile generator 458 is a fixed-function hardware component rather than a stage of a graphics pipeline implemented within a programmable execution unit. However, those of ordinary skill in the art will appreciate that stride and pixel tile generators or other stages of the graphics pipeline (including, but not limited to, fixed-function stages of the rendering processing pipeline not shown) may also be implemented via the programmable execution unit Executing software instructions in the implementation.

现参看图5，其描绘本发明的一实施例的图形处理单元500。如上所提到，为了清楚起见，省略了图形处理单元500的某些组件；然而，一般本领域技术人员应了解，其他未描绘的硬件以及逻辑组件可存在于图形处理单元500中。图形处理单元500包括多个可编程执行单元集区506(以下称为执行单元集区506)以及执行单元集区控制与快取子系统504。执行单元集区控制与快取子系统504可控制执行单元集区506的处理核心的线程管理以及系统的使用者与图形处理单元500内的其他组件之间的通信。由执行单元集区使用的一或多个快取存储器的快取子系统亦可驻留于执行单元集区506控制与快取子系统504中。举例而言，快取子系统可被顶点着色器线程508用来储存数据以供执行三角形配置操作的随后线程使用，或用于典型的存储器传输。或者，执行单元集区506中的每一执行单元可包括执行单元缓冲器，用于由在同一执行单元内执行的随后线程使用的数据的储存。Referring now to FIG. 5, a graphics processing unit 500 of an embodiment of the present invention is depicted. As mentioned above, certain components of the graphics processing unit 500 have been omitted for clarity; The GPU 500 includes a plurality of programmable execution unit pools 506 (hereinafter referred to as execution unit pools 506 ) and an execution unit pool control and cache subsystem 504 . Execution unit pool control and caching subsystem 504 may control thread management of processing cores of execution unit pool 506 and communication between users of the system and other components within GPU 500 . The cache subsystem for one or more cache memories used by the execution unit pool may also reside in the execution unit pool 506 control and cache subsystem 504 . For example, the cache subsystem may be used by the vertex shader thread 508 to store data for use by subsequent threads performing triangle configuration operations, or for typical memory transfers. Alternatively, each execution unit in execution unit pool 506 may include an execution unit buffer for storage of data used by subsequent threads executing within the same execution unit.

如上所提到，图形管线的使用者可编程阶段(诸如，几何着色器510、顶点着色器508或像素着色器512)可于执行单元集区506内执行。由于执行单元集区506通常为能够进行多线程操作的处理核心，所以执行单元集区控制与快取子系统504通常负责在执行单元集区506内的线程的排程。当执行单元集区控制与快取子系统504接收到可编程着色器的执行请求时，其将指示执行单元集区506中的执行单元建立用于着色器的执行的新线程。执行单元集区控制与快取子系统504可管理执行单元集区506上的负载，以及自一种类型的着色器至另一类型着色器的转变资源，以有效地管理图形管线的流量。此等线程管理技术是已知的且无需在本文中进行进一步详细讨论。然而，举例来说，若像素着色器512为瓶颈源(就GPU 500的流量而言)，则执行单元集区控制与快取子系统504可将较多的执行单元资源配置至像素着色器512以便改善流量。As mentioned above, user-programmable stages of the graphics pipeline, such as geometry shader 510 , vertex shader 508 , or pixel shader 512 , may execute within execution unit pool 506 . Since the execution unit pool 506 is usually a processing core capable of multi-threaded operation, the execution unit pool control and cache subsystem 504 is generally responsible for the scheduling of threads within the execution unit pool 506 . When the execution unit pool control and cache subsystem 504 receives an execution request of a programmable shader, it will instruct the execution units in the execution unit pool 506 to create a new thread for execution of the shader. The execution unit pool control and caching subsystem 504 can manage the load on the execution unit pool 506 and transfer resources from one type of shader to another type of shader to efficiently manage the traffic of the graphics pipeline. Such thread management techniques are known and need not be discussed in further detail herein. However, the execution unit pool control and caching subsystem 504 can allocate more execution unit resources to the pixel shader 512 if, for example, the pixel shader 512 is the bottleneck source (in terms of GPU 500 traffic) in order to improve flow.

根据本发明的一实施例，当图形管线的执行需要三角形配置520或属性配置522操作时，可建立额外的线程以执行三角形配置或属性配置操作。相对于图3的图形处理单元(图3的三角形配置单元及属性配置单元为GPU内的单独的硬件组件)，本实施例的三角形配置520以及属性配置522阶段可实现于在执行单元集区506内执行的软件中。换言之，除了执行如以上所提到的可编程着色器操作的线程外，通过在执行单元内建立能够执行三角形配置以及属性配置操作的线程，可使执行单元集区506能执行三角形配置以及属性配置操作。According to an embodiment of the present invention, when the execution of the graphics pipeline requires triangle allocation 520 or attribute allocation 522 operations, additional threads may be established to perform triangle allocation or attribute allocation operations. Compared with the graphics processing unit in FIG. 3 (the triangle configuration unit and attribute configuration unit in FIG. 3 are separate hardware components in the GPU), the triangle configuration 520 and attribute configuration 522 stages of this embodiment can be implemented in the execution unit pool 506 in the software executed within. In other words, in addition to threads performing programmable shader operations as mentioned above, the execution unit pool 506 can be enabled to perform triangle configuration and attribute configuration by creating threads within the execution units capable of performing triangle configuration and attribute configuration operations operate.

执行三角形配置以及属性配置操作的软件指令可储存于执行单元自身、执行单元集区控制与快取子系统504中，且可来源于执行单元自身、执行单元集区控制与快取子系统504，或者，实施三角形配置以及属性配置操作的软件指令可来源于软件装置驱动器或应由一般本领域技术人员了解的其他位置。The software instructions for performing triangle configuration and attribute configuration operations may be stored in the execution unit itself, the execution unit pool control and cache subsystem 504, and may originate from the execution unit itself, the execution unit pool control and cache subsystem 504, Alternatively, the software instructions to implement triangle configuration and attribute configuration operations may originate from a software device driver or other location as would be understood by one of ordinary skill in the art.

为了执行三角形配置520以及属性配置522操作，可在执行单元集区506内建立线程。三角形配置520以及属性配置522操作可执行于线程内，而非执行于与执行单元集区506分离的硬件组件内。由于执行单元集区506能够进行多线程操作，所以可建立用于执行三角形配置520以及属性配置522操作的线程，而可同时执行其他着色器操作或甚至三角形以及属性配置操作的额外线程。In order to execute triangle configuration 520 and attribute configuration 522 operations, threads may be created within execution unit pool 506 . Triangle configuration 520 and attribute configuration 522 operations may be performed within threads rather than within separate hardware components from execution unit pool 506 . Since the execution unit pool 506 is capable of multi-threading, a thread for performing triangle configuration 520 and attribute configuration 522 operations may be created while additional threads may be executing other shader operations or even triangle and attribute configuration operations concurrently.

在此实施例的图形处理单元500中，跨距与像素片产生器518可实施为执行单元集区506的外部硬件组件。如所知，在完成三角形配置520操作后，可将来自三角形配置520操作的至少一些所得数据(包括边缘函数、计算的行列式、边界框以及Z差值)输出至跨距与像素片产生器518以及未图示的图形管线的可能的其他阶段(诸如，Z测试)。在完成三角形配置520操作后与跨距与像素片产生器518执行操作的期间，可中止执行三角形配置520操作的线程。在跨距与像素片产生器518或其他图形管线操作完成后，若正由图形管线操纵的几何基元被排斥，则即可终止线程。In the GPU 500 of this embodiment, the stride and tile generator 518 may be implemented as an external hardware component of the execution unit pool 506 . As is known, after the triangle configuration 520 operation is complete, at least some of the resulting data from the triangle configuration 520 operation, including edge functions, computed determinants, bounding boxes, and Z differences, can be output to the span and tile generator 518 and possibly other stages of the graphics pipeline not shown (such as Z-test). The thread performing the triangle configuration 520 operation may be suspended between the time the stride and tile generator 518 is performing operations after the triangle configuration 520 operation is complete. After the stride and tile generator 518 or other graphics pipeline operations are complete, the thread may be terminated if the geometry primitive being manipulated by the graphics pipeline is repelled.

换言之，若不必将几何基元再现至屏幕，诸如在几何基元由其他基元覆盖的情况下，则可能不必继续处理图形管线中的基元。如果在图形管线的此部分中未排斥几何基元，则线程可通过执行属性配置522操作而继续执行。如所知，图形管线中的属性配置522操作可包括在执行使用者可编程像素着色器512线程之前，处理对应于多个像素的多个属性，所述多个像素中的每一个包括所述多个属性的一部分。在于线程内完成属性配置522操作后，即可将所得的数据储存于执行单元集区控制与快取子系统504内的电平二快取存储器以供随后线程(包括像素着色器线程)使用。或者，可将来自线程的所得数据储存于各个执行单元内的缓冲器中，且使其可用于在执行单元内建立的下一个线程(若线程需要使用数据)。举例而言，在执行三角形配置520以及属性配置522操作的线程终止后，可在执行单元内建立一对应于由属性配置522阶段处理的像素属性的像素着色器512，其中在执行先前的线程后，像素属性以及需要用于像素着色器线程的其他数据驻留于缓冲器中。其它实施例可包括执行单元内的专门逻辑模块以增强某三角形配置或属性配置操作的效能。举例而言，可将特定逻辑电路并入于执行单元内，以执行诸如琐细三角形排斥等三角形配置阶段的操作的任务。In other words, if the geometry primitive does not have to be rendered to the screen, such as if the geometry primitive is covered by other primitives, it may not be necessary to continue processing the primitive in the graphics pipeline. If geometry primitives are not repelled in this portion of the graphics pipeline, the thread may continue executing by performing attribute configuration 522 operations. As is known, an attribute configuration 522 operation in the graphics pipeline may include processing a plurality of attributes corresponding to a plurality of pixels, each of the plurality of pixels including the described Part of multiple properties. After the attribute configuration 522 operation is completed within a thread, the resulting data can be stored in level two cache memory within the execution unit pool control and cache subsystem 504 for use by subsequent threads (including pixel shader threads). Alternatively, the resulting data from the threads can be stored in buffers within each execution unit and made available to the next thread established within the execution unit if the thread needs to use the data. For example, a pixel shader 512 corresponding to the pixel attributes processed by the attribute configuration 522 stage may be established within the execution unit after the thread executing the triangle configuration 520 and attribute configuration 522 operations terminates, wherein after execution of the previous thread , pixel attributes and other data needed for the pixel shader thread resides in the buffer. Other embodiments may include specialized logic modules within the execution unit to enhance the performance of certain triangle configuration or attribute configuration operations. For example, specific logic circuits may be incorporated within the execution units to perform tasks such as triangular triangle repulsion for the operations of the triangle configuration stage.

本发明的实施例提供与结合三角形配置以及属性配置阶段的单独的硬件组件实施的图形处理单元相比的优势。具体而言，相对于实施为与执行单元集区分离的硬件组件的三角形配置单元520及/或属性配置522单元，在执行于执行单元集区内的软件指令中实施图形管线的三角形配置520以及属性配置522阶段可减少图形处理单元500的门数目。如所知，图形应用程序设计接口需要执行单元集区506以允许GPU执行图形管线的各种可编程阶段，诸如几何着色器、顶点着色器或像素着色器。在GPU内已存在的执行单元集区506内实施至少三角形配置以及属性配置阶段可移除至少所述硬件组件，进而减少系统中的门的数量。应了解，根据本发明的实施例减少图形处理单元的门数目可降低设计及/或生产GPU的成本。此外，通过去除用以将数据传递至作为单独硬件组件的三角形配置单元或属性配置单元及/或自三角形配置单元或属性配置单元传递数据的硬件线的GPU的需要，也可降低系统的成本。此在下层端(low end)图形处理单元或计算机系统中尤其有用，其中，成本为在硬件组件的设计以及制造上是重要的考虑。Embodiments of the present invention provide advantages over graphics processing units implemented in separate hardware components in conjunction with the triangle configuration and attribute configuration stages. Specifically, with respect to the triangle configuration unit 520 and/or the attribute configuration 522 unit implemented as separate hardware components from the execution unit pool, the triangle configuration 520 and the graphics pipeline are implemented in software instructions executed within the execution unit pool. The attribute configuration 522 stage can reduce the number of gates of the GPU 500 . As is known, graphics application programming interfaces require execution unit pool 506 to allow the GPU to execute various programmable stages of the graphics pipeline, such as geometry shaders, vertex shaders, or pixel shaders. Implementing at least the triangle configuration and attribute configuration stages within the existing execution unit pool 506 in the GPU can remove at least the hardware components, thereby reducing the number of gates in the system. It should be appreciated that reducing the gate count of a graphics processing unit according to embodiments of the present invention can reduce the cost of designing and/or producing the GPU. In addition, the cost of the system may also be reduced by removing the need for the GPU to pass data to and/or hardware lines from the triangle hive or attribute hive as a separate hardware component. This is especially useful in low end graphics processing units or computer systems where cost is an important consideration in the design and manufacture of hardware components.

另外，本发明的实施例可导致更有效的图形管线，因为三角形配置520以及属性配置522执行于能够进行多线程操作的执行单元集区506内。应了解，可通过执行单元集区的线程控制以及排程达成图形管线的有效执行。举例而言，若三角形配置操为造成图形管线瓶颈的原因，则可自执行单元集区增加资源分配至三角形配置操作以减轻瓶颈或缓和降低的效能。或者，若图形管线的另一阶段(诸如，像素着色器)为GPU中的瓶颈的原因，则可自执行单元集区增加资源分配至像素着色器线程以增加系统的流量。此外，通过在执行单元集区506中的线程中实施属性配置以及三角形配置操作的设计可建立一较不取决于单一瓶颈点的系统。通过利用此项技术中已知的线程管理以及排程协定来管理执行单元集区506的负载，图形管线可更有效。In addition, embodiments of the present invention may result in a more efficient graphics pipeline because triangle configuration 520 and attribute configuration 522 are executed within pool 506 of execution units capable of multi-threading. It should be appreciated that efficient execution of the graphics pipeline can be achieved through thread control and scheduling of pools of execution units. For example, if the triangle allocation operation is the cause of the graphics pipeline bottleneck, resource allocation from the execution unit pool to the triangle allocation operation may be increased to alleviate the bottleneck or mitigate the reduced performance. Alternatively, if another stage of the graphics pipeline, such as the pixel shader, is the cause of the bottleneck in the GPU, resource allocation from the execution unit pool to the pixel shader thread can be increased to increase the throughput of the system. In addition, by implementing the attribute configuration and triangle configuration operations in threads in the execution unit pool 506, a design can create a system that is less dependent on a single bottleneck point. By utilizing thread management and scheduling conventions known in the art to manage the load on the pool of execution units 506, the graphics pipeline can be more efficient.

本发明的实施例提供的另一优势为因消除三角形配置以及属性配置操作的独立硬件组件所产生的灵活性以及可扩展性。举例而言，本发明的实施例可通过更改用以在执行单元内执行三角形配置或属性配置操作的软件指令，来更改图形处理单元中三角形配置520或属性配置522阶段。相反，与执行单元集区分离的三角形配置以及属性配置硬件组件可能需要新的硬件组件以更改图形管线的三角形配置或属性配置阶段。对于程序错误的修改、新特征的添加或用于三角形配置520或属性配置522阶段的实施的演算法的调整，此灵活性可为有用的。Another advantage provided by embodiments of the present invention is flexibility and scalability resulting from the elimination of separate hardware components for triangle configuration and attribute configuration operations. For example, embodiments of the present invention may modify the triangle configuration 520 or attribute configuration 522 stage in a graphics processing unit by modifying the software instructions used to perform triangle configuration or attribute configuration operations within the execution unit. Conversely, triangle configuration and attribute configuration hardware components separate from the execution unit pool may require new hardware components to change the triangle configuration or attribute configuration stages of the graphics pipeline. This flexibility may be useful for bug fixes, addition of new features, or adjustments to the algorithms used for the implementation of the triangle configuration 520 or attribute configuration 522 stages.

现参看图6，其描绘本发明的方法实施例600的流程图。在步骤602中，接收表示几何基元的顶点数据，以供图形管线的三角形配置及属性配置阶段进行处理。正由图形管线处理的几何基元的顶点数据通常自几何着色器输出，以供三角形配置阶段的处理。在步骤604中，经由软件指令在执行单元内建立线程，以执行三角形配置操作(步骤606)。如以上所提到，图形管线中的三角形配置操作可包括(但不限于)：三角形琐细排斥、行列式计算、边界框计算、精选、预属性配置KLMN、边缘函数产生、裁剪以及安全带裁剪。Referring now to FIG. 6, depicted is a flowchart of a method embodiment 600 of the present invention. In step 602, vertex data representing geometric primitives are received for processing by the triangle configuration and attribute configuration stages of the graphics pipeline. Vertex data for geometry primitives being processed by the graphics pipeline is typically output from the geometry shader for processing by the triangle configuration stage. In step 604, threads are created within the execution units via software instructions to perform triangle configuration operations (step 606). As mentioned above, triangle configuration operations in the graphics pipeline may include (but are not limited to): Triangle Rejection, Determinant Computation, Bounding Box Computation, Refinement, Pre-Attribute Configuration KLMN, Edge Function Generation, Clipping, and Seatband Clipping .

在步骤608中，在完成三角形配置操作后，将边界框输出至跨距与像素片产生器。亦将Z差值输出至图形管线的Z测试阶段(ZL1、ZL2)。本文中未讨论链接至三角形配置阶段的输出的图形管线的其他元件，但其对于一般本领域技术人员而言是已知的。举例而言，三角形配置阶段可将数据输出至再现处理管线的其他元件以用于处理。在完成三角形配置操作且产生了至少以上输出后，中止线程直至数据返回至执行单元为止。举例而言，若线程将数据输出至跨距与像素片产生器、Z测试或再现处理管线的其他阶段，则线程在继续执行属性配置操作前，必须等待至阶段内进行的操作已完成。在步骤610中，中止线程。In step 608, after the triangle configuration operation is completed, the bounding box is output to the span and tile generator. The Z difference values are also output to the Z test stage (ZL1, ZL2) of the graphics pipeline. Other elements of the graphics pipeline linked to the output of the triangle configuration stage are not discussed herein, but are known to those of ordinary skill in the art. For example, the triangle configuration stage may output data to other elements of the rendering processing pipeline for processing. After the triangle configuration operation is complete and at least the above output is produced, the thread is suspended until data is returned to the execution unit. For example, if a thread outputs data to a stride and tile generator, Z test, or other stage in the rendering pipeline, the thread must wait until the operations performed in that stage are complete before continuing with the attribute configuration operation. In step 610, the thread is terminated.

在步骤612中，若三角形或几何基元未受到跨距与像素片产生器或Z测试的排斥，则线程得以恢复(步骤614)，且在步骤616中，于线程内执行属性配置操作，以产生与所述顶点数据相关的像素属性。举例而言，若图形管线的其他元件(诸如，Z测试)判定无需将三角形输出至图形管线稍后阶段中的帧缓冲器，则可排斥三角形或几何基元。在此情形下，属性配置操作是不必要的。在执行了属性配置操作后，在步骤618中，储存来自线程的数据。如以上参考图6的实施例所提到，可将来自线程的数据储存于执行单元内的缓冲器中，用于由执行单元所建立的随后线程使用。或者，亦可将数据储存于可由其他执行单元存取的快取子系统中，以供在其他执行单元中所建立的线程使用。其中，所述随后线程为选自下列的至少一个：像素着色器线程、顶点着色器线程，以及可执行所述三角形配置操作以及所述属性配置操作的一线程。在步骤620中，终止线程，且接着可将执行单元分配至专用于图形管线的其他阶段的线程。In step 612, if the triangle or geometry primitive is not rejected by the stride and tile generator or the Z test, the thread is resumed (step 614), and in step 616, attribute configuration operations are performed within the thread to Pixel attributes associated with the vertex data are generated. For example, a triangle or geometry primitive may be rejected if other elements of the graphics pipeline, such as a Z test, determine that there is no need to output the triangle to a frame buffer in a later stage of the graphics pipeline. In this case, attribute configuration operations are unnecessary. After performing the attribute configuration operation, in step 618, the data from the thread is stored. As mentioned above with reference to the embodiment of FIG. 6, data from threads may be stored in buffers within the execution unit for use by subsequent threads created by the execution unit. Alternatively, data may also be stored in a cache subsystem accessible by other execution units for use by threads created in other execution units. Wherein, the subsequent thread is at least one selected from the following: a pixel shader thread, a vertex shader thread, and a thread capable of performing the triangle configuration operation and the attribute configuration operation. In step 620, the thread is terminated, and execution units may then be allocated to threads dedicated to other stages of the graphics pipeline.

本发明的实施例可实施于硬件、软件、韧体或其组合中。在一些实施例中，色彩数据的压缩可由储存于存储器中且由合适的指令执行系统所执行的软件或韧体来实施。若实施于硬件中，如在替代实施例中，可通过下列已知技术中的任一个或组合来实施三角形配置以及属性配置阶段：具有用于对数据信号实施逻辑功能的逻辑门的离散逻辑电路(discrete logic circuit)、具有适当组合的逻辑门的专用集成电路(application specificintegrated circuit，ASIC)、可编程门阵列(programmable gatearray，PGA)、场可编程门阵列(field programmable gate array，FPGA)等。Embodiments of the invention may be implemented in hardware, software, firmware or a combination thereof. In some embodiments, the compression of color data may be performed by software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in an alternate embodiment, the triangle configuration and property configuration stages may be implemented by any one or combination of the following known techniques: Discrete logic circuits with logic gates for implementing logic functions on data signals (discrete logic circuit), application specific integrated circuit (ASIC), programmable gate array (programmable gate array, PGA), field programmable gate array (field programmable gate array, FPGA) etc. with appropriate combinations of logic gates.

如相当熟习本发明的技术者应理解，应将流程图中的任何过程描述或方块理解为表示模块、区段或包括用于实施过程中的具体逻辑功能或步骤的一或多个可执行指令的程序码的部分，且替代实施包括于本发明的较佳实施例的范畴内，在此范畴中，可以与所揭示或讨论的次序不同的次序来执行功能，包括大体上同时或按相反次序，此视所涉及的功能性而定。As those skilled in the present invention should understand, any process description or block in the flowchart should be understood as representing a module, section, or one or more executable instructions including specific logical functions or steps for implementing the process and alternative implementations are included within the scope of the preferred embodiment of the invention, where functions may be performed in an order different from that disclosed or discussed, including substantially simultaneously or in the reverse order , depending on the functionality involved.

以上所述仅为本发明较佳实施例，然其并非用以限定本发明的范围，任何熟悉本项技术的人员，在不脱离本发明的精神和范围内，可在此基础上做进一步的改进和变化，因此本发明的保护范围当以本申请的权利要求书所界定的范围为准。The above description is only a preferred embodiment of the present invention, but it is not intended to limit the scope of the present invention. Any person familiar with this technology can make further improvements on this basis without departing from the spirit and scope of the present invention. Improvements and changes, so the protection scope of the present invention should be defined by the claims of the present application.

附图中符号的简单说明如下：A brief description of the symbols in the drawings is as follows:

110：顶点着色器110: Vertex Shader

120：几何着色器120: Geometry Shader

130：三角形配置单元130: Triangular hive

140：跨距与像素片产生器140: Span and Pixel Slice Generator

150：属性配置单元150: Properties hive

160：像素着色器160: Pixel Shader

170：帧缓冲器170: Frame buffer

200：图形管线200: graphics pipeline

250：存储器250: memory

252：命令流处理器252: Command Stream Processor

254：顶点着色器254: Vertex shader

256：几何着色器256: Geometry Shader

257：三角形配置单元257: Triangular hive

258：跨距与像素片产生器258: Span and Pixel Slice Generator

259：属性配置单元259: Attributes Hive

260：像素着色器260: Pixel Shader

262：帧缓冲器262: frame buffer

300：图形处理单元(GPU)300: Graphics Processing Unit (GPU)

304：执行单元集区控制与快取子系统304: Execution unit pool control and cache subsystem

306：多个可编程执行单元集区306: Multiple Programmable Execution Unit Pools

310：几何着色器310: Geometry Shaders

312：像素着色器312: Pixel Shader

314：三角形配置单元314: Triangular hive

316：属性配置单元316: Attributes Hive

318：跨距与像素片产生器318: Span and Pixel Slice Generator

400：图形管线400: Graphics pipeline

450：存储器450: memory

452：命令流处理器452: Command Stream Processor

454：顶点着色器454: Vertex Shader

456：几何着色器456: Geometry Shaders

457：三角形配置457: Triangular configuration

458：跨距与像素片产生器458: Span and Pixel Slice Generator

460：像素着色器460: Pixel Shader

462：帧缓冲器462: Framebuffer

500：图形处理单元500: Graphics Processing Unit

504：执行单元集区控制与快取子系统504: Execution Unit Pool Control and Cache Subsystem

506：多个可编程执行单元集区506: Multiple Programmable Execution Unit Pools

508：顶点着色器508: Vertex Shader

510：几何着色器510: Geometry Shader

512：像素着色器512: Pixel Shader

518：跨距与像素片产生器518: Span and Pixel Slice Generator

520：三角形配置520: Triangular configuration

522：属性配置。522: attribute configuration.

Claims

1. a graphic processing facility is characterized in that, comprising:

At least one performance element, described at least one performance element can be used for multithreading operation, and wherein said at least one performance element can be carried out at least one thread that is used for operation of able to programmeization triangular arrangement and the operation of able to programmeization attribute configuration by software instruction; Wherein

Described at least one performance element is that able to programmeization is selected from following at least one thread with execution: vertex shader operation, pixel coloring device operation and geometric coloration operation;

Described at least one performance element can end to be used for described at least one thread of described able to programmeization triangular arrangement operation and the operation of described able to programmeization attribute configuration;

Described at least one performance element can export the data from described able to programmeization triangular arrangement operation to the outer at least one nextport hardware component NextPort of described performance element, and described able to programmeization triangular arrangement operation is from described at least one thread;

When the data corresponding to described at least one thread were received, described at least one performance element can recover described suspended at least one thread; And

Described at least one performance element can be stored in the result data from described at least one thread in the impact damper in described at least one performance element, and the thread subsequently that is used for being set up by described at least one performance element uses.

2. a Graphics Processing Unit is characterized in that, comprising:

At least one performance element, described at least one performance element can be used for multithreading operation, wherein said at least one performance element can be carried out at least one thread that is used for triangular arrangement operation and attribute configuration operation, and described performance element can be in order to carry out tinter operation able to programme; And

One performance element collection zone control system is in order to the described at least one thread of scheduling with the described at least one performance element of management;

Wherein said performance element collection zone control system can simultaneously initially be used for described at least one thread of described triangular arrangement operation, the operation of described attribute configuration and a tinter operation able to programme.

3. Graphics Processing Unit according to claim 2 is characterized in that, described attribute configuration operation comprises a plurality of attributes of processing corresponding to a plurality of pixels, and each in wherein said a plurality of pixels comprises the part of described a plurality of attributes.

4. Graphics Processing Unit according to claim 2 is characterized in that, described at least one performance element is operating to carry out described triangular arrangement operation and described attribute configuration via software instruction of able to programmeization.

5. Graphics Processing Unit according to claim 2 is characterized in that,

Described at least one performance element can be in order to end to be used for described at least one thread of described triangular arrangement operation and the operation of described attribute configuration;

Described at least one performance element can export the data from described triangular arrangement operation to the outer at least one nextport hardware component NextPort of described at least one performance element, and described triangular arrangement operation is from described at least one thread; And

When the data corresponding to described at least one thread were received, described at least one performance element can recover described suspended at least one thread.

6. Graphics Processing Unit according to claim 2 is characterized in that, described at least one performance element more comprises:

One impact damper is in order to store the result of described at least one thread of carrying out described triangular arrangement operation and the operation of described attribute configuration.

7. a method of carrying out triangular arrangement and attribute configuration in graphic system is characterized in that, comprises the following steps:

Receive vertex data, described vertex data is corresponding to a geometric primitive,

Set up a thread in being used for a performance element of multithreading operation, described performance element can be carried out tinter operation able to programme,

In described thread, described vertex data is carried out the triangular arrangement operation,

In described thread, carry out the attribute configuration operation with the generation pixel property relevant with described vertex data, and

Stop described thread.

8. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, more comprises the following steps:

End described thread,

Export the described result of described triangular arrangement operation to a span and pixel sheet generator,

Receive deal with data from described span and pixel sheet generator,

Carry out the attribute configuration operation producing described pixel property from described treated data, and

Recover described thread.

9. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, described performance element can be carried out pixel coloring device, geometric coloration and vertex shader operation by software instruction.

10. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, more comprises:

Produce another thread by another performance element that can be used for multithreading operation, described another thread can be in order to carry out described triangular arrangement operation with the described thread parallel ground of described performance element;

Wherein said another thread can be carried out simultaneously with described thread.