CN112116715A

CN112116715A - Method and apparatus for efficient interpolation

Info

Publication number: CN112116715A
Application number: CN202010572432.5A
Authority: CN
Inventors: R.库马尔; F.古鲁帕德; D.塔宁鲍姆
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2019-06-20
Filing date: 2020-06-22
Publication date: 2020-12-22

Abstract

The method for interpolating the attribute values of the image grid may comprise: determining a root value of an attribute of a root node located at the center of the image grid; pre-computing metadata for a plurality of child nodes of one or more levels based on the one or more gradients of the attributes; and deriving an attribute value for each child node of each hierarchy based on the corresponding root value and metadata of the hierarchy for each child node, wherein each child node can serve as a root node for a next hierarchy. The image grid may have a plurality of outer cells radially arranged around a center cell, and the root node may be located in the center cell.

Description

Method and apparatus for efficient interpolation

技术领域technical field

本公开总体上涉及内插，并且更具体地涉及用于在网格、诸如图像网格中的节点的有效内插的方法和装置。The present disclosure relates generally to interpolation, and more particularly to methods and apparatus for efficient interpolation of nodes in grids, such as image grids.

背景技术Background technique

内插是用于确定位于已知数据点之间的未知数据点的属性值的处理。例如在图像处理中，内插可以用于找到像素集合的属性的中间值，这些像素在空间上彼此相邻并位于基元内部，基元是用于创建更大图像的几何基本单位。内插器消耗资源，诸如电力和集成电路上的面积。传统内插器的高成本和功耗会限制吞吐量或渲染像素的速度。然而，与作为图像子集的常规图块域相比，显示器继续提供更高分辨率，使用更多的基元和/或在更大的图像域上工作。此外，使用和/或变换来自内插器的数据的下游图像处理设备(诸如执行单元)继续得到改善。这可能会在内插器的带宽与下游单元的带宽之间产生失配，可能导致下游单元的利用率低和/或渲染设备上的帧速率低。因此，需要可以以更高的效率和/或吞吐量操作的内插器。Interpolation is the process of determining attribute values for unknown data points that lie between known data points. In image processing, for example, interpolation can be used to find intermediate values of properties of a collection of pixels that are spatially adjacent to each other and inside primitives, which are geometric basic units used to create larger images. Interposers consume resources such as power and area on integrated circuits. The high cost and power consumption of traditional interpolators can limit throughput or the speed at which pixels can be rendered. However, displays continue to offer higher resolutions, use more primitives and/or work on larger image domains than the regular tile domain, which is a subset of the image. In addition, downstream image processing devices, such as execution units, that use and/or transform data from the interpolator continue to improve. This can create a mismatch between the bandwidth of the interpolator and the bandwidth of the downstream unit, which can lead to low utilization of the downstream unit and/or low frame rate on the rendering device. Therefore, there is a need for an interpolator that can operate with higher efficiency and/or throughput.

发明内容SUMMARY OF THE INVENTION

一种用于对图像网格的属性值进行内插的方法，该方法包括：确定位于图像网格中心的一级根节点处的属性的一级根值；基于属性在第一方向上的第一梯度和属性在第二方向上的第二梯度来计算一级元数据；以及基于一级根值和一级元数据，导出图像网格中两个或更多个一级子节点的属性的一级子值，在图像网格中围绕一级根节点放射状布置所述两个或更多个一级子节点。该方法还可以包括：使用一级子节点中的一个及其对应的一级子值作为图像网格的单元格的二级根节点和二级根值，其中，单元格的根节点位于单元格的中心；基于第一梯度和第二梯度计算二级元数据；以及基于二级根值和二级元数据，导出单元格中两个或更多个二级子节点的属性的二级子值，在单元格中围绕二级根节点放射状布置两个或更多个二级子节点。A method for interpolating an attribute value of an image grid, the method comprising: determining a first-level root value of an attribute at a first-level root node at the center of the image grid; a gradient and a second gradient of the attribute in the second direction to calculate first-level metadata; and based on the first-level root value and the first-level metadata, deriving an attribute of two or more first-level child nodes in the image grid A first-level child value, the two or more first-level child nodes are radially arranged around the first-level root node in the image grid. The method may further include: using one of the first-level child nodes and its corresponding first-level child value as the second-level root node and the second-level root value of the cell of the image grid, wherein the root node of the cell is located in the cell the center of , two or more secondary child nodes are arranged radially around the secondary root node in the cell.

每个一级子节点可以在第一方向和第二方向上从一级根节点对称地偏移。每个一级子节点可以在第一方向和第二方向上从一级根节点偏移基本零或基本相同距离。图像网格可以包括具有中心单元格和八个外部单元格的3x3单元格阵列，两个或更多个一级子节点可以包括八个一级子节点，一级根节点可以位于中心单元格的中心，以及每个一级子节点可以位于外部单元格中的一个的中心。一级元数据可以包括在第一方向和第二方向上的偏移的属性的增量值。第一参数A的值可以基于第一梯度，以及第二参数B的值可以基于第二梯度。一级元数据可以包括值A、B、A+B和A-B。第一参数A的值可以基于第一梯度，第二参数B的值可以基于第二梯度，图像网格包括3x3单元格阵列，一级元数据可以包括值3A、3B、3(A+B)和3(A-B)，以及二级元数据可以包括值A、B、A+B和A-B。可以基于平面方程来计算一级元数据。平面方程可以具有P(x,y)＝A*(x-Seed_X)+B*(y-Seed_Y)+C的形式，其中P可以是在每个位置(x,y)处可以内插的二维表面的参数，其中，x可以是在x方向上的距离，y可以是在y方向上的距离，A可以是x方向上每像素(或其他单元格)的梯度，B可以是y方向上每像素(或其他单元格)的梯度，以及C可以是位置(Seed_X,Seed_Y)处P的值。导出一级子值可以包括将一个或多个一级元数据添加到一级根值。一级根节点和每个一级子节点可以对应于像素。一级根节点和每个一级子节点可以对应于样本。该方法还可以包括响应于属性值对图像进行栅格化。属性可以包括指示节点可以在基元内部的第一值和指示节点可以在基元外部的第二值。Each first-level child node may be offset symmetrically from the first-level root node in the first direction and the second direction. Each first-level child node may be offset from the first-level root node by substantially zero or substantially the same distance in the first direction and the second direction. The image grid can include a 3x3 cell array with a center cell and eight outer cells, two or more first-level children can include eight first-level children, and a first-level root node can be located at the center of the center cell. The center, and each first-level child node can be in the center of one of the outer cells. The primary metadata may include delta values for attributes of offsets in the first and second directions. The value of the first parameter A may be based on the first gradient, and the value of the second parameter B may be based on the second gradient. Primary metadata may include the values A, B, A+B, and A-B. The value of the first parameter A may be based on the first gradient, the value of the second parameter B may be based on the second gradient, the image grid includes a 3x3 cell array, and the primary metadata may include the values 3A, 3B, 3(A+B) and 3(A-B), and secondary metadata may include the values A, B, A+B, and A-B. The primary metadata can be calculated based on a plane equation. The plane equation can be of the form P(x,y)=A*(x-Seed_X)+B*(y-Seed_Y)+C, where P can be a two-dimensional interpolation at each position (x,y) Parameters of a dimensional surface, where x can be the distance in the x direction, y can be the distance in the y direction, A can be the gradient per pixel (or other cell) in the x direction, and B can be the gradient in the y direction Gradient per pixel (or other cell), and C can be the value of P at position (Seed_X, Seed_Y). Deriving the first-level child value may include adding one or more first-level metadata to the first-level root value. The first-level root node and each first-level child node may correspond to a pixel. The first-level root node and each first-level child node may correspond to a sample. The method may also include rasterizing the image in response to the attribute value. The attribute may include a first value indicating that the node may be inside the primitive and a second value indicating that the node may be outside the primitive.

用于对图像网格的属性值进行内插的方法可以包括：确定位于图像网格中心的根节点的属性的根值；基于属性的一个或多个梯度，在一个或多个层级中预先计算多个子节点的元数据；以及基于每个子节点的层级的对应的根值和元数据，导出每个层级的每个子节点的属性值；其中，每个子节点可以用作下一层级的根节点。图像网格可以具有围绕中心单元格放射状布置的多个外部单元格，以及根节点可以位于中心单元格中。根节点可以位于具有一个或多个附加节点的第一单元格中，并且该方法还可以包括：确定第一单元格中一个或多个附加节点的属性值；以及导出与每个层级的第一单元格中每个附加节点相对应的附加子节点的属性值，其中，可以基于第一单元格中对应附加节点的属性值和对应层级的元数据，导出每个附加子节点的属性值。可以通过关于第一单元格中的每个节点的单独的分层树来导出附加子节点的属性值。第一单元格可以是像素，并且第一单元格中的每个节点可以是样本。像素中的样本可以用于多采样抗锯齿(MSAA)。A method for interpolating attribute values of an image grid may include: determining a root value of an attribute of a root node located at the center of the image grid; precomputing in one or more levels based on one or more gradients of the attribute metadata of a plurality of child nodes; and deriving attribute values of each child node of each level based on the corresponding root value and metadata of the level of each child node; wherein each child node can be used as the root node of the next level. The image grid may have a plurality of outer cells arranged radially around the central cell, and the root node may be located in the central cell. The root node may be located in a first cell having one or more additional nodes, and the method may further include: determining attribute values of the one or more additional nodes in the first cell; and deriving the first cell associated with each hierarchy level The attribute value of the additional child node corresponding to each additional node in the cell, wherein the attribute value of each additional child node can be derived based on the attribute value of the corresponding additional node in the first cell and the metadata of the corresponding level. Attribute values for additional child nodes may be derived through a separate hierarchical tree for each node in the first cell. The first cell may be a pixel, and each node in the first cell may be a sample. The samples in the pixel can be used for multi-sample anti-aliasing (MSAA).

用于对图像网格的属性值进行内插的系统可以包括：根单元，被配置为确定位于图像网格中心的根节点的属性的根值；元数据单元，被配置为基于属性的一个或多个参数，在一个或多个层级中预先计算多个子节点的元数据、以及一个或多个逻辑阶段的树，一个或多个逻辑阶段耦合到根单元和元数据单元，并被配置为基于每个子节点的层级的对应根值和元数据导出每个层级的每个子节点的属性值。逻辑阶段中的一个或多个可以包括具有两输入加法器的组合逻辑，被布置为将属性的根值添加到多个子节点的元数据。该系统还可以包括重定向单元，耦合在根单元和树之间，并且被配置为基于操作模式重新布置将样本从根单元定向到树的方式。逻辑阶段可以被配置为在多样本操作模式下处理多个样本。图像网格可以是更大图像网格的第一子网格，并且该系统还可以包括：第二根单元，被配置为确定位于更大图像网格的第二子网格中心的第二根节点的第二属性的第二根值；第二元数据单元，被配置为基于第二属性的一个或多个参数，预先计算一个或多个层级中的多个第二子节点的第二元数据；以及一个或多个逻辑阶段的第二树，该一个或多个逻辑阶段耦合到第二根单元和第二元数据单元，并被配置为基于每个第二子节点的层级的对应第二根值和第二元数据，导出每个层级处的每个第二子节点的第二属性值。第二根单元、第二树和第二元数据单元可以被配置为选择性地使用第一子网格的属性作为第二子网格的第二属性。第一子网格和第二子网格的属性可以用于不同的基元。该系统还可以包括以串行混合配置耦合在根单元和树之间的一个或多个附加逻辑阶段，其中，一个或多个附加逻辑阶段使用与一个或多个逻辑阶段的树实质上不同的内插技术。该系统可以以硬件、软件或其组合来实现。硬件可以包括集成电路。The system for interpolating attribute values of an image grid may include: a root unit configured to determine a root value of an attribute of a root node located at the center of the image grid; a metadata unit configured to be based on one of the attributes or A plurality of parameters, metadata for a plurality of child nodes are precomputed in one or more hierarchies, and a tree of one or more logical stages coupled to the root unit and the metadata unit and configured to be based on The corresponding root value and metadata of the hierarchy of each child node derives the attribute value of each child node of each hierarchy. One or more of the logic stages may include combinatorial logic with a two-input adder, arranged to add the root value of the attribute to the metadata of the plurality of child nodes. The system may also include a redirection unit coupled between the root unit and the tree and configured to rearrange the manner in which the samples are directed from the root unit to the tree based on the mode of operation. The logic stage may be configured to process multiple samples in a multi-sample mode of operation. The image grid may be a first sub-grid of a larger image grid, and the system may further include: a second root unit configured to determine a second root centered on the second sub-grid of the larger image grid a second root value of a second attribute of the node; a second metadata unit configured to pre-compute second elements of a plurality of second child nodes in one or more hierarchies based on one or more parameters of the second attribute data; and a second tree of one or more logical stages coupled to the second root unit and the second metadata unit and configured to The second root value and the second metadata, the second attribute value of each second child node at each level is derived. The second root unit, the second tree, and the second metadata unit may be configured to selectively use properties of the first sub-grid as second properties of the second sub-grid. The properties of the first subgrid and the second subgrid can be used for different primitives. The system may also include one or more additional logic stages coupled between the root cell and the tree in a serial hybrid configuration, wherein the one or more additional logic stages use a substantially different tree than the one or more logic stages interpolation technique. The system may be implemented in hardware, software, or a combination thereof. Hardware may include integrated circuits.

用于对图像网格的属性值进行内插的装置可以包括一个或多个逻辑阶段的树，被配置为基于根节点处的对应属性值和一个或多个层级中的每一个的元数据，导出一个或多个层级中的每一个处位于中心位置的根节点周围的多个子节点的属性值。该装置还可以包括元数据单元，耦合到一个或多个逻辑阶段的树，并且被配置为基于属性的一个或多个参数预先计算一个或多个层级的每一个的多个子节点的元数据。该装置还可以包括根单元，耦合到一个或多个逻辑阶段的树，并且被配置为确定图像网格中根节点的属性的根值。逻辑阶段中的一个或多个可以包括具有两输入加法器的组合逻辑，被布置为将属性的根值添加到多个子节点的元数据。该装置还可以包括以串行混合配置耦合到树的一个或多个附加逻辑阶段，其中，一个或多个附加逻辑阶段使用与一个或多个逻辑阶段的树实质上不同的内插技术。一个或多个逻辑阶段的树可以在集成电路中实现。可以预期其他和/或附加的配置。The means for interpolating attribute values of an image grid may include a tree of one or more logical stages configured to be based on corresponding attribute values at the root node and metadata for each of the one or more levels, Attribute values for a plurality of child nodes around a centrally located root node at each of one or more hierarchies are derived. The apparatus may also include a metadata unit coupled to the tree of one or more logical stages and configured to pre-compute metadata for the plurality of child nodes of each of the one or more hierarchies based on the one or more parameters of the attribute. The apparatus may also include a root unit coupled to the tree of one or more logical stages and configured to determine a root value of an attribute of the root node in the image grid. One or more of the logic stages may include combinatorial logic with a two-input adder, arranged to add the root value of the attribute to the metadata of the plurality of child nodes. The apparatus may also include one or more additional logic stages coupled to the tree in a serial hybrid configuration, wherein the one or more additional logic stages use a substantially different interpolation technique than the tree of the one or more logic stages. The tree of one or more logical stages may be implemented in an integrated circuit. Other and/or additional configurations are contemplated.

附图说明Description of drawings

附图不必按比例绘制，并且出于整个附图的说明性目的，相似结构或功能的元件通常由相似的附图标记表示。附图仅旨在促进对本文描述的各种实施例的描述。附图没有描述本文公开的教导的每个方面，并且不限制权利要求的范围。附图与说明书一起示出本公开的示例实施例，并且与说明书一起用于解释本公开的原理。The drawings are not necessarily to scale and elements of similar structure or function are generally represented by like reference numerals for illustrative purposes throughout the drawings. The drawings are intended only to facilitate the description of the various embodiments described herein. The drawings do not depict every aspect of the teachings disclosed herein, and do not limit the scope of the claims. The drawings, together with the description, illustrate example embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.

图1示出用于对像素网格的属性值进行内插的常规技术。Figure 1 illustrates a conventional technique for interpolating attribute values of a grid of pixels.

图2示出根据本公开原理的对像素网格的属性值进行内插的方法的实施例。2 illustrates an embodiment of a method of interpolating attribute values of a grid of pixels in accordance with the principles of the present disclosure.

图3是根据本公开原理的对像素网格的属性值进行内插的方法的流程图。3 is a flowchart of a method of interpolating attribute values of a grid of pixels in accordance with the principles of the present disclosure.

图4示出根据本公开原理的用于使用平面方程对属性值进行内插的方法的示例实施例。4 illustrates an example embodiment of a method for interpolating attribute values using a plane equation in accordance with the principles of the present disclosure.

图5示出根据本公开原理的用于使用边界方程用于栅格化的内插的方法的示例实施例。5 illustrates an example embodiment of a method for interpolation of rasterization using boundary equations in accordance with the principles of the present disclosure.

图6示出用于相邻级别的元数据通过缩放因子而相关的内插的方法的示例实施例。Figure 6 illustrates an example embodiment of a method for interpolation of adjacent levels of metadata correlated by scaling factors.

图7是示出根据本公开的原理的层级内插系统的实施例的结构和数据流的微架构的框图。7 is a block diagram illustrating a microarchitecture of the structure and data flow of an embodiment of a hierarchical interpolation system in accordance with the principles of the present disclosure.

图8是示出根据本公开的原理的多样本层级内插系统的实施例的结构和数据流的微架构的框图。8 is a block diagram illustrating a microarchitecture of the structure and data flow of an embodiment of a multi-sample hierarchical interpolation system in accordance with the principles of the present disclosure.

图9至图11示出根据本公开的原理可以细分的网格的实施例。9-11 illustrate embodiments of grids that may be subdivided in accordance with the principles of the present disclosure.

图12是示出根据本公开原理的多属性层级内插系统的实施例的结构和数据流的微架构的框图。12 is a block diagram illustrating a microarchitecture of the structure and data flow of an embodiment of a multi-attribute hierarchical interpolation system in accordance with the principles of the present disclosure.

图13示出成像设备204的实施例，本公开中描述的任何方法或装置可以集成到该成像设备204中。FIG. 13 illustrates an embodiment of an imaging device 204 into which any of the methods or apparatus described in this disclosure may be integrated.

图14是示出根据本公开原理的层级内插系统的实施例的结构和数据流的另一微架构的框图。14 is a block diagram illustrating another microarchitecture of the structure and data flow of an embodiment of a hierarchical interpolation system in accordance with the principles of the present disclosure.

图15示出根据本公开的计算系统的实施例。15 illustrates an embodiment of a computing system in accordance with the present disclosure.

具体实施方式Detailed ways

图1示出用于对9像素乘9像素网格100的属性值进行内插的常规技术。可以首先对样本点102处的左下角像素确定属性值。然后，可以通过在x和y方向上以逐像素为基础遍历网格来找到其他像素的样本点处的属性值，如图1中的箭头所示。每个像素的属性增量值被添加到先前像素的属性总和值。(增量值有时称为属性的“delta(差分)”。)每个像素的属性增量值可以基于每个像素相对于先前像素的x偏移和y偏移以及可用于确定任何像素的属性值的函数。FIG. 1 illustrates a conventional technique for interpolating attribute values of a 9-pixel by 9-pixel grid 100 . The attribute value may first be determined for the lower left corner pixel at sample point 102 . The attribute values at the sample points of other pixels can then be found by traversing the grid on a pixel-by-pixel basis in the x and y directions, as indicated by the arrows in Figure 1. The attribute delta value of each pixel is added to the attribute sum value of the previous pixel. (The delta value is sometimes referred to as the attribute's "delta".) The attribute delta value for each pixel can be based on each pixel's x- and y-offset relative to the previous pixel and can be used to determine any pixel's attribute function of value.

然而，每个像素的x偏移和y偏移对于每一行和/或每一列可能是可变的，因此，对于每个节点的属性计算可能需要3输入加法器，这在硬件或软件中实现起来可能相对昂贵。此外，用于n像素乘n像素网格的硬件实现的逻辑级可以是2n。另一个缺点可能是实现图1的技术的硬件所需的面积，该面积可能随着n值的增加而几何增加。此外，硬件实现可能导致较大的扇出，这可能需要更大和/或更昂贵的驱动器，以避免计算中的附加延迟。However, the x-offset and y-offset of each pixel may be variable for each row and/or each column, so the attribute calculation for each node may require a 3-input adder, which is implemented in hardware or software It can be relatively expensive. Furthermore, the logic level for a hardware implementation of an n-pixel by n-pixel grid may be 2n. Another disadvantage may be the area required by the hardware to implement the technique of Figure 1, which may increase geometrically as the value of n increases. Additionally, hardware implementations may result in large fan-outs, which may require larger and/or more expensive drivers to avoid additional delays in computation.

图2示出根据本公开原理的用于对图像网格的属性值进行内插的方法的实施例。图像网格可以最初被划分为九个单元格C1、C2、...、C9，这可以是网格110A中所示级别的最高层级。每个单元格可以具有对应的节点N1、N2、...、N9。中心单元格C5和中心节点N5可以分别被指定为根单元格和根节点。可以围绕中心单元格放射状布置的其余单元格C1-C4和C6-C9可以被指定为子单元格。其余节点N1-N4和N6-N9可以被指定为子节点。2 illustrates an embodiment of a method for interpolating attribute values of an image grid in accordance with the principles of the present disclosure. The image grid may be initially divided into nine cells C1, C2, . . . , C9, which may be the highest level of the levels shown in grid 110A. Each cell may have corresponding nodes N1, N2, . . . , N9. The center cell C5 and the center node N5 may be designated as the root cell and the root node, respectively. The remaining cells C1-C4 and C6-C9, which can be arranged radially around the central cell, can be designated as child cells. The remaining nodes N1-N4 and N6-N9 can be designated as child nodes.

该方法可以通过确定根节点的属性值(根属性)开始。这可以以任何合适的方式来实现。例如，如果根节点N5碰巧是已知样本点，则该点处的样本值可以被用作最高级别的根值。否则，可以例如通过使用通用乘法器、加法器等从图像网格110A外部的其他节点进行内插来计算根值。该方法可以计算元数据，元数据可以例如包括根节点N5和子节点之间在x方向和y方向上的偏移的属性增量值。例如，这可以通过对属性使用平面方程来实现。然后，可以通过将根值与元数据组合来从根节点N5导出每个子节点N1-N4和N6-N9的属性值，如图2中的箭头所示。例如，可以通过如下所述的流线型加法处理，通过将一个或多个元数据添加到根值来计算每个子节点的属性值。在图2中在子单元格中导出属性值的处理被称为阶段1。The method can start by determining the attribute value of the root node (root attribute). This can be accomplished in any suitable way. For example, if root node N5 happens to be a known sample point, the sample value at that point can be used as the highest-level root value. Otherwise, the root value may be calculated by interpolation from other nodes outside the image grid 110A, eg, by using general multipliers, adders, or the like. The method may calculate metadata, which may, for example, include attribute delta values for offsets in the x and y directions between the root node N5 and child nodes. For example, this can be achieved by using plane equations for properties. Then, the attribute value of each child node N1-N4 and N6-N9 can be derived from the root node N5 by combining the root value with the metadata, as shown by the arrows in FIG. 2 . For example, the attribute value of each child node can be computed by adding one or more metadata to the root value through a streamlined additive process as described below. The process of deriving attribute values in child cells is referred to as phase 1 in Figure 2.

最高级别的九个单元格C1，C2，...，C9中的每一个可以被细分为下一级别的更小的子单元格，如网格110B所示，网格110B是细分后的网格110A的另一视图。例如，单元格C1可以被细分为二级单元格或子单元格C1-1、C1-2、…C1-9。每个二级单元格可以具有对应的节点N1-1、N1-2、…N1-9。(为了避免混淆附图，在图2中并未对图像网格110B的所有细分单元格和节点都进行标记，但是从常规的标记模式中可以明显看出每个单元格的指定。)可以将中心单元格C1-5和中心节点N1-5分别指定为二级的根单元格和根节点。因此，一级子节点N1可以被用作二级根节点N1-5。类似地，一级子节点N1处的导出属性值可以被用作二级根节点N1-5的二级根值。可以沿中心单元格C1-5放射状布置的其余单元格C1-1至C1-4和C1-6至C1-9可以被指定为处于二级的子单元格。其余节点N1-1至N1-4和N1-6至N1-9可以被指定为处于二级的子节点。Each of the top nine cells C1, C2, ..., C9 can be subdivided into smaller sub-cells at the next level, as shown in grid 110B, which is the subdivided Another view of the grid 110A. For example, cell C1 can be subdivided into secondary cells or sub-cells C1-1, C1-2, …C1-9. Each secondary cell may have corresponding nodes N1-1, N1-2, . . . N1-9. (To avoid obscuring the drawing, not all subdivision cells and nodes of image grid 110B are labeled in Figure 2, but the designation of each cell is evident from the conventional labeling pattern.) It is possible to Designate the center cell C1-5 and the center node N1-5 as the second-level root cell and root node, respectively. Therefore, the first-level child node N1 can be used as the second-level root node N1-5. Similarly, the derived attribute values at the primary child node N1 may be used as secondary root values for secondary root nodes N1-5. The remaining cells C1-1 to C1-4 and C1-6 to C1-9, which may be arranged radially along the center cell C1-5, may be designated as sub-cells at the secondary level. The remaining nodes N1-1 to N1-4 and N1-6 to N1-9 may be designated as child nodes at the second level.

该方法可以计算二级子节点的元数据，元数据可以包括二级根节点N1-5与二级子节点N1-1至N1-4和N1-6至N1-9之间在x方向和y方向上的偏移的属性增量值。这可以例如使用属性的平面方程来完成。然后，可以通过将二级根值与二级元数据组合，从二级根节点N1-5处的属性的二级根值中导出每个二级子节点的属性值，如图2中的箭头所示。例如，每个二级子节点的属性值可以通过流线型相加处理将一个或多个二级元数据添加到二级根值来计算，如下所述。The method can calculate the metadata of the second-level child nodes, and the metadata can include the second-level root node N1-5 and the second-level child nodes N1-1 to N1-4 and N1-6 to N1-9 in the x direction and the y direction. The property increment value for the offset in the direction. This can be done, for example, using the plane equation of the property. Then, the attribute value of each second-level child node can be derived from the second-level root value of the attribute at the second-level root node N1-5 by combining the second-level root value with the second-level metadata, as shown by the arrow in Figure 2 shown. For example, the attribute value for each secondary child node may be computed by adding one or more secondary metadata to the secondary root value through a streamlined addition process, as described below.

类似地，其他一级单元格C2至C9中的每一个可以被细分为更小的单元格，每个单元格具有其自己的二级节点，如网格110B所示。在网格中心中的最高级别中心单元格C5的情况下，根节点N5可以用作二级单元格C5-5的二级根节点N5-5。在图2中导出二级子单元格处的属性值的处理被称为阶段2。Similarly, each of the other primary cells C2 to C9 may be subdivided into smaller cells, each cell having its own secondary node, as shown in grid 110B. In the case of the highest-level center cell C5 in the center of the grid, the root node N5 can be used as the secondary root node N5-5 of the secondary cell C5-5. The process of deriving attribute values at secondary child cells in Figure 2 is referred to as Phase 2.

可以对任何任意数量的级别重复细分单元格、创建子节点以及导出每个子节点处的属性值的处理，从而创建分层树结构和具有越来越精细的分辨率的网格。因此，内插可以从最高级别的中心单元格中的根节点开始，并且涟波下降(ripple down)到每个相继更低级别处的越来越多的节点。此外，即使本公开的原理不限于图2的3×3单元格布置，该特定拓扑(可以被描述为对角分层3×3拓扑)可以提供如下所述的许多益处。The process of subdividing cells, creating child nodes, and deriving attribute values at each child node can be repeated for any arbitrary number of levels, creating hierarchical tree structures and meshes with increasingly finer resolutions. Therefore, the interpolation can start from the root node in the center cell of the highest level, and ripple down to more and more nodes at each successively lower level. Furthermore, even though the principles of the present disclosure are not limited to the 3x3 cell arrangement of Figure 2, this particular topology (which may be described as a diagonal layered 3x3 topology) can provide many benefits as described below.

图3是根据本公开原理的用于对图像网格的属性值进行内插的方法的流程图。该方法可以在分层树的最高级别的起点112开始。在步骤114，该方法可以确定在最高级别位于图像中心的根节点处的属性值。在步骤116，可以基于属性的参数(诸如梯度)对一个或多个层级的多个子节点预先计算元数据。在步骤118，该方法可以基于当前层级的对应的根值和元数据导出当前层级的每个子节点的属性值。在步骤120，如果当前级别不是最低层级，则在步骤122将每个子节点的属性值用作向下一层级的下一级的根节点的值，并重复步骤118，否则处理在124结束。3 is a flowchart of a method for interpolating attribute values of an image grid in accordance with the principles of the present disclosure. The method may begin at the beginning 112 of the highest level of the hierarchical tree. At step 114, the method may determine the attribute value at the root node whose highest level is located at the center of the image. At step 116, metadata may be pre-computed for the plurality of child nodes of one or more hierarchies based on parameters of the attributes, such as gradients. At step 118, the method may derive attribute values for each child node of the current hierarchy based on the corresponding root values and metadata of the current hierarchy. At step 120, if the current level is not the lowest level, then at step 122 the attribute value of each child node is used as the value of the root node of the next level down to the next level, and step 118 is repeated, otherwise the process ends at 124.

关于图3描述的方法可以以多种方式被修改并且被配置和适配以用于无数的应用中。例如，即使一些实施例被示为具有3×3阵列，即，几何比N＝9，但是可以使用任何数量的节点N。然后，随着添加附加级别，树中的节点数可以根据级数1、N、N²、N³等以几何方式增长。然而，在一些实施例中，在每个级别上，树可能不会以相同比例的节点增长。The method described with respect to FIG. 3 can be modified in various ways and configured and adapted for use in a myriad of applications. For example, even though some embodiments are shown as having a 3x3 array, ie, a geometric ratio N=9, any number of nodes N may be used. Then, as additional levels are added, the number of nodes in the tree can grow geometrically according to level ¹ , N, ^N2 , N3, etc. However, in some embodiments, the tree may not grow with the same proportion of nodes at each level.

作为另一示例，除了内插连续属性值之外，该方法还可用于栅格化，以通过边界估计(即，相对于边界的点分类或点线距离估计)来确定基元的像素覆盖，从而确定特定样本位于基元的内部、外部或边界，以及其他应用。在这样的应用中，属性可以是例如二值输入/输出确定、三值输入/输出/在线确定等。分层的各个级别上的单元格和节点可以用于实现像素和/或样本的任何组合。例如，在一些实施例中，最高级别单元格C1、C2、…，C9中的每一个可以用于实现具有最高级别节点N1至N9中的每一个的像素，用作像素中的一个的最初(primary)样本。然后，附加节点可以实现用于过采样、超采样、多采样抗锯齿(MSAA)等的附加采样。在一些其他实施例中，在分层的最低级别上的最小细分单元格可以实现相对高分辨率图像的像素。As another example, in addition to interpolating continuous attribute values, the method can also be used for rasterization to determine pixel coverage of primitives by boundary estimation (i.e. point classification or point-line distance estimation with respect to the boundary), Thereby determining that a particular sample is located inside, outside, or at the boundary of primitives, among other applications. In such applications, attributes may be, for example, binary input/output determination, ternary input/output/online determination, or the like. Cells and nodes at various levels of the hierarchy can be used to implement any combination of pixels and/or samples. For example, in some embodiments, each of the highest level cells C1, C2, ..., C9 may be used to implement a pixel with each of the highest level nodes N1 through N9, used as the initial ( primary) sample. Additional nodes can then implement additional sampling for oversampling, supersampling, multi-sample antialiasing (MSAA), etc. In some other embodiments, the smallest subdivision cells at the lowest level of the hierarchy can achieve pixels of a relatively high resolution image.

网格在每个方向上可以具有任意数量的单元格，因此提供了通用n单元格乘m单元格网格，这可能会导致例如在生成和存储预先计算的元数据的成本、树深度等之间的折中。在一般的非对称n乘m网格(即n不等于m)的情况下，导出子节点的属性值可能需要的元数据可以被假定阐述为n*A、m*B、n*A+m*B、n*A-m*B。The grid can have any number of cells in each direction, so a generic n-cell by m-cell grid is provided, which may incur for example in the cost of generating and storing precomputed metadata, tree depth, etc. compromise between. In the case of a general asymmetric n-by-m grid (ie, n is not equal to m), the metadata that may be required to derive attribute values for child nodes can be assumed to be formulated as n*A, m*B, n*A+m *B, n*A-m*B.

取决于网格的拓扑，可能不存在从一个或多个最高级别节点导出的任何二级子节点。例如，如果最高级别根节点与划分两个单元格的线或四个单元格之间的交点对齐，则可能会发生这种情况，就像在一侧或两侧具有偶数个单元格的网格的情况下可能会发生这种情况。根节点N5被示出在图2的网格110A、110B的中心，并且每个其他节点被示出在其相应单元格的中心，但是可能没有必要将节点放置在这些中心位置。然而，在一些实施例中，从足以接近中心的意义上来说，将根节点或其他节点放置在中心可能是有益的，以使得能够有效地创建和内插子节点以及分层树的其他级别。Depending on the topology of the mesh, there may not be any secondary child nodes derived from one or more of the highest level nodes. This can happen, for example, if the top-level root node is aligned with a line dividing two cells or an intersection between four cells, like a grid with an even number of cells on one or both sides This may happen in the case of . The root node N5 is shown in the center of the grids 110A, 110B of FIG. 2, and every other node is shown in the center of its corresponding cell, although it may not be necessary to place the node in these central locations. However, in some embodiments, it may be beneficial to place the root node or other nodes in the center in the sense of being sufficiently close to the center to enable efficient creation and interpolation of child nodes and other levels of the hierarchical tree.

关于图2所示的方法的一个示例应用是使用平面方程在样本位置处内插属性值。等式1是二维表面的参数P的示例平面方程，可以使用定义平面的参数A、B和C在每个位置(x,y)内插：An example application for the method shown in Figure 2 is to use plane equations to interpolate attribute values at sample locations. Equation 1 is an example plane equation for the parameter P of a two-dimensional surface, which can be interpolated at each position (x,y) using the parameters A, B, and C that define the plane:

P(x，y)＝A*(x-Seed_X)+B*(y-Seed_Y)+C (等式1)P(x, y)=A*(x- Seed_X )+B*(y- Seed_Y )+C (equation 1)

其中，A是x方向上每像素(或其他单元格)的梯度，B是y方向上每像素(或其他单元格)的梯度，并且C是位置(Seed_X,Seed_Y)处P的值。where A is the gradient per pixel (or other cell) in the x direction, B is the gradient per pixel (or other cell) in the y direction, and C is the value of P at position (Seed_X, Seed_Y).

图4示出用于使用平面方程对属性值进行内插的方法的示例实施例。在每个节点处实现样本的情境下描述图4的实施例，但是原理也适用于像素或任何其他类型的节点。图4的实施例使用空间相邻的样本S1至S9的3样本乘3样本网格130，因为其可以提供如下所述的计算益处。该方法使用平面方程，诸如等式1，其中参数P用作要内插的属性。该方法对图4中的单元格的特定大小预计算元数据A、B、A+B和A-B的值。可以以任何合适的方式来确定根节点处的属性值。例如，可以通过使用通用乘法器、加法器等从图像网格130外部的其他样本或节点内插来计算根样本处的属性值。4 illustrates an example embodiment of a method for interpolating attribute values using a plane equation. The embodiment of Figure 4 is described in the context of implementing samples at each node, but the principles also apply to pixels or any other type of node. The embodiment of FIG. 4 uses a 3-sample by 3-sample grid 130 of spatially adjacent samples S1 to S9 as it may provide computational benefits as described below. This method uses a plane equation, such as Equation 1, where the parameter P is used as the property to be interpolated. The method precomputes the values of metadata A, B, A+B and A-B for the specific size of the cell in Figure 4 . The attribute value at the root node may be determined in any suitable manner. For example, the attribute value at the root sample may be calculated by interpolation from other samples or nodes outside the image grid 130 using general multipliers, adders, or the like.

一旦知道了根样本S5处的属性值，并且预先计算了元数据值A、B、A+B和A-B，则可以通过将以下元数据值简单地添加到根样本S5处的属性值来导出子样本S1-S4和S6-S9处的属性值，如图4所示：样本S1:-(A-B)；样本S2:B；样本S3:A+B；样本S4:-A；样本S6:A；样本S7:-(A+B)；样本S8:-B；及样本S9:A-B。这些计算的简单性可以通过拓扑的对称性来实现。也就是说，每个子样本位于从根样本x或y偏移零或一个公共单位的位置。在该实施例中，公共单位等于网格单元格的大小。例如，样本S6的具有1个单位的x偏移和0的y偏移，而样本S3具有1个单位的x偏移为和1个单位的y偏移。所有样本以零或单位偏移的这种排列，甚至是从根对角线放置的样本，可以使得每个子样本能够使用一个简单的加法。这进而可以使得每个子样本的加法运算能够用2输入加法器实现，与图1的传统技术中不对称差分(delta)所需的3输入加法器相比，这可以减少成本和面积。Once the attribute values at root sample S5 are known, and the metadata values A, B, A+B, and A-B are precomputed, the child can be derived by simply adding the following metadata values to the attribute values at root sample S5 The attribute values at samples S1-S4 and S6-S9 are shown in Figure 4: sample S1:-(A-B); sample S2:B; sample S3:A+B; sample S4:-A; sample S6:A; Sample S7:-(A+B); Sample S8:-B; and Sample S9:A-B. The simplicity of these computations can be achieved by the symmetry of the topology. That is, each subsample is located at an offset of zero or one common unit from the root sample x or y. In this embodiment, the common unit is equal to the size of the grid cells. For example, sample S6 has an x offset of 1 unit and a y offset of 0, while sample S3 has an x offset of 1 unit and a y offset of 1 unit. This arrangement of all samples offset by zero or units, even samples placed diagonally from the root, enables the use of a simple addition for each subsample. This in turn may enable the addition of each subsample to be implemented with a 2-input adder, which may reduce cost and area compared to the 3-input adder required for asymmetric delta in the conventional technique of FIG. 1 .

如图5所示，图4的对角分层3×3拓扑可以适用于栅格化。当与栅格化一起使用时，内插可以基于边界方程(edge equation)而不是平面方程，在这种情况下，元数据可以被预先计算为dx、dy、dx+dy和dx-dy，其中，dx可能是x方向上的差分，且dy可以是y方向上的差分。该处理可以从作为树的根值的边界处的位置的边界方程估计值开始，并且样本可以是像素中心。然后可以通过将以下元数据添加到像素P5处的起始值来计算如图5的网格132中所示的子像素的值：像素P1:-(dx-dy)；像素P2:dy；像素P3:dx+dy；像素P4:-dx；像素P6:dx；像素P7:-(dx+dy)；像素P8:-dy；像素P9:dx-dy。As shown in Figure 5, the diagonal layered 3x3 topology of Figure 4 can be adapted for rasterization. When used with rasterization, interpolation can be based on edge equations rather than plane equations, in which case metadata can be precomputed as dx, dy, dx+dy and dx-dy, where , dx may be the difference in the x direction, and dy may be the difference in the y direction. The process may start with an estimate of the boundary equation for the position at the boundary of the root value of the tree, and the samples may be pixel centers. The value of the sub-pixel as shown in grid 132 of Figure 5 can then be calculated by adding the following metadata to the starting value at pixel P5: pixel P1:-(dx-dy); pixel P2: dy; pixel P3: dx+dy; pixel P4: -dx; pixel P6: dx; pixel P7: -(dx+dy); pixel P8: -dy; pixel P9: dx-dy.

等式1可以适合于与边界方程一起使用，例如，通过在平面方程中用dx代替A，用dy代替B，并且用“起始(start)”(对于位置的边界的边界估计)代替C。Equation 1 can be adapted for use with boundary equations, eg, by substituting dx for A, dy for B, and "start" (a boundary estimate for the boundary of a location) for C in the plane equation.

可以将图4和图5的方法扩展到附加级别，如图6所示，在这种情况下，每个级别的元数据可以通过简单的缩放因子与相邻级别的元数据相关。例如，在图6的实施例中，图5的3×3网格132可以用作一级网格134A的二级细分单元格，并且预先计算的元数据dx、dy、dx+dy和dx-dy可以用于导出二级子像素。(可以将像素P1至P9重新指定为P1-1至P1-9。)可以将用于一级网格134A的预先计算的元数据计算为3dx、3dy、3(dx+dy)和3(dx-dy)。这些一级元数据可以通过将它们添加到根像素P5的起始值导出一级子像素P1至P4和P6至P9，如下所述：像素P1:-3(dx-dy)；像素P2:3dy；像素P3:3(dx+dy)；像素P4:-3dx；像素P6:3dx；像素P7:-3(dx+dy)；像素P8:-3dy；像素P9:3(dx-dy)。The methods of Figures 4 and 5 can be extended to additional levels, as shown in Figure 6, in which case the metadata at each level can be related to the metadata at adjacent levels by a simple scaling factor. For example, in the embodiment of Figure 6, the 3x3 grid 132 of Figure 5 may be used as a secondary subdivision cell of the primary grid 134A, and the precomputed metadata dx, dy, dx+dy and dx -dy can be used to derive secondary subpixels. (Pixels P1 to P9 may be redesignated as P1-1 to P1-9.) The precomputed metadata for primary grid 134A may be computed as 3dx, 3dy, 3(dx+dy), and 3(dx -dy). These first-level metadata can be derived from the first-level subpixels P1 to P4 and P6 to P9 by adding them to the starting value of the root pixel P5, as follows: pixel P1:-3(dx-dy); pixel P2:3dy ; Pixel P3: 3(dx+dy); Pixel P4: -3dx; Pixel P6: 3dx; Pixel P7: -3(dx+dy); Pixel P8: -3dy; Pixel P9: 3(dx-dy).

与对角分层3x3拓扑相关的3x因子在数字逻辑中可能特别容易实现，因为可以使用2输入加法器实现x3乘法。例如，3*x可以被实现为x+2*x，而2*x可以便宜地以浮点实现，因为2*x可以通过将x的指数增加1来实现。类似地，如果图4的3×3拓扑被扩展到另一个层级，则可以将预先计算的元数据计算为3A、3B、3(A+B)和3(A-B)。The 3x factor associated with the diagonally layered 3x3 topology can be particularly easy to implement in digital logic, since x3 multiplication can be implemented using a 2-input adder. For example, 3*x can be implemented as x+2*x, while 2*x can be implemented cheaply in floating point because 2*x can be implemented by increasing the exponent of x by 1. Similarly, if the 3x3 topology of Figure 4 is extended to another level, the precomputed metadata can be computed as 3A, 3B, 3(A+B), and 3(A-B).

图7是示出根据本公开的原理的分层内插系统的实施例的结构和数据流的微架构的框图。图7的系统可以用于实现本文公开的任何方法和处理，但不限于本公开中描述的任何实现细节。系统150包括根单元154，被配置为计算在分层树拓扑的最高级别处的根节点——例如，中心样本位置——的属性值。根单元154可以响应于可能依赖于系统的特定应用的输入152，例如通过使用通用乘法器、加法器等通过内插来计算值。例如，当用于基于平面方程对样本进行内插时，输入152可以包括等式1中的参数，包括定义平面的参数A、B和C，其中A是x方向上每像素(或其他单元格)的梯度，B是y方向上每像素(或其他单元格)的梯度，且C是位置(Seed_X,Seed_Y)上P的值。输入152还可以包括根位置(X_root,Y_root)的坐标。元数据单元156可以被配置为响应于输入152预先计算用于导出子节点的属性值的元数据，诸如在平面方程的情况下的A、B、A+B和A-B，且在边界方程的情况下的dx、dy、dx+dy和dx-dy。元数据单元156可以被配置为对分层树的每个级别使用一个元数据集合。例如，如果对最低级别预先计算的元数据包括集合M＝{A,B,A+B,A-B}，则对向上下一级别进行预先计算的集合可以是M’＝3*M，对再向上下一级别预先计算的集合可以是M”＝9*M，依此类推。7 is a block diagram illustrating a microarchitecture of the structure and data flow of an embodiment of a hierarchical interpolation system in accordance with the principles of the present disclosure. The system of FIG. 7 may be used to implement any of the methods and processes disclosed herein, but is not limited to any implementation details described in this disclosure. The system 150 includes a root unit 154 configured to compute attribute values for the root node at the highest level of the hierarchical tree topology, eg, the center sample position. The root unit 154 may compute values in response to input 152 that may depend on the particular application of the system, eg, by interpolation using general multipliers, adders, or the like. For example, when used to interpolate samples based on a plane equation, input 152 may include the parameters in Equation 1, including the parameters A, B, and C that define the plane, where A is each pixel (or other cell) in the x-direction ), B is the gradient per pixel (or other cell) in the y direction, and C is the value of P at position (Seed_X, Seed_Y). Input 152 may also include the coordinates of the root location (X_root, Y_root). The metadata unit 156 may be configured to pre-compute, in response to the input 152, metadata for deriving attribute values for child nodes, such as A, B, A+B, and A-B in the case of the plane equation, and in the case of the boundary equation dx, dy, dx+dy and dx-dy below. The metadata unit 156 may be configured to use one set of metadata for each level of the hierarchical tree. For example, if the metadata precomputed for the lowest level includes the set M={A,B,A+B,A-B}, then the set precomputed for the upper and lower levels may be M'=3*M, and for the further up The next level precomputed set may be M"=9*M, and so on.

一个或多个逻辑阶段的树158，在该示例中具有三个阶段158A、158B和158C，可以被配置为执行计算，在分层拓扑的每个级被上导出子节点的属性值。在该实施例中，假设3×3拓扑。因此，第一阶段158A可以被构建为容纳9个节点，第二阶段158B被构建为容纳81个节点，且第三阶段158C被构建为容纳729个节点。A tree 158 of one or more logical stages, in this example having three stages 158A, 158B, and 158C, may be configured to perform computations that derive attribute values for child nodes at each level of the hierarchical topology. In this example, a 3x3 topology is assumed. Thus, the first stage 158A may be constructed to accommodate 9 nodes, the second stage 158B to accommodate 81 nodes, and the third stage 158C to be constructed to accommodate 729 nodes.

输出160可以是N像素乘M像素内插输出的形式，但是在其他实施例中，输出可以具有一个或多个不同维度、节点类型等的阵列。期望的带宽，例如，每时钟周期或其他时间单位的样本数或像素数可以是x方向上的N个像素和y方向上的M个像素，以使吞吐量与可以使用和/或转换输出数据的下游处理或执行单元相匹配。The output 160 may be in the form of an N-pixel by M-pixel interpolated output, but in other embodiments the output may have one or more arrays of different dimensions, node types, and the like. The desired bandwidth, for example, the number of samples or pixels per clock cycle or other unit of time can be N pixels in the x-direction and M pixels in the y-direction, so that the throughput is comparable to the output data that can be used and/or transformed downstream processing or execution units.

为了说明的目的，图7的实施例被图示为具有三个级别的3×3拓扑，但是可以使用其他拓扑和层级(阶段)数。因此，阶段158C与输出160之间的虚线指示可以添加的附加阶段。图7的系统150可以以硬件、软件或其任何组合来实现。在硬件实现方式中，逻辑阶段的树158和元数据单元156可以被实现为具有简单的二输入加法器的组合逻辑，可以对整个树分层进行内插，也就是说，在单个时钟周期中将所有节点降至最低级别。这可能导致降低的功率和/或能量消耗和/或电路面积要求。根单元154可以用组合和同步逻辑来实现，以集成到更大图像处理系统的时钟中。在一些硬件实现方式中，系统150可以被集成到集成电路(IC)上的图形处理单元(GPU)中，其中，它可以使得能够改善渲染帧速率。For illustrative purposes, the embodiment of Figure 7 is illustrated as a 3x3 topology with three levels, although other topologies and numbers of levels (stages) may be used. Thus, the dashed line between stage 158C and output 160 indicates additional stages that may be added. The system 150 of FIG. 7 may be implemented in hardware, software, or any combination thereof. In a hardware implementation, the logic stage tree 158 and metadata unit 156 can be implemented as combinatorial logic with a simple two-input adder that can interpolate the entire tree hierarchy, that is, in a single clock cycle Drop all nodes to the lowest level. This may result in reduced power and/or energy consumption and/or circuit area requirements. The root unit 154 may be implemented with combinatorial and synchronization logic for integration into the clock of a larger image processing system. In some hardware implementations, system 150 may be integrated into a graphics processing unit (GPU) on an integrated circuit (IC), where it may enable improved rendering frame rates.

在软件实现的情况下，本文公开的方法和架构可以减少用于加法和/或减法运算所需的恒定暂存空间。在一些混合实施例中，一系列分层树阶段可以在硬件中实现并且被馈送由软件提供的根值和/或元数据。In the case of a software implementation, the methods and architectures disclosed herein can reduce the constant scratch pad space required for addition and/or subtraction operations. In some hybrid embodiments, a series of hierarchical tree stages may be implemented in hardware and fed with root values and/or metadata provided by software.

图7的系统以及本文公开的其他实施例的一些其他潜在益处如下。如果N是每时钟周期要内插的样本或其他节点的数量，即，网格树最低级别中的节点数量，则树的逻辑深度可以由N+1以9为底数的对数给出，即log₉(N+1)。这可以与图1的传统技术有利地比较，在传统技术中，随着N的值的增加，逻辑级可能几何增加。此外，对角分层，特别是在3×3实现中，可以减少加法处理的逻辑级别和/或关键路径。因此，对于高频设计合成和由于通过分层树的一个或多个阶段的传播延迟而减少的等待时间可能是有利的。此外，由于可能需要存储相对更少的元数据值，例如，每级别的A、B、A+B和A-B或其缩放版本，因此这可以降低每计算值的成本。Some other potential benefits of the system of FIG. 7 and other embodiments disclosed herein are as follows. If N is the number of samples or other nodes to interpolate per clock cycle, i.e. the number of nodes in the lowest level of the grid tree, then the logical depth of the tree can be given by the base-9 logarithm of N+1, i.e. log ₉ (N+1). This can be advantageously compared to the conventional technique of FIG. 1, in which the logical level may increase geometrically as the value of N increases. Additionally, diagonal layering, especially in 3x3 implementations, can reduce the logic level and/or critical path of additive processing. Thus, it may be advantageous for high frequency design synthesis and reduced latency due to propagation delays through one or more stages of the hierarchical tree. Furthermore, this can reduce the cost per computed value since relatively fewer metadata values may need to be stored, eg, A, B, A+B, and AB or scaled versions thereof per level.

图7的实施例以及本文公开的其他实施例可以在串行混合配置中实现，其中可以使用常规内插技术来实现一个或多个更高级别，例如，顺序遍历的x和y路径，如图1所示。混合配置可以简化一个或多个更高级别的实现，而在更低级别、即靠近或位于底部叶节点处仍然使用分层树拓扑，分层树拓扑的成本节省在更低级别可能最大。此外，当使用尺寸除3×3外的网格时，电子设计自动化(EDA)平台中的综合工具可能能够自动优化未使用的叶节点。The embodiment of FIG. 7, as well as other embodiments disclosed herein, may be implemented in a serial hybrid configuration, where conventional interpolation techniques may be used to implement one or more higher levels, eg, sequentially traversed x and y paths, as shown in FIG. 1 shown. A hybrid configuration can simplify the implementation at one or more higher levels, while still using a hierarchical tree topology at lower levels, ie near or at the bottom leaf nodes, where the cost savings of a hierarchical tree topology may be greatest. Additionally, synthesis tools in electronic design automation (EDA) platforms may be able to automatically optimize unused leaf nodes when using meshes with dimensions other than 3×3.

图14是示出根据本公开原理的分层内插系统的实施例的结构和数据流的另一微架构的框图。图14的系统151在架构上可以类似于图7的系统150，但是它可以包括具有阶段159A、159B、159C、……的广义树159，这些阶段分别具有“I”个节点、I²个节点、I³个节点……，可以通过I＝W*Z来给出“I”，其中，W和Z可以分别表示在x方向和y方向上的节点数。因此，每个节点可以在下一阶段分支到I个节点。随着树从根节成长点到叶节点，每个阶段的节点数量可能会根据以下模式或级数进行增长：1、I、I²、I³、I⁴、……。14 is a block diagram illustrating another microarchitecture of the structure and data flow of an embodiment of a hierarchical interpolation system in accordance with the principles of the present disclosure. The system 151 of FIG. 14 may be similar in architecture to the system 150 of FIG. ⁷ , but it may include a generalized tree 159 having stages 159A, 159B, 159C, . , I ³ nodes..., "I" can be given by I=W*Z, where W and Z can represent the number of nodes in the x-direction and y-direction, respectively. Therefore, each node can branch to 1 nodes in the next stage. As the tree grows from the root node to the leaf nodes, the number of nodes at each stage may grow according to the following patterns or series: ¹ , I, ^I2 , ^I3 , I4, . . .

在设计处理期间可以将数字I、W和Z选择为常量。图7的实施例可以被视为图8的实施例的特例，其中，W＝3、Z＝3，因此I＝9。The numbers I, W and Z can be chosen as constants during the design process. The embodiment of FIG. 7 can be regarded as a special case of the embodiment of FIG. 8 , where W=3, Z=3, and thus I=9.

图14的实施例的特征可以在于具有由O(logN)给出的通用逻辑级别或树深度，其中N可以是分层树的最低级别中的节点数，并且O可以表示可能与渐近界有关的通用复杂度符号。例如，在每个节点可以在下一阶段分支到“I”个节点的实施例中，逻辑级别以及因此通过树的各阶段的传播延迟所引起的等待时间可以由N以I为底数的对数给出，即O(log_IN)。取决于实现细节，这可以与图1的传统技术有利地比较，在传统技术中，随着N的增加，逻辑级可以由O(N)给出。因此，在一些实施例中，具有如图14所示的通用树拓扑的系统可以将逻辑级别和/或传播延迟/内插延迟从O(N)减小到O(logN)。The embodiment of FIG. 14 may be characterized as having a general logical level or tree depth given by O(logN), where N may be the number of nodes in the lowest level of the hierarchical tree, and O may indicate that the asymptotic bound may be relevant Generic complexity notation for . For example, in an embodiment where each node can branch to "I" nodes in the next stage, the logic level and thus the latency caused by the propagation delay through the stages of the tree can be given by the logarithm of N to the base 1 out, which is O(log _IN ). Depending on the implementation details, this can be compared favorably with the conventional technique of Figure 1, where the logic level can be given by O(N) as N increases. Thus, in some embodiments, a system with a generic tree topology as shown in Figure 14 can reduce the logic level and/or propagation delay/interpolation delay from O(N) to O(logN).

在一些实施例中，元数据可以被视为具有三个通用分量：X分量、Y分量和XY分量。例如，当使用边界方程时，X、Y和XY分量可以分别是dx、dy和dx+/-dy。X、Y和XY分量可以分别被指定为META_X、META_Y和META_X+/-META_Y。正如从根节点移动到叶节点，每个阶段的节点数量增长一样，在相反的方向上，从叶节点移动到根节点，元数据可以增长，如：{META_X,META_Y,META_X+/-META_Y,…},{W*META_X,Z*META_Y,W*META_X+/-Z*META_Y…},{W²*META_X,Z²*META_Y,W²*META_X+/-Z²*META_Y…},{W³*META_X,Z³*META_Y,W³*META_X+/-Z³*META_Y…}…。In some embodiments, metadata can be viewed as having three general components: an X component, a Y component, and an XY component. For example, when using boundary equations, the X, Y, and XY components may be dx, dy, and dx+/-dy, respectively. The X, Y, and XY components may be specified as META_X, META_Y, and META_X+/-META_Y, respectively. Just as moving from the root node to the leaf node, the number of nodes at each stage grows, in the opposite direction, moving from the leaf node to the root node, the metadata can grow, such as: {META_X,META_Y,META_X+/-META_Y,… },{W*META_X,Z*META_Y,W*META_X+/-Z* ^META_Y …},{W2* ^META_X ,Z2* ^META_Y ,W2* ^META_X +/-Z2* ^META_Y …},{W3* META_X,Z3* ^META_Y ,W3* ^META_X +/-Z3* ^META_Y …}….

元数据单元157可以被配置为对分层树的每个级别使用一个元数据集合。例如，如果对最低级别预先计算的元数据包括集合M＝{META_X,META_Y,META_X+META_Y,META_X-META_Y}，则对后续向上级别预先计算的集合可以为M’＝I*M、M”＝I²*M等。The metadata unit 157 may be configured to use one set of metadata for each level of the hierarchical tree. For example, if the metadata precomputed for the lowest level includes the set M={META_X,META_Y,META_X+META_Y,META_X-META_Y}, then the set precomputed for subsequent up levels may be M'=I*M,M"= ^I2 *M, etc.

与图7的实施例一样，图14的实施例可以以硬件、软件或其任何组合来实现。可以使用任何数量的阶段，并且可以生成任何N像素乘M像素内插输出。Like the embodiment of FIG. 7, the embodiment of FIG. 14 may be implemented in hardware, software, or any combination thereof. Any number of stages can be used and any N-pixel by M-pixel interpolated output can be generated.

图7和图14的实施例以及本文中公开的其他实施例可以被配置为与多样本抗锯齿(MSAA)一起使用，每个像素使用多个样本以改善图像质量。MSAA的公共布置是每个像素使用四个样本，这些样本在像素内以旋转的2×2网格排列。这可以称为4×或4至1的MSAA，但是可以使用2×、8×和MSAA的其他变体。The embodiments of Figures 7 and 14, as well as other embodiments disclosed herein, may be configured for use with multi-sample anti-aliasing (MSAA), using multiple samples per pixel to improve image quality. The common arrangement of MSAA is to use four samples per pixel, which are arranged in a rotated 2×2 grid within the pixel. This can be referred to as 4x or 4 to 1 MSAA, but 2x, 8x and other variants of MSAA can be used.

例如，为了与MSAA一起操作，可以通过复制或分叉树结构来修改图7的实施例，以使其能够处理每个像素的附加样本，如图8所示。图8的系统170可以是大体上类似于图7的系统150，但是添加重定向单元162，该重定向单元162可以响应于模式选择输入164而操作以重新布置将样本从根单元155定向到树166的方式。模式选择输入164可以使系统能够在MSAA模式和非MSAA模式之间切换。根单元155也可以被修改为响应于模式选择输入164而重新布置将样本定向到树166的方式。包括阶段166A、166B和166C的树166也可以被修改为在分层的每个阶段处理附加节点。每个阶段的节点数可以乘以例如MSAA模式下每个像素中的样本数。例如，阶段166A可以处理(9×NUM_SAMPLES)个节点，在NUM_SAMPLES＝4的情况下，其可以是36个节点。将阶段的大小增加与MSAA模式下每个像素的样本数相同的倍数可能有助于在MSAA模式和非MSAA模式之间切换树。根据实现细节，这可能会在简单性和性能方面提供实质性的改进，其可能甚于任何潜在的硬件成本增加。For example, to operate with MSAA, the embodiment of FIG. 7 may be modified by duplicating or forking the tree structure to enable it to handle additional samples per pixel, as shown in FIG. 8 . The system 170 of FIG. 8 may be substantially similar to the system 150 of FIG. 7 , but with the addition of a redirection unit 162 that may operate in response to the mode selection input 164 to rearrange the directing samples from the root unit 155 to the tree 166 way. Mode selection input 164 may enable the system to switch between MSAA mode and non-MSAA mode. Root cell 155 may also be modified to rearrange the manner in which samples are directed to tree 166 in response to mode selection input 164 . Tree 166, including stages 166A, 166B, and 166C, may also be modified to handle additional nodes at each stage of the hierarchy. The number of nodes per stage can be multiplied by, for example, the number of samples per pixel in MSAA mode. For example, stage 166A may handle (9 x NUM_SAMPLES) nodes, which may be 36 nodes in the case of NUM_SAMPLES=4. Increasing the size of the stage by the same multiple as the number of samples per pixel in MSAA mode may help to switch the tree between MSAA mode and non-MSAA mode. Depending on implementation details, this may provide substantial improvements in simplicity and performance, which may outweigh any potential hardware cost increase.

在一些实现方式中，在MSAA模式下(即，当模式选择输入164是活动的时)，修改的根单元155可以通过在网格的中心像素中找到/选择多个像素中的一个用作根样本而开始。(例如，可以选择中心像素的左角中的样本作为根样本。)然后，根单元155可以从根样本扩展并内插到中心像素中的样本数量，在该示例中，假定为四个(NUM_SAMPLES＝4)。然后，根单元155和重定向单元162可以将四个样本从中心像素定向到树166，然后可以应用对角分层3×3拓扑来计算与中心像素相邻的其他像素的其他样本的值，依此类推。因此，可以对多个样本中的每一个独立地实现分层树结构。In some implementations, in MSAA mode (ie, when the mode selection input 164 is active), the modified root cell 155 can be used as the root by finding/selecting one of the plurality of pixels in the center pixel of the grid start with the sample. (For example, the sample in the left corner of the center pixel may be selected as the root sample.) The root unit 155 may then extend and interpolate from the root sample to the number of samples in the center pixel, which in this example is assumed to be four (NUM_SAMPLES = 4). The root unit 155 and redirection unit 162 may then direct the four samples from the center pixel to the tree 166, which may then apply a diagonal hierarchical 3x3 topology to compute the values of other samples of other pixels adjacent to the center pixel, So on and so forth. Therefore, a hierarchical tree structure can be implemented independently for each of the plurality of samples.

在MSAA模式下，样本输出160的布局可能需要旋转以适应下游处理单元的期望，例如，以补偿样本在每个像素中的布置。重定向单元162可以向图8的架构添加一个逻辑级别，但是它可能相对具有成本效益。In MSAA mode, the layout of sample outputs 160 may need to be rotated to accommodate the expectations of downstream processing units, eg, to compensate for the placement of samples in each pixel. The redirection unit 162 may add one level of logic to the architecture of Figure 8, but it may be relatively cost effective.

在非MSAA模式下(即，当模式选择输入164是活动的时)，根单元155和重定向单元162可以通过添加与原始像素网格对应的值除以NUM_SAMPLES而在根及其连接的节点上进行一些修改来重新配置树166的输入，以供重用。In non-MSAA mode (ie, when the mode select input 164 is active), the root cell 155 and redirection cell 162 may be on the root and its connected nodes by adding the value corresponding to the original pixel grid divided by NUM_SAMPLES Some modifications are made to reconfigure the inputs of tree 166 for reuse.

在一些可选实施例中，树166可以被配置为分叉来自先前阶段的3×3网格的中心样本，以跨越像素内的样本。在该实施例中，可以使用混合树，其中为简化设计，可以用常规配置实现一个或多个阶段。In some alternative embodiments, tree 166 may be configured to fork the center samples from the 3x3 grid of previous stages to span samples within a pixel. In this embodiment, a hybrid tree can be used, where one or more stages can be implemented in a conventional configuration to simplify the design.

尽管图8的实施例被示为具有3×3拓扑，但是可以使用任何拓扑，包括图14的实施例的通用形式。每个节点可以在下一个级别上分支到“I”个节点，其中I＝W*Z，其中W和Z可以表示x方向和y方向上的节点数。Although the embodiment of FIG. 8 is shown as having a 3x3 topology, any topology may be used, including generalized forms of the embodiment of FIG. 14 . Each node can branch to "I" nodes on the next level, where I=W*Z, where W and Z can represent the number of nodes in the x and y directions.

在一些实施例中，可以将N乘M网格细分为更小的子网格，从而可以为每个子网格内插不同的属性。例如，这可以通过在每个子网格的中心开始并确定要在该子网格的中心对该子网格进行内插的属性的根值来实现。在找到每个子网格的根值之后，可以在每个子网格上使用分层树拓扑对每个整个子网格的属性值进行内插。In some embodiments, the N by M grid can be subdivided into smaller sub-grids so that different attributes can be interpolated for each sub-grid. This can be accomplished, for example, by starting at the center of each subgrid and determining the root value of the attribute to interpolate for that subgrid at the center of that subgrid. After finding the root value of each subgrid, the attribute values of each entire subgrid can be interpolated using a hierarchical tree topology on each subgrid.

在一些实施例中，系统可以配置有多个树和/或根单元，其中每个树和/或根单元可以用于对子网格中的一个的值进行内插。例如，如果将网格细分为k个子网格，则系统可以包括k个分层树以对每个根节点的子节点进行内插，以及k个根单元以确定每个子网格中心的起始根值。In some embodiments, the system may be configured with multiple tree and/or root cells, where each tree and/or root cell may be used to interpolate the value of one of the sub-grids. For example, if the grid is subdivided into k subgrids, the system may include k hierarchical trees to interpolate the children of each root node, and k root cells to determine the origin of each subgrid center Root value.

图9至图11示出可以根据本公开的原理细分的网格的实施例。在图9中，网格180未被细分(k＝1)，并且整个网格可以由单个分层树进行内插，该单个分层树从在整个网格的中心处被标识为ROOT的根节点开始，并且向外扩展，如箭头所示。在图10中，网格被细分为两个子网格182和184(k＝2)。每个子网格可以由不同的分层树使用不同的属性进行内插，从两个子网格的中心的根节点ROOT1和ROOT2中的一个开始。在图11中，网格180被细分为四个子网格186、188、190和192(k＝4)。每个子网格可以由不同的分层树使用不同的属性进行内插，从四个子网格的中心的根节点ROOT1、ROOT2、ROOT3和ROOT4中的一个开始。无论细分的级别如何，都可以使用不同的子网格对相同或不同基元的属性进行内插。9-11 illustrate embodiments of grids that may be subdivided in accordance with the principles of the present disclosure. In FIG. 9, the grid 180 is not subdivided (k=1), and the entire grid can be interpolated by a single hierarchical tree from the center of the entire grid identified as ROOT The root node starts and expands outward as indicated by the arrows. In Figure 10, the grid is subdivided into two sub-grids 182 and 184 (k=2). Each subgrid can be interpolated by a different hierarchical tree using different attributes, starting from one of the root nodes ROOT1 and ROOT2 in the center of the two subgrids. In Figure 11, grid 180 is subdivided into four sub-grids 186, 188, 190 and 192 (k=4). Each subgrid can be interpolated by a different hierarchical tree using different attributes, starting from one of the root nodes ROOT1, ROOT2, ROOT3, and ROOT4 in the center of the four subgrids. Properties of the same or different primitives can be interpolated using different subgrids, regardless of the level of subdivision.

图12是示出根据本公开原理的多属性分层内插系统的实施例的结构和数据流的微架构的框图。图12的示例系统194示出具有两个树(k＝2)的实施例，但是原理可以扩展到对于任何数量的子网格具有任何数量的树的实施例。系统194被示为具有用于支持MSAA的功能，但是与多属性内插有关的功能独立于MSAA功能，并且可以省略MSAA功能。12 is a block diagram illustrating a microarchitecture of the structure and data flow of an embodiment of a multi-attribute hierarchical interpolation system in accordance with the principles of the present disclosure. The example system 194 of FIG. 12 shows an embodiment with two trees (k=2), but the principles can be extended to embodiments with any number of trees for any number of sub-grids. System 194 is shown with functionality for supporting MSAA, but the functionality related to multiple attribute interpolation is independent of MSAA functionality, and MSAA functionality may be omitted.

图12的系统194包括第一分层树166A、根单元155A和元数据单元156A，它们能够基本上独立于并行的第二分层树166B、根单元155B和元数据单元156B进行操作。然而，根据同一时钟操作两个半部和/或使它们在某些模式下一起工作可能是有益的。系统的每一半可以接收单独的根位置和平面方程输入152A和152B。调度器196可以被配置为向系统194的两个半部提供不同的输入152A和152B。例如，调度器196可以提供不同的根位置和平面方程，以使得系统能够执行图10所示的并行内插。The system 194 of FIG. 12 includes a first hierarchical tree 166A, a root unit 155A, and a metadata unit 156A that are capable of operating substantially independently of the parallel second hierarchical tree 166B, root unit 155B, and metadata unit 156B. However, it may be beneficial to operate both halves from the same clock and/or have them work together in certain modes. Each half of the system may receive separate root position and plane equation inputs 152A and 152B. The scheduler 196 may be configured to provide different inputs 152A and 152B to the two halves of the system 194 . For example, scheduler 196 may provide different root positions and plane equations to enable the system to perform the parallel interpolation shown in FIG. 10 .

图12的系统可以用于每个时钟周期基于两个不同的平面方程和两个不同的根位置对两个不同的子网格的两个不同的属性独立地进行内插。系统的每一半部都可以使用其相应的根位置和平面方程在其相应网格的中心位置处找到的属性值，然后实现分层树(诸如树166A或166B)，以通过其子网格的其余部分估计属性值。例如，第一根单元155A可以确定图10中的ROOT1的值，而根单元155B可以确定图10中的ROOT2的值。在该示例中，树实现对角3×3拓扑，但是可以使用其他拓扑。例如，可以使用任何通用拓扑，如图14的实施例所示。每个节点可以在下一级分支到“I”个节点，其中I＝W*Z，并且其中W和Z可以表示x方向和y方向的节点数。The system of Figure 12 can be used to independently interpolate two different properties of two different sub-grids per clock cycle based on two different plane equations and two different root positions. Each half of the system can then implement a hierarchical tree (such as tree 166A or 166B) using its corresponding root position and the attribute value found at the center of its corresponding grid by the plane equation The rest estimates attribute values. For example, the first root cell 155A may determine the value of ROOT1 in FIG. 10 , while the root cell 155B may determine the value of ROOT2 in FIG. 10 . In this example, the tree implements a diagonal 3x3 topology, but other topologies can be used. For example, any general topology can be used, as shown in the embodiment of FIG. 14 . Each node can branch to "I" nodes at the next level, where I=W*Z, and where W and Z can represent the number of nodes in the x-direction and the y-direction.

与上述其他实施例一样，图12的实施例可以用硬件、软件或其任何合适的组合来实现。如果使用树166A和166B的组合逻辑在硬件中实现，则系统可能能够在单个时钟周期对两个子网格进行内插，例如图10所示的那些。该系统也可以缩放为包括任意数量的树，用于同时对任意数量的子网格进行内插。如图9-图11所示，将整个网格细分为相等大小的子网格可能是有益的，这是因为这可以使树得到平衡并降低树的逻辑级别。系统的硬件配置也可以适于平衡各种因素，诸如成本、功率和能量消耗、性能等。例如，图12的多属性实施例中的每个树都可以用一半的硬件量来实现，作为单属性实施例，这可能导致每个半部作为单属性版本以大约一半的速度运行，但是在组合输出中仍然保持每个时钟周期相同的N×M的样本吞吐量。可选地，可以用与单属性实施例相同数量的硬件来实现每个半部。这可以有效地使硬件数量加倍，并使合并的N×M个样本输出的样本吞吐量加倍。Like the other embodiments described above, the embodiment of FIG. 12 may be implemented in hardware, software, or any suitable combination thereof. If implemented in hardware using the combinatorial logic of trees 166A and 166B, the system may be able to interpolate two sub-grids, such as those shown in FIG. 10, in a single clock cycle. The system can also be scaled to include any number of trees for interpolating any number of subgrids simultaneously. As shown in Figures 9-11, it may be beneficial to subdivide the entire grid into equally sized sub-grids because this balances the tree and reduces the logical level of the tree. The hardware configuration of the system may also be adapted to balance various factors, such as cost, power and energy consumption, performance, and the like. For example, each tree in the multi-attribute embodiment of Figure 12 can be implemented with half the amount of hardware, which, as a single-attribute embodiment, may result in each half running at about half the speed as a single-attribute version, but in The combined output still maintains the same N×M sample throughput per clock cycle. Alternatively, each half could be implemented with the same amount of hardware as the single attribute embodiment. This effectively doubles the amount of hardware and doubles the sample throughput for the combined N×M sample output.

在一些实施例中，例如，在需要对每个时钟少于k个属性进行内插的情况下，多属性分层树可以被配置为共享资源。这可以例如通过在靠近树的头部包括多路复用器和/或加法器来实现。在一些实施例中，这可以使系统即使在共享资源的同时也能够保持每个时钟相同的N×M样本吞吐量。In some embodiments, for example, where less than k attributes per clock need to be interpolated, a multi-attribute hierarchical tree may be configured to share resources. This can be accomplished, for example, by including multiplexers and/or adders near the head of the tree. In some embodiments, this may enable the system to maintain the same throughput of NxM samples per clock even while sharing resources.

图12的实施例包括可使它能够被重新配置以共享资源的功能。例如，如果对整个网格只需要对一个属性进行内插，则可以重新配置系统，使得两个树可以被配置为使用相同的根位置和平面方程输入对网格的一半进行内插。在这种操作模式下，第一树166A可以使用第一根单元155A以正常方式操作，以使用来自输入152A的第一根位置和平面方程确定ROOT1处的中心样本。然而，在该资源共享操作模式下，多路复用器202可以从输入152A中选择第一根位置和平面方程作为对第二树的第二元数据单元156B的输入。另外，在该资源共享操作模式下，另一个多路复用器200可以选择第一根单元155A的输出，但是具有由加法器198添加的偏移，以将第二树的根位置放置在距ROOT1的偏移处。也就是说，加法器198和多路复用器200可以实质上用适当的值替换ROOT2，以使得第二树能够使用第一属性对其子网格进行内插。因此，系统的两个半部可以并行操作以在整个网格180上对一个属性进行内插。The embodiment of Figure 12 includes functionality that may enable it to be reconfigured to share resources. For example, if only one property needs to be interpolated for the entire grid, the system can be reconfigured so that both trees can be configured to interpolate half of the grid using the same root position and plane equation input. In this mode of operation, the first tree 166A may operate in the normal manner using the first root cell 155A to determine the center sample at ROOT1 using the first root position and the plane equation from the input 152A. However, in this resource sharing mode of operation, the multiplexer 202 may select the first root position and plane equation from the input 152A as input to the second metadata unit 156B of the second tree. Additionally, in this resource sharing mode of operation, another multiplexer 200 may select the output of the first root cell 155A, but with an offset added by the adder 198 to place the root position of the second tree at a distance from Offset of ROOT1. That is, adder 198 and multiplexer 200 may substantially replace ROOT2 with an appropriate value to enable the second tree to interpolate its sub-grids using the first attribute. Thus, the two halves of the system can operate in parallel to interpolate an attribute over the entire grid 180 .

关于图12示出的原理可能有助于配置内插系统以适应各种系统需求。例如，随着对系统样本吞吐量的需求增加，例如，由于每个网格的样本数量增加，可能会越来越需要同时多个属性内插(来自相同或不同的基元)以确保足够样本被内插。在基元的大小较小的情况下和/或在仅具有部分样本覆盖的基元的角落进行内插时，可以同时内插多个属性可能很有用，因此可能有助于提高利用率。关于图12所示的原理可以适于在这些情况中的任何情况下帮助改善系统性能、效率等。The principles illustrated with respect to FIG. 12 may be helpful in configuring an interpolation system to suit various system requirements. For example, as demands on the system sample throughput increase, e.g. due to an increase in the number of samples per grid, there may be an increasing need to interpolate multiple attributes simultaneously (from the same or different primitives) to ensure sufficient samples is interpolated. In cases where the size of the primitives is small and/or when interpolating at the corners of primitives with only partial sample coverage, it may be useful to be able to interpolate multiple attributes simultaneously, and thus may help improve utilization. The principles shown with respect to FIG. 12 can be adapted to help improve system performance, efficiency, etc. in any of these situations.

图13示出成像设备204的实施例，本公开中描述的任何方法或装置可以集成到该成像设备204中。成像设备204可以具有任何形式，诸如用于PC、膝上型计算机、移动设备等的面板显示器，投影仪，VR护目镜等，并且可以基于任何成像技术，诸如阴极射线管(CRT)，数字光投影仪(DLP)，发光二极管(LED)，液晶显示器(LCD)，有机LED(OLED)，量子点等，用于显示带有像素的栅格化图像206。诸如图形处理单元(GPU)和/或驱动器电路212的图像处理器210可以将图像处理和/或转换为可以在成像设备204上或通过成像设备204显示的形式。图像206的一部分被放大示出，使得像素208是可见的。本公开中描述的任何方法或装置可以被集成到成像设备204、处理器210和/或驱动器电路212中以对图13中所示的任何像素208进行内插。在一些实施例中，图像处理器210可以包括例如在集成电路211上实现的诸如上面描述的那些分层树拓扑中的任何一个。在一些实施例中，集成电路211还可以包括驱动器电路212和/或可以实现成像设备204的任何其他功能的任何其他组件。FIG. 13 illustrates an embodiment of an imaging device 204 into which any of the methods or apparatus described in this disclosure may be integrated. Imaging device 204 may be of any form, such as panel displays for PCs, laptops, mobile devices, etc., projectors, VR goggles, etc., and may be based on any imaging technology, such as cathode ray tubes (CRT), digital light Projectors (DLPs), Light Emitting Diodes (LEDs), Liquid Crystal Displays (LCDs), Organic LEDs (OLEDs), Quantum Dots, etc., are used to display the rasterized image 206 with pixels. An image processor 210 , such as a graphics processing unit (GPU) and/or driver circuitry 212 , can process and/or convert the image into a form that can be displayed on or by the imaging device 204 . A portion of image 206 is shown enlarged so that pixels 208 are visible. Any of the methods or apparatus described in this disclosure may be integrated into the imaging device 204 , the processor 210 and/or the driver circuit 212 to interpolate any of the pixels 208 shown in FIG. 13 . In some embodiments, image processor 210 may include any of a hierarchical tree topology such as those described above implemented on integrated circuit 211, for example. In some embodiments, integrated circuit 211 may also include driver circuitry 212 and/or any other components that may implement any other functionality of imaging device 204 .

除了上面提到的那些以外，并且取决于实现细节和情况，本公开的原理可以提供以下优点和/或特征中的任何一些或全部：可被缩放至各种像素网格维度的方法和/或装置；分层拓扑，包括对角3×3拓扑。可以减小面积、能量和/或功耗，并且可以适用于任何样本/像素内插单元/模块；分层拓扑，包括对角3×3拓扑，可以在基于边界方程的内插上应用，这可能对有效栅格化有用；混合树拓扑，包括对角3×3拓扑，与传统设计相结合可以节省成本并降低复杂度；具有分层拓扑，包括对角3×3拓扑，的样本内插可以结合MSAA模式操作实现；分层拓扑，包括对角3×3拓扑，可以被应用于x和y方向上相邻样本整数阵列的任何内插吞吐量；分层拓扑，包括对角3×3拓扑，可以被缩放到任何其他样本/像素集合，以与树内插一起使用；本文公开的方法和装置可以与任何属性数据格式一起使用；为了支持对多个块(例如，k个块)的多个属性进行内插，可以使用在接近树的头部具有k个叶节点的分叉点来构造内插树。In addition to those mentioned above, and depending on implementation details and circumstances, the principles of the present disclosure may provide any or all of the following advantages and/or features: methods scalable to various pixel grid dimensions and/or Device; Hierarchical topologies, including diagonal 3×3 topologies. Area, energy and/or power consumption can be reduced and can be applied to any sample/pixel interpolation unit/module; hierarchical topologies, including diagonal 3×3 topologies, can be applied on boundary equation based interpolation, which May be useful for efficient rasterization; hybrid tree topologies, including diagonal 3×3 topologies, combined with traditional designs can save cost and reduce complexity; sample interpolation with hierarchical topologies, including diagonal 3×3 topologies Can be implemented in conjunction with MSAA mode operation; hierarchical topologies, including diagonal 3×3 topologies, can be applied to any interpolation throughput of integer arrays of adjacent samples in the x and y directions; hierarchical topologies, including diagonal 3×3 topology, can be scaled to any other set of samples/pixels for use with tree interpolation; the methods and apparatus disclosed herein can be used with any attribute data format; in order to support multiple blocks (eg, k blocks) Multiple attributes are interpolated, and an interpolated tree can be constructed using bifurcation points with k leaf nodes near the head of the tree.

在一些实施例中，在每个阶段、即在每个级别，节点的数量可以遵循几何级数。另外，在3×3拓扑的情况下，每个阶段的成本可能近似等于先前阶段成本的九倍。因此，如果最后阶段面积为A，则总面积TA可以由下式给出：TA＝A+A/9+A/81+A/729…＝A×(9/8)。使用此近似值，表1中提供了基于面积的示例成本汇总，用于基于以下假设的栅格化实现：(1)近似值基于混叠模式(即，不是多样本抗锯齿)；(2)使用定点算法基于栅格化器dx、dy和起始点进行面积估计；(3)使用在x和y方向上具有相同数量的样本的对称网格。表1中显示的值仅用于说明目的，可能不代表物理或模拟实现中的实际值。In some embodiments, at each stage, ie, at each level, the number of nodes may follow a geometric progression. Additionally, in the case of a 3×3 topology, the cost of each stage may be approximately equal to nine times the cost of the previous stage. Therefore, if the final stage area is A, the total area TA can be given by: TA=A+A/9+A/81+A/729...=A×(9/8). Using this approximation, an example area-based cost summary is provided in Table 1 for a rasterization implementation based on the following assumptions: (1) the approximation is based on aliasing mode (i.e., not multi-sample antialiasing); (2) fixed-point is used The algorithm performs area estimation based on rasterizers dx, dy, and starting points; (3) uses a symmetric grid with the same number of samples in the x and y directions. The values shown in Table 1 are for illustration purposes only and may not represent actual values in physical or simulated implementations.

表1Table 1

像素pixel 传统技术traditional technology 对角分层3×3拓扑Diagonal layered 3×3 topology ％改善率% improvement rate 4×44×4 99 66 3333 8×88×8 4444 1919 5757 12×1212×12 100100 4141 5959 16×1616×16 200200 7474 6363 24×2424×24 500500 160160 6868 32×3232×32 10001000 260260 7474

图15示出根据本公开的计算系统的实施例。图15的系统300可以用于实现本公开中描述的任何或所有方法和/或装置。系统300可以包括中央处理单元(CPU)302、存储器304、存储设备306、图形处理单元(GPU)307、用户接口308、网络接口310和电源312。可以在GPU 307中实现根据本公开的分层树结构的完整硬件实现，而可以在CPU 302内完全实现完整软件实现。在其他实施例中，分层树结构的完整硬件实现可以在CPU 302中实现为集成图形处理单元(IGPU)。在其他实施例中，GPU 307可用于实现串行混合配置，其中，分层树结构307的较高级可以在GPU中使用传统硬件来实现，而较低级可以在GPU 307和/或CPU 302中使用分层树拓扑在硬件和/或软件中实现。在其他实施例中，根据本公开的分层树结构可以使用系统300的任何组件在硬件和/或软件的任何适当组合之间分配。此外，本公开的原理不限于利用图15所示的任何组件来实现，但是可以用任何合适的硬件、软件或其组合来实现。15 illustrates an embodiment of a computing system in accordance with the present disclosure. The system 300 of FIG. 15 may be used to implement any or all of the methods and/or apparatuses described in this disclosure. System 300 may include central processing unit (CPU) 302 , memory 304 , storage device 306 , graphics processing unit (GPU) 307 , user interface 308 , network interface 310 , and power supply 312 . A complete hardware implementation of the hierarchical tree structure according to the present disclosure may be implemented in GPU 307 , while a complete software implementation may be implemented fully within CPU 302 . In other embodiments, a complete hardware implementation of the hierarchical tree structure may be implemented in CPU 302 as an integrated graphics processing unit (IGPU). In other embodiments, GPU 307 may be used to implement a serial hybrid configuration, wherein higher levels of hierarchical tree structure 307 may be implemented in the GPU using conventional hardware, while lower levels may be implemented in GPU 307 and/or CPU 302 Implemented in hardware and/or software using a hierarchical tree topology. In other embodiments, hierarchical tree structures in accordance with the present disclosure may be distributed among any suitable combination of hardware and/or software using any component of system 300 . Furthermore, the principles of the present disclosure are not limited to being implemented with any of the components shown in FIG. 15, but may be implemented in any suitable hardware, software, or combination thereof.

在不同的实施例中，系统可以省略这些组件中的任何一个，或者可以包括这些组件中的任何一个的重复或任何附加数量，以及实现本公开中描述的任何方法和/或装置的任何其他类型的组件。In various embodiments, the system may omit any of these components, or may include repetition or any additional number of any of these components, as well as any other type of implementation of any of the methods and/or apparatus described in this disclosure s component.

CPU 302可以包括任何数量的核、高速缓存、总线和/或互连接口和/或控制器。存储器304可以包括动态和/或静态RAM、非易失性存储器(例如，闪存)等的任何布置。存储设备306可以包括硬盘驱动器(HDD)、固态驱动器(SSD)和/或任何其他类型数据存储设备或它们的任何组合。用户界面308可以包括任何类型的人机界面设备，诸如键盘、鼠标、监视器、视频捕获或传输设备、麦克风、扬声器、触摸屏等，以及这些设备的任何虚拟或远程版本。网络接口310可以包括一个或多个适配器或其他装置，以通过以太网、Wi-Fi、蓝牙或任何其他计算机网络布置进行通信，以使组件能够通过物理和/或逻辑网络(诸如内联网、互联网、局域网、广域网等)进行通信。电源312可以包括电池和/或电源，能够从AC或DC电源接收电力并将其转换为适用于系统300的组件的任何形式。CPU 302 may include any number of cores, caches, buses and/or interconnects and/or controllers. Memory 304 may include any arrangement of dynamic and/or static RAM, non-volatile memory (eg, flash memory), and the like. Storage device 306 may include a hard disk drive (HDD), solid state drive (SSD), and/or any other type of data storage device or any combination thereof. User interface 308 may include any type of human interface device, such as keyboards, mice, monitors, video capture or transmission devices, microphones, speakers, touch screens, etc., as well as any virtual or remote versions of these devices. Network interface 310 may include one or more adapters or other devices to communicate via Ethernet, Wi-Fi, Bluetooth, or any other computer network arrangement to enable components to communicate over physical and/or logical networks (such as intranets, the Internet , local area network, wide area network, etc.) to communicate. Power source 312 may include a battery and/or a power source capable of receiving power from an AC or DC power source and converting it into any form suitable for use with the components of system 300 .

系统300的任何或所有组件可以通过系统总线301互连，该系统总线301可以统称为各种接口，包括电源总线、地址和数据总线，诸如串行AT附件(SATA)的高速互连、外围组件互连(PCI)、外围组件互连Express(PCI-e)、系统管理总线(SMB)和任何其他类型的接口，可以使这些组件可以在一个位置本地和/或在不同位置之间协同工作。Any or all components of system 300 may be interconnected by system bus 301, which may be collectively referred to as various interfaces, including power bus, address and data bus, high-speed interconnects such as Serial AT Attachment (SATA), peripheral components Interconnect (PCI), Peripheral Component Interconnect Express (PCI-e), System Management Bus (SMB) and any other type of interface that enables these components to work together locally at one location and/or between locations.

系统300还可以包括各种芯片组、接口、适配器、胶合逻辑，诸如可编程或非可编程逻辑设备或阵列的嵌入式控制器、专用集成电路(ASIC)、嵌入式计算机、智能卡等，被布置为使系统300的各个组件能够一起工作来实现本公开中描述的所有方法和/或装置中的任何一个。系统300的任何组件可以用硬件、软件、固件或其任何组合来实现。在一些实施例中，可以以虚拟化的形式和/或在具有灵活的资源配置的基于云的实现方式来实现任何或所有组件，例如在数据中心内或分布在多个数据中心。System 300 may also include various chipsets, interfaces, adapters, glue logic, embedded controllers such as programmable or non-programmable logic devices or arrays, application specific integrated circuits (ASICs), embedded computers, smart cards, etc., arranged Any one of all methods and/or apparatuses described in this disclosure is implemented to enable the various components of system 300 to work together. Any component of system 300 may be implemented in hardware, software, firmware, or any combination thereof. In some embodiments, any or all components may be implemented in a virtualized form and/or in a cloud-based implementation with flexible resource configuration, eg, within a data center or distributed across multiple data centers.

结合本文公开的实施例描述的方法或算法和功能的块或步骤可以直接体现在硬件中，在由处理器执行的一个或多个软件模块中，或在两者的组合中，包括系统300。如果以软件实现，则功能可以作为一个或多个指令或代码存储或传输在有形的非暂时性计算机可读介质上。软件模块可以驻留在随机存取存储器(RAM)、闪存、只读存储器(ROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)、寄存器、硬盘、可移动磁盘、CD ROM中或任何其他形式的存储介质。本文公开的任何系统或其组件或部分可以被实现为较大系统(例如，图形处理单元(GPU)或其他较大系统)的软件堆栈的一部分。本文公开的任何系统或其组件或部分可以被实现为其自身的软件堆栈。The blocks or steps of the methods or algorithms and functions described in connection with the embodiments disclosed herein may be embodied directly in hardware, in one or more software modules executed by a processor, or in a combination of both, including system 300 . If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a tangible, non-transitory computer-readable medium. Software modules may reside in random access memory (RAM), flash memory, read only memory (ROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), registers, hard disk, removable disk, CD ROM or any other form of storage medium. Any of the systems disclosed herein, or components or portions thereof, may be implemented as part of a software stack of a larger system (eg, a graphics processing unit (GPU) or other larger system). Any system disclosed herein, or a component or portion thereof, may be implemented as its own software stack.

已经在各种实现细节的上下文中描述了以上公开的实施例，但是本公开的原理不限于这些或任何其他具体细节。例如，一些功能已经描述为由某些组件实现，但是在其他实施例中，该功能可以分布在不同位置的具有各种用户接口的不同系统和组件之间。已经将某些实施例描述为具有特定处理、步骤等，但是这些术语还涵盖可以用多个处理、步骤等来实现特定处理、步骤等的实施例或多个处理、步骤等可以集成到单个处理、步骤等的实施例。对组件或元件的引用可以仅是指该组件或元件的一部分。例如，对集成电路的引用可以是指集成电路的全部或仅一部分，对块的引用可以是指整个块或一个或多个子块。尽管已经在某些应用的上下文中描述了本公开的原理，但是该原理可以应用于任何属性内插和/或栅格化器处理，并且它们可以在使用边界方程、平面方程或任何其他方程内插或外推一个或多个值的任何数学计算中有用。在一些实施例中，对于分层结构的最低级别中的位置，可以执行计算，并且根据网格或其他阵列的分辨率，位置可以对应于各种事物，诸如像素、样本、质心等。在一些实施例中，内插可以在平面基元的任何空间采样频率下工作。在一些实施例中，零偏移量可以是指基本上零偏移量，该零偏移量使得能够出于计算目的而忽略该值而不会明显降低结果。The above disclosed embodiments have been described in the context of various implementation details, but the principles of the present disclosure are not limited to these or any other specific details. For example, some functionality has been described as being implemented by certain components, but in other embodiments the functionality may be distributed among different systems and components with various user interfaces in different locations. Certain embodiments have been described as having particular processes, steps, etc., but these terms also encompass embodiments in which multiple processes, steps, etc. may be used to implement particular processes, steps, etc. or multiple processes, steps, etc. may be integrated into a single process , steps, etc. References to a component or element may refer to only a portion of that component or element. For example, a reference to an integrated circuit may refer to all or only a portion of the integrated circuit, and a reference to a block may refer to the entire block or one or more sub-blocks. Although the principles of the present disclosure have been described in the context of certain applications, the principles may be applied to any attribute interpolation and/or rasterizer processing, and they may be within the use of boundary equations, plane equations, or any other equations Useful in any mathematical calculation that interpolates or extrapolates one or more values. In some embodiments, computations may be performed for positions in the lowest level of the hierarchy, and depending on the resolution of the grid or other array, the positions may correspond to various things, such as pixels, samples, centroids, and the like. In some embodiments, the interpolation can work at any spatial sampling frequency of the planar primitives. In some embodiments, a zero offset may refer to a substantially zero offset that enables the value to be ignored for computational purposes without significantly degrading the result.

在本公开和权利要求中使用诸如“第一”和“第二”的术语仅是为了区分它们修改的事物，并且除非从上下文明显看出，否则可能不指示任何空间或时间顺序。提到第一事物可能并不意味着第二事物的存在。Terms such as "first" and "second" are used in this disclosure and in the claims only to distinguish what they modify, and may not indicate any spatial or temporal order unless apparent from the context. The mention of the first thing may not imply the existence of the second thing.

根据本专利公开的发明原理，上述各种细节和实施例可以组合以产生另外的实施例。由于可以在不脱离本发明构思的情况下在布置和细节上修改本专利公开的发明原理，因此认为这样的改变和修改落入所附权利要求的范围内。The various details and embodiments described above may be combined to yield further embodiments in accordance with the inventive principles of this patent disclosure. Since the inventive principles of the present patent disclosure may be modified in arrangement and detail without departing from the inventive concept, such changes and modifications are considered to be within the scope of the appended claims.

Claims

1. A method for interpolating an attribute value of an image grid, the method comprising:

Determine the first-level root value of the attribute at the first-level root node located at the center of the image grid;

computing the first level metadata based on the first gradient of the attribute in the first direction and the second gradient of the attribute in the second direction; and

Based on the primary root value and primary metadata, primary child values for attributes of two or more primary child nodes radially arranged around the primary root node in the image grid are derived.

2. The method of claim 1, further comprising:

Using one of the first-level child nodes and its corresponding first-level child value as the second-level root node and second-level root value of the cell of the image grid, wherein the root node of the cell is located in the center of the cell;

computing secondary metadata based on the first gradient and the second gradient; and

Based on the secondary root value and the secondary metadata, secondary child values are derived for attributes of two or more secondary child nodes radially arranged around the secondary root node in the cell.

3. The method of claim 1, wherein each first-level child node is symmetrically offset from the first-level root node in the first direction and the second direction.

4. The method of claim 3, wherein each first-level child node is offset from the first-level root node by substantially zero or substantially the same distance in the first direction and the second direction.

5. The method of claim 1, wherein:

The image grid includes a 3x3 cell array with a center cell and eight outer cells;

Two or more first-level child nodes include eight first-level child nodes;

the first-level root node is at the center of the center cell; and

Each first-level child node is centered on one of the outer cells.

6. The method of claim 1, wherein the primary metadata includes attribute delta values offset in the first direction and the second direction.

7. The method of claim 1, wherein:

the value of the first parameter A is based on the first gradient; and

The value of the second parameter B is based on the second gradient.

8. The method of claim 7, wherein the primary metadata includes values A, B, A+B, and A-B.

9. The method of claim 2, wherein:

the value of the first parameter A is based on the first gradient;

The value of the second parameter B is based on the second gradient;

The image grid includes a 3×3 cell array;

Primary metadata includes the values 3A, 3B, 3(A+B), and 3(A-B); and

Secondary metadata includes the values A, B, A+B, and A-B.

10. The method of claim 1, wherein the first-level metadata is calculated based on a planar equation.

11. The method of claim 10, wherein:

The form of the plane equation is P(x,y)=A*(x-Seed_X)+B*(y-Seed_Y)+C;

P is the parameter of the two-dimensional surface interpolated at each position (x,y), where x is the distance in the x direction and y is the distance in the y direction;

A is the gradient per pixel or other cell in the x direction;

B is the gradient per pixel or other cell in the y direction; and

C is the value of P at position (Seed_X, Seed_Y).

12. The method of claim 1, wherein deriving a first-level sub-value comprises adding one or more first-level metadata to a first-level root value.

13. The method of claim 1, wherein a first-level root node and each first-level child node corresponds to a pixel.

14. The method of claim 1, wherein the first-level root node and each first-level child node corresponds to a sample.

15. The method of claim 1, further comprising rasterizing the image in response to the attribute value.

16. The method of claim 15, wherein the attribute includes a first value indicating that the node is inside the primitive and a second value indicating that the node is outside the primitive.

17. A method for interpolating attribute values of an image grid, the method comprising:

Determine the root value of the attribute of the root node at the center of the image grid;

precomputing metadata for a plurality of child nodes in one or more hierarchies based on one or more gradients of attributes; and

Based on the corresponding root value and metadata of the hierarchy of each sub-node, the attribute value of each sub-node of each hierarchy is derived;

where each child node serves as the root node in the next hierarchy.

18. The method of claim 17, wherein:

The image grid has a plurality of outer cells arranged radially around the central cell; and

The root node is in the center cell.

19. The method of claim 17, wherein the root node is located in the first cell having one or more additional nodes, the method further comprising:

determining attribute values for one or more additional nodes in the first cell; and

Deriving the attribute value of the additional child node corresponding to each additional node in the first cell of each level, wherein each additional node is derived based on the attribute value of the corresponding additional node in the first cell and the metadata of the corresponding level The attribute value of the child node.

20. An apparatus for interpolating attribute values of an image grid, the apparatus comprising:

A tree of one or more logical stages configured to derive, based on corresponding attribute values at the root node and metadata for each of the one or more levels, a plurality of child nodes around the centrally located root node in the level property value.