CN105957134A

CN105957134A - Systems and methods for 3-d scene acceleration structure creation and updating

Info

Publication number: CN105957134A
Application number: CN201610262778.9A
Authority: CN
Inventors: J·A·麦库姆; L·T·皮得森; A·德怀尔; N·内塞
Original assignee: Imagination Technologies Ltd
Current assignee: Imagination Technologies Ltd
Priority date: 2011-08-05
Filing date: 2012-08-04
Publication date: 2016-09-21
Anticipated expiration: 2032-08-04
Also published as: US20210183130A1; CN103765481B; US20130113800A1; US10930052B2; US20190164331A1; US11481954B2; US10217267B2; US8717357B2; CN103765481A; US9430864B2; US12380626B2; GB2505608B; US20140333610A1; GB201322078D0; US20160371876A1; CN105957134B; WO2013022804A1; DE112012003243T5; US20230016561A1; GB2505608A

Abstract

Systems and methods for generating acceleration structures propose subdividing a 3-D scene into multiple volume portions, the volume portions having different sizes, each volume portion can be addressed using a multi-part address, the multi-part The addresses indicate the location and relative size of each volume portion. A stream of primitives is processed by characterizing each primitive according to one or more criteria, selecting a relative size of volume portions for enclosing the primitive, and finding the , a collection of volume parts of this relative size. A primitive ID is stored in each location of a cache associated with each volume part in the set of volume parts. In response to each cache eviction decision made during the process, a cache location is selected for eviction. In response to the evicted cache location, an element of an acceleration structure is generated based on the contents of the evicted cache location.

Description

Systems and methods for accelerated structure creation and updating of 3-D scenes

本申请是国际申请日为2012年08月04日、国家申请号为201280037833.4、名称为“用于3-D场景加速结构创建和更新的系统和方法”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with an international filing date of August 4, 2012, a national application number of 201280037833.4, and a title of “System and Method for Creating and Updating 3-D Scene Acceleration Structures”.

相关申请的交叉引用Cross References to Related Applications

本申请要求2011年8月5日提交的标题为“加速结构创建的系统和方法(Systemsand methods of Acceleration Structure Creation)”的美国临时申请号61/515,801的优先权，出于所有目的，该申请通过引用以其全文结合于此。This application claims priority to U.S. Provisional Application No. 61/515,801, filed August 5, 2011, entitled "Systems and methods of Acceleration Structure Creation," which for all purposes is adopted by The citations are hereby incorporated in their entirety.

技术领域technical field

本主题的一个方面涉及用于有待渲染的3-D场景的场景加速结构的创建，并且在一个更具体的方面涉及对这种加速结构进行创建和更新，以便使用光线追踪用于对来自3-D场景描述的2-D图像进行渲染。One aspect of this subject matter relates to the creation of a scene acceleration structure for a 3-D scene to be rendered and, in a more specific aspect, to the creation and updating of such an acceleration structure for use with ray tracing for processing images from 3-D A 2-D image of a 3D scene description is rendered.

背景技术Background technique

用光线追踪对来自3-D场景描述的逼真2-D图像进行渲染在计算机图形学技术领域内是众所周知的。光线追踪通常涉及到获得由多种几何形状组成的场景描述，这些几何形状对场景中的结构的表面进行描述。如果这些几何形状是能够被有待使用渲染系统处理的一种形状类型的形状，则它们经常被称为图元；否则，通常对这些几何形状进行处理以便基于这些几何形状产生图元。例如，可以对贴片(patches)进行处理以便产生三角形图元。Rendering realistic 2-D images from 3-D scene descriptions with ray tracing is well known in the art of computer graphics. Ray tracing typically involves obtaining a scene description composed of various geometric shapes that describe the surfaces of structures in the scene. These geometries are often referred to as primitives if they are shapes of a shape type that can be processed using a rendering system; otherwise, these geometries are typically processed to generate primitives based on these geometries. For example, patches can be processed to generate triangle primitives.

图元可以与纹理和其他指令计算机图元的质量应如何影响碰撞该图元的光的信息相关联。光线追踪可以如实地对复杂光照、光透射、反射、折射等进行渲染，因为光线追踪可以对与场景的元素交互的光的物理行为进行建模。A primitive may be associated with textures and other information that instructs the computer how the quality of the primitive should affect light hitting the primitive. Ray tracing can faithfully render complex lighting, light transmission, reflection, refraction, etc., because ray tracing can model the physical behavior of light interacting with the elements of the scene.

光线追踪中常见的操作是确定光线与场景中的一个或多个图元之间的相交。在对光线追踪系统的物体进行定义时所使用的图元的示例是由位于3-D场景空间内的一个顶点集合组成的三角形；本说明书以此熟悉的示例继续下去，但三角形图元的使用是为了清晰性，而不是限制性的。A common operation in ray tracing is to determine the intersection between a ray and one or more primitives in the scene. An example of a primitive used in defining objects for a ray-tracing system is a triangle composed of a collection of vertices in 3-D scene space; this specification continues with this familiar example, but the use of triangle primitives It is for clarity, not limitation.

光线的定义可以由一个原点、和一个方向以及沿着光线的当前裁剪(clipping)距离组成，对于光线而言，裁剪距离可以被标识为“t”。当前裁剪距离对为光线检测到的当前最近的相交进行标识，或者在没有检测到的相交的情况下，标识已经在超过该距离之外的距离上测试了光线的相交，并且没有发现相交。当光线完成相交测试时，可以返回最近的检测到的相交，并且确定用于该相交的信息，如与该光线相交的图元的标识。A ray definition may consist of an origin, and a direction and the current clipping distance along the ray, which for a ray may be denoted as "t". The current clipping distance identifies the current closest intersection detected for the ray, or in the case of no detected intersection, the ray's intersection has been tested at a distance beyond this distance and no intersection was found. When a ray completes the intersection test, the most recent detected intersection may be returned and information for that intersection determined, such as the identity of the primitive that the ray intersected.

质朴地，通过反复地测试场景中的每一个三角形可以为给定的光线确定这些结果，以便对最近的相交三角形进行标识。虽然这种质朴的方法对带有少量三角形的场景而言以可令人接受的方式工作，但此方法对复杂的场景和对商业产品而言是难解决的，其中，对于每个有待渲染的帧而言，可能需要被测试数百万甚至数千万或数亿条光线与数百万个三角形的相交。Naively, these results can be determined for a given ray by iteratively testing every triangle in the scene to identify the closest intersecting triangle. While this naive approach works acceptably for scenes with a small number of triangles, it is intractable for complex scenes and for commercial products where, for each Frame-wise, the intersection of millions or even tens or hundreds of millions of rays with millions of triangles may need to be tested.

为了加速这种相交测试，使用3-D空间加速结构。这些加速结构通常将3D空间细分成多个不同区域来工作，其中，这些区域各自可以包围(bound)多个图元。然后，可以首先对这些区域中的每个区域进行相交测试，以便确定是否需要单独测试该区域中的图元。在一些情况下，加速结构可以是分层级的，从而使得执行加速结构内的不同区域的多次测试，以便对有待测试与给定光线相交的图元的相对小的集合进行标识。To speed up this intersection test, a 3-D spatial acceleration structure is used. These acceleration structures typically work by subdividing the 3D space into a number of different regions, where each of these regions can bound multiple primitives. Each of these regions can then be first tested for intersection in order to determine whether primitives in that region need to be tested individually. In some cases, the acceleration structure may be hierarchical such that multiple tests of different regions within the acceleration structure are performed in order to identify a relatively small set of primitives to be tested for intersection with a given ray.

对于一个适当组装的加速结构而言，光线相交测试的总数应实质上小于在每条光线与场景中的每个三角形之间已经执行的光线三角形测试。For a properly assembled acceleration structure, the total number of ray intersection tests should be substantially less than the ray triangle tests already performed between each ray and each triangle in the scene.

发明内容Contents of the invention

本概述描述了一种系统和方法的概览，其中，可以实践或实施各种更多的特定方面。然后介绍了这些特定方面中的一些方面。This overview describes an overview of systems and methods in which various more specific aspects can be practiced or implemented. Some of these specific aspects are then introduced.

在一个方面中，披露了一种用于为图元流产生加速结构的方法和装置。这些图元位于3-D场景中。该加速结构可以具有一种选定的类型，如包围体积层次，或kD树。包围体积层次可以使用一种选定类型的形状，如球体或轴对准包围盒。In one aspect, a method and apparatus for generating an acceleration structure for a stream of primitives is disclosed. These primitives are located in the 3-D scene. The acceleration structure can be of a selected type, such as a bounding volume hierarchy, or a kD tree. A bounding volume hierarchy can use a selected type of shape, such as a sphere or an axis-aligned bounding box.

为了产生加速结构，为3-D场景形成层级空间细分并且该细分包括不同粒度等级下的多个元素集合。对每个流式图元进行分类以选择一个粒度等级，在该粒度等级下，图元有待被来自所选定的粒度等级的元素的集合包围。作为一个具体的示例，较大数量的较小空间细分可以用于共同地包围一个给定的图元，或者可以使用较小数量的较大空间细分。较大数量的较小空间细分可以在该层级空间细分的更细粒度层内。To generate the acceleration structure, a hierarchical spatial subdivision is formed for the 3-D scene and includes multiple sets of elements at different levels of granularity. Each streaming primitive is classified to select a level of granularity under which the primitive is to be surrounded by a set of elements from the selected level of granularity. As a specific example, a larger number of smaller spatial subdivisions may be used to collectively enclose a given primitive, or a smaller number of larger spatial subdivisions may be used. A larger number of smaller spatial subdivisions can be within a finer-grained layer of the hierarchical spatial subdivision.

对于所选定的粒度等级而言，对层级空间细分的一个或多个元素进行标识，这些元素共同地包围着图元。将图元ID添加到该层级空间细分的每个所标识的元素的高速缓存条目中。对进一步的图元可以类似处理，从而使得可以将多个图元收集到该层级空间细分的一个给定元素中。For the selected level of granularity, one or more elements of the spatial subdivision of the hierarchy are identified, which collectively enclose the primitive. The primitive ID is added to the cache entry for each identified element of this hierarchical spatial subdivision. Further primitives can be treated similarly, so that multiple primitives can be collected into a given element of the hierarchical spatial subdivision.

在该层级空间细分的的每个元素内，对该元素内收集的那些图元的覆盖范围进行追踪。在一个示例中，通过在用该元素包围这些图元(或部分图元)的元素内确定一个子体积来对覆盖范围进行追踪。在这种方面中，对于包围体积的一种给定形状而言(如立方体或球体)，覆盖范围可以被视为多个体积的并集。Within each element of the spatial subdivision of the hierarchy, the coverage of those primitives collected within that element is tracked. In one example, coverage is tracked by determining a subvolume within the element that surrounds the primitives (or portions of primitives) with that element. In this aspect, for a given shape of the bounding volume (such as a cube or a sphere), coverage can be viewed as the union of volumes.

最后，可以从高速缓存中逐出层级空间细分的一个给定元素的一个高速缓存条目。该元素的覆盖范围用于形成、定义或选择将用于光线相交测试的加速结构的一个元素。还使该元素的覆盖范围聚合，其中，该层级空间细分的其他元素的覆盖范围聚合成该层级空间细分的一个更大的元素。一个或多个图元可以与该更大的元素(和该粒度等级下的其他元素)直接相关联，并且覆盖范围可以被保持为所有这些元素的并集。还可以高速缓存此信息(对这些包围元素和/或图元进行标识)并且该信息可以最终用于定义该加速结构的另一个元素，并且可能被聚合成该层级空间细分的一个仍然更大的元素等，直到已经确定用于该加速结构的一个或多个根节点。Finally, a cache entry for a given element of the hierarchical space subdivision may be evicted from the cache. The coverage of this element is used to form, define or select an element of the acceleration structure that will be used for ray intersection testing. The coverage of the element is also aggregated, wherein the coverages of other elements of the spatial subdivision of the level are aggregated into one larger element of the spatial subdivision of the level. One or more primitives can be directly associated with this larger element (and other elements at that level of granularity), and coverage can be maintained as the union of all these elements. This information can also be cached (identifying these enclosing elements and/or primitives) and this information can eventually be used to define another element of the acceleration structure, and possibly be aggregated into a still larger one of the hierarchical space subdivisions elements, etc., until one or more root nodes for the acceleration structure have been identified.

在一个特定方面中，根据本披露的系统和方法使图元与包围盒元素凝聚，以便形成一个加速结构。使用工作空间细分层次确定该加速结构的元素，其中，元素是可容易地寻址或标识的。在一个示例中，可以使用3-D位置和粒度等级对元素进行唯一地定址；换言之，每个3-D空间位置被多个粒度等级中的每个粒度等级下的该细分层次的恰好仅一个元素包含。可以将所构建的一种类型的加速结构(形状和/或元素的相互关系)与如何展现该工作空间细分层次解耦。In a particular aspect, systems and methods according to the present disclosure coalesce primitives and bounding box elements to form an acceleration structure. The elements of the acceleration structure are determined using a workspace subdivision hierarchy, where the elements are easily addressable or identifiable. In one example, elements can be uniquely addressed using a 3-D location and a level of granularity; in other words, each 3-D spatial location is identified by exactly only An element contains. One type of acceleration structure built (shapes and/or interrelationships of elements) can be decoupled from how the workspace subdivision hierarchy is represented.

在另一个方面中，一种用于产生加速结构以便在3-D渲染中使用的方法包括根据一种包括与3-D坐标系的一个或多个轴对准、图元的绝对大小、以及图元的长宽比中的一个或多个的启示法来表征图元流中的每个图元。该方法提供用于为该图元所在的3-D场景确定一个工作细分结构的元素的一个集合。该元素集合可以具有相同形式和/或相同大小，可以从多种形式中选择，并且每个形式具有多个大小确定的细分元素。该图元流中的不同图元可以被不同形式和大小的元素包围。每个元素还可以具有由该元素定义的一个体积内的多个图元的一部分(或全部)。基于该工作细分结构的元素形成一个加速结构的叶节点。每个叶节点包括一个对应的包围体积，该包围体积包围用于该叶节点的工作细分结构的元素内的那个或那些图元。该方法还包括递归地使包围体积凝聚以产生越来越大的包围体积。在每次递归凝聚中，该加速结构的一个元素包括一个包围体积，该包围体积是被凝聚成该元素的每个包围体积的并集。In another aspect, a method for generating an acceleration structure for use in 3-D rendering includes aligning with one or more axes of a 3-D coordinate system, an absolute size of a primitive, and One or more heuristics of the aspect ratio of the primitive to characterize each primitive in the primitive stream. This method provides a set of elements used to determine a work subdivision structure for the 3-D scene in which the primitive resides. The set of elements may have the same form and/or the same size, may be selected from a plurality of forms, and each form has a plurality of sized subdivision elements. Different primitives in this primitive stream can be surrounded by elements of different shapes and sizes. Each element can also have part (or all) of multiple primitives within a volume defined by the element. Elements based on the work breakdown structure form a leaf node of the acceleration structure. Each leaf node includes a corresponding bounding volume that surrounds the primitive or primitives within the elements of the work subdivision structure for that leaf node. The method also includes recursively condensing the bounding volume to produce larger and larger bounding volumes. In each recursive condensation, an element of the acceleration structure includes a bounding volume which is the union of each bounding volume condensed into that element.

另一个方面涉及光子映射和形成加速结构以便在查询映射的光子时使用。在一个示例中，一种方法包括正向地追踪来自3-D场景中的光源的光线以对3-D空间中的对应位置进行标识，一个或多个光子有待沉积在该位置处。在体元层次内选择一个粒度等级，在该粒度等级下存储每个光子。基于该光子的有效半径的指示和距离该光源的距离来选择该粒度等级。在一个示例中，光线微分用于表示该有效半径；执行光子发射的硬件或软件还可以提供关于这种有效半径的元数据。每个光子位于基于该有效半径选择的一个粒度等级的一个体元内，并且该体元包括3-D空间中的用于该光子的标识位置。该光子相对于该体元的数据被高速缓存。响应于逐出一个高速缓存位置的指示，与该位置相关联的光子数据被写出并且用于产生表示所写出的光子数据的一个加速结构的一个元素。在这种情况下，在对光子流进行处理的过程中，可以混合或以另外的方式将光子组合成一个分布、函数或参数化的表示。光子映射的其他方法包括将所有光子映射到层次中的每一级上并且然后选择性地对该层次中的节点进行修剪以产生具有均衡的光子数量的节点，和/或产生一个以适当的精确度表示光子的加速结构。在一些示例中，例如，为一个加速结构中的更大的节点混合光子数据并且针对这些节点的面收集或在一个节点的中心处收集光子数据。Another aspect involves photon mapping and forming acceleration structures for use when interrogating mapped photons. In one example, a method includes forward tracing rays from light sources in a 3-D scene to identify corresponding locations in 3-D space at which one or more photons are to be deposited. Select a level of granularity within the voxel hierarchy at which to store each photon. The granularity level is selected based on an indication of the photon's effective radius and distance from the light source. In one example, a ray differential is used to represent this effective radius; the hardware or software performing the photon emission can also provide metadata about this effective radius. Each photon is located within a voxel of a granularity level selected based on the effective radius, and the voxel includes an identified location in 3-D space for the photon. The data of the photon relative to the voxel is cached. In response to an indication to evict a cache location, photon data associated with the location is written out and used to generate an element of an acceleration structure representing the written out photon data. In this case, during processing of the stream of photons, the photons may be mixed or otherwise combined into a distribution, function or parametric representation. Other methods of photon mapping include mapping all photons to each level in the hierarchy and then selectively pruning the nodes in the hierarchy to produce nodes with a balanced number of photons, and/or to produce a The degree represents the accelerated structure of the photon. In some examples, photon data is mixed for larger nodes in an acceleration structure and collected for the faces of these nodes or collected at the center of a node, for example.

其他需注意的方面在于实践所披露方面的系统可以保持流数据到或离开外部存储器的高带宽链路，而如上所述与处理元素形成一体的相对小的存储器提供用于高速缓存的空间。因此，使图元和加速结构数据流式化，并且不在外部存储器中保持大量的加速结构状态。通常，根据本披露构建的系统可以在O(n log n)的最坏情况构建时间中产生加速结构，其中n表示所处理的图元的数量。在一些实现方式中，可以用最小浮点计算来处理图元，最小浮点计算允许相对小的固定功能硬件实现方式并且因此可以硬件方式成本有效地实现。可以调谐系统和方法以形成特殊种类的加速结构。Another aspect of note is that systems practicing the disclosed aspects can maintain high bandwidth links of streaming data to and from external memory, while the relatively small memory integrated with the processing elements provides room for caching as described above. Thus, primitive and acceleration structure data are streamed, and large amounts of acceleration structure state are not maintained in external memory. In general, systems built according to the present disclosure can produce accelerated structures in a worst-case build time of O(n log n), where n represents the number of primitives processed. In some implementations, primitives can be processed with minimal floating-point computations that allow relatively small fixed-function hardware implementations and thus can be cost-effectively implemented in hardware. Systems and methods can be tuned to form special kinds of accelerating structures.

附图说明Description of drawings

为了清晰性，附图展现了按2-D形状在3-D空间中定义的图元和其他体积元素。然而，这些元素仍然被称为并且作为在3-D空间中定义的元素对待。本领域的普通技术人员从这些披露中将理解到如何在3-D渲染中实施这些技术。For clarity, the figures represent primitives and other volume elements defined by 2-D shapes in 3-D space. However, these elements are still called and treated as elements defined in 3-D space. Those of ordinary skill in the art will understand from these disclosures how to implement these techniques in 3-D rendering.

图1描绘了在其中可以实现本披露的多个方面的示例系统的概览；FIG. 1 depicts an overview of an example system in which aspects of the present disclosure can be implemented;

图2描绘了根据本披露的一种示例方法；Figure 2 depicts an example method according to the present disclosure;

图3描绘了可以用于支持图2中描绘的方法的示例分量方法；FIG. 3 depicts an example component method that may be used to support the method depicted in FIG. 2;

图4A至图4F描绘了3-D场景的多级细分的示例的各个方面，以便用于根据本披露的方法和系统中；4A-4F depict various aspects of an example of multi-level subdivision of a 3-D scene for use in methods and systems according to the present disclosure;

图5描绘了有待为其创建加速结构的图元的示例微型集合；Figure 5 depicts an example micro-collection of primitives for which an acceleration structure is to be created;

图6描绘了映射到选定粒度等级(LOG)下的体元上的图元中的一些图元；Figure 6 depicts some of the primitives mapped to voxels at a selected level of granularity (LOG);

图7描绘了映射到与图6的图元不同的LOG下的体元上的相对大且规则的图元；Figure 7 depicts relatively large and regular primitives mapped to voxels under a different LOG than the primitives of Figure 6;

图8描绘了一个示例，其中图6中出现的图元映射到比图6中的更细粒度LOG下的体元上；Figure 8 depicts an example where primitives appearing in Figure 6 are mapped to voxels under a finer-grained LOG than in Figure 6;

图9针对映射到具体体元上的图元的一部分描绘了一个立方体覆盖体积Figure 9 depicts a cubic coverage volume for a portion of primitives mapped onto concrete voxels

图10针对映射到图9的体元上的图元的一部分描绘了一个轴对准的覆盖体积；FIG. 10 depicts an axis-aligned coverage volume for a portion of a primitive mapped onto the voxel of FIG. 9;

图11描绘了一个示例，其中图10的轴对准的覆盖体积被延长以包括另一个图元；Figure 11 depicts an example where the axis-aligned coverage volume of Figure 10 is extended to include another primitive;

图12描绘了包括来自更细粒度LOG的和来自直接在较粗粒度LOG下的体元中的图元的覆盖体积；Figure 12 depicts a coverage volume including primitives from the finer-grained LOG and from voxels directly under the coarser-grained LOG;

图13描绘了根据本披露的高速缓存的操作的各方面；Figure 13 depicts aspects of the operation of a cache according to the present disclosure;

图14描绘了一个加速结构的逻辑视图，可以使用本披露的实现方式产生该加速结构；Figure 14 depicts a logical view of an acceleration structure that can be generated using implementations of the present disclosure;

图15描绘了可以使用本披露的实现方式产生的加速结构的封装存储器展现；Figure 15 depicts a packaged memory representation of an acceleration structure that can be produced using implementations of the present disclosure;

图16描绘了使用带有光线微分的光线追踪使光子沉积在3-D场景中的示例；Figure 16 depicts an example of photon deposition in a 3-D scene using ray tracing with ray differentiation;

图17描绘了确定用于光子的加速层次的过程；Figure 17 depicts the process of determining the acceleration level for photons;

图18描绘了确定用于光子的加速层次的另一个过程；以及Figure 18 depicts another process for determining the acceleration level for photons; and

图19描绘了一个示例系统，在其中可以提供本披露的实现方式。FIG. 19 depicts an example system in which implementations of the present disclosure may be provided.

具体实施方式detailed description

以下信息涉及一种用于生成加速结构的算法的示例，该加速结构包括包围3-D场景中的表面的元素。在一个特定的示例中，由图元定义这些表面，并且该加速结构用于光线追踪。在另一个示例中，可以创建该加速结构用于在光子查询时使用。The following information relates to an example of an algorithm for generating an acceleration structure comprising elements surrounding a surface in a 3-D scene. In one particular example, the surfaces are defined by primitives, and the acceleration structure is used for ray tracing. In another example, the acceleration structure can be created for use in photon interrogation.

可以根据各种方法创建加速结构。尽管预期加速结构加速光线相交的测试，并且确实达到了该目标，但加速结构的使用在整体渲染系统内产生其他开销。例如，存在为给定场景构建加速结构的开销。Acceleration structures can be created according to various methods. Although the acceleration structure was expected to speed up the testing of ray intersections, and did achieve that goal, the use of the acceleration structure created additional overhead within the overall rendering system. For example, there is the overhead of building an acceleration structure for a given scenario.

可以为静态3-D场景(即，不是动画的3-D场景)设计一些加速结构，并且相反，可以可变地放置视点和该场景中的光。在这种情况下，加速结构的相对仔细且耗时的构建会是合乎情理的。如果在长时期中将不变地使用静态场景(例如，从一个视点上观察有待构建的建筑模型)，则出于这种目的，在该加速结构的质量整体判断中可以忽略构造该加速结构所需的时间。Some acceleration structures can be designed for static 3-D scenes (ie, not animated 3-D scenes), and instead, the viewpoints and lights in the scene can be variably placed. In this case, it would be reasonable to speed up the relatively careful and time-consuming construction of the structure. If a static scene is to be used invariably over a long period of time (for example, a model of a building to be constructed is observed from one point of view), then for this purpose the costs involved in constructing the accelerated structure can be ignored in the overall judgment of the quality of the accelerated structure. required time.

然而，在动态场景中，对象可以进入、离开或在该场景中移动，改变其几何形状等。在这些种类的应用中，构建或更新加速结构的开销的量值可以成为对来自该场景的图像进行渲染所需的计算的一个实质性部分。对于这些更频繁更新的场景(如在动画场景中)而言，或者当可以对该场景作出实时变化时，加速结构的重建或构建时间(以及实现这种构建时间所需的计算机资源的量值)可以是对构建加速结构的给定方法的可用性进行判断一个因素。因此，对于动态应用而言，构建和使用加速结构的方法应将构建(或重建/修改)加速结构所需的时间和资源以及加速结构帮助加速场景的实际光线追踪的程度两者作为因素计入在内。However, in a dynamic scene, objects can enter, leave or move within the scene, change their geometry, etc. In these kinds of applications, the magnitude of the overhead of building or updating the acceleration structures can become a substantial portion of the computation required to render images from the scene. For these more frequently updated scenes (as in animated scenes), or when real-time changes can be made to the scene, speeding up the reconstruction or build time of the structure (and the amount of computer resources required to achieve this build time ) can be a factor in judging the usability of a given method for building an acceleration structure. Therefore, for dynamic applications, the approach to building and using acceleration structures should factor in both the time and resources required to build (or rebuild/modify) the acceleration structure, and the extent to which the acceleration structure helps accelerate the actual ray tracing of the scene inside.

这种情况还考虑了以下一般事实：对于构建加速结构的给定方法而言，就每条光线将需要比较少的相交测试的意义而言，花费更多的计算资源以构建给定加速结构产生了“更好”的加速结构。在光线追踪的背景下，使用加速结构的一个目标是,随着场景变得越来越复杂,对光线相交测试获得更接近恒定的分辨时间。This case also takes into account the general fact that, for a given method of building an acceleration structure, more computational resources are expended to build a given acceleration structure in the sense that each ray will require fewer intersection tests to produce A "better" acceleration structure. In the context of ray tracing, one goal of using an accelerated structure is to obtain a closer to constant resolution time for ray intersection tests as the scene becomes more complex.

可以针对给定的三角形计数以及解析每个光线相交花费的平均时间(在此称为“遍历时间”)基于对结构进行组装所需的时间来判断创建加速结构的途径。测量的示例单位可以分别是“每秒三角形数量”和“每秒光线数量”(可以为特定架构或计算资源的固定和特定集合对此进行分析)。可以有用的是将这些因素进一步表达为“每光线的节点测试数量”和“每光线的三角形测试数量”，这可以使得度量标准更加独立于具体的计算资源。The approach to creating an accelerated structure can be judged based on the time required to assemble the structure for a given triangle count and the average time it takes to resolve each ray intersection (referred to herein as "traversal time"). Example units of measurement could be "triangles per second" and "rays per second" respectively (this could be analyzed for a specific architecture or a fixed and specific set of computing resources). It may be useful to further express these factors as "number of node tests per ray" and "number of triangle tests per ray", which may make the metric more independent of specific computing resources.

给定加速结构构建算法的普通性还会是有意义的。例如，一些种类的加速结构构建算法可以比一些种类的3-D场景的其他算法表现得更好。此外，如果在加速结构中使用特殊种类的形状或程序，一些种类的算法可以产生更好的加速结构。例如，在用于加速结构的许多典型的包围体积中可以不良地包围(例如，太松散)“细长(long-skinny)”和/或离轴的几何形状。对此特征不敏感的形状在遍历过程中与光线相交成本更高并且难于在加速结构组装过程中产生。It would also be interesting given the generality of accelerated structure building algorithms. For example, some kinds of accelerated structure building algorithms may perform better than others for some kinds of 3-D scenes. Also, some kinds of algorithms can produce better accelerated structures if special kinds of shapes or programs are used in the accelerated structure. For example, "long-skinny" and/or off-axis geometries may be poorly enclosed (eg, too loosely) in many typical enclosing volumes for acceleration structures. Shapes that are not sensitive to this feature are more expensive to intersect with rays during traversal and are difficult to generate during accelerated structure assembly.

如果希望将一种给定的加速结构构建方法用在多变的并且经常未知的场景构建条件下，则可以令人希望的是为各种可以具有不同种类的特征的样本场景收集这种度量标准，这些特征可能用不同的方式挑战该加速结构构建算法。更确切地说，一些加速结构方法可以具有有利的遍历时间，但以缓慢的构建时间(许多秒或分钟的量级)、对商业上可行实现方式的不切实际要求，或两者为代价。还可以使场景特定效应变得明显。If it is desired to apply a given accelerated structure-building method to variable and often unknown scene-building conditions, it may be desirable to collect such metrics for a variety of sample scenes, which may have different kinds of characteristics , these features may challenge the accelerated structure-building algorithm in different ways. Rather, some accelerated structural methods may have favorable traversal times at the expense of slow build times (on the order of many seconds or minutes), unrealistic requirements for a commercially viable implementation, or both. Scene-specific effects can also be made apparent.

在一个方面中，本披露涉及多种加速结构构建方法，这些方法在动态场景(如在动画中)或每秒需要更新3-D场景许多次的其他情况中是有用的，但其中计算要求是现实且均衡的(然而，不要求与动态场景一起使用)。在一些方面中，本披露涉及可以基于目标可得计算资源或给定性能目标或两者进行缩放的多种方法。因此，在一些方面中，本披露涉及用于实现均衡系统的方法，其中考虑了加速结构构建时间和光线遍历时间。In one aspect, the present disclosure relates to accelerated structure building methods that are useful in dynamic scenes (such as in animation) or other situations where the 3-D scene needs to be updated many times per second, but where the computational requirements are Realistic and well-balanced (however, not required for use with dynamic scenes). In some aspects, the present disclosure relates to methods that can be scaled based on target available computing resources or given performance goals, or both. Thus, in some aspects, the present disclosure relates to methods for implementing a balanced system that takes into account accelerated structure build time and ray traversal time.

根据本披露的加速结构构建算法的示例特征可以包括以下各项中的任一项：避免递归的流式管线、O(nlogn)最坏情况构建时间，其中n＝三角形计数(triangle_count)。简单且高效的方法确定自底向上构建中的空间邻接。使用与使用简单整数运算并且避免/减少浮点计算的体元化/3D扫描转换类似的方法处理三角形。中间数据传送/带宽使用会受到约束以保持在片上本地存储器的特定量值内。可以确定结果加速结构的质量与可用本地存储器存储之间的折中。可以流式化片外数据传送/带宽使用，如流式化几何形状的读出和流式化加速结构定义数据的写入。Example features of accelerated structure construction algorithms according to the present disclosure may include any of the following: streamlined pipeline avoiding recursion, O(nlogn) worst case construction time, where n=triangle count (triangle_count). Simple and efficient method for determining spatial adjacency in bottom-up construction. Triangles are handled in a similar way to voxelization/3D scan conversion which uses simple integer arithmetic and avoids/reduces floating point calculations. Intermediate data transfers/bandwidth usage can be constrained to stay within a certain amount of on-chip local memory. A tradeoff between the quality of the resulting acceleration structure and the available local memory storage can be determined. Off-chip data transfers/bandwidth usage can be streamed, such as streaming geometry reads and streaming acceleration structure definition data writes.

本披露提供了可以跨细长几何形状放置多个包围体积以收紧几何形状的包围以便减少浪费的光线节点测试的算法。Kd树提供规则结构的示例，并且包围体积层次(BVH)是可能不规则的加速结构的一种分类。加速结构还可以是同类的(其中，给定的元素仅直接包围其他的元素或图元，并且叶节点直接包围所有图元)或者不同类的，其中，任何给定的元素可以直接包围其他元素和图元的组合。The present disclosure provides algorithms that can place multiple bounding volumes across elongated geometries to tighten the geometry's bounding so as to reduce wasted ray node tests. Kd-trees provide examples of regular structures, and bounding volume hierarchies (BVHs) are a classification of possibly irregular acceleration structures. Acceleration structures can also be homogeneous (where a given element directly surrounds only other elements or primitives, and leaf nodes directly surround all primitives) or dissimilar, where any given element can directly surround other elements Combination with primitives.

在以下披露中，可以考虑用顶点定义3-D空间中的一个点，并且该顶点可以用多个值的元组表示。可以通过3个顶点定义一个三角形图元。在三角形网格中，可以在不同图元之间共享多个顶点，而不是通过3个值的独特集合定义每个图元。一个网格还可以包括一个解释规则，如缠绕顺序。例如，三角形条带网格使用一个附加顶点结合前一个三角形的两个之前的顶点来定义多个三角形。因此，对于n个顶点，描述n-2个三角形。“三角形”网格使用3个单独的顶点定义一个三角形。如此，对于n个顶点，描述n/3个三角形(n可被3整除)。可以实现三角形图元的各种其他展现。并且，在各种实现方式中，图元不一定是三角形的而可以由其他种类的表面形成。如此，三角形图元的示例不是限制性的，而是用于清楚且简洁地阐述本披露的各方面。In the following disclosure, a vertex may be considered to define a point in 3-D space, and the vertex may be represented by a tuple of multiple values. A triangle primitive can be defined by 3 vertices. In a triangular mesh, multiple vertices can be shared between different primitives, rather than each primitive being defined by a unique set of 3 values. A mesh can also include an interpretation rule, such as a winding order. For example, a triangle strip mesh defines multiple triangles using an additional vertex combined with two previous vertices of a previous triangle. Thus, for n vertices, n-2 triangles are described. A "triangle" mesh uses 3 individual vertices to define a triangle. Thus, for n vertices, n/3 triangles are described (n is divisible by 3). Various other representations of triangle primitives may be implemented. Also, in various implementations, primitives need not be triangular but can be formed from other kinds of surfaces. As such, the examples of triangle primitives are not limiting, but serve to clearly and concisely illustrate aspects of the present disclosure.

系统和方法可以处理图元的网格(如三角形条带)并且可以输入或以另外的方式访问针对有待处理的网格的顶点连接性规则。可以使用标识符对网格进行标识；指针可以参照对三角形网格进行定义的顶点数据的阵列。如总三角形计数的信息。The systems and methods can process meshes of primitives (eg, triangle strips) and can input or otherwise access vertex connectivity rules for the mesh to be processed. A mesh can be identified using an identifier; a pointer can refer to an array of vertex data that defines the triangle mesh. Information such as the total triangle count.

本披露主要涉及一个示例，其中，固定3-D场景细分(例如，体元格栅)用作工作场景细分，以便将图元流映射到3-D空间的多个部分上。将一个图元集合映射到3-D场景的一部分上之后，从该映射中创建有待在渲染过程中使用的一个加速结构的一个或多个元素。因此，在本披露中，可以产生一个最终加速结构，该最终加速结构具有与用于将图元映射到3-D空间上的场景细分不同的元素。然而，该最终加速结构不一定在特征上与用于映射的3-D场景细分不同。The present disclosure is primarily concerned with an example where a fixed 3-D scene subdivision (eg, voxel grid) is used as a working scene subdivision to map primitive streams onto portions of 3-D space. After mapping a collection of primitives onto a portion of the 3-D scene, one or more elements of an acceleration structure to be used in the rendering process are created from the mapping. Thus, in the present disclosure, a final acceleration structure can be produced that has elements different from the scene subdivision used to map primitives onto 3-D space. However, this final acceleration structure is not necessarily characteristically different from the 3-D scene subdivision used for mapping.

在一个示例中，关于3-D场景中的位置来限定几何形状，如使用3-D世界空间坐标，经常从(0，0，0)位置进行参照。这样，所有几何形状将位于3-D空间中的某个地方，在场景的所有维度中共同地定义该几何形状的范围。3-D场景细分可以与被几何形状占用的空间进行匹配(例如，该场景细分的顶层部分的位置可以与该几何形状对准)。然而，在此处的一种方法中，3-D场景细分的全范围在大小上被四舍五入成2的下一个更高次幂。然后，根据2的幂确定所有场景细分的大小并对其进行放置，并且使用整数为其编索引。In one example, the geometry is defined with respect to a position in the 3-D scene, such as using 3-D world space coordinates, often referenced from a (0,0,0) position. This way, all geometry will lie somewhere in 3-D space, collectively defining the extent of that geometry in all dimensions of the scene. The 3-D scene subdivision can be matched to the space occupied by the geometric shape (eg, the position of the top level part of the scene subdivision can be aligned with the geometric shape). However, in one approach here, the full range of 3-D scene subdivisions is rounded to the next higher power of 2 in size. All scene subdivisions are then sized and placed according to powers of 2 and indexed using integers.

在一个示例中，该3-D场景细分为轴对准立方体积的一个多级集合，为了简洁性，在此每个立方体积被称为一个体元。可以在3-D中将体元类推到像元。多级包括3-D场景中的任一给定点被不同大小的若干体元包围(封闭)，其中不同大小的体元处于多级集合的不同层级中。在此通过粒度等级(LOG)对层次中的每个层级进行标识，在多级集合内引起从较大到较小立方体的渐进。例如，3-D场景的整体可以标识为LOG＝0，并且其可以包含一个单个立方体。在一个示例中，可以发现给定LOG下的多个立方体为(LOG+1)^3，作出1，8，27，64等的渐进。每个LOG下的体元共同地覆盖该3-D场景。在一个示例中，有32个粒度等级，这将允许LOG＝31下的32,768个体元。除非另有说明，否则将图元映射到其上的3-D场景细分可以将空间表示为3-D场景内各分辨率下的分立体元的一个3D格栅集合。In one example, the 3-D scene is subdivided into a multilevel collection of axis-aligned cubic volumes, each of which is referred to herein as a voxel for simplicity. A voxel can be analogized to a pixel in 3-D. Multilevel involves any given point in the 3-D scene being surrounded (enclosed) by several voxels of different sizes, where the voxels of different sizes are in different levels of the multilevel set. Here each level in the hierarchy is identified by a level of granularity (LOG), causing a progression from larger to smaller cubes within the multi-level set. For example, the entirety of a 3-D scene may be identified as LOG=0, and it may contain a single cube. In one example, the number of cubes under a given LOG can be found to be (LOG+1)^3, making progressions of 1, 8, 27, 64, etc. The voxels under each LOG collectively cover the 3-D scene. In one example, there are 32 levels of granularity, which would allow 32,768 voxels at LOG=31. Unless otherwise stated, the 3-D scene subdivision onto which primitives are mapped may represent the space as a 3D grid set of discrete voxels at various resolutions within the 3-D scene.

如将详细解释的，可以使用3-D整数坐标[xi，yi，zi]对给定LOG下的每个体元单独地进行定址。为了充分地限定该多级集合中的任何体元，还必须限定该体元的LOG。限定3-D整数坐标限定了具有所限定整数坐标的所有LOG的一个体元集合。因此，本披露提供了一种3-D场景细分，允许容易地对该3-D场景细分的任何成员进行定址或唯一地进行标识。体元为示例途径，而不是排他性的。更普遍地，可以使用允许用从小到大变化的邻域对3-D空间的细分部分进行定址的任何种类的途径。仍然更普遍地，可以使用允许对3-D空间的细分部分进行定址的任何途径，该3-D空间限定以下各项中的一个或多个：细分部分的形式、细分部分相对于场景坐标轴的定向、以及细分部分的相对大小。例如，如以上举例说明的，3-D空间中的点和一个偏移量集合可以用于描述一个包围体积集合，而不是描述每个体元的一个明确整数坐标集合。鉴于以上内容，关于附图，以下介绍了各种细节和示例。As will be explained in detail, each voxel under a given LOG can be individually addressed using 3-D integer coordinates [xi, yi, zi]. In order to adequately define any voxel in the multilevel set, the LOG of that voxel must also be defined. Defining the 3-D integer coordinates defines a voxel set of all LOGs with the defined integer coordinates. Accordingly, the present disclosure provides a 3-D scene subdivision that allows any member of the 3-D scene subdivision to be easily addressed or uniquely identified. Voxels are exemplary approaches, not exclusive. More generally, any kind of approach that allows subdivisions of 3-D space to be addressed with neighborhoods that vary from small to large can be used. Still more generally, any approach that allows addressing a subdivision of a 3-D space that defines one or more of the following: the form of the subdivision, the relative The orientation of the scene axes, and the relative size of the subdivisions. For example, as exemplified above, points in 3-D space and a set of offsets may be used to describe a set of bounding volumes rather than an explicit set of integer coordinates for each voxel. In view of the above, various details and examples are set forth below with respect to the accompanying drawings.

图1描绘了一个示例系统50。系统50具有一个输入图元流51的分类器52。分类器52为每个处理的图元产生一个体元坐标[XYZ]和一个LOG([XYZ]LOG 59作为示例)。分类器52耦合到扫描转换器54的输出端上，该扫描转换器可以访问表示一个体元坐标系统的数据53。扫描转换器54输出到一个体元高速缓存56。扫描转换器54可以作为一个数字微分分析仪来工作以用于沿着图元的边缘步进并且确定遇到的每个像元是否在该图元内。在一些实现方式中，扫描转换器54可以使用整数来操作以用于定义细分元素的边界；在一些实现方式中，通过增加空间的大小使所有细分元素与整数边界对准，包围元素被放置在该空间中以允许实现上述对准。与传统扫描转换算法(其可以要求该边缘至少横穿多边形的中心)相比之下，此处的扫描转换对落入该图元内的体元的任何部分进行追踪。可以在固定功能硬件中有益地实施扫描转换。可以实施各种途径以提供扫描转换，以便用在本披露的实现方式中。更普遍地，多种实现方式可以提供一种产生完全包含该图元的一个或多个3-D形状的一个集合的描述，并且可以分开地对每个形状进行定址，如以便为每个这种形状建立一个高速缓存条目(以下所述)。FIG. 1 depicts an example system 50 . The system 50 has a classifier 52 that inputs a stream 51 of primitives. Classifier 52 generates a voxel coordinate [XYZ] and a LOG ([XYZ]LOG 59 as an example) for each primitive processed. Classifier 52 is coupled to the output of scan converter 54, which has access to data 53 representing a voxel coordinate system. Scan converter 54 outputs to a voxel cache 56 . Scan converter 54 may operate as a digital differential analyzer for stepping along the edges of the primitives and determining whether each pixel encountered is within the primitive. In some implementations, the scan converter 54 can operate using integers for defining the boundaries of the subdivision elements; in some implementations, by increasing the size of the space to align all subdivision elements with integer boundaries, the surrounding elements are Placed in this space to allow the alignment described above. In contrast to traditional scan conversion algorithms, which may require the edge to traverse at least the center of the polygon, scan conversion here tracks any portion of a voxel that falls within the primitive. Scan conversion can be beneficially implemented in fixed-function hardware. Various approaches can be implemented to provide scan conversion for use in implementations of the present disclosure. More generally, implementations may provide a description that generates a collection of one or more 3-D shapes that completely contain the primitive, and may address each shape separately, such as for each such Create a cache entry for each shape (described below).

体元高速缓存56通过体元高速缓存选择器/输出端58将高速缓存条目输出。体元高速缓存选择器/输出端58可以回收来自一个所逐出的高速缓存条目的数据(如以下解释，被称为Vbnode数据55)。这种数据还转到格式转换器60，该格式转换器产生存储在存储器62中的加速结构元素。Vbnode数据55也可以存储在存储器62中。如果体元高速缓存选择器58不回收，格式转换器可以回收Vbnode数据55。以下更加详细地披露了这些组件的示例操作和使用。Voxel cache 56 outputs cache entries through voxel cache selector/output 58 . Voxel cache selector/output 58 may reclaim data from an evicted cache entry (referred to as Vbnode data 55 as explained below). This data also goes to a format converter 60 which produces accelerated structure elements stored in memory 62 . Vbnode data 55 may also be stored in memory 62 . If the voxel cache selector 58 does not evict, the format converter may evict the Vbnode data 55 . Example operations and uses of these components are disclosed in more detail below.

Vbnode数据定义一个临时数据结构，该临时数据结构表示图元所映射到的空间细分的一个元素。因此，每个Vbnode表示一个空间3D体积。每个Vbnode可以包含指示对其他Vbnode关系的数据。例如，Vbnode可以具有两个链路指针，分别指向一个同层级Vbnode和指向或者一个第一子Vbnode或者一个顶点。就其不描述最后层次中的最终元素的意义而言，Vbnode是瞬态的。相反，在将图元映射到工作空间细分上的过程中创建它们、将其传递到拷贝出阶段以便转换成加速结构的最终元素并且然后不再需要它们。在一些实现方式中，不管总体场景层次最终有多大，对Vbnode进行管理从而使得它们将全部保持在本地存储存储器内。在一个示例中，Vbnode旨在尽可能合理地小以便允许其足够的数量在本地存储中的层次构造过程中是活跃的。在一种实现方式中，每个需要32个字节，允许每64高速缓存行保留2个。The Vbnode data defines a temporary data structure representing one element of the spatial subdivision to which the primitive is mapped. Therefore, each Vbnode represents a spatial 3D volume. Each Vbnode may contain data indicating relationships to other Vbnodes. For example, a Vbnode may have two link pointers, respectively pointing to a same-level Vbnode and pointing to either a first sub-Vbnode or a vertex. A Vbnode is transient in the sense that it does not describe the final element in the final hierarchy. Instead, they are created during the mapping of primitives onto workspace subdivisions, passed to the copy-out stage for conversion into final elements of the accelerated structure and then no longer needed. In some implementations, no matter how large the overall scene hierarchy ends up, the Vbnodes are managed so that they will all remain in local storage memory. In one example, Vbnodes are designed to be as small as reasonably possible to allow a sufficient number of them to be active during hierarchy construction in local storage. In one implementation, each requires 32 bytes, allowing 2 to be reserved for every 64 cache lines.

可以在Vbnode中提供的示例信息包括对该节点是否为一个叶节点的一个指示、对该节点的引用的计数。如果该节点具有零引用，则不需要保持该节点。指示该节点相对于父节点所在的地方(例如，在八叉树实现方式中，Vbnode可以指示该父节点的卦限)。Vbnode还可以将数据携带通过该管线，从而使得下游功能或稍后时间中进行的功能可获得该数据。Example information that may be provided in a Vbnode includes an indication of whether the node is a leaf node, a count of references to the node. If the node has zero references, there is no need to keep the node. Indicates where the node is relative to the parent node (eg, in an octree implementation, Vbnode may indicate the hexagram limit of the parent node). Vbnodes can also carry data through the pipeline, making the data available to downstream functions or functions performed at a later time.

图2描绘了一种示例方法的概览。在120，将图元数据输入到分类器21。在122，确定该图元的特征；在124，使用或分析所确定的特征以便选择该图元被体元包围时所在的粒度等级(LOG)。简言之，更细粒度的LOG致使使用更小的体元，而较粗粒度的LOG致使使用更少的更大体元。分类器21可以根据使用不同种类数据输入的一种或多种启发法来工作。关于图3提供了与图元分类的示例方法相关的进一步细节。Figure 2 depicts an overview of an example method. At 120 , the primitive data is input to the classifier 21 . At 122, characteristics of the primitive are determined; at 124, the determined characteristics are used or analyzed to select a level of granularity (LOG) at which the primitive is surrounded by voxels. In short, a finer-grained LOG results in the use of smaller voxels, while a coarse-grained LOG results in the use of fewer larger voxels. Classifier 21 may work according to one or more heuristics using different kinds of data inputs. Further details related to an example method of primitive classification are provided with respect to FIG. 3 .

在126处，对用于包含所选定LOG下的图元的一部分的每个体元的空间定位器进行标识；本质上，对包含该图元的任一部分的体元的集合进行标识。用于每个体元的空间定位器和LOG确定高速缓存56中将与该图元相关联的条目的集合(见图13)。At 126, a spatial locator is identified for each voxel that contains a portion of the primitive under the selected LOG; essentially, the set of voxels that contain any portion of the primitive is identified. The spatial locator and LOG for each voxel determine the set of entries in cache 56 that will be associated with that primitive (see FIG. 13 ).

在128，并且针对在126处确定的每个体元，计算一个覆盖范围。在一个示例中，用预先选择的形状表示覆盖范围，在所选定的体元内具有多个维度和一个位置，从而使得该覆盖范围仅是一个包围该体元内的所有图元的所有部分的单个连续形状。在一个示例中，覆盖范围可以是立方体形状，或者为轴对准的包围盒，但其不要求所有边具有相等的维度。覆盖形状的其他示例包括面体和球体。可以基于将在最终产生的加速结构中使用的一种形状来选择用于表示覆盖范围的形状。At 128, and for each voxel determined at 126, a coverage is calculated. In one example, the coverage is represented by a pre-selected shape, with multiple dimensions and a position within the selected voxel, such that the coverage is only one that encloses all parts of all primitives within the voxel A single continuous shape of . In one example, the footprint may be cube-shaped, or an axis-aligned bounding box, but it does not require all sides to be of equal dimensions. Other examples of covering shapes include polygons and spheres. The shape used to represent the coverage can be chosen based on the one that will be used in the final resulting acceleration structure.

在130，在128处计算的覆盖范围存储在高速缓存56中。如上所述，针对每个被标识为包含被处理图元的一部分的每个体元，执行126、128和130中的每一个。因此，在处理图元之后，可以用至所处理的图元的链路对一个或多个体元高速缓存条目进行更新，并且将相应地更新与这些体元高速缓存条目中的每个相关联的一个或多个对应覆盖范围。At 130 , the coverage computed at 128 is stored in cache 56 . As described above, each of 126, 128, and 130 is performed for each voxel identified as containing a portion of the primitive being processed. Thus, after a primitive is processed, one or more voxel cache entries may be updated with a link to the processed primitive, and the voxel cache entries associated with each of these voxel cache entries will be updated accordingly. One or more corresponding coverage areas.

在130之后，可以处理进一步的图元。最终，将逐出该高速缓存中的一个体元，因为在大多数实现方式中，被分配用于高速缓存的存储器的量值将小于存储场景的所有图元的数据所需的存储器的量值，从而使得必须实时地创建加速结构的元素并且其在执行所披露方法的系统中是瞬态的。After 130, further primitives can be processed. Eventually, a voxel in the cache will be evicted, because in most implementations, the amount of memory allocated for the cache will be less than the amount of memory required to store data for all primitives of the scene , such that the elements of the acceleration structure must be created in real-time and transient in the system performing the disclosed method.

在142，从所逐出的高速缓存内容中确定加速结构的一个或多个元素。在创建后不能修定或编辑加速结构元素的情况下，更大的高速缓存大小允许创建更好的加速结构。当仅瞬态地保持几何形状和其他内容时，出现这种情况，从而可能使得编辑加速结构节点所需的状态是不可获得的。为了最小优点，针对实际考虑(如引起执行过多计算)，也可以避免这样。At 142, one or more elements of the acceleration structure are determined from the evicted cache contents. A larger cache size allows for the creation of better acceleration structures where the acceleration structure elements cannot be modified or edited after creation. This occurs when geometry and other content is only maintained transiently, potentially making the state required to edit acceleration structure nodes unavailable. This can also be avoided for minimal benefit, for practical considerations such as causing too many calculations to be performed.

向128提供从该高速缓存中逐出的内容的覆盖范围信息，其中，基于所提供的覆盖范围信息和现有的覆盖范围信息为更高层级体元计算覆盖范围。在其他实现方式中，覆盖范围信息也可以继续经历步骤120至126。Coverage information for content evicted from the cache is provided to 128, wherein coverage is calculated for higher level voxels based on the provided coverage information and existing coverage information. In other implementation manners, the coverage information may also continue to go through steps 120 to 126 .

实际上，可以将图元流化，并且当高速缓存已满时，可以从高速缓存中逐出体元，从而使得较粗粒度的体元开始从更细粒度的体元接收覆盖范围。在同类的实现方式中，所有图元被叶节点包围，从而使得较粗粒度体元仅包围叶节点的覆盖范围，而不直接包围图元本身。在多种实现方式中，可以刷新高速缓存56或者该管线以另外的方式运行以在处理覆盖盒之前完成所有图元的处理。在一些实现方式中，高速缓存56可以具有两个独立可控的部分，这些部分分别被分配至叶节点和非叶节点。In practice, primitives can be streamed, and when the cache is full, voxels can be evicted from the cache so that coarser-grained voxels start receiving coverage from finer-grained voxels. In a similar implementation, all primitives are surrounded by leaf nodes, so that the coarser-grained voxels only surround the coverage of the leaf nodes, but do not directly surround the primitives themselves. In various implementations, the cache 56 may be flushed or the pipeline otherwise run to complete processing of all primitives before processing the coverage box. In some implementations, cache 56 may have two independently controllable portions that are allocated to leaf nodes and non-leaf nodes, respectively.

图3描绘了一种对图元进行分类的示例方法，以便选择有待包围的图元所在的LOG。关于图4至图12披露了图2与图3如何与不同种类图元一起表现的示例。FIG. 3 depicts an example method of classifying primitives in order to select the LOG in which the primitives to be surrounded reside. Examples of how FIGS. 2 and 3 behave with different kinds of primitives are disclosed with respect to FIGS. 4 to 12 .

在152，对包围被表征的图元的轴对准盒的表面积进行计算。在154，可以发现被分类图元的绝对大小，并且在156，可以确定该图元的长宽比。该图元的长宽比为长度对宽度比率的计算结果；细三角形具有相对高的长宽比。例如，高的长宽比用作一个指示物，指示该图元可能无法适当利用单个大的包围体积。At 152, the surface area of the axis-aligned box enclosing the primitive being characterized is calculated. At 154, the absolute size of the classified primitive can be found, and at 156, the aspect ratio of the primitive can be determined. The aspect ratio of this primitive is a calculation of the ratio of length to width; thin triangles have relatively tall aspect ratios. For example, a tall aspect ratio is used as an indicator that the primitive may not be able to properly utilize a single large bounding volume.

在160，针对该图元，计算图元表面积对该包围体积的表面积的比率。这种比率用作一种指示，指示该图元关于该包围盒的一个或多个轴(除非变换，否则对应于场景轴)对准的良好程度。例如，三角形通常可以竖直走向，并且因此与包围体积的一个竖直面对准，该竖直面可以是更小的厚度、具有更小的表面积，从而使得图元表面积对这种包围体积的表面积的比率将更大。本计算是确定这种特征的一种示例方法；可以提供其他方式。对于所有这些确定结果或计算结果而言，可以缩放或以另外的方式改变数字，并且它们的确需要将可以外部使用的或可获得的任何数量表示为一个绝对比较。例如，对于轴对准比率，该图元的面积的两倍可以用于计算该比率，同时仍然保持一致的度量标准。类似地，在适当情况下可以清除常量以简化计算。At 160, for the primitive, a ratio of the surface area of the primitive to the surface area of the bounding volume is calculated. This ratio is used as an indication of how well the primitive is aligned with respect to one or more axes of the bounding box (corresponding to scene axes unless transformed). For example, a triangle may typically run vertically, and thus align with a vertical face of the bounding volume, which may be of lesser thickness and have a smaller surface area, making the The surface area ratio will be greater. This calculation is one example method of determining such characteristics; other methods may be provided. As with all such determinations or calculations, the numbers may be scaled or otherwise changed, and they do require expressing any quantity externally usable or available as an absolute comparison. For example, for axis alignment ratios, twice the area of the primitive can be used to calculate the ratio while still maintaining a consistent metric. Similarly, constants can be cleared to simplify calculations where appropriate.

图3的剩余部分提供了对这些数据进行求值以便实现LOG选择的示例。如将变得明显的是，这种求值不需要具有坚实的决定或硬性的决定，而相反可以是启发式的。图3中描绘的求值被显示为全都被并行执行，但那是一个实现方式细节。可以通过测试参数或条件的不同集合来执行调谐。在162，相对高的长宽比在165处引起更细粒度的趋势。在163，具有低的长宽比的相对大(绝对大小)的图元在166处引起更粗粒度的趋势。在164，指示相对高的轴对准的轴对准得分在167处提供更粗粒度的趋势。在170处，基于这些输入中的一个或多个进行粒度等级的选择。如以上介绍的，对该选择的输入通常在多个值的连续统上，并且可以具有相当多的粒度等级(在一个示例中为32)。The remainder of Figure 3 provides an example of evaluating these data to achieve LOG selection. As will become apparent, such evaluations need not have firm or hard decisions, but instead can be heuristic. The evaluations depicted in Figure 3 are shown as all being performed in parallel, but that is an implementation detail. Tuning can be performed by testing different sets of parameters or conditions. At 162, the relatively high aspect ratio induces a more fine-grained tendency at 165. Relatively large (absolute size) primitives with low aspect ratios at 163 cause a coarser grained trend at 166 . At 164 , an axis alignment score indicating a relatively high axis alignment provides a more coarse-grained trend at 167 . At 170, selection of a granularity level is made based on one or more of these inputs. As introduced above, the input to this selection is typically on a continuum of values, and can have quite a few levels of granularity (32 in one example).

进一步地，某些启发法对一些情况而言比其他情况更相关。例如，以上已经介绍了给定体元内的覆盖范围的概念，并且以下将更加详细地对其进行描述。如果用AABB追踪覆盖范围，则轴对准可以是一个被更高加权的输入。如果对球形覆盖范围进行追踪，则可以对长宽比进行更高加权。可以执行自动化调谐算法，该算法将搜索输入空间并且针对不同的加速构建设置确定什么选择标准是合适的；可以执行手动调谐，或者以上的组合。可以向这种过程提供其他输入，并且这些是示例性的。在所有情况下，不需要提供所有的输入。Further, certain heuristics are more relevant for some situations than others. For example, the concept of coverage within a given voxel has been introduced above and will be described in more detail below. If AABB is used to track coverage, axis alignment can be a more heavily weighted input. Aspect ratios can be weighted higher if tracking is done for spherical coverage. An automated tuning algorithm can be performed that will search the input space and determine what selection criteria are appropriate for different accelerated build settings; manual tuning can be performed, or a combination of the above. Other inputs may be provided to such a process, and these are exemplary. In all cases, not all inputs need to be provided.

图4A至图4F开始一个应用以上披露的简单示例。图4A描绘了体元5至9，被提供作为3-D空间细分以便用于构建加速结构。图4A的示例示出了多个体元具有相同的形式并且彼此嵌套的情况。此处，术语“形式”用于描述空间细分中所用的一个或多个体元形状的各方面。例如，如图4A中的2-D中所示，立方体形式在所有三维中提供相等的长度边，而矩形形式具有一个细长的维度。形式还指非立方体形状的长宽比，因为矩形形状可以具有低长宽比的表面(更立方的)和/或具有高长宽比(细长并且没那么立方的)。由于实际上体元在3-D空间中，所以矩形的一些面可以是立方体的，而其他面高度细长。并且，因为可以用不同大小(例如，作为嵌套体元结构，像图4A中所示的那样)提供给定形式的体元，形式还可以指不同的缩放特性，其指示给定体元放大或缩小的程度。在一种方法中，体元的每一维均匀地缩放(例如，缩放2的幂)。在另一种方法中，一维或两维均匀地缩放，并且另一维可以有区别地缩放或根本不缩放。Figures 4A-4F begin a simple example of applying the above disclosure. Figure 4A depicts voxels 5 to 9, provided as 3-D spatial subdivisions for use in building acceleration structures. The example of FIG. 4A shows a case where a plurality of voxels have the same form and are nested within each other. Here, the term "form" is used to describe aspects of the shape of one or more voxels used in spatial subdivision. For example, as shown in 2-D in Figure 4A, the cubic form provides equal length sides in all three dimensions, while the rectangular form has one elongated dimension. Form also refers to the aspect ratio of a non-cubic shape, since a rectangular shape can have a low aspect ratio surface (more cubic) and/or a high aspect ratio (elongated and less cubic). Due to the fact that voxels are in 3-D space, some faces of the rectangle may be cubic, while other faces are highly elongated. And, because voxels of a given form can be provided in different sizes (e.g., as a nested voxel structure, like that shown in FIG. or degree of reduction. In one approach, each dimension of the voxel is scaled uniformly (eg, scaled by a power of 2). In another approach, one or both dimensions are scaled uniformly, and the other dimension may be scaled differentially or not at all.

术语“形式”还可以指3-D空间中的体元的定向差(例如，与3-D坐标空间中的一个平面或另一个平面的对准)。如关于图4B至图4F所示，可以提供对不同维而言具有不同相对大小、定向差等的各种其他形式。总而言之，图4A描绘了一种途径，其中所有体元具有一个立方体体元形式。图4B至图4F描绘了其他形式的体元的多方面，其可以在细分中单独或共同使用。在一个具体方法中，在任何给定空间细分内，可以提供多种体元形式的体元，并且来自可获得体元形式的适当体元的集合可以被选择用于包围一个具体图元。The term "form" may also refer to a difference in orientation of a voxel in 3-D space (eg, alignment to one plane or another in 3-D coordinate space). As shown with respect to FIGS. 4B-4F , various other forms having different relative sizes, orientation differences, etc. for different dimensions may be provided. In summary, Figure 4A depicts an approach in which all voxels have a cubic voxel form. Figures 4B-4F depict aspects of other forms of voxels that may be used individually or together in subdivision. In one particular approach, within any given subdivision of space, voxels in multiple voxel forms may be provided, and a suitable set of voxels from the available voxel forms may be selected for enclosing a particular primitive.

在图4B，示例体元14a和14b在横截面上是矩形的，在横截面平面中具有非立方体横截面。在图4B中，没有描绘深度，并且普通技术人员将从本披露中理解到体元14a和14b的深度为这些体元的形式内的另一个可变参数。例如，该深度可以大致等于、小于或大于该高度。图4B还描绘了矩形体元15a和15b(具有一种形式，其中横截面平面的长宽比高于体元14a和14b)，并且还描绘了体元15a和15b将嵌套像体元14b一样的体元所定义的体积(在这些体积内)，因为将会跨过由该虚线框标识的空间来重复体元14b，在该虚线框中定义体元14ab和和15ab。体元14a和14b之间的定向差示出了给定维度形式的体元还可以具有多个定向差。为了清晰性，在所有这些示例中，对于所描绘的每个体元形式而言(例如体元14b)，正在使用的整个3-D空间可以填充有那种形式的体元，并且再次填充有相同形式的但更小的体元。多种不同大小和定向下的每种体元形式可以填充该3-D空间。In FIG. 4B, example voxels 14a and 14b are rectangular in cross-section, having a non-cubic cross-section in the cross-sectional plane. In Figure 4B, depth is not depicted, and one of ordinary skill will appreciate from this disclosure that the depth of voxels 14a and 14b is another variable parameter within the form of these voxels. For example, the depth can be approximately equal to, less than or greater than the height. Figure 4B also depicts rectangular voxels 15a and 15b (having a form in which the aspect ratio of the cross-sectional plane is higher than voxels 14a and 14b), and also depicts that voxels 15a and 15b will nest like voxels 14b Volumes defined by the same voxels (within these volumes), since voxel 14b will be repeated across the space identified by the dashed box in which voxels 14ab and 15ab are defined. The orientation difference between voxels 14a and 14b shows that a voxel in a given dimensional form can also have multiple orientation differences. For clarity, in all these examples, for each voxel form depicted (e.g., voxel 14b), the entire 3-D space being used can be filled with voxels of that form, and again with the same Formal but smaller voxels. Each voxel form in a variety of different sizes and orientations can fill this 3-D space.

图4C描绘了为3-D细长体元的体元16a和16b，并且虚线横跨体元16a的面，描绘了该面的维度内具有形式16a和16b的体元的缩放，但不是该深度。图4D至图4F描绘了体元17至19，这些体元在3-D空间中具有不同的缩放度和定向。在一种方法中，体元为轴对准的包围盒，与图4A至图4F的示例一致。FIG. 4C depicts voxels 16a and 16b as 3-D elongated voxels, and the dashed line spans the face of voxel 16a, depicting the scaling of voxels of form 16a and 16b within the dimension of the face, but not the voxel. depth. Figures 4D-4F depict voxels 17-19 with different scales and orientations in 3-D space. In one approach, the voxels are axis-aligned bounding boxes, consistent with the examples of FIGS. 4A-4F .

从以上披露中，在一些实施例中应明显的是将3-D场景进行细分从而使得3-D场景中的每个位置(定位)在多个体元内，这些体元可以具有不同形式和/或不同大小中的一个或多个。可以单独地对这些体元中的每个体元进行定址。可以根据预先确定的惯例来构造地址。例如，可以选择体元的一个预先确定的形式集合(例如，立方体、正方形末端的细长矩形、薄而大面积的矩形等等)。这些形式中的每种形式可以被安排成用于在不止一个定向上填充该3-D场景，例如可以平行于XY平面以递增的Z并且在YZ平面中以递增的X堆叠多个薄而大面积的矩形。这样，一个地址包括对体元的形式进行标识、对该形式的定向和对体元的大小以及还有位置进行标识的多个分量。因此，这些披露示出了一个完整的体元集合(可以从该集合中选择一个或多个体元用于包围一个具体的图元)可以由许多不同形式、大小和位置的体元组成，从而允许更好地选择体元来实现更好的包围体积。例如，灯柱和电话线柱可以被相似形式但不同大小的体元包围，而与该灯柱大小大致相同的壁可以被具有相似高度但总体上不同形式的体元包围。From the above disclosure, it should be apparent in some embodiments that a 3-D scene is subdivided such that each position (location) in the 3-D scene is within a plurality of voxels, which may have different forms and /or one or more of different sizes. Each of these voxels can be addressed individually. Addresses may be constructed according to predetermined conventions. For example, a predetermined set of forms of voxels may be selected (eg, cubes, elongated rectangles with square ends, thin rectangles with large areas, etc.). Each of these forms can be arranged to fill the 3-D scene in more than one orientation, for example multiple thin and large can be stacked parallel to the XY plane with increasing Z and in the YZ plane with increasing X area of the rectangle. Thus, an address includes components identifying the form of the voxel, the orientation of the form and identifying the size and also the location of the voxel. Thus, these disclosures show that a complete set of voxels from which one or more voxels can be selected to enclose a particular primitive can be composed of voxels of many different forms, sizes and positions, allowing Better selection of voxels for better bounding volumes. For example, a lamppost and a telephone pole may be surrounded by voxels of similar form but different sizes, while a wall approximately the same size as the lamppost may be surrounded by voxels of similar height but generally different form.

图5描绘了位于3-D空间中的图元10至13，包括将为其创建加速结构的图元。图6描绘了在选定LOG下的体元20至23，其中图元10至12在体元20至23上重叠。图7描绘了较粗粒度LOG下的体元9，在该体元中包含了图元13。图6和图7描绘了多个图元可以完全被包含在一个体元中(如图7中)或者可以被一系列体元包围。图6和图7还描绘了图元可以具有非常不同的形状，具有不同长宽比和轴对准。图6描绘了图元10比图元11更加轴向地对准，并且描绘了图元10和11两者与图元13相比具有相对高的长宽比。根据图3的启发法，图元13可以映射到比图10和11更粗的粒度LOG上。Figure 5 depicts the primitives 10 to 13 in 3-D space, including the primitives for which the acceleration structure will be created. FIG. 6 depicts voxels 20-23 under a selected LOG, where voxels 10-12 are overlaid on voxels 20-23. FIG. 7 depicts a voxel 9 under a coarser-grained LOG, in which a graphic element 13 is included. Figures 6 and 7 illustrate that multiple primitives can be completely contained within a voxel (as in Figure 7) or can be surrounded by a series of voxels. Figures 6 and 7 also depict that primitives can have very different shapes, with different aspect ratios and axis alignments. FIG. 6 depicts that primitive 10 is more axially aligned than primitive 11 , and that both primitives 10 and 11 have a relatively high aspect ratio compared to primitive 13 . According to the heuristic of FIG. 3 , primitive 13 can be mapped to a coarser-grained LOG than those in FIGS. 10 and 11 .

图8描绘了还仍然在更细粒度LOG下的体元25至31，并且这些体元共同地包围图元11。图元11没有图元10那么轴对准，并且因此，可以有益地使用体元25至31的一个更紧的拟合集合，从这些体元中产生加速结构元素。当然，取决于各种情况，还可以选择仍然更细粒度的LOG，这将仍然提供更紧的包围，但以更多的体元元素为代价，这通常与在最终加速结构中具有更多元素相关联，因此在避免对图元而言太松散的包围盒、但不产生太多的全都需要被测试和存储的加速元素之间给出折中的示例。FIG. 8 depicts voxels 25 to 31 , still under the finer-grained LOG, and these voxels collectively surround primitive 11 . Primitive 11 is less axis-aligned than primitive 10 and, therefore, a tighter fitting set of voxels 25 to 31 from which acceleration structural elements can be generated can be beneficially used. Of course, depending on the circumstances, it is also possible to choose a still finer-grained LOG, which will still provide tighter enclosing, but at the cost of more voxel elements, which is usually the same as having more elements in the final acceleration structure associated, thus giving an example of a compromise between avoiding bounding boxes that are too loose for primitives, but not generating too many accelerated elements that all need to be tested and stored.

图9描绘了体元20，其中图元11的一部分在该体元内。图9描绘了一个覆盖范围32，在本示例中其本身是一个立方体(要求所有边缘长度相等)。对于给定的体元而言，“体元覆盖范围”或简单地“覆盖范围”对体元的一部分进行定义，该部分至少部分地包围该体元中的所有几何形状(或者直接地或者并且被更细粒度的体元包围)。在一种示例实现方式中，该体元内的轴对准包围盒可以表示该覆盖范围并且与该体元的缩放成比例地定义。由于可以在一个体元位置内定义覆盖盒，所以可以用大幅度降低的精度展现该覆盖盒。在一个示例中，可以使用每轴8比特。在另一个示例中，覆盖体积为立方体。可以使用其他展现形式，如位置和在相对于该位置的一个或多个方向上的范围；在一些实现方式中，可以用任意复杂形状表示覆盖范围；然而，实现方式从使表现覆盖范围所需的数据量最小化中受益，从而使得可以在固定大小的高速缓存中同时表现更多的体元。Figure 9 depicts a voxel 20 within which a portion of the primitive 11 is located. Figure 9 depicts a coverage area 32, which in this example is itself a cube (requiring all edges to be of equal length). For a given voxel, a "voxel coverage" or simply "coverage" defines a portion of a voxel that at least partially encloses all geometry in that voxel (either directly or and surrounded by finer-grained voxels). In an example implementation, an axis-aligned bounding box within the voxel may represent the coverage and be defined proportionally to the voxel's scaling. Since the coverage box can be defined within a voxel position, it can be rendered with greatly reduced precision. In one example, 8 bits per axis may be used. In another example, the coverage volume is a cube. Other representations may be used, such as a position and extent in one or more directions relative to that position; in some implementations, coverage may be represented by arbitrarily complex shapes; however, implementations may require benefit from the minimization of the amount of data in the voxels, allowing more voxels to be represented simultaneously in a fixed-size cache.

图10描绘了根据图9的一个示例，其中除了用AABB形状而不是立方体来追踪覆盖范围33之外，再次描绘了体元20和图元11。图11描绘了覆盖范围为空间的并集，该空间并集包括一个体元内存在的图元的所有部分；具体地，图元11现在添加到体元20上，并且覆盖范围34被生成作为覆盖范围33的并集，并且也是包围图元10所要求的。Fig. 10 depicts an example according to Fig. 9, again depicting voxel 20 and primitive 11, except that coverage 33 is traced with an AABB shape instead of a cube. Figure 11 depicts the coverage as the spatial union that includes all parts of the primitives present within a voxel; specifically, primitive 11 is now added to voxel 20, and the coverage 34 is generated as The union of coverages 33 and is also required to enclose primitives 10.

图12描绘了覆盖范围和图元如何可以聚合成较粗粒度LOG下的体元。还描绘了覆盖范围32和体元21以及图元10的覆盖范围35。如所描绘的，覆盖范围35和覆盖范围32被添加到体元38上。图元36也直接添加到体元36上。因此，体元38的覆盖范围在有待用于表示该覆盖范围的形状种类所确定的约束下(例如，体元、AABB、球体或其他形状)是覆盖范围35、覆盖范围32和图元36的并集。在本示例中，基于更细粒度体元的覆盖范围并且针对被体元(例如，体元38)直接包围的任何图元(例如，图元36)计算覆盖范围。因此，不直接从本示例中的图元重新计算较粗粒度体元的覆盖范围。Figure 12 depicts how coverage and primitives can be aggregated into voxels under a coarser-grained LOG. Coverage 32 and coverage 35 of voxels 21 and primitives 10 are also depicted. Coverage 35 and coverage 32 are added to voxels 38 as depicted. Primitives 36 are also added directly to voxels 36 . Thus, the coverage of voxel 38 is that of coverage 35, coverage 32, and primitive 36 under constraints determined by the kind of shape to be used to represent the coverage (e.g., voxel, AABB, sphere, or other shape). union. In this example, the coverage is based on a finer-grained voxel coverage and is calculated for any primitive (eg, primitive 36 ) that is directly surrounded by a voxel (eg, voxel 38 ). Therefore, coverage for coarser-grained voxels is not recalculated directly from the primitives in this example.

图13描绘了高速缓存56的示例及其操作。如上所述，高速缓存56存储每个体元的空间位置ID和指示该体元的LOG的数据。图13中描绘了这种情况的示例，如空间位置ID205、207和209。LOG206、208和210分别与空间位置205、207和209相关联。图13描绘了每个空间位置ID还具有一个保持的覆盖范围，该覆盖范围指示该体元内(直接地或间接地被包围)所有图元和图元的各部分的并集。此外，还提供了每个覆盖范围的和正在相对于该空间ID被高速缓存的形状(图元)的标识符。这种信息至少应对这些覆盖范围和/或形状进行标识，但不需要包括它们的定义数据。FIG. 13 depicts an example of cache 56 and its operation. As described above, the cache 56 stores the spatial position ID of each voxel and data indicating the LOG of the voxel. An example of this is depicted in FIG. 13 as spatial location IDs 205 , 207 and 209 . LOGs 206, 208, and 210 are associated with spatial locations 205, 207, and 209, respectively. Figure 13 depicts that each spatial location ID also has a maintained coverage indicating the union of all primitives and parts of primitives within (directly or indirectly surrounded by) the voxel. Additionally, an identifier for each footprint and the shape (primitive) being cached against that space ID is also provided. Such information should at least identify these coverages and/or shapes, but need not include their defining data.

在操作中，使用被输入到高速缓存位置映射器220的空间坐标和LOG(例如，[X，Y，Z]LOG 218)对高速缓存56进行定址。这可以是为了向现有体元高速缓存条目添加覆盖范围或图元和创建新的体元高速缓存条目两者。在该示例中，映射器220使[X，Y，Z]LOG218散列(231)以产生一个用于对候选高速缓存位置(Loc 237-239)进行标识(233)的散列值235；本方面与一个交错因数相关，并且如果该高速缓存允许任何位置存储任何体元，则不需要对候选位置进行标识，并且相反，可以保持一个空闲位置列表。如果该操作是添加体元(240)操作，则确定(245)是否有空闲位置；如果有，则在那里添加该体元；如果没有，则逐出(249)一个现有体元位置以使一个位置空闲。对于向位置添加的操作(241)而言，对该位置中是否仍然有空间做出确定(242)，如果有，则添加图元或覆盖范围，而如果没有，则执行逐出(249)。可以根据启发法确定逐出哪个位置，如使用度量标准、或该位置中的元素的数目、或该位置中的体元的粒度等级与其他位置的平均值相比的差异、或这些的组合，或者另一种认为合适的启发法。与覆盖范围相关联的信息可以包括一个产生该覆盖范围的线程标识符、该覆盖范围从属的体元的地址以及一个至同层级和子节点(如果有的话)的链路。出于应用特定目的，可以提供其他字段用于将数据传送通过高速缓存。In operation, cache 56 is addressed using spatial coordinates and a LOG (eg, [X,Y,Z] LOG 218 ) that are input to cache location mapper 220 . This can be for both adding coverage or primitives to existing voxel cache entries and creating new voxel cache entries. In this example, the mapper 220 hashes (231) the [X, Y, Z] LOG 218 to produce a hash value 235 for identifying (233) the candidate cache locations (Loc 237-239); Aspects are associated with an interleaving factor, and if the cache allows any voxel to be stored at any location, no candidate locations need be identified, and instead a list of free locations can be maintained. If the operation is an add voxel (240) operation, determine (245) whether there is a free location; if so, add the voxel there; if not, evict (249) an existing voxel location so that One slot is free. For adding to a location (241), a determination is made (242) whether there is still space in the location (242), and if so, the primitive or coverage is added, and if not, an eviction is performed (249). Which location to evict can be determined based on heuristics, such as using a metric, or the number of elements in that location, or the difference in the granularity level of voxels in that location compared to the average of other locations, or a combination of these, Or another heuristic as you see fit. Information associated with a coverage may include a thread identifier that generated the coverage, the address of the voxel to which the coverage depends, and a link to peers and child nodes (if any). Additional fields may be provided for passing data through the cache for application specific purposes.

应理解到可以在执行根据本披露的方法(和系统操作)的过程中进行各种变换。没有披露所有可能的变换，因为这些是实现方式的问题并且鉴于这些披露当进行实现时普通技术人员将能够确定是否和何时使用某种变换。以下提供了这种变化的示例。It should be understood that various changes may be made in implementing the methods (and system operations) in accordance with the present disclosure. Not all possible permutations are disclosed, as these are a matter of implementation and one of ordinary skill will be able to determine if and when to use a certain permutation when implementing it given these disclosures. Examples of such variations are provided below.

如在以上示例中介绍的，3-D场景空间被细分成多个轴对准立方体体积的多级集合(如体元层次)。可以使用世界空间中的实数描述输入几何形状；因此，所有输入几何形状可以变换成一种具体LOG(或多种LOG)下的体元格栅内的坐标。此变换可以是缩放和平移成完全包含所有输入场景几何形状的包围立方体。在层次构造开始时，可以假设已知世界空间的轴对准的包围立方体包含所有这种几何形状(如来自渲染器内的在先顶点处理管线级的输出，耦合到此加速结构构建器上。变换之后，可以在本地坐标系中执行所有内部操作。当发生拷贝出时，使用逆变换在来自变换3-D空间中的展现的场景层次中产生对应的世界空间包围体积。为了执行这种逆变，该体元层次可以与可以包括每种LOG的特定数据的数据相关联。在一个示例中，这种数据可以包括：(1)本层次的包围立方体的最小世界空间坐标，(2)本层次的世界空间包围立方体的范围，(3)每LOG缩放信息，(4)场景体元到此LOG下的体元的缩放因数，(5)给定LOG下的体元的世界空间大小，(6)此LOG下的体元覆盖单元的世界空间大小，以及(7)此LOG下的最大可寻址体元。As introduced in the examples above, the 3-D scene space is subdivided into a multi-level collection (eg voxel hierarchy) of axis-aligned cubic volumes. The input geometries can be described using real numbers in world space; thus, all input geometries can be transformed into coordinates within a voxel grid under a particular LOG (or LOGs). This transformation can be scaled and translated to a bounding cube that completely encompasses all input scene geometry. At the beginning of the hierarchy, it can be assumed that the axis-aligned bounding cube of known world space contains all such geometry (as output from previous vertex processing pipeline stages within the renderer, coupled to this acceleration structure builder. After the transformation, all internal operations can be performed in the local coordinate system. When a copy-out occurs, the inverse transformation is used to produce a corresponding world-space bounding volume in the scene hierarchy from the representation in the transformed 3-D space. To perform this inverse Change, the voxel level can be associated with data that can include specific data for each type of LOG. In one example, such data can include: (1) the minimum world space coordinates of the bounding cube at this level, (2) this The range of the cube surrounded by the hierarchical world space, (3) the scaling information per LOG, (4) the scaling factor from the scene voxel to the voxel under this LOG, (5) the world space size of the voxel under the given LOG, ( 6) The world space size of the voxel coverage unit under this LOG, and (7) the largest addressable voxel under this LOG.

一些实现方式也可以提供或允许实现内部变换，这些内部变换为加速结构元素的一个定义的子集提供增强的轴对准，这些元素典型地将包围一个物体。例如，该场景中的物体可以被加速结构层次内的自含式子树包围。可以通过允许任意方式转动该场景物体来将本物体及其子树转动成相对于一个内部有意义的轴集合而轴对准，这意味着不再相对于这些场景轴一致地定义该物体。在加速层次和/或场景几何形状的光线追踪或其他使用过程中，当光线将此子树输入到该内部轴集合中时可以变换该光线(原点、方向和一个或多个裁剪距离)并且测试其在该变化条件下的相交，这允许在更高效的层次中遍历。可以将相交点变换回场景空间，如为了比较碰撞距离，或在着色过程中。Some implementations may also provide or enable internal transformations that provide enhanced axis alignment for a defined subset of acceleration structure elements that will typically surround an object. For example, objects in the scene may be surrounded by self-contained subtrees within the acceleration structure hierarchy. This object and its subtrees can be rotated to be axis-aligned relative to an internally meaningful set of axes by allowing the scene object to be rotated in any way, which means that the object is no longer consistently defined relative to these scene axes. During raytracing or other use of accelerated hierarchies and/or scene geometry, a ray can be transformed (origin, direction, and one or more clipping distances) as it enters this subtree into the set of internal axes and tested Its intersection under that varying condition, which allows for more efficient traversal in the hierarchy. Intersection points can be transformed back to scene space, e.g. for comparing collision distances, or during shading.

图14和图15描绘了从所逐出的体元高速缓存位置的内容中导出的加速结构的多个方面。图14描绘了叶元素277和278，其分别包围图元279-280和图元281-282。这些节点元素进而被较粗粒度元素274和275包围。元素2725包围元素275和图元273。275与274之间的虚线指示同层级关系。图15描绘了来自图14的元素的打包，这可以被提供用于存储在存储器中。加速结构可以存储图元的定义数据的标识符或指针、和/或该加速结构的元素。然而，鉴于顶点可以具有与其相关联的属性的不同量值，图元可能需要更多的存储，并且可能需要大小可变的存储。14 and 15 depict aspects of acceleration structures derived from the contents of evicted voxel cache locations. Figure 14 depicts leaf elements 277 and 278, which surround primitives 279-280 and primitives 281-282, respectively. These node elements are in turn surrounded by coarser grained elements 274 and 275 . Element 2725 surrounds element 275 and primitive 273. The dashed line between 275 and 274 indicates a sibling relationship. Figure 15 depicts packing of elements from Figure 14, which may be provided for storage in memory. An acceleration structure may store identifiers or pointers to definition data for primitives, and/or elements of the acceleration structure. However, primitives may require more storage, and possibly variable-sized storage, given that vertices may have different magnitudes of attributes associated with them.

在一个方面中，以上披露描述了用于为包围3-D场景中的所有几何形状的叶节点的集合确定包围体积的示例系统和方法。这些叶节点可以大小不同、根据图元的一个或多个特征以及可能基于有待在加速结构中使用的一种节点进行选择。可以用递增的抽象等级将包围体积递归地凝聚成越来越大的包围体积。In one aspect, the above disclosure describes example systems and methods for determining a bounding volume for a set of leaf nodes that bounds all geometric shapes in a 3-D scene. These leaf nodes may be of different sizes, selected according to one or more characteristics of the primitive, and possibly based on a type of node to be used in the acceleration structure. Bounding volumes can be recursively condensed into larger and larger bounding volumes with increasing levels of abstraction.

图16开始介绍用于构建用于光子映射的加速结构的途径。图16示出了一个示例3-D场景，其中，灯350和灯349将光发射到3-D空间中。在光子映射中，通过正向追踪来自多个光的光线，光子首先沉积到一个场景中。光子展现出从可从一个空间位置或体积中接收的位置中发出的光的特征，从而使得从一个点取样光子可以提供关于该点的光照信息。Figure 16 begins with an introduction to the approach used to build accelerating structures for photon mapping. Fig. 16 shows an example 3-D scene where lights 350 and 349 emit light into 3-D space. In photon mapping, photons are first deposited into a scene by forward tracing rays from multiple lights. Photons exhibit characteristics of light emitted from a location that can be received from a spatial location or volume, such that sampling photons from a point can provide lighting information about that point.

在本示例中，根据该光的特征向从灯上正向追踪的光线分配一个微分。例如，光线355从灯350发出并且被分配一个微分351。光线357从灯349发出并且被分配一个微分352。发现光线355在相交点360与一个形状相交，而发现光线357在相交点361相交。为了简明性，描绘了一个单个形状，但在实际场景中，相交点经常是宽散开的，并且将从个光投射出许多光线。图16描绘了一条也被分配有一个微分353的反射光线356。这种反射光线356还可以用于确定光子沉积的特征。In this example, the ray traced forward from the light is assigned a differential based on the characteristics of that light. For example, ray 355 emanates from lamp 350 and is assigned a differential 351 . Ray 357 is emitted from lamp 349 and is assigned a differential 352 . Ray 355 was found to intersect a shape at intersection point 360 , while ray 357 was found to intersect at intersection point 361 . For simplicity, a single shape is depicted, but in real scenes the intersections are often widely spread out and many rays will be cast from a single light. FIG. 16 depicts a reflected ray 356 that is also assigned a differential 353 . This reflected light ray 356 can also be used to characterize the photon deposition.

如将描述的，光线微分可以用于确定或以另外的方式追踪光子应如何展现在加速结构中。图17提供了自底向上的光子加速结构构建的示例。在402，光源被标识为光子源(对场景光源进行标识)，并且为了确定每条光的光子，在404，产生多条光线和多个微分。可以通过光源的特征确定每条光线的微分。例如，聚焦光源将接收比散射光源更小的微分。As will be described, ray differentiation can be used to determine or otherwise track how photons should appear in the accelerating structure. Figure 17 provides an example of bottom-up construction of photon-accelerating structures. At 402, a light source is identified as a photon source (identifying a scene light source), and to determine the photons for each light, at 404, multiple rays and multiple derivatives are generated. The differentiation of each ray can be determined by the characteristics of the light source. For example, a focused light source will receive a smaller differential than a diffuse light source.

在406，对所产生的光线进行追踪以对相交进行标识(此处，追踪包括对反射、折射等进行建模的次级光线进行追踪)。在408，针对所检测到的相交点，光子沉积在该3-D场景中的多个位置处。在410，每条光线的微分(现在与特定光子有关联)用于选择将直接存储该光子的体元的粒度等级。如在以上讨论中，存在多个嵌套的体元，每个体元可以包围3-D空间中的一个具体位置。光线微分和到该光线的相交点的距离可以用于确定该光线围绕一个相交点的宽度。所以，例如，在某一距离外相交的宽微分在相交点提供宽光线，而在相同相交距离处具有较窄微分的光线将较狭窄。在相交点具有较宽的光线驱向于使用较粗粒度等级的体元(较大的体元)，而相反地，较小的光线驱向于较细粒度等级(较小的体元)。较小的体元允许比较大的体元更精确地定位光子，但如果该光线不管怎样都是宽的，则该精度是不必要的。因此，在412，选择每个光子的LOG，并且在414，每个光子与所选定LOG确定的体元的高速缓存位置以及相交点的位置相关联。在416，可以按照上述内容管理高速缓存逐出。格式转换也可以类似地进行，在该格式转换中在该加速结构中使用的形状可以不是体元；可以使用上述形状。在一些实现方式中，光线可以与一个提示或其他元数据一起发射，该提示或其他元数据有待用于为由于与该光线相交而沉积的光子选择LOG。这种元数据可以被视为无方向光子规格的效果的半径。At 406, the resulting rays are traced to identify intersections (here, tracing includes tracing secondary rays that model reflection, refraction, etc.). At 408, photons are deposited at locations in the 3-D scene for the detected intersection points. At 410, the differentiation of each ray (now associated with a particular photon) is used to select the level of granularity of the voxel that will directly store that photon. As in the discussion above, there are multiple nested voxels, each of which may enclose a specific location in 3-D space. The ray differential and the distance to the ray's intersection point can be used to determine the ray's width around an intersection point. So, for example, a wide differential that intersects some distance away gives a wide ray at the point of intersection, while a ray with a narrower differential at the same intersection distance will be narrower. Rays with wider points at intersections are driven to use voxels at a coarser grain level (larger voxels), while conversely, rays with smaller grains are driven to use finer grain levels (smaller voxels). Smaller voxels allow more precise positioning of photons than larger voxels, but this precision is unnecessary if the ray is wide anyway. Thus, at 412, a LOG for each photon is selected, and at 414, each photon is associated with the voxel's cache location determined by the selected LOG and the location of the intersection point. At 416, cache evictions can be managed as described above. Format conversion can also be performed similarly, in which the shapes used in the acceleration structure need not be voxels; the above-mentioned shapes can be used. In some implementations, a ray may be fired with a hint or other metadata to be used to select the LOG for the photons deposited due to the intersection with the ray. This metadata can be thought of as the radius of the effect of the directionless photon specification.

光子沉积可以包括使光颜色和强度与一个点或多个点的分布、或多个参数相关联，通过在计算中解释或使用这些参数，这些参数可以用于确定光颜色和/或强度。通过直接指定有待定位于3-D场景中的光子来完成光子沉积，并且提示和其他元数据可以用于选择体元的LOG以包含这些光子。Photon deposition can include correlating light color and intensity with a distribution of a point or points, or parameters that, by interpreting or using these parameters in a calculation, can be used to determine light color and/or intensity. Photon deposition is done by directly specifying the photons to be localized in the 3-D scene, and hints and other metadata can be used to select the voxel's LOG to contain these photons.

在另一个方面中，可以将给定体元内的光子混合在一起(颜色混合)，并且它们的位置全都用该体元内的单个点表示，如中心点。还可以将光子的贡献合并成一个分布，如体元的每个面的高斯瓣分布。本质上，通过将这些光子混合在一起并且不试图区分每个体元内的每个光子的3-D位置，可以节省存储空间。这种节省作为关于精确度的折中，但可以调整实现方式，从而使得实现这些因素的适当均衡。可以与确定向特定体元添加一个给定光子相结合来完成这种混合或合并。可以创建体元的稀疏层次，该稀疏层次包含具有多个光子并且可以用于光子查询的体元。In another aspect, the photons within a given voxel can be mixed together (color mixing) and their positions all represented by a single point within that voxel, such as a center point. It is also possible to combine photon contributions into a distribution, such as a Gaussian lobe distribution for each facet of a voxel. Essentially, storage space is saved by mixing these photons together and not trying to distinguish the 3-D position of each photon within each voxel. This savings comes as a tradeoff with respect to accuracy, but the implementation can be tuned so that an appropriate balance of these factors is achieved. This mixing or merging can be done in conjunction with determining the addition of a given photon to a particular voxel. A sparse hierarchy of voxels can be created that contains voxels that have multiple photons and can be used for photon interrogation.

图18描绘了一个实施光子映射的不同途径的过程。图17描绘了以自底向上的方式处理光子，其中，在多个道次中创建使3-D空间的较大区域抽象化的越来越大的体积。图18描绘了一种不同途径。具体地，在418，对光线进行追踪，并且在420，光子沉积在相交点上。在422，对于定义的全体元结构而言，将光子添加到所有粒度等级下的每个体元中(例如，对于32级体元层次，将相同的光子添加到该体元层次32次，其中，空间坐标相同，但LOG不同)。因此，产生出包含所有粒度等级下的所有光子的体元结构。在这种产生之后，在428，自顶向下遍历该体元结构一系列道次。Figure 18 depicts a process for implementing different approaches to photon mapping. FIG. 17 depicts processing photons in a bottom-up manner, in which increasingly larger volumes abstracting larger regions of 3-D space are created in multiple passes. Figure 18 depicts a different approach. Specifically, at 418, rays are traced, and at 420, photons are deposited on intersection points. At 422, for the defined overall cell structure, photons are added to each voxel at all levels of granularity (e.g., for a voxel level of 32, the same photon is added to the voxel level 32 times, where, same spatial coordinates, but different LOG). Thus, a voxel structure containing all photons at all granularity levels is generated. Following this generation, at 428, the voxel structure is traversed a series of passes top-down.

在430，当在给定LOG下遇到一个体元(“当前体元”)时，访问或产生该体元内存在的光子的计数。如果该计数大于一个阈值，则从该层次清除当前体元，并且当前(现在已清除)体元的子节点直接连接到所清除的体元的父节点上。在432，对于该当前体元的每个子节点而言，并且如果没有清除掉该当前体元(因为其具有比阈值数量光子更少的光子)，则检查该当前体元的每个子体元。对于具有少于(或等于)该阈值数量光子的每个子体元而言，清除该子体元。除非另有说明，否则当子体元具有小于或等于阈值数量的光子时，则可以将该体元的光子推入较低粒度等级的体元(较大体元)内。换言之，图18描绘了一种根据沉积到该层次的不同部分中的光子的数量对一个完整体元层次进行修剪的途径。在一定意义上，这种修剪是针对在不同体元中的多个加速元素与多个光子之间保持平衡。该方法允许在有益的情况下使用较小的体元保持精度(在它们具有相对较高数量的光子的情况下)。例如，透镜可以使光集中并且该集中点将具有一个大的光子密度。因此，由于执行所描述的过程的实现方式，将自然地为那些光子保持小体元。At 430, when a voxel ("current voxel") is encountered at a given LOG, a count of photons present within that voxel is accessed or generated. If the count is greater than a threshold, the current voxel is cleared from the hierarchy, and the child nodes of the current (now cleared) voxel are directly connected to the parent node of the cleared voxel. At 432, for each child node of the current voxel, and if the current voxel is not cleared because it has fewer photons than a threshold number of photons, each child voxel of the current voxel is checked. For each sub-voxel that has less than (or equal to) the threshold number of photons, the sub-voxel is purged. Unless otherwise stated, when a child voxel has less than or equal to a threshold number of photons, then the photons of that voxel may be pushed into voxels of lower granularity levels (larger voxels). In other words, Figure 18 depicts an approach for pruning a complete voxel hierarchy according to the number of photons deposited into different parts of the hierarchy. In a sense, this pruning is aimed at maintaining a balance between multiple accelerating elements and multiple photons in different voxels. This approach allows the use of smaller voxels to preserve accuracy (in cases where they have a relatively high number of photons) where it is beneficial. For example, a lens can concentrate light and that concentrated point will have a large photon density. Therefore, due to the implementation in which the described process is carried out, voxels will naturally be kept small for those photons.

图19描绘了一个示例系统的多个方面，其中，可以实践这些披露的实现方式。图19描绘了一个集群阵列511，该阵列包括核心512-518，其可以执行计算，如图形计算。数据管理504-510的集合可以设置有待在集群511的阵列上执行的计算。集群阵列511可以具有被一个或多个核心512-518使用的纹理管线520和522。某些种类的计算可以使用包单元524，该包单元具有一个就绪堆栈526、一个收集定义/体元高速缓存528、一个空堆栈530和一个打包机532。扫描转换器534可以耦合到包单元524和格式器536上。格式器536进而可以与高速缓存层次542通信，该高速缓存层次与系统内存接口538通信。主机接口502可以通过总线540提供向另一个计算系统的通信能力。Figure 19 depicts aspects of an example system in which the disclosed implementations may be practiced. Figure 19 depicts a cluster array 511 that includes cores 512-518 that can perform computations, such as graphics computations. A collection of data management 504 - 510 may set up computations to be performed on the arrays of cluster 511 . Cluster array 511 may have texture pipelines 520 and 522 used by one or more cores 512-518. Certain kinds of computations can use the pack unit 524 , which has a ready stack 526 , a collection definition/voxel cache 528 , an empty stack 530 and a packer 532 . Scan converter 534 may be coupled to packet unit 524 and formatter 536 . Formatter 536 may in turn communicate with cache hierarchy 542 , which communicates with system memory interface 538 . Host interface 502 may provide communication capabilities to another computing system via bus 540 .

具体关于当前加速结构构建算法，扫描转换器534可以使用收集定义内存528来存储体元(例如，起到图1的高速缓存56的作用)。收集定义内存528以另外的方式起到针对一个或多个密钥来收集数据集合的作用，并且可以将该数据集合分发在多个包内。可以从空堆栈530中检索有待用来自收集内存的数据填充的包槽并将其输入到就绪堆栈526中，该就绪堆栈由调度器540读取，并且这种包的内容分布在该集群阵列511上以便执行。在一个示例中，在包中处理光线，并且收集定义内存528存储多个收集中的光线标识符，这些收集每个与一种形状或多种形状相关联，该一种或多种形状有待测试与每个收集中标识的光线的相交。With particular regard to the current accelerated structure construction algorithm, scan converter 534 may use collection definition memory 528 to store voxels (eg, functioning as cache 56 of FIG. 1 ). Collection definition memory 528 otherwise functions to collect a collection of data for one or more keys, and may distribute this collection of data within multiple packages. Packet slots to be filled with data from collection memory can be retrieved from the empty stack 530 and entered into the ready stack 526, which is read by the scheduler 540 and the contents of such packets distributed across the cluster array 511 up for execution. In one example, rays are processed in packages, and collection definition memory 528 stores ray identifiers in a plurality of collections each associated with a shape or shapes to be tested The intersection with the rays identified in each collection.

加速结构的元素可以用于使更大数量的图元抽象化。示例中没有任何事物应被认为局限于遵循在此概述的操作原则和技术的实现方式、可以解决、应解决的问题、或者对这些披露在将它们外推、抽象或将它们应用在一种不同背景下的有用性的限制，以便解决一种不同的问题或者以另外的方式实现它们。相反，特殊性提供应轻易理解的具体示例，并且普通技术人员将能够采用和从这些披露中学习以便在具体设置中实施它们。以上各披露描述了确定某些条件、特征或值；普通技术人员将从这些披露中理解到这种确定不需要是恰好的，而是对这些情况足够的精确程度。Elements of the acceleration structure can be used to abstract a larger number of primitives. Nothing in the examples should be considered limited to implementations following the principles of operation and techniques outlined herein, the problems that can be solved, should be solved, or the extent to which these disclosures extrapolate, abstract, or apply them in a different Limitations of usefulness in the context of solving a different problem or implementing them in another way. Rather, the specificity provides specific examples that should be readily understood, and one of ordinary skill will be able to adopt and learn from the disclosure to implement them in specific settings. Each of the above disclosures describes determining certain conditions, characteristics or values; those of ordinary skill will appreciate from these disclosures that such determination need not be exact, but with a degree of precision sufficient for the circumstances.

可以提供计算机代码和相关联数据用于实施在此描述的过程的某些部分和其他方面，通过将处理器配置成用于在这种过程及其部分的执行中执行指令来实施。计算机代码可以包括计算机可执行指令，这些计算机可执行指令例如可以是二进制指令、中间格式指令(如汇编语言、固件或源代码)。该代码可以配置或者以另外的方式致使被配置通用计算机、专用计算机或专用处理设备来执行某一功能或一组功能。任何这样的代码可以存储在有形机器可读介质(如固态驱动器、硬盘驱动器、CD-ROM和其他光学存储装置)中，瞬时地存储在易失性存储器(例如DRAM)中，或非瞬时地存储在SRAM中。Computer code and associated data may be provided for implementing certain portions of the processes and other aspects described herein by a processor configured to execute instructions in the execution of such processes and portions thereof. Computer code may include computer-executable instructions, which may be, for example, binary instructions, instructions in an intermediate format such as assembly language, firmware, or source code. The code may configure, or otherwise cause to be configured, a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Any such code may be stored on a tangible machine-readable medium (such as a solid-state drive, hard disk drive, CD-ROM, and other optical storage devices), transiently in volatile memory (such as DRAM), or non-transitory in SRAM.

可以提供各种实现方式，这些实现可方式可以包括互操作硬件、固件和/或软件，该互操作硬件、固件和/或软件也可以体现在各种形状因数和设备的任意一种中，包括膝上型计算机、智能电话、小型个人计算机、个人数字助理等。此处描述的功能性也可以体现在外围设备或附加卡中。通过进一步的示例，这种功能性还可以实现在在单个设备上执行的不同芯片或不同进程之间的电路板上。Various implementations may be provided, which may include interoperable hardware, firmware, and/or software, which may also be embodied in any of a variety of form factors and devices, including Laptops, smart phones, small personal computers, personal digital assistants, and more. The functionality described here may also be embodied in peripheral devices or add-in cards. By way of further example, such functionality may also be implemented on a circuit board between different chips or different processes executing on a single device.

例如，针对根据这些示例的机器可以包括一个使用固定用途的3-D扫描转换器、用于分析图元的多个可编程元件、以及用于高速缓存的多个内存。进一步的机器组件包括多条通信链路。在此描述的系统的实现方式可以是一个更大系统的组件，该更大系统包括其他输入和输出设备，如一个或多个应用处理器、网络接口、磁盘驱动器或固态驱动器、显示器等。For example, machines according to these examples may include a fixed-purpose 3-D scan converter, programmable elements for analyzing primitives, and memories for cache memory. Further machine components include multiple communication links. Implementations of the systems described herein may be components of a larger system including other input and output devices, such as one or more application processors, network interfaces, disk drives or solid state drives, displays, and the like.

在以上示例中，这些加速结构构建组件是产生输出的系统或机器的一部分，这些输出可以用于各种目的，如用于对视频游戏中的图像、或对动画进行渲染，或者用于存储或向用户传输等。In the examples above, these accelerated structure building components are part of a system or machine that produces output that can be used for various purposes, such as for rendering images in a video game, or for animation, or for storing or transmission to users, etc.

尽管使用了各种示例和其他信息来解释所附权利要求书的范围内的多个方面，但不应基于这种示例中的具体特征或安排而隐含对权利要求书的限制，因为普通技术人员将能够使用这些示例导出很多种实现方式。进一步地，并且尽管以结构特征和/或方法步骤的示例专用的语言描述了某个主题，但应理解，所附权利要求书中限定的主题不一定局限于这些描述的特征或动作。可以用线性流程方式描述示例过程，并且描述了多个过程部分的序列，但这种描述是为了方便，并且可以并行地或以不同的顺序执行某些这种部分。此外，多个线程可以执行每个过程部分，并且不同线程可以被分配用于执行所描述过程的不同部分。此外，功能性可以分布在与除了在此标识的组件以外的组件、附加的组件或更少的组件中，或在与除了在此标识的组件以外的组件、附加的组件或更少的组件中执行。相反，公开了所述特征和步骤作为所附权利要求书的范围内的系统和方法的组件的例子。While various examples and other information have been used to explain aspects within the scope of the appended claims, no limitations should be implied to the claims based on specific features or arrangements in such examples because ordinary skill One will be able to derive many implementations using these examples. Further, and although certain subject matter has been described in language specific to examples of structural features and/or methodological steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to these described features or acts. Example processes may be described in a linear flow fashion, and sequences of various process portions are described, but such description is for convenience and some such portions may be performed in parallel or in a different order. Furthermore, multiple threads may execute each process portion, and different threads may be assigned to execute different portions of the described process. Furthermore, functionality may be distributed in, or in, other than, additional, or fewer components than those identified herein implement. Rather, the described features and steps are disclosed as example components of systems and methods within the scope of the appended claims.

Claims

1. A machine-implemented method for generating an accelerated structure for use in rendering a computer graphics image from a three-dimensional 3-D scene, comprising:

mapping a set of primitives to nodes of a workspace subdivision of the 3-D scene, each node of the workspace subdivision surrounding a 3-D volume within the 3-D scene;

Each of the nodes is represented by a corresponding temporary data structure stored on a machine-readable medium, the data in the temporary data structure including data identifying one or more of the following: a parent node, a child node, and a primitive in the set of primitives mapped to the node;

Defining elements of a hierarchical acceleration structure from an input comprising data from a selection made from the temporary data structure, the definition comprising subdivisions from the workspace corresponding to the selected temporary data structure the 3-D volume within the node determines a respective 3-D volume for each of the elements of the hierarchical acceleration structure; and

A hierarchical acceleration structure is generated in a machine-readable medium using the determined 3-D volume.

2. The machine-implemented method of claim 1, wherein said representing each of said nodes comprises providing data describing a subsection of said 3-D volume, said 3-D volume surrounding a map mapped to all parts in said primitive of the node, and said defining includes aggregating the respective subparts of each of the 3-D volumes in the 3-D volume surrounded by said node corresponding to the selected temporary data structure , the aggregation defines the respective 3-D volumes of the hierarchical acceleration structure defined for the selection of the temporary data structure.

3. The machine-implemented method of claim 1 , wherein the defining is performed based on a subset of the nodes of the temporary data structure for which data is stored in a cache .

4. The machine-implemented method of claim 1, wherein each of the nodes of the workspace subdivision is represented by a set of planes defining a bounding box.

5. The machine-implemented method of claim 4, wherein the 3-D volume within the node represented by the selected temporary data structure is represented by data locating a corresponding subvolume within each of the nodes to define.

6. The machine-implemented method of claim 5 , wherein locating the data of a subvolume within each of the nodes comprises locating the node relative to the node's position in the 3-D scene. The subvolume's data.

7. The machine-implemented method of claim 1 , wherein said defining comprises defining each element of said hierarchical acceleration structure as an axis-aligned bounding box, said axis-aligned bounding box being defined as being A union of axis-aligned bounding boxes positioned within nodes corresponding to said selection of said temporary data structure.

8. A machine-implemented method for generating an accelerated structure for use in rendering a computer graphics image from a three-dimensional 3-D scene, comprising:

mapping a primitive of the plurality of primitives for which a hierarchical acceleration structure is to be created into a workspace subdivision of the 3-D scene, the mapping comprising determining to collectively enclose the primitive in 3-D space A corresponding collection of one or more 3-D volumes of ;

forming, in the machine-readable medium, a hierarchical acceleration structure from the set of 3-D volumes determined for the mapped primitives, wherein the forming includes:

determining a set of leaf elements, each leaf element having corresponding data describing a bounding volume that is a 3-D volume within a selection of one or more 3-D volumes from the corresponding set of determined 3-D volumes Union of D spaces;

determining a hierarchy of non-leaf elements that respectively define a bounding volume including a 3-D volume within each leaf element that is a child of the non-leaf element, and that the non-leaf elements to is contained within the bounding volume of the non-leaf element.

9. The machine-implemented method of claim 8, further comprising determining respective relative sizes of the one or more 3-D volumes in the set of 3-D volumes based on characteristics of the primitives and location.

10. The machine-implemented method of claim 8 , wherein the heuristic comprises evaluating a surface area indicative of the primitive compared to a corresponding surface area of a candidate set of 3-D elements to which the primitive can be mapped. The ratio.

11. The machine-implemented method of claim 8, wherein each primitive is mapped to one or more volume elements, each volume element identifiable by a unique address indicating the location of the volume element.

12. The machine-implemented method of claim 8, wherein the heuristic comprises evaluating a size of the primitive relative to a dimension of the 3-D scene.

13. The machine-implemented method of claim 8, wherein the heuristic includes evaluating an aspect ratio of the primitive.

14. The machine-implemented method of claim 8, wherein the mapping includes determining a granularity of segmentation to be performed on the primitive, and then determining the set of 3-D volumes at the granularity of segmentation.

15. The machine-implemented method of claim 8 , further comprising tracking a 3-D overlay defined as a bounding volume of a predetermined shape, the predetermined shape containing the 3-D overlay dependent, positioned in the map Part of all primitives within the node determined during the period.

16. The machine-implemented method of claim 15, wherein the predetermined shape is selected according to a type of shape for an element of the hierarchical acceleration structure.

17. The machine-implemented method of claim 8, wherein said determining said hierarchy of non-leaf elements comprises maintaining 3-D coverage for multiple regions within said 3-D scene, and completing and start processing another region of the 3-D overlay hierarchically comprising the completed region of the hierarchical acceleration structure.

18. An apparatus for generating an acceleration structure for use in rendering a 3-D scene, comprising:

a streaming input configured to receive a definition of primitives, each primitive being located in the 3-D scene, and to generate a hierarchical acceleration structure containing the primitives for the 3-D scene;

a mapper configured to map each primitive to a corresponding set of temporary 3-D scene subdivision elements determined by applying heuristics to different candidate scene subdivisions to which the primitive can be mapped;

a cache coupled to the mapper and capable of storing data associated with the primitive, the data identifying fewer than all temporary 3-D scene tessellation elements of the temporary 3-D scene tessellation element a subset of sub-elements; and

a hierarchy builder configured to operate on the subset of cached temporary 3-D scene subdivision elements to produce a portion of the hierarchy acceleration structure, and write to system memory the portion defining the hierarchy acceleration structure and wherein the cache is configured to evict at least a portion of the data defining the subset of cached temporary 3-D scene subdivision elements, all of the cached temporary 3-D scene subdivision elements The subset is used in defining the portion of the hierarchical acceleration structure written to the system memory.

19. The system of claim 18 , further comprising a coverage module configured to determine a subvolume within each temporary scene subdivision element for each corresponding set of primitives, each subvolume enclosing the temporary scene subdivision The part of this primitive in the element.

20. The system of claim 18, wherein the overlay module is further configured to generate data defining each subvolume, the data positioning the subvolume relative to the position of a temporary 3-D scene subdivision element.

21. The system of claim 18, wherein the heuristic comprises evaluating a ratio indicating a surface area of the primitive compared to a surface area of a candidate set of bounding boxes to which the primitive may be mapped.

22. The system of claim 20, wherein the subvolume is defined according to a predetermined shape, the predetermined shape being the shape of an element used in the hierarchical acceleration structure, and the predetermined shape can be independent of is determined by defining the shape of the 3-D volume in the workspace subdivision.