CN115298686B

CN115298686B - System and method for efficient multi-GPU rendering of geometry by pre-testing for interleaved screen areas prior to rendering

Info

Publication number: CN115298686B
Application number: CN202180023019.6A
Authority: CN
Inventors: M.E.塞尔尼; F.斯特劳斯; T.伯格霍夫
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2020-02-03
Filing date: 2021-02-01
Publication date: 2023-10-17
Anticipated expiration: 2041-02-01
Also published as: JP2023144060A; WO2021158483A8; JP2024091921A; CN115298686A; WO2021158483A1; JP7481556B2; JP7564399B2; JP7334358B2; JP2023505607A; JP7702056B2; EP4100922A1; JP2024178419A

Abstract

A method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on the interleaved plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning geometries of image frames generated by the application to the GPU for geometry pre-testing. The method includes performing a geometry pre-test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen regions. The method includes using the information at each of the plurality of GPUs in rendering the image frames.

Description

Geometry mapping by pre-testing against interleaved screen areas before rendering System and method for efficient multi-GPU rendering of graphics

技术领域Technical field

本公开涉及图形处理，并且更具体地涉及在为应用渲染图像时的多GPU协作。This disclosure relates to graphics processing, and more specifically to multi-GPU collaboration in rendering images for applications.

背景技术Background technique

近年来，在线服务不断推动，允许在云游戏服务器和通过网络连接的客户端之间以流式传输格式进行在线游戏或云游戏。由于按需提供游戏名称、执行更复杂游戏的能力、玩家之间联网以进行多玩家游戏的能力、玩家之间的资产共享、玩家和/或观众之间的即时体验共享、允许朋友观看朋友玩视频游戏、让朋友加入朋友正在进行的游戏进行等，流式传输格式越来越受欢迎。In recent years, there has been a push for online services that allow online gaming or cloud gaming in a streaming format between cloud gaming servers and clients connected through the network. Due to the availability of game titles on demand, the ability to perform more complex games, the ability to network between players for multi-player games, asset sharing between players, instant experience sharing between players and/or viewers, allowing friends to watch friends play Video games, letting friends join their friends' ongoing gaming sessions, etc., the streaming format is becoming increasingly popular.

云游戏服务器可以被配置为向一个或多个客户端和/或应用提供资源。也就是说，云游戏服务器可以配置有能够高吞吐量的资源。例如，单个图形处理单元(GPU)可以达到的性能是有限的。为了在生成场景时渲染更复杂的场景或使用更复杂的算法(例如材质、照明等)，可能需要使用多个GPU来渲染单个图像。然而，这些图形处理单元的同等使用是难以实现的。此外，即使有多个GPU使用传统技术为应用处理图像，也无法支持对应增加的屏幕像素计数和几何图形密度(例如，四个GPU不能写入四倍的像素和/或处理图像的四倍顶点或图元)。A cloud gaming server may be configured to provide resources to one or more clients and/or applications. That is, cloud gaming servers can be configured with resources capable of high throughput. For example, there is a limit to the performance a single graphics processing unit (GPU) can achieve. In order to render more complex scenes or use more complex algorithms (such as materials, lighting, etc.) when generating the scene, it may be necessary to use multiple GPUs to render a single image. However, equivalent use of these graphics processing units is difficult to achieve. Additionally, even if multiple GPUs were used to process images for the application using traditional techniques, they would not be able to support the corresponding increase in screen pixel count and geometry density (e.g., four GPUs would not be able to write four times the pixels and/or process four times the vertices of the image or primitive).

正是在这种背景下，出现了本公开的实施方案。It is against this background that embodiments of the present disclosure arise.

发明内容Contents of the invention

本公开的实施方案涉及使用多个GPU协作来渲染单个图像，诸如通过在渲染之前针对可能交错的屏幕区域进行预测试来为应用进行几何图形的多GPU渲染。Embodiments of the present disclosure involve using multiple GPUs to collaborate to render a single image, such as multi-GPU rendering of geometry for an application by pre-testing for potentially interleaved screen areas prior to rendering.

本公开的实施方案公开了一种用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。屏幕区域是交错的。该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。该方法包括将由应用生成的图像帧的几何图形分配给GPU以用于几何图形测试。该方法包括在GPU处执行几何图形测试以生成关于该几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括在多个GPU中的每一个处使用该信息来渲染该几何图形，其中使用该信息可以包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。Embodiments of the present disclosure disclose a method for graphics processing. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. Screen areas are staggered. The method includes allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The method includes allocating the geometry of image frames generated by the application to the GPU for geometry testing. The method includes performing a geometry test at the GPU to generate information about the geometry and its relationship to each of a plurality of screen areas. The method includes using the information to render the geometry at each of a plurality of GPUs, where using the information may include, for example, skipping entirely if it has been determined that the geometry does not overlap any screen area allocated to the given GPU. Over rendering.

在另一实施方案中，公开了一种用于执行方法的非暂时性计算机可读介质。该计算机可读介质包括用于使用多个图形处理单元(GPU)为应用渲染图形的程序指令。该计算机可读介质包括用于基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任的程序指令，每个GPU具有多个GPU已知的对应责任划分，其中所述多个屏幕区域中的屏幕区域是交错的。该计算机可读介质包括用于将由应用生成的图像帧的几何图形分配给GPU以用于几何图形预测试的程序指令。该计算机可读介质包括用于在GPU处执行几何图形预测试以生成关于该几何图形及其与多个屏幕区域中的每一个的关系的信息的程序指令。该计算机可读介质包括用于在渲染图像帧时在多个GPU中的每一个处使用该信息的程序指令。In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer-readable medium includes program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs). The computer-readable medium includes program instructions for dividing responsibility for rendering geometry of a graphic among a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibilities known to the plurality of GPUs, wherein said Screen areas within multiple screen areas are interleaved. The computer-readable medium includes program instructions for allocating geometry of image frames generated by an application to a GPU for geometry pre-testing. The computer-readable medium includes program instructions for performing geometry pretesting at a GPU to generate information about the geometry and its relationship to each of a plurality of screen areas. The computer-readable medium includes program instructions for using the information at each of a plurality of GPUs when rendering image frames.

在另一实施方案中，公开了一种计算机系统，包括处理器和存储器，所述存储器耦合到处理器并且其中存储有指令，所述指令如果由计算机系统执行，则致使计算机系统执行用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分，其中所述多个屏幕区域中的屏幕区域是交错的。该方法包括将由应用生成的图像帧的几何图形分配给GPU以用于几何图形预测试。该方法包括在GPU处执行几何图形预测试以生成关于该几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括在渲染图像帧时在多个GPU中的每一个处使用该信息。In another embodiment, a computer system is disclosed, including a processor and a memory, the memory being coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform operations for graphics Processing method. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing responsibilities for rendering geometry between a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibilities known to the plurality of GPUs, wherein the screen areas of the plurality of screen areas It's staggered. The method includes allocating the geometry of image frames generated by the application to the GPU for geometry pre-testing. The method includes performing a geometry pretest at the GPU to generate information about the geometry and its relationship to each of a plurality of screen areas. The method includes using the information at each of multiple GPUs when rendering image frames.

本公开的实施方案公开了一种用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括在预测试GPU处对由应用生成的图像帧的多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。Embodiments of the present disclosure disclose a method for graphics processing. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes performing geometry testing at a pre-test GPU on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of the plurality of screen areas. The method includes rendering, at each of a plurality of GPUs, using information generated for each of a plurality of geometries, wherein using the information includes, for example, if it has been determined that the geometry does not match the If any screen area overlaps a given GPU, rendering is skipped entirely.

在另一实施方案中，公开了一种用于执行方法的非暂时性计算机可读介质。该计算机可读介质包括用于使用多个图形处理单元(GPU)为应用渲染图形的程序指令。该计算机可读介质包括用于基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任的程序指令，每个GPU具有多个GPU已知的对应责任划分。该计算机可读介质包括用于在预测试GPU处对由应用生成的图像帧的多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息的程序指令。该计算机可读介质包括用于在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形的程序指令，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer-readable medium includes program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs). The computer-readable medium includes program instructions for dividing responsibility for rendering geometry of a graphic among a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibility known to the plurality of GPUs. The computer-readable medium includes means for performing, at a pre-test GPU, a geometry test on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of the plurality of screen areas. information program instructions. The computer-readable medium includes program instructions for rendering, at each of a plurality of GPUs, information generated for each of a plurality of geometries, wherein using the information includes, for example, if Make sure that the geometry does not overlap any screen area allocated to the given GPU, then skip rendering entirely.

在另一实施方案中，公开了一种计算机系统，包括处理器和存储器，所述存储器耦合到处理器并且其中存储有指令，所述指令如果由计算机系统执行，则致使计算机系统执行用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括在预测试GPU处对由应用生成的图像帧的多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a computer system is disclosed, including a processor and a memory, the memory being coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform operations for graphics Processing method. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes performing geometry testing at a pre-test GPU on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of the plurality of screen areas. The method includes rendering, at each of a plurality of GPUs, using information generated for each of a plurality of geometries, wherein using the information includes, for example, if it has been determined that the geometry does not match the If any screen area overlaps a given GPU, rendering is skipped entirely.

本公开的实施方案公开了一种用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括在由应用生成的前一图像帧的渲染阶段期间在多个GPU处渲染第一多个几何图形。该方法包括为前一图像帧的渲染生成统计数据。该方法包括基于统计数据将由应用生成的当前图像帧的第二多个几何图形分配给多个GPU用于几何图形测试。该方法包括在当前图像帧上对第二多个几何图形执行几何图形测试，以生成关于第二多个几何图形中的每一个及其与多个屏幕区域中的每一个的关系的信息，其中几何图形测试是基于分配在多个GPU中的每一个处执行。该方法包括在多个GPU中的每一个处使用为第二多个几何图形中的每一个生成的信息来渲染所述第二多个几何图形，其中使用该信息可以包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。Embodiments of the present disclosure disclose a method for graphics processing. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes rendering a first plurality of geometries at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The method includes generating statistics for the rendering of a previous image frame. The method includes allocating a second plurality of geometries of a current image frame generated by an application to a plurality of GPUs for geometry testing based on statistical data. The method includes performing a geometry test on a second plurality of geometric figures on a current image frame to generate information about each of the second plurality of geometric figures and its relationship to each of a plurality of screen areas, wherein Geometry tests are performed on each of multiple GPUs based on allocation. The method includes rendering at each of a plurality of GPUs using information generated for each of the second plurality of geometries, wherein using the information may include, for example, if the geometry has been determined If the graphic does not overlap any screen area allocated to a given GPU, rendering is skipped entirely.

在另一实施方案中，公开了一种用于执行方法的非暂时性计算机可读介质。该计算机可读介质包括用于使用多个图形处理单元(GPU)为应用渲染图形的程序指令。该计算机可读介质包括用于基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任的程序指令，每个GPU具有多个GPU已知的对应责任划分。该计算机可读介质包括用于在由应用生成的前一图像帧的渲染阶段期间在多个GPU处渲染第一多个几何图形的程序指令。该计算机可读介质包括用于为渲染前一图像帧生成统计数据的程序指令。该计算机可读介质包括用于基于统计数据将由应用生成的当前图像帧的第二多个几何图形分配给多个GPU用于几何图形测试的程序指令。该计算机可读介质包括用于在当前图像帧上对第二多个几何图形执行几何图形测试，以生成关于第二多个几何图形中的每一个及其与多个屏幕区域中的每一个的关系的信息的程序指令，其中几何图形测试是基于分配在多个GPU中的每一个处执行。该计算机可读介质包括用于在多个GPU中的每一个处使用为第二多个几何图形中的每一个生成的信息来渲染所述第二多个几何图形的程序指令，其中使用该信息可以包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer-readable medium includes program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs). The computer-readable medium includes program instructions for dividing responsibility for rendering geometry of a graphic among a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibility known to the plurality of GPUs. The computer-readable medium includes program instructions for rendering a first plurality of geometries at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The computer-readable medium includes program instructions for generating statistical data for rendering a previous image frame. The computer-readable medium includes program instructions for allocating a second plurality of geometries of a current image frame generated by an application to a plurality of GPUs for geometry testing based on statistical data. The computer-readable medium includes means for performing a geometry test on a second plurality of geometric figures on a current image frame to generate a graph with respect to each of the second plurality of geometric figures and each of the plurality of screen areas. Information related to program instructions on which geometry testing is based is allocated to execution at each of multiple GPUs. The computer-readable medium includes program instructions for rendering, at each of a plurality of GPUs, information generated for each of the second plurality of geometries, wherein the information is used This can include, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to a given GPU.

在另一实施方案中，公开了一种计算机系统，包括处理器和存储器，所述存储器耦合到处理器并且其中存储有指令，所述指令如果由计算机系统执行，则致使计算机系统执行用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括在由应用生成的前一图像帧的渲染阶段期间在多个GPU处渲染第一多个几何图形。该方法包括为前一图像帧的渲染生成统计数据。该方法包括基于统计数据将由应用生成的当前图像帧的第二多个几何图形分配给多个GPU用于几何图形测试。该方法包括在当前图像帧上对第二多个几何图形执行几何图形测试，以生成关于第二多个几何图形中的每一个及其与多个屏幕区域中的每一个的关系的信息，其中几何图形测试是基于分配在多个GPU中的每一个处执行。该方法包括在多个GPU中的每一个处使用为第二多个几何图形中的每一个生成的信息来渲染所述第二多个几何图形，其中使用该信息可以包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a computer system is disclosed, including a processor and a memory, the memory being coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform operations for graphics Processing method. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes rendering a first plurality of geometries at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The method includes generating statistics for the rendering of a previous image frame. The method includes allocating a second plurality of geometries of a current image frame generated by an application to a plurality of GPUs for geometry testing based on statistical data. The method includes performing a geometry test on a second plurality of geometric figures on a current image frame to generate information about each of the second plurality of geometric figures and its relationship to each of a plurality of screen areas, wherein Geometry tests are performed on each of multiple GPUs based on allocation. The method includes rendering at each of a plurality of GPUs using information generated for each of the second plurality of geometries, wherein using the information may include, for example, if the geometry has been determined If the graphic does not overlap any screen area allocated to a given GPU, rendering is skipped entirely.

本公开的实施方案公开了一种用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。该方法包括设置配置一个或多个着色器以执行几何图形测试的第一状态。该方法包括在多个GPU处对多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括设置配置一个或多个着色器以执行渲染的第二状态。该方法包括在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。Embodiments of the present disclosure disclose a method for graphics processing. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The method includes setting a first state that configures one or more shaders to perform geometry testing. The method includes performing geometry tests on multiple geometries at multiple GPUs to generate information about each geometry and its relationship to each of the multiple screen areas. The method includes setting a second state that configures one or more shaders to perform rendering. The method includes rendering, at each of a plurality of GPUs, using information generated for each of a plurality of geometries, wherein using the information includes, for example, if it has been determined that the geometry does not match the If any screen area overlaps a given GPU, rendering is skipped entirely.

在另一实施方案中，公开了一种用于执行方法的非暂时性计算机可读介质。该计算机可读介质包括用于使用多个图形处理单元(GPU)为应用渲染图形的程序指令。该计算机可读介质包括用于基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任的程序指令，每个GPU具有多个GPU已知的对应责任划分。该计算机可读介质包括用于将图像帧的多个几何图形分配给多个GPU以用于几何图形测试的程序指令。该计算机可读介质包括用于设置配置一个或多个着色器以执行几何图形测试的第一状态的程序指令。该计算机可读介质包括用于在多个GPU处对多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息的程序指令。该计算机可读介质包括用于设置配置一个或多个着色器以执行渲染的第二状态的程序指令。该计算机可读介质包括用于在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形的程序指令，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer-readable medium includes program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs). The computer-readable medium includes program instructions for dividing responsibility for rendering geometry of a graphic among a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibility known to the plurality of GPUs. The computer-readable medium includes program instructions for allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The computer-readable medium includes program instructions for setting a first state that configures one or more shaders to perform geometry testing. The computer-readable medium includes program instructions for performing geometry testing on a plurality of geometries at a plurality of GPUs to generate information about each geometry and its relationship to each of a plurality of screen areas. The computer-readable medium includes program instructions for setting a second state that configures one or more shaders to perform rendering. The computer-readable medium includes program instructions for rendering, at each of a plurality of GPUs, information generated for each of a plurality of geometries, wherein using the information includes, for example, if Make sure that the geometry does not overlap any screen area allocated to the given GPU, then skip rendering entirely.

在另一实施方案中，公开了一种计算机系统，包括处理器和存储器，所述存储器耦合到处理器并且其中存储有指令，所述指令如果由计算机系统执行，则致使计算机系统执行用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。该方法包括设置配置一个或多个着色器以执行几何图形测试的第一状态。该方法包括在多个GPU处对多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。该方法包括设置配置一个或多个着色器以执行渲染的第二状态。该方法包括在多个GPU中的每一个处使用为多个几何图形中的每一个生成的信息来渲染所述多个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a computer system is disclosed, including a processor and a memory, the memory being coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform operations for graphics Processing method. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The method includes setting a first state that configures one or more shaders to perform geometry testing. The method includes performing geometry tests on multiple geometries at multiple GPUs to generate information about each geometry and its relationship to each of the multiple screen areas. The method includes setting a second state that configures one or more shaders to perform rendering. The method includes rendering, at each of a plurality of GPUs, using information generated for each of a plurality of geometries, wherein using the information includes, for example, if it has been determined that the geometry does not match the If any screen area overlaps a given GPU, rendering is skipped entirely.

本公开的实施方案公开了一种用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。该方法包括将第一组着色器对第一组几何图形执行几何图形测试和渲染与第二组着色器对第二组几何图形执行几何图形测试和渲染交错。几何图形测试生成关于第一组或第二组中的每个几何图形及其与多个屏幕区域中的每一个的关系的对应信息。多个GPU使用对应的信息来渲染第一组或第二组中的每一个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。Embodiments of the present disclosure disclose a method for graphics processing. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The method includes interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second group and its relationship to each of the plurality of screen areas. The plurality of GPUs render each geometry in the first or second group using corresponding information, where using the information includes, for example, completely rendering the geometry if it has been determined that it does not overlap any screen area allocated to the given GPU. Skip rendering.

在另一实施方案中，公开了一种用于执行方法的非暂时性计算机可读介质。该计算机可读介质包括用于使用多个图形处理单元(GPU)为应用渲染图形的程序指令。该计算机可读介质包括用于基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任的程序指令，每个GPU具有多个GPU已知的对应责任划分。该计算机可读介质包括用于将图像帧的多个几何图形分配给多个GPU以用于几何图形测试的程序指令。该计算机可读介质包括用于将第一组着色器对第一组几何图形执行几何图形测试和渲染与第二组着色器对第二组几何图形执行几何图形测试和渲染交错的程序指令。几何图形测试生成关于第一组或第二组中的每个几何图形及其与多个屏幕区域中的每一个的关系的对应信息。多个GPU使用对应的信息来渲染第一组或第二组中的每一个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer-readable medium includes program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs). The computer-readable medium includes program instructions for dividing responsibility for rendering geometry of a graphic among a plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of responsibility known to the plurality of GPUs. The computer-readable medium includes program instructions for allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The computer-readable medium includes program instructions for interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second group and its relationship to each of the plurality of screen areas. The plurality of GPUs render each geometry in the first or second group using corresponding information, where using the information includes, for example, completely rendering the geometry if it has been determined that it does not overlap any screen area allocated to the given GPU. Skip rendering.

在另一实施方案中，公开了一种计算机系统，包括处理器和存储器，所述存储器耦合到处理器并且其中存储有指令，所述指令如果由计算机系统执行，则致使计算机系统执行用于图形处理的方法。该方法包括使用多个图形处理单元(GPU)为应用渲染图形。该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，每个GPU具有多个GPU已知的对应责任划分。该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。该方法包括将第一组着色器对第一组几何图形执行几何图形测试和渲染与第二组着色器对第二组几何图形执行几何图形测试和渲染交错。几何图形测试生成关于第一组或第二组中的每个几何图形及其与多个屏幕区域中的每一个的关系的对应信息。多个GPU使用对应的信息来渲染第一组或第二组中的每一个几何图形，其中使用该信息包括例如如果已经确定该几何图形不与分配到给定GPU的任何屏幕区域重叠，则完全跳过渲染。In another embodiment, a computer system is disclosed, including a processor and a memory, the memory being coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform operations for graphics Processing method. The method includes using multiple graphics processing units (GPUs) to render graphics for the application. The method includes dividing the responsibility for rendering the geometry of the graphics among multiple GPUs based on multiple screen areas, each GPU having a corresponding division of responsibilities known to the multiple GPUs. The method includes allocating multiple geometries of an image frame to multiple GPUs for geometry testing. The method includes interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second group and its relationship to each of the plurality of screen areas. The plurality of GPUs render each geometry in the first or second group using corresponding information, where using the information includes, for example, completely rendering the geometry if it has been determined that it does not overlap any screen area allocated to the given GPU. Skip rendering.

根据以下结合附图理解的具体实施方式，本公开的其他方面将变得显而易见，所述附图以示例的方式示出了本公开的原理。Other aspects of the disclosure will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the disclosure.

附图说明Description of the drawings

通过参考结合附图的以下描述，可以最好地理解本公开，在附图中：The present disclosure may best be understood by reference to the following description taken in conjunction with the accompanying drawings, in which:

图1是根据本公开的实施方案的用于通过网络在一个或多个云游戏服务器之间提供游戏的系统的示意图，该云游戏服务器被配置为实现多个GPU协作以渲染单个图像，包括通过针对可能交错的屏幕区域预测试几何图形来为应用进行几何图形的多GPU(图形处理单元)渲染。1 is a schematic diagram of a system for providing games over a network between one or more cloud gaming servers configured to enable multiple GPUs to cooperate to render a single image, including by Pre-test geometry for potentially interleaved screen areas to perform multi-GPU (graphics processing unit) rendering of geometry for your application.

图2是根据本公开的一个实施方案的其中多个GPU协作以渲染单个图像的多GPU架构的示意图。Figure 2 is a schematic diagram of a multi-GPU architecture in which multiple GPUs cooperate to render a single image, according to one embodiment of the present disclosure.

图3是根据本公开的一个实施方案的多个图形处理单元资源的示意图，所述资源被配置用于通过针对可能交错的屏幕区域预测试几何图形来为应用进行几何图形的多GPU渲染。3 is a schematic diagram of multiple graphics processing unit resources configured for multi-GPU rendering of geometry for an application by pre-testing the geometry for potentially interleaved screen areas, according to one embodiment of the present disclosure.

图4是根据本公开的一个实施方案的实现被配置用于多GPU处理以使得多个GPU协作以渲染单个图像的图形流水线的渲染架构的示意图。4 is a schematic diagram of a rendering architecture implementing a graphics pipeline configured for multi-GPU processing such that multiple GPUs cooperate to render a single image, in accordance with one embodiment of the present disclosure.

图5是图示了根据本公开的一个实施方案的用于图形处理的方法的流程图，该方法包括通过在渲染之前针对交错的屏幕区域进行预测试来为应用进行几何图形的多GPU渲染。5 is a flowchart illustrating a method for graphics processing including multi-GPU rendering of geometry for an application by pre-testing for interleaved screen areas prior to rendering, according to one embodiment of the present disclosure.

图6A是根据本公开的一个实施方案的在执行多GPU渲染时被细分为象限的屏幕的示意图。Figure 6A is a schematic diagram of a screen subdivided into quadrants when performing multi-GPU rendering, in accordance with one embodiment of the present disclosure.

图6B是根据本公开的一个实施方案的在执行多GPU渲染时被细分为多个交错区域的屏幕的示意图。6B is a schematic diagram of a screen subdivided into multiple interleaved regions when performing multi-GPU rendering, according to one embodiment of the present disclosure.

图7A是根据本公开的一个实施方案的由协作以渲染单个图像帧的多个GPU共享的渲染命令缓冲区的示意图，渲染包括几何图形部分的预测试和渲染部分。7A is a schematic diagram of a rendering command buffer shared by multiple GPUs collaborating to render a single image frame, including a pretest and rendering portion of the geometry portion, according to one embodiment of the present disclosure.

图7B-1图示了根据本公开的一个实施方案的包括由多个GPU渲染的四个对象的图像，并且示出了在渲染图像的对象时每个GPU的屏幕区域责任。7B-1 illustrates an image including four objects rendered by multiple GPUs, and shows the screen area responsibility of each GPU in rendering the objects of the image, according to one embodiment of the present disclosure.

图7B-2是图示了根据本公开的一个实施方案的当渲染图7B-1的四个对象时由每个GPU执行的渲染的表格。Figure 7B-2 is a table illustrating the rendering performed by each GPU when rendering the four objects of Figure 7B-1 according to one embodiment of the present disclosure.

图7C是图示了根据本公开的一个实施方案的在通过多个GPU的协作渲染图像帧(例如，图7B-1的图像)时由一个或多个GPU执行的几何图形预测试和几何图形渲染的执行的示意图。Figure 7C is an illustration of geometry pre-testing and geometry performed by one or more GPUs when rendering an image frame (eg, the image of Figure 7B-1) through the cooperation of multiple GPUs, in accordance with one embodiment of the present disclosure. Schematic diagram of the rendered execution.

图8A图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像时针对屏幕区域的对象测试。Figure 8A illustrates object testing for screen area when multiple GPUs collaborate to render a single image, according to one embodiment of the present disclosure.

图8B图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像时针对屏幕区域的对象的部分的测试。8B illustrates testing of portions of an object of a screen area when multiple GPUs collaborate to render a single image, according to one embodiment of the present disclosure.

图9A-图9C图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像时用于将屏幕区域分配给对应GPU的各种策略。9A-9C illustrate various strategies for allocating screen areas to corresponding GPUs when multiple GPUs collaborate to render a single image, according to one embodiment of the present disclosure.

图10是图示了根据本公开的实施方案的用于对多个几何图形执行几何图形预测试的GPU分配的各种分布的示意图。10 is a schematic diagram illustrating various distributions of GPU allocations for performing geometry pretesting on multiple geometries in accordance with an embodiment of the present disclosure.

图11A是图示了根据本公开的一个实施方案的由多个GPU对前一图像帧进行几何图形预测试和渲染，以及在当前图像帧中使用在渲染期间收集的统计数据来影响将当前图像帧的几何图形的预测试分配给多个GPU的示意图。11A is a diagram illustrating geometry pre-testing and rendering of a previous image frame by multiple GPUs and using statistics collected during rendering in the current image frame to affect the rendering of the current image, according to one embodiment of the present disclosure. Illustration of pre-test distribution of frame geometry to multiple GPUs.

图11B是图示了根据本公开的一个实施方案的图形处理方法的流程图，包括由多个GPU对前一图像帧进行几何图形预测试和渲染，以及在当前图像帧中使用在渲染期间收集的统计数据来影响将当前图像帧的几何图形的预测试分配给多个GPU。11B is a flowchart illustrating a graphics processing method according to one embodiment of the present disclosure, including geometry pre-testing and rendering of a previous image frame by multiple GPUs and using data collected during rendering in the current image frame. Statistics to influence the pre-test distribution of the geometry of the current image frame to multiple GPUs.

图12A是图示了根据本公开的一个实施方案的被配置为通过命令缓冲区的一部分在两次通过中执行图像帧的几何图形的预测试和渲染两者的着色器的使用的示意图。12A is a schematic diagram illustrating the use of a shader configured to perform both pre-testing and rendering of the geometry of an image frame in two passes through a portion of a command buffer, in accordance with one embodiment of the present disclosure.

图12B是图示了根据本公开的一个实施方案的图形处理方法的流程图，包括通过命令缓冲区的一部分在两次通过中使用相同的一组着色器来执行图像帧的几何图形的预测试和渲染两者。12B is a flowchart illustrating a graphics processing method including performing pre-testing of the geometry of an image frame using the same set of shaders in two passes through a portion of a command buffer, in accordance with one embodiment of the present disclosure. and rendering both.

图13A是图示了根据本公开的一个实施方案的被配置为执行几何图形测试和渲染两者的着色器的使用的示意图，其中针对不同组几何图形执行的几何图形测试和渲染是使用对应命令缓冲区的单独部分交错的。13A is a schematic diagram illustrating the use of a shader configured to perform both geometry testing and rendering for different sets of geometries using corresponding commands, according to one embodiment of the present disclosure. Individual parts of the buffer are interleaved.

图13B是图示了根据本公开的一个实施方案的图形处理方法的流程图，包括使用对应命令缓冲区的单独部分来针对不同组几何图形使图像帧的几何图形的预测试和渲染交错。13B is a flowchart illustrating a graphics processing method including using separate portions of corresponding command buffers to interleave pre-testing and rendering of geometry for image frames for different sets of geometries, according to one embodiment of the present disclosure.

图14图示了可用于执行本公开的各种实施方案的各方面的示例装置的部件。Figure 14 illustrates components of an example apparatus that may be used to perform aspects of various embodiments of the present disclosure.

具体实施方式Detailed ways

虽然出于说明目的，以下详细说明包含许多具体细节，但是本领域的普通技术人员将会了解，以下细节的许多变化和更改在本公开的范围内。因此，阐明以下所描述的本公开的各个方面，而本说明书的随附权利要求书的概括性没有任何损失，并且不对此权利要求书施加任何限制。Although the following detailed description contains many specific details for purposes of illustration, one of ordinary skill in the art will appreciate that many variations and modifications of the following details are within the scope of the present disclosure. Accordingly, the various aspects of the present disclosure described below are set forth without any loss of the generality of the appended claims of this specification and without the imposition of any limitations on such claims.

一般来说，单个GPU可以达到的性能是有限的，例如从GPU可以有多大的限制中得出。在本公开的实施方案中，为了渲染更加复杂的场景或使用更加复杂的算法(例如，材质、照明等)，希望使用多个GPU来渲染单个图像。特别地，本公开的各种实施方案描述了被配置用于通过针对可能交错的屏幕区域预测试几何图形来为应用执行几何图形的多GPU渲染的方法和系统。多个GPU协作生成图像。基于屏幕区域在多个GPU之间划分渲染责任。在渲染几何图形之前，GPU会生成关于几何图形及其与屏幕区域的关系的信息。这允许GPU更高效地渲染几何图形或完全避免渲染它。作为一个优势，例如，这允许多个GPU在相同时间量内渲染更复杂的场景和/或图像。Generally speaking, there are limits to the performance a single GPU can achieve, derived from, for example, the limits on how big a GPU can be. In embodiments of the present disclosure, in order to render more complex scenes or use more complex algorithms (eg, materials, lighting, etc.), it is desirable to use multiple GPUs to render a single image. In particular, various embodiments of the present disclosure describe methods and systems configured to perform multi-GPU rendering of geometry for an application by pre-testing the geometry for potentially interleaved screen areas. Multiple GPUs collaborate to generate images. Divide rendering responsibilities between multiple GPUs based on screen area. Before rendering geometry, the GPU generates information about the geometry and its relationship to the screen area. This allows the GPU to render the geometry more efficiently or avoid rendering it entirely. As an advantage, this allows multiple GPUs to render more complex scenes and/or images in the same amount of time, for example.

借助对各种实施方案的上述一般理解，现在将参考各种附图来描述实施方案的示例细节。With the above general understanding of various implementations in mind, example details of implementations will now be described with reference to the various figures.

贯穿本说明书，对“应用”或“游戏”或“视频游戏”或“游戏应用”的引用意在表示通过执行输入命令来引导的任何类型的交互式应用。仅出于说明目的，交互式应用包括用于游戏、文字处理、视频处理、视频游戏处理等的应用。此外，上面介绍的术语是可互换的。Throughout this specification, references to "application" or "game" or "video game" or "game application" are intended to mean any type of interactive application that is directed by executing input commands. For illustrative purposes only, interactive applications include applications for gaming, word processing, video processing, video game processing, and the like. Furthermore, the terms introduced above are interchangeable.

在整个说明书中，针对使用具有四个GPU的示例性架构为应用进行几何图形的多GPU处理或渲染来描述本公开的各种实施方案。然而，应当理解，在为应用渲染几何图形时，任何数量的GPU(例如，两个或更多个GPU)可以协作。Throughout this specification, various embodiments of the present disclosure are described with respect to multi-GPU processing or rendering of geometry for applications using an exemplary architecture with four GPUs. However, it should be understood that any number of GPUs (eg, two or more GPUs) may cooperate when rendering geometry for an application.

图1是根据本公开的一个实施方案的用于在为应用渲染图像(例如图像帧)时执行多GPU处理的系统的示意图。根据本公开的实施方案，该系统被配置为通过网络在一个或多个云游戏服务器之间提供游戏，并且更具体地被配置用于多个GPU的协作以渲染应用的单个图像。云游戏包括在服务器上执行视频游戏以生成游戏渲染的视频帧，然后将其发送到客户端进行显示。特别地，系统100被配置用于通过在渲染之前针对可能交错的屏幕区域进行预测试来为应用进行几何图形的高效的多GPU渲染。1 is a schematic diagram of a system for performing multi-GPU processing when rendering images (eg, image frames) for an application, according to one embodiment of the present disclosure. According to embodiments of the present disclosure, the system is configured to provide games between one or more cloud gaming servers over a network, and is more specifically configured for the cooperation of multiple GPUs to render a single image of an application. Cloud gaming involves executing a video game on a server to generate game-rendered video frames, which are then sent to the client for display. In particular, system 100 is configured for efficient multi-GPU rendering of geometry for applications by pre-testing for potentially interleaved screen areas prior to rendering.

虽然图1图示了在云游戏系统的一个或多个云游戏服务器之间实现几何图形的多GPU渲染，但是本公开的其他实施方案通过在渲染时在独立系统(诸如，包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内执行区域测试为应用提供几何图形的高效多GPU渲染。While FIG. 1 illustrates enabling multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system, other embodiments of the present disclosure implement multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system, other embodiments of the present disclosure include Performing zone testing within a high-end graphics card (PC or gaming console) provides applications with efficient multi-GPU rendering of geometry.

还应理解，在各种实施方案中(例如，在云游戏环境中或在独立系统中)，几何图形的多GPU渲染可以使用物理GPU或虚拟GPU或两者的组合来执行。例如，虚拟机(例如实例)可以使用利用硬件层的一个或多个部件(诸如多个CPU、存储器模块、GPU、网络接口、通信部件等)的主机硬件(例如位于数据中心)的管理程序创建。这些物理资源可以布置在机架中，诸如CPU机架、GPU机架、存储器机架等，其中机架中的物理资源可以使用架顶交换机访问，该架顶交换机促进用于组装和访问用于实例的部件的结构(例如，在构建实例的虚拟化部件时)。通常，管理程序可以呈现配置有虚拟资源的多个实例的多个客户操作系统。即，每个操作系统可以配置有由一个或多个硬件资源(例如，位于对应的数据中心)支持的对应的一组虚拟化资源。例如，每个操作系统可以由一个虚拟CPU、多个虚拟GPU、虚拟存储器、虚拟化通信部件等支持。此外，一个实例的配置可以从一个数据中心转移到另一个数据中心中心以减少延时。为用户或游戏定义的GPU利用率可以在保存用户的游戏会话时使用。GPU利用率可以包括本文所述的任何数量的配置，以优化游戏会话的视频帧的快速渲染。在一个实施方案中，为游戏或用户定义的GPU利用率可以作为可配置设置在数据中心之间传输。传输GPU利用率设置的能力可以在用户连接到不同地理位置玩游戏的情况下，有效地将游戏进行从数据中心迁移到数据中心。It should also be understood that in various implementations (eg, in a cloud gaming environment or in a stand-alone system), multi-GPU rendering of geometry may be performed using physical GPUs or virtual GPUs, or a combination of both. For example, a virtual machine (eg, an instance) may be created using a hypervisor of host hardware (eg, located in a data center) that utilizes one or more components of a hardware layer (such as multiple CPUs, memory modules, GPUs, network interfaces, communications components, etc.) . These physical resources can be arranged in racks, such as CPU racks, GPU racks, memory racks, etc., where the physical resources in the rack can be accessed using a top-of-rack switch that facilitates assembly and access for The structure of the instance's components (for example, when building the instance's virtualized components). Typically, a hypervisor can present multiple guest operating systems configured with multiple instances of virtual resources. That is, each operating system may be configured with a corresponding set of virtualization resources supported by one or more hardware resources (eg, located in a corresponding data center). For example, each operating system may be supported by a virtual CPU, multiple virtual GPUs, virtual memory, virtualized communications components, etc. Additionally, an instance's configuration can be moved from one data center to another to reduce latency. GPU utilization defined for a user or game can be used when saving a user's game session. GPU utilization can include any number of configurations described in this article to optimize fast rendering of video frames for a gaming session. In one embodiment, GPU utilization defined for a game or user can be transferred between data centers as a configurable setting. The ability to transfer GPU utilization settings effectively moves game play from data center to data center in situations where users are connected to play games in different geographical locations.

根据本公开的一个实施方案，系统100经由云游戏网络190提供游戏，其中远离正在玩游戏的对应用户的客户端装置110(例如，瘦客户端)执行游戏。系统100可以经由网络150以单玩家或多玩家模式通过云游戏网络190向玩一个或多个游戏的一个或多个用户提供游戏控制。在一些实施方案中，云游戏网络190可以包括在主机的管理程序上运行的多个虚拟机(VM)，其中一个或多个虚拟机被配置为利用对主机的管理程序可用的硬件资源来执行游戏处理器模块。网络150可包括一种或多种通信技术。在一些实施方案中，网络150可以包括具有先进无线通信系统的第五代(5G)网络技术。According to one embodiment of the present disclosure, the system 100 provides games via a cloud gaming network 190 where the games are executed on a client device 110 (eg, a thin client) remote from the corresponding user who is playing the game. System 100 may provide game controls over cloud gaming network 190 via network 150 to one or more users playing one or more games in single-player or multi-player modes. In some embodiments, cloud gaming network 190 may include multiple virtual machines (VMs) running on a host's hypervisor, where one or more VMs are configured to execute using hardware resources available to the host's hypervisor. Game processor module. Network 150 may include one or more communications technologies. In some embodiments, network 150 may include fifth generation (5G) network technology with advanced wireless communication systems.

在一些实施方案中，可使用无线技术来促进通信。此类技术可以包括例如5G无线通信技术。5G是第五代蜂窝网络技术。5G网络是数字蜂窝网络，其中提供商覆盖的服务区域被划分为称为小区的小地理区域。代表声音和图像的模拟信号在电话中被数字化、由模数转换器转换并作为比特流传输。通过从其他小区中重复使用的频率池中由收发器分配的频率信道，小区中的所有5G无线装置通过无线电波与小区中的本地天线阵列和小功率自动收发器(发射器和接收器)通信。本地天线通过高带宽光纤或无线回程连接与电话网络和互联网连接。与在其他小区网络中一样，从一个小区跨到另一小区的移动装置会自动转到新的小区。应当理解，5G网络只是通信网络的示例类型，并且本公开的实施方案可以利用更早一代的无线或有线通信，以及5G之后的更新一代的有线或无线技术。In some embodiments, wireless technology may be used to facilitate communication. Such technologies may include, for example, 5G wireless communications technology. 5G is the fifth generation of cellular network technology. 5G networks are digital cellular networks in which the service area covered by a provider is divided into small geographic areas called cells. Analog signals representing sound and images are digitized in the phone, converted by an analog-to-digital converter and transmitted as a bit stream. All 5G wireless devices in a cell communicate via radio waves with the local antenna array and low-power automatic transceivers (transmitters and receivers) in the cell through frequency channels allocated by transceivers from a frequency pool that is reused in other cells. . Local antennas connect to the telephone network and the Internet via high-bandwidth fiber optic or wireless backhaul connections. As in other cell networks, mobile devices crossing from one cell to another are automatically transferred to the new cell. It should be understood that 5G networks are only an example type of communication network and that embodiments of the present disclosure may utilize earlier generations of wireless or wired communications, as well as newer generations of wired or wireless technologies after 5G.

如图所示，云游戏网络190包括提供对多个视频游戏的访问的游戏服务器160。游戏服务器160可以是云中可用的任何类型的服务器计算装置，并且可以被配置为在一个或多个主机上执行的一个或多个虚拟机。例如，游戏服务器160可管理支持为用户实例化游戏实例的游戏处理器的虚拟机。这样，与多个虚拟机相关联的游戏服务器160的多个游戏处理器被配置为执行与多个用户的游戏进行相关联的一个或多个游戏的多个实例。以这种方式，后端服务器支持向多个对应用户提供多个游戏应用的游戏进行的媒体(例如，视频、音频等)的流式传输。也就是说，游戏服务器160被配置为通过网络150将数据(例如，对应游戏进行的渲染图像和/或帧)流式传输回对应客户端装置110。以该方式，计算上复杂的游戏应用可响应于由客户端装置110接收和转发的控制器输入而在后端服务器处执行。每个服务器能够渲染图像和/或帧，然后将其编码(例如，压缩)并流式传输到对应的客户端装置以供显示。As shown, cloud gaming network 190 includes game servers 160 that provide access to a plurality of video games. Game server 160 may be any type of server computing device available in the cloud, and may be configured as one or more virtual machines executing on one or more hosts. For example, game server 160 may manage a virtual machine that supports a game processor that instantiates game instances for users. As such, multiple game processors of game server 160 associated with multiple virtual machines are configured to execute multiple instances of one or more games associated with multiple users' game play. In this manner, the backend server supports streaming of media (eg, video, audio, etc.) for gaming of multiple gaming applications to multiple corresponding users. That is, game server 160 is configured to stream data (eg, rendered images and/or frames corresponding to game play) over network 150 back to corresponding client device 110 . In this manner, computationally complex gaming applications can be executed at the backend server in response to controller input received and forwarded by client device 110 . Each server is capable of rendering images and/or frames, which are then encoded (eg, compressed) and streamed to the corresponding client device for display.

例如，多个用户可使用被配置用于接收流式传输媒体的对应客户端装置110经由通信网络150访问云游戏网络190。在一个实施方案中，客户端装置110可被配置为瘦客户端，所述瘦客户端提供与被配置用于提供计算功能性(例如，包括游戏名称处理引擎111)的后端服务器(例如，云游戏网络190)的介接。在另一实施方案中，客户端装置110可配置有用于视频游戏的至少一些本地处理的游戏名称处理引擎和游戏逻辑，并且还可被利用来接收由在后端服务器处执行的视频游戏生成的流式传输内容，或由后端服务器支持提供的其他内容。对于本地处理，游戏名称处理引擎包括用于执行视频游戏和与视频游戏相关联的服务的基于基本处理器的功能。在这种情况下，游戏逻辑可以存储在本地客户端装置110上并用于执行视频游戏。For example, multiple users may access cloud gaming network 190 via communication network 150 using corresponding client devices 110 configured to receive streaming media. In one embodiment, client device 110 may be configured as a thin client that provides communication with a backend server (eg, including game title processing engine 111 ) configured to provide computing functionality. Cloud gaming network 190) interface. In another embodiment, the client device 110 may be configured with a game title processing engine and game logic for at least some local processing of the video game, and may also be utilized to receive video games generated by the video game executing at the backend server. Streaming content, or other content provided by backend server support. For local processing, the game title processing engine includes basic processor-based functionality for executing video games and services associated with video games. In this case, the game logic may be stored locally on the client device 110 and used to execute the video game.

每个客户端装置110可以请求访问来自云游戏网络的不同游戏。例如，云游戏网络190可以执行一个或多个游戏逻辑，这些逻辑建立在游戏名称处理引擎111上，如使用游戏服务器160的CPU资源163和GPU资源365执行的。例如，与游戏名称处理引擎111合作的游戏逻辑115a可以在用于一个客户端的游戏服务器160上执行，与游戏名称处理引擎111合作的游戏逻辑115b可以在用于第二客户端的游戏服务器160上执行，……并且与游戏名称处理引擎111合作的游戏逻辑115n可以在用于第N客户端的游戏服务器160上执行。Each client device 110 may request access to a different game from the cloud gaming network. For example, cloud gaming network 190 may execute one or more game logic built on game title processing engine 111 , as executed using CPU resources 163 and GPU resources 365 of game server 160 . For example, game logic 115a in cooperation with the game name processing engine 111 may be executed on the game server 160 for one client, and game logic 115b in cooperation with the game name processing engine 111 may be executed on the game server 160 for a second client. , ...and the game logic 115n cooperating with the game name processing engine 111 may be executed on the game server 160 for the Nth client.

特别地，对应用户(未示出)的客户端装置110被配置用于通过诸如互联网的通信网络150请求访问游戏，并且用于渲染由游戏服务器160执行的视频游戏生成的显示图像(例如，图像帧)，其中编码图像被传送到客户端装置110以与对应的用户相关联地显示。例如，用户可以通过客户端装置110与在游戏服务器160的游戏处理器上执行的视频游戏的实例进行交互。更具体地，视频游戏的实例由游戏名称处理引擎111执行。实现视频游戏的对应游戏逻辑(例如，可执行代码)115被存储并可通过数据存储区(未示出)访问，并用于执行视频游戏。游戏名称处理引擎111能够使用多个游戏逻辑(例如游戏应用)来支持多个视频游戏，每个游戏逻辑都可由用户选择。In particular, a client device 110 corresponding to a user (not shown) is configured for requesting access to a game through a communication network 150 such as the Internet, and for rendering display images (eg, images) generated by a video game executed by the game server 160 frame), in which the encoded image is transmitted to the client device 110 for display in association with the corresponding user. For example, a user may interact through client device 110 with an instance of a video game executing on a game processor of game server 160 . More specifically, instances of video games are executed by game title processing engine 111 . Corresponding game logic (eg, executable code) 115 that implements the video game is stored and accessible through a data store (not shown) and used to execute the video game. The game title processing engine 111 is capable of supporting multiple video games using multiple game logics (eg, game applications), each game logic being selectable by the user.

例如，客户端装置110被配置为与和对应用户的游戏进行相关联的游戏名称处理引擎111交互，诸如通过用于驱动游戏进行的输入命令。特别地，客户端装置110可以接收来自各种类型的输入装置的输入，诸如游戏控制器、平板计算机、键盘、由摄像机捕获的手势、鼠标、触摸板等。客户端装置110可以是任何类型的至少具有存储器和能够通过网络150连接到游戏服务器160的处理器模块的计算装置。后端游戏名称处理引擎111被配置用于生成渲染图像，该渲染图像通过网络150传送以在与客户端装置110相关联的对应显示器处显示。例如，通过基于云的服务，游戏渲染图像可以由在游戏服务器160的游戏执行引擎111上执行的对应游戏(例如游戏逻辑)的实例来传送。也就是说，客户端装置110被配置用于接收编码图像(例如，从通过执行视频游戏生成的游戏渲染图像编码)，并且用于在显示器11上显示渲染的图像。在一个实施方案中，显示器11包括HMD(例如，显示VR内容)。在一些实施方案中，渲染图像可以以无线或有线方式、直接从基于云的服务或经由客户端装置110(例如Remote Play)流式传输到智能手机或平板电脑。For example, the client device 110 is configured to interact with the game title processing engine 111 associated with the corresponding user's game play, such as through input commands for driving the game play. In particular, client device 110 may receive input from various types of input devices, such as game controllers, tablets, keyboards, gestures captured by cameras, mice, touch pads, and the like. Client device 110 may be any type of computing device having at least memory and a processor module capable of connecting to game server 160 over network 150 . The backend game title processing engine 111 is configured to generate a rendered image that is transmitted over the network 150 for display at a corresponding display associated with the client device 110 . For example, through a cloud-based service, game rendering images may be delivered by an instance of the corresponding game (eg, game logic) executing on the game execution engine 111 of the game server 160 . That is, client device 110 is configured to receive an encoded image (eg, encoded from a game-rendered image generated by executing a video game) and to display the rendered image on display 11 . In one embodiment, display 11 includes a HMD (eg, displaying VR content). In some implementations, the rendered image may be wirelessly or wired, directly from a cloud-based service, or via client device 110 (e.g., Remote Play) to stream to your smartphone or tablet.

在一个实施方案中，游戏服务器160和/或游戏名称处理引擎111包括用于执行游戏和与游戏应用相关联的服务的基于基本处理器的功能。例如，游戏服务器160包括中央处理单元(CPU)资源163和图形处理单元(GPU)资源365，它们被配置用于执行基于处理器的功能，包括2D或3D渲染、物理模拟、脚本化、音频、动画、图形处理、照明、着色、光栅化、光线追踪、阴影、剔除、变换、人工智能等。另外，CPU和GPU组可以实现用于游戏应用的服务，部分地包括存储器管理、多线程管理、服务质量(QoS)、带宽测试、社交网络、社交好友管理、与朋友的社交网络通信、通信渠道、发短信、即时消息、聊天支持等。在一个实施方案中，一个或多个应用共享特定的GPU资源。在一个实施方案中，可以组合多个GPU装置来为在对应CPU上执行的单个应用执行图形处理。In one embodiment, game server 160 and/or game title processing engine 111 includes basic processor-based functionality for executing games and services associated with game applications. For example, game server 160 includes central processing unit (CPU) resources 163 and graphics processing unit (GPU) resources 365 that are configured to perform processor-based functions including 2D or 3D rendering, physics simulation, scripting, audio, Animation, graphics processing, lighting, shading, rasterization, ray tracing, shadows, culling, transforms, artificial intelligence, and more. In addition, the CPU and GPU groups can implement services for gaming applications, including in part memory management, multi-thread management, quality of service (QoS), bandwidth testing, social networking, social friend management, social network communication with friends, communication channels , texting, instant messaging, chat support, and more. In one embodiment, one or more applications share specific GPU resources. In one embodiment, multiple GPU devices may be combined to perform graphics processing for a single application executing on a corresponding CPU.

在一个实施方案中，云游戏网络190是分布式游戏服务器系统和/或架构。特别地，执行游戏逻辑的分布式游戏引擎被配置为对应游戏的对应实例。通常，分布式游戏引擎采用游戏引擎的每个功能并将这些功能分布以供多个处理实体执行。单独的功能可以进一步分布在一个或多个处理实体上。处理实体可以被配置为不同的配置，包括物理硬件，和/或作为虚拟部件或虚拟机，和/或作为虚拟容器，其中容器不同于虚拟机，因为它虚拟化正在虚拟化操作系统上运行的游戏应用的实例。处理实体可以利用和/或依赖云游戏网络190的一个或多个服务器(计算节点)上的服务器及其底层硬件，其中服务器可以位于一个或多个机架上。对各种处理实体执行这些功能的协调、分配和管理由分布同步层执行。以这种方式，这些功能的执行由分布同步层控制，以响应于玩家的控制器输入为游戏应用生成媒体(例如，视频帧、音频等)。分布同步层能够跨分布式处理实体有效地执行(例如，通过载荷平衡)这些功能，以使得关键的游戏引擎部件/功能被分布和重新组装以进行更有效的处理。In one embodiment, cloud gaming network 190 is a distributed game server system and/or architecture. In particular, a distributed game engine executing game logic is configured to correspond to a corresponding instance of the game. Typically, a distributed game engine takes every function of the game engine and distributes those functions for execution by multiple processing entities. Individual functionality can be further distributed across one or more processing entities. The processing entity may be configured in different configurations, including physical hardware, and/or as virtual components or virtual machines, and/or as virtual containers, where a container is different from a virtual machine in that it is virtualized and is running on a virtualized operating system. Examples of gaming applications. The processing entity may utilize and/or rely on servers and their underlying hardware on one or more servers (computing nodes) of the cloud gaming network 190, where the servers may be located on one or more racks. The coordination, allocation, and management of the various processing entities performing these functions are performed by the distributed synchronization layer. In this manner, the execution of these functions is controlled by a distributed synchronization layer to generate media (e.g., video frames, audio, etc.) for the game application in response to the player's controller input. The distributed synchronization layer is able to efficiently perform (e.g., through load balancing) these functions across distributed processing entities so that critical game engine components/functions are distributed and reassembled for more efficient processing.

图2是根据本公开的一个实施方案的其中多个GPU协作以渲染对应的应用的单个图像的示例性多GPU架构200的示意图。应当理解，在本公开的各种实施方案中，许多架构是可能的，其中多个GPU协作以渲染单个图像，尽管没有明确地描述或示出。例如，在渲染时通过执行区域测试来为应用进行几何图形的多GPU渲染可以在云游戏系统的一个或多个云游戏服务器之间实现，或者可以在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内实现，等等。Figure 2 is a schematic diagram of an exemplary multi-GPU architecture 200 in which multiple GPUs cooperate to render a single image for a corresponding application, according to one embodiment of the present disclosure. It should be understood that in various embodiments of the present disclosure, many architectures are possible in which multiple GPUs cooperate to render a single image, although not explicitly described or shown. For example, multi-GPU rendering of geometry for an application by performing area testing at render time may be implemented between one or more cloud gaming servers of a cloud gaming system, or may be performed on a standalone system such as a high-end server that includes multiple GPUs. graphics card (PC or game console), etc.

多GPU架构200包括CPU 163和多个GPU，其被配置用于为应用的单个图像和/或应用的图像序列中的每个图像进行多GPU渲染。特别地，CPU 163和GPU资源365被配置用于执行基于处理器的功能，包括2D或3D渲染、物理模拟、脚本化、音频、动画、图形处理、照明、着色、光栅化、光线追踪、阴影、剔除、变换、人工智能等，如前所述。Multi-GPU architecture 200 includes a CPU 163 and a plurality of GPUs configured for multi-GPU rendering for a single image of an application and/or for each image in a sequence of images of the application. In particular, CPU 163 and GPU resources 365 are configured to perform processor-based functions including 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shading , culling, transformation, artificial intelligence, etc., as mentioned above.

例如，在多GPU架构200的GPU资源365中示出了四个GPU，尽管在为应用渲染图像时可以使用任意数量的GPU。每个GPU经由高速总线220连接到对应的专用存储器，诸如随机存取存储器(RAM)。特别地，GPU-A经由总线220连接到存储器210A(例如，RAM)，GPU-B经由总线220连接到存储器210B(例如，RAM)，GPU-C经由总线220连接到存储器210C(例如，RAM)，和GPU-D经由总线220连接到存储器210D(例如，RAM)。For example, four GPUs are shown in GPU resource 365 of multi-GPU architecture 200, although any number of GPUs may be used when rendering images for an application. Each GPU is connected via high-speed bus 220 to a corresponding dedicated memory, such as random access memory (RAM). Specifically, GPU-A is connected to memory 210A (eg, RAM) via bus 220, GPU-B is connected to memory 210B (eg, RAM) via bus 220, and GPU-C is connected to memory 210C (eg, RAM) via bus 220. , and GPU-D is connected to memory 210D (eg, RAM) via bus 220.

此外，每个GPU经由总线240彼此连接，取决于架构，总线240在速度上可能大约等于或慢于用于对应GPU与其对应存储器之间的通信的总线220。例如，GPU-A经由总线240连接到GPU-B、GPU-C和GPU-D中的每一个。同样，GPU-B经由总线240连接到GPU-A、GPU-C和GPU-D中的每一个。此外，GPU-C经由总线240连接到GPU-A、GPU-B和GPU-D中的每一个。进一步，GPU-D经由总线240连接到GPU-A、GPU-B和GPU-C中的每一个。Additionally, each GPU is connected to one another via a bus 240, which, depending on the architecture, may be approximately equal to or slower in speed than bus 220 used for communication between a corresponding GPU and its corresponding memory. For example, GPU-A is connected to each of GPU-B, GPU-C, and GPU-D via bus 240 . Likewise, GPU-B is connected to each of GPU-A, GPU-C, and GPU-D via bus 240 . In addition, GPU-C is connected to each of GPU-A, GPU-B, and GPU-D via bus 240 . Further, GPU-D is connected to each of GPU-A, GPU-B, and GPU-C via bus 240 .

CPU 163经由较低速度的总线230连接到每个GPU(例如，总线230比用于对应GPU与其对应存储器之间的通信的总线220慢)。具体地，CPU 163连接到GPU-A、GPU-B、GPU-C和GPU-D中的每一个。CPU 163 is connected to each GPU via a lower speed bus 230 (eg, bus 230 is slower than bus 220 used for communication between the corresponding GPU and its corresponding memory). Specifically, the CPU 163 is connected to each of GPU-A, GPU-B, GPU-C, and GPU-D.

图3是根据本公开的一个实施方案的图形处理单元资源365的示意图，所述资源被配置用于通过在渲染前针对可能交错的屏幕区域进行预测试来对应用生成的图像帧进行几何图形的多GPU渲染。例如，游戏服务器160可以被配置为包括图1的云游戏网络190中的GPU资源365。如图所示，GPU资源365包括多个GPU，诸如GPU 365a、GPU 365b……GPU 365n。如前所述，各种架构可以包括多个GPU协作以通过在渲染时通过区域测试为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器之间实现几何图形的多GPU渲染，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内实现几何图形的多GPU渲染，等等。3 is a schematic diagram of a graphics processing unit resource 365 configured to geometrically shape image frames generated by an application by pre-testing for potentially interleaved screen areas prior to rendering, in accordance with one embodiment of the present disclosure. Multi-GPU rendering. For example, game server 160 may be configured to include GPU resources 365 in cloud gaming network 190 of FIG. 1 . As shown, GPU resource 365 includes multiple GPUs, such as GPU 365a, GPU 365b...GPU 365n. As previously mentioned, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for the application through zone testing at render time, such as between one or more cloud gaming servers of a cloud gaming system. Multi-GPU rendering of geometry can be implemented in a stand-alone system such as a personal computer or game console that includes a high-end graphics card with multiple GPUs, etc.

特别地，在一个实施方案中，游戏服务器160被配置为在渲染应用的单个图像时执行多GPU处理，使得多个GPU协作以渲染单个图像，和/或在执行应用时渲染图像序列的一个或多个图像中的每一个。例如，在一个实施方案中，游戏服务器160可以包括CPU和GPU组，其被配置为对应用的图像序列中的一个或多个图像中的每一个执行多GPU渲染，其中一个CPU和GPU组可以实现应用的图形和/或渲染流水线。CPU和GPU组可以被配置为一个或多个处理装置。如前所述，GPU和GPU组可以包括CPU 163和GPU资源365，其被配置用于执行基于处理器的功能，包括2D或3D渲染、物理模拟、脚本化、音频、动画、图形处理、照明、着色、光栅化、光线追踪、阴影、剔除、变换、人工智能等。Specifically, in one embodiment, game server 160 is configured to perform multi-GPU processing when rendering a single image of an application, such that multiple GPUs cooperate to render a single image, and/or to render one or more of a sequence of images when executing an application. Multiple images each. For example, in one embodiment, game server 160 may include a CPU and GPU group configured to perform multi-GPU rendering on each of one or more images in an application's sequence of images, where one CPU and GPU group may Implement the application's graphics and/or rendering pipeline. CPU and GPU groups can be configured as one or more processing devices. As previously mentioned, GPUs and GPU groups may include CPU 163 and GPU resources 365 configured to perform processor-based functions including 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting , shading, rasterization, ray tracing, shadows, culling, transforms, artificial intelligence, and more.

GPU资源365负责和/或配置用于渲染对象(例如，将对象像素的颜色或法线向量值写入多渲染目标-MRT)和执行同步计算内核(例如，对产生的MRT的全屏效果)；要执行的同步计算和要渲染的对象由包含在GPU将执行的渲染命令缓冲区325中的命令指定。特别地，GPU资源365被配置为在执行来自渲染命令缓冲区325的命令时渲染对象并执行同步计算(例如，在同步计算内核的执行期间)，其中命令和/或操作可以依赖于其他操作，使得它们依次执行。GPU resources 365 are responsible and/or configured for rendering objects (e.g., writing color or normal vector values of object pixels to a multi-render target-MRT) and executing synchronized computing kernels (e.g., full-screen effects on the resulting MRT); The synchronized calculations to be performed and the objects to be rendered are specified by commands contained in the rendering command buffer 325 that the GPU will execute. In particular, GPU resource 365 is configured to render objects and perform synchronous computations (e.g., during the execution of a synchronous compute kernel) upon execution of commands from rendering command buffer 325 , where the commands and/or operations may depend on other operations, Make them execute in sequence.

例如，GPU资源365被配置为使用一个或多个渲染命令缓冲区325(例如渲染命令缓冲区325a、渲染缓冲区325b……渲染命令缓冲区325n)来执行同步计算和/或对象的渲染。在一个实施方案中，GPU资源365中的每个GPU可以具有它们自己的命令缓冲区。替代地，当每个GPU正在渲染基本上相同的一组对象时(例如，由于区域的小尺寸)，GPU资源365中的GPU可以使用相同的命令缓冲区或相同的一组命令缓冲区。此外，GPU资源365中的每个GPU可以支持命令由一个GPU执行而不是由另一个GPU执行的能力。例如，绘制命令上的标志或渲染命令缓冲区中的预测允许单个GPU执行对应命令缓冲区中的一个或多个命令，而其他GPU将忽略这些命令。例如，渲染命令缓冲区325a可以支持标志330a，渲染命令缓冲区325b可以支持标志330b……渲染命令缓冲区325n可以支持标志330n。For example, GPU resource 365 is configured to perform synchronized computations and/or rendering of objects using one or more render command buffers 325 (eg, render command buffer 325a, render buffer 325b...render command buffer 325n). In one embodiment, each GPU in GPU resources 365 may have their own command buffer. Alternatively, the GPUs in GPU resource 365 may use the same command buffer or the same set of command buffers when each GPU is rendering substantially the same set of objects (eg, due to the small size of the region). Additionally, each GPU in GPU resources 365 may support the ability for commands to be executed by one GPU but not another GPU. For example, a flag on a draw command or a prediction in a render command buffer allows a single GPU to execute one or more commands in the corresponding command buffer, while other GPUs will ignore these commands. For example, rendering command buffer 325a may support flag 330a, rendering command buffer 325b may support flag 330b, ... rendering command buffer 325n may support flag 330n.

同步计算的性能(例如同步计算内核的执行)和对象的渲染是整体渲染的一部分。例如，如果视频游戏以60Hz(例如每秒60帧)运行，则图像帧的所有对象渲染和同步计算内核的执行通常必须在大约16.67毫秒(例如，60Hz的一帧)内完成。如前所述，渲染对象和/或执行同步计算内核时执行的操作是有序的，使得操作可能依赖于其他操作(例如，渲染命令缓冲区中的命令可能需要在该渲染命令缓冲区中的其他命令可以执行之前完成执行)。The performance of synchronized calculations (e.g. the execution of synchronized calculation kernels) and the rendering of objects are part of the overall rendering. For example, if a video game runs at 60Hz (e.g., 60 frames per second), then all object rendering and execution of synchronized computing cores for an image frame must typically complete in approximately 16.67 milliseconds (e.g., one frame at 60Hz). As mentioned previously, operations performed when rendering objects and/or executing synchronous compute kernels are ordered such that operations may depend on other operations (e.g., a command in a render command buffer may require a command in that render command buffer. Other commands can be executed before completing execution).

特别地，每个渲染命令缓冲区325包含各种类型的命令，包括影响对应GPU配置的命令(例如，指定渲染目标的位置和格式的命令)，以及渲染对象和/或执行同步计算内核的命令。为了说明的目的，执行同步计算内核时执行的同步计算可以包括当对象全部被渲染到一个或多个对应的多渲染目标(MRT)时执行全屏效果。In particular, each render command buffer 325 contains various types of commands, including commands that affect the corresponding GPU configuration (e.g., commands that specify the location and format of a render target), as well as commands that render objects and/or execute synchronous computing kernels. . For purposes of illustration, synchronous computations performed when executing a synchronous computation kernel may include performing full-screen effects when objects are all rendered to one or more corresponding multiple render targets (MRTs).

此外，当GPU资源365为图像帧渲染对象，和/或在生成图像帧时执行同步计算内核时，GPU资源365经由每个GPU 365a、365b……365n的寄存器进行配置。例如，GPU 365a经由其寄存器340(例如寄存器340a、寄存器340b……寄存器340n)配置为以某种方式执行该渲染或计算内核执行。也就是说，当执行渲染命令缓冲区325中的命令用于为图像帧渲染对象和/或执行同步计算内核时，存储在寄存器340中的值定义用于GPU 365a的硬件背景(例如GPU配置或GPU状态)。GPU资源365中的每个GPU可以被类似地配置，使得GPU 365b经由其寄存器350(例如，寄存器350a、寄存器350b……寄存器350n)被配置为以某种方式执行该渲染或计算内核执行；……并且GPU 365n经由其寄存器370(例如，寄存器370a、寄存器370b……寄存器370n)被配置为以某种方式执行该渲染或计算内核执行。Additionally, when the GPU resource 365 is an image frame rendering object, and/or executes a synchronous computation kernel when generating an image frame, the GPU resource 365 is configured via the registers of each GPU 365a, 365b...365n. For example, GPU 365a is configured via its registers 340 (eg, register 340a, register 340b...register 340n) to perform the rendering or compute core execution in a certain manner. That is, the values stored in register 340 define the hardware context (e.g., GPU configuration or GPU status). Each GPU in GPU resources 365 may be similarly configured such that GPU 365b is configured via its registers 350 (e.g., register 350a, register 350b...register 350n) to perform the rendering or compute kernel execution in a certain manner; ... ...and GPU 365n is configured via its registers 370 (eg, register 370a, register 370b...register 370n) to perform this rendering or compute core execution in a certain manner.

GPU配置的一些示例包括渲染目标(例如MRT)的位置和格式。此外，GPU配置的其他示例包括操作程序。例如，在渲染对象时，可以通过各种方式将对象的每个像素的Z值与Z缓冲区进行比较。例如，仅当对象Z值与Z缓冲区中的值匹配时才写入对象像素。替代地，仅当对象Z值等于或小于Z缓冲区中的值时才可以写入对象像素。正在执行的测试类型在GPU配置中定义。Some examples of GPU configuration include the location and format of render targets (such as MRT). Additionally, other examples of GPU configurations include operating procedures. For example, when rendering an object, you can compare the Z value of each pixel of the object to the Z buffer in various ways. For example, object pixels are only written if the object Z value matches the value in the Z buffer. Instead, object pixels can be written only if the object Z value is equal to or less than the value in the Z buffer. The type of test being performed is defined in the GPU configuration.

图4是根据本公开的一个实施方案的实现被配置用于多GPU处理以使得多个GPU协作以渲染单个图像的图形流水线400的渲染架构的简化示意图。图形流水线400说明了使用3D(三维)多边形渲染过程来渲染图像的一般过程。用于渲染图像的图形流水线400为显示器中的每个像素输出对应的颜色信息，其中颜色信息可以表示纹理和着色(例如，颜色、阴影等)。图形流水线400可以在图1和图3的客户端装置110、游戏服务器160、游戏名称处理引擎111和/或GPU资源365内实现。也就是说，各种架构可能包括多个GPU协作以通过在渲染时通过区域测试为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器之间实现几何图形的多GPU渲染，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内实现几何图形的多GPU渲染，等等。4 is a simplified schematic diagram of a rendering architecture implementing a graphics pipeline 400 configured for multi-GPU processing such that multiple GPUs cooperate to render a single image, in accordance with one embodiment of the present disclosure. Graphics pipeline 400 illustrates the general process of rendering an image using a 3D (three-dimensional) polygon rendering process. Graphics pipeline 400 for rendering images outputs corresponding color information for each pixel in the display, where the color information may represent texture and shading (eg, color, shading, etc.). Graphics pipeline 400 may be implemented within client device 110, game server 160, game title processing engine 111, and/or GPU resources 365 of Figures 1 and 3. That is, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for the application through zone testing at render time, such as between one or more cloud gaming servers of a cloud gaming system Implement multi-GPU rendering of geometry, or within a stand-alone system such as a personal computer or game console that includes high-end graphics cards with multiple GPUs, and so on.

如图所示，图形流水线接收输入几何图形405。例如，几何图形处理级410接收输入几何图形405。例如，输入几何图形405可以包括3D游戏世界内的顶点，以及对应于每个顶点的信息。游戏世界中的给定对象可以使用由顶点定义的多边形(例如三角形)来表示，其中对应多边形的表面然后通过图形流水线400进行处理以实现最终效果(例如颜色、纹理等)。顶点属性可以包括法线(例如，哪个方向垂直于那个位置的几何图形)、颜色(例如，RGB-红色、绿色和蓝色三元组等)，以及纹理坐标/映射信息。As shown, the graphics pipeline receives input geometry 405. For example, geometry processing stage 410 receives input geometry 405 . For example, input geometry 405 may include vertices within a 3D game world, as well as information corresponding to each vertex. A given object in the game world may be represented using polygons (e.g., triangles) defined by vertices, where the surfaces of the corresponding polygons are then processed through the graphics pipeline 400 to achieve final effects (e.g., color, texture, etc.). Vertex attributes can include normals (e.g., which direction is normal to the geometry at that location), color (e.g., RGB - red, green, and blue triples, etc.), and texture coordinates/mapping information.

几何图形处理级410负责(并且能够进行)顶点处理(例如，经由顶点着色器)和图元处理两者。特别地，几何图形处理级410可以输出定义图元的顶点集合并将它们传递到图形流水线400的下一级，以及输出这些顶点的位置(准确地说，齐次坐标)和各种其他参数。这些位置被放置在位置高速缓存450中以供后面的着色器级访问。其他参数被放置在参数高速缓存460中，再次供后面的着色器级访问。Geometry processing stage 410 is responsible for (and capable of) both vertex processing (eg, via vertex shaders) and primitive processing. In particular, the geometry processing stage 410 may output the set of vertices that define the primitives and pass them to the next stage of the graphics pipeline 400, as well as output the positions (to be precise, homogeneous coordinates) of these vertices and various other parameters. These locations are placed in location cache 450 for later shader level access. Other parameters are placed in parameter cache 460, again for access by subsequent shader levels.

几何图形处理级410可以执行各种操作，诸如对图元和/或多边形执行光照和阴影计算。在一个实施方案中，由于几何图形级能够处理图元，它可以执行背面剔除和/或剪裁(例如，针对视锥体进行测试)，从而减少下游级(例如，光栅化级420等)的负载。在另一实施方案中，几何图形级可以生成图元(例如，具有与传统几何图形着色器等效的功能)。Geometry processing stage 410 may perform various operations, such as performing lighting and shading calculations on primitives and/or polygons. In one embodiment, because the geometry stage is capable of processing primitives, it can perform backface culling and/or clipping (e.g., test against the view frustum), thereby reducing the load on downstream stages (e.g., rasterization stage 420, etc.) . In another embodiment, a geometry level may generate primitives (eg, having functionality equivalent to a traditional geometry shader).

几何图形处理级410输出的图元被馈送到光栅化级420，光栅化级420将图元转换成由像素组成的光栅图像。特别地，光栅化级420被配置为将场景中的对象投影到由3D游戏世界中的观看位置(例如，相机位置、用户眼睛位置等)定义的二维(2D)图像平面。在简单的级别上，光栅化级420查看每个图元并确定哪些像素受到对应图元的影响。特别地，光栅化器420将图元分割成像素大小的片段，其中每个片段对应于显示器中的一个像素。重要的是要注意，在显示图像时，一个或多个片段可能会影响对应像素的颜色。The primitives output by the geometry processing stage 410 are fed to a rasterization stage 420, which converts the primitives into raster images composed of pixels. In particular, rasterization stage 420 is configured to project objects in the scene onto a two-dimensional (2D) image plane defined by a viewing position in the 3D game world (eg, camera position, user eye position, etc.). At a simple level, rasterization stage 420 looks at each primitive and determines which pixels are affected by the corresponding primitive. In particular, rasterizer 420 divides primitives into pixel-sized fragments, where each fragment corresponds to a pixel in the display. It is important to note that when an image is displayed, one or more fragments may affect the color of the corresponding pixel.

如前所述，光栅化级420还可以执行附加操作，诸如针对观看位置裁剪(识别并忽略在视锥体之外的片段)和剔除(忽略被更近的对象遮挡的片段)。关于裁剪，几何图形处理级410和/或光栅化级420可以被配置为识别和忽略在由游戏世界中的观看位置定义的视锥体之外的图元。As mentioned previously, rasterization stage 420 may also perform additional operations such as clipping (identifying and ignoring fragments outside the viewing frustum) and culling (ignoring fragments occluded by closer objects) for the viewing position. Regarding clipping, the geometry processing stage 410 and/or the rasterization stage 420 may be configured to identify and ignore primitives that are outside the view frustum defined by the viewing position in the game world.

像素处理级430使用由几何图形处理级创建的参数以及其他数据来生成诸如像素的得到的颜色之类的值。特别地，处理级430在其核心的像素对片段执行着色操作以确定图元的颜色和亮度如何随可用照明而变化。例如，像素处理级430可以确定每个片段的深度、颜色、法线和纹理坐标(例如，纹理细节)，并且可以进一步确定片段的适当的明度、暗度和颜色水平。特别地，像素处理级430计算每个片段的特征，包括颜色和其他属性(例如，z深度表示距观看位置的距离，以及α值表示透明度)。此外，像素处理级430基于影响对应片段的可用照明将照明效果应用到片段。此外，像素处理级430可以为每个片段应用阴影效果。The pixel processing stage 430 uses the parameters created by the geometry processing stage and other data to generate values such as the resulting color of the pixel. In particular, processing stage 430 performs shading operations on the fragment at the pixels at its core to determine how the primitive's color and brightness change with available lighting. For example, pixel processing stage 430 may determine the depth, color, normal, and texture coordinates (eg, texture detail) of each fragment, and may further determine appropriate lightness, darkness, and color levels for the fragment. In particular, pixel processing stage 430 calculates characteristics of each segment, including color and other attributes (eg, z-depth represents distance from the viewing position, and alpha value represents transparency). Additionally, pixel processing stage 430 applies lighting effects to segments based on available lighting affecting the corresponding segment. Additionally, pixel processing stage 430 can apply shadow effects to each fragment.

像素处理级430的输出包括处理后的片段(例如，纹理和着色信息)并被传递到图形流水线400的下一个级中的输出合并级440。输出合并级440使用像素处理级430的输出以及诸如已经在存储器中的值等其他数据为像素生成最终颜色。例如，输出合并级440可以对从像素处理级430确定的片段和/或像素之间的值与已经写入该像素的MRT的值进行可选的混合。The output of pixel processing stage 430 includes processed fragments (eg, texture and shading information) and is passed to output merging stage 440 in the next stage of graphics pipeline 400 . The output combining stage 440 uses the output of the pixel processing stage 430 and other data such as values already in memory to generate the final color for the pixel. For example, the output binning stage 440 may optionally blend the values between fragments and/or pixels determined from the pixel processing stage 430 with the values of the MRT that have been written to that pixel.

显示器中每个像素的颜色值可以存储在帧缓冲区(未示出)中。当显示场景的对应图像时，这些值被扫描到对应的像素。特别是，显示器逐行地从左到右或从右到左、从上到下或从下到上、或以任何其他模式从每个像素的帧缓冲区中读取颜色值，并在显示图像时使用这些像素值照亮像素。The color value for each pixel in the display may be stored in a frame buffer (not shown). These values are scanned into the corresponding pixels when the corresponding image of the scene is displayed. In particular, the display reads color values from each pixel's framebuffer row by row, left to right or right to left, top to bottom or bottom to top, or in any other pattern, and displays the image Use these pixel values to illuminate the pixels.

利用图1-图3的云游戏网络190(例如在游戏服务器160中)和GPU资源365的详细描述，图5的流程图500图示了根据本公开的一个实施方案的当通过在渲染前针对交错的屏幕区域预测试几何图形来为应用生成的图像帧实现几何图形的多GPU渲染时的图形处理方法。以这种方式，多个GPU资源用于在执行应用时高效地执行对象的渲染。如前所述，各种架构可以包括多个GPU协作以通过在渲染时通过区域测试为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器内，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内，等等。With the detailed description of the cloud gaming network 190 (eg, in the game server 160) and the GPU resources 365 of FIGS. 1-3, the flow diagram 500 of FIG. Interleaved screen areas pre-test geometry to implement graphics processing methods for multi-GPU rendering of geometry for image frames generated by an application. In this manner, multiple GPU resources are used to efficiently perform the rendering of objects while executing the application. As previously mentioned, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for an application through zone testing at render time, such as within one or more cloud gaming servers of a cloud gaming system , or within a standalone system such as a personal computer or game console including a high-end graphics card with multiple GPUs, etc.

在510，该方法包括使用协作生成图像的多个图形处理单元(GPU)为应用渲染图形。特别地，在渲染用于实时应用的单个图像帧和/或图像帧序列中的一个或多个图像帧中的每一个时执行多GPU处理。At 510, the method includes rendering graphics for the application using multiple graphics processing units (GPUs) that cooperate to generate the image. In particular, multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames in a sequence of image frames for real-time applications.

在520，该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任。也就是说，每个GPU都有所有GPU已知的对应的责任划分(例如，对应的屏幕区域)。更具体地，每个GPU负责在多个屏幕区域的对应的一组屏幕区域中渲染几何图形，其中对应的一组屏幕区域包括一个或多个屏幕区域。例如，第一GPU具有在第一组屏幕区域中渲染对象的第一责任划分。同样，第二GPU具有在第二组屏幕区域中渲染对象的第二责任划分。这对于剩余的GPU是可重复的。At 520, the method includes dividing responsibilities for rendering geometry of the graphics among multiple GPUs based on multiple screen areas. That is, each GPU has a corresponding division of responsibilities (e.g., corresponding screen area) known to all GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen areas of a plurality of screen areas, where the corresponding set of screen areas includes one or more screen areas. For example, a first GPU has a first division of responsibility for rendering objects in a first set of screen areas. Likewise, the second GPU has a second division of responsibility for rendering objects in a second set of screen areas. This is repeatable for the remaining GPUs.

在530，该方法包括将在执行应用用于几何图形测试期间生成的图像帧的第一几何图形分配给第一GPU。例如，图像帧可以包括一个或多个对象，其中每个对象可以由一个或多个几何图形定义。也就是说，在一个实施方案中，对作为整个对象的一个几何图形执行几何图形预测试和渲染。在其他实施方案中，对作为整个对象的一部分的一个几何图形执行几何图形预测试和渲染。At 530, the method includes allocating to the first GPU a first geometry of the image frame generated during execution of the application for geometry testing. For example, an image frame may include one or more objects, where each object may be defined by one or more geometric shapes. That is, in one embodiment, geometry pretesting and rendering is performed on one geometry as an entire object. In other embodiments, geometry pretesting and rendering is performed on one geometry that is part of the entire object.

例如，多个GPU中的每一个被分配给与图像帧相关联的几何图形的对应部分。特别地，几何图形的每个部分都分配给对应的GPU以用于几何图形预测试。在一个实施方案中，可以在多个GPU之间均匀地分配几何图形。例如，如果多个GPU中有四个GPU，则每个GPU可以处理图像帧中几何图形的四分之一。在其他实施方案中，可以在多个GPU之间不均匀地分配几何图形。例如，在使用四个GPU进行图像帧的多GPU渲染的示例中，一个GPU可以比另一个GPU处理图像帧的更多几何图形。For example, each of multiple GPUs is assigned to a corresponding portion of the geometry associated with the image frame. In particular, each part of the geometry is assigned to the corresponding GPU for geometry pre-testing. In one embodiment, geometry can be distributed evenly across multiple GPUs. For example, if there are four GPUs in a multiple GPU, each GPU can process a quarter of the geometry in the image frame. In other implementations, geometry may be distributed unevenly among multiple GPUs. For example, in the example of using four GPUs for multi-GPU rendering of an image frame, one GPU can process more geometry of the image frame than another GPU.

在540，该方法包括在第一GPU处执行几何图形预测试以生成关于该几何图形如何与多个屏幕区域相关的信息。特别地，第一GPU生成几何图形以及它如何与多个屏幕区域中的每一个相关的信息。例如，由第一GPU进行的几何图形预测试可以确定该几何图形是否与分配给对应GPU用于对象渲染的特定屏幕区域重叠。第一几何图形可以与其他GPU负责对象渲染的屏幕区域重叠，和/或可以与第一GPU负责对象渲染的屏幕区域重叠。在一个实施方案中，在由多个GPU中的任何一个执行几何图形的渲染之前，由第一GPU执行的对应命令缓冲区中的着色器执行几何图形测试。在其他实施方案中，几何图形测试由硬件执行，例如在图形流水线400的光栅化级420中。At 540, the method includes performing geometry pretesting at the first GPU to generate information about how the geometry relates to the plurality of screen areas. In particular, the first GPU generates information about the geometry and how it relates to each of the multiple screen areas. For example, a geometry pretest by a first GPU may determine whether the geometry overlaps a specific screen area allocated to the corresponding GPU for object rendering. The first geometry may overlap with an area of the screen where the other GPU is responsible for object rendering, and/or may overlap with an area of the screen where the first GPU is responsible for object rendering. In one embodiment, before rendering of the geometry is performed by any of the plurality of GPUs, a geometry test is performed by the shader in the corresponding command buffer executed by the first GPU. In other embodiments, geometry testing is performed in hardware, such as in rasterization stage 420 of graphics pipeline 400 .

几何图形预测试通常在实施方案中由多个GPU针对对应图像帧的所有几何图形同时执行。也就是说，每个GPU针对对应图像帧的几何图形的其部分执行几何图形预测试。以这种方式，GPU的几何图形预测试允许每个GPU知道要渲染哪些几何图形，以及要跳过哪些几何图形。特别地，当对应的GPU执行几何图形预测试时，它针对用于渲染图像帧的多个GPU中的每一个的屏幕区域测试其几何图形部分。例如，如果有四个GPU，那么每个GPU可以对图像帧几何图形的四分之一执行几何图形测试，特别是如果几何图形被均匀地分配给GPU以进行几何图形测试。因此，虽然每个GPU仅针对对应图像帧的其几何图形部分执行几何图形预测试，但是因为几何图形预测试通常在实施方案中跨多个GPU针对图像帧的所有几何图形同时执行，所以生成的信息指示图像帧中的所有几何图形(例如几何图形)如何与所有GPU的屏幕区域相关，其中屏幕区域每个都分配给对应的GPU用于对象渲染，和/或其中渲染可以对几何图形(例如整个对象或对象的一部分)执行。Geometry pretesting is typically performed in implementations by multiple GPUs simultaneously for all geometries of corresponding image frames. That is, each GPU performs geometry pretesting for its portion of the geometry corresponding to the image frame. In this way, GPU geometry pretesting allows each GPU to know which geometries to render and which to skip. In particular, when a corresponding GPU performs geometry pretesting, it tests its geometry portion against the screen area of each of the multiple GPUs used to render image frames. For example, if there are four GPUs, then each GPU can perform geometry testing on a quarter of the image frame geometry, especially if the geometry is evenly distributed to the GPUs for geometry testing. Therefore, although each GPU performs geometry pretests only for its geometry portion of the corresponding image frame, because geometry pretests are typically performed simultaneously across multiple GPUs in implementations for all geometries of an image frame, the resulting Information indicating how all geometries (e.g. geometries) in the image frame are related to the screen areas of all GPUs, where the screen areas are each assigned to the corresponding GPU for object rendering, and/or where rendering can be done on geometries (e.g. The entire object or part of the object) is executed.

在550，该方法包括在渲染几何图形时在多个GPU中的每一个处使用信息(例如，包括完全渲染几何图形或跳过该几何图形的渲染)。也就是说，在多个GPU中的每一个处使用该信息来渲染该几何图形，其中几何图形的测试结果(例如信息)被发送到其他GPU，使得该信息对于每个GPU都是已知的。例如，图像帧中的几何图形(例如多个几何图形)通常在实施方案中由多个GPU同时渲染。特别地，当一个几何图形与分配给对应GPU以进行对象渲染的任何屏幕区域重叠时，该GPU将基于该信息渲染该几何图形。另一方面，当该几何图形不与分配给对应GPU以进行对象渲染的任何屏幕区域重叠时，该GPU可以基于该信息跳过该几何图形的渲染。因此，这信息允许所有GPU更高效地渲染图像帧中的几何图形，和/或完全避免渲染该几何图形。例如，渲染可以由多个GPU执行的对应命令缓冲区中的着色器执行。如将在下文图7A、图12A和图13A更全面地描述的那样，着色器可以被配置为基于对应的GPU配置执行几何图形测试和/或渲染中的一个或两个。At 550, the method includes using the information at each of the plurality of GPUs when rendering the geometry (eg, including rendering the geometry entirely or skipping rendering of the geometry). That is, the geometry is rendered using the information at each of multiple GPUs, where the test results (eg information) of the geometry are sent to the other GPUs such that the information is known to each GPU . For example, geometries (eg, multiple geometries) in an image frame are often rendered simultaneously by multiple GPUs in implementations. In particular, when a geometry overlaps any screen area allocated to the corresponding GPU for object rendering, that GPU will render the geometry based on that information. On the other hand, when the geometry does not overlap any screen area allocated to the corresponding GPU for object rendering, the GPU can skip rendering of the geometry based on this information. This information therefore allows all GPUs to more efficiently render the geometry in the image frame, and/or avoid rendering that geometry entirely. For example, rendering can be performed by shaders in corresponding command buffers executed by multiple GPUs. As will be described more fully below in Figures 7A, 12A, and 13A, shaders may be configured to perform one or both of geometry testing and/or rendering based on the corresponding GPU configuration.

根据本公开的一个实施方案，在一些架构中，如果对应的渲染GPU及时接收到对应的信息以使用它，则该GPU将在决定渲染对应的图像内的哪个几何图形时使用该信息。也就是说，该信息可以作为提示。否则，渲染GPU将像往常一样处理该几何图形。使用其中信息可以指示几何图形是否与分配给渲染GPU(例如第二GPU)的任何屏幕区域重叠的示例，如果信息指示几何图形没有重叠，则渲染GPU可以完全跳过渲染该几何图形。此外，如果只有几何图形中的多个不重叠，则第二GPU可以跳过几何图形中的不与分配给第二GPU用于对象渲染的任何屏幕区域重叠的至少那些的渲染。另一方面，该信息可以指示几何图形存在重叠，在这种情况下，第二GPU或渲染GPU将渲染该几何图形。同样，该信息可以指示某些几何图形与分配给第二GPU或渲染GPU以用于对象渲染的任何屏幕区域重叠。在这种情况下，第二GPU或渲染GPU将只渲染那些重叠的几何图形。在又一个实施方案中，如果没有信息，或者如果没有及时生成或接收信息，则第二GPU将正常执行渲染(例如，渲染几何图形)。因此，作为提示提供的信息如果被及时接收可以提高图形处理系统的整体效率。如果没有及时收到信息，图形处理系统在没有这些信息的情况下仍然正常运行。According to one embodiment of the present disclosure, in some architectures, if a corresponding rendering GPU receives the corresponding information in time to use it, the GPU will use the information when deciding which geometry within the corresponding image to render. That is, this information can serve as a reminder. Otherwise, the rendering GPU will handle the geometry as usual. Using the example where the information may indicate whether the geometry overlaps any screen area assigned to the rendering GPU (eg, the second GPU), if the information indicates that the geometry does not overlap, the rendering GPU may skip rendering the geometry entirely. Furthermore, if only a plurality of the geometries do not overlap, the second GPU may skip rendering of at least those of the geometries that do not overlap with any screen area allocated to the second GPU for object rendering. On the other hand, this information can indicate that there is overlap of geometry, in which case the second GPU or rendering GPU will render the geometry. Likewise, this information may indicate that certain geometry overlaps any screen area allocated to the secondary GPU or rendering GPU for object rendering. In this case, the second GPU or rendering GPU will only render those overlapping geometries. In yet another embodiment, if the information is not available, or if the information is not generated or received in a timely manner, the second GPU will perform rendering normally (eg, rendering geometry). Therefore, the information provided as prompts can improve the overall efficiency of the graphics processing system if received in a timely manner. If the information is not received in a timely manner, the graphics processing system can still function normally without this information.

在一个实施方案中，一个GPU(例如预测试GPU)专用于执行几何图形预测试以生成信息。也就是说，专用GPU不用于渲染对应的图像帧中的对象(例如几何图形)。具体而言，如前所述，使用多个GPU来渲染应用的图形。基于可以交错的多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任，其中每个GPU具有多个GPU已知的对应责任划分。在预测试GPU处对由应用生成的图像帧的多个几何图形执行几何图形测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。使用为多个几何图形中的每一个生成的信息在多个GPU中的每一个处渲染所述多个几何图形。也就是说，当由用于渲染图像帧的GPU中的对应渲染GPU渲染几何图形中的每一个时使用该信息。In one embodiment, one GPU (eg, the pretest GPU) is dedicated to performing geometry pretests to generate information. That is, the dedicated GPU is not used to render objects (such as geometry) in the corresponding image frame. Specifically, as mentioned earlier, multiple GPUs are used to render the app's graphics. Responsibility for rendering the geometry of a graphic is divided between multiple GPUs based on multiple screen areas that can be interleaved, with each GPU having a corresponding division of responsibilities known to multiple GPUs. Geometry testing is performed at the pre-test GPU on a plurality of geometries of image frames generated by the application to generate information about each geometry and its relationship to each of the plurality of screen areas. The plurality of geometries are rendered at each of the plurality of GPUs using information generated for each of the plurality of geometries. That is, this information is used when each of the geometries is rendered by a corresponding one of the GPUs used to render the image frame.

图6A-图6B示出了对被细分为区域和子区域的屏幕的渲染，纯粹是为了说明的目的。可以理解，细分的区域和/或子区域的数量可以选择以对图像和/或图像序列中的一个或多个图像中的每一个进行高效多GPU处理。也就是说，屏幕可以细分为两个或更多个区域，其中每个区域可以进一步划分为子区域。在本公开的一个实施方案中，屏幕被细分为四个象限，如图6A所示。在本公开的另一个实施方案中，屏幕被细分为更大数量的交错区域，如图6B所示。下面的图6A-图6B的讨论旨在说明在对分配有多个GPU的多个屏幕区域执行多GPU渲染时出现的低效率；图7A-图7C和图8A-图8B示出了根据本发明的一些实施方案的更高效的渲染。Figures 6A-6B show renderings of a screen subdivided into regions and sub-regions purely for illustrative purposes. It will be appreciated that the number of subdivided regions and/or sub-regions may be selected to enable efficient multi-GPU processing of an image and/or each of one or more images in a sequence of images. That is, the screen can be subdivided into two or more areas, each of which can be further divided into sub-areas. In one embodiment of the present disclosure, the screen is subdivided into four quadrants, as shown in Figure 6A. In another embodiment of the present disclosure, the screen is subdivided into a larger number of staggered areas, as shown in Figure 6B. The following discussion of Figures 6A-6B is intended to illustrate the inefficiencies that occur when performing multi-GPU rendering on multiple screen areas assigned to multiple GPUs; Figures 7A-7C and 8A-8B illustrate the More efficient rendering of some embodiments of the invention.

特别地，图6A是在执行多GPU渲染时被细分为象限(例如四个区域)的屏幕610A的示意图。如图所示，屏幕610A被细分为四个象限(例如A、B、C和D)。每个象限以一对一的关系被分配给四个GPU[GPU-A、GPU-B、GPU-C和GPU-D]之一。例如，GPU-A分配给象限A，GPU-B分配给象限B，GPU-C分配给象限C，和GPU-D分配给象限D。In particular, FIG. 6A is a schematic diagram of a screen 610A subdivided into quadrants (eg, four regions) when performing multi-GPU rendering. As shown, screen 610A is subdivided into four quadrants (eg, A, B, C, and D). Each quadrant is assigned to one of four GPUs [GPU-A, GPU-B, GPU-C, and GPU-D] in a one-to-one relationship. For example, GPU-A is assigned to quadrant A, GPU-B is assigned to quadrant B, GPU-C is assigned to quadrant C, and GPU-D is assigned to quadrant D.

几何图形可以被剔除。例如，CPU 163可以检查每个象限的平截头体的边界框，并请求每个GPU仅渲染与其对应平截头体重叠的对象。结果是每个GPU只负责渲染几何图形的一部分。出于说明的目的，屏幕610示出几何图形，其中每个都是对应的对象，其中屏幕610示出对象611-617(例如几何图形)。GPU-A将不渲染任何对象，因为没有对象与象限A重叠。GPU-B将渲染对象615和616(由于对象615的一部分存在于象限B中，CPU的剔除测试将正确得出结论：GPU-B必须渲染它)。GPU-C将渲染对象611和612。GPU-D将渲染对象612、613、614、615和617。Geometry can be culled. For example, CPU 163 may examine the bounding box of each quadrant's frustum and request that each GPU render only objects that overlap with its corresponding frustum. The result is that each GPU is responsible for rendering only part of the geometry. For purposes of illustration, screen 610 shows geometric figures, each of which is a corresponding object, with screen 610 showing objects 611 - 617 (eg, geometric figures). GPU-A will render no objects because no objects overlap Quadrant A. GPU-B will render objects 615 and 616 (since part of object 615 exists in quadrant B, the CPU's culling test will correctly conclude that GPU-B must render it). GPU-C will render objects 611 and 612. GPU-D will render objects 612, 613, 614, 615 and 617.

在图6A中，当屏幕610A被划分为象限A-D时，每个GPU必须执行的工作量可能非常不同，因为在某些情况下，不成比例的几何图形量可能在一个象限中。例如，象限A没有任何几何图形，而象限D具有五个几何图形，或至少五个几何图形的至少一部分。因此，分配给象限A的GPU-A将处于空闲状态，而分配给象限D的GPU-D在渲染对应图像中的对象时会异常繁忙。In Figure 6A, when screen 610A is divided into quadrants A-D, the amount of work each GPU must perform may be very different because in some cases a disproportionate amount of geometry may be in one quadrant. For example, quadrant A does not have any geometric figures, while quadrant D has five geometric figures, or at least parts of five geometric figures. Therefore, GPU-A assigned to quadrant A will be idle, while GPU-D assigned to quadrant D will be extremely busy rendering objects in the corresponding image.

图6B图示了将屏幕细分为区域时的另一种技术。特别地，屏幕610B不是细分为象限，而是在对单个图像或图像序列中的一个或多个图像中的每一个执行多GPU渲染时细分为多个交错区域。在那种情况下，屏幕610B被细分为更大数量的交错区域(例如大于四个象限)，同时使用相同量的GPU进行渲染(例如四个)。屏幕610A中所示的对象(611-617)也显示在屏幕610B中相同的对应位置。Figure 6B illustrates another technique when subdividing the screen into regions. In particular, screen 610B is not subdivided into quadrants, but into multiple interleaved regions when performing multi-GPU rendering on a single image or on each of one or more images in a sequence of images. In that case, screen 610B is subdivided into a larger number of interleaved areas (eg, greater than four quadrants) while using the same amount of GPU for rendering (eg, four). The objects (611-617) shown in screen 610A are also displayed in the same corresponding locations in screen 610B.

特别地，四个GPU(例如GPU-A、GPU-B、GPU-C和GPU-D)用于为对应的应用渲染图像。每个GPU负责渲染与对应区域重叠的几何图形。也就是说，每个GPU都被分配到一组对应的区域。例如，GPU-A负责对应组中标记为A的每个区域，GPU-B负责对应组中标记为B的每个区域，GPU-C负责对应组中标记为C的每个区域，和GPU-D负责对应组中标记为D的每个区域。In particular, four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) are used to render images for corresponding applications. Each GPU is responsible for rendering the geometry that overlaps the corresponding area. That is, each GPU is assigned to a corresponding set of regions. For example, GPU-A is responsible for every region labeled A in the corresponding group, GPU-B is responsible for every region labeled B in the corresponding group, GPU-C is responsible for every region labeled C in the corresponding group, and GPU- D is responsible for each area labeled D in the corresponding group.

此外，这些区域以特定模式交错。由于区域的交错(和更多数量的区域)，每个GPU必须执行的工作量可能会更加平衡。例如，屏幕610B的交错模式包括交替行，包括区域A-B-A-B等，以及区域C-D-C-D等。在本公开的实施方案中支持使区域交错的其他模式。例如，模式可以包括重复的区域序列、均匀分布的区域、不均匀分布的区域、可重复的区域序列行、随机区域序列、随机区域序列行等。Furthermore, these areas are interlaced in specific patterns. Due to the interleaving of regions (and a greater number of regions), the amount of work each GPU has to perform may be more balanced. For example, the staggered pattern of screen 610B includes alternating rows including areas A-B-A-B, etc., and areas C-D-C-D, etc. Other modes of interleaving regions are supported in embodiments of the present disclosure. For example, a pattern may include a repeating sequence of regions, uniformly distributed regions, unevenly distributed regions, repeatable region sequence rows, random region sequence, random region sequence rows, etc.

选择区域的数量很重要。例如，如果区域的分布太细(例如，区域的数量太大而无法优化)，则每个GPU仍必须处理几何图形的大部分或所有。例如，可能很难针对GPU负责的所有区域检查对象边界框。此外，即使可以及时检查边界框，由于区域大小较小，结果将是每个GPU可能必须处理几何图形的大部分，因为图像中的每个对象都与每个GPU的至少一个区域重叠(例如，GPU处理整个对象，即使对象的只有一部分与分配给该GPU的一组区域中的至少一个区域重叠)。The number of areas chosen is important. For example, if the distribution of regions is too fine (e.g., the number of regions is too large to optimize), each GPU will still have to process most or all of the geometry. For example, it can be difficult to check object bounding boxes against all areas that the GPU is responsible for. Furthermore, even if bounding boxes could be checked in time, due to the small region sizes, the result would be that each GPU would likely have to process a large portion of the geometry, since every object in the image overlaps with at least one region of each GPU (e.g., The GPU processes the entire object, even if only a portion of the object overlaps at least one region in a set of regions assigned to the GPU).

因此，选择区域数量、交错模式等很重要。选择太少或太多的区域、或者用于交错的太少区域或太多区域、或者选择用于交错的低效模式可能导致在执行GPU处理时效率低下(例如，每个GPU处理几何图形的大部分或全部)。在这些情况下，虽然有多个GPU用于渲染图像，但是由于GPU效率低下，也无法支持对应增加的屏幕像素计数和几何图形密度(即四个GPU不能写入四倍的像素并处理四倍的顶点或图元)。以下实施方案针对剔除策略(图7A-图7C)和剔除粒度(图8A-图8B)的改进以及其他进步。Therefore, it is important to choose the number of zones, staggering pattern, etc. Selecting too few or too many regions, or too few or too many regions for interleaving, or selecting an inefficient mode for interleaving can result in inefficiencies in performing GPU processing (e.g., each GPU processing geometry most or all). In these cases, although multiple GPUs are used to render the image, they are unable to support the corresponding increase in screen pixel count and geometry density due to GPU inefficiency (i.e. four GPUs cannot write four times the pixels and process four times the vertices or primitives). The following embodiments address improvements in the culling strategy (Figs. 7A-7C) and culling granularity (Figs. 8A-8B), as well as other advances.

图7A-图7C是图示了本公开的实施方案中的使用多个GPU来渲染单个图像和/或图像序列中的至少一个或多个图像中的每一个的示意图。选择四个GPU纯粹是为了便于说明在执行应用时渲染图像时的多GPU渲染，并且可以理解，在各种实施方案中，任何数量的GPU都可以用于多GPU渲染。7A-7C are schematic diagrams illustrating the use of multiple GPUs to render a single image and/or each of at least one or more images in a sequence of images, in embodiments of the present disclosure. Four GPUs were chosen purely for ease of illustration of multi-GPU rendering when rendering images while executing the application, and it is understood that in various implementations, any number of GPUs may be used for multi-GPU rendering.

特别地，图7A是根据本公开的一个实施方案的由协作以渲染单个图像帧的多个GPU共享的渲染命令缓冲区700A的示意图。也就是说，在本示例中，多个GPU各自使用相同的渲染命令缓冲区(例如，缓冲区700A)，并且每个GPU执行渲染命令缓冲区中的所有命令。多个命令(完整的集合)被加载到渲染命令缓冲区700A中，且用于渲染对应的图像帧。可以理解，一个或多个渲染命令缓冲区可用于生成对应的图像帧。在一个示例中，CPU为图像帧生成一个或多个绘制调用，其中绘制调用包括放置在一个或多个渲染命令缓冲区中以供图3的GPU资源365的一个或多个GPU在执行对应图像的多GPU渲染时执行的命令。在一些实施方式中，CPU 163可以请求一个或多个GPU来生成用于渲染对应图像的绘制调用的所有或一些。此外，包含在渲染命令缓冲区700A中的整个命令集可以在图7A中示出，或图7A可以示出包含在渲染命令缓冲区700A内的整个命令集的一部分。In particular, FIG. 7A is a schematic diagram of a rendering command buffer 700A shared by multiple GPUs cooperating to render a single image frame, according to one embodiment of the present disclosure. That is, in this example, multiple GPUs each use the same render command buffer (eg, buffer 700A), and each GPU executes all commands in the render command buffer. Multiple commands (complete sets) are loaded into rendering command buffer 700A and used to render corresponding image frames. It can be appreciated that one or more rendering command buffers may be used to generate corresponding image frames. In one example, the CPU generates one or more draw calls for an image frame, wherein the draw calls include placement in one or more render command buffers for one or more GPUs of GPU resource 365 of FIG. 3 to execute the corresponding image. Command executed during multi-GPU rendering. In some implementations, CPU 163 may request one or more GPUs to generate all or some of the draw calls used to render the corresponding image. Additionally, the entire set of commands contained in rendering command buffer 700A may be shown in FIG. 7A , or FIG. 7A may show a portion of the entire set of commands contained in rendering command buffer 700A.

当执行图像或图像序列中的一个或多个图像中的每一个的多GPU渲染时，GPU通常在实施方案中同时进行渲染。图像的渲染可以分为多个阶段。在每个阶段中，GPU都需要同步，因此较快的GPU必须等到较慢的GPU完成。图7A所示的用于渲染命令缓冲区700A的命令示出了一个阶段。尽管在图7A中示出了用于仅一个阶段的命令，但渲染命令缓冲区700A可以包括用于渲染图像时的一个或多个阶段的命令，图7A仅示出了所有命令的一部分，因此未示出用于其他阶段的命令。在图7A所示的图示了一个阶段的渲染命令缓冲区700A中，有四个要被渲染的对象(例如，对象0、对象1、对象2和对象3)，如图7B-1所示。When performing multi-GPU rendering of each of one or more images in an image or sequence of images, the GPUs typically render simultaneously in implementations. The rendering of an image can be divided into multiple stages. During each stage, the GPUs need to be synchronized, so the faster GPU has to wait until the slower GPU completes. The commands for rendering command buffer 700A shown in Figure 7A illustrate one stage. Although commands for only one stage are shown in Figure 7A, the rendering command buffer 700A may include commands for one or more stages when rendering an image. Figure 7A only shows a portion of all commands, so Commands for other phases are not shown. In the rendering command buffer 700A shown in FIG. 7A illustrating one stage, there are four objects to be rendered (eg, object 0, object 1, object 2, and object 3), as shown in FIG. 7B-1 .

如图所示，图7A中所示的渲染命令缓冲区700A包括用于几何图形测试、对象(例如几何图形)渲染的命令和用于配置正在执行来自渲染命令缓冲区700A的命令的一个或多个渲染GPU的状态的命令。仅出于说明的目的，图7A中所示的渲染命令缓冲区700A包括用于几何图形预测试、渲染对象和/或在为对应的应用渲染对应的图像时执行同步计算内核的命令(710-728)。在一些实施方式中，几何图形预测试和该图像的对象渲染和/或同步计算内核的执行必须在一个帧周期内执行。在渲染命令缓冲区700A中示出了两个处理分段。特别地，处理分段1包括预测试或几何图形测试701，和分段2包括渲染702。As shown, the rendering command buffer 700A shown in FIG. 7A includes commands for geometry testing, object (eg, geometry) rendering, and one or more commands for configuring the execution of commands from the rendering command buffer 700A. A command to render the state of the GPU. For purposes of illustration only, rendering command buffer 700A shown in FIG. 7A includes commands for geometry pretesting, rendering objects, and/or executing synchronous compute kernels when rendering corresponding images for corresponding applications (710- 728). In some embodiments, geometry pre-testing and execution of object rendering and/or synchronization calculation kernels for this image must be performed within one frame period. Two processing stages are shown in render command buffer 700A. In particular, processing segment 1 includes pre-testing or geometry testing 701 , and segment 2 includes rendering 702 .

分段1包括对图像帧中的对象执行几何图形测试701，其中每个对象可以由一个或多个几何图形定义。可以由一个或多个着色器执行预测试或几何图形测试701。例如，在一个实施方案中，为对应图像帧的多GPU渲染中使用的每个GPU分配图像帧的几何图形的一部分以执行几何图形测试，其中每个部分可以被分配用于预测试。分配的部分可以包括一个或多个几何图形，其中每个几何图形可以包括整个对象，或者可以包括对象的一部分(例如，顶点、图元等)。特别地，对一个几何图形进行几何图形测试以生成关于该几何图形如何与多个屏幕区域中的每一个相关的信息。例如，几何图形测试可以确定一个几何图形是否与分配给对应GPU用于对象渲染的特定屏幕区域重叠。Section 1 includes performing geometry testing 701 on objects in the image frame, where each object may be defined by one or more geometries. Pre-testing or geometry testing 701 may be performed by one or more shaders. For example, in one embodiment, each GPU used in a multi-GPU rendering of a corresponding image frame is assigned a portion of the geometry of the image frame to perform geometry testing, where each portion may be assigned for pre-testing. The allocated portion may include one or more geometries, where each geometry may include the entire object, or may include a portion of the object (eg, vertices, primitives, etc.). In particular, geometry testing is performed on a geometry to generate information about how the geometry relates to each of multiple screen areas. For example, a geometry test can determine whether a geometry overlaps a specific screen area allocated to the corresponding GPU for object rendering.

如图7A所示，分段1的几何图形测试701(例如，几何图形的预测试)包括用于配置执行来自渲染命令缓冲区700A的命令的一个或多个GPU的状态的命令，以及用于执行几何图形测试的命令。特别地，每个GPU的GPU状态是在GPU对对应对象执行几何图形测试之前配置的。例如，命令710、713和715各自用于配置一个或多个GPU的GPU状态，以用于执行用于几何图形测试的命令。如图所示，命令710配置GPU状态，以便可以正确执行几何图形测试命令711-712，其中命令711对对象0执行几何图形测试，和命令712对对象1执行几何图形测试。类似地，命令713配置GPU状态，以便几何图形测试命令714可以对对象2执行几何图形测试。同样，命令715配置GPU状态，以便几何图形测试命令716可以对对象3执行几何图形测试。可以理解，GPU状态可以被配置用于一个或多个几何图形测试命令(例如，测试命令711和712)。As shown in Figure 7A, Segment 1 geometry test 701 (eg, a pre-test of geometry) includes commands for configuring the state of one or more GPUs executing commands from rendering command buffer 700A, and for Command to perform geometry testing. In particular, the GPU state of each GPU is configured before the GPU performs geometry testing on the corresponding object. For example, commands 710, 713, and 715 are each used to configure the GPU state of one or more GPUs for execution of commands for geometry testing. As shown, command 710 configures the GPU state so that geometry test commands 711-712 can be executed correctly, where command 711 performs a geometry test on object 0, and command 712 performs a geometry test on object 1. Similarly, command 713 configures the GPU state so that geometry test command 714 can perform geometry testing on object 2. Likewise, command 715 configures the GPU state so that geometry test command 716 can perform geometry testing on object 3. It will be appreciated that the GPU state may be configured for one or more geometry test commands (eg, test commands 711 and 712).

如前所述，当执行渲染命令缓冲区700A中的命令用于为对应的图像进行几何图形测试和/或渲染对象和/或执行同步计算内核时，存储在寄存器中的值定义了用于对应GPU的硬件背景(例如GPU配置)。如图所示，GPU状态可以在渲染命令缓冲区700A中的命令处理的整个过程中被修改，命令的每个后续分段可以用于配置GPU状态。如应用于图7A，以及在整个说明书中提及设置GPU状态时，可以以多种方式设置GPU状态。例如，CPU或GPU可以设置随机存取存储器(RAM)中的一个值，其中GPU将检查RAM中的该值。在另一个示例中，状态可以在GPU内部，诸如当命令缓冲区作为子例程被调用两次时，内部GPU状态在两个子例程调用之间是不同的。As previously mentioned, when executing commands in rendering command buffer 700A for performing geometry testing and/or rendering objects for corresponding images and/or executing synchronous computing kernels, the values stored in the registers define the values used for the corresponding Hardware background of the GPU (e.g. GPU configuration). As shown, the GPU state may be modified throughout the processing of commands in rendering command buffer 700A, and each subsequent segment of the command may be used to configure the GPU state. As applied to Figure 7A, and as mentioned throughout this specification when setting the GPU state, the GPU state can be set in a variety of ways. For example, the CPU or GPU can set a value in random access memory (RAM), where the GPU will check the RAM for that value. In another example, the state may be internal to the GPU, such as when a command buffer is called twice as a subroutine, the internal GPU state is different between the two subroutine calls.

分段2包括对图像帧中的对象执行渲染702，其中几何图形被渲染)。渲染702可由命令缓冲区700A中的一个或多个着色器执行。如图7A所示，分段2的渲染702包括用于配置执行来自渲染命令缓冲区700A的命令的一个或多个GPU的状态的命令，以及用于执行渲染的命令。特别地，每个GPU的GPU状态在GPU渲染对应的对象(例如几何图形)之前配置。例如，命令721、723、725和727各自用于配置一个或多个GPU的GPU状态，以用于执行用于渲染的命令。如图所示，命令721配置GPU状态，以便渲染命令722可以渲染对象0；命令723配置GPU状态，以便渲染命令724可以渲染对象1；命令725配置GPU状态，以便渲染命令726可以渲染对象2；并且命令727配置GPU状态，以便渲染命令728可以渲染对象3。虽然图7A示出了为每个渲染命令(例如，渲染对象0等)配置GPU状态，但是应当理解，可以为一个或多个渲染命令配置GPU状态。Segment 2 includes performing rendering 702 of objects in the image frame, where geometry is rendered). Rendering 702 may be performed by one or more shaders in command buffer 700A. As shown in Figure 7A, rendering 702 of segment 2 includes commands for configuring the state of one or more GPUs executing commands from rendering command buffer 700A, as well as commands for performing rendering. In particular, the GPU state of each GPU is configured before the GPU renders the corresponding object (such as geometry). For example, commands 721, 723, 725, and 727 are each used to configure the GPU state of one or more GPUs for execution of commands for rendering. As shown in the figure, command 721 configures the GPU state so that render command 722 can render object 0; command 723 configures the GPU state so that render command 724 can render object 1; command 725 configures the GPU state so that render command 726 can render object 2; And command 727 configures the GPU state so that render command 728 can render object 3. Although FIG. 7A illustrates that the GPU state is configured for each rendering command (eg, render object 0, etc.), it should be understood that the GPU state may be configured for one or more rendering commands.

如前所述，在对应图像帧的多GPU渲染中使用的每个GPU基于在几何图形预测试期间生成的信息来渲染对应的几何图形。具体来说，每个GPU已知的信息提供了对象和屏幕区域之间的关系。在渲染对应几何图形时，如果及时接收到该信息，GPU可以使用该信息来高效地渲染这些几何图形。具体而言，如信息所指示，当一个几何图形与分配给对应GPU用于对象渲染的任何一个或多个屏幕区域重叠时，该GPU执行该几何图形的渲染。另一方面，该信息可以指示第一GPU应该完全跳过渲染几何图形(例如，几何图形不与第一GPU被分配负责对象渲染的任何屏幕区域重叠)。以这种方式，每个GPU只渲染与它负责对象渲染的一个或多个屏幕区域重叠的几何图形。这样，该信息作为提示提供给每个GPU，使得如果在渲染开始之前接收到该信息，则该信息被正在执行渲染几何图形的每个GPU考虑。在一个实施方案中，如果没有及时接收到信息，则渲染正常进行，诸如对应的几何图形由对应的GPU完全渲染，而不管该几何图形是否与分配给GPU用于对象渲染的任何屏幕区域重叠。As mentioned before, each GPU used in the multi-GPU rendering of the corresponding image frame renders the corresponding geometry based on the information generated during the geometry pre-test. Specifically, information known to each GPU provides relationships between objects and screen areas. When rendering corresponding geometries, if this information is received in time, the GPU can use this information to render these geometries efficiently. Specifically, as the information indicates, a GPU performs rendering of a geometry when it overlaps any one or more screen areas allocated to the corresponding GPU for object rendering. On the other hand, this information may indicate that the first GPU should skip rendering the geometry entirely (eg, the geometry does not overlap with any screen area that the first GPU is assigned to be responsible for object rendering). In this way, each GPU only renders geometry that overlaps the screen area or areas for which it is responsible for object rendering. This way, this information is provided as a hint to each GPU such that if it is received before rendering begins, it is considered by each GPU that is performing rendering geometry. In one embodiment, if information is not received in time, rendering proceeds normally, such that the corresponding geometry is fully rendered by the corresponding GPU, regardless of whether the geometry overlaps any screen area allocated to the GPU for object rendering.

仅出于说明目的，四个GPU将对应的屏幕划分为它们之间的区域。如前所述，每个GPU负责在对应的一组区域中渲染对象，其中对应的一组包括一个或多个区域。在一个实施方案中，渲染命令缓冲区700A由协作渲染单个图像的多个GPU共享。也就是说，用于单个图像或图像序列中的一个或多个图像中的每一个的多GPU渲染的GPU共享一个公共命令缓冲区。在另一个实施方案中，每个GPU可能具有其自己的命令缓冲区。For illustration purposes only, the four GPUs divide the corresponding screen into the area between them. As mentioned before, each GPU is responsible for rendering objects in a corresponding set of regions, where a corresponding set includes one or more regions. In one embodiment, rendering command buffer 700A is shared by multiple GPUs collaborating to render a single image. That is, the GPUs for multi-GPU rendering of a single image or each of one or more images in a sequence of images share a common command buffer. In another embodiment, each GPU may have its own command buffer.

替代地，在又一个实施方案中，每个GPU可能正在渲染稍微不同的几组对象。当可以确定特定GPU不需要渲染特定对象时，可能会出现这种情况，因为它不重叠其对应的屏幕区域，诸如在对应的一组中。多个GPU仍然可以使用相同的命令缓冲区(例如，共享一个命令缓冲区)，只要命令缓冲区支持命令由一个GPU而不是另一个GPU执行的能力，如前所述。例如，共享渲染命令缓冲区700A中的命令的执行可以限于渲染GPU之一。这可以通过多种方式实现。在另一个示例中，可以在对应的命令上使用标志来指示哪些GPU应该执行它。同样，可以在渲染命令缓冲区中使用位来实现预测，以说明哪个GPU在什么条件下做什么。预测的示例包括——“如果这是GPU-A，则跳过以下X命令”。Alternatively, in yet another implementation, each GPU may be rendering slightly different sets of objects. This may occur when it can be determined that a particular GPU does not need to render a particular object because it does not overlap its corresponding screen area, such as in a corresponding set. Multiple GPUs can still use the same command buffer (e.g., share a command buffer), as long as the command buffer supports the ability for commands to be executed by one GPU but not the other, as mentioned previously. For example, execution of commands in shared rendering command buffer 700A may be limited to one of the rendering GPUs. This can be achieved in a number of ways. In another example, flags can be used on the corresponding command to indicate which GPUs should execute it. Likewise, prediction can be implemented using bits in the render command buffer to say which GPU is doing what under what conditions. Examples of predictions include - "If this is GPU-A, skip the following X commands".

在又一个实施方案中，由于每个GPU正在渲染基本上相同的一组对象，因此多个GPU仍可以使用相同的命令缓冲区。例如，当区域相对较小时，每个GPU仍可以渲染所有对象，如前所述。In yet another embodiment, multiple GPUs may still use the same command buffer since each GPU is rendering substantially the same set of objects. For example, when the area is relatively small, each GPU can still render all objects, as mentioned earlier.

图7B-1图示了根据本公开的一个实施方案的屏幕700B，其示出了包括由多个GPU使用图7A的渲染命令缓冲区700A渲染的四个对象的图像。根据本公开的一个实施方案，在渲染与图像帧中的对象相对应的几何图形之前，通过针对可能交错的屏幕区域预测试几何图形来为应用执行几何图形的多GPU渲染。Figure 7B-1 illustrates screen 700B showing an image including four objects rendered by multiple GPUs using render command buffer 700A of Figure 7A, according to one embodiment of the present disclosure. According to one embodiment of the present disclosure, multi-GPU rendering of geometry is performed for an application by pre-testing the geometry for potentially interleaved screen areas before rendering the geometry corresponding to objects in the image frame.

特别地，几何图形的渲染责任在多个GPU之间通过屏幕区域划分，其中多个屏幕区域被配置为减少多个GPU之间的渲染时间不平衡。例如，屏幕700B示出了在渲染图像的对象时每个GPU的屏幕区域责任。四个GPU(GPU-A、GPU-B、GPU-C和GPU-D)用于渲染屏幕700B所示图像中的对象。屏幕700B比如图6A所示通过象限更精细地划分，努力平衡GPU之间的像素和顶点负载。此外，屏幕700B被划分为可能交错的区域。例如，交错包括多行区域。行731和733中的每一行包括与区域B交替的区域A。行732和734中的每一行包括与区域D交替的区域C。更具体地，在一个模式中，包括区域A和B的行与包括区域C和区域D的行交替。In particular, geometry rendering responsibilities are divided between multiple GPUs via screen regions, where multiple screen regions are configured to reduce rendering time imbalances across multiple GPUs. For example, screen 700B shows each GPU's screen area responsibility in rendering objects of an image. Four GPUs (GPU-A, GPU-B, GPU-C, and GPU-D) are used to render objects in the image shown on screen 700B. Screen 700B is more finely divided by quadrants than shown in Figure 6A in an effort to balance pixel and vertex load between the GPUs. Additionally, screen 700B is divided into areas that may be interleaved. For example, interleaving includes multi-line regions. Each of rows 731 and 733 includes area A alternating with area B. Each of rows 732 and 734 includes a region C alternating with a region D. More specifically, in one pattern, rows including areas A and B alternate with rows including areas C and D.

如前所述，为了实现GPU处理效率，可以在将屏幕划分为区域时使用各种技术，诸如增加或减少区域的数量(例如，选择正确的区域数量)、交错区域、增加或减少用于交错的区域的数量、在交错区域和/或子区域时选择特定模式等。在一个实施方案中，多个屏幕区域中的每个屏幕区域的大小是统一的。在一个实施方案中，多个屏幕区域中的每一个在大小上是不统一的。在又一个实施方案中，多个屏幕区域的数量和大小动态地改变。As mentioned before, in order to achieve GPU processing efficiency, various techniques can be used when dividing the screen into regions, such as increasing or decreasing the number of regions (e.g., choosing the right number of regions), interleaving regions, increasing or decreasing the number of regions for interleaving. number of regions, selecting a specific mode when interleaving regions and/or subregions, etc. In one embodiment, each of the plurality of screen areas is uniform in size. In one embodiment, each of the plurality of screen areas is not uniform in size. In yet another embodiment, the number and size of the multiple screen areas change dynamically.

GPU中的每一个负责在对应的一组区域中渲染对象，其中每组可以包括一个或多个区域。因此，GPU-A负责在对应组中每个A区域中渲染对象，GPU-B负责在对应组中每个B区域中渲染对象，GPU-C负责在对应组中每个C区域中渲染对象，和GPU-D负责在对应组中每个D区域中渲染对象。也可能有具有其他责任的GPU，使得它们可能不执行渲染(例如，执行在多个帧周期上执行的异步计算内核，执行用于渲染GPU的剔除等)。Each of the GPUs is responsible for rendering objects in a corresponding set of regions, where each group can include one or more regions. Therefore, GPU-A is responsible for rendering objects in each A region in the corresponding group, GPU-B is responsible for rendering objects in each B region in the corresponding group, and GPU-C is responsible for rendering objects in each C region in the corresponding group. and GPU-D are responsible for rendering objects in each D region in the corresponding group. There may also be GPUs that have other responsibilities such that they may not be performing rendering (e.g. executing asynchronous compute kernels that execute over multiple frame cycles, performing culling for rendering GPUs, etc.).

对于每个GPU，要执行的渲染量是不同的。图7B-2图示了根据本公开的一个实施方案的示出当渲染图7B-1的四个对象时由每个GPU执行的渲染的表格。如表格所示，经过几何图形预测试，可以确定对象0是由GPU-B渲染的；对象1是由GPU-C和GPU-D渲染的；对象2是由GPU-A、GPU-B和GPU-D渲染的；且对象3是由GPU-B、GPU-C和GPU-D渲染的。可能仍然存在某一不平衡的渲染，因为GPU A只需要渲染对象2，而GPU D需要渲染对象1、2和3。然而，整体上，利用屏幕区域的交错，图像内对象的渲染在用于图像的多GPU渲染或图像序列中的一个或多个图像中的每一个的渲染的多个GPU之间得到合理平衡。For each GPU, the amount of rendering to be performed is different. Figure 7B-2 illustrates a table showing the rendering performed by each GPU when rendering the four objects of Figure 7B-1, according to one embodiment of the present disclosure. As shown in the table, after geometry pre-testing, it can be determined that object 0 is rendered by GPU-B; object 1 is rendered by GPU-C and GPU-D; object 2 is rendered by GPU-A, GPU-B and GPU -D is rendered; and object 3 is rendered by GPU-B, GPU-C, and GPU-D. There may still be some imbalanced rendering because GPU A only needs to render object 2, while GPU D needs to render objects 1, 2, and 3. Overall, however, with the interleaving of screen areas, the rendering of objects within an image is reasonably balanced between the multiple GPUs used for the multi-GPU rendering of an image or the rendering of each of one or more images in a sequence of images.

图7C是根据本公开的一个实施方案的图示了当多个GPU协作渲染单个图像帧(诸如图7B-1所示的图像帧700B)时由每个GPU执行的每个对象的渲染的示意图。特别地，图7C示出了由四个GPU(例如，GPU-A、GPU-B、GPU-C和GPU-D)中的每一个使用图7A的共享渲染命令缓冲区700A执行的对象0-3的渲染过程。Figure 7C is a schematic diagram illustrating the rendering of each object performed by each GPU when multiple GPUs collaborate to render a single image frame, such as image frame 700B shown in Figure 7B-1, in accordance with one embodiment of the present disclosure. . In particular, Figure 7C illustrates object 0- executed by each of four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) using the shared render command buffer 700A of Figure 7A 3 rendering process.

特别地，相对于时间线740示出了两个渲染时序图。渲染时序图700C-1示出了在一个渲染阶段中对应图像的对象0-3的多GPU渲染，其中每个GPU在没有关于对象0-3和屏幕区域之间的重叠的任何信息的情况下执行渲染。渲染时序图700C-2示出了在同一渲染阶段中对应图像的对象0-3的多GPU渲染，其中在屏幕区域的几何图形测试期间(例如在渲染之前执行)生成的信息与用于通过对应的GPU流水线渲染对象0-3的每个GPU共享。渲染时序图700C-1和700C-2中的每一个示出了每个GPU处理每个几何图形(例如，执行几何图形测试和渲染)所花费的时间。在一个实施方案中，一个几何图形是一个完整的对象。在另一个实施方案中，一个几何图形可以是对象的一部分。为了说明的目的，图7C的示例示出了几何图形的渲染，其中每个几何图形对应于一个对象(例如其整体)。在渲染时序图700C-1和700C-2中的每一个中，不具有与对应GPU的至少一个屏幕区域(例如，在对应的一组区域中)重叠的几何图形(例如对象的图元)的对象(例如几何图形)由用虚线绘制的框表示。另一方面，具有与对应GPU的至少一个屏幕区域(例如，在对应的一组区域中)重叠的几何图形的对象由用实线绘制的框表示。In particular, two rendering timing diagrams are shown relative to timeline 740 . Rendering timing diagram 700C-1 shows multi-GPU rendering of object 0-3 of the corresponding image in one rendering stage, where each GPU does not have any information about the overlap between object 0-3 and the screen area. Perform rendering. Rendering timing diagram 700C-2 illustrates a multi-GPU rendering of objects 0-3 of a corresponding image in the same rendering phase, where the information generated during geometry testing of the screen area (e.g., performed prior to rendering) is consistent with the information used to pass the corresponding The GPU pipeline renders objects 0-3 shared by each GPU. Rendering timing diagrams 700C-1 and 700C-2 each illustrate the time it takes for each GPU to process each geometry (eg, perform geometry testing and rendering). In one embodiment, a geometry is a complete object. In another embodiment, a geometric figure may be part of an object. For purposes of illustration, the example of Figure 7C shows a rendering of geometric figures, where each geometric figure corresponds to an object (eg, its entirety). In each of rendering timing diagrams 700C-1 and 700C-2, there is no geometry (e.g., primitives of an object) that overlaps at least one screen area of the corresponding GPU (e.g., in a corresponding set of areas). Objects (such as geometric figures) are represented by boxes drawn with dashed lines. On the other hand, objects with geometry that overlap at least one screen area of the corresponding GPU (eg, in a corresponding set of areas) are represented by boxes drawn with solid lines.

渲染时序图700C-1示出了使用四个GPU(例如GPU-A、GPU-B、GPU-C和GPU-D)渲染对象0-3。竖直线755a指示对象的渲染阶段的开始，而竖直线755b示出渲染时序图700C-1中对象的渲染阶段的结束。所示的渲染阶段的沿时间线740的开始点和结束点表示同步点，其中四个GPU中的每一个在执行对应的GPU流水线时被同步。例如，在指示渲染阶段结束的竖直线755b处，所有GPU必须等待最慢的GPU(例如GPU-B)通过对应的图形流水线完成对象0-3的渲染，然后才能进入下一个渲染阶段。Rendering timing diagram 700C-1 shows rendering of objects 0-3 using four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D). Vertical line 755a indicates the beginning of the object's rendering phase, while vertical line 755b illustrates the end of the object's rendering phase in rendering timing diagram 700C-1. The illustrated start and end points of the rendering stages along timeline 740 represent synchronization points where each of the four GPUs is synchronized in executing the corresponding GPU pipeline. For example, at vertical line 755b indicating the end of the rendering phase, all GPUs must wait for the slowest GPU (e.g., GPU-B) to complete the rendering of objects 0-3 through the corresponding graphics pipeline before entering the next rendering phase.

在渲染时序图700C-1中不执行几何图形预测试。因此，每个GPU必须通过对应的图形流水线处理每个对象。如果在分配给对应GPU用于对象渲染的任何区域(例如，在对应组中)没有要为对象绘制的像素，则GPU可能无法通过图形流水线完全渲染对象。例如，当对象不重叠时，只执行图形流水线的几何图形处理级。但是，这仍然需要一些时间来处理。Geometry pretesting is not performed in render timing diagram 700C-1. Therefore, each GPU must process each object through the corresponding graphics pipeline. If there are no pixels to be drawn for the object in any area allocated to the corresponding GPU for object rendering (for example, in the corresponding group), the GPU may not be able to fully render the object through the graphics pipeline. For example, when objects do not overlap, only the geometry processing stage of the graphics pipeline is executed. However, this still takes some time to process.

特别地，GPU-A不会完全渲染对象0、1和3，因为它们不与分配给GPU-A用于对象渲染的任何屏幕区域(例如，在对应组中)重叠。这三个对象的渲染显示在虚线框中，指示至少执行了几何图形处理级，但没有完全执行图形流水线。GPU-A完全渲染对象2，因为该对象与分配给GPU-A用于渲染的至少一个屏幕区域重叠。对象2的渲染显示在实线框中，指示对应图形流水线的所有级都执行。类似地，GPU-B不会完全渲染对象1(用虚线框示出)(即至少执行几何图形处理级)，但会完全渲染对象0、2和3(用实线框示出)，因为那些对象与分配给GPU-B用于渲染的至少一个屏幕区域(例如，在对应组中)重叠。同样，GPU-C不会完全渲染对象0和2(用虚线框示出)(即至少执行几何图形处理级)，但会完全渲染对象(用实线框示出)，因为那些对象与分配给GPU-C用于渲染的至少一个屏幕区域(例如，在对应组中)重叠。进一步，GPU-D不会完全渲染对象0(用虚线框示出)(即至少执行几何图形处理级)，但会完全渲染对象1、2和3(用实线框示出)，因为那些对象与分配给GPU-D用于渲染的至少一个屏幕区域(例如，在对应组中)重叠。In particular, GPU-A will not fully render objects 0, 1, and 3 because they do not overlap with any screen area allocated to GPU-A for object rendering (e.g., in the corresponding group). Renderings of these three objects are shown in dashed boxes, indicating that at least the geometry processing stage was executed, but the graphics pipeline was not fully executed. GPU-A renders object 2 entirely because the object overlaps at least one of the screen areas allocated to GPU-A for rendering. The rendering of object 2 is shown in a solid box, indicating that all stages of the corresponding graphics pipeline are executed. Similarly, GPU-B does not completely render object 1 (shown with a dashed box) (i.e. performs at least the geometry processing level), but does fully render objects 0, 2, and 3 (shown with a solid box) because those The object overlaps at least one screen area allocated to GPU-B for rendering (eg, in the corresponding group). Likewise, GPU-C does not fully render objects 0 and 2 (shown with dashed boxes) (i.e., performs at least the geometry processing level), but does fully render objects (shown with solid boxes) because those objects are different from those assigned to At least one screen area used by GPU-C for rendering (eg, in corresponding groups) overlaps. Further, GPU-D will not fully render object 0 (shown with a dashed box) (i.e., perform at least one geometry processing stage), but will fully render objects 1, 2, and 3 (shown with a solid box) because those objects Overlaps with at least one screen area allocated to GPU-D for rendering (e.g., in the corresponding group).

渲染时序图700C-2示出了使用多个GPU对对象0-3进行几何图形预测试701'和渲染702'。竖直线750a指示对象的渲染阶段(例如，包括几何图形预测试和渲染)的开始，而竖直线750b示出渲染时序图700C-2中的对象的渲染阶段的结束。时序图700C-2中所示的渲染阶段的沿时间线740的开始点和结束点表示同步点，其中四个GPU中的每一个在执行对应的GPU流水线时被同步，如前所述。例如，在指示渲染阶段结束的竖直线750b处，所有GPU必须等待最慢的GPU(例如GPU-B)通过对应的图形流水线完成对象0-3的渲染，然后才能进入下一个渲染阶段。Rendering timing diagram 700C-2 shows geometry pre-testing 701' and rendering 702' of object 0-3 using multiple GPUs. Vertical line 750a indicates the beginning of the rendering phase of the object (eg, including geometry pretesting and rendering), while vertical line 750b illustrates the end of the rendering phase of the object in rendering timing diagram 700C-2. The start and end points along timeline 740 of the rendering phase shown in timing diagram 700C-2 represent synchronization points where each of the four GPUs is synchronized in executing the corresponding GPU pipeline, as previously described. For example, at vertical line 750b indicating the end of the rendering stage, all GPUs must wait for the slowest GPU (e.g., GPU-B) to complete the rendering of objects 0-3 through the corresponding graphics pipeline before entering the next rendering stage.

首先，几何图形预测试701'由GPU执行，其中每个GPU针对所有屏幕区域为图像帧的几何图形子集执行几何图形预测试，其中每个屏幕区域被分配给对应的GPU用于对象渲染。如前所述，每个GPU被分配给与图像帧相关联的几何图形的对应部分。几何图形预测试生成关于特定几何图形如何与每个屏幕区域相关的信息，诸如几何图形是否与分配给对应GPU用于对象渲染的任何屏幕区域(例如，在对应组中)重叠。该信息与用于渲染图像帧的每个GPU共享。例如，图7C中所示的几何图形预测试701'包括让GPU-A对对象0执行几何图形预测试，让GPU-B对对象1执行几何图形预测试，让GPU-C对对象2执行几何图形预测试，以及让GPU-D对对象3执行几何图形预测试。根据被测对象，执行几何图形预测试的时间可能会有所不同。例如，对象0的几何图形预测试比对对象1执行几何图形预测试花费的时间更少。这可能是由于对象大小、重叠的屏幕区域数量等原因。First, the geometry pretest 701' is performed by the GPUs, where each GPU performs the geometry pretest for a geometry subset of the image frame for all screen areas, where each screen area is assigned to a corresponding GPU for object rendering. As mentioned before, each GPU is assigned to a corresponding part of the geometry associated with the image frame. The geometry pretest generates information about how a particular geometry relates to each screen area, such as whether the geometry overlaps any screen area (eg, in a corresponding group) assigned to the corresponding GPU for object rendering. This information is shared with each GPU used to render the image frame. For example, geometry pretest 701' shown in Figure 7C includes having GPU-A perform a geometry pretest on object 0, GPU-B performing a geometry pretest on object 1, and GPU-C performing a geometry pretest on object 2. Graphics pretest, and let GPU-D perform geometry pretest on object 3. Depending on the object being tested, the time it takes to perform the geometry pretest may vary. For example, a geometry pretest for object 0 takes less time than a geometry pretest for object 1. This may be due to object size, amount of overlapping screen area, etc.

在几何图形预测试之后，每个GPU对与其屏幕区域相交的所有对象或几何图形执行渲染。在一个实施方案中，一旦几何图形测试完成，每个GPU就开始渲染其几何图形。也就是说，几何图形测试和渲染之间没有同步点。这是可能的，因为正在生成的几何图形测试信息被视为提示而不是硬依赖。例如，GPU-A在GPU-B完成对象1的几何图形预测试之前且因此在GPU-B开始渲染对象0、2和3之前开始渲染对象2。After the geometry pretest, each GPU performs rendering of all objects or geometry that intersects its screen area. In one embodiment, each GPU begins rendering its geometry once geometry testing is complete. That is, there is no synchronization point between geometry testing and rendering. This is possible because the geometry test information being generated is treated as a hint rather than a hard dependency. For example, GPU-A starts rendering object 2 before GPU-B completes geometry pretesting of object 1 and therefore before GPU-B starts rendering objects 0, 2, and 3.

竖直线750a与竖直线755a对齐，使得渲染时序图700C-1和700C-2中的每一个同时开始渲染对象0-1。然而，渲染时序图700C-2中所示的对象0-3的渲染在比渲染时序图700C-1中所示的渲染更短的时间内执行。也就是说，指示下部时序图700C-2的渲染阶段结束的竖直线750b出现早于如竖直线755b所指示的上部时序图700C-1的渲染阶段结束。具体地，当为应用执行图像的几何图形的多GPU渲染(包括在渲染之前针对屏幕区域预测试几何图形，并提供几何图形预测试的结果作为信息(例如提示))时实现渲染对象0-3时的速度增加745。如图所示，速度增加745是时序图700C-2的竖直线750b和时序图700C-1的竖直线755b之间的时间差。Vertical line 750a is aligned with vertical line 755a such that each of rendering timing diagrams 700C-1 and 700C-2 begins rendering object 0-1 at the same time. However, the rendering of object 0-3 shown in rendering timing diagram 700C-2 is performed in a shorter time than the rendering shown in rendering timing diagram 700C-1. That is, vertical line 750b indicating the end of the rendering phase of lower timing diagram 700C-2 occurs earlier than the end of the rendering phase of upper timing diagram 700C-1 as indicated by vertical line 755b. Specifically, render objects 0-3 are implemented when performing multi-GPU rendering of an image's geometry for an application, including pre-testing the geometry for a screen area before rendering and providing the results of the geometry pre-testing as information (e.g., a hint). The speed is increased by 745. As shown, velocity increase 745 is the time difference between vertical line 750b of timing diagram 700C-2 and vertical line 755b of timing diagram 700C-1.

速度增加是通过几何图形预测试期间生成的信息的生成和共享来实现的。例如，在几何图形预测试期间，GPU-A生成指示对象0只需要由GPU-B渲染的信息。因此，GPU-B被告知它应该渲染对象0，而其他GPU(例如GPU-A、GPU-C和GPU-D)可以完全跳过对象0的渲染，因为对象0不与分配给这些GPU用于对象渲染的任何区域(例如在对应组中)重叠。例如，这些GPU不需要执行几何图形处理级，而在没有几何图形预测试的情况下，即使这些GPU不会完全渲染对象0，也会处理该级，如时序图700C-1所示。同样，在几何图形预测试期间，GPU-B生成指示对象1应由GPU-C和GPU-D渲染并且GPU-A和GPU-B可以完全跳过对象1的渲染的信息，因为对象1不与分配给GPU-A或GPU-B用于对象渲染的任何区域(例如，在各自对应组中)重叠。同样，在几何图形预测试期间，GPU-C生成指示对象2应由GPU-A、GPU-B和GPU-D渲染并且GPU-C可以完全跳过对象2的渲染的信息，因为对象2不与分配给GPU-C用于对象渲染的任何区域(例如，在对应组中)重叠。进一步，在几何图形预测试期间，GPU-D生成指示对象3应由GPU-B、GPU-C和GPU-D渲染并且GPU-A可以完全跳过对象3的渲染的信息，因为对象3不与分配给GPU-A用于对象渲染的任何区域(例如，在对应组中)重叠。Increased speed is achieved through the generation and sharing of information generated during geometry pretesting. For example, during geometry pretest, GPU-A generates information indicating that object 0 only needs to be rendered by GPU-B. Therefore, GPU-B is told that it should render object 0, while other GPUs (such as GPU-A, GPU-C, and GPU-D) can skip the rendering of object 0 entirely because object 0 is not the same as those assigned to these GPUs for Any areas where objects are rendered (such as within corresponding groups) overlap. For example, these GPUs are not required to perform the geometry processing stage, and without geometry pretest, these GPUs will process this stage even though they will not fully render object 0, as shown in timing diagram 700C-1. Likewise, during the geometry pretest, GPU-B generates information indicating that object 1 should be rendered by GPU-C and GPU-D and that GPU-A and GPU-B can skip rendering of object 1 entirely because object 1 is not related to Any areas assigned to GPU-A or GPU-B for object rendering (e.g., within their respective groups) overlap. Likewise, during the geometry pretest, GPU-C generates information indicating that object 2 should be rendered by GPU-A, GPU-B, and GPU-D and that GPU-C can skip rendering of object 2 entirely because object 2 is not related to Any regions allocated to GPU-C for object rendering (e.g., within corresponding groups) overlap. Further, during the geometry pretest, GPU-D generates information indicating that object 3 should be rendered by GPU-B, GPU-C, and GPU-D and that GPU-A can skip the rendering of object 3 entirely because object 3 is not related to Any regions allocated to GPU-A for object rendering (e.g., within corresponding groups) overlap.

由于几何图形预测试生成的信息在GPU之间共享，因此每个GPU都可以确定要渲染哪些对象。因此，在执行几何图形预测试并将测试结果与所有GPU共享之后，每个GPU都具有关于哪些对象或几何图形需要由对应GPU渲染的信息。例如，GPU-A渲染对象2；GPU-B渲染对象0、2和3；GPU-C渲染对象1和3；和GPU-D渲染对象1、2和3。Because the information generated by the geometry pretest is shared between GPUs, each GPU can determine which objects to render. Therefore, after performing a geometry pretest and sharing the test results with all GPUs, each GPU has information about which objects or geometries need to be rendered by the corresponding GPU. For example, GPU-A renders object 2; GPU-B renders objects 0, 2, and 3; GPU-C renders objects 1 and 3; and GPU-D renders objects 1, 2, and 3.

特别地，GPU A对对象1执行几何图形处理，并确定对象1可以被GPU-B跳过，因为对象1不与分配给GPU-B用于对象渲染的任何区域(例如，在对应组中)重叠。此外，对象1没有完全由GPU-A渲染，因为它不与分配给GPU-A用于对象渲染的任何区域(例如，在对应组中)重叠。由于在GPU-B开始对对象1进行几何图形处理之前确定对象1没有与分配给GPU-B的任何区域重叠，因此GPU-B跳过对象1的渲染。Specifically, GPU A performs geometry processing on Object 1 and determines that Object 1 can be skipped by GPU-B because Object 1 is not associated with any region allocated to GPU-B for object rendering (e.g., in the corresponding group) overlapping. Furthermore, Object 1 is not fully rendered by GPU-A because it does not overlap with any area allocated to GPU-A for object rendering (e.g., in the corresponding group). Since it is determined that Object 1 does not overlap any area assigned to GPU-B before GPU-B begins geometry processing on Object 1, GPU-B skips rendering of Object 1.

图8A-图8B示出了针对屏幕区域820A和820B的对象测试，其中屏幕区域可以是交错的(例如，屏幕区域820A和820B示出了显示器的一部分)。特别地，通过在屏幕中渲染对象之前执行几何图形测试，对单个图像帧或图像帧序列中的一个或多个图像帧中的每一个执行对象的多GPU渲染。如图所示，GPU-A被分配负责在屏幕区域820A中渲染对象。GPU-B被分配负责在屏幕区域820B中渲染对象。为“几何图形”生成信息，其中几何图形可以是整个对象或对象的一部分。例如，一个几何图形可以是对象810，或对象810的部分。8A-8B illustrate object testing for screen areas 820A and 820B, where the screen areas may be interleaved (eg, screen areas 820A and 820B show a portion of a display). In particular, multi-GPU rendering of an object is performed on a single image frame or on each of one or more image frames in a sequence of image frames by performing geometry testing before rendering the object in the screen. As shown, GPU-A is assigned the responsibility of rendering objects in screen area 820A. GPU-B is assigned the responsibility of rendering objects in screen area 820B. Generates information for "geometry", where the geometry can be an entire object or a portion of an object. For example, a geometric shape may be an object 810, or a portion of an object 810.

图8A图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像时针对屏幕区域的对象测试。如前所述，几何图形可以是对象，使得几何图形对应于由对应的绘制调用使用或生成的几何图形。在几何图形预测试期间，可以确定对象810与区域820A重叠。也就是说，对象810的部分810A与区域820A重叠。在这种情况下，GPU-A的任务是渲染对象810。同样，在几何图形预测试期间，可以确定对象810与区域820B重叠。也就是说，对象810的部分810B与区域820B重叠。在这种情况下，GPU-B的任务也是渲染对象810。Figure 8A illustrates object testing for screen area when multiple GPUs collaborate to render a single image, according to one embodiment of the present disclosure. As mentioned before, the geometry can be an object such that the geometry corresponds to the geometry used or generated by the corresponding draw call. During the geometry pre-test, it may be determined that object 810 overlaps area 820A. That is, portion 810A of object 810 overlaps area 820A. In this case, GPU-A is tasked with rendering object 810 . Likewise, during geometry pre-testing, it may be determined that object 810 overlaps region 820B. That is, portion 810B of object 810 overlaps area 820B. In this case, GPU-B is also tasked with rendering object 810.

图8B图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像帧时针对屏幕区域和/或屏幕子区域的对象的部分的测试。也就是说，几何图形可以是对象的一部分。例如，对象810可以被分割成多个，使得由绘制调用使用或生成的几何图形被细分成更小的几何图形。在一个实施方案中，几何图形的每一个大致是分配的位置高速缓存和/或参数高速缓存的大小。在那种情况下，在几何图形测试期间为那些较小的几何图形生成信息(例如一个或多个提示)，其中该信息由渲染GPU使用，如前所述。8B illustrates testing of portions of an object of a screen region and/or a screen sub-region when multiple GPUs collaborate to render a single image frame, in accordance with one embodiment of the present disclosure. That is, geometry can be part of an object. For example, object 810 may be split into multiples such that the geometry used or generated by the draw call is subdivided into smaller geometries. In one embodiment, each of the geometries is approximately the size of the allocated location cache and/or parameter cache. In that case, information (such as one or more hints) is generated for those smaller geometries during geometry testing, where this information is used by the rendering GPU, as described previously.

例如，对象810被分割成更小的对象，使得用于区域测试的几何图形对应于这些更小的对象。如图所示，对象810被分割成几何图形“a”、“b”、“c”、“d”、“e”和“f”。在几何图形预测试之后，GPU-A只渲染几何图形“a”、“b”、“c”、“d”和“e”。也就是说，GPU-A可以跳过渲染几何图形“f”。此外，在几何图形预测试之后，GPU-B仅渲染几何图形“d”、“e”和“f”。也就是说，GPU-B可以跳过渲染几何图形“a”、“b”和“c”。For example, object 810 is segmented into smaller objects such that the geometry used for area testing corresponds to these smaller objects. As shown, object 810 is segmented into geometric shapes "a", "b", "c", "d", "e" and "f". After the geometry pretest, GPU-A only renders geometries "a", "b", "c", "d" and "e". That is, GPU-A can skip rendering geometry "f". Furthermore, after the geometry pre-test, GPU-B only renders geometries "d", "e" and "f". That is, GPU-B can skip rendering geometries "a", "b", and "c".

在一个实施方案中，由于几何图形处理级被配置为执行顶点处理和图元处理两者，因此可能在几何图形处理级中使用着色器对一个几何图形执行几何图形预测试。例如，几何图形处理级生成信息(例如提示)，诸如通过针对GPU屏幕区域测试几何图形的边界平截头体，这可以由软件着色器操作执行。在一个实施方案中，该测试通过使用专用指令或通过硬件实现的指令来加速，从而实现软件/硬件解决方案。也就是说，一个或多个专用指令用于加速关于几何图形及其与屏幕区域的关系的信息的生成。例如，一个几何图形的图元的顶点的齐次坐标作为输入提供给几何图形处理级的几何图形预测试指令。该测试可以为每个GPU生成布尔返回值，该布尔返回值指示图元是否与分配给该GPU用于对象渲染的任何屏幕区域(例如，在对应组中)重叠。这样，在几何图形预测试期间生成的关于对应几何图形及其与屏幕区域的关系的信息(例如提示)在几何图形处理级中由着色器生成。In one embodiment, since the geometry processing stage is configured to perform both vertex processing and primitive processing, it is possible to perform geometry pretesting on one geometry using a shader in the geometry processing stage. For example, the geometry processing stage generates information (eg, hints), such as by testing the bounding frustum of the geometry against the GPU screen area, which may be performed by a software shader operation. In one embodiment, the test is accelerated through the use of dedicated instructions or instructions implemented through hardware, thereby implementing a software/hardware solution. That is, one or more specialized instructions are used to accelerate the generation of information about the geometry and its relationship to the screen area. For example, the homogeneous coordinates of the vertices of a geometry's primitives are provided as input to the geometry pretest instructions of the geometry processing stage. This test can generate a Boolean return value for each GPU that indicates whether the primitive overlaps any screen area assigned to that GPU for object rendering (for example, in the corresponding group). In this way, the information (such as hints) generated during geometry pretesting about the corresponding geometry and its relationship to the screen area is generated by the shader in the geometry processing stage.

在另一个实施方案中，可以在硬件光栅化级执行对一个几何图形的几何图形预测试。例如，硬件扫描转换器可以被配置为执行几何图形预测试，使得扫描转换器生成关于分配给多个GPU用于对应图像帧的对象渲染的所有屏幕区域的信息。In another embodiment, geometry pretesting of a geometry may be performed at the hardware rasterization level. For example, a hardware scan converter may be configured to perform geometry pretesting such that the scan converter generates information about all screen areas allocated to multiple GPUs for object rendering of corresponding image frames.

在又一个实施方案中，几何图形可以是图元。也就是说，用于几何图形预测试的对象的部分可以是图元。因此，一个GPU在几何图形预测试期间生成的信息(例如提示)指示是否需要由另一个渲染GPU渲染各个三角形(例如表示图元)。In yet another embodiment, the geometries may be primitives. That is, parts of the object used for geometry pretesting can be primitives. Thus, information (e.g., hints) generated by one GPU during geometry pretest indicates whether individual triangles (e.g., representation primitives) need to be rendered by another rendering GPU.

在一个实施方案中，在几何图形预测试期间生成并由用于渲染的GPU共享的信息包括与分配给对应GPU用于对象渲染的任何屏幕区域(例如，在对应组中)重叠的图元数(例如，幸存图元计数)。该信息还可以包括用于构建或定义这些图元的顶点数量。也就是说，该信息包括幸存顶点计数。因此，在渲染时，对应的渲染GPU可以使用提供的顶点计数来分配位置高速缓存和参数高速缓存中的空间。例如，在一个实施方案中，不需要的顶点没有任何分配的空间，这可以提高渲染效率。In one embodiment, information generated during geometry pretesting and shared by the GPUs used for rendering includes the number of primitives that overlap any screen area (eg, in the corresponding group) assigned to the corresponding GPU for object rendering. (e.g. surviving primitive count). This information can also include the number of vertices used to build or define these primitives. That is, the information includes surviving vertex counts. Therefore, when rendering, the corresponding rendering GPU can use the provided vertex count to allocate space in the position cache and parameter cache. For example, in one embodiment, unnecessary vertices do not have any allocated space, which can improve rendering efficiency.

在其他实施方案中，在几何图形预测试期间生成的信息(例如提示)包括与分配给对应GPU用于对象渲染的任何屏幕区域(例如在对应组中)重叠的特定图元(例如作为精确匹配的幸存图元)。也就是说，为渲染GPU生成的信息包括一组特定的用于渲染的图元。该信息还可以包括用于构建或定义这些图元的特定顶点。也就是说，为渲染GPU生成的信息包括一组特定的用于渲染的顶点。例如，此信息可以在渲染几何图形时在其几何图形处理级期间节省其他渲染GPU时间。In other embodiments, the information (eg, hints) generated during the geometry pretest includes specific primitives that overlap (eg, as an exact match) any screen area (eg, in the corresponding group) assigned to the corresponding GPU for object rendering. of surviving primitives). That is, the information generated by the rendering GPU includes a specific set of primitives used for rendering. This information can also include the specific vertices used to build or define these primitives. That is, the information generated by the rendering GPU includes a specific set of vertices used for rendering. For example, this information can save other rendering GPU time during its geometry processing stage when rendering geometry.

在其他实施方案中，可能存在与在几何图形测试期间生成信息相关联的处理开销(软件或硬件)。在这种情况下，跳过为某些几何图形生成信息可能是有益的。也就是说，作为提示提供的信息是为某些对象生成的，而不是为其他对象生成的。例如，表示天空盒或大片地形的几何图形(例如，对象或对象的一部分)可以包括大的三角形。在这种情况下，用于图像帧或图像帧序列中的一个或多个图像帧中的每一个的多GPU渲染的每个GPU都可能需要渲染这些几何图形。也就是说，可以根据对应的几何图形的性质来生成或不生成信息。In other embodiments, there may be processing overhead (software or hardware) associated with generating information during geometry testing. In this case, it may be beneficial to skip generating information for certain geometries. That is, the information provided as a hint is generated for some objects but not for others. For example, geometry (eg, an object or a portion of an object) that represents a skybox or a large swath of terrain may include large triangles. In this case, each GPU used for multi-GPU rendering of the image frame or each of one or more image frames in the sequence of image frames may need to render these geometries. That is, information may or may not be generated depending on the properties of the corresponding geometric figure.

图9A-图9C图示了根据本公开的一个实施方案的当多个GPU协作渲染单个图像时用于将屏幕区域分配给对应GPU的各种策略。为了实现GPU处理效率，在将屏幕划分为区域时可以使用各种技术，诸如增加或减少区域数量(例如，选择正确的区域量)、交错区域、增加或减少用于交错的区域的数量、在交错区域时选择特定模式等。例如，多个GPU被配置为通过在渲染对应图像中的对象之前针对交错的屏幕区域预测试几何图形来为应用生成的图像帧执行几何图形的多GPU渲染。图9A-图9C中的屏幕区域的配置被设计用于减少多个GPU之间的任何渲染时间不平衡。测试的复杂性(例如重叠对应的屏幕区域)取决于屏幕区域如何分配给GPU来变化。如图9A-图9C中示出的示意图所示，粗体框910是在渲染图像时使用的对应屏幕或显示器的轮廓。9A-9C illustrate various strategies for allocating screen areas to corresponding GPUs when multiple GPUs collaborate to render a single image, according to one embodiment of the present disclosure. To achieve GPU processing efficiency, various techniques can be used when dividing the screen into regions, such as increasing or decreasing the number of regions (e.g., choosing the right amount of regions), interleaving regions, increasing or decreasing the number of regions used for interleaving, Select specific modes when interleaving areas, etc. For example, multiple GPUs are configured to perform multi-GPU rendering of geometry for image frames generated by the application by pre-testing the geometry for interleaved screen areas before rendering objects in the corresponding images. The configuration of the screen areas in Figures 9A-9C is designed to reduce any rendering time imbalance between multiple GPUs. The complexity of the test (e.g. overlapping corresponding screen areas) varies depending on how the screen area is allocated to the GPU. As shown in the schematic diagrams shown in Figures 9A-9C, the bold box 910 is the outline of the corresponding screen or display used in rendering the image.

在一个实施方案中，多个屏幕区域或多个区域中的每一个都具有统一的大小。在一个实施方案中，多个屏幕区域中的每一个在大小上是不统一的。在又一个实施方案中，多个屏幕区域中的屏幕区域的数量和大小动态地改变。In one embodiment, the multiple screen areas or each of the multiple areas have a uniform size. In one embodiment, each of the plurality of screen areas is not uniform in size. In yet another embodiment, the number and size of the screen areas in the plurality of screen areas dynamically change.

特别地，图9A图示了屏幕910的简单模式900A。每个屏幕区域的大小都是统一的。例如，每个区域的大小可以是尺寸为2像素的幂的矩形。例如，每个区域的大小可能为256x256像素。如图所示，区域分配是一个棋盘模式，其中一行A和B区域与另一行B和C区域交替。在几何图形预测试期间可以容易地测试模式900A。但是，可能存在一些渲染效率低下。例如，分配给每个GPU的屏幕面积基本上不同(即，屏幕910中屏幕区域C和区域D的覆盖范围较小)，这可能导致每个GPU的渲染时间不平衡。In particular, FIG. 9A illustrates a simple mode 900A of screen 910. Each screen area is uniform in size. For example, the size of each region may be a rectangle with dimensions that are powers of 2 pixels. For example, each area might be 256x256 pixels in size. As shown in the figure, the zone allocation is a checkerboard pattern in which one row of A and B zones alternates with another row of B and C zones. Pattern 900A can be easily tested during geometry pre-testing. However, there may be some rendering inefficiencies. For example, the screen area allocated to each GPU is substantially different (i.e., screen area C and area D have smaller coverage in screen 910), which may result in an imbalance in the rendering time of each GPU.

图9B图示了屏幕910的屏幕区域的模式900B。每个屏幕或子区域的大小都是统一的。屏幕区域被分配和分布，以减少GPU之间渲染时间的不平衡。例如，以模式900B将GPU分配给屏幕区域会导致跨屏幕910分配给每个GPU的屏幕像素量几乎相等。也就是说，屏幕区域被分配给GPU以使屏幕910中的屏幕面积或覆盖范围均等。例如，如果每个区域的大小可能是256x256像素，则每个区域在屏幕910中具有大致相同的覆盖范围。特别地，一组屏幕区域A覆盖了6x256x256像素大小的面积，一组屏幕区域B覆盖了5.75x256x256像素大小的面积，一组屏幕区域C覆盖了5.5x256x256像素大小的面积，和一组屏幕区域D覆盖了5.5x256x256像素大小的面积。FIG. 9B illustrates a pattern 900B of screen area of screen 910. Each screen or subarea is uniformly sized. Screen areas are allocated and distributed to reduce imbalances in rendering times between GPUs. For example, allocating GPUs to screen areas in mode 900B results in a nearly equal amount of screen pixels being allocated to each GPU across screen 910 . That is, the screen area is allocated to the GPU to equalize the screen area or coverage in screen 910 . For example, if the size of each area might be 256x256 pixels, each area would have approximately the same coverage in screen 910. Specifically, a set of screen areas A covers an area of 6x256x256 pixels, a set of screen areas B covers an area of 5.75x256x256 pixels, a set of screen areas C covers an area of 5.5x256x256 pixels, and a set of screen areas D Covers an area of 5.5x256x256 pixels.

图9C图示了屏幕910的屏幕区域的模式900C。每个屏幕区域的大小都不统一。也就是说，分配给GPU的用于负责渲染对象的屏幕区域的大小可能不统一。特别地，屏幕910被划分使得每个GPU被分配给相同数量的像素。例如，如果将4K显示器(3840x2160)竖直均分为四个区域，则每个区域的高度为520个像素。但是，通常GPU在32x32像素块中执行许多操作，并且520个像素不是32个像素的倍数。因此，在一个实施方案中，模式900C可以包括高度为512个像素(32的倍数)的块，以及高度为544个像素(也是32的倍数)的其他块。其他实施方案可以使用不同大小的块。模式900C通过使用非统一屏幕区域示出分配给每个GPU的等量屏幕像素。9C illustrates a pattern 900C of screen area of screen 910. Each screen area is not uniformly sized. That is, the area of the screen allocated to the GPU for rendering objects may not be uniformly sized. In particular, screen 910 is divided so that each GPU is assigned the same number of pixels. For example, if a 4K monitor (3840x2160) is divided vertically into four areas, each area will be 520 pixels tall. However, typically GPUs perform many operations in 32x32 pixel blocks, and 520 pixels is not a multiple of 32 pixels. Thus, in one embodiment, pattern 900C may include a block with a height of 512 pixels (a multiple of 32), and other blocks with a height of 544 pixels (also a multiple of 32). Other implementations may use different sized blocks. Mode 900C shows an equal amount of screen pixels allocated to each GPU by using non-uniform screen areas.

在又一个实施方案中，应用在执行图像渲染时的需求随时间变化，并且屏幕区域是动态选择的。例如，如果已知大部分渲染时间都花在屏幕的下半部分，那么以这样一种方式分配区域是有利的，即显示器下半部分的几乎等量的屏幕像素被分配给用于渲染对应图像的每个GPU。也就是说，分配给用于渲染对应图像的每个GPU的区域可以动态地改变。例如，可以基于游戏模式、不同的游戏、屏幕大小、为区域选择的模式等应用改变。In yet another embodiment, the application's needs in performing image rendering change over time, and the screen area is dynamically selected. For example, if it is known that most of the rendering time is spent in the lower half of the screen, it is advantageous to allocate the area in such a way that an almost equal amount of screen pixels in the lower half of the display is allocated for rendering the corresponding image of each GPU. That is, the area allocated to each GPU for rendering the corresponding image can dynamically change. For example, changes can be applied based on game mode, different games, screen size, mode selected for the region, etc.

图10是图示了根据本公开的一个实施方案的用于执行几何图形预测试的GPU到几何图形的分配的各种分布的示意图。也就是说，图10示出了在多个GPU之间的几何图形预测试期间生成信息的责任分布。如前所述，每个GPU被分配给图像帧几何图形的对应部分，其中该部分可以进一步划分为对象、对象的部分、几何图形、多个几何图形等。几何图形预测试包括确定特定几何图形是否与分配给对应GPU以进行对象渲染的任何一个或多个屏幕区域重叠。几何图形预测试通常在实施方案中由GPU同时针对对应图像帧的所有几何图形(例如所有几何图形)执行。以这种方式，几何图形测试由GPU协作执行，允许每个GPU知道要渲染哪些几何图形，以及要跳过哪些几何图形的渲染，如前所述。Figure 10 is a schematic diagram illustrating various distributions of GPU-to-geometry allocations for performing geometry pre-testing in accordance with one embodiment of the present disclosure. That is, Figure 10 shows the distribution of responsibilities for generating information during geometry pre-testing across multiple GPUs. As mentioned before, each GPU is assigned to a corresponding portion of the image frame's geometry, where that portion can be further divided into objects, parts of objects, geometries, multiple geometries, etc. Geometry pre-testing involves determining whether a particular geometry overlaps any screen area or areas allocated to the corresponding GPU for object rendering. Geometry pre-testing is typically performed in implementations by the GPU for all geometries (eg, all geometries) of the corresponding image frame simultaneously. In this way, geometry testing is performed collaboratively by the GPUs, allowing each GPU to know which geometries to render, and which geometries to skip rendering, as described previously.

如图10所示，每个几何图形可以是对象、对象的一部分等。例如，几何图形可以是对象的部分，诸如几何图形的大小大致是被分配的位置高速缓存和/或参数高速缓存的大小，如前所述。纯粹为了说明，对象0(例如，由渲染命令缓冲区700A中的命令722指定渲染)被分割成几何图形“a”、“b”、“c”、“d”、“e”和“f”，诸如图8B中的对象810。同样，对象1(例如，由渲染命令缓冲区700A中的命令724指定渲染)被分割成几何图形“g”、“h”和“i”。此外，对象2(例如，由渲染命令缓冲区700A中的命令724指定渲染)被分割成几何图形“j”、“k”、“l”、“m”、“n”和“o”。为了将几何图形测试的责任分布给GPU，可以对这些几何图形进行排序(例如，a-o)。As shown in Figure 10, each geometric figure can be an object, a part of an object, etc. For example, the geometry may be part of an object, such that the size of the geometry is approximately the size of the allocated location cache and/or parameter cache, as previously described. Purely for illustration, object 0 (eg, specified for rendering by command 722 in render command buffer 700A) is split into geometries "a", "b", "c", "d", "e", and "f" , such as object 810 in Figure 8B. Likewise, object 1 (eg, specified for rendering by command 724 in render command buffer 700A) is split into geometries "g", "h", and "i". Additionally, object 2 (eg, specified for rendering by command 724 in render command buffer 700A) is split into geometries "j", "k", "l", "m", "n", and "o". To distribute the responsibility for geometry testing to the GPU, the geometries can be ordered (e.g., a-o).

分布1010(例如ABCDABCDABCD...行)示出了在多个GPU之间执行几何图形测试的责任的均匀分布。特别地，不是让一个GPU占用几何图形的前四分之一(例如，在一个块中，诸如GPU A占用大约16个中的前四个，包括“a”、“b”、“c”和“d”用于几何图形测试)，和第二GPU占用第二个四分之一，依此类推，到GPU的分配是交错的。也就是说，连续的几何图形被分配给不同的GPU。例如，几何图形“a”分配给GPU-A，几何图形“b”分配给GPU-B，几何图形“c”分配给GPU-C，几何图形“d”分配给GPU-D，几何图形“e”分配给GPU-A，几何图形“f”分配给GPU-B，几何图形“g”分配给GPU-C，等等。结果，几何图形测试的处理在GPU(例如，GPU-A、GPU-B、GPU-C和GPU-D)之间大致平衡。Distribution 1010 (eg, row ABCDABCDABCD...) shows an even distribution of responsibility for performing geometry testing across multiple GPUs. In particular, instead of having one GPU occupy the first quarter of the geometry (e.g. in a block, such as GPU A occupies the first four of approximately 16, including "a", "b", "c" and "d" for geometry testing), and the second GPU takes the second quarter, and so on, with allocations to the GPUs being staggered. That is, contiguous geometries are assigned to different GPUs. For example, geometry "a" is assigned to GPU-A, geometry "b" is assigned to GPU-B, geometry "c" is assigned to GPU-C, geometry "d" is assigned to GPU-D, geometry "e" ” is assigned to GPU-A, geometry “f” is assigned to GPU-B, geometry “g” is assigned to GPU-C, and so on. As a result, processing of geometry tests is roughly balanced between GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D).

分布1020(例如ABBCDABBCDABBCD...行)示出了在多个GPU之间执行几何图形测试的责任的非对称分布。当某些GPU在渲染对应图像帧时比其他GPU有更多时间来执行几何图形测试时，非对称分布可能是有利的。例如，一个GPU可能比其他GPU更早地完成了场景的前一帧或多帧的对象渲染，因此(因为预计它也会更早地完成这一帧)它可以被分配更多的几何图形用于执行几何图形测试。同样，对GPU的分配是交错的。如图所示，GPU-B被分配了比其他GPU更多的几何图形用于几何图形预测试。举例来说，几何图形“a”分配给GPU-A，几何图形“b”分配给GPU-B，几何图形“c”也分配给GPU-B，几何图形“d”分配给GPU-C，几何图形“e”分配给GPU-D，几何图形“f”分配给GPU-A，几何图形“g”分配给GPU-B，几何图形“h”也分配给GPU-B，几何图形“i”分配给GPU-C等。尽管几何图形测试到GPU的分配可能不平衡，但是完整阶段(例如，几何图形预测试和几何图形渲染)的组合处理可能最后是大致平衡的(例如，每个GPU花费大约相同量的时间来执行几何图形预测试和几何图形渲染)。Distribution 1020 (eg row ABBCDABBCDABBCD...) shows an asymmetric distribution of responsibility for performing geometry testing across multiple GPUs. Asymmetric distribution can be advantageous when some GPUs have more time than other GPUs to perform geometry tests while rendering corresponding image frames. For example, one GPU may have completed object rendering for the previous frame or frames of the scene earlier than the other GPUs, and therefore (because it is expected to complete this frame earlier too) it may be allocated more geometry to use. For performing geometry tests. Likewise, allocation to GPUs is staggered. As shown in the figure, GPU-B is allocated more geometry than other GPUs for geometry pre-testing. For example, geometry "a" is assigned to GPU-A, geometry "b" is assigned to GPU-B, geometry "c" is also assigned to GPU-B, geometry "d" is assigned to GPU-C, geometry Geometry "e" is assigned to GPU-D, geometry "f" is assigned to GPU-A, geometry "g" is assigned to GPU-B, geometry "h" is also assigned to GPU-B, geometry "i" is assigned To GPU-C etc. Although the distribution of geometry testing to GPUs may be unbalanced, the combined processing of complete stages (e.g., geometry pretest and geometry rendering) may end up being roughly balanced (e.g., each GPU takes approximately the same amount of time to perform Geometry pretesting and geometry rendering).

图11A-图11B图示了在分配在多个GPU之间执行几何图形测试的责任时对一个或多个图像帧的统计数据的使用。例如，基于统计数据，一些GPU可以在几何图形测试期间处理更多或更少的几何图形，以生成在渲染时有用的信息。11A-11B illustrate the use of statistics for one or more image frames when distributing responsibility for performing geometry testing among multiple GPUs. For example, based on statistics, some GPUs can process more or less geometry during geometry testing to generate information that is useful when rendering.

特别地，图11A是图示了根据本公开的一个实施方案的由多个GPU对前一图像帧进行几何图形预测试和渲染，以及在当前图像帧使用在渲染期间收集的统计数据来影响将当前图像帧的几何图形的预测试分配给多个GPU的示意图。纯粹为了说明，在图11A的第二帧1100B中，GPU-B处理的几何图形(例如在预测试期间)是其他GPU(例如GPU-A、GPU-C和GPU-D)的两倍。更多几何图形的分布和分配给GPU-B以在当前图像帧中执行几何图形预测试是基于在渲染前一个图像帧或前几个图像帧期间收集的统计数据。In particular, FIG. 11A is a diagram illustrating geometry pre-testing and rendering of a previous image frame by multiple GPUs and using statistics collected during rendering to affect the current image frame, in accordance with one embodiment of the present disclosure. Illustration of a pre-test distribution of the geometry of the current image frame to multiple GPUs. Purely for illustration, in the second frame 1100B of Figure 11A, GPU-B processes twice as much geometry (eg, during pre-testing) as the other GPUs (eg, GPU-A, GPU-C, and GPU-D). The distribution and allocation of more geometries to GPU-B to perform geometry pretesting in the current image frame is based on statistics collected during the rendering of the previous image frame or several image frames.

例如，时序图1100A示出了针对前一图像帧的几何图形预测试701A和渲染702A，其中四个GPU(例如GPU-A、GPU-B、GPU-C和GPU-D)用于两个过程。前一图像帧的几何图形(例如多个几何图形)的分配均匀分布在GPU之间。每个GPU对几何图形预测试701A的大致平衡性能表明了这一点。For example, timing diagram 1100A shows geometry pretest 701A and rendering 702A for a previous image frame, with four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) used for both processes. . The distribution of geometry (e.g., multiple geometries) from the previous image frame is evenly distributed between GPUs. This is shown by the roughly balanced performance of each GPU on the Geometry Pretest 701A.

从一个或多个图像帧收集的渲染统计数据可用于确定如何执行当前图像帧的几何图形测试和渲染。也就是说，可以提供统计数据作为在执行后续图像帧(例如，当前图像帧)的几何图形测试和渲染时使用的信息。例如，在前一图像帧的对象(例如几何图形)的渲染期间收集的统计数据可以指示GPU-B比其他GPU更早地完成了渲染。特别地，GPU-B在渲染与分配给GPU-B用于对象渲染的任何屏幕区域(例如，在对应组中)重叠的几何图形的其部分之后具有空闲时间1130A。其他GPU-A、GPU-C和GPU-D中的每一个执行渲染大约直到前一图像帧的对应帧周期的结束710。Rendering statistics collected from one or more image frames can be used to determine how to perform geometry testing and rendering of the current image frame. That is, the statistical data may be provided as information used when performing geometry testing and rendering of subsequent image frames (eg, the current image frame). For example, statistics collected during the rendering of an object (such as geometry) from the previous image frame may indicate that GPU-B completed the rendering earlier than the other GPUs. In particular, GPU-B has an idle time 1130A after rendering its portion of the geometry that overlaps any screen area assigned to GPU-B for object rendering (eg, in the corresponding group). Each of the other GPU-A, GPU-C, and GPU-D performs rendering approximately until the end of the corresponding frame period of the previous image frame 710.

当执行应用时，可以为特定场景生成前一图像帧和当前图像帧。因此，从一个场景到另一个场景的对象在数量和位置上可能大致相似。在这种情况下，在图像帧序列中的多个图像帧之间，对于GPU执行几何图形预测试和渲染的时间将是相似的。也就是说，基于统计数据合理推测，GPU-B在执行当前图像帧中的几何图形测试和渲染时也会有空闲时间。因此，GPU-B可以被分配更多的几何图形用于当前帧中的几何图形预测试。例如，通过在几何图形预测试期间让GPU-B处理更多几何图形，结果是GPU-B在渲染当前图像帧中的对象后与其他GPU以大致相同的时间完成。也就是说，GPU-A、GPU-B、GPU-C和GPU-D中的每一个执行渲染大约直到当前图像帧的对应帧周期的结束711。在一个实施方案中，减少了渲染当前图像帧的总时间，使得在使用渲染统计数据时渲染当前图像帧花费更少的时间。因此，前一帧和/或前多帧的渲染的统计数据可用于调整几何图形预测试，诸如在当前图像帧中在GPU之间分配几何图形(例如多个几何图形)的分布。When the application is executed, the previous image frame and the current image frame can be generated for a specific scene. Therefore, objects may be roughly similar in number and location from one scene to another. In this case, the time it takes for the GPU to perform geometry pretesting and rendering will be similar between multiple image frames in the image frame sequence. That is, it is reasonable to speculate based on statistical data that GPU-B will also have idle time while performing geometry testing and rendering in the current image frame. Therefore, GPU-B can be allocated more geometries for geometry pre-testing in the current frame. For example, by having GPU-B process more geometry during the geometry pretest, the result is that GPU-B finishes rendering the objects in the current image frame in roughly the same time as the other GPUs. That is, each of GPU-A, GPU-B, GPU-C, and GPU-D performs rendering approximately until the end 711 of the corresponding frame period of the current image frame. In one embodiment, the total time to render the current image frame is reduced such that it takes less time to render the current image frame when rendering statistics are used. Accordingly, the statistics of the rendering of the previous frame and/or multiple previous frames may be used to adjust the geometry pretest, such as the distribution of geometry (eg, multiple geometries) across GPUs in the current image frame.

图11B是图示了根据本公开的一个实施方案的图形处理方法的流程图1100B，包括由多个GPU对前一图像帧进行几何图形预测试和渲染，以及在当前图像帧中使用在渲染期间收集的统计数据来影响将当前图像帧的几何图形的预测试分配给多个GPU。图11A的示意图图示了在流程图1100B的方法中使用统计数据来确定用于图像帧的GPU之间的几何图形(例如多个几何图形)分配的分布。如前所述，各种架构可以包括多个GPU协作以通过为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器内，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内，等等。11B is a flowchart 1100B illustrating a graphics processing method including geometry pre-testing and rendering of a previous image frame by multiple GPUs and use in a current image frame during rendering, in accordance with one embodiment of the present disclosure. Statistics collected to influence pre-test distribution of the geometry of the current image frame to multiple GPUs. FIG. 11A is a schematic diagram illustrating the use of statistics in the method of flowchart 1100B to determine distribution of geometry (eg, multiple geometries) allocations among GPUs for image frames. As mentioned previously, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for an application, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system ( Such as within a personal computer or gaming console including a high-end graphics card with multiple GPUs, etc.

特别地，在1110，该方法包括使用多个GPU为应用渲染图形，如前所述。在1120，该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任。每个GPU具有多个GPU已知的对应责任划分。更具体地，每个GPU负责在多个屏幕区域的对应的一组屏幕区域中渲染几何图形，其中对应的一组屏幕区域包括一个或多个屏幕区域，如前所述。在一个实施方案中，屏幕区域是交错的(例如，当显示器被划分为几组屏幕区域用于几何图形预测试和渲染时)。In particular, at 1110, the method includes using multiple GPUs to render graphics for the application, as previously described. At 1120, the method includes dividing responsibilities for rendering geometry of the graphics among multiple GPUs based on multiple screen areas. Each GPU has a corresponding division of responsibilities known to multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen areas of a plurality of screen areas, where the corresponding set of screen areas includes one or more screen areas, as previously described. In one embodiment, the screen areas are interleaved (eg, when the display is divided into sets of screen areas for geometry pre-testing and rendering).

在1130，该方法包括在多个GPU处渲染由应用生成的前一图像帧的第一多个几何图形。例如，时序图1100A图示了执行前一图像帧中的几何图形的几何图形测试和对象(例如几何图形)的渲染的时序。在1140，该方法包括为前一图像帧的渲染生成统计数据。也就是说，可以在渲染前一图像帧时收集统计数据。At 1130, the method includes rendering, at the plurality of GPUs, a first plurality of geometries of a previous image frame generated by the application. For example, timing diagram 1100A illustrates the timing of performing geometry testing of geometry in a previous image frame and rendering of an object (eg, geometry). At 1140, the method includes generating statistics for the rendering of the previous image frame. That is, statistics can be collected while rendering the previous image frame.

在1150，该方法包括基于统计数据将由应用生成的当前图像帧的第二多个几何图形分配给多个GPU用于几何图形测试。也就是说，当渲染下一个或当前图像帧时，这些统计数据可用于将用于几何图形测试的相同、更少或更多几何图形分配给特定GPU。在一些情况下，统计数据可以指示在执行几何图形测试时第二多个几何图形中的几何图形应该被均匀地分配给多个GPU。At 1150 , the method includes allocating a second plurality of geometries of the current image frame generated by the application to a plurality of GPUs for geometry testing based on the statistics. That is, these statistics can be used to assign the same, less, or more geometry used for geometry testing to a specific GPU when rendering the next or current image frame. In some cases, statistics may indicate that geometries in the second plurality of geometries should be distributed evenly across multiple GPUs when performing geometry testing.

在其他情况下，统计数据可以指示在执行几何图形测试时第二多个几何图形中的几何图形应该被非均匀地分配给多个GPU。例如，如时间线1100A中所示，统计数据可以指示GPU-B在前一图像帧中在任何其他GPU之前完成渲染。特别地，可以确定第一GPU(例如GPU-B)在第二GPU(例如GPU-A)完成渲染第一多个几何图形(例如几何图形的一部分)之前完成渲染第一多个几何图形。如前所述，第一GPU(例如GPU-B)渲染与分配给第一GPU用于对象渲染的任何屏幕区域重叠的第一多个几何图形中的一个或多个几何图形，并且第二GPU(例如GPU-A)渲染与分配给第二GPU用于对象渲染的任何屏幕区域重叠的第一多个几何图形中的一个或多个几何图形。因此，因为基于统计数据预期第一GPU(例如GPU-B)将比第二GPU(例如GPU-A)需要更少的时间来渲染第二多个几何图形，所以更多的几何图形可以在渲染当前图像帧时分配给第一GPU进行几何图形预测试。例如，可以将第一数量的第二多个几何图形分配给第一GPU(例如GPU-B)用于几何图形测试，并且可以将第二数量的第二多个几何图形分配给第二GPU(例如，GPU-A)用于几何图形测试，其中第一数量高于第二数量(如果时间不平衡足够大，则GPU-A可能根本不分配有任何几何图形)。以这种方式，GPU-B在几何图形测试期间比GPU-A处理更多的几何图形。例如，时序图1100B示出GPU-B已经被分配了更多的几何图形，并且比其他GPU花费更多的时间来执行几何图形测试。In other cases, statistics may indicate that geometries in the second plurality of geometries should be distributed non-uniformly across multiple GPUs when performing geometry testing. For example, as shown in timeline 1100A, statistics may indicate that GPU-B completed rendering before any other GPU in the previous image frame. In particular, it may be determined that the first GPU (eg, GPU-B) completes rendering the first plurality of geometries (eg, a portion of the geometry) before the second GPU (eg, GPU-A) completes rendering the first plurality of geometries (eg, a portion of the geometry). As previously described, the first GPU (e.g., GPU-B) renders one or more geometries of the first plurality of geometries that overlap with any screen area allocated to the first GPU for object rendering, and the second GPU (eg, GPU-A) renders one or more geometries of the first plurality of geometries that overlap any screen area allocated to the second GPU for object rendering. Therefore, because it is expected based on statistics that the first GPU (e.g. GPU-B) will take less time to render the second plurality of geometries than the second GPU (e.g. GPU-A), more geometries can be rendered in The current image frame is assigned to the first GPU for geometry pre-testing. For example, a first number of a second plurality of geometries may be assigned to a first GPU (eg, GPU-B) for geometry testing, and a second number of a second plurality of geometries may be assigned to a second GPU (e.g., GPU-B) For example, GPU-A) is used for geometry testing where the first quantity is higher than the second quantity (GPU-A may not be assigned any geometry at all if the temporal imbalance is large enough). In this way, GPU-B processes more geometry than GPU-A during the geometry test. For example, timing diagram 1100B shows that GPU-B has been allocated more geometry and is spending more time performing geometry tests than the other GPUs.

在1160，该方法包括在当前图像帧处对第二多个几何图形执行几何图形预测试，以生成关于第二多个几何图形中的每一个及其与多个屏幕区域中的每一个的关系的信息。基于分配在多个GPU中的每一个处执行几何图形预测试。在预测试GPU处对由应用生成的图像帧的多个几何图形执行几何图形预测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。At 1160 , the method includes performing a geometry pretest on the second plurality of geometries at the current image frame to generate information about each of the second plurality of geometries and their relationship to each of the plurality of screen areas. Information. Geometry pretesting is performed at each of multiple GPUs based on allocation. Geometry pretesting is performed at the pretest GPU on a plurality of geometries of image frames generated by the application to generate information about each geometry and its relationship to each of the plurality of screen areas.

在1170，该方法包括在渲染阶段期间使用为第二多个几何图形中的每一个生成的信息来渲染多个几何图形(例如，包括在对应GPU处完全渲染几何图形或跳过该几何图形的渲染)。在实施方案中，渲染通常在每个GPU处同时执行。特别地，当前图像帧的多个几何图形在多个GPU中的每一个处使用为几何图形中的每一个生成的信息来渲染。At 1170 , the method includes using the information generated for each of the second plurality of geometries to render the plurality of geometries during a rendering stage (e.g., including fully rendering the geometry at the corresponding GPU or skipping the rendering). In implementations, rendering is typically performed simultaneously at each GPU. In particular, multiple geometries of the current image frame are rendered at each of multiple GPUs using information generated for each of the geometries.

在其他实施方案中，分布几何图形到GPU用于生成信息是动态调整的。也就是说，用于执行几何图形预测试的当前图像帧的几何图形的分配可以在当前图像帧的渲染期间动态调整。例如，在时序图1100B的示例中，可以确定GPU-A正在以比预期慢的速率对其分配的几何图形执行几何图形预测试。因此，分配给GPU-A以进行几何图形预测试的几何图形可以即时重新分配，诸如将一个几何图形从GPU-A重新分配给GPU-B，这样GPU-B现在的任务就是在用于渲染当前图像帧的帧周期期间对该几何图形执行几何图形预测试。In other embodiments, the distribution of geometry to the GPU for generating information is dynamically adjusted. That is, the allocation of geometry for the current image frame used to perform geometry pretesting can be dynamically adjusted during the rendering of the current image frame. For example, in the example of timing diagram 1100B, it may be determined that GPU-A is performing geometry pretesting on its assigned geometry at a slower rate than expected. Therefore, geometry assigned to GPU-A for geometry pretesting can be reassigned on the fly, such as reassigning a geometry from GPU-A to GPU-B, so that GPU-B is now tasked with rendering the current A geometry pretest is performed on this geometry during the frame period of the image frame.

图12A-图12B图示了用于处理渲染命令缓冲区的另一种策略。先前，参考图7A-图7C描述了一种策略，其中命令缓冲区包含用于对对象(例如几何图形)进行几何图形预测试的命令，随后是用于渲染对象(例如几何图形)的命令。图12A-图12B示出了使用能够根据GPU配置执行任一操作的着色器的几何图形预测试和渲染策略。Figures 12A-12B illustrate another strategy for handling the render command buffer. Previously, a strategy was described with reference to Figures 7A-7C, in which a command buffer contains commands for geometry pre-testing of an object (eg, geometry), followed by commands for rendering the object (eg, geometry). Figures 12A-12B illustrate geometry pre-testing and rendering strategies using shaders capable of performing either operation depending on the GPU configuration.

特别地，图12A是图示了根据本公开的一个实施方案的被配置为在通过命令缓冲区1200A的一部分的两次通过中执行图像帧的几何图形的预测试和渲染两者的着色器的使用的示意图。也就是说，用于执行命令缓冲区1200A中的命令的着色器可以被配置为在正确配置时执行几何图形预测试，或者在正确配置时执行渲染。In particular, FIG. 12A is an illustration of a shader configured to perform both pre-testing and rendering of the geometry of an image frame in two passes through a portion of command buffer 1200A, in accordance with one embodiment of the present disclosure. Schematic diagram used. That is, a shader used to execute commands in command buffer 1200A may be configured to perform geometry pretesting when properly configured, or to perform rendering when properly configured.

如图所示，图12A中所示的命令缓冲区1200A的一部分被执行两次，每次执行产生不同的动作；第一次执行导致执行几何图形预测试，而第二次执行导致执行几何图形渲染。这可以通过多种方式实现，例如，1200A中描述的命令缓冲区的部分可以作为子例程被显式调用两次，不同的状态(例如，寄存器设置或RAM中的值)在每次调用之前显式设置为不同的值。替代地，1200A中描述的命令缓冲区的部分可以被隐式执行两次，例如通过使用特殊命令来标记该部分的开始和结束以执行两次，并为命令缓冲区的部分的第一次和第二次执行隐式设置不同的配置(例如寄存器设置)。当命令缓冲区1200A的部分中的命令被执行时(例如，设置状态的命令或执行着色器的命令)，基于GPU状态，命令的结果是不同的(例如，导致执行几何图形预测试与执行渲染)。也就是说，命令缓冲区1200A中的命令可以被配置用于几何图形预测试或渲染。特别地，命令缓冲区1200A的部分包括用于配置执行来自渲染命令缓冲区1200A的命令的一个或多个GPU的状态的命令，以及用于执行取决于状态执行几何图形预测试或渲染的着色器的命令。例如，命令1210、1212、1214和1216中的每一个都用于配置一个或多个GPU的状态，以便执行取决于状态执行几何图形预测试或渲染的着色器。如图所示，命令1210配置GPU状态，以便着色器0可以经由命令1211执行并执行几何图形预测试或渲染。同样，命令1212配置GPU状态，以便着色器1可以经由命令1213执行以执行几何图形预测试或渲染。此外，命令1214配置GPU状态，以便着色器2可以经由命令1215执行以执行几何图形预测试或渲染。最后，命令1216配置GPU状态，以便着色器3可以经由命令1217执行以执行几何图形预测试或渲染。As shown, the portion of command buffer 1200A shown in Figure 12A is executed twice, with each execution resulting in a different action; the first execution results in execution of the geometry pretest, while the second execution results in execution of the geometry render. This can be accomplished in a number of ways, for example, the portion of the command buffer described in 1200A can be explicitly called twice as a subroutine, with different states (e.g., register settings or values in RAM) before each call Explicitly set to a different value. Alternatively, the portion of the command buffer described in 1200A may be implicitly executed twice, such as by using special commands to mark the beginning and end of the portion to execute twice, and for the first and end of the portion of the command buffer. The second execution implicitly sets a different configuration (e.g. register settings). When a command in portion of command buffer 1200A is executed (e.g., a command to set state or a command to execute a shader), the result of the command is different based on the GPU state (e.g., causing a geometry pretest to be performed versus performing a render ). That is, the commands in command buffer 1200A may be configured for geometry pretesting or rendering. In particular, portions of command buffer 1200A include commands for configuring the state of one or more GPUs that execute commands from rendering command buffer 1200A, and for performing shaders that perform geometry pretesting or rendering depending on the state. The command. For example, commands 1210, 1212, 1214, and 1216 are each used to configure the state of one or more GPUs to execute shaders that perform geometry pretesting or rendering depending on the state. As shown, command 1210 configures the GPU state so that shader 0 can execute via command 1211 and perform geometry pretesting or rendering. Likewise, command 1212 configures the GPU state so that shader 1 can be executed via command 1213 to perform geometry pretesting or rendering. Additionally, command 1214 configures the GPU state so that shader 2 can be executed via command 1215 to perform geometry pretesting or rendering. Finally, command 1216 configures the GPU state so that shader 3 can be executed via command 1217 to perform geometry pretesting or rendering.

在通过命令缓冲区1200A的第一次遍历1291上，基于如上所述显式或隐式设置的GPU状态以及由命令1210、1212、1214和1216配置的GPU状态，对应的着色器执行几何图形预测试。例如，着色器0被配置为对对象0(例如，一个几何图形)执行几何图形预测试(例如，基于图7B-1中所示的对象)，着色器1被配置为对对象1执行几何图形预测试，着色器2被配置为对对象2执行几何图形预测试，和着色器3被配置为对对象3执行几何图形预测试。On the first pass 1291 through the command buffer 1200A, based on the GPU state explicitly or implicitly set as described above and the GPU state configured by commands 1210, 1212, 1214, and 1216, the corresponding shader performs geometry pre-processing. test. For example, shader 0 is configured to perform geometry pretesting (e.g., based on the object shown in Figure 7B-1) on object 0 (e.g., a geometry), and shader 1 is configured to perform geometry on object 1 Pretest, shader 2 is configured to perform geometry pretest on object 2, and shader 3 is configured to perform geometry pretest on object 3.

在一个实施方案中，基于GPU状态，可以跳过或不同地解释命令。例如，设置状态的某些命令(1210、1212、1214和1216的部分)可以基于如上所述显式或隐式设置的GPU状态而被跳过；例如，如果配置经由命令1210执行的着色器0需要为几何图形预测试配置的GPU状态比配置用于几何图形渲染时需要的GPU状态更少，那么由于GPU状态设置会产生开销，所以跳过设置GPU状态的不必要部分可能是有益的。再举一个例子，设置状态的某些命令(1210、1212、1214和1216的部分)可以基于如上所述显式或隐式设置的GPU状态而被不同地解释；例如，如果经由命令1210执行的着色器0需要为几何图形预测试配置的GPU状态不同于为几何图形渲染配置时的，或者如果经由命令1210执行的着色器0需要针对几何图形预测试和针对几何图形渲染不同的输入。In one embodiment, commands may be skipped or interpreted differently based on GPU status. For example, certain commands that set the state (parts of 1210, 1212, 1214, and 1216) may be skipped based on the GPU state being explicitly or implicitly set as described above; for example, if shader 0 is configured to be executed via command 1210 If less GPU state is required to be configured for geometry pretesting than is required when configuring for geometry rendering, it may be beneficial to skip the unnecessary portion of setting the GPU state since GPU state setting incurs overhead. As another example, certain commands that set state (parts of 1210, 1212, 1214, and 1216) may be interpreted differently based on the GPU state being explicitly or implicitly set as described above; for example, if executed via command 1210 Shader 0 requires a different GPU state configured for geometry pretesting than when configured for geometry rendering, or if Shader0 executed via command 1210 requires different inputs for geometry pretesting than for geometry rendering.

在一个实施方案中，为几何图形预测试配置的着色器不分配位置高速缓存和参数高速缓存中的空间，如前所述。在另一个实施方案中，使用单个着色器来执行预测试或渲染。这可以通过多种方式完成，诸如经由着色器可以检查的外部硬件状态(例如，如上所述显式或隐式设置的)，或经由着色器的输入(例如，由通过命令缓冲区在第一次和第二次通过中不同地解释的命令设置的)。In one embodiment, shaders configured for geometry pretest do not allocate space in the position cache and parameter cache, as previously described. In another embodiment, a single shader is used to perform pretesting or rendering. This can be done in a variety of ways, such as via external hardware state that the shader can inspect (e.g., set explicitly or implicitly as described above), or via input to the shader (e.g., via the command buffer in the first command settings that are interpreted differently in the first and second passes).

在通过命令缓冲区1200A的第二次遍历1292中，基于如上所述显式或隐式设置的GPU状态，以及由命令1210、1212、1214和1216配置的GPU状态，对应的着色器执行对应图像帧的几何图形的渲染。例如，着色器0被配置为执行对象0(例如，一个几何图形)的渲染(例如，基于图7B-1中所示的对象)。同样，着色器1被配置为执行对象1的渲染，着色器2被配置为执行对象2的渲染，并且着色器3被配置为执行对象3的渲染。In the second pass 1292 through the command buffer 1200A, the corresponding shader executes the corresponding image based on the GPU state explicitly or implicitly set as described above, and the GPU state configured by commands 1210, 1212, 1214, and 1216 Rendering of frame geometry. For example, shader 0 is configured to perform rendering of object 0 (eg, a geometry) (eg, based on the object shown in Figure 7B-1). Likewise, shader 1 is configured to perform rendering of object 1, shader 2 is configured to perform rendering of object 2, and shader 3 is configured to perform rendering of object 3.

图12B是图示了根据本公开的一个实施方案的图形处理方法的流程图1200B，包括在通过命令缓冲区的一部分在两次通过中使用相同的一组着色器来执行图像帧的几何图形的预测试和渲染两者。如前所述，各种架构可以包括多个GPU协作以通过为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器内，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内，等等。12B is a flowchart 1200B illustrating a graphics processing method that includes executing the geometry of an image frame using the same set of shaders in two passes through a portion of a command buffer, in accordance with one embodiment of the present disclosure. Pre-test and render both. As mentioned previously, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for an application, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system ( Such as within a personal computer or gaming console including a high-end graphics card with multiple GPUs, etc.

特别地，在1210，该方法包括使用多个GPU为应用渲染图形，如前所述。在1220，该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任。每个GPU具有多个GPU已知的对应责任划分。更具体地，每个GPU负责在多个屏幕区域的对应的一组屏幕区域中渲染几何图形，其中对应的一组屏幕区域包括一个或多个屏幕区域，如前所述。在一个实施方案中，屏幕区域是交错的(例如，当显示器被划分为几组屏幕区域用于几何图形预测试和渲染时)。In particular, at 1210, the method includes using multiple GPUs to render graphics for the application, as previously described. At 1220, the method includes dividing responsibilities for rendering geometry of the graphics among multiple GPUs based on multiple screen areas. Each GPU has a corresponding division of responsibilities known to multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen areas of a plurality of screen areas, where the corresponding set of screen areas includes one or more screen areas, as previously described. In one embodiment, the screen areas are interleaved (eg, when the display is divided into sets of screen areas for geometry pre-testing and rendering).

在1230，该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。特别地，多个GPU中的每一个被分配给与图像帧相关联的几何图形的对应部分，以用于几何图形测试。如前所述，在实施方案中，几何图形的分配可以是均匀或不均匀分布的，其中每个部分包括一个或多个几何图形，或者可能根本没有几何图形。At 1230, the method includes allocating multiple geometries of the image frame to multiple GPUs for geometry testing. In particular, each of the multiple GPUs is assigned to a corresponding portion of the geometry associated with the image frame for geometry testing. As mentioned previously, in embodiments, the distribution of geometries may be evenly or unevenly distributed, where each section includes one or more geometries, or there may be no geometries at all.

在1240，该方法包括加载配置一个或多个着色器以执行几何图形预测试的第一GPU状态。例如，根据GPU状态，可以配置对应的着色器来执行不同的操作。因此，第一GPU状态配置对应的着色器以执行几何图形预测试。在图12A的示例中，这可以以多种方式设置，例如通过在1200A中描绘的命令缓冲区的部分外部显式或隐式设置状态，如上所述。特别地，可以以多种方式设置GPU状态。例如，CPU或GPU可以设置随机存取存储器(RAM)中的一个值，其中GPU将检查RAM中的该值。在另一个示例中，状态可以在GPU内部，诸如当命令缓冲区作为子例程被调用两次时，内部GPU状态在两个子例程调用之间是不同的。替代地，图12A中的命令1210可以基于如上所述显式或隐式设置的状态来不同地解释或跳过。基于此第一GPU状态，由命令1211执行的着色器0被配置为执行几何图形预测试。At 1240, the method includes loading a first GPU state that configures one or more shaders to perform geometry pre-testing. For example, depending on the GPU state, the corresponding shader can be configured to perform different operations. Therefore, the first GPU state configures the corresponding shader to perform geometry pretesting. In the example of Figure 12A, this can be set in a variety of ways, such as by explicitly or implicitly setting state externally to the portion of the command buffer depicted in 1200A, as described above. In particular, the GPU state can be set in a variety of ways. For example, the CPU or GPU can set a value in random access memory (RAM), where the GPU will check the RAM for that value. In another example, the state may be internal to the GPU, such as when a command buffer is called twice as a subroutine, the internal GPU state is different between the two subroutine calls. Alternatively, command 1210 in Figure 12A may be interpreted differently or skipped based on state explicitly or implicitly set as described above. Based on this first GPU state, shader 0, executed by command 1211, is configured to perform geometry pretesting.

在1250，该方法包括在多个GPU处对多个几何图形执行几何图形预测试，以生成关于每个几何图形及其与多个屏幕区域中的每一个的关系的信息。如前所述，几何图形预测试可以确定一个几何图形是否与分配给对应GPU用于对象渲染的任何屏幕区域(例如，在对应组中)重叠。因为几何图形预测试通常在实施方案中由GPU同时针对对应图像帧的所有几何图形执行，所以每个GPU能够知道要渲染哪些几何图形以及跳过哪些几何图形。这结束了通过命令缓冲区的第一次遍历，其中着色器可以被配置为根据GPU状态执行几何图形预测试和/或渲染中的每一者。At 1250, the method includes performing geometry pretesting on the plurality of geometries at the plurality of GPUs to generate information about each geometry and its relationship to each of the plurality of screen areas. As mentioned before, the geometry pretest can determine whether a geometry overlaps any screen area allocated to the corresponding GPU for object rendering (e.g., in the corresponding group). Because geometry pretesting is typically performed by the GPU in implementations simultaneously for all geometries of a corresponding image frame, each GPU is able to know which geometries to render and which geometries to skip. This concludes the first pass through the command buffer, where the shader can be configured to perform each of geometry pretesting and/or rendering depending on the GPU state.

在1260，该方法包括加载配置一个或多个着色器以执行渲染的第二GPU状态。如前所述，根据GPU状态，可以配置对应的着色器来执行不同的操作。因此，第二GPU状态配置对应的着色器(之前用于执行几何图形预测试的相同着色器)来执行渲染。在图12A的示例中，基于该第二GPU状态，由命令1211执行的着色器0被配置为执行渲染。At 1260, the method includes loading a second GPU state that configures one or more shaders to perform rendering. As mentioned before, depending on the GPU state, the corresponding shader can be configured to perform different operations. Therefore, the second GPU state configures the corresponding shader (the same shader previously used to perform the geometry pretest) to perform rendering. In the example of Figure 12A, based on this second GPU state, shader 0 executed by command 1211 is configured to perform rendering.

在1270，该方法包括在渲染多个几何图形(例如，包括在对应GPU处完全渲染几何图形或跳过该几何图形的渲染)时在多个GPU的每一个处使用为多个几何图形中的每一个生成的信息。如前所述，该信息可以指示一个几何图形是否与分配给对应GPU用于对象渲染的任何屏幕区域(例如，在对应组中)重叠。该信息可以用于在多个GPU中的每一个处渲染多个几何图形中的每一个，使得每个GPU可以仅高效地渲染与分配给该对应GPU用于对象渲染的至少一个屏幕(例如，在对应组中)重叠的几何图形。这结束了通过命令缓冲区的第二次遍历，其中着色器可以被配置为根据GPU状态执行几何图形预测试和/或渲染中的每一者。At 1270 , the method includes using at each of the plurality of GPUs when rendering the plurality of geometries (e.g., including fully rendering the geometry at the corresponding GPU or skipping rendering of the geometry). Each generated message. As mentioned before, this information can indicate whether a geometry overlaps any screen area (eg, in the corresponding group) allocated to the corresponding GPU for object rendering. This information can be used to render each of the plurality of geometries at each of the plurality of GPUs such that each GPU can efficiently render only the at least one screen allocated to that corresponding GPU for object rendering (e.g., in corresponding groups) overlapping geometries. This concludes a second pass through the command buffer, where the shader can be configured to perform either geometry pretesting and/or rendering depending on the GPU state.

图13A-图13B图示了用于处理渲染命令缓冲区的另一种策略。先前，参考图7A-图7C描述了一种策略，其中命令缓冲区包含用于对对象(例如几何图形)进行几何图形预测试的命令，随后是用于渲染对象(例如几何图形)的命令，且图12A-图12B中描述了使用能够根据GPU配置执行任一操作的着色器的另一种策略。图13A-图13B示出了根据本公开的实施方案的使用能够执行几何图形预测试或渲染的着色器的几何图形测试和渲染策略，并且其中几何图形预测试和渲染的过程对于不同组的几何图形是交错的。Figures 13A-13B illustrate another strategy for handling the render command buffer. Previously, a strategy was described with reference to Figures 7A-7C in which a command buffer contains commands for geometry pre-testing of an object (e.g., geometry), followed by commands for rendering the object (e.g., geometry), And another strategy using shaders that can perform either operation depending on the GPU configuration is described in Figures 12A-12B. 13A-13B illustrate a geometry testing and rendering strategy using a shader capable of performing geometry pre-testing or rendering, in accordance with an embodiment of the present disclosure, and wherein the process of geometry pre-testing and rendering is useful for different sets of geometries. The graphics are staggered.

特别地，图13A是图示了根据本公开的一个实施方案的被配置为执行几何图形预测试和渲染两者的着色器的使用的示意图，其中针对不同组几何图形执行的几何图形预测试和渲染是使用对应命令缓冲区1300A的单独部分交错的。也就是说，不是从开始到结束执行命令缓冲区1300A的一部分，命令缓冲区1300A被动态地配置和执行，使得几何图形预测试和渲染对于不同组的几何图形交错。例如，在命令缓冲区中，一些着色器(例如，经由命令1311和1313执行的)被配置为对第一组几何图形执行几何图形预测试，其中在执行几何图形测试之后，那些相同的着色器(例如，由命令1311和1313执行的)然后被配置为执行渲染。在对第一组几何图形执行渲染之后，命令缓冲区中的其他着色器(例如，经由命令1315和1317执行的)被配置用于对第二组几何图形执行几何图形预测试，其中在执行几何图形预测试之后，那些相同的着色器(例如经由命令1315和1317执行的)然后被配置用于执行渲染，并且渲染是对第二组几何图形使用那些命令执行的。这种策略的好处是可以动态解决GPU之间的不平衡，诸如通过在整个渲染过程中使用几何图形测试的非对称交错。先前在图10的分布102中介绍了几何图形测试的非对称交错的示例。In particular, FIG. 13A is a schematic diagram illustrating the use of a shader configured to perform both geometry pretesting and rendering, in which geometry pretesting and rendering are performed for different sets of geometries, according to one embodiment of the present disclosure. Rendering is interleaved using separate portions of the corresponding command buffer 1300A. That is, rather than executing portions of command buffer 1300A from start to finish, command buffer 1300A is dynamically configured and executed such that geometry pretesting and rendering are interleaved for different sets of geometry. For example, in the command buffer, some shaders (e.g., executed via commands 1311 and 1313) are configured to perform geometry pretesting on the first set of geometries, where after performing the geometry test, those same shaders (e.g., executed by commands 1311 and 1313) are then configured to perform rendering. After performing rendering on the first set of geometries, other shaders in the command buffer (e.g., executed via commands 1315 and 1317) are configured to perform geometry pretesting on the second set of geometries, where the geometry is After graphics pre-testing, those same shaders (eg, executed via commands 1315 and 1317) are then configured to perform rendering, and the rendering is performed using those commands for the second set of geometries. The benefit of this strategy is that imbalances between GPUs can be resolved dynamically, such as through asymmetric interleaving using geometry testing throughout the rendering process. An example of asymmetric interleaving of geometry tests was previously introduced in distribution 102 of FIG. 10 .

由于几何图形预测试和渲染的交错动态发生，GPU的配置(例如，经由寄存器设置或RAM中的值)隐式发生，也就是说，GPU配置的一个方面发生在命令缓冲区的外部。例如，GPU寄存器可以设置为0(指示应该发生几何图形预测试)或1(指示应该发生渲染)；命令缓冲区的交错遍历和该寄存器的设置可以由GPU基于处理的对象数量、处理的图元、GPU之间的不平衡等来控制。替代地，可以使用RAM中的值。作为该外部配置(意味着在命令缓冲区外部设置)的结果，当命令缓冲区1300A的部分中的命令被执行时(例如，设置状态的命令或执行着色器的命令)，基于GPU状态，命令的结果是不同的(例如，导致执行几何图形预测试与执行渲染)。也就是说，命令缓冲区1300A中的命令可以被配置用于几何图形预测试1391或渲染1392。特别地，命令缓冲区1300A的部分包括用于配置执行来自渲染命令缓冲区1300A的命令的一个或多个GPU的状态的命令，以及用于执行取决于状态执行几何图形预测试或渲染的着色器的命令。例如，命令1310、1312、1314和1316中的每一个都用于配置GPU的状态，以便执行取决于状态执行几何图形预测试或渲染的着色器。如图所示，命令缓冲区1310配置GPU状态，以便可以经由命令1311执行着色器0，以执行对象0的几何图形预测试或渲染。同样，命令缓冲区1312配置GPU状态，以便可以经由命令1313执行着色器1，以执行对象1的几何图形预测试或渲染。同样，命令缓冲区1314配置GPU状态，以便可以经由命令1315执行着色器2，以执行对象2的几何图形预测试或渲染。进一步，命令缓冲区1316配置GPU状态，以便可以经由命令1317执行着色器3，以执行对象3的几何图形预测试或渲染。Because of the interleaved dynamics of geometry pretesting and rendering, configuration of the GPU (e.g., via register settings or values in RAM) occurs implicitly, that is, one aspect of the GPU configuration occurs outside of the command buffer. For example, a GPU register can be set to 0 (indicating that geometry pretesting should occur) or 1 (indicating that rendering should occur); interleaved traversal of the command buffer and the setting of this register can be set by the GPU based on the number of objects processed, primitives processed , imbalance between GPUs, etc. to control. Alternatively, the value in RAM can be used. As a result of this external configuration (meaning set outside the command buffer), when a command in that portion of command buffer 1300A is executed (eg, a command to set state or a command to execute a shader), based on the GPU state, the command The results are different (for example, causing geometry pretesting to perform versus performing rendering). That is, the commands in command buffer 1300A may be configured for geometry pretesting 1391 or rendering 1392. In particular, portions of command buffer 1300A include commands for configuring the state of one or more GPUs that execute commands from rendering command buffer 1300A, and for performing shaders that perform geometry pretesting or rendering depending on the state. The command. For example, commands 1310, 1312, 1314, and 1316 are each used to configure the state of the GPU in order to execute shaders that perform geometry pretesting or rendering depending on the state. As shown, command buffer 1310 configures the GPU state so that shader 0 can be executed via command 1311 to perform geometry pretesting or rendering of object 0. Likewise, command buffer 1312 configures the GPU state so that shader 1 can be executed via command 1313 to perform geometry pretesting or rendering of object 1 . Likewise, command buffer 1314 configures the GPU state so that shader 2 can be executed via command 1315 to perform geometry pretesting or rendering of object 2. Further, command buffer 1316 configures the GPU state so that shader 3 can be executed via command 1317 to perform geometry pretesting or rendering of object 3.

几何图形预测试和渲染可以针对不同组的几何图形交错。仅出于说明目的，命令缓冲区1300A可以被配置为首先执行对象0和1的几何图形预测试和渲染，然后命令缓冲区1300A被配置为第二次执行对象2和3的几何图形预测试和渲染。可以理解，不同数量的几何图形可以在不同的分段中交错。例如，分段1示出了通过命令缓冲区1300A的第一次遍历。基于如上所述隐式设置的GPU状态以及由命令1310和1312配置的GPU状态，对应的着色器执行几何图形预测试。例如，着色器0被配置为对对象0(例如，一个几何图形)执行几何图形预测试(例如，基于图7B-1中所示的对象)，并且着色器1被配置为对对象1执行几何图形预测试。分段2示出了通过命令缓冲区1300A的第二次遍历。基于如上所述隐式设置的GPU状态以及由命令1310和1312配置的GPU状态，对应的着色器执行渲染。例如，着色器0配置为现在执行对象0的渲染，且着色器1配置为现在执行对象1的渲染。Geometry pretesting and rendering can be interleaved for different groups of geometries. For illustration purposes only, command buffer 1300A may be configured to first perform geometry pretesting and rendering for objects 0 and 1, and then command buffer 1300A is configured to perform a second time for geometry pretesting and rendering for objects 2 and 3. render. It will be appreciated that different numbers of geometries can be interleaved in different segments. For example, segment 1 shows the first traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1310 and 1312, the corresponding shader performs geometry pre-testing. For example, shader 0 is configured to perform geometry pretesting (e.g., based on the object shown in Figure 7B-1) on object 0 (e.g., a geometry), and shader 1 is configured to perform geometry pretesting on object 1 Graphical pretest. Segment 2 shows the second pass through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1310 and 1312, the corresponding shader performs rendering. For example, shader 0 is configured to now perform the rendering of object 0, and shader 1 is configured to now perform the rendering of object 1.

几何图形预测试和渲染在不同组的几何图形上的性能的交错显示在图13A中。特别地，分段3示出了通过命令缓冲区1300A的第三部分遍历。基于如上所述隐式设置的GPU状态以及由命令1314和1316配置的GPU状态，对应的着色器执行几何图形预测试。例如，着色器2(经由命令1315执行)对对象2(例如一个几何图形)执行几何图形测试(例如基于图7B-1中所示的对象)，并且着色器3(经由命令1317执行)对对象3执行几何图形测试。分段4示出了通过命令缓冲区1300A的第四部分遍历。基于如上所述隐式设置的GPU状态以及由命令1314和1316配置的GPU状态，对应的着色器执行渲染。例如，着色器2(经由命令1315执行)执行对象2的渲染，并且着色器3(经由命令1317执行)执行对象3的渲染。An interleaving of geometry pre-testing and rendering performance on different sets of geometries is shown in Figure 13A. In particular, segment 3 shows the third portion of the traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1314 and 1316, the corresponding shader performs geometry pre-testing. For example, shader 2 (executed via command 1315) performs a geometry test (eg, based on the object shown in Figure 7B-1) on object 2 (eg, a geometry), and shader 3 (executed via command 1317) performs a geometry test on object 2 (eg, a geometry). 3Perform geometry testing. Segment 4 shows the fourth portion of the traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1314 and 1316, the corresponding shader performs rendering. For example, shader 2 (executed via command 1315) performs rendering of object 2, and shader 3 (executed via command 1317) performs rendering of object 3.

请注意，硬件背景被保留，或者被保存和恢复。例如，分段1结束时的几何图形预测试GPU背景需要在分段3的开始用于执行几何图形预测试。同样，分段2结束时的渲染GPU背景需要在分段4的开始执行渲染。Note that the hardware background is preserved, or saved and restored. For example, the geometry pretest GPU background at the end of segment 1 needs to be used at the beginning of segment 3 to perform the geometry pretest. Likewise, rendering the GPU background at the end of segment 2 requires rendering at the beginning of segment 4.

在一个实施方案中，基于GPU状态，可以跳过或不同地解释命令。例如，设置状态的某些命令(1310、1312、1314和1316的部分)可以基于如上所述隐式设置的GPU状态而被跳过；例如，如果配置经由命令1310执行的着色器0需要为几何图形测试配置的GPU状态比配置用于几何图形渲染时需要的GPU状态更少，那么由于GPU状态设置会产生开销，所以跳过设置GPU状态的不必要部分可能是有益的。再举一个例子，设置状态的某些命令(1310、1312、1314和1316的部分)可以基于如上所述隐式设置的GPU状态而被不同地解释；例如，如果经由命令1310执行的着色器0需要为几何图形测试配置的GPU状态不同于为几何图形渲染配置时，或者如果经由命令1310执行的着色器0需要针对几何图形测试和针对几何图形渲染不同的输入。In one embodiment, commands may be skipped or interpreted differently based on GPU status. For example, certain commands that set the state (parts of 1310, 1312, 1314, and 1316) may be skipped based on the implicitly set GPU state as described above; for example, if configuring shader 0 executed via command 1310 needs to be for geometry If a graphics test is configured with less GPU state than is required when configured for geometry rendering, it may be beneficial to skip the unnecessary portion of setting the GPU state since there is an overhead in setting the GPU state. As another example, certain commands that set state (parts of 1310, 1312, 1314, and 1316) may be interpreted differently based on the GPU state being implicitly set as described above; for example, if shader 0 is executed via command 1310 The GPU state that needs to be configured for geometry testing is different than when configured for geometry rendering, or if shader 0 executed via command 1310 requires different inputs for geometry testing than for geometry rendering.

在一个实施方案中，为几何图形预测试配置的着色器不分配位置高速缓存和参数高速缓存中的空间，如前所述。在另一个实施方案中，使用单个着色器来执行预测试或渲染。这可以通过多种方式完成，诸如经由着色器可以检查的外部硬件状态(例如，如上所述隐式设置的)，或经由着色器的输入(例如，由通过命令缓冲区在第一次和第二次通过中不同地解释的命令设置的)。In one embodiment, shaders configured for geometry pretest do not allocate space in the position cache and parameter cache, as previously described. In another embodiment, a single shader is used to perform pretesting or rendering. This can be done in a variety of ways, such as via external hardware state that the shader can check (e.g., set implicitly as described above), or via input to the shader (e.g., by passing the command buffer on the first and second passes). (set by commands interpreted differently in two passes).

图13B是图示了根据本公开的一个实施方案的图形处理方法的流程图，包括使用对应命令缓冲区的单独部分来针对不同组几何图形使图像帧的几何图形的预测试和渲染交错。如前所述，各种架构可以包括多个GPU协作以通过为应用执行几何图形的多GPU渲染来渲染单个图像，诸如在云游戏系统的一个或多个云游戏服务器内，或在独立系统(诸如包括具有多个GPU的高端显卡的个人计算机或游戏控制台)内，等等。13B is a flowchart illustrating a graphics processing method including using separate portions of corresponding command buffers to interleave pre-testing and rendering of geometry for image frames for different sets of geometries, according to one embodiment of the present disclosure. As mentioned previously, various architectures may include multiple GPUs cooperating to render a single image by performing multi-GPU rendering of geometry for an application, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system ( Such as within a personal computer or gaming console including a high-end graphics card with multiple GPUs, etc.

特别地，在1310，该方法包括使用多个GPU为应用渲染图形，如前所述。在1320，该方法包括基于多个屏幕区域在多个GPU之间划分用于渲染图形的几何图形的责任。每个GPU具有多个GPU已知的对应责任划分。更具体地，每个GPU负责在多个屏幕区域的对应的一组屏幕区域中渲染几何图形，其中对应的一组屏幕区域包括一个或多个屏幕区域，如前所述。在一个实施方案中，屏幕区域是交错的(例如，当显示器被划分为几组屏幕区域用于几何图形预测试和渲染时)。In particular, at 1310, the method includes using multiple GPUs to render graphics for the application, as previously described. At 1320, the method includes dividing responsibilities for rendering geometry of the graphics among multiple GPUs based on multiple screen areas. Each GPU has a corresponding division of responsibilities known to multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen areas of a plurality of screen areas, where the corresponding set of screen areas includes one or more screen areas, as previously described. In one embodiment, the screen areas are interleaved (eg, when the display is divided into sets of screen areas for geometry pre-testing and rendering).

在1330，该方法包括将图像帧的多个几何图形分配给多个GPU以用于几何图形测试。特别地，多个GPU中的每一个被分配给与图像帧相关联的几何图形的对应部分，以用于几何图形测试。如前所述，几何图形的分配可以是均匀或不均匀分布的，其中每个部分包括一个或多个几何图形，或者可能根本没有几何图形。At 1330, the method includes allocating multiple geometries of the image frame to multiple GPUs for geometry testing. In particular, each of the multiple GPUs is assigned to a corresponding portion of the geometry associated with the image frame for geometry testing. As mentioned before, the distribution of geometries can be evenly or unevenly distributed, where each section includes one or more geometries, or there may be no geometry at all.

在1340，该方法包括将命令缓冲区中的第一组着色器与第二组着色器交错，其中所述着色器被配置为执行几何图形预测试和渲染两者。特别地，第一组着色器被配置为对第一组几何图形执行几何图形预测试和渲染。此后，第二组着色器被配置为对第二组几何图形执行几何图形预测试和渲染。如前所述，几何图形预测试生成关于第一组或第二组中的每个几何图形及其与多个屏幕区域中的每一个的关系的对应信息。多个GPU使用对应的信息来渲染第一组或第二组中的每个几何图形。如前所述，可以以多种方式设置GPU状态，以便执行几何图形预测试或渲染。例如，CPU或GPU可以设置随机存取存储器(RAM)中的一个值，其中GPU将检查RAM中的该值。在另一个示例中，状态可以在GPU内部，诸如当命令缓冲区作为子例程被调用两次时，内部GPU状态在两个子例程调用之间是不同的。At 1340, the method includes interleaving a first set of shaders in the command buffer with a second set of shaders, wherein the shaders are configured to perform both geometry pretesting and rendering. In particular, the first set of shaders is configured to perform geometry pretesting and rendering on the first set of geometries. Thereafter, a second set of shaders is configured to perform geometry pretesting and rendering on the second set of geometries. As previously mentioned, the geometry pretest generates corresponding information about each geometry in the first or second group and its relationship to each of the plurality of screen areas. Multiple GPUs use corresponding information to render each geometry in the first or second group. As mentioned before, the GPU state can be set in a variety of ways in order to perform geometry pretesting or rendering. For example, the CPU or GPU can set a value in random access memory (RAM), where the GPU will check the RAM for that value. In another example, the state may be internal to the GPU, such as when a command buffer is called twice as a subroutine, the internal GPU state is different between the two subroutine calls.

进一步描述交错过程。特别地，命令缓冲区的第一组着色器被配置为对第一组几何图形执行几何图形预测试，如前所述。在多个GPU处对第一组几何图形执行几何图形预测试以生成关于第一组中的每个几何图形及其与多个屏幕区域中的每一个的关系的第一信息。然后，第一组着色器被配置为执行第一组几何图形的渲染，如前所述。此后，当在多个GPU中的每一个处渲染多个几何图形(例如，包括在对应的GPU处完全渲染第一组几何图形或跳过第一组几何图形的渲染)时使用第一信息。如前所述，该信息指示哪些几何图形与分配给对应GPU用于对象渲染的屏幕区域重叠。例如，当该信息指示几何图形确实与分配给GPU以用于对象渲染的任何屏幕区域(例如，在对应组中)重叠时，该信息可用于跳过在GPU处渲染几何图形。The interleaving process is further described. In particular, the first set of shaders of the command buffer is configured to perform geometry pretesting on the first set of geometries, as described previously. Geometry pretesting is performed on the first set of geometries at the plurality of GPUs to generate first information about each geometry in the first set and its relationship to each of the plurality of screen areas. The first set of shaders are then configured to perform the rendering of the first set of geometries, as described previously. Thereafter, the first information is used when rendering the plurality of geometries at each of the plurality of GPUs (eg, including fully rendering the first set of geometries at the corresponding GPU or skipping rendering of the first set of geometries). As mentioned before, this information indicates which geometries overlap the screen area allocated to the corresponding GPU for object rendering. For example, this information can be used to skip rendering the geometry at the GPU when the information indicates that the geometry does overlap any screen area allocated to the GPU for object rendering (eg, in the corresponding group).

然后将第二组着色器用于第二组几何图形的几何图形测试和渲染。特别地，命令缓冲区的第二组着色器被配置为对第二组几何图形执行几何图形预测试，如前所述。然后，在多个GPU处对第二组几何图形执行几何图形测试以生成关于第二组中的每个几何图形及其与多个屏幕区域中的每一个的关系的第二信息。然后，第二组着色器被配置为执行第二组几何图形的渲染，如前所述。此后，使用第二信息在多个GPU中的每一个处执行第二组几何图形的渲染。如前所述，该信息指示哪些几何图形与分配给对应GPU用于对象渲染的屏幕区域(例如，对应组的)重叠。The second set of shaders are then used for geometry testing and rendering of the second set of geometry. In particular, the command buffer's second set of shaders is configured to perform geometry pretesting on the second set of geometries, as previously described. Geometry testing is then performed on the second set of geometries at the plurality of GPUs to generate second information about each geometry in the second set and its relationship to each of the plurality of screen areas. The second set of shaders is then configured to perform the rendering of the second set of geometries, as previously described. Thereafter, rendering of the second set of geometries is performed at each of the plurality of GPUs using the second information. As mentioned before, this information indicates which geometries overlap the screen area (eg, of the corresponding group) allocated to the corresponding GPU for object rendering.

尽管以上将多个GPU描述为前后紧接处理几何图形(即，多个GPU执行几何图形预测试，然后多个GPU执行渲染)，但在一些实施方案中，GPU没有明确地彼此同步，例如，一个GPU可以正在渲染第一组几何图形，而第二GPU正在对第二组几何图形执行几何图形预测试。Although multiple GPUs are described above as processing geometry back-to-back (i.e., multiple GPUs perform geometry pretesting and then multiple GPUs perform rendering), in some embodiments the GPUs are not explicitly synchronized with each other, e.g. One GPU can be rendering a first set of geometries while a second GPU is performing geometry pretesting on a second set of geometries.

图14图示了可用于执行本公开的各种实施方案的各方面的示例装置1400的部件。例如，图14图示了根据本公开的实施方案的示例性硬件系统，该系统适用于通过在渲染图像帧的对象之前针对可能交错的屏幕区域预测试几何图形来为应用进行几何图形的多GPU渲染。该框图示出了装置1400，所述装置可并入或可能是个人计算机、服务器计算机、游戏控制台、移动装置或其他数字装置，它们中的每个适合于实践本发明的实施方案。装置1400包括用于运行软件应用和可选地操作系统的中央处理单元(CPU)1402。CPU 1402可以由一个或多个同构或异构处理核心构成。Figure 14 illustrates components of an example apparatus 1400 that may be used to perform aspects of various embodiments of the present disclosure. For example, FIG. 14 illustrates an exemplary hardware system suitable for multi-GPU rendering of geometry for an application by pre-testing the geometry for potentially interleaved screen areas prior to rendering objects of an image frame, in accordance with an embodiment of the present disclosure. render. This block diagram illustrates an apparatus 1400 that may be incorporated into or may be a personal computer, a server computer, a game console, a mobile device, or other digital device, each of which may be suitable for practicing embodiments of the present invention. Apparatus 1400 includes a central processing unit (CPU) 1402 for running software applications and optionally an operating system. CPU 1402 may be composed of one or more homogeneous or heterogeneous processing cores.

根据各种实施方案中，CPU 1402是具有一个或多个处理核心的一个或多个通用微处理器。另外的实施方案可使用一个或多个CPU来实现，所述一个或多个CPU具有特别适用于诸如媒体和交互式娱乐应用等高度平行和计算密集的应用的微处理器架构，所述应用被配置用于游戏的执行期间的图形处理。According to various embodiments, CPU 1402 is one or more general-purpose microprocessors having one or more processing cores. Additional embodiments may be implemented using one or more CPUs having a microprocessor architecture particularly suited for highly parallel and computationally intensive applications such as media and interactive entertainment applications that are Configures graphics processing used during the execution of the game.

存储器1404存储应用和数据供CPU 1402和GPU 1416使用。存储装置1406为应用和数据提供非易失性存储装置和其他计算机可读介质并且可以包括固定磁盘驱动器、可移除磁盘驱动器、快闪存储器装置和CD-ROM、DVD-ROM、蓝光光碟、HD-DVD或其他光学存储装置，以及信号传输和存储介质。用户输入装置1408将用户输入从一个或多个用户传达到装置1400，所述装置的示例可包括键盘、鼠标、操纵杆、触摸板、触摸屏、静态或视频记录器/相机和/或传声器。网络接口1409允许装置1400经由电子通信网络与其他计算机系统通信，并且可以包括在局域网和诸如互联网的广域网上的有线或无线通信。音频处理器1412适于从由CPU 1402、存储器1404和/或存储装置1406提供的指令和/或数据生成模拟或数字音频输出。装置1400的部件，包括CPU 1402、包括GPU 1416的图形子系统、存储器1404、数据存储装置1406、用户输入装置1408、网络接口1409和音频处理器1412，经由一根或多根数据总线1422连接。Memory 1404 stores applications and data for use by CPU 1402 and GPU 1416. Storage 1406 provides non-volatile storage and other computer-readable media for applications and data and may include fixed disk drives, removable disk drives, flash memory devices, and CD-ROMs, DVD-ROMs, Blu-ray Discs, HD -DVD or other optical storage devices, and signal transmission and storage media. User input device 1408 communicates user input from one or more users to device 1400, examples of which may include a keyboard, mouse, joystick, touch pad, touch screen, still or video recorder/camera, and/or microphone. Network interface 1409 allows device 1400 to communicate with other computer systems via electronic communications networks, and may include wired or wireless communications over local area networks and wide area networks such as the Internet. Audio processor 1412 is adapted to generate analog or digital audio output from instructions and/or data provided by CPU 1402, memory 1404, and/or storage 1406. The components of device 1400, including CPU 1402, graphics subsystem including GPU 1416, memory 1404, data storage 1406, user input device 1408, network interface 1409, and audio processor 1412, are connected via one or more data buses 1422.

图形子系统1414进一步与数据总线1422和装置1400的部件连接。图形子系统1414包括至少一个图形处理单元(GPU)1416和图形存储器1418。图形存储器1418包括显示存储器(例如帧缓冲区)，所述显示存储器用于存储输出图像的每个像素的像素数据。图形存储器1418可以集成在与GPU 1416相同的装置中、作为单独的装置与GPU 1416连接和/或实现在存储器1404内。像素数据可以直接从CPU 1402提供到图形存储器1418。替代地，CPU 1402可向GPU 1416提供定义期望输出图像的数据和/或指令，GPU 1416根据所述数据和/或指令生成一个或多个输出图像的像素数据。定义期望输出图像的数据和/或指令可以存储在存储器1404和/或图形存储器1418中。在实施方案中，GPU 1416包括3D渲染能力，所述3D渲染能力用于根据指令和数据生成输出图像的像素数据，所述指令和数据定义场景的几何图形、照明、着色、纹理化、动作和/或相机参数。GPU 1416还可以包括能够执行着色器程序的一个或多个可编程执行单元。Graphics subsystem 1414 further interfaces with data bus 1422 and components of device 1400 . Graphics subsystem 1414 includes at least one graphics processing unit (GPU) 1416 and graphics memory 1418 . Graphics memory 1418 includes display memory (eg, a frame buffer) that stores pixel data for each pixel of the output image. Graphics memory 1418 may be integrated in the same device as GPU 1416 , coupled with GPU 1416 as a separate device, and/or implemented within memory 1404 . Pixel data may be provided directly from CPU 1402 to graphics memory 1418. Alternatively, CPU 1402 may provide data and/or instructions defining a desired output image to GPU 1416, from which GPU 1416 generates pixel data for one or more output images. Data and/or instructions defining the desired output image may be stored in memory 1404 and/or graphics memory 1418 . In an embodiment, GPU 1416 includes 3D rendering capabilities for generating pixel data for an output image based on instructions and data that define the scene's geometry, lighting, shading, texturing, motion, and /or camera parameters. GPU 1416 may also include one or more programmable execution units capable of executing shader programs.

图形子系统1414周期性地从图形存储器1418输出图像的像素数据，以在显示装置1410上显示或由投影系统(未示出)进行投影。显示装置1410可以是能够响应于来自装置1400的信号来显示视觉信息的任何装置，包括CRT、LCD、等离子体和OLED显示器。装置1400可以向显示装置1410提供例如模拟或数字信号。Graphics subsystem 1414 periodically outputs pixel data of the image from graphics memory 1418 for display on display device 1410 or for projection by a projection system (not shown). Display device 1410 may be any device capable of displaying visual information in response to signals from device 1400, including CRT, LCD, plasma, and OLED displays. Device 1400 may provide, for example, an analog or digital signal to display device 1410.

用于优化图形子系统1414的其他实施方案可以包括通过在渲染图像帧的对象之前针对可能交错的屏幕区域预测试几何图形来为应用进行几何图形的多GPU渲染。图形子系统1414可被配置为一个或多个处理装置。Other implementations for optimizing the graphics subsystem 1414 may include multi-GPU rendering of geometry for an application by pre-testing the geometry for potentially interleaved screen areas before rendering objects of an image frame. Graphics subsystem 1414 may be configured as one or more processing devices.

例如，在一个实施方案中，图形子系统1414可被配置成为应用执行几何图形的多GPU渲染，其中多个图形子系统可为单个应用实现图形和/或渲染流水线。也就是说，图形子系统1414包括多个GPU，用于在执行应用时渲染图像或图像序列中的一个或多个图像中的每一个。For example, in one embodiment, graphics subsystem 1414 may be configured to perform multi-GPU rendering of geometry for an application, where multiple graphics subsystems may implement graphics and/or rendering pipelines for a single application. That is, graphics subsystem 1414 includes multiple GPUs for rendering each of one or more images in an image or sequence of images when executing an application.

在其他实施方案中，图形子系统1414包括多个GPU装置，它们被组合以为在对应的CPU上执行的单个应用执行图形处理。例如，多个GPU可以通过在渲染图像帧的对象之前针对可能交错的屏幕区域预测试几何图形来为应用执行几何图形的多GPU渲染。在其他示例中，多个GPU可执行交替形式的帧渲染，其中以顺序的帧周期，GPU 1渲染第一帧，并且GPU 2渲染第二帧，依此类推，直到到达最后一个GPU为止，于是初始GPU渲染下一个视频帧(例如，如果只有两个GPU，则GPU 1渲染第三帧)。那就是渲染帧时GPU旋转。渲染操作可重叠，其中GPU 2可在GPU 1完成渲染第一帧之前开始渲染第二帧。在另一个实施方式中，可以在渲染和/或图形流水线中为多个GPU装置分配不同的着色器操作。主GPU正在执行主渲染和合成。例如，在包括三个GPU的组中，主GPU 1可以执行主渲染(例如，第一着色器操作)和合成来自从属GPU 2和从属GPU 3的输出，其中从属GPU 2可以执行第二着色器(例如流体效果，诸如河流)操作，从属GPU 3可以执行第三着色器(例如粒子烟雾)操作，其中主GPU 1合成来自GPU 1、GPU 2和GPU 3中的每一个的结果。这样，可以分配不同的GPU来执行不同的着色器操作(例如挥旗、风、烟雾生成、火等)以渲染视频帧。在又一个实施方案中，三个GPU中的每一个都可以分配给对应于视频帧的不同的对象和/或场景部分。在以上实施方案和实施方式中，可在同一帧周期(同时并行)或在不同帧周期(顺序并行)中执行这些操作。In other embodiments, graphics subsystem 1414 includes multiple GPU devices that are combined to perform graphics processing for a single application executing on a corresponding CPU. For example, multiple GPUs can perform multi-GPU rendering of geometry for an application by pre-testing the geometry for potentially interleaved screen areas before rendering the objects of the image frame. In other examples, multiple GPUs may perform an alternating form of frame rendering, where in sequential frame cycles, GPU 1 renders the first frame, and GPU 2 renders the second frame, and so on until the last GPU is reached, so The initial GPU renders the next video frame (for example, if there are only two GPUs, GPU 1 renders the third frame). That's GPU rotation when rendering frames. Rendering operations can overlap, where GPU 2 can start rendering the second frame before GPU 1 has finished rendering the first frame. In another embodiment, multiple GPU devices may be assigned different shader operations in a rendering and/or graphics pipeline. The main GPU is performing the main rendering and compositing. For example, in a group of three GPUs, master GPU 1 can perform the master rendering (e.g., a first shader operation) and composite output from slave GPU 2 and slave GPU 3, where slave GPU 2 can perform a second shader operation. (eg fluid effects such as river) operations, slave GPU 3 can perform third shader (eg particle smoke) operations where master GPU 1 synthesizes the results from each of GPU 1, GPU 2 and GPU 3. This way, different GPUs can be assigned to perform different shader operations (such as flag waving, wind, smoke generation, fire, etc.) to render video frames. In yet another embodiment, each of the three GPUs may be assigned to a different object and/or scene portion corresponding to a video frame. In the above embodiments and implementations, these operations may be performed in the same frame period (simultaneous parallelism) or in different frame periods (sequential parallelism).

因此，本公开描述了被配置为通过在执行应用时在渲染图像帧或图像帧序列中的一个或多个图像帧中的每一个的对象之前针对可能交错的屏幕区域预测试几何图形来为应用进行几何图形的多GPU渲染的方法和系统。Accordingly, the present disclosure describes configurations that provide for an application by pre-testing geometry for potentially interleaved screen areas before rendering an image frame or an object for each of one or more image frames in a sequence of image frames when the application is executed. Methods and systems for multi-GPU rendering of geometric figures.

应理解，可使用本文公开的各种特征将本文定义的各种实施方案组合或组装成具体的实施方式。因此，所提供的示例只是一些可能的示例，而不限于通过组合各种元素来定义更多的实施方式而可能的各种实施方式。在一些示例中，一些实施方式可以包括更少的元素，而不背离所公开的或等效实施方式的精神。It is to be understood that the various embodiments defined herein can be combined or assembled into specific embodiments using the various features disclosed herein. Therefore, the examples provided are only some of the possible examples and are not limited to the various implementations that are possible by combining various elements to define further implementations. In some examples, some implementations may include fewer elements without departing from the spirit of the disclosed or equivalent implementations.

本公开的实施方案可以用包括手持式装置、微处理器系统、基于微处理器的或可编程的消费电子产品、小型计算机、大型计算机等的各种计算机系统配置来实践。还可以在分布式计算环境中实践本公开的实施方案，在所述分布式计算环境中，通过基于有线或无线网络链接的远程处理装置执行任务。Embodiments of the present disclosure may be practiced with a variety of computer system configurations including handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. Embodiments of the present disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices based on wired or wireless network links.

考虑到以上实施方案，应理解，本公开的实施方案可以采用涉及存储在计算机系统中的数据的各种计算机实现的操作。这些操作是需要对物理量的物理操纵的那些操作。本文描述的形成本公开的实施方案的一部分的任何操作都是有用的机器操作。本公开的实施方案还涉及用于执行这些操作的装置或设备。可以针对所需目的专门构造所述设备，或者所述设备可以是由存储在计算机中的计算机程序选择性地激活或配置的通用计算机。具体地，各种通用机器可以与根据本文的教导编写的计算机程序一起使用，或者构造更专门的设备来执行所需的操作可能更方便。With the above embodiments in mind, it should be understood that embodiments of the present disclosure may employ various computer-implemented operations involving data stored in a computer system. These operations are those requiring physical manipulation of physical quantities. Any operations described herein that form part of embodiments of the present disclosure are useful machine operations. Embodiments of the present disclosure also relate to apparatus or equipment for performing these operations. The device may be specially constructed for the required purposes, or the device may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, a variety of general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.

本公开还可以实施为计算机可读介质上的计算机可读代码。计算机可读介质是可以存储数据的任何数据存储装置，该数据随后可以由计算机系统读取。计算机可读介质的示例包括硬盘驱动器、网络附接存储装置(NAS)、只读存储器、随机存取存储器、CD-ROM、CD-R、CD-RW、磁带以及其他光学和非光学数据存储装置。所述计算机可读介质可以包括分布在网络耦合的计算机系统上的计算机可读有形介质，使得以分布式方式存储和执行计算机可读代码。The disclosure may also be embodied as computer-readable code on a computer-readable medium. Computer-readable media is any data storage device that can store data that can subsequently be read by a computer system. Examples of computer-readable media include hard drives, network-attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tape, and other optical and non-optical data storage devices . The computer-readable medium may include computer-readable tangible media distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion.

尽管以具体的顺序描述了方法操作，但应理解，可在操作之间执行其他内务操作，或者可以调整操作，使得它们在略微不同的时间发生，或者可以分布在允许以与处理相关联的各种间隔发生处理操作的系统中，只要覆盖操作的处理以期望的方式执行即可。Although the method operations are described in a specific order, it should be understood that other housekeeping operations may be performed between the operations, or the operations may be adjusted so that they occur at slightly different times, or may be distributed among the various operations allowed to be associated with the process. In a system where processing operations occur at intervals, it suffices as long as the processing of the overriding operation is performed in the expected manner.

虽然出于清楚理解的目的相当详细地描述了前述公开内容，但显而易见的是，可在所附权利要求的范围内实践某些改变和修改。因此，本发明的实施方案被认为是说明性的而非限制性的，并且本公开的实施方案不限于本文给出的细节，而可以在所附权利要求的范围和等效物内进行修改。Although the foregoing disclosure has been described in considerable detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the embodiments of the present invention are to be regarded as illustrative and not restrictive, and the disclosed embodiments are not limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Claims

1. A method for graphics processing, comprising:

Use multiple graphics processing units (GPUs) to render graphics for your application;

Responsibility for rendering the geometry of the figure is divided between the plurality of GPUs based on a plurality of screen areas, each GPU having a corresponding division of the responsibility known to the plurality of GPUs, wherein the plurality of Screen areas within screen areas are interleaved;

allocating the geometry of the image frames generated by the application to the GPU for geometry pre-testing;

performing the geometry pretest at the GPU to generate information regarding the geometry and its relationship to each of the plurality of screen areas;

using the information at each of the plurality of GPUs when rendering the image frame;

providing said information as a hint to a rendering GPU, wherein said rendering GPU is one of said plurality of GPUs,

wherein said rendering GPU considers said information if said information is received before said rendering GPU begins rendering said geometry,

Wherein the geometry is fully rendered at the rendering GPU when the information is received after rendering of the geometry has started.

2. The method of claim 1, further comprising:

Rendering of the geometry is skipped at the rendering GPU when the information indicates that the geometry does not overlap any screen area allocated to the rendering GPU for object rendering, where the rendering GPU is One of the multiple GPUs mentioned above.

3. The method of claim 1, further comprising:

assigning a plurality of geometries of the image frame to the plurality of GPUs for the geometry pre-testing,

Wherein the geometries of the plurality of geometries are evenly or unevenly distributed among the plurality of GPUs.

4. The method of claim 3, wherein the plurality of geometries are allocated such that consecutive geometries are processed by different GPUs.

5. The method of claim 4, wherein a first GPU performs the geometry pretest on more geometries than a second GPU, or the first GPU performs the geometry pretest while the second GPU performs the geometry pretest. The second GPU does not perform geometry pretesting at all.

6. The method of claim 1, wherein the plurality of screen areas are configured to reduce imbalances in rendering times between the plurality of GPUs.

7. The method of claim 1, wherein each of the plurality of screen areas is not uniform in size.

8. The method of claim 1, wherein the plurality of screen areas change dynamically.

9. The method of claim 1, wherein the geometry corresponds to geometry used or generated by a draw call.

10. The method of claim 1,

Where the geometry used or generated by the application's draw calls is subdivided into multiple geometries, including the geometry for which the GPU generated the information.

11. The method of claim 1, wherein the geometries are individual primitives.

12. The method of claim 1, wherein the information about the geometry includes a vertex count or a primitive count.

13. The method of claim 1, wherein the information about the geometry includes a specific set of primitives for rendering or a specific set of vertices for rendering.

14. The method of claim 1, further comprising:

Using a common rendering command buffer for the multiple GPUs; and

Limit execution of commands in the common rendering command buffer to one or more of the plurality of GPUs.

15. The method of claim 1,

Depending on the nature of the geometry, the information may or may not be generated.

16. The method of claim 1, further comprising:

The information is generated using a scan converter in the rasterization stage.

17. The method of claim 1, further comprising:

The information is generated using one or more shaders in the geometry processing stage.

18. The method of claim 17, wherein the one or more shaders use one or more specialized instructions to accelerate generation of information.

19. The method of claim 17, wherein the one or more shaders do not perform allocation of position caches or parameter caches.

20. The method of claim 1, further comprising:

partitioning a plurality of geometries among the plurality of GPUs for said geometry pretest,

where consecutive geometries are processed by different GPUs.

21. The method of claim 20, further comprising:

The partitioning of the plurality of geometries is dynamically adjusted during the geometry pre-test based on the performance of each of the plurality of GPUs.

22. The method of claim 1,

wherein one or more of the plurality of GPUs are part of a larger GPU configured as a plurality of virtual GPUs.

23. A computer system comprising:

processor;

A memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for implementing a graphics pipeline, the method comprising:

Allocate the geometry of the image frames generated by the application to the GPU for geometry pre-testing;

24. The computer system of claim 23, further comprising:

Rendering of the geometry is skipped at the rendering GPU when the information indicates that the geometry does not overlap any screen area allocated to the rendering GPU for object rendering, wherein the rendering GPU is the multi- One of the GPUs.