CN117274030A

CN117274030A - A mobile-side Vulkan rendering process optimization method

Info

Publication number: CN117274030A
Application number: CN202311284471.5A
Authority: CN
Inventors: 胡思超; 米楠
Original assignee: Faceunity Technology Co ltd; Zhejiang University ZJU
Current assignee: Faceunity Technology Co ltd; Zhejiang University ZJU
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-12-22

Abstract

The invention discloses a mobile terminal Vulkan rendering process optimization method, which is designed based on the characteristics of the Vulkan rendering API and the characteristics of the mobile terminal GPU. Through a series of methods, reduce the number of times the rendering pass is enabled and try to discard unnecessary frame buffers, thereby reducing the read and write overhead of the frame buffer when using the rendering pass. The purpose of the present invention is to reduce the data bandwidth required when the mobile terminal performs drawing, and designs a series of methods of encapsulating the graphics API using Vulkan to optimize the performance of the mobile terminal without increasing the complexity of upper layer calls.

Description

A mobile-side Vulkan rendering process optimization method

技术领域Technical field

本发明属于计算机图形学中的渲染领域，尤其涉及一种移动端的Vulkan绘制流程优化方法。The invention belongs to the field of rendering in computer graphics, and in particular relates to a mobile terminal Vulkan rendering process optimization method.

背景技术Background technique

应用在PC端的IMR(Immediate Mode Rendering)的流程需要在渲染时对每个三角形做一次读写frame buffer和depth buffer，因此这样的架构需要大量的带宽或是大缓存。而大带宽带来的效果就是功耗大，散热要求高，而且移动端受限于尺寸要求，也无法设计大缓存，因此IMR显然无法在移动端应用。因此对于移动端，人们发明了TBR(Tile-BasedRendering)的架构，目的在于减少着色时GPU需要的外部内存访问。TBR的逻辑是将屏幕分成小块(tile)，比如可以将16x16或32x32个像素作为一个tile。将图元分配到每个小块上计算，每个小块拥有单独的高速缓存，在本地做完所有的计算后再将小块的frame buffer写回到主存中，所有的tile完成工作后就得到了最终的frame buffer。因此不难看出，实现移动端渲染时，要尽量减少整个渲染流程中GPU与外存之间的读写操作。The IMR (Immediate Mode Rendering) process applied on the PC side requires reading and writing the frame buffer and depth buffer once for each triangle during rendering, so such an architecture requires a lot of bandwidth or a large cache. The effect of large bandwidth is high power consumption, high heat dissipation requirements, and the mobile terminal is limited by size requirements and cannot design a large cache, so IMR obviously cannot be applied on the mobile terminal. Therefore, for the mobile terminal, people invented the TBR (Tile-BasedRendering) architecture, which aims to reduce the external memory access required by the GPU during shading. The logic of TBR is to divide the screen into small blocks (tiles). For example, 16x16 or 32x32 pixels can be used as a tile. Allocate graphics elements to each small block for calculation. Each small block has a separate cache. After all calculations are completed locally, the frame buffer of the small block is written back to the main memory. After all tiles have completed their work, You get the final frame buffer. Therefore, it is not difficult to see that when implementing mobile rendering, it is necessary to minimize the read and write operations between the GPU and external memory in the entire rendering process.

Frame Graph是一种新的渲染框架，是一种通过获得完整的一帧中的所有信息，从而分析各个渲染通道节点之间的依赖关系来完成优化操作的技术。Frame Graph is a new rendering framework. It is a technology that completes optimization operations by obtaining all the information in a complete frame and analyzing the dependencies between each rendering channel node.

发明内容Contents of the invention

针对现有技术的不足，本发明提供一种移动端的Vulkan绘制流程优化方法，该方法是通过在一帧中减少渲染通道(Render Pass)的数量，将可兼容的Render Pass自动合并为同一个；对于同一个渲染目标，减少其写回显存之后再次被Render Pass读取到Tile的操作。然后自动判断Render Pass的每个对渲染目标对应的纹理是否需要保存计算结果，和是否需要读取先前的已有结果；将不必要存取的纹理自动设置为不存取从而节省带宽开销。最后对于需要开启MSAA的渲染目标，自动判断后续是否需要其在Multiple Sample状态的纹理作为渲染目标，否则指定Render Pass结束时原地resolve到Single Sample的纹理。In view of the shortcomings of the existing technology, the present invention provides a method for optimizing the Vulkan drawing process on the mobile terminal. This method automatically merges compatible Render Passes into one by reducing the number of rendering passes (Render Pass) in one frame; For the same render target, reduce the operation of writing it back to the video memory and then reading it into Tile again by Render Pass. Then it automatically determines whether each texture corresponding to the rendering target of the Render Pass needs to save the calculation results, and whether it needs to read the previous existing results; automatically sets unnecessary access textures to not access to save bandwidth overhead. Finally, for the rendering target that needs to turn on MSAA, it is automatically determined whether its texture in the Multiple Sample state is needed as the rendering target in the future, otherwise it is specified to resolve to the texture of the Single Sample in place at the end of the Render Pass.

从而将MSAA带来的多倍存储开销消除，且以优化移动端的Vulkan绘制流程出现的由读写帧缓冲(Frame Buffer)的开销造成的低性能问题。This eliminates the multiple storage overhead caused by MSAA and optimizes the Vulkan drawing process on the mobile terminal to cause low performance problems caused by the overhead of reading and writing the frame buffer (Frame Buffer).

本发明主要技术方案如下：The main technical solutions of the present invention are as follows:

一种移动端的Vulkan绘制流程优化方法，对Vulkan提供的VkRenderPass相关调用进行封装，在封装函数体内针对其读写帧缓冲的操作进行优化，该方法包括以下步骤：A mobile-side Vulkan drawing process optimization method encapsulates the VkRenderPass-related calls provided by Vulkan, and optimizes its reading and writing frame buffer operations in the encapsulated function body. The method includes the following steps:

(1)对开始渲染通道函数BeginRenderPass、绘制函数Draw和结束渲染通道函数EndRenderPass三个函数接口进行封装；(1) Encapsulate the three function interfaces of the BeginRenderPass function, the Draw function, and the EndRenderPass function;

所述开始渲染通道函数BeginRenderPass是通过传入一组渲染目标RenderTarget和绑定的深度图来获取Vulkan的渲染通道VkRenderPass和帧缓冲对象VkFrameBuffer，且将其保存，检查上一次调用BeginRenderPass时保存的对象是否与当前调用获取的渲染通道和帧缓冲兼容；The start rendering pass function BeginRenderPass obtains the Vulkan rendering pass VkRenderPass and the frame buffer object VkFrameBuffer by passing in a set of rendering targets RenderTarget and the bound depth map, and saves them. Check whether the objects saved when BeginRenderPass was called last time. Compatible with the rendering channel and framebuffer obtained by the current call;

若不兼容则调用Vulkan提供的对应vkCmdEndRenderPass函数来终止上一次启用的渲染通道，并将新的对象保存等待真正使用；If it is not compatible, call the corresponding vkCmdEndRenderPass function provided by Vulkan to terminate the last enabled rendering pass, and save the new object for actual use;

若兼容则无需终止已经启用的渲染通道，则渲染通道自动合并；If compatible, there is no need to terminate the enabled rendering pass, and the rendering passes will be automatically merged;

所述开始渲染通道函数BeginRenderPass获取并保存渲染通道和帧缓冲后，所述绘制函数Draw用于判断是否需要调用Vulkan的API来开始新的渲染通道并开始；所述结束渲染通道函数EndRenderPass的函数接口用于检查上层调用这组函数时的语义明确性，并不执行Vulkan提供的终止渲染通道函数；After the start rendering pass function BeginRenderPass obtains and saves the rendering pass and the frame buffer, the drawing function Draw is used to determine whether it is necessary to call Vulkan's API to start a new rendering pass and start; the function interface of the end rendering pass function EndRenderPass It is used to check the semantic clarity when the upper layer calls this set of functions, and does not execute the termination rendering pass function provided by Vulkan;

(2)所述步骤(1)中调用开始渲染通道函数执行兼容性判定从而形成合并的前提是，在调用开始渲染通道函数之前，由上一次绘制启用的同质的渲染通道对象不因其他情况终止；具体为：(2) The premise for calling the start rendering pass function in step (1) to perform compatibility determination to form a merger is that before calling the start rendering pass function, the homogeneous rendering pass objects enabled by the last drawing are not affected by other circumstances. Termination; specifically:

若一组渲染通道对应写入的帧缓冲对象相同，则能够被合并，若每个渲染通道执行绘制时需要的资源不同，则该资源需要在渲染通道开始前更新；If the frame buffer objects written by a group of rendering passes are the same, they can be merged. If each rendering pass requires different resources when performing drawing, the resources need to be updated before the rendering pass starts;

因此，在一组同质的渲染通道中，为了防止后执行的渲染通道的更新操作会对已启用的同质渲染通道造成打断，将后执行的渲染通道所需的资源对应的更新操作提前到这组渲染通道之前需利用Frame Graph来实现这个目标；(3)在步骤(2)中判定渲染通道间由资源连接的依赖关系时，对于不存在依赖的资源，能通过在渲染通道结束时丢弃资源来防止冗余写回操作；具体为利用Frame Graph将每个渲染通道抽象成一个渲染节点，将步骤(1)中被合并的渲染通道视为同一个渲染节点；对于每个渲染节点，利用Frame Graph判定其绑定的帧缓冲中的每个渲染目标是否被后续的节点依赖，或依赖于先前的节点；对于不存在相应依赖关系的渲染目标，通过在创建渲染通道时设置相应的标志位来避免对其进行冗余的读写操作。Therefore, in a group of homogeneous rendering passes, in order to prevent the update operation of the rendering pass executed later from interrupting the enabled homogeneous rendering pass, the update operation corresponding to the resources required by the rendering pass executed later is advanced. Frame Graph needs to be used to achieve this goal before entering this group of rendering passes; (3) When determining the dependencies between rendering passes connected by resources in step (2), for resources that do not have dependencies, they can be passed at the end of the rendering pass Discard resources to prevent redundant writeback operations; specifically, use Frame Graph to abstract each rendering pass into a rendering node, and treat the merged rendering passes in step (1) as the same rendering node; for each rendering node, Use Frame Graph to determine whether each rendering target in its bound frame buffer is dependent on subsequent nodes or depends on previous nodes; for rendering targets that do not have corresponding dependencies, set the corresponding flags when creating the rendering pass. bit to avoid redundant read and write operations.

(4)对于开启多重采样抗锯齿MSAA的渲染通道，因需要对其进行解析来转换成未开启多重采样的普通纹理，从而产生最终需要的结果，而渲染通道结束时将多重采样的目标写回显存，然后再执行解析的操作代价过高，所以需要在步骤(3)判定其渲染目标时需要进行特殊处理来避免冗余读写；具体为，在创建渲染通道时对多重采样的目标指定对应的解析后的渲染目标，即对渲染通道额外绑定一个渲染目标，使得解析操作可以在渲染通道结束时原地完成，从而舍弃多重采样的目标，只需将解析后的目标写回显存。(4) For the rendering pass that turns on multi-sampling anti-aliasing MSAA, it needs to be parsed and converted into a normal texture without multi-sampling turned on, so as to produce the final required result, and the multi-sampling target is written back at the end of the rendering pass. The cost of the operation is too high, so when determining the rendering target in step (3), special processing is required to avoid redundant reading and writing; specifically, when creating a rendering pass, specify the corresponding multi-sampling target. The parsed rendering target means that an additional rendering target is bound to the rendering pass, so that the parsing operation can be completed in place at the end of the rendering pass, thereby discarding the multi-sampling target and only writing the parsed target back to the video memory.

进一步地，所述步骤(1)包括如下子步骤：Further, the step (1) includes the following sub-steps:

(1.1)维护渲染通道和帧缓冲的缓存，在每次调用开始渲染通道函数时，根据传入参数中的渲染目标和深度图在缓存中寻找对应的渲染通道和帧缓冲，若寻找失败则生成新的渲染通道和帧缓冲并将其保存在各自缓存中；(1.1) Maintain the cache of rendering channels and frame buffers. Each time the start rendering channel function is called, the corresponding rendering channels and frame buffers are searched for in the cache based on the rendering target and depth map in the passed parameters. If the search fails, a generated New render passes and framebuffers and saving them in their respective caches;

(1.2)在开始渲染通道函数中，判断渲染通道兼容性，此时寻找和创建渲染通道和帧缓冲的关键字只与渲染目标和深度模板图的参数相关，而与渲染通道自身的操作无关；(1.2) In the start rendering pass function, the compatibility of the rendering pass is determined. At this time, the keywords for finding and creating rendering passes and frame buffers are only related to the parameters of the rendering target and depth template map, and have nothing to do with the operation of the rendering pass itself;

(1.3)在绘制函数中，判断是否需要调用Vulkan的API来开始新的渲染通道并开始；(1.3) In the drawing function, determine whether it is necessary to call Vulkan's API to start a new rendering pass and start;

(1.4)在结束渲染通道函数中，保证渲染硬件接口语义的完整性，确保每个开始的调用有相对应的结束调用。(1.4) In the end rendering pass function, ensure the integrity of the rendering hardware interface semantics and ensure that each initial call has a corresponding end call.

进一步地，所述步骤(2)包括如下子步骤：Further, the step (2) includes the following sub-steps:

(2.1)利用Frame Graph的架构设计，将大部分渲染通道所需的资源更新操作放在Frame(2.1) Utilize the architectural design of Frame Graph to place most of the resource update operations required for the rendering pass in the Frame

Graph的构建阶段，而实际的绘制流程发生在构建之后的执行阶段；The construction phase of Graph, and the actual drawing process occurs in the execution phase after construction;

(2.2)对于某渲染节点的结果被后续渲染节点依赖的情况，为了避免其状态切换发生在后续同质渲染通道间造成对合并同质渲染通道的打断，则利用通过Frame Graph的分析进行提前记录资源需要进行的状态切换，将其状态切换的操作的执行时机放在真正执行Vulkan的终止渲染通道函数后，即不同质的渲染节点间，从而阻止其结果作为资源在本该合并的同质渲染通道间进行状态切换。(2.2) For the situation where the result of a certain rendering node is dependent on subsequent rendering nodes, in order to avoid its state switching occurring between subsequent homogeneous rendering passes and causing interruption to the merged homogeneous rendering passes, the analysis through Frame Graph is used to perform advance analysis. Record the state switching that the resource needs to perform, and place the execution timing of the state switching operation after the actual execution of Vulkan's termination rendering pass function, that is, between non-homogeneous rendering nodes, thus preventing the results from being used as resources in homogeneous areas that should be merged. Switch states between rendering passes.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

本发明提供了一种优化移动端Vulkan绘制流程的方法，通过在Frame Graph层添加判定逻辑来对渲染通道节点进行排序，同时根据排序后的结果判定每个渲染通道节点中的每个颜色和深度附件的读写操作，并对开启MSAA的纹理对象进行特判，从而节省了在移动端用Vulkan进行绘制时对帧缓冲读写的开销，一定程度上减少了移动端由带宽限制带来的性能瓶颈问题，提高了整体的性能表现。The present invention provides a method for optimizing the Vulkan drawing process on the mobile terminal. By adding judgment logic in the Frame Graph layer, the rendering channel nodes are sorted, and at the same time, each color and depth in each rendering channel node is determined based on the sorted results. Read and write operations on attachments, and perform special judgment on texture objects with MSAA turned on, thus saving the overhead of reading and writing the frame buffer when drawing with Vulkan on the mobile terminal, and reducing the performance caused by bandwidth limitations on the mobile terminal to a certain extent. bottleneck problem and improve overall performance.

由上述实施例可知，本申请应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本申请。As can be seen from the above embodiments, the present application should be understood that the above general description and the following detailed description are only illustrative and explanatory, and do not limit the present application.

附图说明Description of the drawings

图1为对合并Render Pass逻辑的底层实现；Figure 1 shows the underlying implementation of the merged Render Pass logic;

图2为判定Frame Graph计算Load Store Flag的逻辑流程图；Figure 2 is a logical flow chart for determining Frame Graph and calculating Load Store Flag;

图3为Frame Graph进行Load Store Flag计算后的示例图。Figure 3 is an example diagram after Frame Graph calculates Load Store Flag.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application.

本申请实施例的目的是提供一种方法，以优化移动端的Vulkan绘制流程出现的由读写Frame Buffer的开销造成的低性能问题。The purpose of the embodiments of this application is to provide a method to optimize the low performance problem caused by the overhead of reading and writing the Frame Buffer in the Vulkan drawing process on the mobile terminal.

第一步：首先封装开始渲染通道(BeginRenderPass)、绘制(Draw)、结束渲染通道(EndRenderPass)的函数接口，BeginRenderPass函数的作用是通过传入一组渲染目标和绑定的深度图(Render Targets)来获取VkRenderPass和VkFrameBuffer对象。Step 1: First encapsulate the function interfaces of BeginRenderPass, Draw, and EndRenderPass. The function of the BeginRenderPass function is to pass in a set of render targets and bound depth maps (Render Targets). to get VkRenderPass and VkFrameBuffer objects.

如图1所示，此时检查上一次调用BeginRenderPass时设置的对象是否与当前获取的VkRenderPass和VkFrameBuffer兼容，若不兼容则调用vkCmdEndRenderPass来终止上一次启用的VkRenderPass，并将新的对象保存等待真正使用；若兼容则无需终止已经启用的VkRenderPass，从而减少对vkCmdBeginRenderPass和vkCmdEndRenderPass的调用，达到对VkRenderPass自动合并的效果。As shown in Figure 1, at this time, check whether the object set when BeginRenderPass was called last time is compatible with the currently obtained VkRenderPass and VkFrameBuffer. If it is not compatible, call vkCmdEndRenderPass to terminate the last enabled VkRenderPass, and save the new object for actual use. ; If it is compatible, there is no need to terminate the enabled VkRenderPass, thereby reducing the calls to vkCmdBeginRenderPass and vkCmdEndRenderPass, and achieving the effect of automatic merging of VkRenderPass.

(a)维护VkFrameBuffer和VkRenderPass的缓存，在每次调用BeginRenderPass函数时，根据传入参数中的Render Targets在缓存寻找对应的VkFrameBuffer和VkRenderPass，若寻找失败则生成新的VkFrameBuffer和VkRenderPass并将其保存在各自缓存中；(a) Maintain the cache of VkFrameBuffer and VkRenderPass. Each time the BeginRenderPass function is called, the corresponding VkFrameBuffer and VkRenderPass are searched for in the cache according to the Render Targets in the passed parameters. If the search fails, new VkFrameBuffer and VkRenderPass are generated and saved in in their respective caches;

(b)从BeginRenderPass函数获取的VkRenderPass不会立即开始，而是在封装的Draw函数内调用vkCmdDraw函数之前调用vkCmdBeginRenderPass来开始；(b) The VkRenderPass obtained from the BeginRenderPass function will not start immediately, but will start by calling vkCmdBeginRenderPass before calling the vkCmdDraw function within the encapsulated Draw function;

(c)EndRenderPass函数不会调用vkCmdEndRenderPass来终止渲染通道，为了使渲染通道尽可能被合并，这里只是加上一个标记，真正的vkCmdEndRenderPass调用会出现在BeginRenderPass函数发现渲染目标切换时；(c) The EndRenderPass function will not call vkCmdEndRenderPass to terminate the rendering pass. In order to make the rendering passes merged as much as possible, a mark is just added here. The real vkCmdEndRenderPass call will appear when the BeginRenderPass function finds that the rendering target is switched;

(d)寻找和创建VkFrameBuffer和VkRenderPass的关键字仅与Render Targets的参数相关，而与Render Pass自身的操作无关(比如是否对Render Targets做读入、清除等操作)，这样就保证了对VkRenderPass兼容性的判断。(d) The keywords for finding and creating VkFrameBuffer and VkRenderPass are only related to the parameters of Render Targets, and have nothing to do with the operation of Render Pass itself (such as whether to read in, clear, etc. Render Targets), thus ensuring compatibility with VkRenderPass sexual judgment.

第二步：除了第一步中的自动合并逻辑之外，由于一些API调用因为需要在VkRenderPass的作用域之外，因此会对本该可以被合并的VkRenderPass造成打断。主要有以下两种情况：Step 2: In addition to the automatic merging logic in the first step, some API calls need to be outside the scope of VkRenderPass, which will interrupt the VkRenderPass that should be merged. There are mainly two situations:

(a)一些对渲染所需资源的数据或状态进行更新(比如vkCmdCopyBuffer、vkCmdCopyImage、转换VkImageLayout等操作)。这些操作需要尽可能被提前到所有VkRenderPass开始之前，同时也要减少对资源的不必要的状态更新。这一点可以通过FrameGraph来实现，在Frame Graph中判定Buffer和Image等资源和Render Pass节点的依赖关系，对于不依赖当前帧所有Render Pass的资源，将对其更新的操作提前到帧绘制的开始阶段；对于存在依赖关系的资源(比如某个节点输出的资源结果被后续某节点作为资源输入)，在两个节点间寻找VkRenderPass会被切换的位置进行插入；(a) Some updates to the data or status of resources required for rendering (such as vkCmdCopyBuffer, vkCmdCopyImage, conversion to VkImageLayout, etc.). These operations need to be advanced as much as possible before all VkRenderPass starts, while also reducing unnecessary status updates to resources. This can be achieved through Frame Graph. Determine the dependency relationship between resources such as Buffer and Image and Render Pass nodes in Frame Graph. For resources that do not depend on all Render Pass of the current frame, the update operation will be advanced to the beginning of frame drawing. ; For resources that have dependencies (for example, the resource result output by a node is used as a resource input by a subsequent node), find the position where VkRenderPass will be switched between the two nodes and insert it;

(b)由于Compute Pass的操作必然会打断VkRenderPass，因此在Frame Graph中需要根据Compute Pass和Render Pass的依赖关系，将允许被合并的Render Pass节点尽量合并，(比如对于渲染通道C同时依赖计算通道A和渲染通道B，但计算通道A和渲染通道B之间相互独立的情况，Frame Graph可以将计算通道A的执行放到渲染通道B之前)。(b) Since the operation of Compute Pass will inevitably interrupt VkRenderPass, the Render Pass nodes that are allowed to be merged need to be merged as much as possible according to the dependency relationship between Compute Pass and Render Pass in Frame Graph (for example, for rendering channel C, which also depends on calculation Pass A and rendering pass B, but when calculation pass A and rendering pass B are independent of each other, Frame Graph can put the execution of calculation pass A before rendering pass B).

第三步，在Frame Graph中判定每个Render节点的每个渲染结果是否被后续的Render节点或Compute节点依赖，这里同时要考虑的是：因为上述的Render节点真正的合并操作在底层进行，合并逻辑对Frame Graph不可见，因此Frame Graph需要自行判定相邻的Render节点是否会被合并，这里通过判断两个节点的输出结果是否一致(即渲染目标一致)才判定两个Render节点是否合并，若存在合并情况则合并后的整体作为一个节点考虑。这里需要对每个Render节点关联的每个输出资源(即Frame Buffer中每个Attachment)进行判定。判定逻辑如图2所示，具体流程如下：The third step is to determine in the Frame Graph whether each rendering result of each Render node is dependent on the subsequent Render node or Compute node. What should also be considered here is: because the real merging operation of the above Render nodes is performed at the bottom, the merging The logic is not visible to the Frame Graph, so the Frame Graph needs to determine whether the adjacent Render nodes will be merged. Here, it is determined whether the two Render nodes are merged by judging whether the output results of the two nodes are consistent (that is, the rendering targets are consistent). If If there is a merge, the merged whole will be considered as one node. Here you need to determine each output resource associated with each Render node (that is, each Attachment in the Frame Buffer). The decision logic is shown in Figure 2, and the specific process is as follows:

(a)对于每个Render节点，对于该Render节点的每个Attachment，遍历判定后续节点(除去会和当前Render节点合并的其他Render节点之外)是否依赖其结果，若后续其他Render节点都不需要当前Attachment，则该渲染结果可以在当前Render节点结束后被丢弃而无需写回显存，通过在VkAttachmentDescription中指定VkAttachmentStoreOp为VK_ATTACHMENT_STORE_OP_DONT_CARE来完成，否则设置为VK_ATTACHMENT_STORE_OP_STORE。(a) For each Render node, for each Attachment of the Render node, traverse to determine whether subsequent nodes (except for other Render nodes that will be merged with the current Render node) depend on its results. If other subsequent Render nodes do not need it, Current Attachment, the rendering result can be discarded after the current Render node ends without writing back to the video memory. This is accomplished by specifying VkAttachmentStoreOp as VK_ATTACHMENT_STORE_OP_DONT_CARE in VkAttachmentDescription, otherwise it is set to VK_ATTACHMENT_STORE_OP_STORE.

(b)同理，每个Render节点也需要对每个绑定的Attachment判定在此之前的Render节点(除去会和当前Render节点合并的其他Render节点之外)是否有对其写入的操作。若在此之前的某个Render节点对当前Attachment做了写的操作并且保存，那么需要在VkAttachmentDescription中设置VkAttachmentLoadOp为VK_ATTACHMENT_LOAD_OP_LOAD，否则设置为VK_ATTACHMENT_LOAD_OP_DONT_CARE。(b) In the same way, each Render node also needs to determine for each bound Attachment whether the previous Render node (except for other Render nodes that will be merged with the current Render node) has written operations to it. If a previous Render node has written and saved the current Attachment, then you need to set VkAttachmentLoadOp to VK_ATTACHMENT_LOAD_OP_LOAD in VkAttachmentDescription, otherwise set it to VK_ATTACHMENT_LOAD_OP_DONT_CARE.

第四步，对于开启MSAA的纹理资源，将其和相对应的Resolve的纹理作为同一个对象来管理。在第三步中用Frame Graph对Render节点的渲染目标进行判定时区分是否对该渲染目标对象开启了多重采样抗锯齿(MSAA)。对于开启了MSAA的对象，默认其作为节点的输入资源时需要其解析(Resolve)的结果,作为节点的输出结果时需要其作为多重采样(Multiple Sample)的对象。从这个逻辑出发可以得出，对于没有后续节点将当前的某MSAA渲染目标再次作为渲染目标的情况，而只是作为后续节点需要的输入资源，则MultipleSample的对象可以被丢弃从而大大减少存储开销，具体见图3，该图为通过这步逻辑计算获得效果的简单例子。The fourth step is to manage the texture resource with MSAA turned on and the corresponding Resolve texture as the same object. In the third step, use Frame Graph to determine the rendering target of the Render node to determine whether multi-sampling anti-aliasing (MSAA) is turned on for the rendering target object. For objects with MSAA turned on, by default, when used as the input resource of a node, the result of its resolution (Resolve) is required, and when used as the output result of the node, it is required to be used as the object of multiple sampling (Multiple Sample). From this logic, it can be concluded that if there is no subsequent node that uses the current MSAA rendering target as a rendering target again, but only as an input resource required by subsequent nodes, the MultipleSample object can be discarded to greatly reduce storage overhead. Specifically See Figure 3, which is a simple example of the effect obtained through this logical calculation.

(a)第三步中使用Frame Graph判定MSAA的纹理时，判定是否写回显存时进行特判，对于后续不存在再次进行写入的节点的情况，可以将MSAA的对象的VkAttachmentStoreOp设置为VK_ATTACHMENT_STORE_OP_DONT_CARE。(a) When using Frame Graph to determine the texture of MSAA in the third step, a special judgment is made when determining whether to write it back to the video memory. If there is no node to be written again in the future, the VkAttachmentStoreOp of the MSAA object can be set to VK_ATTACHMENT_STORE_OP_DONT_CARE.

(b)相对应的Resolve纹理的读写操作固定为VK_ATTACHMENT_LOAD_OP_DONT_CARE和VK_ATTACHMENT_STORE_OP_STORE。(b) The corresponding read and write operations of the Resolve texture are fixed to VK_ATTACHMENT_LOAD_OP_DONT_CARE and VK_ATTACHMENT_STORE_OP_STORE.

(c)对于MSAA的Depth Stencil Attachment，也可以使用名为:VK_KHR_depth_stencil_resolve的扩展提供的结构体VkSubpassDescriptionDepthStencilResolve来完成原地Resolve的操作。(c) For MSAA's Depth Stencil Attachment, you can also use the structure VkSubpassDescriptionDepthStencilResolve provided by the extension named: VK_KHR_depth_stencil_resolve to complete the in-place Resolve operation.

本领域技术人员在考虑说明书及实践这里公开的内容后，将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. .

应当理解的是，本申请并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof.

Claims

1. A mobile terminal Vulkan drawing process optimization method, characterized by encapsulating the VkRenderPass related calls provided by Vulkan, and optimizing its read and write frame buffer operations in the encapsulated function body. The method includes the following steps:

(1) Encapsulate the three functions of BeginRenderPass, Draw and EndRenderPass for the upper layer to call, and implement calls to Vulkan related APIs inside the encapsulated function body;

The start rendering pass function BeginRenderPass obtains the Vulkan rendering pass VkRenderPass and the frame buffer object VkFrameBuffer by passing in a set of rendering targets Render Target and the bound depth map, and saves them. Check the objects saved when BeginRenderPass was last called. Whether it is compatible with the rendering channel and frame buffer obtained by the current call;

If it is not compatible, call the corresponding vkCmdEndRenderPass function provided by Vulkan to terminate the last enabled rendering pass, and save the new object for actual use;

If compatible, there is no need to terminate the enabled rendering pass, and the rendering passes will be automatically merged;

After the start rendering pass function BeginRenderPass obtains and saves the rendering pass and the frame buffer, the drawing function Draw is used to determine whether it is necessary to call Vulkan's API to start a new rendering pass and start; the function interface of the end rendering pass function EndRenderPass It is used to check the semantic clarity when the upper layer calls this set of functions, and does not execute the termination rendering pass function provided by Vulkan;

(2) The premise for calling the start rendering pass function in step (1) to perform compatibility determination to form a merger is that before calling the start rendering pass function, the homogeneous rendering pass objects enabled by the last drawing are not affected by other circumstances. Termination; specifically:

If the frame buffer objects written by a group of rendering passes are the same, they can be merged. If each rendering pass requires different resources when performing drawing, the resources need to be updated before the rendering pass starts;

Therefore, in a group of homogeneous rendering passes, in order to prevent the update operation of the rendering pass executed later from interrupting the enabled homogeneous rendering pass, the update operation corresponding to the resources required by the rendering pass executed later is advanced. Before entering this set of rendering passes, Frame Graph needs to be used to achieve this goal;

(3) When determining the dependencies between rendering passes connected by resources in step (2), for resources that do not have dependencies, redundant writeback operations can be prevented by discarding the resources at the end of the rendering pass; specifically, using Frame Graph Abstract each rendering channel into a rendering node, and treat the merged rendering channels in step (1) as the same rendering node; for each rendering node, use Frame Graph to determine each rendering in its bound frame buffer. Whether the target is dependent on subsequent nodes or depends on previous nodes; for rendering targets that do not have corresponding dependencies, set the corresponding flag bits when creating the rendering pass to avoid redundant read and write operations;

(4) For the rendering pass that turns on multi-sampling anti-aliasing MSAA, it needs to be parsed and converted into a normal texture without multi-sampling turned on, so as to produce the final required result, and the multi-sampling target is written back at the end of the rendering pass. The cost of the operation is too high, so when determining the rendering target in step (3), special processing is required to avoid redundant reading and writing; specifically, when creating a rendering pass, specify the corresponding multi-sampling target. The parsed rendering target means that an additional rendering target is bound to the rendering pass, so that the parsing operation can be completed in place at the end of the rendering pass, thereby discarding the multi-sampling target and only writing the parsed target back to the video memory.

2. A method for optimizing the Vulkan rendering process of a mobile terminal according to claim 1, characterized in that the step (1) includes the following sub-steps:

(1.1) Maintain the cache of rendering channels and frame buffers. Each time the start rendering channel function is called, the corresponding rendering channels and frame buffers are searched for in the cache based on the rendering target and depth map in the passed parameters. If the search fails, a generated New render passes and framebuffers and saving them in their respective caches;

(1.2) In the start rendering pass function, the compatibility of the rendering pass is determined. At this time, the keywords for finding and creating rendering passes and frame buffers are only related to the parameters of the rendering target and depth template map, and have nothing to do with the operation of the rendering pass itself;

(1.3) In the drawing function, determine whether it is necessary to call Vulkan's API to start a new rendering pass and start;

(1.4) In the end rendering pass function, ensure the integrity of the rendering hardware interface semantics and ensure that each initial call has a corresponding end call.

3. A method for optimizing the Vulkan rendering process of a mobile terminal according to claim 1, characterized in that the step (2) includes the following sub-steps:

(2.1) Using the architectural design of Frame Graph, most of the resource update operations required for the rendering channel are placed in the construction phase of Frame Graph, and the actual drawing process occurs in the execution phase after construction;

(2.2) For the situation where the result of a certain rendering node is dependent on subsequent rendering nodes, in order to avoid its state switching occurring between subsequent homogeneous rendering passes and causing interruption to the merged homogeneous rendering passes, the analysis through Frame Graph is used to perform advance analysis. Record the state switching that the resource needs to perform, and place the execution timing of the state switching operation after the actual execution of Vulkan's termination rendering pass function, that is, between non-homogeneous rendering nodes, thus preventing the results from being used as resources in homogeneous areas that should be merged. Switch states between rendering passes.