[go: up one dir, main page]

Skip to content

Render graph

The Vulkan backend uses a render graph for sending commands to the GPU.

Multiple threads can send commands to the GPU. Every thread has a GPU context. A GPU context can only be active on a single thread.

To make sure that the commands are not mingled they are added to a render graph. At a certain time the render graph is submitted to the GPU device.

Features

  • User of the GPU modules doesn't need to order the commands to be supported by the Backend. Data transfer, compute and drawing commands can be send in any sequential order. The render graph will reorder the commands so data transfers & compute commands are performed outside an rendering scope. Rendering will be suspended/resumed when needed
  • Reordering commands also requires generating specific synchronization barriers. These barriers make sure that we don't read from resources that are still being written to, and ensure the optimal ordering of pixels of images.

Overview

  • When a context is activated on a thread the context asks for a render graph it can use by calling VKDevice::render_graph_new.
  • Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout.
  • When the context is flushed the render graph is submitted to the device by calling VKDevice::render_graph_submit.
  • The device puts the render graph in VKDevice::submission_pool.
  • There is a single background thread that gets the next render graph to send to the GPU (VKDevice::submission_runner).
    • Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (VKScheduler)
    • Generate the required barriers VKCommandBuilder::groups_extract_barriers. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes.
    • GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (VKCommandBuilder::record_commands)
    • When completed the command buffer can be submitted to the device queue. vkQueueSubmit
    • Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the VKDevice::unused_render_graphs queue.
sequenceDiagram
    box Application thread
        participant VKContext
        participant VKDevice
    end
    participant submission_pool
    participant unused_render_graphs

    box Submission thread
        participant submission_runner
        participant VKScheduler
        participant VKCommandBuilder
    end

    box Vulkan API
        participant VkQueue
    end

    activate submission_runner

    Note over VKContext, VKDevice: Request render graph from device
    VKContext ->> VKDevice: render_graph_new()
    activate VKDevice
    VKDevice ->> unused_render_graphs: pop
    activate unused_render_graphs
    unused_render_graphs -->> VKDevice: VKRenderGraph
    deactivate unused_render_graphs
    VKDevice -->> VKContext: VKRenderGraph
    deactivate VKDevice

    Note over VKContext: Fill render graph
    VKContext ->> VKContext: Add nodes to render graph

    Note over VKContext, VKDevice: Submit render graph to device
    VKContext ->> VKDevice : render_graph_submit
    activate VKDevice
    VKDevice -->> submission_pool: push
    VKDevice -->> VKContext: TimelineValue
    deactivate VKDevice

    submission_runner ->> submission_pool: pop
    activate submission_pool
    submission_pool -->> submission_runner: VKRenderGraph
    deactivate submission_pool

    Note over submission_runner, VKScheduler: Determine recording order
    submission_runner ->> VKScheduler: select_nodes
    activate VKScheduler
    VKScheduler -->> submission_runner: NodeHandles
    deactivate VKScheduler

    Note over submission_runner, VKCommandBuilder: Extract pipeline barriers
    critical Locks resource state
    submission_runner ->> VKCommandBuilder::build_nodes
    activate VKCommandBuilder
    deactivate VKCommandBuilder
    end

    Note over submission_runner, VKCommandBuilder: Record primary command buffer
    critical Locking driver logic
    submission_runner ->> VKCommandBuilder::record_commands
    activate VKCommandBuilder
    deactivate VKCommandBuilder 
    end

    Note over submission_runner, VKQueue: Submit command buffer to device queue
    submission_runner ->> VkQueue: vkQueueSubmit 
    activate VkQueue
    deactivate VkQueue

    Note over unused_render_graphs, submission_runner: Mark render graph for reuse
    submission_runner ->> unused_render_graphs: push

    deactivate submission_runner

When multiple contexts request for a render graph they will receive all their own instance allowing non blocking building of the render graphs. When submitting they are processed in submission order. Ensuring correct resource state when extracting pipeline barriers. The recording of the actual commands is separate from the command building so application thread are not blocked when creating new resources or adding nodes to the render graph. During tests it has been validated that for our workload it is better to record commands in a primary command. Using multiple threads or secondary command buffers quickly blocks inside the driver due to mutexes or PCI bus performance.

Note

Most time of this process is spent in the driver when calling VKCommandBuilder::record_commands. Using more threads and secondary command builders didn't improve the performance.

Nodes

Commands in the render graph are recorded as nodes. Compared to other render graphs the render scope is split in multiple nodes. This is because Blender currently doesn't have the concepts of a render scope.

Node Vulkan API
Draw
VKBeginRenderingNode vkCmdBeginRendering/vkCmdBeginRenderPass
VKClearAttachmentsNode vkCmdClearAttachments
VKDrawIndexedIndirectNode vkCmdDrawIndexedIndirect
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants
vkCmdBindVertexBuffers/vkCmdBindIndexBuffer]
VKDrawIndexedNode vkCmdDrawIndexed
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants
vkCmdBindVertexBuffers/vkCmdBindIndexBuffer]
VKDrawIndirectNode vkCmdDrawIndirect
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants
vkCmdBindVertexBuffers]
VKDrawNode vkCmdDraw
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants
vkCmdBindVertexBuffers]
VKEndRenderingNode vkCmdEndRendering/vkCmdEndRenderPass
Compute
VKDispatchIndirectNode vkCmdDispatchIndirect
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants]
VKDispatchNode vkCmdDispatch
optional: [vkCmdBindPipeline/vkCmdBindDescriptorSets/vkCmdPushConstants]
Data transfer
VKBlitImageNode vkCmdBlitImage
VKClearColorImageNode vkCmdClearColorImage
VKClearDepthStencilImageNode vkCmdClearDepthStencilImageNode
VKCopyBufferNode vkCmdCopyBuffer
VKCopyBufferToImageNode vkCmdCopyBufferToImage
VKCopyImageNode vkCmdCopyImage
VKCopyImageToBufferNode vkCmdCopyImageToBufferNode
VKFillBufferNode vkCmdFillBuffer
VKSynchonizationNode vkCmdPipelineBarrier (only used for swapchain barriers)
VKUpdateBufferNode vkCmdUpdateBuffer
VKUpdateMipmapsNode vkCmdPipelineBarrier/vkCmdBlitImage
GPU selection
VKBeginQueryNode vkCmdBeginQuery
VKEndQueryNode vkCmdEndQuery
VKResetQueryPoolNode vkCmdResetQueryPool

Note

vkCmdDraw* and vkCmdDispatch* only sends binding commands when the bindings are different. When attaching the same resources they are not generated.

Creating a node

In development: The next section is in development (2025Q1). It is already added here as the current implementation performs to many memory copies.

Nodes are first allocated. During allocation memory is reserved that the caller can use to store the data that the node requires. The caller fills the data and calls create_node.

Node data can be quite large (VKBeginRenderingNode::Data is just less than 1Kb) it stores its data in an array (render_graph.storage.begin_rendering) and points to the item via (VKRenderGraphNode::storage_index).

VKRenderGraph &render_graph;
VKBeginRenderingNode::Data &node_data = render_graph::allocate_node<VKBeginRenderingNode>();
VKBeginRenderingNode::CreateInfo create_info = {};

/* Fill data. */
...

/* Fill create info. */
...

render_graph::finalize_node<VKBeginRenderingNode>(create_info);

During finalizing node extracts its resource dependencies together with the needed access flags and needed image layout.

  • When calling allocate_node it is required that finalize_node is called, before allocating a new node.
  • For readability we could wrap the node_handle, node_data and create_info in a wrapper.