Render graph¶
The Vulkan backend uses a render graph for sending commands to the GPU.
Multiple threads can send commands to the GPU. Every thread has a GPU context. A GPU context can only be active on a single thread.
To make sure that the commands are not mingled they are added to a render graph. At a certain time the render graph is submitted to the GPU device.
Features¶
- User of the GPU modules doesn't need to order the commands to be supported by the Backend. Data transfer, compute and drawing commands can be send in any sequential order. The render graph will reorder the commands so data transfers & compute commands are performed outside an rendering scope. Rendering will be suspended/resumed when needed
- Reordering commands also requires generating specific synchronization barriers. These barriers make sure that we don't read from resources that are still being written to, and ensure the optimal ordering of pixels of images.
Overview¶
- When a context is activated on a thread the context asks for a render graph it can use by calling
VKDevice::render_graph_new
. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout.
- When the context is flushed the render graph is submitted to the device by calling
VKDevice::render_graph_submit
. - The device puts the render graph in
VKDevice::submission_pool
. - There is a single background thread that gets the next render graph to send to the GPU (
VKDevice::submission_runner
).- Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (
VKScheduler
) - Generate the required barriers
VKCommandBuilder::groups_extract_barriers
. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (
VKCommandBuilder::record_commands
) - When completed the command buffer can be submitted to the device queue.
vkQueueSubmit
- Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the
VKDevice::unused_render_graphs
queue.
- Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (
sequenceDiagram
box Application thread
participant VKContext
participant VKDevice
end
participant submission_pool
participant unused_render_graphs
box Submission thread
participant submission_runner
participant VKScheduler
participant VKCommandBuilder
end
box Vulkan API
participant VkQueue
end
activate submission_runner
Note over VKContext, VKDevice: Request render graph from device
VKContext ->> VKDevice: render_graph_new()
activate VKDevice
VKDevice ->> unused_render_graphs: pop
activate unused_render_graphs
unused_render_graphs -->> VKDevice: VKRenderGraph
deactivate unused_render_graphs
VKDevice -->> VKContext: VKRenderGraph
deactivate VKDevice
Note over VKContext: Fill render graph
VKContext ->> VKContext: Add nodes to render graph
Note over VKContext, VKDevice: Submit render graph to device
VKContext ->> VKDevice : render_graph_submit
activate VKDevice
VKDevice -->> submission_pool: push
VKDevice -->> VKContext: TimelineValue
deactivate VKDevice
submission_runner ->> submission_pool: pop
activate submission_pool
submission_pool -->> submission_runner: VKRenderGraph
deactivate submission_pool
Note over submission_runner, VKScheduler: Determine recording order
submission_runner ->> VKScheduler: select_nodes
activate VKScheduler
VKScheduler -->> submission_runner: NodeHandles
deactivate VKScheduler
Note over submission_runner, VKCommandBuilder: Extract pipeline barriers
critical Locks resource state
submission_runner ->> VKCommandBuilder::build_nodes
activate VKCommandBuilder
deactivate VKCommandBuilder
end
Note over submission_runner, VKCommandBuilder: Record primary command buffer
critical Locking driver logic
submission_runner ->> VKCommandBuilder::record_commands
activate VKCommandBuilder
deactivate VKCommandBuilder
end
Note over submission_runner, VKQueue: Submit command buffer to device queue
submission_runner ->> VkQueue: vkQueueSubmit
activate VkQueue
deactivate VkQueue
Note over unused_render_graphs, submission_runner: Mark render graph for reuse
submission_runner ->> unused_render_graphs: push
deactivate submission_runner
When multiple contexts request for a render graph they will receive all their own instance allowing non blocking building of the render graphs. When submitting they are processed in submission order. Ensuring correct resource state when extracting pipeline barriers. The recording of the actual commands is separate from the command building so application thread are not blocked when creating new resources or adding nodes to the render graph. During tests it has been validated that for our workload it is better to record commands in a primary command. Using multiple threads or secondary command buffers quickly blocks inside the driver due to mutexes or PCI bus performance.
Note
Most time of this process is spent in the driver when calling VKCommandBuilder::record_commands
. Using more threads and secondary command builders didn't improve the performance.
Nodes¶
Commands in the render graph are recorded as nodes. Compared to other render graphs the render scope is split in multiple nodes. This is because Blender currently doesn't have the concepts of a render scope.
Node | Vulkan API |
---|---|
Draw | |
VKBeginRenderingNode |
vkCmdBeginRendering /vkCmdBeginRenderPass |
VKClearAttachmentsNode |
vkCmdClearAttachments |
VKDrawIndexedIndirectNode |
vkCmdDrawIndexedIndirect |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants |
|
vkCmdBindVertexBuffers /vkCmdBindIndexBuffer ] |
|
VKDrawIndexedNode |
vkCmdDrawIndexed |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants |
|
vkCmdBindVertexBuffers /vkCmdBindIndexBuffer ] |
|
VKDrawIndirectNode |
vkCmdDrawIndirect |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants |
|
vkCmdBindVertexBuffers ] |
|
VKDrawNode |
vkCmdDraw |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants |
|
vkCmdBindVertexBuffers ] |
|
VKEndRenderingNode |
vkCmdEndRendering /vkCmdEndRenderPass |
Compute | |
VKDispatchIndirectNode |
vkCmdDispatchIndirect |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants ] |
|
VKDispatchNode |
vkCmdDispatch |
optional: [vkCmdBindPipeline /vkCmdBindDescriptorSets /vkCmdPushConstants ] |
|
Data transfer | |
VKBlitImageNode |
vkCmdBlitImage |
VKClearColorImageNode |
vkCmdClearColorImage |
VKClearDepthStencilImageNode |
vkCmdClearDepthStencilImageNode |
VKCopyBufferNode |
vkCmdCopyBuffer |
VKCopyBufferToImageNode |
vkCmdCopyBufferToImage |
VKCopyImageNode |
vkCmdCopyImage |
VKCopyImageToBufferNode |
vkCmdCopyImageToBufferNode |
VKFillBufferNode |
vkCmdFillBuffer |
VKSynchonizationNode |
vkCmdPipelineBarrier (only used for swapchain barriers) |
VKUpdateBufferNode |
vkCmdUpdateBuffer |
VKUpdateMipmapsNode |
vkCmdPipelineBarrier /vkCmdBlitImage |
GPU selection | |
VKBeginQueryNode |
vkCmdBeginQuery |
VKEndQueryNode |
vkCmdEndQuery |
VKResetQueryPoolNode |
vkCmdResetQueryPool |
Note
vkCmdDraw*
and vkCmdDispatch*
only sends binding commands when the bindings are different. When attaching the same resources they are not generated.
Creating a node¶
In development: The next section is in development (2025Q1). It is already added here as the current implementation performs to many memory copies.
Nodes are first allocated. During allocation memory is reserved that the caller can use to store the data that the node requires. The caller fills the data and calls create_node.
Node data can be quite large (VKBeginRenderingNode::Data is just less than 1Kb) it stores
its data in an array (render_graph.storage.begin_rendering
) and points to the item via
(VKRenderGraphNode::storage_index
).
VKRenderGraph &render_graph;
VKBeginRenderingNode::Data &node_data = render_graph::allocate_node<VKBeginRenderingNode>();
VKBeginRenderingNode::CreateInfo create_info = {};
/* Fill data. */
...
/* Fill create info. */
...
render_graph::finalize_node<VKBeginRenderingNode>(create_info);
During finalizing node extracts its resource dependencies together with the needed access flags and needed image layout.
- When calling
allocate_node
it is required thatfinalize_node
is called, before allocating a new node. - For readability we could wrap the node_handle, node_data and create_info in a wrapper.