API Without Secrets
API Without Secrets
Preface
Written by Pawel Lapinski
Recently, with our team here at Intel, I was involved in preparing validation tools for our graphics driver’s support for
the emerging API called Vulkan. This graphics programming interface and the approach it represents is new to me. The
idea came to me that while I’m learning about it I can, at the same time, prepare a tutorial for writing applications using
Vulkan. I can share my thoughts and experiences as someone who knows OpenGL and would like to “migrate” to its
successor.
About Vulkan
Vulkan is seen as an OpenGL’s successor. It is a multiplatform API that allows developers to prepare high-performance
graphics applications likes games, CAD tools, benchmarks, and so forth. It can be used on different operating systems like
Windows*, Linux*, or Android*. The Khronos consortium created and maintains Vulkan. Vulkan also shares some other
similarities with OpenGL, including graphics pipeline stages, GLSL shaders (sort of) or nomenclature.
But there are many differences that confirm the need for the new API. OpenGL was changing for over 20 years. Many
things have changed in the computer industry since the early 90s, especially in graphics cards architecture. OpenGL is a
good library, but not everything can be done by only adding new functionalities that match the abilities of new graphics
cards. Sometimes a huge redesign has to be made. And that’s why Vulkan was created.
Vulkan was based on Mantle*—the first in a series of new low-level graphics APIs. Mantle was developed by AMD and
designed only for the architecture of Radeon cards. And despite it being the first publicly available API, games and
benchmarks that used Mantle saw some impressive performance gains. Then other low-level APIs started appearing, such
as Microsoft’s DirectX* 12, Apple’s Metal* and now Vulkan.
What is the difference between traditional graphics APIs and new low-level APIs? High-level APIs like OpenGL are quite
easy to use. The developer declares what they want to do and how they want to do it, and the driver handles the rest. The
driver checks whether the developer uses API calls in the proper way, whether the correct parameters are passed, and
whether the state is adequately prepared. If problems occur, feedback is provided. For ease of use, many tasks have to be
done “behind the scenes” by the driver.
In low-level APIs the developer is the one who must take care of most things. They are required to adhere to strict
programming and usage rules and also must write much more code. But this approach is reasonable. The developer knows
what they want to do and what they want to achieve. The driver does not, so with traditional APIs the driver has to make
additional effort for the program to work properly. With APIs like Vulkan this additional effort can be avoided. That’s why
DirectX 12, Metal, or Vulkan are called thin-drivers/thin-APIs. Mostly they only communicate user requests to the
hardware, providing only a thin abstraction layer of the hardware itself. The driver does as little as possible for the sake
of much higher performance.
Low-level APIs require additional work on the application side. But this work can’t be avoided. Someone or something
has to do it. So it is much more reasonable for the developer to do it, as they know how to divide work into separate
threads, when the image would be a render target (color attachment) or used as a texture/sampler, and so on. The
developer knows what pipeline state or what vertex attributes changes more often. All that leads to far more effective
use of the graphics card hardware. And the best part is that it works. An impressive performance boost can be observed.
But the word “can” is important. It requires additional effort but also a proper approach. There are scenarios in which
no difference in performance between OpenGL and Vulkan will be observed. If someone doesn’t need multithreading or
if the application isn’t CPU bound (rendered scenes aren’t too complex), OpenGL is enough and using Vulkan will not give
any performance boost (but it may lower power consumption, which is important on mobile devices). But if we want to
squeeze every last bit from our graphics hardware, Vulkan is the way to go.
Sooner or later all major graphics engines will support some, if not all, of the new low-level APIs. So if we want to use
Vulkan or other APIs, we won’t have to write everything from scratch. But it is always good to know what is going on
“under the hood”, and that’s the reason I have prepared this tutorial.
https://github.com/GameTechDev/IntroductionToVulkan
I have tried to write code samples that are as simple as possible and to not clutter the code with unnecessary “#ifdefs”.
Sometimes this can’t be avoided (like in window creation and management) so I decided to divide the code into small
parts:
Tutorial files are the most important here. They are the ones where all the exciting Vulkan-related code is placed.
Each lesson is placed in one header/source pair.
OperatingSystem header and source files contain OS-dependent parts of code like window creation, message
processing, and rendering loops. These files contain code for both Linux and Windows, but I tried to unify them
as much as possible.
main.cpp file is a starting point for each lesson. As it uses my custom Window class it doesn’t contain any OS-
specific code.
VulkanCommon header/source files contain the base class for all tutorials starting from tutorial 3. This class
basically replicates tutorials 1 and 2—creation of a Vulkan instance and all other resources necessary for the
rendered image to appear on the screen. I’ve extracted this preparation code so the code of all the other chapters
could focus on only the presented topics.
Tools contain some additional utility functions and classes like a function that reads the contents of a binary file
or a wrapper class for automatic object destruction.
The code for each chapter is placed in a separate folder. Sometimes it may contain an additional Data directory in
which resources like shaders or textures for a given chapter are placed. This Data folder should be copied to the same
directory in which executables will be held. By default executables are compiled into a build folder.
Right. Compilation and build folder. As the sample project should be easily maintained both on Windows and Linux
I’ve decided to use CMakeLists.txt file and a CMake tool. On Windows there is a build.bat file that creates a Visual Studio*
solution—Microsoft Visual Studio 2013 is required to compile the code on Windows (by default). On Linux I’ve provided a
build.sh script that compiles the code using make but CMakeLists.txt can also be easily opened with tools like Qt. CMake
is of course also required.
Solution and project files are generated and executables are compiled into the build folder. This folder is also the
default working directory, so the Data folders should be copied into it for the lessons to work properly. During execution,
in case of any problems, additional information is “printed” in cmd/terminal. So if there is something wrong, run the lesson
from the command line/terminal or look into the console/terminal window to see if any messages are displayed.
I hope these notes will help you understand and follow my Vulkan tutorial. Now let’s focus on learning Vulkan itself!
Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
1. We can dynamically load the driver’s library that provides Vulkan API implementation and acquire function
pointers by ourselves from it.
2. We can use the Vulkan SDK and link with the provided Vulkan Runtime (Vulkan Loader) static library.
3. We can use the Vulkan SDK, dynamically load Vulkan Loader library at runtime, and load function pointers from
it.
The first approach is not recommended. Hardware vendors can modify their drivers in any way, and it may affect
compatibility with a given application. It may even break the application and requiredevelopers writing a Vulkan-enabled
application to rewrite some parts of the code. That’s why it’s better to use some level of abstraction.
The recommended solution is to use the Vulkan Loader from the Vulkan SDK. It provides more configuration abilities
and more flexibility without the need to modify Vulkan application source code. One example of the flexibility is Layers.
The Vulkan API requires developers to create applications that strictly follow API usage rules. In case of any errors, the
driver provides us with little feedback, only some severe and important errors are reported (for example, out of memory).
This approach is used so the API itself can be as small (thin) and as fast as possible. But if we want to obtain more
information about what we are doing wrong we have to enable debug/validation layers. There are different layers for
different purposes such as memory usage, proper parameter passing, object life-time checking, and so on. These layers
all slow down the application’s performance but provide us with much more information.
We also need to choose whether we want to statically link with a Vulkan Loader or whether we will load it dynamically
and acquire function pointers by ourselves at runtime. This choice is just a matter of personal preference. This paper
focuses on the third way of using Vulkan: dynamically loading function pointers from the Vulkan Runtime library. This
approach is similar to what we had to do when we wanted to use OpenGL* on a Windows* system in which only some
basic functions were provided by the default implementation. The remaining functions had to be loaded dynamically using
wglGetProcAddress() or standard windows GetProcAddress() functions. This is what wrangler libraries such as GLEW or
GL3W were created for.
From now on, I refer to the first tutorial’s source code, focusing on the Tutorial01.cpp file. So in the initialization code
of our application we have to load the Vulkan library with something like this:
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VulkanLibrary = LoadLibrary( "vulkan-1.dll" );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
VulkanLibrary = dlopen( "libvulkan.so", RTLD_NOW );
#endif
This function is used to load all other Vulkan functions. To ease our work of obtaining addresses of all Vulkan API
functions it is very convenient to place their names inside a macro. This way we won’t have to duplicate function names
in multiple places (like definition, declaration, or loading) and can keep them in only one header file. This single file will be
used later for different purposes with an #include directive. We can declare our exported function like this:
#if !defined(VK_EXPORTED_FUNCTION)
#define VK_EXPORTED_FUNCTION( fun )
#endif
VK_EXPORTED_FUNCTION( vkGetInstanceProcAddr )
#undef VK_EXPORTED_FUNCTION
2. ListOfFunctions.inl, -
Now we define the variables that will represent functions from the Vulkan API. This can be done with something like
this:
#include "vulkan.h"
#include "ListOfFunctions.inl"
3. VulkanFunctions.cpp
Here we first include the vulkan.h file, which is officially provided for developers that want to use Vulkan API in their
applications. This file is similar to the gl.h file in the OpenGL library. It defines all enumerations, structures, types, and
function types that are necessary for Vulkan application development. Next we define the macros for functions from each
“level” (I will describe these levels soon). The function definition requires providing function type and a function name.
Fortunately, function types in Vulkan can be easily derived from function names. For example, the definition of
vkGetInstanceProcAddr() function’s type looks like this:
typedef PFN_vkVoidFunction (VKAPI_PTR *PFN_vkGetInstanceProcAddr)(VkInstance instance,
const char* pName);
4. Vulkan.h
The definition of a variable that represents this function would then look like this:
PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr;
-
This is what the macros from VulkanFunctions.cpp file expand to. They take the function name (hidden in a “fun”
parameter) and add “PFN_” at the beginning. Then the macro places a space after the type, and adds a function name and
a semicolon after that. Functions are “pasted” into the file in the line with the #include “ListOfFunctions.inl” directive.
But we must remember that when we want to define Vulkan functions’ prototypes by ourselves we must define the
VK_NO_PROTOTYPES preprocessor directive. By default the vulkan.h header file contains definitions of all functions. This
is useful when we are statically linking with Vulkan Runtime. So when we add our own definitions, there will be a
compilation error claiming that the given variables (for function pointers) are defined more than once (since we would
break the One Definition rule). We can disable definitions from vulkan.h file using the mentioned preprocessor macro.
Similarly we need to declare variables defined in the VulkanFunctions.cpp file so they would be seen in all other parts
of our code. This is done in the same way, but the word “extern” is placed before each function. Compare to the
VulkanFunctions.h file.
Now we have variables in which we can store addresses of functions acquired from the Vulkan library. To load the
only one exported function, we can use the following code:
#if defined(VK_USE_PLATFORM_WIN32_KHR)
#define LoadProcAddress GetProcAddress
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
#define LoadProcAddress dlsym
#endif
#include "ListOfFunctions.inl"
return true;
5. Tutorial01.cpp, function LoadExportedEntryPoints()
This macro takes the function name from the “fun” parameter, converts it into a string (with #) and obtains its address
from VulkanLibrary. The address is acquired using the GetProcAddress() (on Windows) or dlsym() (on Linux) function and
is stored in the variable represented by fun. If this operation fails and the function is not exposed from the library, we
report this problem by printing the proper information and returning false. The macro operates on lines included from
ListOfFunctions.inl. This way we don’t have to write the names of functions multiple times.
Now that we have our main function-loading procedure, we can load the rest of the Vulkan API procedures. These can
be divided into three types:
We will start with acquiring instance creation functions from the global level.
vkCreateInstance
vkEnumerateInstanceExtensionProperties
vkEnumerateInstanceLayerProperties
The most important function is vkCreateInstance(), which allows us to create a “Vulkan instance.” From application
point of view Vulkan instance can be thought of as an equivalent of OpenGL’s rendering context. It stores per-application
state (there is no global state in Vulkan) like enabled instance-level layers and extensions. The other two functions allow
us to check what instance layers are available and what instance extensions are available. Validation layers are divided
into instance and device levels depending on what functionality they debug. Extensions in Vulkan are similar to OpenGL’s
extensions: they expose additional functionality that is not required by core specifications, and not all hardware vendors
may implement them. Extensions, like layers, are also divided into instance and device levels, and extensions from
different levels must be enabled separately. In OpenGL, all extensions are (usually) available in created contexts; using
Vulkan we have to enable them before the functionality exposed by them can be used.
We call the function vkGetInstanceProcAddr() to acquire addresses of instance-level procedures. It takes two
parameters: an instance, and a function name. We don’t have an instance yet so we provide “null” for the first parameter.
That’s why these functions may sometimes be called null-instance or no-instance level functions. The second parameter
required by the vkGetInstanceProcAddr() function is a name of a procedure address of which we want to acquire. We can
only load global-level functions without an instance. It is not possible to load any other function without an instance handle
provided in the first parameter.
The code that loads global-level functions may look like this:
#define VK_GLOBAL_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( nullptr, #fun )) ) { \
printf( "Could not load global level function: " #fun "!\n" ); \
return false; \
}
#include "ListOfFunctions.inl"
return true;
6. Tutorial01.cpp, function LoadGlobalLevelEntryPoints()
The only difference between this code and the code used for loading the exported function (vkGetInstanceProcAddr()
exposed by the library) is that we don’t use function provided by the OS, like GetProcAddress(), but we call
vkGetInstanceProcAddr() where the first parameter is set to null.
If you follow this tutorial and write the code yourself, make sure you add global-level functions wrapped in a properly
named macro to ListOfFunctions.inl header file:
#if !defined(VK_GLOBAL_LEVEL_FUNCTION)
#define VK_GLOBAL_LEVEL_FUNCTION( fun )
#endif
VK_GLOBAL_LEVEL_FUNCTION( vkCreateInstance )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceExtensionProperties )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceLayerProperties )
#undef VK_GLOBAL_LEVEL_FUNCTION
7. ListOfFunctions.inl
VkInstanceCreateInfo instance_create_info = {
VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0, // VkInstanceCreateFlags flags
&application_info, // const VkApplicationInfo
*pApplicationInfo
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
0, // uint32_t
enabledExtensionCount
nullptr // const char * const
*ppEnabledExtensionNames
};
Most of the Vulkan structures begin with a field describing the type of the structure. Parameters are provided to
functions by pointers to avoid copying big memory chunks. Sometimes, inside structures, pointers to other structures, are
also provided. For the driver to know how many bytes it should read and how members are aligned, the type of the
structure is always provided. So what exactly do all these parameters mean?
sType – Type of the structure. In this case it informs the driver that we are providing information for instance
creation by providing a value of VK_STRUCTURE_TYPE_APPLICATION_INFO.
pNext – Additional information for instance creation may be provided in future versions of Vulkan API and this
parameter will be used for that purpose. For now, it is reserved for future use.
flags – Another parameter reserved for future use; for now it must be set to 0.
pApplicationInfo – An address of another structure with information about our application (like name, version,
required Vulkan API version, and so on).
enabledLayerCount – Defines the number of instance-level validation layers we want to enable.
ppEnabledLayerNames – This is an array of enabledLayerCount elements with the names of the layers we
would like to enable.
enabledExtensionCount – The number of instance-level extensions we want to enable.
ppEnabledExtensionNames – As with layers, this parameter should point to an array of at least
enabledExtensionCount elements containing names of instance-level extensions we want to use.
Most of the parameters can be nulls or zeros. The most important one (apart from the structure type information) is
a parameter pointing to a variable of type VkApplicationInfo. So before specifying instance creation information, we also
have to specify an additional variable describing our application. This variable contains the name of our application, the
name of the engine we are using, or the Vulkan API version we require (which is similar to the OpenGL version; if the driver
doesn’t support this version, the instance will not be created). This information may be very useful for the driver.
Remember that some graphics card vendors provide drivers that can be specialized for a specific title, such as a specific
game. If a graphics card vendor knows what graphics the engine game uses, it can optimize the driver’s behavior so the
game performs faster. This application information structure can be used for this purpose. The parameters from the
VkApplicationInfo structure include:
So now that we have defined these two structures we can call the vkCreateInstance() function and check whether an
instance was created. If successful, instance handle will be stored in a variable we provided the address of and VK_SUCCESS
(which is zero!) is returned.
Loading every Vulkan procedure using the vkGetInstanceProcAddr() function and Vulkan instance handle comes with
some trade-offs. When we use Vulkan for data processing, we must create a logical device and acquire device-level
functions. But on the computer that runs our application, there may be many devices that support Vulkan. Determining
which device to use depends on the mentioned logical device. But vkGetInstanceProcAddr() doesn’t recognize a logical
device, as there is no parameter for it. When we acquire device-level procedures using this function we in fact acquire
addresses of a simple “jump” functions. These functions take the handle of a logical device and jump to a proper
implementation (function implemented for a specific device). The overhead of this jump can be avoided. The
recommended behavior is to load procedures for each device separately using another function. But we still have to use
the vkGetInstanceProcAddr() function to load functions that allow us to create such a logical device.
vkEnumeratePhysicalDevices
vkGetPhysicalDeviceProperties
vkGetPhysicalDeviceFeatures
vkGetPhysicalDeviceQueueFamilyProperties
vkCreateDevice
vkGetDeviceProcAddr
vkDestroyInstance
These are the functions that are required and are used in this tutorial to create a logical device. But there are other
instance-level functions, that is, from extensions. The list in a header file from the example solution’s source code will
expand. The source code used to load all these functions is:
#define VK_INSTANCE_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( Vulkan.Instance, #fun )) ) { \
printf( "Could not load instance level function: " #fun "\n" ); \
return false; \
}
#include "ListOfFunctions.inl"
return true;
9. Tutorial01.cpp, function LoadInstanceLevelEntryPoints()
The code for loading instance-level functions is almost identical to the code loading global-level functions. We just
change the first parameter of vkGetInstanceProcAddr() function from null to create Vulkan instance handle. Of course we
also operate on instance-level functions so now we redefine the VK_INSTANCE_LEVEL_FUNCTION() macro instead of a
VK_GLOBAL_LEVEL_FUNCTION() macro. We also need to define functions from the instance level. As before, this is best
done with a list of macro-wrapped names collected in a shared header, for example:
#if !defined(VK_INSTANCE_LEVEL_FUNCTION)
#define VK_INSTANCE_LEVEL_FUNCTION( fun )
#endif
VK_INSTANCE_LEVEL_FUNCTION( vkDestroyInstance )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumeratePhysicalDevices )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceFeatures )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceQueueFamilyProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkCreateDevice )
VK_INSTANCE_LEVEL_FUNCTION( vkGetDeviceProcAddr )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumerateDeviceExtensionProperties )
#undef VK_INSTANCE_LEVEL_FUNCTION
10. ListOfFunctions.inl
Instance-level functions operate on physical devices. In Vulkan we can see “physical devices” and “logical devices”
(simply called devices). As the name suggests, a physical device refers to any physical graphics card (or any other hardware
component) that is installed on a computer running a Vulkan-enabled application that is capable of executing Vulkan
commands. As mentioned earlier, such a device may expose and implement different (optional) Vulkan features, may have
different capabilities (like total memory or ability to work on buffer objects of different sizes), or may provide different
extensions. Such hardware may be a dedicated (discrete) graphics card or an additional chip built (integrated) into a main
processor. It may even be the CPU itself. Instance-level functions allow us to check all these parameters. After we check
them, we must decide (based on our findings and our needs) which physical device we want to use. Maybe we even want
to use more than one device, which is also possible, but this scenario is too advanced for now. So if we want to harness
the power of any physical device we must create a logical device that represents our choice in the application (along with
enabled layers, extensions, features, and so on). After creating a device (and acquiring queues) we are prepared to use
Vulkan, the same way as we are prepared to use OpenGL after creating rendering context.
To check how many devices are available, we call the vkEnumeratePhysicalDevices() function. We call it twice, first
with the last parameter set to null. This way the driver knows that we are asking only for the number of available physical
devices. This number will be stored in the variable we provided the address of in the second parameter.
Now that we know how many physical devices are available we can prepare storage for their handles. I use a vector
so I don’t need to worry about memory allocation and deallocation. When we call vkEnumeratePhysicalDevices() again,
this time with all the parameters not equal to null, we will acquire handles of the physical devices in the array we provided
addresses of in the last parameter. This array may not be the same size as the number returned after the first call, but it
must hold the same number of elements as defined in the second parameter.
Example: we can have four physical devices available, but we are interested only in the first one. So after the first call
we set a value of four in num_devices. This way we know that there is any Vulkan-compatible device and that we can
proceed. We overwrite this value with one as we only want to use one (any) such device, no matter which. And we will
get only one physical device handle after the second call.
The number of devices we provided will be replaced by the actual number of enumerated physical devices (which of
course will not be greater than the value we provided). Example: we don’t want to call this function twice. Our application
supports up to 10 devices and we provide this value along with a pointer to a static, 10-element array. The driver always
returns the number of actually enumerated devices. If there is none, zero is stored in the variable address we provided. If
there is any such device, we will also know that. We will not be able to tell if there are more than 10 devices.
Now that we have handles of all the Vulkan compatible physical devices we can check the properties of each device.
In the sample code, this is done inside a loop:
VkPhysicalDevice selected_physical_device = VK_NULL_HANDLE;
uint32_t selected_queue_family_index = UINT32_MAX;
for( uint32_t i = 0; i < num_devices; ++i ) {
if( CheckPhysicalDeviceProperties( physical_devices[i], selected_queue_family_index
) ) {
selected_physical_device = physical_devices[i];
}
}
12. Tutorial01.cpp, function CreateDevice()
Device Properties
I created the CheckPhysicalDeviceProperties() function. It takes the handle of a physical device and checks whether
the capabilities of a given device are enough for our application to work properly. If so, it returns true and stores the queue
family index in the variable provided in the second parameter. Queues and queue families are discussed in a later section.
At the beginning of this function, the physical device is queried for its properties and features. Properties contain fields
such as supported Vulkan API version, device name and type (integrated or dedicated/discrete GPU), Vendor ID, and limits.
Limits describe how big textures can be created, how many samples in anti-aliasing are supported, or how many buffers
in a given shader stage can be used.
Device Features
Features are additional hardware capabilities that are similar to extensions. They may not necessarily be supported
by the driver and by default are not enabled. Features contain items such as geometry and tessellation shaders multiple
viewports, logical operations, or additional texture compression formats. If a given physical device supports any feature
we can enable it during logical device creation. Features are not enabled by default in Vulkan. But the Vulkan spec points
out that some features may have performance impact (like robustness).
After querying for hardware info and capabilities, I have provided a small example of how these queries can be used.
I “reversed” the VK_MAKE_VERSION macro and retrieved major, minor, and patch versions from the apiVersion field of
device properties. I check whether it is above some version I want to use, and also check whether I can create 2D textures
of a given size. In this example I’m not using features at all, but if we want to use any feature (that is, geometry shaders)
we must check whether it is supported and we must (explicitly) enable it later, during logical device creation. And this is
the reason why we need to create a logical device and not use physical device directly. A logical device represents a
physical device and all the features and extensions we enabled for it.
Command buffers (as whole objects) are passed to the hardware for execution through queues. However, these
buffers may contain different types of operations, such as graphics commands (used for generating and displaying images
like in typical 3D games) or compute commands (used for processing data). Specific types of commands may be processed
by dedicated hardware, and that’s why queues are also divided into different types. In Vulkan these queue types are called
families. Each queue family may support different types of operations. That’s why we also have to check if a given physical
device supports the type of operations we want to perform. We can also perform one type of operation on one device
and another type of operation on another device, but we have to check if we can. This check is done in the second half of
CheckPhysicalDeviceProperties() function:
uint32_t queue_families_count = 0;
vkGetPhysicalDeviceQueueFamilyProperties( physical_device, &queue_families_count,
nullptr );
if( queue_families_count == 0 ) {
printf( "Physical device %p doesn't have any queue families!\n", physical_device );
return false;
}
printf( "Could not find queue family with required properties on physical device
%p!\n", physical_device );
return false;
14. Tutorial01.cpp, function CheckPhysicalDeviceProperties()
We must first check how many different queue families are available in a given physical device. This is done in a similar
way to enumerating physical devices. First we call vkGetPhysicalDeviceQueueFamilyProperties() with the last parameter
set to null. This way, in a “queue_count” a variable number of different queue families is stored. Next we can prepare a
place for this number of queue families’ properties (if we want to—the mechanism is similar to enumerating physical
devices). Next we call the function again and the properties for each queue family are stored in a provided array.
The properties of each queue family contain queue flags, the number of available queues in this family, time stamp
support, and image transfer granularity. Right now, the most important part is the number of queues in the family and
flags. Flags (which is a bitfield) define which types of operations are supported by a given queue family (more than one
may be supported). It can be graphics, compute, transfer (memory operations like copying), and sparse binding (for sparse
resources like mega-textures) operations. Other types may appear in the future.
In our example we check for graphics operations support, and if we find it we can use the given physical device.
Remember that we also have to remember the selected family index. After we chose the physical device we can create
logical device that will represent it in the rest of our application, as shown in the example:
if( selected_physical_device == VK_NULL_HANDLE ) {
printf( "Could not select physical device based on the chosen properties!\n" );
return false;
}
VkDeviceQueueCreateInfo queue_create_info = {
VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceQueueCreateFlags
flags
selected_queue_family_index, // uint32_t
queueFamilyIndex
static_cast<uint32_t>(queue_priorities.size()), // uint32_t
queueCount
&queue_priorities[0] // const float
*pQueuePriorities
};
VkDeviceCreateInfo device_create_info = {
VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceCreateFlags
flags
1, // uint32_t
queueCreateInfoCount
&queue_create_info, // const VkDeviceQueueCreateInfo
*pQueueCreateInfos
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
0, // uint32_t
enabledExtensionCount
nullptr, // const char * const
*ppEnabledExtensionNames
nullptr // const VkPhysicalDeviceFeatures
*pEnabledFeatures
};
Vulkan.QueueFamilyIndex = selected_queue_family_index;
return true;
15. Tutorial01.cpp, function CreateDevice()
First we make sure that after we exited the device features loop, we have found the device that supports our needs.
Next we can create a logical device, which is done by calling vkCreateDevice(). It takes the handle to a physical device and
an address of a structure that contains the information necessary for device creation. This structure is of type
VkDeviceCreateInfo and contains the following fields:
Features (as I have described earlier) are additional hardware capabilities that are disabled by default. If we want to
enable all available features, we can’t simply fill this structure with ones. If some feature is not supported, the device
creation will fail. Instead, we should pass a structure that was filled when we called vkGetPhysicalDeviceFeatures(). This
is the easiest way to enable all supported features. If we are interested only in some specific features, we query the driver
for available features and clear all unwanted fields. If we don’t want any of the additional features we can clear this
structure (fill it with zeros) or pass a null pointer for this parameter (like in this example).
Queues are created automatically along with the device. To specify what types of queues we want to enable, we
provide an array of additional VkDeviceQueueCreateInfo structures. This array must contain queueCreateInfoCount
elements. Each element in this array must refer to a different queue family; we refer to a specific queue family only once.
As I mentioned previously, each element in the array with VkDeviceQueueCreateInfo elements must describe a
different queue family. Its index is a number that must be smaller than the value provided by the
vkGetPhysicalDeviceQueueFamilyProperties() function (must be smaller than number of available queue families). In
our example we are only interested in one queue from one queue family. And that’s why we must remember the queue
family index. It is used right here. If we want to prepare a more complicated scenario, we should also remember the
number of queues in each family as each family may support a different number of queues. And we can’t create more
queues than are available in a given family!
It is also worth noting that different queue families may have similar (or even identical properties) meaning they may
support similar types of operations, that is, there may be more than one queue families that support graphics operations.
And each family may contain different number of queues.
We must also assign a floating point value (from 0.0 to 1.0, both inclusive) to each queue. The higher the value we
provide for a given queue (relative to values assigned to other queues) the more time the given queue may have for
processing commands (relatively to other queues). But this relation is not guaranteed. Priorities also don’t influence
execution order. It is just a hint.
Priorities are relative only on a single device. If operations are performed on multiple devices, priorities may impact
processing time in each of these devices but not between them. A queue with a given value may be more important only
than queues with lower priorities on the same device. Queues from different devices are treated independently. Once we
fill these structures and call vkCreateDevice(), upon success a created logical device is stored in a variable we provided an
address of (in our example it is called VulkanDevice). If this function fails, it returns a value other than VK_SUCCESS.
So what should we do with device-level functions if there can be so many devices? We can load universal procedures.
This is done with the vkGetInstanceProcAddr() function. It returns the addresses of dispatch functions that perform jumps
to proper implementations based on a provided logical device handle. But we can avoid this overhead by loading functions
for each logical device separately. With this method, we must remember that we can call the given function only with the
device we loaded this function from. So if we are using more devices in our application we must load functions from each
of these devices. It’s not that difficult. And despite this leading to storing more functions (and grouping them based on a
device they were loaded from), we can avoid one level of abstraction and save some processor time. We can load functions
similarly to how we have loaded exported, global-, and instance-level functions:
#define VK_DEVICE_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetDeviceProcAddr( Vulkan.Device, #fun )) ) { \
printf( "Could not load device level function: " #fun "!\n" ); \
return false; \
}
#include "ListOfFunctions.inl"
return true;
16. Tutorial01.cpp, function LoadDeviceLevelEntryPoints()
This time we used the vkGetDeviceProcAddr() function along with a logical device handle. Functions from device level
are placed in a shared header. This time they are wrapped in a VK_DEVICE_LEVEL_FUNCTION() macro like this:
#if !defined(VK_DEVICE_LEVEL_FUNCTION)
#define VK_DEVICE_LEVEL_FUNCTION( fun )
#endif
VK_DEVICE_LEVEL_FUNCTION( vkGetDeviceQueue )
VK_DEVICE_LEVEL_FUNCTION( vkDestroyDevice )
VK_DEVICE_LEVEL_FUNCTION( vkDeviceWaitIdle )
#undef VK_DEVICE_LEVEL_FUNCTION
17. ListOfFunctions.inl
All functions that are not from the exported, global or instance levels are from the device level. Another distinction
can be made based on a first parameter: for device-level functions, the first parameter in the list may only be of type
VkDevice, VkQueue, or VkCommandBuffer. In the rest of the tutorial if a new function appears it must be added to
ListOfFunctions.inl and further added in the VK_DEVICE_LEVEL_FUNCTION portion (with a few noted exceptions like
extensions).
Retrieving Queues
Now that we have created a device, we need a queue that we can submit some commands to for processing. Queues
are automatically created with a logical device, but in order to use them we must specifically ask for a queue handle. This
is done with vkGetDeviceQueue() like this:
vkGetDeviceQueue( Vulkan.Device, Vulkan.QueueFamilyIndex, 0, &Vulkan.Queue );
To retrieve the queue handle we must provide the logical device we want to get the queue from. The queue family
index is also needed and it must by one of the indices we’ve provided during logical device creation (we cannot create
additional queues or use queues from families we didn’t request). One last parameter is a queue index from within a given
family; it must be smaller than the total number of queues we requested from a given family. For example if the device
supports five queues in family number 3 and we want two queues from that family, the index of a queue must be smaller
than two. For each queue we want to retrieve we have to call this function and make a separate query. If the function call
succeeds, it will store a handle to a requested queue in a variable we have provided the address of in the final parameter.
From now on, all the work we want to perform (using command buffers) can be submitted for processing to the acquired
queue.
Tutorial01 Execution
As I have mentioned, the example provided with this tutorial doesn’t display anything. But we have learned enough
information for one lesson. So how do we know if everything went fine? If the normal application window appears and
nothing is printed in the console/terminal, this means the Vulkan setup was successful. Starting with the next tutorial, the
results of our operations will be displayed on the screen.
Cleaning Up
There is one more thing we need to remember: cleaning up and freeing resources. Cleanup must be done in a specific
order that is (in general) a reversal of the order of creation.
After the application is closed, the OS should release memory and all other resources associated with it. This should
include Vulkan; the driver usually cleans up unreferenced resources. Unfortunately, this cleaning may not be performed
in a proper order, which might lead to application crash during the closing process. It is always good practice to do the
cleaning ourselves. Here is the sample code required to release resources we have created during this first tutorial:
if( Vulkan.Device != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( Vulkan.Device );
vkDestroyDevice( Vulkan.Device, nullptr );
}
if( Vulkan.Instance != VK_NULL_HANDLE ) {
vkDestroyInstance( Vulkan.Instance, nullptr );
}
if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
dlclose( VulkanLibrary );
#endif
}
19. Tutorial01.cpp, destructor
We should always check to see whether any given resource was created. Without a logical device there are no device-
level function pointers so we are unable to call even proper resource cleaning functions. Similarly, without an instance we
are unable to acquire pointer to a vkDestroyInstance() function. In general we should not release resources that weren’t
created.
We must ensure that before deleting any object, it is not being used by a device. That’s why there is a wait function,
which will block until all processing on all queues of a given device is finished. Next, we destroy the logical device using
the vkDestroyDevice() function. All queues associated with it are destroyed automatically, then the instance is destroyed.
After that we can free (unload or release) a Vulkan library from which all these functions were acquired.
Conclusion
This tutorial explained how to prepare to use Vulkan in our application. First we “connect” with the Vulkan Runtime
library and load global level functions from it. Then we create a Vulkan instance and load instance-level functions. After
that we can check what physical devices are available and what are their features, properties, and capabilities. Next we
create a logical device and describe what and how many queues must be created along with the device. After that we can
retrieve device-level functions using the newly created logical device handle. One additional thing to do is to retrieve
queues through which we can submit work for execution.
Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
So how do you integrate Vulkan with the application’s window? What are the differences compared to OpenGL? In
OpenGL (on Microsoft Windows*) we acquire Device Context that is associated with the application’s window. Using it we
then have to define “how” to present images on the screen, “what” the format is of the application’s window we will be
drawing on, and what capabilities it should support. This is done through the pixel format. Most of the time we create a
32-bit color surface with a 24-bit depth buffer and a support for double buffering (this way we can draw something to a
“hidden” back buffer, and after we’re finished we can present it on the screen—swap front and back buffers). Only after
these preparations can we create a Rendering Context and activate it. In OpenGL, all the rendering is directed to the
default, back buffer.
In Vulkan there is no default frame buffer. We can create an application that displays nothing at all. This is a valid
approach. But if we want to display something we can create a set of buffers to which we can render. These buffers along
with their properties, similar to Direct3D*, are called a swap chain. A swap chain can contain many images. To display any
of them we don’t “swap” them—as the name suggests—but we present them, which means that we give them back to a
presentation engine. So in OpenGL we first have to define the surface format and associate it with a window (at least on
Windows) and after that we create Rendering Context. In Vulkan, we first create an instance, a device, and then we create
a swap chain. But, what’s interesting is that there will be situations where we will have to destroy this swap chain and
recreate it. In the middle of a working application. From scratch!
Well, it’s not so obvious. Vulkan can be used for many different purposes, including performing mathematical
operations, boosting physics calculations, and processing a video stream. The results of these actions may not necessarily
be displayed on a typical monitor, which is why the core API is OS-agnostic, similar to OpenGL.
If you want to create a game and display rendered images on a monitor, you can (and should) use a swap chain. But
here is the second reason why a swap chain is an extension. Every OS displays images in a different way. The surface on
which you can render may be implemented differently, can have a different format, and can be differently represented in
the OS—there is no one universal way to do it. So in Vulkan a swap chain must also depend on the OS your application is
written for.
These are the reasons a swap chain in Vulkan is treated as an extension: it provides render targets (buffers or images
like FBOs in OpenGL) that integrates with OS specific code. It’s something that core Vulkan (which is platform independent)
can’t do. So if swap chain creation and usage is an extension, we have to ask for the extension during both instance and
device creation. The ability to create and use a swap chain requires us to enable extensions at two levels (at least on most
operating systems, with Windows and Linux* among them). This means that we have to go back to the first tutorial and
change it to request the proper swap-chain-related extensions. If a given instance and device doesn’t support these
extensions, the instance and/or device creation will fail. There are of course other ways through which we can display an
image, like acquiring the pointer to a buffer’s (texture’s) memory (mapping it) and copying data from it to the OS-acquired
window’s surface pointer. This process is out of scope of this tutorial (though not really that hard). But fortunately it seems
that swap chain extensions will be similar to OpenGL’s core extensions: they will be something that’s not in the core spec
and that’s not required to be implemented but they also are something that every hardware vendor will implement
anyway. I think all hardware vendors would like to show that they support Vulkan and that it gives impressive performance
boost in games which are displayed on screen. And, what backs this theory, swap chain extensions are integrated into the
main, “core” vulkan.h header.
In the case of swap-chain support, there are actually three extensions involved: two from an instance level and one
from a device level. These extensions logically separate different functionalities. The first is the VK_KHR_surface extension
defined at the instance level. It describes a “surface” object, which is a logical representation of an application’s window.
This extension allows us to check different parameters (that is, capabilities, supported formats, size) of a surface and to
query whether the given physical device supports a swap chain (more precisely, whether the given queue family supports
presenting an image to a given surface). This is useful information because we don’t want to choose a physical device and
try to create a logical device from it only to find out that it doesn’t support swap chains. This extension also defines
methods to destroy any such surface.
The second instance-level extension is OS-dependent: in the Windows OS family it is called VK_KHR_win32_surface
and in Linux it is called VK_KHR_xlib_surface or VK_KHR_xcb_surface. This extension allows us to create a surface that
represents the application’s window in a given OS (and uses OS-specific parameters).
We can prepare a place for a smaller amount of extensions, but then vkEnumerateInstanceExtensionProperties() will
return VK_INCOMPLETE to let us know we didn’t acquire all the extensions.
Our array is now filled with all available (supported) instance-level extensions. Each element of our allocated space
contains the name of the extension and its version. The second parameter probably won’t be used too often, but it may
be useful to check whether the hardware supports the given version of the extension. For example, we might be
interested in some specific extension, and we downloaded an SDK for it that contains a set of header files. Each header
file has its own version corresponding to the value returned by this query. If the hardware our application is executed on
supports an older version of the extension (not the one we downloaded the SDK for) it may not support all the functions
we are using from this specific extension. So sometimes it may be useful to also verify the version, but for a swap chain it
doesn’t matter—at least for now.
We can now search through all of the returned extensions and see whether the list contains the extensions we are
looking for. Here I’m using two convenient definitions named VK_KHR_SURFACE_EXTENSION_NAME and
VK_KHR_????_SURFACE_EXTENSION_NAME. They are defined inside a Vulkan header file and contain the names of the
extensions so we don’t have to copy or remember them. We just can use the definitions in our code, and if we make a
mistake the compiler will tell us. I hope all extensions will come with a similar definition.
With the second definition comes a small trap. These two mentioned defines are placed in a vulkan.h header file. But
isn’t the second define specific for a given OS and isn’t vulkan.h header OS independent? Both questions are true and
quite valid. The vulkan.h file is OS-independent and it contains the definitions of OS-specific extensions. But these are
enclosed inside #ifdef … #endif preprocessor directives. If we want to “enable” them we need to add a proper preprocessor
directive somewhere in our project. For a Windows system, we need to add a VK_USE_PLATFORM_WIN32_KHR string. On
Linux, we need to add VK_USE_PLATFORM_XCB_KHR or VK_USE_PLATFORM_XLIB_KHR depending on whether we want
to use the X11 or XCB libraries. In the provided example project, these definitions are added by default through the
CMakeLists.txt file.
But back to our source code. What does the CheckExtensionAvailability() function do? It loops over all available
extensions and compares their names with the name of the provided extension. If a match is found, it just returns true.
for( size_t i = 0; i < available_extensions.size(); ++i ) {
if( strcmp( available_extensions[i].extensionName, extension_name ) == 0 ) {
return true;
}
}
return false;
2. Tutorial02.cpp, function CheckExtensionAvailability()
VkInstanceCreateInfo instance_create_info = {
VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkInstanceCreateFlags flags
&application_info, // const VkApplicationInfo
*pApplicationInfo
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
static_cast<uint32_t>(extensions.size()), // uint32_t
enabledExtensionCount
&extensions[0] // const char * const
*ppEnabledExtensionNames
};
This code is similar to the CreateInstance() function in the Tutorial01.cpp file. To request instance-level extensions we
have to prepare an array with the names of all extensions we want to enable. Here I have used a standard vector with
“const char*” elements and mentioned extension names in forms of defines.
In Tutorial 1 we declared zero extensions and placed a nullptr for the address of an array in a VkInstanceCreateInfo
structure. This time we must provide an address of the first element of an array filled with the names of the requested
extensions. And we must also specify how many elements the array contains (that’s why I chose a vector: if I add or remove
extensions in future tutorials, the vector’s size will also change accordingly). Next we call the vkCreateInstance() function.
If it doesn’t return VK_SUCCESS it means that (in the case of this tutorial) extensions are not supported. If it does return
successfully, we can load instance-level functions as previously, but this time also with some additional, extension-specific
functions.
With these extensions come additional functions. And, as it is an instance-level extension, we must add them to our
set of instance-level functions (so they will also be loaded at a proper moment and with a proper function). In this case
we must add the following functions into a ListOfFunctions.inl wrapped into a VK_INSTANCE_LEVEL_FUNCTION() macro
like this:
// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_INSTANCE_LEVEL_FUNCTION( vkDestroySurfaceKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceSupportKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceCapabilitiesKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceFormatsKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfacePresentModesKHR )
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateWin32SurfaceKHR )
#elif defined(VK_USE_PLATFORM_XCB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXcbSurfaceKHR )
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXlibSurfaceKHR )
#endif
#endif
4. ListOfFunctions.inl
One more thing: I’ve wrapped all these swap-chain-related functions inside another #ifdef … #endif pair, which
requires a USE_SWAPCHAIN_EXTENSIONS preprocessor directive to be defined. I’ve done this so Tutorial 1 would properly
work. Without it, our first application (as it uses the same header files) would try to load all these functions. But we don’t
enable swap chain extensions in the first tutorial, so this operation would fail and the application would close without fully
initializing Vulkan. If a given extension isn’t enabled, functions from it may not be available.
#elif defined(VK_USE_PLATFORM_XCB_KHR)
VkXcbSurfaceCreateInfoKHR surface_create_info = {
VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkXcbSurfaceCreateFlagsKHR
flags
Window.Connection, // xcb_connection_t*
connection
Window.Handle // xcb_window_t
window
};
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VkXlibSurfaceCreateInfoKHR surface_create_info = {
VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkXlibSurfaceCreateFlagsKHR
flags
Window.DisplayPtr, // Display
*dpy
Window.Handle // Window
window
};
if( vkCreateXlibSurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr,
&Vulkan.PresentationSurface ) == VK_SUCCESS ) {
return true;
}
#endif
To create a presentation surface, we call the vkCreate????SurfaceKHR() function, which accepts Vulkan Instance (with
enabled surface extensions), a pointer to a OS-specific structure, a pointer to optional memory allocation handling
functions, and a pointer to a variable in which a handle to a created surface will be stored.
This OS-specific structure is called Vk????SurfaceCreateInfoKHR and it contains the following fields:
To check what extensions given physical device supports we must create code similar to the code prepared for
instance-level extensions. This time we just use the vkEnumerateDeviceExtensionProperties() function. It behaves
identically to the function querying instance extensions. The only difference is that it takes an additional physical device
handle in the first argument. The code for this may look similar to the example below. It is a part of the
CheckPhysicalDeviceProperties() function in our example source code.
uint32_t extensions_count = 0;
if( (vkEnumerateDeviceExtensionProperties( physical_device, nullptr,
&extensions_count, nullptr ) != VK_SUCCESS) ||
(extensions_count == 0) ) {
printf( "Error occurred during physical device %p extensions enumeration!\n",
physical_device );
return false;
}
We first ask for the number of all extensions available on a given physical device. Next we get their names and look
for the device-level swap-chain extension. If there is none there is no point in further checking the device’s properties,
features, and queue families’ properties as a given device doesn’t support swap chain at all.
The only change is that I’ve added another variable that will contain an index of a queue family that supports a swap
chain (more precisely image presentation). Unfortunately, just checking whether swap extension is supported is not
enough because presentation support is a queue family property. A physical device may support swap chains, but that
doesn’t mean that all its queue families also support it. And do we really need another queue or queue family for displaying
images? Can’t we just use graphics queue that we’d selected in the first tutorial? Most of the time one queue family will
probably be enough for our needs. This means that the selected queue family will support both graphics operations and
a presentation. But, unfortunately, it is also possible that there will be devices that won’t support graphics and presenting
within a single queue family. In Vulkan we have to be flexible and prepared for any situation.
vkGetPhysicalDeviceSurfaceSupportKHR() function is used to check whether a given queue family from a given
physical device supports a swap chain or, to be more precise, whether it supports presenting images to a given surface.
That’s why we needed to create a surface earlier.
So assume we have already checked whether a given physical device exposes a swap-chain extension and that we
have already queried for a number of different queue families supported by a given physical device. We have also
requested the properties of all queue families. Now we can check whether a given queue family supports presentation to
our surface (window).
uint32_t graphics_queue_family_index = UINT32_MAX;
uint32_t present_queue_family_index = UINT32_MAX;
// We don't have queue that supports both graphics and present so we have to use
separate queues
for( uint32_t i = 0; i < queue_families_count; ++i ) {
if( queue_present_support[i] ) {
present_queue_family_index = i;
break;
}
}
// If this device doesn't support queues with graphics and present capabilities don't
use it
if( (graphics_queue_family_index == UINT32_MAX) ||
(present_queue_family_index == UINT32_MAX) ) {
printf( "Could not find queue families with required properties on physical device
%p!\n", physical_device );
return false;
}
selected_graphics_queue_family_index = graphics_queue_family_index;
selected_present_queue_family_index = present_queue_family_index;
return true;
8. Tutorial02.cpp, function CheckPhysicalDeviceProperties()
Here we are iterating over all available queue families. In each loop iteration, we are calling a function responsible for
checking whether a given queue family supports presentation. vkGetPhysicalDeviceSurfaceSupportKHR() function
requires us to provide a physical device handle, the queue family index we want to check, and the surface handle we want
to render into (present an image). If support is available, VK_TRUE will be stored at a given address; otherwise VK_FALSE
is stored.
Now we have the properties of all available queue families. We know which queue family supports graphics operations
and which supports presentation. In our tutorial example I prefer families that support both. If I find one I store the family
index and exit immediately from CheckPhysicalDeviceProperties() function. If there is no such queue family I use the first
queue family that supports graphics and a first family that supports presenting. Only then can I leave the function with a
“success” return code.
A more advanced scenario may search through all available devices and try to find one with a queue family that
supports both graphics and presentation operations. But I can also imagine situations when there will be no single device
that supports both. Then we are forced to use one device for graphics calculations (maybe like the old “graphics
accelerator”) and another device for presenting results on the screen (connected with the “accelerator” and a monitor).
Unfortunately in such case we must use “general” Vulkan functions from the Vulkan Runtime or we need to store
device-level functions for each used device (each device may have a different implementation of Vulkan functions). But,
hopefully, such situations will be uncommon.
queue_create_infos.push_back( {
VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceQueueCreateFlags
flags
selected_graphics_queue_family_index, // uint32_t
queueFamilyIndex
static_cast<uint32_t>(queue_priorities.size()), // uint32_t
queueCount
&queue_priorities[0] // const float
*pQueuePriorities
} );
VkDeviceCreateInfo device_create_info = {
VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceCreateFlags
flags
1, // uint32_t
queueCreateInfoCount
&queue_create_infos[0], // const VkDeviceQueueCreateInfo
*pQueueCreateInfos
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
static_cast<uint32_t>(extensions.size()), // uint32_t
enabledExtensionCount
&extensions[0], // const char * const
*ppEnabledExtensionNames
nullptr // const VkPhysicalDeviceFeatures
*pEnabledFeatures
};
Vulkan.GraphicsQueueFamilyIndex = selected_graphics_queue_family_index;
Vulkan.PresentQueueFamilyIndex = selected_present_queue_family_index;
return true;
9. Tutorial02.cpp, function CreateDevice()
As before, we need to fill a variable of VkDeviceCreateInfo type. To do this, we need to declare the queue families and
how many queues each we want to enable. We do this through a pointer to a separate array with
VkDeviceQueueCreateInfo elements. Here I declare a vector and I add one element, which defines one queue from the
queue family that supports graphics operations. We use a vector because if graphics and presenting aren’t supported by
a single family, we will need to define two separate families. If a single family supports both we just define one member
and declare that only one family is needed. If the indices of graphics and presentation families are different we need to
declare additional members for our vector with VkDeviceQueueCreateInfo elements. In this case the VkDeviceCreateInfo
structure must provide info about two different families. That’s why a vector once again comes in handy (with its size()
member function).
But we are not finished with device creation yet. We have to ask for the third extension related to a swap chain—a
device-level “VK_KHR_swapchain” extension. As mentioned earlier, this extensions defines the actual support,
implementation, and usage of a swap chain.
To ask for this extension, similarly at an instance level, we define an array (or a vector) which contains all the names
of device-level extensions we want to enable. We provide an address of a first element of this array and the number of
extensions we want to use. This extension also contains a definition of its name in a form of a #define
VK_KHR_SWAPCHAIN_EXTENSION_NAME. We can use it inside our array (vector), and we don’t have to worry about any
typos.
This third extension introduces additional functions used to actually create, destroy, or in general manage swap chains.
Before we can use them, we of course need to load pointers to these functions. They are from the device level so we will
place them in a ListOfFunctions.inl file using VK_DEVICE_LEVEL_FUNCTION() macro:
// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_DEVICE_LEVEL_FUNCTION( vkCreateSwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkDestroySwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkGetSwapchainImagesKHR )
VK_DEVICE_LEVEL_FUNCTION( vkAcquireNextImageKHR )
VK_DEVICE_LEVEL_FUNCTION( vkQueuePresentKHR )
#endif
10. ListOfFunctions.inl
You can once again see that I’m checking whether a USE_SWAPCHAIN_EXTENSIONS preprocessor directive is defined.
I define it only in projects that enable swap-chain extensions.
Now that we have created a logical devices we need to receive handles of a graphics queue and (if separate)
presentation queue. I’m using two separate queue variables for convenience, but they both may contain the same handle.
After loading the device-level functions we can read requested queue handles. Here’s the code for it:
vkGetDeviceQueue( Vulkan.Device, Vulkan.GraphicsQueueFamilyIndex, 0,
&Vulkan.GraphicsQueue );
vkGetDeviceQueue( Vulkan.Device, Vulkan.PresentQueueFamilyIndex, 0,
&Vulkan.PresentQueue );
return true;
11. Tutorial02.cpp, function GetDeviceQueue()
Creating a Semaphore
One last step before we can move to swap chain creation and usage is to create a semaphore. Semaphores are objects
used for queue synchronization. They may be signaled or unsignaled. One queue may signal a semaphore (change its state
from unsignaled to signaled) when some operations are finished, and another queue may wait on the semaphore until it
becomes signaled. After that, the queue resumes performing operations submitted through command buffers.
VkSemaphoreCreateInfo semaphore_create_info = {
VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0 // VkSemaphoreCreateFlags flags
};
return true;
12. Tutorial02.cpp, function CreateSemaphores()
To create a semaphore we call the vkCreateSemaphore() function. It requires us to provide create information with
three fields:
Semaphores are used during drawing (or during presentation if we want to be more precise). I will describe the details
later.
Creating a Swap Chain
We have enabled support for a swap chain, but before we can render anything on screen we must first create a swap
chain from which we can acquire images on which we can render (or to which we can copy anything if we have rendered
something into another image).
To create a swap chain, we call the vkCreateSwapchainKHR() function. It requires us to provide an address of a
variable of type VkSwapchainCreateInfoKHR, which informs the driver about the properties of a swap chain that is being
created. To fill this structure with the proper values, we must determine what is possible on a given hardware and
platform. To do this we query the platform’s or window’s properties about the availability of and compatibility with several
different features, that is, supported image formats or present modes (how images are presented on screen). So before
we can create a swap chain we must check what is possible with a given platform and how we can create a swap chain.
Acquired capabilities contain important information about ranges (limits) that are supported by the swap chain, that
is, minimal and maximal number of images, minimal and maximal dimensions of images, or supported transforms (some
platforms may require transformations applied to images before these images may be presented).
To query for surface formats, we must call the vkGetPhysicalDeviceSurfaceFormatsKHR() function. We can do it, as
usual, twice: the first time to acquire the number of supported formats and a second time to acquire supported formats
in an array prepared for this purpose. It can be done like this:
uint32_t formats_count;
if( (vkGetPhysicalDeviceSurfaceFormatsKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &formats_count, nullptr ) != VK_SUCCESS) ||
(formats_count == 0) ) {
printf( "Error occurred during presentation surface formats enumeration!\n" );
return false;
}
To query for present modes that are supported on a given platform, we call the
vkGetPhysicalDeviceSurfacePresentModesKHR() function. We can create code similar to this one:
uint32_t present_modes_count;
if( (vkGetPhysicalDeviceSurfacePresentModesKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &present_modes_count, nullptr ) != VK_SUCCESS) ||
(present_modes_count == 0) ) {
printf( "Error occurred during presentation surface present modes enumeration!\n"
);
return false;
}
We now have acquired all the data that will help us prepare the proper values for a swap chain creation.
An application may request more images. If it wants to use multiple images at once it may do so, for example, when
encoding a video stream where every fourth image is a key frame and the application needs it to prepare the remaining
three frames. Such usage will determine the number of images that will be automatically created in a swap chain: how
many images the application requires at once for processing and how many images the presentation engine requires to
function properly.
But we must ensure that the requested number of swap chain images is not smaller than the minimal required number
of images and not greater than the maximal supported number of images (if there is such a limitation). And too many
images will require much more memory. On the other hand, too small a number of images may cause stalls in the
application (more about this later).
The number of images that are required for a swap chain to work properly and for an application to be able to render
to is defined in the surface capabilities. Here is some code that checks whether the number of images is between the
allowable min and max values:
// Set of images defined in a swap chain may not always be available for application
to render to:
// One may be displayed and one may wait in a queue to be presented
// If application wants to use more images at the same time it must ask for more
images
uint32_t image_count = surface_capabilities.minImageCount + 1;
if( (surface_capabilities.maxImageCount > 0) &&
(image_count > surface_capabilities.maxImageCount) ) {
image_count = surface_capabilities.maxImageCount;
}
return image_count;
16. Tutorial02.cpp, function GetSwapChainNumImages()
The minImageCount value in the surface capabilities structure gives the required minimum number of images for the
swap chain to work properly. Here I’m selecting one more image than is required, and I also check whether I’m asking for
too much. One more image may be useful for triple buffering-like presentation mode (if it is available). In more advanced
scenarios we would also be required to store the number of images we want to use at the same time (at once). Let’s say
we want to encode a mentioned video stream and we need a key frame (every forth image frame) and the other three
images. But a swap chain doesn’t allow the application to operate on four images at once—only on three. We need to
know that because we can only prepare two frames from a key frame, then we need to release them (give them back to
a presentation engine) and only then can we acquire the last, third, non-key frame. This will become clearer later.
Each platform may support a different number of format-colorspace pairs. If we want to use specific ones we must
make sure that they are available.
// If the list contains only one entry with undefined format
// it mean that there are no preferred surface formats and any can be choosen
if( (surface_formats.size() == 1) &&
(surface_formats[0].format == VK_FORMAT_UNDEFINED) ) {
return{ VK_FORMAT_R8G8B8A8_UNORM, VK_COLORSPACE_SRGB_NONLINEAR_KHR };
}
Earlier we requested a supported format which was placed in an array (a vector in our case). If this array contains only
one value with an undefined format, that platform doesn’t have any preferences. We can use any image format we want.
In other cases, we can use only one of the available formats. Here I’m looking for any (linear or not) 32-bit RGBA
format. If it is available I can choose it. If there is no such format I will use any format from the list (hoping that the first is
also the best and contains the format with the most precision).
Selecting the swap chain size may (and probably usually will) look like this:
// Special value of surface extent is width == height == -1
// If this is so we define the size by ourselves but it must fit within defined
confines
if( surface_capabilities.currentExtent.width == -1 ) {
VkExtent2D swap_chain_extent = { 640, 480 };
if( swap_chain_extent.width < surface_capabilities.minImageExtent.width ) {
swap_chain_extent.width = surface_capabilities.minImageExtent.width;
}
if( swap_chain_extent.height < surface_capabilities.minImageExtent.height ) {
swap_chain_extent.height = surface_capabilities.minImageExtent.height;
}
if( swap_chain_extent.width > surface_capabilities.maxImageExtent.width ) {
swap_chain_extent.width = surface_capabilities.maxImageExtent.width;
}
if( swap_chain_extent.height > surface_capabilities.maxImageExtent.height ) {
swap_chain_extent.height = surface_capabilities.maxImageExtent.height;
}
return swap_chain_extent;
}
// Most of the cases we define size of the swap_chain images equal to current
window's size
return surface_capabilities.currentExtent;
18. Tutorial02.cpp, function GetSwapChainExtent()
For a swap chain we want to render (in most cases) into the image (use it as a render target), so we must specify “color
attachment” usage with VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT enum. In Vulkan this usage is always available for
swap chains, so we can always set it without any additional checking. But for any other usage we must ensure it is
supported – we can do this through a “supportedUsageFlags” member of surface capabilities structure.
// Color attachment flag must always be supported
// We can define other usage flags but we always need to check if they are supported
if( surface_capabilities.supportedUsageFlags & VK_IMAGE_USAGE_TRANSFER_DST_BIT ) {
return VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;
}
return 0;
19. Tutorial02.cpp, function GetSwapChainUsageFlags()
In this example we define additional “transfer destination” usage which is required for image clear operation.
Selecting Pre-Transformations
On some platforms we may want our image to be transformed. This is usually the case on tablets when they are
oriented in a way other than their default orientation. During swap chain creation we must specify what transformations
should be applied to images prior to presenting. We can, of course, use only the supported transforms, which can be found
in a “supportedTransforms” member of acquired surface capabilities.
If the selected pre-transform is other than the current transformation (also found in surface capabilities) the
presentation engine will apply the selected transformation. On some platforms this may cause performance degradation
(probably not noticeable but worth mentioning). In the sample code, I don’t want any transformations but, of course, I
must check whether it is supported. If not, I’m just using the same transformation that is currently used.
// Sometimes images must be transformed before they are presented (i.e. due to
device's orienation
// being other than default orientation)
// If the specified transform is other than current transform, presentation engine
will transform image
// during presentation operation; this operation may hit performance on some
platforms
// Here we don't want any transformations to occur so if the identity transform is
supported use it
// otherwise just use the same transform as current transform
if( surface_capabilities.supportedTransforms & VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR
) {
return VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
} else {
return surface_capabilities.currentTransform;
}
20. Tutorial02.cpp, function GetSwapChainTransform()
Double buffering was introduced to prevent the visibility of drawing operations: one image was displayed and the
second was used to render into. During presentation, the contents of the second image were copied into the first image
(earlier) or (later) the images were swapped (remember SwapBuffers() function used in OpenGL applications?) which
means that their pointers were exchanged.
Tearing was another issue with displaying images, so the ability to wait for the vertical blank signal was introduced if
we wanted to avoid it. But waiting introduced another problem: input lag. So double buffering was changed into triple
buffering in which we were drawing into two back buffers interchangeably and during v-sync the most recent one was
used for presentation.
This is exactly what presentation modes are for: how to deal with all these issues, how to present images on the screen
and whether we want to use v-sync.
IMMEDIATE. Present requests are applied immediately and tearing may be observed (depending on the
frames per second). Internally the presentation engine doesn’t use any queue for holding swap chain images.
FIFO. This mode is the most similar to OpenGL’s buffer swapping with a swap interval set to 1. The image is
displayed (replaces currently displayed image) only on vertical blanking periods, so no tearing should be
visible. Internally, the presentation engine uses FIFO queue with “numSwapchainImages – 1” elements.
Present requests are appended to the end of this queue. During blanking periods, the image from the
beginning of the queue replaces the currently displayed image, which may become available to application. If
all images are in the queue, the application has to wait until v-sync releases the currently displayed image.
Only after that does it becomes available to the application and program may render image into it. This mode
must always be available in all Vulkan implementations supporting swap chain extension.
FIFO RELAXED. This mode is similar to FIFO, but when the image is displayed longer than one blanking period
it may be released immediately without waiting for another v-sync signal (so if we are rendering frames with
lower frequency than screen’s refresh rate, tearing may be visible)
MAILBOX. In my opinion, this mode is the most similar to the mentioned triple buffering. The image is
displayed only on vertical blanking periods and no tearing should be visible. But internally, the presentation
engine uses the queue with only a single element. One image is displayed and one waits in the queue. If
application wants to present another image it is not appended to the end of the queue but replaces the one
that waits. So in the queue there is always the most recently generated image. This behavior is available if
there are more than two images. For two images MAILBOX mode behaves similarly to FIFO (as we have to
wait for the displayed image to be released, we don’t have “spare” image which can be exchanged with the
one that waits in the queue).
Deciding on which presentation mode to use depends on the type of operations we want to do. If we want to decode
and display movies we want all frames to be displayed in a proper order. So the FIFO mode is in my opinion the best
choice. But if we are creating a game, we usually want to display the most recently generated frame. In this case I suggest
using MAILBOX because there is no tearing and input lag is minimized. The most recently generated image is displayed
and the application doesn’t need to wait for v-sync. But to achieve this behavior, at least three images must be created
and this mode may not always be supported.
FIFO mode is always available and requires at least two images but causes application to wait for v-sync (no matter
how many swap chain images were requested). Immediate mode is the fastest. As I understand it, it also requires two
images but it doesn’t make application wait for monitor refresh rate. On the downside it may cause image tearing. The
choice is yours but, as always, we must make sure that the chosen presentation mode is supported.
Earlier we queried for available present modes, so now we must look for the one that best suits our needs. Here is the
code in which I’m looking for MAILBOX mode:
// FIFO present mode is always available
// MAILBOX is the lowest latency V-Sync enabled mode (something like triple-
buffering) so use it if available
for( VkPresentModeKHR &present_mode : present_modes ) {
if( present_mode == VK_PRESENT_MODE_MAILBOX_KHR ) {
return present_mode;
}
}
return VK_PRESENT_MODE_FIFO_KHR;
21. Tutorial02.cpp, function GetSwapChainPresentMode()
Creating a Swap Chain
Now we have all the data necessary to create a swap chain. We have defined all the required values, and we are sure
they fit into the given platform’s constraints.
uint32_t desired_number_of_images = GetSwapChainNumImages(
surface_capabilities );
VkSurfaceFormatKHR desired_format = GetSwapChainFormat( surface_formats );
VkExtent2D desired_extent = GetSwapChainExtent(
surface_capabilities );
VkImageUsageFlags desired_usage = GetSwapChainUsageFlags(
surface_capabilities );
VkSurfaceTransformFlagBitsKHR desired_transform = GetSwapChainTransform(
surface_capabilities );
VkPresentModeKHR desired_present_mode = GetSwapChainPresentMode(
present_modes );
VkSwapchainKHR old_swap_chain = Vulkan.SwapChain;
if( static_cast<int>(desired_usage) == 0 ) {
printf( "TRANSFER_DST image usage is not supported by the swap chain!" );
return false;
}
VkSwapchainCreateInfoKHR swap_chain_create_info = {
VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkSwapchainCreateFlagsKHR
flags
Vulkan.PresentationSurface, // VkSurfaceKHR
surface
desired_number_of_images, // uint32_t
minImageCount
desired_format.format, // VkFormat
imageFormat
desired_format.colorSpace, // VkColorSpaceKHR
imageColorSpace
desired_extent, // VkExtent2D
imageExtent
1, // uint32_t
imageArrayLayers
desired_usage, // VkImageUsageFlags
imageUsage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
imageSharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr, // const uint32_t
*pQueueFamilyIndices
desired_transform, // VkSurfaceTransformFlagBitsKHR
preTransform
VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR, // VkCompositeAlphaFlagBitsKHR
compositeAlpha
desired_present_mode, // VkPresentModeKHR
presentMode
VK_TRUE, // VkBool32
clipped
old_swap_chain // VkSwapchainKHR
oldSwapchain
};
if( vkCreateSwapchainKHR( Vulkan.Device, &swap_chain_create_info, nullptr,
&Vulkan.SwapChain ) != VK_SUCCESS ) {
printf( "Could not create swap chain!\n" );
return false;
}
if( old_swap_chain != VK_NULL_HANDLE ) {
vkDestroySwapchainKHR( Vulkan.Device, old_swap_chain, nullptr );
}
return true;
22. Tutorial02.cpp, function CreateSwapChain()
In this code example, at the beginning we gathered all the necessary data described earlier. Next we create a variable
of type VkSwapchainCreateInfoKHR. It consists of the following members:
So what’s the matter with this sharing mode? Images in Vulkan can be referenced by queues. This means that we can
create commands that use these images. These commands are stored in command buffers, and these command buffers
are submitted to queues. Queues belong to different queue families. And Vulkan requires us to state how many different
queue families and which of them are referencing these images through commands submitted with command buffers.
If we want to reference images from many different queue families at a time we can do so. In this case we must
provide “concurrent” sharing mode. But this (probably) requires us to manage image data coherency by ourselves, that
is, we must synchronize different queues in such a way that data in the images is proper and no hazards occur—some
queues are reading data from images, but other queues haven’t finished writing to them yet.
We may not specify these queue families and just tell Vulkan that only one queue family (queues from one family) will
be referencing image at a time. This doesn’t mean other queues can’t reference these images. It just means they can’t do
it all at once, at the same time. So if we want to reference images from one family and then from another we must
specifically tell Vulkan: “My image was used inside this queue family, but from now on another family, this one, will be
referencing it.” Such a transition is done using image memory barrier. When only one queue family uses a given image at
a time, use the “exclusive” sharing mode.
If any of these requirements are not fulfilled, undefined behavior will probably occur and we may not rely on the
image contents.
In this example we are using only one queue so we don’t have to specify “concurrent” sharing mode and leave related
parameters (queueFamilyCount and pQueueFamilyIndices) blank (or nulled, or zeroed).
So now we can call the vkCreateSwapchainKHR() function to create a swap chain and check whether this operation
succeeded. After that (if we are recreating the swap chain, meaning this isn’t the first time we are creating one) we should
destroy the previous swap chain. I’ll discuss this later.
Image Presentation
We now have a working swap chain that contains several images. To use these images as render targets, we can get
handles to all images created with a swap chain, but we are not allowed to use them just like that. Swap chain images
belong to and are owned by the swap chain. This means that the application cannot use these images until it asks for
them. This also means that images are created and destroyed by the platform along with a swap chain (not by the
application).
So when the application wants to render into a swap chain image or use it in any other way, it must first get access to
it by asking a swap chain for it. If the swap chain makes us wait, we have to wait. And after the application finishes using
the image it should “return” it by presenting it. If we forget about returning images to a swap chain, we will soon run out
of images and nothing will display on the screen.
The application may also request access to more images at once but they must be available. Acquiring access may
require waiting. In corner cases, when there are too few images in a swap chain and the application wants to access too
many of them, or if we forget about returning images to a swap chain, the application may even wait an infinite amount
of time.
Given that there are (usually) at least two images, it may sound strange that we have to wait, but it is quite reasonable.
Not all images are available for the application because they are used by the presentation engine. Usually one image is
displayed. Additional images may also be required for the presentation engine to work properly. So we can’t use them
because it could block the presentation engine in some way. We don’t know its internal mechanisms and algorithms or
the requirements of the OS the application is executed on. So the availability of images may depend on many factors:
internal implementation, OS, number of created images, number of images the application wants to use at a single time
and on the selected presentation mode, which is the most important factor from the perspective of this tutorial.
In immediate mode, one image is always presented. Other images (at least one) are available for application. When
the application posts a presentation request (“returns” an image), the image that was displayed is replaced with the new
one. So if two images are created, only one image may be available for application at a single time. When the application
asks for another image, it must “return” the previous one. If it wants two images at a time, it must create a swap chain
with more images or it will wait forever. When we request more images, in immediate mode, the application can ask for
(own) “imageCount – 1” images at a time.
In FIFO mode one image is displayed, and the rest are placed in a FIFO queue. The length of this queue is always equal
to “imageCount – 1.” At first, all images may be available to the application (because the queue is empty and no image is
presented). When the application presents an image (“returns” it to a swap chain), it is appended to the end of the queue.
So as soon as the queue fills, the application has to wait for another image until the displayed image is released during the
vertical blanking period. Images are always displayed in the same order they were presented in by the application. When
the v-sync signal appears, the first image from the queue replaces the image that was displayed. The previously displayed
image (the released one) may become available to the application as it becomes unused (isn’t presented and is not waiting
in the queue). If all images are in the queue, the application will wait for the next blanking period to access another image.
If rendering takes longer than the refresh rates, the application will not have to wait at all. This behavior doesn’t change
when there are more images. The internal swap chain queue has always “imageCount – 1” elements.
The last mode available for the time being is MAILBOX. As previously mentioned, this mode is most similar to the
“traditional” triple buffering. One image is always displayed. The second image waits in a single-element queue (it always
has place for only one element). The rest of the images may be available for the application. When the application presents
an image, the image replaces the one waiting in the queue. The image in the queue gets displayed only during blanking
periods, but the application doesn’t need to wait for the next image (when there are more than two images). MAILBOX
mode with only two images behaves identically to FIFO mode—the application must wait for the v-sync signal to acquire
the next image. But with at least three images it immediately may acquire the image that was replaced by the “presented”
image (the one waiting in the queue). That’s why I requested one more image than the minimal number. If MAILBOX mode
is available I want to use it in a manner similar to triple buffering (maybe the first thing to do is to check what mode is
available and after that choose the number of swap chain images based on the selected presentation mode).
I hope these examples help you understand why the application must ask for an image if it wants to use any. In Vulkan
we can only do what is allowed and required—not less and usually not too much more.
uint32_t image_index;
VkResult result = vkAcquireNextImageKHR( Vulkan.Device, Vulkan.SwapChain, UINT64_MAX,
Vulkan.ImageAvailableSemaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
case VK_SUCCESS:
case VK_SUBOPTIMAL_KHR:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during swap chain image acquisition!\n" );
return false;
}
23. Tutorial02.cpp, function Draw()
To access an image, we must call the vkAcquireNextImageKHR() function. During the call we must specify (apart from
the device handle like in almost all other functions) a swap chain from which we want to use an image, a timeout, a
semaphore, and a fence object. A function, in case of a success, will store the image index in the variable we provided the
address of. Why an index and not the (handle to) image itself? Such a behavior may be convenient (that is, during the
“preprocessing” phase when we want to prepare as much data needed for rendering as possible to not waste time during
typical frame rendering) but I will describe it later. Just remember that we can check what images were created in a swap
chain if we want (we just can’t use them until we are allowed). An array of images will be provided upon such query. And
the vkAcquireNextImageKHR() function stores an index into this very array.
We have to specify a timeout because sometimes images may not be immediately available. Trying to use an image
before we are allowed to will cause an undefined behavior. Specifying a timeout gives the presentation engine time to
react. If it needs to wait for the next vertical blanking period it can do so and we give it a time. So this function will block
until the given time has passed. We can provide maximal available value so the function may even block indefinitely. If we
provide 0 for the timeout, the function will return immediately. If any image was available at the time the call occurred it
will be provided immediately. If there was no available image, an error will be returned stating that the image was not yet
ready.
Once we have our image we can use it however we want. Images are processed or referenced by commands stored
in command buffers. We can prepare command buffers earlier (to save as much processing time for rendering as we can)
and use or submit them here. Or we can prepare the commands now and submit them when we’re done. In Vulkan,
creating command buffers and submitting them to queues is the only way to cause operations to be performed by the
device.
When command buffers are submitted to queues, all their commands start being processed. But a queue cannot use
an image until it is allowed to, and the semaphore we created earlier is for internal queue synchronization—before the
queue starts processing commands that reference a given image, it should wait on this semaphore (until it gets signaled).
But this wait doesn’t block an application. There are two synchronization mechanisms for accessing swap chain images:
(1) a timeout, which may block an application but doesn’t stop queue processing, and (2) a semaphore, which doesn’t
block the application but blocks selected queues.
We now know (theoretically) how to render anything (through command buffers). So let’s now imagine that inside a
command buffer we are submitting some rendering operations take place. But before the processing will start, we should
tell the queue (on which this rendering will occur) to wait. This all is done within one submit operation.
VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&Vulkan.ImageAvailableSemaphore, // const VkSemaphore
*pWaitSemaphores
&wait_dst_stage_mask, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&Vulkan.PresentQueueCmdBuffers[image_index], // const VkCommandBuffer
*pCommandBuffers
1, // uint32_t
signalSemaphoreCount
&Vulkan.RenderingFinishedSemaphore // const VkSemaphore
*pSignalSemaphores
};
In this example we are telling the queue to wait only on one semaphore, which will be signaled by the presentation
engine when the queue can safely start processing commands referencing the swap chain image.
We also submit just one simple command buffer. It was prepared earlier (I will describe how to do it later). It only
clears the acquired image. But this is enough for us to see the selected color in our application’s window and to see that
the swap chain is working properly.
In the code above, the command buffers are arranged in an array (a vector, to be more precise). To make it easier to
submit the proper command buffer—the one that references the currently acquired image—I prepared a separate
command buffer for each swap chain image. The index of an image that the vkAcquireNextImageKHR() function provides
can be used right here. Using image handles (in similar scenarios) would require creating maps that would translate the
handle into a specific command buffer or index. On the other hand, normal numbers can be used to just select a specific
array element. This is why this function gives us indices and not image handles.
After we have submitted a command buffer, all the processing starts in the background, on “hardware.” Next, we
want to present a rendered image. Presenting means that we want our image to be displayed and that we are “giving it
back” to the swap chain. The code to do this might look like this:
VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&Vulkan.RenderingFinishedSemaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&Vulkan.SwapChain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( Vulkan.PresentQueue, &present_info );
switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during image presentation!\n" );
return false;
}
return true;
25. Tutorial02.cpp, function Draw()
An image (or images) is presented by calling the vkQueuePresentKHR() function. It may be perceived as submitting a
command buffer with only operation: presentation.
To present an image we must specify what images should be presented from how many and from which swap chains.
We can present many images from many swap chains at once (that is, to multiple windows) but only one image from a
single swap chain can be presented at once. We provide this information through the VkPresentInfoKHR structure, which
contains the following fields:
Now that we have prepared this structure, we can use it to present an image. In this example I’m just presenting a
single image from a single swap chain.
Each operation that is performed (or submitted) by calling vkQueue…() functions (this includes presenting) is
appended to the end of the queue for processing. Operations are processed in the order in which they were submitted.
For a presentation, we are presenting an image after submitting other command buffers. So the present queue will start
presenting an image after the processing of all the command buffers is done. This ensures that the image will be presented
after we are done using it (rendering into it) and an image with correct contents will be displayed on the screen. But in
this example we submit drawing (clearing) operations and a present operation to the same queue: the PresentQueue. We
are doing only simple operations that are allowed to be done on a present queue.
If we want to perform drawing operations on a queue that is different than the present operation, we need to
synchronize the queues. This is done, again, with semaphores, which is the reason why we created two semaphores (the
second one may not be necessary in this example, as we render and present using the same queue, but I wanted to show
how it should be done in the correct way).
The first semaphore is for presentation engine to tell the queue that it can safely use (reference/render into) an image.
The second semaphore is for us. It is signaled when the operations on the image (rendering into it) are done. The submit
info structure has a field called pSignalSemaphores. It is an array of semaphore handles that will be signaled after
processing of all of the submitted command buffers is finished. So we need to tell the second queue to wait on this second
semaphore. We store the handle of our second semaphore in the pWaitSemaphores field of a VkPresentInfoKHR structure.
And the queue to which we are submitting the present operation will wait, thanks to this second semaphore, until we are
done rendering into a given image.
And that’s it. We have displayed our first image using Vulkan!
This code sample is a fragment of an imaginary function that checks how many and what images were created inside
a swap chain. It is done by a traditional “double-call,” this time using a vkGetSwapchainImagesKHR() function. First we
call it with the last parameter set to null. This way the number of all images created in a swap chain is stored in an
“image_count” variable and we know how much storage we need to prepare for the handles of all images. The second
time we call this function, we achieve the handles in the array we have provided the address of through the last parameter.
Now we know all the images that the swap chain is using. For the vkAcquireNextImageKHR() function and
VkPresentInfoKHR structure, the indices I referred to are the indices into this array, an array “returned” by the
vkGetSwapchainImagesKHR() function. It is called an array of a swap chain’s presentable images. And if any function, in
the case of a swap chain, wants us to provide an index or returns an index, it is the index of an image in this very array.
Sometimes a swap chain gets old. This means that the properties of the surface, platform, or application window
properties changed in such a way that the current swap chain cannot be used any more. The most obvious (and
unfortunately not so good) example is when the window’s size changed. We cannot create a swap chain image nor can
we change its size. The only possibility is to destroy and recreate a swap chain. There are also situations in which we can
still use a swap chain, but it may no longer be optimal for surface it was created for.
These situations are notified by the return codes of the vkAcquireNextImageKHR() and vkQueuePresentKHR()
functions.
When the VK_SUBOPTIMAL_KHR value is returned, we can still use the current swap chain for presentation. It will still
work but not optimally (that is, color precision will be worse). It is advised to recreate swap chain when there is an
opportunity. A good example is when we have performed performance-heavy rendering and after acquiring the image we
are informed that our image is suboptimal. We don’t want to waste all this processing and make the user wait much longer
for another frame. We just present the image and recreate the swap chain as soon as there is an opportunity.
When VK_ERROR_OUT_OF_DATE_KHR is returned we cannot use current swap chain and we must recreate it
immediately. We cannot present using the current swap chain; this operation will fail. We have to recreate a swap chain
as soon as possible.
I have mentioned that changing the window size is the most obvious, but not so good, example of surface properties’
changes after which we should recreate a swap chain. In this situation we should recreate a swap chain, but we may not
be notified about it with the mentioned return codes. We should monitor the window size changes by ourselves using OS-
specific code. And that’s why the name of this function in our source is OnWindowSizeChanged. This function is called
every time a window’s size had changed. But as this function only recreates a swap chain (and command buffers) the same
function can be called here.
Recreation is done the same way as creation. There is a structure member in which we provide a swap chain that the
new one should replace. But we must implicitly destroy the old swap chain after we create the new one.
In the first tutorial, I described queues and queue families. If we want to execute commands on a device we submit
them to queues through command buffers. To put it in other words: commands are encapsulated inside command buffers.
Submitting such buffers to queues causes devices to start processing commands that were recorded in them. Do you
remember OpenGL’s drawing lists? We could prepare lists of commands that cause the geometry to be drawn in a form
of a list of, well, drawing commands. The situation in Vulkan is similar, but far more flexible and advanced.
Remember that command buffers can be submitted only to proper queue families and only the types of operations
compatible with a given family can be submitted to a given queue. Also, the command buffer itself is not connected with
any queue or queue family, but the memory pool from which buffer allocates its memory is. So each command buffer that
takes memory from a given pool can only be submitted to a queue from a proper queue family—a family from (inside?)
which the memory pool was created. If there are more queues created from a given family, we can submit a command
buffer to any one of them; the family index is the most important thing here.
VkCommandPoolCreateInfo cmd_pool_create_info = {
VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void*
pNext
0, // VkCommandPoolCreateFlags
flags
Vulkan.PresentQueueFamilyIndex // uint32_t
queueFamilyIndex
};
To create a pool for command buffer(s) we call a vkCreateCommandPool() function. It requires us to provide (an
address of) a variable of structure type VkCommandPoolCreateInfo. It contains the following members:
For our test application, we use only one queue from a presentation family, so we should use its index. Now we can
call the vkCreateCommandPool() function and check whether it succeeded. If yes, the handle to the command pool will
be stored in a variable we have provided the address of.
As described earlier, I allocate more than one command buffer—one for each swap chain image that will be referenced
by the drawing commands. So each time we acquire an image from a swap chain we can submit/use the proper command
buffer.
uint32_t image_count = 0;
if( (vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, nullptr
) != VK_SUCCESS) ||
(image_count == 0) ) {
printf( "Could not get the number of swap chain images!\n" );
return false;
}
Vulkan.PresentQueueCmdBuffers.resize( image_count );
VkCommandBufferAllocateInfo cmd_buffer_allocate_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void*
pNext
Vulkan.PresentQueueCmdPool, // VkCommandPool
commandPool
VK_COMMAND_BUFFER_LEVEL_PRIMARY, // VkCommandBufferLevel
level
image_count // uint32_t
bufferCount
};
if( vkAllocateCommandBuffers( Vulkan.Device, &cmd_buffer_allocate_info,
&Vulkan.PresentQueueCmdBuffers[0] ) != VK_SUCCESS ) {
printf( "Could not allocate command buffers!\n" );
return false;
}
if( !RecordCommandBuffers() ) {
printf( "Could not record command buffers!\n" );
return false;
}
return true;
28. Tutorial02.cpp, function CreateCommandBuffers()
First we need to know how many swap chain images were created (a swap chain may create more images than we
have specified). This was explained in an earlier section. We call the vkGetSwapchainImagesKHR() function with the last
parameter set to null. Right now we don’t need the handles of images, only their total number. After that we prepare an
array (vector) for a proper number of command buffers and we can create a proper number of command buffers. To do
this we call the vkAllocateCommandBuffers() function. It requires us to prepare a structured variable of type
VkCommandBufferAllocateInfo, which contains the following fields:
After calling the vkAllocateCommandBuffers() function, we need to check whether the buffer creations succeeded. If
yes, we are done allocating command buffers and we are ready to record some (simple) commands.
Here is a set of variables required (in this tutorial) to record command buffers:
uint32_t image_count = static_cast<uint32_t>(Vulkan.PresentQueueCmdBuffers.size());
VkCommandBufferBeginInfo cmd_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};
VkClearColorValue clear_color = {
{ 1.0f, 0.8f, 0.4f, 0.0f }
};
VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};
29. Tutorial02.cpp, function RecordCommandBuffers()
First we get the handles of all the swap chain images, which will be used in drawing commands (we will just clear them
to one single color but nevertheless we will use them). We already know the number of images, so we don’t have to ask
for it again. The handles of images are stored in a vector after calling the vkGetSwapchainImagesKHR() function.
Next, we need to prepare a variable of structured type VkCommandBufferBeginInfo. It contains the information
necessary in more typical rendering scenarios (like render passes). We won’t be doing such operations here and that’s
why we can set almost all parameters to zeros or nulls. But, for clarity, the structure contains the following fields:
Command buffers gather commands. To store commands in command buffers, we record them. The above structure
provides some necessary information for the driver to prepare for and optimize the recording process.
In Vulkan, command buffers are divided into primary and secondary. Primary command buffers are typical command
buffers similar to drawing lists. They are independent, individual “beings” and they (and only they) may be submitted to
queues. Secondary command buffers can also store commands (we also record them), but they may only be referenced
from within primary command buffers (we can call secondary command buffers from within primary command buffers
like calling OpenGL’s drawing lists from another drawing lists). We can’t submit secondary command buffers directly to
queues.
In this simple example we want to clear our images with one single value. So next we set up a color that will be used
for clearing. You can pick any value you like. I used a light orange color.
The last variable in the code above specifies the parts of the image that our operations will be performed on. Our
image consists of only one mipmap level and one array level (no stereoscopic buffers, and so on). We set values in the
VkImageSubresourceRange structure accordingly. This structure contains the following fields:
aspectMask – Depends on the image format as we are using images as color render targets (they have “color”
format) so we specify “color aspect” here.
baseMipLevel – First mipmap level that will be accessed (modified).
levelCount – Number of mipmap levels on which operations will be performed (including the base level).
baseArrayLayer – First array layer that will be accessed (modified).
arraySize – Number of layers the operations will be performed on (including the base layer).
If we create an image with different usages in mind and we want to perform different operations on it, we must
change the image’s current layout before we can perform each type of operation. To do this, we must transition from the
current layout to another layout that is compatible with the operations we are about to execute.
Each image we create is created (generally) with an undefined layout, and we must transition from it to another layout
if want to use the image. But swap-chain-created images have VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR layouts. This
layout, as the name suggests, is designed for the image to be used (presented) by the presentation engine (that is,
displayed on the screen). So if we want to perform some operations on swap chain images, we need to change their
layouts to ones compatible with the desired operations. And after we have finished with processing the images (that is,
rendering into them) we need to transition their layouts back to the VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR.
Otherwise, the presentation engine will not be able to use these images and undefined behavior may occur.
To transition from one layout to another one, image memory barriers are used. With them we can specify the old
layout (current) we are transitioning from and the new layout we are transitioning to. The old layout must always be equal
to the current or undefined layout. When we specify the old layout as undefined, image contents may be discarded during
transition. This allows the driver to perform some optimizations. If we want to preserve image contents we must specify
a layout that is equal to the current layout.
The last variable of type VkImageSubresourceRange in the code example above is also used for image transitions. It
defines what “parts” of the image are changing their layout and is required when preparing an image memory barrier.
VkImageMemoryBarrier barrier_from_clear_to_present = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
Vulkan.PresentQueueFamilyIndex, // uint32_t
srcQueueFamilyIndex
Vulkan.PresentQueueFamilyIndex, // uint32_t
dstQueueFamilyIndex
swap_chain_images[i], // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( Vulkan.PresentQueueCmdBuffers[i],
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0,
nullptr, 1, &barrier_from_clear_to_present );
if( vkEndCommandBuffer( Vulkan.PresentQueueCmdBuffers[i] ) != VK_SUCCESS ) {
printf( "Could not record command buffers!\n" );
return false;
}
}
return true;
30. Tutorial02.cpp, function RecordCommandBuffers()
This code is placed inside a loop. We are recording a command buffer for each swap chain image. That’s why we
needed a number of images. Image handles are also needed here. We need to specify them for image memory barriers
and during image clearing. But recall that I said we can’t use swap chain images until we are allowed to, until we acquire
the image from the swap chain. That’s true, but we aren’t using them here. We are only preparing commands. The usage
itself is performed when we submit operations (a command buffer) to the queue for execution. Here we are just telling
Vulkan that in the future, take this picture and do this with it, then that, and after that something more. This way we can
prepare as much work as we can before we start the main rendering loop and we avoid switches, ifs, jumps, and other
branches during the real rendering. This scenario won’t be so simple in real life, but I hope the example is clear.
In the above code above, we are first preparing two image memory barriers. Memory barriers are used to change
three different things in the case of images. From the tutorial point of view, only the layouts are interesting right now but
we need to properly set all fields. To set up a memory barrier we need to prepare a variable of type
VkImageMemoryBarrier, which contains the following fields:
Some notes are necessary regarding access masks and family indices. In this example before the first barrier and after
the second barrier only the presentation engine has access to the image. The presentation engine only reads from the
image (it doesn’t modify it) so we set srcAccessMask in the first barrier and dstAccessMask in the second barrier to
VK_ACCESS_MEMORY_READ_BIT. This indicates that the memory associated with the image is read-only (image contents
are not modified before the first barrier and after the second barrier). In our command buffer we will only clear an image.
This operation belongs to the so-called “transfer” operations. That is why I’ve set the value of
VK_ACCESS_TRANSFER_WRITE_BIT in the first barrier in dstAccessMask field and in the srcAccessMask field of the second
barrier.
I won’t go into more detail about queue family indices, but if a queue used for graphics operations and presentation
are the same, srcQueueFamilyIndex and dstQueueFamilyIndex will be equal, and the hardware won’t make any
modifications regarding image access from the queues. But remember that we have specified that only one queue at a
time will access/use the image. So if these queues are different, we inform the hardware here about the “ownership”
change, that different queue will now access the image. And this is all the information you need right now to properly set
up barriers.
We need to create two barriers: one that changes the layout from the “present source” (or undefined) to ”transfer
dst”. This barrier is used at the beginning of a command buffer, when the previously presentation engine used an image
and now we want to use it and modify it. The second barrier is used to change the layout back into the “present source”
when we are done using the images and we can give them back to a swap chain. This barrier is set at the end of a command
buffer.
Now we are ready to start recording our commands by calling the vkBeginCommandBuffer() function. We provide a
handle to a command buffer and an address of a variable of type VkCommandBufferBeginInfo and we are ready to go.
Next we set up a barrier to change the image layout. We call the vkCmdPipelineBarrier() function, which takes quite a
few parameters but in this example the only relevant ones are the first—the command buffer handle—and the last two:
number of elements (barriers) of an array and a pointer to first element of an array containing the addresses of variables
of type VkImageMemoryBarrier. Elements of this array describe images, their parts, and the types of transitions that
should occur. After the barrier we can safely perform any operations on the swap chain image that are compatible with
the layout we have transitioned images to. The general layout is compatible with all operations but with a (probably)
reduced performance.
In the example we are only clearing images so we call the vkCmdClearColorImage() function. It takes a handle to a
command buffer, handle to an image, current layout of an image, pointer to a variable with clear color value, number of
subresources (number of elements in the array from the last parameter), and an array of pointers to variables of type
VkImageSubresourceRange. Elements in the last array specify what parts of the image we want to clear (we don’t have to
clear all mipmaps or array levels of an image if we don’t want to).
And at the end of our recording session we set up another barrier that transitions the image layout back to a “present
source” layout. It is the only layout that is compatible with the present operations performed by the presentation engine.
Now we can call the vkEndCommandBuffer() function to inform that we have ended recording a command buffer. If
something went wrong during recording we will be informed about it through the value returned by this function. If there
were errors, we cannot use the command buffer, and we’ll need to record it once again. If everything is fine we can use
the command buffer later to tell our device to perform operations stored in it just by submitting the buffer to a queue.
Tutorial 2 Execution
In this example, if everything went fine, we should see a window with a light-orange color displayed inside it. The
contents of a window should look similar to this:
Cleaning Up
Now you know how to create a swap chain, display images in a window and perform simple operations that are
executed on a device. We have created command buffers, recorded them, and presented on the screen. Before we close
the application, we need to clean up the resources we were using. In this tutorial I have divided cleaning into two functions.
The first function clears (destroys) only those resources that should be recreated when the swap chain is recreated (that
is, after the size of an application’s window has changed).
if( Vulkan.Device != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( Vulkan.Device );
First we must be sure that no operations are executed on the device’s queues (we can’t destroy a resource that is
used by the currently processed commands). We can check it by calling vkDeviceWaitIdle() function. It will block until all
operations are finished.
Next we free all the allocated command buffers. In fact this operation is not necessary here. Destroying a command
pool implicitly frees all command buffers allocated from a given pool. But I want to show you how to explicitly free
command buffers. Next we destroy the command pool itself.
Here is the code that is responsible for destroying all of the resources created in this lesson:
Clear();
if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
dlclose( VulkanLibrary );
#endif
}
32. Tutorial02.cpp, destructor
First we destroy the semaphores (remember they cannot be destroyed when they are in use, that is, when a queue is
waiting on a given semaphore). After that we destroy a swap chain. Images that were created along with it are
automatically destroyed, and we don’t need to do it by ourselves (we are even not allowed to). Next the device is
destroyed. We also need to destroy the surface that represents our application’s window. At the end, the Vulkan instance
destruction takes place and the graphics driver’s dynamic library is unloaded. Before we perform each step we also check
whether a given resource was properly created. We can’t destroy resources that weren’t properly created.
Conclusion
In this tutorial you learned how to display on a screen anything that was created with Vulkan API. To brief review the
steps: First we enabled the proper instance level extensions. Next we created an application window’s Vulkan
representation called a surface. Then we chose a device with a queue family that supported presentation and created a
logical device (don’t forget about enabling device-level extensions!)
After that we created a swap chain. To do that we first acquired a set of parameters describing our surface and then
chose values for proper swap chain creation. Those values had to fit into a surface’s supported constraints.
To draw something on the screen we learned how to create and record command buffers, which also included image’s
layout transitions for which image memory barriers (pipeline barriers) were used. We cleared images so we could see the
selected color being displayed on screen.
And we also learned how to present a given image on the screen, which included acquiring an image, submitting a
command buffer, and the presentation process itself.
Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
The graphics pipeline and drawing in general require lots of preparations in Vulkan (in the form of filling many
structures with even more different fields). There are potentially many places where we can make mistakes, and in Vulkan,
even simple mistakes may lead to the application not working as expected, displaying just a blank screen, and leaving us
wondering what went wrong. In such situations validation layers can help us a lot. But I didn’t want to dive into too many
different aspects and the specifics of the Vulkan API. So I prepared the code to be as small and as simple as possible.
This led me to create an application that is working properly and displays a simple triangle the way I expected, but it
also uses mechanics that are not recommended, not flexible, and also probably not too efficient (though correct). I don’t
want to teach solutions that aren’t recommended, but here it simplifies the tutorial quite considerably and allows us to
focus only on the minimal required set of API usage. I will point out the “disputable” functionality as soon as we get to it.
And in the next tutorial, I will show the recommended way of drawing triangles.
To draw our first simple triangle, we need to create a render pass, a framebuffer, and a graphics pipeline. Command
buffers are of course also needed, but we already know something about them. We will create simple GLSL shaders and
compile them into Khronos’s SPIR*-V language—the only (at this time) form of shaders that Vulkan (officially) understands.
If nothing displays on your computer’s screen, try to simplify the code as much as possible or even go back to the
second tutorial. Check whether command buffer that just clears image behaves as expected, and that the color the image
was cleared to is properly displayed on the screen. If yes, modify the code and add the parts from this tutorial. Check every
return value if it is not VK_SUCCESS. If these ideas don’t help, wait for the tutorial about validation layers.
I’ve also added a separate set of files for some utility functions. Here we will be reading SPIR-V shaders from binary
files, so I’ve added a function for checking loading contents of a binary file. It can be found in Tools.cpp and Tools.h files.
What is a render pass? A general picture can give us a “logical” render pass that may be found in many known
rendering techniques like deferred shading. This technique consists of many subpasses. The first subpass draws the
geometry with shaders that fill the G-Buffer: store diffuse color in one texture, normal vectors in another, shininess in
another, depth (position) in yet another. Next for each light source, drawing is performed that reads some of the data
(normal vectors, shininess, depth/position), calculates lighting and stores it in another texture. Final pass aggregates
lighting data with diffuse color. This is a (very rough) explanation of deferred shading but describes the render pass—a set
of data required to perform some drawing operations: storing data in textures and reading data from other textures.
In Vulkan, a render pass represents (or describes) a set of framebuffer attachments (images) required for drawing
operations and a collection of subpasses that drawing operations will be ordered into. It is a construct that collects all
color, depth and stencil attachments and operations modifying them in such a way that driver does not have to deduce
this information by itself what may give substantial optimization opportunities on some GPUs. A subpass consists of
drawing operations that use (more or less) the same attachments. Each of these drawing operations may read from some
input attachments and render data into some other (color, depth, stencil) attachments. A render pass also describes the
dependencies between these attachments: in one subpass we perform rendering into the texture, but in another this
texture will be used as a source of data (that is, it will be sampled from). All this data help the graphics hardware optimize
drawing operations.
To create a render pass in Vulkan, we call the vkCreateRenderPass() function, which requires a pointer to a structure
describing all the attachments involved in rendering and all the subpasses forming the render pass. As usual, the more
attachments and subpasses we use, the more array elements containing properly filed structures we need. In our simple
example, we will be drawing only into a single texture (color attachment) with just a single subpass.
To create a render pass, first we prepare an array with elements describing each attachment, regardless of the type
of attachment and how it will be used inside a render pass. Each array element is of type VkAttachmentDescription, which
contains the following fields:
flags – Describes additional properties of an attachment. Currently, only an aliasing flag is available, which
informs the driver that the attachment shares the same physical memory with another attachment; it is not
the case here so we set this parameter to zero.
format – Format of an image used for the attachment; here we are rendering directly into a swap chain so we
need to take its format.
samples – Number of samples of the image; we are not using any multisampling here so we just use one
sample.
loadOp – Specifies what to do with the image’s contents at the beginning of a render pass, whether we want
them to be cleared, preserved, or we don’t care about them (as we will overwrite them all). Here we want to
clear the image to the specified value. This parameter also refers to depth part of depth/stencil images.
storeOp – Informs the driver what to do with the image’s contents after the render pass (after a subpass in
which the image was used for the last time). Here we want the contents of the image to be preserved after
the render pass as we intend to display them on screen. This parameter also refers to the depth part of
depth/stencil images.
stencilLoadOp – The same as loadOp but for the stencil part of depth/stencil images; for color attachments it
is ignored.
stencilStoreOp – The same as storeOp but for the stencil part of depth/stencil images; for color attachments
this parameter is ignored.
initialLayout – The layout the given attachment will have when the render pass starts (what the layout image
is provided with by the application).
finalLayout – The layout the driver will automatically transition the given image into at the end of a render
pass.
Some additional information is required for load and store operations and initial and final layouts.
Load op refers to the attachment’s contents at the beginning of a render pass. This operation describes what the
graphics hardware should do with the attachment: clear it, operate on its existing contents (leave its contents untouched),
or it shouldn’t matter about the contents because the application intends to overwrite them. This gives the hardware an
opportunity to optimize memory operations. For example, if we intend to overwrite all of the contents, the hardware
won’t bother with them and, if it is faster, may allocate totally new memory for the attachment.
Store op, as the name suggests, is used at the end of a render pass and informs the hardware whether we want to use
the contents of the attachment after the render pass or if we don’t care about it and the contents may be discarded. In
some scenarios (when contents are discarded) this creates the ability for the hardware to create an image in temporary,
fast memory as the image will “live” only during the render pass and the implementations may save some memory
bandwidth avoiding writing back data that is not needed anymore.
When an attachment has a depth format (and potentially also a stencil component) load and store ops refer only to
the depth component. If a stencil is present, stencil values are treated the way stencil load and store ops describe. For
color attachments, stencil ops are not relevant.
Layout, as I described in the swap chain tutorial, is an internal memory arrangement of an image. Image data may be
organized in such a way that neighboring “image pixels” are also neighbors in memory, which can increase cache hits
(faster memory reading) when image is used as a source of data (that is, during texture sampling). But caching is not
necessary when the image is used as a target for drawing operations, and the memory for that image may be organized
in a totally different way. Image may have linear layout (which gives the CPU ability to read or populate image’s memory
contents) or optimal layout (which is optimized for performance but is also hardware/vendor dependent). So some
hardware may have special memory organization for some types of operations; other hardware may be operations-
agnostic. Some of the memory layouts may be better suited for some intended image “usages.” Or from the other side,
some usages may require specific memory layouts. There is also a general layout that is compatible with all types of
operations. But from the performance point of view, it is always best to set the layout appropriate for an intended image
usage and it is application’s responsibility to inform the driver about transitions.
Image layouts may be changed using image memory barriers. We did this in the swap chain tutorial when we first
changed the layout from the presentation source (image was used by the presentation engine) to transfer destination (we
wanted to clear the image with a given color). But layouts, apart from image memory barriers, may also be changed
automatically by the hardware inside a render pass. If we specify a different initial layout, subpass layouts (described
later), and final layout, the hardware does the transition automatically at the appropriate time.
Initial layout informs the hardware about the layout the application “provides” (or “leaves”) the given attachment
with. This is the layout the image starts with at the beginning of a render pass (in our example we acquire the image from
the presentation engine so the image has a “presentation source” layout set). Each subpass of a render pass may use a
different layout, and the transition will be done automatically by the hardware between subpasses. The final layout is the
layout the given attachment will be transitioned into (automatically) at the end of a render pass (after a render pass is
finished).
This information must be prepared for each attachment that will be used in a render pass. When graphics hardware
receives this information a priori, it may optimize operations and memory during the render pass to achieve the best
possible performance.
Subpass Description
VkAttachmentReference color_attachment_references[] = {
{
0, // uint32_t
attachment
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL // VkImageLayout
layout
}
};
VkSubpassDescription subpass_descriptions[] = {
{
0, // VkSubpassDescriptionFlags
flags
VK_PIPELINE_BIND_POINT_GRAPHICS, // VkPipelineBindPoint
pipelineBindPoint
0, // uint32_t
inputAttachmentCount
nullptr, // const VkAttachmentReference
*pInputAttachments
1, // uint32_t
colorAttachmentCount
color_attachment_references, // const VkAttachmentReference
*pColorAttachments
nullptr, // const VkAttachmentReference
*pResolveAttachments
nullptr, // const VkAttachmentReference
*pDepthStencilAttachment
0, // uint32_t
preserveAttachmentCount
nullptr // const uint32_t*
pPreserveAttachments
}
};
2. Tutorial03.cpp, function CreateRenderPass()
Next we specify the description of each subpass our render pass will include. This is done using VkSubpassDescription
structure, which contains the following fields:
This structure contains references (indices) into the attachment_descriptions array of VkRenderPassCreateInfo. When
we create a render pass we must provide a description of all attachments used during a render pass. We’ve prepared this
description earlier in “Render pass attachment description” when we created the attachment_descriptions array. Right
now it contains only one element, but in more advanced scenarios there will be multiple attachments. So this “general”
collection of all render pass attachments is used as a reference point. In the subpass description, when we fill
pColorAttachments or pDepthStencilAttachment members, we provide indices into this very “general” collection, like this:
take the first attachment from all render pass attachments and use it as a color attachment. The second attachment from
that array will be used for depth data.
There is a separation between a whole render pass and its subpasses because each subpass may use multiple
attachments in a different way, that is, in one subpass we are rendering into one color attachment but in the next subpass
we are reading from this attachment. In this way, we can prepare a list of all attachments used in the whole render pass,
and at the same time we can specify how each attachment will be used in each subpass. And as each subpass may use a
given attachment in its own way, we must also specify each image’s layout for each subpass.
So before we can specify a description of all subpasses (an array with elements of type VkSubpassDescription) we
must create references for each attachment used in each subpass. And this is what the color_attachment_references
variable was created for. When I write a tutorial for rendering into a texture, this usage will be more apparent.
return true;
3. Tutorial03.cpp, function CreateRenderPass()
We start by filling the VkRenderPassCreateInfo structure, which contains the following fields:
Dependencies describe what parts of the graphics pipeline use memory resource in what way. Each subpass may use
resources in a different way. Layouts of each resource may not solely define how they use resources. Some subpasses may
render into images or store data through shader images. Other may not use images at all or may read from them at
different pipeline stages (that is, vertex or fragment).
This information helps the driver optimize automatic layout transitions and, more generally, optimize barriers
between subpasses. When we are writing into images only in a vertex shader there is no point waiting until the fragment
shader executes (of course in terms of used images). After all the vertex operations are done, images may immediately
change their layouts and memory access type, and even some parts of graphics hardware may start executing the next
operations (that are referencing or reading the given images) without the need to wait for the rest of the commands from
the given subpass to finish. For now, just remember that dependencies are important from a performance point of view.
So now that we have prepared all the information required to create a render pass, we can safely call the
vkCreateRenderPass() function.
Creating a Framebuffer
We have created a render pass. It describes all attachments and all subpasses used during the render pass. But this
description is quite abstract. We have specified formats of all attachments (just one image in this example) and described
how attachments will be used by each subpass (also just one here). But we didn’t specify WHAT attachments we will be
using or, in other words, what images will be used as these attachments. This is done through a framebuffer.
A framebuffer describes specific images that the render pass operates on. In OpenGL*, a framebuffer is a set of
textures (attachments) we are rendering into. In Vulkan, this term is much broader. It describes all the textures
(attachments) used during the render pass, not only the images we are rendering into (color and depth/stencil
attachments) but also images used as a source of data (input attachments).
This separation of render pass and framebuffer gives us some additional flexibility. We can use the given render pass
with different framebuffers and a given framebuffer with different render passes, if they are compatible, meaning that
they operate in a similar fashion on images of similar types and usages.
Before we can create a framebuffer, we must create image views for each image used as a framebuffer and render
pass attachment. In Vulkan, not only in the case of framebuffers, but in general, we don’t operate on images themselves.
Images are not accessed directly. For this purpose, image views are used. Image views represent images, they “wrap”
images and provide additional (meta)data for them.
As you can see here, we acquire handles to all swap chain images, and we are referencing them inside a loop. This
way we fill the structure required for image view creation, which we pass to a vkCreateImageView() function. We do this
for each image that was created along with a swap chain.
The framebuffer specifies what images are used as attachments on which the render pass operates. We can say that
it translates image (image view) into a given attachment. The number of images specified for a framebuffer must be the
same as the number of attachments in a render pass for which we are creating a framebuffer. Also, each pAttachments
array’s element corresponds directly to an attachment in a render pass description structure. Render pass and framebuffer
are closely connected, and that’s why we also must specify a render pass during framebuffer creation. But we may use a
framebuffer not only with the specified render pass but also with all render passes that are compatible with the one
specified. Compatible render passes, in general, must have the same number of attachments and corresponding
attachments must have the same format and number of samples. But image layouts (initial, final, and for each subpass)
may differ and doesn’t involve render pass compatibility.
After we have finished creating and filling the VkFramebufferCreateInfo structure, we call the vkCreateFramebuffer()
function.
The above code executes in a loop. A framebuffer references image views. Here the image view is created for each
swap chain image. So for each swap chain image and its view, we are creating a framebuffer. We are doing this in order
to simplify the code called in a rendering loop. In a normal, real-life scenario we wouldn’t (probably) create a framebuffer
for each swap chain image. I assume that a better solution would be to render into a single image (texture) and after that
use command buffers that would copy rendering results from that image into a given swap chain image. This way we will
have only three simple command buffers that are connected with a swap chain. All other rendering commands would be
independent of a swap chain, making it easier to maintain.
In OpenGL there are multiple programmable stages (vertex, tessellation, fragment shaders, and so on) and some fixed
function stages (rasterizer, depth test, blending, and so on). In Vulkan, the situation is similar. There are similar (if not
identical) stages. But the whole pipeline’s state is gathered in one monolithic object. OpenGL allows us to change the state
that influences rendering operations anytime we want, we can change parameters for each stage (mostly) independently.
We can set up shader programs, depths test, blending, and whatever state we want, and then we can render some objects.
Next we can change just some small part of the state and render another object. In Vulkan, such operations can’t be done
(we say that pipelines are “immutable”). We must prepare the whole state and set up parameters for pipeline stages and
group them in a pipeline object. At the beginning this was one of the most startling pieces information for me. I’m not
able to change shader program anytime I want? Why?
The easiest and more valid explanation is because of the performance implications of such state changes. Changing
just one single state of the whole pipeline may cause graphics hardware to perform many background operations like
state and error checking. Different hardware vendors may implement (and usually are implementing) such functionality
differently. This may cause applications to perform differently (meaning unpredictably, performance-wise) when executed
on different graphics hardware. So the ability to change anything at any time is convenient for developers. But,
unfortunately, it is not so convenient for the hardware.
That’s why in Vulkan the state of the whole pipeline is to gather in one, single object. All the relevant state and error
checking is performed when the pipeline object is created. When there are problems (like different parts of pipeline are
set up in an incompatible way) pipeline object creation fails. But we know that upfront. The driver doesn’t have to worry
for us and do whatever it can to properly use such a broken pipeline. It can immediately tell us about the problem. But
during real usage, in performance-critical parts of the application, everything is already set up correctly and can be used
as is.
The downside of this methodology is that we have to create multiple pipeline objects, multiple variations of pipeline
objects when we are drawing many objects in a different way (some opaque, some semi-transparent, some with depth
test enabled, others without). Unfortunately, even different shaders make us create different pipeline objects. If we want
to draw objects using different shaders, we also have to create multiple pipeline objects, one for each combination of
shader programs. Shaders are also connected with the whole pipeline state. They use different resources (like textures
and buffers), render into different color attachments, and read from different attachments (possibly that were rendered
into before). These connections must also be initialized, prepared, and set up correctly. We know what we want to do,
the driver does not. So it is better and far more logical that we do it, not the driver. In general this approach makes sense.
In OpenGL, we write shaders in GLSL. They are compiled and then linked into shader programs directly in our
application. We can use or stop using a shader program anytime we want in our application.
Vulkan on the other hand accepts only a binary representation of shaders, an intermediate language called SPIR-V.
We can’t provide GLSL code like we did in OpenGL. But there is an official, separate compiler that can transform shaders
written in GLSL into a binary SPIR-V language. To use it, we have to do it offline. After we prepare the SPIR-V assembly we
can create a shader module from it. Such modules are then composed into an array of VkPipelineShaderStageCreateInfo
structures, which are used, among other parameters, to create graphics pipeline.
Here’s the code that creates a shader module from a specified file that contains a binary SPIR-V.
const std::vector<char> code = Tools::GetBinaryFileContents( filename );
if( code.size() == 0 ) {
return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}
VkShaderModuleCreateInfo shader_module_create_info = {
VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkShaderModuleCreateFlags
flags
code.size(), // size_t
codeSize
reinterpret_cast<const uint32_t*>(&code[0]) // const uint32_t
*pCode
};
VkShaderModule shader_module;
if( vkCreateShaderModule( GetDevice(), &shader_module_create_info, nullptr,
&shader_module ) != VK_SUCCESS ) {
printf( "Could not create shader module from a %s file!\n", filename );
return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}
To acquire the contents of the file, I have prepared a simple utility function GetBinaryFileContents() that reads the
entire contents of a specified file. It returns the content in a vector of chars.
After we prepare a structure, we can call the vkCreateShaderModule() function and check whether everything went
fine.
The AutoDeleter<> class from Tools namespace is a helper class that wraps a given Vulkan object handle and takes a
function that is called to delete that object. This class is similar to smart pointers, which delete the allocated memory
when the object (the smart pointer) goes out of scope. AutoDeleter<> class takes the handle of a given object and deletes
it with a provided function when the object of this class’s type goes out of scope.
template<class T, class F>
class AutoDeleter {
public:
AutoDeleter() :
Object( VK_NULL_HANDLE ),
Deleter( nullptr ),
Device( VK_NULL_HANDLE ) {
}
~AutoDeleter() {
if( (Object != VK_NULL_HANDLE) && (Deleter != nullptr) && (Device !=
VK_NULL_HANDLE) ) {
Deleter( Device, Object, nullptr );
}
}
AutoDeleter& operator=( AutoDeleter&& other ) {
if( this != &other ) {
Object = other.Object;
Deleter = other.Deleter;
Device = other.Device;
other.Object = VK_NULL_HANDLE;
}
return *this;
}
T Get() {
return Object;
}
private:
AutoDeleter( const AutoDeleter& );
AutoDeleter& operator=( const AutoDeleter& );
T Object;
F Deleter;
VkDevice Device;
};
7. Tools.h, -
Why so much effort for one simple object? Shader modules are one of the objects required to create the graphics
pipeline. But after the pipeline is created, we don’t need these shader modules anymore. Sometimes it is convenient to
keep them as we may need to create additional, similar pipelines. But in this example they may be safely destroyed after
we create a graphics pipeline. Shader modules are destroyed by calling the vkDestroyShaderModule() function. But in the
example, we would need to call this function in many places: inside multiple “ifs” and at the end of the whole function.
Because I don’t want to remember where I need to call this function and, at the same time, I don’t want any memory leaks
to occur, I have prepared this simple class just for convenience. Now, I don’t have to remember to delete the created
shader module because it will be deleted automatically.
std::vector<VkPipelineShaderStageCreateInfo> shader_stage_create_infos = {
// Vertex shader
{
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineShaderStageCreateFlags flags
VK_SHADER_STAGE_VERTEX_BIT, //
VkShaderStageFlagBits stage
vertex_shader_module.Get(), // VkShaderModule
module
"main", // const char
*pName
nullptr // const
VkSpecializationInfo *pSpecializationInfo
},
// Fragment shader
{
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineShaderStageCreateFlags flags
VK_SHADER_STAGE_FRAGMENT_BIT, //
VkShaderStageFlagBits stage
fragment_shader_module.Get(), // VkShaderModule
module
"main", // const char
*pName
nullptr // const
VkSpecializationInfo *pSpecializationInfo
}
};
8. Tutorial03.cpp, function CreatePipeline()
At the beginning we are creating two shader modules for vertex and fragment stages. They are created with the
function presented earlier. When any error occurs and we return from the CreatePipeline() function, any created module
is deleted automatically by a wrapper class with a provided deleter function.
The code for the shader modules is read from files that contain the binary SPIR-V assembly. These files are generated
with an application called “glslangValidator”. This is a tool distributed officially with the Vulkan SDK and is designed to
validate GLSL shaders. But “glslangValidator” also has the capability to compile or rather transform GLSL shaders into SPIR-
V binary files. A full explanation of the command line for its usage can be found at the official SDK site. I’ve used the
following commands to generate SPIR-V shaders for this tutorial:
glslangValidator.exe -V -H shader.vert > vert.spv.txt
glslangValidator.exe -V -H shader.frag > frag.spv.txt
“glslangValidator” takes a specified file and generates SPIR-V file from it. The type of shader stage is automatically
detected by the input file’s extension (“.vert” for vertex shaders, “.geom” for geometry shaders, and so on). The name of
the generated file can be specified, but by default it takes a form “<stage>.spv”. So in our example “vert.spv” and
“frag.spv” files will be generated.
SPIR-V files have a binary format so it may be hard to read and analyze them—but not impossible. When the “-H”
option is used, “glslangValidator” outputs SPIR-V in a form that can be more easily read. This form is printed on standard
output and that’s why I’m using the “> *.spv.txt” redirection operator.
Here are the contents of a “shader.vert” file from which SPIR-V assembly was generated for the vertex stage:
#version 400
void main() {
vec2 pos[3] = vec2[3]( vec2(-0.7, 0.7), vec2(0.7, 0.7), vec2(0.0, -0.7) );
gl_Position = vec4( pos[gl_VertexIndex], 0.0, 1.0 );
}
9. shader.vert, -
As you can see I have hardcoded the positions of all vertices used to render the triangle. They are indexed using the
Vulkan-specific “gl_VertexIndex” built-in variable. In the simplest scenario, when using non-indexed drawing commands
(which takes place here) this value starts from the value of the “firstVertex” parameter of a drawing command (zero in
the provided example).
This is the disputable part I wrote about earlier—this approach is acceptable and valid but not quite convenient to
maintain and also allows us to skip some of the “structure filling” needed to create the graphics pipeline. I’ve chosen it in
order to shorten and simplify this tutorial as much as possible. In the next tutorial, I will present a more typical way of
drawing any number of vertices, similar to using vertex arrays and indices in OpenGL.
Below is the source code of a fragment shader from the “shader.frag” file that was used to generate the SPIRV-V
assembly for the fragment stage:
#version 400
void main() {
out_Color = vec4( 0.0, 0.4, 1.0, 1.0 );
}
10. shader.frag, -
In Vulkan’s shaders (when transforming from GLSL to SPIR-V) layout qualifiers are required. Here we specify to what
output (color) attachment we want to store the color values generated by the fragment shader. Because we are using only
one attachment, we must specify the first available location (zero).
Now that you know how to prepare shaders for applications using Vulkan, we can move on to the next step. After we
have created two shader modules, we check whether these operations succeeded. If they did we can start preparing a
description of all shader stages that will constitute our graphics pipeline.
For each enabled shader stage we need to prepare an instance of VkPipelineShaderStageCreateInfo structure. Arrays
of these structures along with the number of its elements are together used in a graphics pipeline create info structure
(provided to the function that creates the graphics pipeline). VkPipelineShaderStageCreateInfo structure has the following
fields:
sType – Type of structure that we are preparing, which in this case must be equal to
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO.
pNext – Pointer reserved for extensions.
flags – Parameter reserved for future use.
stage – Type of shader stage we are describing (like vertex, tessellation control, and so on).
module – Handle to a shader module that contains the shader for a given stage.
pName – Name of the entry point of the provided shader.
pSpecializationInfo – Pointer to a VkSpecializationInfo structure, which we will leave for now and set to null.
When we are creating a graphics pipeline we don’t create too many (Vulkan) objects. Most of the data is presented in
a form of just such structures.
Preparing Description of a Vertex Input
Now we must provide a description of the input data used for drawing. This is similar to OpenGL’s vertex data:
attributes, number of components, buffers from which to take data, data’s stride, or step rate. In Vulkan this data is of
course prepared in a different way, but in general the meaning is the same. Fortunately, because of the fact that vertex
data is hardcoded into a vertex shader in this tutorial, we can almost entirely skip this step and fill the
VkPipelineVertexInputStateCreateInfo with almost nulls and zeros:
VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineVertexInputStateCreateFlags flags;
0, // uint32_t
vertexBindingDescriptionCount
nullptr, // const
VkVertexInputBindingDescription *pVertexBindingDescriptions
0, // uint32_t
vertexAttributeDescriptionCount
nullptr // const
VkVertexInputAttributeDescription *pVertexAttributeDescriptions
};
11. Tutorial03.cpp, function CreatePipeline()
But for clarity here is a description of the members of the VkPipelineVertexInputStateCreateInfo structure:
We do that through the VkPipelineInputAssemblyStateCreateInfo structure, which contains the following members:
VkRect2D scissor = {
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
300, // int32_t
width
300 // int32_t
height
}
};
VkPipelineViewportStateCreateInfo viewport_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineViewportStateCreateFlags flags
1, // uint32_t
viewportCount
&viewport, // const VkViewport
*pViewports
1, // uint32_t
scissorCount
&scissor // const VkRect2D
*pScissors
};
13. Tutorial03.cpp, function CreatePipeline()
In this example, the usage is simple: we just set the viewport coordinates to some predefined values. I don’t check the
size of the swap chain image we are rendering into. But remember that in real-life production applications this has to be
done because the specification states that dimensions of the viewport cannot exceed the dimensions of the attachments
that we are rendering into.
To specify the viewport’s parameters, we fill the VkViewport structure that contains these fields:
When specifying viewport coordinates, remember that the origin is different than in OpenGL. Here we specify the
upper-left corner of the viewport (not the lower left).
Also worth noting is that the minDepth and maxDepth values must be between 0.0 and 1.0 (inclusive) but maxDepth
can be lower than minDepth. This will cause the depth to be calculated in “reverse.”
Next we must specify the parameters for the scissor test. The scissor test, similarly to OpenGL, restricts generation of
fragments only to the specified rectangular area. But in Vulkan, the scissor test is always enabled and can’t be turned off.
We can just provide the values identical to the ones provided for viewport. Try changing these values and see how it
influences the generated image.
The scissor test doesn’t have a dedicated structure. To provide data for it we fill the VkRect2D structure which contains
two similar structure members. First is VkOffset2D with the following members:
The second member is of type VkExtent2D, which contains the following fields:
In general, the meaning of the data we provide for the scissor test through the VkRect2D structure is similar to the
data prepared for viewport.
After we have finished preparing data for viewport and the scissor test, we can finally fill the structure that is used
during pipeline creation. The structure is called VkPipelineViewportStateCreateInfo, and it contains the following fields:
Remember that the viewportCount and scissorCount parameters must be equal. We are also allowed to specify more
viewports, but then the multiViewport feature must be also enabled.
Preparing the Rasterization State’s Description
The next part of the graphics pipeline creation applies to the rasterization state. We must specify how polygons are
going to be rasterized (changed into fragments), which means whether we want fragments to be generated for whole
polygons or just their edges (polygon mode) or whether we want to see the front or back side or maybe both sides of the
polygon (face culling). We can also provide depth bias parameters or indicate whether we want to enable depth clamp.
This whole state is encapsulated into VkPipelineRasterizationStateCreateInfo. It contains the following members:
Here is the source code responsible for setting rasterization state in our example:
VkPipelineRasterizationStateCreateInfo rasterization_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineRasterizationStateCreateFlags flags
VK_FALSE, // VkBool32
depthClampEnable
VK_FALSE, // VkBool32
rasterizerDiscardEnable
VK_POLYGON_MODE_FILL, // VkPolygonMode
polygonMode
VK_CULL_MODE_BACK_BIT, // VkCullModeFlags
cullMode
VK_FRONT_FACE_COUNTER_CLOCKWISE, // VkFrontFace
frontFace
VK_FALSE, // VkBool32
depthBiasEnable
0.0f, // float
depthBiasConstantFactor
0.0f, // float
depthBiasClamp
0.0f, // float
depthBiasSlopeFactor
1.0f // float
lineWidth
};
14. Tutorial03.cpp, function CreatePipeline()
In the tutorial we are disabling as many parameters as possible to simplify the process, the code itself, and the
rendering operations. The parameters that matter here set up (typical) fill mode for polygon rasterization, back face
culling, and similar to OpenGL’s counterclockwise front faces. Depth biasing and clamping are also disabled (to enable
depth clamping, we first need to enable a dedicated feature during logical device creation; similarly we must do the same
for polygon modes other than “fill”).
In this example, I wanted to minimize possible problems so I’ve set parameters to values that generally disable
multisampling—just one sample per given pixel with the other parameters turned off. Remember that if we want to enable
sample shading or alpha to one, we also need to enable two respective features. Here is a source code that prepares the
VkPipelineMultisampleStateCreateInfo structure:
VkPipelineMultisampleStateCreateInfo multisample_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineMultisampleStateCreateFlags flags
VK_SAMPLE_COUNT_1_BIT, //
VkSampleCountFlagBits rasterizationSamples
VK_FALSE, // VkBool32
sampleShadingEnable
1.0f, // float
minSampleShading
nullptr, // const VkSampleMask
*pSampleMask
VK_FALSE, // VkBool32
alphaToCoverageEnable
VK_FALSE // VkBool32
alphaToOneEnable
};
15. Tutorial03.cpp, function CreatePipeline()
VkPipelineColorBlendStateCreateInfo color_blend_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineColorBlendStateCreateFlags flags
VK_FALSE, // VkBool32
logicOpEnable
VK_LOGIC_OP_COPY, // VkLogicOp
logicOp
1, // uint32_t
attachmentCount
&color_blend_attachment_state, // const
VkPipelineColorBlendAttachmentState *pAttachments
{ 0.0f, 0.0f, 0.0f, 0.0f } // float
blendConstants[4]
};
16. Tutorial03.cpp, function CreatePipeline()
Final color operations are set up through the VkPipelineColorBlendStateCreateInfo structure. It contains the following
fields:
More information is needed for the attachmentCount and pAttachments parameters. When we want to perform
drawing operations we set up parameters, the most important of which are graphics pipeline, render pass, and
framebuffer. The graphics card needs to know how to draw (graphics pipeline which describes rendering state, shaders,
test, and so on) and where to draw (the render pass gives general setup; the framebuffer specifies exactly what images
are used). As I have already mentioned, the render pass specifies how operations are ordered, what the dependencies
are, when we are rendering into a given attachment, and when we are reading from the same attachment. These stages
take the form of subpasses. And for each drawing operation we can (but don’t have to) enable/use a different pipeline.
But when we are drawing, we must remember that we are drawing into a set of attachments. This set is defined in a
render pass, which describes all color, input, depth attachments (the framebuffer just specifies what images are used for
each of them). For the blending state, we can specify whether we want to enable blending at all. This is done through the
pAttachments array. Each of its elements must correspond to each color attachment defined in a render pass. So the value
of attachmentCount elements in the pAttachments array must equal the number of color attachments defined in a render
pass.
There is one more restriction. By default all elements in pAttachments array must contain the same values, must be
specified in the same way, and must be identical. By default, blending (and color masks) is done in the same way for all
attachments. So why it is an array? Why can’t we just specify one value? Because there is a feature that allows us to
perform independent, distinct blending for each active color attachment. When we enable the independent blending
feature during device creation we can provide different values for each color attachment.
In this example, we disable blending, which causes all other parameters to be irrelevant. Except for colorWriteMask,
we select all components for writing but you can freely check what will happen when this parameter is changed to some
other R, G, B, A combinations.
With Vulkan, the situation is similar. We create some form of a memory layout: first there are two buffers, next we
have three textures and an image. This memory “structure” is called a set and a collection of these sets is provided for the
pipeline. In shaders, we access specified resources using specific memory “locations” from within these sets (layouts). This
is done through a layout (set = X, binding = Y) specifier, which can be translated to: take the resource from the Y memory
location from the X set.
And pipeline layout can be thought of as an interface between shader stages and shader resources as it takes these
groups of resources, describes how they are gathered, and provides them to the pipeline.
This process is complex and I plan to devote a tutorial to it. Here we are not using any additional resources so I present
an example for creating an “empty” pipeline layout:
VkPipelineLayoutCreateInfo layout_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkPipelineLayoutCreateFlags
flags
0, // uint32_t
setLayoutCount
nullptr, // const VkDescriptorSetLayout
*pSetLayouts
0, // uint32_t
pushConstantRangeCount
nullptr // const VkPushConstantRange
*pPushConstantRanges
};
VkPipelineLayout pipeline_layout;
if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr,
&pipeline_layout ) != VK_SUCCESS ) {
printf( "Could not create pipeline layout!\n" );
return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>();
}
To create a pipeline layout we must first prepare a variable of type VkPipelineLayoutCreateInfo. It contains the
following fields:
In this example we create “empty” layout so almost all the fields are set to null or zero.
We are not using push constants here, but they deserve some explanation. Push constants in Vulkan allow us to modify
the data of constant variables used in shaders. There is a special, small amount of memory reserved for push constants.
We update their values through Vulkan commands, not through memory updates, and it is expected that updates of push
constants’ values are faster than normal memory writes.
As shown in the above example, I’m also wrapping pipeline layout in an “AutoDeleter” object. Pipeline layouts are
required during pipeline creation, descriptor sets binding (enabling/activating this interface between shaders and shader
resources) and push constants setting. None of these operations, except for pipeline creation, take place in this tutorial.
So here, after we create a pipeline, we don’t need the layout anymore. To avoid memory leaks, I have used this helper
class to destroy the layout as soon as we leave the function in which graphics pipeline is created.
Creating a Graphics Pipeline
Now we have all the resources required to properly create graphics pipeline. Here is the code that does that:
Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout> pipeline_layout =
CreatePipelineLayout();
if( !pipeline_layout ) {
return false;
}
VkGraphicsPipelineCreateInfo pipeline_create_info = {
VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineCreateFlags flags
static_cast<uint32_t>(shader_stage_create_infos.size()), // uint32_t
stageCount
&shader_stage_create_infos[0], // const
VkPipelineShaderStageCreateInfo *pStages
&vertex_input_state_create_info, // const
VkPipelineVertexInputStateCreateInfo *pVertexInputState;
&input_assembly_state_create_info, // const
VkPipelineInputAssemblyStateCreateInfo *pInputAssemblyState
nullptr, // const
VkPipelineTessellationStateCreateInfo *pTessellationState
&viewport_state_create_info, // const
VkPipelineViewportStateCreateInfo *pViewportState
&rasterization_state_create_info, // const
VkPipelineRasterizationStateCreateInfo *pRasterizationState
&multisample_state_create_info, // const
VkPipelineMultisampleStateCreateInfo *pMultisampleState
nullptr, // const
VkPipelineDepthStencilStateCreateInfo *pDepthStencilState
&color_blend_state_create_info, // const
VkPipelineColorBlendStateCreateInfo *pColorBlendState
nullptr, // const
VkPipelineDynamicStateCreateInfo *pDynamicState
pipeline_layout.Get(), // VkPipelineLayout
layout
Vulkan.RenderPass, // VkRenderPass
renderPass
0, // uint32_t
subpass
VK_NULL_HANDLE, // VkPipeline
basePipelineHandle
-1 // int32_t
basePipelineIndex
};
First we create a pipeline layout wrapped in an object of type “AutoDeleter”. Next we fill the structure of type
VkGraphicsPipelineCreateInfo. It contains many fields. Here is a brief description of them:
sType – Type of structure, VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO here.
pNext – Parameter reserved for future, extension-related use.
flags – This time this parameter is not reserved for future use but controls how the pipeline should be created:
if we are creating a derivative pipeline (if we are inheriting from another pipeline) or if we allow creating
derivative pipelines from this one. We can also disable optimizations, which should shorten the time needed
to create a pipeline.
stageCount – Number of stages described in the pStages parameter; must be greater than zero.
pStages – Array with descriptions of active shader stages (the ones created using shader modules); each stage
must be unique (we can’t specify a given stage more than once). There also must be a vertex stage present.
pVertexInputState – Pointer to a variable contain the description of the vertex input’s state.
pInputAssemblyState – Pointer to a variable with input assembly description.
pTessellationState – Pointer to a description of the tessellation stages; can be null if tessellation is disabled.
pViewportState – Pointer to a variable specifying viewport parameters; can be null if rasterization is disabled.
pRasterizationState – Pointer to a variable specifying rasterization behavior.
pMultisampleState – Pointer to a variable defining multisampling; can be null if rasterization is disabled.
pDepthStencilState – Pointer to a description of depth/stencil parameters; this can be null in two situations:
when rasterization is disabled or we’re not using depth/stencil attachments in a render pass.
pColorBlendState – Pointer to a variable with color blending/write masks state; can be null also in two
situations: when rasterization is disabled or when we’re not using any color attachments inside the render
pass.
pDynamicState – Pointer to a variable specifying which parts of the graphics pipeline can be set dynamically;
can be null if the whole state is considered static (defined only through this create info structure).
layout – Handle to a pipeline layout object that describes resources accessed inside shaders.
renderPass – Handle to a render pass object; pipeline can be used with any render pass compatible with the
provided one.
subpass – Number (index) of a subpass in which the pipeline will be used.
basePipelineHandle – Handle to a pipeline this one should derive from.
basePipelineIndex – Index of a pipeline this one should derive from.
When we are creating a new pipeline, we can inherit some of the parameters from another one. This means that both
pipelines should have much in common. A good example is shader code. We don’t specify what fields are the same, but
the general message that the pipeline inherits from another one may substantially accelerate pipeline creation. But why
are there two fields to indicate a “parent” pipeline? We can’t use them both—only one of them at a time. When we are
using a handle, this means that the “parent” pipeline is already created and we are deriving from the one we have provided
the handle of. But the pipeline creation function allows us to create many pipelines at once. Using the second parameter,
“parent” pipeline index, we can create both “parent” and “child” pipelines in the same call. We just specify an array of
graphics pipeline creation info structures and this array is provided to pipeline creation function. So the
“basePipelineIndex” is the index of pipeline creation info in this very array. We just have to remember that the “parent”
pipeline must be earlier (must have a smaller index) in this array and it must be created with the “allow derivatives” flag
set.
In this example we are creating a pipeline with the state being entirely static (null for the “pDynamicState” parameter).
But what is a dynamic state? To allow for some flexibility and to lower the number of created pipeline objects, the dynamic
state was introduced. We can define through the “pDynamicState” parameter what parts of the graphics pipeline can be
set dynamically through additional Vulkan commands and what parts are being static, set once during pipeline creation.
The dynamic state includes parameters such as viewports, line widths, blend constants, or some stencil parameters. If we
specify that a given state is dynamic, parameters in a pipeline creation info structure that are related to that state are
ignored. We must set the given state using the proper Vulkan commands during rendering because initial values of such
state may be undefined.
So after these quite overwhelming preparations we can create a graphics pipeline. This is done by calling the
vkCreateGraphicsPipelines() function which, among others, takes an array of pointers to the pipeline create info
structures. When everything goes well, VK_SUCCESS should be returned by this function and a handle of a graphics
pipeline should be stored in a variable we’ve provided the address of. Now we are ready to start drawing.
Command buffers are containers for GPU commands. If we want to execute some job on a device, we do it through
command buffers. This means that we must prepare a set of commands that process data (that is, draw something on the
screen) and record these commands in command buffers. Then we can submit whole buffers to device’s queues. This
submit operation tells the device: here is a bunch of things I want you to do for me and do them now.
To record commands, we must first allocate command buffers. These are allocated from command pools, which can
be thought of as memory chunks. If a command buffer needs to be larger (as we record many complicated commands in
it) it can grow and use additional memory from a pool it was allocated with. So first we must create a command pool.
Remember that command buffers allocated from a given pool can only be submitted to a queue from a queue family
specified during pool creation.
To allocate command buffers we specify a variable of structure type. This time its type is
VkCommandBufferAllocateInfo, which contains these members:
To allocate command buffers, call the vkAllocateCommandBuffers() function and check whether it succeeded. We
can allocate many buffers at once with one function call.
I’ve prepared a simple buffer allocating function to show you how some Vulkan functions can be wrapped for easier
use. Here is a usage of two such wrapper functions that create command pools and allocate command buffers from them.
if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.GraphicsCommandPool )
) {
printf( "Could not create command pool!\n" );
return false;
}
As you can see, we are creating a command pool for a graphics queue family index. All image state transitions and
drawing operations will be performed on a graphics queue. Presentation is done on another queue (if the presentation
queue is different from the graphics queue) but we don’t need a command buffer for this operation.
And we are also allocating command buffers for each swap chain image. Here we take number of images and provide
it to this simple “wrapper” function for command buffer allocation.
VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};
VkClearValue clear_value = {
{ 1.0f, 0.8f, 0.4f, 0.0f }, // VkClearColorValue
color
};
Performing command buffer recording is similar to OpenGL’s drawing lists where we start recording a list by calling
the glNewList() function. Next we prepare a set of drawing commands and then we close the list or stop recording it
(glEndList()). So the first thing we need to do is to prepare a variable of type VkCommandBufferBeginInfo. It is used when
we start recording a command buffer and it tells the driver about the type, contents, and desired usage of a command
buffer. Variables of this type contain the following members:
Next we describe the areas or parts of our images that we will set up image memory barriers for. Here we set up
barriers to specify that queues from different families will reference a given image. This is done through a variable of type
VkImageSubresourceRange with the following members:
aspectMask – Describes a “type” of image, whether it is for color, depth, or stencil data.
baseMipLevel – Number of a first mipmap level our operations will be performed on.
levelCount – Number of mipmap levels (including base level) we will be operating on.
baseArrayLayer – Number of an first array layer of an image that will take part in operations.
layerCount – Number of layers (including base layer) that will be modified.
Next we set up a clear value for our images. Before drawing we need to clear images. In previous tutorials, we
performed this operation explicitly by ourselves. Here images are cleared as a part of a render pass attachment load
operation. We set to “clear” so now we must specify the color to which an image must be cleared. This is done using a
variable of type VkClearValue in which we provide R, G, B, A values.
Variables we have created thus far are independent of an image itself, and that’s why we have specified them before
a loop. Now we can start recording command buffers:
for( size_t i = 0; i < Vulkan.GraphicsCommandBuffers.size(); ++i ) {
vkBeginCommandBuffer( Vulkan.GraphicsCommandBuffers[i],
&graphics_commandd_buffer_begin_info );
VkRenderPassBeginInfo render_pass_begin_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.RenderPass, // VkRenderPass
renderPass
Vulkan.FramebufferObjects[i].Handle, // VkFramebuffer
framebuffer
{ // VkRect2D
renderArea
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
300, // int32_t
width
300, // int32_t
height
}
},
1, // uint32_t
clearValueCount
&clear_value // const VkClearValue
*pClearValues
};
vkCmdBindPipeline( Vulkan.GraphicsCommandBuffers[i],
VK_PIPELINE_BIND_POINT_GRAPHICS, Vulkan.GraphicsPipeline );
vkCmdDraw( Vulkan.GraphicsCommandBuffers[i], 3, 1, 0, 0 );
vkCmdEndRenderPass( Vulkan.GraphicsCommandBuffers[i] );
Recording a command buffer is started by calling the vkBeginCommandBuffer() function. At the beginning we set up
a barrier that tells the driver that previously queues from one family referenced a given image but now queues from a
different family will be referencing it (we need to do this because during swap chain creation we specified exclusive sharing
mode). The barrier is set only when the graphics queue is different than the present queue. This is done by calling the
vkCmdPipelineBarrier() function. We must specify when in the pipeline the barrier should be placed
(VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT) and how the barrier should be set up. Barrier parameters are
prepared through the VkImageMemoryBarrier structure:
In this example we don’t change the layout of an image, for two reasons: (1) The barrier may not be set at all (if the
graphics and present queues are the same), and (2) the layout transition will be performed automatically as a render pass
operation (at the beginning of the first—and only—subpass).
Next we start a render pass. We call the vkCmdBeginRenderPass() function for which we must provide a pointer to a
variable of VkRenderPassBeginInfo type. It contains the following members:
When we specify a render area for the render pass, we must make sure that the rendering operations won’t modify
pixels outside this area. This is just a hint for a driver so it could optimize its behavior. If we won’t confine operations to
the provided area by using a proper scissor test, pixels outside this area may become undefined (we can’t rely on their
contents). We also can’t specify a render area that is greater than a framebuffer’s dimensions (falls outside the
framebuffer).
And with a pClearValues array, it must contain the elements for each render pass attachment. Each of its members
specifies the color to which the given attachment must be cleared when its loadOp is set to clear. For attachments where
loadOp is not clear, the values provided for them are ignored. But we can’t provide an array with a smaller amount of
elements.
We have begun a command buffer, set a barrier (if necessary), and started a render pass. When we start a render pass
we are also starting its first subpass. We can switch to the next subpass by calling the vkCmdNextSubpass() function.
During these operations, layout transitions and clear operations may occur. Clears are done in a subpass in which the
image is first used (referenced). Layout transitions occur each time a subpass layout is different than the layout in a
previous subpass or (in the case of a first subpass or when the image is first referenced) different than the initial layout
(layout before the render pass). So in our example when we start a render pass, the swap chain image’s layout is changed
automatically from “presentation source” to a “color attachment optimal” layout.
Now we bind a graphics pipeline. This is done by calling the vkCmdBindPipeline() function. This “activates” all shader
programs (similar to the glUseProgram() function) and sets desired tests, blending operations, and so on.
After the pipeline is bound, we can finally draw something by calling the vkCmdDraw() function. In this function we
specify the number of vertices we want to draw (three), number of instances that should be drawn (just one), and a
numbers or indices of a first vertex and first instance (both zero).
Next the vkCmdEndRenderPass() function is called which, as the name suggests, ends the given render pass. Here all
final layout transitions occur if the final layout specified for a render pass is different from the layout used in the last
subpass the given image was referenced in.
After that, the barrier may be set in which we tell the driver that the graphics queue finished using a given image and
from now on the present queue will be using it. This is done, once again, only when the graphics and present queues are
different. And after the barrier, we stop recording a command buffer for a given image. All these operations are repeated
for each swap chain image.
Drawing
The drawing function is the same as the Draw() function presented in Tutorial 2. We acquire the image’s index, submit
a proper command buffer, and present an image. We are using semaphores the same way they were used previously: one
semaphore is used for acquiring an image and it tells the graphics queue to wait when the image is not yet available for
use. The second command buffer is used to indicate whether drawing on a graphics queue is finished. The present queue
waits on this semaphore before it can present an image. Here is the source code of a Draw() function:
VkSemaphore image_available_semaphore = GetImageAvailableSemaphore();
VkSemaphore rendering_finished_semaphore = GetRenderingFinishedSemaphore();
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;
VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&rendering_finished_semaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&swap_chain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );
switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during image presentation!\n" );
return false;
}
return true;
24. Tutorial03.cpp, function Draw()
Tutorial 3 Execution
In this tutorial we performed “real” drawing operations. A simple triangle may not sound too convincing, but it is a
good starting point for a first Vulkan-created image. Here is what the triangle should look like:
If you’re wondering why there are black parts in the image, here is an explanation: To simplify the whole code, we
created a framebuffer with a fixed size (width and height of 300 pixels). But the window’s size (and the size of the swap
chain images) may be greater than these 300 x 300 pixels. The parts of an image that lay outside of the framebuffer’s
dimensions are uncleared and unmodified by our application. They may even contain some “artifacts,” because the
memory from which the driver allocates the swap chain images may have been previously used for other purposes and
could contain some data. The correct behavior is to create a framebuffer with the same size as the swap chain images and
to recreate it when the window’s size changes. But as long as the blue triangle is rendered on an orange/gold background,
it means that the code works correctly.
Cleaning Up
One last thing to learn before this tutorial ends is how to release resources created during this lesson. I won’t repeat
the code needed to release resources created in the previous chapter. Just look at the VulkanCommon.cpp file. Here is
the code needed to destroy resources specific to this chapter:
if( GetDevice() != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( GetDevice() );
As usual we first check whether there is any device. If we don’t have a device, we don’t have a resource. Next we wait
until the device is free and we delete all the created resources. We start from deleting command buffers by calling a
vkFreeCommandBuffers() function. Next we destroy a command pool through a vkDestroyCommandPool() function and
after that the graphics pipeline is destroyed. This is achieved through a vkDestroyPipeline() function call. Next we call a
vkDestroyRenderPass() function, which releases the handle to a render pass. Finally, all framebuffers and image views
associated with each swap chain image are deleted.
Each object destruction is preceded by a check whether a given resource was properly created. If not we skip the
process of destruction of such resource.
Conclusion
In this tutorial, we created a render pass with one subpass. Next we created image views and framebuffers for each
swap chain image. One of the most difficult parts was to create a graphics pipeline, because it required us to prepare lots
of data. We had to create shader modules and describe all the shader stages that should be active when a given graphics
pipeline is bound. We had to prepare information about input vertices, their layout, and assembling them into polygons.
Viewport, rasterization, multisampling, and color blending information was also necessary. Then we created a simple
pipeline layout and after that we could create the pipeline itself. Next we created a command pool and allocated command
buffers for each swap chain image. Operations recorded in each command buffer involved setting up an image memory
barrier, beginning a render pass, binding a graphics pipeline, and drawing. Next we ended a render pass and set up another
image memory barrier. The drawing itself was performed the same way as in the previous tutorial (2).
In the next tutorial, we will learn about the vertex attributes, images and buffers.
Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. Check with your system manufacturer or
retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
But now we can build on these foundations. So the tutorials will be shorter and focus on smaller topics related to a
Vulkan API. In this part I present the recommended way of drawing arbitrary geometry by providing vertex attributes
through vertex buffers. As the code of this lesson is similar to the code from the “03 – First Triangle” tutorial, I focus on
and describe only the parts that are different.
I also show a different way of organizing the rendering code. Previously we recorded command buffers before the
main rendering loop. But in real-life situations, every frame of animation is different, so we can’t prerecord all the
rendering commands. We should record and submit the command buffer as late as possible to minimize input lag and
acquire as recent input data as possible. We will record the command buffer just before it is submitted to the queue. But
a single command buffer isn’t enough. We should not record the same command buffer until the graphics card finishes
processing it after it was submitted. This moment is signaled through a fence. But waiting on a fence every frame is a
waste of time, so we need more command buffers used interchangeably. With more command buffers, more fences are
also needed and the situation gets more complicated. This tutorial shows how to organize the code so it is easily
maintained, flexible, and as fast as possible.
In this part of tutorial, identically as in the previous part, we specify “transfer src” for initial and final image layouts,
and “color attachment optimal” subpass layout for our render pass. But previous tutorials lacked important, additional
information, specifically how the image was used (that is, what types of operations occurred in connection with an image),
and when it was used (which parts of a rendering pipeline were using an image). This information can be specified both in
the image memory barrier and the render pass description. When we create an image memory barrier, we specify the
types of operations which concern the given image (memory access types before and after barrier), and we also specify
when this barrier should be placed (pipeline stages in which image was used before and after the barrier).
When we create a render pass and provide a description for it, the same information is specified through subpass
dependencies. This additional data is crucial for a driver to optimally prepare an implicit barrier. Below is the source code
that creates a render pass and prepares subpass dependencies.
std::vector<VkSubpassDependency> dependencies = {
{
VK_SUBPASS_EXTERNAL, // uint32_t
srcSubpass
0, // uint32_t
dstSubpass
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, // VkPipelineStageFlags
srcStageMask
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // VkPipelineStageFlags
dstStageMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
dstAccessMask
VK_DEPENDENCY_BY_REGION_BIT // VkDependencyFlags
dependencyFlags
},
{
0, // uint32_t
srcSubpass
VK_SUBPASS_EXTERNAL, // uint32_t
dstSubpass
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // VkPipelineStageFlags
srcStageMask
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, // VkPipelineStageFlags
dstStageMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_DEPENDENCY_BY_REGION_BIT // VkDependencyFlags
dependencyFlags
}
};
VkRenderPassCreateInfo render_pass_create_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkRenderPassCreateFlags
flags
1, // uint32_t
attachmentCount
attachment_descriptions, // const VkAttachmentDescription
*pAttachments
1, // uint32_t
subpassCount
subpass_descriptions, // const VkSubpassDescription
*pSubpasses
static_cast<uint32_t>(dependencies.size()), // uint32_t
dependencyCount
&dependencies[0] // const VkSubpassDependency
*pDependencies
};
Subpass dependencies describe dependencies between different subpasses. When an attachment is used in one
specific way in a given subpass (for example, rendering into it), but in another way in another subpass (sampling from it),
we can create a memory barrier or we can provide a subpass dependency that describes the intended usage of an
attachment in these two subpasses. Of course, the latter option is recommended, as the driver can (usually) prepare the
barriers in a more optimal way. And the code itself is improved—everything required to understand the code is gathered
in one place, one object.
In our simple example, we have only one subpass, but we specify two dependencies. This is because we can (and
should) specify dependencies between render passes (by providing the number of a given subpass) and operations outside
of them (by providing a VK_SUBPASS_EXTERNAL value). Here we provide one dependency for color attachment between
operations occurring before a render pass and its only subpass. The second dependency is defined for operations occurring
inside a subpass and after the render pass.
What operations are we talking about? We are using only one attachment, which is an image acquired from a
presentation engine (swapchain). The presentation engine uses an image as a source of a presentable data. It only displays
an image. So the only operation that involves this image is “memory read” on the image with “present src” layout. This
operation doesn’t occur in any normal pipeline stage, but it can be represented in the “bottom of pipeline” stage.
Inside our render pass, in its only subpass (with index 0), we are rendering into an image used as a color attachment.
So the operation that occurs on this image is “color attachment write”, which is performed in the “color attachment
output” pipeline stage (after a fragment shader). After that the image is presented and returned to a presentation engine,
which again uses this image as a source of data. So, in our example, the operation after the render pass is the same as
before it: “memory read”.
We specify this data through an array of VkSubpassDependency members. And when we create a render pass and a
VkRenderPassCreateInfo structure, we specify the number of elements in the dependencies array (through
dependencyCount member), and provide an address of its first element (through pDependencies). In a previous part of
the tutorial we have provided 0 and nullptr for these two fields. VkSubpassDependency structure contains the following
fields:
Writing Shaders
First have a look at the vertex shader written in GLSL code:
#version 450
out gl_PerVertex
{
vec4 gl_Position;
};
void main() {
gl_Position = i_Position;
v_Color = i_Color;
}
2. shader.vert
This shader is quite simple, though more complicated than the one from Tutorial 03.
We specify two input attributes (named i_Position and i_Color). In Vulkan, all attributes must have a location layout
qualifier. When we specify a description of the vertex attributes in Vulkan API, the names of these attributes don’t matter,
only their indices/locations. In OpenGL* we could ask for a location of an attribute with a given name. In Vulkan we can’t
do this. Location layout qualifiers are the only way to go.
Next, we redeclare the gl_PerVertex block in the shader. Vulkan uses shader I/O blocks, and we should redeclare a
gl_PerVertex block to specify exactly what members of this block to use. When we don’t, the default definition is used.
But we must remember that the default definition contains gl_ClipDistance[], which requires us to enable a feature named
shaderClipDistance (and in Vulkan we can’t use features that are not enabled during device creation or our application
may not work correctly). Here we are using only a gl_Position member so the feature is not required.
We then specify an additional output varying variable called v_Color in which we store vertices’ colors. Inside a main
function we copy values provided by an application to proper output variables: position to gl_Position and color to v_Color.
void main() {
o_Color = v_Color;
}
3. shader.frag
In a fragment shader, the input varying variable v_Color is copied to the only output variable called o_Color. Both
variables have location layout specifiers. The v_Color variable has the same location as the output variable in the vertex
shader, so it will contain color values interpolated between vertices.
These shaders can be converted to a SPIR-V assembly the same way as previously. The following commands do this:
So now, when we know what attributes we want to use in our shaders, we can create the appropriate graphics
pipeline.
We want to use two attributes: vertex positions, which are composed of four float components, and vertex colors,
which are also composed of four float values. We will lay all of our vertex data in one buffer using the interleaved attributes
layout. This means that position for the first vertex will be placed, next color for the same vertex, next the position of
second vertex, after that the color of the second vertex, then position and color of third vertex and so on. All this
specification is performed with the following code:
std::vector<VkVertexInputBindingDescription> vertex_binding_descriptions = {
{
0, // uint32_t
binding
sizeof(VertexData), // uint32_t
stride
VK_VERTEX_INPUT_RATE_VERTEX // VkVertexInputRate
inputRate
}
};
std::vector<VkVertexInputAttributeDescription> vertex_attribute_descriptions = {
{
0, // uint32_t
location
vertex_binding_descriptions[0].binding, // uint32_t
binding
VK_FORMAT_R32G32B32A32_SFLOAT, // VkFormat
format
offsetof(struct VertexData, x) // uint32_t
offset
},
{
1, // uint32_t
location
vertex_binding_descriptions[0].binding, // uint32_t
binding
VK_FORMAT_R32G32B32A32_SFLOAT, // VkFormat
format
offsetof( struct VertexData, r ) // uint32_t
offset
}
};
VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineVertexInputStateCreateFlags flags;
static_cast<uint32_t>(vertex_binding_descriptions.size()), // uint32_t
vertexBindingDescriptionCount
&vertex_binding_descriptions[0], // const
VkVertexInputBindingDescription *pVertexBindingDescriptions
static_cast<uint32_t>(vertex_attribute_descriptions.size()), // uint32_t
vertexAttributeDescriptionCount
&vertex_attribute_descriptions[0] // const
VkVertexInputAttributeDescription *pVertexAttributeDescriptions
};
4. Tutorial04.cpp, function CreatePipeline()
First specify the binding (general memory information) of vertex data through VkVertexInputBindingDescription. It
contains the following fields:
The stride and inputRate fields are quite self-explanatory. Additional information may be required for a binding
member. When we create a vertex buffer, we bind it to a chosen slot before rendering operations. The slot number (an
index) is this binding and here we describe how data in this slot is aligned in memory and how it should be consumed (per
vertex or per instance). Different vertex buffers can be bound to different bindings. And each binding may be differently
positioned in memory.
Next step is to define all vertex attributes. We must specify a location (index) for each attribute (the same as in a
shader source code, in location layout qualifier), source of data (binding from which data will be read), format (data type
and number of components), and offset at which data for this specific attribute can be found (offset from the beginning
of a data for a given vertex, not from the beginning of all vertex data). The situation here is exactly the same as in OpenGL
where we created Vertex Buffer Objects (VBO, which can be thought of as an equivalent of “binding”) and defined
attributes using glVertexAttribPointer() function through which we specified an index of an attribute (location), size and
type (number of components and format), stride and offset. This information is provided through the
VkVertexInputAttributeDescription structure. It contains these fields:
location – Index of an attribute, the same as defined by the location layout specifier in a shader source code.
binding – The number of the slot from which data should be read (source of data like VBO in OpenGL), the
same binding as in a VkVertexInputBindingDescription structure and vkCmdBindVertexBuffers() function
(described later).
format – Data type and number of components per attribute.
offset – Beginning of data for a given attribute.
When we are ready, we can prepare vertex input state description by filling a variable of type
VkPipelineVertexInputStateCreateInfo which consist of the following fields:
This concludes vertex attributes specification at pipeline creation. But to use them, we must create a vertex buffer
and bind it to command buffer before we issue a rendering command.
The structure that defines static viewport parameters has the following members:
When we want to define viewport and scissor parameters through a dynamic state, we don’t have to fill pViewports
and pScissors members. That’s why they are set to null in the example above. But, we always have to define the number
of viewports and scissor test rectangles. These values are always specified through the VkPipelineViewportStateCreateInfo
structure, no matter if we want to use dynamic or static viewport and scissor state.
VkPipelineDynamicStateCreateInfo dynamic_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineDynamicStateCreateFlags flags
static_cast<uint32_t>(dynamic_states.size()), // uint32_t
dynamicStateCount
&dynamic_states[0] // const
VkDynamicState *pDynamicStates
};
7. Tutorial04.cpp, function CreatePipeline()
It is done by using a structure of type VkPipelineDynamicStateCreateInfo, which contains the following fields:
The most important variable, which contains references to all pipeline parameters, is of type
VkGraphicsPipelineCreateInfo. The only change from the previous tutorial is an addition of the pDynamicState parameter,
which points to a structure of VkPipelineDynamicStateCreateInfo type, described above. Every pipeline state, which is
specified as dynamic, must be set through a proper function call during command buffer recording.
In Vulkan, buffer and image creation consists of at least two stages. First, we create the object itself. Next, we need
to create a memory object, which will then be bound to the buffer (or image). From this memory object, the buffer will
take its storage space. This approach allows us to specify additional parameters for the memory and control it with more
details.
To create a (general) buffer object we call vkCreateBuffer(). It accepts, among other parameters, a pointer to a
variable of type VkBufferCreateInfo, which defines parameters of created buffer. Here is the code responsible for creating
a buffer used as a source of data for vertex attributes:
VertexData vertex_data[] = {
{
-0.7f, -0.7f, 0.0f, 1.0f,
1.0f, 0.0f, 0.0f, 0.0f
},
{
-0.7f, 0.7f, 0.0f, 1.0f,
0.0f, 1.0f, 0.0f, 0.0f
},
{
0.7f, -0.7f, 0.0f, 1.0f,
0.0f, 0.0f, 1.0f, 0.0f
},
{
0.7f, 0.7f, 0.0f, 1.0f,
0.3f, 0.3f, 0.3f, 0.0f
}
};
Vulkan.VertexBuffer.Size = sizeof(vertex_data);
VkBufferCreateInfo buffer_create_info = {
VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkBufferCreateFlags flags
Vulkan.VertexBuffer.Size, // VkDeviceSize size
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, // VkBufferUsageFlags usage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr // const uint32_t
*pQueueFamilyIndices
};
At the beginning of the CreateVertexBuffer() function we define a set of values for position and color attributes. First,
four position components are defined for first vertex, next four color components for the same vertex, after that four
components of a position attribute for second vertex are specified, next a color values for the same vertex, after that
position and color for third and fourth vertices. The size of this array is used to define the size of a buffer. Remember
though that internally graphics driver may require more storage for a buffer than the size requested by an application.
Next we define a variable of VkBufferCreateInfo type. It is a structure with the following fields:
VkPhysicalDeviceMemoryProperties memory_properties;
vkGetPhysicalDeviceMemoryProperties( GetPhysicalDevice(), &memory_properties );
VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
buffer_memory_requirements.size, // VkDeviceSize
allocationSize
i // uint32_t
memoryTypeIndex
};
First we must check what the memory requirements for a created buffer are. We do this by calling the
vkGetBufferMemoryRequirements() function. It stores parameters for memory creation in a variable that we provided
the address of in the last parameter. This variable must be of type VkMemoryRequirements and it contains information
about required size, memory alignment, and supported memory types. What are memory types?
Each device may have and expose different memory types—heaps of various sizes that have different properties. One
memory type may be a device’s local memory located on the GDDR chips (thus very, very fast). Another may be a shared
memory that is visible both for a graphics card and a CPU. Both the graphics card and application may have access to this
memory, but such memory type is slower than the device local-only memory (which is accessible only to a graphics card).
To check what memory heaps and types are available, we need to call the vkGetPhysicalDeviceMemoryProperties()
function, which stores information about memory in a variable of type VkPhysicalDeviceMemoryProperties. It contains
the following information:
Before we can allocate a memory for a given buffer, we need to check which memory type fulfills a buffer’s memory
requirements. If we have additional, specific needs, we can also check them. For all of this, we iterate over all available
memory types. Buffer memory requirements have a field called memoryTypeBits and if a bit on a given index is set in this
field, it means that for a given buffer we can allocate a memory of the type represented by that index. But we must
remember that while there must always be a memory type that fulfills buffer’s memory requirements, it may not support
some other, specific needs. In this case we need to look for another memory type or change our additional requirements.
Here, our additional requirement is that memory needs to be host visible. This means that application can map this
memory and get access to it—read it or write data to it. Such memory is usually slower than the device local-only memory,
but this way we can easily upload data for our vertex attributes. The next tutorial will show how to use device local-only
memory for better performance.
Fortunately, the host visible requirement is popular, and it should be easy to find a memory type that supports both
the buffer’s memory requirements and the host visible property. We then prepare a variable of type
VkMemoryAllocateInfo and fill all its fields:
After we fill such a structure we call vkAllocateMemory() and check whether the memory object allocation succeeded.
AllocateBufferMemory() is a function that allocates a memory object. It was presented earlier. When a memory object
is created, we bind it to the buffer by calling the vkBindBufferMemory() function. During the call we must specify a handle
to a buffer, handle to a memory object, and an offset. Offset is very important and requires some additional explanation.
When we queried for buffer memory requirement, we acquired information about required size, memory type, and
alignment. Different buffer usages may require different memory alignment. The beginning of a memory object (offset of
0) satisfies all alignments. This means that all memory objects are created at addresses that fulfill the requirements of all
different usages. So when we specify a zero offset, we don’t have to worry about anything.
But we can create larger memory object and use it as a storage space for multiple buffers (or images). This, in fact, is
the recommended behavior. Creating larger memory objects means we are creating fewer memory objects. This allows
driver to track fewer objects in general. Memory objects must be tracked by a driver because of OS requirements and
security measures. Larger memory objects don’t cause big problems with memory fragmentation. Finally, we should
allocate larger memory amounts and keep similar objects in them to increase cache hits and thus improve performance
of our application.
But when we allocate larger memory objects and bind them to multiple buffers (or images), not all of them can be
bound at offset zero. Only one can be bound at this offset, others must be bound further away, after a space used by the
first buffer (or image). So the offset for the second, and all other buffers bound to the same memory object, must meet
alignment requirements reported by the query. And we must remember it. That’s why alignment member is important.
When our buffer is created and memory for it is allocated and bound, we can fill the buffer with data for vertex
attributes.
VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.VertexBuffer.Memory, // VkDeviceMemory memory
0, // VkDeviceSize offset
VK_WHOLE_SIZE // VkDeviceSize size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );
return true;
12. Tutorial04.cpp, function CreateVertexBuffer()
To map memory, we call the vkMapMemory() function. In the call we must specify which memory object we want to
map and a region to access. Region is defined by an offset from the beginning of a memory object’s storage and size. After
the successful call we acquire a pointer. We can use it to copy data from our application to the provided memory address.
Here we copy vertex data from an array with vertex positions and colors.
After a memory copy operation and before we unmap a memory (we don’t need to unmap it, we can keep a pointer
and this shouldn’t impact performance), we need to tell the driver which parts of the memory was modified by our
operations. This operation is called flushing. Through it we specify all memory ranges that our application copied data to.
Ranges don’t have to be continuous. Ranges are defined by an array of VkMappedMemoryRange elements which contain
these fields:
When we define all memory ranges that should be flashed, we can call the vkFlushMappedMemoryRanges() function.
After that, the driver will know which parts were modified and will reload them (that is, refresh cache). Reloading usually
occurs on barriers. After modifying a buffer, we should set a buffer memory barrier, which will tell the driver that some
operations influenced a buffer and it should be refreshed. But, fortunately, in this case such a barrier is placed implicitly
by the driver on a submission of a command buffer that references the given buffer and no additional operations are
required. Now we can use this buffer during rendering commands recording.
To record command buffers and submit them to queue in an efficient way, we need four types of resources: command
buffers, semaphores, fences and framebuffers. Semaphores, as we already discussed, are used for internal queue
synchronization. Fences, on the other hand, allow the application to check if some specific situation occurred, e.g. if
command buffer’s execution after it was submitted to queue, has finished. If necessary, application can wait on a fence,
until it is signaled. In general, semaphores are used to synchronize queues (GPU) and fences are used to synchronize
application (CPU).
To render a single frame of animation we need (at least) one command buffer, two semaphores—one for a swapchain
image acquisition (image available semaphore) and the other to signal that presentation may occur (rendering a finished
semaphore)—a fence, and a framebuffer. The fence is used later to check whether we can rerecord a given command
buffer. We will keep several numbers of such rendering resources, which we can call a virtual frame. The number of these
virtual frames (consisting of a command buffer, two semaphores, a fence, and a framebuffer) should be independent of a
number of swapchain images.
The rendering algorithm progresses like this: We record rendering commands to the first virtual frame and then submit
it to a queue. Next we record another frame (command buffer) and submit it to queue. We do this until we are out of all
virtual frames. At this point we will start reusing frames by taking the oldest (least recently submitted) command buffer
and rerecording it again. Then we will use another command buffer, and so on.
This is where the fences come in. We are not allowed to record a command buffer that has been submitted to a queue
until its execution in the queue is finished. During command buffer recording, we can use the “simultaneous use” flag,
which allows us to record or resubmit a command buffer that has already been submitted. This may impact performance
though. A better way is to use fences and check whether a command buffer is not used any more. If a graphics card is still
processing a command buffer, we can wait on a fence associated with a given command buffer, or use this additional time
for other purposes, like improved AI calculations, and after some time check again to see whether a fence is signaled.
How many virtual frames should we have? One is not enough. When we record and submit a single command buffer,
we immediately wait until we can rerecord it. It is a waste of time of both the CPU and the GPU. The GPU is usually faster,
so waiting on a CPU causes more waiting on a GPU. We should keep the GPU as busy as possible. That is why thin APIs like
Vulkan were created. Using two virtual frames gives huge performance gain, as there is much less waiting both on the CPU
and the GPU. Adding a third virtual frame gives additional performance gain, but the increase isn’t as big. Using four or
more groups of rendering resource doesn’t make sense, as the performance gain is negligible (of course this may depend
on the complexity of the rendered scene and calculations performed by the CPU-like physics or AI). When we increase the
number of virtual frames we also increase the input lag, as we present a frame that’s one to three frames behind the CPU.
So two or three virtual frames seems to be the most reasonable compromise between performance, memory usage, and
input lag.
You may wonder why the number of virtual frames shouldn’t be connected with the number of swapchain images.
This approach may influence the behavior of our application. When we create a swapchain, we ask for the minimal
required number of images, but the driver is allowed to create more. So different hardware vendors may implement
drivers that offer different numbers of swapchain images, even for the same requirements (present mode and minimal
number of images). When we connect the number of virtual frames with a number of swapchain images, our application
will use only two virtual frames on one graphics card, but four virtual frames on another graphics card. This may influence
both performance and mentioned input lag. It’s not a desired behavior. By keeping the number of virtual frames fixed, we
can control our rendering algorithm and fine-tune it to our needs, that is, balance the time spent on rendering and AI or
physics calculations.
The command pool is created by calling vkCreateCommandPool(), which requires us to provide a pointer to a variable
of type VkCommandPoolCreateInfo. The code remains mostly unchanged, compared to previous tutorials. But this time,
two additional flags are added for command pool creation:
The only change is that command buffers are gathered into a vector of rendering resources. Each rendering resource
structure contains a command buffer, image available semaphore, rendering finished semaphore, a fence and a
framebuffer. Command buffers are allocated in a loop. The number of elements in a rendering resources vector is chosen
arbitrarily. For this tutorial it is equal to three.
Semaphore Creation
The code responsible for creating a semaphore is simple and the same as previously shown:
VkSemaphoreCreateInfo semaphore_create_info = {
VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0 // VkSemaphoreCreateFlags flags
};
Fence Creation
Here is the code responsible for creating fence objects:
VkFenceCreateInfo fence_create_info = {
VK_STRUCTURE_TYPE_FENCE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_FENCE_CREATE_SIGNALED_BIT // VkFenceCreateFlags
flags
};
To create a fence object we call the vkCreateFence() function. It accepts, among other parameters, a pointer to a
variable of type VkFenceCreateInfo, which has the following members:
Why create a fence that is already signaled? Our rendering algorithm will record commands to the first command
buffer, then to the second command buffer, after that to the third, and then once again to the first (after its execution in
a queue has ended). We use fences to check whether we can record a given command buffer once again. But what about
the first recording? We don’t want to keep separate code paths for the first command buffer recording and for the
following recording operations. So when we issue a command buffer recording for the first time, we also check whether
a fence is already signaled. But because we didn’t submit a given command buffer, the fence associated with it can’t
become signaled as a result of the finished execution. So the fence needs to be created in an already signaled state. This
way, for the first time, we won’t have to wait for it to become signaled (as it is already signaled), but after the check we
will reset it and immediately go to the recording code. After that we submit a command buffer and provide the same
fence, which will get signaled by the queue when operations are done. The next time, when we want to rerecord rendering
commands to the same command buffer, we can do the same operations: wait on the fence, reset it, and then start
command buffer recording.
Drawing
Now we are nearly ready to record rendering operations. We are recording each command buffer just before it is
submitted to the queue. We record one command buffer and submit it, then the next command buffer and submit it, then
yet another one. After that we take the first command buffer, check whether we can use it, and we record it and submit
it to the queue.
static size_t resource_index = 0;
RenderingResourcesData ¤t_rendering_resource =
Vulkan.RenderingResources[resource_index];
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;
VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
1, // uint32_t
waitSemaphoreCount
¤t_rendering_resource.FinishedRenderingSemaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&swap_chain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );
switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
std::cout << "Problem occurred during image presentation!" << std::endl;
return false;
}
return true;
17. Tutorial04.cpp, function Draw()
So first we take the least recently used rendering resource. Then we wait until the fence associated with this group is
signaled. If it is, this means that we can safely take a command buffer and record it. But this also means that we can take
semaphores used to acquire and present an image that was referenced in a given command buffer. We shouldn’t use the
same semaphore for different purposes or in two different submit operations, until the previous submission is finished.
The fences prevent us from altering both command buffers and semaphores. And as you will soon see, framebuffers too.
When a fence is finished, we reset the fence and perform normal drawing-related operations: we acquire an image,
record operations rendering into an acquired image, submit the command buffer, and present an image.
After that we take another set of rendering resources and perform these same operations. Thanks to keeping three
groups of rendering resources, three virtual frames, we lower the time wasted on waiting for a fence to be signaled.
Framebuffer creation is simple and fast. Keeping framebuffer objects along with a swapchain means that we need to
recreate them when the swapchain needs to be recreated. If our rendering algorithm is complicated, we have multiple
images and framebuffers associated with them. If those images need to have the same size as swapchain images, we need
to recreate all of them (to include potential size change). So it is better and more convenient to create framebuffers on
demand. This way, they always have the desired size. Framebuffers operate on image views, which are created for a given,
specific image. When a swapchain is recreated, old images are invalid, not existent. So we must recreate image views and
also framebuffers.
In the “03 – First Triangle” tutorial, we had framebuffers of a fixed size and they had to be recreated along with a
swapchain. Now we have a framebuffer object in each of our virtual frame group of resources. Before we record a
command buffer, we create a framebuffer for an image to which we will be rendering, and of the same size as that image.
This way, when swapchain is recreated, the size of the next frame will be immediately adjusted and a handle of the new
swapchain’s image and its image view will be used to create a framebuffer.
When we record a command buffer that uses a render pass and framebuffer objects, the framebuffer must remain
valid for the whole time the command buffer is processed by the queue. When we create a new framebuffer, we can’t
destroy it until commands submitted to a queue are finished. But as we are using fences, and we have already waited on
a fence associated with a given command buffer, we are sure that the framebuffer can be safely destroyed. We then
create a new framebuffer to include potential size and image handle changes.
if( framebuffer != VK_NULL_HANDLE ) {
vkDestroyFramebuffer( GetDevice(), framebuffer, nullptr );
}
VkFramebufferCreateInfo framebuffer_create_info = {
VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkFramebufferCreateFlags
flags
Vulkan.RenderPass, // VkRenderPass
renderPass
1, // uint32_t
attachmentCount
&image_view, // const VkImageView
*pAttachments
GetSwapChain().Extent.width, // uint32_t
width
GetSwapChain().Extent.height, // uint32_t
height
1 // uint32_t
layers
};
return true;
18. Tutorial04.cpp, function CreateFramebuffer()
When we create a framebuffer, we take current swapchain extents and image view for an acquired swapchain image.
VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};
VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};
VkClearValue clear_value = {
{ 1.0f, 0.8f, 0.4f, 0.0f }, // VkClearColorValue
color
};
VkRenderPassBeginInfo render_pass_begin_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.RenderPass, // VkRenderPass
renderPass
framebuffer, // VkFramebuffer
framebuffer
{ // VkRect2D
renderArea
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
GetSwapChain().Extent, // VkExtent2D
extent;
},
1, // uint32_t
clearValueCount
&clear_value // const VkClearValue
*pClearValues
};
First we define a variable of type VkCommandBufferBeginInfo and specify that a command buffer will be submitted
only once. When we specify a VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT flag, we can’t submit a given
command buffer more times. After each submission it must be reset. But the recording operation resets it due to the
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag used during command pool creation.
Next we define subresource ranges for image memory barriers. The layout transitions of the swapchain images are
performed implicitly inside a render pass, but if the graphics and presentation queue are different, the queue transition
must be manually performed.
After that we begin a render pass with the temporary framebuffer object.
vkCmdBindPipeline( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
Vulkan.GraphicsPipeline );
VkViewport viewport = {
0.0f, // float
x
0.0f, // float
y
static_cast<float>(GetSwapChain().Extent.width), // float
width
static_cast<float>(GetSwapChain().Extent.height), // float
height
0.0f, // float
minDepth
1.0f // float
maxDepth
};
VkRect2D scissor = {
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
GetSwapChain().Extent.width, // uint32_t
width
GetSwapChain().Extent.height // uint32_t
height
}
};
VkDeviceSize offset = 0;
vkCmdBindVertexBuffers( command_buffer, 0, 1, &Vulkan.VertexBuffer.Handle, &offset );
vkCmdDraw( command_buffer, 4, 1, 0, 0 );
vkCmdEndRenderPass( command_buffer );
Next we bind a graphics pipeline. It has two states marked as dynamic: viewport and scissor test. So we prepare
structures that define viewport and scissor test parameters. The dynamic viewport state is set by calling the
vkCmdSetViewport() function. The dynamic scissor test is set by calling the vkCmdSetScissor() function. This way, our
graphics pipeline can be used for rendering into images of different sizes.
One last thing before we can draw anything is to bind appropriate vertex buffer, providing buffer data for vertex
attributes. We do this through the vkCmdBindVertexBuffers() function call. We specify a binding number (which set of
vertex attributes should take data from this buffer), a pointer to a buffer handle (or more handles if we want to bind
buffers for multiple bindings) and an offset. The offset specifies that data for vertex attributes should be taken from further
parts of the buffer. But we can’t specify offset larger than the size of a corresponding buffer (buffer, not memory object
bound to this buffer).
Now we have specified all the required elements: framebuffer, viewport and scissor test, and a vertex buffer. We can
draw the geometry, finish the render pass, and end the command buffer.
Tutorial04 Execution
Here is the result of rendering operations:
We are rendering a quad that has different colors in each corner. Try resizing the window; previously, the triangle was
always the same size, only the black frame on the right and bottom sides of an application window grew larger or smaller.
Now, thanks to the dynamic viewport state, the quad is growing or shrinking along with the window.
Cleaning Up
After rendering and before closing the application, we should destroy all resources. Here is a code responsible for this
operation:
if( GetDevice() != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( GetDevice() );
We destroy all resources after the device completes processing all commands submitted to all its queues. We destroy
resources in a reverse order. First we destroy all rendering resources: framebuffers, command buffers, semaphores and
fences. Fences are destroyed by calling the vkDestroyFence() function. Then the command pool is destroyed. After that
we destroy buffer by calling the vkDestroyBuffer() function, and free memory object by calling the vkFreeMemory()
function. Finally the pipeline object and a render pass are destroyed.
Conclusion
This tutorial is based on the”03 – First Triangle” tutorial. We improved rendering by using vertex attributes in a
graphics pipeline and vertex buffers bound during command buffer recording. We described the number and layout of
vertex attributes. We introduced dynamic pipeline states for the viewport and scissors test. We learned how to create
buffers and memory objects and how to bind one to another. We also mapped memory and upload data from the CPU to
the GPU.
We have created a set of rendering resources that allow us to efficiently record and issue rendering commands. These
resources consisted of command buffers, semaphores, fences, and framebuffers. We learned how to use fences, how to
set up values of dynamic pipeline states, and how to bind vertex buffers (source of vertex attribute data) during command
buffer recording.
The next tutorial will present staging resources. These are intermediate buffers used to copy data between the CPU
and GPU. This way, buffers (or images) used for rendering don’t have to be mapped by an application and can be bound
to a device’s local (very fast) memory.
Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Performance varies depending on system configuration. Check with your system
manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
What are “staging resources” or “staging buffers”? They are intermediate or temporary resources used to transfer
data from an application (CPU) to a graphics card’s memory (GPU). We need them to increase our application’s
performance.
In Part 4 of the tutorial we learned how to use buffers, bind them to a host-visible memory, map this memory, and
transfer data from the CPU to the GPU. This approach is easy and convenient for us, but we need to know that host-visible
parts of a graphics card’s memory aren’t the most efficient. Typically, they are much slower than the parts of the memory
that are not directly accessible to the application (cannot be mapped by an application). This causes our application to
execute in a sub-optimal way.
One solution to this problem is to always use device-local memory for all resources involved in a rendering process.
But as device-local memory isn’t accessible for an application, we cannot directly transfer any data from the CPU to such
memory. That’s why we need intermediate, or staging, resources.
In this part of the tutorial we will bind the buffer with vertex attribute data to the device-local memory. And we will
use the staging buffer to mediate the transfer of data from the CPU to the vertex buffer.
Again, only the differences between this tutorial and the previous tutorial (Part 4) are described.
First we create a command pool for which we indicate that command buffers allocated from this pool will be short
lived. In our case, all command buffers will be submitted only once before rerecording.
Next we iterate over the arbitrary chosen number of virtual frames. In this code example, the number of virtual frames
is three. Inside the loop, for each virtual frame, we allocate one command buffer, create two semaphores (one for image
acquisition and a second to indicate that frame rendering is done) and a fence. Framebuffer creation is done inside a
drawing function, just before command buffer recording.
This is the same set of rendering resources used in Part 4, where you can find a more thorough explanation of what is
going on in the code. I will also skip render pass and graphics pipeline creation. They are created in exactly the same way
they were created previously. Since nothing has changed here, we will jump directly to buffer creation.
Buffer creation
Here is our general code used for buffer creation:
VkBufferCreateInfo buffer_create_info = {
VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkBufferCreateFlags
flags
buffer.Size, // VkDeviceSize
size
usage, // VkBufferUsageFlags
usage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr // const uint32_t
*pQueueFamilyIndices
};
return true;
2. Tutorial05.cpp, function CreateBuffer()
The code is wrapped into a CreateBuffer() function, which accepts the buffer’s usage, size, and requested memory
properties. To create a buffer we need to prepare a variable of type VkBufferCreateInfo. It is a structure that contains the
following members:
Right now we are not interested in binding a sparse memory. We do not want to share the buffer between different
device queues, so sharingMode, queueFamilyIndexCount, and pQueueFamilyIndices parameters are irrelevant. The most
important parameters are size and usage. We are not allowed to use a buffer in a way that is not specified during buffer
creation. Finally, we need to create a buffer that is large enough to contain our data.
To create a buffer we call the vkCreateBuffer() function, which when successful stores the buffer handle in a variable
we provided the address of. But creating a buffer is not enough. A buffer, after creation, doesn’t have any storage. We
need to bind a memory object (or part of it) to the buffer to back its storage. Or, if we don’t have any memory objects, we
need to allocate one.
Each buffer’s usage may have a different memory requirement, which is relevant when we want to allocate a memory
object and bind it to the buffer. Here is a code sample that allocates a memory object for a given buffer:
VkMemoryRequirements buffer_memory_requirements;
vkGetBufferMemoryRequirements( GetDevice(), buffer, &buffer_memory_requirements );
VkPhysicalDeviceMemoryProperties memory_properties;
vkGetPhysicalDeviceMemoryProperties( GetPhysicalDevice(), &memory_properties );
VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
buffer_memory_requirements.size, // VkDeviceSize
allocationSize
i // uint32_t
memoryTypeIndex
};
Similarly to the code in Part 4, we first check what the memory requirements for a given buffer are. After that we
check the properties of a memory available in a given physical device. It contains information about the number of memory
heaps and their capabilities.
Next we iterate over each available memory type and check if it is compatible with the requirement queried for a
given buffer. We also check if a given memory type supports our additional, requested properties, for example, whether
a given memory type is host-visible. When we find a match, we fill in a VkMemoryAllocateInfo structure and call a
vkAllocateMemory() function.
The allocated memory object is then bound to our buffer, and from now on we can safely use this buffer in our
application.
We also need to specify two different usages for this buffer. The first is a vertex buffer usage, which means that we
want to use the given buffer as a vertex buffer from which data for the vertex attributes will be fetched. The second is
transfer dst usage, which means that we will copy data to this buffer. It will be used as a destination of any transfer (copy)
operation.
The code that creates a buffer with all these requirements looks like this:
const std::vector<float>& vertex_data = GetVertexData();
Vulkan.VertexBuffer.Size = static_cast<uint32_t>(vertex_data.size() *
sizeof(vertex_data[0]));
if( !CreateBuffer( VK_BUFFER_USAGE_VERTEX_BUFFER_BIT |
VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
Vulkan.VertexBuffer ) ) {
std::cout << "Could not create vertex buffer!" << std::endl;
return false;
}
return true;
4. Tutorial05.cpp, function CreateVertexBuffer()
At the beginning we get the vertex data (hard-coded in a GetVertexData() function) to check how much space we need
to hold values for all our vertices. After that we call a CreateBuffer() function presented earlier to create a vertex buffer
and bind a device-local memory to it.
return true;
5. Tutorial04.cpp, function CreateStagingBuffer()
We will copy data from this buffer to other resources, so we must specify a transfer src usage for it (it will be used as
a source for transfer operations). We would also like to map it to be able to directly copy any data from the application.
For this we need to use a host-visible memory and that’s why we specify this memory property. The buffer’s size is chosen
arbitrarily, but should be large enough to be able to hold vertex data. In real-life scenarios we should try to reuse the
staging buffer as many times as possible, in many cases, so its size should be big enough to cover most of data transfer
operations in our application. Of course, if we want to do many transfer operations at the same time, we have to create
multiple staging buffers.
First let’s see what our data for vertex attributes looks like:
static const std::vector<float> vertex_data = {
-0.7f, -0.7f, 0.0f, 1.0f,
1.0f, 0.0f, 0.0f, 0.0f,
//
-0.7f, 0.7f, 0.0f, 1.0f,
0.0f, 1.0f, 0.0f, 0.0f,
//
0.7f, -0.7f, 0.0f, 1.0f,
0.0f, 0.0f, 1.0f, 0.0f,
//
0.7f, 0.7f, 0.0f, 1.0f,
0.3f, 0.3f, 0.3f, 0.0f
};
return vertex_data;
6. Tutorial05.cpp, function GetVertexData()
It is a simple, hard-coded array of floating point values. Data for each vertex contains four components for position
attribute and four components for color attribute. As we render a quad, we have four pairs of such attributes.
Here is the code that copies data from the application to the staging buffer and after that from the staging buffer to
the vertex buffer:
// Prepare data in a staging buffer
const std::vector<float>& vertex_data = GetVertexData();
void *staging_buffer_memory_pointer;
if( vkMapMemory( GetDevice(), Vulkan.StagingBuffer.Memory, 0,
Vulkan.VertexBuffer.Size, 0, &staging_buffer_memory_pointer) != VK_SUCCESS ) {
std::cout << "Could not map memory and upload data to a staging buffer!" <<
std::endl;
return false;
}
VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.StagingBuffer.Memory, // VkDeviceMemory
memory
0, // VkDeviceSize
offset
Vulkan.VertexBuffer.Size // VkDeviceSize
size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );
// Prepare command buffer to copy data from staging buffer to a vertex buffer
VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};
VkBufferCopy buffer_copy_info = {
0, // VkDeviceSize
srcOffset
0, // VkDeviceSize
dstOffset
Vulkan.VertexBuffer.Size // VkDeviceSize
size
};
vkCmdCopyBuffer( command_buffer, Vulkan.StagingBuffer.Handle,
Vulkan.VertexBuffer.Handle, 1, &buffer_copy_info );
VkBufferMemoryBarrier buffer_memory_barrier = {
VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER, // VkStructureType
sType;
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT, // VkAccessFlags
dstAccessMask
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.VertexBuffer.Handle, // VkBuffer
buffer
0, // VkDeviceSize
offset
VK_WHOLE_SIZE // VkDeviceSize
size
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, 0, 0, nullptr, 1, &buffer_memory_barrier, 0, nullptr
);
vkEndCommandBuffer( command_buffer );
// Submit command buffer and copy data from staging buffer to a vertex buffer
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // uint32_t
waitSemaphoreCount
nullptr, // const VkSemaphore
*pWaitSemaphores
nullptr, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&command_buffer, // const VkCommandBuffer
*pCommandBuffers
0, // uint32_t
signalSemaphoreCount
nullptr // const VkSemaphore
*pSignalSemaphores
};
vkDeviceWaitIdle( GetDevice() );
return true;
7. Tutorial05.cpp, function CopyVertexData()
At the beginning, we get vertex data and map the staging buffer’s memory by calling the vkMapMemory() function.
During the call, we specify a handle of a memory that is bound to a staging buffer, and buffer’s size. This gives us a pointer
that we can use in an ordinary memcpy() function to copy data from our application to graphics hardware.
Next we flush the mapped memory to tell the driver which parts of a memory object were modified. We can specify
multiple ranges of memory if needed. We have one memory area that should be flushed and we specify it by creating a
variable of type VkMappedMemoryRange and by calling a vkFlushMappedMemoryRanges() function. After that we
unmap the memory, but we don’t have to do this. We can keep a pointer for later use and this should not affect the
performance of our application.
Next we start preparing a command buffer. We specify that it will be submitted only once before it will be reset. We
fill a VkCommandBufferBeginInfo structure and provide it to a vkBeginCommandBuffer() function.
Now we perform the copy operation. First a variable of type VkBufferCopy is created. It contains the following fields:
srcOffset – Offset in bytes in a source buffer from which we want to copy data.
dstOffset – Offset in bytes in a destination buffer into which we want to copy data.
size – Size of the data (in bytes) we want to copy.
We copy data from the beginning of a staging buffer and to the beginning of a vertex buffer, so we specify zero for
both offsets. The size of the vertex buffer was calculated based on the hard-coded vertex data, so we copy the same
number of bytes. To copy data from one buffer to another, we call a vkCmdCopyBuffer() function.
First we prepare a variable of type VkBufferMemoryBarrier, which contains the following members:
As you can see, we can set up a barrier for a specific range of buffer’s memory. But here we do it for the whole buffer,
so we specify an offset of 0 and the VK_WHOLE_SIZE enum for the size. We don’t want to transfer ownership between
different queue families, so we use VK_QUEUE_FAMILY_IGNORED enum both for srcQueueFamilyIndex and
dstQueueFamilyIndex.
The most important parameters are srcAccessMask and dstAccessMask. We have copied data from the staging buffer
to a vertex buffer. So before the barrier, the vertex buffer was used as a destination for transfer operations and its memory
was written to. That’s why we have specified VK_ACCESS_MEMORY_WRITE_BIT for a srcAccessMask field. But after that
the barrier buffer will be used only as a source of data for vertex attributes. So for dstAccessMask field we specify
VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT.
To set up a barrier we call a vkCmdPipelineBarrier() function. And to finish command buffer recording, we call
vkEndCommandBuffer(). Next, for all of the above operations to execute, we submit a command buffer by calling
vkQueueSubmit() function.
Normally during the command buffer submission, we should provide a fence. It is signaled once all transfer operations
(and whole command buffer) are finished. But here, for the sake of simplicity, we call vkDeviceWaitIdle() and wait for all
operations executed on a given device to finish. Once all operations complete, we have successfully transferred data to
the device-local memory and we can use the vertex buffer without worrying about performance loss.
Tutorial05 Execution
The results of the rendering operations are exactly the same as in Part 4:
We render a quad that has different colors in each corner: red, green, dark gray, and blue. The quad should adjust its
size (and aspect) to match window’s size and shape.
Cleaning Up
In this part of the tutorial, I have also refactored the cleaning code. We have created two buffers, each with a separate
memory object. To avoid code redundancy, I prepared a buffer cleaning function:
if( buffer.Handle != VK_NULL_HANDLE ) {
vkDestroyBuffer( GetDevice(), buffer.Handle, nullptr );
buffer.Handle = VK_NULL_HANDLE;
}
DestroyBuffer( Vulkan.VertexBuffer );
DestroyBuffer( Vulkan.StagingBuffer );
First we wait for all the operations performed by the device to finish. Next we destroy the vertex and staging buffers.
After that we destroy all other resources in the order opposite to their creation: graphics pipeline, render pass, and
resources for each virtual frame, which consists of a framebuffer, command buffer, two semaphores, a fence, and a
framebuffer. Finally we destroy a command pool from which command buffers were allocated.
Conclusion
In this tutorial we used the recommended technique for transferring data from the application to the graphics
hardware. It gives the best performance for the resources involved in the rendering process and the ability to map and
copy data from the application to the staging buffer. We only need to prepare an additional command buffer recording
and submission to transfer data from one buffer to another.
Using staging buffers is recommended for more than just copying data between buffers. We can use the same
approach to copy data from a buffer to images. And the next part of the tutorial will show how to do this by presenting
the descriptors, descriptor sets, and descriptor layouts, which are another big part of the Vulkan API.
Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. Check with your system manufacturer or
retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
In this tutorial, we will focus on a functionality that is similar to OpenGL* textures. But in Vulkan* there are no such
objects. We have only two resource types in which we can store data: buffers and images (there are also push constants,
but we will cover them in a dedicated tutorial). Each of them can be provided to shaders, in which case we call such
resources descriptors, but we can’t provide them to shaders directly. Instead, they are aggregated in wrapper or container
objects called descriptor sets. We can place multiple resources in a single descriptor set but we need to do it according to
a predefined structure of such set. This structure defines the contents of a single descriptor set—types of resources that
are placed inside it, number of each of these resource types, and their order. This description is specified inside objects
named descriptor set layouts. Similar descriptions need to be specified when we write shader programs. Together they
form an interface between API (our application) and the programmable pipeline (shaders).
When we have prepared a layout, and created a descriptor set, we can fill it; in this way we define specific objects
(buffers and/or images) that we want to use in shaders. After that, before issuing drawing commands inside a command
buffer, we need to bind such a set to the command buffer. This allows us to use the resources from inside the shader
source code; for example, fetch data from a sampled image (a texture), or read a value of a uniform variable stored in a
uniform buffer.
In this part of the tutorial, we will see how to create descriptor set layouts and descriptor sets themselves. We will
also prepare a sampler and an image so we can make them available as a texture inside shaders. We will also learn how
we can use them inside shaders.
As mentioned previously, this tutorial is based on the knowledge presented in all the previous parts of the API without
Secrets: Introduction to Vulkan tutorials, and only the differences and parts important for the described topics are
presented.
Creating an Image
We start by creating an image that will act as our texture. Images represent a continuous area of memory, which is
interpreted according to the rules defined during image creation. In Vulkan, we have only three basic image types: 1D, 2D,
and 3D. Images may have mipmaps (levels of detail), many array layers (at least one is required), or samples per frame.
All these parameters are specified during image creation. In the code sample, we create the most commonly used
two-dimensional image, with one sample per pixel and the four RGBA components.
VkImageCreateInfo image_create_info = {
VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, // VkStructureType sType;
nullptr, // const void *pNext
0, // VkImageCreateFlags flags
VK_IMAGE_TYPE_2D, // VkImageType imageType
VK_FORMAT_R8G8B8A8_UNORM, // VkFormat format
{ // VkExtent3D extent
width, // uint32_t width
height, // uint32_t height
1 // uint32_t depth
},
1, // uint32_t mipLevels
1, // uint32_t arrayLayers
VK_SAMPLE_COUNT_1_BIT, // VkSampleCountFlagBits samples
VK_IMAGE_TILING_OPTIMAL, // VkImageTiling tiling
VK_IMAGE_USAGE_TRANSFER_DST_BIT | // VkImageUsageFlags usage
VK_IMAGE_USAGE_SAMPLED_BIT,
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr, // const uint32_t *pQueueFamilyIndices
VK_IMAGE_LAYOUT_UNDEFINED // VkImageLayout initialLayout
};
To create an image we need to prepare a structure of type VkImageCreateInfo. This structure contains the basic set
of parameters required to create an image. These parameters are specified through the following members:
Most of the parameters defined during image creation are quite self-explanatory or similar to parameters used during
creation of other resources. But three parameters require additional explanation.
Tiling defines the inner memory structure of an image (but don’t confuse it with a layout). Images may have linear or
optimal tiling (buffers always have linear tiling). Images with linear tiling have their texels laid out linearly, one texel after
another, one row after another, and so on. We can query for all the relevant image’s memory parameters (offset and size,
row, array, and depth stride). This way we know how the image’s contents are kept in memory. Such tiling can be used to
copy data to an image directly (by mapping the image’s memory). Unfortunately, there are severe restrictions on images
with linear tiling. For example, the Vulkan specification says that only 2D images must support linear tiling. Hardware
vendors may implement support for linear tiling in other image types, but this is not obligatory, and we can’t rely on such
support. But, what’s more important, linearly tiled images may have worse performance than their optimal counterparts.
When we specify an optimal tiling for images, it means that we don’t know how their memory is structured. Each
platform we execute our application on may keep an image’s contents in a totally different way, so it’s practically
impossible to map an image’s memory and copy it to or from the CPU directly (we need to use a staging resource, a buffer
or an image). But this way we can create whatever images we want (there are no restrictions similar to linearly tiled
images) and our application will have better performance. That’s why it is strongly suggested to always specify optimal
tiling for images.
Now let’s focus on an initialLayout parameter. Layout, as it was described in a tutorial about swapchains, defines an
image’s memory layout and is strictly connected with the way in which we want to use an image. Each specific usage has
its own memory layout. Before we can use an image in a given way we need to perform a layout transition. For example,
swapchain images can be displayed on screen only in VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout. When we want to
render into an image, we need to set its memory layout to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. There
is also a general layout that allows us to use images any way we want to, but as it impacts performance, it’s use is strongly
discouraged (use only when really necessary).
Now, when we want to change the way in which an image is used, we need to perform the above-mentioned layout
transition. We must specify a current (old) layout and a new one. The old layout can have one of two values: current image
layout or an undefined layout. When we specify the value of a current image’s layout, the image contents are preserved
during transition. But when we don’t need an image’s contents, we can provide an undefined layout. In this way layout
transition may be performed faster.
And this is when the initialLayout parameter comes in. We can specify only two values for it—undefined or
preinitialized. The preinitialized layout value allows us to preserve an image’s contents during the image’s first layout
transition. This way we can copy data to an image with memory mapping; but this is quite impractical. We can only copy
data directly (through memory mapping) to images with linear tiling, which have restrictions as mentioned above.
Practically speaking, these images can only be used as staging resources—for transferring data between GPU and CPU.
But for this purpose we can also use buffers; that’s why it is much easier to copy data using a buffer than using an image
with linear tiling.
All this leads to the conclusion that, in most cases, an undefined layout can be used for an initialLayout parameter. In
such a case, an image’s contents cannot be initialized directly (by mapping its memory). But if we want to, we can copy
data to such an image by using a staging buffer. That approach is presented in this tutorial.
One last thing we need to remember is the usage. Similar to buffers, when we create an image we need to designate
ALL the ways in which we intend to use the image. We can’t change it later and we can’t use the image in a way that
wasn’t specified during its creation. Here, we want to use an image as a texture inside shaders. For this purpose we specify
the VK_IMAGE_USAGE_SAMPLED_BIT usage. We also need a way to upload data to the image. We are going to read it
from an image file and copy it to the image object. This can be done by transferring data using a staging resource. In such
a case, the image will be a target of a transfer operation; that’s why we also specify the
VK_IMAGE_USAGE_TRANSFER_DST_BIT usage.
Now, when we have provided values for all the parameters, we can create an image. This is done by calling the
vkCreateImage() function for which we need to provide a handle of a logical device, a pointer to the structure described
above, and a pointer to a variable of type VkImage in which the handle of the created image will be stored.
VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
image_memory_requirements.size, // VkDeviceSize allocationSize
i // uint32_t memoryTypeIndex
};
Each memory type has a specific set of properties. When we want to bind memory to an image, we can have our own
specific requirements too. For example, we may need to access memory directly, by mapping it, so such memory must be
host-visible. If we have additional requirements we can compare them with the properties of each available memory type.
When we find the match, we can use a given memory type and allocate a memory object from it by calling the
vkAllocateMemory() function.
After that, we need to bind such memory to our image. We do this by calling the vkBindImageMemory() function and
providing the handle of an image to which we want to bind memory, a handle of a memory object, and an offset from the
beginning of the memory object, like this:
if( vkBindImageMemory( GetDevice(), Vulkan.Image.Handle, Vulkan.Image.Memory, 0 ) !=
VK_SUCCESS ) {
std::cout << "Could not bind memory to an image!" << std::endl;
return false;
}
4. Tutorial06.cpp, function CreateTexture()
Offset value is very important when we bind memory to an object. Resources in Vulkan have specific requirements
for memory offset alignment. Information about the requirements is also available in the image_memory_requirements
variable. The offset that we provide when we bind a memory must be a multiple of the variable’s alignment member. Zero
is always a valid offset value.
Of course, when we want to bind a memory to an image, we don’t need to create a new memory object each time. It
is more optimal to create a small number of larger memory objects and bind parts of them by providing a proper offset
value.
Creating Image View
When we want to use an image in our application we rarely provide the image’s handle. Image views are usually used
instead. They provide an additional layer that interprets the contents of an image for the purpose of using it in a specific
context. For example, we may have a multilayer image (2D array) and we want to render only to a specific array layer. To
do this we create an image view in which we define the layer we want to use. Another example is an image with six array
layers. Using image views, we can interpret it as a cubemap.
Creation of image views was described in Introduction to Vulkan Part 3: First Triangle, so I will provide only the source
code used in this part.
VkImageViewCreateInfo image_view_create_info = {
VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkImageViewCreateFlags flags
image_parameters.Handle, // VkImage image
VK_IMAGE_VIEW_TYPE_2D, // VkImageViewType viewType
VK_FORMAT_R8G8B8A8_UNORM, // VkFormat format
{ // VkComponentMapping components
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle r
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle g
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle b
VK_COMPONENT_SWIZZLE_IDENTITY // VkComponentSwizzle a
},
{ // VkImageSubresourceRange
subresourceRange
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t layerCount
}
};
VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.StagingBuffer.Memory, // VkDeviceMemory memory
0, // VkDeviceSize offset
data_size // VkDeviceSize size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );
We map the buffer’s memory. This operation gives us a pointer that can be used the way that all other C++ pointers
are used. We copy texture data to it and inform the driver which parts of the buffer’s memory were changed during this
operation (we flush the memory). At the end, we unmap the memory, but this is not necessary.
For the purpose of this tutorial we will use the following image as a texture:
The operation of copying data from a buffer to an image requires recording a command buffer and submitting it to a
queue. Calling the vkBeginCommandBuffer() function starts the recording operation:
// Prepare command buffer to copy data from staging buffer to a vertex buffer
VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};
At the beginning of the command buffer recording we need to perform a layout transition on our image. We want to
copy data to the image so we need to change its layout to a VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. We need to
do this explicitly using an image memory barrier and calling the vkCmdPipelineBarrier() function:
VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t baseMipLevel
1, // uint32_t levelCount
0, // uint32_t baseArrayLayer
1 // uint32_t layerCount
};
VkImageMemoryBarrier image_memory_barrier_from_undefined_to_transfer_dst = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType sType
nullptr, // const void *pNext
0, // VkAccessFlags srcAccessMask
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags dstAccessMask
VK_IMAGE_LAYOUT_UNDEFINED, // VkImageLayout oldLayout
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout newLayout
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.Image.Handle, // VkImage image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1,
&image_memory_barrier_from_undefined_to_transfer_dst);
9. Tutorial06.cpp, function CopyTextureData()
Next, we can copy the data itself. To do this we need to provide parameters describing both a source and a destination
for the data: which parts of the image we want to update (imageSubresource member), a specific region within the
provided part (imageOffset), and the total size of the image. For the source of the data we need to provide an offset from
the beginning of a buffer’s memory where the data starts, and how this data is structured, and the size of an imaginary
image inside the buffer (the size of its rows and columns). Fortunately, we can store our data in such a way that it fits our
image. This allows us to set a zero value for both parameters (bufferRowLength and bufferImageHeight), specifying that
the data is tightly packed according to the image size.
VkBufferImageCopy buffer_image_copy_info = {
0, // VkDeviceSize bufferOffset
0, // uint32_t bufferRowLength
0, // uint32_t bufferImageHeight
{ // VkImageSubresourceLayers imageSubresource
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t mipLevel
0, // uint32_t baseArrayLayer
1 // uint32_t layerCount
},
{ // VkOffset3D imageOffset
0, // int32_t x
0, // int32_t y
0 // int32_t z
},
{ // VkExtent3D imageExtent
width, // uint32_t width
height, // uint32_t height
1 // uint32_t depth
}
};
vkCmdCopyBufferToImage( command_buffer, Vulkan.StagingBuffer.Handle,
Vulkan.Image.Handle, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &buffer_image_copy_info );
10. Tutorial06.cpp, function CopyTextureData()
One last thing is to perform another layout transition. Our image will be used as a texture inside shaders, so we need
to transition it to a VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL layout. After that, we can end our command
buffer, submit it to a queue, and wait for the transfer to complete (in a real-life application, we should skip waiting and
synchronize operations in some other way; for example, using semaphores, to avoid unnecessary pipeline stalls).
VkImageMemoryBarrier image_memory_barrier_from_transfer_to_shader_read = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType sType
nullptr, // const void *pNext
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_SHADER_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout oldLayout
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, // VkImageLayout newLayout
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.Image.Handle, // VkImage image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, nullptr, 0, nullptr, 1,
&image_memory_barrier_from_transfer_to_shader_read);
vkEndCommandBuffer( command_buffer );
// Submit command buffer and copy data from staging buffer to a vertex buffer
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // uint32_t
waitSemaphoreCount
nullptr, // const VkSemaphore
*pWaitSemaphores
nullptr, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&command_buffer, // const VkCommandBuffer
*pCommandBuffers
0, // uint32_t
signalSemaphoreCount
nullptr // const VkSemaphore
*pSignalSemaphores
};
vkDeviceWaitIdle( GetDevice() );
11. Tutorial06.cpp, function CopyTextureData()
Now our image is created and fully initialized (contains proper data). But we are not yet done preparing our texture.
Creating a Sampler
In OpenGL, when we created a texture, both the image and its sampling parameters had to be specified. In later
versions of OpenGL we could also create separate sampler objects. Inside a shader, we usually created variables of type
sampler2D, which also combined both images and their sampling parameters (samplers). In Vulkan, we need to create
images and samplers separately.
Samplers define the way in which image data is read inside shaders: whether filtering is enabled, whether we want to
use mipmaps (or maybe a specific subrange of mipmaps), or what kind of addressing mode we want to use (clamping or
wrapping).
VkSamplerCreateInfo sampler_create_info = {
VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0, // VkSamplerCreateFlags flags
VK_FILTER_LINEAR, // VkFilter magFilter
VK_FILTER_LINEAR, // VkFilter minFilter
VK_SAMPLER_MIPMAP_MODE_NEAREST, // VkSamplerMipmapMode mipmapMode
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeU
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeV
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeW
0.0f, // float mipLodBias
VK_FALSE, // VkBool32 anisotropyEnable
1.0f, // float maxAnisotropy
VK_FALSE, // VkBool32 compareEnable
VK_COMPARE_OP_ALWAYS, // VkCompareOp compareOp
0.0f, // float minLod
0.0f, // float maxLod
VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK,// VkBorderColor borderColor
VK_FALSE // VkBool32
unnormalizedCoordinates
};
All the above parameters are defined through variables of type VkSamplerCreateInfo. It has many members:
Sampler object is created by calling the vkCreateSampler() function, for which we provide a pointer to the structure
described above.
As mentioned at the beginning, resources used inside shaders are called descriptors. In Vulkan we have 11 types of
descriptors:
• Samplers – Define the way image data is read. Inside shaders, samplers can be used with multiple images.
• Sampled images – Define images from which we can read data inside shaders. We can read data from a single
image using different samplers.
• Combined image samplers – These descriptors combine both sampler and sampled image as one object. From
the API perspective (our application), we still need to create both a sampler and an image, but inside the
shader they appear as a single object. Using them may be more optimal (may have better performance) than
using separate samplers and sampled images.
• Storage images – This descriptor allows us to both read and store data inside an image.
• Input attachments – This a specific usage of render pass’s attachments. When we want to read data from an
image which is used as an attachment inside the same render pass, we can only do it through an input
attachment. This way we do not need to end a render pass and start another one, but we are restricted to
only fragment shaders, and to only a single location per fragment shader instance (a given instance of a
fragment shader can read data from coordinates associated with the fragment shader’s coordinates).
• Uniform buffers (and their dynamic variation) – Uniform buffers allow us to read data from uniform variables.
In Vulkan, such variables cannot be placed inside the global scope; we need to use uniform buffers.
• Storage buffers (and their dynamic variation) – Storage buffers allow us to both read and store data inside
variables.
• Uniform texel buffers – These allow the contents of buffers to be treated as if they contain texture data, they
are interpreted as texels with a selected number of components and format. In this way, we can access very
large arrays of data (much larger than uniform buffers).
• Storage texel buffers – These are similar to uniform texel buffers. Not only can they be used for reading, but
they can also be used for storing data.
All of the above descriptors are created from samplers, images, or buffers. The difference is in the way that we use
them and access inside shaders. All additional parameters of such access may have performance implications. For
example, with storage buffers we can only read data, but reading data is probably much faster than storing data inside
storage buffers. Similarly, texel buffers allow us to access more elements than with uniform buffers, but this may also
come with the cost of worse performance. We should remember to select a descriptor that fits our needs.
In this tutorial we want to use a texture. For this purpose we created an image and a sampler. We will use both to
prepare a combined image sampler descriptor.
Descriptor set layout creation starts by defining the parameters of all descriptors available in a given set. This is done
by filling a structure variable of type VkDescriptorSetLayoutBinding:
VkDescriptorSetLayoutBinding layout_binding = {
0, // uint32_t binding
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType descriptorType
1, // uint32_t descriptorCount
VK_SHADER_STAGE_FRAGMENT_BIT, // VkShaderStageFlags stageFlags
nullptr // const VkSampler
*pImmutableSamplers
};
13. Tutorial06.cpp, function CreateDescriptorSetLayout()
The above description contains the following members:
• binding – Index of a descriptor within a given set. All descriptors from a single layout (and set) must have a
unique binding. This same binding is also used inside shaders to access a descriptor.
• descriptorType – The type of a descriptor (sampler, uniform buffer, and so on.)
• descriptorCount – Number of descriptors of a selected type accessed as an array. For a single descriptor, 1
value should be used.
• stageFlags – Set of flags defining all shader stages that will have access to a given descriptor. For better
performance, we should specify only those stages that will access the given resource.
• pImmutableSamplers – Affects only samplers that should be permanently bound into the layout (and cannot
be changed later). But we don’t have to worry about this parameter, and we can bind samplers as any other
descriptors by setting this parameter to null.
In our example, we want to use only one descriptor of a combined image sampler, which will be accessed only by a
fragment shader. It will be the first (binding zero) descriptor in a given layout. To avoid wasting memory, we should keep
bindings as compactly as possible (as close to zero as possible), because drivers may allocate memory for descriptor slots
even if they are not used.
We can prepare similar parameters for other descriptors accessed from a single set. Then, pointers to such variables
are provided to a variable of type VkDescriptorSetLayoutCreateInfo:
VkDescriptorSetLayoutCreateInfo descriptor_set_layout_create_info = {
VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkDescriptorSetLayoutCreateFlags flags
1, // uint32_t
bindingCount
&layout_binding // const
VkDescriptorSetLayoutBinding *pBindings
};
After we have filled in the structure, we can call the vkCreateDescriptorSetLayout() function to create a descriptor
set layout. We will need this layout later, multiple times.
Creating a Descriptor Pool
Next step is to prepare a descriptor set. Descriptor sets, similar to command buffers, are not created directly; they are
instead allocated from pools. Before we can allocate a descriptor set, we need to create a descriptor pool.
VkDescriptorPoolSize pool_size = {
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType
type
1 // uint32_t
descriptorCount
};
VkDescriptorPoolCreateInfo descriptor_pool_create_info = {
VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDescriptorPoolCreateFlags
flags
1, // uint32_t
maxSets
1, // uint32_t
poolSizeCount
&pool_size // const VkDescriptorPoolSize
*pPoolSizes
};
Creating a descriptor pool involves specifying how many descriptor sets can be allocated from it. At the same time,
we also need to specify what types of descriptors, and how many of them can be allocated from the pool across all sets.
For example, let’s imagine that we want to allocate a single sampled image and a single storage buffer from a given pool,
and that we can allocate two descriptor sets from the pool. When doing this, if we allocate one descriptor set with a
sampled image, the second descriptor can contain only a storage buffer. If a single descriptor set allocated from that pool
contains both resources, we can’t allocate another set because it would have to be empty. During descriptor pool creation
we define the total number of descriptors and total number of sets that can be allocated from it. This is done in two steps.
First, we prepare variables of type VkDescriptorPoolSize that specify the type of a descriptor and the total number of
descriptors of a selected type that can be allocated from the pool. Next, we provide an array of such variables to a variable
of type VkDescriptorPoolCreateInfo. It contains the following members:
In our example we want to allocate only a single descriptor set with only one descriptor of a combined image sampler
type. We prepare parameters according to our example and create a descriptor pool by calling the
vkCreateDescriptorPool() function.
To allocate a descriptor set we need to prepare a variable of VkDescriptorSetAllocateInfo type, which has the following
members:
• sType – Standard type of the structure. For the purpose of descriptor set allocation we need to set this
member to a value of VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO.
• pNext – Pointer reserved for extensions.
• descriptorPool – Handle of a descriptor pool from which the command buffer should be allocated.
• descriptorSetCount – Number of descriptor sets we want to allocate (and number of elements in the
pSetLayouts member).
• pSetLayouts – Pointer to an array with at least descriptorSetCount elements. Each element of this array must
contain a descriptor set layout that defines the inner structure of the allocated descriptor set (elements may
repeat; for example, we can allocate five descriptor sets at once, all with the same layout).
As we can see in the above structure, we need to provide descriptor set layouts. That’s why we needed to create them
earlier. To allocate a selected number of descriptor sets from a provided pool we need to provide a pointer to the above
structure to the vkAllocateDescriptorSets() function.
As we don’t have another descriptor set, we need to write to our single descriptor set directly. For each descriptor
type we need to prepare two structures. One, common for all descriptor types, is the VkWriteDescriptorSet structure. It
contains the following members:
Depending on the type of descriptor we want to update, we need to prepare a variable (or an array of variables) of
type VkDescriptorImageInfo, VkDescriptorBufferInfo, or VkBufferView. Here, we want to update a combined image
sampler descriptor, so we need to prepare a variable of type VkDescriptorImageInfo. It contains the following members:
In this structure we provide parameters of specific resources; we point to created and valid resources that we want
to use inside shaders. Members of this structure are initialized based on the descriptor type. For example, if we update a
sampler, we need to provide only the handle of a sampler. If we want to update a sampled image, we need to provide an
image view’s handle and an image’s layout. But image won’t be transitioned to this layout automatically (as in render
passes). We need to perform the transition to this layout ourselves, explicitly through pipeline barriers or, in case of input
attachments, through render passes. What’s more, we need to provide a layout that corresponds to a given usage.
In our example we want to use a texture. We can do this either by using separate sampler and sampled image
descriptors or by using a combined image sampler descriptor (as in typical OpenGL applications). The latter approach can
be more optimal (some hardware platforms may sample data from combined image samplers faster than from separate
samplers and sampled images), and we present that approach here. When we want to update a combined image sampler,
we need to provide all three members of the VkDescriptorImageInfo structure:
VkDescriptorImageInfo image_info = {
Vulkan.Image.Sampler, // VkSampler
sampler
Vulkan.Image.View, // VkImageView
imageView
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL // VkImageLayout
imageLayout
};
VkWriteDescriptorSet descriptor_writes = {
VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.DescriptorSet.Handle, // VkDescriptorSet
dstSet
0, // uint32_t
dstBinding
0, // uint32_t
dstArrayElement
1, // uint32_t
descriptorCount
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType
descriptorType
&image_info, // const VkDescriptorImageInfo
*pImageInfo
nullptr, // const VkDescriptorBufferInfo
*pBufferInfo
nullptr // const VkBufferView
*pTexelBufferView
};
The pipeline layout stores information about resource types that the given pipeline has access to. These resources
involve descriptors and push constant ranges. For now we can skip push constants and focus only on descriptors.
To create a pipeline layout and prepare information about the types of resources accessed by the pipeline, we need
to provide an array of descriptor set layouts. This is done through the following members of a variable of type
VkPipelineLayoutCreateInfo:
And this is when descriptor set layouts are used again. The single descriptor set layout defines resource types
contained within a single descriptor set. And an array of these layouts defines resource types that the given pipeline needs
access to.
To create a pipeline layout we just call the vkCreatePipelineLayout() function. We did this in Introduction to Vulkan
Part 3: First Triangle. But there we created an empty layout (with no push constants and with no access to descriptor
resources). Here, we create a more typical pipeline layout.
VkPipelineLayoutCreateInfo layout_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkPipelineLayoutCreateFlags
flags
1, // uint32_t
setLayoutCount
&Vulkan.DescriptorSet.Layout, // const VkDescriptorSetLayout
*pSetLayouts
0, // uint32_t
pushConstantRangeCount
nullptr // const VkPushConstantRange
*pPushConstantRanges
};
Such layout is then provided during pipeline creation. We also need to use this layout when we bind descriptor sets
during command buffer recording. So we need to store the pipeline layout handle.
Drawing operations requires us to use render passes and pipelines. If a pipeline uses descriptor resources (when
shaders access images or buffers), we need to bind descriptor sets by calling the vkCmdBindDescriptorSets() function. For
this function we must provide a handle of the pipeline layout and an array of descriptor set handles. We bind descriptor
sets to specific indices. The given index we bind a descriptor set to must correspond to its layout provided at the same
index during pipeline creation.
vkCmdBeginRenderPass( command_buffer, &render_pass_begin_info,
VK_SUBPASS_CONTENTS_INLINE );
// ...
vkCmdDraw( command_buffer, 4, 1, 0, 0 );
vkCmdEndRenderPass( command_buffer );
19. Tutorial06.cpp, function PrepareFrame()
From the beginning of this tutorial we have been referring to descriptor sets, bindings within descriptor sets, and
about binding descriptor sets themselves. At the same time, we may have multiple descriptor sets bound to a command
buffer. Each descriptor set may contain multiple resources. This data conforms to a specific address that we use inside
shaders. This address is defined through a layout() specifier like this:
layout(set=S, binding=B) uniform <variable type> <variable name>
Set defines an index that the given descriptor set was bound to through the vkCmdBindDescriptorSets() function.
Binding specifies the index of a resource within the provided set and corresponds to the binding defined during descriptor
set layout creation. In our case, we have only one descriptor set provided at index zero, with only one combined image
sampler at binding zero. Combined image samplers are accessed inside shaders through sampler1D, sampler2D, or
sampler 3D variables. So our fragment shader’s source code looks like this:
#version 450
void main() {
o_Color = texture( u_Texture, v_Texcoord );
}
20. shader.frag, -
Tutorial06 Execution
We can see below how the final image generated by the sample program should look:
We render a quad that has a texture applied to its surface. The quad should adjust its size (and aspect) to match the
window’s size and shape (if we stretch the window, the quad and the image will be stretched too).
Cleaning Up
Before we can end our application, we should perform a cleanup.
// ...
// ...
We destroy both pipeline and its layout by calling the vkDestroyPipeline() and vkDestroyPipelineLayout() functions.
Next, we destroy the descriptor pool with the vkDestroyDescriptorPool() function and the descriptor set layout with the
vkDestroyDescriptorSetLayout() function. We of course destroy other resources, but we already know how to do this.
You may notice that we don’t free a descriptor set. We can free each descriptor set separately if a proper flag was provided
during descriptor pool creation. But we don’t have to—when we destroy a descriptor pool all sets allocated from this pool
are also freed.
Conclusion
This part of the tutorial presented a way to use textures (combined image samplers, in fact) inside shaders. To do this
we created an image and allocated and bound a memory to it. We also created an image view. Next, we copied data from
a staging buffer to the image to initialize its contents. We also created a sampler object that defined a way in which image
data was read inside shaders.
Next, we prepared a descriptor set. First, we created a descriptor set layout. After that, a descriptor pool was created
from which a single descriptor set was allocated. We updated this set with the sampler and the image view handles.
The descriptor set layout was also used to define resources to which our graphics pipeline had access. This was done
during pipeline layout creation. This layout was then used when we bound the descriptor sets to a command buffer.
We also learned how to prepare a shader code that accessed the combined image sampler to read its data (to sample
it as a texture). It was done inside a fragment shader that was used during rendering of our simple geometry. This way we
applied a texture to the surface of this geometry.
In the next tutorial we will see how we can use uniform buffers inside shaders.
Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Performance varies depending on system configuration. Check with your system
manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.