[go: up one dir, main page]

0% found this document useful (0 votes)
53 views158 pages

API Without Secrets

The document introduces Vulkan, a new low-level graphics API designed as a successor to OpenGL, aimed at providing high-performance graphics applications across multiple platforms. It emphasizes the differences between high-level and low-level APIs, highlighting the increased control and performance benefits of Vulkan, while also noting that it requires more effort from developers. The author, Pawel Lapinski, shares his experiences and provides a tutorial for using Vulkan, along with source code examples for both Windows and Linux environments.

Uploaded by

15700079681
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views158 pages

API Without Secrets

The document introduces Vulkan, a new low-level graphics API designed as a successor to OpenGL, aimed at providing high-performance graphics applications across multiple platforms. It emphasizes the differences between high-level and low-level APIs, highlighting the increased control and performance benefits of Vulkan, while also noting that it requires more effort from developers. The author, Pawel Lapinski, shares his experiences and provides a tutorial for using Vulkan, along with source code examples for both Windows and Linux environments.

Uploaded by

15700079681
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 158

API without Secrets: Introduction to Vulkan*

Preface
Written by Pawel Lapinski

About the Author


I have been a software developer for over 9 years. My main area of interest is graphics programming, and most of my
professional career has been involved in 3D graphics. I have a lot of experience in OpenGL* and shading languages (mainly
GLSL and Cg), and for about 3 years I also worked with Unity* software. I have also had opportunities to work on some VR
projects that involved working with head-mounted displays like Oculus Rift* or even CAVE-like systems.

Recently, with our team here at Intel, I was involved in preparing validation tools for our graphics driver’s support for
the emerging API called Vulkan. This graphics programming interface and the approach it represents is new to me. The
idea came to me that while I’m learning about it I can, at the same time, prepare a tutorial for writing applications using
Vulkan. I can share my thoughts and experiences as someone who knows OpenGL and would like to “migrate” to its
successor.

About Vulkan
Vulkan is seen as an OpenGL’s successor. It is a multiplatform API that allows developers to prepare high-performance
graphics applications likes games, CAD tools, benchmarks, and so forth. It can be used on different operating systems like
Windows*, Linux*, or Android*. The Khronos consortium created and maintains Vulkan. Vulkan also shares some other
similarities with OpenGL, including graphics pipeline stages, GLSL shaders (sort of) or nomenclature.

But there are many differences that confirm the need for the new API. OpenGL was changing for over 20 years. Many
things have changed in the computer industry since the early 90s, especially in graphics cards architecture. OpenGL is a
good library, but not everything can be done by only adding new functionalities that match the abilities of new graphics
cards. Sometimes a huge redesign has to be made. And that’s why Vulkan was created.

Vulkan was based on Mantle*—the first in a series of new low-level graphics APIs. Mantle was developed by AMD and
designed only for the architecture of Radeon cards. And despite it being the first publicly available API, games and
benchmarks that used Mantle saw some impressive performance gains. Then other low-level APIs started appearing, such
as Microsoft’s DirectX* 12, Apple’s Metal* and now Vulkan.

What is the difference between traditional graphics APIs and new low-level APIs? High-level APIs like OpenGL are quite
easy to use. The developer declares what they want to do and how they want to do it, and the driver handles the rest. The
driver checks whether the developer uses API calls in the proper way, whether the correct parameters are passed, and
whether the state is adequately prepared. If problems occur, feedback is provided. For ease of use, many tasks have to be
done “behind the scenes” by the driver.

In low-level APIs the developer is the one who must take care of most things. They are required to adhere to strict
programming and usage rules and also must write much more code. But this approach is reasonable. The developer knows
what they want to do and what they want to achieve. The driver does not, so with traditional APIs the driver has to make
additional effort for the program to work properly. With APIs like Vulkan this additional effort can be avoided. That’s why
DirectX 12, Metal, or Vulkan are called thin-drivers/thin-APIs. Mostly they only communicate user requests to the
hardware, providing only a thin abstraction layer of the hardware itself. The driver does as little as possible for the sake
of much higher performance.
Low-level APIs require additional work on the application side. But this work can’t be avoided. Someone or something
has to do it. So it is much more reasonable for the developer to do it, as they know how to divide work into separate
threads, when the image would be a render target (color attachment) or used as a texture/sampler, and so on. The
developer knows what pipeline state or what vertex attributes changes more often. All that leads to far more effective
use of the graphics card hardware. And the best part is that it works. An impressive performance boost can be observed.

But the word “can” is important. It requires additional effort but also a proper approach. There are scenarios in which
no difference in performance between OpenGL and Vulkan will be observed. If someone doesn’t need multithreading or
if the application isn’t CPU bound (rendered scenes aren’t too complex), OpenGL is enough and using Vulkan will not give
any performance boost (but it may lower power consumption, which is important on mobile devices). But if we want to
squeeze every last bit from our graphics hardware, Vulkan is the way to go.

Sooner or later all major graphics engines will support some, if not all, of the new low-level APIs. So if we want to use
Vulkan or other APIs, we won’t have to write everything from scratch. But it is always good to know what is going on
“under the hood”, and that’s the reason I have prepared this tutorial.

A Note about the Source Code


I’m a Windows developer. When given a choice I write applications for Windows. That’s because I don’t have
experience with other operating systems. But Vulkan is a multiplatform API and I want to show that it can be used on
different operating systems. That’s why I’ve prepared a sample project that can be compiled and executed both on
Windows and Linux.

Source code for this tutorial can be found here:

https://github.com/GameTechDev/IntroductionToVulkan

I have tried to write code samples that are as simple as possible and to not clutter the code with unnecessary “#ifdefs”.
Sometimes this can’t be avoided (like in window creation and management) so I decided to divide the code into small
parts:

 Tutorial files are the most important here. They are the ones where all the exciting Vulkan-related code is placed.
Each lesson is placed in one header/source pair.
 OperatingSystem header and source files contain OS-dependent parts of code like window creation, message
processing, and rendering loops. These files contain code for both Linux and Windows, but I tried to unify them
as much as possible.
 main.cpp file is a starting point for each lesson. As it uses my custom Window class it doesn’t contain any OS-
specific code.
 VulkanCommon header/source files contain the base class for all tutorials starting from tutorial 3. This class
basically replicates tutorials 1 and 2—creation of a Vulkan instance and all other resources necessary for the
rendered image to appear on the screen. I’ve extracted this preparation code so the code of all the other chapters
could focus on only the presented topics.
 Tools contain some additional utility functions and classes like a function that reads the contents of a binary file
or a wrapper class for automatic object destruction.

The code for each chapter is placed in a separate folder. Sometimes it may contain an additional Data directory in
which resources like shaders or textures for a given chapter are placed. This Data folder should be copied to the same
directory in which executables will be held. By default executables are compiled into a build folder.

Right. Compilation and build folder. As the sample project should be easily maintained both on Windows and Linux
I’ve decided to use CMakeLists.txt file and a CMake tool. On Windows there is a build.bat file that creates a Visual Studio*
solution—Microsoft Visual Studio 2013 is required to compile the code on Windows (by default). On Linux I’ve provided a
build.sh script that compiles the code using make but CMakeLists.txt can also be easily opened with tools like Qt. CMake
is of course also required.

Solution and project files are generated and executables are compiled into the build folder. This folder is also the
default working directory, so the Data folders should be copied into it for the lessons to work properly. During execution,
in case of any problems, additional information is “printed” in cmd/terminal. So if there is something wrong, run the lesson
from the command line/terminal or look into the console/terminal window to see if any messages are displayed.

I hope these notes will help you understand and follow my Vulkan tutorial. Now let’s focus on learning Vulkan itself!

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


API without Secrets: Introduction to Vulkan*
Part 1
Table of Contents
Tutorial 1: Vulkan* – The Beginning ....................................................................................................................................... 2
Loading Vulkan Runtime Library and Acquiring Pointer to an Exported Function ............................................................. 2
Acquiring Pointers to Global-Level Functions ..................................................................................................................... 4
Creating a Vulkan Instance ................................................................................................................................................. 5
Acquiring Pointers to Instance-Level Functions .................................................................................................................. 7
Creating a Logical Device .................................................................................................................................................... 9
Device Properties .......................................................................................................................................................... 10
Device Features ............................................................................................................................................................. 10
Queues, Queue Families, and Command Buffers ......................................................................................................... 11
Acquiring Pointers to Device-Level Functions................................................................................................................... 14
Retrieving Queues ............................................................................................................................................................. 15
Tutorial01 Execution ......................................................................................................................................................... 15
Cleaning Up ....................................................................................................................................................................... 15
Conclusion ......................................................................................................................................................................... 16
Tutorial 1: Vulkan* – The Beginning
We start with a simple application that unfortunately doesn’t display anything. I won’t present the full source code
(with windowing, rendering loop, and so on) here in the text as the tutorial would be too long. The entire sample project
with full source code is available in a provided example that can be found at
https://github.com/gametechdev/IntroductionToVulkan. Here I show only the parts of the code that are relevant to
Vulkan itself. There are several ways to use the Vulkan API in our application:

1. We can dynamically load the driver’s library that provides Vulkan API implementation and acquire function
pointers by ourselves from it.
2. We can use the Vulkan SDK and link with the provided Vulkan Runtime (Vulkan Loader) static library.
3. We can use the Vulkan SDK, dynamically load Vulkan Loader library at runtime, and load function pointers from
it.

The first approach is not recommended. Hardware vendors can modify their drivers in any way, and it may affect
compatibility with a given application. It may even break the application and requiredevelopers writing a Vulkan-enabled
application to rewrite some parts of the code. That’s why it’s better to use some level of abstraction.

The recommended solution is to use the Vulkan Loader from the Vulkan SDK. It provides more configuration abilities
and more flexibility without the need to modify Vulkan application source code. One example of the flexibility is Layers.
The Vulkan API requires developers to create applications that strictly follow API usage rules. In case of any errors, the
driver provides us with little feedback, only some severe and important errors are reported (for example, out of memory).
This approach is used so the API itself can be as small (thin) and as fast as possible. But if we want to obtain more
information about what we are doing wrong we have to enable debug/validation layers. There are different layers for
different purposes such as memory usage, proper parameter passing, object life-time checking, and so on. These layers
all slow down the application’s performance but provide us with much more information.

We also need to choose whether we want to statically link with a Vulkan Loader or whether we will load it dynamically
and acquire function pointers by ourselves at runtime. This choice is just a matter of personal preference. This paper
focuses on the third way of using Vulkan: dynamically loading function pointers from the Vulkan Runtime library. This
approach is similar to what we had to do when we wanted to use OpenGL* on a Windows* system in which only some
basic functions were provided by the default implementation. The remaining functions had to be loaded dynamically using
wglGetProcAddress() or standard windows GetProcAddress() functions. This is what wrangler libraries such as GLEW or
GL3W were created for.

Loading Vulkan Runtime Library and Acquiring Pointer to an Exported Function


In this tutorial we go through the process of acquiring Vulkan functions pointers by ourselves. We load them from the
Vulkan Runtime library (Vulkan Loader) which should be installed along with the graphics driver that supports Vulkan. The
dynamic library for Vulkan (Vulkan Loader) is named vulkan-1.dll on Windows* and libvulkan.so on Linux*.

From now on, I refer to the first tutorial’s source code, focusing on the Tutorial01.cpp file. So in the initialization code
of our application we have to load the Vulkan library with something like this:
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VulkanLibrary = LoadLibrary( "vulkan-1.dll" );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
VulkanLibrary = dlopen( "libvulkan.so", RTLD_NOW );
#endif

if( VulkanLibrary == nullptr ) {


printf( "Could not load Vulkan library!\n" );
return false;
}
return true;
1. Tutorial01.cpp, function LoadVulkanLibrary()
VulkanLibrary is a variable of type HMODULE in Windows or just void* in Linux. If the value returned by the library
loading function is not 0 we can load all exported functions. The Vulkan library, as well as Vulkan implementations (every
driver from every vendor), are required to expose only one function that can be loaded with the standard techniques our
OS possesses (like the previously mentioned GetProcAddress() in Windows or dlsym() in Linux). Other functions from the
Vulkan API may also be available for acquiring using this method but it is not guaranteed (and even not recommended).
The only function that must be exported is vkGetInstanceProcAddr().

This function is used to load all other Vulkan functions. To ease our work of obtaining addresses of all Vulkan API
functions it is very convenient to place their names inside a macro. This way we won’t have to duplicate function names
in multiple places (like definition, declaration, or loading) and can keep them in only one header file. This single file will be
used later for different purposes with an #include directive. We can declare our exported function like this:
#if !defined(VK_EXPORTED_FUNCTION)
#define VK_EXPORTED_FUNCTION( fun )
#endif

VK_EXPORTED_FUNCTION( vkGetInstanceProcAddr )

#undef VK_EXPORTED_FUNCTION
2. ListOfFunctions.inl, -

Now we define the variables that will represent functions from the Vulkan API. This can be done with something like
this:
#include "vulkan.h"

#define VK_EXPORTED_FUNCTION( fun ) PFN_##fun fun;


#define VK_GLOBAL_LEVEL_FUNCTION( fun ) PFN_##fun fun;
#define VK_INSTANCE_LEVEL_FUNCTION( fun ) PFN_##fun fun;
#define VK_DEVICE_LEVEL_FUNCTION( fun ) PFN_##fun fun;

#include "ListOfFunctions.inl"
3. VulkanFunctions.cpp

Here we first include the vulkan.h file, which is officially provided for developers that want to use Vulkan API in their
applications. This file is similar to the gl.h file in the OpenGL library. It defines all enumerations, structures, types, and
function types that are necessary for Vulkan application development. Next we define the macros for functions from each
“level” (I will describe these levels soon). The function definition requires providing function type and a function name.
Fortunately, function types in Vulkan can be easily derived from function names. For example, the definition of
vkGetInstanceProcAddr() function’s type looks like this:
typedef PFN_vkVoidFunction (VKAPI_PTR *PFN_vkGetInstanceProcAddr)(VkInstance instance,
const char* pName);
4. Vulkan.h

The definition of a variable that represents this function would then look like this:
PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr;
-
This is what the macros from VulkanFunctions.cpp file expand to. They take the function name (hidden in a “fun”
parameter) and add “PFN_” at the beginning. Then the macro places a space after the type, and adds a function name and
a semicolon after that. Functions are “pasted” into the file in the line with the #include “ListOfFunctions.inl” directive.

But we must remember that when we want to define Vulkan functions’ prototypes by ourselves we must define the
VK_NO_PROTOTYPES preprocessor directive. By default the vulkan.h header file contains definitions of all functions. This
is useful when we are statically linking with Vulkan Runtime. So when we add our own definitions, there will be a
compilation error claiming that the given variables (for function pointers) are defined more than once (since we would
break the One Definition rule). We can disable definitions from vulkan.h file using the mentioned preprocessor macro.

Similarly we need to declare variables defined in the VulkanFunctions.cpp file so they would be seen in all other parts
of our code. This is done in the same way, but the word “extern” is placed before each function. Compare to the
VulkanFunctions.h file.

Now we have variables in which we can store addresses of functions acquired from the Vulkan library. To load the
only one exported function, we can use the following code:
#if defined(VK_USE_PLATFORM_WIN32_KHR)
#define LoadProcAddress GetProcAddress
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
#define LoadProcAddress dlsym
#endif

#define VK_EXPORTED_FUNCTION( fun ) \


if( !(fun = (PFN_##fun)LoadProcAddress( VulkanLibrary, #fun )) ) { \
printf( "Could not load exported function: " #fun "!\n" ); \
return false; \
}

#include "ListOfFunctions.inl"

return true;
5. Tutorial01.cpp, function LoadExportedEntryPoints()

This macro takes the function name from the “fun” parameter, converts it into a string (with #) and obtains its address
from VulkanLibrary. The address is acquired using the GetProcAddress() (on Windows) or dlsym() (on Linux) function and
is stored in the variable represented by fun. If this operation fails and the function is not exposed from the library, we
report this problem by printing the proper information and returning false. The macro operates on lines included from
ListOfFunctions.inl. This way we don’t have to write the names of functions multiple times.

Now that we have our main function-loading procedure, we can load the rest of the Vulkan API procedures. These can
be divided into three types:

 Global-level functions. Allow us to create a Vulkan instance.


 Instance-level functions. Check what Vulkan-capable hardware is available and what Vulkan features are
exposed.
 Device-level functions. Responsible for performing jobs typically done in a 3D application (like drawing).

We will start with acquiring instance creation functions from the global level.

Acquiring Pointers to Global-Level Functions


Before we can create a Vulkan instance we must acquire the addresses of functions that will allow us to do it. Here is
a list of these functions:

 vkCreateInstance
 vkEnumerateInstanceExtensionProperties
 vkEnumerateInstanceLayerProperties

The most important function is vkCreateInstance(), which allows us to create a “Vulkan instance.” From application
point of view Vulkan instance can be thought of as an equivalent of OpenGL’s rendering context. It stores per-application
state (there is no global state in Vulkan) like enabled instance-level layers and extensions. The other two functions allow
us to check what instance layers are available and what instance extensions are available. Validation layers are divided
into instance and device levels depending on what functionality they debug. Extensions in Vulkan are similar to OpenGL’s
extensions: they expose additional functionality that is not required by core specifications, and not all hardware vendors
may implement them. Extensions, like layers, are also divided into instance and device levels, and extensions from
different levels must be enabled separately. In OpenGL, all extensions are (usually) available in created contexts; using
Vulkan we have to enable them before the functionality exposed by them can be used.

We call the function vkGetInstanceProcAddr() to acquire addresses of instance-level procedures. It takes two
parameters: an instance, and a function name. We don’t have an instance yet so we provide “null” for the first parameter.
That’s why these functions may sometimes be called null-instance or no-instance level functions. The second parameter
required by the vkGetInstanceProcAddr() function is a name of a procedure address of which we want to acquire. We can
only load global-level functions without an instance. It is not possible to load any other function without an instance handle
provided in the first parameter.

The code that loads global-level functions may look like this:
#define VK_GLOBAL_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( nullptr, #fun )) ) { \
printf( "Could not load global level function: " #fun "!\n" ); \
return false; \
}

#include "ListOfFunctions.inl"

return true;
6. Tutorial01.cpp, function LoadGlobalLevelEntryPoints()

The only difference between this code and the code used for loading the exported function (vkGetInstanceProcAddr()
exposed by the library) is that we don’t use function provided by the OS, like GetProcAddress(), but we call
vkGetInstanceProcAddr() where the first parameter is set to null.

If you follow this tutorial and write the code yourself, make sure you add global-level functions wrapped in a properly
named macro to ListOfFunctions.inl header file:
#if !defined(VK_GLOBAL_LEVEL_FUNCTION)
#define VK_GLOBAL_LEVEL_FUNCTION( fun )
#endif

VK_GLOBAL_LEVEL_FUNCTION( vkCreateInstance )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceExtensionProperties )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceLayerProperties )

#undef VK_GLOBAL_LEVEL_FUNCTION
7. ListOfFunctions.inl

Creating a Vulkan Instance


Now that we have loaded global-level functions, we can create a Vulkan instance. This is done by calling the
vkCreateInstance() function, which takes three parameters.
 The first parameter has information about our application, the requested Vulkan version, and the instance
level layers and extensions we want to enable. This all is done with structures (structures are very common in
Vulkan).
 The second parameter provides a pointer to a structure with list of different functions related to memory
allocation. They can be used for debugging purposes but this feature is optional and we can rely on built-in
memory allocation methods.
 The third parameter is an address of a variable in which we want to store Vulkan instance handle. In the Vulkan
API it is common that results of operations are stored in variables we provide addresses of. Return values are
used only for some pass/fail notifications. Here is the full source code for instance creation:
VkApplicationInfo application_info = {
VK_STRUCTURE_TYPE_APPLICATION_INFO, // VkStructureType sType
nullptr, // const void *pNext
"API without Secrets: Introduction to Vulkan", // const char
*pApplicationName
VK_MAKE_VERSION( 1, 0, 0 ), // uint32_t
applicationVersion
"Vulkan Tutorial by Intel", // const char
*pEngineName
VK_MAKE_VERSION( 1, 0, 0 ), // uint32_t
engineVersion
VK_API_VERSION // uint32_t
apiVersion
};

VkInstanceCreateInfo instance_create_info = {
VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0, // VkInstanceCreateFlags flags
&application_info, // const VkApplicationInfo
*pApplicationInfo
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
0, // uint32_t
enabledExtensionCount
nullptr // const char * const
*ppEnabledExtensionNames
};

if( vkCreateInstance( &instance_create_info, nullptr, &Vulkan.Instance ) !=


VK_SUCCESS ) {
printf( "Could not create Vulkan instance!\n" );
return false;
}
return true;
8. Tutorial01.cpp, function CreateInstance()

Most of the Vulkan structures begin with a field describing the type of the structure. Parameters are provided to
functions by pointers to avoid copying big memory chunks. Sometimes, inside structures, pointers to other structures, are
also provided. For the driver to know how many bytes it should read and how members are aligned, the type of the
structure is always provided. So what exactly do all these parameters mean?

 sType – Type of the structure. In this case it informs the driver that we are providing information for instance
creation by providing a value of VK_STRUCTURE_TYPE_APPLICATION_INFO.
 pNext – Additional information for instance creation may be provided in future versions of Vulkan API and this
parameter will be used for that purpose. For now, it is reserved for future use.
 flags – Another parameter reserved for future use; for now it must be set to 0.
 pApplicationInfo – An address of another structure with information about our application (like name, version,
required Vulkan API version, and so on).
 enabledLayerCount – Defines the number of instance-level validation layers we want to enable.
 ppEnabledLayerNames – This is an array of enabledLayerCount elements with the names of the layers we
would like to enable.
 enabledExtensionCount – The number of instance-level extensions we want to enable.
 ppEnabledExtensionNames – As with layers, this parameter should point to an array of at least
enabledExtensionCount elements containing names of instance-level extensions we want to use.

Most of the parameters can be nulls or zeros. The most important one (apart from the structure type information) is
a parameter pointing to a variable of type VkApplicationInfo. So before specifying instance creation information, we also
have to specify an additional variable describing our application. This variable contains the name of our application, the
name of the engine we are using, or the Vulkan API version we require (which is similar to the OpenGL version; if the driver
doesn’t support this version, the instance will not be created). This information may be very useful for the driver.
Remember that some graphics card vendors provide drivers that can be specialized for a specific title, such as a specific
game. If a graphics card vendor knows what graphics the engine game uses, it can optimize the driver’s behavior so the
game performs faster. This application information structure can be used for this purpose. The parameters from the
VkApplicationInfo structure include:

 sType – Type of structure. Here VK_STRUCTURE_TYPE_APPLICATION_INFO, information about the


application.
 pNext – Reserved for future use.
 pApplicationName – Name of our application.
 applicationVersion – Version of our application; it is quite convenient to use Vulkan macro for version creation.
It packs major, minor, and patch numbers into one 32-bit value.
 pEngineName – Name of the engine our application uses.
 engineVersion – Version of the engine we are using in our application.
 apiVersion – Version of the Vulkan API we want to use. It is best to provide the version defined in the Vulkan
header we are including, which is why we use VK_API_VERSION found in the vulkan.h header file.

So now that we have defined these two structures we can call the vkCreateInstance() function and check whether an
instance was created. If successful, instance handle will be stored in a variable we provided the address of and VK_SUCCESS
(which is zero!) is returned.

Acquiring Pointers to Instance-Level Functions


We have created a Vulkan instance. Next we can acquire pointers to functions that allow us to create a logical device,
which can be seen as a user view on a physical device. There may be many different devices installed on a computer that
support Vulkan. Each of these devices may have different features and capabilities and different performance, or may
support different functionalities. When we want to use Vulkan, we must specify which device to perform the operations
on. We may use many devices for different purposes (such as one for rendering 3D graphics, one for physics calculations,
and one for media decoding). We must check what devices and how many of them are available, what their capabilities
are, and what operations they support. This is all done with instance-level functions. We get the addresses of these
functions using the vkGetInstanceProcAddr() function used earlier. But this time we will provide handle to a created
Vulkan instance.

Loading every Vulkan procedure using the vkGetInstanceProcAddr() function and Vulkan instance handle comes with
some trade-offs. When we use Vulkan for data processing, we must create a logical device and acquire device-level
functions. But on the computer that runs our application, there may be many devices that support Vulkan. Determining
which device to use depends on the mentioned logical device. But vkGetInstanceProcAddr() doesn’t recognize a logical
device, as there is no parameter for it. When we acquire device-level procedures using this function we in fact acquire
addresses of a simple “jump” functions. These functions take the handle of a logical device and jump to a proper
implementation (function implemented for a specific device). The overhead of this jump can be avoided. The
recommended behavior is to load procedures for each device separately using another function. But we still have to use
the vkGetInstanceProcAddr() function to load functions that allow us to create such a logical device.

Some of the instance level functions include:

 vkEnumeratePhysicalDevices
 vkGetPhysicalDeviceProperties
 vkGetPhysicalDeviceFeatures
 vkGetPhysicalDeviceQueueFamilyProperties
 vkCreateDevice
 vkGetDeviceProcAddr
 vkDestroyInstance

These are the functions that are required and are used in this tutorial to create a logical device. But there are other
instance-level functions, that is, from extensions. The list in a header file from the example solution’s source code will
expand. The source code used to load all these functions is:
#define VK_INSTANCE_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( Vulkan.Instance, #fun )) ) { \
printf( "Could not load instance level function: " #fun "\n" ); \
return false; \
}

#include "ListOfFunctions.inl"

return true;
9. Tutorial01.cpp, function LoadInstanceLevelEntryPoints()

The code for loading instance-level functions is almost identical to the code loading global-level functions. We just
change the first parameter of vkGetInstanceProcAddr() function from null to create Vulkan instance handle. Of course we
also operate on instance-level functions so now we redefine the VK_INSTANCE_LEVEL_FUNCTION() macro instead of a
VK_GLOBAL_LEVEL_FUNCTION() macro. We also need to define functions from the instance level. As before, this is best
done with a list of macro-wrapped names collected in a shared header, for example:
#if !defined(VK_INSTANCE_LEVEL_FUNCTION)
#define VK_INSTANCE_LEVEL_FUNCTION( fun )
#endif

VK_INSTANCE_LEVEL_FUNCTION( vkDestroyInstance )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumeratePhysicalDevices )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceFeatures )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceQueueFamilyProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkCreateDevice )
VK_INSTANCE_LEVEL_FUNCTION( vkGetDeviceProcAddr )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumerateDeviceExtensionProperties )

#undef VK_INSTANCE_LEVEL_FUNCTION
10. ListOfFunctions.inl
Instance-level functions operate on physical devices. In Vulkan we can see “physical devices” and “logical devices”
(simply called devices). As the name suggests, a physical device refers to any physical graphics card (or any other hardware
component) that is installed on a computer running a Vulkan-enabled application that is capable of executing Vulkan
commands. As mentioned earlier, such a device may expose and implement different (optional) Vulkan features, may have
different capabilities (like total memory or ability to work on buffer objects of different sizes), or may provide different
extensions. Such hardware may be a dedicated (discrete) graphics card or an additional chip built (integrated) into a main
processor. It may even be the CPU itself. Instance-level functions allow us to check all these parameters. After we check
them, we must decide (based on our findings and our needs) which physical device we want to use. Maybe we even want
to use more than one device, which is also possible, but this scenario is too advanced for now. So if we want to harness
the power of any physical device we must create a logical device that represents our choice in the application (along with
enabled layers, extensions, features, and so on). After creating a device (and acquiring queues) we are prepared to use
Vulkan, the same way as we are prepared to use OpenGL after creating rendering context.

Creating a Logical Device


Before we can create a logical device, we must first check to see how many physical devices are available in the system
we execute our application on. Next we can get handles to all available physical devices:
uint32_t num_devices = 0;
if( (vkEnumeratePhysicalDevices( Vulkan.Instance, &num_devices, nullptr ) !=
VK_SUCCESS) ||
(num_devices == 0) ) {
printf( "Error occurred during physical devices enumeration!\n" );
return false;
}

std::vector<VkPhysicalDevice> physical_devices( num_devices );


if( vkEnumeratePhysicalDevices( Vulkan.Instance, &num_devices, &physical_devices[0] )
!= VK_SUCCESS ) {
printf( "Error occurred during physical devices enumeration!\n" );
return false;
}
11. Tutorial01.cpp, function CreateDevice()

To check how many devices are available, we call the vkEnumeratePhysicalDevices() function. We call it twice, first
with the last parameter set to null. This way the driver knows that we are asking only for the number of available physical
devices. This number will be stored in the variable we provided the address of in the second parameter.

Now that we know how many physical devices are available we can prepare storage for their handles. I use a vector
so I don’t need to worry about memory allocation and deallocation. When we call vkEnumeratePhysicalDevices() again,
this time with all the parameters not equal to null, we will acquire handles of the physical devices in the array we provided
addresses of in the last parameter. This array may not be the same size as the number returned after the first call, but it
must hold the same number of elements as defined in the second parameter.

Example: we can have four physical devices available, but we are interested only in the first one. So after the first call
we set a value of four in num_devices. This way we know that there is any Vulkan-compatible device and that we can
proceed. We overwrite this value with one as we only want to use one (any) such device, no matter which. And we will
get only one physical device handle after the second call.

The number of devices we provided will be replaced by the actual number of enumerated physical devices (which of
course will not be greater than the value we provided). Example: we don’t want to call this function twice. Our application
supports up to 10 devices and we provide this value along with a pointer to a static, 10-element array. The driver always
returns the number of actually enumerated devices. If there is none, zero is stored in the variable address we provided. If
there is any such device, we will also know that. We will not be able to tell if there are more than 10 devices.
Now that we have handles of all the Vulkan compatible physical devices we can check the properties of each device.
In the sample code, this is done inside a loop:
VkPhysicalDevice selected_physical_device = VK_NULL_HANDLE;
uint32_t selected_queue_family_index = UINT32_MAX;
for( uint32_t i = 0; i < num_devices; ++i ) {
if( CheckPhysicalDeviceProperties( physical_devices[i], selected_queue_family_index
) ) {
selected_physical_device = physical_devices[i];
}
}
12. Tutorial01.cpp, function CreateDevice()

Device Properties
I created the CheckPhysicalDeviceProperties() function. It takes the handle of a physical device and checks whether
the capabilities of a given device are enough for our application to work properly. If so, it returns true and stores the queue
family index in the variable provided in the second parameter. Queues and queue families are discussed in a later section.

Here is the first half of a CheckPhysicalDeviceProperties() function:


VkPhysicalDeviceProperties device_properties;
VkPhysicalDeviceFeatures device_features;

vkGetPhysicalDeviceProperties( physical_device, &device_properties );


vkGetPhysicalDeviceFeatures( physical_device, &device_features );

uint32_t major_version = VK_VERSION_MAJOR( device_properties.apiVersion );


uint32_t minor_version = VK_VERSION_MINOR( device_properties.apiVersion );
uint32_t patch_version = VK_VERSION_PATCH( device_properties.apiVersion );

if( (major_version < 1) &&


(device_properties.limits.maxImageDimension2D < 4096) ) {
printf( "Physical device %p doesn't support required parameters!\n",
physical_device );
return false;
}
13. Tutorial01.cpp, function CheckPhysicalDeviceProperties()

At the beginning of this function, the physical device is queried for its properties and features. Properties contain fields
such as supported Vulkan API version, device name and type (integrated or dedicated/discrete GPU), Vendor ID, and limits.
Limits describe how big textures can be created, how many samples in anti-aliasing are supported, or how many buffers
in a given shader stage can be used.

Device Features
Features are additional hardware capabilities that are similar to extensions. They may not necessarily be supported
by the driver and by default are not enabled. Features contain items such as geometry and tessellation shaders multiple
viewports, logical operations, or additional texture compression formats. If a given physical device supports any feature
we can enable it during logical device creation. Features are not enabled by default in Vulkan. But the Vulkan spec points
out that some features may have performance impact (like robustness).

After querying for hardware info and capabilities, I have provided a small example of how these queries can be used.
I “reversed” the VK_MAKE_VERSION macro and retrieved major, minor, and patch versions from the apiVersion field of
device properties. I check whether it is above some version I want to use, and also check whether I can create 2D textures
of a given size. In this example I’m not using features at all, but if we want to use any feature (that is, geometry shaders)
we must check whether it is supported and we must (explicitly) enable it later, during logical device creation. And this is
the reason why we need to create a logical device and not use physical device directly. A logical device represents a
physical device and all the features and extensions we enabled for it.

The next part of checking physical device’s capabilities—queues—requires additional explanation.

Queues, Queue Families, and Command Buffers


When we want to process any data (that is, draw a 3D scene from vertex data and vertex attributes) we call Vulkan
functions that are passed to the driver. These functions are not passed directly, as sending each request separately down
through a communication bus is inefficient. It is better to aggregate them and pass in groups. In OpenGL this was done
automatically by the driver and was hidden from the user. OpenGL API calls were queued in a buffer and if this buffer was
full (or if we requested to flush it) whole buffer was passed to hardware for processing. In Vulkan this mechanism is directly
visible to the user and, more importantly, the user must specifically create and manage buffers for commands. These are
called (conveniently) command buffers.

Command buffers (as whole objects) are passed to the hardware for execution through queues. However, these
buffers may contain different types of operations, such as graphics commands (used for generating and displaying images
like in typical 3D games) or compute commands (used for processing data). Specific types of commands may be processed
by dedicated hardware, and that’s why queues are also divided into different types. In Vulkan these queue types are called
families. Each queue family may support different types of operations. That’s why we also have to check if a given physical
device supports the type of operations we want to perform. We can also perform one type of operation on one device
and another type of operation on another device, but we have to check if we can. This check is done in the second half of
CheckPhysicalDeviceProperties() function:
uint32_t queue_families_count = 0;
vkGetPhysicalDeviceQueueFamilyProperties( physical_device, &queue_families_count,
nullptr );
if( queue_families_count == 0 ) {
printf( "Physical device %p doesn't have any queue families!\n", physical_device );
return false;
}

std::vector<VkQueueFamilyProperties> queue_family_properties( queue_families_count );


vkGetPhysicalDeviceQueueFamilyProperties( physical_device, &queue_families_count,
&queue_family_properties[0] );
for( uint32_t i = 0; i < queue_families_count; ++i ) {
if( (queue_family_properties[i].queueCount > 0) &&
(queue_family_properties[i].queueFlags & VK_QUEUE_GRAPHICS_BIT) ) {
queue_family_index = i;
return true;
}
}

printf( "Could not find queue family with required properties on physical device
%p!\n", physical_device );
return false;
14. Tutorial01.cpp, function CheckPhysicalDeviceProperties()

We must first check how many different queue families are available in a given physical device. This is done in a similar
way to enumerating physical devices. First we call vkGetPhysicalDeviceQueueFamilyProperties() with the last parameter
set to null. This way, in a “queue_count” a variable number of different queue families is stored. Next we can prepare a
place for this number of queue families’ properties (if we want to—the mechanism is similar to enumerating physical
devices). Next we call the function again and the properties for each queue family are stored in a provided array.

The properties of each queue family contain queue flags, the number of available queues in this family, time stamp
support, and image transfer granularity. Right now, the most important part is the number of queues in the family and
flags. Flags (which is a bitfield) define which types of operations are supported by a given queue family (more than one
may be supported). It can be graphics, compute, transfer (memory operations like copying), and sparse binding (for sparse
resources like mega-textures) operations. Other types may appear in the future.

In our example we check for graphics operations support, and if we find it we can use the given physical device.
Remember that we also have to remember the selected family index. After we chose the physical device we can create
logical device that will represent it in the rest of our application, as shown in the example:
if( selected_physical_device == VK_NULL_HANDLE ) {
printf( "Could not select physical device based on the chosen properties!\n" );
return false;
}

std::vector<float> queue_priorities = { 1.0f };

VkDeviceQueueCreateInfo queue_create_info = {
VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceQueueCreateFlags
flags
selected_queue_family_index, // uint32_t
queueFamilyIndex
static_cast<uint32_t>(queue_priorities.size()), // uint32_t
queueCount
&queue_priorities[0] // const float
*pQueuePriorities
};

VkDeviceCreateInfo device_create_info = {
VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceCreateFlags
flags
1, // uint32_t
queueCreateInfoCount
&queue_create_info, // const VkDeviceQueueCreateInfo
*pQueueCreateInfos
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
0, // uint32_t
enabledExtensionCount
nullptr, // const char * const
*ppEnabledExtensionNames
nullptr // const VkPhysicalDeviceFeatures
*pEnabledFeatures
};

if( vkCreateDevice( selected_physical_device, &device_create_info, nullptr,


&Vulkan.Device ) != VK_SUCCESS ) {
printf( "Could not create Vulkan device!\n" );
return false;
}

Vulkan.QueueFamilyIndex = selected_queue_family_index;
return true;
15. Tutorial01.cpp, function CreateDevice()

First we make sure that after we exited the device features loop, we have found the device that supports our needs.
Next we can create a logical device, which is done by calling vkCreateDevice(). It takes the handle to a physical device and
an address of a structure that contains the information necessary for device creation. This structure is of type
VkDeviceCreateInfo and contains the following fields:

 sType – Standard type of a provided structure, VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO here that means


we are providing parameters for a device creation.
 pNext – Parameter pointing to an extension specific structure; here we set it to null.
 flags – Another parameter reserved for future use which must be zero.
 queueCreateInfoCount – Number of different queue families from which we create queues along with the
device.
 pQueueCreateInfos – Pointer to an array of queueCreateInfoCount elements specifying queues we want to
create.
 enabledLayerCount – Number of device-level validation layers to enable.
 ppEnabledLayerNames – Pointer to an array with enabledLayerCount names of device-level layers to enable.
 enabledExtensionCount – Number of extensions to enable for the device.
 ppEnabledExtensionNames – Pointer to an array with enabledExtensionCount elements; each element must
contain the name of an extension that should be enabled.
 pEnabledFeatures – Pointer to a structure indicating additional features to enable for this device (see the
“Device ” section).

Features (as I have described earlier) are additional hardware capabilities that are disabled by default. If we want to
enable all available features, we can’t simply fill this structure with ones. If some feature is not supported, the device
creation will fail. Instead, we should pass a structure that was filled when we called vkGetPhysicalDeviceFeatures(). This
is the easiest way to enable all supported features. If we are interested only in some specific features, we query the driver
for available features and clear all unwanted fields. If we don’t want any of the additional features we can clear this
structure (fill it with zeros) or pass a null pointer for this parameter (like in this example).

Queues are created automatically along with the device. To specify what types of queues we want to enable, we
provide an array of additional VkDeviceQueueCreateInfo structures. This array must contain queueCreateInfoCount
elements. Each element in this array must refer to a different queue family; we refer to a specific queue family only once.

The VkDeviceQueueCreateInfo structure contains the following fields:

 sType –Type of structure, here VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO indicating it’s queue


creation information.
 pNext – Pointer reserved for extensions.
 flags – Value reserved for future use.
 queueFamilyIndex – Index of a queue family from which queues should be created.
 queueCount – Number of queues we want to enable in this specific queue family (number of queues we want
to use from this family) and a number of elements in the pQueuePriorities array.
 pQueuePriorities – Array with floating point values describing priorities of operations performed in each
queue from this family.

As I mentioned previously, each element in the array with VkDeviceQueueCreateInfo elements must describe a
different queue family. Its index is a number that must be smaller than the value provided by the
vkGetPhysicalDeviceQueueFamilyProperties() function (must be smaller than number of available queue families). In
our example we are only interested in one queue from one queue family. And that’s why we must remember the queue
family index. It is used right here. If we want to prepare a more complicated scenario, we should also remember the
number of queues in each family as each family may support a different number of queues. And we can’t create more
queues than are available in a given family!

It is also worth noting that different queue families may have similar (or even identical properties) meaning they may
support similar types of operations, that is, there may be more than one queue families that support graphics operations.
And each family may contain different number of queues.

We must also assign a floating point value (from 0.0 to 1.0, both inclusive) to each queue. The higher the value we
provide for a given queue (relative to values assigned to other queues) the more time the given queue may have for
processing commands (relatively to other queues). But this relation is not guaranteed. Priorities also don’t influence
execution order. It is just a hint.

Priorities are relative only on a single device. If operations are performed on multiple devices, priorities may impact
processing time in each of these devices but not between them. A queue with a given value may be more important only
than queues with lower priorities on the same device. Queues from different devices are treated independently. Once we
fill these structures and call vkCreateDevice(), upon success a created logical device is stored in a variable we provided an
address of (in our example it is called VulkanDevice). If this function fails, it returns a value other than VK_SUCCESS.

Acquiring Pointers to Device-Level Functions


We have created a logical device. We can now use it to load functions from the device level. As I have mentioned
earlier in real-life scenarios, there will be situations where more than one hardware vendor on a single computer will
provide us with Vulkan implementation. With OpenGL it is happening now. Many computers have dedicated/discrete
graphics card used mainly for gaming, but they also have Intel’s graphics card built into the processor (which of course
can also be used for games). So in the future there will be more than one device supporting Vulkan. And with Vulkan we
can divide processing into whatever hardware we want. Remember when there were extension cards dedicated for
physics processing? Or going farther into the past, a normal “2D” card with additional graphics “accelerator” (do you
remember Voodoo cards)? Vulkan is ready for any such scenario.

So what should we do with device-level functions if there can be so many devices? We can load universal procedures.
This is done with the vkGetInstanceProcAddr() function. It returns the addresses of dispatch functions that perform jumps
to proper implementations based on a provided logical device handle. But we can avoid this overhead by loading functions
for each logical device separately. With this method, we must remember that we can call the given function only with the
device we loaded this function from. So if we are using more devices in our application we must load functions from each
of these devices. It’s not that difficult. And despite this leading to storing more functions (and grouping them based on a
device they were loaded from), we can avoid one level of abstraction and save some processor time. We can load functions
similarly to how we have loaded exported, global-, and instance-level functions:
#define VK_DEVICE_LEVEL_FUNCTION( fun ) \
if( !(fun = (PFN_##fun)vkGetDeviceProcAddr( Vulkan.Device, #fun )) ) { \
printf( "Could not load device level function: " #fun "!\n" ); \
return false; \
}

#include "ListOfFunctions.inl"

return true;
16. Tutorial01.cpp, function LoadDeviceLevelEntryPoints()

This time we used the vkGetDeviceProcAddr() function along with a logical device handle. Functions from device level
are placed in a shared header. This time they are wrapped in a VK_DEVICE_LEVEL_FUNCTION() macro like this:
#if !defined(VK_DEVICE_LEVEL_FUNCTION)
#define VK_DEVICE_LEVEL_FUNCTION( fun )
#endif
VK_DEVICE_LEVEL_FUNCTION( vkGetDeviceQueue )
VK_DEVICE_LEVEL_FUNCTION( vkDestroyDevice )
VK_DEVICE_LEVEL_FUNCTION( vkDeviceWaitIdle )

#undef VK_DEVICE_LEVEL_FUNCTION
17. ListOfFunctions.inl

All functions that are not from the exported, global or instance levels are from the device level. Another distinction
can be made based on a first parameter: for device-level functions, the first parameter in the list may only be of type
VkDevice, VkQueue, or VkCommandBuffer. In the rest of the tutorial if a new function appears it must be added to
ListOfFunctions.inl and further added in the VK_DEVICE_LEVEL_FUNCTION portion (with a few noted exceptions like
extensions).

Retrieving Queues
Now that we have created a device, we need a queue that we can submit some commands to for processing. Queues
are automatically created with a logical device, but in order to use them we must specifically ask for a queue handle. This
is done with vkGetDeviceQueue() like this:
vkGetDeviceQueue( Vulkan.Device, Vulkan.QueueFamilyIndex, 0, &Vulkan.Queue );

18. Tutorial01.cpp, function GetDeviceQueue()

To retrieve the queue handle we must provide the logical device we want to get the queue from. The queue family
index is also needed and it must by one of the indices we’ve provided during logical device creation (we cannot create
additional queues or use queues from families we didn’t request). One last parameter is a queue index from within a given
family; it must be smaller than the total number of queues we requested from a given family. For example if the device
supports five queues in family number 3 and we want two queues from that family, the index of a queue must be smaller
than two. For each queue we want to retrieve we have to call this function and make a separate query. If the function call
succeeds, it will store a handle to a requested queue in a variable we have provided the address of in the final parameter.
From now on, all the work we want to perform (using command buffers) can be submitted for processing to the acquired
queue.

Tutorial01 Execution
As I have mentioned, the example provided with this tutorial doesn’t display anything. But we have learned enough
information for one lesson. So how do we know if everything went fine? If the normal application window appears and
nothing is printed in the console/terminal, this means the Vulkan setup was successful. Starting with the next tutorial, the
results of our operations will be displayed on the screen.

Cleaning Up
There is one more thing we need to remember: cleaning up and freeing resources. Cleanup must be done in a specific
order that is (in general) a reversal of the order of creation.

After the application is closed, the OS should release memory and all other resources associated with it. This should
include Vulkan; the driver usually cleans up unreferenced resources. Unfortunately, this cleaning may not be performed
in a proper order, which might lead to application crash during the closing process. It is always good practice to do the
cleaning ourselves. Here is the sample code required to release resources we have created during this first tutorial:
if( Vulkan.Device != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( Vulkan.Device );
vkDestroyDevice( Vulkan.Device, nullptr );
}
if( Vulkan.Instance != VK_NULL_HANDLE ) {
vkDestroyInstance( Vulkan.Instance, nullptr );
}

if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
dlclose( VulkanLibrary );
#endif
}
19. Tutorial01.cpp, destructor

We should always check to see whether any given resource was created. Without a logical device there are no device-
level function pointers so we are unable to call even proper resource cleaning functions. Similarly, without an instance we
are unable to acquire pointer to a vkDestroyInstance() function. In general we should not release resources that weren’t
created.

We must ensure that before deleting any object, it is not being used by a device. That’s why there is a wait function,
which will block until all processing on all queues of a given device is finished. Next, we destroy the logical device using
the vkDestroyDevice() function. All queues associated with it are destroyed automatically, then the instance is destroyed.
After that we can free (unload or release) a Vulkan library from which all these functions were acquired.

Conclusion
This tutorial explained how to prepare to use Vulkan in our application. First we “connect” with the Vulkan Runtime
library and load global level functions from it. Then we create a Vulkan instance and load instance-level functions. After
that we can check what physical devices are available and what are their features, properties, and capabilities. Next we
create a logical device and describe what and how many queues must be created along with the device. After that we can
retrieve device-level functions using the newly created logical device handle. One additional thing to do is to retrieve
queues through which we can submit work for execution.

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.


© 2016 Intel Corporation.
API without Secrets: Introduction to Vulkan*
Part 2
Table of Contents
Tutorial 2: Swap Chain – Integrating Vulkan with the OS ....................................................................................................... 3
Asking for a Swap Chain Extension ..................................................................................................................................... 3
Checking Whether an Instance Extension Is Supported ................................................................................................. 4
Enabling an Instance-Level Extension ............................................................................................................................. 5
Creating a Presentation Surface ..................................................................................................................................... 7
Checking Whether a Device Extension is Supported ...................................................................................................... 8
Checking Whether Presentation to a Given Surface Is Supported ................................................................................. 9
Creating a Device with a Swap Chain Extension Enabled ............................................................................................. 11
Creating a Semaphore....................................................................................................................................................... 13
Creating a Swap Chain ...................................................................................................................................................... 14
Acquiring Surface Capabilities....................................................................................................................................... 14
Acquiring Supported Surface Formats .......................................................................................................................... 14
Acquiring Supported Present Modes ............................................................................................................................ 15
Selecting the Number of Swap Chain Images ............................................................................................................... 15
Selecting a Format for Swap Chain Images ................................................................................................................... 16
Selecting the Size of the Swap Chain Images ................................................................................................................ 16
Selecting Swap Chain Usage Flags................................................................................................................................. 17
Selecting Pre-Transformations...................................................................................................................................... 17
Selecting Presentation Mode ........................................................................................................................................ 18
Creating a Swap Chain .................................................................................................................................................. 22
Image Presentation ........................................................................................................................................................... 24
Checking What Images Were Created in a Swap Chain .................................................................................................... 29
Recreating a Swap Chain ................................................................................................................................................... 29
Quick Dive into Command Buffers .................................................................................................................................... 30
Creating Command Buffer Memory Pool ..................................................................................................................... 30
Allocating Command Buffers ........................................................................................................................................ 31
Recording Command Buffers ........................................................................................................................................ 32
Image Layouts and Layout Transitions .......................................................................................................................... 34
Recording Command Buffers ........................................................................................................................................ 34
Tutorial 2 Execution .......................................................................................................................................................... 37
Cleaning Up ....................................................................................................................................................................... 37
Conclusion ......................................................................................................................................................................... 39
Tutorial 2: Swap Chain – Integrating Vulkan with the OS
Welcome to the second Vulkan tutorial. In the first tutorial, I discussed basic Vulkan setup: function loading, instance
creation, choosing a physical device and queues, and logical device creation. I’m sure you now want to draw something!
Unfortunately we must wait until the next part. Why? Because if we draw something we’ll want to see it. Similar to
OpenGL*, we must integrate the Vulkan pipeline with the application and API that the OS provides. However, with Vulkan,
this task unfortunately isn’t simple and obvious. And as with all other thin APIs, it is done this way on purpose—for the
sake of high performance and flexibility.

So how do you integrate Vulkan with the application’s window? What are the differences compared to OpenGL? In
OpenGL (on Microsoft Windows*) we acquire Device Context that is associated with the application’s window. Using it we
then have to define “how” to present images on the screen, “what” the format is of the application’s window we will be
drawing on, and what capabilities it should support. This is done through the pixel format. Most of the time we create a
32-bit color surface with a 24-bit depth buffer and a support for double buffering (this way we can draw something to a
“hidden” back buffer, and after we’re finished we can present it on the screen—swap front and back buffers). Only after
these preparations can we create a Rendering Context and activate it. In OpenGL, all the rendering is directed to the
default, back buffer.

In Vulkan there is no default frame buffer. We can create an application that displays nothing at all. This is a valid
approach. But if we want to display something we can create a set of buffers to which we can render. These buffers along
with their properties, similar to Direct3D*, are called a swap chain. A swap chain can contain many images. To display any
of them we don’t “swap” them—as the name suggests—but we present them, which means that we give them back to a
presentation engine. So in OpenGL we first have to define the surface format and associate it with a window (at least on
Windows) and after that we create Rendering Context. In Vulkan, we first create an instance, a device, and then we create
a swap chain. But, what’s interesting is that there will be situations where we will have to destroy this swap chain and
recreate it. In the middle of a working application. From scratch!

Asking for a Swap Chain Extension


In Vulkan, a swap chain is an extension. Why? Isn’t it obvious we want to display an image on the screen in our
application’s window?

Well, it’s not so obvious. Vulkan can be used for many different purposes, including performing mathematical
operations, boosting physics calculations, and processing a video stream. The results of these actions may not necessarily
be displayed on a typical monitor, which is why the core API is OS-agnostic, similar to OpenGL.

If you want to create a game and display rendered images on a monitor, you can (and should) use a swap chain. But
here is the second reason why a swap chain is an extension. Every OS displays images in a different way. The surface on
which you can render may be implemented differently, can have a different format, and can be differently represented in
the OS—there is no one universal way to do it. So in Vulkan a swap chain must also depend on the OS your application is
written for.

These are the reasons a swap chain in Vulkan is treated as an extension: it provides render targets (buffers or images
like FBOs in OpenGL) that integrates with OS specific code. It’s something that core Vulkan (which is platform independent)
can’t do. So if swap chain creation and usage is an extension, we have to ask for the extension during both instance and
device creation. The ability to create and use a swap chain requires us to enable extensions at two levels (at least on most
operating systems, with Windows and Linux* among them). This means that we have to go back to the first tutorial and
change it to request the proper swap-chain-related extensions. If a given instance and device doesn’t support these
extensions, the instance and/or device creation will fail. There are of course other ways through which we can display an
image, like acquiring the pointer to a buffer’s (texture’s) memory (mapping it) and copying data from it to the OS-acquired
window’s surface pointer. This process is out of scope of this tutorial (though not really that hard). But fortunately it seems
that swap chain extensions will be similar to OpenGL’s core extensions: they will be something that’s not in the core spec
and that’s not required to be implemented but they also are something that every hardware vendor will implement
anyway. I think all hardware vendors would like to show that they support Vulkan and that it gives impressive performance
boost in games which are displayed on screen. And, what backs this theory, swap chain extensions are integrated into the
main, “core” vulkan.h header.

In the case of swap-chain support, there are actually three extensions involved: two from an instance level and one
from a device level. These extensions logically separate different functionalities. The first is the VK_KHR_surface extension
defined at the instance level. It describes a “surface” object, which is a logical representation of an application’s window.
This extension allows us to check different parameters (that is, capabilities, supported formats, size) of a surface and to
query whether the given physical device supports a swap chain (more precisely, whether the given queue family supports
presenting an image to a given surface). This is useful information because we don’t want to choose a physical device and
try to create a logical device from it only to find out that it doesn’t support swap chains. This extension also defines
methods to destroy any such surface.

The second instance-level extension is OS-dependent: in the Windows OS family it is called VK_KHR_win32_surface
and in Linux it is called VK_KHR_xlib_surface or VK_KHR_xcb_surface. This extension allows us to create a surface that
represents the application’s window in a given OS (and uses OS-specific parameters).

Checking Whether an Instance Extension Is Supported


Before we can enable the two instance-level extensions, we need to check whether they are available or supported.
We are talking about instance extensions and we haven’t created any instance yet. To determine whether our Vulkan
instance supports these extensions, we use a global-level function called vkEnumerateInstanceExtensionProperties(). It
enumerates all available instance general extensions, if its first parameter is null, or instance layer extensions (it seems
that layers can also have extensions), if we set the first parameter to the name of any given layer. We aren’t interested in
layers so we leave the first parameter set to null. Again we call this function twice. For the first call, we want to acquire
the total number of supported extensions so we leave the third argument nulled. Next we prepare storage for all these
extensions and we call this function once again with the third parameter pointing to the allocated storage.
uint32_t extensions_count = 0;
if( (vkEnumerateInstanceExtensionProperties( nullptr, &extensions_count, nullptr ) !=
VK_SUCCESS) ||
(extensions_count == 0) ) {
printf( "Error occurred during instance extensions enumeration!\n" );
return false;
}

std::vector<VkExtensionProperties> available_extensions( extensions_count );


if( vkEnumerateInstanceExtensionProperties( nullptr, &extensions_count,
&available_extensions[0] ) != VK_SUCCESS ) {
printf( "Error occurred during instance extensions enumeration!\n" );
return false;
}

std::vector<const char*> extensions = {


VK_KHR_SURFACE_EXTENSION_NAME,
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VK_KHR_WIN32_SURFACE_EXTENSION_NAME
#elif defined(VK_USE_PLATFORM_XCB_KHR)
VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VK_KHR_XLIB_SURFACE_EXTENSION_NAME
#endif
};

for( size_t i = 0; i < extensions.size(); ++i ) {


if( !CheckExtensionAvailability( extensions[i], available_extensions ) ) {
printf( "Could not find instance extension named \"%s\"!\n", extensions[i] );
return false;
}
}
1. Tutorial02.cpp, function CreateInstance()

We can prepare a place for a smaller amount of extensions, but then vkEnumerateInstanceExtensionProperties() will
return VK_INCOMPLETE to let us know we didn’t acquire all the extensions.

Our array is now filled with all available (supported) instance-level extensions. Each element of our allocated space
contains the name of the extension and its version. The second parameter probably won’t be used too often, but it may
be useful to check whether the hardware supports the given version of the extension. For example, we might be
interested in some specific extension, and we downloaded an SDK for it that contains a set of header files. Each header
file has its own version corresponding to the value returned by this query. If the hardware our application is executed on
supports an older version of the extension (not the one we downloaded the SDK for) it may not support all the functions
we are using from this specific extension. So sometimes it may be useful to also verify the version, but for a swap chain it
doesn’t matter—at least for now.

We can now search through all of the returned extensions and see whether the list contains the extensions we are
looking for. Here I’m using two convenient definitions named VK_KHR_SURFACE_EXTENSION_NAME and
VK_KHR_????_SURFACE_EXTENSION_NAME. They are defined inside a Vulkan header file and contain the names of the
extensions so we don’t have to copy or remember them. We just can use the definitions in our code, and if we make a
mistake the compiler will tell us. I hope all extensions will come with a similar definition.

With the second definition comes a small trap. These two mentioned defines are placed in a vulkan.h header file. But
isn’t the second define specific for a given OS and isn’t vulkan.h header OS independent? Both questions are true and
quite valid. The vulkan.h file is OS-independent and it contains the definitions of OS-specific extensions. But these are
enclosed inside #ifdef … #endif preprocessor directives. If we want to “enable” them we need to add a proper preprocessor
directive somewhere in our project. For a Windows system, we need to add a VK_USE_PLATFORM_WIN32_KHR string. On
Linux, we need to add VK_USE_PLATFORM_XCB_KHR or VK_USE_PLATFORM_XLIB_KHR depending on whether we want
to use the X11 or XCB libraries. In the provided example project, these definitions are added by default through the
CMakeLists.txt file.

But back to our source code. What does the CheckExtensionAvailability() function do? It loops over all available
extensions and compares their names with the name of the provided extension. If a match is found, it just returns true.
for( size_t i = 0; i < available_extensions.size(); ++i ) {
if( strcmp( available_extensions[i].extensionName, extension_name ) == 0 ) {
return true;
}
}
return false;
2. Tutorial02.cpp, function CheckExtensionAvailability()

Enabling an Instance-Level Extension


Let’s say we have verified that both extensions are supported. Instance-level extensions are requested (enabled)
during instance creation—we create an instance with a list of extensions that should be enabled. Here’s the code
responsible for doing it:
VkApplicationInfo application_info = {
VK_STRUCTURE_TYPE_APPLICATION_INFO, // VkStructureType sType
nullptr, // const void *pNext
"API without Secrets: Introduction to Vulkan", // const char
*pApplicationName
VK_MAKE_VERSION( 1, 0, 0 ), // uint32_t
applicationVersion
"Vulkan Tutorial by Intel", // const char
*pEngineName
VK_MAKE_VERSION( 1, 0, 0 ), // uint32_t
engineVersion
VK_API_VERSION // uint32_t
apiVersion
};

VkInstanceCreateInfo instance_create_info = {
VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkInstanceCreateFlags flags
&application_info, // const VkApplicationInfo
*pApplicationInfo
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
static_cast<uint32_t>(extensions.size()), // uint32_t
enabledExtensionCount
&extensions[0] // const char * const
*ppEnabledExtensionNames
};

if( vkCreateInstance( &instance_create_info, nullptr, &Vulkan.Instance ) !=


VK_SUCCESS ) {
printf( "Could not create Vulkan instance!\n" );
return false;
}
return true;
3. Tutorial02.cpp, function CreateInstance()

This code is similar to the CreateInstance() function in the Tutorial01.cpp file. To request instance-level extensions we
have to prepare an array with the names of all extensions we want to enable. Here I have used a standard vector with
“const char*” elements and mentioned extension names in forms of defines.

In Tutorial 1 we declared zero extensions and placed a nullptr for the address of an array in a VkInstanceCreateInfo
structure. This time we must provide an address of the first element of an array filled with the names of the requested
extensions. And we must also specify how many elements the array contains (that’s why I chose a vector: if I add or remove
extensions in future tutorials, the vector’s size will also change accordingly). Next we call the vkCreateInstance() function.
If it doesn’t return VK_SUCCESS it means that (in the case of this tutorial) extensions are not supported. If it does return
successfully, we can load instance-level functions as previously, but this time also with some additional, extension-specific
functions.

With these extensions come additional functions. And, as it is an instance-level extension, we must add them to our
set of instance-level functions (so they will also be loaded at a proper moment and with a proper function). In this case
we must add the following functions into a ListOfFunctions.inl wrapped into a VK_INSTANCE_LEVEL_FUNCTION() macro
like this:
// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_INSTANCE_LEVEL_FUNCTION( vkDestroySurfaceKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceSupportKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceCapabilitiesKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceFormatsKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfacePresentModesKHR )
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateWin32SurfaceKHR )
#elif defined(VK_USE_PLATFORM_XCB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXcbSurfaceKHR )
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXlibSurfaceKHR )
#endif
#endif
4. ListOfFunctions.inl

One more thing: I’ve wrapped all these swap-chain-related functions inside another #ifdef … #endif pair, which
requires a USE_SWAPCHAIN_EXTENSIONS preprocessor directive to be defined. I’ve done this so Tutorial 1 would properly
work. Without it, our first application (as it uses the same header files) would try to load all these functions. But we don’t
enable swap chain extensions in the first tutorial, so this operation would fail and the application would close without fully
initializing Vulkan. If a given extension isn’t enabled, functions from it may not be available.

Creating a Presentation Surface


We have created a Vulkan instance with two extensions enabled. We have loaded instance-level functions from a core
Vulkan spec and from enabled extensions (this is done automatically thanks to our macros). To create a surface, we write
code similar to the following:
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VkWin32SurfaceCreateInfoKHR surface_create_info = {
VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkWin32SurfaceCreateFlagsKHR
flags
Window.Instance, // HINSTANCE
hinstance
Window.Handle // HWND
hwnd
};

if( vkCreateWin32SurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr,


&Vulkan.PresentationSurface ) == VK_SUCCESS ) {
return true;
}

#elif defined(VK_USE_PLATFORM_XCB_KHR)
VkXcbSurfaceCreateInfoKHR surface_create_info = {
VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkXcbSurfaceCreateFlagsKHR
flags
Window.Connection, // xcb_connection_t*
connection
Window.Handle // xcb_window_t
window
};

if( vkCreateXcbSurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr,


&Vulkan.PresentationSurface ) == VK_SUCCESS ) {
return true;
}

#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VkXlibSurfaceCreateInfoKHR surface_create_info = {
VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkXlibSurfaceCreateFlagsKHR
flags
Window.DisplayPtr, // Display
*dpy
Window.Handle // Window
window
};
if( vkCreateXlibSurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr,
&Vulkan.PresentationSurface ) == VK_SUCCESS ) {
return true;
}

#endif

printf( "Could not create presentation surface!\n" );


return false;
5. ListOfFunctions.inl

To create a presentation surface, we call the vkCreate????SurfaceKHR() function, which accepts Vulkan Instance (with
enabled surface extensions), a pointer to a OS-specific structure, a pointer to optional memory allocation handling
functions, and a pointer to a variable in which a handle to a created surface will be stored.

This OS-specific structure is called Vk????SurfaceCreateInfoKHR and it contains the following fields:

 sType – Standard type of structure that here should be equal to


VK_STRUCTURE_TYPE_????_SURFACE_CREATE_INFO_KHR (where ???? can be WIN32, XCB, XLIB, or other)
 pNext – Standard pointer to some other structure
 flags – Parameter reserved for future use
 hinstance/connection/dpy – First OS-specific parameter
 hwnd/window – Handle to our application’s window (also OS specific)

Checking Whether a Device Extension is Supported


We have created an instance and a surface. The next step is to create a logical device. But we want to create a device
that supports a swap chain. So we also need to check whether a given physical device supports a swap chain extension, a
device-level extension. This extension is called VK_KHR_swapchain, and it defines the actual support, implementation,
and usage of a swap chain.

To check what extensions given physical device supports we must create code similar to the code prepared for
instance-level extensions. This time we just use the vkEnumerateDeviceExtensionProperties() function. It behaves
identically to the function querying instance extensions. The only difference is that it takes an additional physical device
handle in the first argument. The code for this may look similar to the example below. It is a part of the
CheckPhysicalDeviceProperties() function in our example source code.
uint32_t extensions_count = 0;
if( (vkEnumerateDeviceExtensionProperties( physical_device, nullptr,
&extensions_count, nullptr ) != VK_SUCCESS) ||
(extensions_count == 0) ) {
printf( "Error occurred during physical device %p extensions enumeration!\n",
physical_device );
return false;
}

std::vector<VkExtensionProperties> available_extensions( extensions_count );


if( vkEnumerateDeviceExtensionProperties( physical_device, nullptr,
&extensions_count, &available_extensions[0] ) != VK_SUCCESS ) {
printf( "Error occurred during physical device %p extensions enumeration!\n",
physical_device );
return false;
}

std::vector<const char*> device_extensions = {


VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

for( size_t i = 0; i < device_extensions.size(); ++i ) {


if( !CheckExtensionAvailability( device_extensions[i], available_extensions ) ) {
printf( "Physical device %p doesn't support extension named \"%s\"!\n",
physical_device, device_extensions[i] );
return false;
}
}
6. Tutorial02.cpp, function CheckPhysicalDeviceProperties()

We first ask for the number of all extensions available on a given physical device. Next we get their names and look
for the device-level swap-chain extension. If there is none there is no point in further checking the device’s properties,
features, and queue families’ properties as a given device doesn’t support swap chain at all.

Checking Whether Presentation to a Given Surface Is Supported


Let’s go back to the CreateDevice() function. After creating an instance, in the first tutorial we looped through all
available physical devices and queried their properties. Based on these properties we selected which device we want to
use and which queue families we want to request. This query is done in a loop over all available physical devices. Now that
we want to use swap chain I have to modify my CheckPhysicalDeviceProperties() function that is called inside a mentioned
loop from CreateDevice() function like this:
uint32_t selected_graphics_queue_family_index = UINT32_MAX;
uint32_t selected_present_queue_family_index = UINT32_MAX;

for( uint32_t i = 0; i < num_devices; ++i ) {


if( CheckPhysicalDeviceProperties( physical_devices[i],
selected_graphics_queue_family_index, selected_present_queue_family_index ) ) {
Vulkan.PhysicalDevice = physical_devices[i];
}
}
7. Tutorial02.cpp, function CreateDevice()

The only change is that I’ve added another variable that will contain an index of a queue family that supports a swap
chain (more precisely image presentation). Unfortunately, just checking whether swap extension is supported is not
enough because presentation support is a queue family property. A physical device may support swap chains, but that
doesn’t mean that all its queue families also support it. And do we really need another queue or queue family for displaying
images? Can’t we just use graphics queue that we’d selected in the first tutorial? Most of the time one queue family will
probably be enough for our needs. This means that the selected queue family will support both graphics operations and
a presentation. But, unfortunately, it is also possible that there will be devices that won’t support graphics and presenting
within a single queue family. In Vulkan we have to be flexible and prepared for any situation.

vkGetPhysicalDeviceSurfaceSupportKHR() function is used to check whether a given queue family from a given
physical device supports a swap chain or, to be more precise, whether it supports presenting images to a given surface.
That’s why we needed to create a surface earlier.
So assume we have already checked whether a given physical device exposes a swap-chain extension and that we
have already queried for a number of different queue families supported by a given physical device. We have also
requested the properties of all queue families. Now we can check whether a given queue family supports presentation to
our surface (window).
uint32_t graphics_queue_family_index = UINT32_MAX;
uint32_t present_queue_family_index = UINT32_MAX;

for( uint32_t i = 0; i < queue_families_count; ++i ) {


vkGetPhysicalDeviceSurfaceSupportKHR( physical_device, i,
Vulkan.PresentationSurface, &queue_present_support[i] );

if( (queue_family_properties[i].queueCount > 0) &&


(queue_family_properties[i].queueFlags & VK_QUEUE_GRAPHICS_BIT) ) {
// Select first queue that supports graphics
if( graphics_queue_family_index == UINT32_MAX ) {
graphics_queue_family_index = i;
}

// If there is queue that supports both graphics and present - prefer it


if( queue_present_support[i] ) {
selected_graphics_queue_family_index = i;
selected_present_queue_family_index = i;
return true;
}
}
}

// We don't have queue that supports both graphics and present so we have to use
separate queues
for( uint32_t i = 0; i < queue_families_count; ++i ) {
if( queue_present_support[i] ) {
present_queue_family_index = i;
break;
}
}

// If this device doesn't support queues with graphics and present capabilities don't
use it
if( (graphics_queue_family_index == UINT32_MAX) ||
(present_queue_family_index == UINT32_MAX) ) {
printf( "Could not find queue families with required properties on physical device
%p!\n", physical_device );
return false;
}

selected_graphics_queue_family_index = graphics_queue_family_index;
selected_present_queue_family_index = present_queue_family_index;
return true;
8. Tutorial02.cpp, function CheckPhysicalDeviceProperties()

Here we are iterating over all available queue families. In each loop iteration, we are calling a function responsible for
checking whether a given queue family supports presentation. vkGetPhysicalDeviceSurfaceSupportKHR() function
requires us to provide a physical device handle, the queue family index we want to check, and the surface handle we want
to render into (present an image). If support is available, VK_TRUE will be stored at a given address; otherwise VK_FALSE
is stored.

Now we have the properties of all available queue families. We know which queue family supports graphics operations
and which supports presentation. In our tutorial example I prefer families that support both. If I find one I store the family
index and exit immediately from CheckPhysicalDeviceProperties() function. If there is no such queue family I use the first
queue family that supports graphics and a first family that supports presenting. Only then can I leave the function with a
“success” return code.

A more advanced scenario may search through all available devices and try to find one with a queue family that
supports both graphics and presentation operations. But I can also imagine situations when there will be no single device
that supports both. Then we are forced to use one device for graphics calculations (maybe like the old “graphics
accelerator”) and another device for presenting results on the screen (connected with the “accelerator” and a monitor).
Unfortunately in such case we must use “general” Vulkan functions from the Vulkan Runtime or we need to store
device-level functions for each used device (each device may have a different implementation of Vulkan functions). But,
hopefully, such situations will be uncommon.

Creating a Device with a Swap Chain Extension Enabled


Now we can return to the CreateDevice() function. We have found the physical device that supports both graphics
and presenting but not necessarily in a single queue family. We now need to create a logical device.
std::vector<VkDeviceQueueCreateInfo> queue_create_infos;
std::vector<float> queue_priorities = { 1.0f };

queue_create_infos.push_back( {
VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceQueueCreateFlags
flags
selected_graphics_queue_family_index, // uint32_t
queueFamilyIndex
static_cast<uint32_t>(queue_priorities.size()), // uint32_t
queueCount
&queue_priorities[0] // const float
*pQueuePriorities
} );

if( selected_graphics_queue_family_index != selected_present_queue_family_index ) {


queue_create_infos.push_back( {
VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceQueueCreateFlags
flags
selected_present_queue_family_index, // uint32_t
queueFamilyIndex
static_cast<uint32_t>(queue_priorities.size()), // uint32_t
queueCount
&queue_priorities[0] // const float
*pQueuePriorities
} );
}

std::vector<const char*> extensions = {


VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

VkDeviceCreateInfo device_create_info = {
VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDeviceCreateFlags
flags
1, // uint32_t
queueCreateInfoCount
&queue_create_infos[0], // const VkDeviceQueueCreateInfo
*pQueueCreateInfos
0, // uint32_t
enabledLayerCount
nullptr, // const char * const
*ppEnabledLayerNames
static_cast<uint32_t>(extensions.size()), // uint32_t
enabledExtensionCount
&extensions[0], // const char * const
*ppEnabledExtensionNames
nullptr // const VkPhysicalDeviceFeatures
*pEnabledFeatures
};

if( vkCreateDevice( Vulkan.PhysicalDevice, &device_create_info, nullptr,


&Vulkan.Device ) != VK_SUCCESS ) {
printf( "Could not create Vulkan device!\n" );
return false;
}

Vulkan.GraphicsQueueFamilyIndex = selected_graphics_queue_family_index;
Vulkan.PresentQueueFamilyIndex = selected_present_queue_family_index;
return true;
9. Tutorial02.cpp, function CreateDevice()

As before, we need to fill a variable of VkDeviceCreateInfo type. To do this, we need to declare the queue families and
how many queues each we want to enable. We do this through a pointer to a separate array with
VkDeviceQueueCreateInfo elements. Here I declare a vector and I add one element, which defines one queue from the
queue family that supports graphics operations. We use a vector because if graphics and presenting aren’t supported by
a single family, we will need to define two separate families. If a single family supports both we just define one member
and declare that only one family is needed. If the indices of graphics and presentation families are different we need to
declare additional members for our vector with VkDeviceQueueCreateInfo elements. In this case the VkDeviceCreateInfo
structure must provide info about two different families. That’s why a vector once again comes in handy (with its size()
member function).

But we are not finished with device creation yet. We have to ask for the third extension related to a swap chain—a
device-level “VK_KHR_swapchain” extension. As mentioned earlier, this extensions defines the actual support,
implementation, and usage of a swap chain.

To ask for this extension, similarly at an instance level, we define an array (or a vector) which contains all the names
of device-level extensions we want to enable. We provide an address of a first element of this array and the number of
extensions we want to use. This extension also contains a definition of its name in a form of a #define
VK_KHR_SWAPCHAIN_EXTENSION_NAME. We can use it inside our array (vector), and we don’t have to worry about any
typos.

This third extension introduces additional functions used to actually create, destroy, or in general manage swap chains.
Before we can use them, we of course need to load pointers to these functions. They are from the device level so we will
place them in a ListOfFunctions.inl file using VK_DEVICE_LEVEL_FUNCTION() macro:
// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_DEVICE_LEVEL_FUNCTION( vkCreateSwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkDestroySwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkGetSwapchainImagesKHR )
VK_DEVICE_LEVEL_FUNCTION( vkAcquireNextImageKHR )
VK_DEVICE_LEVEL_FUNCTION( vkQueuePresentKHR )
#endif
10. ListOfFunctions.inl

You can once again see that I’m checking whether a USE_SWAPCHAIN_EXTENSIONS preprocessor directive is defined.
I define it only in projects that enable swap-chain extensions.

Now that we have created a logical devices we need to receive handles of a graphics queue and (if separate)
presentation queue. I’m using two separate queue variables for convenience, but they both may contain the same handle.

After loading the device-level functions we can read requested queue handles. Here’s the code for it:
vkGetDeviceQueue( Vulkan.Device, Vulkan.GraphicsQueueFamilyIndex, 0,
&Vulkan.GraphicsQueue );
vkGetDeviceQueue( Vulkan.Device, Vulkan.PresentQueueFamilyIndex, 0,
&Vulkan.PresentQueue );
return true;
11. Tutorial02.cpp, function GetDeviceQueue()

Creating a Semaphore
One last step before we can move to swap chain creation and usage is to create a semaphore. Semaphores are objects
used for queue synchronization. They may be signaled or unsignaled. One queue may signal a semaphore (change its state
from unsignaled to signaled) when some operations are finished, and another queue may wait on the semaphore until it
becomes signaled. After that, the queue resumes performing operations submitted through command buffers.
VkSemaphoreCreateInfo semaphore_create_info = {
VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0 // VkSemaphoreCreateFlags flags
};

if( (vkCreateSemaphore( Vulkan.Device, &semaphore_create_info, nullptr,


&Vulkan.ImageAvailableSemaphore ) != VK_SUCCESS) ||
(vkCreateSemaphore( Vulkan.Device, &semaphore_create_info, nullptr,
&Vulkan.RenderingFinishedSemaphore ) != VK_SUCCESS) ) {
printf( "Could not create semaphores!\n" );
return false;
}

return true;
12. Tutorial02.cpp, function CreateSemaphores()

To create a semaphore we call the vkCreateSemaphore() function. It requires us to provide create information with
three fields:

 sType – Standard structure type that must be set to VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO in this


example.
 pNext – Standard parameter reserved for future use.
 flags – Another parameter that is reserved for future use and must equal zero.

Semaphores are used during drawing (or during presentation if we want to be more precise). I will describe the details
later.
Creating a Swap Chain
We have enabled support for a swap chain, but before we can render anything on screen we must first create a swap
chain from which we can acquire images on which we can render (or to which we can copy anything if we have rendered
something into another image).

To create a swap chain, we call the vkCreateSwapchainKHR() function. It requires us to provide an address of a
variable of type VkSwapchainCreateInfoKHR, which informs the driver about the properties of a swap chain that is being
created. To fill this structure with the proper values, we must determine what is possible on a given hardware and
platform. To do this we query the platform’s or window’s properties about the availability of and compatibility with several
different features, that is, supported image formats or present modes (how images are presented on screen). So before
we can create a swap chain we must check what is possible with a given platform and how we can create a swap chain.

Acquiring Surface Capabilities


First we must query for surface capabilities. To do this, we call the vkGetPhysicalDeviceSurfaceCapabilitiesKHR()
function like this:
VkSurfaceCapabilitiesKHR surface_capabilities;
if( vkGetPhysicalDeviceSurfaceCapabilitiesKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &surface_capabilities ) != VK_SUCCESS ) {
printf( "Could not check presentation surface capabilities!\n" );
return false;
}
13. Tutorial02.cpp, function CreateSwapChain()

Acquired capabilities contain important information about ranges (limits) that are supported by the swap chain, that
is, minimal and maximal number of images, minimal and maximal dimensions of images, or supported transforms (some
platforms may require transformations applied to images before these images may be presented).

Acquiring Supported Surface Formats


Next, we need to query for supported surface formats. Not all platforms are compatible with typical image formats
like non-linear 32-bit RGBA. Some platforms don’t have any preferences, but other may only support a small range of
formats. We can only select one of the available formats for a swap chain or its creation will fail.

To query for surface formats, we must call the vkGetPhysicalDeviceSurfaceFormatsKHR() function. We can do it, as
usual, twice: the first time to acquire the number of supported formats and a second time to acquire supported formats
in an array prepared for this purpose. It can be done like this:
uint32_t formats_count;
if( (vkGetPhysicalDeviceSurfaceFormatsKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &formats_count, nullptr ) != VK_SUCCESS) ||
(formats_count == 0) ) {
printf( "Error occurred during presentation surface formats enumeration!\n" );
return false;
}

std::vector<VkSurfaceFormatKHR> surface_formats( formats_count );


if( vkGetPhysicalDeviceSurfaceFormatsKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &formats_count, &surface_formats[0] ) != VK_SUCCESS ) {
printf( "Error occurred during presentation surface formats enumeration!\n" );
return false;
}
14. Tutorial02.cpp, function CreateSwapChain()
Acquiring Supported Present Modes
We should also ask for the available present modes, which tell us how images are presented (displayed) on the screen.
The present mode defines whether an application will wait for v-sync or whether it will display an image immediately
when it is available (which will probably lead to image tearing). I describe different present modes later.

To query for present modes that are supported on a given platform, we call the
vkGetPhysicalDeviceSurfacePresentModesKHR() function. We can create code similar to this one:
uint32_t present_modes_count;
if( (vkGetPhysicalDeviceSurfacePresentModesKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &present_modes_count, nullptr ) != VK_SUCCESS) ||
(present_modes_count == 0) ) {
printf( "Error occurred during presentation surface present modes enumeration!\n"
);
return false;
}

std::vector<VkPresentModeKHR> present_modes( present_modes_count );


if( vkGetPhysicalDeviceSurfacePresentModesKHR( Vulkan.PhysicalDevice,
Vulkan.PresentationSurface, &present_modes_count, &present_modes[0] ) != VK_SUCCESS ) {
printf( "Error occurred during presentation surface present modes enumeration!\n"
);
return false;
}
15. Tutorial02.cpp, function CreateSwapChain()

We now have acquired all the data that will help us prepare the proper values for a swap chain creation.

Selecting the Number of Swap Chain Images


A swap chain consists of multiple images. Several images (typically more than one) are required for the presentation
engine to work properly, that is, one image is presented on the screen, another image waits in a queue for the next v-
sync, and a third image is available for the application to render into.

An application may request more images. If it wants to use multiple images at once it may do so, for example, when
encoding a video stream where every fourth image is a key frame and the application needs it to prepare the remaining
three frames. Such usage will determine the number of images that will be automatically created in a swap chain: how
many images the application requires at once for processing and how many images the presentation engine requires to
function properly.

But we must ensure that the requested number of swap chain images is not smaller than the minimal required number
of images and not greater than the maximal supported number of images (if there is such a limitation). And too many
images will require much more memory. On the other hand, too small a number of images may cause stalls in the
application (more about this later).

The number of images that are required for a swap chain to work properly and for an application to be able to render
to is defined in the surface capabilities. Here is some code that checks whether the number of images is between the
allowable min and max values:
// Set of images defined in a swap chain may not always be available for application
to render to:
// One may be displayed and one may wait in a queue to be presented
// If application wants to use more images at the same time it must ask for more
images
uint32_t image_count = surface_capabilities.minImageCount + 1;
if( (surface_capabilities.maxImageCount > 0) &&
(image_count > surface_capabilities.maxImageCount) ) {
image_count = surface_capabilities.maxImageCount;
}
return image_count;
16. Tutorial02.cpp, function GetSwapChainNumImages()

The minImageCount value in the surface capabilities structure gives the required minimum number of images for the
swap chain to work properly. Here I’m selecting one more image than is required, and I also check whether I’m asking for
too much. One more image may be useful for triple buffering-like presentation mode (if it is available). In more advanced
scenarios we would also be required to store the number of images we want to use at the same time (at once). Let’s say
we want to encode a mentioned video stream and we need a key frame (every forth image frame) and the other three
images. But a swap chain doesn’t allow the application to operate on four images at once—only on three. We need to
know that because we can only prepare two frames from a key frame, then we need to release them (give them back to
a presentation engine) and only then can we acquire the last, third, non-key frame. This will become clearer later.

Selecting a Format for Swap Chain Images


Choosing a format for the images depends on the type of processing/rendering we want to do, that is, if we want to
blend the application window with the desktop contents, an alpha value may be required. We must also know what color
space is available and if we operate on linear or sRGB colorspace.

Each platform may support a different number of format-colorspace pairs. If we want to use specific ones we must
make sure that they are available.
// If the list contains only one entry with undefined format
// it mean that there are no preferred surface formats and any can be choosen
if( (surface_formats.size() == 1) &&
(surface_formats[0].format == VK_FORMAT_UNDEFINED) ) {
return{ VK_FORMAT_R8G8B8A8_UNORM, VK_COLORSPACE_SRGB_NONLINEAR_KHR };
}

// Check if list contains most widely used R8 G8 B8 A8 format


// with nonlinear color space
for( VkSurfaceFormatKHR &surface_format : surface_formats ) {
if( surface_format.format == VK_FORMAT_R8G8B8A8_UNORM ) {
return surface_format;
}
}

// Return the first format from the list


return surface_formats[0];
17. Tutorial02.cpp, function GetSwapChainFormat()

Earlier we requested a supported format which was placed in an array (a vector in our case). If this array contains only
one value with an undefined format, that platform doesn’t have any preferences. We can use any image format we want.

In other cases, we can use only one of the available formats. Here I’m looking for any (linear or not) 32-bit RGBA
format. If it is available I can choose it. If there is no such format I will use any format from the list (hoping that the first is
also the best and contains the format with the most precision).

Selecting the Size of the Swap Chain Images


Typically the size of swap chain images will be identical to the window size. We can choose other sizes, but we must
fit into image size constraints. The size of an image that would fit into the current application window’s size is available in
the surface capabilities structure, in “currentExtent” member.
One thing worth noting is that a special value of “-1” indicates that the application’s window size will be determined
by the swap chain size, so we can choose whatever dimension we want. But we must still make sure that the selected size
is not smaller and not greater than the defined minimum and maximum constraints.

Selecting the swap chain size may (and probably usually will) look like this:
// Special value of surface extent is width == height == -1
// If this is so we define the size by ourselves but it must fit within defined
confines
if( surface_capabilities.currentExtent.width == -1 ) {
VkExtent2D swap_chain_extent = { 640, 480 };
if( swap_chain_extent.width < surface_capabilities.minImageExtent.width ) {
swap_chain_extent.width = surface_capabilities.minImageExtent.width;
}
if( swap_chain_extent.height < surface_capabilities.minImageExtent.height ) {
swap_chain_extent.height = surface_capabilities.minImageExtent.height;
}
if( swap_chain_extent.width > surface_capabilities.maxImageExtent.width ) {
swap_chain_extent.width = surface_capabilities.maxImageExtent.width;
}
if( swap_chain_extent.height > surface_capabilities.maxImageExtent.height ) {
swap_chain_extent.height = surface_capabilities.maxImageExtent.height;
}
return swap_chain_extent;
}

// Most of the cases we define size of the swap_chain images equal to current
window's size
return surface_capabilities.currentExtent;
18. Tutorial02.cpp, function GetSwapChainExtent()

Selecting Swap Chain Usage Flags


Usage flags define how a given image may be used in Vulkan. If we want an image to be sampled (used inside shaders)
it must be created with “sampled” usage. If the image should be used as a depth render target, it must be created with
“depth and stencil” usage. An image without proper usage “enabled” cannot be used for a given purpose or the results of
such operations will be undefined.

For a swap chain we want to render (in most cases) into the image (use it as a render target), so we must specify “color
attachment” usage with VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT enum. In Vulkan this usage is always available for
swap chains, so we can always set it without any additional checking. But for any other usage we must ensure it is
supported – we can do this through a “supportedUsageFlags” member of surface capabilities structure.
// Color attachment flag must always be supported
// We can define other usage flags but we always need to check if they are supported
if( surface_capabilities.supportedUsageFlags & VK_IMAGE_USAGE_TRANSFER_DST_BIT ) {
return VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;
}
return 0;
19. Tutorial02.cpp, function GetSwapChainUsageFlags()

In this example we define additional “transfer destination” usage which is required for image clear operation.

Selecting Pre-Transformations
On some platforms we may want our image to be transformed. This is usually the case on tablets when they are
oriented in a way other than their default orientation. During swap chain creation we must specify what transformations
should be applied to images prior to presenting. We can, of course, use only the supported transforms, which can be found
in a “supportedTransforms” member of acquired surface capabilities.

If the selected pre-transform is other than the current transformation (also found in surface capabilities) the
presentation engine will apply the selected transformation. On some platforms this may cause performance degradation
(probably not noticeable but worth mentioning). In the sample code, I don’t want any transformations but, of course, I
must check whether it is supported. If not, I’m just using the same transformation that is currently used.
// Sometimes images must be transformed before they are presented (i.e. due to
device's orienation
// being other than default orientation)
// If the specified transform is other than current transform, presentation engine
will transform image
// during presentation operation; this operation may hit performance on some
platforms
// Here we don't want any transformations to occur so if the identity transform is
supported use it
// otherwise just use the same transform as current transform
if( surface_capabilities.supportedTransforms & VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR
) {
return VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
} else {
return surface_capabilities.currentTransform;
}
20. Tutorial02.cpp, function GetSwapChainTransform()

Selecting Presentation Mode


Present modes determine the way images will be processed internally by the presentation engine and displayed on
the screen. In the past, there was just a single buffer that was displayed all the time. If we were drawing anything on it the
draw operations (whole process of image creation) were visible.

Double buffering was introduced to prevent the visibility of drawing operations: one image was displayed and the
second was used to render into. During presentation, the contents of the second image were copied into the first image
(earlier) or (later) the images were swapped (remember SwapBuffers() function used in OpenGL applications?) which
means that their pointers were exchanged.

Tearing was another issue with displaying images, so the ability to wait for the vertical blank signal was introduced if
we wanted to avoid it. But waiting introduced another problem: input lag. So double buffering was changed into triple
buffering in which we were drawing into two back buffers interchangeably and during v-sync the most recent one was
used for presentation.

This is exactly what presentation modes are for: how to deal with all these issues, how to present images on the screen
and whether we want to use v-sync.

Currently there are four presentation modes:

 IMMEDIATE. Present requests are applied immediately and tearing may be observed (depending on the
frames per second). Internally the presentation engine doesn’t use any queue for holding swap chain images.
 FIFO. This mode is the most similar to OpenGL’s buffer swapping with a swap interval set to 1. The image is
displayed (replaces currently displayed image) only on vertical blanking periods, so no tearing should be
visible. Internally, the presentation engine uses FIFO queue with “numSwapchainImages – 1” elements.
Present requests are appended to the end of this queue. During blanking periods, the image from the
beginning of the queue replaces the currently displayed image, which may become available to application. If
all images are in the queue, the application has to wait until v-sync releases the currently displayed image.
Only after that does it becomes available to the application and program may render image into it. This mode
must always be available in all Vulkan implementations supporting swap chain extension.
 FIFO RELAXED. This mode is similar to FIFO, but when the image is displayed longer than one blanking period
it may be released immediately without waiting for another v-sync signal (so if we are rendering frames with
lower frequency than screen’s refresh rate, tearing may be visible)

 MAILBOX. In my opinion, this mode is the most similar to the mentioned triple buffering. The image is
displayed only on vertical blanking periods and no tearing should be visible. But internally, the presentation
engine uses the queue with only a single element. One image is displayed and one waits in the queue. If
application wants to present another image it is not appended to the end of the queue but replaces the one
that waits. So in the queue there is always the most recently generated image. This behavior is available if
there are more than two images. For two images MAILBOX mode behaves similarly to FIFO (as we have to
wait for the displayed image to be released, we don’t have “spare” image which can be exchanged with the
one that waits in the queue).
Deciding on which presentation mode to use depends on the type of operations we want to do. If we want to decode
and display movies we want all frames to be displayed in a proper order. So the FIFO mode is in my opinion the best
choice. But if we are creating a game, we usually want to display the most recently generated frame. In this case I suggest
using MAILBOX because there is no tearing and input lag is minimized. The most recently generated image is displayed
and the application doesn’t need to wait for v-sync. But to achieve this behavior, at least three images must be created
and this mode may not always be supported.

FIFO mode is always available and requires at least two images but causes application to wait for v-sync (no matter
how many swap chain images were requested). Immediate mode is the fastest. As I understand it, it also requires two
images but it doesn’t make application wait for monitor refresh rate. On the downside it may cause image tearing. The
choice is yours but, as always, we must make sure that the chosen presentation mode is supported.

Earlier we queried for available present modes, so now we must look for the one that best suits our needs. Here is the
code in which I’m looking for MAILBOX mode:
// FIFO present mode is always available
// MAILBOX is the lowest latency V-Sync enabled mode (something like triple-
buffering) so use it if available
for( VkPresentModeKHR &present_mode : present_modes ) {
if( present_mode == VK_PRESENT_MODE_MAILBOX_KHR ) {
return present_mode;
}
}
return VK_PRESENT_MODE_FIFO_KHR;
21. Tutorial02.cpp, function GetSwapChainPresentMode()
Creating a Swap Chain
Now we have all the data necessary to create a swap chain. We have defined all the required values, and we are sure
they fit into the given platform’s constraints.
uint32_t desired_number_of_images = GetSwapChainNumImages(
surface_capabilities );
VkSurfaceFormatKHR desired_format = GetSwapChainFormat( surface_formats );
VkExtent2D desired_extent = GetSwapChainExtent(
surface_capabilities );
VkImageUsageFlags desired_usage = GetSwapChainUsageFlags(
surface_capabilities );
VkSurfaceTransformFlagBitsKHR desired_transform = GetSwapChainTransform(
surface_capabilities );
VkPresentModeKHR desired_present_mode = GetSwapChainPresentMode(
present_modes );
VkSwapchainKHR old_swap_chain = Vulkan.SwapChain;

if( static_cast<int>(desired_usage) == 0 ) {
printf( "TRANSFER_DST image usage is not supported by the swap chain!" );
return false;
}

VkSwapchainCreateInfoKHR swap_chain_create_info = {
VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkSwapchainCreateFlagsKHR
flags
Vulkan.PresentationSurface, // VkSurfaceKHR
surface
desired_number_of_images, // uint32_t
minImageCount
desired_format.format, // VkFormat
imageFormat
desired_format.colorSpace, // VkColorSpaceKHR
imageColorSpace
desired_extent, // VkExtent2D
imageExtent
1, // uint32_t
imageArrayLayers
desired_usage, // VkImageUsageFlags
imageUsage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
imageSharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr, // const uint32_t
*pQueueFamilyIndices
desired_transform, // VkSurfaceTransformFlagBitsKHR
preTransform
VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR, // VkCompositeAlphaFlagBitsKHR
compositeAlpha
desired_present_mode, // VkPresentModeKHR
presentMode
VK_TRUE, // VkBool32
clipped
old_swap_chain // VkSwapchainKHR
oldSwapchain
};
if( vkCreateSwapchainKHR( Vulkan.Device, &swap_chain_create_info, nullptr,
&Vulkan.SwapChain ) != VK_SUCCESS ) {
printf( "Could not create swap chain!\n" );
return false;
}
if( old_swap_chain != VK_NULL_HANDLE ) {
vkDestroySwapchainKHR( Vulkan.Device, old_swap_chain, nullptr );
}

return true;
22. Tutorial02.cpp, function CreateSwapChain()

In this code example, at the beginning we gathered all the necessary data described earlier. Next we create a variable
of type VkSwapchainCreateInfoKHR. It consists of the following members:

 sType – Normal structure type, which here must be a


VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR.
 pNext – Pointer reserved for future use (for some extensions to this extension).
 flags – Value reserved for future use; currently must be set to zero.
 surface – A handle of a created surface that represents windowing system (our application’s window).
 minImageCount – Minimal number of images the application requests for a swap chain (must fit into available
constraints).
 imageFormat – Application-selected format for swap chain images; must be one of the supported surface
formats.
 imageColorSpace – Colorspace for swap chain images; only enumerated values of format-colorspace pairs
may be used for imageFormat and imageColorSpace (we can’t use format from one pair and colorspace from
another pair).
 imageExtent – Size (dimensions) of swap chain images defined in pixels; must fit into available constraints.
 imageArrayLayers – Defines the number of layers in a swap chain images (that is, views); typically this value
will be one but if we want to create multiview or stereo (stereoscopic 3D) images, we can set it to some higher
value.
 imageUsage – Defines how application wants to use images; it may contain only values of supported usages;
color attachment usage is always supported.
 imageSharingMode – Describes image-sharing mode when multiple queues are referencing images (I will
describe this in more detail later).
 queueFamilyIndexCount – The number of different queue families from which swap chain images will be
referenced; this parameter matters only when VK_SHARING_MODE_CONCURRENT sharing mode is used.
 pQueueFamilyIndices – An array containing all the indices of queue families that will be referencing swap
chain images; must contain at least queueFamilyIndexCount elements and as in queueFamilyIndexCount this
parameter matters only when VK_SHARING_MODE_CONCURRENT sharing mode is used.
 preTransform – Transformations applied to the swap chain image before it can be presented; must be one of
the supported values.
 compositeAlpha – This parameter is used to indicate how the surface (image) should be composited
(blended?) with other surfaces on some windowing systems; this value must also be one of the possible values
(bits) returned in surface capabilities, but it looks like opaque composition (no blending, alpha ignored) will
be always supported (as most of the games will want to use this mode).
 presentMode – Presentation mode that will be used by a swap chain; only supported mode may be selected.
 clipped – Connected with ownership of pixels; in general it should be set to VK_TRUE if application doesn’t
want to read from swap chain images (like ReadPixels()) as it will allow some platforms to use more optimal
presentation methods; VK_FALSE value is used in some specific scenarios (if I learn more about these scenario
I will write about them).
 oldSwapchain – If we are recreating a swap chain, this parameter defines an old swap chain that will be
replaced by a newly created one.

So what’s the matter with this sharing mode? Images in Vulkan can be referenced by queues. This means that we can
create commands that use these images. These commands are stored in command buffers, and these command buffers
are submitted to queues. Queues belong to different queue families. And Vulkan requires us to state how many different
queue families and which of them are referencing these images through commands submitted with command buffers.

If we want to reference images from many different queue families at a time we can do so. In this case we must
provide “concurrent” sharing mode. But this (probably) requires us to manage image data coherency by ourselves, that
is, we must synchronize different queues in such a way that data in the images is proper and no hazards occur—some
queues are reading data from images, but other queues haven’t finished writing to them yet.

We may not specify these queue families and just tell Vulkan that only one queue family (queues from one family) will
be referencing image at a time. This doesn’t mean other queues can’t reference these images. It just means they can’t do
it all at once, at the same time. So if we want to reference images from one family and then from another we must
specifically tell Vulkan: “My image was used inside this queue family, but from now on another family, this one, will be
referencing it.” Such a transition is done using image memory barrier. When only one queue family uses a given image at
a time, use the “exclusive” sharing mode.

If any of these requirements are not fulfilled, undefined behavior will probably occur and we may not rely on the
image contents.

In this example we are using only one queue so we don’t have to specify “concurrent” sharing mode and leave related
parameters (queueFamilyCount and pQueueFamilyIndices) blank (or nulled, or zeroed).

So now we can call the vkCreateSwapchainKHR() function to create a swap chain and check whether this operation
succeeded. After that (if we are recreating the swap chain, meaning this isn’t the first time we are creating one) we should
destroy the previous swap chain. I’ll discuss this later.

Image Presentation
We now have a working swap chain that contains several images. To use these images as render targets, we can get
handles to all images created with a swap chain, but we are not allowed to use them just like that. Swap chain images
belong to and are owned by the swap chain. This means that the application cannot use these images until it asks for
them. This also means that images are created and destroyed by the platform along with a swap chain (not by the
application).

So when the application wants to render into a swap chain image or use it in any other way, it must first get access to
it by asking a swap chain for it. If the swap chain makes us wait, we have to wait. And after the application finishes using
the image it should “return” it by presenting it. If we forget about returning images to a swap chain, we will soon run out
of images and nothing will display on the screen.

The application may also request access to more images at once but they must be available. Acquiring access may
require waiting. In corner cases, when there are too few images in a swap chain and the application wants to access too
many of them, or if we forget about returning images to a swap chain, the application may even wait an infinite amount
of time.

Given that there are (usually) at least two images, it may sound strange that we have to wait, but it is quite reasonable.
Not all images are available for the application because they are used by the presentation engine. Usually one image is
displayed. Additional images may also be required for the presentation engine to work properly. So we can’t use them
because it could block the presentation engine in some way. We don’t know its internal mechanisms and algorithms or
the requirements of the OS the application is executed on. So the availability of images may depend on many factors:
internal implementation, OS, number of created images, number of images the application wants to use at a single time
and on the selected presentation mode, which is the most important factor from the perspective of this tutorial.

In immediate mode, one image is always presented. Other images (at least one) are available for application. When
the application posts a presentation request (“returns” an image), the image that was displayed is replaced with the new
one. So if two images are created, only one image may be available for application at a single time. When the application
asks for another image, it must “return” the previous one. If it wants two images at a time, it must create a swap chain
with more images or it will wait forever. When we request more images, in immediate mode, the application can ask for
(own) “imageCount – 1” images at a time.

In FIFO mode one image is displayed, and the rest are placed in a FIFO queue. The length of this queue is always equal
to “imageCount – 1.” At first, all images may be available to the application (because the queue is empty and no image is
presented). When the application presents an image (“returns” it to a swap chain), it is appended to the end of the queue.
So as soon as the queue fills, the application has to wait for another image until the displayed image is released during the
vertical blanking period. Images are always displayed in the same order they were presented in by the application. When
the v-sync signal appears, the first image from the queue replaces the image that was displayed. The previously displayed
image (the released one) may become available to the application as it becomes unused (isn’t presented and is not waiting
in the queue). If all images are in the queue, the application will wait for the next blanking period to access another image.
If rendering takes longer than the refresh rates, the application will not have to wait at all. This behavior doesn’t change
when there are more images. The internal swap chain queue has always “imageCount – 1” elements.

The last mode available for the time being is MAILBOX. As previously mentioned, this mode is most similar to the
“traditional” triple buffering. One image is always displayed. The second image waits in a single-element queue (it always
has place for only one element). The rest of the images may be available for the application. When the application presents
an image, the image replaces the one waiting in the queue. The image in the queue gets displayed only during blanking
periods, but the application doesn’t need to wait for the next image (when there are more than two images). MAILBOX
mode with only two images behaves identically to FIFO mode—the application must wait for the v-sync signal to acquire
the next image. But with at least three images it immediately may acquire the image that was replaced by the “presented”
image (the one waiting in the queue). That’s why I requested one more image than the minimal number. If MAILBOX mode
is available I want to use it in a manner similar to triple buffering (maybe the first thing to do is to check what mode is
available and after that choose the number of swap chain images based on the selected presentation mode).

I hope these examples help you understand why the application must ask for an image if it wants to use any. In Vulkan
we can only do what is allowed and required—not less and usually not too much more.
uint32_t image_index;
VkResult result = vkAcquireNextImageKHR( Vulkan.Device, Vulkan.SwapChain, UINT64_MAX,
Vulkan.ImageAvailableSemaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
case VK_SUCCESS:
case VK_SUBOPTIMAL_KHR:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during swap chain image acquisition!\n" );
return false;
}
23. Tutorial02.cpp, function Draw()

To access an image, we must call the vkAcquireNextImageKHR() function. During the call we must specify (apart from
the device handle like in almost all other functions) a swap chain from which we want to use an image, a timeout, a
semaphore, and a fence object. A function, in case of a success, will store the image index in the variable we provided the
address of. Why an index and not the (handle to) image itself? Such a behavior may be convenient (that is, during the
“preprocessing” phase when we want to prepare as much data needed for rendering as possible to not waste time during
typical frame rendering) but I will describe it later. Just remember that we can check what images were created in a swap
chain if we want (we just can’t use them until we are allowed). An array of images will be provided upon such query. And
the vkAcquireNextImageKHR() function stores an index into this very array.

We have to specify a timeout because sometimes images may not be immediately available. Trying to use an image
before we are allowed to will cause an undefined behavior. Specifying a timeout gives the presentation engine time to
react. If it needs to wait for the next vertical blanking period it can do so and we give it a time. So this function will block
until the given time has passed. We can provide maximal available value so the function may even block indefinitely. If we
provide 0 for the timeout, the function will return immediately. If any image was available at the time the call occurred it
will be provided immediately. If there was no available image, an error will be returned stating that the image was not yet
ready.

Once we have our image we can use it however we want. Images are processed or referenced by commands stored
in command buffers. We can prepare command buffers earlier (to save as much processing time for rendering as we can)
and use or submit them here. Or we can prepare the commands now and submit them when we’re done. In Vulkan,
creating command buffers and submitting them to queues is the only way to cause operations to be performed by the
device.

When command buffers are submitted to queues, all their commands start being processed. But a queue cannot use
an image until it is allowed to, and the semaphore we created earlier is for internal queue synchronization—before the
queue starts processing commands that reference a given image, it should wait on this semaphore (until it gets signaled).
But this wait doesn’t block an application. There are two synchronization mechanisms for accessing swap chain images:
(1) a timeout, which may block an application but doesn’t stop queue processing, and (2) a semaphore, which doesn’t
block the application but blocks selected queues.

We now know (theoretically) how to render anything (through command buffers). So let’s now imagine that inside a
command buffer we are submitting some rendering operations take place. But before the processing will start, we should
tell the queue (on which this rendering will occur) to wait. This all is done within one submit operation.
VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&Vulkan.ImageAvailableSemaphore, // const VkSemaphore
*pWaitSemaphores
&wait_dst_stage_mask, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&Vulkan.PresentQueueCmdBuffers[image_index], // const VkCommandBuffer
*pCommandBuffers
1, // uint32_t
signalSemaphoreCount
&Vulkan.RenderingFinishedSemaphore // const VkSemaphore
*pSignalSemaphores
};

if( vkQueueSubmit( Vulkan.PresentQueue, 1, &submit_info, VK_NULL_HANDLE ) !=


VK_SUCCESS ) {
return false;
}
24. Tutorial02.cpp, function Draw()
First we prepare a structure with information about the types of operations we want to submit to the queue. This is
done through VkSubmitInfo structure. It contains the following fields:

 sType – Standard structure type; here it must be set to VK_STRUCTURE_TYPE_SUBMIT_INFO.


 pNext – Standard pointer reserved for future use.
 waitSemaphoreCount – Number of semaphores we want the queue to wait on before it starts processing
commands from command buffers.
 pWaitSemaphores – Pointer to an array with semaphore handles on which queue should wait; this array must
contain at least waitSemaphoreCount elements.
 pWaitDstStageMask – Pointer to an array with the same amount of elements as pWaitSemaphores array; it
describes the pipeline stages at which each (corresponding) semaphore wait will occur; in our example, the
queue may perform some operations before it starts using the image from the swap chain so there is no
reason to block all of the operations; the queue may start processing some drawing commands and until
pipeline gets to the stage in which the image is used, it will wait.
 commandBufferCount – Number of command buffers we are submitting for execution.
 pCommandBuffers – Pointer to an array with command buffers handles which must contain at least
commandBufferCount elements.
 signalSemaphoreCount – Number of semaphores we want the queue to signal after processing all the
submitted command buffers.
 pSignalSemaphores – Pointer to an array of at least signalSemaphoreCount elements with semaphore
handles; these semaphores will be signaled after the queue has finished processing commands submitted
within this submit information.

In this example we are telling the queue to wait only on one semaphore, which will be signaled by the presentation
engine when the queue can safely start processing commands referencing the swap chain image.

We also submit just one simple command buffer. It was prepared earlier (I will describe how to do it later). It only
clears the acquired image. But this is enough for us to see the selected color in our application’s window and to see that
the swap chain is working properly.

In the code above, the command buffers are arranged in an array (a vector, to be more precise). To make it easier to
submit the proper command buffer—the one that references the currently acquired image—I prepared a separate
command buffer for each swap chain image. The index of an image that the vkAcquireNextImageKHR() function provides
can be used right here. Using image handles (in similar scenarios) would require creating maps that would translate the
handle into a specific command buffer or index. On the other hand, normal numbers can be used to just select a specific
array element. This is why this function gives us indices and not image handles.

After we have submitted a command buffer, all the processing starts in the background, on “hardware.” Next, we
want to present a rendered image. Presenting means that we want our image to be displayed and that we are “giving it
back” to the swap chain. The code to do this might look like this:
VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&Vulkan.RenderingFinishedSemaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&Vulkan.SwapChain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( Vulkan.PresentQueue, &present_info );

switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during image presentation!\n" );
return false;
}

return true;
25. Tutorial02.cpp, function Draw()

An image (or images) is presented by calling the vkQueuePresentKHR() function. It may be perceived as submitting a
command buffer with only operation: presentation.

To present an image we must specify what images should be presented from how many and from which swap chains.
We can present many images from many swap chains at once (that is, to multiple windows) but only one image from a
single swap chain can be presented at once. We provide this information through the VkPresentInfoKHR structure, which
contains the following fields:

 sType – Standard structure type, it must be a VK_STRUCTURE_TYPE_PRESENT_INFO_KHR here.


 pNext – Parameter reserved for future use.
 waitSemaphoreCount – The number of semaphores we want the queue to wait on before it presents images.
 pWaitSemaphores – Pointer to an array with semaphore handles on which the queue should wait; this array
must contain at least waitSemaphoreCount elements.
 swapchainCount – The number of swapchains to which we would like to present images.
 pSwapchains – An array with swapchainCount elements that contains handles of all the swap chains that we
want to present images to; any single swap chain may only appear once in this array.
 imageIndices – An array with swapchainCount elements that contains indices of images that we want to
present; each element of this array corresponds to a swap chain in a pSwapchains array; the image index is
the index into the array of each swap chain’s images (see the next section).
 pResults – A pointer to an array of at least swapchainCount element; this parameter is optional and can be
set to null, but if we provide such an array, the result of the presenting operation will be stored in each of its
elements, for each swap chain respectively; a single value returned by the whole function is the same as the
worst result value from all swap chains.

Now that we have prepared this structure, we can use it to present an image. In this example I’m just presenting a
single image from a single swap chain.

Each operation that is performed (or submitted) by calling vkQueue…() functions (this includes presenting) is
appended to the end of the queue for processing. Operations are processed in the order in which they were submitted.
For a presentation, we are presenting an image after submitting other command buffers. So the present queue will start
presenting an image after the processing of all the command buffers is done. This ensures that the image will be presented
after we are done using it (rendering into it) and an image with correct contents will be displayed on the screen. But in
this example we submit drawing (clearing) operations and a present operation to the same queue: the PresentQueue. We
are doing only simple operations that are allowed to be done on a present queue.
If we want to perform drawing operations on a queue that is different than the present operation, we need to
synchronize the queues. This is done, again, with semaphores, which is the reason why we created two semaphores (the
second one may not be necessary in this example, as we render and present using the same queue, but I wanted to show
how it should be done in the correct way).

The first semaphore is for presentation engine to tell the queue that it can safely use (reference/render into) an image.
The second semaphore is for us. It is signaled when the operations on the image (rendering into it) are done. The submit
info structure has a field called pSignalSemaphores. It is an array of semaphore handles that will be signaled after
processing of all of the submitted command buffers is finished. So we need to tell the second queue to wait on this second
semaphore. We store the handle of our second semaphore in the pWaitSemaphores field of a VkPresentInfoKHR structure.
And the queue to which we are submitting the present operation will wait, thanks to this second semaphore, until we are
done rendering into a given image.

And that’s it. We have displayed our first image using Vulkan!

Checking What Images Were Created in a Swap Chain


Previously I mentioned swap chain’s image indices. Here in this code sample, I show you more specifically what I was
talking about.
uint32_t image_count = 0;
if( (vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, nullptr
) != VK_SUCCESS) ||
(image_count == 0) ) {
printf( "Could not get the number of swap chain images!\n" );
return false;
}

std::vector<VkImage> swap_chain_images( image_count );


if( vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count,
&swap_chain_images[0] ) != VK_SUCCESS ) {
printf( "Could not get swap chain images!\n" );
return false;
}
26. -

This code sample is a fragment of an imaginary function that checks how many and what images were created inside
a swap chain. It is done by a traditional “double-call,” this time using a vkGetSwapchainImagesKHR() function. First we
call it with the last parameter set to null. This way the number of all images created in a swap chain is stored in an
“image_count” variable and we know how much storage we need to prepare for the handles of all images. The second
time we call this function, we achieve the handles in the array we have provided the address of through the last parameter.

Now we know all the images that the swap chain is using. For the vkAcquireNextImageKHR() function and
VkPresentInfoKHR structure, the indices I referred to are the indices into this array, an array “returned” by the
vkGetSwapchainImagesKHR() function. It is called an array of a swap chain’s presentable images. And if any function, in
the case of a swap chain, wants us to provide an index or returns an index, it is the index of an image in this very array.

Recreating a Swap Chain


Previously, I mentioned that sometimes we must recreate a swap chain, and I also said that the old swap chain must
be destroyed. The vkAcquireNextImageKHR() and vkQueuePresentKHR() functions return a result that sometimes causes
the OnWindowSizeChanged() function to be called. This function recreates the swap chain.

Sometimes a swap chain gets old. This means that the properties of the surface, platform, or application window
properties changed in such a way that the current swap chain cannot be used any more. The most obvious (and
unfortunately not so good) example is when the window’s size changed. We cannot create a swap chain image nor can
we change its size. The only possibility is to destroy and recreate a swap chain. There are also situations in which we can
still use a swap chain, but it may no longer be optimal for surface it was created for.

These situations are notified by the return codes of the vkAcquireNextImageKHR() and vkQueuePresentKHR()
functions.

When the VK_SUBOPTIMAL_KHR value is returned, we can still use the current swap chain for presentation. It will still
work but not optimally (that is, color precision will be worse). It is advised to recreate swap chain when there is an
opportunity. A good example is when we have performed performance-heavy rendering and after acquiring the image we
are informed that our image is suboptimal. We don’t want to waste all this processing and make the user wait much longer
for another frame. We just present the image and recreate the swap chain as soon as there is an opportunity.

When VK_ERROR_OUT_OF_DATE_KHR is returned we cannot use current swap chain and we must recreate it
immediately. We cannot present using the current swap chain; this operation will fail. We have to recreate a swap chain
as soon as possible.

I have mentioned that changing the window size is the most obvious, but not so good, example of surface properties’
changes after which we should recreate a swap chain. In this situation we should recreate a swap chain, but we may not
be notified about it with the mentioned return codes. We should monitor the window size changes by ourselves using OS-
specific code. And that’s why the name of this function in our source is OnWindowSizeChanged. This function is called
every time a window’s size had changed. But as this function only recreates a swap chain (and command buffers) the same
function can be called here.

Recreation is done the same way as creation. There is a structure member in which we provide a swap chain that the
new one should replace. But we must implicitly destroy the old swap chain after we create the new one.

Quick Dive into Command Buffers


You now know a lot about swap chains, but there is still one important thing you need to know. To explain it, I will
briefly show you how to prepare drawing commands. That one last important thing about swap chains is connected with
drawing and preparing command buffers. I will present only information about how to clear images, but it is enough to
check whether our swap chain is working as it should.

In the first tutorial, I described queues and queue families. If we want to execute commands on a device we submit
them to queues through command buffers. To put it in other words: commands are encapsulated inside command buffers.
Submitting such buffers to queues causes devices to start processing commands that were recorded in them. Do you
remember OpenGL’s drawing lists? We could prepare lists of commands that cause the geometry to be drawn in a form
of a list of, well, drawing commands. The situation in Vulkan is similar, but far more flexible and advanced.

Creating Command Buffer Memory Pool


To store commands, a command buffer needs some storage. To prepare space for commands we create a pool from
which the buffer can allocate its memory. We don’t specify the amount of space—it is allocated dynamically when the
buffer is built (recorded).

Remember that command buffers can be submitted only to proper queue families and only the types of operations
compatible with a given family can be submitted to a given queue. Also, the command buffer itself is not connected with
any queue or queue family, but the memory pool from which buffer allocates its memory is. So each command buffer that
takes memory from a given pool can only be submitted to a queue from a proper queue family—a family from (inside?)
which the memory pool was created. If there are more queues created from a given family, we can submit a command
buffer to any one of them; the family index is the most important thing here.
VkCommandPoolCreateInfo cmd_pool_create_info = {
VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void*
pNext
0, // VkCommandPoolCreateFlags
flags
Vulkan.PresentQueueFamilyIndex // uint32_t
queueFamilyIndex
};

if( vkCreateCommandPool( Vulkan.Device, &cmd_pool_create_info, nullptr,


&Vulkan.PresentQueueCmdPool ) != VK_SUCCESS ) {
printf( "Could not create a command pool!\n" );
return false;
}
27. Tutorial02.cpp, function CreateCommandBuffers()

To create a pool for command buffer(s) we call a vkCreateCommandPool() function. It requires us to provide (an
address of) a variable of structure type VkCommandPoolCreateInfo. It contains the following members:

 sType – A usual type of structure that must be equal to VK_STRUCTURE_TYPE_CMD_POOL_CREATE_INFO in


this occasion.
 pNext – Pointer reserved for future use.
 flags – Value reserved for future use.
 queueFamilyIndex – Index of a queue family for which this pool is created.

For our test application, we use only one queue from a presentation family, so we should use its index. Now we can
call the vkCreateCommandPool() function and check whether it succeeded. If yes, the handle to the command pool will
be stored in a variable we have provided the address of.

Allocating Command Buffers


Next, we need to allocate the command buffer itself. Command buffers are not created in a typical way; they are
allocated from pools. Other objects that take their memory from pool objects are also allocated (the pools themselves are
created). That’s why there is a separation in the names of the functions vkCreate…() and vkAllocate…().

As described earlier, I allocate more than one command buffer—one for each swap chain image that will be referenced
by the drawing commands. So each time we acquire an image from a swap chain we can submit/use the proper command
buffer.
uint32_t image_count = 0;
if( (vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, nullptr
) != VK_SUCCESS) ||
(image_count == 0) ) {
printf( "Could not get the number of swap chain images!\n" );
return false;
}

Vulkan.PresentQueueCmdBuffers.resize( image_count );

VkCommandBufferAllocateInfo cmd_buffer_allocate_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void*
pNext
Vulkan.PresentQueueCmdPool, // VkCommandPool
commandPool
VK_COMMAND_BUFFER_LEVEL_PRIMARY, // VkCommandBufferLevel
level
image_count // uint32_t
bufferCount
};
if( vkAllocateCommandBuffers( Vulkan.Device, &cmd_buffer_allocate_info,
&Vulkan.PresentQueueCmdBuffers[0] ) != VK_SUCCESS ) {
printf( "Could not allocate command buffers!\n" );
return false;
}

if( !RecordCommandBuffers() ) {
printf( "Could not record command buffers!\n" );
return false;
}
return true;
28. Tutorial02.cpp, function CreateCommandBuffers()

First we need to know how many swap chain images were created (a swap chain may create more images than we
have specified). This was explained in an earlier section. We call the vkGetSwapchainImagesKHR() function with the last
parameter set to null. Right now we don’t need the handles of images, only their total number. After that we prepare an
array (vector) for a proper number of command buffers and we can create a proper number of command buffers. To do
this we call the vkAllocateCommandBuffers() function. It requires us to prepare a structured variable of type
VkCommandBufferAllocateInfo, which contains the following fields:

 sType – Type of a structure, this time equal to VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO.


 pNext – Normal parameter reserved for future use.
 commandPool – Command pool from which the buffer will be allocating its memory during commands
recording.
 level – Type (level) of command buffer. There are two levels: primary and secondary. Secondary command
buffers may only be referenced (used) from primary command buffers. Because we don’t have any other
buffers, we need to create primary buffers here.
 bufferCount – The number of command buffers we want to create at once.

After calling the vkAllocateCommandBuffers() function, we need to check whether the buffer creations succeeded. If
yes, we are done allocating command buffers and we are ready to record some (simple) commands.

Recording Command Buffers


Command recording is the most important operation we will be doing in Vulkan. The recording itself also requires us
to provide a lot of information. The more information, the more complicated the drawing commands are.

Here is a set of variables required (in this tutorial) to record command buffers:
uint32_t image_count = static_cast<uint32_t>(Vulkan.PresentQueueCmdBuffers.size());

std::vector<VkImage> swap_chain_images( image_count );


if( vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count,
&swap_chain_images[0] ) != VK_SUCCESS ) {
printf( "Could not get swap chain images!\n" );
return false;
}

VkCommandBufferBeginInfo cmd_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};

VkClearColorValue clear_color = {
{ 1.0f, 0.8f, 0.4f, 0.0f }
};

VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};
29. Tutorial02.cpp, function RecordCommandBuffers()

First we get the handles of all the swap chain images, which will be used in drawing commands (we will just clear them
to one single color but nevertheless we will use them). We already know the number of images, so we don’t have to ask
for it again. The handles of images are stored in a vector after calling the vkGetSwapchainImagesKHR() function.

Next, we need to prepare a variable of structured type VkCommandBufferBeginInfo. It contains the information
necessary in more typical rendering scenarios (like render passes). We won’t be doing such operations here and that’s
why we can set almost all parameters to zeros or nulls. But, for clarity, the structure contains the following fields:

 sType – Structure type, this time it must be set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.


 pNext – Pointer reserved for future use, leave it to null.
 flags – Parameter defining preferred usage of a command buffer.
 pInheritanceInfo – Parameter pointing to another structure that is used in more typical rendering scenarios.

Command buffers gather commands. To store commands in command buffers, we record them. The above structure
provides some necessary information for the driver to prepare for and optimize the recording process.

In Vulkan, command buffers are divided into primary and secondary. Primary command buffers are typical command
buffers similar to drawing lists. They are independent, individual “beings” and they (and only they) may be submitted to
queues. Secondary command buffers can also store commands (we also record them), but they may only be referenced
from within primary command buffers (we can call secondary command buffers from within primary command buffers
like calling OpenGL’s drawing lists from another drawing lists). We can’t submit secondary command buffers directly to
queues.

All of this information will be described in more detail in a forthcoming tutorial.

In this simple example we want to clear our images with one single value. So next we set up a color that will be used
for clearing. You can pick any value you like. I used a light orange color.

The last variable in the code above specifies the parts of the image that our operations will be performed on. Our
image consists of only one mipmap level and one array level (no stereoscopic buffers, and so on). We set values in the
VkImageSubresourceRange structure accordingly. This structure contains the following fields:

 aspectMask – Depends on the image format as we are using images as color render targets (they have “color”
format) so we specify “color aspect” here.
 baseMipLevel – First mipmap level that will be accessed (modified).
 levelCount – Number of mipmap levels on which operations will be performed (including the base level).
 baseArrayLayer – First array layer that will be accessed (modified).
 arraySize – Number of layers the operations will be performed on (including the base layer).

We are almost ready to record some buffers.

Image Layouts and Layout Transitions


The last variable required in the above code example (of type VkImageSubresourceRange) specifies the parts of the
image that operations will be performed on. In this lesson we only clear an image. But we also need to perform resource
transitions. Remember the code when we selected a use for a swap chain image before the swap chain itself was created?
Images may be used for different purposes. They may be used as render targets, as textures that can be sampled from
inside the shaders, or as a data source for copy/blit operations (data transfers). We must specify different usage flags
during image creation for the different types of operations we want to perform with or on images. We can specify more
usage flags if we want (if they are supported; “color attachment” usage is always available for swap chains). But image
usage specification is not the only thing we need to do. Depending on the type of operation, images may be differently
allocated or may have a different layout in memory. Each type of image operation may be connected with a different
“image layout.” We can use a general layout that is supported by all operations, but it may not provide the best
performance. For specific usages we should always use dedicated layouts.

If we create an image with different usages in mind and we want to perform different operations on it, we must
change the image’s current layout before we can perform each type of operation. To do this, we must transition from the
current layout to another layout that is compatible with the operations we are about to execute.

Each image we create is created (generally) with an undefined layout, and we must transition from it to another layout
if want to use the image. But swap-chain-created images have VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR layouts. This
layout, as the name suggests, is designed for the image to be used (presented) by the presentation engine (that is,
displayed on the screen). So if we want to perform some operations on swap chain images, we need to change their
layouts to ones compatible with the desired operations. And after we have finished with processing the images (that is,
rendering into them) we need to transition their layouts back to the VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR.
Otherwise, the presentation engine will not be able to use these images and undefined behavior may occur.

To transition from one layout to another one, image memory barriers are used. With them we can specify the old
layout (current) we are transitioning from and the new layout we are transitioning to. The old layout must always be equal
to the current or undefined layout. When we specify the old layout as undefined, image contents may be discarded during
transition. This allows the driver to perform some optimizations. If we want to preserve image contents we must specify
a layout that is equal to the current layout.

The last variable of type VkImageSubresourceRange in the code example above is also used for image transitions. It
defines what “parts” of the image are changing their layout and is required when preparing an image memory barrier.

Recording Command Buffers


The last step is to record a command buffer for each swap chain image. We want to clear the image to some arbitrary
color. But first we need to change the image layout and change it back after we are done. Here is the code that does that:
for( uint32_t i = 0; i < image_count; ++i ) {
VkImageMemoryBarrier barrier_from_present_to_clear = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_UNDEFINED, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout
newLayout
Vulkan.PresentQueueFamilyIndex, // uint32_t
srcQueueFamilyIndex
Vulkan.PresentQueueFamilyIndex, // uint32_t
dstQueueFamilyIndex
swap_chain_images[i], // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};

VkImageMemoryBarrier barrier_from_clear_to_present = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
Vulkan.PresentQueueFamilyIndex, // uint32_t
srcQueueFamilyIndex
Vulkan.PresentQueueFamilyIndex, // uint32_t
dstQueueFamilyIndex
swap_chain_images[i], // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};

vkBeginCommandBuffer( Vulkan.PresentQueueCmdBuffers[i], &cmd_buffer_begin_info );


vkCmdPipelineBarrier( Vulkan.PresentQueueCmdBuffers[i],
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0,
nullptr, 1, &barrier_from_present_to_clear );

vkCmdClearColorImage( Vulkan.PresentQueueCmdBuffers[i], swap_chain_images[i],


VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear_color, 1, &image_subresource_range );

vkCmdPipelineBarrier( Vulkan.PresentQueueCmdBuffers[i],
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0,
nullptr, 1, &barrier_from_clear_to_present );
if( vkEndCommandBuffer( Vulkan.PresentQueueCmdBuffers[i] ) != VK_SUCCESS ) {
printf( "Could not record command buffers!\n" );
return false;
}
}

return true;
30. Tutorial02.cpp, function RecordCommandBuffers()

This code is placed inside a loop. We are recording a command buffer for each swap chain image. That’s why we
needed a number of images. Image handles are also needed here. We need to specify them for image memory barriers
and during image clearing. But recall that I said we can’t use swap chain images until we are allowed to, until we acquire
the image from the swap chain. That’s true, but we aren’t using them here. We are only preparing commands. The usage
itself is performed when we submit operations (a command buffer) to the queue for execution. Here we are just telling
Vulkan that in the future, take this picture and do this with it, then that, and after that something more. This way we can
prepare as much work as we can before we start the main rendering loop and we avoid switches, ifs, jumps, and other
branches during the real rendering. This scenario won’t be so simple in real life, but I hope the example is clear.

In the above code above, we are first preparing two image memory barriers. Memory barriers are used to change
three different things in the case of images. From the tutorial point of view, only the layouts are interesting right now but
we need to properly set all fields. To set up a memory barrier we need to prepare a variable of type
VkImageMemoryBarrier, which contains the following fields:

 sType – Structure type which here must be set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.


 pNext – Leave it null, pointer not used right now.
 srcAccessMask – Types of memory operations done on the image before the barrier.
 dstAccessMask – Types of memory operations that will take place after the barrier.
 oldLayout – Layout from which we are transitioning; it should always be equal to the current layout (which in
this example, for a first barrier, would be VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR).Or we can use an
undefined layout, which will let the driver perform some optimizations but the contents of the image may be
discarded. Since we don’t need the contents, we can use an undefined layout here.
 newLayout – A layout that is compatible with operations we will be performing after the barrier; we want to
do image clears; to do that we need to specify VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL layout. We
should always use a specific, dedicated layout.
 srcQueueFamilyIndex – A queue family index that was referencing the image previously.
 dstQueueFamilyIndex – A family index from which queues will be referencing images after the barrier (this
refers to the swap chain sharing mode I was describing earlier).
 image – handle to image itself.
 subresourceRange – A structure describing parts of an image we want to perform transitions on; this is that
last variable from the previous code example.

Some notes are necessary regarding access masks and family indices. In this example before the first barrier and after
the second barrier only the presentation engine has access to the image. The presentation engine only reads from the
image (it doesn’t modify it) so we set srcAccessMask in the first barrier and dstAccessMask in the second barrier to
VK_ACCESS_MEMORY_READ_BIT. This indicates that the memory associated with the image is read-only (image contents
are not modified before the first barrier and after the second barrier). In our command buffer we will only clear an image.
This operation belongs to the so-called “transfer” operations. That is why I’ve set the value of
VK_ACCESS_TRANSFER_WRITE_BIT in the first barrier in dstAccessMask field and in the srcAccessMask field of the second
barrier.

I won’t go into more detail about queue family indices, but if a queue used for graphics operations and presentation
are the same, srcQueueFamilyIndex and dstQueueFamilyIndex will be equal, and the hardware won’t make any
modifications regarding image access from the queues. But remember that we have specified that only one queue at a
time will access/use the image. So if these queues are different, we inform the hardware here about the “ownership”
change, that different queue will now access the image. And this is all the information you need right now to properly set
up barriers.

We need to create two barriers: one that changes the layout from the “present source” (or undefined) to ”transfer
dst”. This barrier is used at the beginning of a command buffer, when the previously presentation engine used an image
and now we want to use it and modify it. The second barrier is used to change the layout back into the “present source”
when we are done using the images and we can give them back to a swap chain. This barrier is set at the end of a command
buffer.

Now we are ready to start recording our commands by calling the vkBeginCommandBuffer() function. We provide a
handle to a command buffer and an address of a variable of type VkCommandBufferBeginInfo and we are ready to go.
Next we set up a barrier to change the image layout. We call the vkCmdPipelineBarrier() function, which takes quite a
few parameters but in this example the only relevant ones are the first—the command buffer handle—and the last two:
number of elements (barriers) of an array and a pointer to first element of an array containing the addresses of variables
of type VkImageMemoryBarrier. Elements of this array describe images, their parts, and the types of transitions that
should occur. After the barrier we can safely perform any operations on the swap chain image that are compatible with
the layout we have transitioned images to. The general layout is compatible with all operations but with a (probably)
reduced performance.

In the example we are only clearing images so we call the vkCmdClearColorImage() function. It takes a handle to a
command buffer, handle to an image, current layout of an image, pointer to a variable with clear color value, number of
subresources (number of elements in the array from the last parameter), and an array of pointers to variables of type
VkImageSubresourceRange. Elements in the last array specify what parts of the image we want to clear (we don’t have to
clear all mipmaps or array levels of an image if we don’t want to).

And at the end of our recording session we set up another barrier that transitions the image layout back to a “present
source” layout. It is the only layout that is compatible with the present operations performed by the presentation engine.

Now we can call the vkEndCommandBuffer() function to inform that we have ended recording a command buffer. If
something went wrong during recording we will be informed about it through the value returned by this function. If there
were errors, we cannot use the command buffer, and we’ll need to record it once again. If everything is fine we can use
the command buffer later to tell our device to perform operations stored in it just by submitting the buffer to a queue.

Tutorial 2 Execution
In this example, if everything went fine, we should see a window with a light-orange color displayed inside it. The
contents of a window should look similar to this:

Cleaning Up
Now you know how to create a swap chain, display images in a window and perform simple operations that are
executed on a device. We have created command buffers, recorded them, and presented on the screen. Before we close
the application, we need to clean up the resources we were using. In this tutorial I have divided cleaning into two functions.
The first function clears (destroys) only those resources that should be recreated when the swap chain is recreated (that
is, after the size of an application’s window has changed).
if( Vulkan.Device != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( Vulkan.Device );

if( (Vulkan.PresentQueueCmdBuffers.size() > 0) && (Vulkan.PresentQueueCmdBuffers[0]


!= VK_NULL_HANDLE) ) {
vkFreeCommandBuffers( Vulkan.Device, Vulkan.PresentQueueCmdPool,
static_cast<uint32_t>(Vulkan.PresentQueueCmdBuffers.size()),
&Vulkan.PresentQueueCmdBuffers[0] );
Vulkan.PresentQueueCmdBuffers.clear();
}

if( Vulkan.PresentQueueCmdPool != VK_NULL_HANDLE ) {


vkDestroyCommandPool( Vulkan.Device, Vulkan.PresentQueueCmdPool, nullptr );
Vulkan.PresentQueueCmdPool = VK_NULL_HANDLE;
}
}
31. Tutorial02.cpp, Clear()

First we must be sure that no operations are executed on the device’s queues (we can’t destroy a resource that is
used by the currently processed commands). We can check it by calling vkDeviceWaitIdle() function. It will block until all
operations are finished.

Next we free all the allocated command buffers. In fact this operation is not necessary here. Destroying a command
pool implicitly frees all command buffers allocated from a given pool. But I want to show you how to explicitly free
command buffers. Next we destroy the command pool itself.

Here is the code that is responsible for destroying all of the resources created in this lesson:
Clear();

if( Vulkan.Device != VK_NULL_HANDLE ) {


vkDeviceWaitIdle( Vulkan.Device );

if( Vulkan.ImageAvailableSemaphore != VK_NULL_HANDLE ) {


vkDestroySemaphore( Vulkan.Device, Vulkan.ImageAvailableSemaphore, nullptr );
}
if( Vulkan.RenderingFinishedSemaphore != VK_NULL_HANDLE ) {
vkDestroySemaphore( Vulkan.Device, Vulkan.RenderingFinishedSemaphore, nullptr );
}
if( Vulkan.SwapChain != VK_NULL_HANDLE ) {
vkDestroySwapchainKHR( Vulkan.Device, Vulkan.SwapChain, nullptr );
}
vkDestroyDevice( Vulkan.Device, nullptr );
}

if( Vulkan.PresentationSurface != VK_NULL_HANDLE ) {


vkDestroySurfaceKHR( Vulkan.Instance, Vulkan.PresentationSurface, nullptr );
}

if( Vulkan.Instance != VK_NULL_HANDLE ) {


vkDestroyInstance( Vulkan.Instance, nullptr );
}

if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
dlclose( VulkanLibrary );
#endif
}
32. Tutorial02.cpp, destructor
First we destroy the semaphores (remember they cannot be destroyed when they are in use, that is, when a queue is
waiting on a given semaphore). After that we destroy a swap chain. Images that were created along with it are
automatically destroyed, and we don’t need to do it by ourselves (we are even not allowed to). Next the device is
destroyed. We also need to destroy the surface that represents our application’s window. At the end, the Vulkan instance
destruction takes place and the graphics driver’s dynamic library is unloaded. Before we perform each step we also check
whether a given resource was properly created. We can’t destroy resources that weren’t properly created.

Conclusion
In this tutorial you learned how to display on a screen anything that was created with Vulkan API. To brief review the
steps: First we enabled the proper instance level extensions. Next we created an application window’s Vulkan
representation called a surface. Then we chose a device with a queue family that supported presentation and created a
logical device (don’t forget about enabling device-level extensions!)

After that we created a swap chain. To do that we first acquired a set of parameters describing our surface and then
chose values for proper swap chain creation. Those values had to fit into a surface’s supported constraints.

To draw something on the screen we learned how to create and record command buffers, which also included image’s
layout transitions for which image memory barriers (pipeline barriers) were used. We cleared images so we could see the
selected color being displayed on screen.

And we also learned how to present a given image on the screen, which included acquiring an image, submitting a
command buffer, and the presentation process itself.

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


API without Secrets: Introduction to Vulkan*
Part 3
Table of Contents
Tutorial 3: First Triangle – Graphics Pipeline and Drawing ..................................................................................................... 3
About the Source Code Example .................................................................................................................................... 3
Creating a Render Pass ....................................................................................................................................................... 3
Render Pass Attachment Description ............................................................................................................................. 4
Subpass Description ........................................................................................................................................................ 6
Render Pass Creation ...................................................................................................................................................... 7
Creating a Framebuffer ....................................................................................................................................................... 8
Creating Image Views...................................................................................................................................................... 9
Specifying Framebuffer Parameters ............................................................................................................................. 10
Creating a Graphics Pipeline ............................................................................................................................................. 11
Creating a Shader Module ............................................................................................................................................ 12
Preparing a Description of the Shader Stages .............................................................................................................. 14
Preparing Description of a Vertex Input ....................................................................................................................... 17
Preparing the Description of an Input Assembly .......................................................................................................... 17
Preparing the Viewport’s Description ........................................................................................................................... 18
Preparing the Rasterization State’s Description ........................................................................................................... 20
Setting the Multisampling State’s Description ............................................................................................................. 21
Setting the Blending State’s Description....................................................................................................................... 21
Creating a Pipeline Layout ............................................................................................................................................ 23
Creating a Graphics Pipeline ......................................................................................................................................... 25
Preparing Drawing Commands ......................................................................................................................................... 27
Creating a Command Pool ............................................................................................................................................ 27
Allocating Command Buffers ........................................................................................................................................ 27
Recording Command Buffers ........................................................................................................................................ 29
Drawing ............................................................................................................................................................................. 33
Tutorial 3 Execution .......................................................................................................................................................... 34
Cleaning Up ....................................................................................................................................................................... 35
Conclusion ......................................................................................................................................................................... 36
Tutorial 3: First Triangle – Graphics Pipeline and Drawing
In this tutorial we will finally draw something on the screen. One single triangle should be just fine for our first Vulkan-
generated “image.”

The graphics pipeline and drawing in general require lots of preparations in Vulkan (in the form of filling many
structures with even more different fields). There are potentially many places where we can make mistakes, and in Vulkan,
even simple mistakes may lead to the application not working as expected, displaying just a blank screen, and leaving us
wondering what went wrong. In such situations validation layers can help us a lot. But I didn’t want to dive into too many
different aspects and the specifics of the Vulkan API. So I prepared the code to be as small and as simple as possible.

This led me to create an application that is working properly and displays a simple triangle the way I expected, but it
also uses mechanics that are not recommended, not flexible, and also probably not too efficient (though correct). I don’t
want to teach solutions that aren’t recommended, but here it simplifies the tutorial quite considerably and allows us to
focus only on the minimal required set of API usage. I will point out the “disputable” functionality as soon as we get to it.
And in the next tutorial, I will show the recommended way of drawing triangles.

To draw our first simple triangle, we need to create a render pass, a framebuffer, and a graphics pipeline. Command
buffers are of course also needed, but we already know something about them. We will create simple GLSL shaders and
compile them into Khronos’s SPIR*-V language—the only (at this time) form of shaders that Vulkan (officially) understands.

If nothing displays on your computer’s screen, try to simplify the code as much as possible or even go back to the
second tutorial. Check whether command buffer that just clears image behaves as expected, and that the color the image
was cleared to is properly displayed on the screen. If yes, modify the code and add the parts from this tutorial. Check every
return value if it is not VK_SUCCESS. If these ideas don’t help, wait for the tutorial about validation layers.

About the Source Code Example


For this and succeeding tutorials, I’ve changed the sample project. Vulkan preparation phases that were described in
the previous tutorials were placed in a “VulkanCommon” class found in separate files (header and source). The class for a
given tutorial that is responsible for presenting topics described in a given tutorial, inherits from the “VulkanCommon”
class and has access to some (required) Vulkan variables like device or swap chain. This way I can reuse Vulkan creation
code and prepare smaller classes focusing only on the presented topics. The code from the earlier chapters works properly
so it should also be easier to find potential mistakes.

I’ve also added a separate set of files for some utility functions. Here we will be reading SPIR-V shaders from binary
files, so I’ve added a function for checking loading contents of a binary file. It can be found in Tools.cpp and Tools.h files.

Creating a Render Pass


To draw anything on the screen, we need a graphics pipeline. But creating it now will require pointers to other
structures, which will probably also need pointers to yet other structures. So we’ll start with a render pass.

What is a render pass? A general picture can give us a “logical” render pass that may be found in many known
rendering techniques like deferred shading. This technique consists of many subpasses. The first subpass draws the
geometry with shaders that fill the G-Buffer: store diffuse color in one texture, normal vectors in another, shininess in
another, depth (position) in yet another. Next for each light source, drawing is performed that reads some of the data
(normal vectors, shininess, depth/position), calculates lighting and stores it in another texture. Final pass aggregates
lighting data with diffuse color. This is a (very rough) explanation of deferred shading but describes the render pass—a set
of data required to perform some drawing operations: storing data in textures and reading data from other textures.

In Vulkan, a render pass represents (or describes) a set of framebuffer attachments (images) required for drawing
operations and a collection of subpasses that drawing operations will be ordered into. It is a construct that collects all
color, depth and stencil attachments and operations modifying them in such a way that driver does not have to deduce
this information by itself what may give substantial optimization opportunities on some GPUs. A subpass consists of
drawing operations that use (more or less) the same attachments. Each of these drawing operations may read from some
input attachments and render data into some other (color, depth, stencil) attachments. A render pass also describes the
dependencies between these attachments: in one subpass we perform rendering into the texture, but in another this
texture will be used as a source of data (that is, it will be sampled from). All this data help the graphics hardware optimize
drawing operations.

To create a render pass in Vulkan, we call the vkCreateRenderPass() function, which requires a pointer to a structure
describing all the attachments involved in rendering and all the subpasses forming the render pass. As usual, the more
attachments and subpasses we use, the more array elements containing properly filed structures we need. In our simple
example, we will be drawing only into a single texture (color attachment) with just a single subpass.

Render Pass Attachment Description


VkAttachmentDescription attachment_descriptions[] = {
{
0, // VkAttachmentDescriptionFlags
flags
GetSwapChain().Format, // VkFormat
format
VK_SAMPLE_COUNT_1_BIT, // VkSampleCountFlagBits
samples
VK_ATTACHMENT_LOAD_OP_CLEAR, // VkAttachmentLoadOp
loadOp
VK_ATTACHMENT_STORE_OP_STORE, // VkAttachmentStoreOp
storeOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE, // VkAttachmentLoadOp
stencilLoadOp
VK_ATTACHMENT_STORE_OP_DONT_CARE, // VkAttachmentStoreOp
stencilStoreOp
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
initialLayout;
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR // VkImageLayout
finalLayout
}
};
1. Tutorial03.cpp, function CreateRenderPass()

To create a render pass, first we prepare an array with elements describing each attachment, regardless of the type
of attachment and how it will be used inside a render pass. Each array element is of type VkAttachmentDescription, which
contains the following fields:

 flags – Describes additional properties of an attachment. Currently, only an aliasing flag is available, which
informs the driver that the attachment shares the same physical memory with another attachment; it is not
the case here so we set this parameter to zero.
 format – Format of an image used for the attachment; here we are rendering directly into a swap chain so we
need to take its format.
 samples – Number of samples of the image; we are not using any multisampling here so we just use one
sample.
 loadOp – Specifies what to do with the image’s contents at the beginning of a render pass, whether we want
them to be cleared, preserved, or we don’t care about them (as we will overwrite them all). Here we want to
clear the image to the specified value. This parameter also refers to depth part of depth/stencil images.
 storeOp – Informs the driver what to do with the image’s contents after the render pass (after a subpass in
which the image was used for the last time). Here we want the contents of the image to be preserved after
the render pass as we intend to display them on screen. This parameter also refers to the depth part of
depth/stencil images.
 stencilLoadOp – The same as loadOp but for the stencil part of depth/stencil images; for color attachments it
is ignored.
 stencilStoreOp – The same as storeOp but for the stencil part of depth/stencil images; for color attachments
this parameter is ignored.
 initialLayout – The layout the given attachment will have when the render pass starts (what the layout image
is provided with by the application).
 finalLayout – The layout the driver will automatically transition the given image into at the end of a render
pass.

Some additional information is required for load and store operations and initial and final layouts.

Load op refers to the attachment’s contents at the beginning of a render pass. This operation describes what the
graphics hardware should do with the attachment: clear it, operate on its existing contents (leave its contents untouched),
or it shouldn’t matter about the contents because the application intends to overwrite them. This gives the hardware an
opportunity to optimize memory operations. For example, if we intend to overwrite all of the contents, the hardware
won’t bother with them and, if it is faster, may allocate totally new memory for the attachment.

Store op, as the name suggests, is used at the end of a render pass and informs the hardware whether we want to use
the contents of the attachment after the render pass or if we don’t care about it and the contents may be discarded. In
some scenarios (when contents are discarded) this creates the ability for the hardware to create an image in temporary,
fast memory as the image will “live” only during the render pass and the implementations may save some memory
bandwidth avoiding writing back data that is not needed anymore.

When an attachment has a depth format (and potentially also a stencil component) load and store ops refer only to
the depth component. If a stencil is present, stencil values are treated the way stencil load and store ops describe. For
color attachments, stencil ops are not relevant.

Layout, as I described in the swap chain tutorial, is an internal memory arrangement of an image. Image data may be
organized in such a way that neighboring “image pixels” are also neighbors in memory, which can increase cache hits
(faster memory reading) when image is used as a source of data (that is, during texture sampling). But caching is not
necessary when the image is used as a target for drawing operations, and the memory for that image may be organized
in a totally different way. Image may have linear layout (which gives the CPU ability to read or populate image’s memory
contents) or optimal layout (which is optimized for performance but is also hardware/vendor dependent). So some
hardware may have special memory organization for some types of operations; other hardware may be operations-
agnostic. Some of the memory layouts may be better suited for some intended image “usages.” Or from the other side,
some usages may require specific memory layouts. There is also a general layout that is compatible with all types of
operations. But from the performance point of view, it is always best to set the layout appropriate for an intended image
usage and it is application’s responsibility to inform the driver about transitions.

Image layouts may be changed using image memory barriers. We did this in the swap chain tutorial when we first
changed the layout from the presentation source (image was used by the presentation engine) to transfer destination (we
wanted to clear the image with a given color). But layouts, apart from image memory barriers, may also be changed
automatically by the hardware inside a render pass. If we specify a different initial layout, subpass layouts (described
later), and final layout, the hardware does the transition automatically at the appropriate time.

Initial layout informs the hardware about the layout the application “provides” (or “leaves”) the given attachment
with. This is the layout the image starts with at the beginning of a render pass (in our example we acquire the image from
the presentation engine so the image has a “presentation source” layout set). Each subpass of a render pass may use a
different layout, and the transition will be done automatically by the hardware between subpasses. The final layout is the
layout the given attachment will be transitioned into (automatically) at the end of a render pass (after a render pass is
finished).
This information must be prepared for each attachment that will be used in a render pass. When graphics hardware
receives this information a priori, it may optimize operations and memory during the render pass to achieve the best
possible performance.

Subpass Description
VkAttachmentReference color_attachment_references[] = {
{
0, // uint32_t
attachment
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL // VkImageLayout
layout
}
};

VkSubpassDescription subpass_descriptions[] = {
{
0, // VkSubpassDescriptionFlags
flags
VK_PIPELINE_BIND_POINT_GRAPHICS, // VkPipelineBindPoint
pipelineBindPoint
0, // uint32_t
inputAttachmentCount
nullptr, // const VkAttachmentReference
*pInputAttachments
1, // uint32_t
colorAttachmentCount
color_attachment_references, // const VkAttachmentReference
*pColorAttachments
nullptr, // const VkAttachmentReference
*pResolveAttachments
nullptr, // const VkAttachmentReference
*pDepthStencilAttachment
0, // uint32_t
preserveAttachmentCount
nullptr // const uint32_t*
pPreserveAttachments
}
};
2. Tutorial03.cpp, function CreateRenderPass()

Next we specify the description of each subpass our render pass will include. This is done using VkSubpassDescription
structure, which contains the following fields:

 flags – Parameter reserved for future use.


 pipelineBindPoint – Type of pipeline in which this subpass will be used (graphics or compute). Our example,
of course, uses a graphics pipeline.
 inputAttachmentCount – Number of elements in the pInputAttachments array.
 pInputAttachments – Array with elements describing which attachments are used as an input and can be read
from inside shaders. We are not using any input attachments here so we set this value to 0.
 colorAttachmentCount – Number of elements in pColorAttachments and pResolveAttachments arrays.
 pColorAttachments – Array describing (pointing to) attachments which will be used as color render targets
(that image will be rendered into).
 pResolveAttachments – Array closely connected with color attachments. Each element from this array
corresponds to an element from a color attachments array; any such color attachment will be resolved to a
given resolve attachment (if a resolve attachment at the same index is not null or if the whole pointer is not
null). This is optional and can be set to null.
 pDepthStencilAttachment – Description of an attachment that will be used for depth (and/or stencil) data.
We don’t use depth information here so we can set it to null.
 preserveAttachmentCount – Number of elements in pPreserveAttachments array.
 pPreserveAttachments – Array describing attachments that should be preserved. When we have multiple
subpasses not all of them will use all attachments. If a subpass doesn’t use some of the attachments but we
need their contents in the later subpasses, we must specify these attachments here.

The pInputAttachments, pColorAttachments, pResolveAttachments, pPreserveAttachments, and


pDepthStencilAttachment parameters are all of type VkAttachmentReference. This structure contains only these two
fields:

 attachment – Index into an attachment_descriptions array of VkRenderPassCreateInfo.


 layout – Requested (required) layout the attachment will use during a given subpass. The hardware will
perform an automatic transition into a provided layout just before a given subpass.

This structure contains references (indices) into the attachment_descriptions array of VkRenderPassCreateInfo. When
we create a render pass we must provide a description of all attachments used during a render pass. We’ve prepared this
description earlier in “Render pass attachment description” when we created the attachment_descriptions array. Right
now it contains only one element, but in more advanced scenarios there will be multiple attachments. So this “general”
collection of all render pass attachments is used as a reference point. In the subpass description, when we fill
pColorAttachments or pDepthStencilAttachment members, we provide indices into this very “general” collection, like this:
take the first attachment from all render pass attachments and use it as a color attachment. The second attachment from
that array will be used for depth data.

There is a separation between a whole render pass and its subpasses because each subpass may use multiple
attachments in a different way, that is, in one subpass we are rendering into one color attachment but in the next subpass
we are reading from this attachment. In this way, we can prepare a list of all attachments used in the whole render pass,
and at the same time we can specify how each attachment will be used in each subpass. And as each subpass may use a
given attachment in its own way, we must also specify each image’s layout for each subpass.

So before we can specify a description of all subpasses (an array with elements of type VkSubpassDescription) we
must create references for each attachment used in each subpass. And this is what the color_attachment_references
variable was created for. When I write a tutorial for rendering into a texture, this usage will be more apparent.

Render Pass Creation


We now have all the data we need to create a render pass.
VkRenderPassCreateInfo render_pass_create_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkRenderPassCreateFlags
flags
1, // uint32_t
attachmentCount
attachment_descriptions, // const VkAttachmentDescription
*pAttachments
1, // uint32_t
subpassCount
subpass_descriptions, // const VkSubpassDescription
*pSubpasses
0, // uint32_t
dependencyCount
nullptr // const VkSubpassDependency
*pDependencies
};

if( vkCreateRenderPass( GetDevice(), &render_pass_create_info, nullptr,


&Vulkan.RenderPass ) != VK_SUCCESS ) {
printf( "Could not create render pass!\n" );
return false;
}

return true;
3. Tutorial03.cpp, function CreateRenderPass()

We start by filling the VkRenderPassCreateInfo structure, which contains the following fields:

 sType – Type of structure (VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO here).


 pNext – Parameter not currently used.
 flags – Parameter reserved for future use.
 attachmentCount – Number of all different attachments (elements in pAttachments array) used during whole
render pass (here just one).
 pAttachments – Array specifying all attachments used in a render pass.
 subpassCount – Number of subpasses a render pass consists of (and number of elements in pSubpasses array
– just one in our simple example).
 pSubpasses – Array with descriptions of all subpasses.
 dependencyCount – Number of elements in pDependencies array (zero here).
 pDependencies – Array describing dependencies between pairs of subpasses. We don’t have many subpasses
so we don’t have dependencies here (set it to null here).

Dependencies describe what parts of the graphics pipeline use memory resource in what way. Each subpass may use
resources in a different way. Layouts of each resource may not solely define how they use resources. Some subpasses may
render into images or store data through shader images. Other may not use images at all or may read from them at
different pipeline stages (that is, vertex or fragment).

This information helps the driver optimize automatic layout transitions and, more generally, optimize barriers
between subpasses. When we are writing into images only in a vertex shader there is no point waiting until the fragment
shader executes (of course in terms of used images). After all the vertex operations are done, images may immediately
change their layouts and memory access type, and even some parts of graphics hardware may start executing the next
operations (that are referencing or reading the given images) without the need to wait for the rest of the commands from
the given subpass to finish. For now, just remember that dependencies are important from a performance point of view.

So now that we have prepared all the information required to create a render pass, we can safely call the
vkCreateRenderPass() function.

Creating a Framebuffer
We have created a render pass. It describes all attachments and all subpasses used during the render pass. But this
description is quite abstract. We have specified formats of all attachments (just one image in this example) and described
how attachments will be used by each subpass (also just one here). But we didn’t specify WHAT attachments we will be
using or, in other words, what images will be used as these attachments. This is done through a framebuffer.

A framebuffer describes specific images that the render pass operates on. In OpenGL*, a framebuffer is a set of
textures (attachments) we are rendering into. In Vulkan, this term is much broader. It describes all the textures
(attachments) used during the render pass, not only the images we are rendering into (color and depth/stencil
attachments) but also images used as a source of data (input attachments).
This separation of render pass and framebuffer gives us some additional flexibility. We can use the given render pass
with different framebuffers and a given framebuffer with different render passes, if they are compatible, meaning that
they operate in a similar fashion on images of similar types and usages.

Before we can create a framebuffer, we must create image views for each image used as a framebuffer and render
pass attachment. In Vulkan, not only in the case of framebuffers, but in general, we don’t operate on images themselves.
Images are not accessed directly. For this purpose, image views are used. Image views represent images, they “wrap”
images and provide additional (meta)data for them.

Creating Image Views


In this simple application, we want to render directly into swap chain images. We have created a swap chain with
multiple images, so we must create an image view for each of them.
const std::vector<VkImage> &swap_chain_images = GetSwapChain().Images;
Vulkan.FramebufferObjects.resize( swap_chain_images.size() );

for( size_t i = 0; i < swap_chain_images.size(); ++i ) {


VkImageViewCreateInfo image_view_create_info = {
VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkImageViewCreateFlags
flags
swap_chain_images[i], // VkImage
image
VK_IMAGE_VIEW_TYPE_2D, // VkImageViewType
viewType
GetSwapChain().Format, // VkFormat
format
{ // VkComponentMapping
components
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle r
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle g
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle b
VK_COMPONENT_SWIZZLE_IDENTITY // VkComponentSwizzle a
},
{ // VkImageSubresourceRange
subresourceRange
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
}
};

if( vkCreateImageView( GetDevice(), &image_view_create_info, nullptr,


&Vulkan.FramebufferObjects[i].ImageView ) != VK_SUCCESS ) {
printf( "Could not create image view for framebuffer!\n" );
return false;
}
4. Tutorial03.cpp, function CreateFramebuffers()
To create an image view, we must first create a variable of type VkImageViewCreateInfo. It contains the following
fields:

 sType – Structure type, in this case it should be set to VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO.


 pNext – Parameter typically set to null.
 flags – Parameter reserved for future use.
 image – Handle to an image for which view will be created.
 viewType – Type of view we want to create. View type must be compatible with an image it is created for.
(that is, we can create a 2D view for an image that has multiple array layers or we can create a CUBE view for
a 2D image with six layers).
 format – Format of an image view; it must be compatible with the image’s format but may not be the same
format (that is, it may be a different format but with the same number of bits per pixel).
 components – Mapping of an image components into a vector returned in the shader by texturing operations.
This applies only to read operations (sampling), but since we are using an image as a color attachment (we
are rendering into an image) we must set the so-called identity mapping (R component into R, G -> G, and so
on) or just use “identity” value (VK_COMPONENT_SWIZZLE_IDENTITY).
 subresourceRange – Describes the set of mipmap levels and array layers that will be accessible to a view. If
our image is mipmapped, we may specify the specific mipmap level we want to render to (and in case of
render targets we must specify exactly one mipmap level of one array layer).

As you can see here, we acquire handles to all swap chain images, and we are referencing them inside a loop. This
way we fill the structure required for image view creation, which we pass to a vkCreateImageView() function. We do this
for each image that was created along with a swap chain.

Specifying Framebuffer Parameters


Now we can create a framebuffer. To do this we call the vkCreateFramebuffer() function.
VkFramebufferCreateInfo framebuffer_create_info = {
VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkFramebufferCreateFlags
flags
Vulkan.RenderPass, // VkRenderPass
renderPass
1, // uint32_t
attachmentCount
&Vulkan.FramebufferObjects[i].ImageView, // const VkImageView
*pAttachments
300, // uint32_t
width
300, // uint32_t
height
1 // uint32_t
layers
};

if( vkCreateFramebuffer( GetDevice(), &framebuffer_create_info, nullptr,


&Vulkan.FramebufferObjects[i].Handle ) != VK_SUCCESS ) {
printf( "Could not create a framebuffer!\n" );
return false;
}
}
return true;
5. Tutorial03.cpp, function CreateFramebuffers()
vkCreateFramebuffer() function requires us to provide a pointer to a variable of type VkFramebufferCreateInfo so we
must first prepare it. It contains the following fields:

 sType – Structure type set to VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO in this situation.


 pNext – Parameter most of the time set to null.
 flags – Parameter reserved for future use.
 renderPass – Render pass this framebuffer will be compatible with.
 attachmentCount – Number of attachments in a framebuffer (elements in pAttachments array).
 pAttachments – Array of image views representing all attachments used in a framebuffer and render pass.
Each element in this array (each image view) corresponds to each attachment in a render pass.
 width – Width of a framebuffer.
 height – Height of a framebuffer.
 layers – Number of layers in a framebuffer (OpenGL’s layered rendering with geometry shaders, which could
specify the layer into which fragments rasterized from a given polygon will be rendered).

The framebuffer specifies what images are used as attachments on which the render pass operates. We can say that
it translates image (image view) into a given attachment. The number of images specified for a framebuffer must be the
same as the number of attachments in a render pass for which we are creating a framebuffer. Also, each pAttachments
array’s element corresponds directly to an attachment in a render pass description structure. Render pass and framebuffer
are closely connected, and that’s why we also must specify a render pass during framebuffer creation. But we may use a
framebuffer not only with the specified render pass but also with all render passes that are compatible with the one
specified. Compatible render passes, in general, must have the same number of attachments and corresponding
attachments must have the same format and number of samples. But image layouts (initial, final, and for each subpass)
may differ and doesn’t involve render pass compatibility.

After we have finished creating and filling the VkFramebufferCreateInfo structure, we call the vkCreateFramebuffer()
function.

The above code executes in a loop. A framebuffer references image views. Here the image view is created for each
swap chain image. So for each swap chain image and its view, we are creating a framebuffer. We are doing this in order
to simplify the code called in a rendering loop. In a normal, real-life scenario we wouldn’t (probably) create a framebuffer
for each swap chain image. I assume that a better solution would be to render into a single image (texture) and after that
use command buffers that would copy rendering results from that image into a given swap chain image. This way we will
have only three simple command buffers that are connected with a swap chain. All other rendering commands would be
independent of a swap chain, making it easier to maintain.

Creating a Graphics Pipeline


Now we are ready to create a graphics pipeline. A pipeline is a collection of stages that process data one stage after
another. In Vulkan there is currently a compute pipeline and a graphics pipeline. The compute pipeline allows us to
perform some computational work, such as performing physics calculations for objects in games. The graphics pipeline is
used for drawing operations.

In OpenGL there are multiple programmable stages (vertex, tessellation, fragment shaders, and so on) and some fixed
function stages (rasterizer, depth test, blending, and so on). In Vulkan, the situation is similar. There are similar (if not
identical) stages. But the whole pipeline’s state is gathered in one monolithic object. OpenGL allows us to change the state
that influences rendering operations anytime we want, we can change parameters for each stage (mostly) independently.
We can set up shader programs, depths test, blending, and whatever state we want, and then we can render some objects.
Next we can change just some small part of the state and render another object. In Vulkan, such operations can’t be done
(we say that pipelines are “immutable”). We must prepare the whole state and set up parameters for pipeline stages and
group them in a pipeline object. At the beginning this was one of the most startling pieces information for me. I’m not
able to change shader program anytime I want? Why?
The easiest and more valid explanation is because of the performance implications of such state changes. Changing
just one single state of the whole pipeline may cause graphics hardware to perform many background operations like
state and error checking. Different hardware vendors may implement (and usually are implementing) such functionality
differently. This may cause applications to perform differently (meaning unpredictably, performance-wise) when executed
on different graphics hardware. So the ability to change anything at any time is convenient for developers. But,
unfortunately, it is not so convenient for the hardware.

That’s why in Vulkan the state of the whole pipeline is to gather in one, single object. All the relevant state and error
checking is performed when the pipeline object is created. When there are problems (like different parts of pipeline are
set up in an incompatible way) pipeline object creation fails. But we know that upfront. The driver doesn’t have to worry
for us and do whatever it can to properly use such a broken pipeline. It can immediately tell us about the problem. But
during real usage, in performance-critical parts of the application, everything is already set up correctly and can be used
as is.

The downside of this methodology is that we have to create multiple pipeline objects, multiple variations of pipeline
objects when we are drawing many objects in a different way (some opaque, some semi-transparent, some with depth
test enabled, others without). Unfortunately, even different shaders make us create different pipeline objects. If we want
to draw objects using different shaders, we also have to create multiple pipeline objects, one for each combination of
shader programs. Shaders are also connected with the whole pipeline state. They use different resources (like textures
and buffers), render into different color attachments, and read from different attachments (possibly that were rendered
into before). These connections must also be initialized, prepared, and set up correctly. We know what we want to do,
the driver does not. So it is better and far more logical that we do it, not the driver. In general this approach makes sense.

To begin the pipeline creation process, let’s start with shaders.

Creating a Shader Module


Creating a graphics pipeline requires us to prepare lots of data in the form of structures or even arrays of structures.
The first such data is a collection of all shader stages and shader programs that will be used during rendering with a given
graphics pipeline bound.

In OpenGL, we write shaders in GLSL. They are compiled and then linked into shader programs directly in our
application. We can use or stop using a shader program anytime we want in our application.

Vulkan on the other hand accepts only a binary representation of shaders, an intermediate language called SPIR-V.
We can’t provide GLSL code like we did in OpenGL. But there is an official, separate compiler that can transform shaders
written in GLSL into a binary SPIR-V language. To use it, we have to do it offline. After we prepare the SPIR-V assembly we
can create a shader module from it. Such modules are then composed into an array of VkPipelineShaderStageCreateInfo
structures, which are used, among other parameters, to create graphics pipeline.

Here’s the code that creates a shader module from a specified file that contains a binary SPIR-V.
const std::vector<char> code = Tools::GetBinaryFileContents( filename );
if( code.size() == 0 ) {
return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

VkShaderModuleCreateInfo shader_module_create_info = {
VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkShaderModuleCreateFlags
flags
code.size(), // size_t
codeSize
reinterpret_cast<const uint32_t*>(&code[0]) // const uint32_t
*pCode
};

VkShaderModule shader_module;
if( vkCreateShaderModule( GetDevice(), &shader_module_create_info, nullptr,
&shader_module ) != VK_SUCCESS ) {
printf( "Could not create shader module from a %s file!\n", filename );
return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>( shader_module,


vkDestroyShaderModule, GetDevice() );
6. Tutorial03.cpp, function CreateShaderModule()

First we prepare a VkShaderModuleCreateInfo structure that contains the following fields:

 sType – Type of structure, in this example set to VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO.


 pNext – Pointer not yet used.
 flags – Parameter reserved for future use.
 codeSize – Size in bytes of the code passed in pCode parameter.
 pCode – Pointer to an array with source code (binary SPIR-V assembly).

To acquire the contents of the file, I have prepared a simple utility function GetBinaryFileContents() that reads the
entire contents of a specified file. It returns the content in a vector of chars.

After we prepare a structure, we can call the vkCreateShaderModule() function and check whether everything went
fine.

The AutoDeleter<> class from Tools namespace is a helper class that wraps a given Vulkan object handle and takes a
function that is called to delete that object. This class is similar to smart pointers, which delete the allocated memory
when the object (the smart pointer) goes out of scope. AutoDeleter<> class takes the handle of a given object and deletes
it with a provided function when the object of this class’s type goes out of scope.
template<class T, class F>
class AutoDeleter {
public:
AutoDeleter() :
Object( VK_NULL_HANDLE ),
Deleter( nullptr ),
Device( VK_NULL_HANDLE ) {
}

AutoDeleter( T object, F deleter, VkDevice device ) :


Object( object ),
Deleter( deleter ),
Device( device ) {
}

AutoDeleter( AutoDeleter&& other ) {


*this = std::move( other );
}

~AutoDeleter() {
if( (Object != VK_NULL_HANDLE) && (Deleter != nullptr) && (Device !=
VK_NULL_HANDLE) ) {
Deleter( Device, Object, nullptr );
}
}
AutoDeleter& operator=( AutoDeleter&& other ) {
if( this != &other ) {
Object = other.Object;
Deleter = other.Deleter;
Device = other.Device;
other.Object = VK_NULL_HANDLE;
}
return *this;
}

T Get() {
return Object;
}

bool operator !() const {


return Object == VK_NULL_HANDLE;
}

private:
AutoDeleter( const AutoDeleter& );
AutoDeleter& operator=( const AutoDeleter& );
T Object;
F Deleter;
VkDevice Device;
};
7. Tools.h, -

Why so much effort for one simple object? Shader modules are one of the objects required to create the graphics
pipeline. But after the pipeline is created, we don’t need these shader modules anymore. Sometimes it is convenient to
keep them as we may need to create additional, similar pipelines. But in this example they may be safely destroyed after
we create a graphics pipeline. Shader modules are destroyed by calling the vkDestroyShaderModule() function. But in the
example, we would need to call this function in many places: inside multiple “ifs” and at the end of the whole function.
Because I don’t want to remember where I need to call this function and, at the same time, I don’t want any memory leaks
to occur, I have prepared this simple class just for convenience. Now, I don’t have to remember to delete the created
shader module because it will be deleted automatically.

Preparing a Description of the Shader Stages


Now that we know how to create and destroy shader modules, we can create data for shader stages compositing our
graphics pipeline. As I have written, the data that describes what shader stages should be active when a given graphics
pipeline is bound has a form of an array with elements of type VkPipelineShaderStageCreateInfo. Here is the code that
creates shader modules and prepares such an array:
Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> vertex_shader_module =
CreateShaderModule( "Data03/vert.spv" );
Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> fragment_shader_module
= CreateShaderModule( "Data03/frag.spv" );

if( !vertex_shader_module || !fragment_shader_module ) {


return false;
}

std::vector<VkPipelineShaderStageCreateInfo> shader_stage_create_infos = {
// Vertex shader
{
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineShaderStageCreateFlags flags
VK_SHADER_STAGE_VERTEX_BIT, //
VkShaderStageFlagBits stage
vertex_shader_module.Get(), // VkShaderModule
module
"main", // const char
*pName
nullptr // const
VkSpecializationInfo *pSpecializationInfo
},
// Fragment shader
{
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineShaderStageCreateFlags flags
VK_SHADER_STAGE_FRAGMENT_BIT, //
VkShaderStageFlagBits stage
fragment_shader_module.Get(), // VkShaderModule
module
"main", // const char
*pName
nullptr // const
VkSpecializationInfo *pSpecializationInfo
}
};
8. Tutorial03.cpp, function CreatePipeline()

At the beginning we are creating two shader modules for vertex and fragment stages. They are created with the
function presented earlier. When any error occurs and we return from the CreatePipeline() function, any created module
is deleted automatically by a wrapper class with a provided deleter function.

The code for the shader modules is read from files that contain the binary SPIR-V assembly. These files are generated
with an application called “glslangValidator”. This is a tool distributed officially with the Vulkan SDK and is designed to
validate GLSL shaders. But “glslangValidator” also has the capability to compile or rather transform GLSL shaders into SPIR-
V binary files. A full explanation of the command line for its usage can be found at the official SDK site. I’ve used the
following commands to generate SPIR-V shaders for this tutorial:
glslangValidator.exe -V -H shader.vert > vert.spv.txt
glslangValidator.exe -V -H shader.frag > frag.spv.txt

“glslangValidator” takes a specified file and generates SPIR-V file from it. The type of shader stage is automatically
detected by the input file’s extension (“.vert” for vertex shaders, “.geom” for geometry shaders, and so on). The name of
the generated file can be specified, but by default it takes a form “<stage>.spv”. So in our example “vert.spv” and
“frag.spv” files will be generated.

SPIR-V files have a binary format so it may be hard to read and analyze them—but not impossible. When the “-H”
option is used, “glslangValidator” outputs SPIR-V in a form that can be more easily read. This form is printed on standard
output and that’s why I’m using the “> *.spv.txt” redirection operator.

Here are the contents of a “shader.vert” file from which SPIR-V assembly was generated for the vertex stage:
#version 400
void main() {
vec2 pos[3] = vec2[3]( vec2(-0.7, 0.7), vec2(0.7, 0.7), vec2(0.0, -0.7) );
gl_Position = vec4( pos[gl_VertexIndex], 0.0, 1.0 );
}
9. shader.vert, -

As you can see I have hardcoded the positions of all vertices used to render the triangle. They are indexed using the
Vulkan-specific “gl_VertexIndex” built-in variable. In the simplest scenario, when using non-indexed drawing commands
(which takes place here) this value starts from the value of the “firstVertex” parameter of a drawing command (zero in
the provided example).

This is the disputable part I wrote about earlier—this approach is acceptable and valid but not quite convenient to
maintain and also allows us to skip some of the “structure filling” needed to create the graphics pipeline. I’ve chosen it in
order to shorten and simplify this tutorial as much as possible. In the next tutorial, I will present a more typical way of
drawing any number of vertices, similar to using vertex arrays and indices in OpenGL.

Below is the source code of a fragment shader from the “shader.frag” file that was used to generate the SPIRV-V
assembly for the fragment stage:
#version 400

layout(location = 0) out vec4 out_Color;

void main() {
out_Color = vec4( 0.0, 0.4, 1.0, 1.0 );
}
10. shader.frag, -

In Vulkan’s shaders (when transforming from GLSL to SPIR-V) layout qualifiers are required. Here we specify to what
output (color) attachment we want to store the color values generated by the fragment shader. Because we are using only
one attachment, we must specify the first available location (zero).

Now that you know how to prepare shaders for applications using Vulkan, we can move on to the next step. After we
have created two shader modules, we check whether these operations succeeded. If they did we can start preparing a
description of all shader stages that will constitute our graphics pipeline.

For each enabled shader stage we need to prepare an instance of VkPipelineShaderStageCreateInfo structure. Arrays
of these structures along with the number of its elements are together used in a graphics pipeline create info structure
(provided to the function that creates the graphics pipeline). VkPipelineShaderStageCreateInfo structure has the following
fields:

 sType – Type of structure that we are preparing, which in this case must be equal to
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO.
 pNext – Pointer reserved for extensions.
 flags – Parameter reserved for future use.
 stage – Type of shader stage we are describing (like vertex, tessellation control, and so on).
 module – Handle to a shader module that contains the shader for a given stage.
 pName – Name of the entry point of the provided shader.
 pSpecializationInfo – Pointer to a VkSpecializationInfo structure, which we will leave for now and set to null.

When we are creating a graphics pipeline we don’t create too many (Vulkan) objects. Most of the data is presented in
a form of just such structures.
Preparing Description of a Vertex Input
Now we must provide a description of the input data used for drawing. This is similar to OpenGL’s vertex data:
attributes, number of components, buffers from which to take data, data’s stride, or step rate. In Vulkan this data is of
course prepared in a different way, but in general the meaning is the same. Fortunately, because of the fact that vertex
data is hardcoded into a vertex shader in this tutorial, we can almost entirely skip this step and fill the
VkPipelineVertexInputStateCreateInfo with almost nulls and zeros:
VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineVertexInputStateCreateFlags flags;
0, // uint32_t
vertexBindingDescriptionCount
nullptr, // const
VkVertexInputBindingDescription *pVertexBindingDescriptions
0, // uint32_t
vertexAttributeDescriptionCount
nullptr // const
VkVertexInputAttributeDescription *pVertexAttributeDescriptions
};
11. Tutorial03.cpp, function CreatePipeline()

But for clarity here is a description of the members of the VkPipelineVertexInputStateCreateInfo structure:

 sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO here.


 pNext – Pointer to an extension-specific structure.
 flags – Parameter reserved for future use.
 vertexBindingDescriptionCount – Number of elements in the pVertexBindingDescriptions array.
 pVertexBindingDescriptions – Array with elements describing input vertex data (stride and stepping rate).
 vertexAttributeDescriptionCount – Number of elements in the pVertexAttributeDescriptions array.
 pVertexAttributeDescriptions – Array with elements describing vertex attributes (location, format, offset).

Preparing the Description of an Input Assembly


The next step requires us to describe how vertices should be assembled into primitives. As with OpenGL, we must
specify what topology we want to use: points, lines, triangles, triangle fan, and so on.
VkPipelineInputAssemblyStateCreateInfo input_assembly_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineInputAssemblyStateCreateFlags flags
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST, //
VkPrimitiveTopology topology
VK_FALSE // VkBool32
primitiveRestartEnable
};
12. Tutorial03.cpp, function CreatePipeline()

We do that through the VkPipelineInputAssemblyStateCreateInfo structure, which contains the following members:

 sType – Structure type set here to VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO.


 pNext – Pointer not yet used.
 flags – Parameter reserved for future use.
 topology – Parameter describing how vertices will be organized to form a primitive.
 primitiveRestartEnable – Parameter that tells whether a special index value (when indexed drawing is
performed) restarts assembly of a given primitive.

Preparing the Viewport’s Description


We have finished dealing with input data. Now we must specify the form of output data, all the part of the graphics
pipeline that are connected with fragments, like rasterization, window (viewport), depth tests, and so on. The first set of
data we must prepare here is the state of the viewport, which specifies to what part of the image (or texture, or window)
we want do draw.
VkViewport viewport = {
0.0f, // float
x
0.0f, // float
y
300.0f, // float
width
300.0f, // float
height
0.0f, // float
minDepth
1.0f // float
maxDepth
};

VkRect2D scissor = {
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
300, // int32_t
width
300 // int32_t
height
}
};

VkPipelineViewportStateCreateInfo viewport_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineViewportStateCreateFlags flags
1, // uint32_t
viewportCount
&viewport, // const VkViewport
*pViewports
1, // uint32_t
scissorCount
&scissor // const VkRect2D
*pScissors
};
13. Tutorial03.cpp, function CreatePipeline()

In this example, the usage is simple: we just set the viewport coordinates to some predefined values. I don’t check the
size of the swap chain image we are rendering into. But remember that in real-life production applications this has to be
done because the specification states that dimensions of the viewport cannot exceed the dimensions of the attachments
that we are rendering into.

To specify the viewport’s parameters, we fill the VkViewport structure that contains these fields:

 x – Left side of the viewport.


 y – Upper side of the viewport.
 width – Width of the viewport.
 height – Height of the viewport.
 minDepth – Minimal depth value used for depth calculations.
 maxDepth – Maximal depth value used for depth calculations.

When specifying viewport coordinates, remember that the origin is different than in OpenGL. Here we specify the
upper-left corner of the viewport (not the lower left).

Also worth noting is that the minDepth and maxDepth values must be between 0.0 and 1.0 (inclusive) but maxDepth
can be lower than minDepth. This will cause the depth to be calculated in “reverse.”

Next we must specify the parameters for the scissor test. The scissor test, similarly to OpenGL, restricts generation of
fragments only to the specified rectangular area. But in Vulkan, the scissor test is always enabled and can’t be turned off.
We can just provide the values identical to the ones provided for viewport. Try changing these values and see how it
influences the generated image.

The scissor test doesn’t have a dedicated structure. To provide data for it we fill the VkRect2D structure which contains
two similar structure members. First is VkOffset2D with the following members:

 x – Left side of the rectangular area used for scissor test


 y – Upper side of the scissor area

The second member is of type VkExtent2D, which contains the following fields:

 width – Width of the scissor rectangular area


 height – Height of the scissor area

In general, the meaning of the data we provide for the scissor test through the VkRect2D structure is similar to the
data prepared for viewport.

After we have finished preparing data for viewport and the scissor test, we can finally fill the structure that is used
during pipeline creation. The structure is called VkPipelineViewportStateCreateInfo, and it contains the following fields:

 sType – Type of the structure, VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO here.


 pNext – Pointer reserved for extensions.
 flags – Parameter reserved for future use.
 viewportCount – Number of elements in the pViewports array.
 pViewports – Array with elements describing parameters of viewports used when the given pipeline is bound.
 scissorCount – Number of elements in the pScissors array.
 pScissors – Array with elements describing parameters of the scissor test for each viewport.

Remember that the viewportCount and scissorCount parameters must be equal. We are also allowed to specify more
viewports, but then the multiViewport feature must be also enabled.
Preparing the Rasterization State’s Description
The next part of the graphics pipeline creation applies to the rasterization state. We must specify how polygons are
going to be rasterized (changed into fragments), which means whether we want fragments to be generated for whole
polygons or just their edges (polygon mode) or whether we want to see the front or back side or maybe both sides of the
polygon (face culling). We can also provide depth bias parameters or indicate whether we want to enable depth clamp.
This whole state is encapsulated into VkPipelineRasterizationStateCreateInfo. It contains the following members:

 sType – Structure type, VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO in this


example.
 pNext – Pointer reserved for extensions.
 flags – Parameter reserved for future use.
 depthClampEnable – Parameter describing whether we want to clamp depth values of the rasterized primitive
to the frustum (when true) or if we want normal clipping to occur (false).
 rasterizerDiscardEnable – Deactivates fragment generation (discards primitive before rasterization turning off
fragment shader).
 polygonMode – Controls how the fragments are generated for a given primitive (triangle mode): whether they
are generated for the whole triangle, only its edges, or just its vertices.
 cullMode – Chooses the triangle’s face used for culling (if enabled).
 frontFace – Chooses which side of a triangle should be considered the front (depending on the winding order).
 depthBiasEnable – Enabled or disables biasing of fragments’ depth values.
 depthBiasConstantFactor – Constant factor added to each fragment’s depth value when biasing is enabled.
 depthBiasClamp – Maximum (or minimum) value of bias that can be applied to fragment’s depth.
 depthBiasSlopeFactor – Factor applied for fragment’s slope during depth calculations when biasing is enabled.
 lineWidth – Width of rasterized lines.

Here is the source code responsible for setting rasterization state in our example:
VkPipelineRasterizationStateCreateInfo rasterization_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineRasterizationStateCreateFlags flags
VK_FALSE, // VkBool32
depthClampEnable
VK_FALSE, // VkBool32
rasterizerDiscardEnable
VK_POLYGON_MODE_FILL, // VkPolygonMode
polygonMode
VK_CULL_MODE_BACK_BIT, // VkCullModeFlags
cullMode
VK_FRONT_FACE_COUNTER_CLOCKWISE, // VkFrontFace
frontFace
VK_FALSE, // VkBool32
depthBiasEnable
0.0f, // float
depthBiasConstantFactor
0.0f, // float
depthBiasClamp
0.0f, // float
depthBiasSlopeFactor
1.0f // float
lineWidth
};
14. Tutorial03.cpp, function CreatePipeline()
In the tutorial we are disabling as many parameters as possible to simplify the process, the code itself, and the
rendering operations. The parameters that matter here set up (typical) fill mode for polygon rasterization, back face
culling, and similar to OpenGL’s counterclockwise front faces. Depth biasing and clamping are also disabled (to enable
depth clamping, we first need to enable a dedicated feature during logical device creation; similarly we must do the same
for polygon modes other than “fill”).

Setting the Multisampling State’s Description


In Vulkan, when we are creating a graphics pipeline, we must also specify the state relevant to multisampling. This is
done using the VkPipelineMultisampleStateCreateInfo structure. Here are its members:

 sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO here.


 pNext – Pointer reserved for extensions.
 flags – Parameter reserved for future use.
 rasterizationSamples – Number of per pixel samples used in rasterization.
 sampleShadingEnable – Parameter specifying that shading should occur per sample (when enabled) instead
of per fragment (when disabled).
 minSampleShading – Specifies the minimum number of unique sample locations that should be used during
the given fragment’s shading.
 pSampleMask – Pointer to an array of static coverage sample masks; this can be null.
 alphaToCoverageEnable – Controls whether the fragment’s alpha value should be used for coverage
calculations.
 alphaToOneEnable – Controls whether the fragment’s alpha value should be replaced with one.

In this example, I wanted to minimize possible problems so I’ve set parameters to values that generally disable
multisampling—just one sample per given pixel with the other parameters turned off. Remember that if we want to enable
sample shading or alpha to one, we also need to enable two respective features. Here is a source code that prepares the
VkPipelineMultisampleStateCreateInfo structure:
VkPipelineMultisampleStateCreateInfo multisample_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineMultisampleStateCreateFlags flags
VK_SAMPLE_COUNT_1_BIT, //
VkSampleCountFlagBits rasterizationSamples
VK_FALSE, // VkBool32
sampleShadingEnable
1.0f, // float
minSampleShading
nullptr, // const VkSampleMask
*pSampleMask
VK_FALSE, // VkBool32
alphaToCoverageEnable
VK_FALSE // VkBool32
alphaToOneEnable
};
15. Tutorial03.cpp, function CreatePipeline()

Setting the Blending State’s Description


Another thing we need to prepare when creating a graphics pipeline is a blending state (which also includes logical
operations).
VkPipelineColorBlendAttachmentState color_blend_attachment_state = {
VK_FALSE, // VkBool32
blendEnable
VK_BLEND_FACTOR_ONE, // VkBlendFactor
srcColorBlendFactor
VK_BLEND_FACTOR_ZERO, // VkBlendFactor
dstColorBlendFactor
VK_BLEND_OP_ADD, // VkBlendOp
colorBlendOp
VK_BLEND_FACTOR_ONE, // VkBlendFactor
srcAlphaBlendFactor
VK_BLEND_FACTOR_ZERO, // VkBlendFactor
dstAlphaBlendFactor
VK_BLEND_OP_ADD, // VkBlendOp
alphaBlendOp
VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT | //
VkColorComponentFlags colorWriteMask
VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT
};

VkPipelineColorBlendStateCreateInfo color_blend_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineColorBlendStateCreateFlags flags
VK_FALSE, // VkBool32
logicOpEnable
VK_LOGIC_OP_COPY, // VkLogicOp
logicOp
1, // uint32_t
attachmentCount
&color_blend_attachment_state, // const
VkPipelineColorBlendAttachmentState *pAttachments
{ 0.0f, 0.0f, 0.0f, 0.0f } // float
blendConstants[4]
};
16. Tutorial03.cpp, function CreatePipeline()

Final color operations are set up through the VkPipelineColorBlendStateCreateInfo structure. It contains the following
fields:

 sType – Type of the structure, set to VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO in


this example.
 pNext – Pointer reserved for future, extension-specific use.
 flags – Parameter also reserved for future use.
 logicOpEnable – Indicates whether we want to enable logical operations on pixels.
 logicOp – Type of the logical operation we want to perform (like copy, clear, and so on)
 attachmentCount – Number of elements in the pAttachments array.
 pAttachments – Array containing state parameters for each color attachment used in a subpass for which the
given graphics pipeline is bound.
 blendConstants – Four-element array with color value used in blending operation (when a dedicated blend
factor is used).

More information is needed for the attachmentCount and pAttachments parameters. When we want to perform
drawing operations we set up parameters, the most important of which are graphics pipeline, render pass, and
framebuffer. The graphics card needs to know how to draw (graphics pipeline which describes rendering state, shaders,
test, and so on) and where to draw (the render pass gives general setup; the framebuffer specifies exactly what images
are used). As I have already mentioned, the render pass specifies how operations are ordered, what the dependencies
are, when we are rendering into a given attachment, and when we are reading from the same attachment. These stages
take the form of subpasses. And for each drawing operation we can (but don’t have to) enable/use a different pipeline.
But when we are drawing, we must remember that we are drawing into a set of attachments. This set is defined in a
render pass, which describes all color, input, depth attachments (the framebuffer just specifies what images are used for
each of them). For the blending state, we can specify whether we want to enable blending at all. This is done through the
pAttachments array. Each of its elements must correspond to each color attachment defined in a render pass. So the value
of attachmentCount elements in the pAttachments array must equal the number of color attachments defined in a render
pass.

There is one more restriction. By default all elements in pAttachments array must contain the same values, must be
specified in the same way, and must be identical. By default, blending (and color masks) is done in the same way for all
attachments. So why it is an array? Why can’t we just specify one value? Because there is a feature that allows us to
perform independent, distinct blending for each active color attachment. When we enable the independent blending
feature during device creation we can provide different values for each color attachment.

Each pAttachments array’s element is of type VkPipelineColorBlendAttachmentState. It is a structure with the


following members:

 blendEnable – Indicates whether we want to enable blending at all.


 srcColorBlendFactor – Blending factor for color of the source (incoming) fragment.
 dstColorBlendFactor – Blending factor for the destination color (stored already in the framebuffer at the same
location as the incoming fragment).
 colorBlendOp – Type of operation to perform (multiplication, addition, and so on).
 srcAlphaBlendFactor – Blending factor for the alpha value of the source (incoming) fragment.
 dstAlphaBlendFactor – Blending factor for the destination alpha value (already stored in the framebuffer).
 alphaBlendOp – Type of operation to perform for alpha blending.
 colorWriteMask – Bitmask selecting which of the RGBA components are selected (enabled) for writing.

In this example, we disable blending, which causes all other parameters to be irrelevant. Except for colorWriteMask,
we select all components for writing but you can freely check what will happen when this parameter is changed to some
other R, G, B, A combinations.

Creating a Pipeline Layout


The final thing we must do before pipeline creation is create a proper pipeline layout. A pipeline layout describes all
the resources that can be accessed by the pipeline. In this example we must specify how many textures can be used by
shaders and which shader stages will have access to them. There are of course other resources involved. Apart from shader
stages, we must also describe the types of resources (textures, buffers), their total numbers, and layout. This layout can
be compared to OpenGL’s active textures and shader uniforms. In OpenGL we bind textures to the desired texture image
units and for shader uniforms we don’t provide texture handles but IDs of the texture image units to which actual textures
are bound (we provide the number of the unit which the given texture was associated with).

With Vulkan, the situation is similar. We create some form of a memory layout: first there are two buffers, next we
have three textures and an image. This memory “structure” is called a set and a collection of these sets is provided for the
pipeline. In shaders, we access specified resources using specific memory “locations” from within these sets (layouts). This
is done through a layout (set = X, binding = Y) specifier, which can be translated to: take the resource from the Y memory
location from the X set.

And pipeline layout can be thought of as an interface between shader stages and shader resources as it takes these
groups of resources, describes how they are gathered, and provides them to the pipeline.
This process is complex and I plan to devote a tutorial to it. Here we are not using any additional resources so I present
an example for creating an “empty” pipeline layout:
VkPipelineLayoutCreateInfo layout_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkPipelineLayoutCreateFlags
flags
0, // uint32_t
setLayoutCount
nullptr, // const VkDescriptorSetLayout
*pSetLayouts
0, // uint32_t
pushConstantRangeCount
nullptr // const VkPushConstantRange
*pPushConstantRanges
};

VkPipelineLayout pipeline_layout;
if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr,
&pipeline_layout ) != VK_SUCCESS ) {
printf( "Could not create pipeline layout!\n" );
return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>();
}

return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>(


pipeline_layout, vkDestroyPipelineLayout, GetDevice() );
17. Tutorial03.cpp, function CreatePipelineLayout()

To create a pipeline layout we must first prepare a variable of type VkPipelineLayoutCreateInfo. It contains the
following fields:

 sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO in this example.


 pNext – Parameter reserved for extensions.
 flags – Parameter reserved for future use.
 setLayoutCount – Number of descriptor sets included in this layout.
 pSetLayouts – Pointer to an array containing descriptions of descriptor layouts.
 pushConstantRangeCount – Number of push constant ranges (I will describe it in a later tutorial).
 pPushConstantRanges – Array describing all push constant ranges used inside shaders (in a given pipeline).

In this example we create “empty” layout so almost all the fields are set to null or zero.

We are not using push constants here, but they deserve some explanation. Push constants in Vulkan allow us to modify
the data of constant variables used in shaders. There is a special, small amount of memory reserved for push constants.
We update their values through Vulkan commands, not through memory updates, and it is expected that updates of push
constants’ values are faster than normal memory writes.

As shown in the above example, I’m also wrapping pipeline layout in an “AutoDeleter” object. Pipeline layouts are
required during pipeline creation, descriptor sets binding (enabling/activating this interface between shaders and shader
resources) and push constants setting. None of these operations, except for pipeline creation, take place in this tutorial.
So here, after we create a pipeline, we don’t need the layout anymore. To avoid memory leaks, I have used this helper
class to destroy the layout as soon as we leave the function in which graphics pipeline is created.
Creating a Graphics Pipeline
Now we have all the resources required to properly create graphics pipeline. Here is the code that does that:
Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout> pipeline_layout =
CreatePipelineLayout();
if( !pipeline_layout ) {
return false;
}

VkGraphicsPipelineCreateInfo pipeline_create_info = {
VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineCreateFlags flags
static_cast<uint32_t>(shader_stage_create_infos.size()), // uint32_t
stageCount
&shader_stage_create_infos[0], // const
VkPipelineShaderStageCreateInfo *pStages
&vertex_input_state_create_info, // const
VkPipelineVertexInputStateCreateInfo *pVertexInputState;
&input_assembly_state_create_info, // const
VkPipelineInputAssemblyStateCreateInfo *pInputAssemblyState
nullptr, // const
VkPipelineTessellationStateCreateInfo *pTessellationState
&viewport_state_create_info, // const
VkPipelineViewportStateCreateInfo *pViewportState
&rasterization_state_create_info, // const
VkPipelineRasterizationStateCreateInfo *pRasterizationState
&multisample_state_create_info, // const
VkPipelineMultisampleStateCreateInfo *pMultisampleState
nullptr, // const
VkPipelineDepthStencilStateCreateInfo *pDepthStencilState
&color_blend_state_create_info, // const
VkPipelineColorBlendStateCreateInfo *pColorBlendState
nullptr, // const
VkPipelineDynamicStateCreateInfo *pDynamicState
pipeline_layout.Get(), // VkPipelineLayout
layout
Vulkan.RenderPass, // VkRenderPass
renderPass
0, // uint32_t
subpass
VK_NULL_HANDLE, // VkPipeline
basePipelineHandle
-1 // int32_t
basePipelineIndex
};

if( vkCreateGraphicsPipelines( GetDevice(), VK_NULL_HANDLE, 1, &pipeline_create_info,


nullptr, &Vulkan.GraphicsPipeline ) != VK_SUCCESS ) {
printf( "Could not create graphics pipeline!\n" );
return false;
}
return true;
18. Tutorial03.cpp, function CreatePipeline()

First we create a pipeline layout wrapped in an object of type “AutoDeleter”. Next we fill the structure of type
VkGraphicsPipelineCreateInfo. It contains many fields. Here is a brief description of them:
 sType – Type of structure, VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO here.
 pNext – Parameter reserved for future, extension-related use.
 flags – This time this parameter is not reserved for future use but controls how the pipeline should be created:
if we are creating a derivative pipeline (if we are inheriting from another pipeline) or if we allow creating
derivative pipelines from this one. We can also disable optimizations, which should shorten the time needed
to create a pipeline.
 stageCount – Number of stages described in the pStages parameter; must be greater than zero.
 pStages – Array with descriptions of active shader stages (the ones created using shader modules); each stage
must be unique (we can’t specify a given stage more than once). There also must be a vertex stage present.
 pVertexInputState – Pointer to a variable contain the description of the vertex input’s state.
 pInputAssemblyState – Pointer to a variable with input assembly description.
 pTessellationState – Pointer to a description of the tessellation stages; can be null if tessellation is disabled.
 pViewportState – Pointer to a variable specifying viewport parameters; can be null if rasterization is disabled.
 pRasterizationState – Pointer to a variable specifying rasterization behavior.
 pMultisampleState – Pointer to a variable defining multisampling; can be null if rasterization is disabled.
 pDepthStencilState – Pointer to a description of depth/stencil parameters; this can be null in two situations:
when rasterization is disabled or we’re not using depth/stencil attachments in a render pass.
 pColorBlendState – Pointer to a variable with color blending/write masks state; can be null also in two
situations: when rasterization is disabled or when we’re not using any color attachments inside the render
pass.
 pDynamicState – Pointer to a variable specifying which parts of the graphics pipeline can be set dynamically;
can be null if the whole state is considered static (defined only through this create info structure).
 layout – Handle to a pipeline layout object that describes resources accessed inside shaders.
 renderPass – Handle to a render pass object; pipeline can be used with any render pass compatible with the
provided one.
 subpass – Number (index) of a subpass in which the pipeline will be used.
 basePipelineHandle – Handle to a pipeline this one should derive from.
 basePipelineIndex – Index of a pipeline this one should derive from.

When we are creating a new pipeline, we can inherit some of the parameters from another one. This means that both
pipelines should have much in common. A good example is shader code. We don’t specify what fields are the same, but
the general message that the pipeline inherits from another one may substantially accelerate pipeline creation. But why
are there two fields to indicate a “parent” pipeline? We can’t use them both—only one of them at a time. When we are
using a handle, this means that the “parent” pipeline is already created and we are deriving from the one we have provided
the handle of. But the pipeline creation function allows us to create many pipelines at once. Using the second parameter,
“parent” pipeline index, we can create both “parent” and “child” pipelines in the same call. We just specify an array of
graphics pipeline creation info structures and this array is provided to pipeline creation function. So the
“basePipelineIndex” is the index of pipeline creation info in this very array. We just have to remember that the “parent”
pipeline must be earlier (must have a smaller index) in this array and it must be created with the “allow derivatives” flag
set.

In this example we are creating a pipeline with the state being entirely static (null for the “pDynamicState” parameter).
But what is a dynamic state? To allow for some flexibility and to lower the number of created pipeline objects, the dynamic
state was introduced. We can define through the “pDynamicState” parameter what parts of the graphics pipeline can be
set dynamically through additional Vulkan commands and what parts are being static, set once during pipeline creation.
The dynamic state includes parameters such as viewports, line widths, blend constants, or some stencil parameters. If we
specify that a given state is dynamic, parameters in a pipeline creation info structure that are related to that state are
ignored. We must set the given state using the proper Vulkan commands during rendering because initial values of such
state may be undefined.
So after these quite overwhelming preparations we can create a graphics pipeline. This is done by calling the
vkCreateGraphicsPipelines() function which, among others, takes an array of pointers to the pipeline create info
structures. When everything goes well, VK_SUCCESS should be returned by this function and a handle of a graphics
pipeline should be stored in a variable we’ve provided the address of. Now we are ready to start drawing.

Preparing Drawing Commands


I introduced you to the concept of command buffers in the previous tutorial. Here I will briefly explain what are they
and how to use them.

Command buffers are containers for GPU commands. If we want to execute some job on a device, we do it through
command buffers. This means that we must prepare a set of commands that process data (that is, draw something on the
screen) and record these commands in command buffers. Then we can submit whole buffers to device’s queues. This
submit operation tells the device: here is a bunch of things I want you to do for me and do them now.

To record commands, we must first allocate command buffers. These are allocated from command pools, which can
be thought of as memory chunks. If a command buffer needs to be larger (as we record many complicated commands in
it) it can grow and use additional memory from a pool it was allocated with. So first we must create a command pool.

Creating a Command Pool


Command pool creation is simple and looks like this:
VkCommandPoolCreateInfo cmd_pool_create_info = {
VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkCommandPoolCreateFlags
flags
queue_family_index // uint32_t
queueFamilyIndex
};

if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) !=


VK_SUCCESS ) {
return false;
}
return true;
19. Tutorial03.cpp, function CreateCommandPool()

First we prepare a variable of type VkCommandPoolCreateInfo. It contains the following fields:

 sType – Standard type of structure, set to VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO here.


 pNext – Pointer reserved for extensions.
 flags – Indicates usage scenarios for command pool and command buffers allocated from it; that is, we can
tell the driver that command buffers allocated from this pool will live for a short time; for no specific usage
we can set it to zero.
 queueFamilyIndex – Index of a queue family for which we are creating a command pool.

Remember that command buffers allocated from a given pool can only be submitted to a queue from a queue family
specified during pool creation.

To create a command pool, we just call the vkCreateCommandPool() function.

Allocating Command Buffers


Now that we have the command pool ready, we can allocate command buffers from it.
VkCommandBufferAllocateInfo command_buffer_allocate_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
pool, // VkCommandPool
commandPool
VK_COMMAND_BUFFER_LEVEL_PRIMARY, // VkCommandBufferLevel
level
count // uint32_t
bufferCount
};

if( vkAllocateCommandBuffers( GetDevice(), &command_buffer_allocate_info,


command_buffers ) != VK_SUCCESS ) {
return false;
}
return true;
20. Tutorial03.cpp, function AllocateCommandBuffers()

To allocate command buffers we specify a variable of structure type. This time its type is
VkCommandBufferAllocateInfo, which contains these members:

 sType – Type of the structure; VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO for this purpose.


 pNext – Pointer reserved for extensions.
 commandPool – Pool from which we want our command buffers to take their memory.
 level – Command buffer level; there are two levels: primary and secondary; right now we are only interested
in primary command buffers.
 bufferCount – Number of command buffers we want to allocate.

To allocate command buffers, call the vkAllocateCommandBuffers() function and check whether it succeeded. We
can allocate many buffers at once with one function call.

I’ve prepared a simple buffer allocating function to show you how some Vulkan functions can be wrapped for easier
use. Here is a usage of two such wrapper functions that create command pools and allocate command buffers from them.
if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.GraphicsCommandPool )
) {
printf( "Could not create command pool!\n" );
return false;
}

uint32_t image_count = static_cast<uint32_t>(GetSwapChain().Images.size());


Vulkan.GraphicsCommandBuffers.resize( image_count, VK_NULL_HANDLE );

if( !AllocateCommandBuffers( Vulkan.GraphicsCommandPool, image_count,


&Vulkan.GraphicsCommandBuffers[0] ) ) {
printf( "Could not allocate command buffers!\n" );
return false;
}
return true;
21. Tutorial03.cpp, function CreateCommandBuffers()

As you can see, we are creating a command pool for a graphics queue family index. All image state transitions and
drawing operations will be performed on a graphics queue. Presentation is done on another queue (if the presentation
queue is different from the graphics queue) but we don’t need a command buffer for this operation.
And we are also allocating command buffers for each swap chain image. Here we take number of images and provide
it to this simple “wrapper” function for command buffer allocation.

Recording Command Buffers


Now that we have command buffers allocated from the command pool we can finally record operations that will draw
something on the screen. First we must prepare a set of data needed for the recording operation. Some of this data is
identical for all command buffers, but some is referencing a specific swap chain image. Here is a code that is independent
of swap chain images:
VkCommandBufferBeginInfo graphics_commandd_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};

VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};

VkClearValue clear_value = {
{ 1.0f, 0.8f, 0.4f, 0.0f }, // VkClearColorValue
color
};

const std::vector<VkImage>& swap_chain_images = GetSwapChain().Images;


22. Tutorial03.cpp, function RecordCommandBuffers()

Performing command buffer recording is similar to OpenGL’s drawing lists where we start recording a list by calling
the glNewList() function. Next we prepare a set of drawing commands and then we close the list or stop recording it
(glEndList()). So the first thing we need to do is to prepare a variable of type VkCommandBufferBeginInfo. It is used when
we start recording a command buffer and it tells the driver about the type, contents, and desired usage of a command
buffer. Variables of this type contain the following members:

 sType – Standard structure type, here set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.


 pNext – Pointer reserved for extensions.
 flags – Parameters describing the desired usage (that is, if we want to submit this command buffer only once
and destroy/reset it or if it is possible that the buffer will submitted again before the processing of its previous
submission has finished).
 pInheritanceInfo – Parameter used only when we want to record a secondary command buffer.

Next we describe the areas or parts of our images that we will set up image memory barriers for. Here we set up
barriers to specify that queues from different families will reference a given image. This is done through a variable of type
VkImageSubresourceRange with the following members:
 aspectMask – Describes a “type” of image, whether it is for color, depth, or stencil data.
 baseMipLevel – Number of a first mipmap level our operations will be performed on.
 levelCount – Number of mipmap levels (including base level) we will be operating on.
 baseArrayLayer – Number of an first array layer of an image that will take part in operations.
 layerCount – Number of layers (including base layer) that will be modified.

Next we set up a clear value for our images. Before drawing we need to clear images. In previous tutorials, we
performed this operation explicitly by ourselves. Here images are cleared as a part of a render pass attachment load
operation. We set to “clear” so now we must specify the color to which an image must be cleared. This is done using a
variable of type VkClearValue in which we provide R, G, B, A values.

Variables we have created thus far are independent of an image itself, and that’s why we have specified them before
a loop. Now we can start recording command buffers:
for( size_t i = 0; i < Vulkan.GraphicsCommandBuffers.size(); ++i ) {
vkBeginCommandBuffer( Vulkan.GraphicsCommandBuffers[i],
&graphics_commandd_buffer_begin_info );

if( GetPresentQueue().Handle != GetGraphicsQueue().Handle ) {


VkImageMemoryBarrier barrier_from_present_to_draw = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
GetPresentQueue().FamilyIndex, // uint32_t
srcQueueFamilyIndex
GetGraphicsQueue().FamilyIndex, // uint32_t
dstQueueFamilyIndex
swap_chain_images[i], // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i],
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0, 0, nullptr, 0, nullptr, 1,
&barrier_from_present_to_draw );
}

VkRenderPassBeginInfo render_pass_begin_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.RenderPass, // VkRenderPass
renderPass
Vulkan.FramebufferObjects[i].Handle, // VkFramebuffer
framebuffer
{ // VkRect2D
renderArea
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
300, // int32_t
width
300, // int32_t
height
}
},
1, // uint32_t
clearValueCount
&clear_value // const VkClearValue
*pClearValues
};

vkCmdBeginRenderPass( Vulkan.GraphicsCommandBuffers[i], &render_pass_begin_info,


VK_SUBPASS_CONTENTS_INLINE );

vkCmdBindPipeline( Vulkan.GraphicsCommandBuffers[i],
VK_PIPELINE_BIND_POINT_GRAPHICS, Vulkan.GraphicsPipeline );

vkCmdDraw( Vulkan.GraphicsCommandBuffers[i], 3, 1, 0, 0 );

vkCmdEndRenderPass( Vulkan.GraphicsCommandBuffers[i] );

if( GetGraphicsQueue().Handle != GetPresentQueue().Handle ) {


VkImageMemoryBarrier barrier_from_draw_to_present = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
GetGraphicsQueue().FamilyIndex, // uint32_t
srcQueueFamilyIndex
GetPresentQueue( ).FamilyIndex, // uint32_t
dstQueueFamilyIndex
swap_chain_images[i], // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i],
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0,
0, nullptr, 0, nullptr, 1, &barrier_from_draw_to_present );
}
if( vkEndCommandBuffer( Vulkan.GraphicsCommandBuffers[i] ) != VK_SUCCESS ) {
printf( "Could not record command buffer!\n" );
return false;
}
}
return true;
23. Tutorial03.cpp, function RecordCommandBuffers()

Recording a command buffer is started by calling the vkBeginCommandBuffer() function. At the beginning we set up
a barrier that tells the driver that previously queues from one family referenced a given image but now queues from a
different family will be referencing it (we need to do this because during swap chain creation we specified exclusive sharing
mode). The barrier is set only when the graphics queue is different than the present queue. This is done by calling the
vkCmdPipelineBarrier() function. We must specify when in the pipeline the barrier should be placed
(VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT) and how the barrier should be set up. Barrier parameters are
prepared through the VkImageMemoryBarrier structure:

 sType – Type of the structure, here set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.


 pNext – Pointer reserved for extensions.
 srcAccessMask – Type of memory operations that took place in regard to a given image before the barrier.
 dstAccessMask – Type of memory operations connected with a given image that will take place after the
barrier.
 oldLayout – Current image memory layout.
 newLayout – Memory layout image you should have after the barrier.
 srcQueueFamilyIndex – Index of a family of queues which were referencing image before the barrier.
 dstQueueFamilyIndex – Index of a queue family queues from which will be referencing image after the barrier.
 image – Handle to the image itself.
 subresourceRange – Parts of an image for which we want the transition to occur.

In this example we don’t change the layout of an image, for two reasons: (1) The barrier may not be set at all (if the
graphics and present queues are the same), and (2) the layout transition will be performed automatically as a render pass
operation (at the beginning of the first—and only—subpass).

Next we start a render pass. We call the vkCmdBeginRenderPass() function for which we must provide a pointer to a
variable of VkRenderPassBeginInfo type. It contains the following members:

 sType – Standard type of structure. In this case we must set it to a value of


VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO.
 pNext – Pointer reserved for future use.
 renderPass – Handle of a render pass we want to start.
 framebuffer – Handle of a framebuffer, which specifies images used as attachments for this render pass.
 renderArea – Area of all images that will be affected by the operations that takes place in this render pass. It
specifies the upper-left corner (through x and y parameters of an offset member) and width and height
(through extent member) of a render area.
 clearValueCount – Number of elements in pClearValues array.
 pClearValues – Array with clear values for each attachment.

When we specify a render area for the render pass, we must make sure that the rendering operations won’t modify
pixels outside this area. This is just a hint for a driver so it could optimize its behavior. If we won’t confine operations to
the provided area by using a proper scissor test, pixels outside this area may become undefined (we can’t rely on their
contents). We also can’t specify a render area that is greater than a framebuffer’s dimensions (falls outside the
framebuffer).

And with a pClearValues array, it must contain the elements for each render pass attachment. Each of its members
specifies the color to which the given attachment must be cleared when its loadOp is set to clear. For attachments where
loadOp is not clear, the values provided for them are ignored. But we can’t provide an array with a smaller amount of
elements.
We have begun a command buffer, set a barrier (if necessary), and started a render pass. When we start a render pass
we are also starting its first subpass. We can switch to the next subpass by calling the vkCmdNextSubpass() function.
During these operations, layout transitions and clear operations may occur. Clears are done in a subpass in which the
image is first used (referenced). Layout transitions occur each time a subpass layout is different than the layout in a
previous subpass or (in the case of a first subpass or when the image is first referenced) different than the initial layout
(layout before the render pass). So in our example when we start a render pass, the swap chain image’s layout is changed
automatically from “presentation source” to a “color attachment optimal” layout.

Now we bind a graphics pipeline. This is done by calling the vkCmdBindPipeline() function. This “activates” all shader
programs (similar to the glUseProgram() function) and sets desired tests, blending operations, and so on.

After the pipeline is bound, we can finally draw something by calling the vkCmdDraw() function. In this function we
specify the number of vertices we want to draw (three), number of instances that should be drawn (just one), and a
numbers or indices of a first vertex and first instance (both zero).

Next the vkCmdEndRenderPass() function is called which, as the name suggests, ends the given render pass. Here all
final layout transitions occur if the final layout specified for a render pass is different from the layout used in the last
subpass the given image was referenced in.

After that, the barrier may be set in which we tell the driver that the graphics queue finished using a given image and
from now on the present queue will be using it. This is done, once again, only when the graphics and present queues are
different. And after the barrier, we stop recording a command buffer for a given image. All these operations are repeated
for each swap chain image.

Drawing
The drawing function is the same as the Draw() function presented in Tutorial 2. We acquire the image’s index, submit
a proper command buffer, and present an image. We are using semaphores the same way they were used previously: one
semaphore is used for acquiring an image and it tells the graphics queue to wait when the image is not yet available for
use. The second command buffer is used to indicate whether drawing on a graphics queue is finished. The present queue
waits on this semaphore before it can present an image. Here is the source code of a Draw() function:
VkSemaphore image_available_semaphore = GetImageAvailableSemaphore();
VkSemaphore rendering_finished_semaphore = GetRenderingFinishedSemaphore();
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;

VkResult result = vkAcquireNextImageKHR( GetDevice(), swap_chain, UINT64_MAX,


image_available_semaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
case VK_SUCCESS:
case VK_SUBOPTIMAL_KHR:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during swap chain image acquisition!\n" );
return false;
}

VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;


VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&image_available_semaphore, // const VkSemaphore
*pWaitSemaphores
&wait_dst_stage_mask, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&Vulkan.GraphicsCommandBuffers[image_index], // const VkCommandBuffer
*pCommandBuffers
1, // uint32_t
signalSemaphoreCount
&rendering_finished_semaphore // const VkSemaphore
*pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) !=


VK_SUCCESS ) {
return false;
}

VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType sType
nullptr, // const void *pNext
1, // uint32_t
waitSemaphoreCount
&rendering_finished_semaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&swap_chain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );

switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
printf( "Problem occurred during image presentation!\n" );
return false;
}

return true;
24. Tutorial03.cpp, function Draw()

Tutorial 3 Execution
In this tutorial we performed “real” drawing operations. A simple triangle may not sound too convincing, but it is a
good starting point for a first Vulkan-created image. Here is what the triangle should look like:
If you’re wondering why there are black parts in the image, here is an explanation: To simplify the whole code, we
created a framebuffer with a fixed size (width and height of 300 pixels). But the window’s size (and the size of the swap
chain images) may be greater than these 300 x 300 pixels. The parts of an image that lay outside of the framebuffer’s
dimensions are uncleared and unmodified by our application. They may even contain some “artifacts,” because the
memory from which the driver allocates the swap chain images may have been previously used for other purposes and
could contain some data. The correct behavior is to create a framebuffer with the same size as the swap chain images and
to recreate it when the window’s size changes. But as long as the blue triangle is rendered on an orange/gold background,
it means that the code works correctly.

Cleaning Up
One last thing to learn before this tutorial ends is how to release resources created during this lesson. I won’t repeat
the code needed to release resources created in the previous chapter. Just look at the VulkanCommon.cpp file. Here is
the code needed to destroy resources specific to this chapter:
if( GetDevice() != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( GetDevice() );

if( (Vulkan.GraphicsCommandBuffers.size() > 0) && (Vulkan.GraphicsCommandBuffers[0]


!= VK_NULL_HANDLE) ) {
vkFreeCommandBuffers( GetDevice(), Vulkan.GraphicsCommandPool,
static_cast<uint32_t>(Vulkan.GraphicsCommandBuffers.size()),
&Vulkan.GraphicsCommandBuffers[0] );
Vulkan.GraphicsCommandBuffers.clear();
}

if( Vulkan.GraphicsCommandPool != VK_NULL_HANDLE ) {


vkDestroyCommandPool( GetDevice(), Vulkan.GraphicsCommandPool, nullptr );
Vulkan.GraphicsCommandPool = VK_NULL_HANDLE;
}

if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {


vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
}
if( Vulkan.RenderPass != VK_NULL_HANDLE ) {
vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
Vulkan.RenderPass = VK_NULL_HANDLE;
}

for( size_t i = 0; i < Vulkan.FramebufferObjects.size(); ++i ) {


if( Vulkan.FramebufferObjects[i].Handle != VK_NULL_HANDLE ) {
vkDestroyFramebuffer( GetDevice(), Vulkan.FramebufferObjects[i].Handle, nullptr
);
Vulkan.FramebufferObjects[i].Handle = VK_NULL_HANDLE;
}

if( Vulkan.FramebufferObjects[i].ImageView != VK_NULL_HANDLE ) {


vkDestroyImageView( GetDevice(), Vulkan.FramebufferObjects[i].ImageView,
nullptr );
Vulkan.FramebufferObjects[i].ImageView = VK_NULL_HANDLE;
}
}
Vulkan.FramebufferObjects.clear();
}
25. Tutorial03.cpp, function ChildClear()

As usual we first check whether there is any device. If we don’t have a device, we don’t have a resource. Next we wait
until the device is free and we delete all the created resources. We start from deleting command buffers by calling a
vkFreeCommandBuffers() function. Next we destroy a command pool through a vkDestroyCommandPool() function and
after that the graphics pipeline is destroyed. This is achieved through a vkDestroyPipeline() function call. Next we call a
vkDestroyRenderPass() function, which releases the handle to a render pass. Finally, all framebuffers and image views
associated with each swap chain image are deleted.

Each object destruction is preceded by a check whether a given resource was properly created. If not we skip the
process of destruction of such resource.

Conclusion
In this tutorial, we created a render pass with one subpass. Next we created image views and framebuffers for each
swap chain image. One of the most difficult parts was to create a graphics pipeline, because it required us to prepare lots
of data. We had to create shader modules and describe all the shader stages that should be active when a given graphics
pipeline is bound. We had to prepare information about input vertices, their layout, and assembling them into polygons.
Viewport, rasterization, multisampling, and color blending information was also necessary. Then we created a simple
pipeline layout and after that we could create the pipeline itself. Next we created a command pool and allocated command
buffers for each swap chain image. Operations recorded in each command buffer involved setting up an image memory
barrier, beginning a render pass, binding a graphics pipeline, and drawing. Next we ended a render pass and set up another
image memory barrier. The drawing itself was performed the same way as in the previous tutorial (2).

In the next tutorial, we will learn about the vertex attributes, images and buffers.

Notices

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. Check with your system manufacturer or
retailer or learn more at intel.com.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


API without Secrets: Introduction to Vulkan*
Part 4
Table of Contents
Tutorial 4: Vertex Attributes – Buffers, Images, and Fences .................................................................................................. 2
Specifying Render Pass Dependencies ................................................................................................................................ 2
Graphics Pipeline Creation .................................................................................................................................................. 4
Writing Shaders ............................................................................................................................................................... 4
Vertex Attributes Specification ....................................................................................................................................... 5
Input Assembly State Specification ................................................................................................................................. 7
Viewport State Specification ........................................................................................................................................... 8
Dynamic State Specification............................................................................................................................................ 8
Pipeline Object Creation ................................................................................................................................................. 9
Vertex Buffer Creation ...................................................................................................................................................... 10
Buffer Memory Allocation............................................................................................................................................. 12
Binding a Buffer’s Memory ........................................................................................................................................... 13
Uploading Vertex Data .................................................................................................................................................. 14
Rendering Resources Creation .......................................................................................................................................... 15
Command Pool Creation ............................................................................................................................................... 16
Command Buffer Allocation .......................................................................................................................................... 16
Semaphore Creation ..................................................................................................................................................... 17
Fence Creation .............................................................................................................................................................. 17
Drawing ............................................................................................................................................................................. 18
Recording a Command Buffer ....................................................................................................................................... 20
Tutorial04 Execution ......................................................................................................................................................... 24
Cleaning Up ....................................................................................................................................................................... 25
Conclusion ......................................................................................................................................................................... 26
Tutorial 4: Vertex Attributes – Buffers, Images, and Fences
In previous tutorials we learned the basics. The tutorials themselves were long and (I hope) detailed enough. This is
because the learning curve of a Vulkan API is quite steep. And, as you can see, a considerable amount of knowledge is
necessary to prepare even the simplest application.

But now we can build on these foundations. So the tutorials will be shorter and focus on smaller topics related to a
Vulkan API. In this part I present the recommended way of drawing arbitrary geometry by providing vertex attributes
through vertex buffers. As the code of this lesson is similar to the code from the “03 – First Triangle” tutorial, I focus on
and describe only the parts that are different.

I also show a different way of organizing the rendering code. Previously we recorded command buffers before the
main rendering loop. But in real-life situations, every frame of animation is different, so we can’t prerecord all the
rendering commands. We should record and submit the command buffer as late as possible to minimize input lag and
acquire as recent input data as possible. We will record the command buffer just before it is submitted to the queue. But
a single command buffer isn’t enough. We should not record the same command buffer until the graphics card finishes
processing it after it was submitted. This moment is signaled through a fence. But waiting on a fence every frame is a
waste of time, so we need more command buffers used interchangeably. With more command buffers, more fences are
also needed and the situation gets more complicated. This tutorial shows how to organize the code so it is easily
maintained, flexible, and as fast as possible.

Specifying Render Pass Dependencies


We start by creating a render pass, in the same way as the previous tutorial. But this time we will provide additional
information. Render pass describes the internal organization of rendering resources (images/attachments), how they are
used, and how they change during the rendering process. Images’ layout changes can be performed explicitly by creating
image memory barriers. But they can also be performed implicitly, when proper render pass description is specified (initial,
subpass, and final image layouts). Implicit transition is preferred, as drivers can perform such transitions more optimally.

In this part of tutorial, identically as in the previous part, we specify “transfer src” for initial and final image layouts,
and “color attachment optimal” subpass layout for our render pass. But previous tutorials lacked important, additional
information, specifically how the image was used (that is, what types of operations occurred in connection with an image),
and when it was used (which parts of a rendering pipeline were using an image). This information can be specified both in
the image memory barrier and the render pass description. When we create an image memory barrier, we specify the
types of operations which concern the given image (memory access types before and after barrier), and we also specify
when this barrier should be placed (pipeline stages in which image was used before and after the barrier).

When we create a render pass and provide a description for it, the same information is specified through subpass
dependencies. This additional data is crucial for a driver to optimally prepare an implicit barrier. Below is the source code
that creates a render pass and prepares subpass dependencies.
std::vector<VkSubpassDependency> dependencies = {
{
VK_SUBPASS_EXTERNAL, // uint32_t
srcSubpass
0, // uint32_t
dstSubpass
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, // VkPipelineStageFlags
srcStageMask
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // VkPipelineStageFlags
dstStageMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
dstAccessMask
VK_DEPENDENCY_BY_REGION_BIT // VkDependencyFlags
dependencyFlags
},
{
0, // uint32_t
srcSubpass
VK_SUBPASS_EXTERNAL, // uint32_t
dstSubpass
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // VkPipelineStageFlags
srcStageMask
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, // VkPipelineStageFlags
dstStageMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_DEPENDENCY_BY_REGION_BIT // VkDependencyFlags
dependencyFlags
}
};

VkRenderPassCreateInfo render_pass_create_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkRenderPassCreateFlags
flags
1, // uint32_t
attachmentCount
attachment_descriptions, // const VkAttachmentDescription
*pAttachments
1, // uint32_t
subpassCount
subpass_descriptions, // const VkSubpassDescription
*pSubpasses
static_cast<uint32_t>(dependencies.size()), // uint32_t
dependencyCount
&dependencies[0] // const VkSubpassDependency
*pDependencies
};

if( vkCreateRenderPass( GetDevice(), &render_pass_create_info, nullptr,


&Vulkan.RenderPass ) != VK_SUCCESS ) {
std::cout << "Could not create render pass!" << std::endl;
return false;
}
1. Tutorial04.cpp, function CreateRenderPass()

Subpass dependencies describe dependencies between different subpasses. When an attachment is used in one
specific way in a given subpass (for example, rendering into it), but in another way in another subpass (sampling from it),
we can create a memory barrier or we can provide a subpass dependency that describes the intended usage of an
attachment in these two subpasses. Of course, the latter option is recommended, as the driver can (usually) prepare the
barriers in a more optimal way. And the code itself is improved—everything required to understand the code is gathered
in one place, one object.

In our simple example, we have only one subpass, but we specify two dependencies. This is because we can (and
should) specify dependencies between render passes (by providing the number of a given subpass) and operations outside
of them (by providing a VK_SUBPASS_EXTERNAL value). Here we provide one dependency for color attachment between
operations occurring before a render pass and its only subpass. The second dependency is defined for operations occurring
inside a subpass and after the render pass.
What operations are we talking about? We are using only one attachment, which is an image acquired from a
presentation engine (swapchain). The presentation engine uses an image as a source of a presentable data. It only displays
an image. So the only operation that involves this image is “memory read” on the image with “present src” layout. This
operation doesn’t occur in any normal pipeline stage, but it can be represented in the “bottom of pipeline” stage.

Inside our render pass, in its only subpass (with index 0), we are rendering into an image used as a color attachment.
So the operation that occurs on this image is “color attachment write”, which is performed in the “color attachment
output” pipeline stage (after a fragment shader). After that the image is presented and returned to a presentation engine,
which again uses this image as a source of data. So, in our example, the operation after the render pass is the same as
before it: “memory read”.

We specify this data through an array of VkSubpassDependency members. And when we create a render pass and a
VkRenderPassCreateInfo structure, we specify the number of elements in the dependencies array (through
dependencyCount member), and provide an address of its first element (through pDependencies). In a previous part of
the tutorial we have provided 0 and nullptr for these two fields. VkSubpassDependency structure contains the following
fields:

 srcSubpass – Index of a first (previous) subpass or VK_SUBPASS_EXTERNAL if we want to indicate dependency


between subpass and operations outside of a render pass.
 dstSubpass – Index of a second (later) subpass (or VK_SUBPASS_EXTERNAL).
 srcStageMask – Pipeline stage during which a given attachment was used before (in a src subpass).
 dstStageMask – Pipeline stage during which a given attachment will be used later (in a dst subpass).
 srcAccessMask – Types of memory operations that occurred in a src subpass or before a render pass.
 dstAccessMask – Types of memory operations that occurred in a dst subpass or after a render pass.
 dependencyFlags – Flag describing the type (region) of dependency.

Graphics Pipeline Creation


Now we will create a graphics pipeline object. (We should create framebuffers for our swapchain images, but we will
do that during command buffer recording). We don’t want to render a geometry that is hardcoded into a shader. We want
to draw any number of vertices, and we also want to provide additional attributes, not only vertex positions. What should
we do first?

Writing Shaders
First have a look at the vertex shader written in GLSL code:
#version 450

layout(location = 0) in vec4 i_Position;


layout(location = 1) in vec4 i_Color;

out gl_PerVertex
{
vec4 gl_Position;
};

layout(location = 0) out vec4 v_Color;

void main() {
gl_Position = i_Position;
v_Color = i_Color;
}
2. shader.vert

This shader is quite simple, though more complicated than the one from Tutorial 03.
We specify two input attributes (named i_Position and i_Color). In Vulkan, all attributes must have a location layout
qualifier. When we specify a description of the vertex attributes in Vulkan API, the names of these attributes don’t matter,
only their indices/locations. In OpenGL* we could ask for a location of an attribute with a given name. In Vulkan we can’t
do this. Location layout qualifiers are the only way to go.

Next, we redeclare the gl_PerVertex block in the shader. Vulkan uses shader I/O blocks, and we should redeclare a
gl_PerVertex block to specify exactly what members of this block to use. When we don’t, the default definition is used.
But we must remember that the default definition contains gl_ClipDistance[], which requires us to enable a feature named
shaderClipDistance (and in Vulkan we can’t use features that are not enabled during device creation or our application
may not work correctly). Here we are using only a gl_Position member so the feature is not required.

We then specify an additional output varying variable called v_Color in which we store vertices’ colors. Inside a main
function we copy values provided by an application to proper output variables: position to gl_Position and color to v_Color.

Now look at a fragment shader to see how attributes are consumed.


#version 450

layout(location = 0) in vec4 v_Color;

layout(location = 0) out vec4 o_Color;

void main() {
o_Color = v_Color;
}
3. shader.frag

In a fragment shader, the input varying variable v_Color is copied to the only output variable called o_Color. Both
variables have location layout specifiers. The v_Color variable has the same location as the output variable in the vertex
shader, so it will contain color values interpolated between vertices.

These shaders can be converted to a SPIR-V assembly the same way as previously. The following commands do this:

glslangValidator.exe -V -H shader.vert > vert.spv.txt

glslangValidator.exe -V -H shader.frag > frag.spv.txt

So now, when we know what attributes we want to use in our shaders, we can create the appropriate graphics
pipeline.

Vertex Attributes Specification


The most important improvement in this tutorial is added to the vertex input state creation, for which we specify a
variable of type VkPipelineVertexInputStateCreateInfo. In this variable we provide pointers to structures, which define the
type of vertex input data and number and layout of our attributes.

We want to use two attributes: vertex positions, which are composed of four float components, and vertex colors,
which are also composed of four float values. We will lay all of our vertex data in one buffer using the interleaved attributes
layout. This means that position for the first vertex will be placed, next color for the same vertex, next the position of
second vertex, after that the color of the second vertex, then position and color of third vertex and so on. All this
specification is performed with the following code:
std::vector<VkVertexInputBindingDescription> vertex_binding_descriptions = {
{
0, // uint32_t
binding
sizeof(VertexData), // uint32_t
stride
VK_VERTEX_INPUT_RATE_VERTEX // VkVertexInputRate
inputRate
}
};

std::vector<VkVertexInputAttributeDescription> vertex_attribute_descriptions = {
{
0, // uint32_t
location
vertex_binding_descriptions[0].binding, // uint32_t
binding
VK_FORMAT_R32G32B32A32_SFLOAT, // VkFormat
format
offsetof(struct VertexData, x) // uint32_t
offset
},
{
1, // uint32_t
location
vertex_binding_descriptions[0].binding, // uint32_t
binding
VK_FORMAT_R32G32B32A32_SFLOAT, // VkFormat
format
offsetof( struct VertexData, r ) // uint32_t
offset
}
};

VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineVertexInputStateCreateFlags flags;
static_cast<uint32_t>(vertex_binding_descriptions.size()), // uint32_t
vertexBindingDescriptionCount
&vertex_binding_descriptions[0], // const
VkVertexInputBindingDescription *pVertexBindingDescriptions
static_cast<uint32_t>(vertex_attribute_descriptions.size()), // uint32_t
vertexAttributeDescriptionCount
&vertex_attribute_descriptions[0] // const
VkVertexInputAttributeDescription *pVertexAttributeDescriptions
};
4. Tutorial04.cpp, function CreatePipeline()

First specify the binding (general memory information) of vertex data through VkVertexInputBindingDescription. It
contains the following fields:

 binding – Index of a binding with which vertex data will be associated.


 stride – The distance in bytes between two consecutive elements (the same attribute for two neighbor
vertices).
 inputRate – Defines how data should be consumed, per vertex or per instance.

The stride and inputRate fields are quite self-explanatory. Additional information may be required for a binding
member. When we create a vertex buffer, we bind it to a chosen slot before rendering operations. The slot number (an
index) is this binding and here we describe how data in this slot is aligned in memory and how it should be consumed (per
vertex or per instance). Different vertex buffers can be bound to different bindings. And each binding may be differently
positioned in memory.
Next step is to define all vertex attributes. We must specify a location (index) for each attribute (the same as in a
shader source code, in location layout qualifier), source of data (binding from which data will be read), format (data type
and number of components), and offset at which data for this specific attribute can be found (offset from the beginning
of a data for a given vertex, not from the beginning of all vertex data). The situation here is exactly the same as in OpenGL
where we created Vertex Buffer Objects (VBO, which can be thought of as an equivalent of “binding”) and defined
attributes using glVertexAttribPointer() function through which we specified an index of an attribute (location), size and
type (number of components and format), stride and offset. This information is provided through the
VkVertexInputAttributeDescription structure. It contains these fields:

 location – Index of an attribute, the same as defined by the location layout specifier in a shader source code.
 binding – The number of the slot from which data should be read (source of data like VBO in OpenGL), the
same binding as in a VkVertexInputBindingDescription structure and vkCmdBindVertexBuffers() function
(described later).
 format – Data type and number of components per attribute.
 offset – Beginning of data for a given attribute.

When we are ready, we can prepare vertex input state description by filling a variable of type
VkPipelineVertexInputStateCreateInfo which consist of the following fields:

 sType – Type of structure, here it should be equal to


VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO.
 pNext – Pointer reserved for extensions. Right now set this value to null.
 flags – Parameter reserved for future use.
 vertexBindingDescriptionCount – Number of elements in the pVertexBindingDescriptions array.
 pVertexBindingDescriptions – Array describing all bindings defined for a given pipeline (buffers from which
values of all attributes are read).
 vertexAttributeDescriptionCount – Number of elements in the pVertexAttributeDescriptions array.
 pVertexAttributeDescriptions – Array with elements specifying all vertex attributes.

This concludes vertex attributes specification at pipeline creation. But to use them, we must create a vertex buffer
and bind it to command buffer before we issue a rendering command.

Input Assembly State Specification


Previously we have drawn a single triangle using a triangle list topology. Now we will draw a quad, which is more
convenient to draw by defining just four vertices, not two triangles and six vertices. To do this, we must use triangle strip
topology. We define it through VkPipelineInputAssemblyStateCreateInfo structure that has the following members:

 sType – Structure type, here equal to


VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO.
 pNext – Pointer reserved for extensions.
 flags – Parameter reserved for future use.
 topology – Topology used for drawing vertices (like triangle fan, strip, list).
 primitiveRestartEnable – Parameter defining whether we want to restart assembling a primitive by using a
special value of vertex index.

Here is the code sample used to define triangle strip topology:


VkPipelineInputAssemblyStateCreateInfo input_assembly_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineInputAssemblyStateCreateFlags flags
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP, //
VkPrimitiveTopology topology
VK_FALSE // VkBool32
primitiveRestartEnable
};

5. Tutorial04.cpp, function CreatePipeline()

Viewport State Specification


In this tutorial we introduce another change. Previously, for the sake of simplicity, we have hardcoded the viewport
and scissor test parameters, which unfortunately caused our image to be always the same size, no matter how big the
application window was. This time, we won’t specify these values through the VkPipelineViewportStateCreateInfo
structure. We will use a dynamic state for that. Here is a code responsible for defining static viewport state parameters:
VkPipelineViewportStateCreateInfo viewport_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineViewportStateCreateFlags flags
1, // uint32_t
viewportCount
nullptr, // const VkViewport
*pViewports
1, // uint32_t
scissorCount
nullptr // const VkRect2D
*pScissors
};
6. Tutorial04.cpp, function CreatePipeline()

The structure that defines static viewport parameters has the following members:

 sType – Type of the structure, VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO here.


 pNext – Pointer reserved for extension-specific parameters.
 flags – Parameter reserved for future use.
 viewportCount – Number of viewports.
 pViewports – Pointer to a structure defining static viewport parameters.
 scissorCount – Number of scissor rectangles (must have the same value as viewportCount parameter).
 pScissors – Pointer to an array of 2D rectangles defining static scissor test parameters for each viewport.

When we want to define viewport and scissor parameters through a dynamic state, we don’t have to fill pViewports
and pScissors members. That’s why they are set to null in the example above. But, we always have to define the number
of viewports and scissor test rectangles. These values are always specified through the VkPipelineViewportStateCreateInfo
structure, no matter if we want to use dynamic or static viewport and scissor state.

Dynamic State Specification


When we create a pipeline, we can specify which parts of it are always static, defined through structures at a pipeline
creation, and which are dynamic, specified by proper function calls during command buffer recording. This allows us to
lower the number of pipeline objects that differ only with small details like line widths, blend constants, or stencil
parameters, or mentioned viewport size. Here is the code used to define parts of pipeline that should be dynamic:
std::vector<VkDynamicState> dynamic_states = {
VK_DYNAMIC_STATE_VIEWPORT,
VK_DYNAMIC_STATE_SCISSOR,
};

VkPipelineDynamicStateCreateInfo dynamic_state_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineDynamicStateCreateFlags flags
static_cast<uint32_t>(dynamic_states.size()), // uint32_t
dynamicStateCount
&dynamic_states[0] // const
VkDynamicState *pDynamicStates
};
7. Tutorial04.cpp, function CreatePipeline()

It is done by using a structure of type VkPipelineDynamicStateCreateInfo, which contains the following fields:

 sType – Parameter defining the type of a given structure, here equal to


VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO.
 pNext – Parameter reserved for extensions.
 flags – Parameter reserved for future use.
 dynamicStateCount – Number of elements in pDynamicStates array.
 pDynamicStates – Array containing enums, specifying which parts of a pipeline should be marked as dynamic.
Each element of this array is of type VkDynamicState.

Pipeline Object Creation


We now have defined all the necessary parameters of a graphics pipeline, so we can create a pipeline object. Here is
the code that does it:
VkGraphicsPipelineCreateInfo pipeline_create_info = {
VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkPipelineCreateFlags flags
static_cast<uint32_t>(shader_stage_create_infos.size()), // uint32_t
stageCount
&shader_stage_create_infos[0], // const
VkPipelineShaderStageCreateInfo *pStages
&vertex_input_state_create_info, // const
VkPipelineVertexInputStateCreateInfo *pVertexInputState;
&input_assembly_state_create_info, // const
VkPipelineInputAssemblyStateCreateInfo *pInputAssemblyState
nullptr, // const
VkPipelineTessellationStateCreateInfo *pTessellationState
&viewport_state_create_info, // const
VkPipelineViewportStateCreateInfo *pViewportState
&rasterization_state_create_info, // const
VkPipelineRasterizationStateCreateInfo *pRasterizationState
&multisample_state_create_info, // const
VkPipelineMultisampleStateCreateInfo *pMultisampleState
nullptr, // const
VkPipelineDepthStencilStateCreateInfo *pDepthStencilState
&color_blend_state_create_info, // const
VkPipelineColorBlendStateCreateInfo *pColorBlendState
&dynamic_state_create_info, // const
VkPipelineDynamicStateCreateInfo *pDynamicState
pipeline_layout.Get(), // VkPipelineLayout
layout
Vulkan.RenderPass, // VkRenderPass
renderPass
0, // uint32_t
subpass
VK_NULL_HANDLE, // VkPipeline
basePipelineHandle
-1 // int32_t
basePipelineIndex
};

if( vkCreateGraphicsPipelines( GetDevice(), VK_NULL_HANDLE, 1, &pipeline_create_info,


nullptr, &Vulkan.GraphicsPipeline ) != VK_SUCCESS ) {
std::cout << "Could not create graphics pipeline!" << std::endl;
return false;
}
return true;
8. Tutorial04.cpp, function CreatePipeline()

The most important variable, which contains references to all pipeline parameters, is of type
VkGraphicsPipelineCreateInfo. The only change from the previous tutorial is an addition of the pDynamicState parameter,
which points to a structure of VkPipelineDynamicStateCreateInfo type, described above. Every pipeline state, which is
specified as dynamic, must be set through a proper function call during command buffer recording.

The pipeline object itself is created by calling the vkCreateGraphicsPipelines() function.

Vertex Buffer Creation


To use vertex attributes, apart from specifying them during pipeline creation, we need to prepare a buffer that will
contain all the data for these attributes. From this buffer, the values for attributes will be read and provided to the vertex
shader.

In Vulkan, buffer and image creation consists of at least two stages. First, we create the object itself. Next, we need
to create a memory object, which will then be bound to the buffer (or image). From this memory object, the buffer will
take its storage space. This approach allows us to specify additional parameters for the memory and control it with more
details.

To create a (general) buffer object we call vkCreateBuffer(). It accepts, among other parameters, a pointer to a
variable of type VkBufferCreateInfo, which defines parameters of created buffer. Here is the code responsible for creating
a buffer used as a source of data for vertex attributes:
VertexData vertex_data[] = {
{
-0.7f, -0.7f, 0.0f, 1.0f,
1.0f, 0.0f, 0.0f, 0.0f
},
{
-0.7f, 0.7f, 0.0f, 1.0f,
0.0f, 1.0f, 0.0f, 0.0f
},
{
0.7f, -0.7f, 0.0f, 1.0f,
0.0f, 0.0f, 1.0f, 0.0f
},
{
0.7f, 0.7f, 0.0f, 1.0f,
0.3f, 0.3f, 0.3f, 0.0f
}
};

Vulkan.VertexBuffer.Size = sizeof(vertex_data);

VkBufferCreateInfo buffer_create_info = {
VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkBufferCreateFlags flags
Vulkan.VertexBuffer.Size, // VkDeviceSize size
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, // VkBufferUsageFlags usage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr // const uint32_t
*pQueueFamilyIndices
};

if( vkCreateBuffer( GetDevice(), &buffer_create_info, nullptr,


&Vulkan.VertexBuffer.Handle ) != VK_SUCCESS ) {
std::cout << "Could not create a vertex buffer!" << std::endl;
return false;
}
9. Tutorial04.cpp, function CreateVertexBuffer()

At the beginning of the CreateVertexBuffer() function we define a set of values for position and color attributes. First,
four position components are defined for first vertex, next four color components for the same vertex, after that four
components of a position attribute for second vertex are specified, next a color values for the same vertex, after that
position and color for third and fourth vertices. The size of this array is used to define the size of a buffer. Remember
though that internally graphics driver may require more storage for a buffer than the size requested by an application.

Next we define a variable of VkBufferCreateInfo type. It is a structure with the following fields:

 sType – Type of the structure, which should be set to VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO value.


 pNext – Parameter reserved for extensions.
 flags – Parameter defining additional creation parameters. Right now it allows creation of a buffer backed by
a sparse memory (something similar to a mega texture). As we don’t want to use sparse memory, we can set
this parameter to zero.
 size – Size, in bytes, of a buffer.
 usage – This parameter defines how we intend to use this buffer in future. We can specify that we want to
use buffer as a uniform buffer, index buffer, source of data for transfer (copy) operations, and so on. Here we
intend to use this buffer as a vertex buffer. Remember that we can’t use a buffer for a purpose that is not
defined during buffer creation.
 sharingMode – Sharing mode, similarly to swapchain images, defines whether a given buffer can be accessed
by multiple queues at the same time (concurrent sharing mode) or by just a single queue (exclusive sharing
mode). If a concurrent sharing mode is specified, we must provide indices of all queues that will have access
to a buffer. If we want to define an exclusive sharing mode, we can still reference this buffer in different
queues, but only in one at a time. If we want to use a buffer in a different queue (submit commands that
reference this buffer to another queue), we need to specify buffer memory barrier that transitions buffer’s
ownership from one queue to another.
 queueFamilyIndexCount – Number of queue indices in pQueueFamilyIndices array (only when concurrent
sharing mode is specified).
 pQueueFamilyIndices – Array with indices of all queues that will reference buffer (only when concurrent
sharing mode is specified).

To create a buffer we must call vkCreateBuffer() function.

Buffer Memory Allocation


We next create a memory object that will back the buffer’s storage.
VkMemoryRequirements buffer_memory_requirements;
vkGetBufferMemoryRequirements( GetDevice(), buffer, &buffer_memory_requirements );

VkPhysicalDeviceMemoryProperties memory_properties;
vkGetPhysicalDeviceMemoryProperties( GetPhysicalDevice(), &memory_properties );

for( uint32_t i = 0; i < memory_properties.memoryTypeCount; ++i ) {


if( (buffer_memory_requirements.memoryTypeBits & (1 << i)) &&
(memory_properties.memoryTypes[i].propertyFlags &
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) ) {

VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
buffer_memory_requirements.size, // VkDeviceSize
allocationSize
i // uint32_t
memoryTypeIndex
};

if( vkAllocateMemory( GetDevice(), &memory_allocate_info, nullptr, memory ) ==


VK_SUCCESS ) {
return true;
}
}
}
return false;
10. Tutorial04.cpp, function AllocateBufferMemory()

First we must check what the memory requirements for a created buffer are. We do this by calling the
vkGetBufferMemoryRequirements() function. It stores parameters for memory creation in a variable that we provided
the address of in the last parameter. This variable must be of type VkMemoryRequirements and it contains information
about required size, memory alignment, and supported memory types. What are memory types?

Each device may have and expose different memory types—heaps of various sizes that have different properties. One
memory type may be a device’s local memory located on the GDDR chips (thus very, very fast). Another may be a shared
memory that is visible both for a graphics card and a CPU. Both the graphics card and application may have access to this
memory, but such memory type is slower than the device local-only memory (which is accessible only to a graphics card).

To check what memory heaps and types are available, we need to call the vkGetPhysicalDeviceMemoryProperties()
function, which stores information about memory in a variable of type VkPhysicalDeviceMemoryProperties. It contains
the following information:

 memoryHeapCount – Number of memory heaps exposed by a given device.


 memoryHeaps – An array of memory heaps. Each heap represents a memory of different size and properties.
 memoryTypeCount – Number of different memory types exposed by a given device.
 memoryTypes – An array of memory types. Each element describes specific memory properties and contains
an index of a heap that has these particular properties.

Before we can allocate a memory for a given buffer, we need to check which memory type fulfills a buffer’s memory
requirements. If we have additional, specific needs, we can also check them. For all of this, we iterate over all available
memory types. Buffer memory requirements have a field called memoryTypeBits and if a bit on a given index is set in this
field, it means that for a given buffer we can allocate a memory of the type represented by that index. But we must
remember that while there must always be a memory type that fulfills buffer’s memory requirements, it may not support
some other, specific needs. In this case we need to look for another memory type or change our additional requirements.

Here, our additional requirement is that memory needs to be host visible. This means that application can map this
memory and get access to it—read it or write data to it. Such memory is usually slower than the device local-only memory,
but this way we can easily upload data for our vertex attributes. The next tutorial will show how to use device local-only
memory for better performance.

Fortunately, the host visible requirement is popular, and it should be easy to find a memory type that supports both
the buffer’s memory requirements and the host visible property. We then prepare a variable of type
VkMemoryAllocateInfo and fill all its fields:

 sType – Type of the structure, here set to VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO.


 pNext – Pointer reserved for extensions.
 allocationSize – Minimum required memory size that should be allocated.
 memoryTypeIndex – Index of a memory type we want to use for a created memory object. It is the index of
one of bits that are set (has value of one) in buffer’s memory requirement.

After we fill such a structure we call vkAllocateMemory() and check whether the memory object allocation succeeded.

Binding a Buffer’s Memory


When we are done creating a memory object, we must bind it to our buffer. Without it, there will be no storage space
in a buffer and we won’t be able to store any data in it.
if( !AllocateBufferMemory( Vulkan.VertexBuffer.Handle, &Vulkan.VertexBuffer.Memory )
) {
std::cout << "Could not allocate memory for a vertex buffer!" << std::endl;
return false;
}

if( vkBindBufferMemory( GetDevice(), Vulkan.VertexBuffer.Handle,


Vulkan.VertexBuffer.Memory, 0 ) != VK_SUCCESS ) {
std::cout << "Could not bind memory for a vertex buffer!" << std::endl;
return false;
}
11. Tutorial04.cpp, function CreateVertexBuffer()

AllocateBufferMemory() is a function that allocates a memory object. It was presented earlier. When a memory object
is created, we bind it to the buffer by calling the vkBindBufferMemory() function. During the call we must specify a handle
to a buffer, handle to a memory object, and an offset. Offset is very important and requires some additional explanation.

When we queried for buffer memory requirement, we acquired information about required size, memory type, and
alignment. Different buffer usages may require different memory alignment. The beginning of a memory object (offset of
0) satisfies all alignments. This means that all memory objects are created at addresses that fulfill the requirements of all
different usages. So when we specify a zero offset, we don’t have to worry about anything.

But we can create larger memory object and use it as a storage space for multiple buffers (or images). This, in fact, is
the recommended behavior. Creating larger memory objects means we are creating fewer memory objects. This allows
driver to track fewer objects in general. Memory objects must be tracked by a driver because of OS requirements and
security measures. Larger memory objects don’t cause big problems with memory fragmentation. Finally, we should
allocate larger memory amounts and keep similar objects in them to increase cache hits and thus improve performance
of our application.

But when we allocate larger memory objects and bind them to multiple buffers (or images), not all of them can be
bound at offset zero. Only one can be bound at this offset, others must be bound further away, after a space used by the
first buffer (or image). So the offset for the second, and all other buffers bound to the same memory object, must meet
alignment requirements reported by the query. And we must remember it. That’s why alignment member is important.

When our buffer is created and memory for it is allocated and bound, we can fill the buffer with data for vertex
attributes.

Uploading Vertex Data


We have created a buffer and we have bound a memory that is host visible. This means we can map this memory,
acquire a pointer to this memory, and use this pointer to copy data from our application to the buffer itself (similar to the
OpenGL’s glBufferData() function):
void *vertex_buffer_memory_pointer;
if( vkMapMemory( GetDevice(), Vulkan.VertexBuffer.Memory, 0,
Vulkan.VertexBuffer.Size, 0, &vertex_buffer_memory_pointer ) != VK_SUCCESS ) {
std::cout << "Could not map memory and upload data to a vertex buffer!" <<
std::endl;
return false;
}

memcpy( vertex_buffer_memory_pointer, vertex_data, Vulkan.VertexBuffer.Size );

VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.VertexBuffer.Memory, // VkDeviceMemory memory
0, // VkDeviceSize offset
VK_WHOLE_SIZE // VkDeviceSize size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );

vkUnmapMemory( GetDevice(), Vulkan.VertexBuffer.Memory );

return true;
12. Tutorial04.cpp, function CreateVertexBuffer()

To map memory, we call the vkMapMemory() function. In the call we must specify which memory object we want to
map and a region to access. Region is defined by an offset from the beginning of a memory object’s storage and size. After
the successful call we acquire a pointer. We can use it to copy data from our application to the provided memory address.
Here we copy vertex data from an array with vertex positions and colors.

After a memory copy operation and before we unmap a memory (we don’t need to unmap it, we can keep a pointer
and this shouldn’t impact performance), we need to tell the driver which parts of the memory was modified by our
operations. This operation is called flushing. Through it we specify all memory ranges that our application copied data to.
Ranges don’t have to be continuous. Ranges are defined by an array of VkMappedMemoryRange elements which contain
these fields:

 sType – Structure type, here equal to VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE.


 pNext – Pointer reserved for extensions.
 memory – Handle of a mapped and modified memory object.
 offset – Offset (from the beginning of a given memory object’s storage) at which a given range starts.
 size – Size, in bytes, of an affected region. If the whole memory, from an offset to the end, was modified, we
can use the special value of VK_WHOLE_SIZE.

When we define all memory ranges that should be flashed, we can call the vkFlushMappedMemoryRanges() function.
After that, the driver will know which parts were modified and will reload them (that is, refresh cache). Reloading usually
occurs on barriers. After modifying a buffer, we should set a buffer memory barrier, which will tell the driver that some
operations influenced a buffer and it should be refreshed. But, fortunately, in this case such a barrier is placed implicitly
by the driver on a submission of a command buffer that references the given buffer and no additional operations are
required. Now we can use this buffer during rendering commands recording.

Rendering Resources Creation


We now must prepare resources required for a command buffer recording. In previous tutorials we have recorded
one static command buffer for each swapchain image. Here we will reorganize the rendering code. We will still display a
simple, static scene, but the approach presented here is useful in real-life scenarios, where displayed scenes are dynamic.

To record command buffers and submit them to queue in an efficient way, we need four types of resources: command
buffers, semaphores, fences and framebuffers. Semaphores, as we already discussed, are used for internal queue
synchronization. Fences, on the other hand, allow the application to check if some specific situation occurred, e.g. if
command buffer’s execution after it was submitted to queue, has finished. If necessary, application can wait on a fence,
until it is signaled. In general, semaphores are used to synchronize queues (GPU) and fences are used to synchronize
application (CPU).

To render a single frame of animation we need (at least) one command buffer, two semaphores—one for a swapchain
image acquisition (image available semaphore) and the other to signal that presentation may occur (rendering a finished
semaphore)—a fence, and a framebuffer. The fence is used later to check whether we can rerecord a given command
buffer. We will keep several numbers of such rendering resources, which we can call a virtual frame. The number of these
virtual frames (consisting of a command buffer, two semaphores, a fence, and a framebuffer) should be independent of a
number of swapchain images.

The rendering algorithm progresses like this: We record rendering commands to the first virtual frame and then submit
it to a queue. Next we record another frame (command buffer) and submit it to queue. We do this until we are out of all
virtual frames. At this point we will start reusing frames by taking the oldest (least recently submitted) command buffer
and rerecording it again. Then we will use another command buffer, and so on.

This is where the fences come in. We are not allowed to record a command buffer that has been submitted to a queue
until its execution in the queue is finished. During command buffer recording, we can use the “simultaneous use” flag,
which allows us to record or resubmit a command buffer that has already been submitted. This may impact performance
though. A better way is to use fences and check whether a command buffer is not used any more. If a graphics card is still
processing a command buffer, we can wait on a fence associated with a given command buffer, or use this additional time
for other purposes, like improved AI calculations, and after some time check again to see whether a fence is signaled.

How many virtual frames should we have? One is not enough. When we record and submit a single command buffer,
we immediately wait until we can rerecord it. It is a waste of time of both the CPU and the GPU. The GPU is usually faster,
so waiting on a CPU causes more waiting on a GPU. We should keep the GPU as busy as possible. That is why thin APIs like
Vulkan were created. Using two virtual frames gives huge performance gain, as there is much less waiting both on the CPU
and the GPU. Adding a third virtual frame gives additional performance gain, but the increase isn’t as big. Using four or
more groups of rendering resource doesn’t make sense, as the performance gain is negligible (of course this may depend
on the complexity of the rendered scene and calculations performed by the CPU-like physics or AI). When we increase the
number of virtual frames we also increase the input lag, as we present a frame that’s one to three frames behind the CPU.
So two or three virtual frames seems to be the most reasonable compromise between performance, memory usage, and
input lag.
You may wonder why the number of virtual frames shouldn’t be connected with the number of swapchain images.
This approach may influence the behavior of our application. When we create a swapchain, we ask for the minimal
required number of images, but the driver is allowed to create more. So different hardware vendors may implement
drivers that offer different numbers of swapchain images, even for the same requirements (present mode and minimal
number of images). When we connect the number of virtual frames with a number of swapchain images, our application
will use only two virtual frames on one graphics card, but four virtual frames on another graphics card. This may influence
both performance and mentioned input lag. It’s not a desired behavior. By keeping the number of virtual frames fixed, we
can control our rendering algorithm and fine-tune it to our needs, that is, balance the time spent on rendering and AI or
physics calculations.

Command Pool Creation


Before we can allocate a command buffer, we first need to create a command pool.
VkCommandPoolCreateInfo cmd_pool_create_info = {
VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT | // VkCommandPoolCreateFlags
flags
VK_COMMAND_POOL_CREATE_TRANSIENT_BIT,
queue_family_index // uint32_t
queueFamilyIndex
};

if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) !=


VK_SUCCESS ) {
return false;
}
return true;
13. Tutorial04.cpp, function CreateCommandPool()

The command pool is created by calling vkCreateCommandPool(), which requires us to provide a pointer to a variable
of type VkCommandPoolCreateInfo. The code remains mostly unchanged, compared to previous tutorials. But this time,
two additional flags are added for command pool creation:

 VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT – Indicates that command buffers,


allocated from this pool, may be reset individually. Normally, without this flag, we can’t rerecord the same
command buffer multiple times. It must be reset first. And, what’s more, command buffers created from one
pool may be reset only all at once. Specifying this flag allows us to reset command buffers individually, and
(even better) it is done implicitly by calling the vkBeginCommandBuffer() function.
 VK_COMMAND_POOL_CREATE_TRANSIENT_BIT – This flag tells the driver that command buffers allocated
from this pool will be living for a short amount of time, they will be often recorded and reset (re-recorded).
This information helps optimize command buffer allocation and perform it more optimally.

Command Buffer Allocation


Allocating command buffers remains the same as previously.
for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {
if( !AllocateCommandBuffers( Vulkan.CommandPool, 1,
&Vulkan.RenderingResources[i].CommandBuffer ) ) {
std::cout << "Could not allocate command buffer!" << std::endl;
return false;
}
}
return true;
14. Tutorial04.cpp, function CreateCommandBuffers()

The only change is that command buffers are gathered into a vector of rendering resources. Each rendering resource
structure contains a command buffer, image available semaphore, rendering finished semaphore, a fence and a
framebuffer. Command buffers are allocated in a loop. The number of elements in a rendering resources vector is chosen
arbitrarily. For this tutorial it is equal to three.

Semaphore Creation
The code responsible for creating a semaphore is simple and the same as previously shown:
VkSemaphoreCreateInfo semaphore_create_info = {
VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0 // VkSemaphoreCreateFlags flags
};

for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {


if( (vkCreateSemaphore( GetDevice(), &semaphore_create_info, nullptr,
&Vulkan.RenderingResources[i].ImageAvailableSemaphore ) != VK_SUCCESS) ||
(vkCreateSemaphore( GetDevice(), &semaphore_create_info, nullptr,
&Vulkan.RenderingResources[i].FinishedRenderingSemaphore ) != VK_SUCCESS) ) {
std::cout << "Could not create semaphores!" << std::endl;
return false;
}
}
return true;
15. Tutorial04.cpp, function CreateSemaphores()

Fence Creation
Here is the code responsible for creating fence objects:
VkFenceCreateInfo fence_create_info = {
VK_STRUCTURE_TYPE_FENCE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_FENCE_CREATE_SIGNALED_BIT // VkFenceCreateFlags
flags
};

for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {


if( vkCreateFence( GetDevice(), &fence_create_info, nullptr,
&Vulkan.RenderingResources[i].Fence ) != VK_SUCCESS ) {
std::cout << "Could not create a fence!" << std::endl;
return false;
}
}
return true;
16. Tutorial04.cpp, function CreateFences()

To create a fence object we call the vkCreateFence() function. It accepts, among other parameters, a pointer to a
variable of type VkFenceCreateInfo, which has the following members:

 sType – Type of the structure. Here it should be set to VK_STRUCTURE_TYPE_FENCE_CREATE_INFO.


 pNext – Pointer reserved for extensions.
 flags – Right now this parameter allows for creating a fence that is already signaled.
A fence may have two states: signaled and unsignaled. The application checks whether a given fence is in a signaled
state, or it may wait on a fence until the fence gets signaled. Signaling is done by the GPU after all operations submitted
to the queue are processed. When we submit command buffers, we can provide a fence that will be signaled when a
queue has finished executing all commands that were issued in this one submit operation. After the fence is signaled, it is
the application’s responsibility to reset it to an unsignaled state.

Why create a fence that is already signaled? Our rendering algorithm will record commands to the first command
buffer, then to the second command buffer, after that to the third, and then once again to the first (after its execution in
a queue has ended). We use fences to check whether we can record a given command buffer once again. But what about
the first recording? We don’t want to keep separate code paths for the first command buffer recording and for the
following recording operations. So when we issue a command buffer recording for the first time, we also check whether
a fence is already signaled. But because we didn’t submit a given command buffer, the fence associated with it can’t
become signaled as a result of the finished execution. So the fence needs to be created in an already signaled state. This
way, for the first time, we won’t have to wait for it to become signaled (as it is already signaled), but after the check we
will reset it and immediately go to the recording code. After that we submit a command buffer and provide the same
fence, which will get signaled by the queue when operations are done. The next time, when we want to rerecord rendering
commands to the same command buffer, we can do the same operations: wait on the fence, reset it, and then start
command buffer recording.

Drawing
Now we are nearly ready to record rendering operations. We are recording each command buffer just before it is
submitted to the queue. We record one command buffer and submit it, then the next command buffer and submit it, then
yet another one. After that we take the first command buffer, check whether we can use it, and we record it and submit
it to the queue.
static size_t resource_index = 0;
RenderingResourcesData &current_rendering_resource =
Vulkan.RenderingResources[resource_index];
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;

resource_index = (resource_index + 1) % VulkanTutorial04Parameters::ResourcesCount;

if( vkWaitForFences( GetDevice(), 1, &current_rendering_resource.Fence, VK_FALSE,


1000000000 ) != VK_SUCCESS ) {
std::cout << "Waiting for fence takes too long!" << std::endl;
return false;
}
vkResetFences( GetDevice(), 1, &current_rendering_resource.Fence );

VkResult result = vkAcquireNextImageKHR( GetDevice(), swap_chain, UINT64_MAX,


current_rendering_resource.ImageAvailableSemaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
case VK_SUCCESS:
case VK_SUBOPTIMAL_KHR:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
return OnWindowSizeChanged();
default:
std::cout << "Problem occurred during swap chain image acquisition!" <<
std::endl;
return false;
}

if( !PrepareFrame( current_rendering_resource.CommandBuffer,


GetSwapChain().Images[image_index], current_rendering_resource.Framebuffer ) ) {
return false;
}
VkPipelineStageFlags wait_dst_stage_mask =
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
1, // uint32_t
waitSemaphoreCount
&current_rendering_resource.ImageAvailableSemaphore, // const VkSemaphore
*pWaitSemaphores
&wait_dst_stage_mask, // const
VkPipelineStageFlags *pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&current_rendering_resource.CommandBuffer, // const VkCommandBuffer
*pCommandBuffers
1, // uint32_t
signalSemaphoreCount
&current_rendering_resource.FinishedRenderingSemaphore // const VkSemaphore
*pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info,


current_rendering_resource.Fence ) != VK_SUCCESS ) {
return false;
}

VkPresentInfoKHR present_info = {
VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, // VkStructureType
sType
nullptr, // const void
*pNext
1, // uint32_t
waitSemaphoreCount
&current_rendering_resource.FinishedRenderingSemaphore, // const VkSemaphore
*pWaitSemaphores
1, // uint32_t
swapchainCount
&swap_chain, // const VkSwapchainKHR
*pSwapchains
&image_index, // const uint32_t
*pImageIndices
nullptr // VkResult
*pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );

switch( result ) {
case VK_SUCCESS:
break;
case VK_ERROR_OUT_OF_DATE_KHR:
case VK_SUBOPTIMAL_KHR:
return OnWindowSizeChanged();
default:
std::cout << "Problem occurred during image presentation!" << std::endl;
return false;
}

return true;
17. Tutorial04.cpp, function Draw()
So first we take the least recently used rendering resource. Then we wait until the fence associated with this group is
signaled. If it is, this means that we can safely take a command buffer and record it. But this also means that we can take
semaphores used to acquire and present an image that was referenced in a given command buffer. We shouldn’t use the
same semaphore for different purposes or in two different submit operations, until the previous submission is finished.
The fences prevent us from altering both command buffers and semaphores. And as you will soon see, framebuffers too.

When a fence is finished, we reset the fence and perform normal drawing-related operations: we acquire an image,
record operations rendering into an acquired image, submit the command buffer, and present an image.

After that we take another set of rendering resources and perform these same operations. Thanks to keeping three
groups of rendering resources, three virtual frames, we lower the time wasted on waiting for a fence to be signaled.

Recording a Command Buffer


A function responsible for recording a command buffer is quite long. This time it is even longer, because we use a
vertex buffer and a dynamic viewport and scissor test. And we also create temporary framebuffers!

Framebuffer creation is simple and fast. Keeping framebuffer objects along with a swapchain means that we need to
recreate them when the swapchain needs to be recreated. If our rendering algorithm is complicated, we have multiple
images and framebuffers associated with them. If those images need to have the same size as swapchain images, we need
to recreate all of them (to include potential size change). So it is better and more convenient to create framebuffers on
demand. This way, they always have the desired size. Framebuffers operate on image views, which are created for a given,
specific image. When a swapchain is recreated, old images are invalid, not existent. So we must recreate image views and
also framebuffers.

In the “03 – First Triangle” tutorial, we had framebuffers of a fixed size and they had to be recreated along with a
swapchain. Now we have a framebuffer object in each of our virtual frame group of resources. Before we record a
command buffer, we create a framebuffer for an image to which we will be rendering, and of the same size as that image.
This way, when swapchain is recreated, the size of the next frame will be immediately adjusted and a handle of the new
swapchain’s image and its image view will be used to create a framebuffer.

When we record a command buffer that uses a render pass and framebuffer objects, the framebuffer must remain
valid for the whole time the command buffer is processed by the queue. When we create a new framebuffer, we can’t
destroy it until commands submitted to a queue are finished. But as we are using fences, and we have already waited on
a fence associated with a given command buffer, we are sure that the framebuffer can be safely destroyed. We then
create a new framebuffer to include potential size and image handle changes.
if( framebuffer != VK_NULL_HANDLE ) {
vkDestroyFramebuffer( GetDevice(), framebuffer, nullptr );
}

VkFramebufferCreateInfo framebuffer_create_info = {
VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkFramebufferCreateFlags
flags
Vulkan.RenderPass, // VkRenderPass
renderPass
1, // uint32_t
attachmentCount
&image_view, // const VkImageView
*pAttachments
GetSwapChain().Extent.width, // uint32_t
width
GetSwapChain().Extent.height, // uint32_t
height
1 // uint32_t
layers
};

if( vkCreateFramebuffer( GetDevice(), &framebuffer_create_info, nullptr, &framebuffer


) != VK_SUCCESS ) {
std::cout << "Could not create a framebuffer!" << std::endl;
return false;
}

return true;
18. Tutorial04.cpp, function CreateFramebuffer()

When we create a framebuffer, we take current swapchain extents and image view for an acquired swapchain image.

Next we start recording a command buffer:


if( !CreateFramebuffer( framebuffer, image_parameters.View ) ) {
return false;
}

VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};

vkBeginCommandBuffer( command_buffer, &command_buffer_begin_info );

VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags
aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t
levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t
layerCount
};

if( GetPresentQueue().Handle != GetGraphicsQueue().Handle ) {


VkImageMemoryBarrier barrier_from_present_to_draw = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
GetPresentQueue().FamilyIndex, // uint32_t
srcQueueFamilyIndex
GetGraphicsQueue().FamilyIndex, // uint32_t
dstQueueFamilyIndex
image_parameters.Handle, // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0, 0, nullptr, 0, nullptr, 1,
&barrier_from_present_to_draw );
}

VkClearValue clear_value = {
{ 1.0f, 0.8f, 0.4f, 0.0f }, // VkClearColorValue
color
};

VkRenderPassBeginInfo render_pass_begin_info = {
VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.RenderPass, // VkRenderPass
renderPass
framebuffer, // VkFramebuffer
framebuffer
{ // VkRect2D
renderArea
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
GetSwapChain().Extent, // VkExtent2D
extent;
},
1, // uint32_t
clearValueCount
&clear_value // const VkClearValue
*pClearValues
};

vkCmdBeginRenderPass( command_buffer, &render_pass_begin_info,


VK_SUBPASS_CONTENTS_INLINE );
19. Tutorial04.cpp, function PrepareFrame()

First we define a variable of type VkCommandBufferBeginInfo and specify that a command buffer will be submitted
only once. When we specify a VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT flag, we can’t submit a given
command buffer more times. After each submission it must be reset. But the recording operation resets it due to the
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag used during command pool creation.
Next we define subresource ranges for image memory barriers. The layout transitions of the swapchain images are
performed implicitly inside a render pass, but if the graphics and presentation queue are different, the queue transition
must be manually performed.

After that we begin a render pass with the temporary framebuffer object.
vkCmdBindPipeline( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
Vulkan.GraphicsPipeline );

VkViewport viewport = {
0.0f, // float
x
0.0f, // float
y
static_cast<float>(GetSwapChain().Extent.width), // float
width
static_cast<float>(GetSwapChain().Extent.height), // float
height
0.0f, // float
minDepth
1.0f // float
maxDepth
};

VkRect2D scissor = {
{ // VkOffset2D
offset
0, // int32_t
x
0 // int32_t
y
},
{ // VkExtent2D
extent
GetSwapChain().Extent.width, // uint32_t
width
GetSwapChain().Extent.height // uint32_t
height
}
};

vkCmdSetViewport( command_buffer, 0, 1, &viewport );


vkCmdSetScissor( command_buffer, 0, 1, &scissor );

VkDeviceSize offset = 0;
vkCmdBindVertexBuffers( command_buffer, 0, 1, &Vulkan.VertexBuffer.Handle, &offset );

vkCmdDraw( command_buffer, 4, 1, 0, 0 );

vkCmdEndRenderPass( command_buffer );

if( GetGraphicsQueue().Handle != GetPresentQueue().Handle ) {


VkImageMemoryBarrier barrier_from_draw_to_present = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType
sType
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_MEMORY_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
oldLayout
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, // VkImageLayout
newLayout
GetGraphicsQueue().FamilyIndex, // uint32_t
srcQueueFamilyIndex
GetPresentQueue().FamilyIndex, // uint32_t
dstQueueFamilyIndex
image_parameters.Handle, // VkImage
image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0,
0, nullptr, 0, nullptr, 1, &barrier_from_draw_to_present );
}

if( vkEndCommandBuffer( command_buffer ) != VK_SUCCESS ) {


std::cout << "Could not record command buffer!" << std::endl;
return false;
}
return true;
20. Tutorial04.cpp, function PrepareFrame()

Next we bind a graphics pipeline. It has two states marked as dynamic: viewport and scissor test. So we prepare
structures that define viewport and scissor test parameters. The dynamic viewport state is set by calling the
vkCmdSetViewport() function. The dynamic scissor test is set by calling the vkCmdSetScissor() function. This way, our
graphics pipeline can be used for rendering into images of different sizes.

One last thing before we can draw anything is to bind appropriate vertex buffer, providing buffer data for vertex
attributes. We do this through the vkCmdBindVertexBuffers() function call. We specify a binding number (which set of
vertex attributes should take data from this buffer), a pointer to a buffer handle (or more handles if we want to bind
buffers for multiple bindings) and an offset. The offset specifies that data for vertex attributes should be taken from further
parts of the buffer. But we can’t specify offset larger than the size of a corresponding buffer (buffer, not memory object
bound to this buffer).

Now we have specified all the required elements: framebuffer, viewport and scissor test, and a vertex buffer. We can
draw the geometry, finish the render pass, and end the command buffer.

Tutorial04 Execution
Here is the result of rendering operations:
We are rendering a quad that has different colors in each corner. Try resizing the window; previously, the triangle was
always the same size, only the black frame on the right and bottom sides of an application window grew larger or smaller.
Now, thanks to the dynamic viewport state, the quad is growing or shrinking along with the window.

Cleaning Up
After rendering and before closing the application, we should destroy all resources. Here is a code responsible for this
operation:
if( GetDevice() != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( GetDevice() );

for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {


if( Vulkan.RenderingResources[i].Framebuffer != VK_NULL_HANDLE ) {
vkDestroyFramebuffer( GetDevice(), Vulkan.RenderingResources[i].Framebuffer,
nullptr );
}
if( Vulkan.RenderingResources[i].CommandBuffer != VK_NULL_HANDLE ) {
vkFreeCommandBuffers( GetDevice(), Vulkan.CommandPool, 1,
&Vulkan.RenderingResources[i].CommandBuffer );
}
if( Vulkan.RenderingResources[i].ImageAvailableSemaphore != VK_NULL_HANDLE ) {
vkDestroySemaphore( GetDevice(),
Vulkan.RenderingResources[i].ImageAvailableSemaphore, nullptr );
}
if( Vulkan.RenderingResources[i].FinishedRenderingSemaphore != VK_NULL_HANDLE ) {
vkDestroySemaphore( GetDevice(),
Vulkan.RenderingResources[i].FinishedRenderingSemaphore, nullptr );
}
if( Vulkan.RenderingResources[i].Fence != VK_NULL_HANDLE ) {
vkDestroyFence( GetDevice(), Vulkan.RenderingResources[i].Fence, nullptr );
}
}

if( Vulkan.CommandPool != VK_NULL_HANDLE ) {


vkDestroyCommandPool( GetDevice(), Vulkan.CommandPool, nullptr );
Vulkan.CommandPool = VK_NULL_HANDLE;
}
if( Vulkan.VertexBuffer.Handle != VK_NULL_HANDLE ) {
vkDestroyBuffer( GetDevice(), Vulkan.VertexBuffer.Handle, nullptr );
Vulkan.VertexBuffer.Handle = VK_NULL_HANDLE;
}

if( Vulkan.VertexBuffer.Memory != VK_NULL_HANDLE ) {


vkFreeMemory( GetDevice(), Vulkan.VertexBuffer.Memory, nullptr );
Vulkan.VertexBuffer.Memory = VK_NULL_HANDLE;
}

if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {


vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
}

if( Vulkan.RenderPass != VK_NULL_HANDLE ) {


vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
Vulkan.RenderPass = VK_NULL_HANDLE;
}
}
21. Tutorial04.cpp, function ChildClear()

We destroy all resources after the device completes processing all commands submitted to all its queues. We destroy
resources in a reverse order. First we destroy all rendering resources: framebuffers, command buffers, semaphores and
fences. Fences are destroyed by calling the vkDestroyFence() function. Then the command pool is destroyed. After that
we destroy buffer by calling the vkDestroyBuffer() function, and free memory object by calling the vkFreeMemory()
function. Finally the pipeline object and a render pass are destroyed.

Conclusion
This tutorial is based on the”03 – First Triangle” tutorial. We improved rendering by using vertex attributes in a
graphics pipeline and vertex buffers bound during command buffer recording. We described the number and layout of
vertex attributes. We introduced dynamic pipeline states for the viewport and scissors test. We learned how to create
buffers and memory objects and how to bind one to another. We also mapped memory and upload data from the CPU to
the GPU.

We have created a set of rendering resources that allow us to efficiently record and issue rendering commands. These
resources consisted of command buffers, semaphores, fences, and framebuffers. We learned how to use fences, how to
set up values of dynamic pipeline states, and how to bind vertex buffers (source of vertex attribute data) during command
buffer recording.

The next tutorial will present staging resources. These are intermediate buffers used to copy data between the CPU
and GPU. This way, buffers (or images) used for rendering don’t have to be mapped by an application and can be bound
to a device’s local (very fast) memory.

Notices

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Performance varies depending on system configuration. Check with your system
manufacturer or retailer or learn more at intel.com.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation


API without Secrets: Introduction to Vulkan*
Part 5
Table of Contents
Tutorial 5: Staging Resources – Copying Data Between Buffers............................................................................................. 2
Creating Rendering Resources ............................................................................................................................................ 2
Buffer creation .................................................................................................................................................................... 4
Vertex Buffer Creation .................................................................................................................................................... 6
Staging Buffer Creation ................................................................................................................................................... 7
Copying Data Between Buffers ........................................................................................................................................... 7
Setting a Buffer Memory Barrier................................................................................................................................... 10
Tutorial05 Execution ......................................................................................................................................................... 11
Cleaning Up ....................................................................................................................................................................... 11
Conclusion ......................................................................................................................................................................... 12
Tutorial 5: Staging Resources – Copying Data between Buffers
In this part of the tutorial we will focus on improving performance. At the same time, we will prepare for the next
tutorial, in which we introduce images and descriptors (shader resources). Using the knowledge we gather here, it will be
easier for us to follow the next part and squeeze as much performance as possible from our graphics hardware.

What are “staging resources” or “staging buffers”? They are intermediate or temporary resources used to transfer
data from an application (CPU) to a graphics card’s memory (GPU). We need them to increase our application’s
performance.

In Part 4 of the tutorial we learned how to use buffers, bind them to a host-visible memory, map this memory, and
transfer data from the CPU to the GPU. This approach is easy and convenient for us, but we need to know that host-visible
parts of a graphics card’s memory aren’t the most efficient. Typically, they are much slower than the parts of the memory
that are not directly accessible to the application (cannot be mapped by an application). This causes our application to
execute in a sub-optimal way.

One solution to this problem is to always use device-local memory for all resources involved in a rendering process.
But as device-local memory isn’t accessible for an application, we cannot directly transfer any data from the CPU to such
memory. That’s why we need intermediate, or staging, resources.

In this part of the tutorial we will bind the buffer with vertex attribute data to the device-local memory. And we will
use the staging buffer to mediate the transfer of data from the CPU to the vertex buffer.

Again, only the differences between this tutorial and the previous tutorial (Part 4) are described.

Creating Rendering Resources


This time I have moved rendering resources creation to the beginning of our code. Later we will need to record and
submit a command buffer to transfer data from the staging resource to the vertex buffer. I have also refactored rendering
resource creation code to eliminate multiple loops and replace them with only one loop. In this loop we can create all
resources that compose our virtual frame.
bool Tutorial05::CreateRenderingResources() {
if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.CommandPool ) ) {
return false;
}

for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {


if( !AllocateCommandBuffers( Vulkan.CommandPool, 1,
&Vulkan.RenderingResources[i].CommandBuffer ) ) {
return false;
}

if( !CreateSemaphore( &Vulkan.RenderingResources[i].ImageAvailableSemaphore ) ) {


return false;
}

if( !CreateSemaphore( &Vulkan.RenderingResources[i].FinishedRenderingSemaphore )


) {
return false;
}

if( !CreateFence( VK_FENCE_CREATE_SIGNALED_BIT,


&Vulkan.RenderingResources[i].Fence ) ) {
return false;
}
}
return true;
}
bool Tutorial05::CreateCommandPool( uint32_t queue_family_index, VkCommandPool *pool
) {
VkCommandPoolCreateInfo cmd_pool_create_info = {
VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT | // VkCommandPoolCreateFlags
flags
VK_COMMAND_POOL_CREATE_TRANSIENT_BIT,
queue_family_index // uint32_t
queueFamilyIndex
};

if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) !=


VK_SUCCESS ) {
std::cout << "Could not create command pool!" << std::endl;
return false;
}
return true;
}

bool Tutorial05::AllocateCommandBuffers( VkCommandPool pool, uint32_t count,


VkCommandBuffer *command_buffers ) {
VkCommandBufferAllocateInfo command_buffer_allocate_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
pool, // VkCommandPool
commandPool
VK_COMMAND_BUFFER_LEVEL_PRIMARY, // VkCommandBufferLevel
level
count // uint32_t
bufferCount
};

if( vkAllocateCommandBuffers( GetDevice(), &command_buffer_allocate_info,


command_buffers ) != VK_SUCCESS ) {
std::cout << "Could not allocate command buffer!" << std::endl;
return false;
}
return true;
}

bool Tutorial05::CreateSemaphore( VkSemaphore *semaphore ) {


VkSemaphoreCreateInfo semaphore_create_info = {
VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void*
pNext
0 // VkSemaphoreCreateFlags
flags
};

if( vkCreateSemaphore( GetDevice(), &semaphore_create_info, nullptr, semaphore ) !=


VK_SUCCESS ) {
std::cout << "Could not create semaphore!" << std::endl;
return false;
}
return true;
}
bool Tutorial05::CreateFence( VkFenceCreateFlags flags, VkFence *fence ) {
VkFenceCreateInfo fence_create_info = {
VK_STRUCTURE_TYPE_FENCE_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
flags // VkFenceCreateFlags
flags
};

if( vkCreateFence( GetDevice(), &fence_create_info, nullptr, fence ) != VK_SUCCESS


) {
std::cout << "Could not create a fence!" << std::endl;
return false;
}
return true;
}
1. Tutorial05.cpp

First we create a command pool for which we indicate that command buffers allocated from this pool will be short
lived. In our case, all command buffers will be submitted only once before rerecording.

Next we iterate over the arbitrary chosen number of virtual frames. In this code example, the number of virtual frames
is three. Inside the loop, for each virtual frame, we allocate one command buffer, create two semaphores (one for image
acquisition and a second to indicate that frame rendering is done) and a fence. Framebuffer creation is done inside a
drawing function, just before command buffer recording.

This is the same set of rendering resources used in Part 4, where you can find a more thorough explanation of what is
going on in the code. I will also skip render pass and graphics pipeline creation. They are created in exactly the same way
they were created previously. Since nothing has changed here, we will jump directly to buffer creation.

Buffer creation
Here is our general code used for buffer creation:
VkBufferCreateInfo buffer_create_info = {
VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkBufferCreateFlags
flags
buffer.Size, // VkDeviceSize
size
usage, // VkBufferUsageFlags
usage
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode
sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr // const uint32_t
*pQueueFamilyIndices
};

if( vkCreateBuffer( GetDevice(), &buffer_create_info, nullptr, &buffer.Handle ) !=


VK_SUCCESS ) {
std::cout << "Could not create buffer!" << std::endl;
return false;
}
if( !AllocateBufferMemory( buffer.Handle, memoryProperty, &buffer.Memory ) ) {
std::cout << "Could not allocate memory for a buffer!" << std::endl;
return false;
}

if( vkBindBufferMemory( GetDevice(), buffer.Handle, buffer.Memory, 0 ) != VK_SUCCESS


) {
std::cout << "Could not bind memory to a buffer!" << std::endl;
return false;
}

return true;
2. Tutorial05.cpp, function CreateBuffer()

The code is wrapped into a CreateBuffer() function, which accepts the buffer’s usage, size, and requested memory
properties. To create a buffer we need to prepare a variable of type VkBufferCreateInfo. It is a structure that contains the
following members:

 sType – Standard type of the structure. Here it should be equal to


VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO.
 pNext – Pointer reserved for extensions.
 flags – Parameter describing additional properties of the buffer. Right now we can only specify that the buffer
can be backed by a sparse memory.
 size – Size of the buffer (in bytes).
 usage – Bitfield indicating intended usages of the buffer.
 sharingMode – Queue sharing mode.
 queueFamilyIndexCount – Number of different queue families that will access the buffer in case of a
concurrent sharing mode.
 pQueueFamilyIndices – Array with indices of all queue families that will access the buffer when concurrent
sharing mode is used.

Right now we are not interested in binding a sparse memory. We do not want to share the buffer between different
device queues, so sharingMode, queueFamilyIndexCount, and pQueueFamilyIndices parameters are irrelevant. The most
important parameters are size and usage. We are not allowed to use a buffer in a way that is not specified during buffer
creation. Finally, we need to create a buffer that is large enough to contain our data.

To create a buffer we call the vkCreateBuffer() function, which when successful stores the buffer handle in a variable
we provided the address of. But creating a buffer is not enough. A buffer, after creation, doesn’t have any storage. We
need to bind a memory object (or part of it) to the buffer to back its storage. Or, if we don’t have any memory objects, we
need to allocate one.

Each buffer’s usage may have a different memory requirement, which is relevant when we want to allocate a memory
object and bind it to the buffer. Here is a code sample that allocates a memory object for a given buffer:
VkMemoryRequirements buffer_memory_requirements;
vkGetBufferMemoryRequirements( GetDevice(), buffer, &buffer_memory_requirements );

VkPhysicalDeviceMemoryProperties memory_properties;
vkGetPhysicalDeviceMemoryProperties( GetPhysicalDevice(), &memory_properties );

for( uint32_t i = 0; i < memory_properties.memoryTypeCount; ++i ) {


if( (buffer_memory_requirements.memoryTypeBits & (1 << i)) &&
((memory_properties.memoryTypes[i].propertyFlags & property) == property) ) {

VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
buffer_memory_requirements.size, // VkDeviceSize
allocationSize
i // uint32_t
memoryTypeIndex
};

if( vkAllocateMemory( GetDevice(), &memory_allocate_info, nullptr, memory ) ==


VK_SUCCESS ) {
return true;
}
}
}
return false;
3. Tutorial05.cpp, function AllocateBufferMemory()

Similarly to the code in Part 4, we first check what the memory requirements for a given buffer are. After that we
check the properties of a memory available in a given physical device. It contains information about the number of memory
heaps and their capabilities.

Next we iterate over each available memory type and check if it is compatible with the requirement queried for a
given buffer. We also check if a given memory type supports our additional, requested properties, for example, whether
a given memory type is host-visible. When we find a match, we fill in a VkMemoryAllocateInfo structure and call a
vkAllocateMemory() function.

The allocated memory object is then bound to our buffer, and from now on we can safely use this buffer in our
application.

Vertex Buffer Creation


The first buffer we want to create is a vertex buffer. It stores data for vertex attributes that are used during rendering.
In this example we store position and color for four vertices of a quad. The most important change from the previous
tutorial is the use of a device-local memory instead of a host-visible memory. Device-local memory is much faster, but we
can’t copy any data directly from the application to device-local memory. We need to use a staging buffer, from which we
copy data to the vertex buffer.

We also need to specify two different usages for this buffer. The first is a vertex buffer usage, which means that we
want to use the given buffer as a vertex buffer from which data for the vertex attributes will be fetched. The second is
transfer dst usage, which means that we will copy data to this buffer. It will be used as a destination of any transfer (copy)
operation.

The code that creates a buffer with all these requirements looks like this:
const std::vector<float>& vertex_data = GetVertexData();

Vulkan.VertexBuffer.Size = static_cast<uint32_t>(vertex_data.size() *
sizeof(vertex_data[0]));
if( !CreateBuffer( VK_BUFFER_USAGE_VERTEX_BUFFER_BIT |
VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
Vulkan.VertexBuffer ) ) {
std::cout << "Could not create vertex buffer!" << std::endl;
return false;
}

return true;
4. Tutorial05.cpp, function CreateVertexBuffer()

At the beginning we get the vertex data (hard-coded in a GetVertexData() function) to check how much space we need
to hold values for all our vertices. After that we call a CreateBuffer() function presented earlier to create a vertex buffer
and bind a device-local memory to it.

Staging Buffer Creation


Next we can create an intermediate staging buffer. This buffer is not used during the rendering process so it can be
bound to a slower, host-visible memory. This way we can map it and copy data directly from the application. After that
we can copy data from the staging buffer to any other buffer (or even image) that is bound to device-local memory. This
way all resources that are used for rendering purposes are bound to the fastest available memory. We just need additional
operations for the data transfer.

Here is a code that creates a staging buffer:


Vulkan.StagingBuffer.Size = 4000;
if( !CreateBuffer( VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, Vulkan.StagingBuffer ) ) {
std::cout << "Could not staging buffer!" << std::endl;
return false;
}

return true;
5. Tutorial04.cpp, function CreateStagingBuffer()

We will copy data from this buffer to other resources, so we must specify a transfer src usage for it (it will be used as
a source for transfer operations). We would also like to map it to be able to directly copy any data from the application.
For this we need to use a host-visible memory and that’s why we specify this memory property. The buffer’s size is chosen
arbitrarily, but should be large enough to be able to hold vertex data. In real-life scenarios we should try to reuse the
staging buffer as many times as possible, in many cases, so its size should be big enough to cover most of data transfer
operations in our application. Of course, if we want to do many transfer operations at the same time, we have to create
multiple staging buffers.

Copying Data between Buffers


We have created two buffers: one for the vertex attributes data and the other to act as an intermediate buffer. Now
we need to copy data from the CPU to the GPU. To do this we need to map the staging buffer and acquire a pointer that
we can use to upload data to the graphics hardware’s memory. After that we need to record and submit a command
buffer that will copy the vertex data from the staging buffer to the vertex buffer. And as all of our command buffers used
for virtual frames and rendering are marked as short lived, we can safely use one of them for this operation.

First let’s see what our data for vertex attributes looks like:
static const std::vector<float> vertex_data = {
-0.7f, -0.7f, 0.0f, 1.0f,
1.0f, 0.0f, 0.0f, 0.0f,
//
-0.7f, 0.7f, 0.0f, 1.0f,
0.0f, 1.0f, 0.0f, 0.0f,
//
0.7f, -0.7f, 0.0f, 1.0f,
0.0f, 0.0f, 1.0f, 0.0f,
//
0.7f, 0.7f, 0.0f, 1.0f,
0.3f, 0.3f, 0.3f, 0.0f
};
return vertex_data;
6. Tutorial05.cpp, function GetVertexData()

It is a simple, hard-coded array of floating point values. Data for each vertex contains four components for position
attribute and four components for color attribute. As we render a quad, we have four pairs of such attributes.

Here is the code that copies data from the application to the staging buffer and after that from the staging buffer to
the vertex buffer:
// Prepare data in a staging buffer
const std::vector<float>& vertex_data = GetVertexData();

void *staging_buffer_memory_pointer;
if( vkMapMemory( GetDevice(), Vulkan.StagingBuffer.Memory, 0,
Vulkan.VertexBuffer.Size, 0, &staging_buffer_memory_pointer) != VK_SUCCESS ) {
std::cout << "Could not map memory and upload data to a staging buffer!" <<
std::endl;
return false;
}

memcpy( staging_buffer_memory_pointer, &vertex_data[0], Vulkan.VertexBuffer.Size );

VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.StagingBuffer.Memory, // VkDeviceMemory
memory
0, // VkDeviceSize
offset
Vulkan.VertexBuffer.Size // VkDeviceSize
size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );

vkUnmapMemory( GetDevice(), Vulkan.StagingBuffer.Memory );

// Prepare command buffer to copy data from staging buffer to a vertex buffer
VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};

VkCommandBuffer command_buffer = Vulkan.RenderingResources[0].CommandBuffer;

vkBeginCommandBuffer( command_buffer, &command_buffer_begin_info);

VkBufferCopy buffer_copy_info = {
0, // VkDeviceSize
srcOffset
0, // VkDeviceSize
dstOffset
Vulkan.VertexBuffer.Size // VkDeviceSize
size
};
vkCmdCopyBuffer( command_buffer, Vulkan.StagingBuffer.Handle,
Vulkan.VertexBuffer.Handle, 1, &buffer_copy_info );

VkBufferMemoryBarrier buffer_memory_barrier = {
VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER, // VkStructureType
sType;
nullptr, // const void
*pNext
VK_ACCESS_MEMORY_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT, // VkAccessFlags
dstAccessMask
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.VertexBuffer.Handle, // VkBuffer
buffer
0, // VkDeviceSize
offset
VK_WHOLE_SIZE // VkDeviceSize
size
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, 0, 0, nullptr, 1, &buffer_memory_barrier, 0, nullptr
);

vkEndCommandBuffer( command_buffer );

// Submit command buffer and copy data from staging buffer to a vertex buffer
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // uint32_t
waitSemaphoreCount
nullptr, // const VkSemaphore
*pWaitSemaphores
nullptr, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&command_buffer, // const VkCommandBuffer
*pCommandBuffers
0, // uint32_t
signalSemaphoreCount
nullptr // const VkSemaphore
*pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) !=


VK_SUCCESS ) {
return false;
}

vkDeviceWaitIdle( GetDevice() );

return true;
7. Tutorial05.cpp, function CopyVertexData()

At the beginning, we get vertex data and map the staging buffer’s memory by calling the vkMapMemory() function.
During the call, we specify a handle of a memory that is bound to a staging buffer, and buffer’s size. This gives us a pointer
that we can use in an ordinary memcpy() function to copy data from our application to graphics hardware.

Next we flush the mapped memory to tell the driver which parts of a memory object were modified. We can specify
multiple ranges of memory if needed. We have one memory area that should be flushed and we specify it by creating a
variable of type VkMappedMemoryRange and by calling a vkFlushMappedMemoryRanges() function. After that we
unmap the memory, but we don’t have to do this. We can keep a pointer for later use and this should not affect the
performance of our application.

Next we start preparing a command buffer. We specify that it will be submitted only once before it will be reset. We
fill a VkCommandBufferBeginInfo structure and provide it to a vkBeginCommandBuffer() function.

Now we perform the copy operation. First a variable of type VkBufferCopy is created. It contains the following fields:

 srcOffset – Offset in bytes in a source buffer from which we want to copy data.
 dstOffset – Offset in bytes in a destination buffer into which we want to copy data.
 size – Size of the data (in bytes) we want to copy.

We copy data from the beginning of a staging buffer and to the beginning of a vertex buffer, so we specify zero for
both offsets. The size of the vertex buffer was calculated based on the hard-coded vertex data, so we copy the same
number of bytes. To copy data from one buffer to another, we call a vkCmdCopyBuffer() function.

Setting a Buffer Memory Barrier


We have recorded a copy operation but that’s not all. From now on we will not use the buffer as a target for transfer
operations but as a vertex buffer. We need to tell the driver that the type of buffer’s memory access will change and from
now on it will serve as a source of data for vertex attributes. To do this we set a memory barrier similarly to what we have
done earlier in case of swapchain images.

First we prepare a variable of type VkBufferMemoryBarrier, which contains the following members:

 sType – Standard structure type, here set to VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER.


 pNext – Parameter reserved for extensions.
 srcAccessMask – Types of memory operations that were performed on this buffer before the barrier.
 dstAccessMask – Memory operations that will be performed on a given buffer after the barrier.
 srcQueueFamilyIndex – Index of a queue family that accessed the buffer before.
 dstQueueFamilyIndex – Queue family that will access the buffer from now on.
 buffer – Handle to the buffer for which we set up a barrier.
 offset – Memory offset from the start of the buffer (from the memory’s base offset bound to the buffer).
 size – Size of the buffer’s memory area for which we want to setup a barrier.

As you can see, we can set up a barrier for a specific range of buffer’s memory. But here we do it for the whole buffer,
so we specify an offset of 0 and the VK_WHOLE_SIZE enum for the size. We don’t want to transfer ownership between
different queue families, so we use VK_QUEUE_FAMILY_IGNORED enum both for srcQueueFamilyIndex and
dstQueueFamilyIndex.

The most important parameters are srcAccessMask and dstAccessMask. We have copied data from the staging buffer
to a vertex buffer. So before the barrier, the vertex buffer was used as a destination for transfer operations and its memory
was written to. That’s why we have specified VK_ACCESS_MEMORY_WRITE_BIT for a srcAccessMask field. But after that
the barrier buffer will be used only as a source of data for vertex attributes. So for dstAccessMask field we specify
VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT.

To set up a barrier we call a vkCmdPipelineBarrier() function. And to finish command buffer recording, we call
vkEndCommandBuffer(). Next, for all of the above operations to execute, we submit a command buffer by calling
vkQueueSubmit() function.

Normally during the command buffer submission, we should provide a fence. It is signaled once all transfer operations
(and whole command buffer) are finished. But here, for the sake of simplicity, we call vkDeviceWaitIdle() and wait for all
operations executed on a given device to finish. Once all operations complete, we have successfully transferred data to
the device-local memory and we can use the vertex buffer without worrying about performance loss.

Tutorial05 Execution
The results of the rendering operations are exactly the same as in Part 4:

We render a quad that has different colors in each corner: red, green, dark gray, and blue. The quad should adjust its
size (and aspect) to match window’s size and shape.

Cleaning Up
In this part of the tutorial, I have also refactored the cleaning code. We have created two buffers, each with a separate
memory object. To avoid code redundancy, I prepared a buffer cleaning function:
if( buffer.Handle != VK_NULL_HANDLE ) {
vkDestroyBuffer( GetDevice(), buffer.Handle, nullptr );
buffer.Handle = VK_NULL_HANDLE;
}

if( buffer.Memory != VK_NULL_HANDLE ) {


vkFreeMemory( GetDevice(), buffer.Memory, nullptr );
buffer.Memory = VK_NULL_HANDLE;
}
8. Tutorial05.cpp, function DestroyBuffer()
This function checks whether a given buffer was successfully created, and if so it calls a vkDestroyBuffer() function. It
also frees memory associated with a given buffer through a vkFreeMemory() function call. The DestroyBuffer() function
is called in a destructor that also releases all other resources relevant to this part of the tutorial:
if( GetDevice() != VK_NULL_HANDLE ) {
vkDeviceWaitIdle( GetDevice() );

DestroyBuffer( Vulkan.VertexBuffer );

DestroyBuffer( Vulkan.StagingBuffer );

if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {


vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
}

if( Vulkan.RenderPass != VK_NULL_HANDLE ) {


vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
Vulkan.RenderPass = VK_NULL_HANDLE;
}

for( size_t i = 0; i < Vulkan.RenderingResources.size(); ++i ) {


if( Vulkan.RenderingResources[i].Framebuffer != VK_NULL_HANDLE ) {
vkDestroyFramebuffer( GetDevice(), Vulkan.RenderingResources[i].Framebuffer,
nullptr );
}
if( Vulkan.RenderingResources[i].CommandBuffer != VK_NULL_HANDLE ) {
vkFreeCommandBuffers( GetDevice(), Vulkan.CommandPool, 1,
&Vulkan.RenderingResources[i].CommandBuffer );
}
if( Vulkan.RenderingResources[i].ImageAvailableSemaphore != VK_NULL_HANDLE ) {
vkDestroySemaphore( GetDevice(),
Vulkan.RenderingResources[i].ImageAvailableSemaphore, nullptr );
}
if( Vulkan.RenderingResources[i].FinishedRenderingSemaphore != VK_NULL_HANDLE ) {
vkDestroySemaphore( GetDevice(),
Vulkan.RenderingResources[i].FinishedRenderingSemaphore, nullptr );
}
if( Vulkan.RenderingResources[i].Fence != VK_NULL_HANDLE ) {
vkDestroyFence( GetDevice(), Vulkan.RenderingResources[i].Fence, nullptr );
}
}

if( Vulkan.CommandPool != VK_NULL_HANDLE ) {


vkDestroyCommandPool( GetDevice(), Vulkan.CommandPool, nullptr );
Vulkan.CommandPool = VK_NULL_HANDLE;
}
}
9. Tutorial05.cpp, destructor

First we wait for all the operations performed by the device to finish. Next we destroy the vertex and staging buffers.
After that we destroy all other resources in the order opposite to their creation: graphics pipeline, render pass, and
resources for each virtual frame, which consists of a framebuffer, command buffer, two semaphores, a fence, and a
framebuffer. Finally we destroy a command pool from which command buffers were allocated.

Conclusion
In this tutorial we used the recommended technique for transferring data from the application to the graphics
hardware. It gives the best performance for the resources involved in the rendering process and the ability to map and
copy data from the application to the staging buffer. We only need to prepare an additional command buffer recording
and submission to transfer data from one buffer to another.

Using staging buffers is recommended for more than just copying data between buffers. We can use the same
approach to copy data from a buffer to images. And the next part of the tutorial will show how to do this by presenting
the descriptors, descriptor sets, and descriptor layouts, which are another big part of the Vulkan API.

Notices

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. Check with your system manufacturer or
retailer or learn more at intel.com.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation


API without Secrets: Introduction to Vulkan*
Part 6
Table of Contents
Tutorial 6: Descriptor Sets—Using Textures in Shaders ......................................................................................................... 2
Creating an Image .............................................................................................................................................................. 2
Allocating Image Memory .............................................................................................................................................. 4
Creating Image View ...................................................................................................................................................... 6
Copying Data to an Image .............................................................................................................................................. 6
Creating a Sampler ........................................................................................................................................................... 10
Using Descriptor Sets ....................................................................................................................................................... 11
Creating a Descriptor Set Layout .................................................................................................................................. 12
Creating a Descriptor Pool............................................................................................................................................ 14
Allocating Descriptor Sets ............................................................................................................................................ 15
Updating Descriptor Sets.............................................................................................................................................. 15
Creating a Pipeline Layout ............................................................................................................................................ 17
Binding Descriptor Sets ................................................................................................................................................ 18
Accessing Descriptors in Shaders ................................................................................................................................. 19
Tutorial06 Execution ........................................................................................................................................................ 20
Cleaning Up ...................................................................................................................................................................... 20
Conclusion ........................................................................................................................................................................ 21
Tutorial 6: Descriptor Sets—Using Textures in Shaders
We know how to create a graphics pipeline and how to use shaders to draw geometry on screen. We have also learned
how to create buffers and use them as a source of vertex data (vertex buffers). Now we need to know how to provide data
to shaders—we will see how to use resources like samplers and images inside shader source code and how to set up an
interface between the application and the programmable shader stages.

In this tutorial, we will focus on a functionality that is similar to OpenGL* textures. But in Vulkan* there are no such
objects. We have only two resource types in which we can store data: buffers and images (there are also push constants,
but we will cover them in a dedicated tutorial). Each of them can be provided to shaders, in which case we call such
resources descriptors, but we can’t provide them to shaders directly. Instead, they are aggregated in wrapper or container
objects called descriptor sets. We can place multiple resources in a single descriptor set but we need to do it according to
a predefined structure of such set. This structure defines the contents of a single descriptor set—types of resources that
are placed inside it, number of each of these resource types, and their order. This description is specified inside objects
named descriptor set layouts. Similar descriptions need to be specified when we write shader programs. Together they
form an interface between API (our application) and the programmable pipeline (shaders).

When we have prepared a layout, and created a descriptor set, we can fill it; in this way we define specific objects
(buffers and/or images) that we want to use in shaders. After that, before issuing drawing commands inside a command
buffer, we need to bind such a set to the command buffer. This allows us to use the resources from inside the shader
source code; for example, fetch data from a sampled image (a texture), or read a value of a uniform variable stored in a
uniform buffer.

In this part of the tutorial, we will see how to create descriptor set layouts and descriptor sets themselves. We will
also prepare a sampler and an image so we can make them available as a texture inside shaders. We will also learn how
we can use them inside shaders.

As mentioned previously, this tutorial is based on the knowledge presented in all the previous parts of the API without
Secrets: Introduction to Vulkan tutorials, and only the differences and parts important for the described topics are
presented.

Creating an Image
We start by creating an image that will act as our texture. Images represent a continuous area of memory, which is
interpreted according to the rules defined during image creation. In Vulkan, we have only three basic image types: 1D, 2D,
and 3D. Images may have mipmaps (levels of detail), many array layers (at least one is required), or samples per frame.
All these parameters are specified during image creation. In the code sample, we create the most commonly used
two-dimensional image, with one sample per pixel and the four RGBA components.
VkImageCreateInfo image_create_info = {
VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, // VkStructureType sType;
nullptr, // const void *pNext
0, // VkImageCreateFlags flags
VK_IMAGE_TYPE_2D, // VkImageType imageType
VK_FORMAT_R8G8B8A8_UNORM, // VkFormat format
{ // VkExtent3D extent
width, // uint32_t width
height, // uint32_t height
1 // uint32_t depth
},
1, // uint32_t mipLevels
1, // uint32_t arrayLayers
VK_SAMPLE_COUNT_1_BIT, // VkSampleCountFlagBits samples
VK_IMAGE_TILING_OPTIMAL, // VkImageTiling tiling
VK_IMAGE_USAGE_TRANSFER_DST_BIT | // VkImageUsageFlags usage
VK_IMAGE_USAGE_SAMPLED_BIT,
VK_SHARING_MODE_EXCLUSIVE, // VkSharingMode sharingMode
0, // uint32_t
queueFamilyIndexCount
nullptr, // const uint32_t *pQueueFamilyIndices
VK_IMAGE_LAYOUT_UNDEFINED // VkImageLayout initialLayout
};

return vkCreateImage( GetDevice(), &image_create_info, nullptr, image ) ==


VK_SUCCESS;
1. Tutorial06.cpp, function CreateImage()

To create an image we need to prepare a structure of type VkImageCreateInfo. This structure contains the basic set
of parameters required to create an image. These parameters are specified through the following members:

• sType – Typical type of the structure. It must be equal to a VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO value.


• pNext – Pointer reserved for extensions.
• flags – Parameter that describes additional properties of an image. Through this parameter we can specify
that the image can be backed by a sparse memory. But a more interesting value is a
VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, which allows us to use the image as a cubemap. If we don’t
have additional requirements, we can set this parameter to 0.
• imageType – Basic type (number of dimensions) of the image: 1D, 2D, or 3D.
• format – Format of the image: number of its components, number of bits for each component, and a data
type.
• extent – Size of the image (number of texels/pixels) in each dimension.
• mipLevels – Number of levels of detail (mipmaps).
• arrayLayers – Number of array layers.
• samples – Number of per texel samples (one for normal images and more than one for multisampled images).
• tiling – Defines the inner memory structure of the image: linear or optimal.
• usage – Defines all the ways in which we want to use an image during its overall lifetime.
• sharingMode – Specifies whether an image will be accessed by queues from multiple families at a time (the
same as the sharingMode parameter used during swapchain or buffer creation).
• queueFamilyIndexCount – Number of elements in a pQueueFamilyIndices array (used only when concurrent
sharing mode is specified).
• pQueueFamilyIndices – Array with indices of all queue families from which queues will access an image (used
only when concurrent sharing mode is specified).
• initialLayout – Memory layout image will be created with. We can only provide an undefined or preinitialized
layout. We also need to perform a layout transition before we can use an image inside command buffers.

Most of the parameters defined during image creation are quite self-explanatory or similar to parameters used during
creation of other resources. But three parameters require additional explanation.

Tiling defines the inner memory structure of an image (but don’t confuse it with a layout). Images may have linear or
optimal tiling (buffers always have linear tiling). Images with linear tiling have their texels laid out linearly, one texel after
another, one row after another, and so on. We can query for all the relevant image’s memory parameters (offset and size,
row, array, and depth stride). This way we know how the image’s contents are kept in memory. Such tiling can be used to
copy data to an image directly (by mapping the image’s memory). Unfortunately, there are severe restrictions on images
with linear tiling. For example, the Vulkan specification says that only 2D images must support linear tiling. Hardware
vendors may implement support for linear tiling in other image types, but this is not obligatory, and we can’t rely on such
support. But, what’s more important, linearly tiled images may have worse performance than their optimal counterparts.

When we specify an optimal tiling for images, it means that we don’t know how their memory is structured. Each
platform we execute our application on may keep an image’s contents in a totally different way, so it’s practically
impossible to map an image’s memory and copy it to or from the CPU directly (we need to use a staging resource, a buffer
or an image). But this way we can create whatever images we want (there are no restrictions similar to linearly tiled
images) and our application will have better performance. That’s why it is strongly suggested to always specify optimal
tiling for images.

Now let’s focus on an initialLayout parameter. Layout, as it was described in a tutorial about swapchains, defines an
image’s memory layout and is strictly connected with the way in which we want to use an image. Each specific usage has
its own memory layout. Before we can use an image in a given way we need to perform a layout transition. For example,
swapchain images can be displayed on screen only in VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout. When we want to
render into an image, we need to set its memory layout to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. There
is also a general layout that allows us to use images any way we want to, but as it impacts performance, it’s use is strongly
discouraged (use only when really necessary).

Now, when we want to change the way in which an image is used, we need to perform the above-mentioned layout
transition. We must specify a current (old) layout and a new one. The old layout can have one of two values: current image
layout or an undefined layout. When we specify the value of a current image’s layout, the image contents are preserved
during transition. But when we don’t need an image’s contents, we can provide an undefined layout. In this way layout
transition may be performed faster.

And this is when the initialLayout parameter comes in. We can specify only two values for it—undefined or
preinitialized. The preinitialized layout value allows us to preserve an image’s contents during the image’s first layout
transition. This way we can copy data to an image with memory mapping; but this is quite impractical. We can only copy
data directly (through memory mapping) to images with linear tiling, which have restrictions as mentioned above.
Practically speaking, these images can only be used as staging resources—for transferring data between GPU and CPU.
But for this purpose we can also use buffers; that’s why it is much easier to copy data using a buffer than using an image
with linear tiling.

All this leads to the conclusion that, in most cases, an undefined layout can be used for an initialLayout parameter. In
such a case, an image’s contents cannot be initialized directly (by mapping its memory). But if we want to, we can copy
data to such an image by using a staging buffer. That approach is presented in this tutorial.

One last thing we need to remember is the usage. Similar to buffers, when we create an image we need to designate
ALL the ways in which we intend to use the image. We can’t change it later and we can’t use the image in a way that
wasn’t specified during its creation. Here, we want to use an image as a texture inside shaders. For this purpose we specify
the VK_IMAGE_USAGE_SAMPLED_BIT usage. We also need a way to upload data to the image. We are going to read it
from an image file and copy it to the image object. This can be done by transferring data using a staging resource. In such
a case, the image will be a target of a transfer operation; that’s why we also specify the
VK_IMAGE_USAGE_TRANSFER_DST_BIT usage.

Now, when we have provided values for all the parameters, we can create an image. This is done by calling the
vkCreateImage() function for which we need to provide a handle of a logical device, a pointer to the structure described
above, and a pointer to a variable of type VkImage in which the handle of the created image will be stored.

Allocating Image Memory


Similar to buffers, images don’t have their own memory, so before we can use images we need to bind memory to
them. To do this, we first need to know what the properties of memory that can be bound to an image are. We do this by
calling the vkGetImageMemoryRequirements() function.
VkMemoryRequirements image_memory_requirements;
vkGetImageMemoryRequirements( GetDevice(), Vulkan.Image.Handle,
&image_memory_requirements );
2. Tutorial06.cpp, function AllocateImageMemory()
The above call stores the required memory parameters in an image_memory_requirements variable. This tells us how
much memory we need and which memory type supported by a given physical device can be used for an image’s memory
allocation. If we don’t know what memory types are supported by a given physical device we can learn about them by
calling the vkGetPhysicalDeviceMemoryProperties() function. This was covered in a previous tutorial, when we were
allocating memory for a buffer. Next, we can iterate over available memory types and check which are compatible with
our image.
for( uint32_t i = 0; i < memory_properties.memoryTypeCount; ++i ) {
if( (image_memory_requirements.memoryTypeBits & (1 << i)) &&
(memory_properties.memoryTypes[i].propertyFlags & property) ) {

VkMemoryAllocateInfo memory_allocate_info = {
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
image_memory_requirements.size, // VkDeviceSize allocationSize
i // uint32_t memoryTypeIndex
};

if( vkAllocateMemory( GetDevice(), &memory_allocate_info, nullptr, memory ) ==


VK_SUCCESS ) {
return true;
}
}
}
return false;
3. Tutorial06.cpp, function AllocateImageMemory()

Each memory type has a specific set of properties. When we want to bind memory to an image, we can have our own
specific requirements too. For example, we may need to access memory directly, by mapping it, so such memory must be
host-visible. If we have additional requirements we can compare them with the properties of each available memory type.
When we find the match, we can use a given memory type and allocate a memory object from it by calling the
vkAllocateMemory() function.

After that, we need to bind such memory to our image. We do this by calling the vkBindImageMemory() function and
providing the handle of an image to which we want to bind memory, a handle of a memory object, and an offset from the
beginning of the memory object, like this:
if( vkBindImageMemory( GetDevice(), Vulkan.Image.Handle, Vulkan.Image.Memory, 0 ) !=
VK_SUCCESS ) {
std::cout << "Could not bind memory to an image!" << std::endl;
return false;
}
4. Tutorial06.cpp, function CreateTexture()

Offset value is very important when we bind memory to an object. Resources in Vulkan have specific requirements
for memory offset alignment. Information about the requirements is also available in the image_memory_requirements
variable. The offset that we provide when we bind a memory must be a multiple of the variable’s alignment member. Zero
is always a valid offset value.

Of course, when we want to bind a memory to an image, we don’t need to create a new memory object each time. It
is more optimal to create a small number of larger memory objects and bind parts of them by providing a proper offset
value.
Creating Image View
When we want to use an image in our application we rarely provide the image’s handle. Image views are usually used
instead. They provide an additional layer that interprets the contents of an image for the purpose of using it in a specific
context. For example, we may have a multilayer image (2D array) and we want to render only to a specific array layer. To
do this we create an image view in which we define the layer we want to use. Another example is an image with six array
layers. Using image views, we can interpret it as a cubemap.

Creation of image views was described in Introduction to Vulkan Part 3: First Triangle, so I will provide only the source
code used in this part.
VkImageViewCreateInfo image_view_create_info = {
VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // VkImageViewCreateFlags flags
image_parameters.Handle, // VkImage image
VK_IMAGE_VIEW_TYPE_2D, // VkImageViewType viewType
VK_FORMAT_R8G8B8A8_UNORM, // VkFormat format
{ // VkComponentMapping components
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle r
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle g
VK_COMPONENT_SWIZZLE_IDENTITY, // VkComponentSwizzle b
VK_COMPONENT_SWIZZLE_IDENTITY // VkComponentSwizzle a
},
{ // VkImageSubresourceRange
subresourceRange
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t
baseMipLevel
1, // uint32_t levelCount
0, // uint32_t
baseArrayLayer
1 // uint32_t layerCount
}
};

return vkCreateImageView( GetDevice(), &image_view_create_info, nullptr,


&image_parameters.View ) == VK_SUCCESS;
5. Tutorial06.cpp, function CreateImageView()

Copying Data to an Image


Now we need to copy data to our image. We do this by using a staging buffer. We first create a buffer big enough to
hold our image data. Next, we allocate memory that is host-visible (that can be mapped), and bind it to the buffer. Then,
we copy data to the buffer’s memory like this:
// Prepare data in staging buffer
void *staging_buffer_memory_pointer;
if( vkMapMemory( GetDevice(), Vulkan.StagingBuffer.Memory, 0, data_size, 0,
&staging_buffer_memory_pointer ) != VK_SUCCESS ) {
std::cout << "Could not map memory and upload texture data to a staging buffer!" <<
std::endl;
return false;
}

memcpy( staging_buffer_memory_pointer, texture_data, data_size );

VkMappedMemoryRange flush_range = {
VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.StagingBuffer.Memory, // VkDeviceMemory memory
0, // VkDeviceSize offset
data_size // VkDeviceSize size
};
vkFlushMappedMemoryRanges( GetDevice(), 1, &flush_range );

vkUnmapMemory( GetDevice(), Vulkan.StagingBuffer.Memory );


6. Tutorial06.cpp, function CopyTextureData()

We map the buffer’s memory. This operation gives us a pointer that can be used the way that all other C++ pointers
are used. We copy texture data to it and inform the driver which parts of the buffer’s memory were changed during this
operation (we flush the memory). At the end, we unmap the memory, but this is not necessary.

Image data is read from a file with the following code:


int width = 0, height = 0, data_size = 0;
std::vector<char> texture_data = Tools::GetImageData( "Data06/texture.png", 4,
&width, &height, nullptr, &data_size );
if( texture_data.size() == 0 ) {
return false;
}

if( !CopyTextureData( &texture_data[0], data_size, width, height ) ) {


std::cout << "Could not upload texture data to device memory!" << std::endl;
return false;
}
7. Tutorial06.cpp, function CreateTexture()

For the purpose of this tutorial we will use the following image as a texture:

The operation of copying data from a buffer to an image requires recording a command buffer and submitting it to a
queue. Calling the vkBeginCommandBuffer() function starts the recording operation:
// Prepare command buffer to copy data from staging buffer to a vertex buffer
VkCommandBufferBeginInfo command_buffer_begin_info = {
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, // VkCommandBufferUsageFlags
flags
nullptr // const
VkCommandBufferInheritanceInfo *pInheritanceInfo
};

VkCommandBuffer command_buffer = Vulkan.RenderingResources[0].CommandBuffer;

vkBeginCommandBuffer( command_buffer, &command_buffer_begin_info);


8. Tutorial06.cpp, function CopyTextureData()

At the beginning of the command buffer recording we need to perform a layout transition on our image. We want to
copy data to the image so we need to change its layout to a VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. We need to
do this explicitly using an image memory barrier and calling the vkCmdPipelineBarrier() function:
VkImageSubresourceRange image_subresource_range = {
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t baseMipLevel
1, // uint32_t levelCount
0, // uint32_t baseArrayLayer
1 // uint32_t layerCount
};

VkImageMemoryBarrier image_memory_barrier_from_undefined_to_transfer_dst = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType sType
nullptr, // const void *pNext
0, // VkAccessFlags srcAccessMask
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags dstAccessMask
VK_IMAGE_LAYOUT_UNDEFINED, // VkImageLayout oldLayout
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout newLayout
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.Image.Handle, // VkImage image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1,
&image_memory_barrier_from_undefined_to_transfer_dst);
9. Tutorial06.cpp, function CopyTextureData()

Next, we can copy the data itself. To do this we need to provide parameters describing both a source and a destination
for the data: which parts of the image we want to update (imageSubresource member), a specific region within the
provided part (imageOffset), and the total size of the image. For the source of the data we need to provide an offset from
the beginning of a buffer’s memory where the data starts, and how this data is structured, and the size of an imaginary
image inside the buffer (the size of its rows and columns). Fortunately, we can store our data in such a way that it fits our
image. This allows us to set a zero value for both parameters (bufferRowLength and bufferImageHeight), specifying that
the data is tightly packed according to the image size.
VkBufferImageCopy buffer_image_copy_info = {
0, // VkDeviceSize bufferOffset
0, // uint32_t bufferRowLength
0, // uint32_t bufferImageHeight
{ // VkImageSubresourceLayers imageSubresource
VK_IMAGE_ASPECT_COLOR_BIT, // VkImageAspectFlags aspectMask
0, // uint32_t mipLevel
0, // uint32_t baseArrayLayer
1 // uint32_t layerCount
},
{ // VkOffset3D imageOffset
0, // int32_t x
0, // int32_t y
0 // int32_t z
},
{ // VkExtent3D imageExtent
width, // uint32_t width
height, // uint32_t height
1 // uint32_t depth
}
};
vkCmdCopyBufferToImage( command_buffer, Vulkan.StagingBuffer.Handle,
Vulkan.Image.Handle, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &buffer_image_copy_info );
10. Tutorial06.cpp, function CopyTextureData()

One last thing is to perform another layout transition. Our image will be used as a texture inside shaders, so we need
to transition it to a VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL layout. After that, we can end our command
buffer, submit it to a queue, and wait for the transfer to complete (in a real-life application, we should skip waiting and
synchronize operations in some other way; for example, using semaphores, to avoid unnecessary pipeline stalls).
VkImageMemoryBarrier image_memory_barrier_from_transfer_to_shader_read = {
VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, // VkStructureType sType
nullptr, // const void *pNext
VK_ACCESS_TRANSFER_WRITE_BIT, // VkAccessFlags
srcAccessMask
VK_ACCESS_SHADER_READ_BIT, // VkAccessFlags
dstAccessMask
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, // VkImageLayout oldLayout
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, // VkImageLayout newLayout
VK_QUEUE_FAMILY_IGNORED, // uint32_t
srcQueueFamilyIndex
VK_QUEUE_FAMILY_IGNORED, // uint32_t
dstQueueFamilyIndex
Vulkan.Image.Handle, // VkImage image
image_subresource_range // VkImageSubresourceRange
subresourceRange
};
vkCmdPipelineBarrier( command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, nullptr, 0, nullptr, 1,
&image_memory_barrier_from_transfer_to_shader_read);

vkEndCommandBuffer( command_buffer );

// Submit command buffer and copy data from staging buffer to a vertex buffer
VkSubmitInfo submit_info = {
VK_STRUCTURE_TYPE_SUBMIT_INFO, // VkStructureType sType
nullptr, // const void *pNext
0, // uint32_t
waitSemaphoreCount
nullptr, // const VkSemaphore
*pWaitSemaphores
nullptr, // const VkPipelineStageFlags
*pWaitDstStageMask;
1, // uint32_t
commandBufferCount
&command_buffer, // const VkCommandBuffer
*pCommandBuffers
0, // uint32_t
signalSemaphoreCount
nullptr // const VkSemaphore
*pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) !=


VK_SUCCESS ) {
return false;
}

vkDeviceWaitIdle( GetDevice() );
11. Tutorial06.cpp, function CopyTextureData()

Now our image is created and fully initialized (contains proper data). But we are not yet done preparing our texture.

Creating a Sampler
In OpenGL, when we created a texture, both the image and its sampling parameters had to be specified. In later
versions of OpenGL we could also create separate sampler objects. Inside a shader, we usually created variables of type
sampler2D, which also combined both images and their sampling parameters (samplers). In Vulkan, we need to create
images and samplers separately.

Samplers define the way in which image data is read inside shaders: whether filtering is enabled, whether we want to
use mipmaps (or maybe a specific subrange of mipmaps), or what kind of addressing mode we want to use (clamping or
wrapping).
VkSamplerCreateInfo sampler_create_info = {
VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO, // VkStructureType sType
nullptr, // const void* pNext
0, // VkSamplerCreateFlags flags
VK_FILTER_LINEAR, // VkFilter magFilter
VK_FILTER_LINEAR, // VkFilter minFilter
VK_SAMPLER_MIPMAP_MODE_NEAREST, // VkSamplerMipmapMode mipmapMode
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeU
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeV
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE, // VkSamplerAddressMode addressModeW
0.0f, // float mipLodBias
VK_FALSE, // VkBool32 anisotropyEnable
1.0f, // float maxAnisotropy
VK_FALSE, // VkBool32 compareEnable
VK_COMPARE_OP_ALWAYS, // VkCompareOp compareOp
0.0f, // float minLod
0.0f, // float maxLod
VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK,// VkBorderColor borderColor
VK_FALSE // VkBool32
unnormalizedCoordinates
};

return vkCreateSampler( GetDevice(), &sampler_create_info, nullptr, sampler ) ==


VK_SUCCESS;
12. Tutorial06.cpp, function CreateSampler()

All the above parameters are defined through variables of type VkSamplerCreateInfo. It has many members:

• sType – Type of the structure. It should be equal to a VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO value.


• pNext – Pointer reserved for extensions.
• flags – Must be set to zero. This parameter is reserved for future use.
• magFilter – Type of filtering (nearest or linear) used for magnification.
• minFilter – Type of filtering (nearest or linear) used for minification.
• mipmapMode – Type of filtering (nearest or linear) used for mipmap lookup.
• addressModeU – Addressing mode for U coordinates that are outside of a <0.0; 1.0> range.
• addressModeV – Addressing mode for V coordinates that are outside of a <0.0; 1.0> range.
• addressModeW – Addressing mode for W coordinates that are outside of a <0.0; 1.0> range.
• mipLodBias – Value of bias added to mipmap’s level of detail calculations. If we want to offset fetching data
from a specific mipmap, we can provide a value other than 0.0.
• anisotropyEnable – Parameter defining whether anisotropic filtering should be used.
• maxAnisotropy – Maximal allowed value used for anisotropic filtering (clamping value).
• compareEnable – Enables comparison against a reference value during texture lookups.
• compareOp – Type of comparison performed during lookups if the compareEnable parameter is set to true.
• minLod – Minimal allowed level of detail used during data fetching. If calculated level of detail (mipmap level)
is lower than this value, it will be clamped.
• maxLod – Maximal allowed level of detail used during data fetching. If the calculated level of detail (mipmap
level) is greater than this value, it will be clamped.
• borderColor – Specifies predefined color of border pixels. Border color is used when address mode includes
clamping to border colors.
• unnormalizedCoordinates – Usually (when this parameter is set to false) we provide texture coordinates using
a normalized <0.0; 1.0> range. When set to true, this parameter allows us to specify that we want to use
unnormalized coordinates and address texture using texels (in a <0; texture dimension> range, similar to
OpenGL’s rectangle textures).

Sampler object is created by calling the vkCreateSampler() function, for which we provide a pointer to the structure
described above.

Using Descriptor Sets


We created an image, bound a memory to it, and we even uploaded data to the image. We also created a sampler to
set up sampling parameters for our texture. Now we want to use the texture. How can we do this? We do it through
descriptor sets.

As mentioned at the beginning, resources used inside shaders are called descriptors. In Vulkan we have 11 types of
descriptors:

• Samplers – Define the way image data is read. Inside shaders, samplers can be used with multiple images.
• Sampled images – Define images from which we can read data inside shaders. We can read data from a single
image using different samplers.
• Combined image samplers – These descriptors combine both sampler and sampled image as one object. From
the API perspective (our application), we still need to create both a sampler and an image, but inside the
shader they appear as a single object. Using them may be more optimal (may have better performance) than
using separate samplers and sampled images.
• Storage images – This descriptor allows us to both read and store data inside an image.
• Input attachments – This a specific usage of render pass’s attachments. When we want to read data from an
image which is used as an attachment inside the same render pass, we can only do it through an input
attachment. This way we do not need to end a render pass and start another one, but we are restricted to
only fragment shaders, and to only a single location per fragment shader instance (a given instance of a
fragment shader can read data from coordinates associated with the fragment shader’s coordinates).
• Uniform buffers (and their dynamic variation) – Uniform buffers allow us to read data from uniform variables.
In Vulkan, such variables cannot be placed inside the global scope; we need to use uniform buffers.
• Storage buffers (and their dynamic variation) – Storage buffers allow us to both read and store data inside
variables.
• Uniform texel buffers – These allow the contents of buffers to be treated as if they contain texture data, they
are interpreted as texels with a selected number of components and format. In this way, we can access very
large arrays of data (much larger than uniform buffers).
• Storage texel buffers – These are similar to uniform texel buffers. Not only can they be used for reading, but
they can also be used for storing data.

All of the above descriptors are created from samplers, images, or buffers. The difference is in the way that we use
them and access inside shaders. All additional parameters of such access may have performance implications. For
example, with storage buffers we can only read data, but reading data is probably much faster than storing data inside
storage buffers. Similarly, texel buffers allow us to access more elements than with uniform buffers, but this may also
come with the cost of worse performance. We should remember to select a descriptor that fits our needs.

In this tutorial we want to use a texture. For this purpose we created an image and a sampler. We will use both to
prepare a combined image sampler descriptor.

Creating a Descriptor Set Layout


Preparing resources to be used by shaders should begin with creating a descriptor set layout. Descriptor sets are
opaque objects in which we store handles of resources. Layouts define the structure of descriptor sets—what types of
descriptors they contain, how many descriptors of each type there are, and what their order is.

Descriptor set layout creation starts by defining the parameters of all descriptors available in a given set. This is done
by filling a structure variable of type VkDescriptorSetLayoutBinding:
VkDescriptorSetLayoutBinding layout_binding = {
0, // uint32_t binding
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType descriptorType
1, // uint32_t descriptorCount
VK_SHADER_STAGE_FRAGMENT_BIT, // VkShaderStageFlags stageFlags
nullptr // const VkSampler
*pImmutableSamplers
};
13. Tutorial06.cpp, function CreateDescriptorSetLayout()
The above description contains the following members:

• binding – Index of a descriptor within a given set. All descriptors from a single layout (and set) must have a
unique binding. This same binding is also used inside shaders to access a descriptor.
• descriptorType – The type of a descriptor (sampler, uniform buffer, and so on.)
• descriptorCount – Number of descriptors of a selected type accessed as an array. For a single descriptor, 1
value should be used.
• stageFlags – Set of flags defining all shader stages that will have access to a given descriptor. For better
performance, we should specify only those stages that will access the given resource.
• pImmutableSamplers – Affects only samplers that should be permanently bound into the layout (and cannot
be changed later). But we don’t have to worry about this parameter, and we can bind samplers as any other
descriptors by setting this parameter to null.

In our example, we want to use only one descriptor of a combined image sampler, which will be accessed only by a
fragment shader. It will be the first (binding zero) descriptor in a given layout. To avoid wasting memory, we should keep
bindings as compactly as possible (as close to zero as possible), because drivers may allocate memory for descriptor slots
even if they are not used.

We can prepare similar parameters for other descriptors accessed from a single set. Then, pointers to such variables
are provided to a variable of type VkDescriptorSetLayoutCreateInfo:
VkDescriptorSetLayoutCreateInfo descriptor_set_layout_create_info = {
VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, //
VkDescriptorSetLayoutCreateFlags flags
1, // uint32_t
bindingCount
&layout_binding // const
VkDescriptorSetLayoutBinding *pBindings
};

if( vkCreateDescriptorSetLayout( GetDevice(), &descriptor_set_layout_create_info,


nullptr, &Vulkan.DescriptorSet.Layout ) != VK_SUCCESS ) {
std::cout << "Could not create descriptor set layout!" << std::endl;
return false;
}
14. Tutorial06.cpp, function CreateDescriptorSetLayout()

This structure contains just a few members:

• sType – The type of the structure. It should be equal to


VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO.
• pNext – Pointer reserved for extensions.
• flags – This parameter allows us to provide additional options for layout creation. But as they are connected
with using extensions, we can set this parameter to zero.
• bindingCount – The number of bindings, elements in the pBindings array.
• pBindings – A pointer to an array with descriptions of all resources in a given layout. This array must be no
smaller than the value of the bindingCount parameter.

After we have filled in the structure, we can call the vkCreateDescriptorSetLayout() function to create a descriptor
set layout. We will need this layout later, multiple times.
Creating a Descriptor Pool
Next step is to prepare a descriptor set. Descriptor sets, similar to command buffers, are not created directly; they are
instead allocated from pools. Before we can allocate a descriptor set, we need to create a descriptor pool.
VkDescriptorPoolSize pool_size = {
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType
type
1 // uint32_t
descriptorCount
};

VkDescriptorPoolCreateInfo descriptor_pool_create_info = {
VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkDescriptorPoolCreateFlags
flags
1, // uint32_t
maxSets
1, // uint32_t
poolSizeCount
&pool_size // const VkDescriptorPoolSize
*pPoolSizes
};

if( vkCreateDescriptorPool( GetDevice(), &descriptor_pool_create_info, nullptr,


&Vulkan.DescriptorSet.Pool ) != VK_SUCCESS ) {
std::cout << "Could not create descriptor pool!" << std::endl;
return false;
}
15. Tutorial06.cpp, function CreateDescriptorPool()

Creating a descriptor pool involves specifying how many descriptor sets can be allocated from it. At the same time,
we also need to specify what types of descriptors, and how many of them can be allocated from the pool across all sets.
For example, let’s imagine that we want to allocate a single sampled image and a single storage buffer from a given pool,
and that we can allocate two descriptor sets from the pool. When doing this, if we allocate one descriptor set with a
sampled image, the second descriptor can contain only a storage buffer. If a single descriptor set allocated from that pool
contains both resources, we can’t allocate another set because it would have to be empty. During descriptor pool creation
we define the total number of descriptors and total number of sets that can be allocated from it. This is done in two steps.

First, we prepare variables of type VkDescriptorPoolSize that specify the type of a descriptor and the total number of
descriptors of a selected type that can be allocated from the pool. Next, we provide an array of such variables to a variable
of type VkDescriptorPoolCreateInfo. It contains the following members:

• sType – The type of the structure. In this case it should be set to


VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO.
• pNext – Pointer reserved for extensions.
• flags – This parameter defines (when using a VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT
flag) whether individual sets allocated from the pool can be freed or reset separately. If this parameter is set
to zero, all descriptor sets allocated from the pool can only be reset at once (in bulk) by resetting the whole
pool.
• maxSets – Is the total number of sets that can be allocated from the pool.
• poolSizeCount – Defines the number of elements in the pPoolSizes array.
• pPoolSizes – Is a pointer to an array containing no less than poolSizeCount elements containing descriptor
types and a total number of descriptors of that type that can be allocated from the pool.

In our example we want to allocate only a single descriptor set with only one descriptor of a combined image sampler
type. We prepare parameters according to our example and create a descriptor pool by calling the
vkCreateDescriptorPool() function.

Allocating Descriptor Sets


Now we are ready to allocate the descriptor set itself. Code that does this is quite short:
VkDescriptorSetAllocateInfo descriptor_set_allocate_info = {
VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
Vulkan.DescriptorSet.Pool, // VkDescriptorPool
descriptorPool
1, // uint32_t
descriptorSetCount
&Vulkan.DescriptorSet.Layout // const VkDescriptorSetLayout
*pSetLayouts
};

if( vkAllocateDescriptorSets( GetDevice(), &descriptor_set_allocate_info,


&Vulkan.DescriptorSet.Handle ) != VK_SUCCESS ) {
std::cout << "Could not allocate descriptor set!" << std::endl;
return false;
}
16. Tutorial06.cpp, function AllocateDescriptorSet()

To allocate a descriptor set we need to prepare a variable of VkDescriptorSetAllocateInfo type, which has the following
members:

• sType – Standard type of the structure. For the purpose of descriptor set allocation we need to set this
member to a value of VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO.
• pNext – Pointer reserved for extensions.
• descriptorPool – Handle of a descriptor pool from which the command buffer should be allocated.
• descriptorSetCount – Number of descriptor sets we want to allocate (and number of elements in the
pSetLayouts member).
• pSetLayouts – Pointer to an array with at least descriptorSetCount elements. Each element of this array must
contain a descriptor set layout that defines the inner structure of the allocated descriptor set (elements may
repeat; for example, we can allocate five descriptor sets at once, all with the same layout).

As we can see in the above structure, we need to provide descriptor set layouts. That’s why we needed to create them
earlier. To allocate a selected number of descriptor sets from a provided pool we need to provide a pointer to the above
structure to the vkAllocateDescriptorSets() function.

Updating Descriptor Sets


We prepared a descriptor set, but it is empty; it’s uninitialized. Now we need to fill it or update it. This means that we
tell the driver which resources should be used for descriptors inside the set.

We can update a descriptor set in two ways:

• By writing to the descriptor set—this way we provide new resources.


• By copying data from another descriptor set—if we have a previously updated descriptor set and if we also
want to use some of its descriptors in another descriptor we can copy them; this approach can be much faster
than writing descriptor sets directly from the CPU.

As we don’t have another descriptor set, we need to write to our single descriptor set directly. For each descriptor
type we need to prepare two structures. One, common for all descriptor types, is the VkWriteDescriptorSet structure. It
contains the following members:

• sType – Type of the structure. We need to use a VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET value.


• pNext – Pointer reserved for extensions.
• dstSet – Handle of a descriptor set that we want to update (fill with specific resources).
• dstBinding – Index within the descriptor set that we want to update. We must provide one of the bindings
specified during descriptor set layout creation. What’s more, the selected binding must correspond to a
provided type of the descriptor.
• dstArrayElement – Specifies the first array index we want to update. Using a single VkWriteDescriptorSet
structure we can update multiple elements of a single array. Let’s say we have a four-element array of
samplers and we want to update the last two (with indices 2 and 3); we can provide two samplers and update
the array starting from index 2.
• descriptorCount – Number of descriptors we want to update (number of elements in pImageInfo or
pBufferInfo, or pTexelBufferView array). For ordinary descriptors we set the value to one. But for arrays we
can provide larger values.
• descriptorType – Type of the descriptor we are going to update. It must be the same as the descriptor type
provided during descriptor set layout creation with the same binding (index within a descriptor set).
• pImageInfo – Pointer to an array with at least descriptorCount elements of type VkDescriptorImageInfo. Each
such element must contain handles of specific resources when we want to update
VK_DESCRIPTOR_TYPE_SAMPLER, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE, or
VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT descriptors.
• pBufferInfo – Pointer to an array with at least descriptorCount elements of type VkDescriptorBufferInfo. Each
such element must contain handles of specific resources when we want to update
VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC, or
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC descriptors.
• pTexelBufferView – Array with at least descriptorCount VkBufferView handles. This array is used when we
want to update VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC, or
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC descriptors.

Depending on the type of descriptor we want to update, we need to prepare a variable (or an array of variables) of
type VkDescriptorImageInfo, VkDescriptorBufferInfo, or VkBufferView. Here, we want to update a combined image
sampler descriptor, so we need to prepare a variable of type VkDescriptorImageInfo. It contains the following members:

• sampler – Handle of a sampler object.


• imageView – Handle of an image view.
• imageLayout – Here we provide a layout that the image will have when the descriptor is accessed inside
shaders.

In this structure we provide parameters of specific resources; we point to created and valid resources that we want
to use inside shaders. Members of this structure are initialized based on the descriptor type. For example, if we update a
sampler, we need to provide only the handle of a sampler. If we want to update a sampled image, we need to provide an
image view’s handle and an image’s layout. But image won’t be transitioned to this layout automatically (as in render
passes). We need to perform the transition to this layout ourselves, explicitly through pipeline barriers or, in case of input
attachments, through render passes. What’s more, we need to provide a layout that corresponds to a given usage.

In our example we want to use a texture. We can do this either by using separate sampler and sampled image
descriptors or by using a combined image sampler descriptor (as in typical OpenGL applications). The latter approach can
be more optimal (some hardware platforms may sample data from combined image samplers faster than from separate
samplers and sampled images), and we present that approach here. When we want to update a combined image sampler,
we need to provide all three members of the VkDescriptorImageInfo structure:
VkDescriptorImageInfo image_info = {
Vulkan.Image.Sampler, // VkSampler
sampler
Vulkan.Image.View, // VkImageView
imageView
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL // VkImageLayout
imageLayout
};

VkWriteDescriptorSet descriptor_writes = {
VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, // VkStructureType sType
nullptr, // const void *pNext
Vulkan.DescriptorSet.Handle, // VkDescriptorSet
dstSet
0, // uint32_t
dstBinding
0, // uint32_t
dstArrayElement
1, // uint32_t
descriptorCount
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, // VkDescriptorType
descriptorType
&image_info, // const VkDescriptorImageInfo
*pImageInfo
nullptr, // const VkDescriptorBufferInfo
*pBufferInfo
nullptr // const VkBufferView
*pTexelBufferView
};

vkUpdateDescriptorSets( GetDevice(), 1, &descriptor_writes, 0, nullptr );


17. Tutorial06.cpp, function UpdateDescriptorSet()

A pointer to a variable of type VkDescriptorImageInfo is then provided in a variable of type VkWriteDescriptorSet. As


we update only one descriptor, we need only one instance of both structures. But of course we can update more
descriptors at a time, in which case we need to prepare more variables, which are then provided to the
vkUpdateDescriptorSets() function.

Creating a Pipeline Layout


We are not yet done. When we want to use descriptors, allocating and updating descriptor sets is not the only job we
need to perform. We have prepared specific resources that are almost ready to be used inside shaders, but descriptor sets
are used to store handles of specific resources. These handles are provided during command buffer recording. We need
to prepare information for the other side of the barricade: the driver also needs to know what types of resources the given
pipeline needs access to. This information is crucial when we create a pipeline as it may impact its internal structure or
maybe even a shader compilation. And this information is provided in a so-called pipeline layout.

The pipeline layout stores information about resource types that the given pipeline has access to. These resources
involve descriptors and push constant ranges. For now we can skip push constants and focus only on descriptors.
To create a pipeline layout and prepare information about the types of resources accessed by the pipeline, we need
to provide an array of descriptor set layouts. This is done through the following members of a variable of type
VkPipelineLayoutCreateInfo:

• sType – Type of the structure. A VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO value should be used


in this case.
• pNext – Pointer reserved for extensions.
• flags – This parameter is reserved for future use.
• setLayoutCount – Number of elements in the pSetLayouts member and number of separate descriptor sets
that can be used with this pipeline.
• pSetLayouts – Array with descriptor set layouts.
• pushConstantRangeCount – Number of separate push constant ranges.
• pPushConstantRanges – Array with elements describing push constant ranges.

And this is when descriptor set layouts are used again. The single descriptor set layout defines resource types
contained within a single descriptor set. And an array of these layouts defines resource types that the given pipeline needs
access to.

To create a pipeline layout we just call the vkCreatePipelineLayout() function. We did this in Introduction to Vulkan
Part 3: First Triangle. But there we created an empty layout (with no push constants and with no access to descriptor
resources). Here, we create a more typical pipeline layout.
VkPipelineLayoutCreateInfo layout_create_info = {
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO, // VkStructureType
sType
nullptr, // const void
*pNext
0, // VkPipelineLayoutCreateFlags
flags
1, // uint32_t
setLayoutCount
&Vulkan.DescriptorSet.Layout, // const VkDescriptorSetLayout
*pSetLayouts
0, // uint32_t
pushConstantRangeCount
nullptr // const VkPushConstantRange
*pPushConstantRanges
};

if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr,


&Vulkan.PipelineLayout ) != VK_SUCCESS ) {
std::cout << "Could not create pipeline layout!" << std::endl;
return false;
}
return true;
18. Tutorial06.cpp, function CreatePipelineLayout()

Such layout is then provided during pipeline creation. We also need to use this layout when we bind descriptor sets
during command buffer recording. So we need to store the pipeline layout handle.

Binding Descriptor Sets


One last thing is to bind descriptor sets to the command buffer during recording. We can have multiple different
descriptor sets or multiple, similar descriptor sets (with the same layouts), but they may contain different resource
handles. Which of these descriptors are used during rendering is defined during command buffer recording. Before we
can draw anything, we need to set up a valid state (according to the drawing parameters). For each command buffer we
record we need to do it from scratch.

Drawing operations requires us to use render passes and pipelines. If a pipeline uses descriptor resources (when
shaders access images or buffers), we need to bind descriptor sets by calling the vkCmdBindDescriptorSets() function. For
this function we must provide a handle of the pipeline layout and an array of descriptor set handles. We bind descriptor
sets to specific indices. The given index we bind a descriptor set to must correspond to its layout provided at the same
index during pipeline creation.
vkCmdBeginRenderPass( command_buffer, &render_pass_begin_info,
VK_SUBPASS_CONTENTS_INLINE );

vkCmdBindPipeline( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,


Vulkan.GraphicsPipeline );

// ...

vkCmdBindDescriptorSets( command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,


Vulkan.PipelineLayout, 0, 1, &Vulkan.DescriptorSet.Handle, 0, nullptr );

vkCmdDraw( command_buffer, 4, 1, 0, 0 );

vkCmdEndRenderPass( command_buffer );
19. Tutorial06.cpp, function PrepareFrame()

Accessing Descriptors in Shaders


One more thing. We need to write proper shaders. In this example, we access a texture inside a fragment shader only,
so only the fragment shader will be presented.

From the beginning of this tutorial we have been referring to descriptor sets, bindings within descriptor sets, and
about binding descriptor sets themselves. At the same time, we may have multiple descriptor sets bound to a command
buffer. Each descriptor set may contain multiple resources. This data conforms to a specific address that we use inside
shaders. This address is defined through a layout() specifier like this:
layout(set=S, binding=B) uniform <variable type> <variable name>

Set defines an index that the given descriptor set was bound to through the vkCmdBindDescriptorSets() function.
Binding specifies the index of a resource within the provided set and corresponds to the binding defined during descriptor
set layout creation. In our case, we have only one descriptor set provided at index zero, with only one combined image
sampler at binding zero. Combined image samplers are accessed inside shaders through sampler1D, sampler2D, or
sampler 3D variables. So our fragment shader’s source code looks like this:
#version 450

layout(set=0, binding=0) uniform sampler2D u_Texture;

layout(location = 0) in vec2 v_Texcoord;

layout(location = 0) out vec4 o_Color;

void main() {
o_Color = texture( u_Texture, v_Texcoord );
}
20. shader.frag, -
Tutorial06 Execution
We can see below how the final image generated by the sample program should look:

We render a quad that has a texture applied to its surface. The quad should adjust its size (and aspect) to match the
window’s size and shape (if we stretch the window, the quad and the image will be stretched too).

Cleaning Up
Before we can end our application, we should perform a cleanup.
// ...

if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {


vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
}

if( Vulkan.PipelineLayout != VK_NULL_HANDLE ) {


vkDestroyPipelineLayout( GetDevice(), Vulkan.PipelineLayout, nullptr );
Vulkan.PipelineLayout = VK_NULL_HANDLE;
}

// ...

if( Vulkan.DescriptorSet.Pool != VK_NULL_HANDLE ) {


vkDestroyDescriptorPool( GetDevice(), Vulkan.DescriptorSet.Pool, nullptr );
Vulkan.DescriptorSet.Pool = VK_NULL_HANDLE;
}

if( Vulkan.DescriptorSet.Layout != VK_NULL_HANDLE ) {


vkDestroyDescriptorSetLayout( GetDevice(), Vulkan.DescriptorSet.Layout, nullptr );
Vulkan.DescriptorSet.Layout = VK_NULL_HANDLE;
}

if( Vulkan.Image.Sampler != VK_NULL_HANDLE ) {


vkDestroySampler( GetDevice(), Vulkan.Image.Sampler, nullptr );
Vulkan.Image.Sampler = VK_NULL_HANDLE;
}
if( Vulkan.Image.View != VK_NULL_HANDLE ) {
vkDestroyImageView( GetDevice(), Vulkan.Image.View, nullptr );
Vulkan.Image.View = VK_NULL_HANDLE;
}

if( Vulkan.Image.Handle != VK_NULL_HANDLE ) {


vkDestroyImage( GetDevice(), Vulkan.Image.Handle, nullptr );
Vulkan.Image.Handle = VK_NULL_HANDLE;
}

if( Vulkan.Image.Memory != VK_NULL_HANDLE ) {


vkFreeMemory( GetDevice(), Vulkan.Image.Memory, nullptr );
Vulkan.Image.Memory = VK_NULL_HANDLE;
}
21. Tutorial06.cpp, function destructor

We destroy both pipeline and its layout by calling the vkDestroyPipeline() and vkDestroyPipelineLayout() functions.
Next, we destroy the descriptor pool with the vkDestroyDescriptorPool() function and the descriptor set layout with the
vkDestroyDescriptorSetLayout() function. We of course destroy other resources, but we already know how to do this.
You may notice that we don’t free a descriptor set. We can free each descriptor set separately if a proper flag was provided
during descriptor pool creation. But we don’t have to—when we destroy a descriptor pool all sets allocated from this pool
are also freed.

Conclusion
This part of the tutorial presented a way to use textures (combined image samplers, in fact) inside shaders. To do this
we created an image and allocated and bound a memory to it. We also created an image view. Next, we copied data from
a staging buffer to the image to initialize its contents. We also created a sampler object that defined a way in which image
data was read inside shaders.

Next, we prepared a descriptor set. First, we created a descriptor set layout. After that, a descriptor pool was created
from which a single descriptor set was allocated. We updated this set with the sampler and the image view handles.

The descriptor set layout was also used to define resources to which our graphics pipeline had access. This was done
during pipeline layout creation. This layout was then used when we bound the descriptor sets to a command buffer.

We also learned how to prepare a shader code that accessed the combined image sampler to read its data (to sample
it as a texture). It was done inside a fragment shader that was used during rendering of our simple geometry. This way we
applied a texture to the surface of this geometry.

In the next tutorial we will see how we can use uniform buffers inside shaders.

Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Performance varies depending on system configuration. Check with your system
manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2017 Intel Corporation

You might also like