mirror of
https://github.com/vulkano-rs/vulkano.git
synced 2024-11-25 16:25:31 +00:00
271 lines
18 KiB
Markdown
271 lines
18 KiB
Markdown
This document contains the global design decisions made by the vulkano library. It can also be a
|
|
good start if you want to contribute to some internal parts of vulkano and don't know how it works.
|
|
|
|
This document assumes that you're already familiar with Vulkan and does not introduce the various
|
|
concepts. However it can still be a good read if you are not so familiar.
|
|
|
|
If you notice any mistake, feel free to open a PR. If you want to suggest something, feel free to
|
|
open a PR as well.
|
|
|
|
# The three kinds of objects
|
|
|
|
Vulkano provides wrappers around all objects of the Vulkan API. However these objects are split in
|
|
three categories, depending on their access pattern:
|
|
|
|
- Objects that are not created often and in very small numbers.
|
|
- Objects that are typically created at initialization and which are often accessed without mutation
|
|
by performance-critical code.
|
|
- Objects that are created, destroyed or modified during performance-critical code, and that
|
|
usually require a synchronization strategy to avoid race conditions.
|
|
|
|
The first category are objects that are not created often and created in very small numbers:
|
|
Instances, Devices, Surfaces, Swapchains. In a typical application each of these objects is only
|
|
created once and destroyed when the application exits. Vulkano's API provides a struct that
|
|
corresponds to each of these objects, and this struct is typically wrapped in an `Arc`.
|
|
Their `new` method in fact returns an `Arc<T>` instead of just a `T` in order to encourage users to
|
|
use `Arc`s. You use these objects by cloning them around like you would use objects in a
|
|
garbage-collected language such as Java.
|
|
|
|
The second category are objects like the GraphicsPipeline, ComputePipeline, PipelineLayout,
|
|
RenderPass and Framebuffer. They are usually created at initialization and don't perform any
|
|
operations themselves, but they describe to the Vulkan implementation operations that we are going
|
|
to perform and are thus frequently accessed in order to determine whether the operation that the
|
|
vulkano user requested is compliant to what was described. Just like the first category, each of
|
|
these objects has a struct that corresponds to them, but in order to make these checks as fast as
|
|
possible these structs have a template parameter that describes in a strongly-typed fashion the
|
|
operation on the CPU side. This makes it possible to move many checks to compile-time instead of
|
|
runtime. More information in another section of this document.
|
|
|
|
The third category are objects like CommandBuffers, CommandPools, DescriptorSets, DescriptorPools,
|
|
Buffers, Images, and memory pools (although not technically a Vulkan object). The way they are
|
|
implemented has a huge impact on the performance of the application. Contrary to the first two
|
|
categories, each of these objects is represented in vulkano by an unsafe trait (and not by a
|
|
struct) that can be freely implemented by the user if they wish. Vulkano provides unsafe structs
|
|
such as `UnsafeBuffer`, `UnsafeImage`, etc. which have zero overhead and do not perform any safety
|
|
checks, and are the tools used by the safe implementations of the traits. Vulkano also provides
|
|
some safe implementations for convenience such as `CpuAccessibleBuffer` or `AttachmentImage`.
|
|
|
|
# Runtime vs compile-time checks
|
|
|
|
The second category of objects described above are objects that describe to the Vulkan
|
|
implementation an operation that we are going to perform later. For example a `ComputePipeline`
|
|
object describes to the Vulkan implementation a compute operation and contains the shader's code
|
|
and the list of resources that we are going to bind and that are going to be accessed by the shader.
|
|
|
|
Since vulkano is a safe library, it needs to check whether the operation the user requests (eg.
|
|
executing a compute operation) matches the corresponding `ComputePipeline` (for example, check
|
|
that the list of resources passed by the user matches what the compute pipeline expects).
|
|
These checks can be expensive. For example when it comes to buffers, vulkano needs to check whether
|
|
the layout of the buffers passed by the user is the same as what is expected, by looping through all
|
|
the members and following several indirections. If you multiply this by several dozens or hundreds
|
|
of operations, it can become very expensive.
|
|
|
|
In order to reduce the stress caused by these checks, structs such as `ComputePipeline` have a
|
|
template parameter which describes the operation. Whenever vulkano performs a check, it queries
|
|
the templated object through a trait, and each safety check has its own trait. This means
|
|
that we can build strongly-typed objects at compile-time that describe a very precise operation and
|
|
whose method implementations are trivial. For example, we can create a `MyComputeOpDesc` type which
|
|
implements the `ResourcesListMatch<MyResourcesList>` trait (which was made up for the sake of the
|
|
example), and the user will only be able to pass a `MyResourcesList` object for the list of
|
|
resources. This moves the check to compile-time and totally eliminates any runtime check. The
|
|
compute pipeline is then expressed as `ComputePipeline<MyComputeOpDesc>`.
|
|
|
|
However this design has a drawback, which is that is can be difficult to explicitly express such a
|
|
type. A compute pipeline in the example above could be expressed as
|
|
`ComputePipeline<MyComputeOpDesc>`, but in practice these types (like `MyComputeOpDesc`) would be
|
|
built by builders and can become extremely long and annoying to put in a struct (just like for
|
|
example the type of `(10..).filter(|n| n*2).skip(3).take(5)` can be very long and annoying to put
|
|
in a struct). This is especially problematic as it concerns objects that are usually created at
|
|
initialization and stay alive for a long time, in other words the kind of objects that you would
|
|
put in a struct.
|
|
|
|
In order to solve this naming problem, all the traits that are used to describe operations must be
|
|
boxable so that we can turn `ComputePipeline<Very<Long<And<Complicated, Type>>>>` into
|
|
`ComputePipeline<Box<ComputePipelineDesc>>`. This means that we can't use associated types and
|
|
templates for any of the trait methods. Ideologically it is a bit annoying to have to restrict
|
|
ourselves in what we can do just because the user needs to be able to write out the precise type,
|
|
but it's the only pragmatic solution for now.
|
|
|
|
# Submissions
|
|
|
|
Any object that can be submitted to a GPU queue (for example a command buffer) implements
|
|
the `Submit` trait.
|
|
|
|
The `Submit` trait provides a function named `build` which returns a `Submission<Self>` object
|
|
(where `Self` is the type that implements the `Submit` trait). The `Submission` object must be kept
|
|
alive by the user for as long as the GPU hasn't finished executing the submission. Trying to
|
|
destroy a `Submission` will block until it is the case. Since the `Submission` holds the object
|
|
that was submitted, this object is also kept alive for as long as the GPU hasn't finished executing
|
|
it.
|
|
|
|
For the moment submitting an object always creates a fence, which is how the `Submission` knows
|
|
whether the GPU has finished executing it. Eventually this will need to be modified for the sake of
|
|
performance.
|
|
|
|
In order to make the `Submit` trait safer to implement, the method that actually needs to be
|
|
implemented is not `build` but `append_submission`. This method uses a API/lifetime trick to
|
|
guarantee that the GPU only executes command buffers that outlive the struct that implements
|
|
`Submit`.
|
|
|
|
SAFETY ISSUE HERE HOWEVER: the user can use mem::forget on the Submission and then drop the
|
|
objects referenced by it. There are two solutions to this: either store a bunch of Arc<Fence> in
|
|
every single object referenced by submissions (eg. pipeline objects), or force the user to use
|
|
either Arcs or give ownership of the object. The latter is preferred but not yet implemented.
|
|
|
|
# Pools
|
|
|
|
There are three kinds of pools in vulkano: memory pools, descriptor pools, and command pools. Only
|
|
the last two are technically Vulkan concepts, but using a memory pool is also a very common
|
|
pattern that you are strongly encouraged to embrace when you write a Vulkan application.
|
|
|
|
These three kinds of pools are each represented in vulkano by a trait. When you use the Vulkan API,
|
|
you are expected to create multiple command pools and multiple descriptor pools for maximum
|
|
performance. In vulkano however, it is the implementation of the pool trait that is responsible
|
|
for managing multiple actual pool objects. In other words a pool in vulkano is just a trait that
|
|
provides a method to allocate or free some resource, and the advanced functionality of Vulkan
|
|
pools (like resetting a command buffer, resetting a pool, or managing the descriptor pool's
|
|
capacity) is handled internally by the implementation of the trait. For example freeing a
|
|
command buffer can be implemented by resetting it and reusing it, instead of actually freeing it.
|
|
|
|
One of the goals of vulkano is to be easy to use by default. Therefore vulkano provides a default
|
|
implementation for each of these pools, and the `new` constructors of types that need a pool (ie.
|
|
buffers, images, descriptor sets, and command buffers) will use the default implementation. It is
|
|
possible for the user to use an alternative implementation of a pool by using an alternative
|
|
constructor, but the default implementations should be good for most usages. This is similar to
|
|
memory allocators in languages such as C++ and Rust, in the sense that some users want to be able
|
|
to use a custom allocator but most of the time it's not worth bothering with that.
|
|
|
|
# Command buffers
|
|
|
|
Command buffer objects belong to the last category of objects that were described above. They are
|
|
represented by an unsafe trait and can be implemented manually by the user if they wish.
|
|
|
|
However this poses a practical problem, which is that creating a command buffer in a safe way
|
|
is really complicated. There are tons of commands to implement, and each command has a ton of
|
|
safety requirements. If a user wants to create a custom command buffer type, it is just not an
|
|
option to ask them to reimplement these safety checks themselves.
|
|
|
|
The reason why users may want to create their own command buffer types is to implement
|
|
synchronization themselves. Vulkano's default implementation (which is `AutobarriersCommandBuffer`)
|
|
will automatically place pipeline barriers in order to handle cache flushes and image layout
|
|
transitions and avoid data races, but this automatic computation can be seen as expensive.
|
|
|
|
In order to make it possible to customize the synchronization story of command buffers, vulkano has
|
|
split the command buffer building process in two steps. First the user builds a list of commands
|
|
through an iterator-like API (and vulkano will check their validity), and then they are turned into
|
|
a command buffer through a trait. This means that the user can customize the synchronization
|
|
strategy (by customizing the second step) while still using the same command-building process
|
|
(the first step). Commands are not opinionated towards one strategy or another. The
|
|
command-building code is totally isolated from the synchronization strategy and only checks
|
|
whether the commands themselves are valid.
|
|
|
|
The fact that all the commands are added at once can be a little surprising for a user coming from
|
|
Vulkan. Vulkano's API looks very similar to Vulkan's API, but there is a major difference: in
|
|
Vulkan the cost of creating a command buffer is distributed between each function call, but in
|
|
vulkano it is done all at once. For example creating a command buffer with 6 commands with Vulkan
|
|
requires 8 function calls that take say 5µs each, while creating the same command buffer with
|
|
vulkano requires 8 function calls, but the first 7 are almost free and the last one takes 40µs.
|
|
After some thinking, it was considered to not be a problem.
|
|
|
|
Creating a list of commands with an iterator-like API has the problem that the type of the list of
|
|
commands changes every time you add a new command to the list
|
|
(just like for example `let iterator = iterator.skip(1)` changes the type of `iterator`). This is
|
|
a problem in situations where we don't know at compile-time the number of commands that we are
|
|
going to add. In order to solve this, it is required that the `CommandsList` trait be boxable,
|
|
so that the user can use a `Box<CommandsList>`. This is unfortunately not optimal as you will need
|
|
a memory allocation for each command that is added to the list. The situation here could still be
|
|
improved.
|
|
|
|
# The auto-barriers builder
|
|
|
|
As explained above, the default implementation of a command buffer provided by vulkano
|
|
automatically places pipeline barriers to avoid issues such as caches not being flushed, commands
|
|
being executed simultaneously when they shouldn't, or images having the wrong layout.
|
|
|
|
This is not an easy job, because Vulkan allows lots of weird access patterns that we want to make
|
|
available in vulkano. You can for example create a buffer object split into multiple sub-buffer
|
|
objects, or make some images and buffers share the same memory.
|
|
|
|
In order to make it possible to handle everything properly, the `Buffer` and `Image` traits need to
|
|
help us with the `conflicts` methods. Each buffer and image can be queried to know whether it
|
|
potentially uses the same memory as any other buffer or image. When two resources conflict, this
|
|
means that you can't write to one and read from the other one simultaneously or write to both
|
|
simultaneously.
|
|
|
|
But we don't want to check every single combination of buffer and image every time to check whether
|
|
they conflict. So in order to improve performance, buffers and images also need to provide a key
|
|
that identifies them. Two resources that can potentially conflict must always return the same key.
|
|
The regular `conflict` functions are still necessary to handle the situation where buffers or
|
|
images accidentally return the same key but don't actually conflict.
|
|
|
|
This conflict system is also used to make sure that the attachments of a framebuffer don't conflict
|
|
with each other or that the resources in a descriptor set don't conflict with each other (both
|
|
situations are forbidden).
|
|
|
|
# Image layouts
|
|
|
|
Tracking image layouts can be tedious. Vulkano uses a simple solution, which is that images must
|
|
always be in a specific layout at the beginning and the end of a command buffer. If a transition
|
|
is performed during a command buffer, the image must be transitioned back before the end of the
|
|
command buffer. The layout in question is queried with a method on the `Image` trait.
|
|
|
|
For example an `AttachmentImage` must always be in the `ColorAttachmentOptimal` layout for color
|
|
attachment, and the `DepthStencilAttachmentOptimal` layout for depth-stencil attachments. If any
|
|
command switches the image to another layout, then it will need to be switched back before the end
|
|
of the command buffer.
|
|
|
|
This system works very nicely in practice, and unnecessary layout transitions almost never happen.
|
|
The only situation where unnecessary transitions tend to happen in practice is for swapchain images
|
|
that are transitioned from `PresentSrc` to `ColorAttachmentOptimal` before the start of the
|
|
render pass, because the initial layout of the render pass attachment is `ColorAttachmentOptimal`
|
|
by default for color attachments. Vulkano should make it clear in the documentation of render
|
|
passes that the user is encouraged to specify when an attachment is expected to be in the
|
|
`PresentSrc` layout.
|
|
|
|
The only problematic area concerns the first usage of an image, where it must be transitioned from
|
|
the `Undefined` or `Preinitialized` layout. This is done by making the user pass a command buffer
|
|
builder in the constructor of images, and the constructor adds a transition command to it. The
|
|
image implementation is responsible for making sure that the transition command has been submitted
|
|
before any further command that uses the image.
|
|
|
|
# Inter-queue synchronization
|
|
|
|
When users submit two command buffers to two different queues, they expect the two command buffers
|
|
to execute in parallel. However this is forbidden if doing so could result in a data race,
|
|
like for example if one command buffer writes to an image and the other one reads from that same
|
|
image.
|
|
In this situation, the only possible technical solution is to make the execution of the second
|
|
command buffer block until the first command buffer has finished executing.
|
|
This case is similar to spawning two threads that each access the same resource protected by
|
|
a `RwLock` or a `Mutex`. One of the two threads will need to block until the first one is finished.
|
|
|
|
This raises the question: should vulkano implicitly block command buffers to avoid data races,
|
|
or should it force the user to explicitly add wait operations? By comparing a CPU-side
|
|
multithreaded program and a GPU-side multithreaded program, then the answer is to make it implicit,
|
|
as a CPU will also implicitly block when calling a function that happens to lock a `Mutex` or
|
|
a `RwLock`. In CPU code, these locking problems are always "fixed" by properly documenting the
|
|
behavior of the functions you call. Similarly, vulkano should precisely document its behavior.
|
|
|
|
More generally users are encouraged to avoid sharing resources between multiple queues unless these
|
|
resources are read-only, and in practice in a video game it is indeed rarely needed to share
|
|
resources between multiple queues. Just like for CPU-side multithreading, users are encouraged to
|
|
have a graph of the ways queues interact with each other.
|
|
|
|
However another problem arises. In order to make a command buffer wait for another, you need to
|
|
make the queue of the first command buffer submit a semaphore after execution, and the queue of
|
|
the second command buffer wait on that same semaphore before execution. Semaphores can only be used
|
|
once. This means that when you submit a command buffer to a queue, you must already know if any
|
|
other command buffers are going to wait on the one you are submitting, and if so how many. This is not
|
|
something that vulkano can automatically determine. The fact that there is therefore no optimal
|
|
algorithm for implicit synchronization would be a good point in favor of explicit synchronization.
|
|
|
|
The decision was taken to encourage users to explicitly handle synchronization between multiple
|
|
queues, but if they forget to do so then vulkano will automatically fall back to a dumb
|
|
worst-case-scenario but safe behavior. Whenever this dumb behavior is triggered, a debug message
|
|
is outputted by vulkano with the `vkDebugReportMessageEXT` function. This message can easily be
|
|
caught by the user by registering a callback, or with a debugger.
|
|
|
|
It is yet to be determined what exactly the user needs to handle. The user will at least need to
|
|
specify an optional list of semaphores to signal at each submission, but maybe not the list of
|
|
semaphores to wait upon if these can be determined automatically. This has yet to be seen.
|