* Add an optional system for counting and reporting internal resources and events
* Count API objects in wgpu-hal
* Expose internal counters in wgpu-core and wgpu.
- Improve logging in `StatelessTracker::remove_abandoned` to show the
outcome of the call.
- Add similar logging to `BufferTracker` and `TextureTracker`.
- Let `device_create_buffer`'s log only the new buffer's label, id,
and whether it's mapped at creation. It used to show the whole
descriptor, which is too much detail.
- Have `queue_submit` log the submission id, and have `device_poll`
log what it was waiting for, and what actually got done.
- Have `Device::drop` log the destruction of the raw device when it
actually happens, so it's properly ordered with respect to logging
from other parts of the device, like `Device::command_allocator`.
Add the following flags to `wgpu_types::Features`:
- `SHADER_INT64_ATOMIC_ALL_OPS` enables all atomic operations on `atomic<i64>` and
`atomic<u64>` values.
- `SHADER_INT64_ATOMIC_MIN_MAX` is a subset of the above, enabling only
`AtomicFunction::Min` and `AtomicFunction::Max` operations on `atomic<i64>` and
`atomic<u64>` values in the `Storage` address space. These are the only 64-bit
atomic operations available on Metal as of 3.1.
Add corresponding flags to `naga::valid::Capabilities`. These are supported by the
WGSL front end, and all Naga backends.
Platform support:
- On Direct3d 12, in `D3D12_FEATURE_DATA_D3D12_OPTIONS9`, if
`AtomicInt64OnTypedResourceSupported` and `AtomicInt64OnGroupSharedSupported` are
both available, then both wgpu features described above are available.
- On Metal, `SHADER_INT64_ATOMIC_MIN_MAX` is available on Apple9 hardware, and on
hardware that advertises both Apple8 and Mac2 support. This also requires Metal
Shading Language 2.4 or later. Metal does not yet support the more general
`SHADER_INT64_ATOMIC_ALL_OPS`.
- On Vulkan, if the `VK_KHR_shader_atomic_int64` extension is available with both the
`shader_buffer_int64_atomics` and `shader_shared_int64_atomics` features, then both
wgpu features described above are available.
* add new tests for checking on query set lifetime
* Fix ownership management of query sets on compute passes for write_timestamp, timestamp_writes (on desc) and pipeline statistic queries
* changelog entry
Fix two major synchronization issues in `wgpu_val::vulkan`:
- Properly order queue command buffer submissions. Due to Mesa bugs, two semaphores are required even though the Vulkan spec says that only one should be necessary.
- Properly manage surface texture acquisition and presentation:
- Acquiring a surface texture can return while the presentation engine is still displaying the texture. Applications must wait for a semaphore to be signaled before using the acquired texture.
- Presenting a surface texture requires a semaphore to ensure that drawing is complete before presentation occurs.
Co-authored-by: Jim Blandy <jimb@red-bean.com>
This proves a flag in msl::PipelineOptions that attempts to write all
Metal vertex entry points to use a vertex pulling technique. It does
this by:
1) Forcing the _buffer_sizes structure to be generated for all vertex
entry points. The structure has additional buffer_size members that
contain the byte sizes of the vertex buffers.
2) Adding new args to vertex entry points for the vertex id and/or
the instance id and for the bound buffers. If there is an existing
@builtin(vertex_index) or @builtin(instance_index) param, then no
duplicate arg is created.
3) Adding code at the beginning of the function for vertex entry points
to compare the vertex id or instance id against the lengths of all the
bound buffers, and force an early-exit if the bounds are violated.
4) Extracting the raw bytes from the vertex buffer(s) and unpacking
those bytes into the bound attributes with the expected types.
5) Replacing the varyings input and instead using the unpacked
attributes to fill any structs-as-args that are rebuilt in the entry
point.
A new naga test is added which exercises this flag and demonstrates the
effect of the transform. The msl generated by this test passes
validation.
Eventually this transformation will be the default, always-on behavior
for Metal pipelines, though the flag may remain so that naga
translation tests can be run with and without the tranformation.
* lift encoder->computepass lifetime constraint and add now failing test
* compute passes now take an arc to their parent command encoder, thus removing compile time dependency to it
* Command encoder goes now into locked state while compute pass is open
* changelog entry
* share most of the code between get_encoder and lock_encoder
These are being deprecated in the future in favor of the associated
constants (which are already being used in some code here), so this
consistently uses the preferred forms.
* rename `command_encoder_run_*_pass` to `*_pass_end` and make it a method of compute/render pass instead of encoder
* executing a compute pass consumes it now such that it can't be executed again
* use handle_error instead of handle_error_nolabel for wgpu compute pass
* use handle_error instead of handle_error_nolabel for render_pass_end
* changelog addition
* feat: `compute_pass_set_push_constant`: move panics to error variants
Co-Authored-By: Erich Gubler <erichdongubler@gmail.com>
---------
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
This was previously added in #2230 but I don't think it was necessary. #901 already implemented the buffer <-> texture validation for those formats. It's also not a requirement in the spec.
* basic test setup
* remove lifetime and drop resources on test - test fails now just as expected
* compute pass recording is now hub dependent (needs gfx_select)
* compute pass recording now bumps reference count of uses resources directly on recording
TODO:
* bind groups don't work because the Binder gets an id only
* wgpu level error handling is missing
* simplify compute pass state flush, compute pass execution no longer needs to lock bind_group storage
* wgpu sided error handling
* make ComputePass hal dependent, removing command cast hack. Introduce DynComputePass on wgpu side
* remove stray repr(C)
* changelog entry
* fix deno issues -> move DynComputePass into wgc
* split out resources setup from test
A `for` loop is less noisy than a `drain`, which requires:
- a `mut` qualifier for a variable whose modified value we never
consult
- a method name appearing mid-line instead of a control structure name
at the front of the line
- a range which is always `..`, establishing no restriction at all
- a closure instead of a block
Structured control flow syntax has a fine pedigree, originating in,
among other places, Dijkstrsa's efforts at designing languages in a
way that made it easier to formally verify programs written in
them (see "A Discipline Of Programming"). There is nothing "more
mathematical" about a method call that takes a closure than a `for`
loop. Since `for_each` is useless unless the closure has side effects,
there's nothing "more functional" about `for_each` here, either.
Obsessive use of `for_each` suggests that the author loves Haskell
without understanding it.
Rename `LifetimeTracker::triage_resources`'s `resources_map` argument
to `suspected_resources`, since this always points to a field of
`LifetimeTracker::suspected_resources`.
In the various `triage_suspected_foo` functions, name the map
`suspected_foos`.
Check whether the resource is abandoned first, since none of the rest
of the work is necessary otherwise.
Rename `non_referenced_resources` to `last_resources`. This function
copes with various senses in which the resource might be referenced or
not. Instead, `last_resources` is the name of the `ActiveSubmission`
member this may point to, which is more specific.
Move the use of `last_resources` immediately after its production.
* Avoid introducing spurious features for optional dependencies
If a feature depends on an optional dependency without using the dep:
prefix, a feature with the same name as the optional dependency is
introduced. This feature almost certainly won't have any effect when
enabled other than increasing compile times and polutes the feature list
shown by cargo add. Consistently use dep: for all optional dependencies
to avoid this problem.
* Add changelog entry
* Clean up weak references to texture views
* add change to CHANGELOG.md
* drop texture view before clean up
* cleanup weak ref to bind groups
* update changelog
* Trim weak backlinks in their holders' triage functions.
---------
Co-authored-by: Jim Blandy <jimb@red-bean.com>
Change `Device::untrack` to properly reuse the `ResourceMap` allocated
for prior calls. The prior code tries to do this but always leaves
`Device::temp_suspected` set to a new empty `ResourceMap`, leaving the
previous value to be dropped by `ResourceMap::extend`.
Change `ResourceMap::extend` to take `other` by reference, rather than
taking it by value and dropping it.
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.
When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:
- `Device::trackers` holds strong references to all the device's
resources.
- `CommandBufferMutable::trackers` holds strong references to all
resources used by the command buffer.
- The `used_resources` methods of the various members of
`CommandBufferMutable::trackers` all return iterators of owned
`Arc`s.
Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.
This allows `Device::temp_suspected` to be private to
`device::resource`, with a sole remaining use in `Device::untrack`.
Fixes#5647.
The lock analyzers in the `wgpu_core::lock` module can be a bit
simpler if they can assume that locks are acquired and released in a
stack-like order: that a guard is only dropped when it is the most
recently acquired lock guard still held. So:
- Change `Device::maintain` to take a `RwLockReadGuard` for the device's
hal fence, rather than just a reference to it.
- Adjust the order in which guards are dropped in `Device::maintain`
and `Queue::submit`.
Fixes#5610.
Rather than implementing `Drop` for all three lock guard types to
restore the lock analysis' per-thread state, let lock guards own
values of a new type, `LockStateGuard`, with the appropriate `Drop`
implementation. This is cleaner and shorter, and helps us implement
`RwLock::downgrade` in a later commit.
* Fix cts_runner command invocation in readme
* Remove assertDeviceMatch from deno_webgpu in createBindGroup
This should be done as verification in wgpu-core.
* Add device mismatched check to create_buffer_binding
* Extract common logic to create_sampler_binding
* Move common logic to create_texture_binding and add device mismatch check
Introduce two new private functions, `acquire` and `release`, to the
`lock::ranked` module, to perform validation for acquiring and
releasing locks. Change `Mutex::lock` and `MutexGuard::drop` to use
those functions, rather than writing out their contents.
* move out compute command to separate module
* introduce ArcComputeCommand
* stateless tracker now returns reference to arc upon insertion
* add insert_merge_single to buffer tracker
* compute pass execution now works internally with an ArcComputeCommand
* compute pass execution now translates Command to ArcCommand ahead of time
* don't clone commands in compute pass execution
* remove doc hiding
* use option insert
* clippy fix
* fix private doc issue
* remove unnecessary copied over doc hide
If `debug_assertions` or the `"validate-locks"` feature are enabled,
change `wgpu-core` to use a wrapper around `parking_lot::Mutex` that
checks for potential deadlocks.
At the moment, `wgpu-core` does contain deadlocks, so the ranking in
the `lock::rank` module is incomplete, in the interests of keeping it
acyclic. #5572 tracks the work needed to complete the ranking.
The derivation is only effective if the generic type parameter `A`
also implements `Default`, which `HalApi` implementations generally
don't, so this derivation never actually took place. (This is why
`ResourceMaps::new` is written out the way it is.)
Move the `Mutex` in `Device::command_allocator` inside the
`CommandAllocator` type itself, allowing it to be passed by shared
reference instead of mutable reference.
Passing `CommandAllocator` to functions like
`PendingWrites::post_submit` by mutable reference requires the caller
to acquire and hold the mutex for the entire time the callee runs, but
`CommandAllocator` is just a recycling pool, with very simple
invariants; there's no reason to hold the lock for a long time.
Flesh out the documentation for `wgpu_core`'s `CommandBuffer`,
`CommandEncoder`, and associated types.
Allow doc links to private items. `wgpu-core` isn't entirely
user-facing, so it's useful to document internal items.
Replace the `wgpu_core:🆔:Id::transmute` method, the `transmute`
private module, and the `Transmute` sealed trait with some associated
functions with obvious names.
* [wgpu-core] pass resources as Arcs when adding them to the registry (fix gfx-rs#5493)
* [wgpu-core] also add `Arc::new` to `#[cfg(dx12)]` blocks
* [wgpu-core] allow `clippy::arc_with_non_send_sync`
* pool tracker vecs
* pool
* ci
* move pool to device
* use pool ref, cleanup and comment
* suspect all the future suspects (#5413)
* suspect all the future suspects
* changelog
* changelog
* review feedback
---------
Co-authored-by: Andreas Reich <r_andreas2@web.de>
Invoke a DeviceLostClosure immediately if set on an invalid device.
To make the device invalid, this defines an explicit, test-only method
make_invalid. It also modifies calls that expect to always retrieve a
valid device.
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
Rust would have made this operation either an overflow in release mode,
or a panic in debug mode. Neither seem appropriate for this context,
where I suspect an error should be returned instead. Web browsers, for
instance, shouldn't crash simply because of an issue of this nature.
Users may, quite reasonably, have bad arguments to this in early stages
of development!
Fuzz testing in Firefox encountered crashes for calls of
`Global::command_encoder_clear_buffer` where:
* `offset` is greater than `buffer.size`, but…
* `size` is `None`.
Oops! We should _always_ check this (i.e., even when `size` is `None`),
because we have no guarantee that `offset` and the fallback value of
`size` is in bounds. 😅 So, we change validation here to unconditionally
compute `size` and run checks we previously gated behind `if let
Some(size) = size { … }`.
For convenience, the spec. link for this method:
<https://gpuweb.github.io/gpuweb/#dom-gpucommandencoder-clearbuffer>
* split out TIMESTAMP_QUERY_INSIDE_ENCODERS from TIMESTAMP_QUERY
* changelog entry
* update changelog change number
* fix web warnings
* single line changelog
* note on followup issue
When no work is submitted for a frame, presenting the surface results
in a timeout due to no work having been submitted.
Fixes#3189.
This flag was added in #1892 with a note that it was going to be
temporary until #1688 landed.
* docs: sync. `wgpu/Cargo.toml` feature comments with `lib.rs`
* Revert "docs: inline `document-features` usage, remove dep."
This reverts commit 3d5bec659b9cf19f1c64274de0d11808d771cc66, with an
update to `document-features`, and preferring to keep new `feature`
content. To be clear, the only difference I have observed is the
addition of the `serde` feature.
In case it shortens anyone's search, the specific issue resolved is
[`slint-ui/document-features`#20](https://github.com/slint-ui/document-features/issues/20).
* [wgpu-core] Add tests for minimum binding size validation.
* [wgpu-core] Compute minimum binding size correctly for arrays.
In early versions of WGSL, `storage` or `uniform` global variables had
to be either structs or runtime-sized arrays. This rule was relaxed,
and now globals can have any type; Naga automatically wraps such
variables in structs when required by the backend shading language.
Under the old rules, whenever wgpu-core saw a `storage` or `uniform`
global variable with an array type, it could assume it was a
runtime-sized array, and take the stride as the minimum binding size.
Under the new rules, wgpu-core must consider fixed-sized and
runtime-sized arrays separately.
It's risky to get write access through the snatchlock from a drop implementation since the snatch lock is typically held for large scopes. This commit makes it so we deffer snatching some resources to when the device is polled and we know the snatch lock is not held.
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
This fixes two cases where a DeviceLostClosureC might not be consumed
before it is dropped, which is a requirement:
1) When the closure is replaced, this ensures the to-be-dropped closure
is invoked.
2) When the global is dropped, this ensures that the closure is invoked
before it is dropped.
The first of these two cases is tested in a new test,
DEVICE_LOST_REPLACED_CALLBACK. The second case has a stub,
always-skipped test, DROPPED_GLOBAL_THEN_DEVICE_LOST. The test is
always-skipped because there does not appear to be a way to drop the
global from within a test. Nor is there any other way to reach
Device.prepare_to_die without having first dropping the device.
* Add serde, serialize, deserialize features to wgpu and wgpu-core
Remove trace, replay features from wgpu-types
* Do not use trace, replay in wgpu-types anymore
* Make use of deserialize, serialize features in wgpu-core
* Make use of serialize, deserialize features in wgpu
* Run cargo fmt
* Use serde(default) for deserialize only
* Fix serial-pass feature
* Add a comment for new features
* Add CHANGELOG entry
* Run cargo fmt
* serial-pass also needs serde features for Id<T>
* Add feature documentation to lib.rs docs
* wgpu-types implicit serde feature
* wgpu-core explicit serde feature
* wgpu explicit serde feature
* Update CHANGELOG.md
* Fix compilation with default features
* Address review comments