Add the following flags to `wgpu_types::Features`:
- `SHADER_INT64_ATOMIC_ALL_OPS` enables all atomic operations on `atomic<i64>` and
`atomic<u64>` values.
- `SHADER_INT64_ATOMIC_MIN_MAX` is a subset of the above, enabling only
`AtomicFunction::Min` and `AtomicFunction::Max` operations on `atomic<i64>` and
`atomic<u64>` values in the `Storage` address space. These are the only 64-bit
atomic operations available on Metal as of 3.1.
Add corresponding flags to `naga::valid::Capabilities`. These are supported by the
WGSL front end, and all Naga backends.
Platform support:
- On Direct3d 12, in `D3D12_FEATURE_DATA_D3D12_OPTIONS9`, if
`AtomicInt64OnTypedResourceSupported` and `AtomicInt64OnGroupSharedSupported` are
both available, then both wgpu features described above are available.
- On Metal, `SHADER_INT64_ATOMIC_MIN_MAX` is available on Apple9 hardware, and on
hardware that advertises both Apple8 and Mac2 support. This also requires Metal
Shading Language 2.4 or later. Metal does not yet support the more general
`SHADER_INT64_ATOMIC_ALL_OPS`.
- On Vulkan, if the `VK_KHR_shader_atomic_int64` extension is available with both the
`shader_buffer_int64_atomics` and `shader_shared_int64_atomics` features, then both
wgpu features described above are available.
* add new tests for checking on query set lifetime
* Fix ownership management of query sets on compute passes for write_timestamp, timestamp_writes (on desc) and pipeline statistic queries
* changelog entry
Fix two major synchronization issues in `wgpu_val::vulkan`:
- Properly order queue command buffer submissions. Due to Mesa bugs, two semaphores are required even though the Vulkan spec says that only one should be necessary.
- Properly manage surface texture acquisition and presentation:
- Acquiring a surface texture can return while the presentation engine is still displaying the texture. Applications must wait for a semaphore to be signaled before using the acquired texture.
- Presenting a surface texture requires a semaphore to ensure that drawing is complete before presentation occurs.
Co-authored-by: Jim Blandy <jimb@red-bean.com>
This proves a flag in msl::PipelineOptions that attempts to write all
Metal vertex entry points to use a vertex pulling technique. It does
this by:
1) Forcing the _buffer_sizes structure to be generated for all vertex
entry points. The structure has additional buffer_size members that
contain the byte sizes of the vertex buffers.
2) Adding new args to vertex entry points for the vertex id and/or
the instance id and for the bound buffers. If there is an existing
@builtin(vertex_index) or @builtin(instance_index) param, then no
duplicate arg is created.
3) Adding code at the beginning of the function for vertex entry points
to compare the vertex id or instance id against the lengths of all the
bound buffers, and force an early-exit if the bounds are violated.
4) Extracting the raw bytes from the vertex buffer(s) and unpacking
those bytes into the bound attributes with the expected types.
5) Replacing the varyings input and instead using the unpacked
attributes to fill any structs-as-args that are rebuilt in the entry
point.
A new naga test is added which exercises this flag and demonstrates the
effect of the transform. The msl generated by this test passes
validation.
Eventually this transformation will be the default, always-on behavior
for Metal pipelines, though the flag may remain so that naga
translation tests can be run with and without the tranformation.
* lift encoder->computepass lifetime constraint and add now failing test
* compute passes now take an arc to their parent command encoder, thus removing compile time dependency to it
* Command encoder goes now into locked state while compute pass is open
* changelog entry
* share most of the code between get_encoder and lock_encoder
These are being deprecated in the future in favor of the associated
constants (which are already being used in some code here), so this
consistently uses the preferred forms.
* rename `command_encoder_run_*_pass` to `*_pass_end` and make it a method of compute/render pass instead of encoder
* executing a compute pass consumes it now such that it can't be executed again
* use handle_error instead of handle_error_nolabel for wgpu compute pass
* use handle_error instead of handle_error_nolabel for render_pass_end
* changelog addition
* feat: `compute_pass_set_push_constant`: move panics to error variants
Co-Authored-By: Erich Gubler <erichdongubler@gmail.com>
---------
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
This was previously added in #2230 but I don't think it was necessary. #901 already implemented the buffer <-> texture validation for those formats. It's also not a requirement in the spec.
* basic test setup
* remove lifetime and drop resources on test - test fails now just as expected
* compute pass recording is now hub dependent (needs gfx_select)
* compute pass recording now bumps reference count of uses resources directly on recording
TODO:
* bind groups don't work because the Binder gets an id only
* wgpu level error handling is missing
* simplify compute pass state flush, compute pass execution no longer needs to lock bind_group storage
* wgpu sided error handling
* make ComputePass hal dependent, removing command cast hack. Introduce DynComputePass on wgpu side
* remove stray repr(C)
* changelog entry
* fix deno issues -> move DynComputePass into wgc
* split out resources setup from test
A `for` loop is less noisy than a `drain`, which requires:
- a `mut` qualifier for a variable whose modified value we never
consult
- a method name appearing mid-line instead of a control structure name
at the front of the line
- a range which is always `..`, establishing no restriction at all
- a closure instead of a block
Structured control flow syntax has a fine pedigree, originating in,
among other places, Dijkstrsa's efforts at designing languages in a
way that made it easier to formally verify programs written in
them (see "A Discipline Of Programming"). There is nothing "more
mathematical" about a method call that takes a closure than a `for`
loop. Since `for_each` is useless unless the closure has side effects,
there's nothing "more functional" about `for_each` here, either.
Obsessive use of `for_each` suggests that the author loves Haskell
without understanding it.
Rename `LifetimeTracker::triage_resources`'s `resources_map` argument
to `suspected_resources`, since this always points to a field of
`LifetimeTracker::suspected_resources`.
In the various `triage_suspected_foo` functions, name the map
`suspected_foos`.
Check whether the resource is abandoned first, since none of the rest
of the work is necessary otherwise.
Rename `non_referenced_resources` to `last_resources`. This function
copes with various senses in which the resource might be referenced or
not. Instead, `last_resources` is the name of the `ActiveSubmission`
member this may point to, which is more specific.
Move the use of `last_resources` immediately after its production.
* Avoid introducing spurious features for optional dependencies
If a feature depends on an optional dependency without using the dep:
prefix, a feature with the same name as the optional dependency is
introduced. This feature almost certainly won't have any effect when
enabled other than increasing compile times and polutes the feature list
shown by cargo add. Consistently use dep: for all optional dependencies
to avoid this problem.
* Add changelog entry
* Clean up weak references to texture views
* add change to CHANGELOG.md
* drop texture view before clean up
* cleanup weak ref to bind groups
* update changelog
* Trim weak backlinks in their holders' triage functions.
---------
Co-authored-by: Jim Blandy <jimb@red-bean.com>
Change `Device::untrack` to properly reuse the `ResourceMap` allocated
for prior calls. The prior code tries to do this but always leaves
`Device::temp_suspected` set to a new empty `ResourceMap`, leaving the
previous value to be dropped by `ResourceMap::extend`.
Change `ResourceMap::extend` to take `other` by reference, rather than
taking it by value and dropping it.
Remove unreachable code from `Global::queue_submit` that checks
whether the resources used by the command buffer have a reference
count of one, and adds them to `Device::temp_suspected` if so.
When `queue_submit` is called, all the `Arc`s processed by this code
have a reference count of at least three, even when the user has
dropped the resource:
- `Device::trackers` holds strong references to all the device's
resources.
- `CommandBufferMutable::trackers` holds strong references to all
resources used by the command buffer.
- The `used_resources` methods of the various members of
`CommandBufferMutable::trackers` all return iterators of owned
`Arc`s.
Fortunately, since the `Global::device_drop_foo` methods all add the
`foo` being dropped to `Device::suspected_resources`, and
`LifetimeTracker::triage_suspected` does an adequate job of accounting
for the uninteresting `Arc`s and leaves the resources there until
they're actually dead, things do get cleaned up without the checks in
`Global::queue_submit`.
This allows `Device::temp_suspected` to be private to
`device::resource`, with a sole remaining use in `Device::untrack`.
Fixes#5647.
The lock analyzers in the `wgpu_core::lock` module can be a bit
simpler if they can assume that locks are acquired and released in a
stack-like order: that a guard is only dropped when it is the most
recently acquired lock guard still held. So:
- Change `Device::maintain` to take a `RwLockReadGuard` for the device's
hal fence, rather than just a reference to it.
- Adjust the order in which guards are dropped in `Device::maintain`
and `Queue::submit`.
Fixes#5610.
Rather than implementing `Drop` for all three lock guard types to
restore the lock analysis' per-thread state, let lock guards own
values of a new type, `LockStateGuard`, with the appropriate `Drop`
implementation. This is cleaner and shorter, and helps us implement
`RwLock::downgrade` in a later commit.
* Fix cts_runner command invocation in readme
* Remove assertDeviceMatch from deno_webgpu in createBindGroup
This should be done as verification in wgpu-core.
* Add device mismatched check to create_buffer_binding
* Extract common logic to create_sampler_binding
* Move common logic to create_texture_binding and add device mismatch check
Introduce two new private functions, `acquire` and `release`, to the
`lock::ranked` module, to perform validation for acquiring and
releasing locks. Change `Mutex::lock` and `MutexGuard::drop` to use
those functions, rather than writing out their contents.