This fixes two cases where a DeviceLostClosureC might not be consumed
before it is dropped, which is a requirement:
1) When the closure is replaced, this ensures the to-be-dropped closure
is invoked.
2) When the global is dropped, this ensures that the closure is invoked
before it is dropped.
The first of these two cases is tested in a new test,
DEVICE_LOST_REPLACED_CALLBACK. The second case has a stub,
always-skipped test, DROPPED_GLOBAL_THEN_DEVICE_LOST. The test is
always-skipped because there does not appear to be a way to drop the
global from within a test. Nor is there any other way to reach
Device.prepare_to_die without having first dropping the device.
* Add serde, serialize, deserialize features to wgpu and wgpu-core
Remove trace, replay features from wgpu-types
* Do not use trace, replay in wgpu-types anymore
* Make use of deserialize, serialize features in wgpu-core
* Make use of serialize, deserialize features in wgpu
* Run cargo fmt
* Use serde(default) for deserialize only
* Fix serial-pass feature
* Add a comment for new features
* Add CHANGELOG entry
* Run cargo fmt
* serial-pass also needs serde features for Id<T>
* Add feature documentation to lib.rs docs
* wgpu-types implicit serde feature
* wgpu-core explicit serde feature
* wgpu explicit serde feature
* Update CHANGELOG.md
* Fix compilation with default features
* Address review comments
* naga: glsl parser should return singular ParseError similar to wgsl
* wgpu: treat glsl the same as wgsl when creating ShaderModule
* naga: update glsl parser tests to use new ParseError type
* naga: glsl ParseError errors field should be public
* wgpu-core: add 'glsl' feature
* fix some minor bugs in glsl parse error refactor
* naga/wgpu/wgpu-core: improve spirv parse error handling
* wgpu-core: feature gate use of glsl and spv naga modules
* wgpu: enable wgpu-core glsl and spirv features when appropriate
* obey clippy
* naga: derive Clone in Type
* naga: don't feature gate Clone derivation for Type
* obey cargo fmt
* wgpu-core: use bytemuck instead of zerocopy
* wgpu-core: apply suggested edit
* wgpu-core: no need to borrow spirv code
* Update wgpu/src/backend/wgpu_core.rs
Co-authored-by: Alphyr <47725341+a1phyr@users.noreply.github.com>
---------
Co-authored-by: Alphyr <47725341+a1phyr@users.noreply.github.com>
This clarifies that the Rust and C-style callbacks/closures need to be
consumed (not called) before they are dropped. It also makes the from_c
function consume the param closure so that it can be dropped without
panicking.
It also relaxes the restriction that the callback/closure can only be
called once.
* Remove the Destroyed state from Storage
It used to be how we handled destroying buffers and textures but we moved to different approach.
* Explicit check for destroyed textures/buffers in a few entry points
This used to be checked automatically when getting the resource from the registry, but has to be done manually now that we track we track the destroyed state in the resources.
Re-implements https://github.com/gfx-rs/wgpu/pull/4886 (CC @Wumpf)
without the `document-features` crate, which has issues integrating into
Firefox builds after being `cargo vendor`ed into its repository. This
issue is being tracked against
https://github.com/slint-ui/document-features/issues/20. Once resolved,
I expect that we will want to revert this PR in its entirety, since
`document-features` is still a good addition to `wgpu`'s documentation
story.
Internally, consensus has already been achieved for this change.
Firefox's ability to build unfortunately take priority over this
particular convenience. Hopefully, we won't have to compromise shortly!
I tested this by ensuring that the HTML output of our existing
`document_features::document_features!(…)` usage was exactly the same.
There should be exactly zero regressions in the current state of
documentation for users. For maintainers, I have added a disclaimer that
one needs to keep changes in sync. with the relevant `Cargo.toml`
manifests.
* Ensure device lost closure is called exactly once before being dropped.
This requires a change to the Rust callback signature, which is now Fn
instead of FnOnce. When the Rust callback or the C closure are dropped,
they will panic if they haven't been called. `device_drop` is changed
to call the closure with a message of "Device dropped." A test is added.
* Remove some locks in BindGroup
These are only written to clear the vectors when triaging bindgroups for destruction, which is not necessary. We can let the reference counts drop when the bind group is dropped.
* Make the mem_leak test pass again
We allocate a String every time we want to get a label for logging. The string is also allocated when logging is disabled. Either way, the allocation is unnecessary. This commit replaces the String with a dyn Debug reference which does not need any allocation.
* Remove the abstractions in resource maps
ResourceMaps had a rather convoluted system for erasing types that isn't needed anywhere except in one place in the triage_resource function. Everywhere else we are always dealing with specific types so using a member of the resource maps is simpler than going through an abstraction. More importantly there was a constraint that all contents of the resource maps implement the Resource trait which got in the way of some of the ongoing buffer snatching changes.
This commit simplifies this by removing the abstraction. Each resource type has its hash map directly in ResourceMaps and it is easier to have different requirements and behaviors depending on the type of the each resource.
* Introduce `dx12` and `metal` crate features to `wgpu`
* Implement dummy `Context` to allow compilation with `--no-default-features`
* Address review
* Remove `dummy::Context` in favor of `hal::api::Empty`
* Add changelog entry
* Panic early in `Instance::new()` if no backend is enabled
Co-Authored-By: Andreas Reich <1220815+Wumpf@users.noreply.github.com>
---------
Co-authored-by: Andreas Reich <1220815+Wumpf@users.noreply.github.com>
The general idea is to register postpone reigistering the buffer until towards the end of the function so that our unique reference to it lets us easily snatch the raw buffer if an error happens.
* Downgrade resource lifetime management log level to trace.
Allow promoting it back to info via an feature flag.
* Don't filter out info and warning log in the examples.
* Changelog entry.
* Downgrade storage log level from info to trace
* Downgrade tracking log level from info to trace
* Demote present log from info to debug
* Downgrade device/life.rs log from info to debug and trace
* Downgrade more log from info to trace
* Conditionally lift API logging from trace to info level
Most of this logging used to be info level until I demoted it to trace. Unfortunately this gets in my way because:
- Most of the logging I currently need is these API entry points, there is is a fair amount of more verbose logging in wgpu at higher levels than trace
- Firefox disable all trace and debug logging for optimized builds, which means I miss the this API logging where I need most.
This patch lifts the api logging back to info level.
* Move the api logging behind the api_log macro
* Clean up the trace-level logging for devices
* Log the descriptors for create_buffer and create_texture.
* Make logged ids more concise
Logged ids go from 'Id { index: 204, epoch: 1, backend: Vulkan }' to 'Id(204,1,vk)'.
* Log errors in more places.
Let `naga::TypeInner::Matrix` hold a full `Scalar`, with a kind and
byte width, not merely a byte width, to make it possible to represent
matrices of AbstractFloats for WGSL.
* Keep the value in its storage after destroy
in #4657 the destroy implementation was made to remove the value from the storage and leave an error variant in its place.
Unfortunately this causes some issues with the tracking code which expects the ID to be unregistered after the value has been fully destroyed, even if the latter is not in storage anymore.
To work around that, this commit adds a `Destroyed` variant in storage which keeps the value so that the tracking behavior is preserved while
still making sure that most accesses to the destroyed resource lead to validation errors.
... Except for submitted command buffers that need to be consumed right away. These are replaced with `Element::Error` like before this commit.
Co-authored-by: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com>
Introduce a new struct type, `Scalar`, combining a `ScalarKind` and a
`Bytes` width, and use this whenever such pairs of values are passed
around.
In particular, use `Scalar` in `TypeInner` variants `Scalar`, `Vector`,
`Atomic`, and `ValuePointer`.
Introduce associated `Scalar` constants `I32`, `U32`, `F32`, `BOOL`
and `F64`, for common cases.
Introduce a helper function `Scalar::float` for constructing `Float`
scalars of a given width, for dealing with `TypeInner::Matrix`, which
only supplies the scalar width of its elements, not a kind.
Introduce helper functions on `Literal` and `TypeInner`, to produce
the `Scalar` describing elements' values.
Use `Scalar` in `wgpu_core::validation::NumericType` as well.
* Better handle destroying textures and buffers
Before this commit, explicitly destroying a texture or a buffer (without dropping it)
schedules the asynchronous destruction of its raw resources but does not actually mark
it as destroyed. This can cause some incorrect behavior, for example mapping a buffer
after destroying it does not cause a validation error (and later panics due to the
map callback being dropped without being called).
This Commit adds `Storage::take_and_mark_destroyed` for use in `destroy` methods.
Since it puts the resource in the error state, other methods properly catch that
the resource is no longer usable when attempting to access it and raise validation
errors.
There are other resource types that require similar treatment and will be addressed
in followup work.
* Add a changelog entry
* More complete implementation of "lose the device".
This provides a way for wgpu-core to specify a callback on "lose the
device". It ensures this callback is called at the appropriate times:
either after device.destroy has empty queues, or on demand from
device.lose.
A test has been added to device.rs.
* Updated CHANGELOG.md.
* Fix conversion to *const c_char.
* Use an allow lint to permit trivial_casts.
* rustfmt changes.
Since the pipeline id is provided by the caller, the caller may presume
that an implicit pipeline layout id is also created, even in error
conditions such as with an invalid device. Since our registry system
will panic if asked to retrieve a pipeline layout id that has never been
registered, it's dangerous to leave that id unregistered. This ensures
that the layout ids also get error values when the pipeline creation
returns an error.
* Add `BGRA8UNORM_STORAGE` extension
* Leave a comment in the backends that don't support bgra8unorm-storage
* Pass the appropriate storage format to naga
* Check for bgra8unorm storage support in the vulkan backend
* Add a test
Co-authored-by: Jinlei Li <jinleili0@outlook.com>
Co-authored-by: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com>
* Make buffer_unmap check for device validity, add tests.
This patch makes buffer_unmap check for a valid device, and corrects
buffer_map to return the appropriate error for an invalid device.
Tests are added for both operations.
It also adds device validity checks to device_maintain_ids and to the
functions that get and set buffer sub data.
* Update changelog and test comment.
* Run rustfmt.
* Update test device_lose_then_more to specify more buffer usages to keep Vulkan happy.
* Don't create a raw bind group layout for duplicates.
* Fully support bind group layout deduplication in Pipeline::get_bind_group_layout
---------
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
* wgpu-core: Only produce StageError::InputNotConsumed on DX11/DX12
This error only exists due to an issue with naga's HLSL support:
https://github.com/gfx-rs/naga/issues/1945
The WGPU spec itself allows vertex shader outputs that are
not consumed by the fragment shader.
Until the issue is fixed, we can allow unconsumed outputs on
all platforms other than DX11/DX12.
* Add Features::SHADER_UNUSED_VERTEX_OUTPUT to allow disabling check
* Pick an unused feature id
* Remove generic parameter in compat::Manager.
The code is specific to bind group layout ids and the generic parameter gets in the way.
* Add a test.
The test actually covers wgpu's configuration where deduplication does not require the indirection rather than the new code. I used it to debug the new code with the configuration hard-coded. It's tedious to add a test to cover dedpuplication of bind group layouts for users of wgpu_core to provide their IDs, we can rely on the CTS which has test for that.
* Implement bind group layout deduplication for all configurations
Currently wgpu-core implement bind group layout deduplication only when it creates its own resource IDs. In other words it works for wgpu but not in Firefox.
This PR bridges the gap by allowing an optional indirection in bind group layouts: each BGL may store an ID referring to its "deduplicated" BGL.
When referring to a BGL the rest of the code must make sure to follow the indirection. The exception is command buffer processing which is considered hot code and where we first validate against the provided BGL ID and only follow the indirection if the initial check failed.
The main pain point with this approach is the various places where wgpu-core manually updates reference counts: we have to be careful about following the indirection to track the right BGL.
* Avoid making decisions based on the size of some generic type.
* Add validation in accordance with WebGPU `setViewport` valid usage for `x`, `y` and `this.[[attachment_size]]`.
`x` and `y` must not be negative, and the rect must be contained in the render target.
* Add changelog entry.
* Make crate features no-op on incompatible targets
* Remove `portable_features`
* Test `--all-features` in CI
* Make `renderdoc` no-op on WASM
* Address review
* remove redundant flag
* set the `MULTISAMPLED_SHADING` downlevel flag for gles and dx11
* set the right naga capabilities
add `Features::SHADER_EARLY_DEPTH_TEST`
* add changelog entry
* remove `@early_depth_test` from water example
* clippy --fix
* elide lifetimes
* fmt and more fixes
* disable clippy::needless_borrowed_reference as it clashes with clippy::pattern_type_mismatch
* missed flags for target=wasm32-unknown-unknown
wgpu currently checks if the `write_mask` is 0 to determine wether a
stencil is used as readonly or not. However Webgpu contains a more
complex ruleset that also checks the cull mode and face operations to
determine if the stencil is readonly or not.
This commit brings these new rules to wgpu.
* Add validation in accordance with WebGPU `GPUSamplerDescriptor` valid usage for `lodMinClamp` and `lodMaxClamp`.
`lodMinClamp` must not be negative, and `lodMaxClamp` must be >= `lodMinClamp`
* Add changelog entry.
* Run `cargo fmt`.
Somewhere after 1.64 but by 1.66, `rustdoc` started checking for
unmatched HTML tags in doc strings, meaning that text like
/// Convenience function turning Option<Selector> into this enum.
causes problems, since `rustdoc` will pass through `<Selector>` as
HTML tag, which browsers displaying the output will misunderstand.
* Low-level error handling in canvas context creation (for WebGPU and WebGL 2).
Part of fixing #3017. This commit changes the handling of `web_sys`
errors and nulls from `expect()` to returning `Err`, but it doesn't
actually affect the public API — that will be done in the next commit.
* Breaking: Change type of `create_surface()` functions to expose canvas errors.
Part of fixing #3017.
* enable doc_auto_cfg for docs.rs
This should expose more feature labels in the generated documentation and removes the needs for the manually labeling the features for a type, function or enum variants.
* enable docsrs cfg when building docs for master
* Split Blendability and Filterability into Two Different TextureFormatFeatureFlags
* Update CHANGELOG.md
* Split out individual booleans to improve readability
* Make sure blendablity is correctly set for guaranteed format features of WebGPU
* Cargo fmt
Co-authored-by: Xander Warnez <xander.warnez@materialise.be>
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
This change is to help with an attempt to allow the Context type in wgpu to be swappable at runtime. In order to do that, the functions provided by Context and it's associated types need to be object safe. Instead of passing a impl trait that combines both HasRawWindowHandle and HasRawDisplayHandle, we seperate the types into their RawDisplayHandle and RawWindowHandle parts internally to reduce some of the hal implementation code mess.
Change `Instance::as_hal<A>` to simply return an
`Option<&A::Instance>`, rather than passing it to a callback. We're
just borrowing a reference to some `wgpu_hal::Api::Instance` owned by
the `wgpu_core::instance::Instance`, so there's no special scoping or
locking magic required here.
* Use () instead of PhantomData as IdentityManager's Input type.
PhantomData suggests that there's some sort of persuasion required
for lifetime variance inference or other sorts of arcana, but it
doesn't seem to be necessary at all. `()` works just fine.
* Have `prepare_staging_buffer` take a raw hal Device.
This helps the borrow checker understand that we can borrow
`pending_writes`'s encoder and the raw Device at the same time.
* Always free staging buffers.
We must ensure that the staging buffer is not leaked when an error
occurs, so allocate it as late as possible, and free it explicitly when
those fallible operations we can't move it past go awry.
Fixes#2959.
* Some tests for texture and buffer copies.
* Add CHANGELOG.md entry.
Since `wgpu-core`'s public functions are supposed to validate their
parameters, the internal `track` module skips many of Rust's usual
run-time checks in release builds. However, some `wgpu-core` users are
happy to pay the performance cost in exchange for more safety. The
`"strict_asserts"` feature causes `wgpu_core` to perform the same
checks in release builds as it does in debug builds.
* Validate that map_async's range is not negative.
map_async already checks that the range's end is within the bounds of the buffer, so this also ensures the range start is within bounds.
* Add an entry in the changelog.
* Record that the buffer is mapped when its size is zero.
* Avoid internally trying to map a zero-sized buffer if mapped_at_creation is true.
* Add an entry in the changelog.
* Ensure the BufferMapAsyncCallback is always called.
This solves two issues on the Gecko side:
- The callback cleans up after itself (the user data is deleted at the end of the callback), so dropping the callback without calling it is a memory leak. I can't think of a better way to implement this on the C++ side since there can be any number of callback at any time living for an unspecified amount of time.
- This makes it easier to implement the error reporting of the WebGPU spec.
* Update the changelog.
* Make the color attachments `Option`-al in render pipelines, render passes, and render bundles
* vk: `Option`-al color attachments support
* dx12: sparse color_attachments support
* Only non-hole attachments is supported on wasm target and gl backend
* deno_webgpu: `Option`-al color attachments support
* Follow all suggestions
* Add Limit::max_buffer_size.
* Prevent very large buffer with some drivers.
Some drivers run into issues when buffer sizes and ranges are larger than what fits signed 32 bit integer. Adapt the maximum buffer size accordingly.
in the thiserror error format string, `{0:?}` ends up referring to the
first named argument, not the first struct field or a compile error, so
the error was incorrectly
Dimension 32768 value 32768 exceeds the limit of 16384
instead of
Dimension X value 32768 exceeds the limit of 16384
* Change all the functions
* Return the set of supported TextureFormat specified by WebGPU
* Remove redundant filtering and use list directly
* Replace pops with first
* Remove now unused function
* Fix doc and clarify preffered format
* Dereference enums
* Remove unused list
* Remove fancy coode
* Update wgpu-core/src/device/mod.rs
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Refactor `wgpu_core::command::bundle::State` to more closely resemble
the internal slots of a WebGPU `GPURenderBundleEncoder`, and add
validation required by the WebGPU specification.
Use `Option` to represent state that may be left unset on the encoder:
specifically, the pipeline and vertex buffers. (Previous commits have
already addressed index buffers and bind groups.) Use `.ok_or`, etc.
for unwrapping, to ensure that encoding state errors are reported.
Consolidate state derived from the pipeline in a new `PipelineState`
struct.
Remove `wgpu_core::command::bundle::PushConstantState::is_dirty`; just
represent push constant state as a vector of `PushConstantRange`
values. It's sufficient to simply zero the push constants whenever the
vector is non-empty.
Rename `bundle::State::flush_push_constants` to `zero_push_constants`a.
This is not a "flush pending state changes" function like all the
others; it just ensures that each pipeline's push constant state is
initialized.
This is used in various places around render pipelines, passes, and
bundles.
The public `wgpu_core::pipeline::VertexBufferLayout` could use
`VertexStep` as well, but I think it's best to let that continue to
resemble `GPUVertexBufferLayout`.
* Remove unused field `bundle::IndexState::pipeline_format`.
* Clean up render bundle index buffer tracking.
Put all state associated with an established index buffer within an
`Option`, so that the Rust types accurately represent value liveness.
Generate `MissingIndexBuffer` errors as needed for `DrawIndexed` and
indexed `MultiDrawIndirect` commands.
`wgpu_core::command::bundle::State::set_pipeline` marks a vertex
buffer slot dirty if the pipeline's stride or step mode for that
vertex buffer slot differs from what had been previously established.
The effect of marking the slot dirty is to ensure that a new
`SetVertexBuffer` command is inserted before the next draw command
that uses that vertex buffer. However, this is unnecessary:
`wgpu_hal::CommandEncoder::set_vertex_buffer` does not need to be
called simply because the stride or rate has changed.
Put some plumbing in place to accomodate the latest definition of
`GPURenderBundleEncoderDescriptor` in the WebGPU spec, which now has
separate `depthReadOnly` and `stencilReadOnly` members.
Rename `RenderPassDepthStencilAttachment::is_read_only` to
`depth_stencil_read_only`, and don't skip validation steps due to
early returns.
* First attempt of exposing create_surface_from_canvas for webgl
* Test Fix Compile For WebGL Offscreen Canvas
* Only specify web-sys feature version in wgpu-core, so that patch version is taken from workspace
* Reuse already existing fn create_surface_from_canvas
* Remove unnecessary unsafe
* Remove unsafe prefix also from top-level create_surface_from_canvas
* Add create_surface_from_offscreen_canvas() for webgl
* Cargo fmt
* Store webgl2_context instead of canvas, which works also for OffscreenCanvas
* Copy skybox example for OffscreenCanvas example
* Use offscreen_canvas instead in newly created example
* Update skypbox_offscreen readme.md
* Allow enabling OffscreenCanvas in examples via http://localhost:8000/?offscreen_canvas=true
* Fix cargo fmt
* [fix warning] Only import FromStr for wasm32
* [fix warning] Exclude offscreen_canvas_setup from non-wasm32
* [fix warning] Add ImageBitmap feature as well so that all related methods can be used
* Fix cargo fmt
* Fix emscripten build
* Remove `webgl` feature from wgpu-core as webgl is the only wasm32 backend
Co-authored-by: Zicklag <zicklag@katharostech.com>
* Implement submission indexes
* Write some unit tests for poll
* Update wgpu/src/lib.rs
Co-authored-by: Jim Blandy <jimb@red-bean.com>
* Unify Maintain in both wgc and wgpu
Co-authored-by: Jim Blandy <jimb@red-bean.com>
* surface.acquire_texture: pass Option<Duration> for timeout
A std::time::Duration allows for timeouts to be specified more clearly in
Rust using whatever units are convenient for the caller, and an Option also
makes it clearer in case no timeout is wanted, as opposed to passing a
bitwise !0 as special timeout value.
Notably there was an impedance mismatch with the Vulkan backend that
takes a 64bit timeout in nanoseconds and uses u64::MAX to indicate no
timeout and the backend was not mapping a given u32::MAX into a u64::MAX
* surface.acquire_texture: ignore timeout for Android < 11
Prior to Android 11 then Android's vkAcquireNextImageKHR implementation was
non-conformant and didn't support timeouts and additionally would log a
verbose warning if a timeout was requested.
For reference this version of AcquireNextImageKHR doesn't support timeouts:
https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-mainline-10.0.0_r13/vulkan/libvulkan/swapchain.cpp#1426
and this version does:
https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-mainline-11.0.0_r45/vulkan/libvulkan/swapchain.cpp#1438
(i.e. timeout support was added in Android 11)
This patch adds a dependency on the `android-properties` crate that provides
a simple wrapper for the `__system_property_set` syscall so that the
platform version can be read via the `ro.build.version.sdk` property
and then for versions < 30 (corresponds to Android 11) any timeout
given to `acquire_texture` will be ignored (and `u64::MAX` will be
passed to Vulkan)
This vector's contents always ended up identical to the
`RenderBundleEncoder`'s `BasePass`'s `dynamic_offsets` vector, so
we can just take values from there instead of copying them.
The `dynamic_offsets` and `is_dirty` flags only make sense when the
slot is occupied, so they should be inside the `Option`. This makes
`State::bind` into an `ArrayVec<Option<BindState>>`, and cleans up
various other bits.
In parking_lot 0.12 and parking_lot_core 0.9.0, those crates switched
from the winapi crate to the official Microsoft windows-sys crate.
This is fine, except that windows-sys and its dependencies are even
larger than winapi. Some users may wish to stick with winapi for the
time being; this change allows wgpu to accommodate them.
It's very odd to have almost all the render pass and compute pass ffi
functions in `wgpu` except for the `set_index_buffer` functions, which
live in Firefox. I'd like to remove these from Firefox and put them
back next to their companions.
These functions were originally removed from wgpu in #1077, because
wgpu-native has its own incompatible version of IndexFormat (see that
PR for details). However, with wgpu-native#85, that code was removed,
so having these functions in `wgpu` should be no longer be a problem
for wgpu-native.
* fix: don't panic on invalid id in Storage::get
* formatting
* removed double matches
* more match removal
* fix formatting
* add fix to Storage::label_for_invalid_id
This is intended to help developers new to wgpu or to graphics debugging
quickly recognize in a debugging tool which items are wgpu-generated, as
opposed to either part of their program or part of the platform graphics
system.
I also removed existing marker-like text such as leading underscores and
angle brackets.
Without this change, `LifetimeTracker::triage_suspected` never notices
that compute or render pipelines are free, and they stick around until
the hub is destroyed.
Fixes#2564.
Go ahead and call `global.device_drop` from `direct::Context::device_drop`.
Let `Global::device_drop` simply drop the life guard's `RefCount`, and
put off everything else entailed in freeing a device until
`Device::maintain` says its queue is empty and there are no more
references to it. (The user can reach that function, even after
dropping their `Device`, by calling `wgpu::Instance::poll_all`.)
Fixes#2563.
`RefCount::rich_drop_inner` is no longer used by anything other than `RefCount::drop`. It's simpler to just handle it directly in `drop`.
`MultiRefCount` has no need to heap-allocate the count, since it's
never cloned.