The `macro_rules!` implementation was becomng excessively complicated,
and difficult to modify. The new proc macro implementation should make
it much easier to add new features (e.g. skipping certain `#[derive]`s)
This reduces peak memory usage significantly for some programs with very
large functions, such as:
- `keccak`, `unicode_normalization`, and `match-stress-enum`, from
the `rustc-perf` benchmark suite;
- `http-0.2.6` from crates.io.
The new type is used in the analyses where the bitsets can get huge
(e.g. 10s of thousands of bits): `MaybeInitializedPlaces`,
`MaybeUninitializedPlaces`, and `EverInitializedPlaces`.
Some refactoring was required in `rustc_mir_dataflow`. All existing
analysis domains are either `BitSet` or a trivial wrapper around
`BitSet`, and access in a few places is done via `Borrow<BitSet>` or
`BorrowMut<BitSet>`. Now that some of these domains are `ClusterBitSet`,
that no longer works. So this commit replaces the `Borrow`/`BorrowMut`
usage with a new trait `BitSetExt` containing the needed bitset
operations. The impls just forward these to the underlying bitset type.
This required fiddling with trait bounds in a few places.
The commit also:
- Moves `static_assert_size` from `rustc_data_structures` to
`rustc_index` so it can be used in the latter; the former now
re-exports it so existing users are unaffected.
- Factors out some common "clear excess bits in the final word"
functionality in `bit_set.rs`.
- Uses `fill` in a few places instead of loops.
`Decoder` has two impls:
- opaque: this impl is already partly infallible, i.e. in some places it
currently panics on failure (e.g. if the input is too short, or on a
bad `Result` discriminant), and in some places it returns an error
(e.g. on a bad `Option` discriminant). The number of places where
either happens is surprisingly small, just because the binary
representation has very little redundancy and a lot of input reading
can occur even on malformed data.
- json: this impl is fully fallible, but it's only used (a) for the
`.rlink` file production, and there's a `FIXME` comment suggesting it
should change to a binary format, and (b) in a few tests in
non-fundamental ways. Indeed #85993 is open to remove it entirely.
And the top-level places in the compiler that call into decoding just
abort on error anyway. So the fallibility is providing little value, and
getting rid of it leads to some non-trivial performance improvements.
Much of this commit is pretty boring and mechanical. Some notes about
a few interesting parts:
- The commit removes `Decoder::{Error,error}`.
- `InternIteratorElement::intern_with`: the impl for `T` now has the same
optimization for small counts that the impl for `Result<T, E>` has,
because it's now much hotter.
- Decodable impls for SmallVec, LinkedList, VecDeque now all use
`collect`, which is nice; the one for `Vec` uses unsafe code, because
that gave better perf on some benchmarks.
This is a compact, fast storage for variable-sized sets, typically consisting of
larger ranges. It is less efficient than a bitset if ranges are both small and
the domain size is small, but will still perform acceptably. With enormous
domain sizes and large ranges, the interval set performs much better, as it can
be much more densely packed in memory than the uncompressed bit set alternative.
Optimize live point computation
This refactors the live-point computation to lower per-MIR-instruction costs by operating on a largely per-block level. This doesn't fundamentally change the number of operations necessary, but it greatly improves the practical performance by aggregating bit manipulation into ranges rather than single-bit; this scales much better with larger blocks.
On the benchmark provided in #90445, with 100,000 array elements, walltime for a check build is improved from 143 seconds to 15.
I consider the tiny losses here acceptable given the many small wins on real world benchmarks and large wins on stress tests. The new code scales much better, but on some subset of inputs the slightly higher constant overheads decrease performance somewhat. Overall though, this is expected to be a big win for pathological cases (as illustrated by the test case motivating this work) and largely not material for non-pathological cases. I consider the new code somewhat easier to follow, too.
This is just replicating the previous algorithm, but taking advantage of the
bitset structures to optimize into tighter and better optimized loops.
Particularly advantageous on enormous MIR blocks, which are relatively rare in
practice.
The PR had some unforseen perf regressions that are not as easy to find.
Revert the PR for now.
This reverts commit 6ae8912a3e, reversing
changes made to 86d6d2b738.
Fix inherent impl overlap check.
The current implementation of the overlap check was slightly buggy, and unified the wrong connected component in the `ids.len() <= 1` case. This became visible in another PR which changed the iteration order of items.
r? ``@matthewjasper`` since you reviewed the other PR.
Stabilize `const_panic`
Closes#51999
FCP completed in #89006
```@rustbot``` label +A-const-eval +A-const-fn +T-lang
cc ```@oli-obk``` for review (not `r?`'ing as not on lang team)
For some reason unboxed_closures supresses the feature gate for
min_specialization when implementing TrustedStep. min_specialization is
the true feature that is used.
Since RFC 3052 soft deprecated the authors field anyway, hiding it from
crates.io, docs.rs, and making Cargo not add it by default, and it is
not generally up to date/useful information, we should remove it from
crates in this repo.
While stdlib implementations of the unchecked methods require unchecked
math, there is no reason to gate it behind this for external users. The
reasoning for a separate `step_trait_ext` feature is unclear, and as
such has been merged as well.
further split up const_fn feature flag
This continues the work on splitting up `const_fn` into separate feature flags:
* `const_fn_trait_bound` for `const fn` with trait bounds
* `const_fn_unsize` for unsizing coercions in `const fn` (looks like only `dyn` unsizing is still guarded here)
I don't know if there are even any things left that `const_fn` guards... at least libcore and liballoc do not need it any more.
`@oli-obk` are you currently able to do reviews?
With the arrival of min const generics, many alt-vec libraries have
updated to use it in some way and arrayvec is no exception. Use the
latest with minor refactoring.
Also, rustc_workspace_hack is the only user of smallvec 0.6 in the
entire tree, so drop it.
Apply workaround from #72003 for #56935 to allow for cross-compilation of `rustc_index` crate
This patch applies the same workaround as #72003 to the `rustc_index` crate. This allows recent versions of rustfmt to compile to wasm again.
Related: #72017.
Add additional bitset benchmarks
Add additional benchmarks for operations in bitset, I realize that it was a bit lacking when I intended to optimize it earlier, so I was hoping to put some in so I can verify my work later.
A few small cleanups to `BitSet` and friends:
- Overload `clone_from` for `BitSet`.
- Improve `Debug` represenation of `HybridBitSet`.
- Make `HybridBitSet::domain_size` public.
- Don't require `T: Idx` at the type level. The `Idx` bound is still on
most `BitSet` methods, but like `HashMap`, it doesn't need to be
satisfied for the type to exist.