This adds panicking Hash impls for several resolver types that don't
actually satisfy this condition. It's not obvious to me that
rustc_resolve actually upholds the Interned guarantees but fixing that
seems pretty hard (the structures have at minimum some interior
mutability, so it's not really recursively hashable in place...).
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
When the word "cache" appears in the context of the query system, it often
isn't obvious whether that is referring to the in-memory query cache or the
on-disk incremental cache.
For these types, we can assure the reader that they are for in-memory caching.
It's a function that prints numbers with underscores inserted for
readability (e.g. "1_234_567"), used by `-Zmeta-stats` and
`-Zinput-stats`. It's the only thing in `rustc_middle::util::common`,
which is a bizarre location for it.
This commit:
- moves it to `rustc_data_structures`, a more logical crate for it;
- puts it in a module `thousands`, like the similar crates.io crate;
- renames it `format_with_underscores`, which is a clearer name;
- rewrites it to be more concise;
- slightly improves the testing.
This assumes that the set of valid node IDs is exactly `0..num_nodes`.
In practice, we have a lot of graph-algorithm code that already assumes that
nodes are densely numbered, by using `num_nodes` to allocate per-node indexed
data structures.
[cfg_match] Adjust syntax
A year has passed since the creation of #115585 and the feature, as expected, is not moving forward. Let's change that.
This PR proposes changing the arm's syntax from `cfg(SOME_CONDITION) => { ... }` to `SOME_CODITION => {}`.
```rust
match_cfg! {
unix => {
fn foo() { /* unix specific functionality */ }
}
target_pointer_width = "32" => {
fn foo() { /* non-unix, 32-bit functionality */ }
}
_ => {
fn foo() { /* fallback implementation */ }
}
}
```
Why? Because after several manual migrations in https://github.com/rust-lang/rust/pull/116342 it became clear, at least for me, that `cfg` prefixes are unnecessary, verbose and redundant.
Again, everything is just a proposal to move things forward. If the shown syntax isn't ideal, feel free to close this PR or suggest other alternatives.
Improve VecCache under parallel frontend
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (~1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
try-job: i686-gnu-nopt
try-job: dist-x86_64-linux
This replaces the single Vec allocation with a series of progressively
larger buckets. With the cfg for parallel enabled but with -Zthreads=1,
this looks like a slight regression in i-count and cycle counts (<0.1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5%
wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based
approach, likely due to reducing the bouncing of the cache line holding
the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319
seconds).
Since it's inception a long time ago, the parallel compiler and its cfgs
have been a maintenance burden. This was a necessary evil the allow
iteration while not degrading performance because of synchronization
overhead.
But this time is over. Thanks to the amazing work by the parallel
working group (and the dyn sync crimes), the parallel compiler has now
been fast enough to be shipped by default in nightly for quite a while
now.
Stable and beta have still been on the serial compiler, because they
can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has
served us well in the years since it was split from the parallel one,
but it's over now.
Let the knight slay one head of the two-headed dragon!
Dominator-order information is only needed for coverage graphs, and is easy
enough to collect by just traversing the graph again.
This avoids wasted work when computing graph dominators for any other purpose.
Use `ThinVec` for PredicateObligation storage
~~I noticed while profiling clippy on a project that a large amount of time is being spent allocating `Vec`s for `PredicateObligation`, and the `Vec`s are often quite small. This is an attempt to optimise this by using SmallVec to avoid heap allocations for these common small Vecs.~~
This PR turns all the `Vec<PredicateObligation>` into a single type alias while avoiding referring to `Vec` around it, then swaps the type over to `ThinVec<PredicateObligation>` and fixes the fallout. This also contains an implementation of `ThinVec::extract_if`, copied from `Vec::extract_if` and currently being upstreamed to https://github.com/Gankra/thin-vec/pull/66.
This leads to a small (0.2-0.7%) performance gain in the latest perf run.
Like #130865 did for the standard library, we can use `&raw` in the
compiler now that stage0 supports it. Also like the other issue, I did
not make any doc or test changes at this time.