Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
This was largely just caching the shard value at this point, which is not
particularly useful -- in the use sites the key was being hashed nearby anyway.
Specifically, rename the `Const` struct as `ConstS` and re-introduce `Const` as
this:
```
pub struct Const<'tcx>(&'tcx Interned<ConstS>);
```
This now matches `Ty` and `Predicate` more closely, including using
pointer-based `eq` and `hash`.
Notable changes:
- `mk_const` now takes a `ConstS`.
- `Const` was copy, despite being 48 bytes. Now `ConstS` is not, so need a
we need separate arena for it, because we can't use the `Dropless` one any
more.
- Many `&'tcx Const<'tcx>`/`&Const<'tcx>` to `Const<'tcx>` changes
- Many `ct.ty` to `ct.ty()` and `ct.val` to `ct.val()` changes.
- Lots of tedious sigil fiddling.
Specifically, change `Ty` from this:
```
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
```
to this
```
pub struct Ty<'tcx>(Interned<'tcx, TyS<'tcx>>);
```
There are two benefits to this.
- It's now a first class type, so we can define methods on it. This
means we can move a lot of methods away from `TyS`, leaving `TyS` as a
barely-used type, which is appropriate given that it's not meant to
be used directly.
- The uniqueness requirement is now explicit, via the `Interned` type.
E.g. the pointer-based `Eq` and `Hash` comes from `Interned`, rather
than via `TyS`, which wasn't obvious at all.
Much of this commit is boring churn. The interesting changes are in
these files:
- compiler/rustc_middle/src/arena.rs
- compiler/rustc_middle/src/mir/visit.rs
- compiler/rustc_middle/src/ty/context.rs
- compiler/rustc_middle/src/ty/mod.rs
Specifically:
- Most mentions of `TyS` are removed. It's very much a dumb struct now;
`Ty` has all the smarts.
- `TyS` now has `crate` visibility instead of `pub`.
- `TyS::make_for_test` is removed in favour of the static `BOOL_TY`,
which just works better with the new structure.
- The `Eq`/`Ord`/`Hash` impls are removed from `TyS`. `Interned`s impls
of `Eq`/`Hash` now suffice. `Ord` is now partly on `Interned`
(pointer-based, for the `Equal` case) and partly on `TyS`
(contents-based, for the other cases).
- There are many tedious sigil adjustments, i.e. adding or removing `*`
or `&`. They seem to be unavoidable.
Refactor query system to maintain a global job id counter
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
Delete -Zquery-stats infrastructure
These statistics are computable from the self-profile data and/or ad-hoc collectable as needed, and in the meantime contribute to rustc bootstrap times -- locally, this PR shaves ~2.5% from rustc_query_impl builds in instruction counts.
If this does lose some functionality we want to keep, I think we should migrate it to self-profile (or a similar interface) rather than this ad-hoc reporting.
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
These statistics are computable from the self-profile data and/or ad-hoc
collectable as needed, and in the meantime contribute to rustc bootstrap times.
`Decoder` has two impls:
- opaque: this impl is already partly infallible, i.e. in some places it
currently panics on failure (e.g. if the input is too short, or on a
bad `Result` discriminant), and in some places it returns an error
(e.g. on a bad `Option` discriminant). The number of places where
either happens is surprisingly small, just because the binary
representation has very little redundancy and a lot of input reading
can occur even on malformed data.
- json: this impl is fully fallible, but it's only used (a) for the
`.rlink` file production, and there's a `FIXME` comment suggesting it
should change to a binary format, and (b) in a few tests in
non-fundamental ways. Indeed #85993 is open to remove it entirely.
And the top-level places in the compiler that call into decoding just
abort on error anyway. So the fallibility is providing little value, and
getting rid of it leads to some non-trivial performance improvements.
Much of this commit is pretty boring and mechanical. Some notes about
a few interesting parts:
- The commit removes `Decoder::{Error,error}`.
- `InternIteratorElement::intern_with`: the impl for `T` now has the same
optimization for small counts that the impl for `Result<T, E>` has,
because it's now much hotter.
- Decodable impls for SmallVec, LinkedList, VecDeque now all use
`collect`, which is nice; the one for `Vec` uses unsafe code, because
that gave better perf on some benchmarks.
Fixes#92163Fixes#92014
When writing to the incremental cache, we encode all `Span`s
we encounter, regardless of whether or not their `SourceFile`
comes from the local crate, or from a foreign crate.
When we decode a `Span`, we use the `StableSourceFileId` we encoded
to locate the matching `SourceFile` in the current session. If this
id corresponds to a `SourceFile` from another crate, then we need to
have already imported that `SourceFile` into our current session.
This usually happens automatically during resolution / macro expansion,
when we try to resolve definitions from other crates. In certain cases,
however, we may try to load a `Span` from a transitive dependency
without having ever imported the `SourceFile`s from that crate, leading
to an ICE.
This PR fixes the issue by calling `imported_source_files()`
when we encounter a `SourceFile` with a foreign `CrateNum`.
This ensures that all `SourceFile`s from that crate are imported
into the current session.
Remove `SymbolStr`
This was originally proposed in https://github.com/rust-lang/rust/pull/74554#discussion_r466203544. As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.
Best reviewed one commit at a time.
r? `@oli-obk`
By changing `as_str()` to take `&self` instead of `self`, we can just
return `&str`. We're still lying about lifetimes, but it's a smaller lie
than before, where `SymbolStr` contained a (fake) `&'static str`!
Build the query vtable directly.
Continuation of https://github.com/rust-lang/rust/pull/89978.
This shrinks the query interface and attempts to reduce the amount of function pointer calls.
This was already only enabled in debug_assertions builds. Generally, it seems
like most use cases that would use this could also use the -Zself-profile flag
which also tracks cache hits (in all builds), and so the extra cfg's and such
are not really necessary.
This is largely just a small cleanup though, which primarily is intended to make
other changes easier by avoiding the need to deal with this field.
Refactor fingerprint reconstruction
This PR replaces can_reconstruct_query_key with fingerprint_style, which returns the style of the fingerprint for that query. This allows us to avoid trying to extract a DefId (or equivalent) from keys which *are* reconstructible because they're () but not as DefIds.
This is done with the goal of fixing -Zdump-dep-graph, which seems to have broken a while ago (I didn't try to bisect). Currently even on a `fn main() {}` file it'll ICE (you need to also pass -Zquery-dep-graph for it to work at all), and this patch indirectly fixes the cause of that ICE. This also adds a test for it continuing to work.
Turn vtable_allocation() into a query
This PR removes the untracked vtable-const-allocation cache from the `tcx` and turns the `vtable_allocation()` method into a query.
The change is pretty straightforward and should be backportable without too much effort.
Fixes https://github.com/rust-lang/rust/issues/89598.