Remove `DiagCtxt` API duplication
`DiagCtxt` defines the internal API for creating and emitting diagnostics: methods like `struct_err`, `struct_span_warn`, `note`, `create_fatal`, `emit_bug`. There are over 50 methods.
Some of these methods are then duplicated across several other types: `Session`, `ParseSess`, `Parser`, `ExtCtxt`, and `MirBorrowckCtxt`. `Session` duplicates the most, though half the ones it does are unused. Each duplicated method just calls forward to the corresponding method in `DiagCtxt`. So this duplication exists to (in the best case) shorten chains like `ecx.tcx.sess.parse_sess.dcx.emit_err()` to `ecx.emit_err()`.
This API duplication is ugly and has been bugging me for a while. And it's inconsistent: there's no real logic about which methods are duplicated, and the use of `#[rustc_lint_diagnostic]` and `#[track_caller]` attributes vary across the duplicates.
This PR removes the duplicated API methods and makes all diagnostic creation and emission go through `DiagCtxt`. It also adds `dcx` getter methods to several types to shorten chains. This approach scales *much* better than API duplication; indeed, the PR adds `dcx()` to numerous types that didn't have API duplication: `TyCtxt`, `LoweringCtxt`, `ConstCx`, `FnCtxt`, `TypeErrCtxt`, `InferCtxt`, `CrateLoader`, `CheckAttrVisitor`, and `Resolver`. These result in a lot of changes from `foo.tcx.sess.emit_err()` to `foo.dcx().emit_err()`. (You could do this with more types, but it gets into diminishing returns territory for types that don't emit many diagnostics.)
After all these changes, some call sites are more verbose, some are less verbose, and many are the same. The total number of lines is reduced, mostly because of the removed API duplication. And consistency is increased, because calls to `emit_err` and friends are always preceded with `.dcx()` or `.dcx`.
r? `@compiler-errors`
Unify SourceFile::name_hash and StableSourceFileId
This PR adapts the existing `StableSourceFileId` type so that it can be used instead of the `name_hash` field of `SourceFile`. This simplifies a few things that were kind of duplicated before.
The PR should also fix issues https://github.com/rust-lang/rust/issues/112700 and https://github.com/rust-lang/rust/issues/115835, but I was not able to reproduce these issues in a regression test. As far as I can tell, the root cause of these issues is that the id of the originating crate is not hashed in the `HashStable` impl of `Span` and thus cache entries that should have been considered invalidated were loaded. After this PR, the `stable_id` field of `SourceFile` includes information about the originating crate, so that ICE should not occur anymore.
`IntoDiagnostic` defaults to `ErrorGuaranteed`, because errors are the
most common diagnostic level. It makes sense to do likewise for the
closely-related (and much more widely used) `DiagnosticBuilder` type,
letting us write `DiagnosticBuilder<'a, ErrorGuaranteed>` as just
`DiagnosticBuilder<'a>`. This cuts over 200 lines of code due to many
multi-line things becoming single line things.
This commit replaces this pattern:
```
err.into_diagnostic(dcx)
```
with this pattern:
```
dcx.create_err(err)
```
in a lot of places.
It's a little shorter, makes the error level explicit, avoids some
`IntoDiagnostic` imports, and is a necessary prerequisite for the next
commit which will add a `level` arg to `into_diagnostic`.
This requires adding `track_caller` on `create_err` to avoid mucking up
the output of `tests/ui/track-diagnostics/track4.rs`. It probably should
have been there already.
Currently, `emit_diagnostic` takes `&mut self`.
This commit changes it so `emit_diagnostic` takes `self` and the new
`emit_diagnostic_without_consuming` function takes `&mut self`.
I find the distinction useful. The former case is much more common, and
avoids a bunch of `mut` and `&mut` occurrences. We can also restrict the
latter with `pub(crate)` which is nice.
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
Add support for making lib features internal
We have the notion of an "internal" lang feature: a feature that is never intended to be stabilized, and using which can cause ICEs and other issues without that being considered a bug.
This extends that idea to lib features as well. It is an alternative to https://github.com/rust-lang/rust/pull/115623: instead of using an attribute to declare lib features internal, we simply do this based on the name. Everything ending in `_internals` or `_internal` is considered internal.
Then we rename `core_intrinsics` to `core_intrinsics_internal`, which fixes https://github.com/rust-lang/rust/issues/115597.
Cut code size for feature hashing
This locally cuts ~32 kB of .text instructions.
This isn't really a clear win in terms of readability. IMO the code size benefits are worth it (even if they're not necessarily present in the x86_64 hyperoptimized build, I expect them to translate similarly to other platforms). Ultimately there's lots of "small ish" low hanging fruit like this that I'm seeing that seems worth tackling to me, and could translate into larger wins in aggregate.
Call FileEncoder::finish in rmeta encoding
Fixes https://github.com/rust-lang/rust/issues/117254
The bug here was that rmeta encoding never called FileEncoder::finish. Now it does. Most of the changes here are needed to support that, since rmeta encoding wants to finish _then_ access the File in the encoder, so finish can't move out.
I tried adding a `cfg(debug_assertions)` exploding Drop impl to FileEncoder that checked for finish being called before dropping, but fatal errors cause unwinding so this isn't really possible. If we encounter a fatal error with a dirty FileEncoder, the Drop impl ICEs even though the implementation is correct. If we try to paper over that by wrapping FileEncoder in ManuallyDrop then that just erases the fact that Drop automatically checks that we call finish on all paths.
I also changed the name of DepGraph::encode to DepGraph::finish_encoding, because that's what it does and it makes the fact that it is the path to FileEncoder::finish less confusing.
r? `@WaffleLapkin`
Currently we always do this:
```
use rustc_fluent_macro::fluent_messages;
...
fluent_messages! { "./example.ftl" }
```
But there is no need, we can just do this everywhere:
```
rustc_fluent_macro::fluent_messages! { "./example.ftl" }
```
which is shorter.
The `fluent_messages!` macro produces uses of
`crate::{D,Subd}iagnosticMessage`, which means that every crate using
the macro must have this import:
```
use rustc_errors::{DiagnosticMessage, SubdiagnosticMessage};
```
This commit changes the macro to instead use
`rustc_errors::{D,Subd}iagnosticMessage`, which avoids the need for the
imports.
print query map for deadlock when using parallel front end
print query map for deadlock when using parallel front end, so that we can analyze where and why deadlock occurs
By default, `newtype_index!` types get a default `Encodable`/`Decodable`
impl. You can opt out of this with `custom_encodable`. Opting out is the
opposite to how Rust normally works with autogenerated (derived) impls.
This commit inverts the behaviour, replacing `custom_encodable` with
`encodable` which opts into the default `Encodable`/`Decodable` impl.
Only 23 of the 59 `newtype_index!` occurrences need `encodable`.
Even better, there were eight crates with a dependency on
`rustc_serialize` just from unused default `Encodable`/`Decodable`
impls. This commit removes that dependency from those eight crates.
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
Don't store lazyness in `DefKind::TyAlias`
1. Don't store lazyness of a type alias in its `DefKind`, but instead via a query.
2. This allows us to treat type aliases as lazy if `#[feature(lazy_type_alias)]` *OR* if the alias contains a TAIT, rather than having checks for both in separate parts of the codebase.
r? `@oli-obk` cc `@fmease`
Simplify/Optimize FileEncoder
FileEncoder is basically a BufWriter except that it exposes access to the not-written-to-yet region of the buffer so that some users can write directly to the buffer. This strategy is awesome because it lets us avoid calling memcpy for small copies, but the previous strategy was based on the writer accessing a `&mut [MaybeUninit<u8>; N]` and returning a `&[u8]` which is an API which currently mandates the use of unsafe code, making that interface in general not that appealing.
So this PR cleans up the FileEncoder implementation and builds on that general idea of direct buffer access in order to prevent `memcpy` calls in a few key places when encoding the dep graph and rmeta tables. The interface used here is now 100% safe, but with the caveat that internally we need to avoid trusting the number of bytes that the provided function claims to have written.
The original primary objective of this PR was to clean up the FileEncoder implementation so that the fix for the following issues would be easy to implement. The fix for these issues is to correctly update self.buffered even when writes fail, which I think it's easy to verify manually is now done, because all the FileEncoder methods are small.
Fixes https://github.com/rust-lang/rust/issues/115298
Fixes https://github.com/rust-lang/rust/issues/114671
Fixes https://github.com/rust-lang/rust/issues/114045
Fixes https://github.com/rust-lang/rust/issues/108100
Fixes https://github.com/rust-lang/rust/issues/106787
Add optimized lock methods for `Sharded` and refactor `Lock`
This adds methods to `Sharded` which pick a shard and also locks it. These branch on parallelism just once instead of twice, improving performance.
Benchmark for `cfg(parallel_compiler)` and 1 thread:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6461s</td><td align="right">1.6345s</td><td align="right"> -0.70%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2414s</td><td align="right">0.2394s</td><td align="right"> -0.83%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9205s</td><td align="right">0.9143s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4981s</td><td align="right">1.4869s</td><td align="right"> -0.75%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.7629s</td><td align="right">5.7256s</td><td align="right"> -0.65%</td></tr><tr><td>Total</td><td align="right">10.0690s</td><td align="right">10.0008s</td><td align="right"> -0.68%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9928s</td><td align="right"> -0.72%</td></tr></table>
cc `@SparrowLii`
Use a specialized varint + bitpacking scheme for DepGraph encoding
The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large.
This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node.
While being built, each node now stores its edges in a `SmallVec` with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices.
Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction.
Then we apply a bit-packing scheme; since in https://github.com/rust-lang/rust/pull/115391 we now have unused bits on `DepKind`, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits.
r? `@nnethercote`
Remove conditional use of `Sharded` from query state
`Sharded` is already a zero cost abstraction, so it shouldn't affect the performance of the single thread compiler if LLVM does its job.
r? `@cjgillot`
Make `Sharded` an enum and specialize it for the single thread case
This changes `Sharded` to use a single shard by an enum, reducing the size of `Sharded` for greater cache efficiency.
Performance improvement with 1 thread and `cfg(parallel_compiler)`:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7009s</td><td align="right">1.6748s</td><td align="right">💚 -1.53%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2525s</td><td align="right">0.2451s</td><td align="right">💚 -2.90%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9519s</td><td align="right">0.9353s</td><td align="right">💚 -1.74%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5504s</td><td align="right">1.5280s</td><td align="right">💚 -1.45%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9536s</td><td align="right">5.8873s</td><td align="right">💚 -1.11%</td></tr><tr><td>Total</td><td align="right">10.4092s</td><td align="right">10.2706s</td><td align="right">💚 -1.33%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9825s</td><td align="right">💚 -1.75%</td></tr></table>
I did see an unexpected 0.23% change for the serial compiler, so this could use a perf run to see if that reproduces.
cc `@SparrowLii`
Store the laziness of type aliases in their `DefKind`
Previously, we would treat paths referring to type aliases as *lazy* type aliases if the current crate had lazy type aliases enabled independently of whether the crate which the alias was defined in had the feature enabled or not.
With this PR, the laziness of a type alias depends on the crate it is defined in. This generally makes more sense to me especially if / once lazy type aliases become the default in a new edition and we need to think about *edition interoperability*:
Consider the hypothetical case where the dependency crate has an older edition (and thus eager type aliases), it exports a type alias with bounds & a where-clause (which are void but technically valid), the dependent crate has the latest edition (and thus lazy type aliases) and it uses that type alias. Arguably, the bounds should *not* be checked since at any time, the dependency crate should be allowed to change the bounds at will with a *non*-major version bump & without negatively affecting downstream crates.
As for the reverse case (dependency: lazy type aliases, dependent: eager type aliases), I guess it rules out anything from slight confusion to mild annoyance from upstream crate authors that would be caused by the compiler ignoring the bounds of their type aliases in downstream crates with older editions.
---
This fixes#114468 since before, my assumption that the type alias associated with a given weak projection was lazy (and therefore had its variances computed) did not necessarily hold in cross-crate scenarios (which [I kinda had a hunch about](https://github.com/rust-lang/rust/pull/114253#discussion_r1278608099)) as outlined above. Now it does hold.
`@rustbot` label F-lazy_type_alias
r? `@oli-obk`
Implement rust-lang/compiler-team#578.
When an ICE is encountered on nightly releases, the new rustc panic
handler will also write the contents of the backtrace to disk. If any
`delay_span_bug`s are encountered, their backtrace is also added to the
file. The platform and rustc version will also be collected.
Don't hold the active queries lock while calling `make_query`
This moves the call to `make_query` outside the parts that holds the active queries lock in `try_collect_active_jobs`. This should help removed the deadlock and borrow panic that has been observed when printing the query stack during an ICE.
cc `@SparrowLii`
r? `@cjgillot`
Don't leak the function that is called on drop
It probably wasn't causing problems anyway, but still, a `// this leaks, please don't pass anything that owns memory` is not sustainable.
I could implement a version which does not require `Option`, but it would require `unsafe`, at which point it's probably not worth it.
Use dynamic dispatch for queries
This replaces most concrete query values `V` with `MaybeUninit<[u8; { size_of::<V>() }]>` reducing the code instantiated by queries. The compile time of `rustc_query_impl` is reduced by 27%. It is an alternative to https://github.com/rust-lang/rust/pull/107937 which uses unstable const generics while this uses a `EraseType` trait which maps query values to their erased variant.
This is achieved by introducing an `Erased` type which does sanity check with `cfg(debug_assertions)`. The query caches gets instantiated with these erased types leaving the code in `rustc_query_system` unaware of them. `rustc_query_system` is changed to use instances of `QueryConfig` so that `rustc_query_impl` can pass in `DynamicConfig` which holds a pointer to a virtual table.
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7055s</td><td align="right">1.6949s</td><td align="right"> -0.62%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2547s</td><td align="right">0.2528s</td><td align="right"> -0.73%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9590s</td><td align="right">0.9553s</td><td align="right"> -0.39%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5457s</td><td align="right">1.5440s</td><td align="right"> -0.11%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9092s</td><td align="right">5.9009s</td><td align="right"> -0.14%</td></tr><tr><td>Total</td><td align="right">10.3741s</td><td align="right">10.3479s</td><td align="right"> -0.25%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:initial</td><td align="right">2.0605s</td><td align="right">2.0575s</td><td align="right"> -0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check:initial</td><td align="right">0.3218s</td><td align="right">0.3216s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>regex</b>:check:initial</td><td align="right">1.1848s</td><td align="right">1.1839s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>syn</b>:check:initial</td><td align="right">1.9409s</td><td align="right">1.9376s</td><td align="right"> -0.17%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:initial</td><td align="right">7.3105s</td><td align="right">7.2928s</td><td align="right"> -0.24%</td></tr><tr><td>Total</td><td align="right">12.8185s</td><td align="right">12.7935s</td><td align="right"> -0.20%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9986s</td><td align="right"> -0.14%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4606s</td><td align="right">0.4617s</td><td align="right"> 0.24%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1335s</td><td align="right">0.1336s</td><td align="right"> 0.08%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3324s</td><td align="right">0.3346s</td><td align="right"> 0.65%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.6268s</td><td align="right">0.6307s</td><td align="right"> 0.64%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.8248s</td><td align="right">1.8508s</td><td align="right">💔 1.43%</td></tr><tr><td>Total</td><td align="right">3.3779s</td><td align="right">3.4113s</td><td align="right"> 0.99%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">1.0061s</td><td align="right"> 0.61%</td></tr></table>
It's based on https://github.com/rust-lang/rust/pull/108167.
r? `@cjgillot`