Don't hold the active queries lock while calling `make_query`
This moves the call to `make_query` outside the parts that holds the active queries lock in `try_collect_active_jobs`. This should help removed the deadlock and borrow panic that has been observed when printing the query stack during an ICE.
cc `@SparrowLii`
r? `@cjgillot`
Don't leak the function that is called on drop
It probably wasn't causing problems anyway, but still, a `// this leaks, please don't pass anything that owns memory` is not sustainable.
I could implement a version which does not require `Option`, but it would require `unsafe`, at which point it's probably not worth it.
Use dynamic dispatch for queries
This replaces most concrete query values `V` with `MaybeUninit<[u8; { size_of::<V>() }]>` reducing the code instantiated by queries. The compile time of `rustc_query_impl` is reduced by 27%. It is an alternative to https://github.com/rust-lang/rust/pull/107937 which uses unstable const generics while this uses a `EraseType` trait which maps query values to their erased variant.
This is achieved by introducing an `Erased` type which does sanity check with `cfg(debug_assertions)`. The query caches gets instantiated with these erased types leaving the code in `rustc_query_system` unaware of them. `rustc_query_system` is changed to use instances of `QueryConfig` so that `rustc_query_impl` can pass in `DynamicConfig` which holds a pointer to a virtual table.
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7055s</td><td align="right">1.6949s</td><td align="right"> -0.62%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2547s</td><td align="right">0.2528s</td><td align="right"> -0.73%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9590s</td><td align="right">0.9553s</td><td align="right"> -0.39%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5457s</td><td align="right">1.5440s</td><td align="right"> -0.11%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9092s</td><td align="right">5.9009s</td><td align="right"> -0.14%</td></tr><tr><td>Total</td><td align="right">10.3741s</td><td align="right">10.3479s</td><td align="right"> -0.25%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:initial</td><td align="right">2.0605s</td><td align="right">2.0575s</td><td align="right"> -0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check:initial</td><td align="right">0.3218s</td><td align="right">0.3216s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>regex</b>:check:initial</td><td align="right">1.1848s</td><td align="right">1.1839s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>syn</b>:check:initial</td><td align="right">1.9409s</td><td align="right">1.9376s</td><td align="right"> -0.17%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:initial</td><td align="right">7.3105s</td><td align="right">7.2928s</td><td align="right"> -0.24%</td></tr><tr><td>Total</td><td align="right">12.8185s</td><td align="right">12.7935s</td><td align="right"> -0.20%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9986s</td><td align="right"> -0.14%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4606s</td><td align="right">0.4617s</td><td align="right"> 0.24%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1335s</td><td align="right">0.1336s</td><td align="right"> 0.08%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3324s</td><td align="right">0.3346s</td><td align="right"> 0.65%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.6268s</td><td align="right">0.6307s</td><td align="right"> 0.64%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.8248s</td><td align="right">1.8508s</td><td align="right">💔 1.43%</td></tr><tr><td>Total</td><td align="right">3.3779s</td><td align="right">3.4113s</td><td align="right"> 0.99%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">1.0061s</td><td align="right"> 0.61%</td></tr></table>
It's based on https://github.com/rust-lang/rust/pull/108167.
r? `@cjgillot`
Currently a `{D,Subd}iagnosticMessage` can be created from any type that
impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static,
str>`, which are reasonable. It also includes `&String`, which is pretty
weird, and results in many places making unnecessary allocations for
patterns like this:
```
self.fatal(&format!(...))
```
This creates a string with `format!`, takes a reference, passes the
reference to `fatal`, which does an `into()`, which clones the
reference, doing a second allocation. Two allocations for a single
string, bleh.
This commit changes the `From` impls so that you can only create a
`{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static,
str>`. This requires changing all the places that currently create one
from a `&String`. Most of these are of the `&format!(...)` form
described above; each one removes an unnecessary static `&`, plus an
allocation when executed. There are also a few places where the existing
use of `&String` was more reasonable; these now just use `clone()` at
the call site.
As well as making the code nicer and more efficient, this is a step
towards possibly using `Cow<'static, str>` in
`{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing
the `From<&'a str>` impls to `From<&'static str>`, which is doable, but
I'm not yet sure if it's worthwhile.
Remove `QueryEngine` trait
This removes the `QueryEngine` trait and `Queries` from `rustc_query_impl` and replaced them with function pointers and fields in `QuerySystem`. As a side effect `OnDiskCache` is moved back into `rustc_middle` and the `OnDiskCache` trait is also removed.
This has a couple of benefits.
- `TyCtxt` is used in the query system instead of the removed `QueryCtxt` which is larger.
- Function pointers are more flexible to work with. A variant of https://github.com/rust-lang/rust/pull/107802 is included which avoids the double indirection. For https://github.com/rust-lang/rust/pull/108938 we can name entry point `__rust_end_short_backtrace` to avoid some overhead. For https://github.com/rust-lang/rust/pull/108062 it avoids the duplicate `QueryEngine` structs.
- `QueryContext` now implements `DepContext` which avoids many `dep_context()` calls in `rustc_query_system`.
- The `rustc_driver` size is reduced by 0.33%, hopefully that means some bootstrap improvements.
- This avoids the unsafe code around the `QueryEngine` trait.
r? `@cjgillot`
Currently it creates an `Option` and then does `map`/`unwrap_or` and
`map_or_else` on it, which is hard to read.
This commit simplifies things by moving more code into the two arms of
the if/else.
Rewrite MemDecoder around pointers not a slice
This is basically https://github.com/rust-lang/rust/pull/109910 but I'm being a lot more aggressive. The pointer-based structure means that it makes a lot more sense to absorb more complexity into `MemDecoder`, most of the diff is just complexity moving from one place to another.
The primary argument for this structure is that we only incur a single bounds check when doing multi-byte reads from a `MemDecoder`. With the slice-based implementation we need to do those with `data[position..position + len]` , which needs to account for `position + len` wrapping. It would be possible to dodge the first bounds check if we stored a slice that starts at `position`, but that would require updating the pointer and length on every read.
This PR also embeds the failure path in a separate function, which means that this PR should subsume all the perf wins observed in https://github.com/rust-lang/rust/pull/109867.
Add `rustc_fluent_macro` to decouple fluent from `rustc_macros`
Fluent, with all the icu4x it brings in, takes quite some time to compile. `fluent_messages!` is only needed in further downstream rustc crates, but is blocking more upstream crates like `rustc_index`. By splitting it out, we allow `rustc_macros` to be compiled earlier, which speeds up `x check compiler` by about 5 seconds (and even more after the needless dependency on `serde_json` is removed from `rustc_data_structures`).
Encode hashes as bytes, not varint
In a few places, we store hashes as `u64` or `u128` and then apply `derive(Decodable, Encodable)` to the enclosing struct/enum. It is more efficient to encode hashes directly than try to apply some varint encoding. This PR adds two new types `Hash64` and `Hash128` which are produced by `StableHasher` and replace every use of storing a `u64` or `u128` that represents a hash.
Distribution of the byte lengths of leb128 encodings, from `x build --stage 2` with `incremental = true`
Before:
```
( 1) 373418203 (53.7%, 53.7%): 1
( 2) 196240113 (28.2%, 81.9%): 3
( 3) 108157958 (15.6%, 97.5%): 2
( 4) 17213120 ( 2.5%, 99.9%): 4
( 5) 223614 ( 0.0%,100.0%): 9
( 6) 216262 ( 0.0%,100.0%): 10
( 7) 15447 ( 0.0%,100.0%): 5
( 8) 3633 ( 0.0%,100.0%): 19
( 9) 3030 ( 0.0%,100.0%): 8
( 10) 1167 ( 0.0%,100.0%): 18
( 11) 1032 ( 0.0%,100.0%): 7
( 12) 1003 ( 0.0%,100.0%): 6
( 13) 10 ( 0.0%,100.0%): 16
( 14) 10 ( 0.0%,100.0%): 17
( 15) 5 ( 0.0%,100.0%): 12
( 16) 4 ( 0.0%,100.0%): 14
```
After:
```
( 1) 372939136 (53.7%, 53.7%): 1
( 2) 196240140 (28.3%, 82.0%): 3
( 3) 108014969 (15.6%, 97.5%): 2
( 4) 17192375 ( 2.5%,100.0%): 4
( 5) 435 ( 0.0%,100.0%): 5
( 6) 83 ( 0.0%,100.0%): 18
( 7) 79 ( 0.0%,100.0%): 10
( 8) 50 ( 0.0%,100.0%): 9
( 9) 6 ( 0.0%,100.0%): 19
```
The remaining 9 or 10 and 18 or 19 are `u64` and `u128` respectively that have the high bits set. As far as I can tell these are coming primarily from `SwitchTargets`.
Fluent, with all the icu4x it brings in, takes quite some time to
compile. `fluent_messages!` is only needed in further downstream rustc
crates, but is blocking more upstream crates like `rustc_index`. By
splitting it out, we allow `rustc_macros` to be compiled earlier, which
speeds up `x check compiler` by about 5 seconds (and even more after the
needless dependency on `serde_json` is removed from
`rustc_data_structures`).
Various minor Idx-related tweaks
Nothing particularly exciting here, but a couple of things I noticed as I was looking for more index conversions to simplify.
cc https://github.com/rust-lang/compiler-team/issues/606
r? `@WaffleLapkin`
incr.comp.: Make sure dependencies are recorded when feeding queries during eval-always queries.
This PR makes sure we don't drop dependency edges when feeding queries during an eval-always query.
Background: During eval-always queries, no dependencies are recorded because the system knows to unconditionally re-evaluate them regardless of any actual dependencies. This works fine for these queries themselves but leads to a problem when feeding other queries: When queries are fed, we set up their dependency edges by copying the current set of dependencies of the feeding query. But because this set is empty for eval-always queries, we record no edges at all -- which has the effect that the fed query instances always look "green" to the system, although they should always be "red".
The fix is to explicitly add a dependency on the artificial "always red" dep-node when feeding during eval-always queries.
Fixes https://github.com/rust-lang/rust/issues/108481
Maybe also fixes issue https://github.com/rust-lang/rust/issues/88488.
cc `@jyn514`
r? `@cjgillot` or `@oli-obk`
Use Rayon's TLV directly
This accesses Rayon's `TLV` thread local directly avoiding wrapper functions. This makes rustc work with https://github.com/rust-lang/rustc-rayon/pull/10.
r? `@cuviper`
Refactor `try_execute_query`
This merges `JobOwner::try_start` into `try_execute_query`, removing `TryGetJob` in the processes. 3 new functions are extracted from `try_execute_query`: `execute_job`, `cycle_error` and `wait_for_query`. This makes the control flow a bit clearer and improves performance.
Based on https://github.com/rust-lang/rust/pull/109046.
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7134s</td><td align="right">1.7061s</td><td align="right"> -0.43%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2519s</td><td align="right">0.2510s</td><td align="right"> -0.35%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9517s</td><td align="right">0.9481s</td><td align="right"> -0.38%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5389s</td><td align="right">1.5338s</td><td align="right"> -0.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9488s</td><td align="right">5.9258s</td><td align="right"> -0.39%</td></tr><tr><td>Total</td><td align="right">10.4048s</td><td align="right">10.3647s</td><td align="right"> -0.38%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9962s</td><td align="right"> -0.38%</td></tr></table>
r? `@cjgillot`