`SourceFile::lines` is a big part of metadata. It's stored in a compressed form
(a difference list) to save disk space. Decoding it is a big fraction of
compile time for very small crates/programs.
This commit introduces a new type `SourceFileLines` which has a `Lines`
form and a `Diffs` form. The latter is used when the metadata is first
read, and it is only decoded into the `Lines` form when line data is
actually needed. This avoids the decoding cost for many files,
especially in `std`. It's a performance win of up to 15% for tiny
crates/programs where metadata decoding is a high part of compilation
costs.
A `Lock` is needed because the methods that access lines data (which can
trigger decoding) take `&self` rather than `&mut self`. To allow for this,
`SourceFile::lines` now takes a `FnMut` that operates on the lines slice rather
than returning the lines slice.
Begin fixing all the broken doctests in `compiler/`
Begins to fix#95994.
All of them pass now but 24 of them I've marked with `ignore HELP (<explanation>)` (asking for help) as I'm unsure how to get them to work / if we should leave them as they are.
There are also a few that I marked `ignore` that could maybe be made to work but seem less important.
Each `ignore` has a rough "reason" for ignoring after it parentheses, with
- `(pseudo-rust)` meaning "mostly rust-like but contains foreign syntax"
- `(illustrative)` a somewhat catchall for either a fragment of rust that doesn't stand on its own (like a lone type), or abbreviated rust with ellipses and undeclared types that would get too cluttered if made compile-worthy.
- `(not-rust)` stuff that isn't rust but benefits from the syntax highlighting, like MIR.
- `(internal)` uses `rustc_*` code which would be difficult to make work with the testing setup.
Those reason notes are a bit inconsistently applied and messy though. If that's important I can go through them again and try a more principled approach. When I run `rg '```ignore \(' .` on the repo, there look to be lots of different conventions other people have used for this sort of thing. I could try unifying them all if that would be helpful.
I'm not sure if there was a better existing way to do this but I wrote my own script to help me run all the doctests and wade through the output. If that would be useful to anyone else, I put it here: https://github.com/Elliot-Roberts/rust_doctest_fixing_tool
Enable tracing for all queries
This allows you to log everything within a specific query, e.g.
```
env RUSTC_LOG=[mir_borrowck]
```
dumping all borrowck queries may be a bit verbose, so you can also restrict it to just an item of your choice:
```
env RUSTC_LOG=[mir_borrowck{key=\.\*name_of_item\.\*}]
```
the regex `.*` in the key name are because the key is a debug printed DefId, so you'd get all kinds of things like hashes in there. The tracing logs will show you the key, so you can restrict it further if you want.
Cached stable hash cleanups
r? `@nnethercote`
Add a sanity assertion in debug mode to check that the cached hashes are actually the ones we get if we compute the hash each time.
Add a new data structure that bundles all the hash-caching work to make it easier to re-use it for different interned data structures
This commit updates the signatures of all diagnostic functions to accept
types that can be converted into a `DiagnosticMessage`. This enables
existing diagnostic calls to continue to work as before and Fluent
identifiers to be provided. The `SessionDiagnostic` derive just
generates normal diagnostic calls, so these APIs had to be modified to
accept Fluent identifiers.
In addition, loading of the "fallback" Fluent bundle, which contains the
built-in English messages, has been implemented.
Each diagnostic now has "arguments" which correspond to variables in the
Fluent messages (necessary to render a Fluent message) but no API for
adding arguments has been added yet. Therefore, diagnostics (that do not
require interpolation) can be converted to use Fluent identifiers and
will be output as before.
Lazy type-alias-impl-trait take two
### user visible change 1: RPIT inference from recursive call sites
Lazy TAIT has an insta-stable change. The following snippet now compiles, because opaque types can now have their hidden type set from wherever the opaque type is mentioned.
```rust
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return 42
}
let x: u32 = bar(false); // this errors on stable
99
}
```
The return type of `bar` stays opaque, you can't do `bar(false) + 42`, you need to actually mention the hidden type.
### user visible change 2: divergence between RPIT and TAIT in return statements
Note that `return` statements and the trailing return expression are special with RPIT (but not TAIT). So
```rust
#![feature(type_alias_impl_trait)]
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![42];
}
std::iter::empty().collect() //~ ERROR `Foo` cannot be built from an iterator
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![42]
}
std::iter::empty().collect() // Works, magic (accidentally stabilized, not intended)
}
```
But when we are working with the return value of a recursive call, the behavior of RPIT and TAIT is the same:
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![];
}
let mut x = foo(false);
x = std::iter::empty().collect(); //~ ERROR `Foo` cannot be built from an iterator
vec![]
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![];
}
let mut x = bar(false);
x = std::iter::empty().collect(); //~ ERROR `impl Debug` cannot be built from an iterator
vec![]
}
```
### user visible change 3: TAIT does not merge types across branches
In contrast to RPIT, TAIT does not merge types across branches, so the following does not compile.
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
vec![42_i32]
} else {
std::iter::empty().collect()
//~^ ERROR `Foo` cannot be built from an iterator over elements of type `_`
}
}
```
It is easy to support, but we should make an explicit decision to include the additional complexity in the implementation (it's not much, see a721052457cf513487fb4266e3ade65c29b272d2 which needs to be reverted to enable this).
### PR formalities
previous attempt: #92007
This PR also includes #92306 and #93783, as they were reverted along with #92007 in #93893fixes#93411fixes#88236fixes#89312fixes#87340fixes#86800fixes#86719fixes#84073fixes#83919fixes#82139fixes#77987fixes#74282fixes#67830fixes#62742fixes#54895
Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
Simplify rustc_serialize by dropping support for decoding into JSON
This PR currently bundles two (somewhat separate) tasks.
First, it removes the JSON Decoder trait impl, which permitted going from JSON to Rust structs. For now, we keep supporting JSON deserialization, but only to `Json` (an equivalent of serde_json::Value). The primary hard to remove user there is for custom targets -- which need some form of JSON deserialization -- but they already have a custom ad-hoc pass for moving from Json to a Rust struct.
A [comment](e7aca89598/compiler/rustc_target/src/spec/mod.rs (L1653)) there suggests that it would be impractical to move them to a Decodable-based impl, at least without backwards compatibility concerns. I suspect that if we were widely breaking compat there, it would make sense to use serde_json at this point which would produce better error messages; the types in rustc_target are relatively isolated so we would not particularly suffer from using serde_derive.
The second part of the PR (all but the first commit) is to simplify the Decoder API by removing the non-primitive `read_*` functions. These primarily add indirection (through a closure), which doesn't directly cause a performance issue (the unique closure types essentially guarantee monomorphization), but does increase the amount of work rustc and LLVM need to do. This could be split out to a separate PR, but is included here in part to help motivate the first part.
Future work might consist of:
* Specializing enum discriminant encoding to avoid leb128 for small enums (since we know the variant count, we can directly use read/write u8 in almost all cases)
* Adding new methods to support faster deserialization (e.g., access to the underlying byte stream)
* Currently these are somewhat ad-hoc supported by specializations for e.g. `Vec<u8>`, but other types which could benefit don't today.
* Removing the Decoder trait entirely in favor of a concrete type -- today, we only really have one impl of it modulo wrappers used for specialization-based dispatch.
Highly recommend review with whitespace changes off, as the removal of closures frequently causes things to be de-indented.
This was largely just caching the shard value at this point, which is not
particularly useful -- in the use sites the key was being hashed nearby anyway.
Refactor query system to maintain a global job id counter
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
Lazy type-alias-impl-trait
Previously opaque types were processed by
1. replacing all mentions of them with inference variables
2. memorizing these inference variables in a side-table
3. at the end of typeck, resolve the inference variables in the side table and use the resolved type as the hidden type of the opaque type
This worked okayish for `impl Trait` in return position, but required lots of roundabout type inference hacks and processing.
This PR instead stops this process of replacing opaque types with inference variables, and just keeps the opaque types around.
Whenever an opaque type `O` is compared with another type `T`, we make the comparison succeed and record `T` as the hidden type. If `O` is compared to `U` while there is a recorded hidden type for it, we grab the recorded type (`T`) and compare that against `U`. This makes implementing
* https://github.com/rust-lang/rfcs/pull/2515
much simpler (previous attempts on the inference based scheme were very prone to ICEs and general misbehaviour that was not explainable except by random implementation defined oddities).
r? `@nikomatsakis`
fixes#93411fixes#88236
by using an opaque type obligation to bubble up comparisons between opaque types and other types
Also uses proper obligation causes so that the body id works, because out of some reason nll uses body ids for logic instead of just diagnostics.
`Decoder` has two impls:
- opaque: this impl is already partly infallible, i.e. in some places it
currently panics on failure (e.g. if the input is too short, or on a
bad `Result` discriminant), and in some places it returns an error
(e.g. on a bad `Option` discriminant). The number of places where
either happens is surprisingly small, just because the binary
representation has very little redundancy and a lot of input reading
can occur even on malformed data.
- json: this impl is fully fallible, but it's only used (a) for the
`.rlink` file production, and there's a `FIXME` comment suggesting it
should change to a binary format, and (b) in a few tests in
non-fundamental ways. Indeed #85993 is open to remove it entirely.
And the top-level places in the compiler that call into decoding just
abort on error anyway. So the fallibility is providing little value, and
getting rid of it leads to some non-trivial performance improvements.
Much of this commit is pretty boring and mechanical. Some notes about
a few interesting parts:
- The commit removes `Decoder::{Error,error}`.
- `InternIteratorElement::intern_with`: the impl for `T` now has the same
optimization for small counts that the impl for `Result<T, E>` has,
because it's now much hotter.
- Decodable impls for SmallVec, LinkedList, VecDeque now all use
`collect`, which is nice; the one for `Vec` uses unsafe code, because
that gave better perf on some benchmarks.
Update rayon and rustc-rayon
This updates rayon for various tools and rustc-rayon for the compiler's parallel mode.
- rayon v1.3.1 -> v1.5.1
- rayon-core v1.7.1 -> v1.9.1
- rustc-rayon v0.3.1 -> v0.3.2
- rustc-rayon-core v0.3.1 -> v0.3.2
... and indirectly, this updates all of crossbeam-* to their latest versions.
Fixes#92677 by removing crossbeam-queue, but there's still a lingering question about how tidy discovers "runtime" dependencies. None of this is truly in the standard library's dependency tree at all.
Ensure that `Fingerprint` caching respects hashing configuration
Fixes#92266
In some `HashStable` impls, we use a cache to avoid re-computing
the same `Fingerprint` from the same structure (e.g. an `AdtDef`).
However, the `StableHashingContext` used can be configured to
perform hashing in different ways (e.g. skipping `Span`s). This
configuration information is not included in the cache key,
which will cause an incorrect `Fingerprint` to be used if
we hash the same structure with different `StableHashingContext`
settings.
To fix this, the configuration settings of `StableHashingContext`
are split out into a separate `HashingControls` struct. This
struct is used as part of the cache key, ensuring that our caches
always produce the correct result for the given settings.
With this in place, we now turn off `Span` hashing during the
entire process of computing the hash included in legacy symbols.
This current has no effect, but will matter when a future PR
starts hashing more `Span`s that we currently skip.