Unsupported query error now specifies if its unsupported for local or external crate
Fixes#101666.
I had to move `keys.rs` from `rustc_query_impl` to `rustc_middle`. I don't know if that is problematic. I couldn't think of any other way to get the needed information inside `rustc_middle`.
r? ```@jyn514```
Initial pass at expr/abstract const/s
Address comments
Switch to using a list instead of &[ty::Const], rm `AbstractConst`
Remove try_unify_abstract_consts
Update comments
Add edits
Recurse more
More edits
Prevent equating associated consts
Move failing test to ui
Changes this test from incremental to ui, and mark it as failing and a known bug.
Does not cause the compiler to ICE, so should be ok.
indirect immutable freeze by-value function parameters.
Right now, `rustc` only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, `readonly`
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.
This patch introduces the machinery that allows `rustc` to determine those
attributes. It consists of a query, `deduced_param_attrs`, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the `readonly` attribute should be applied to
parameters that are indirect immutable freeze by-value. More attributes could
conceivably be deduced in the future: `nocapture` and `noalias` come to mind.
Adding `readonly` to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in [issue 103103] and [PR
103070] around avoiding stack-to-stack memory copies that appear in functions
like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly
nocapture noalias` in both the caller and callee. We already pass `nocapture
noalias`, but we're missing `readonly`, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the `readonly` attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.
No large benefits should be expected from this optimization *now*; LLVM needs
some changes (discussed in [PR 103070]) to more aggressively use the `noalias
nocapture readonly` combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the `memcpy`s from the following
code:
```rust
fn main() {
println!("Hello {}", 3);
}
```
which is a significant codegen improvement over the status quo. I expect that
if this optimization kicks in in multiple places even for such a simple
program, then it will apply to Rust code all over the place.
[issue 103103]: https://github.com/rust-lang/rust/issues/103103
[PR 103070]: https://github.com/rust-lang/rust/pull/103070
Queries can provide an arbitrary expression for their description and
their caching behavior. Before, these expressions where stored in a
`rustc_query_description` macro emitted by the `rustc_queries` macro,
and then used in `rustc_query_impl` to fill out the methods for the
`QueryDescription` trait.
Instead, we now emit two new modules from `rustc_queries` containing the
functions with the expressions. `rustc_query_impl` calls these functions
now instead of invoking the macro.
Since we are now defining some of the functions in
`rustc_middle::query`, we now need all the imports for the key types
there as well.
The compiler currently has `-Ztime` and `-Ztime-passes`. I've used
`-Ztime-passes` for years but only recently learned about `-Ztime`.
What's the difference? Let's look at the `-Zhelp` output:
```
-Z time=val -- measure time of rustc processes (default: no)
-Z time-passes=val -- measure time of each rustc pass (default: no)
```
The `-Ztime-passes` description is clear, but the `-Ztime` one is less so.
Sounds like it measures the time for the entire process?
No. The real difference is that `-Ztime-passes` prints out info about passes,
and `-Ztime` does the same, but only for a subset of those passes. More
specifically, there is a distinction in the profiling code between a "verbose
generic activity" and an "extra verbose generic activity". `-Ztime-passes`
prints both kinds, while `-Ztime` only prints the first one. (It took me
a close reading of the source code to determine this difference.)
In practice this distinction has low value. Perhaps in the past the "extra
verbose" output was more voluminous, but now that we only print stats for a
pass if it exceeds 5ms or alters the RSS, `-Ztime-passes` is less spammy. Also,
a lot of the "extra verbose" cases are for individual lint passes, and you need
to also use `-Zno-interleave-lints` to see those anyway.
Therefore, this commit removes `-Ztime` and the associated machinery. One thing
to note is that the existing "extra verbose" activities all have an extra
string argument, so the commit adds the ability to accept an extra argument to
the "verbose" activities.
Use function pointers instead of macro-unrolled loops in rustc_query_impl
By making these standalone functions, we
a) allow making them extensible in the future with a new `QueryStruct`
b) greatly decrease the amount of code in each individual function, avoiding exponential blowup in llvm
Helps with https://github.com/rust-lang/rust/issues/96524. Based on https://github.com/rust-lang/rust/pull/101173; only the last commit is relevant.
r? `@cjgillot`
By making these standalone functions, we
a) allow making them extensible in the future with a new `QueryStruct`
b) greatly decrease the amount of code in each individual function, avoiding exponential blowup in llvm
fix a ui test
use `into`
fix clippy ui test
fix a run-make-fulldeps test
implement `IntoQueryParam<DefId>` for `OwnerId`
use `OwnerId` for more queries
change the type of `ParentOwnerIterator::Item` to `(OwnerId, OwnerNode)`
add note for `layout_of` when query depth overflows
Fixes#101747
Added `try_find_layout_root` function to add a note for `layout_of` when query depth overflows. This would make the error in #101747 look like this:
```
error: queries overflow the depth limit!
|
note: Query depth increased by 66 when computing layout of `core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<core::option::Option<alloc::boxed::Box<alloc::string::String>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>`!
--> D:\rust-backup\parallel_rust\query_depth.rs:40:1
|
40 | fn main() {
| ^^^^^^^^^
error: aborting due to previous error
```
cc ``@semicoleon``
Further simplify the macros generated by `rustc_queries`
This doesn't actually move anything outside the macros, but it makes them simpler to read.
- Add a new `rustc_query_names` macro. This allows a much simpler syntax for the matchers in the macros passed to it as a callback.
- Convert `define_dep_nodes` and `alloc_once` to use `rustc_query_names`. This is possible because they only use the names
(despite the quite complicated matchers in `define_dep_nodes`, none of the other arguments are used).
- Get rid of `rustc_dep_node_append`.
r? `@cjgillot`
Simplify caching and storage for queries
I highly recommend reviewing commit-by-commit; each individual commit is quite small but it can be hard to see looking at the overall diff that the behavior is the same. Each commit depends on the previous.
r? `@cjgillot`
We want to refer to `crate::plumbing::try_load_from_disk` in the const, but hard-coding it in
rustc_queries, where we don't yet know the crate this macro will be called in, seems kind of hacky.
Do it in query_impl instead.
- Add a new `rustc_query_names` macro. This allows a much simpler syntax for the matchers in the macros passed to it as a callback.
- Convert `define_dep_nodes` and `alloc_once` to use `rustc_query_names`. This is possible because they only use the names
(despite the quite complicated matchers in `define_dep_nodes`, none of the other arguments are used).
- Get rid of `rustc_dep_node_append`.
- Add a `HandleCycleError` enum to rustc_query_system, along with a `handle_cycle_error` function
- Move `Value` to rustc_query_system, so `handle_cycle_error` can use it
- Move the `Value` impls from rustc_query_impl to rustc_middle. This is necessary due to orphan rules.
- Parameterize DepKindStruct over `'tcx`
This allows passing in an invariant function pointer in `query_callback`,
rather than having to try and make it work for any lifetime.
- Add a new `execute_query` function to `QueryDescription` so we can call `tcx.$name` without needing to be in a macro context
`rustc_data_structures::thin_vec::ThinVec` looks like this:
```
pub struct ThinVec<T>(Option<Box<Vec<T>>>);
```
It's just a zero word if the vector is empty, but requires two
allocations if it is non-empty. So it's only usable in cases where the
vector is empty most of the time.
This commit removes it in favour of `thin_vec::ThinVec`, which is also
word-sized, but stores the length and capacity in the same allocation as
the elements. It's good in a wider variety of situation, e.g. in enum
variants where the vector is usually/always non-empty.
The commit also:
- Sorts some `Cargo.toml` dependency lists, to make additions easier.
- Sorts some `use` item lists, to make additions easier.
- Changes `clean_trait_ref_with_bindings` to take a
`ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this
avoid some unnecessary allocations.
Simplify the arguments to macros generated by the `rustc_queries` proc macro
Very small cleanup. Based on https://github.com/rust-lang/rust/pull/100436 which modifies some of the same code.
r? `@cjgillot`
add `depth_limit` in `QueryVTable` to avoid entering a new tcx in `layout_of`
Fixes#49735
Updates #48685
The `layout_of` query needs to check whether it overflows the depth limit, and the current implementation needs to create a new `ImplicitCtxt` inside `layout_of`. However, `start_query` will already create a new `ImplicitCtxt`, so we can check the depth limit in `start_query`.
We can tell whether we need to check the depth limit simply by whether the return value of `to_debug_str` of the query is `layout_of`. But I think adding the `depth_limit` field in `QueryVTable` may be more elegant and more scalable.
- Disallow multiple macros callbacks in the same invocation. In practice, this was never used.
- Remove the `[]` brackets around the macro name
- Require an `ident`, not an arbitrary `tt`
This should both make the code easier to read and also greatly reduce the amount of codegen
the compiler has to do, since it only needs to monomorphize `create_query_frame` for each
new key and not for each query.
Rustdoc documents these with the name of the type alias instead of normalizing them to the underlying type.
Use associated types instead so that the generated docs for nightly-rustc are easier to read.
Add the diagnostic translation lints to crates that don't emit them
Some of these have a note saying that they should build on a stable compiler, does that mean they shouldn't get these lints? Or can we cfg them out on those?
rustc_metadata: dedupe strings to prevent multiple copies in rmeta/query cache blow file size
r? `@cjgillot`
Encodes strings in rmeta/query cache so duplicated ones will be encoded as offsets to first strings, reducing file size.
Do not report cycle error when inferring return type for suggestion
The UI test is a good example of a case where this happens. The cycle is due to needing the value of the return type `-> _` to compute the variances of items in the crate, but then needing the variances of the items in the crate to do typechecking to infer what `-> _`'s real type is.
Since we're already gonna emit an error in astconv, just delay the cycle bug as an error.
This simplifies things, but requires making `CacheEncoder` non-generic.
(This was previously merged as commit 4 in #94732 and then was reverted
in #97905 because it caused a perf regression.)
Rename rustc_serialize::opaque::Encoder as MemEncoder.
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
(This was previously merged as commit 5 in #94732 and then was reverted
in #97905 because of a perf regression caused by commit 4 in #94732.)
r? ```@bjorn3```
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
(This was previously merged as commit 5 in #94732 and then was reverted
in #97905 because of a perf regression caused by commit 4 in #94732.)
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
There are two impls of the `Encoder` trait: `opaque::Encoder` and
`opaque::FileEncoder`. The former encodes into memory and is infallible, the
latter writes to file and is fallible.
Currently, standard `Result`/`?`/`unwrap` error handling is used, but this is a
bit verbose and has non-trivial cost, which is annoying given how rare failures
are (especially in the infallible `opaque::Encoder` case).
This commit changes how `Encoder` fallibility is handled. All the `emit_*`
methods are now infallible. `opaque::Encoder` requires no great changes for
this. `opaque::FileEncoder` now implements a delayed error handling strategy.
If a failure occurs, it records this via the `res` field, and all subsequent
encoding operations are skipped if `res` indicates an error has occurred. Once
encoding is complete, the new `finish` method is called, which returns a
`Result`. In other words, there is now a single `Result`-producing method
instead of many of them.
This has very little effect on how any file errors are reported if
`opaque::FileEncoder` has any failures.
Much of this commit is boring mechanical changes, removing `Result` return
values and `?` or `unwrap` from expressions. The more interesting parts are as
follows.
- serialize.rs: The `Encoder` trait gains an `Ok` associated type. The
`into_inner` method is changed into `finish`, which returns
`Result<Vec<u8>, !>`.
- opaque.rs: The `FileEncoder` adopts the delayed error handling
strategy. Its `Ok` type is a `usize`, returning the number of bytes
written, replacing previous uses of `FileEncoder::position`.
- Various methods that take an encoder now consume it, rather than being
passed a mutable reference, e.g. `serialize_query_result_cache`.
`SourceFile::lines` is a big part of metadata. It's stored in a compressed form
(a difference list) to save disk space. Decoding it is a big fraction of
compile time for very small crates/programs.
This commit introduces a new type `SourceFileLines` which has a `Lines`
form and a `Diffs` form. The latter is used when the metadata is first
read, and it is only decoded into the `Lines` form when line data is
actually needed. This avoids the decoding cost for many files,
especially in `std`. It's a performance win of up to 15% for tiny
crates/programs where metadata decoding is a high part of compilation
costs.
A `Lock` is needed because the methods that access lines data (which can
trigger decoding) take `&self` rather than `&mut self`. To allow for this,
`SourceFile::lines` now takes a `FnMut` that operates on the lines slice rather
than returning the lines slice.
Remove `crate` visibility modifier
FCP to remove this syntax is just about complete in #53120. Once it completes, this should be merged ASAP to avoid merge conflicts.
The first two commits remove usage of the feature in this repository, while the last removes the feature itself.
Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
This was largely just caching the shard value at this point, which is not
particularly useful -- in the use sites the key was being hashed nearby anyway.
Specifically, rename the `Const` struct as `ConstS` and re-introduce `Const` as
this:
```
pub struct Const<'tcx>(&'tcx Interned<ConstS>);
```
This now matches `Ty` and `Predicate` more closely, including using
pointer-based `eq` and `hash`.
Notable changes:
- `mk_const` now takes a `ConstS`.
- `Const` was copy, despite being 48 bytes. Now `ConstS` is not, so need a
we need separate arena for it, because we can't use the `Dropless` one any
more.
- Many `&'tcx Const<'tcx>`/`&Const<'tcx>` to `Const<'tcx>` changes
- Many `ct.ty` to `ct.ty()` and `ct.val` to `ct.val()` changes.
- Lots of tedious sigil fiddling.
Specifically, change `Ty` from this:
```
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
```
to this
```
pub struct Ty<'tcx>(Interned<'tcx, TyS<'tcx>>);
```
There are two benefits to this.
- It's now a first class type, so we can define methods on it. This
means we can move a lot of methods away from `TyS`, leaving `TyS` as a
barely-used type, which is appropriate given that it's not meant to
be used directly.
- The uniqueness requirement is now explicit, via the `Interned` type.
E.g. the pointer-based `Eq` and `Hash` comes from `Interned`, rather
than via `TyS`, which wasn't obvious at all.
Much of this commit is boring churn. The interesting changes are in
these files:
- compiler/rustc_middle/src/arena.rs
- compiler/rustc_middle/src/mir/visit.rs
- compiler/rustc_middle/src/ty/context.rs
- compiler/rustc_middle/src/ty/mod.rs
Specifically:
- Most mentions of `TyS` are removed. It's very much a dumb struct now;
`Ty` has all the smarts.
- `TyS` now has `crate` visibility instead of `pub`.
- `TyS::make_for_test` is removed in favour of the static `BOOL_TY`,
which just works better with the new structure.
- The `Eq`/`Ord`/`Hash` impls are removed from `TyS`. `Interned`s impls
of `Eq`/`Hash` now suffice. `Ord` is now partly on `Interned`
(pointer-based, for the `Equal` case) and partly on `TyS`
(contents-based, for the other cases).
- There are many tedious sigil adjustments, i.e. adding or removing `*`
or `&`. They seem to be unavoidable.
Refactor query system to maintain a global job id counter
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
Delete -Zquery-stats infrastructure
These statistics are computable from the self-profile data and/or ad-hoc collectable as needed, and in the meantime contribute to rustc bootstrap times -- locally, this PR shaves ~2.5% from rustc_query_impl builds in instruction counts.
If this does lose some functionality we want to keep, I think we should migrate it to self-profile (or a similar interface) rather than this ad-hoc reporting.
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
These statistics are computable from the self-profile data and/or ad-hoc
collectable as needed, and in the meantime contribute to rustc bootstrap times.
`Decoder` has two impls:
- opaque: this impl is already partly infallible, i.e. in some places it
currently panics on failure (e.g. if the input is too short, or on a
bad `Result` discriminant), and in some places it returns an error
(e.g. on a bad `Option` discriminant). The number of places where
either happens is surprisingly small, just because the binary
representation has very little redundancy and a lot of input reading
can occur even on malformed data.
- json: this impl is fully fallible, but it's only used (a) for the
`.rlink` file production, and there's a `FIXME` comment suggesting it
should change to a binary format, and (b) in a few tests in
non-fundamental ways. Indeed #85993 is open to remove it entirely.
And the top-level places in the compiler that call into decoding just
abort on error anyway. So the fallibility is providing little value, and
getting rid of it leads to some non-trivial performance improvements.
Much of this commit is pretty boring and mechanical. Some notes about
a few interesting parts:
- The commit removes `Decoder::{Error,error}`.
- `InternIteratorElement::intern_with`: the impl for `T` now has the same
optimization for small counts that the impl for `Result<T, E>` has,
because it's now much hotter.
- Decodable impls for SmallVec, LinkedList, VecDeque now all use
`collect`, which is nice; the one for `Vec` uses unsafe code, because
that gave better perf on some benchmarks.
Fixes#92163Fixes#92014
When writing to the incremental cache, we encode all `Span`s
we encounter, regardless of whether or not their `SourceFile`
comes from the local crate, or from a foreign crate.
When we decode a `Span`, we use the `StableSourceFileId` we encoded
to locate the matching `SourceFile` in the current session. If this
id corresponds to a `SourceFile` from another crate, then we need to
have already imported that `SourceFile` into our current session.
This usually happens automatically during resolution / macro expansion,
when we try to resolve definitions from other crates. In certain cases,
however, we may try to load a `Span` from a transitive dependency
without having ever imported the `SourceFile`s from that crate, leading
to an ICE.
This PR fixes the issue by calling `imported_source_files()`
when we encounter a `SourceFile` with a foreign `CrateNum`.
This ensures that all `SourceFile`s from that crate are imported
into the current session.
Remove `SymbolStr`
This was originally proposed in https://github.com/rust-lang/rust/pull/74554#discussion_r466203544. As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.
Best reviewed one commit at a time.
r? `@oli-obk`
By changing `as_str()` to take `&self` instead of `self`, we can just
return `&str`. We're still lying about lifetimes, but it's a smaller lie
than before, where `SymbolStr` contained a (fake) `&'static str`!
Build the query vtable directly.
Continuation of https://github.com/rust-lang/rust/pull/89978.
This shrinks the query interface and attempts to reduce the amount of function pointer calls.
Add support for artifact size profiling
This adds support for profiling artifact file sizes (incremental compilation artifacts and query cache to begin with).
Eventually we want to track this in perf.rlo so we can ensure that file sizes do not change dramatically on each pull request.
This relies on support in measureme: https://github.com/rust-lang/measureme/pull/169. Once that lands we can update this PR to not point to a git dependency.
This was worked on together with `@michaelwoerister.`
r? `@wesleywiser`