rustc_metadata: Do not encode unnecessary module children
This should remove the syntax context shift and the special case for `ExternCrate` in decoder in https://github.com/rust-lang/rust/pull/95880.
This PR also shifts some work from decoding to encoding, which is typically useful for performance (but probably not much in this case).
r? `@cjgillot`
Allow self-profiler to only record potentially costly arguments when argument recording is turned on
As discussed [on zulip](https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Identifying.20proc-macro.20slowdowns/near/277304909) with `@wesleywiser,` I'd like to record proc-macro expansions in the self-profiler, with some detailed data (per-expansion spans for example, to follow #95473).
At the same time, I'd also like to avoid doing expensive things when tracking a generic activity's arguments, if they were not specifically opted into the event filter mask, to allow the self-profiler to be used in hotter contexts.
This PR tries to offer:
- a way to ensure a closure to record arguments will only be called in that situation, so that potentially costly arguments can still be recorded when needed. With the additional requirement that, if possible, it would offer a way to record non-owned data without adding many `generic_activity_with_arg_{...}`-style methods. This lead to the `generic_activity_with_arg_recorder` single entry-point, and the closure parameter would offer the new methods, able to be executed in a context where costly argument could be created without disturbing the profiled piece of code.
- some facilities/patterns allowing to record more rustc specific data in this situation, without making `rustc_data_structures` where the self-profiler is defined, depend on other rustc crates (causing circular dependencies): in particular, spans. They are quite tricky to turn into strings (if the default `Debug` impl output does not match the context one needs them for), and since I'd also like to avoid the allocation there when arg recording is turned off today, that has turned into another flexibility requirement for the API in this PR (separating the span-specific recording into an extension trait). **edit**: I've removed this from the PR so that it's easier to review, and opened https://github.com/rust-lang/rust/pull/95739.
- allow for extensibility in the future: other ways to record arguments, or additional data attached to them could be added in the future (e.g. recording the argument's name as well as its data).
Some areas where I'd love feedback:
- the API and names: the `EventArgRecorder` and its method for example. As well as the verbosity that comes from the increased flexibility.
- if I should convert the existing `generic_activity_with_arg{s}` to just forward to `generic_activity_with_arg_recorder` + `recorder.record_arg` (or remove them altogether ? Probably not): I've used the new API in the simple case I could find of allocating for an arg that may not be recorded, and the rest don't seem costly.
- [x] whether this API should panic if no arguments were recorded by the user-provided closure (like this PR currently does: it seems like an error to use an API dedicated to record arguments but not call the methods to then do so) or if this should just record a generic activity without arguments ?
- whether the `record_arg` function should be `#[inline(always)]`, like the `generic_activity_*` functions ?
As mentioned, r? `@wesleywiser` following our recent discussion.
make unaligned_references lint deny-by-default
This lint has been warn-by-default for a year now (since https://github.com/rust-lang/rust/pull/82525), so I think it is time to crank it up a bit. Code that triggers the lint causes UB (without `unsafe`) when executed, so we really don't want people to write code like this.
Cached stable hash cleanups
r? `@nnethercote`
Add a sanity assertion in debug mode to check that the cached hashes are actually the ones we get if we compute the hash each time.
Add a new data structure that bundles all the hash-caching work to make it easier to re-use it for different interned data structures
Enforce well formedness for type alias impl trait's hidden type
fixes#84657
This was not an issue with return-position-impl-trait because the generic bounds of the function are the same as those of the opaque type, and the hidden type must already be well formed within the function.
With type-alias-impl-trait the hidden type could be defined in a function that has *more* lifetime bounds than the type alias. This is fine, but the hidden type must still be well formed without those additional bounds.
This allows profiling costly arguments to be recorded only when `-Zself-profile-events=args` is on: using a closure that takes an `EventArgRecorder` and call its `record_arg` or `record_args` methods.
Add debug assertions to some unsafe functions
As suggested by https://github.com/rust-lang/rust/issues/51713
~~Some similar code calls `abort()` instead of `panic!()` but aborting doesn't work in a `const fn`, and the intrinsic for doing dispatch based on whether execution is in a const is unstable.~~
This picked up some invalid uses of `get_unchecked` in the compiler, and fixes them.
I can confirm that they do in fact pick up invalid uses of `get_unchecked` in the wild, though the user experience is less-than-awesome:
```
Running unittests (target/x86_64-unknown-linux-gnu/debug/deps/rle_decode_fast-04b7918da2001b50)
running 6 tests
error: test failed, to rerun pass '--lib'
Caused by:
process didn't exit successfully: `/home/ben/rle-decode-helper/target/x86_64-unknown-linux-gnu/debug/deps/rle_decode_fast-04b7918da2001b50` (signal: 4, SIGILL: illegal instruction)
```
~~As best I can tell these changes produce a 6% regression in the runtime of `./x.py test` when `[rust] debug = true` is set.~~
Latest commit (6894d559bd) brings the additional overhead from this PR down to 0.5%, while also adding a few more assertions. I think this actually covers all the places in `core` that it is reasonable to check for safety requirements at runtime.
Thoughts?
Rollup of 5 pull requests
Successful merges:
- #95294 (Document Linux kernel handoff in std::io::copy and std::fs::copy)
- #95443 (Clarify how `src/tools/x` searches for python)
- #95452 (fix since field version for termination stabilization)
- #95460 (Spellchecking compiler code)
- #95461 (Spellchecking some comments)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Lazy type-alias-impl-trait take two
### user visible change 1: RPIT inference from recursive call sites
Lazy TAIT has an insta-stable change. The following snippet now compiles, because opaque types can now have their hidden type set from wherever the opaque type is mentioned.
```rust
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return 42
}
let x: u32 = bar(false); // this errors on stable
99
}
```
The return type of `bar` stays opaque, you can't do `bar(false) + 42`, you need to actually mention the hidden type.
### user visible change 2: divergence between RPIT and TAIT in return statements
Note that `return` statements and the trailing return expression are special with RPIT (but not TAIT). So
```rust
#![feature(type_alias_impl_trait)]
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![42];
}
std::iter::empty().collect() //~ ERROR `Foo` cannot be built from an iterator
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![42]
}
std::iter::empty().collect() // Works, magic (accidentally stabilized, not intended)
}
```
But when we are working with the return value of a recursive call, the behavior of RPIT and TAIT is the same:
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![];
}
let mut x = foo(false);
x = std::iter::empty().collect(); //~ ERROR `Foo` cannot be built from an iterator
vec![]
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![];
}
let mut x = bar(false);
x = std::iter::empty().collect(); //~ ERROR `impl Debug` cannot be built from an iterator
vec![]
}
```
### user visible change 3: TAIT does not merge types across branches
In contrast to RPIT, TAIT does not merge types across branches, so the following does not compile.
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
vec![42_i32]
} else {
std::iter::empty().collect()
//~^ ERROR `Foo` cannot be built from an iterator over elements of type `_`
}
}
```
It is easy to support, but we should make an explicit decision to include the additional complexity in the implementation (it's not much, see a721052457cf513487fb4266e3ade65c29b272d2 which needs to be reverted to enable this).
### PR formalities
previous attempt: #92007
This PR also includes #92306 and #93783, as they were reverted along with #92007 in #93893fixes#93411fixes#88236fixes#89312fixes#87340fixes#86800fixes#86719fixes#84073fixes#83919fixes#82139fixes#77987fixes#74282fixes#67830fixes#62742fixes#54895
These debug assertions are all implemented only at runtime using
`const_eval_select`, and in the error path they execute
`intrinsics::abort` instead of being a normal debug assertion to
minimize the impact of these assertions on code size, when enabled.
Of all these changes, the bounds checks for unchecked indexing are
expected to be most impactful (case in point, they found a problem in
rustc).
`Layout` is another type that is sometimes interned, sometimes not, and
we always use references to refer to it so we can't take any advantage
of the uniqueness properties for hashing or equality checks.
This commit renames `Layout` as `LayoutS`, and then introduces a new
`Layout` that is a newtype around an `Interned<LayoutS>`. It also
interns more layouts than before. Previously layouts within layouts
(via the `variants` field) were never interned, but now they are. Hence
the lifetime on the new `Layout` type.
Unlike other interned types, these ones are in `rustc_target` instead of
`rustc_middle`. This reflects the existing structure of the code, which
does layout-specific stuff in `rustc_target` while `TyAndLayout` is
generic over the `Ty`, allowing the type-specific stuff to occur in
`rustc_middle`.
The commit also adds a `HashStable` impl for `Interned`, which was
needed. It hashes the contents, unlike the `Hash` impl which hashes the
pointer.
Currently some `Allocation`s are interned, some are not, and it's very
hard to tell at a use point which is which.
This commit introduces `ConstAllocation` for the known-interned ones,
which makes the division much clearer. `ConstAllocation::inner()` is
used to get the underlying `Allocation`.
In some places it's natural to use an `Allocation`, in some it's natural
to use a `ConstAllocation`, and in some places there's no clear choice.
I've tried to make things look as nice as possible, while generally
favouring `ConstAllocation`, which is the type that embodies more
information. This does require quite a few calls to `inner()`.
The commit also tweaks how `PartialOrd` works for `Interned`. The
previous code was too clever by half, building on `T: Ord` to make the
code shorter. That caused problems with deriving `PartialOrd` and `Ord`
for `ConstAllocation`, so I changed it to build on `T: PartialOrd`,
which is slightly more verbose but much more standard and avoided the
problems.
Always include global target features in function attributes
This ensures that information about target features configured with
`-C target-feature=...` or detected with `-C target-cpu=native` is
retained for subsequent consumers of LLVM bitcode.
This is crucial for linker plugin LTO, since this information is not
conveyed to the plugin otherwise.
<details><summary>Additional test case demonstrating the issue</summary>
```rust
extern crate core;
#[inline]
#[target_feature(enable = "aes")]
unsafe fn f(a: u128, b: u128) -> u128 {
use core::arch::x86_64::*;
use core::mem::transmute;
transmute(_mm_aesenc_si128(transmute(a), transmute(b)))
}
pub fn g(a: u128, b: u128) -> u128 {
unsafe { f(a, b) }
}
fn main() {
let mut args = std::env::args();
let _ = args.next().unwrap();
let a: u128 = args.next().unwrap().parse().unwrap();
let b: u128 = args.next().unwrap().parse().unwrap();
println!("{}", g(a, b));
}
```
```console
$ rustc --edition=2021 a.rs -Clinker-plugin-lto -Clink-arg=-fuse-ld=lld -Ctarget-feature=+aes -O
...
= note: LLVM ERROR: Cannot select: intrinsic %llvm.x86.aesni.aesenc
```
</details>
r? `@nagisa`
Rollup of 9 pull requests
Successful merges:
- #94464 (Suggest adding a new lifetime parameter when two elided lifetimes should match up for traits and impls.)
- #94476 (7 - Make more use of `let_chains`)
- #94478 (Fix panic when handling intra doc links generated from macro)
- #94482 (compiler: fix some typos)
- #94490 (Update books)
- #94496 (tests: accept llvm intrinsic in align-checking test)
- #94498 (9 - Make more use of `let_chains`)
- #94503 (Provide C FFI types via core::ffi, not just in std)
- #94513 (update Miri)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
Avoid exhausting stack space in dominator compression
Doesn't add a test case -- I ended up running into this while playing with the generated example from #43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely.
Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
Introduce `ChunkedBitSet` and use it for some dataflow analyses.
This reduces peak memory usage significantly for some programs with very
large functions.
r? `@ghost`
This reduces peak memory usage significantly for some programs with very
large functions, such as:
- `keccak`, `unicode_normalization`, and `match-stress-enum`, from
the `rustc-perf` benchmark suite;
- `http-0.2.6` from crates.io.
The new type is used in the analyses where the bitsets can get huge
(e.g. 10s of thousands of bits): `MaybeInitializedPlaces`,
`MaybeUninitializedPlaces`, and `EverInitializedPlaces`.
Some refactoring was required in `rustc_mir_dataflow`. All existing
analysis domains are either `BitSet` or a trivial wrapper around
`BitSet`, and access in a few places is done via `Borrow<BitSet>` or
`BorrowMut<BitSet>`. Now that some of these domains are `ClusterBitSet`,
that no longer works. So this commit replaces the `Borrow`/`BorrowMut`
usage with a new trait `BitSetExt` containing the needed bitset
operations. The impls just forward these to the underlying bitset type.
This required fiddling with trait bounds in a few places.
The commit also:
- Moves `static_assert_size` from `rustc_data_structures` to
`rustc_index` so it can be used in the latter; the former now
re-exports it so existing users are unaffected.
- Factors out some common "clear excess bits in the final word"
functionality in `bit_set.rs`.
- Uses `fill` in a few places instead of loops.
Allow inlining of `ensure_sufficient_stack()`
This functions is monomorphized a lot and allowing the compiler to inline it improves instructions count and max RSS significantly in my local tests.
In particular, there's now more protection against incorrect usage,
because you can only create one via `Interned::new_unchecked`, which
makes it more obvious that you must be careful.
There are also some tests.