Use a specialized varint + bitpacking scheme for DepGraph encoding
The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large.
This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node.
While being built, each node now stores its edges in a `SmallVec` with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices.
Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction.
Then we apply a bit-packing scheme; since in https://github.com/rust-lang/rust/pull/115391 we now have unused bits on `DepKind`, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits.
r? `@nnethercote`
Represent MIR composite debuginfo as projections instead of aggregates
Composite debuginfo for MIR is currently represented as
```
debug name => Type { projection1 => place1, projection2 => place2 };
```
ie. a single `VarDebugInfo` object with that name, and its value a `VarDebugInfoContents::Composite`.
This PR proposes to reverse the representation to be
```
debug name.projection1 => place1;
debug name.projection2 => place2;
```
ie. multiple `VarDebugInfo` objects with each their projection.
This simplifies the handling of composite debuginfo by the compiler by avoiding weird nesting.
Based on https://github.com/rust-lang/rust/pull/115139
Add `FreezeLock` type and use it to store `Definitions`
This adds a `FreezeLock` type which allows mutation using a lock until the value is frozen where it can be accessed lock-free. It's used to store `Definitions` in `Untracked` instead of a `RwLock`. Unlike the current scheme of leaking read guards this doesn't deadlock if definitions is written to after no mutation are expected.
Use relative positions inside a SourceFile.
This allows to remove the normalization of start positions for hashing, and simplify allocation of global address space.
cc `@Zoxc`
Encode DepKind as u16
The derived Encodable/Decodable impls serialize/deserialize as a varint, which results in a lot of code size around the encoding/decoding of these types which isn't justified: The full range of values here is rather small but doesn't quite fit in to a `u8`. Growing _all_ serialized `DepKind` to 2 bytes costs us on average 1% size in the incr comp dep graph, which I plan to recoup in https://github.com/rust-lang/rust/pull/110050 by taking advantage of the unused bits in all the serialized `DepKind`.
r? `@nnethercote`
Make `get_return_block()` return `Some` only for HIR nodes in body
Fixes#114918
The issue occurred while compiling the following input:
```rust
fn uwu() -> [(); { () }] {
loop {}
}
```
It was caused by the code below trying to suggest a missing return type which resulted in a const eval cycle: 1bd043098e/compiler/rustc_hir_typeck/src/fn_ctxt/suggestions.rs (L68-L75)
The root cause was `get_return_block()` returning an `Fn` node for a node in the return type (i.e. the second `()` in the return type `[(); { () }]` of the input) although it is supposed to do so only for nodes that lie in the body of the function and return `None` otherwise (at least as per my understanding).
The PR fixes the issue by fixing this behaviour of `get_return_block()`.
Remove conditional use of `Sharded` from query state
`Sharded` is already a zero cost abstraction, so it shouldn't affect the performance of the single thread compiler if LLVM does its job.
r? `@cjgillot`
Allow explicit `#[repr(Rust)]`
This is identical to no `repr()` at all. For `Rust, packed` and `Rust, align(x)`, it should be the same as no `Rust` at all (as, afaik, `#[repr(align(16))]` uses the Rust ABI.)
The main use case for this is being able to explicitly say "I want to use the Rust ABI" in very very rare circumstances where the first obvious choice would be the C ABI yet is undesirable, which is already possible with functions as `extern "Rust"`. This would be useful for silencing https://github.com/rust-lang/rust-clippy/pull/11253. It's also more consistent with `extern`.
The lack of this also tripped me up a bit when I was new to Rust, as I expected this to be possible.
Point at return type when it influences non-first `match` arm
When encountering code like
```rust
fn foo() -> i32 {
match 0 {
1 => return 0,
2 => "",
_ => 1,
}
}
```
Point at the return type and not at the prior arm, as that arm has type `!` which isn't influencing the arm corresponding to arm `2`.
Fix#78124.
Couple of global state and driver refactors
* Remove some unused global mutable state
* Remove a couple of unnecessary queries (both driver and `TyCtxt` queries)
* Remove an unnecessary use of `FxIndexMap`
When encountering code like
```rust
fn foo() -> i32 {
match 0 {
1 => return 0,
2 => "",
_ => 1,
}
}
```
Point at the return type and not at the prior arm, as that arm has type
`!` which isn't influencing the arm corresponding to arm `2`.
Fix#78124.
feat: `riscv-interrupt-{m,s}` calling conventions
Similar to prior support added for the mips430, avr, and x86 targets this change implements the rough equivalent of clang's [`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking visibility into any non-assembly body of the interrupt handler, have to be very conservative and save the [entire CPU state to the stack frame][full-frame-save]. By instead asking LLVM to only save the registers that it uses, we defer the decision to the tool with the best context: it can more accurately account for the cost of spills if it knows that every additional register used is already at the cost of an implicit spill.
At the LLVM level, this is apparently [implemented by] marking every register as "[callee-save]," matching the semantics of an interrupt handler nicely (it has to leave the CPU state just as it found it after its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes no attempt to e.g. save the state in a user-accessible stack frame. For a full discussion of those challenges and tradeoffs, please refer to [the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM does not expose the "all-saved" function flavor as a calling convention directly, instead preferring to use an attribute that allows for differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be aware of the differences between machine-mode and supervisor-mode interrupts as to why no `riscv-interrupt` calling convention is exposed through rustc, and similarly for why `riscv-interrupt-u` makes no appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
Similar to prior support added for the mips430, avr, and x86 targets
this change implements the rough equivalent of clang's
[`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling
e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking
visibility into any non-assembly body of the interrupt handler, have to
be very conservative and save the [entire CPU state to the stack
frame][full-frame-save]. By instead asking LLVM to only save the
registers that it uses, we defer the decision to the tool with the best
context: it can more accurately account for the cost of spills if it
knows that every additional register used is already at the cost of an
implicit spill.
At the LLVM level, this is apparently [implemented by] marking every
register as "[callee-save]," matching the semantics of an interrupt
handler nicely (it has to leave the CPU state just as it found it after
its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes
no attempt to e.g. save the state in a user-accessible stack frame. For
a full discussion of those challenges and tradeoffs, please refer to
[the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM
does not expose the "all-saved" function flavor as a calling convention
directly, instead preferring to use an attribute that allows for
differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be
aware of the differences between machine-mode and supervisor-mode
interrupts as to why no `riscv-interrupt` calling convention is exposed
through rustc, and similarly for why `riscv-interrupt-u` makes no
appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
Map RPIT duplicated lifetimes back to fn captured lifetimes
Use the [`lifetime_mapping`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.OpaqueTy.html#structfield.lifetime_mapping) to map an RPIT's captured lifetimes back to the early- or late-bound lifetimes from its parent function. We may be going thru several layers of mapping, since opaques can be nested, so we introduce `TyCtxt::map_rpit_lifetime_to_fn_lifetime` to loop through several opaques worth of mapping, and handle turning it into a `ty::Region` as well.
We can then use this instead of the identity substs for RPITs in `check_opaque_meets_bounds` to address #114285.
We can then also use `map_rpit_lifetime_to_fn_lifetime` to properly install bidirectional-outlives predicates for both RPITs and RPITITs. This addresses #114601.
I based this on #114574, but I don't actually know how much of that PR we still need, so some code may be redundant now... 🤷
---
Fixes#114597Fixes#114579Fixes#114285
Also fixes#114601, since it turns out we had other bugs with RPITITs and their duplicated lifetime params 😅.
Supersedes #114574
r? `@oli-obk`
Store the laziness of type aliases in their `DefKind`
Previously, we would treat paths referring to type aliases as *lazy* type aliases if the current crate had lazy type aliases enabled independently of whether the crate which the alias was defined in had the feature enabled or not.
With this PR, the laziness of a type alias depends on the crate it is defined in. This generally makes more sense to me especially if / once lazy type aliases become the default in a new edition and we need to think about *edition interoperability*:
Consider the hypothetical case where the dependency crate has an older edition (and thus eager type aliases), it exports a type alias with bounds & a where-clause (which are void but technically valid), the dependent crate has the latest edition (and thus lazy type aliases) and it uses that type alias. Arguably, the bounds should *not* be checked since at any time, the dependency crate should be allowed to change the bounds at will with a *non*-major version bump & without negatively affecting downstream crates.
As for the reverse case (dependency: lazy type aliases, dependent: eager type aliases), I guess it rules out anything from slight confusion to mild annoyance from upstream crate authors that would be caused by the compiler ignoring the bounds of their type aliases in downstream crates with older editions.
---
This fixes#114468 since before, my assumption that the type alias associated with a given weak projection was lazy (and therefore had its variances computed) did not necessarily hold in cross-crate scenarios (which [I kinda had a hunch about](https://github.com/rust-lang/rust/pull/114253#discussion_r1278608099)) as outlined above. Now it does hold.
`@rustbot` label F-lazy_type_alias
r? `@oli-obk`
Add documentation to has_deref
Documentation of `has_deref` needed some polish to be more clear about where it should be used and what's it's purpose.
cc https://github.com/rust-lang/rust/issues/114401
r? `@RalfJung`
Convert builtin "global" late lints to run per module
The compiler currently has 4 non-incremental lints:
1. `clashing_extern_declarations`;
2. `missing_debug_implementations`;
3. ~`unnameable_test_items`;~ changed by https://github.com/rust-lang/rust/pull/114414
4. `missing_docs`.
Non-incremental lints get reexecuted for each compilation, which is slow. Moreover, those lints are allow-by-default, so run for nothing most of the time. This PR attempts to make them more incremental-friendly.
`clashing_extern_declarations` is moved to a standalone query.
`missing_debug_implementation` can use `non_blanket_impls_for_ty` instead of recomputing it.
`missing_docs` is harder as it needs to track if there is a `doc(hidden)` module surrounding. I hack around this using the lint level engine. That's easy to implement and allows to re-enable the lint for a re-exported module, while a more proper solution would reuse the same device as `unnameable_test_items`.
update overflow handling in the new trait solver
implements https://hackmd.io/QY0dfEOgSNWwU4oiGnVRLw?view. I want to clean up this doc and add it to the rustc-dev-guide, but I think this PR is ready for merge as is, even without the dev-guide entry.
r? `@compiler-errors`
Rework upcasting confirmation to support upcasting to fewer projections in target bounds
This PR implements a modified trait upcasting algorithm that is resilient to changes in the number of associated types in the bounds of the source and target trait objects.
It does this by equating each bound of the target trait ref individually against the bounds of the source trait ref, rather than doing them all together by constructing a new trait object.
#### The new way we do trait upcasting confirmation
1. Equate the target trait object's principal trait ref with one of the supertraits of the source trait object's principal.
fdcab310b2/compiler/rustc_trait_selection/src/traits/select/mod.rs (L2509-L2525)
2. Make sure that every auto trait in the *target* trait object is present in the source trait ref's bounds.
fdcab310b2/compiler/rustc_trait_selection/src/traits/select/mod.rs (L2559-L2562)
3. For each projection in the *target* trait object, make sure there is exactly one projection that equates with it in the source trait ref's bound. If there is more than one, bail with ambiguity.
fdcab310b2/compiler/rustc_trait_selection/src/traits/select/mod.rs (L2526-L2557)
* Since there may be more than one that applies, we probe first to check that there is exactly one, then we equate it outside of a probe once we know that it's unique.
4. Make sure the lifetime of the source trait object outlives the lifetime of the target.
<details>
<summary>Meanwhile, this is how we used to do upcasting:</summary>
1. For each supertrait of the source trait object, take that supertrait, append the source object's projection bounds, and the *target* trait object's auto trait bounds, and make this into a new object type:
d12c6e947c/compiler/rustc_trait_selection/src/traits/select/confirmation.rs (L915-L929)
2. Then equate it with the target trait object:
d12c6e947c/compiler/rustc_trait_selection/src/traits/select/confirmation.rs (L936)
This will be a type mismatch if the target trait object has fewer projection bounds, since we compare the bounds structurally in relate:
d12c6e947c/compiler/rustc_middle/src/ty/relate.rs (L696-L698)
</details>
Fixes#114035
Also fixes#114113, because I added a normalize call in the old solver.
r? types
resolve before canonicalization in new solver, ICE if unresolved
Fold the values with a resolver before canonicalization instead of making it happen within canonicalization.
This allows us to filter trivial region constraints from the external constraints.
r? ``@lcnr``