Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
Make `TyCtxt::coroutine_layout` take coroutine's kind parameter
For coroutines that come from coroutine-closures (i.e. async closures), we may have two kinds of bodies stored in the coroutine; one that takes the closure's captures by reference, and one that takes the captures by move.
These currently have identical layouts, but if we do any optimization for these layouts that are related to the upvars, then they will diverge -- e.g. https://github.com/rust-lang/rust/pull/120168#discussion_r1536943728.
This PR relaxes the assertion I added in #121122, and instead make the `TyCtxt::coroutine_layout` method take the `coroutine_kind_ty` argument from the coroutine, which will allow us to differentiate these by-move and by-ref bodies.
This should assist comprehending the size of coroutines.
In particular, whenever a future is suspended while awaiting another
future, the latter is given the special name `__awaitee`, and now the
type of the awaited future will be printed, allowing identifying
caller/callee — er, I mean, poller/pollee — relationships.
It would be possible to include the type name in more cases, but I
thought that that might be overly verbose (`print-type-sizes` is already
a lot of text) and ordinary named fields or variables are easier for
readers to discover the types of.
clean up `Sized` checking
This PR cleans up `sized_constraint` and related functions to make them simpler and faster. This should not make more or less code compile, but it can change error output in some rare cases.
## enums and unions are `Sized`, even if they are not WF
The previous code has some special handling for enums, which made them sized if and only if the last field of each variant is sized. For example given this definition (which is not WF)
```rust
enum E<T1: ?Sized, T2: ?Sized, U1: ?Sized, U2: ?Sized> {
A(T1, T2),
B(U1, U2),
}
```
the enum was sized if and only if `T2` and `U2` are sized, while `T1` and `T2` were ignored for `Sized` checking. After this PR this enum will always be sized.
Unsized enums are not a thing in Rust and removing this special case allows us to return an `Option<Ty>` from `sized_constraint`, rather than a `List<Ty>`.
Similarly, the old code made an union defined like this
```rust
union Union<T: ?Sized, U: ?Sized> {
head: T,
tail: U,
}
```
sized if and only if `U` is sized, completely ignoring `T`. This just makes no sense at all and now this union is always sized.
## apply the "perf hack" to all (non-error) types, instead of just type parameters
This "perf hack" skips evaluating `sized_constraint(adt): Sized` if `sized_constraint(adt): Sized` exactly matches a predicate defined on `adt`, for example:
```rust
// `Foo<T>: Sized` iff `T: Sized`, but we know `T: Sized` from a predicate of `Foo`
struct Foo<T /*: Sized */>(T);
```
Previously this was only applied to type parameters and now it is applied to every type. This means that for example this type is now always sized:
```rust
// Note that this definition is WF, but the type `S<T>` not WF in the global/empty ParamEnv
struct S<T>([T]) where [T]: Sized;
```
I don't anticipate this to affect compile time of any real-world program, but it makes the code a bit nicer and it also makes error messages a bit more consistent if someone does write such a cursed type.
## tuples are sized if the last type is sized
The old solver already has this behavior and this PR also implements it for the new solver and `is_trivially_sized`. This makes it so that tuples work more like a struct defined like this:
```rust
struct TupleN<T1, T2, /* ... */ Tn: ?Sized>(T1, T2, /* ... */ Tn);
```
This might improve the compile time of programs with large tuples a little, but is mostly also a consistency fix.
## `is_trivially_sized` for more types
This function is used post-typeck code (borrowck, const eval, codegen) to skip evaluating `T: Sized` in some cases. It will now return `true` in more cases, most notably `UnsafeCell<T>` and `ManuallyDrop<T>` where `T.is_trivially_sized`.
I'm anticipating that this change will improve compile time for some real world programs.
hir: Remove `opt_local_def_id_to_hir_id` and `opt_hir_node_by_def_id`
Also replace a few `hir_node()` calls with `hir_node_by_def_id()`.
Follow up to https://github.com/rust-lang/rust/pull/120943.
Remove `Ord` from `ClosureKind`
Using `Ord` to accomplish a meaning of subset relationship can be hard to read. The existing uses for that are easily replaced with a `match`, and in my opinion, more readable without needing to resorting to comments to explain the intention.
cc `@compiler-errors`
Using `Ord` to accomplish a meaning of subset relationship
can be hard to read. The existing uses for that are easily
replaced with a `match`, and in my opinion, more readable
without needing to resorting to comments to explain the
intention.
Uplift some feeding out of `associated_type_for_impl_trait_in_impl` and into queries
This PR moves the `type_of` and `generics_of` query feeding out of `associated_type_for_impl_trait_in_impl`, since eagerly feeding results in query cycles due to a subtle interaction with `resolve_bound_vars`.
Fixes#122019
r? spastorino
stricter hidden type wf-check [based on #115008]
Original work by `@aliemjay` in #115008. A huge thanks to them for originally figuring out this approach ❤️
Fixes https://github.com/rust-lang/rust/issues/114728
Fixes https://github.com/rust-lang/rust/issues/114572
Instead of adding the `WellFormed` obligations when relating opaque types, we now always emit such an obligation when defining the hidden type.
This causes nested opaque types which aren't wf to error, see the comment below for the described impact. I believe this change to be desirable as it significantly reduces complexity by removing special-cases.
It also caused an issue with RPITIT: in defaulted trait methods, we add a `Projection(synthetic_assoc, rpit_of_trait_method)` clause to the `param_env`. This clause is not added to the `ParamEnv` of the nested coroutines. This caused a normalization failure in `fn check_coroutine_obligations` with the new solver. I fixed that by using the env of the typeck root instead.
r? `@oli-obk`
Rollup of 9 pull requests
Successful merges:
- #121065 (Add basic i18n guidance for `Display`)
- #121744 (Stop using Bubble in coherence and instead emulate it with an intercrate check)
- #121829 (Dummy tweaks (attempt 2))
- #121857 (Implement async closure signature deduction)
- #121894 (const_eval_select: make it safe but be careful with what we expose on stable for now)
- #122014 (Change some attributes to only_local.)
- #122016 (will_wake tests fail on Miri and that is expected)
- #122018 (only set noalias on Box with the global allocator)
- #122028 (Remove some dead code)
r? `@ghost`
`@rustbot` modify labels: rollup
Rollup of 8 pull requests
Successful merges:
- #121202 (Limit the number of names and values in check-cfg diagnostics)
- #121301 (errors: share `SilentEmitter` between rustc and rustfmt)
- #121658 (Hint user to update nightly on ICEs produced from outdated nightly)
- #121846 (only compare ambiguity item that have hard error)
- #121961 (add test for #78894#71450)
- #121975 (hir_analysis: enums return `None` in `find_field`)
- #121978 (Fix duplicated path in the "not found dylib" error)
- #121991 (Merge impl_trait_in_assoc_types_defined_by query back into `opaque_types_defined_by`)
r? `@ghost`
`@rustbot` modify labels: rollup
only set noalias on Box with the global allocator
As discovered in https://github.com/rust-lang/miri/issues/3341, `noalias` and custom allocators don't go well together.
rustc can now check whether a Box uses the global allocator. This replaces the previous ad-hoc and rather unprincipled check for a zero-sized allocator.
This is the rustc part of fixing that; Miri will also need a patch.
Instead, when we're collecting opaques for associated items, we choose the right collection mode depending on whether we're collecting for an associated item of a trait impl or not.
Use the correct logic for nested impl trait in assoc types
Previously we accidentally continued with the TAIT visitor, which allowed more than we wanted to.
r? ```@compiler-errors```