For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.
For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`. After this PR they do.
Add the missing inttoptr when we ptrtoint in ptr atomics
Ralf noticed this here: https://github.com/rust-lang/rust/pull/122220#discussion_r1535172094
Our previous codegen forgot to add the cast back to integer type. The code compiles anyway, because of course all locals are in-memory to start with, so previous codegen would do the integer atomic, store the integer to a local, then load a pointer from that local. Which is definitely _not_ what we wanted: That's an integer-to-pointer transmute, so all pointers returned by these `AtomicPtr` methods didn't have provenance. Yikes.
Here's the IR for `AtomicPtr::fetch_byte_add` on 1.76: https://godbolt.org/z/8qTEjeraY
```llvm
define noundef ptr `@atomicptr_fetch_byte_add(ptr` noundef nonnull align 8 %a, i64 noundef %v) unnamed_addr #0 !dbg !7 {
start:
%0 = alloca ptr, align 8, !dbg !12
%val = inttoptr i64 %v to ptr, !dbg !12
call void `@llvm.lifetime.start.p0(i64` 8, ptr %0), !dbg !28
%1 = ptrtoint ptr %val to i64, !dbg !28
%2 = atomicrmw add ptr %a, i64 %1 monotonic, align 8, !dbg !28
store i64 %2, ptr %0, align 8, !dbg !28
%self = load ptr, ptr %0, align 8, !dbg !28
call void `@llvm.lifetime.end.p0(i64` 8, ptr %0), !dbg !28
ret ptr %self, !dbg !33
}
```
r? `@RalfJung`
cc `@nikic`
llvm/llvm-project#87910 infers `nuw` and `nsw` on some `trunc`
instructions we're doing `FileCheck` on. Tolerate but don't require them
to support both release and head LLVM.
`f16` and `f128` step 4: basic library support
This is the next step after https://github.com/rust-lang/rust/pull/121926, another portion of https://github.com/rust-lang/rust/pull/114607
Tracking issue: https://github.com/rust-lang/rust/issues/116909
This PR adds the most basic operations to `f16` and `f128` that get lowered as LLVM intrinsics. This is a very small step but it seemed reasonable enough to add unopinionated basic operations before the larger modules that are built on top of them.
r? ```@Amanieu``` since you were pretty involved in the RFC
cc ```@compiler-errors```
```@rustbot``` label +T-libs-api +S-blocked +F-f16_and_f128
I added this back in 111999, but I no longer think it's a good idea
- It had to get scaled back to only power-of-two things to not break a bunch of targets
- LLVM seems to be getting better at memcpy removal anyway
- Introducing vector instructions has seemed to sometimes (115515) make autovectorization worse
So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store for immediates.
Re-enable the early otherwise branch optimization
Closes#95162. Fixes#119014.
This is the first part of #121397.
An invalid enum discriminant can come from anywhere. We have to check to see if all successors contain the discriminant statement. This should have a pass to hoist instructions.
r? cjgillot
Don't emit divide-by-zero panic paths in `StepBy::len`
I happened to notice today that there's actually two such calls emitted in the assembly: <https://rust.godbolt.org/z/1Wbbd3Ts6>
Since they're impossible, hopefully telling LLVM that will also help optimizations elsewhere.
Fix argument ABI for overaligned structs on ppc64le
When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of `i128` rather than an array of `i64`. This will force the use of an even starting doubleword.
For the case of a 16 byte struct with alignment 16 it is important that `[1 x i128]` is used instead of `i128` -- apparently, the latter will get treated similarly to `[2 x i64]`, not exhibiting the correct ABI. Add a `force_array` flag to `Uniform` to support this.
The relevant clang code can be found here:
fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L878-L884)fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L780-L784)
I think the corresponding psABI wording is this:
> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.
In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16.
Fixes https://github.com/rust-lang/rust/issues/122767.
r? `@cuviper`
Implement minimal, internal-only pattern types in the type system
rebase of https://github.com/rust-lang/rust/pull/107606
You can create pattern types with `std::pat::pattern_type!(ty is pat)`. The feature is incomplete and will panic on you if you use any pattern other than integral range patterns. The only way to create or deconstruct a pattern type is via `transmute`.
This PR's implementation differs from the MCP's text. Specifically
> This means you could implement different traits for different pattern types with the same base type. Thus, we just forbid implementing any traits for pattern types.
is violated in this PR. The reason is that we do need impls after all in order to make them usable as fields. constants of type `std::time::Nanoseconds` struct are used in patterns, so the type must be structural-eq, which it only can be if you derive several traits on it. It doesn't need to be structural-eq recursively, so we can just manually implement the relevant traits on the pattern type and use the pattern type as a private field.
Waiting on:
* [x] move all unrelated commits into their own PRs.
* [x] fix niche computation (see 2db07f94f44f078daffe5823680d07d4fded883f)
* [x] add lots more tests
* [x] T-types MCP https://github.com/rust-lang/types-team/issues/126 to finish
* [x] some commit cleanup
* [x] full self-review
* [x] remove 61bd325da19a918cc3e02bbbdce97281a389c648, it's not necessary anymore I think.
* [ ] ~~make sure we never accidentally leak pattern types to user code (add stability checks or feature gate checks and appopriate tests)~~ we don't even do this for the new float primitives
* [x] get approval that [the scope expansion to trait impls](https://rust-lang.zulipchat.com/#narrow/stream/326866-t-types.2Fnominated/topic/Pattern.20types.20types-team.23126/near/427670099) is ok
r? `@BoxyUwU`
When passing a 16 (or higher) aligned struct by value on ppc64le,
it needs to be passed as an array of `i128` rather than an array
of `i64`. This will force the use of an even starting register.
For the case of a 16 byte struct with alignment 16 it is important
that `[1 x i128]` is used instead of `i128` -- apparently, the
latter will get treated similarly to `[2 x i64]`, not exhibiting
the correct ABI. Add a `force_array` flag to `Uniform` to support
this.
The relevant clang code can be found here:
fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L878-L884)fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L780-L784)
I think the corresponding psABI wording is this:
> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.
In particular the last sentence.
Fixes https://github.com/rust-lang/rust/issues/122767.
Use unchecked_sub in str indexing
https://github.com/rust-lang/rust/pull/108763 applied this logic to indexing for slices, but of course `str` has its own separate impl.
Found this by skimming over the codegen for https://github.com/oxidecomputer/hubris/; their dist builds enable overflow checks so the lack of `unchecked_sub` was producing an impossible-to-hit overflow check and also inhibiting some inlining.
r? scottmcm
I happened to notice today that there's actually two such calls emitted in the assembly: <https://rust.godbolt.org/z/1Wbbd3Ts6>
Since they're impossible, hopefully telling LLVM that will also help optimizations elsewhere.
CFI: Don't rewrite ty::Dynamic directly
Now that we're using a type folder, the arguments in predicates are processed automatically - we don't need to descend manually.
We also want to keep projection clauses around, and this does so.
r? `@compiler-errors`
Now that we're using a type folder, the arguments in predicates are
processed automatically - we don't need to descend manually.
We also want to keep projection clauses around, and this does so.
CFI: Restore typeid_for_instance default behavior
Restore typeid_for_instance default behavior of performing self type erasure, since it's the most common case and what it does most of the time. Using concrete self (or not performing self type erasure) is for assigning a secondary type id, and secondary type ids are only assigned when they're unique and to methods, and also are only tested for when methods are used as function pointers.
Restore typeid_for_instance default behavior of performing self type
erasure, since it's the most common case and what it does most of the
time. Using concrete self (or not performing self type erasure) is for
assigning a secondary type id, and secondary type ids are only assigned
when they're unique and to methods, and also are only tested for when
methods are used as function pointers.
CFI: Support function pointers for trait methods
Adds support for both CFI and KCFI for function pointers to trait methods by attaching both concrete and abstract types to functions.
KCFI does this through generation of a `ReifyShim` on any function pointer for a method that could go into a vtable, and keeping this separate from `ReifyShim`s that are *intended* for vtable us by setting a `ReifyReason` on them.
CFI does this by setting both the concrete and abstract type on every instance.
This should land after #123024 or a similar PR, as it diverges the implementation of CFI vs KCFI.
r? `@compiler-errors`
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`
Per [#120268](https://github.com/rust-lang/rust/pull/120268#discussion_r1517492060), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .
I solved some nits to add some comments.
I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.
r? RalfJung
Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
This is just one part of the MCP, but it's the one that IMHO removes the most noise from the standard library code.
Seems net simpler this way, since MIR already supported heterogeneous shifts anyway, and thus it's not more work for backends than before.
Remove len argument from RawVec::reserve_for_push
Removes `RawVec::reserve_for_push`'s `len` argument since it's always the same as capacity.
Also makes `Vec::insert` use `RawVec::reserve_for_push`.
CFI: Fix methods as function pointer cast
Fix casting between methods and function pointers by assigning a secondary type id to methods with their concrete self so they can be used as function pointers.
This was split off from #116404.
cc `@compiler-errors` `@workingjubilee`
Codegen const panic messages as function calls
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting shared library (see [perf]).
A sample improvement from nightly:
```
leaq str.0(%rip), %rdi
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdx
movl $25, %esi
callq *_ZN4core9panicking5panic17h17cabb89c5bcc999E@GOTPCREL(%rip)
```
to this PR:
```
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdi
callq *_RNvNtNtCsduqIKoij8JB_4core9panicking11panic_const23panic_const_div_by_zero@GOTPCREL(%rip)
```
[perf]: https://perf.rust-lang.org/compare.html?start=a7e4de13c1785819f4d61da41f6704ed69d5f203&end=64fbb4f0b2d621ff46d559d1e9f5ad89a8d7789b&stat=instructions:u
Fix casting between methods and function pointers by assigning a
secondary type id to methods with their concrete self so they can be
used as function pointers.
CFI: Fix drop and drop_in_place
Fix drop and drop_in_place by transforming self of drop and drop_in_place methods into a Drop trait objects.
This was split off from https://github.com/rust-lang/rust/pull/116404.
cc `@compiler-errors` `@workingjubilee`
CFI: Support self_cell-like recursion
Current `transform_ty` attempts to avoid cycles when normalizing `#[repr(transparent)]` types to their interior, but runs afoul of this pattern used in `self_cell`:
```
struct X<T> {
x: u8,
p: PhantomData<T>,
}
#[repr(transparent)]
struct Y(X<Y>);
```
When attempting to normalize Y, it will still cycle indefinitely. By using a types-visited list, this will instead get expanded exactly one layer deep to X<Y>, and then stop, not attempting to normalize `Y` any further.
This PR was split off from #121962 as part of fixing the larger vtable compatibility issues.
r? ``````@workingjubilee``````
Let codegen decide when to `mem::swap` with immediates
Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.
Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.
r? oli-obk
Replaces #111744, and means we'll never need more libs PRs like #111803 or #107140
Current `transform_ty` attempts to avoid cycles when normalizing
`#[repr(transparent)]` types to their interior, but runs afoul of this
pattern used in `self_cell`:
```
struct X<T> {
x: u8,
p: PhantomData<T>,
}
#[repr(transparent)]
struct Y(X<Y>);
```
When attempting to normalize Y, it will still cycle indefinitely. By
using a types-visited list, this will instead get expanded exactly
one layer deep to X<Y>, and then stop, not attempting to normalize `Y`
any further.
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting binary.
CFI: Skip non-passed arguments
Rust will occasionally rely on fn((), X) -> Y being compatible with fn(X) -> Y, since () is a non-passed argument. Relax CFI by choosing not to encode non-passed arguments.
This PR was split off from #121962 as part of fixing the larger vtable compatibility issues.
r? `@workingjubilee`
Rust will occasionally rely on fn((), X) -> Y being compatible with
fn(X) -> Y, since () is a non-passed argument. Relax CFI by choosing not
to encode non-passed arguments.
CFI: Break tests into smaller files
Break type metadata identifiers tests into smaller set of tests/files, and move CFI (and KCFI) codegen tests to a cfi (and kcfi) subdirectory,
Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.
So introduce a new `typed_swap` intrinsic with a fallback body, but replace that implementation for immediates and scalar pairs.
Tests added in cast-target-abi.rs, covering the single element, array,
and prefix cases in `CastTarget::llvm_type`, and the Rust-is-larger/smaller
cases in the Rust<->ABI copying code.
ffi-out-of-bounds-loads.rs was overhauled to be runnable on any
platform. Its alignment also increases due to the removal of a `min` in
the previous commit; this was probably an insufficient workaround for
this issue or similar. The higher alignment is fine, since the alloca is
actually aligned to 8 bytes, as the test checks now confirm.
Stop walking the bodies of statics for reachability, and evaluate them instead
cc `@saethlin` `@RalfJung`
cc #119214
This reuses the `DefIdVisitor` from `rustc_privacy`, because they basically try to do the same thing.
This PR's changes can probably be extended to constants, too, but let's tackle that separately, it's likely more involved.
Represent `Result<usize, Box<T>>` as ScalarPair(i64, ptr)
This allows types like `Result<usize, std::io::Error>` (and integers of differing sign, e.g. `Result<u64, i64>`) to be passed in a pair of registers instead of through memory, like `Result<u64, u64>` or `Result<Box<T>, Box<U>>` are today.
Fixes#97540.
r? `@ghost`
Lower transmutes from int to pointer type as gep on null
I thought of this while looking at https://github.com/rust-lang/rust/pull/121242. See that PR's description for why this lowering is preferable.
The UI test that's being changed here crashes without changing the transmutes into casts. Based on that, this PR should not be merged without a crater build-and-test run.
Test wasm32-wasip1 in CI, not wasm32-unknown-unknown
This commit changes CI to no longer test the `wasm32-unknown-unknown` target and instead test the `wasm32-wasip1` target. There was some discussion of this in a [Zulip thread], and the motivations for this PR are:
* Runtime failures on `wasm32-unknown-unknown` print nothing, meaning all you get is "something failed". In contrast `wasm32-wasip1` can print to stdout/stderr.
* The unknown-unknown target is missing lots of pieces of libstd, and while `wasm32-wasip1` is also missing some pieces (e.g. threads) it's missing fewer pieces. This means that many more tests can be run.
Overall my hope is to improve the debuggability of wasm failures on CI and ideally be a bit less of a maintenance burden.
This commit specifically removes the testing of `wasm32-unknown-unknown` and replaces it with testing of `wasm32-wasip1`. Along the way there were a number of other archiectural changes made as well, including:
* A new `target.*.runtool` option can now be specified in `config.toml` which is passed as `--runtool` to `compiletest`. This is used to reimplement execution of WebAssembly in a less-wasm-specific fashion.
* The default value for `runtool` is an ambiently located WebAssembly runtime found on the system, if any. I've implemented logic for Wasmtime.
* Existing testing support for `wasm32-unknown-unknown` and Emscripten has been removed. I'm not aware of Emscripten testing being run any time recently and otherwise `wasm32-wasip1` is in theory the focus now.
* I've added a new `//@ needs-threads` directive for `compiletest` and classified a bunch of wasm-ignored tests as needing threads. In theory these tests can run on `wasm32-wasi-preview1-threads`, for example.
* I've tried to audit all existing tests that are either `ignore-emscripten` or `ignore-wasm*`. Many now run on `wasm32-wasip1` due to being able to emit error messages, for example. Many are updated with comments as to why they can't run as well.
* The `compiletest` output matching for `wasm32-wasip1` automatically uses "match a subset" mode implemented in `compiletest`. This is because WebAssembly runtimes often add extra information on failure, such as the `unreachable` instruction in `panic!`, which isn't able to be matched against the golden output from native platforms.
* I've ported most existing `run-make` tests that use custom Node.js wrapper scripts to the new run-make-based-in-Rust infrastructure. To do this I added `wasmparser` as a dependency of `run-make-support` for the various wasm tests to use that parse wasm files. The one test that executed WebAssembly now uses `wasmtime`-the-CLI to execute the test instead. I have not ported over an exception-handling test as Wasmtime doesn't implement this yet.
* I've updated the `test` crate to print out timing information for WASI targets as it can do that (gets a previously ignored test now passing).
* The `test-various` image now builds a WASI sysroot for the WASI target and additionally downloads a fixed release of Wasmtime, currently the latest one at 18.0.2, and uses that for testing.
[Zulip thread]: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Have.20wasm.20tests.20ever.20caused.20problems.20on.20CI.3F/near/424317944
* The WASI targets deal with the `main` symbol a bit differently than
native so some `codegen` and `assembly` tests have been ignored.
* All `ignore-emscripten` directives have been updated to
`ignore-wasm32` to be more clear that all wasm targets are ignored and
it's not just Emscripten.
* Most `ignore-wasm32-bare` directives are now gone.
* Some ignore directives for wasm were switched to `needs-unwind`
instead.
* Many `ignore-wasm32*` directives are removed as the tests work with
WASI as opposed to `wasm32-unknown-unknown`.
Use ptradd for vtable indexing
Extension of #121665.
After this, the only remaining usages of GEP are [this](cd81f5b27e/compiler/rustc_codegen_llvm/src/intrinsic.rs (L909-L920)) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.)
r? `@nikic`
Stop using LLVM struct types for byval/sret
For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.
\*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until #112157.
†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.
Split out from #121577.
r? `@nikic`
Update a test to support Symbol Mangling V0
Note that since this is a symbol from `std`, overriding the symbol mangling version via the `compile-flags` directive does not work.
Vec::try_with_capacity
Related to #91913
Implements try_with_capacity for `Vec`, `VecDeque`, and `String`. I can follow it up with more collections if desired.
`Vec::try_with_capacity()` is functionally equivalent to the current stable:
```rust
let mut v = Vec::new();
v.try_reserve_exact(n)?
```
However, `try_reserve` calls non-inlined `finish_grow`, which requires old and new `Layout`, and is designed to reallocate memory. There is benefit to using `try_with_capacity`, besides syntax convenience, because it generates much smaller code at the call site with a direct call to the allocator. There's codegen test included.
It's also a very desirable functionality for users of `no_global_oom_handling` (Rust-for-Linux), since it makes a very commonly used function available in that environment (`with_capacity` is used much more frequently than all `(try_)reserve(_exact)`).
Add asm goto support to `asm!`
Tracking issue: #119364
This PR implements asm-goto support, using the syntax described in "future possibilities" section of [RFC2873](https://rust-lang.github.io/rfcs/2873-inline-asm.html#asm-goto).
Currently I have only implemented the `label` part, not the `fallthrough` part (i.e. fallthrough is implicit). This doesn't reduce the expressive though, since you can use label-break to get arbitrary control flow or simply set a value and rely on jump threading optimisation to get the desired control flow. I can add that later if deemed necessary.
r? ``@Amanieu``
cc ``@ojeda``
It seems that LLVM 17 doesn't fully optimize out unwrap_unchecked.
We can just loosen the check lines to account for this, since we don't
really care about the exact instructions, we just want to make sure that
inttoptr/ptrtoint aren't used for Box.
This commit is extracted from #122036 and adds a new directive to the
`compiletest` test runner, `//@ needs-threads`. This is intended to
capture the need that a target must implement threading to execute a
specific test, typically one that uses `std::thread`. This is primarily
done for WebAssembly targets which currently do not have threads by
default. This enables transitioning a lot of `//@ ignore-wasm*`-style
ignores into a more self-documenting `//@ needs-threads` directive.
Additionally the `wasm32-wasi-preview1-threads` target, for example,
does actually have threads, but isn't tested in CI at this time. This
change enables running these tests for that target, but not other wasm
targets.
For the former, it's fine for `inbounds` offsets to be one-past-the-end,
so it's okay even if the ZST is the last field in the layout:
> The base pointer has an in bounds address of an allocated object,
> which means that it points into an allocated object, or to its end.
https://llvm.org/docs/LangRef.html#getelementptr-instruction
For the latter, even DST fields must always be inside the layout
(or to its end for ZSTs), so using inbounds is also fine there.
Adds initial support for DataFlowSanitizer to the Rust compiler. It
currently supports `-Zsanitizer-dataflow-abilist`. Additional options
for it can be passed to LLVM command line argument processor via LLVM
arguments using `llvm-args` codegen option (e.g.,
`-Cllvm-args=-dfsan-combine-pointer-labels-on-load=false`).
rename 'try' intrinsic to 'catch_unwind'
The intrinsic has nothing to do with `try` blocks, and corresponds to the stable `catch_unwind` function, so this makes a lot more sense IMO.
Also rename Miri's special function while we are at it, to reflect the level of abstraction it works on: it's an unwinding mechanism, on which Rust implements panics.
Allow tests to specify a `//@ filecheck-flags:` header
This allows individual codegen/assembly/mir-opt tests to pass extra flags to the LLVM `filecheck` tool as needed.
---
The original motivation was noticing that `tests/run-make/instrument-coverage` was very close to being an ordinary codegen test, except that it needs some extra logic to set up platform-specific variables to be passed into filecheck.
I then saw the comment in `verify_with_filecheck` indicating that a `filecheck-flags` header might be useful for other purposes as well.
This test was already very close to being an ordinary codegen test, except that
it needed some extra logic to set a few variables based on (target) platform
characteristics.
Now that we have support for `//@ filecheck-flags:`, we can instead set those
variables using the normal test revisions mechanism.
This makes room for migrating over `tests/run-make/instrument-coverage`,
without increasing the number of top-level items in the codegen test directory.
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.
And this uses the algebraic float ops to fix https://github.com/rust-lang/rust/issues/120720
cc `@orlp`
Implement intrinsics with fallback bodies
fixes#93145 (though we can port many more intrinsics)
cc #63585
The way this works is that the backend logic for generating custom code for intrinsics has been made fallible. The only failure path is "this intrinsic is unknown". The `Instance` (that was `InstanceDef::Intrinsic`) then gets converted to `InstanceDef::Item`, which represents the fallback body. A regular function call to that body is then codegenned. This is currently implemented for
* codegen_ssa (so llvm and gcc)
* codegen_cranelift
other backends will need to adjust, but they can just keep doing what they were doing if they prefer (though adding new intrinsics to the compiler will then require them to implement them, instead of getting the fallback body).
cc `@scottmcm` `@WaffleLapkin`
### todo
* [ ] miri support
* [x] default intrinsic name to name of function instead of requiring it to be specified in attribute
* [x] make sure that the bodies are always available (must be collected for metadata)
This test started failing on LLVM 18 after change
61118ffd04. As far as I can tell, it's
just good fortune that LLVM is able to sniff out the new noalias here,
and it's correct.
Remove an unneeded helper from the tuple library code
Thanks to https://github.com/rust-lang/rust/pull/107022, this is just what `==` does, so we don't need the helper here anymore.
Rollup of 8 pull requests
Successful merges:
- #120484 (Avoid ICE when is_val_statically_known is not of a supported type)
- #120516 (pattern_analysis: cleanup manual impls)
- #120517 (never patterns: It is correct to lower `!` to `_`.)
- #120523 (Improve `io::Read::read_buf_exact` error case)
- #120528 (Store SHOULD_CAPTURE as AtomicU8)
- #120529 (Update data layouts in custom target tests for LLVM 18)
- #120531 (Remove a bunch of `has_errors` checks that have no meaningful or the wrong effect)
- #120533 (Correct paths for hexagon-unknown-none-elf platform doc)
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid ICE when is_val_statically_known is not of a supported type
2 ICE with 1 stone!
1. Implement `llvm.is.constant.ptr` to avoid first ICE in linked issue.
2. return `false` when the argument is not one of `i*`/`f*`/`ptr` to avoid second ICE.
fixes#120480
Replacement of #114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two
This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See #114390 for more discussion.
While I have extended the scope of the power of two optimization from #114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.
Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).
Fixes#47234Resolves#114390
`@Centri3`
Use `assert_unchecked` instead of `assume` intrinsic in the standard library
Now that a public wrapper for the `assume` intrinsic exists, we can use it in the standard library.
CC #119131
Fix overflow check
Make MIRI choose the path randomly and rename the intrinsic
Add back test
Add miri test and make it operate on `ptr`
Define `llvm.is.constant` for primitives
Update MIRI comment and fix test in stage2
Add const eval test
Clarify that both branches must have the same side effects
guaranteed non guarantee
use immediate type instead
Co-Authored-By: Ralf Jung <post@ralfj.de>
Tune the inlinability of `unwrap`
Fixes#115463
cc `@thomcc`
This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking
The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.
As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
EDIT: I've restricted this to `Result` to avoid combining effects
Separate immediate and in-memory ScalarPair representation
Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.
This currently works fairly well, but runs into problems with https://github.com/rust-lang/rust/pull/116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18).
Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.
The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
llvm: Allow `noundef` in codegen tests
LLVM 18 will automatically infer `noundef` in some situations. Adjust codegen tests to accept this.
See llvm/llvm-project#76553 for why `noundef` is being generated now.
``@rustbot`` label:+llvm-main
LLVM 18 will automatically infer `noundef` in some situations.
Adjust codegen tests to accept this.
See llvm/llvm-project#76553 for why `noundef` is being generated now.
On LLVM 18 we get slightly different arguments here, so it's easier to
just regex those away. The important details are all still asserted as I
understand things.
Fixes#119193.
@rustbot label: +llvm-main
add more niches to rawvec
Previously RawVec only had a single niche in its `NonNull` pointer. With this change it now has `isize::MAX` niches since half the value-space of the capacity field is never needed, we can't have a capacity larger than isize::MAX.
Currently, we assume that ScalarPair is always represented using
a two-element struct, both as an immediate value and when stored
in memory.
This currently works fairly well, but runs into problems with
https://github.com/rust-lang/rust/pull/116672, where a ScalarPair
involving an i128 type can no longer be represented as a two-element
struct in memory. For example, the tuple `(i32, i128)` needs to be
represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy
alignment requirement. Using `{ i32, i128 }` instead will result in
the second element being stored at the wrong offset (prior to
LLVM 18).
Resolve this issue by no longer requiring that the immediate and
in-memory type for ScalarPair are the same. The in-memory type
will now look the same as for normal struct types (and will include
padding filler and similar), while the immediate type stays a
simple two-element struct type. This also means that booleans in
immediate ScalarPair are now represented as i1 rather than i8,
just like we do everywhere else.
The core change here is to llvm_type (which now treats ScalarPair
as a normal struct) and immediate_llvm_type (which returns the
two-element struct that llvm_type used to produce). The rest is
fixing things up to no longer assume these are the same. In
particular, this switches places that try to get pointers to the
ScalarPair elements to use byte-geps instead of struct-geps.
Sets the accessibility of types and fields in DWARF using
`DW_AT_accessibility` attribute.
`DW_AT_accessibility` (public/protected/private) isn't exactly right for
Rust, but neither is `DW_AT_visibility` (local/exported/qualified), and
there's no way to set `DW_AT_visbility` in LLVM's API.
Signed-off-by: David Wood <david@davidtw.co>
Enable stack probes on aarch64 for LLVM 18
I tested this on `aarch64-unknown-linux-gnu` with LLVM main (~18).
cc #77071, to be closed once we upgrade our LLVM submodule.
Add `-Zfunction-return={keep,thunk-extern}` option
This is intended to be used for Linux kernel RETHUNK builds.
With this commit (optionally backported to Rust 1.73.0), plus a patched Linux kernel to pass the flag, I get a RETHUNK build with Rust enabled that is `objtool`-warning-free and is able to boot in QEMU and load a sample Rust kernel module.
Issue: https://github.com/rust-lang/rust/issues/116853.
This is intended to be used for Linux kernel RETHUNK builds.
With this commit (optionally backported to Rust 1.73.0), plus a
patched Linux kernel to pass the flag, I get a RETHUNK build with
Rust enabled that is `objtool`-warning-free and is able to boot in
QEMU and load a sample Rust kernel module.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Add thinlto support to codegen, assembly and coverage tests
Using `--emit=llvm-ir` with thinlto usually result in multiple IR files.
Resolve test case failure issue reported in #113923.
In the future Windows will enable Control-flow Enforcement Technology
(CET aka Shadow Stacks). To protect the path where the context is
updated during exception handling, the binary is required to enumerate
valid unwind entrypoints in a dedicated section which is validated when
the context is being set during exception handling.
The required support for EHCONT has already been merged into LLVM,
long ago. This change adds the Rust codegen option to enable it.
Reference:
* https://reviews.llvm.org/D40223
This also adds a new `ehcont-guard` option to the bootstrap config which
enables EHCont Guard when building std.
Add -Z llvm_module_flag
Allow adding values to the `!llvm.module.flags` metadata for a generated module. The syntax is
`-Z llvm_module_flag=<name>:<type>:<value>:<behavior>`
Currently only u32 values are supported but the type is required to be specified for forward compatibility. The `behavior` element must match one of the named LLVM metadata behaviors.viors.
This flag is expected to be perma-unstable.
This was broken by upstream
llvm/llvm-project@dc6d077396. It's easy
enough to use a regex match to support both, so we do that.
r? @nikic
@rustbot label: +llvm-main
Allow adding values to the `!llvm.module.flags` metadata for a generated
module. The syntax is
`-Z llvm_module_flag=<name>:<type>:<value>:<behavior>`
Currently only u32 values are supported but the type is required to be
specified for forward compatibility. The `behavior` element must match
one of the named LLVM metadata behaviors.viors.
This flag is expected to be perma-unstable.
Hint optimizer about try-reserved capacity
This is #116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case #116570
Clean up unchecked_math, separate out unchecked_shifts
Tracking issue: #85122
Changes:
1. Remove `const_inherent_unchecked_arith` flag and make const-stability flags the same as the method feature flags. Given the number of other unsafe const fns already stabilised, it makes sense to just stabilise these in const context when they're stabilised.
2. Move `unchecked_shl` and `unchecked_shr` into a separate `unchecked_shifts` flag, since the semantics for them are unclear and they'll likely be stabilised separately as a result.
3. Add an `unchecked_neg` method exclusively to signed integers, under the `unchecked_neg` flag. This is because it's a new API and probably needs some time to marinate before it's stabilised, and while it *would* make sense to have a similar version for unsigned integers since `checked_neg` also exists for those there is absolutely no case where that would be a good idea, IMQHO.
The longer-term goal here is to prepare the `unchecked_math` methods for an FCP and stabilisation since they've existed for a while, their semantics are clear, and people seem in favour of stabilising them.
Decompose singular `matches!` with or-patterns to individual `matches!`
statements to enable branchless code output. The following functions
were changed:
- `is_ascii_alphanumeric`
- `is_ascii_hexdigit`
- `is_ascii_punctuation`
Add codegen tests
Co-authored-by: George Bateman <george.bateman16@gmail.com>
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
Copy 1-element arrays as scalars, not vectors
For `[T; 1]` it's silly to copy as `<1 x T>` when we can just copy as `T`.
Inspired by https://github.com/rust-lang/rust/issues/101210#issuecomment-1732470941, which pointed out that `Option<[u8; 1]>` was codegenning worse than `Option<u8>`.
(I'm not sure *why* LLVM doesn't optimize out `<1 x u8>`, but might as well just not emit it in the first place in this codepath.)
---
I think I bit off too much in #116479; let me try just the scalar case first.
r? `@ghost`
Raise minimum supported Apple OS versions
This implements the proposal to raise the minimum supported Apple OS versions as laid out in the now-completed MCP (https://github.com/rust-lang/compiler-team/issues/556).
As of this PR, rustc and the stdlib now support these versions as the baseline:
- macOS: 10.12 Sierra
- iOS: 10
- tvOS: 10
- watchOS: 5 (Unchanged)
In addition to everything this breaks indirectly, these changes also erase the `armv7-apple-ios` target (currently tier 3) because the oldest supported iOS device now uses ARMv7s. Not sure what the policy around tier3 target removal is but shimming it is not an option due to the linker refusing.
[Per comment](https://github.com/rust-lang/compiler-team/issues/556#issuecomment-1297175073), this requires a FCP to merge. cc `@wesleywiser.`
Enable -Zdrop-tracking-mir by default
This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in https://github.com/rust-lang/rust/pull/101692.
This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
Add codegen test to guard against VecDeque optimization regression
Very small PR that adds a codegen test to guard against regression for the `VecDeque` optimization addressed in #80836. Ensures that Rustc optimizes away the panic when unwrapping the result of `.get(0)` because of the `!is_empty()` condition.
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this are inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this, consider the following program:
```rust
#[no_mangle]
fn add_numbers(x: &Option<i32>, y: &Option<i32>) -> i32 {
let x1 = x.unwrap();
let y1 = y.unwrap();
x1 + y1
}
```
When building for x86_64 Windows using 1.72 it generates (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again.
Ideally, we would also deduplicate child scopes and variables, however my attempt to do that with #114643 resulted in asserts when building for Linux (#115156) which would require some deep changes to Rust to fix (#115455).
Instead, when using an inlined function as a debug scope, we will also create a new child scope such that subsequent child scopes and variables do not collide (from LLVM's perspective).
After this change the above assembly now (with <https://reviews.llvm.org/D159226> as well) shows the `panic!` was inlined from `unwrap` in `option.rs` at line 935 into the current function in `lib.rs` at line 0 (line 0 is emitted since it is ambiguous which line to use as there were two inline sites that lead to this same code):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
.cv_inline_site_id 6 within 0 inlined_at 1 0 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
Use `preserve_mostcc` for `extern "rust-cold"`
As experimentation in #115242 has shown looks better than `coldcc`. Notably, clang exposes `preserve_most` (https://clang.llvm.org/docs/AttributeReference.html#preserve-most) but not `cold`, so this change should put us on a better-supported path.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse. (See comment in the code.)
cc tracking issue #97544
Remove some wasm/emscripten ignores
I'm planning on landing a few PRs like this that remove ignores that aren't required. This just covers mir-opt and codegen tests.
As experimentation in 115242 has shown looks better than `coldcc`.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse.
cc tracking issue 97544
Revert "Use the same DISubprogram for each instance of the same inline function within the caller"
This reverts commit 687bffa493.
Reverting to resolve ICEs reported on nightly.
cc `@dpaoliello`
Fixes#115156
Fix#115150 by encoding f32 and f64 correctly for cross-language CFI. I
missed changing the encoding for f32 and f64 when I introduced the
integer normalization option in #105452 as integer normalization does
not include floating point. `f32` and `f64` should be always encoded as
`f` and `d` since they are both FFI safe when their representation are
the same (i.e., IEEE 754) for both the Rust compiler and Clang.
Add regression test for not `memcpy`ing padding bytes
Closes#56297
See this comparison: https://rust.godbolt.org/z/jjzfonfcE
I don't have any experience with codegen tests, I hope this is correct
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this is inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this when building for x86_64 Windows (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again, this also requires caching the `DILexicalBlock` and `DIVariable` objects to avoid creating duplicates.
After this change the above assembly now looks like:
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
.cv_inline_site_id 5 within 0 inlined_at 1 0 0
.cv_inline_site_id 6 within 5 inlined_at 1 12 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
We've investigated one reason why debugging information often goes wrong at https://reviews.llvm.org/D152095.
> LLVM can't handle IR where subprogram definitions are nested within DICompositeType when doing LTO builds,
> because there's no good way to cross the CU boundary to insert a nested DISubprogram definition in one CU into a type defined in another CU.
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly
As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)
Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.
cc `@RalfJung` `@Amanieu`