Add missing module flags for `-Zfunction-return=thunk-extern`
This fixes a bug in the `-Zfunction-return=thunk-extern` flag. The flag needs to be passed onto LLVM to ensure that functions such as `asan.module_ctor` and `asan.module_dtor` that are created internally in LLVM have the mitigation applied to them.
This was originally discovered [in the Linux kernel](https://lore.kernel.org/all/CANiq72myZL4_poCMuNFevtpYYc0V0embjSuKb7y=C+m3vVA_8g@mail.gmail.com/).
Original flag PR: #116892
PR for similar issue: #129373
Tracking issue: #116853
cc ``@ojeda``
r? ``@wesleywiser``
- fix for divergence
- fix error message
- fix another cranelift test
- fix some cranelift things
- don't set the NORETURN option for naked asm
- fix use of naked_asm! in doc comment
- fix use of naked_asm! in run-make test
- use `span_bug` in unreachable branch
Refactor the code in the `convert_while_ascii` helper function to make
it more suitable for auto-vectorization and also process the full ascii
prefix of the string. The generic case conversion logic will only be
invoked starting from the first non-ascii character.
The runtime on microbenchmarks with ascii-only inputs improves between
1.5x for short and 4x for long inputs on x86_64 and aarch64.
The new implementation also encapsulates all unsafe inside the
`convert_while_ascii` function.
Fixes#123712
Don't alloca for unused locals
We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.
There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, https://github.com/rust-lang/rust/pull/129283#issuecomment-2297925578 indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.
This fixes the second problem in https://github.com/rust-lang/rust/issues/129282, and brings us anther step toward `const if` at home.
Remove macOS 10.10 dynamic linker bug workaround
Rust's current minimum macOS version is 10.12, so the hack can be removed. This PR also updates the `remove_dir_all` docs to reflect that all supported macOS versions are protected against TOCTOU race conditions (the fallback implementation was already removed in #127683).
try-job: dist-x86_64-apple
try-job: dist-aarch64-apple
try-job: dist-apple-various
try-job: aarch64-apple
try-job: x86_64-apple-1
Update the minimum external LLVM to 18
With this change, we'll have stable support for LLVM 18 and 19.
For reference, the previous increase to LLVM 17 was #122649.
cc `@rust-lang/wg-llvm`
r? nikic
tests: add repr/transparent test for aarch64
Fixes#74396.
Moves `transparent-struct-ptr.rs` to `transparent-byval-struct-ptr.rs` and then adds a new `transparent-opaque-ptr.rs` for aarch64.
On LLVM 20, these instructions already get eliminated, which at least
partially satisfies a TODO. I'm not talented enough at using FileCheck
to try and constrain this further, but if we really want to we could
copy an LLVM 20 specific version of this test that would restore it to
being CHECK-NEXT: insertvalue ...
@rustbot label: +llvm-main
Do not request sanitizers for naked functions
Naked functions can only contain inline asm, so any instrumentation inserted by sanitizers is illegal. Don't request it.
Fixes https://github.com/rust-lang/rust/issues/129224.
Don't emit `expect`/`assume` in opt-level=0
LLVM does not make use of expect/assume calls in `opt-level=0`, so we can simplify IR by not emitting them in this case.
rustc_target: Add various aarch64 features
Add various aarch64 features already supported by LLVM and Linux.
Additionally include some comment fixes to ensure consistency of feature names with the Arm ARM.
Compiler support for features added to stdarch by https://github.com/rust-lang/stdarch/pull/1614.
Tracking issue for unstable aarch64 features is https://github.com/rust-lang/rust/issues/127764.
List of added features:
- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_HBC
- FEAT_LSE128
- FEAT_LSE2
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT
- FEAT_SME
- FEAT_SME_F16F16
- FEAT_SME_F64F64
- FEAT_SME_F8F16
- FEAT_SME_F8F32
- FEAT_SME_FA64
- FEAT_SME_I16I64
- FEAT_SME_LUTv2
- FEAT_SME2
- FEAT_SME2p1
- FEAT_SSVE_FP8DOT2
- FEAT_SSVE_FP8DOT4
- FEAT_SSVE_FP8FMA
FEAT_FPMR is added in the first commit and then removed in a separate one to highlight it being removed from upstream LLVM 19. The intention is for it to be detectable at runtime through stdarch but not have a corresponding Rust compile-time feature.
add repr to the allowlist for naked functions
Fixes#129412 (combining unstable features #90957 (`#![feature(naked_functions)]`) and #82232 (`#![feature(fn_align)]`)
Set the cfi-normalize-integers and kcfi-offset module flags when
Control-Flow Integrity sanitizers are used, so functions generated by
the LLVM backend use the same CFI/KCFI options as rustc.
cfi-normalize-integers tells LLVM to also use integer normalization
for generated functions when -Zsanitizer-cfi-normalize-integers is
used.
kcfi-offset specifies the number of prefix nops between the KCFI
type hash and the function entry when -Z patchable-function-entry is
used. Note that LLVM assumes all indirectly callable functions use the
same number of prefix NOPs with -Zsanitizer=kcfi.
Unconditionally allow shadow call-stack sanitizer for AArch64
It is possible to do so whenever `-Z fixed-x18` is applied.
cc ``@Darksonn`` for context
The reasoning is that, as soon as reservation on `x18` is forced through the flag `fixed-x18`, on AArch64 the option to instrument with [Shadow Call Stack sanitizer](https://clang.llvm.org/docs/ShadowCallStack.html) is then applicable regardless of the target configuration.
At the every least, we would like to relax the restriction on specifically `aarch64-unknonw-none`. For this option, we can include a documentation change saying that users of compiled objects need to ensure that they are linked to runtime with Shadow Call Stack instrumentation support.
Related: #121972
Rework MIR inlining debuginfo so function parameters show up in debuggers.
Line numbers of multiply-inlined functions were fixed in #114643 by using a single DISubprogram. That, however, triggered assertions because parameters weren't deduplicated. The "solution" to that in #115417 was to insert a DILexicalScope below the DISubprogram and parent all of the parameters to that scope. That fixed the assertion, but debuggers (including gdb and lldb) don't recognize variables that are not parented to the subprogram itself as parameters, even if they are emitted with DW_TAG_formal_parameter.
Consider the program:
```rust
use std::env;
#[inline(always)]
fn square(n: i32) -> i32 {
n * n
}
#[inline(never)]
fn square_no_inline(n: i32) -> i32 {
n * n
}
fn main() {
let x = square(env::vars().count() as i32);
let y = square_no_inline(env::vars().count() as i32);
println!("{x} == {y}");
}
```
When making a release build with debug=2 and rustc 1.82.0-nightly (8b3870784 2024-08-07)
```
(gdb) r
Starting program: /ephemeral/tmp/target/release/tmp [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, tmp::square () at src/main.rs:5
5 n * n
(gdb) info args
No arguments.
(gdb) info locals
n = 31
(gdb) c
Continuing.
Breakpoint 2, tmp::square_no_inline (n=31) at src/main.rs:10
10 n * n
(gdb) info args
n = 31
(gdb) info locals
No locals.
```
This issue is particularly annoying because it removes arguments from stack traces.
The DWARF for the inlined function looks like this:
```
< 2><0x00002132 GOFF=0x00002132> DW_TAG_subprogram
DW_AT_linkage_name _ZN3tmp6square17hc507052ff3d2a488E
DW_AT_name square
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
DW_AT_inline DW_INL_inlined
< 3><0x00002142 GOFF=0x00002142> DW_TAG_lexical_block
< 4><0x00002143 GOFF=0x00002143> DW_TAG_formal_parameter
DW_AT_name n
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
< 4><0x0000214e GOFF=0x0000214e> DW_TAG_null
< 3><0x0000214f GOFF=0x0000214f> DW_TAG_null
```
That DW_TAG_lexical_block inhibits every debugger I've tested from recognizing 'n' as a parameter.
This patch removes the additional lexical scope. Parameters can be easily deduplicated by a tuple of their scope and the argument index, at the trivial cost of taking a Hash + Eq bound on DIScope.
Line numbers of multiply-inlined functions were fixed in #114643 by using a
single DISubprogram. That, however, triggered assertions because parameters
weren't deduplicated. The "solution" to that in #115417 was to insert a
DILexicalScope below the DISubprogram and parent all of the parameters to that
scope. That fixed the assertion, but debuggers (including gdb and lldb) don't
recognize variables that are not parented to the subprogram itself as parameters,
even if they are emitted with DW_TAG_formal_parameter.
Consider the program:
use std::env;
fn square(n: i32) -> i32 {
n * n
}
fn square_no_inline(n: i32) -> i32 {
n * n
}
fn main() {
let x = square(env::vars().count() as i32);
let y = square_no_inline(env::vars().count() as i32);
println!("{x} == {y}");
}
When making a release build with debug=2 and rustc 1.82.0-nightly (8b3870784 2024-08-07)
(gdb) r
Starting program: /ephemeral/tmp/target/release/tmp
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, tmp::square () at src/main.rs:5
5 n * n
(gdb) info args
No arguments.
(gdb) info locals
n = 31
(gdb) c
Continuing.
Breakpoint 2, tmp::square_no_inline (n=31) at src/main.rs:10
10 n * n
(gdb) info args
n = 31
(gdb) info locals
No locals.
This issue is particularly annoying because it removes arguments from stack traces.
The DWARF for the inlined function looks like this:
< 2><0x00002132 GOFF=0x00002132> DW_TAG_subprogram
DW_AT_linkage_name _ZN3tmp6square17hc507052ff3d2a488E
DW_AT_name square
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
DW_AT_inline DW_INL_inlined
< 3><0x00002142 GOFF=0x00002142> DW_TAG_lexical_block
< 4><0x00002143 GOFF=0x00002143> DW_TAG_formal_parameter
DW_AT_name n
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
< 4><0x0000214e GOFF=0x0000214e> DW_TAG_null
< 3><0x0000214f GOFF=0x0000214f> DW_TAG_null
That DW_TAG_lexical_block inhibits every debugger I've tested from recognizing
'n' as a parameter.
This patch removes the additional lexical scope. Parameters can be easily
deduplicated by a tuple of their scope and the argument index, at the trivial
cost of taking a Hash + Eq bound on DIScope.
const vector passed through to codegen
This allows constant vectors using a repr(simd) type to be propagated
through to the backend by reusing the functionality used to do a similar
thing for the simd_shuffle intrinsic
#118209
r? RalfJung
nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
fix: #128855 Ensure `Guard`'s `drop` method is removed at `opt-level=s` for `…
fix: #128855
…Copy` types
Added `#[inline]` to the `drop` method in the `Guard` implementation to ensure that the method is removed by the compiler at optimization level `opt-level=s` for `Copy` types. This change aims to align the method's behavior with optimization expectations and ensure it does not affect performance.
r? `@scottmcm`
This commit adds a new test file 'array-from_fn.rs' to the codegen test suite.
The test checks the behavior of std::array::from_fn under different optimization levels:
1. At opt-level=0 (debug build), it verifies that the core::array::Guard
is present in the generated code.
2. At opt-level=s (size optimization), it ensures that the Guard is
optimized out.
This test helps ensure that the compiler correctly optimizes array::from_fn
calls in release builds while maintaining safety checks in debug builds.
Rollup of 8 pull requests
Successful merges:
- #128221 (Add implied target features to target_feature attribute)
- #128261 (impl `Default` for collection iterators that don't already have it)
- #128353 (Change generate-copyright to generate HTML, with cargo dependencies included)
- #128679 (codegen: better centralize function declaration attribute computation)
- #128732 (make `import.vis` is immutable)
- #128755 (Integrate crlf directly into related test file instead via of .gitattributes)
- #128772 (rustc_codegen_ssa: Set architecture for object crate for 32-bit SPARC)
- #128782 (unused_parens: do not lint against parens around &raw)
r? `@ghost`
`@rustbot` modify labels: rollup
Rollup of 4 pull requests
Successful merges:
- #127574 (elaborate unknowable goals)
- #128141 (Set branch protection function attributes)
- #128315 (Fix vita build of std and forbid unsafe in unsafe in the os/vita module)
- #128339 ([rustdoc] Make the buttons remain when code example is clicked)
r? `@ghost`
`@rustbot` modify labels: rollup
Add `select_unpredictable` to force LLVM to use CMOV
Since https://reviews.llvm.org/D118118, LLVM will no longer turn CMOVs into branches if it comes from a `select` marked with an `unpredictable` metadata attribute.
This PR introduces `core::intrinsics::select_unpredictable` which emits such a `select` and uses it in the implementation of `binary_search_by`.
Set branch protection function attributes
Since LLVM 19, it is necessary to set not only module flags, but also function attributes for branch protection on aarch64. See e15d67cfc2 for the relevant LLVM change.
Fixes https://github.com/rust-lang/rust/issues/127829.
Since https://reviews.llvm.org/D118118, LLVM will no longer turn CMOVs
into branches if it comes from a `select` marked with an `unpredictable`
metadata attribute.
This PR introduces `core::intrinsics::select_unpredictable` which emits
such a `select` and uses it in the implementation of `binary_search_by`.
Delete `SimplifyArmIdentity` and `SimplifyBranchSame` tests
These two passes have already been deleted in #107256. I'm not sure why tidy didn't catch it.
As regression tests, I didn't delete `tests/ui/mir/issue-66851.rs` and `tests/ui/mir/simplify-branch-same.rs`.
r? compiler
Since LLVM 19, it is necessary to set not only module flags, but
also function attributes for branch protection on aarch64. See
e15d67cfc2
for the relevant LLVM change.
reenable some windows tests
Locally passing on `x86_64-pc-windows-msvc`, fingers crossed for `*-pc-windows-gnu`.
try-job: x86_64-msvc
try-job: x86_64-mingw
Ensure floats are returned losslessly by the Rust ABI on 32-bit x86
Solves #115567 for the (default) `"Rust"` ABI. When compiling for 32-bit x86, this PR changes the `"Rust"` ABI to return floats indirectly instead of in x87 registers (with the exception of single `f32`s, which this PR returns in general purpose registers as they are small enough to fit in one). No change is made to the `"C"` ABI as that ABI requires x87 register usage and therefore will need a different solution.
`-Z patchable-function-entry` works like `-fpatchable-function-entry`
on clang/gcc. The arguments are total nop count and function offset.
See MCP rust-lang/compiler-team#704
Account for things that optimize out in inlining costs
This updates the MIR inlining `CostChecker` to have both bonuses and penalties, rather than just penalties.
That lets us add bonuses for some things where we want to encourage inlining without risking wrapping into a gigantic cost. For example, `switchInt(const …)` we give an inlining bonus because codegen will actually eliminate the branch (and associated dead blocks) once it's monomorphized, so measuring both sides of the branch gives an unrealistically-high cost to it. Similarly, an `unreachable` terminator gets a small bonus, because whatever branch leads there doesn't actually exist post-codegen.
Stabilise `c_unwind`
Fix#74990Fix#115285 (that's also where FCP is happening)
Marking as draft PR for now due to `compiler_builtins` issues
r? `@Amanieu`
Don't build a broken/untested profiler runtime on mingw targets
Context: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Why.20build.20a.20broken.2Funtested.20profiler.20runtime.20on.20mingw.3F#75872 added `--enable-profiler` to the `x86_64-mingw` job (to cause some additional tests to run), but had to also add `//@ ignore-windows-gnu` to all of the tests that rely on the profiler runtime actually *working*, because it's broken on that target.
We can achieve a similar outcome by going through all the `//@ needs-profiler-support` tests that don't actually need to produce/run a binary, and making them use `-Zno-profiler-runtime` instead, so that they can run even in configurations that don't have the profiler runtime available. Then we can remove `--enable-profiler` from `x86_64-mingw`, and still get the same amount of testing.
This PR also removes `--enable-profiler` from the mingw dist builds, since it is broken/untested on that target. Those builds have had that flag for a very long time.
For PGO/coverage tests that don't need to build or run an actual artifact, we
can use `-Zno-profiler-runtime` to run the test even when the profiler runtime
is not available.
Rollup of 16 pull requests
Successful merges:
- #123374 (DOC: Add FFI example for slice::from_raw_parts())
- #124514 (Recommend to never display zero disambiguators when demangling v0 symbols)
- #125978 (Cleanup: HIR ty lowering: Consolidate the places that do assoc item probing & access checking)
- #125980 (Nvptx remove direct passmode)
- #126187 (For E0277 suggest adding `Result` return type for function when using QuestionMark `?` in the body.)
- #126210 (docs(core): make more const_ptr doctests assert instead of printing)
- #126249 (Simplify `[T; N]::try_map` signature)
- #126256 (Add {{target}} substitution to compiletest)
- #126263 (Make issue-122805.rs big endian compatible)
- #126281 (set_env: State the conclusion upfront)
- #126286 (Make `storage-live.rs` robust against rustc internal changes.)
- #126287 (Update a cranelift patch file for formatting changes.)
- #126301 (Use `tidy` to sort crate attributes for all compiler crates.)
- #126305 (Make PathBuf less Ok with adding UTF-16 then `into_string`)
- #126310 (Migrate run make prefer rlib)
- #126314 (fix RELEASES: we do not support upcasting to auto traits)
r? `@ghost`
`@rustbot` modify labels: rollup
Instead of not generating the function at all on big endian (which
makes the CHECK lines fail), instead use to_le() on big endian,
so that we essentially perform a bswap for both endiannesses.
Remove hard-coded hashes from codegen tests
This removes hard-coded hashes from the codegen and assembly tests. These use FileCheck, which supports eliding part of the pattern being matched, including by capturing it as a pattern parameter for later matching-on. This is much more appropriate than asking contributors to engage with deliberately-opaque identifier schemes.
In order to reduce the likelihood of error, every hash-coded segment I've touched now expects a certain length. This correctly represents these cases, as our hash outputs have a predetermined amount of entropy attached to them.
This is not done for the UI test suite as those are comparatively easy to simply `--bless`, whereas that would be inappropriate for codegen tests. It is also not done for debuginfo tests as those tests do not support such elision in a correct and useful way.
Fix tests/codegen/riscv-abi/call-llvm-intrinsics.rs
Fix tests/codegen/riscv-abi/riscv64-lp64d-abi.rs
Fix tests/codegen/riscv-abi/riscv64-lp64f-lp64d-abi.rs
On riscv64gc ignore tests/ui/debuginfo/debuginfo-emit-llvm-ir-and-split-debuginfo.rs
Make tests/codegen/riscv-abi/riscv64-lp64d-abi.rs no_core
Make tests/codegen/riscv-abi/riscv64-lp64f-lp64d-abi.rs no_core
Set -O for tests/codegen/riscv-abi/riscv64-lp64d-abi.rs
Set -O for tests/codegen/riscv-abi/riscv64-lp64f-lp64d-abi.rs
When things like our internal hashing or representations change,
it is inappropriate for these tests to suddenly fail for no reason.
The chance of error is reduced if we instead pattern-match.
Make repr(packed) vectors work with SIMD intrinsics
In #117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
Unroll first iteration of checked_ilog loop
This follows the optimization of #115913. As shown in https://github.com/rust-lang/rust/pull/115913#issuecomment-2066788006, the performance was improved in all important cases, but some regressions were introduced for the benchmarks `u32_log_random_small`, `u8_log_random` and `u8_log_random_small`.
Basically, #115913 changed the implementation from one division per iteration to one multiplication per iteration plus one division. When there are zero iterations, this is a regression from zero divisions to one division.
This PR avoids this by avoiding the division if we need zero iterations by returning `Some(0)` early. It also reduces the number of multiplications by one in all other cases.
Except for `simd-intrinsic/`, which has a lot of files containing
multiple types like `u8x64` which really are better when hand-formatted.
There is a surprising amount of two-space indenting in this directory.
Non-trivial changes:
- `rustfmt::skip` needed in `debug-column.rs` to preserve meaning of the
test.
- `rustfmt::skip` used in a few places where hand-formatting read more
nicely: `enum/enum-match.rs`
- Line number adjustments needed for the expected output of
`debug-column.rs` and `coroutine-debug.rs`.
Make more of the test suite run on Mac Catalyst
Combined with https://github.com/rust-lang/rust/pull/125225, the only failing parts of the test suite are in `tests/rustdoc-js`, `tests/rustdoc-js-std` and `tests/debuginfo`. Tested with:
```console
./x test --target=aarch64-apple-ios-macabi library/std
./x test --target=aarch64-apple-ios-macabi --skip=tests/rustdoc-js --skip=tests/rustdoc-js-std --skip=tests/debuginfo tests
```
Will probably put up a PR later to enable _running_ on (not just compiling for) Mac Catalyst in CI, though not sure where exactly I should do so? `src/ci/github-actions/jobs.yml`?
Note that I've deliberately _not_ enabled stack overflow handlers on iOS/tvOS/watchOS/visionOS (see https://github.com/rust-lang/rust/issues/25872), but rather just skipped those tests, as it uses quite a few APIs that I'd be weary about getting rejected by the App Store (note that Swift doesn't do it on those platforms either).
r? ``@workingjubilee``
CC ``@thomcc``
``@rustbot`` label O-ios O-apple
Omit non-needs_drop drop_in_place in vtables
This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.
This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.
Accepted MCP: https://github.com/rust-lang/compiler-team/issues/730
This adds the `only-apple`/`ignore-apple` compiletest directive, and
uses that basically everywhere instead of `only-macos`/`ignore-macos`.
Some of the updates in `run-make` are a bit redundant, as they use
`ignore-cross-compile` and won't run on iOS - but using Apple in these
is still more correct, so I've made that change anyhow.
This replaces the drop_in_place reference with null in vtables. On
librustc_driver.so, this drops about ~17k dynamic relocations from the
output, since many vtables can now be placed in read-only memory, rather
than having a relocated pointer included.
This makes a tradeoff by adding a null check at vtable call sites.
That's hard to avoid without changing the vtable format (e.g., to use a
pc-relative relocation instead of an absolute address, and avoid the
dynamic relocation that way). But it seems likely that the check is
cheap at runtime.
These types are currently rejected for `as` casts by the compiler.
Remove this incorrect check and add codegen tests for all conversions
involving these types.
https://github.com/llvm/llvm-project/pull/89799 changes llvm.dbg.value/declare intrinsics to be in a different, out-of-instruction-line representation. For example
call void @llvm.dbg.declare(...)
becomes
#dbg_declare(...)
Update tests accordingly to work with both the old and new way.
There are a few tests that depend on some target features **not** being
enabled by default, and usually they are correct with the default x86-64
target CPU. However, in downstream builds we have modified the default
to fit our distros -- `x86-64-v2` in RHEL 9 and `x86-64-v3` in RHEL 10
-- and the latter especially trips tests that expect not to have AVX.
These cases are few enough that we can just set them back explicitly.
codegen tests: Tolerate `range()` qualifications in enum tests
Current LLVM can infer range bounds on the i8s involved with these tests, and annotates it. Accept these bounds if present.
`@rustbot` label: +llvm-main
cc `@durin42`
Set writable and dead_on_unwind attributes for sret arguments
Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.
See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:
* The return slot is writable.
* The return slot will not be read if the function unwinds.
Fixes https://github.com/rust-lang/rust/issues/90595.
Stop using LLVM struct types for alloca
The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.
Split out from #121577.
r? `@ghost`
Dellvmize some intrinsics (use `u32` instead of `Self` in some integer intrinsics)
This implements https://github.com/rust-lang/compiler-team/issues/693 minus what was implemented in #123226.
Note: I decided to _not_ change `shl`/... builder methods, as it just doesn't seem worth it.
r? ``@scottmcm``
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.
For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`. After this PR they do.
Add the missing inttoptr when we ptrtoint in ptr atomics
Ralf noticed this here: https://github.com/rust-lang/rust/pull/122220#discussion_r1535172094
Our previous codegen forgot to add the cast back to integer type. The code compiles anyway, because of course all locals are in-memory to start with, so previous codegen would do the integer atomic, store the integer to a local, then load a pointer from that local. Which is definitely _not_ what we wanted: That's an integer-to-pointer transmute, so all pointers returned by these `AtomicPtr` methods didn't have provenance. Yikes.
Here's the IR for `AtomicPtr::fetch_byte_add` on 1.76: https://godbolt.org/z/8qTEjeraY
```llvm
define noundef ptr `@atomicptr_fetch_byte_add(ptr` noundef nonnull align 8 %a, i64 noundef %v) unnamed_addr #0 !dbg !7 {
start:
%0 = alloca ptr, align 8, !dbg !12
%val = inttoptr i64 %v to ptr, !dbg !12
call void `@llvm.lifetime.start.p0(i64` 8, ptr %0), !dbg !28
%1 = ptrtoint ptr %val to i64, !dbg !28
%2 = atomicrmw add ptr %a, i64 %1 monotonic, align 8, !dbg !28
store i64 %2, ptr %0, align 8, !dbg !28
%self = load ptr, ptr %0, align 8, !dbg !28
call void `@llvm.lifetime.end.p0(i64` 8, ptr %0), !dbg !28
ret ptr %self, !dbg !33
}
```
r? `@RalfJung`
cc `@nikic`
llvm/llvm-project#87910 infers `nuw` and `nsw` on some `trunc`
instructions we're doing `FileCheck` on. Tolerate but don't require them
to support both release and head LLVM.
`f16` and `f128` step 4: basic library support
This is the next step after https://github.com/rust-lang/rust/pull/121926, another portion of https://github.com/rust-lang/rust/pull/114607
Tracking issue: https://github.com/rust-lang/rust/issues/116909
This PR adds the most basic operations to `f16` and `f128` that get lowered as LLVM intrinsics. This is a very small step but it seemed reasonable enough to add unopinionated basic operations before the larger modules that are built on top of them.
r? ```@Amanieu``` since you were pretty involved in the RFC
cc ```@compiler-errors```
```@rustbot``` label +T-libs-api +S-blocked +F-f16_and_f128
I added this back in 111999, but I no longer think it's a good idea
- It had to get scaled back to only power-of-two things to not break a bunch of targets
- LLVM seems to be getting better at memcpy removal anyway
- Introducing vector instructions has seemed to sometimes (115515) make autovectorization worse
So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store for immediates.
Re-enable the early otherwise branch optimization
Closes#95162. Fixes#119014.
This is the first part of #121397.
An invalid enum discriminant can come from anywhere. We have to check to see if all successors contain the discriminant statement. This should have a pass to hoist instructions.
r? cjgillot
Don't emit divide-by-zero panic paths in `StepBy::len`
I happened to notice today that there's actually two such calls emitted in the assembly: <https://rust.godbolt.org/z/1Wbbd3Ts6>
Since they're impossible, hopefully telling LLVM that will also help optimizations elsewhere.
Fix argument ABI for overaligned structs on ppc64le
When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of `i128` rather than an array of `i64`. This will force the use of an even starting doubleword.
For the case of a 16 byte struct with alignment 16 it is important that `[1 x i128]` is used instead of `i128` -- apparently, the latter will get treated similarly to `[2 x i64]`, not exhibiting the correct ABI. Add a `force_array` flag to `Uniform` to support this.
The relevant clang code can be found here:
fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L878-L884)fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L780-L784)
I think the corresponding psABI wording is this:
> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.
In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16.
Fixes https://github.com/rust-lang/rust/issues/122767.
r? `@cuviper`
Implement minimal, internal-only pattern types in the type system
rebase of https://github.com/rust-lang/rust/pull/107606
You can create pattern types with `std::pat::pattern_type!(ty is pat)`. The feature is incomplete and will panic on you if you use any pattern other than integral range patterns. The only way to create or deconstruct a pattern type is via `transmute`.
This PR's implementation differs from the MCP's text. Specifically
> This means you could implement different traits for different pattern types with the same base type. Thus, we just forbid implementing any traits for pattern types.
is violated in this PR. The reason is that we do need impls after all in order to make them usable as fields. constants of type `std::time::Nanoseconds` struct are used in patterns, so the type must be structural-eq, which it only can be if you derive several traits on it. It doesn't need to be structural-eq recursively, so we can just manually implement the relevant traits on the pattern type and use the pattern type as a private field.
Waiting on:
* [x] move all unrelated commits into their own PRs.
* [x] fix niche computation (see 2db07f94f44f078daffe5823680d07d4fded883f)
* [x] add lots more tests
* [x] T-types MCP https://github.com/rust-lang/types-team/issues/126 to finish
* [x] some commit cleanup
* [x] full self-review
* [x] remove 61bd325da19a918cc3e02bbbdce97281a389c648, it's not necessary anymore I think.
* [ ] ~~make sure we never accidentally leak pattern types to user code (add stability checks or feature gate checks and appopriate tests)~~ we don't even do this for the new float primitives
* [x] get approval that [the scope expansion to trait impls](https://rust-lang.zulipchat.com/#narrow/stream/326866-t-types.2Fnominated/topic/Pattern.20types.20types-team.23126/near/427670099) is ok
r? `@BoxyUwU`
When passing a 16 (or higher) aligned struct by value on ppc64le,
it needs to be passed as an array of `i128` rather than an array
of `i64`. This will force the use of an even starting register.
For the case of a 16 byte struct with alignment 16 it is important
that `[1 x i128]` is used instead of `i128` -- apparently, the
latter will get treated similarly to `[2 x i64]`, not exhibiting
the correct ABI. Add a `force_array` flag to `Uniform` to support
this.
The relevant clang code can be found here:
fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L878-L884)fe2119a7b0/clang/lib/CodeGen/Targets/PPC.cpp (L780-L784)
I think the corresponding psABI wording is this:
> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.
In particular the last sentence.
Fixes https://github.com/rust-lang/rust/issues/122767.
Use unchecked_sub in str indexing
https://github.com/rust-lang/rust/pull/108763 applied this logic to indexing for slices, but of course `str` has its own separate impl.
Found this by skimming over the codegen for https://github.com/oxidecomputer/hubris/; their dist builds enable overflow checks so the lack of `unchecked_sub` was producing an impossible-to-hit overflow check and also inhibiting some inlining.
r? scottmcm
I happened to notice today that there's actually two such calls emitted in the assembly: <https://rust.godbolt.org/z/1Wbbd3Ts6>
Since they're impossible, hopefully telling LLVM that will also help optimizations elsewhere.
CFI: Don't rewrite ty::Dynamic directly
Now that we're using a type folder, the arguments in predicates are processed automatically - we don't need to descend manually.
We also want to keep projection clauses around, and this does so.
r? `@compiler-errors`
Now that we're using a type folder, the arguments in predicates are
processed automatically - we don't need to descend manually.
We also want to keep projection clauses around, and this does so.
CFI: Restore typeid_for_instance default behavior
Restore typeid_for_instance default behavior of performing self type erasure, since it's the most common case and what it does most of the time. Using concrete self (or not performing self type erasure) is for assigning a secondary type id, and secondary type ids are only assigned when they're unique and to methods, and also are only tested for when methods are used as function pointers.
Restore typeid_for_instance default behavior of performing self type
erasure, since it's the most common case and what it does most of the
time. Using concrete self (or not performing self type erasure) is for
assigning a secondary type id, and secondary type ids are only assigned
when they're unique and to methods, and also are only tested for when
methods are used as function pointers.
CFI: Support function pointers for trait methods
Adds support for both CFI and KCFI for function pointers to trait methods by attaching both concrete and abstract types to functions.
KCFI does this through generation of a `ReifyShim` on any function pointer for a method that could go into a vtable, and keeping this separate from `ReifyShim`s that are *intended* for vtable us by setting a `ReifyReason` on them.
CFI does this by setting both the concrete and abstract type on every instance.
This should land after #123024 or a similar PR, as it diverges the implementation of CFI vs KCFI.
r? `@compiler-errors`
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`
Per [#120268](https://github.com/rust-lang/rust/pull/120268#discussion_r1517492060), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .
I solved some nits to add some comments.
I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.
r? RalfJung
Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
This is just one part of the MCP, but it's the one that IMHO removes the most noise from the standard library code.
Seems net simpler this way, since MIR already supported heterogeneous shifts anyway, and thus it's not more work for backends than before.
Remove len argument from RawVec::reserve_for_push
Removes `RawVec::reserve_for_push`'s `len` argument since it's always the same as capacity.
Also makes `Vec::insert` use `RawVec::reserve_for_push`.
CFI: Fix methods as function pointer cast
Fix casting between methods and function pointers by assigning a secondary type id to methods with their concrete self so they can be used as function pointers.
This was split off from #116404.
cc `@compiler-errors` `@workingjubilee`
Codegen const panic messages as function calls
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting shared library (see [perf]).
A sample improvement from nightly:
```
leaq str.0(%rip), %rdi
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdx
movl $25, %esi
callq *_ZN4core9panicking5panic17h17cabb89c5bcc999E@GOTPCREL(%rip)
```
to this PR:
```
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdi
callq *_RNvNtNtCsduqIKoij8JB_4core9panicking11panic_const23panic_const_div_by_zero@GOTPCREL(%rip)
```
[perf]: https://perf.rust-lang.org/compare.html?start=a7e4de13c1785819f4d61da41f6704ed69d5f203&end=64fbb4f0b2d621ff46d559d1e9f5ad89a8d7789b&stat=instructions:u
Fix casting between methods and function pointers by assigning a
secondary type id to methods with their concrete self so they can be
used as function pointers.
CFI: Fix drop and drop_in_place
Fix drop and drop_in_place by transforming self of drop and drop_in_place methods into a Drop trait objects.
This was split off from https://github.com/rust-lang/rust/pull/116404.
cc `@compiler-errors` `@workingjubilee`
CFI: Support self_cell-like recursion
Current `transform_ty` attempts to avoid cycles when normalizing `#[repr(transparent)]` types to their interior, but runs afoul of this pattern used in `self_cell`:
```
struct X<T> {
x: u8,
p: PhantomData<T>,
}
#[repr(transparent)]
struct Y(X<Y>);
```
When attempting to normalize Y, it will still cycle indefinitely. By using a types-visited list, this will instead get expanded exactly one layer deep to X<Y>, and then stop, not attempting to normalize `Y` any further.
This PR was split off from #121962 as part of fixing the larger vtable compatibility issues.
r? ``````@workingjubilee``````
Let codegen decide when to `mem::swap` with immediates
Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.
Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.
r? oli-obk
Replaces #111744, and means we'll never need more libs PRs like #111803 or #107140
Current `transform_ty` attempts to avoid cycles when normalizing
`#[repr(transparent)]` types to their interior, but runs afoul of this
pattern used in `self_cell`:
```
struct X<T> {
x: u8,
p: PhantomData<T>,
}
#[repr(transparent)]
struct Y(X<Y>);
```
When attempting to normalize Y, it will still cycle indefinitely. By
using a types-visited list, this will instead get expanded exactly
one layer deep to X<Y>, and then stop, not attempting to normalize `Y`
any further.
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting binary.