Add various aarch64 features already supported by LLVM and Linux.
The features are marked as unstable using a newly added symbol, i.e.
aarch64_unstable_target_feature.
Additionally include some comment fixes to ensure consistency of
feature names with the Arm ARM and support for architecture version
target features up to v9.5a.
This commit adds compiler support for the following features:
- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_FPMR
- FEAT_HBC
- FEAT_LSE128
- FEAT_LSE2
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT
Add `f16` and `f128` inline ASM support for `aarch64`
Adds `f16` and `f128` inline ASM support for `aarch64`. SIMD vector types are taken from [the ARM intrinsics list](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:`@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&f:@navigationhierarchiesarchitectures=[A64]).` Based on the work of `@lengrongfu` in #127043.
Relevant issue: #125398
Tracking issue: #116909
`@rustbot` label +F-f16_and_f128
try-job: aarch64-gnu
try-job: aarch64-apple
simd_shuffle intrinsic: allow argument to be passed as vector
See https://github.com/rust-lang/rust/issues/128738 for context.
I'd like to get rid of [this hack](6c0b89dfac/compiler/rustc_codegen_ssa/src/mir/block.rs (L922-L935)). https://github.com/rust-lang/rust/pull/128537 almost lets us do that since constant SIMD vectors will then be passed as immediate arguments. However, simd_shuffle for some reason actually takes an *array* as argument, not a vector, so the hack is still required to ensure that the array becomes an immediate (which then later stages of codegen convert into a vector, as that's what LLVM needs).
This PR prepares simd_shuffle to also support a vector as the `idx` argument. Once this lands, stdarch can hopefully be updated to pass `idx` as a vector, and then support for arrays can be removed, which finally lets us get rid of that hack.
Add `#[warn(unreachable_pub)]` to a bunch of compiler crates
By default `unreachable_pub` identifies things that need not be `pub` and tells you to make them `pub(crate)`. But sometimes those things don't need any kind of visibility. So they way I did these was to remove the visibility entirely for each thing the lint identifies, and then add `pub(crate)` back in everywhere the compiler said it was necessary. (Or occasionally `pub(super)` when context suggested that was appropriate.) Tedious, but results in more `pub` removal.
There are plenty more crates to do but this seems like enough for a first PR.
r? `@compiler-errors`
Set the cfi-normalize-integers and kcfi-offset module flags when
Control-Flow Integrity sanitizers are used, so functions generated by
the LLVM backend use the same CFI/KCFI options as rustc.
cfi-normalize-integers tells LLVM to also use integer normalization
for generated functions when -Zsanitizer-cfi-normalize-integers is
used.
kcfi-offset specifies the number of prefix nops between the KCFI
type hash and the function entry when -Z patchable-function-entry is
used. Note that LLVM assumes all indirectly callable functions use the
same number of prefix NOPs with -Zsanitizer=kcfi.
Avoid extra `cast()`s after `CStr::as_ptr()`
These used to be `&str` literals that did need a pointer cast, but that
became a no-op after switching to `c""` literals in #118566.
Special case DUMMY_SP to emit line 0/column 0 locations on DWARF platforms.
Line 0 has a special meaning in DWARF. From the version 5 spec:
The compiler may emit the value 0 in cases
where an instruction cannot be attributed to any
source line.
DUMMY_SP spans cannot be attributed to any line. However, because rustc internally stores line numbers starting at zero, lookup_debug_loc() adjusts every line number by one. Special casing DUMMY_SP to actually emit line 0 ensures rustc communicates to the debugger that there's no meaningful source code for this instruction, rather than telling the debugger to jump to line 1 randomly.
With the new resolver, a few dependencies get brought in twice with
different licenses. For example, all dependencies from `wasm-tools`
gained Apache-2.0 and MIT options, and with the v2 resolver we were
using one version from before and one version from after this change.
This made tidy's license check difficult.
Update some minimum versions to remove duplicate dependencies and smooth
out license checking.
Fix `is_val_statically_known` for floats
The LLVM intrinsic name for floats differs from the LLVM type name, so handle them explicitly. Also adds support for `f16` and `f128`.
`f16`/`f128` tracking issue: #116909
Rework MIR inlining debuginfo so function parameters show up in debuggers.
Line numbers of multiply-inlined functions were fixed in #114643 by using a single DISubprogram. That, however, triggered assertions because parameters weren't deduplicated. The "solution" to that in #115417 was to insert a DILexicalScope below the DISubprogram and parent all of the parameters to that scope. That fixed the assertion, but debuggers (including gdb and lldb) don't recognize variables that are not parented to the subprogram itself as parameters, even if they are emitted with DW_TAG_formal_parameter.
Consider the program:
```rust
use std::env;
#[inline(always)]
fn square(n: i32) -> i32 {
n * n
}
#[inline(never)]
fn square_no_inline(n: i32) -> i32 {
n * n
}
fn main() {
let x = square(env::vars().count() as i32);
let y = square_no_inline(env::vars().count() as i32);
println!("{x} == {y}");
}
```
When making a release build with debug=2 and rustc 1.82.0-nightly (8b3870784 2024-08-07)
```
(gdb) r
Starting program: /ephemeral/tmp/target/release/tmp [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, tmp::square () at src/main.rs:5
5 n * n
(gdb) info args
No arguments.
(gdb) info locals
n = 31
(gdb) c
Continuing.
Breakpoint 2, tmp::square_no_inline (n=31) at src/main.rs:10
10 n * n
(gdb) info args
n = 31
(gdb) info locals
No locals.
```
This issue is particularly annoying because it removes arguments from stack traces.
The DWARF for the inlined function looks like this:
```
< 2><0x00002132 GOFF=0x00002132> DW_TAG_subprogram
DW_AT_linkage_name _ZN3tmp6square17hc507052ff3d2a488E
DW_AT_name square
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
DW_AT_inline DW_INL_inlined
< 3><0x00002142 GOFF=0x00002142> DW_TAG_lexical_block
< 4><0x00002143 GOFF=0x00002143> DW_TAG_formal_parameter
DW_AT_name n
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
< 4><0x0000214e GOFF=0x0000214e> DW_TAG_null
< 3><0x0000214f GOFF=0x0000214f> DW_TAG_null
```
That DW_TAG_lexical_block inhibits every debugger I've tested from recognizing 'n' as a parameter.
This patch removes the additional lexical scope. Parameters can be easily deduplicated by a tuple of their scope and the argument index, at the trivial cost of taking a Hash + Eq bound on DIScope.
Use the `enum2$` Natvis visualiser for repr128 C-style enums
Use the preexisting `enum2$` Natvis visualiser to allow PDB debuggers to display fieldless `#[repr(u128)]]`/`#[repr(i128)]]` enums correctly.
Tracking issue: #56071
try-job: x86_64-msvc
Shrink `TyKind::FnPtr`.
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and `FnHeader`, which can be packed more efficiently. This reduces the size of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.
r? `@compiler-errors`
Line numbers of multiply-inlined functions were fixed in #114643 by using a
single DISubprogram. That, however, triggered assertions because parameters
weren't deduplicated. The "solution" to that in #115417 was to insert a
DILexicalScope below the DISubprogram and parent all of the parameters to that
scope. That fixed the assertion, but debuggers (including gdb and lldb) don't
recognize variables that are not parented to the subprogram itself as parameters,
even if they are emitted with DW_TAG_formal_parameter.
Consider the program:
use std::env;
fn square(n: i32) -> i32 {
n * n
}
fn square_no_inline(n: i32) -> i32 {
n * n
}
fn main() {
let x = square(env::vars().count() as i32);
let y = square_no_inline(env::vars().count() as i32);
println!("{x} == {y}");
}
When making a release build with debug=2 and rustc 1.82.0-nightly (8b3870784 2024-08-07)
(gdb) r
Starting program: /ephemeral/tmp/target/release/tmp
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, tmp::square () at src/main.rs:5
5 n * n
(gdb) info args
No arguments.
(gdb) info locals
n = 31
(gdb) c
Continuing.
Breakpoint 2, tmp::square_no_inline (n=31) at src/main.rs:10
10 n * n
(gdb) info args
n = 31
(gdb) info locals
No locals.
This issue is particularly annoying because it removes arguments from stack traces.
The DWARF for the inlined function looks like this:
< 2><0x00002132 GOFF=0x00002132> DW_TAG_subprogram
DW_AT_linkage_name _ZN3tmp6square17hc507052ff3d2a488E
DW_AT_name square
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
DW_AT_inline DW_INL_inlined
< 3><0x00002142 GOFF=0x00002142> DW_TAG_lexical_block
< 4><0x00002143 GOFF=0x00002143> DW_TAG_formal_parameter
DW_AT_name n
DW_AT_decl_file 0x0000000f /ephemeral/tmp/src/main.rs
DW_AT_decl_line 0x00000004
DW_AT_type 0x00001a56<.debug_info+0x00001a56>
< 4><0x0000214e GOFF=0x0000214e> DW_TAG_null
< 3><0x0000214f GOFF=0x0000214f> DW_TAG_null
That DW_TAG_lexical_block inhibits every debugger I've tested from recognizing
'n' as a parameter.
This patch removes the additional lexical scope. Parameters can be easily
deduplicated by a tuple of their scope and the argument index, at the trivial
cost of taking a Hash + Eq bound on DIScope.
const vector passed through to codegen
This allows constant vectors using a repr(simd) type to be propagated
through to the backend by reusing the functionality used to do a similar
thing for the simd_shuffle intrinsic
#118209
r? RalfJung
nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and
`FnHeader`, which can be packed more efficiently. This reduces the size
of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms.
This reduces peak memory usage by a few percent on some benchmarks. It
also reduces cache misses and page faults similarly, though this doesn't
translate to clear cycles or wall-time improvements on CI.
codegen: better centralize function declaration attribute computation
For some reason, the codegen backend has two functions that compute which attributes a function declaration gets: `apply_attrs_llfn` and `attributes::from_fn_attrs`. They are called in different places, on entirely different layers of abstraction.
To me the code seems cleaner if we centralize this entirely in `apply_attrs_llfn`, so that's what this PR does.
Make create_dll_import_lib easier to implement
This will make it easier to implement raw-dylib support in cg_clif and cg_gcc. This PR doesn't yet include an create_dll_import_lib implementation for cg_clif as I need to correctly implement dllimport in cg_clif first before raw-dylib can work at all with cg_clif.
Required for https://github.com/rust-lang/rustc_codegen_cranelift/issues/1345