Add `f16` inline ASM support for 32-bit ARM
Adds `f16` inline ASM support for 32-bit ARM. SIMD vector types are taken from [here](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:`@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&f:@navigationhierarchiesarchitectures=[A32]).`
Relevant issue: #125398
Tracking issue: #116909
`@rustbot` label +F-f16_and_f128
Add `f16` inline ASM support for RISC-V
This PR adds `f16` inline ASM support for RISC-V. A `FIXME` is left for `f128` support as LLVM does not support the required `Q` (Quad-Precision Floating-Point) extension yet.
Relevant issue: #125398
Tracking issue: #116909
`@rustbot` label +F-f16_and_f128
Honor collapse_debuginfo for statics.
fixes#126363
<!--
If this PR is related to an unstable feature or an otherwise tracked effort,
please link to the relevant tracking issue here. If you don't know of a related
tracking issue or there are none, feel free to ignore this.
This PR will get automatically assigned to a reviewer. In case you would like
a specific user to review your work, you can assign it to them by using
r? <reviewer name>
-->
Also sort `crt-static` in `--print target-features` output
I didn't find `crt-static` at first (for `x86_64-unknown-linux-gnu`), because it was put at the bottom of the large and otherwise sorted list.
Fully sort the list before we print it.
Note that `llvm_target_features` starts out and remains sorted and does not need to be sorted an extra time.
On my machine the diff is just:
```diff
$ diff -u /tmp/before2.txt /tmp/after2.txt
--- /tmp/before2.txt 2024-06-13 20:40:27.091636592 +0200
+++ /tmp/after2.txt 2024-06-13 20:39:54.584894891 +0200
``@@`` -20,6 +20,7 ``@@``
bmi1 - Support BMI instructions.
bmi2 - Support BMI2 instructions.
cmpxchg16b - 64-bit with cmpxchg16b (this is true for most x86-64 chips, but not the first AMD chips).
+ crt-static - Enables C Run-time Libraries to be statically linked.
ermsb - REP MOVS/STOS are fast.
f16c - Support 16-bit floating point conversion instructions.
fma - Enable three-operand fused multiple-add.
``@@`` -49,7 +50,6 ``@@``
xsavec - Support xsavec instructions.
xsaveopt - Support xsaveopt instructions.
xsaves - Support xsaves instructions.
- crt-static - Enables C Run-time Libraries to be statically linked.
Code-generation features supported by LLVM for this target:
16bit-mode - 16-bit mode (i8086).
```
I couldn't find a ui test that tested this output. Let's see if CI finds a regression tests.
I didn't find `crt-static` at first (for `x86_64-unknown-linux-gnu`),
because it was put at the bottom the large and otherwise sorted list.
Fully sort the list before we print it.
Note that `llvm_target_features` starts out sorted and does not need to
be sorted an extra time.
We already do this for a number of crates, e.g. `rustc_middle`,
`rustc_span`, `rustc_metadata`, `rustc_span`, `rustc_errors`.
For the ones we don't, in many cases the attributes are a mess.
- There is no consistency about order of attribute kinds (e.g.
`allow`/`deny`/`feature`).
- Within attribute kind groups (e.g. the `feature` attributes),
sometimes the order is alphabetical, and sometimes there is no
particular order.
- Sometimes the attributes of a particular kind aren't even grouped
all together, e.g. there might be a `feature`, then an `allow`, then
another `feature`.
This commit extends the existing sorting to all compiler crates,
increasing consistency. If any new attribute line is added there is now
only one place it can go -- no need for arbitrary decisions.
Exceptions:
- `rustc_log`, `rustc_next_trait_solver` and `rustc_type_ir_macros`,
because they have no crate attributes.
- `rustc_codegen_gcc`, because it's quasi-external to rustc (e.g. it's
ignored in `rustfmt.toml`).
Directly add extension instead of using `Path::with_extension`
`Path::with_extension` has a nice footgun when the original path doesn't contain an extension: Anything after the last dot gets removed.
Make repr(packed) vectors work with SIMD intrinsics
In #117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
Rollup of 6 pull requests
Successful merges:
- #125263 (rust-lld: fallback to rustc's sysroot if there's no path to the linker in the target sysroot)
- #125345 (rustc_codegen_llvm: add support for writing summary bitcode)
- #125362 (Actually use TAIT instead of emulating it)
- #125412 (Don't suggest adding the unexpected cfgs to the build-script it-self)
- #125445 (Migrate `run-make/rustdoc-with-short-out-dir-option` to `rmake.rs`)
- #125452 (Cleanup check-cfg handling in core and std)
r? `@ghost`
`@rustbot` modify labels: rollup
rustc_codegen_llvm: add support for writing summary bitcode
Typical uses of ThinLTO don't have any use for this as a standalone file, but distributed ThinLTO uses this to make the linker phase more efficient. With clang you'd do something like `clang -flto=thin -fthin-link-bitcode=foo.indexing.o -c foo.c` and then get both foo.o (full of bitcode) and foo.indexing.o (just the summary or index part of the bitcode). That's then usable by a two-stage linking process that's more friendly to distributed build systems like bazel, which is why I'm working on this area.
I talked some to `@teresajohnson` about naming in this area, as things seem to be a little confused between various blog posts and build systems. "bitcode index" and "bitcode summary" tend to be a little too ambiguous, and she tends to use "thin link bitcode" and "minimized bitcode" (which matches the descriptions in LLVM). Since the clang option is thin-link-bitcode, I went with that to try and not add a new spelling in the world.
Per `@dtolnay,` you can work around the lack of this by using `lld --thinlto-index-only` to do the indexing on regular .o files of bitcode, but that is a bit wasteful on actions when we already have all the information in rustc and could just write out the matching minimized bitcode. I didn't test that at all in our infrastructure, because by the time I learned that I already had this patch largely written.
If we don't do this, some versions of LLVM (at least 17, experimentally)
will double-emit some error messages, which is how I noticed this. Given
that it seems to be costing some extra work, let's only request the
summary bitcode production if we'll actually bother writing it down,
otherwise skip it.
Typical uses of ThinLTO don't have any use for this as a standalone
file, but distributed ThinLTO uses this to make the linker phase more
efficient. With clang you'd do something like `clang -flto=thin
-fthin-link-bitcode=foo.indexing.o -c foo.c` and then get both foo.o
(full of bitcode) and foo.indexing.o (just the summary or index part of
the bitcode). That's then usable by a two-stage linking process that's
more friendly to distributed build systems like bazel, which is why I'm
working on this area.
I talked some to @teresajohnson about naming in this area, as things
seem to be a little confused between various blog posts and build
systems. "bitcode index" and "bitcode summary" tend to be a little too
ambiguous, and she tends to use "thin link bitcode" and "minimized
bitcode" (which matches the descriptions in LLVM). Since the clang
option is thin-link-bitcode, I went with that to try and not add a new
spelling in the world.
Per @dtolnay, you can work around the lack of this by using `lld
--thinlto-index-only` to do the indexing on regular .o files of
bitcode, but that is a bit wasteful on actions when we already have all
the information in rustc and could just write out the matching minimized
bitcode. I didn't test that at all in our infrastructure, because by the
time I learned that I already had this patch largely written.
This code for recalculating `mcdc_bitmap_bytes` doesn't provide any benefit,
because its result won't have changed from the value in `FunctionCoverageInfo`
that was computed during the MIR instrumentation pass.
Rollup of 5 pull requests
Successful merges:
- #124615 (coverage: Further simplify extraction of mapping info from MIR)
- #124778 (Fix parse error message for meta items)
- #124797 (Refactor float `Primitive`s to a separate `Float` type)
- #124888 (Migrate `run-make/rustdoc-output-path` to rmake)
- #124957 (Make `Ty::builtin_deref` just return a `Ty`)
r? `@ghost`
`@rustbot` modify labels: rollup
Refactor float `Primitive`s to a separate `Float` type
Now there are 4 of them, it makes sense to refactor `F16`, `F32`, `F64` and `F128` out of `Primitive` and into a separate `Float` type (like integers already are). This allows patterns like `F16 | F32 | F64 | F128` to be simplified into `Float(_)`, and is consistent with `ty::FloatTy`.
As a side effect, this PR also makes the `Ty::primitive_size` method work with `f16` and `f128`.
Tracking issue: #116909
`@rustbot` label +F-f16_and_f128
codegen: memmove/memset cannot be non-temporal
non-temporal memset is not a thing.
And for memmove, since the LLVM backend doesn't support this, surely we don't need it in the GCC backend.