Support for `f16` and `f128` is varied across targets, backends, and
backend versions. Eventually we would like to reach a point where all
backends support these approximately equally, but until then we have to
work around some of these nuances of support being observable.
Introduce the `cfg_target_has_reliable_f16_f128` internal feature, which
provides the following new configuration gates:
* `cfg(target_has_reliable_f16)`
* `cfg(target_has_reliable_f16_math)`
* `cfg(target_has_reliable_f128)`
* `cfg(target_has_reliable_f128_math)`
`reliable_f16` and `reliable_f128` indicate that basic arithmetic for
the type works correctly. The `_math` versions indicate that anything
relying on `libm` works correctly, since sometimes this hits a separate
class of codegen bugs.
These options match configuration set by the build script at [1]. The
logic for LLVM support is duplicated as-is from the same script. There
are a few possible updates that will come as a follow up.
The config introduced here is not planned to ever become stable, it is
only intended to replace the build scripts for `std` tests and
`compiler-builtins` that don't have any way to configure based on the
codegen backend.
MCP: https://github.com/rust-lang/compiler-team/issues/866
Closes: https://github.com/rust-lang/compiler-team/issues/866
[1]: 555e1d0386/library/std/build.rs (L84-L186)
simd intrinsics with mask: accept unsigned integer masks, and fix some of the errors
It's not clear at all why the mask would have to be signed, it is anyway interpreted bitwise. The backend should just make sure that works no matter the surface-level type; our LLVM backend already does this correctly. The note of "the mask may be widened, which only has the correct behavior for signed integers" explains... nothing? Why can't the code do the widening correctly? If necessary, just cast to the signed type first...
Also while we are at it, fix the errors. For simd_masked_load/store, the errors talked about the "third argument" but they meant the first argument (the mask is the first argument there). They also used the wrong type for `expected_element`.
I have extremely low confidence in the GCC part of this PR.
See [discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/257879-project-portable-simd/topic/On.20the.20sign.20of.20masks)
Rename `is_like_osx` to `is_like_darwin`
Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).
``@rustbot`` label O-apple
r? compiler
Avoid wrapping constant allocations in packed structs when not necessary
This way LLVM will set the string merging flag if the alloc is a nul terminated string, reducing binary sizes.
try-job: armhf-gnu
Speed up target feature computation
The LLVM backend calls `LLVMRustHasFeature` twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.
r? `@bjorn3`
compiler: Use `size_of` from the prelude instead of imported
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the prelude instead of importing or qualifying them. Apply this change across the compiler.
These functions were added to all preludes in Rust 1.80.
r? ``@compiler-errors``
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the
prelude instead of importing or qualifying them.
These functions were added to all preludes in Rust 1.80.
Clean up various LLVM FFI things in codegen_llvm
cc ```@ZuseZ4``` I touched some autodiff parts
The major change of this PR is [bfd88ce](bfd88cead0) which makes `CodegenCx` generic just like `GenericBuilder`
The other commits mostly took advantage of the new feature of making extern functions safe, but also just used some wrappers that were already there and shrunk unsafe blocks.
best reviewed commit-by-commit
Currently it is called twice, once with `allow_unstable` set to true and
once with it set to false. This results in some duplicated work. Most
notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for
every feature, and it's moderately slow. For very short running
compilations on platforms with many features (e.g. a `check` build of
hello-world on x86) this is a significant fraction of runtime.
This commit changes `target_features_cfg` so it is only called once, and
it now returns a pair of feature sets. This halves the number of
`LLVMRustHasFeature` calls.
Rollup of 12 pull requests
Successful merges:
- #135767 (Future incompatibility warning `unsupported_fn_ptr_calling_conventions`: Also warn in dependencies)
- #137852 (Remove layouting dead code for non-array SIMD types.)
- #137863 (Fix pretty printing of unsafe binders)
- #137882 (do not build additional stage on compiler paths)
- #137894 (Revert "store ScalarPair via memset when one side is undef and the other side can be memset")
- #137902 (Make `ast::TokenKind` more like `lexer::TokenKind`)
- #137921 (Subtree update of `rust-analyzer`)
- #137922 (A few cleanups after the removal of `cfg(not(parallel))`)
- #137939 (fix order on shl impl)
- #137946 (Fix docker run-local docs)
- #137955 (Always allow rustdoc-json tests to contain long lines)
- #137958 (triagebot.toml: Don't label `test/rustdoc-json` as A-rustdoc-search)
r? `@ghost`
`@rustbot` modify labels: rollup
Stop using `hash_raw_entry` in `CodegenCx::const_str`
That unstable feature (#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
rename BackendRepr::Vector → SimdVector
For many Rustaceans, "vector" does not imply "SIMD", so let's be more clear in this type that is used pervasively in the compiler.
r? `@workingjubilee`
The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes#115344. Fixes#117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic
That unstable feature completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
remove `simd_fpow` and `simd_fpowi`
Discussed in https://github.com/rust-lang/rust/issues/137555
These functions are not exposed from `std::intrinsics::simd`, and not used anywhere outside of the compiler. They also don't lower to particularly good code at least on the major ISAs (I checked x86_64, aarch64, s390x, powerpc), where the vector is just spilled to the stack and scalar functions are used for the actual logic.
r? `@RalfJung`
intrinsics: unify rint, roundeven, nearbyint in a single round_ties_even intrinsic
LLVM has three intrinsics here that all do the same thing (when used in the default FP environment). There's no reason Rust needs to copy that historically-grown mess -- let's just have one intrinsic and leave it up to the LLVM backend to decide how to lower that.
Suggested by `@hanna-kruppe` in https://github.com/rust-lang/rust/issues/136459; Cc `@tgross35`
try-job: test-various
- For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information
- Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information
Set both `nuw` and `nsw` in slice size calculation
There's an old note in the code to do this, and now that [LLVM-C has an API for it](f0b8ff1251/llvm/include/llvm-c/Core.h (L4403-L4408)), we might as well. And it's been there since what looks like LLVM 17 de9b6aa341 so doesn't even need to be conditional.
(There's other places, like `RawVecInner` or `Layout`, that might want to do things like this too, but I'll leave those for a future PR.)
Parallel-compiler-related cleanup
Parallel-compiler-related cleanup
I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.
cc "Parallel Rustc Front-end" https://github.com/rust-lang/rust/issues/113349
r? SparrowLii
``@rustbot`` label: +WG-compiler-parallel