Unconditionally emit the target-cpu LLVM attribute.
This PR makes `rustc` always emit the `target-cpu` LLVM attribute for functions. The goal is to allow for cross-language inlining of functions defined in `libstd`. So far `libstd` functions were the only function without a `target-cpu` attribute, so in whole-crate-graph cross-lang LTO scenarios they were not eligible for inlining into foreign code.
r? @alexcrichton
This was intended to land way back in 1.24, but it was backed out due to
breakage which has long since been fixed. An unstable `#[unwind]`
attribute can be used to tweak the behavior here, but this is currently
simply switching rustc's internal default to abort-by-default if an
`extern` function panics, making our codegen sound primarily (as
currently you can produce UB with safe code)
Closes#52652
Generalized operand.rs#nontemporal_store and fixed tidy issues
Generalized operand.rs#nontemporal_store's implem even more
With a BuilderMethod trait implemented by Builder for LLVM
Cleaned builder.rs : no more code duplication, no more ValueTrait
Full traitification of builder.rs
Prefer unwrap_or_else to unwrap_or in case of function calls/allocations
The contents of `unwrap_or` are evaluated eagerly, so it's not a good pick in case of function calls and allocations. This PR also changes a few `unwrap_or`s with `unwrap_or_default`.
An added bonus is that in some cases this change also reveals if the object it's called on is an `Option` or a `Result` (based on whether the closure takes an argument).
Support for disabling PLT for better function call performance
This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.
AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](https://github.com/rust-lang/rust/pull/43170), lazy binding was disabled anyway.
This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).
Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.
## Performance
I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):
```
name control ns/iter no-plt ns/iter diff ns/iter diff % speedup
build_app_long 11,097 10,733 -364 -3.28% x 1.03
build_app_short 11,089 10,742 -347 -3.13% x 1.03
build_help_long 186,835 182,713 -4,122 -2.21% x 1.02
build_help_short 80,949 78,455 -2,494 -3.08% x 1.03
parse_clean 12,385 12,044 -341 -2.75% x 1.03
parse_complex 19,438 19,017 -421 -2.17% x 1.02
parse_lots 431,493 421,421 -10,072 -2.33% x 1.02
```
A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.
## Security benefits
**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).
## Remaining PLT calls
The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
Disable the PLT where possible to improve performance
for indirect calls into shared libraries.
This optimization is enabled by default where possible.
- Add the `NonLazyBind` attribute to `rustllvm`:
This attribute informs LLVM to skip PLT calls in codegen.
- Disable PLT unconditionally:
Apply the `NonLazyBind` attribute on every function.
- Only enable no-plt when full relro is enabled:
Ensures we only enable it when we have linker support.
- Add `-Z plt` as a compiler option
Fix a few AMDGPU related issues
* AMDGPU ignores `noinline` and sadly doesn't clear the attribute when it slaps `alwaysinline` on everything,
* an AMDGPU related load bit range metadata assertion,
* I didn't enable the `amdgpu` component in the `librustc_llvm` build script,
* Add AMDGPU call abi info.
This fixes a regression from #53031 where specifying `-C target-cpu=native` is
printing a lot of warnings from LLVM about `native` being an unknown CPU. It
turns out that `native` is indeed an unknown CPU and we have to perform a
mapping to an actual CPU name, but this mapping is only performed in one
location rather than all locations we inform LLVM about the target CPU.
This commit centralizes the mapping of `native` to LLVM's value of the native
CPU, ensuring that all locations we inform LLVM about the `target-cpu` it's
never `native`.
Closes#53322
This shim is generated elsewhere in the compiler so this commit adds support to
ensure it goes through similar paths as the rest of the compiler to set llvm
function attributes like target features.
cc #53372
This commit stabilizes the `#[wasm_import_module]` attribute as
`#[link(wasm_import_module = "...")]`. Tracked by #52090 this new directive in
the `#[link]` attribute is used to configured the module name that the imports
are listed with. The WebAssembly specification indicates two utf-8 names are
associated with all imported items, one for the module the item comes from and
one for the item itself. The item itself is configurable in Rust via its
identifier or `#[link_name = "..."]`, but the module name was previously not
configurable and defaulted to `"env"`. This commit ensures that this is also
configurable.
Closes#52090
This commit upgrades the main LLVM submodule to LLVM's current master branch.
The LLD submodule is updated in tandem as well as compiler-builtins.
Along the way support was also added for LLVM 7's new features. This primarily
includes the support for custom section concatenation natively in LLD so we now
add wasm custom sections in LLVM IR rather than having custom support in rustc
itself for doing so.
Some other miscellaneous changes are:
* We now pass `--gc-sections` to `wasm-ld`
* The optimization level is now passed to `wasm-ld`
* A `--stack-first` option is passed to LLD to have stack overflow always cause
a trap instead of corrupting static data
* The wasm target for LLVM switched to `wasm32-unknown-unknown`.
* The syntax for aligned pointers has changed in LLVM IR and tests are updated
to reflect this.
* The `thumbv6m-none-eabi` target is disabled due to an [LLVM bug][llbug]
Nowadays we've been mostly only upgrading whenever there's a major release of
LLVM but enough changes have been happening on the wasm target that there's been
growing motivation for quite some time now to upgrade out version of LLD. To
upgrade LLD, however, we need to upgrade LLVM to avoid needing to build yet
another version of LLVM on the builders.
The revision of LLVM in use here is arbitrarily chosen. We will likely need to
continue to update it over time if and when we discover bugs. Once LLVM 7 is
fully released we can switch to that channel as well.
[llbug]: https://bugs.llvm.org/show_bug.cgi?id=37382
OOM can't unwind today, and historically it's been optimized as if it can't
unwind. This accidentally regressed with recent changes to the OOM handler, so
this commit adds in a codegen test to assert that everything gets optimized away
after the OOM function is approrpiately classified as nounwind
Closes#50925