Mark drop calls in landing pads `cold` instead of `noinline`
Now that deferred inlining has been disabled in LLVM (#92110), this shouldn't cause catastrophic size blowup.
I confirmed that the test cases from https://github.com/rust-lang/rust/issues/41696#issuecomment-298696944 still compile quickly (<1s) after this change. ~Although note that I wasn't able to reproduce the original issue using a recent rustc/llvm with deferred inlining enabled, so those tests may no longer be representative. I was also unable to create a modified test case that reproduced the original issue.~ (edit: I reproduced it on CI by accident--the first commit timed out on the LLVM 12 builder, because I forgot to make it conditional on LLVM version)
r? `@nagisa`
cc `@arielb1` (this effectively reverts #42771 "mark calls in the unwind path as !noinline")
cc `@RalfJung` (fixes#46515)
edit: also fixes#87055
Allow loading LLVM plugins with both legacy and new pass manager
Opening a draft PR to get feedback and start discussion on this feature. There is already a codegen option `passes` which allow giving a list of LLVM pass names, however we currently can't use a LLVM pass plugin (as described here : https://llvm.org/docs/WritingAnLLVMPass.html), the only available passes are the LLVM built-in ones.
The proposed modification would be to add another codegen option `pass-plugins`, which can be set with a list of paths to shared library files. These libraries are loaded using the LLVM function `PassPlugin::Load`, which calls the expected symbol `lvmGetPassPluginInfo`, and register the pipeline parsing and optimization callbacks.
An example usage with a single plugin and 3 passes would look like this in the `.cargo/config`:
```toml
rustflags = [
"-C", "pass-plugins=/tmp/libLLVMPassPlugin",
"-C", "passes=pass1 pass2 pass3",
]
```
This would give the same functionality as the opt LLVM tool directly integrated in rust build system.
Additionally, we can also not specify the `passes` option, and use a plugin which inserts passes in the optimization pipeline, as one could do using clang.
The resulting profile will include the crate name and will be stored in
the `--out-dir` directory.
This implementation makes it convenient to use LLVM time trace together
with cargo, in the contrast to the previous implementation which would
overwrite profiles or store them in `.cargo/registry/..`.
This commit works around a crash in LLVM when the
`-generate-arange-section` argument is passed to LLVM. An LLVM bug is
opened for this and the code in question is also set to continue passing
this flag with LLVM 14, assuming that this is fixed by the time LLVM 14
comes out. Otherwise this should work around debuginfo crashes on LLVM
13.
Initialize LLVM time trace profiler on each code generation thread
In https://reviews.llvm.org/D71059 LLVM 11, the time trace profiler was
extended to support multiple threads.
`timeTraceProfilerInitialize` creates a thread local profiler instance.
When a thread finishes `timeTraceProfilerFinishThread` moves a thread
local instance into a global collection of instances. Finally when all
codegen work is complete `timeTraceProfilerWrite` writes data from the
current thread local instance and the instances in global collection
of instances.
Previously, the profiler was intialized on a single thread only. Since
this thread performs no code generation on its own, the resulting
profile was empty.
Update LLVM codegen to initialize & finish time trace profiler on each
code generation thread.
cc `@tmandry`
r? `@wesleywiser`
In https://reviews.llvm.org/D71059 LLVM 11, the time trace profiler was
extended to support multiple threads.
`timeTraceProfilerInitialize` creates a thread local profiler instance.
When a thread finishes `timeTraceProfilerFinishThread` moves a thread
local instance into a global collection of instances. Finally when all
codegen work is complete `timeTraceProfilerWrite` writes data from the
current thread local instance and the instances in global collection
of instances.
Previously, the profiler was intialized on a single thread only. Since
this thread performs no code generation on its own, the resulting
profile was empty.
Update LLVM codegen to initialize & finish time trace profiler on each
code generation thread.
Cleanup LLVM multi-threading checks
The support for runtime multi-threading was removed from LLVM. Calls to
`LLVMStartMultithreaded` became no-ops equivalent to checking if LLVM
was compiled with support for threads http://reviews.llvm.org/D4216.
The support for runtime multi-threading was removed from LLVM. Calls to
`LLVMStartMultithreaded` became no-ops equivalent to checking if LLVM
was compiled with support for threads http://reviews.llvm.org/D4216.
[aarch64] add target feature outline-atomics
Enable outline-atomics by default as enabled in clang by the following commit
https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11
Performance improves by several orders of magnitude when using the LSE instructions
instead of the ARMv8.0 compatible load/store exclusive instructions.
Tested on Graviton2 aarch64-linux with
x.py build && x.py install && x.py test
Enable outline-atomics by default as enabled in clang by the following commit
https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11
Performance improves by several orders of magnitude when using the LSE instructions
instead of the ARMv8.0 compatible load/store exclusive instructions.
Tested on Graviton2 aarch64-linux with
x.py build && x.py install && x.py test
This fixes compiling things like the `snap` crate after
https://reviews.llvm.org/D105462. I added a test that verifies the
additional attribute gets specified, and confirmed that I can build
cargo with both LLVM 13 and 14 with this change applied.
This addresses a codegen-issue that needs to be fixed upstream in LLVM.
While we wait for the fix, we can disable it.
Verified manually that the outliner is no longer run when
`-Copt-level=z` is specified, and also that you can override this with
`-Cllvm-args=-enable-machine-outliner` if you need it anyway.
A regression test is not really feasible in this instance, given that we
do not have any minimal reproducers.
Fixes#85351
Update list of allowed aarch64 features
I recently added these features to std_detect for aarch64 linux, pending [review](https://github.com/rust-lang/stdarch/pull/1146).
I have commented any features not supported by LLVM 9, the current minimum version for Rust. Some (PAuth at least) were renamed between 9 & 12 and I've left them disabled. TME, however, is not in LLVM 9 but I've left it enabled.
See https://github.com/rust-lang/stdarch/issues/993
Since the beginning of time the `-Ctarget-feature` flag on the command
line has largely been passed unmodified to LLVM. Afterwards, though, the
`#[target_feature]` attribute was stabilized and some of the names in
this attribute do not match the corresponding LLVM name. This is because
Rust doesn't always want to stabilize the exact feature name in LLVM for
the equivalent functionality in Rust. This creates a situation, however,
where in Rust you'd write:
#[target_feature(enable = "pclmulqdq")]
unsafe fn foo() {
// ...
}
but on the command line you would write:
RUSTFLAGS="-Ctarget-feature=+pclmul" cargo build --release
This difference is somewhat odd to deal with if you're a newcomer and
the situation may be made worse with upcoming features like [WebAssembly
SIMD](https://github.com/rust-lang/rust/issues/74372) which may be more
prevalent.
This commit implements a mapping to translate requests via
`-Ctarget-feature` through the same name-mapping functionality that's
present for attributes in Rust going to LLVM. This means that
`+pclmulqdq` will work on x86 targets where as previously it did not.
I've attempted to keep this backwards-compatible where the compiler will
just opportunistically attempt to remap features found in
`-Ctarget-feature`, but if there's something it doesn't understand it
gets passed unmodified to LLVM just as it was before.
Import small cold functions
The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be code generated in at least one
CGU and available for import.
For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.
In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, no matter how trivial the
implementation.
Increase slightly the default multiplier from 0 to 0.1.
r? `@ghost`
When cg_llvm encounters the `-Ctarget-cpu=native` it computes an
explciit set of features that applies to the target in order to
correctly compile code for the host CPU (because e.g. `skylake` alone is
not sufficient to tell if some of the instructions are available or
not).
However there were a couple of issues with how we did this. Firstly, the
order in which features were overriden wasn't quite right – conceptually
you'd expect `-Ctarget-cpu=native` option to override the features that
are implicitly set by the target definition. However due to how other
`-Ctarget-cpu` values are handled we must adopt the following order
of priority:
* Features from -Ctarget-cpu=*; are overriden by
* Features implied by --target; are overriden by
* Features from -Ctarget-feature; are overriden by
* function specific features.
Another problem was in that the function level `target-features`
attribute would overwrite the entire set of the globally enabled
features, rather than just the features the
`#[target_feature(enable/disable)]` specified. With something like
`-Ctarget-cpu=native` we'd end up in a situation wherein a function
without `#[target_feature(enable)]` annotation would have a broader
set of features compared to a function with one such attribute. This
turned out to be a cause of heavy run-time regressions in some code
using these function-level attributes in conjunction with
`-Ctarget-cpu=native`, for example.
With this PR rustc is more careful about specifying the entire set of
features for functions that use `#[target_feature(enable/disable)]` or
`#[instruction_set]` attributes.
Sadly testing the original reproducer for this behaviour is quite
impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in
particular on developer or CI machines.
The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be generated in at least one CGU and
available for import.
For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.
In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, even for completely trivial
functions.
Increase slightly the default multiplier from 0 to 0.1 to import them
regardless.
Use Option::unwrap_or instead of open-coding it
r? ```@oli-obk``` Noticed this while we were talking about the other PR just now 😆
```@rustbot``` modify labels +C-cleanup +T-compiler
Updated the list of white-listed target features for x86
This PR both adds in-source documentation on what to look out for when adding a new (X86) feature set and [adds all that are detectable at run-time in Rust stable as of 1.27.0](https://github.com/rust-lang/stdarch/blob/master/crates/std_detect/src/detect/arch/x86.rs).
This should only enable the use of the corresponding LLVM intrinsics.
Actual intrinsics need to be added separately in rust-lang/stdarch.
It also re-orders the run-time-detect test statements to be more consistent
with the actual list of intrinsics whitelisted and removes underscores not present
in the actual names (which might be mistaken as being part of the name)
The reference for LLVM's feature names used is [this file](https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/X86TargetParser.def).
This PR was motivated as the compiler end's part for allowing #67329 to be adressed over on rust-lang/stdarch
Allow making `RUSTC_BOOTSTRAP` conditional on the crate name
Motivation: This came up in the [Zulip stream](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Require.20users.20to.20confirm.20they.20know.20RUSTC_.E2.80.A6.20compiler-team.23350/near/208403962) for https://github.com/rust-lang/compiler-team/issues/350.
See also https://github.com/rust-lang/cargo/pull/6608#issuecomment-458546258; this implements https://github.com/rust-lang/cargo/issues/6627.
The goal is for this to eventually allow prohibiting setting `RUSTC_BOOTSTRAP` in build.rs (https://github.com/rust-lang/cargo/issues/7088).
## User-facing changes
- `RUSTC_BOOTSTRAP=1` still works; there is no current plan to remove this.
- Things like `RUSTC_BOOTSTRAP=0` no longer activate nightly features. In practice this shouldn't be a big deal, since `RUSTC_BOOTSTRAP` is the opposite of stable and everyone uses `RUSTC_BOOTSTRAP=1` anyway.
- `RUSTC_BOOTSTRAP=x` will enable nightly features only for crate `x`.
- `RUSTC_BOOTSTRAP=x,y` will enable nightly features only for crates `x` and `y`.
## Implementation changes
The main change is that `UnstableOptions::from_environment` now requires
an (optional) crate name. If the crate name is unknown (`None`), then the new feature is not available and you still have to use `RUSTC_BOOTSTRAP=1`. In practice this means the feature is only available for `--crate-name`, not for `#![crate_name]`; I'm interested in supporting the second but I'm not sure how.
Other major changes:
- Added `Session::is_nightly_build()`, which uses the `crate_name` of
the session
- Added `nightly_options::match_is_nightly_build`, a convenience method
for looking up `--crate-name` from CLI arguments.
`Session::is_nightly_build()`should be preferred where possible, since
it will take into account `#![crate_name]` (I think).
- Added `unstable_features` to `rustdoc::RenderOptions`
I'm not sure whether this counts as T-compiler or T-lang; _technically_ RUSTC_BOOTSTRAP is an implementation detail, but it's been used so much it seems like this counts as a language change too.
r? `@joshtriplett`
cc `@Mark-Simulacrum` `@hsivonen`
This commit grepped for LLVM_VERSION_GE, LLVM_VERSION_LT, get_major_version and
min-llvm-version and statically evaluated every expression possible
(and sensible) assuming that the LLVM version is >=9 now