We've investigated one reason why debugging information often goes wrong at https://reviews.llvm.org/D152095.
> LLVM can't handle IR where subprogram definitions are nested within DICompositeType when doing LTO builds,
> because there's no good way to cross the CU boundary to insert a nested DISubprogram definition in one CU into a type defined in another CU.
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly
As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)
Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.
cc `@RalfJung` `@Amanieu`
Nest other codegen test topics
This PR is like rust-lang/rust#114229 in that it mostly pushes codegen tests around, shoving them into their own directories, but because all of the changes are very simple cleanups I pulled them into a separate PR. The other PR might involve actually evaluating the correctness of the test after changes, but here it is mostly a matter of taste. The only "functional" change is deleting a few tests that... hinge on a version of LLVM that we don't support (as of rust-lang/rust#114148 anyways).
I considered a few different ways to group other topics but I feel the question of whether `tests/codegen/{vec,array,slice}` should exist is more subtle than these choices, as it might be better to group such related tests by other topics like bounds check elision, thus I avoided making it.
Nest tests/codegen/sanitizer*.rs tests in sanitizer dir
The sanitizer tests are the largest and most meticulously tested set of tests in tests/codegen. That's good! They all clearly belong to a subject and thus could go in a directory, but are not, instead being placed simply in tests/codegen. That's bad! Fix this by placing them in their own directory and renaming them to be less repetitive after that move.
A few tests are brittle, and embed their filename in the test's checks. This is acceptable for the ones where it is used only two times, but one test embeds the test's mangled filename in the test *over 50 times*! This may have been one of the things discouraging anyone from moving it, and thus from moving the set. Fortunately, I have some knowledge of Itanium mangling (involuntarily), regex, and the FileCheck syntax. With a capturing variable, FileCheck allows us to now move this test around again without diffing it on ~50 lines, while still guaranteeing that the mangled substring is the same each time.
This also clarifies why the substring is repeated a zillion times, instead of being cryptic. They don't call it mangling because the result is pretty and easy to understand, but now it is slightly easier! Yay descriptive variables!
cleanup: remove pointee types
This can't be merged until the oldest LLVM version we support uses opaque pointers, which will be the case after #114148. (Also note `-Cllvm-args="-opaque-pointers=0"` can technically be used in LLVM 15, though I don't think we should support that configuration.)
I initially hoped this would provide some minor perf win, but in https://github.com/rust-lang/rust/pull/105412#issuecomment-1341224450 it had very little impact, so this is only valuable as a cleanup.
As a followup, this will enable #96242 to be resolved.
r? `@ghost`
`@rustbot` label S-blocked
In the basic case, simply do the string substitution.
For one case with many instances, capture the Itanium-
mangled filename and assert its reuse instead.
Combining revisions with only-arch allows specifying
that a test only applies to a handful of targets.
This allows removing a large amount of repetition
in the test suite for tests that do not benefit.
The revisions are suboptimal for this for some tests,
so they aren't preferred in those cases.
Slightly more complicated: also give them appropriate names
that somewhat describe the cases they are trying to cover,
using information from PR chatter in rust-lang/rust#47158
Fix #[inline(always)] on closures with target feature 1.1
Fixes#108655. I think this is the most obvious solution that isn't overly complicated. The comment includes more justification, but I think this is likely better than demoting the `#[inline(always)]` to `#[inline]`, since existing code is unaffected.
Both GCC and Clang write by default a `.comment` section with compiler
information:
```txt
$ gcc -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] GCC: (GNU) 11.2.0
$ clang -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] clang version 14.0.1 (https://github.com/llvm/llvm-project.git c62053979489ccb002efe411c3af059addcb5d7d)
```
They also implement the `-Qn` flag to avoid doing so:
```txt
$ gcc -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
$ clang -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
```
So far, `rustc` only does it for WebAssembly targets and only
when debug info is enabled:
```txt
$ echo 'fn main(){}' | rustc --target=wasm32-unknown-unknown --emit=llvm-ir -Cdebuginfo=2 - && grep llvm.ident rust_out.ll
!llvm.ident = !{!27}
```
In the RFC part of this PR it was decided to always add
the information, which gets us closer to other popular compilers.
An opt-out flag like GCC and Clang may be added later on if deemed
necessary.
Implementation-wise, this covers both `ModuleLlvm::new()` and
`ModuleLlvm::new_metadata()` cases by moving the addition to
`context::create_module` and adds a few test cases.
ThinLTO also sees the `llvm.ident` named metadata duplicated (in
temporary outputs), so this deduplicates it like it is done for
`wasm.custom_sections`. The tests also check this duplication does
not take place.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Turn copy into moves during DSE.
Dead store elimination computes whether removing a direct store to an unborrowed place is allowed.
Where removing a store is allowed, writing `uninit` is too.
This means that we can use this pass to transform `copy` operands into `move` operands. This is only interesting in call terminators, so we only handle those.
Special care is taken for the `use_both(_1, _1)` case:
- moving the second argument is ok, as `_1` is not live after the call;
- moving the first argument is not, as the second argument reads `_1`.
Fixes#75993
Fixes https://github.com/rust-lang/rust/issues/108068
r? `@RalfJung`
cc `@JakobDegen`
Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process.
Same as #111551, which I [accidentally closed](https://github.com/rust-lang/rust/pull/111551#issuecomment-1571222612) :/
---
This resurrects PR #103830, which has sat idle for a while.
Beyond #103830, this also:
- fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`)
- fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`)
- fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`)
r? `@nikic`
---
`@pcwalton's` original PR description is reproduced below:
Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.
The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.
As a side effect, this should fix#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.
[this comment]: #80822 (comment)
alignment of `byval` on x86 in the process.
Commit 88e4d2c291 from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.
The problem is summarized in [this comment] by @eddyb. Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.
As a side effect, this should fix#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.
[this comment]: https://github.com/rust-lang/rust/pull/80822#issuecomment-829985417
Add Tests for native wasm exceptions
### Motivation
In PR #111322, I added support for native WASM exceptions. I was asked by ``@davidtwco`` to add some tests for it in a follow up PR, which seems like a very good idea.
This PR adds three tests for this feature:
* codegen: ensure the correct LLVM instructions are used
* assembly: ensure the correct WASM instructions are used
* run-make: ensure the exception handling works; the WASM code is run using a small nodejs script which demonstrates the exception handling
### Complications
There are a few changes beside adding the tests, which were necessary
* Tests for the wasm32-unknown-unknown target are (as far as I know) only run on `test-various`. Its docker image uses nodejs-15, which is very old. Experimental support for wasm-exceptions was added in nodejs16. In nodejs 18.12 (LTS), they are stable.
- --> increase nodejs to 18.12 in `test-various`
* codegen/assembly tests are not performed for the wasm32-unknown-unknown target yet
- --> add those to `test-various` as well
Due to the last point, some tests are run which have not run before (assembly+codegen tests for wasm32-unknown-unknown). I added `// ignore wasm32-bare` for those which failed
### Local testing
I run all tests locally using both `test-various` and `wasm32`. As far as I know, none of the other systems run any test for wasm32 targets.
Various impl trait in assoc tys cleanups
r? `@compiler-errors`
All commits except for the last are pure refactorings. 274dab5bd658c97886a8987340bf50ae57900c39 allows struct fields to participate in deciding whether a function has an opaque in its signature.
best reviewed commit by commit
[libs] Simplify `unchecked_{shl,shr}`
There's no need for the `const_eval_select` dance here. And while I originally wrote the `.try_into().unwrap_unchecked()` implementation here, it's kinda a mess in MIR -- this new one is substantially simpler, as shown by the old one being above the inlining threshold but the new one being below it in the `mir-opt/inline/unchecked_shifts` tests.
We don't need `u32::checked_shl` doing a dance through both `Result` *and* `Option` 🙃
There's no need for the `const_eval_select` dance here. And while I originally wrote the `.try_into().unwrap_unchecked()` implementation here, it's kinda a mess in MIR -- this new one is substantially simpler, as shown by the old one being above the inlining threshold but the new one being below it.
Fix codegen test suite for bare-metal-like targets
For Ferrocene I needed to run the test suite for custom target with no unwinding and static relocation. Running the tests uncovered ~20 failures due to the test suite not accounting for these options. This PR fixes them by:
* Fixing `CHECK`s to account for functions having extra LLVM IR attributes (in this case `nounwind`).
* Fixing `CHECK`s to account for the `dso_local` LLVM IR modifier, which is [added to every item when relocation is static](f3d597b31c/compiler/rustc_codegen_llvm/src/mono_item.rs (L139-L142)).
* Fixing `CHECK`s to account for missing `uwtables` attributes.
* Added the `needs-unwind` attributes for tests that are designed to check unwinding.
There is no part of Rust CI that checks this unfortunately, and testing whether the PR works locally is kinda hard because you need a target with std enabled but no unwinding and static relocations. Still, this works in my local testing, and if future PRs accidentally break this Ferrocene will take care of sending followup PRs.
Enable ScalarReplacementOfAggregates in optimized builds
Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include https://github.com/rust-lang/rust/pull/112001 we started seeing significant and majority-positive results.
Based on the fact that we see most of the regressions in debug builds (https://github.com/rust-lang/rust/pull/112002#issuecomment-1566270144) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
These tend to have special handling in a bunch of places anyway, so the variant helps remember that. And I think it's easier to grok than non-Scalar Aggregates sometimes being `Immediates` (like I got wrong and caused 109992). As a minor bonus, it means we don't need to generate poison LLVM values for them to pass around in `OperandValue::Immediate`s.
Only rewrite valtree-constants to patterns and keep other constants opaque
Now that we can reliably fall back to comparing constants with `PartialEq::eq` to the match scrutinee, we can
1. eagerly try to convert constants to valtrees
2. then deeply convert the valtree to a pattern
3. if the to-valtree conversion failed, create an "opaque constant" pattern.
This PR specifically avoids any behavioral changes or major cleanups. What we can now do as follow ups is
* move the two remaining call sites to `destructure_mir_constant` off that query
* make valtree to pattern conversion infallible
* this needs to be done after careful analysis of the effects. There may be user visible changes from that.
based on https://github.com/rust-lang/rust/pull/111768
Optimize scalar and scalar pair representations loaded from ByRef in llvm
in https://github.com/rust-lang/rust/pull/105653 I noticed that we were generating suboptimal LLVM IR if we had a `ConstValue::ByRef` that could be represented by a `ScalarPair`. Before https://github.com/rust-lang/rust/pull/105653 this is probably rare, but after it, every slice will go down this suboptimal code path that requires LLVM to untangle a bunch of indirections and translate static allocations that are only used once to read a scalar pair from.
`[T; N]::zip` is "eager" but most zips are mapped.
This causes poor optimization in generated code.
This is a fundamental design issue and "zip" is
"prime real estate" in terms of function names,
so let's free it up again.
Adds support for LLVM [SafeStack] which provides backward edge control
flow protection by separating the stack into two parts: data which is
only accessed in provable safe ways is allocated on the normal stack
(the "safe stack") and all other data is placed in a separate allocation
(the "unsafe stack").
SafeStack support is enabled by passing `-Zsanitizer=safestack`.
[SafeStack]: https://clang.llvm.org/docs/SafeStack.html
Support #[global_allocator] without the allocator shim
This makes it possible to use liballoc/libstd in combination with `--emit obj` if you use `#[global_allocator]`. This is what rust-for-linux uses right now and systemd may use in the future. Currently they have to depend on the exact implementation of the allocator shim to create one themself as `--emit obj` doesn't create an allocator shim.
Note that currently the allocator shim also defines the oom error handler, which is normally required too. Once `#![feature(default_alloc_error_handler)]` becomes the only option, this can be avoided. In addition when using only fallible allocator methods and either `--cfg no_global_oom_handling` for liballoc (like rust-for-linux) or `--gc-sections` no references to the oom error handler will exist.
To avoid this feature being insta-stable, you will have to define `__rust_no_alloc_shim_is_unstable` to avoid linker errors.
(Labeling this with both T-compiler and T-lang as it originally involved both an implementation detail and had an insta-stable user facing change. As noted above, the `__rust_no_alloc_shim_is_unstable` symbol requirement should prevent unintended dependence on this unstable feature.)
CFI: Fix encode_region: unexpected ReEarlyBound(0, 'a)
Fixes#111515 and complements #106547 by adding support for encoding early bound regions and also excluding projections when transforming trait objects' traits into their identities before emitting type checks.
Fixes#111515 and complements #106547 by adding support for encoding
early bound regions and also excluding projections when transforming
trait objects' traits into their identities before emitting type checks.
Fix some issues with folded AArch64 features
In #91608 the `fp` feature was removed for AArch64 and folded into the `neon` feature, however disabling the `neon` feature doesn't actually disable the `fp` feature. If my understanding on that thread is correct it should do.
While doing this, I also noticed that disabling some features would disable features that it shouldn't. For instance enabling `sve` will enable `neon`, however, when disabling `sve` it would then also disable `neon`, I wouldn't expect disabling `sve` to also disable `neon`.
cc `@workingjubilee`
When compiling with panic=abort (or using a target that doesn't have
unwinding support), the compiler adds the "nounwind" attribute to
functions. This results in a different LLVM IR, which results in a #NNN
added after the function name:
tail call void @bar() #13, !dbg !467
attributes #13 = { nounwind }
...instead of:
tail call void @bar(), !dbg !467
This commit changes the matchers to swallow the #NNN, as it's not needed
for these specific tests.
Rollup of 6 pull requests
Successful merges:
- #111461 (Fix symbol conflict diagnostic mistakenly being shown instead of missing crate diagnostic)
- #111579 (Also assume wrap-around discriminants in `as` MIR building)
- #111704 (Remove return type sized check hack from hir typeck)
- #111853 (Check opaques for mismatch during writeback)
- #111854 (rustdoc: clean up `settings.css`)
- #111860 (Don't ICE if method receiver fails to unify with `arbitrary_self_types`)
r? `@ghost`
`@rustbot` modify labels: rollup
[rustc_ty_utils] Treat `drop_in_place`'s *mut argument like &mut when adding LLVM attributes
This resurrects PR #103614, which has sat idle for a while.
This could probably use a new perf run, since we're on a new LLVM version now.
r? `@oli-obk`
cc `@RalfJung`
---
LLVM can make use of the `noalias` parameter attribute on the parameter to `drop_in_place` in areas like argument promotion. Because the Rust compiler fully controls the code for `drop_in_place`, it can soundly deduce parameter attributes on it.
In #103957, Miri was changed to retag `drop_in_place`'s argument as if it was `&mut`, matching this change.
Fix duplicate `arcinner_layout_for_value_layout` calls when using the uninit `Arc` constructors
What this fixes is the duplicate calls to `arcinner_layout_for_value_layout` seen here: https://godbolt.org/z/jr5Gxozhj
The issue was discovered alongside #111603 but is otherwise unrelated to the duplicate `alloca`s, which remain unsolved. Everything I tried to solve said main issue has failed.
As for the duplicate layout calculations, I also tried slapping `#[inline]` and `#[inline(always)]` on everything in sight but the only thing that worked in the end is to dedup the calls by hand.
Rather than returning an array of features from to_llvm_features, return a structure that contains
the dependencies. This also contains metadata on how the features depend on each other to allow for
the correct enabling and disabling.
Some features that are tied together only make sense to be folded
together when enabling the feature. For example on AArch64 sve and
neon are tied together, however it doesn't make sense to disable neon
when disabling sve.
CFI: Fix encode_ty: unexpected Param(B/#1)
Fixes#111510 and complements #106547 by adding support for encoding type parameters and also by transforming trait objects' traits into their identities before emitting type checks.
The CHECK, -NOT, -SAME pattern ensures that the `CHECK-NOT: noalias`
is limited to only one line, and won't match unrelated lines further
down in the file.
Explicit drop call added to preserve the `foo` argument name, since
names of unused arguments are not preserved.
We've done measurements with Miri and have determined that `noalias` shouldn't
break code. The requirements that allow us to add dereferenceable and align
have been long documented in the standard library documentation.
LLVM can make use of the `noalias` parameter attribute on the parameter to
`drop_in_place` in areas like argument promotion. Because the Rust compiler
fully controls the code for `drop_in_place`, it can soundly deduce parameter
attributes on it. In the case of a value that has a programmer-defined Drop
implementation, we know that the first thing `drop_in_place` will do is pass a
pointer to the object to `Drop::drop`. `Drop::drop` takes `&mut`, so it must be
guaranteed that there are no pointers to the object upon entering that
function. Therefore, it should be safe to mark `noalias` there.
With this patch, we mark `noalias` only when the type is a value with a
programmer-defined Drop implementation. This is possibly overly conservative,
but I thought that proceeding cautiously was best in this instance.
Fixes#111510 and complements #106547 by adding support for encoding
type parameters and also by transforming trait objects' traits into
their identities before emitting type checks.
Stop checking for the absence of something that doesn't exist
A couple of codegen tests are doing
```
// CHECK-NOT: slice_index_len_fail
```
However, that function no longer exists: [the only places](https://github.com/search?q=repo%3Arust-lang%2Frust+slice_index_len_fail&type=code) it occurs in the repo are in those tests.
So this PR updates the tests to check for the absense of the functions that are actually used today to panic for out-of-bounds indexing.
allow mutating function args through `&raw const`
Fixes https://github.com/rust-lang/rust/issues/111502 by "turning off the sketchy optimization while we figure out if this is ok", like `@JakobDegen` said.
The first commit in this PR removes some suspicious looking logic from the same method, but should have no functional changes, since it doesn't modify the `context` outside of the method. Best reviewed commit by commit.
r? opsem
A couple of codegen tests are doing
```
// CHECK-NOT: slice_index_len_fail
```
However, that function no longer exists: [the only places](https://github.com/search?q=repo%3Arust-lang%2Frust+slice_index_len_fail&type=code) it occurs in the repo are in those tests.
So this PR updates the tests to check for the absense of the functions that are actually used today to panic for out-of-bounds indexing.
CFI: Fix SIGILL reached via trait objects
Fix#106547 by transforming the concrete self into a reference to a trait object before emitting type metadata identifiers for trait methods.
You will need to add the following as replacement for the old __rust_*
definitions when not using the alloc shim.
#[no_mangle]
static __rust_no_alloc_shim_is_unstable: u8 = 0;
vec-shrink-panik: update expectations to work on LLVM 17
For some reason, the called function is `cleanup` on LLVM 17 instead of `filter`.
r? `@Amanieu`
Remove some `assume`s from slice iterators that don't do anything
Because the start pointer is iterators is already a `NonNull`, we emit the appropriate `!nonnull` metadata when loading the pointer to tell LLVM that it's non-null.
Probably the best way to see that it's the metadata that's important (and not the `assume`) is to observe that LLVM actually *removes* the `assume` from the optimized IR: <https://rust.godbolt.org/z/KhE6G963n>.
(I also checked that, yes, the if-not-ZST `assume` on `end` is still doing something: it's how there's a `!nonnull` metadata on its load, even though it's an ordinary raw pointer. The codegen test added in this PR fails if the other `assume` is removed.)
Rollup of 6 pull requests
Successful merges:
- #104070 (Prevent aborting guard from aborting the process in a forced unwind)
- #109410 (Introduce `AliasKind::Inherent` for inherent associated types)
- #111004 (Migrate `mir_transform` to translatable diagnostics)
- #111118 (Suggest struct when we get colon in fileds in enum)
- #111170 (Diagnostic args are still args if they're documented)
- #111354 (Fix miscompilation when calling default methods on `Future`)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Add cross-language LLVM CFI support to the Rust compiler
This PR adds cross-language LLVM Control Flow Integrity (CFI) support to the Rust compiler by adding the `-Zsanitizer-cfi-normalize-integers` option to be used with Clang `-fsanitize-cfi-icall-normalize-integers` for normalizing integer types (see https://reviews.llvm.org/D139395).
It provides forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space). For more information about LLVM CFI and cross-language LLVM CFI support for the Rust compiler, see design document in the tracking issue #89653.
Cross-language LLVM CFI can be enabled with -Zsanitizer=cfi and -Zsanitizer-cfi-normalize-integers, and requires proper (i.e., non-rustc) LTO (i.e., -Clinker-plugin-lto).
Thank you again, ``@bjorn3,`` ``@nikic,`` ``@samitolvanen,`` and the Rust community for all the help!
This commit adds cross-language LLVM Control Flow Integrity (CFI)
support to the Rust compiler by adding the
`-Zsanitizer-cfi-normalize-integers` option to be used with Clang
`-fsanitize-cfi-icall-normalize-integers` for normalizing integer types
(see https://reviews.llvm.org/D139395).
It provides forward-edge control flow protection for C or C++ and Rust
-compiled code "mixed binaries" (i.e., for when C or C++ and Rust
-compiled code share the same virtual address space). For more
information about LLVM CFI and cross-language LLVM CFI support for the
Rust compiler, see design document in the tracking issue #89653.
Cross-language LLVM CFI can be enabled with -Zsanitizer=cfi and
-Zsanitizer-cfi-normalize-integers, and requires proper (i.e.,
non-rustc) LTO (i.e., -Clinker-plugin-lto).
Improve niche placement by trying two strategies and picking the better result
Fixes#104807Fixes#105371
Determining which sort order is better requires calculating the struct size (so we can calculate the niche offset). But that in turn depends on the field order, so happens after sorting. So the simple way to solve that is to run the whole thing twice and pick the better result.
1st commit is just code motion, the meat is in the later ones.
Add `intrinsics::transmute_unchecked`
This takes a whole 3 lines in `compiler/` since it lowers to `CastKind::Transmute` in MIR *exactly* the same as the existing `intrinsics::transmute` does, it just doesn't have the fancy checking in `hir_typeck`.
Added to enable experimenting with the request in <https://github.com/rust-lang/rust/pull/106281#issuecomment-1496648190> and because the portable-simd folks might be interested for dependently-sized array-vector conversions.
It also simplifies a couple places in `core`.
See also https://github.com/rust-lang/rust/pull/108442#issuecomment-1474777273, where `CastKind::Transmute` was added having exactly these semantics before the lang meeting (which I wasn't in) independently expressed interest.
This takes a whole 3 lines in `compiler/` since it lowers to `CastKind::Transmute` in MIR *exactly* the same as the existing `intrinsics::transmute` does, it just doesn't have the fancy checking in `hir_typeck`.
Added to enable experimenting with the request in <https://github.com/rust-lang/rust/pull/106281#issuecomment-1496648190> and because the portable-simd folks might be interested for dependently-sized array-vector conversions.
It also simplifies a couple places in `core`.
ci: add a runner for vanilla LLVM 16
Like #107044, this will let us track compatibility with LLVM 16 going
forward, especially after we eventually upgrade our own to the next.
This also drops `tidy` here and in `x86_64-gnu-llvm-15`, syncing with
that change in #106085.
tests: adapt for LLVM change 5b386b864c7619897c51a1da97d78f1cf6f3eff6
The above-mentioned change modified the output of thread-local.rs by changing some variable names. Rather than assume things get put in %0, we capture the variable so the test passes in both the old and new version.
The above-mentioned change modified the output of thread-local.rs by
changing some variable names. Rather than assume things get put in %0,
we capture the variable so the test passes in both the old and new
version.
Permit MIR inlining without #[inline]
I noticed that there are at least a handful of portable-simd functions that have no `#[inline]` but compile to an assign + return.
I locally benchmarked inlining thresholds between 0 and 50 in increments of 5, and 50 seems to be the best. Interesting. That didn't include check builds though, ~maybe perf will have something to say about that~.
Perf has little useful to say about this. We generally regress all the check builds, as best as I can tell, due to a number of small codegen changes in a particular hot function in the compiler. Probably this is because we've nudged the inlining outcomes all over, and uses of `#[inline(always)]`/`#[inline(never)]` might need to be adjusted.
Like #107044, this will let us track compatibility with LLVM 16 going
forward, especially after we eventually upgrade our own to the next.
This also drops `tidy` here and in `x86_64-gnu-llvm-15`, syncing with
that change in #106085.
Preserve argument indexes when inlining MIR
We store argument indexes on VarDebugInfo. Unlike the previous method of relying on the variable index to know whether a variable is an argument, this survives MIR inlining.
We also no longer check if var.source_info.scope is the outermost scope. When a function gets inlined, the arguments to the inner function will no longer be in the outermost scope. What we care about though is whether they were in the outermost scope prior to inlining, which we know by whether we assigned an argument index.
Fixes#83217
I considered using `Option<NonZeroU16>` instead of `Option<u16>` to store the index. I didn't because `TypeFoldable` isn't implemented for `NonZeroU16` and because it looks like due to padding, it currently wouldn't make any difference. But I indexed from 1 anyway because (a) it'll make it easier if later it becomes worthwhile to use a `NonZeroU16` and because the arguments were previously indexed from 1, so it made for a smaller change.
This is my first PR on rust-lang/rust, so apologies if I've gotten anything not quite right.
These don't optimize with debug assertions. For one of them, this
is due to the new alignment checks, for the other I'm not sure
what specifically blocks it.
We store argument indexes on VarDebugInfo. Unlike the previous method of
relying on the variable index to know whether a variable is an argument,
this survives MIR inlining.
We also no longer check if var.source_info.scope is the outermost scope.
When a function gets inlined, the arguments to the inner function will
no longer be in the outermost scope. What we care about though is
whether they were in the outermost scope prior to inlining, which we
know by whether we assigned an argument index.
Use SipHash-1-3 instead of SipHash-2-4 for StableHasher
Noticed this, and it seems easy and likely a perf win. IIUC we don't need DDOS resistance (just collision) so we ideally would have an even faster hash, but it's hard to beat this SipHash impl here, since it's been so highly tuned for the interface.
It wouldn't surprise me if there's some subtle reason changing this sucks, as it's so obvious it seems likely to have been done. Still, SipHash-1-3 seems to still have the guarantees StableHasher should need (and seemingly more), and is clearly less work. So it's worth a shot.
Not fully tested locally.
Validate `ignore` and `only` compiletest directive, and add human-readable ignore reasons
This PR adds strict validation for the `ignore` and `only` compiletest directives, failing if an unknown value is provided to them. Doing so uncovered 79 tests in `tests/ui` that had invalid directives, so this PR also fixes them.
Finally, this PR adds human-readable ignore reasons when tests are ignored due to `ignore` or `only` directives, like *"only executed when the architecture is aarch64"* or *"ignored when the operative system is windows"*. This was the original reason why I started working on this PR and #108659, as we need both of them for Ferrocene.
The PR is a draft because the code is extremely inefficient: it calls `rustc --print=cfg --target $target` for every rustc target (to gather the list of allowed ignore values), which on my system takes between 4s and 5s, and performs a lot of allocations of constant values. I'll fix both of them in the coming days.
r? `@ehuss`
Allow `transmute`s to produce `OperandValue`s instead of needing `alloca`s
LLVM can usually optimize these away, but especially for things like transmutes of newtypes it's silly to generate the `alloc`+`store`+`load` at all when it's actually a nop at LLVM level.
LLVM can usually optimize these away, but especially for things like transmutes of newtypes it's silly to generate the `alloc`+`store`+`load` at all when it's actually a nop at LLVM level.
`-Cdebuginfo=1` was never line tables only and
can't be due to backwards compatibility issues.
This was clarified and an option for line tables only
was added. Additionally an option for line info
directives only was added, which is well needed for
some targets. The debug info options should now
behave the same as clang's debug info options.
Insert alignment checks for pointer dereferences when debug assertions are enabled
Closes https://github.com/rust-lang/rust/issues/54915
- [x] Jake tells me this sounds like a place to use `MirPatch`, but I can't figure out how to insert a new basic block with a new terminator in the middle of an existing basic block, using `MirPatch`. (if nobody else backs up this point I'm checking this as "not actually a good idea" because the code looks pretty clean to me after rearranging it a bit)
- [x] Using `CastKind::PointerExposeAddress` is definitely wrong, we don't want to expose. Calling a function to get the pointer address seems quite excessive. ~I'll see if I can add a new `CastKind`.~ `CastKind::Transmute` to the rescue!
- [x] Implement a more helpful panic message like slice bounds checking.
r? `@oli-obk`
Upgrade to LLVM 16, again
Relative to the previous attempt in https://github.com/rust-lang/rust/pull/107224:
* Update to GCC 8.5 on dist-x86_64-linux, to avoid std::optional ABI-incompatibility between libstdc++ 7 and 8.
* Cherry-pick 96df79af02.
* Cherry-pick 6fc670e5e3.
r? `@cuviper`
Use poison instead of undef
In cases where it is legal, we should prefer poison values over undef values.
This replaces undef with poison for aggregate construction and for uninhabited types. There are more places where we can likely use poison, but I wanted to stay conservative to start with.
In particular the aggregate case is important for newer LLVM versions, which are not able to handle an undef base value during early optimization due to poison-propagation concerns.
r? `@cuviper`
Updates `interpret`, `codegen_ssa`, and `codegen_cranelift` to consume the new cast instead of the intrinsic.
Includes `CastTransmute` for custom MIR building, to be able to test the extra UB.