Commit Graph

64 Commits

Author SHA1 Message Date
bors
e4b9f86054 Auto merge of #109035 - scottmcm:ptr-read-should-know-undef, r=WaffleLapkin,JakobDegen
Ensure `ptr::read` gets all the same LLVM `load` metadata that dereferencing does

I was looking into `array::IntoIter` optimization, and noticed that it wasn't annotating the loads with `noundef` for simple things like `array::IntoIter<i32, N>`.  Trying to narrow it down, it seems that was because `MaybeUninit::assume_init_read` isn't marking the load as initialized (<https://rust.godbolt.org/z/Mxd8TPTnv>), which is unfortunate since that's basically its reason to exist.

The root cause is that `ptr::read` is currently implemented via the *untyped* `copy_nonoverlapping`, and thus the `load` doesn't get any type-aware metadata: no `noundef`, no `!range`.  This PR solves that by lowering `ptr::read(p)` to `copy *p` in MIR, for which the backends already do the right thing.

Fortuitiously, this also improves the IR we give to LLVM for things like `mem::replace`, and fixes a couple of long-standing bugs where `ptr::read` on `Copy` types was worse than `*`ing them.

Zulip conversation: <https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Move.20array.3A.3AIntoIter.20to.20ManuallyDrop/near/341189936>

cc `@erikdesjardins` `@JakobDegen` `@workingjubilee` `@the8472`

Fixes #106369
Fixes #73258
2023-03-15 11:44:12 +00:00
Scott McMurray
dfc3377954 Split the mem-replace codegen test
Apparently in CI it's getting generated in the opposite order, one function per file will make the test pass either way.
2023-03-15 00:57:08 -07:00
Scott McMurray
e7c6ad89cf Improved implementation and comments after code review feedback 2023-03-14 22:24:28 -07:00
Matthias Krüger
39e1f810a9
Rollup merge of #109081 - krasimirgg:llvm-17-simd-wide-sum, r=nikic
simd-wide-sum test: adapt for LLVM 17 codegen change

After 0d4a709bb8 LLVM becomes more clever and turns ```@wider_reduce_loop``` into an alias:

https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/17806#0186da6b-582c-46bf-a227-1565fa0859ac/743-766

This adapts the test to prevent this.
2023-03-13 21:55:38 +01:00
Krasimir Georgiev
ed8dc5d817 simd-wide-sum test: adapt for LLVM 17 codegen change
After 0d4a709bb8
LLVM becomes more clever and turns `@wider_reduce_loop` into an alias:

https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/17806#0186da6b-582c-46bf-a227-1565fa0859ac/743-766

This adapts the test to prevent this.
2023-03-13 15:07:16 +00:00
bors
cf8d98b227 Auto merge of #108623 - scottmcm:try-different-as-slice-impl, r=the8472
Move `Option::as_slice` to an always-sound implementation

This approach depends on CSE to not have any branches or selects when the guessed offset is correct -- which it always will be right now -- but to also be *sound* (just less efficient) if the layout algorithms change such that the guess is incorrect.

The codegen test confirms that CSE handles this as expected, leaving the optimal codegen.

cc JakobDegen #108545
2023-03-13 13:53:24 +00:00
Scott McMurray
1f70bb8c43 Add a codegen test to confirm this fixes 73258 2023-03-12 13:23:22 -07:00
Scott McMurray
0b96fee343 Add a codegen test to confirm this fixes 106369 2023-03-12 12:57:40 -07:00
Scott McMurray
f6a57c1955 Move Option::as_slice to an always-sound implementation
This approach depends on CSE to not have any branches or selects when the guessed offset is correct -- which it always will be right now -- but to also be *sound* (just less efficient) if the layout algorithms change such that the guess is incorrect.
2023-03-11 20:29:26 -08:00
Scott McMurray
b2c717fa33 MaybeUninit::assume_init_read should have noundef load metadata
I was looking into `array::IntoIter` optimization, and noticed that it wasn't annotating the loads with `noundef` for simple things like `array::IntoIter<i32, N>`.

Turned out to be a more general problem as `MaybeUninit::assume_init_read` isn't marking the load as initialized (<https://rust.godbolt.org/z/Mxd8TPTnv>), which is unfortunate since that's basically its reason to exist.

This PR lowers `ptr::read(p)` to `copy *p` in MIR, which fortuitiously also improves the IR we give to LLVM for things like `mem::replace`.
2023-03-11 17:44:43 -08:00
bors
160c2ebeca Auto merge of #108763 - scottmcm:indexing-nuw-lengths, r=cuviper
Use `nuw` when calculating slice lengths from `Range`s

An `assume` would definitely not be worth it, but since the flag is almost free we might as well tell LLVM this, especially on `_unchecked` calls where there's no obvious way for it to deduce it.

(Today neither safe nor unsafe indexing gets it: <https://rust.godbolt.org/z/G1jYT548s>)
2023-03-07 13:17:59 +00:00
Scott McMurray
3554036280 Use nuw when calculating slice lengths from Ranges
An `assume` would definitely not be worth it, but since the flag is almost free we might as well tell LLVM this, especially on `_unchecked` calls where there's no obvious way for it to deduce it.

(Today neither safe nor unsafe indexing gets it: <https://rust.godbolt.org/z/G1jYT548s>)
2023-03-05 15:15:22 -08:00
bors
816f958ac3 Auto merge of #108157 - scottmcm:tuple-gt-via-partialcmp, r=dtolnay
Use `partial_cmp` to implement tuple `lt`/`le`/`ge`/`gt`

In today's implementation, `(A, B)::gt` contains calls to *both* `A::eq` *and* `A::gt`.

That's fine for primitives, but for things like `String`s it's kinda weird -- `(String, usize)::gt` has a call to both `bcmp` and `memcmp` (<https://rust.godbolt.org/z/7jbbPMesf>) because when `bcmp` says the `String`s aren't equal, it turns around and calls `memcmp` to find out which one's bigger.

This PR changes the implementation to instead implement `(A, …, C, Z)::gt` using `A::partial_cmp`, `…::partial_cmp`, `C::partial_cmp`, and `Z::gt`.  (And analogously for `lt`, `le`, and `ge`.)  That way expensive comparisons don't need to be repeated.

Technically this is an observable change on stable, so I've marked it `needs-fcp` + `T-libs-api` and will
r? rust-lang/libs-api

I'm hoping that this will be non-controversial, however, since it's very similar to the observable changes that were made to the derives (#81384 #98655) -- like those, this only changes behaviour if a type overrode behaviour in a way inconsistent with the rules for the various traits involved.

(The first commit here is #108156, adding the codegen test, which I used to make sure this doesn't regress behaviour for primitives.)

Zulip conversation about this change: <https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.60.3E.60.20on.20Tuples/near/328392927>.
2023-03-05 22:02:26 +00:00
bors
864b6258fc Auto merge of #106673 - flba-eb:add_qnx_nto_stdlib, r=workingjubilee
Add support for QNX Neutrino to standard library

This change:

- adds standard library support for QNX Neutrino (7.1).
- upgrades `libc` to version `0.2.139` which supports QNX Neutrino

`@gh-tr`

⚠️ Backtraces on QNX require https://github.com/rust-lang/backtrace-rs/pull/507 which is not yet merged! (But everything else works without these changes) ⚠️

Tested mainly with a x86_64 virtual machine (see qnx-nto.md) and partially with an aarch64 hardware (some tests fail due to constrained resources).
2023-03-02 02:41:42 +00:00
bors
0b4ba4cf0e Auto merge of #108483 - scottmcm:unify-bytewise-eq-traits, r=the8472
Merge two different equality specialization traits in `core`

Arrays and slices each had their own version of this, without a matching set of `impl`s.

Merge them into one (still-`pub(crate)`) `cmp::BytewiseEq` trait, so we can stop doing all these things twice.

And that means that the `[T]::eq` → `memcmp` specialization picks up a bunch of types where that previously only worked for arrays, so examples like <https://rust.godbolt.org/z/KjsG8MGGT> will use it now instead of emitting loops.

r? the8472
2023-03-01 23:34:37 +00:00
Scott McMurray
44eec1d9b0 Merge two different equality specialization traits in core 2023-03-01 14:42:06 -08:00
bors
609496eecf Auto merge of #108446 - Zoxc:named-allocs, r=oli-obk
Name LLVM anonymous constants by a hash of their contents

This makes the names stable between different versions of a crate unlike the `AllocId` naming, making LLVM IR comparisons with `llvm-diff` more practical.
2023-03-01 15:36:15 +00:00
Andre Bogus
41da875fae Add Option::as_slice(_mut)
This adds the following functions:

* `Option<T>::as_slice(&self) -> &[T]`
* `Option<T>::as_slice_mut(&mut self) -> &[T]`

The `as_slice` and `as_slice_mut` functions benefit from an
optimization that makes them completely branch-free.

Note that the optimization's soundness hinges on the fact that either
the niche optimization makes the offset of the `Some(_)` contents zero
or the mempory layout of `Option<T>` is equal to that of
`Option<MaybeUninit<T>>`.
2023-03-01 00:05:31 +01:00
Florian Bartels
3ce2cd059f
Add QNX Neutrino support to libstd
Co-authored-by: gh-tr <troach@qnx.com>
2023-02-28 15:59:47 +01:00
John Kåre Alsaker
b897b2d65c Update tests 2023-02-25 21:43:25 +01:00
bors
7aa413d592 Auto merge of #107921 - cjgillot:codegen-overflow-check, r=tmiasko
Make codegen choose whether to emit overflow checks

ConstProp and DataflowConstProp currently have a specific code path not to propagate constants when they overflow. This is meant to have the correct behaviour when inlining from a crate with overflow checks (like `core`) into a crate compiled without.

This PR shifts the behaviour change to the `Assert(Overflow*)` MIR terminators: if the crate is compiled without overflow checks, just skip emitting the assertions. This is already what happens with `OverflowNeg`.

This allows ConstProp and DataflowConstProp to transform `CheckedBinaryOp(Add, u8::MAX, 1)` into `const (0, true)`, and let codegen ignore the `true`.

 The interpreter is modified to conform to this behaviour.

Fixes #35310
2023-02-19 18:17:26 +00:00
Camille GILLOT
c107e0e945 Fix codegen test. 2023-02-18 21:35:02 +00:00
Camille GILLOT
86dbcb5390 Add codegen test. 2023-02-18 21:35:02 +00:00
Michael Goulet
e82cc656c8 Make dyn* have the same scalar pair ABI as corresponding fat pointer 2023-02-18 19:47:34 +00:00
Michael Goulet
1f11d841b5 Add codegen test 2023-02-18 19:47:34 +00:00
bors
fabfd1fd93 Auto merge of #99679 - repnop:kernel-address-sanitizer, r=cuviper
Add `kernel-address` sanitizer support for freestanding targets

This PR adds support for KASan (kernel address sanitizer) instrumentation in freestanding targets. I included the minimal set of `x86_64-unknown-none`, `riscv64{imac, gc}-unknown-none-elf`, and `aarch64-unknown-none` but there's likely other targets it can be added to. (`linux_kernel_base.rs`?) KASan uses the address sanitizer attributes but has the `CompileKernel` parameter set to `true` in the pass creation.
2023-02-18 03:05:11 +00:00
Scott McMurray
680e21687d Use partial_cmp to implement tuple lt/le/ge/gt 2023-02-16 23:59:13 -08:00
Scott McMurray
dc37e37329 Add a codegen test for comparisons of 2-tuples of primitives
The operators are all overridden in full for tuples, so those parts pass easily, but they're worth pinning.

Going via `Ord::cmp`, though, doesn't optimize away for anything but `cmp`+`is_le`.  So this leaves `FIXME`s in the tests for the others.
2023-02-16 21:36:14 -08:00
bors
639377ed73 Auto merge of #107449 - saethlin:enable-copyprop, r=oli-obk
Enable CopyProp

r? `@tmiasko`

`@rustbot` label +A-mir-opt
2023-02-16 03:44:37 +00:00
Wesley Norris
19714385e0 Add kernel-address sanitizer support for freestanding targets 2023-02-14 20:54:25 -05:00
Ben Kimock
37a875cbdb Try to fix codegen tests for ??? LLVM 14 ??? 2023-02-14 19:49:49 -05:00
Ben Kimock
a82adf0125 Fix codegen tests 2023-02-14 19:21:58 -05:00
Matthias Krüger
a1ba861190
Rollup merge of #107573 - cuviper:drop-llvm-13, r=nagisa
Update the minimum external LLVM to 14

With this change, we'll have stable support for LLVM 14 through 16 (pending release).
For reference, the previous increase to LLVM 13 was #100460.
2023-02-14 18:24:40 +01:00
bors
2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc
Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
2023-02-13 10:18:48 +00:00
Ben Kimock
640ede7b0a Enable CopyProp by default, tune the impl a bit 2023-02-12 13:23:53 -05:00
Josh Stone
a06aaa4a9e Update the minimum external LLVM to 14 2023-02-10 16:06:25 -08:00
Matthias Krüger
8fc9ed51f0
Rollup merge of #107043 - Nilstrieb:true-and-false-is-false, r=wesleywiser
Support `true` and `false` as boolean flag params

Implements [MCP 577](https://github.com/rust-lang/compiler-team/issues/577).
2023-02-10 06:09:56 +01:00
Oleksii Lozovskyi
54b26f49e6 Test XRay only for supported targets
Now that the compiler accepts "-Z instrument-xray" option only when
targeting one of the supported targets, make sure to not run the
codegen tests where the compiler will fail.

Like with other compiletests, we don't have access to internals,
so simply hardcode a list of supported architectures here.
2023-02-09 12:29:43 +09:00
Oleksii Lozovskyi
0fef658ffe Codegen tests for -Z instrument-xray
Let's add at least some tests to verify that this option is accepted
and produces expected LLVM attributes. More tests can be added later
with attribute support.
2023-02-09 12:28:00 +09:00
Ralf Jung
1ef16874b5 also do not add noalias on not-Unpin Box 2023-02-06 12:17:41 +01:00
Ralf Jung
ea541bc2ee make &mut !Unpin not dereferenceable
See https://github.com/rust-lang/unsafe-code-guidelines/issues/381 for discussion.
2023-02-06 11:46:37 +01:00
Ralf Jung
201ae73872 make PointerKind directly reflect pointer types
The code that consumes PointerKind (`adjust_for_rust_scalar` in rustc_ty_utils)
ended up using PointerKind variants to talk about Rust reference types (& and
&mut) anyway, making the old code structure quite confusing: one always had to
keep in mind which PointerKind corresponds to which type. So this changes
PointerKind to directly reflect the type.

This does not change behavior.
2023-02-06 11:46:32 +01:00
Scott McMurray
bb77860d9c Add another autovectorization codegen test using array zip-map 2023-02-04 16:44:53 -08:00
Scott McMurray
5bc328fdef Allow canonicalizing the array::map loop in trusted cases 2023-02-04 16:44:51 -08:00
Scott McMurray
52df0558ea Stop forcing array::map through an unnecessary Result 2023-02-04 16:41:35 -08:00
Scott McMurray
5a7342c3dd Stop using into_iter in array::map 2023-02-04 16:41:35 -08:00
Matthias Krüger
c89bb159f6
Rollup merge of #107373 - michaelwoerister:dont-merge-vtables-when-debuginfo, r=WaffleLapkin
Don't merge vtables when full debuginfo is enabled.

This PR makes the compiler not emit the `unnamed_addr` attribute for vtables when full debuginfo is enabled, so that they don't get merged even if they have the same contents. This allows debuggers to more reliably map from a dyn pointer to the self-type of a trait object by looking at the vtable's debuginfo.

The PR only changes the behavior of the LLVM backend as other backends don't emit vtable debuginfo (as far as I can tell).

The performance impact of this change should be small as [measured](https://github.com/rust-lang/rust/pull/103514#issuecomment-1290833854) in a previous PR.
2023-01-28 05:20:19 +01:00
Matthias Krüger
7b78b6a78d
Rollup merge of #107022 - scottmcm:ordering-option-eq, r=m-ou-se
Implement `SpecOptionPartialEq` for `cmp::Ordering`

Noticed as I continue to explore options for having code using `partial_cmp` optimize better.

Before:
```llvm
; Function Attrs: mustprogress nofree nosync nounwind willreturn uwtable
define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #0 {
start:
  %2 = icmp eq i8 %0, 2
  br i1 %2, label %bb1.i, label %bb3.i

bb1.i:                                            ; preds = %start
  %3 = icmp eq i8 %1, 2
  br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit"

bb3.i:                                            ; preds = %start
  %.not.i = icmp ne i8 %1, 2
  %4 = icmp eq i8 %0, %1
  %spec.select.i = and i1 %.not.i, %4
  br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit"

"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit": ; preds = %bb1.i, %bb3.i
  %.0.i = phi i1 [ %3, %bb1.i ], [ %spec.select.i, %bb3.i ]
  ret i1 %.0.i
}
```

After:
```llvm
; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable
define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #1 {
start:
  %2 = icmp eq i8 %0, %1
  ret i1 %2
}
```

(Which <https://alive2.llvm.org/ce/z/-rop5r> says LLVM *could* just do itself, but there's probably an issue already open for that problem from when this was originally looked at for `Option<NonZeroU8>` and friends.)
2023-01-28 05:20:15 +01:00
Michael Woerister
e5995e6168 Don't merge vtables when full debuginfo is enabled. 2023-01-27 15:29:04 +00:00
Erik Desjardins
009192b01b abi: add AddressSpace field to Primitive::Pointer
...and remove it from `PointeeInfo`, which isn't meant for this.

There are still various places (marked with FIXMEs) that assume all pointers
have the same size and alignment. Fixing this requires parsing non-default
address spaces in the data layout string, which will be done in a followup.
2023-01-22 23:41:39 -05:00