Commit Graph

30 Commits

Author SHA1 Message Date
Scott McMurray
dc37e37329 Add a codegen test for comparisons of 2-tuples of primitives
The operators are all overridden in full for tuples, so those parts pass easily, but they're worth pinning.

Going via `Ord::cmp`, though, doesn't optimize away for anything but `cmp`+`is_le`.  So this leaves `FIXME`s in the tests for the others.
2023-02-16 21:36:14 -08:00
bors
2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc
Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
2023-02-13 10:18:48 +00:00
Matthias Krüger
8fc9ed51f0
Rollup merge of #107043 - Nilstrieb:true-and-false-is-false, r=wesleywiser
Support `true` and `false` as boolean flag params

Implements [MCP 577](https://github.com/rust-lang/compiler-team/issues/577).
2023-02-10 06:09:56 +01:00
Oleksii Lozovskyi
54b26f49e6 Test XRay only for supported targets
Now that the compiler accepts "-Z instrument-xray" option only when
targeting one of the supported targets, make sure to not run the
codegen tests where the compiler will fail.

Like with other compiletests, we don't have access to internals,
so simply hardcode a list of supported architectures here.
2023-02-09 12:29:43 +09:00
Oleksii Lozovskyi
0fef658ffe Codegen tests for -Z instrument-xray
Let's add at least some tests to verify that this option is accepted
and produces expected LLVM attributes. More tests can be added later
with attribute support.
2023-02-09 12:28:00 +09:00
Ralf Jung
1ef16874b5 also do not add noalias on not-Unpin Box 2023-02-06 12:17:41 +01:00
Ralf Jung
ea541bc2ee make &mut !Unpin not dereferenceable
See https://github.com/rust-lang/unsafe-code-guidelines/issues/381 for discussion.
2023-02-06 11:46:37 +01:00
Ralf Jung
201ae73872 make PointerKind directly reflect pointer types
The code that consumes PointerKind (`adjust_for_rust_scalar` in rustc_ty_utils)
ended up using PointerKind variants to talk about Rust reference types (& and
&mut) anyway, making the old code structure quite confusing: one always had to
keep in mind which PointerKind corresponds to which type. So this changes
PointerKind to directly reflect the type.

This does not change behavior.
2023-02-06 11:46:32 +01:00
Scott McMurray
bb77860d9c Add another autovectorization codegen test using array zip-map 2023-02-04 16:44:53 -08:00
Scott McMurray
5bc328fdef Allow canonicalizing the array::map loop in trusted cases 2023-02-04 16:44:51 -08:00
Scott McMurray
52df0558ea Stop forcing array::map through an unnecessary Result 2023-02-04 16:41:35 -08:00
Scott McMurray
5a7342c3dd Stop using into_iter in array::map 2023-02-04 16:41:35 -08:00
Matthias Krüger
c89bb159f6
Rollup merge of #107373 - michaelwoerister:dont-merge-vtables-when-debuginfo, r=WaffleLapkin
Don't merge vtables when full debuginfo is enabled.

This PR makes the compiler not emit the `unnamed_addr` attribute for vtables when full debuginfo is enabled, so that they don't get merged even if they have the same contents. This allows debuggers to more reliably map from a dyn pointer to the self-type of a trait object by looking at the vtable's debuginfo.

The PR only changes the behavior of the LLVM backend as other backends don't emit vtable debuginfo (as far as I can tell).

The performance impact of this change should be small as [measured](https://github.com/rust-lang/rust/pull/103514#issuecomment-1290833854) in a previous PR.
2023-01-28 05:20:19 +01:00
Matthias Krüger
7b78b6a78d
Rollup merge of #107022 - scottmcm:ordering-option-eq, r=m-ou-se
Implement `SpecOptionPartialEq` for `cmp::Ordering`

Noticed as I continue to explore options for having code using `partial_cmp` optimize better.

Before:
```llvm
; Function Attrs: mustprogress nofree nosync nounwind willreturn uwtable
define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #0 {
start:
  %2 = icmp eq i8 %0, 2
  br i1 %2, label %bb1.i, label %bb3.i

bb1.i:                                            ; preds = %start
  %3 = icmp eq i8 %1, 2
  br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit"

bb3.i:                                            ; preds = %start
  %.not.i = icmp ne i8 %1, 2
  %4 = icmp eq i8 %0, %1
  %spec.select.i = and i1 %.not.i, %4
  br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit"

"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit": ; preds = %bb1.i, %bb3.i
  %.0.i = phi i1 [ %3, %bb1.i ], [ %spec.select.i, %bb3.i ]
  ret i1 %.0.i
}
```

After:
```llvm
; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable
define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #1 {
start:
  %2 = icmp eq i8 %0, %1
  ret i1 %2
}
```

(Which <https://alive2.llvm.org/ce/z/-rop5r> says LLVM *could* just do itself, but there's probably an issue already open for that problem from when this was originally looked at for `Option<NonZeroU8>` and friends.)
2023-01-28 05:20:15 +01:00
Michael Woerister
e5995e6168 Don't merge vtables when full debuginfo is enabled. 2023-01-27 15:29:04 +00:00
Erik Desjardins
009192b01b abi: add AddressSpace field to Primitive::Pointer
...and remove it from `PointeeInfo`, which isn't meant for this.

There are still various places (marked with FIXMEs) that assume all pointers
have the same size and alignment. Fixing this requires parsing non-default
address spaces in the data layout string, which will be done in a followup.
2023-01-22 23:41:39 -05:00
bors
705a96d39b Auto merge of #106989 - clubby789:is-zero-num, r=scottmcm
Implement `alloc::vec::IsZero` for `Option<$NUM>` types

Fixes #106911

Mirrors the `NonZero$NUM` implementations with an additional `assert_zero_valid`.
`None::<i32>` doesn't stricly satisfy `IsZero` but for the purpose of allocating we can produce more efficient codegen.
2023-01-19 08:04:26 +00:00
Scott McMurray
3122db7d03 Implement SpecOptionPartialEq for cmp::Ordering 2023-01-18 19:19:28 -08:00
Nilstrieb
a6fda3ee7f Support true and false as boolean flag params
Implements MCP 577.
2023-01-18 20:46:36 +01:00
clubby789
b94a29a25f Implement alloc::vec::IsZero for Option<$NUM> types 2023-01-18 15:15:15 +00:00
Matthias Krüger
c96dac16c3
Rollup merge of #106995 - lukas-code:align_offset_assembly_test, r=cuviper
bump failing assembly & codegen tests from LLVM 14 to LLVM 15

These tests need LLVM 15.

Found by ```@Robert-Cunningham``` in https://github.com/rust-lang/rust/pull/100601#issuecomment-1385400008

Passed tests at 006506e93fc80318ebfd7939fe1fd4dc19ecd8cb in https://github.com/rust-lang/rust/actions/runs/3942442730/jobs/6746104740.
2023-01-18 06:59:21 +01:00
Lukas Markeffsky
1216cc7f1c bump failing assembly & codegen tests from LLVM 14 to LLVM 15 2023-01-17 20:02:01 +01:00
Nilstrieb
f1255380ac Add more codegen tests 2023-01-17 16:23:22 +01:00
Nilstrieb
af23ad93cd Improve comments 2023-01-17 08:14:35 +01:00
Nilstrieb
645c0fddd2 Put noundef on all scalars that don't allow uninit
Previously, it was only put on scalars with range validity invariants
like bool, was uninit was obviously invalid for those.

Since then, we have normatively declared all uninit primitives to be
undefined behavior and can therefore put `noundef` on them.

The remaining concern was the `mem::uninitialized` function, which cause
quite a lot of UB in the older parts of the ecosystem. This function now
doesn't return uninit values anymore, making users of it safe from this
change.

The only real sources of UB where people could encounter uninit
primitives are `MaybeUninit::uninit().assume_init()`, which has always
be clear in the docs about being UB and from heap allocations (like
reading from the spare capacity of a vec. This is hopefully rare enough
to not break anything.
2023-01-17 08:14:35 +01:00
The 8472
9db0134018 replace manual ptr arithmetic with ptr_sub 2023-01-15 17:38:05 +01:00
Nicholas Bishop
46f9e878f6 Stabilize abi_efiapi feature
Tracking issue: https://github.com/rust-lang/rust/issues/65815
2023-01-11 20:42:13 -05:00
Ben Kimock
13eec69e1c Add a regression test for argument copies with DestinationPropagation 2023-01-11 10:27:06 -05:00
Albert Larsan
40ba0e84d5
Change src/test to tests in source files, fix tidy and tests 2023-01-11 09:32:13 +00:00
Albert Larsan
cf2dff2b1e
Move /src/test to /tests 2023-01-11 09:32:08 +00:00