rust/library/core/src
bors 2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc
Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
2023-02-13 10:18:48 +00:00
..
alloc Clarify new_size for realloc means bytes 2023-02-09 23:56:20 -08:00
array Auto merge of #107634 - scottmcm:array-drain, r=thomcc 2023-02-13 10:18:48 +00:00
async_iter use consistent terminology 2022-10-29 09:23:12 +02:00
cell Add OnceCell<T>: !Sync impl for diagnostics 2023-01-19 20:14:21 +01:00
char Auto merge of #105671 - lukas-code:depreciate-char, r=scottmcm 2023-02-12 11:09:06 +00:00
convert Set version placeholders to 1.68 2023-01-25 09:44:29 -05:00
ffi Remove a couple of #[doc(hidden)] pub fn and their #[feature] gates 2023-02-10 08:06:35 +01:00
fmt Auto merge of #106745 - m-ou-se:format-args-ast, r=oli-obk 2023-01-26 12:44:47 +00:00
future Remove GenFuture from core 2023-01-29 15:20:03 +01:00
hash Fix some ~const usage in libcore 2022-12-20 15:01:37 +00:00
intrinsics Auto merge of #107297 - Mark-Simulacrum:bump-bootstrap, r=pietroalbini 2023-01-31 19:24:29 +00:00
iter Auto merge of #107634 - scottmcm:array-drain, r=thomcc 2023-02-13 10:18:48 +00:00
macros Remove HTML tags around warning 2023-01-06 13:20:58 +01:00
mem stage-step cfgs 2023-01-30 13:09:09 -05:00
num Rollup merge of #107961 - scottmcm:unify-ilog-panics, r=Mark-Simulacrum 2023-02-13 11:12:50 +05:30
ops Auto merge of #107634 - scottmcm:array-drain, r=thomcc 2023-02-13 10:18:48 +00:00
panic Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
prelude Replace libstd, libcore, liballoc in docs. 2022-12-30 14:00:40 +01:00
ptr sub_ptr() is equivalent to usize::try_from().unwrap_unchecked(), not usize::from().unwrap_unchecked(). 2023-01-23 14:42:32 +02:00
slice Auto merge of #107634 - scottmcm:array-drain, r=thomcc 2023-02-13 10:18:48 +00:00
str Use associated items of char instead of freestanding items in core::char 2023-01-14 11:58:41 +01:00
sync Mark 'atomic_mut_ptr' methods const 2023-02-05 17:03:46 -05:00
task stage-step cfgs 2023-01-30 13:09:09 -05:00
unicode Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
any.rs Constify TypeId ordering impls 2023-01-16 21:26:03 +01:00
arch.rs move core::arch into separate file 2022-11-20 10:28:14 +01:00
ascii.rs Inline <EscapeDefault as Iterator>::next 2022-03-10 15:35:22 +01:00
asserting.rs [RFC 2011] Library code 2022-05-22 07:18:32 -03:00
bool.rs Add missing assertion 2022-09-22 02:12:06 -04:00
borrow.rs Minor grammar nit. 2022-12-12 16:22:01 -07:00
cell.rs impl DispatchFromDyn for Cell and UnsafeCell 2023-01-24 12:06:12 +01:00
clone.rs Make some trivial functions #[inline(always)] 2022-12-07 17:11:17 +01:00
cmp.rs Replace ConstFnMutClosure with const closures 2023-02-03 14:43:13 +00:00
default.rs cfg-step code 2022-11-06 17:21:21 -05:00
error.md Small round of typo fixes 2022-11-04 20:06:18 -07:00
error.rs Remove a couple of #[doc(hidden)] pub fn and their #[feature] gates 2023-02-10 08:06:35 +01:00
hint.rs Improve the documentation of black_box 2023-01-07 15:44:38 -05:00
internal_macros.rs ignore a doctest for the non-exported macro 2022-05-03 18:33:56 +09:00
intrinsics.rs stage-step cfgs 2023-01-30 13:09:09 -05:00
lib.rs Replace ConstFnMutClosure with const closures 2023-02-03 14:43:13 +00:00
marker.rs Document PointerLike 2023-02-12 01:23:02 +00:00
option.rs nit fixed 2023-02-03 13:57:53 -06:00
panic.rs Replace libstd, libcore, liballoc in docs. 2022-12-30 14:00:40 +01:00
panicking.rs stage-step cfgs 2023-01-30 13:09:09 -05:00
pin.rs Set version placeholders to 1.68 2023-01-25 09:44:29 -05:00
primitive_docs.rs disable strict-provenance-violating doctests in Miri 2022-11-22 11:49:02 +01:00
primitive.rs
result.rs docs: update fragment for Result impls 2023-02-03 19:03:17 -07:00
time.rs Bump version placeholders to release 2022-11-06 17:11:02 -05:00
tuple.rs const Compare Tuples 2022-11-09 09:52:04 +01:00
unit.rs Use implicit capture syntax in format_args 2022-03-10 10:23:40 -05:00