rust/tests at 318be2bee9259852bc95728269916a45f59fa5aa - rust

mirror of https://github.com/rust-lang/rust.git synced 2024-11-26 08:44:35 +00:00

History

bors 2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc Improve the `array::map` codegen The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](`7c46fb2111/library/core/src/array/mod.rs (L865-L912)`) function, I had some ideas that ended up working better than I'd expected. There's three main ideas in here, split over three commits: 1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily 2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant) 3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward. (No libs-api changes; everything should be completely implementation-detail-internal.) It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞 r? `@thomcc` As a simple example, this test ```rust pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] { x.map(\|x\| 13 * x + 7) } ``` On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808` ```llvm start: %array.i.i.i.i = alloca [64 x i32], align 4 %_3.sroa.5.i.i.i = alloca [65 x i32], align 4 %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8 ``` (and yes, that's a 65-element array `alloca` despite 64-element input and output) But with this PR it's only `sub rsp, 520` ```llvm start: %array.i.i.i.i.i.i = alloca [64 x i32], align 4 %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4 ``` Similarly, the loop it emits on nightly is scalar-only and horrifying ```nasm .LBB0_1: mov esi, 64 mov edi, 0 cmp rdx, 64 je .LBB0_3 lea rsi, [rdx + 1] mov qword ptr [rsp + 784], rsi mov r8d, dword ptr [rsp + 4rdx + 528] mov edi, 1 lea edx, [r8 + 2r8] lea r8d, [r8 + 4rdx] add r8d, 7 .LBB0_3: test edi, edi je .LBB0_11 mov dword ptr [rsp + 4rcx + 272], r8d cmp rsi, 64 jne .LBB0_6 xor r8d, r8d mov edx, 64 test r8d, r8d jne .LBB0_8 jmp .LBB0_11 .LBB0_6: lea rdx, [rsi + 1] mov qword ptr [rsp + 784], rdx mov edi, dword ptr [rsp + 4rsi + 528] mov r8d, 1 lea esi, [rdi + 2rdi] lea edi, [rdi + 4rsi] add edi, 7 test r8d, r8d je .LBB0_11 .LBB0_8: mov dword ptr [rsp + 4rcx + 276], edi add rcx, 2 cmp rcx, 64 jne .LBB0_1 ``` whereas with this PR it's unrolled and vectorized ```nasm vpmulld ymm1, ymm0, ymmword ptr [rsp + 64] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 328], ymm1 vpmulld ymm1, ymm0, ymmword ptr [rsp + 96] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 360], ymm1 ``` (though sadly still stack-to-stack)		2023-02-13 10:18:48 +00:00
..
fmt	Add tests for rounding of ties during float formatting	2022-10-20 22:09:24 +02:00
hash	Test const `Hash`, fix nits	2022-11-08 17:39:40 +01:00
iter	Auto merge of #107634 - scottmcm:array-drain, r=thomcc	2023-02-13 10:18:48 +00:00
num	Remove unnecessary `&format!`	2023-01-21 22:06:42 -05:00
ops	Expand the docs for ops::ControlFlow a bit	2021-02-06 22:36:05 -08:00
panic	Fix test (location_const_file)	2022-10-08 11:48:53 +00:00
alloc.rs	Re-optimize `Layout::array`	2022-07-13 17:07:41 -07:00
any.rs	Update bootstrap cfg	2022-12-28 09:18:43 -05:00
array.rs	Stop using `into_iter` in `array::map`	2023-02-04 16:41:35 -08:00
ascii.rs	introduce `{char, u8}::is_ascii_octdigit`	2022-09-27 11:55:13 +05:30
asserting.rs	[RFC 2011] Library code	2022-05-22 07:18:32 -03:00
atomic.rs	Make use of `[wrapping_]byte_{add,sub}`	2022-08-23 19:32:37 +04:00
bool.rs	Constify `bool::then{,_some}`	2021-12-15 00:11:23 +08:00
cell.rs	Fix `Display` for `cell::{Ref,RefMut}`	2022-05-20 11:16:30 -07:00
char.rs	char: µoptimise UTF-16 surrogates decoding	2022-12-23 14:15:33 +01:00
clone.rs	Use Box::new() instead of box syntax in core tests	2022-05-29 01:44:11 +02:00
cmp.rs	Add test for StructuralEq for std::cmp::Ordering.	2022-03-16 14:01:48 -05:00
const_ptr.rs	cleanup code w/ pointers in std a little	2022-08-05 16:47:49 +04:00
convert.rs	Revert "Auto merge of #89450 - usbalbin:const_try_revert, r=oli-obk"	2021-12-12 12:34:59 +08:00
future.rs	add tests	2022-02-02 23:07:02 +09:00
intrinsics.rs	Switch bootstrap cfgs	2022-02-25 08:00:52 -05:00
lazy.rs	More inference-friendly API for lazy	2022-10-29 09:56:20 +01:00
lib.rs	Stabilize `::{core,std}::pin::pin!`	2023-01-11 14:09:14 -08:00
macros.rs	Allow leading pipe in `matches!()` patterns.	2021-07-15 22:05:45 +03:00
manually_drop.rs	Test ManuallyDrop::clone_from.	2021-07-05 11:55:45 +00:00
mem.rs	Update bootstrap cfg	2022-12-28 09:18:43 -05:00
nonzero.rs	Make `From` impls of NonZero integer const.	2021-10-20 12:04:58 +09:00
ops.rs	Test not never	2021-11-21 19:10:39 -08:00
option.rs	cfg-step code	2022-11-06 17:21:21 -05:00
panic.rs	Add newlines	2022-09-27 19:23:52 +00:00
pattern.rs	mv std libs to library/	2020-07-27 19:51:13 -05:00
pin_macro.rs	Write {ui,} tests for `pin_macro` and `pin!`	2022-02-14 16:56:37 +01:00
pin.rs	Make some methods of `Pin<&mut T>` unstable const	2020-09-18 19:23:50 +02:00
ptr.rs	avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs	2023-02-12 15:16:27 +01:00
result.rs	Remove unstable Result::into_ok_or_err	2022-08-17 17:20:42 -07:00
simd.rs	Introduce core::simd trait imports in tests	2022-07-20 18:08:20 -07:00
slice.rs	Remove various double spaces in source comments.	2023-01-14 17:22:04 +01:00
str_lossy.rs	Expose `Utf8Lossy` as `Utf8Chunks`	2022-08-20 12:49:20 -04:00
str.rs	Update paths in comments.	2022-12-30 14:00:42 +01:00
task.rs	Remove test of static Context	2023-01-02 10:33:23 -08:00
time.rs	add tests for div_duration_* functions	2023-01-07 11:05:33 -07:00
tuple.rs	mv std libs to library/	2020-07-27 19:51:13 -05:00
unicode.rs	revert changes to unicode stability	2022-07-08 21:18:15 +00:00
waker.rs	libcore tests: avoid int2ptr casts	2022-06-27 13:30:44 -04:00