nordic-dev.net/rust - rust

mirror of https://github.com/rust-lang/rust.git synced 2024-11-27 01:04:03 +00:00

Author	SHA1	Message	Date
bors	2d91939bb7	Auto merge of #107634 - scottmcm:array-drain, r=thomcc Improve the `array::map` codegen The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](`7c46fb2111/library/core/src/array/mod.rs (L865-L912)`) function, I had some ideas that ended up working better than I'd expected. There's three main ideas in here, split over three commits: 1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily 2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant) 3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward. (No libs-api changes; everything should be completely implementation-detail-internal.) It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞 r? `@thomcc` As a simple example, this test ```rust pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] { x.map(\|x\| 13 * x + 7) } ``` On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808` ```llvm start: %array.i.i.i.i = alloca [64 x i32], align 4 %_3.sroa.5.i.i.i = alloca [65 x i32], align 4 %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8 ``` (and yes, that's a 65-element array `alloca` despite 64-element input and output) But with this PR it's only `sub rsp, 520` ```llvm start: %array.i.i.i.i.i.i = alloca [64 x i32], align 4 %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4 ``` Similarly, the loop it emits on nightly is scalar-only and horrifying ```nasm .LBB0_1: mov esi, 64 mov edi, 0 cmp rdx, 64 je .LBB0_3 lea rsi, [rdx + 1] mov qword ptr [rsp + 784], rsi mov r8d, dword ptr [rsp + 4rdx + 528] mov edi, 1 lea edx, [r8 + 2r8] lea r8d, [r8 + 4rdx] add r8d, 7 .LBB0_3: test edi, edi je .LBB0_11 mov dword ptr [rsp + 4rcx + 272], r8d cmp rsi, 64 jne .LBB0_6 xor r8d, r8d mov edx, 64 test r8d, r8d jne .LBB0_8 jmp .LBB0_11 .LBB0_6: lea rdx, [rsi + 1] mov qword ptr [rsp + 784], rdx mov edi, dword ptr [rsp + 4rsi + 528] mov r8d, 1 lea esi, [rdi + 2rdi] lea edi, [rdi + 4rsi] add edi, 7 test r8d, r8d je .LBB0_11 .LBB0_8: mov dword ptr [rsp + 4rcx + 276], edi add rcx, 2 cmp rcx, 64 jne .LBB0_1 ``` whereas with this PR it's unrolled and vectorized ```nasm vpmulld ymm1, ymm0, ymmword ptr [rsp + 64] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 328], ymm1 vpmulld ymm1, ymm0, ymmword ptr [rsp + 96] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 360], ymm1 ``` (though sadly still stack-to-stack)	2023-02-13 10:18:48 +00:00
Ben Kimock	640ede7b0a	Enable CopyProp by default, tune the impl a bit	2023-02-12 13:23:53 -05:00
Josh Stone	a06aaa4a9e	Update the minimum external LLVM to 14	2023-02-10 16:06:25 -08:00
Matthias Krüger	8fc9ed51f0	Rollup merge of #107043 - Nilstrieb:true-and-false-is-false, r=wesleywiser Support `true` and `false` as boolean flag params Implements [MCP 577](https://github.com/rust-lang/compiler-team/issues/577).	2023-02-10 06:09:56 +01:00
Oleksii Lozovskyi	54b26f49e6	Test XRay only for supported targets Now that the compiler accepts "-Z instrument-xray" option only when targeting one of the supported targets, make sure to not run the codegen tests where the compiler will fail. Like with other compiletests, we don't have access to internals, so simply hardcode a list of supported architectures here.	2023-02-09 12:29:43 +09:00
Oleksii Lozovskyi	0fef658ffe	Codegen tests for -Z instrument-xray Let's add at least some tests to verify that this option is accepted and produces expected LLVM attributes. More tests can be added later with attribute support.	2023-02-09 12:28:00 +09:00
Ralf Jung	1ef16874b5	also do not add noalias on not-Unpin Box	2023-02-06 12:17:41 +01:00
Ralf Jung	ea541bc2ee	make &mut !Unpin not dereferenceable See https://github.com/rust-lang/unsafe-code-guidelines/issues/381 for discussion.	2023-02-06 11:46:37 +01:00
Ralf Jung	201ae73872	make PointerKind directly reflect pointer types The code that consumes PointerKind (`adjust_for_rust_scalar` in rustc_ty_utils) ended up using PointerKind variants to talk about Rust reference types (& and &mut) anyway, making the old code structure quite confusing: one always had to keep in mind which PointerKind corresponds to which type. So this changes PointerKind to directly reflect the type. This does not change behavior.	2023-02-06 11:46:32 +01:00
Scott McMurray	bb77860d9c	Add another autovectorization codegen test using array zip-map	2023-02-04 16:44:53 -08:00
Scott McMurray	5bc328fdef	Allow canonicalizing the `array::map` loop in trusted cases	2023-02-04 16:44:51 -08:00
Scott McMurray	52df0558ea	Stop forcing `array::map` through an unnecessary `Result`	2023-02-04 16:41:35 -08:00
Scott McMurray	5a7342c3dd	Stop using `into_iter` in `array::map`	2023-02-04 16:41:35 -08:00
Matthias Krüger	c89bb159f6	Rollup merge of #107373 - michaelwoerister:dont-merge-vtables-when-debuginfo, r=WaffleLapkin Don't merge vtables when full debuginfo is enabled. This PR makes the compiler not emit the `unnamed_addr` attribute for vtables when full debuginfo is enabled, so that they don't get merged even if they have the same contents. This allows debuggers to more reliably map from a dyn pointer to the self-type of a trait object by looking at the vtable's debuginfo. The PR only changes the behavior of the LLVM backend as other backends don't emit vtable debuginfo (as far as I can tell). The performance impact of this change should be small as [measured](https://github.com/rust-lang/rust/pull/103514#issuecomment-1290833854) in a previous PR.	2023-01-28 05:20:19 +01:00
Matthias Krüger	7b78b6a78d	Rollup merge of #107022 - scottmcm:ordering-option-eq, r=m-ou-se Implement `SpecOptionPartialEq` for `cmp::Ordering` Noticed as I continue to explore options for having code using `partial_cmp` optimize better. Before: ```llvm ; Function Attrs: mustprogress nofree nosync nounwind willreturn uwtable define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #0 { start: %2 = icmp eq i8 %0, 2 br i1 %2, label %bb1.i, label %bb3.i bb1.i: ; preds = %start %3 = icmp eq i8 %1, 2 br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit" bb3.i: ; preds = %start %.not.i = icmp ne i8 %1, 2 %4 = icmp eq i8 %0, %1 %spec.select.i = and i1 %.not.i, %4 br label %"_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit" "_ZN55_$LT$T$u20$as$u20$core..option..SpecOptionPartialEq$GT$2eq17hb7e7beacecde585fE.exit": ; preds = %bb1.i, %bb3.i %.0.i = phi i1 [ %3, %bb1.i ], [ %spec.select.i, %bb3.i ] ret i1 %.0.i } ``` After: ```llvm ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable define noundef zeroext i1 `@ordering_eq(i8` noundef %0, i8 noundef %1) unnamed_addr #1 { start: %2 = icmp eq i8 %0, %1 ret i1 %2 } ``` (Which <https://alive2.llvm.org/ce/z/-rop5r> says LLVM could just do itself, but there's probably an issue already open for that problem from when this was originally looked at for `Option<NonZeroU8>` and friends.)	2023-01-28 05:20:15 +01:00
Michael Woerister	e5995e6168	Don't merge vtables when full debuginfo is enabled.	2023-01-27 15:29:04 +00:00
Erik Desjardins	009192b01b	abi: add `AddressSpace` field to `Primitive::Pointer` ...and remove it from `PointeeInfo`, which isn't meant for this. There are still various places (marked with FIXMEs) that assume all pointers have the same size and alignment. Fixing this requires parsing non-default address spaces in the data layout string, which will be done in a followup.	2023-01-22 23:41:39 -05:00
bors	705a96d39b	Auto merge of #106989 - clubby789:is-zero-num, r=scottmcm Implement `alloc::vec::IsZero` for `Option<$NUM>` types Fixes #106911 Mirrors the `NonZero$NUM` implementations with an additional `assert_zero_valid`. `None::<i32>` doesn't stricly satisfy `IsZero` but for the purpose of allocating we can produce more efficient codegen.	2023-01-19 08:04:26 +00:00
Scott McMurray	3122db7d03	Implement `SpecOptionPartialEq` for `cmp::Ordering`	2023-01-18 19:19:28 -08:00
Nilstrieb	a6fda3ee7f	Support `true` and `false` as boolean flag params Implements MCP 577.	2023-01-18 20:46:36 +01:00
clubby789	b94a29a25f	Implement `alloc::vec::IsZero` for `Option<$NUM>` types	2023-01-18 15:15:15 +00:00
Matthias Krüger	c96dac16c3	Rollup merge of #106995 - lukas-code:align_offset_assembly_test, r=cuviper bump failing assembly & codegen tests from LLVM 14 to LLVM 15 These tests need LLVM 15. Found by ```@Robert-Cunningham``` in https://github.com/rust-lang/rust/pull/100601#issuecomment-1385400008 Passed tests at 006506e93fc80318ebfd7939fe1fd4dc19ecd8cb in https://github.com/rust-lang/rust/actions/runs/3942442730/jobs/6746104740.	2023-01-18 06:59:21 +01:00
Lukas Markeffsky	1216cc7f1c	bump failing assembly & codegen tests from LLVM 14 to LLVM 15	2023-01-17 20:02:01 +01:00
Nilstrieb	f1255380ac	Add more codegen tests	2023-01-17 16:23:22 +01:00
Nilstrieb	af23ad93cd	Improve comments	2023-01-17 08:14:35 +01:00
Nilstrieb	645c0fddd2	Put `noundef` on all scalars that don't allow uninit Previously, it was only put on scalars with range validity invariants like bool, was uninit was obviously invalid for those. Since then, we have normatively declared all uninit primitives to be undefined behavior and can therefore put `noundef` on them. The remaining concern was the `mem::uninitialized` function, which cause quite a lot of UB in the older parts of the ecosystem. This function now doesn't return uninit values anymore, making users of it safe from this change. The only real sources of UB where people could encounter uninit primitives are `MaybeUninit::uninit().assume_init()`, which has always be clear in the docs about being UB and from heap allocations (like reading from the spare capacity of a vec. This is hopefully rare enough to not break anything.	2023-01-17 08:14:35 +01:00
The 8472	9db0134018	replace manual ptr arithmetic with ptr_sub	2023-01-15 17:38:05 +01:00
Nicholas Bishop	46f9e878f6	Stabilize `abi_efiapi` feature Tracking issue: https://github.com/rust-lang/rust/issues/65815	2023-01-11 20:42:13 -05:00
Ben Kimock	13eec69e1c	Add a regression test for argument copies with DestinationPropagation	2023-01-11 10:27:06 -05:00
Albert Larsan	40ba0e84d5	Change `src/test` to `tests` in source files, fix tidy and tests	2023-01-11 09:32:13 +00:00
Albert Larsan	cf2dff2b1e	Move /src/test to /tests	2023-01-11 09:32:08 +00:00

1 2 3 4

181 Commits