Improve the `array::map` codegen
The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.
There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.
(No libs-api changes; everything should be completely implementation-detail-internal.)
It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞
r? `@thomcc`
As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
%array.i.i.i.i = alloca [64 x i32], align 4
%_3.sroa.5.i.i.i = alloca [65 x i32], align 4
%_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)
But with this PR it's only `sub rsp, 520`
```llvm
start:
%array.i.i.i.i.i.i = alloca [64 x i32], align 4
%array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```
Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
mov esi, 64
mov edi, 0
cmp rdx, 64
je .LBB0_3
lea rsi, [rdx + 1]
mov qword ptr [rsp + 784], rsi
mov r8d, dword ptr [rsp + 4*rdx + 528]
mov edi, 1
lea edx, [r8 + 2*r8]
lea r8d, [r8 + 4*rdx]
add r8d, 7
.LBB0_3:
test edi, edi
je .LBB0_11
mov dword ptr [rsp + 4*rcx + 272], r8d
cmp rsi, 64
jne .LBB0_6
xor r8d, r8d
mov edx, 64
test r8d, r8d
jne .LBB0_8
jmp .LBB0_11
.LBB0_6:
lea rdx, [rsi + 1]
mov qword ptr [rsp + 784], rdx
mov edi, dword ptr [rsp + 4*rsi + 528]
mov r8d, 1
lea esi, [rdi + 2*rdi]
lea edi, [rdi + 4*rsi]
add edi, 7
test r8d, r8d
je .LBB0_11
.LBB0_8:
mov dword ptr [rsp + 4*rcx + 276], edi
add rcx, 2
cmp rcx, 64
jne .LBB0_1
```
whereas with this PR it's unrolled and vectorized
```nasm
vpmulld ymm1, ymm0, ymmword ptr [rsp + 64]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 328], ymm1
vpmulld ymm1, ymm0, ymmword ptr [rsp + 96]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
a placeholder type is the same as a param as they
represent "this could be any type". A bound type
represents a type inside of a `for<T>` or `exists<T>`.
When entering a forall or exists `T` should be
instantiated as a existential (inference var) or universal
(placeholder). You should never observe a bound variable
without its binder.
Rollup of 7 pull requests
Successful merges:
- #107654 (reword descriptions of the deprecated int modules)
- #107915 (Add `array::map` benchmarks)
- #107961 (Avoid copy-pasting the `ilog` panic string in a bunch of places)
- #107962 (Add a doc note about why `Chain` is not `ExactSizeIterator`)
- #107966 (Update browser-ui-test version to 0.14.3)
- #107970 (Hermit: Remove floor symbol)
- #107973 (Fix unintentional UB in SIMD tests)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid copy-pasting the `ilog` panic string in a bunch of places
I also ended up changing the implementations to `if let` because it doesn't work to
```rust
self.checked_ilog2().unwrap_or_else(panic_for_nonpositive_argument)
```
due to the `!`. But as a bonus that meant I could remove the `rustc_allow_const_fn_unstable` too.
Add `array::map` benchmarks
Since there were no previous benchmarks for `array::map`, and it is known to have mediocre/poor performance, add some simple benchmarks. These benchmarks vary the length of the array and size of each item.
Rollup of 8 pull requests
Successful merges:
- #107748 (refer to new home)
- #107842 (Patch `build/rustfmt/lib/*.so` for NixOS)
- #107930 (Improve JS function itemTypeFromName code a bit)
- #107934 (rustdoc: account for intra-doc links in `<meta name="description">`)
- #107943 (Document `PointerLike`)
- #107954 (avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs)
- #107955 (fix UB in ancient test)
- #107964 (rustdoc: use tighter line height in h1 and h2)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
fix UB in ancient test
This seems to go back all the way to the [original version of this test](b9aa9def85/src/test/run-pass/regions-mock-trans.rs) from ten years ago... ``@nikomatsakis`` trip down memory lane? ;)
Clearly deallocation is a form of mutation so doing it to a (pointer derived from a) shared reference cannot be legal. Let's use mutable references instead.
avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs
``@Vanille-N`` is working on a successor for Stacked Borrows. It will mostly accept strictly more code than Stacked Borrows did, with one exception: the following pattern no longer works.
```rust
let mut root = 6u8;
let mref = &mut root;
let ptr = mref as *mut u8;
*ptr = 0; // Write
assert_eq!(root, 0); // Parent Read
*ptr = 0; // Attempted Write
```
This worked in Stacked Borrows kind of by accident: when doing the "parent read", under SB we Disable `mref`, but the raw ptrs derived from it remain usable. The fact that we can still use the "children" of a reference that is no longer usable is quite nasty and leads to some undesirable effects (in particular it is the major blocker for resolving https://github.com/rust-lang/unsafe-code-guidelines/issues/257). So in Tree Borrows we no longer do that; instead, reading from `root` makes `mref` and all its children read-only.
Due to other improvements in Tree Borrows, the entire Miri test suite still passes with this new behavior, and even the entire libcore and liballoc test suite, except for these 2 cases this PR fixes. Both of these involve code where the programmer wrote `&mut` but then used pointers derived from that reference in ways that alias with the parent pointer, which arguably is violating uniqueness. They are fixed by properly using raw pointers throughout.
Document `PointerLike`
I forgot to document this, and even though it's currently more of an implementation detail, the old doc was kinda embarrassing 😅
rustdoc: account for intra-doc links in `<meta name="description">`
Similar to #86451, but for the SEO descriptions instead of the search descriptions.
Create a single value cache for the () query key
Since queries using `()` as the key can only store a single value, specialize for that case.
This looks like a minor performance improvement:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.8477s</td><td align="right">1.8415s</td><td align="right"> -0.33%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2666s</td><td align="right">0.2655s</td><td align="right"> -0.40%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">6.3943s</td><td align="right">6.3686s</td><td align="right"> -0.40%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.6413s</td><td align="right">1.6345s</td><td align="right"> -0.42%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">1.0337s</td><td align="right">1.0313s</td><td align="right"> -0.24%</td></tr><tr><td>Total</td><td align="right">11.1836s</td><td align="right">11.1414s</td><td align="right"> -0.38%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9964s</td><td align="right"> -0.36%</td></tr></table>
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
feat: Add clippy configuration section to the manual and update some old keys
Closes#14132
I don't think this is supposed to be under `Diagnostics`, but it does make sense in a way 🤷 (it's probably where someone might look).
minor: Add version placeholder to changelog template
Closes#13967
This isn't great because we need to fill it in manually, but getting the version number from GitHub Actions is a bit annoying.