Commit Graph

3668 Commits

Author SHA1 Message Date
bors
015d2bc3fe Auto merge of #83864 - Dylan-DPC:rollup-78an86n, r=Dylan-DPC
Rollup of 7 pull requests

Successful merges:

 - #80525 (wasm64 support)
 - #83019 (core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.)
 - #83717 (rustdoc: Separate filter-empty-string out into its own function)
 - #83807 (Tests: Remove redundant `ignore-tidy-linelength` annotations)
 - #83815 (ptr::addr_of documentation improvements)
 - #83820 (Remove attribute `#[link_args]`)
 - #83841 (Allow clobbering unsupported registers in asm!)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
2021-04-05 01:26:57 +00:00
bors
35aa636159 Auto merge of #83530 - Mark-Simulacrum:bootstrap-bump, r=Mark-Simulacrum
Bump bootstrap to 1.52 beta

This includes the standard bump, but also a workaround for new cargo behavior around clearing out the doc directory when the rustdoc version changes.
2021-04-04 22:45:56 +00:00
Dylan DPC
3c2e4ff525
Rollup merge of #83820 - petrochenkov:nolinkargs, r=nagisa
Remove attribute `#[link_args]`

Closes https://github.com/rust-lang/rust/issues/29596

The attribute could always be replaced with `-C link-arg`, but cargo didn't provide a reasonable way to pass such flags to rustc.
Now cargo supports `cargo:rustc-link-arg*` directives in build scripts (https://doc.rust-lang.org/cargo/reference/unstable.html#extra-link-arg), so this attribute can be removed.
2021-04-05 00:24:33 +02:00
Dylan DPC
fbe89e20e8
Rollup merge of #83815 - RalfJung:addr_of, r=kennytm
ptr::addr_of documentation improvements

While writing https://github.com/rust-lang/reference/pull/1001 I figured I could also improve the docs here a bit.
2021-04-05 00:24:32 +02:00
Dylan DPC
4e3f471499
Rollup merge of #83019 - eddyb:spirv-no-block-swap, r=nagisa
core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.

SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by `ptr::swap_nonoverlapping_one` (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory".

As such, [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/)'s `rustc_codegen_spirv` backend cannot currently allow the use of `ptr::swap_nonoverlapping_one` - but that comes at a great price, since it's the building block of `mem::{swap,replace}`, and those in turn are used by e.g. `Option::take` and `Range`'s `Iterator` implementation (the latter blocking the use of `for i in 0..n` loops).

There's 4 options I can see in terms of supporting `ptr::swap_nonoverlapping_one` in `rustc_codegen_spirv`:
* legalize the block-wise swap loop back into swapping whole values, for SPIR-V
  * this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching `core::ptr` on the fly"
* (**this PR**) disable the block-wise swap optimization altogether when `#[cfg(target_arch = "spirv")`
  * I've tested it and it does in fact allow compiling `for i in 0..n` loops, which was my primary motivation
  * main downside IMO is the fact that `core` now acknowledges an out-of-tree backend
    * as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another
* only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement
  * would avoid any surprises in terms of potentially-broken/inefficient codegen, in general
  * however, it may be universally applicable (thanks to caches), even if the optimal block size could differ
* move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants
  * this also has an impact on MIR optimizations (cc ``@rust-lang/wg-mir-opt)`` - which currently cannot hope to make sense of e.g. `Option::take` despite it being effectively `_0 = *_1;` `*_1 = None;` `return;`
  * long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/) supporting e.g. range `for` loops

r? ``@nagisa`` cc ``@rust-lang/libs``
2021-04-05 00:24:29 +02:00
Eduard-Mihai Burtescu
bc6af97ed0 core: disable ptr::swap_nonoverlapping_one's block optimization on SPIR-V. 2021-04-04 22:26:27 +03:00
Eduard-Mihai Burtescu
3c3d3ddde9 core: rearrange ptr::swap_nonoverlapping_one's cases (no functional changes). 2021-04-04 22:26:00 +03:00
Mark Rousskov
b3a4f91b8d Bump cfgs 2021-04-04 14:57:05 -04:00
Ralf Jung
b577d7ef25
fix typo
Co-authored-by: kennytm <kennytm@gmail.com>
2021-04-04 19:32:54 +02:00
Dylan DPC
b943ea8cdc
Rollup merge of #83827 - the8472:fix-inplace-panic-on-drop, r=RalfJung
cleanup leak after test to make miri happy

Contains changes that were requested in #83629 but didn't make it into the rollup.

r? `````@RalfJung`````
2021-04-04 19:20:06 +02:00
Dylan DPC
6c13556183
Rollup merge of #82726 - ssomers:btree_node_rearange, r=Mark-Simulacrum
BTree: move blocks around in node.rs

Without changing any names or implementation, reorder some members:
- Move down the ones defined long ago on the demised `struct Root`, to below the definition of their current host `struct NodeRef`.
- Move up some defined on `struct NodeRef` that are interspersed with those defined on `struct Handle`.
- Move up the `correct_…` methods squeezed between the two flavours of `push`.
- Move the unchecked static downcasts (`cast_to_…`) after the upcasts (`forget_`) and the (weirdly named) dynamic downcasts (`force`).
r? ````@Mark-Simulacrum````
2021-04-04 19:20:00 +02:00
Dylan DPC
869726d335
Rollup merge of #81619 - SkiFire13:resultshunt-inplace, r=the8472
Implement `SourceIterator` and `InPlaceIterable` for `ResultShunt`
2021-04-04 19:19:59 +02:00
Ralf Jung
5d1747bf07
rely on intra-doc links
Co-authored-by: Yuki Okushi <jtitor@2k36.org>
2021-04-04 11:24:25 +02:00
bors
88e7862dd0 Auto merge of #83267 - ssomers:btree_prune_range_search_overlap, r=Mark-Simulacrum
BTree: no longer search arrays twice to check Ord

A possible addition to / partial replacement of #83147: no longer linearly search the upper bound of a range in the initial portion of the keys we already know are below the lower bound.
- Should be faster: fewer key comparisons at the cost of some instructions dealing with offsets
- Makes code a little more complicated.
- No longer detects ill-defined `Ord` implementations, but that wasn't a publicised feature, and was quite incomplete, and was only done in the `range` and `range_mut` methods.
r? `@Mark-Simulacrum`
2021-04-04 05:52:43 +00:00
the8472
572873fce0
suggestion from review
Co-authored-by: Ralf Jung <post@ralfj.de>
2021-04-04 01:38:58 +02:00
The8472
3bd241f95b cleanup leak after test to make miri happy 2021-04-04 01:37:05 +02:00
Vadim Petrochenkov
5839bff0ba Remove attribute #[link_args] 2021-04-03 21:25:53 +03:00
Ralf Jung
b93137a24e explain that even addr_of cannot deref a NULL ptr 2021-04-03 19:26:54 +02:00
Ralf Jung
a4a6bdd337 addr_of_mut: add example for creating a pointer to uninit data 2021-04-03 19:25:11 +02:00
Yuki Okushi
961fa632d6
Rollup merge of #83780 - matklad:doc-error-message, r=JohnTitor
Document "standard" conventions for error messages

These are currently documented in the API guidelines:

https://rust-lang.github.io/api-guidelines/interoperability.html#error-types-are-meaningful-and-well-behaved-c-good-err

I think it makes sense to uplift this guideline (in a milder form) into
std docs. Printing and producing errors is something that even
non-expert users do frequently, so it is useful to give at least some
indication of what a typical error message looks like.
2021-04-04 00:19:37 +09:00
Yuki Okushi
3b40d2c1f3
Rollup merge of #82487 - CDirkx:const-socketaddr, r=m-ou-se
Constify methods of `std::net::SocketAddr`, `SocketAddrV4` and `SocketAddrV6`

The following methods are made unstable const under the `const_socketaddr` feature (https://github.com/rust-lang/rust/issues/82485):

```rust
// std::net

impl SocketAddr {
    pub const fn ip(&self) -> IpAddr;
    pub const fn port(&self) -> u16;
    pub const fn is_ipv4(&self) -> bool;
    pub const fn is_ipv6(&self) -> bool;
}

impl SocketAddrV4 {
    pub const fn ip(&self) -> IpAddr;
    pub const fn port(&self) -> u16;
}

impl SocketAddrV6 {
    pub const fn ip(&self) -> IpAddr;
    pub const fn port(&self) -> u16;
    pub const fn flowinfo(&self) -> u32;
    pub const fn scope_id(&self) -> u32;
}
```

Note: `SocketAddrV4::ip` and `SocketAddrV6::ip` use pointer casting and depend on the unstable feature `const_raw_ptr_deref`
2021-04-04 00:19:30 +09:00
bors
621d4b7cbf Auto merge of #83506 - asomers:backtrace-0.3.56, r=Mark-Simulacrum
Update backtrace to 0.3.56

Fixes #78184
2021-04-03 01:52:36 +00:00
Dylan DPC
cb7133f693
Rollup merge of #83771 - asomers:stack_overflow_freebsd, r=dtolnay
Fix stack overflow detection on FreeBSD 11.1+

Beginning with FreeBSD 10.4 and 11.1, there is one guard page by
default.  And the stack autoresizes, so if Rust allocates its own guard
page, then FreeBSD's will simply move up one page.  The best solution is
to just use the OS's guard page.
2021-04-02 19:57:35 +02:00
Dylan DPC
542f441d44
Rollup merge of #83629 - the8472:fix-inplace-panic-on-drop, r=m-ou-se
Fix double-drop in `Vec::from_iter(vec.into_iter())` specialization when items drop during panic

This fixes the double-drop but it leaves a behavioral difference compared to the default implementation intact: In the default implementation the source and the destination vec are separate objects, so they get dropped separately. Here they share an allocation and the latter only exists as a pointer into the former. So if dropping the former panics then this fix will leak more items than the default implementation would. Is this acceptable or should the specialization also mimic the default implementation's drops-during-panic behavior?

Fixes #83618

`@rustbot` label T-libs-impl
2021-04-02 19:57:31 +02:00
Dylan DPC
48ebad58b2
Rollup merge of #83065 - CDirkx:win-alloc, r=dtolnay
Rework `std::sys::windows::alloc`

I came across https://github.com/rust-lang/rust/pull/76676#discussion_r488729990, which points out that there was unsound code in the Windows alloc code, creating a &mut to possibly uninitialized memory. I reworked the code so that that particular issue does not occur anymore, and started adding more documentation and safety comments.

Full list of changes:
 - moved and documented the relevant Windows Heap API functions
 - refactor `allocate_with_flags` to `allocate` (and remove the other helper functions), which now takes just a `bool` if the memory should be zeroed
 - add checks for if `GetProcessHeap` returned null
 - add a test that checks if the size and alignment of a `Header` are indeed <= `MIN_ALIGN`
 - add `#![deny(unsafe_op_in_unsafe_fn)]` and the necessary unsafe blocks with safety comments

I feel like I may have overdone the documenting, the unsoundness fix is the most important part; I could spit this PR up in separate parts.
2021-04-02 19:57:28 +02:00
Christiaan Dirkx
db1d003de1 Remove debug_assert 2021-04-02 17:50:23 +02:00
Christiaan Dirkx
c86e0985f9 Introduce get_process_heap and fix atomic ordering. 2021-04-02 17:37:52 +02:00
Yuki Okushi
417e6b1dd0
Rollup merge of #83740 - obi1kenobi:patch-1, r=joshtriplett
Fix comment typo in once.rs

I believe I came across a minor typo in a comment. I am not particularly familiar with this part of the codebase, but I have read the surrounding code as well as the referenced `park` and `unpark` functions, and I believe my proposed change is true to the intended meaning of the comment.

I intentionally tried to keep the change as minimal as possible. If I have the maintainers' permission, I'd also love to add a comma to improve readability as follows: `Luckily ``park`` comes with the guarantee that if it got an ``unpark`` just before on an unparked thread, it does not park.`
2021-04-02 21:28:23 +09:00
Aleksey Kladov
5547d92746 Document "standard" conventions for error messages
These are currently documented in the API guidelines:

https://rust-lang.github.io/api-guidelines/interoperability.html#error-types-are-meaningful-and-well-behaved-c-good-err

I think it makes sense to uplift this guideline (in a milder form) into
std docs. Printing and producing errors is something that even
non-expert users do frequently, so it is useful to give at least some
indication of what a typical error message looks like.
2021-04-02 15:11:49 +03:00
bors
5662d9343f Auto merge of #80965 - camelid:rename-doc-spotlight, r=jyn514
Rename `#[doc(spotlight)]` to `#[doc(notable_trait)]`

Fixes #80936.

"spotlight" is not a very specific or self-explaining name.
Additionally, the dialog that it triggers is called "Notable traits".
So, "notable trait" is a better name.

* Rename `#[doc(spotlight)]` to `#[doc(notable_trait)]`
* Rename `#![feature(doc_spotlight)]` to `#![feature(doc_notable_trait)]`
* Update documentation
* Improve documentation

r? `@Manishearth`
2021-04-02 07:04:58 +00:00
Alan Somers
ca14abbab1 Fix stack overflow detection on FreeBSD 11.1+
Beginning with FreeBSD 10.4 and 11.1, there is one guard page by
default.  And the stack autoresizes, so if Rust allocates its own guard
page, then FreeBSD's will simply move up one page.  The best solution is
to just use the OS's guard page.
2021-04-01 22:57:20 -06:00
bors
803ddb8359 Auto merge of #83726 - the8472:large-trustedlen-fail-fast, r=kennytm
panic early when `TrustedLen` indicates a `length > usize::MAX`

Changes `TrustedLen` specializations to immediately panic when `size_hint().1 == None`.

As far as I can tell this is ~not a change~ a minimal change in observable behavior for anything except ZSTs because the fallback path would go through `extend_desugared()` which tries to `reserve(lower_bound)` which already is `usize::MAX` and that would also lead to a panic. Before it might have popped somewhere between zero and a few elements from the iterator before panicking while it now panics immediately.

Overall this should reduce codegen by eliminating the fallback paths.

While looking into the `with_capacity()` behavior I also noticed that its documentation didn't have a *Panics* section, so I added that.
2021-04-01 07:55:00 +00:00
Predrag Gruevski
2e4215cb72
Fix minor typo in once.rs 2021-04-01 00:52:02 -04:00
The8472
ad3a791e2a panic early when TrustedLen indicates a length > usize::MAX 2021-03-31 23:09:28 +02:00
Frank Steffahn
7509aa108c Apply suggestions from code review
More links, one more occurrence of “a OsString”

Co-authored-by: Yuki Okushi <huyuumi.dev@gmail.com>
2021-03-31 16:09:25 +02:00
Frank Steffahn
f5e7dbb20a Add a few missing links, fix a typo 2021-03-31 16:02:59 +02:00
Frank Steffahn
e7821e5475 Fix documentation of conversion from String to OsString 2021-03-31 16:02:52 +02:00
Dylan DPC
2aa1bf8984
Rollup merge of #83680 - ibraheemdev:patch-2, r=Dylan-DPC
Update for loop desugaring docs

It looks like the documentation for `for` loops was not updated to match the new de-sugaring process.
2021-03-31 01:14:49 +02:00
Dylan DPC
d51fc973e4
Rollup merge of #83678 - GuillaumeGomez:hack-Self-keyword-conflict, r=jyn514
Fix Self keyword doc URL conflict on case insensitive file systems (until definitely fixed on rustdoc)

This is just a hack to allow rustup to work on macOS and windows again to distribute std documentation (hopefully once https://github.com/rust-lang/rfcs/pull/3097 or an equivalent is merged).

Fixes https://github.com/rust-lang/rust/issues/80504. Prevents https://github.com/rust-lang/rust/issues/83154 and https://github.com/rust-lang/rustup/issues/2694 in future releases.

cc ``@kinnison``
r? ``@jyn514``
2021-03-31 01:14:48 +02:00
Dylan DPC
7391124154
Rollup merge of #80720 - steffahn:prettify_prelude_imports, r=camelid,jyn514
Make documentation of which items the prelude exports more readable.

I recently figured out that rustdoc allows link inside of inline code blocks as long as they’re delimited with `<code> </code>` instead of `` ` ` ``. I think this applies nicely in the listing of prelude exports [in the docs](https://doc.rust-lang.org/std/prelude/index.html). There, currently unformatted `::` and `{ , }` is used in order to mimick import syntax while attatching links to individual identifiers.

## Rendered Comparison
### Currently (light)
![Screenshot_20210105_155801](https://user-images.githubusercontent.com/3986214/103661510-1a87be80-4f6f-11eb-8360-1dfb23f732e8.png)

### After this PR (light)
![Screenshot_20210105_155811](https://user-images.githubusercontent.com/3986214/103661533-1f4c7280-4f6f-11eb-89d4-874793937824.png)

### Currently (dark)
![Screenshot_20210105_155824](https://user-images.githubusercontent.com/3986214/103661571-2a9f9e00-4f6f-11eb-95f9-e291b5570b41.png)

### After this PR (dark)
![Screenshot_20210105_155836](https://user-images.githubusercontent.com/3986214/103661592-2ffce880-4f6f-11eb-977a-82afcb07d331.png)

### Currently (ayu)
![Screenshot_20210105_155917](https://user-images.githubusercontent.com/3986214/103661619-39865080-4f6f-11eb-9ca1-9045a107cddd.png)

### After this PR (ayu)
![Screenshot_20210105_155923](https://user-images.githubusercontent.com/3986214/103661652-3db26e00-4f6f-11eb-82b7-378e38f0c41f.png)

_Edit:_ I just noticed, the “current” screenshots are from stable, so there are a few more differences in the pictures than the ones from just this PR.
2021-03-31 01:14:40 +02:00
bors
74874a690b Auto merge of #83652 - xu-cheng:ipv4-octal, r=sfackler
Disallow octal format in Ipv4 string

In its original specification, leading zero in Ipv4 string is interpreted
as octal literals. So a IP address 0127.0.0.1 actually means 87.0.0.1.

This confusion can lead to many security vulnerabilities. Therefore, in
[IETF RFC 6943], it suggests to disallow octal/hexadecimal format in Ipv4
string all together.

Existing implementation already disallows hexadecimal numbers. This commit
makes Parser reject octal numbers.

Fixes #83648.

[IETF RFC 6943]: https://tools.ietf.org/html/rfc6943#section-3.1.1
2021-03-30 19:34:23 +00:00
Ibraheem Ahmed
29fe5930a3
update for loop desugaring docs 2021-03-30 12:03:58 -04:00
Guillaume Gomez
f35e587db4 Fix Self keyword doc URL conflict on case insensitive file systems 2021-03-30 16:37:13 +02:00
bors
7b6fc5a3dd Auto merge of #83170 - joshtriplett:spawn-cleanup, r=kennytm
Simplify Command::spawn (no semantic change)

This minimizes the size of an unsafe block, and allows outdenting some
complex code.
2021-03-30 14:26:01 +00:00
bors
16156fb278 Auto merge of #83674 - Dylan-DPC:rollup-bcuc1hl, r=Dylan-DPC
Rollup of 7 pull requests

Successful merges:

 - #83568 (update comment at MaybeUninit::uninit_array)
 - #83571 (Constantify some slice methods)
 - #83579 (Improve pointer arithmetic docs)
 - #83645 (Wrap non-pre code blocks)
 - #83656 (Add a regression test for issue-82865)
 - #83662 (Update books)
 - #83667 (Suggest box/pin/arc ing receiver on method calls)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
2021-03-30 11:44:36 +00:00
Dylan DPC
5b67543c98
Rollup merge of #83579 - RalfJung:ptr-arithmetic, r=dtolnay
Improve pointer arithmetic docs

* Add slightly more detailed definition of "allocated object" to the module docs, and link it from everywhere.
* Clarify the "remains attached" wording a bit (at least I hope this is clearer).
* Remove the sentence about using integer arithmetic; this seems to confuse people even if it is technically correct.

As usual, the edit needs to be done in a dozen places to remain consistent, I hope I got them all.
2021-03-30 11:34:26 +02:00
Dylan DPC
ad2a80e412
Rollup merge of #83571 - a1phyr:feature_const_slice_first_last, r=dtolnay
Constantify some slice methods

Tracking issue: #83570

This PR constantifies the following functions under feature `const_slice_first_last`:
- `slice::first`
- `slice::split_first`
- `slice::last`
- `slice::split_last`

Blocking on `#![feature(const_mut_refs)]`:
- `slice::first_mut`
- `slice::split_first_mut`
- `slice::last_mut`
- `slice::split_last_mut`
2021-03-30 11:34:25 +02:00
Dylan DPC
9ab5f7db30
Rollup merge of #83568 - RalfJung:uninit_array, r=dtolnay
update comment at MaybeUninit::uninit_array

https://github.com/rust-lang/rust/issues/49147 is closed; this now instead needs inline const expressions (#76001).
2021-03-30 11:34:23 +02:00
bors
689e8470ff Auto merge of #83458 - saethlin:improve-vec-benches, r=dtolnay
Clean up Vec's benchmarks

The Vec benchmarks need a lot of love. I sort of noticed this in https://github.com/rust-lang/rust/pull/83357 but the overall situation is much less awesome than I thought at the time. The first commit just removes a lot of asserts and does a touch of other cleanup.

A number of these benchmarks are poorly-named. For example, `bench_map_fast` is not in fact fast, `bench_rev_1` and `bench_rev_2` are vague, `bench_in_place_zip_iter_mut` doesn't call `zip`, `bench_in_place*` don't do anything in-place... Should I fix these, or is there tooling that depend on the names not changing?

I've also noticed that `bench_rev_1` and `bench_rev_2` are remarkably fragile. It looks like poking other code in `Vec` can cause the codegen of this benchmark to switch to a version that has almost exactly half its current throughput and I have absolutely no idea why.

Here's the fast version:
```asm
  0.69 │110:   movdqu -0x20(%rbx,%rdx,4),%xmm0
  1.76 │       movdqu -0x10(%rbx,%rdx,4),%xmm1
  0.71 │       pshufd $0x1b,%xmm1,%xmm1
  0.60 │       pshufd $0x1b,%xmm0,%xmm0
  3.68 │       movdqu %xmm1,-0x30(%rcx)
 14.36 │       movdqu %xmm0,-0x20(%rcx)
 13.88 │       movdqu -0x40(%rbx,%rdx,4),%xmm0
  6.64 │       movdqu -0x30(%rbx,%rdx,4),%xmm1
  0.76 │       pshufd $0x1b,%xmm1,%xmm1
  0.77 │       pshufd $0x1b,%xmm0,%xmm0
  1.87 │       movdqu %xmm1,-0x10(%rcx)
 13.01 │       movdqu %xmm0,(%rcx)
 38.81 │       add    $0x40,%rcx
  0.92 │       add    $0xfffffffffffffff0,%rdx
  1.22 │     ↑ jne    110
```
And the slow one:
```asm
  0.42 │9a880:   movdqa     %xmm2,%xmm1
  4.03 │9a884:   movq       -0x8(%rbx,%rsi,4),%xmm4
  8.49 │9a88a:   pshufd     $0xe1,%xmm4,%xmm4
  2.58 │9a88f:   movq       -0x10(%rbx,%rsi,4),%xmm5
  7.02 │9a895:   pshufd     $0xe1,%xmm5,%xmm5
  4.79 │9a89a:   punpcklqdq %xmm5,%xmm4
  5.77 │9a89e:   movdqu     %xmm4,-0x18(%rdx)
 15.74 │9a8a3:   movq       -0x18(%rbx,%rsi,4),%xmm4
  3.91 │9a8a9:   pshufd     $0xe1,%xmm4,%xmm4
  5.04 │9a8ae:   movq       -0x20(%rbx,%rsi,4),%xmm5
  5.29 │9a8b4:   pshufd     $0xe1,%xmm5,%xmm5
  4.60 │9a8b9:   punpcklqdq %xmm5,%xmm4
  9.81 │9a8bd:   movdqu     %xmm4,-0x8(%rdx)
 11.05 │9a8c2:   paddq      %xmm3,%xmm0
  0.86 │9a8c6:   paddq      %xmm3,%xmm2
  5.89 │9a8ca:   add        $0x20,%rdx
  0.12 │9a8ce:   add        $0xfffffffffffffff8,%rsi
  1.16 │9a8d2:   add        $0x2,%rdi
  2.96 │9a8d6: → jne        9a880 <<alloc::vec::Vec<T,A> as core::iter::traits::collect::Extend<&T>>::extend+0xd0>
```
2021-03-30 09:03:29 +00:00
bors
32d3276561 Auto merge of #83357 - saethlin:vec-reserve-inlining, r=dtolnay
Reduce the impact of Vec::reserve calls that do not cause any allocation

I think a lot of callers expect `Vec::reserve` to be nearly free when no resizing is required, but unfortunately that isn't the case. LLVM makes remarkably poor inlining choices (along the path from `Vec::reserve` to `RawVec::grow_amortized`), so depending on the surrounding context you either get a huge blob of `RawVec`'s resizing logic inlined into some seemingly-unrelated function, or not enough inlining happens and/or the actual check in `needs_to_grow` ends up behind a function call. My goal is to make the codegen for `Vec::reserve` match the mental that callers seem to have: It's reliably just a `sub cmp ja` if there is already sufficient capacity.

This patch has the following impact on the serde_json benchmarks: ca3efde8a5 run with `cargo +stage1 run --release -- -n 1024`

Before:
```
                                DOM                  STRUCT
======= serde_json ======= parse|stringify ===== parse|stringify ====
data/canada.json         340 MB/s   490 MB/s   630 MB/s   370 MB/s
data/citm_catalog.json   460 MB/s   540 MB/s  1010 MB/s   550 MB/s
data/twitter.json        330 MB/s   840 MB/s   640 MB/s   630 MB/s

======= json-rust ======== parse|stringify ===== parse|stringify ====
data/canada.json         580 MB/s   990 MB/s
data/citm_catalog.json   720 MB/s   660 MB/s
data/twitter.json        570 MB/s   960 MB/s
```

After:
```
                                DOM                  STRUCT
======= serde_json ======= parse|stringify ===== parse|stringify ====
data/canada.json         330 MB/s   510 MB/s   610 MB/s   380 MB/s
data/citm_catalog.json   450 MB/s   640 MB/s   970 MB/s   830 MB/s
data/twitter.json        330 MB/s   880 MB/s   670 MB/s   960 MB/s

======= json-rust ======== parse|stringify ===== parse|stringify ====
data/canada.json         560 MB/s  1130 MB/s
data/citm_catalog.json   710 MB/s   880 MB/s
data/twitter.json        530 MB/s  1230 MB/s

```

That's approximately a one-third increase in throughput on two of the benchmarks, and no effect on one (The benchmark suite has sufficient jitter that I could pick a run where there are no regressions, so I'm not convinced they're meaningful here).

This also produces perf increases on the order of 3-5% in a few other microbenchmarks that I'm tracking. It might be useful to see if this has a cascading effect on inlining choices in some large codebases.

Compiling this simple program demonstrates the change in codegen that causes the perf impact:
```rust
fn main() {
    reserve(&mut Vec::new());
}

#[inline(never)]
fn reserve(v: &mut Vec<u8>) {
    v.reserve(1234);
}
```

Before:
```rust
00000000000069b0 <scratch::reserve>:
    69b0:       53                      push   %rbx
    69b1:       48 83 ec 30             sub    $0x30,%rsp
    69b5:       48 8b 47 08             mov    0x8(%rdi),%rax
    69b9:       48 8b 4f 10             mov    0x10(%rdi),%rcx
    69bd:       48 89 c2                mov    %rax,%rdx
    69c0:       48 29 ca                sub    %rcx,%rdx
    69c3:       48 81 fa d1 04 00 00    cmp    $0x4d1,%rdx
    69ca:       77 73                   ja     6a3f <scratch::reserve+0x8f>
    69cc:       48 81 c1 d2 04 00 00    add    $0x4d2,%rcx
    69d3:       72 75                   jb     6a4a <scratch::reserve+0x9a>
    69d5:       48 89 fb                mov    %rdi,%rbx
    69d8:       48 8d 14 00             lea    (%rax,%rax,1),%rdx
    69dc:       48 39 ca                cmp    %rcx,%rdx
    69df:       48 0f 47 ca             cmova  %rdx,%rcx
    69e3:       48 83 f9 08             cmp    $0x8,%rcx
    69e7:       be 08 00 00 00          mov    $0x8,%esi
    69ec:       48 0f 47 f1             cmova  %rcx,%rsi
    69f0:       48 85 c0                test   %rax,%rax
    69f3:       74 17                   je     6a0c <scratch::reserve+0x5c>
    69f5:       48 8b 0b                mov    (%rbx),%rcx
    69f8:       48 89 0c 24             mov    %rcx,(%rsp)
    69fc:       48 89 44 24 08          mov    %rax,0x8(%rsp)
    6a01:       48 c7 44 24 10 01 00    movq   $0x1,0x10(%rsp)
    6a08:       00 00
    6a0a:       eb 08                   jmp    6a14 <scratch::reserve+0x64>
    6a0c:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
    6a13:       00
    6a14:       48 8d 7c 24 18          lea    0x18(%rsp),%rdi
    6a19:       48 89 e1                mov    %rsp,%rcx
    6a1c:       ba 01 00 00 00          mov    $0x1,%edx
    6a21:       e8 9a fe ff ff          call   68c0 <alloc::raw_vec::finish_grow>
    6a26:       48 8b 7c 24 20          mov    0x20(%rsp),%rdi
    6a2b:       48 8b 74 24 28          mov    0x28(%rsp),%rsi
    6a30:       48 83 7c 24 18 01       cmpq   $0x1,0x18(%rsp)
    6a36:       74 0d                   je     6a45 <scratch::reserve+0x95>
    6a38:       48 89 3b                mov    %rdi,(%rbx)
    6a3b:       48 89 73 08             mov    %rsi,0x8(%rbx)
    6a3f:       48 83 c4 30             add    $0x30,%rsp
    6a43:       5b                      pop    %rbx
    6a44:       c3                      ret
    6a45:       48 85 f6                test   %rsi,%rsi
    6a48:       75 08                   jne    6a52 <scratch::reserve+0xa2>
    6a4a:       ff 15 38 c4 03 00       call   *0x3c438(%rip)        # 42e88 <_GLOBAL_OFFSET_TABLE_+0x490>
    6a50:       0f 0b                   ud2
    6a52:       ff 15 f0 c4 03 00       call   *0x3c4f0(%rip)        # 42f48 <_GLOBAL_OFFSET_TABLE_+0x550>
    6a58:       0f 0b                   ud2
    6a5a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
```

After:
```asm
0000000000006910 <scratch::reserve>:
    6910:       48 8b 47 08             mov    0x8(%rdi),%rax
    6914:       48 8b 77 10             mov    0x10(%rdi),%rsi
    6918:       48 29 f0                sub    %rsi,%rax
    691b:       48 3d d1 04 00 00       cmp    $0x4d1,%rax
    6921:       77 05                   ja     6928 <scratch::reserve+0x18>
    6923:       e9 e8 fe ff ff          jmp    6810 <alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle>
    6928:       c3                      ret
    6929:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
```
2021-03-30 03:41:14 +00:00