nordic-dev.net/rust - rust

mirror of https://github.com/rust-lang/rust.git synced 2024-11-28 01:34:21 +00:00

Author	SHA1	Message	Date
bors	015d2bc3fe	Auto merge of #83864 - Dylan-DPC:rollup-78an86n, r=Dylan-DPC Rollup of 7 pull requests Successful merges: - #80525 (wasm64 support) - #83019 (core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.) - #83717 (rustdoc: Separate filter-empty-string out into its own function) - #83807 (Tests: Remove redundant `ignore-tidy-linelength` annotations) - #83815 (ptr::addr_of documentation improvements) - #83820 (Remove attribute `#[link_args]`) - #83841 (Allow clobbering unsupported registers in asm!) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup	2021-04-05 01:26:57 +00:00
bors	35aa636159	Auto merge of #83530 - Mark-Simulacrum:bootstrap-bump, r=Mark-Simulacrum Bump bootstrap to 1.52 beta This includes the standard bump, but also a workaround for new cargo behavior around clearing out the doc directory when the rustdoc version changes.	2021-04-04 22:45:56 +00:00
Dylan DPC	3c2e4ff525	Rollup merge of #83820 - petrochenkov:nolinkargs, r=nagisa Remove attribute `#[link_args]` Closes https://github.com/rust-lang/rust/issues/29596 The attribute could always be replaced with `-C link-arg`, but cargo didn't provide a reasonable way to pass such flags to rustc. Now cargo supports `cargo:rustc-link-arg*` directives in build scripts (https://doc.rust-lang.org/cargo/reference/unstable.html#extra-link-arg), so this attribute can be removed.	2021-04-05 00:24:33 +02:00
Dylan DPC	fbe89e20e8	Rollup merge of #83815 - RalfJung:addr_of, r=kennytm ptr::addr_of documentation improvements While writing https://github.com/rust-lang/reference/pull/1001 I figured I could also improve the docs here a bit.	2021-04-05 00:24:32 +02:00
Dylan DPC	4e3f471499	Rollup merge of #83019 - eddyb:spirv-no-block-swap, r=nagisa core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by `ptr::swap_nonoverlapping_one` (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory". As such, [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/)'s `rustc_codegen_spirv` backend cannot currently allow the use of `ptr::swap_nonoverlapping_one` - but that comes at a great price, since it's the building block of `mem::{swap,replace}`, and those in turn are used by e.g. `Option::take` and `Range`'s `Iterator` implementation (the latter blocking the use of `for i in 0..n` loops). There's 4 options I can see in terms of supporting `ptr::swap_nonoverlapping_one` in `rustc_codegen_spirv`: * legalize the block-wise swap loop back into swapping whole values, for SPIR-V * this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching `core::ptr` on the fly" * (this PR) disable the block-wise swap optimization altogether when `#[cfg(target_arch = "spirv")` * I've tested it and it does in fact allow compiling `for i in 0..n` loops, which was my primary motivation * main downside IMO is the fact that `core` now acknowledges an out-of-tree backend * as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another * only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement * would avoid any surprises in terms of potentially-broken/inefficient codegen, in general * however, it may be universally applicable (thanks to caches), even if the optimal block size could differ * move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants * this also has an impact on MIR optimizations (cc ``@rust-lang/wg-mir-opt)`` - which currently cannot hope to make sense of e.g. `Option::take` despite it being effectively `_0 = _1;` `_1 = None;` `return;` * long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/) supporting e.g. range `for` loops r? ``@nagisa`` cc ``@rust-lang/libs``	2021-04-05 00:24:29 +02:00
Eduard-Mihai Burtescu	bc6af97ed0	core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.	2021-04-04 22:26:27 +03:00
Eduard-Mihai Burtescu	3c3d3ddde9	core: rearrange `ptr::swap_nonoverlapping_one`'s cases (no functional changes).	2021-04-04 22:26:00 +03:00
Mark Rousskov	b3a4f91b8d	Bump cfgs	2021-04-04 14:57:05 -04:00
Ralf Jung	b577d7ef25	fix typo Co-authored-by: kennytm <kennytm@gmail.com>	2021-04-04 19:32:54 +02:00
Dylan DPC	b943ea8cdc	Rollup merge of #83827 - the8472:fix-inplace-panic-on-drop, r=RalfJung cleanup leak after test to make miri happy Contains changes that were requested in #83629 but didn't make it into the rollup. r? `````@RalfJung`````	2021-04-04 19:20:06 +02:00
Dylan DPC	6c13556183	Rollup merge of #82726 - ssomers:btree_node_rearange, r=Mark-Simulacrum BTree: move blocks around in node.rs Without changing any names or implementation, reorder some members: - Move down the ones defined long ago on the demised `struct Root`, to below the definition of their current host `struct NodeRef`. - Move up some defined on `struct NodeRef` that are interspersed with those defined on `struct Handle`. - Move up the `correct_…` methods squeezed between the two flavours of `push`. - Move the unchecked static downcasts (`cast_to_…`) after the upcasts (`forget_`) and the (weirdly named) dynamic downcasts (`force`). r? ````@Mark-Simulacrum````	2021-04-04 19:20:00 +02:00
Dylan DPC	869726d335	Rollup merge of #81619 - SkiFire13:resultshunt-inplace, r=the8472 Implement `SourceIterator` and `InPlaceIterable` for `ResultShunt`	2021-04-04 19:19:59 +02:00
Ralf Jung	5d1747bf07	rely on intra-doc links Co-authored-by: Yuki Okushi <jtitor@2k36.org>	2021-04-04 11:24:25 +02:00
bors	88e7862dd0	Auto merge of #83267 - ssomers:btree_prune_range_search_overlap, r=Mark-Simulacrum BTree: no longer search arrays twice to check Ord A possible addition to / partial replacement of #83147: no longer linearly search the upper bound of a range in the initial portion of the keys we already know are below the lower bound. - Should be faster: fewer key comparisons at the cost of some instructions dealing with offsets - Makes code a little more complicated. - No longer detects ill-defined `Ord` implementations, but that wasn't a publicised feature, and was quite incomplete, and was only done in the `range` and `range_mut` methods. r? `@Mark-Simulacrum`	2021-04-04 05:52:43 +00:00
the8472	572873fce0	suggestion from review Co-authored-by: Ralf Jung <post@ralfj.de>	2021-04-04 01:38:58 +02:00
The8472	3bd241f95b	cleanup leak after test to make miri happy	2021-04-04 01:37:05 +02:00
Vadim Petrochenkov	5839bff0ba	Remove attribute `#[link_args]`	2021-04-03 21:25:53 +03:00
Ralf Jung	b93137a24e	explain that even addr_of cannot deref a NULL ptr	2021-04-03 19:26:54 +02:00
Ralf Jung	a4a6bdd337	addr_of_mut: add example for creating a pointer to uninit data	2021-04-03 19:25:11 +02:00
Yuki Okushi	961fa632d6	Rollup merge of #83780 - matklad:doc-error-message, r=JohnTitor Document "standard" conventions for error messages These are currently documented in the API guidelines: https://rust-lang.github.io/api-guidelines/interoperability.html#error-types-are-meaningful-and-well-behaved-c-good-err I think it makes sense to uplift this guideline (in a milder form) into std docs. Printing and producing errors is something that even non-expert users do frequently, so it is useful to give at least some indication of what a typical error message looks like.	2021-04-04 00:19:37 +09:00
Yuki Okushi	3b40d2c1f3	Rollup merge of #82487 - CDirkx:const-socketaddr, r=m-ou-se Constify methods of `std::net::SocketAddr`, `SocketAddrV4` and `SocketAddrV6` The following methods are made unstable const under the `const_socketaddr` feature (https://github.com/rust-lang/rust/issues/82485): ```rust // std::net impl SocketAddr { pub const fn ip(&self) -> IpAddr; pub const fn port(&self) -> u16; pub const fn is_ipv4(&self) -> bool; pub const fn is_ipv6(&self) -> bool; } impl SocketAddrV4 { pub const fn ip(&self) -> IpAddr; pub const fn port(&self) -> u16; } impl SocketAddrV6 { pub const fn ip(&self) -> IpAddr; pub const fn port(&self) -> u16; pub const fn flowinfo(&self) -> u32; pub const fn scope_id(&self) -> u32; } ``` Note: `SocketAddrV4::ip` and `SocketAddrV6::ip` use pointer casting and depend on the unstable feature `const_raw_ptr_deref`	2021-04-04 00:19:30 +09:00
bors	621d4b7cbf	Auto merge of #83506 - asomers:backtrace-0.3.56, r=Mark-Simulacrum Update backtrace to 0.3.56 Fixes #78184	2021-04-03 01:52:36 +00:00
Dylan DPC	cb7133f693	Rollup merge of #83771 - asomers:stack_overflow_freebsd, r=dtolnay Fix stack overflow detection on FreeBSD 11.1+ Beginning with FreeBSD 10.4 and 11.1, there is one guard page by default. And the stack autoresizes, so if Rust allocates its own guard page, then FreeBSD's will simply move up one page. The best solution is to just use the OS's guard page.	2021-04-02 19:57:35 +02:00
Dylan DPC	542f441d44	Rollup merge of #83629 - the8472:fix-inplace-panic-on-drop, r=m-ou-se Fix double-drop in `Vec::from_iter(vec.into_iter())` specialization when items drop during panic This fixes the double-drop but it leaves a behavioral difference compared to the default implementation intact: In the default implementation the source and the destination vec are separate objects, so they get dropped separately. Here they share an allocation and the latter only exists as a pointer into the former. So if dropping the former panics then this fix will leak more items than the default implementation would. Is this acceptable or should the specialization also mimic the default implementation's drops-during-panic behavior? Fixes #83618 `@rustbot` label T-libs-impl	2021-04-02 19:57:31 +02:00
Dylan DPC	48ebad58b2	Rollup merge of #83065 - CDirkx:win-alloc, r=dtolnay Rework `std::sys::windows::alloc` I came across https://github.com/rust-lang/rust/pull/76676#discussion_r488729990, which points out that there was unsound code in the Windows alloc code, creating a &mut to possibly uninitialized memory. I reworked the code so that that particular issue does not occur anymore, and started adding more documentation and safety comments. Full list of changes: - moved and documented the relevant Windows Heap API functions - refactor `allocate_with_flags` to `allocate` (and remove the other helper functions), which now takes just a `bool` if the memory should be zeroed - add checks for if `GetProcessHeap` returned null - add a test that checks if the size and alignment of a `Header` are indeed <= `MIN_ALIGN` - add `#![deny(unsafe_op_in_unsafe_fn)]` and the necessary unsafe blocks with safety comments I feel like I may have overdone the documenting, the unsoundness fix is the most important part; I could spit this PR up in separate parts.	2021-04-02 19:57:28 +02:00
Christiaan Dirkx	db1d003de1	Remove `debug_assert`	2021-04-02 17:50:23 +02:00
Christiaan Dirkx	c86e0985f9	Introduce `get_process_heap` and fix atomic ordering.	2021-04-02 17:37:52 +02:00
Yuki Okushi	417e6b1dd0	Rollup merge of #83740 - obi1kenobi:patch-1, r=joshtriplett Fix comment typo in once.rs I believe I came across a minor typo in a comment. I am not particularly familiar with this part of the codebase, but I have read the surrounding code as well as the referenced `park` and `unpark` functions, and I believe my proposed change is true to the intended meaning of the comment. I intentionally tried to keep the change as minimal as possible. If I have the maintainers' permission, I'd also love to add a comma to improve readability as follows: `Luckily ``park`` comes with the guarantee that if it got an ``unpark`` just before on an unparked thread, it does not park.`	2021-04-02 21:28:23 +09:00
Aleksey Kladov	5547d92746	Document "standard" conventions for error messages These are currently documented in the API guidelines: https://rust-lang.github.io/api-guidelines/interoperability.html#error-types-are-meaningful-and-well-behaved-c-good-err I think it makes sense to uplift this guideline (in a milder form) into std docs. Printing and producing errors is something that even non-expert users do frequently, so it is useful to give at least some indication of what a typical error message looks like.	2021-04-02 15:11:49 +03:00
bors	5662d9343f	Auto merge of #80965 - camelid:rename-doc-spotlight, r=jyn514 Rename `#[doc(spotlight)]` to `#[doc(notable_trait)]` Fixes #80936. "spotlight" is not a very specific or self-explaining name. Additionally, the dialog that it triggers is called "Notable traits". So, "notable trait" is a better name. * Rename `#[doc(spotlight)]` to `#[doc(notable_trait)]` * Rename `#![feature(doc_spotlight)]` to `#![feature(doc_notable_trait)]` * Update documentation * Improve documentation r? `@Manishearth`	2021-04-02 07:04:58 +00:00
Alan Somers	ca14abbab1	Fix stack overflow detection on FreeBSD 11.1+ Beginning with FreeBSD 10.4 and 11.1, there is one guard page by default. And the stack autoresizes, so if Rust allocates its own guard page, then FreeBSD's will simply move up one page. The best solution is to just use the OS's guard page.	2021-04-01 22:57:20 -06:00
bors	803ddb8359	Auto merge of #83726 - the8472:large-trustedlen-fail-fast, r=kennytm panic early when `TrustedLen` indicates a `length > usize::MAX` Changes `TrustedLen` specializations to immediately panic when `size_hint().1 == None`. As far as I can tell this is ~not a change~ a minimal change in observable behavior for anything except ZSTs because the fallback path would go through `extend_desugared()` which tries to `reserve(lower_bound)` which already is `usize::MAX` and that would also lead to a panic. Before it might have popped somewhere between zero and a few elements from the iterator before panicking while it now panics immediately. Overall this should reduce codegen by eliminating the fallback paths. While looking into the `with_capacity()` behavior I also noticed that its documentation didn't have a Panics section, so I added that.	2021-04-01 07:55:00 +00:00
Predrag Gruevski	2e4215cb72	Fix minor typo in once.rs	2021-04-01 00:52:02 -04:00
The8472	ad3a791e2a	panic early when TrustedLen indicates a length > usize::MAX	2021-03-31 23:09:28 +02:00
Frank Steffahn	7509aa108c	Apply suggestions from code review More links, one more occurrence of “a OsString” Co-authored-by: Yuki Okushi <huyuumi.dev@gmail.com>	2021-03-31 16:09:25 +02:00
Frank Steffahn	f5e7dbb20a	Add a few missing links, fix a typo	2021-03-31 16:02:59 +02:00
Frank Steffahn	e7821e5475	Fix documentation of conversion from String to OsString	2021-03-31 16:02:52 +02:00
Dylan DPC	2aa1bf8984	Rollup merge of #83680 - ibraheemdev:patch-2, r=Dylan-DPC Update for loop desugaring docs It looks like the documentation for `for` loops was not updated to match the new de-sugaring process.	2021-03-31 01:14:49 +02:00
Dylan DPC	d51fc973e4	Rollup merge of #83678 - GuillaumeGomez:hack-Self-keyword-conflict, r=jyn514 Fix Self keyword doc URL conflict on case insensitive file systems (until definitely fixed on rustdoc) This is just a hack to allow rustup to work on macOS and windows again to distribute std documentation (hopefully once https://github.com/rust-lang/rfcs/pull/3097 or an equivalent is merged). Fixes https://github.com/rust-lang/rust/issues/80504. Prevents https://github.com/rust-lang/rust/issues/83154 and https://github.com/rust-lang/rustup/issues/2694 in future releases. cc ``@kinnison`` r? ``@jyn514``	2021-03-31 01:14:48 +02:00
Dylan DPC	7391124154	Rollup merge of #80720 - steffahn:prettify_prelude_imports, r=camelid,jyn514 Make documentation of which items the prelude exports more readable. I recently figured out that rustdoc allows link inside of inline code blocks as long as they’re delimited with `<code> </code>` instead of `` ` ` ``. I think this applies nicely in the listing of prelude exports [in the docs](https://doc.rust-lang.org/std/prelude/index.html). There, currently unformatted `::` and `{ , }` is used in order to mimick import syntax while attatching links to individual identifiers. ## Rendered Comparison ### Currently (light) ![Screenshot_20210105_155801](https://user-images.githubusercontent.com/3986214/103661510-1a87be80-4f6f-11eb-8360-1dfb23f732e8.png) ### After this PR (light) ![Screenshot_20210105_155811](https://user-images.githubusercontent.com/3986214/103661533-1f4c7280-4f6f-11eb-89d4-874793937824.png) ### Currently (dark) ![Screenshot_20210105_155824](https://user-images.githubusercontent.com/3986214/103661571-2a9f9e00-4f6f-11eb-95f9-e291b5570b41.png) ### After this PR (dark) ![Screenshot_20210105_155836](https://user-images.githubusercontent.com/3986214/103661592-2ffce880-4f6f-11eb-977a-82afcb07d331.png) ### Currently (ayu) ![Screenshot_20210105_155917](https://user-images.githubusercontent.com/3986214/103661619-39865080-4f6f-11eb-9ca1-9045a107cddd.png) ### After this PR (ayu) ![Screenshot_20210105_155923](https://user-images.githubusercontent.com/3986214/103661652-3db26e00-4f6f-11eb-82b7-378e38f0c41f.png) _Edit:_ I just noticed, the “current” screenshots are from stable, so there are a few more differences in the pictures than the ones from just this PR.	2021-03-31 01:14:40 +02:00
bors	74874a690b	Auto merge of #83652 - xu-cheng:ipv4-octal, r=sfackler Disallow octal format in Ipv4 string In its original specification, leading zero in Ipv4 string is interpreted as octal literals. So a IP address 0127.0.0.1 actually means 87.0.0.1. This confusion can lead to many security vulnerabilities. Therefore, in [IETF RFC 6943], it suggests to disallow octal/hexadecimal format in Ipv4 string all together. Existing implementation already disallows hexadecimal numbers. This commit makes Parser reject octal numbers. Fixes #83648. [IETF RFC 6943]: https://tools.ietf.org/html/rfc6943#section-3.1.1	2021-03-30 19:34:23 +00:00
Ibraheem Ahmed	29fe5930a3	update for loop desugaring docs	2021-03-30 12:03:58 -04:00
Guillaume Gomez	f35e587db4	Fix Self keyword doc URL conflict on case insensitive file systems	2021-03-30 16:37:13 +02:00
bors	7b6fc5a3dd	Auto merge of #83170 - joshtriplett:spawn-cleanup, r=kennytm Simplify Command::spawn (no semantic change) This minimizes the size of an unsafe block, and allows outdenting some complex code.	2021-03-30 14:26:01 +00:00
bors	16156fb278	Auto merge of #83674 - Dylan-DPC:rollup-bcuc1hl, r=Dylan-DPC Rollup of 7 pull requests Successful merges: - #83568 (update comment at MaybeUninit::uninit_array) - #83571 (Constantify some slice methods) - #83579 (Improve pointer arithmetic docs) - #83645 (Wrap non-pre code blocks) - #83656 (Add a regression test for issue-82865) - #83662 (Update books) - #83667 (Suggest box/pin/arc ing receiver on method calls) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup	2021-03-30 11:44:36 +00:00
Dylan DPC	5b67543c98	Rollup merge of #83579 - RalfJung:ptr-arithmetic, r=dtolnay Improve pointer arithmetic docs * Add slightly more detailed definition of "allocated object" to the module docs, and link it from everywhere. * Clarify the "remains attached" wording a bit (at least I hope this is clearer). * Remove the sentence about using integer arithmetic; this seems to confuse people even if it is technically correct. As usual, the edit needs to be done in a dozen places to remain consistent, I hope I got them all.	2021-03-30 11:34:26 +02:00
Dylan DPC	ad2a80e412	Rollup merge of #83571 - a1phyr:feature_const_slice_first_last, r=dtolnay Constantify some slice methods Tracking issue: #83570 This PR constantifies the following functions under feature `const_slice_first_last`: - `slice::first` - `slice::split_first` - `slice::last` - `slice::split_last` Blocking on `#![feature(const_mut_refs)]`: - `slice::first_mut` - `slice::split_first_mut` - `slice::last_mut` - `slice::split_last_mut`	2021-03-30 11:34:25 +02:00
Dylan DPC	9ab5f7db30	Rollup merge of #83568 - RalfJung:uninit_array, r=dtolnay update comment at MaybeUninit::uninit_array https://github.com/rust-lang/rust/issues/49147 is closed; this now instead needs inline const expressions (#76001).	2021-03-30 11:34:23 +02:00
bors	689e8470ff	Auto merge of #83458 - saethlin:improve-vec-benches, r=dtolnay Clean up Vec's benchmarks The Vec benchmarks need a lot of love. I sort of noticed this in https://github.com/rust-lang/rust/pull/83357 but the overall situation is much less awesome than I thought at the time. The first commit just removes a lot of asserts and does a touch of other cleanup. A number of these benchmarks are poorly-named. For example, `bench_map_fast` is not in fact fast, `bench_rev_1` and `bench_rev_2` are vague, `bench_in_place_zip_iter_mut` doesn't call `zip`, `bench_in_place*` don't do anything in-place... Should I fix these, or is there tooling that depend on the names not changing? I've also noticed that `bench_rev_1` and `bench_rev_2` are remarkably fragile. It looks like poking other code in `Vec` can cause the codegen of this benchmark to switch to a version that has almost exactly half its current throughput and I have absolutely no idea why. Here's the fast version: ```asm 0.69 │110: movdqu -0x20(%rbx,%rdx,4),%xmm0 1.76 │ movdqu -0x10(%rbx,%rdx,4),%xmm1 0.71 │ pshufd $0x1b,%xmm1,%xmm1 0.60 │ pshufd $0x1b,%xmm0,%xmm0 3.68 │ movdqu %xmm1,-0x30(%rcx) 14.36 │ movdqu %xmm0,-0x20(%rcx) 13.88 │ movdqu -0x40(%rbx,%rdx,4),%xmm0 6.64 │ movdqu -0x30(%rbx,%rdx,4),%xmm1 0.76 │ pshufd $0x1b,%xmm1,%xmm1 0.77 │ pshufd $0x1b,%xmm0,%xmm0 1.87 │ movdqu %xmm1,-0x10(%rcx) 13.01 │ movdqu %xmm0,(%rcx) 38.81 │ add $0x40,%rcx 0.92 │ add $0xfffffffffffffff0,%rdx 1.22 │ ↑ jne 110 ``` And the slow one: ```asm 0.42 │9a880: movdqa %xmm2,%xmm1 4.03 │9a884: movq -0x8(%rbx,%rsi,4),%xmm4 8.49 │9a88a: pshufd $0xe1,%xmm4,%xmm4 2.58 │9a88f: movq -0x10(%rbx,%rsi,4),%xmm5 7.02 │9a895: pshufd $0xe1,%xmm5,%xmm5 4.79 │9a89a: punpcklqdq %xmm5,%xmm4 5.77 │9a89e: movdqu %xmm4,-0x18(%rdx) 15.74 │9a8a3: movq -0x18(%rbx,%rsi,4),%xmm4 3.91 │9a8a9: pshufd $0xe1,%xmm4,%xmm4 5.04 │9a8ae: movq -0x20(%rbx,%rsi,4),%xmm5 5.29 │9a8b4: pshufd $0xe1,%xmm5,%xmm5 4.60 │9a8b9: punpcklqdq %xmm5,%xmm4 9.81 │9a8bd: movdqu %xmm4,-0x8(%rdx) 11.05 │9a8c2: paddq %xmm3,%xmm0 0.86 │9a8c6: paddq %xmm3,%xmm2 5.89 │9a8ca: add $0x20,%rdx 0.12 │9a8ce: add $0xfffffffffffffff8,%rsi 1.16 │9a8d2: add $0x2,%rdi 2.96 │9a8d6: → jne 9a880 <<alloc::vec::Vec<T,A> as core::iter::traits::collect::Extend<&T>>::extend+0xd0> ```	2021-03-30 09:03:29 +00:00
bors	32d3276561	Auto merge of #83357 - saethlin:vec-reserve-inlining, r=dtolnay Reduce the impact of Vec::reserve calls that do not cause any allocation I think a lot of callers expect `Vec::reserve` to be nearly free when no resizing is required, but unfortunately that isn't the case. LLVM makes remarkably poor inlining choices (along the path from `Vec::reserve` to `RawVec::grow_amortized`), so depending on the surrounding context you either get a huge blob of `RawVec`'s resizing logic inlined into some seemingly-unrelated function, or not enough inlining happens and/or the actual check in `needs_to_grow` ends up behind a function call. My goal is to make the codegen for `Vec::reserve` match the mental that callers seem to have: It's reliably just a `sub cmp ja` if there is already sufficient capacity. This patch has the following impact on the serde_json benchmarks: `ca3efde8a5` run with `cargo +stage1 run --release -- -n 1024` Before: ``` DOM STRUCT ======= serde_json ======= parse\|stringify ===== parse\|stringify ==== data/canada.json 340 MB/s 490 MB/s 630 MB/s 370 MB/s data/citm_catalog.json 460 MB/s 540 MB/s 1010 MB/s 550 MB/s data/twitter.json 330 MB/s 840 MB/s 640 MB/s 630 MB/s ======= json-rust ======== parse\|stringify ===== parse\|stringify ==== data/canada.json 580 MB/s 990 MB/s data/citm_catalog.json 720 MB/s 660 MB/s data/twitter.json 570 MB/s 960 MB/s ``` After: ``` DOM STRUCT ======= serde_json ======= parse\|stringify ===== parse\|stringify ==== data/canada.json 330 MB/s 510 MB/s 610 MB/s 380 MB/s data/citm_catalog.json 450 MB/s 640 MB/s 970 MB/s 830 MB/s data/twitter.json 330 MB/s 880 MB/s 670 MB/s 960 MB/s ======= json-rust ======== parse\|stringify ===== parse\|stringify ==== data/canada.json 560 MB/s 1130 MB/s data/citm_catalog.json 710 MB/s 880 MB/s data/twitter.json 530 MB/s 1230 MB/s ``` That's approximately a one-third increase in throughput on two of the benchmarks, and no effect on one (The benchmark suite has sufficient jitter that I could pick a run where there are no regressions, so I'm not convinced they're meaningful here). This also produces perf increases on the order of 3-5% in a few other microbenchmarks that I'm tracking. It might be useful to see if this has a cascading effect on inlining choices in some large codebases. Compiling this simple program demonstrates the change in codegen that causes the perf impact: ```rust fn main() { reserve(&mut Vec::new()); } #[inline(never)] fn reserve(v: &mut Vec<u8>) { v.reserve(1234); } ``` Before: ```rust 00000000000069b0 <scratch::reserve>: 69b0: 53 push %rbx 69b1: 48 83 ec 30 sub $0x30,%rsp 69b5: 48 8b 47 08 mov 0x8(%rdi),%rax 69b9: 48 8b 4f 10 mov 0x10(%rdi),%rcx 69bd: 48 89 c2 mov %rax,%rdx 69c0: 48 29 ca sub %rcx,%rdx 69c3: 48 81 fa d1 04 00 00 cmp $0x4d1,%rdx 69ca: 77 73 ja 6a3f <scratch::reserve+0x8f> 69cc: 48 81 c1 d2 04 00 00 add $0x4d2,%rcx 69d3: 72 75 jb 6a4a <scratch::reserve+0x9a> 69d5: 48 89 fb mov %rdi,%rbx 69d8: 48 8d 14 00 lea (%rax,%rax,1),%rdx 69dc: 48 39 ca cmp %rcx,%rdx 69df: 48 0f 47 ca cmova %rdx,%rcx 69e3: 48 83 f9 08 cmp $0x8,%rcx 69e7: be 08 00 00 00 mov $0x8,%esi 69ec: 48 0f 47 f1 cmova %rcx,%rsi 69f0: 48 85 c0 test %rax,%rax 69f3: 74 17 je 6a0c <scratch::reserve+0x5c> 69f5: 48 8b 0b mov (%rbx),%rcx 69f8: 48 89 0c 24 mov %rcx,(%rsp) 69fc: 48 89 44 24 08 mov %rax,0x8(%rsp) 6a01: 48 c7 44 24 10 01 00 movq $0x1,0x10(%rsp) 6a08: 00 00 6a0a: eb 08 jmp 6a14 <scratch::reserve+0x64> 6a0c: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) 6a13: 00 6a14: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi 6a19: 48 89 e1 mov %rsp,%rcx 6a1c: ba 01 00 00 00 mov $0x1,%edx 6a21: e8 9a fe ff ff call 68c0 <alloc::raw_vec::finish_grow> 6a26: 48 8b 7c 24 20 mov 0x20(%rsp),%rdi 6a2b: 48 8b 74 24 28 mov 0x28(%rsp),%rsi 6a30: 48 83 7c 24 18 01 cmpq $0x1,0x18(%rsp) 6a36: 74 0d je 6a45 <scratch::reserve+0x95> 6a38: 48 89 3b mov %rdi,(%rbx) 6a3b: 48 89 73 08 mov %rsi,0x8(%rbx) 6a3f: 48 83 c4 30 add $0x30,%rsp 6a43: 5b pop %rbx 6a44: c3 ret 6a45: 48 85 f6 test %rsi,%rsi 6a48: 75 08 jne 6a52 <scratch::reserve+0xa2> 6a4a: ff 15 38 c4 03 00 call 0x3c438(%rip) # 42e88 <_GLOBAL_OFFSET_TABLE_+0x490> 6a50: 0f 0b ud2 6a52: ff 15 f0 c4 03 00 call 0x3c4f0(%rip) # 42f48 <_GLOBAL_OFFSET_TABLE_+0x550> 6a58: 0f 0b ud2 6a5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ``` After: ```asm 0000000000006910 <scratch::reserve>: 6910: 48 8b 47 08 mov 0x8(%rdi),%rax 6914: 48 8b 77 10 mov 0x10(%rdi),%rsi 6918: 48 29 f0 sub %rsi,%rax 691b: 48 3d d1 04 00 00 cmp $0x4d1,%rax 6921: 77 05 ja 6928 <scratch::reserve+0x18> 6923: e9 e8 fe ff ff jmp 6810 <alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle> 6928: c3 ret 6929: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) ```	2021-03-30 03:41:14 +00:00

1 2 3 4 5 ...

3668 Commits