Rollup of 2 pull requests
Successful merges:
- #90998 (Require const stability attribute on all stable functions that are `const`)
- #93489 (Mark the panic_no_unwind lang item as nounwind)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Mark the panic_no_unwind lang item as nounwind
This has 2 effects:
- It helps LLVM when inlining since it doesn't need to generate landing pads for `panic_no_unwind`.
- It makes it sound for a panic handler to unwind even if `PanicInfo::can_unwind` returns true. This will simply cause another panic once the unwind tries to go past the `panic_no_unwind` lang item. Eventually this will cause a stack overflow, which is safe.
Require const stability attribute on all stable functions that are `const`
This PR requires all stable functions (of all kinds) that are `const fn` to have a `#[rustc_const_stable]` or `#[rustc_const_unstable]` attribute. Stability was previously implied if omitted; a follow-up PR is planned to change the fallback to be unstable.
Optimize `core::str::Chars::count`
I wrote this a while ago after seeing this function as a bottleneck in a profile, but never got around to contributing it. I saw it again, and so here it is. The implementation is fairly complex, but I tried to explain what's happening at both a high level (in the header comment for the file), and in line comments in the impl. Hopefully it's clear enough.
This implementation (`case00_cur_libcore` in the benchmarks below) is somewhat consistently around 4x to 5x faster than the old implementation (`case01_old_libcore` in the benchmarks below), for a wide variety of workloads, without regressing performance on any of the workload sizes I've tried.
I also improved the benchmarks for this code, so that they explicitly check text in different languages and of different sizes (err, the cross product of language x size). The results of the benchmarks are here:
<details>
<summary>Benchmark results</summary>
<pre>
test str::char_count::emoji_huge::case00_cur_libcore ... bench: 20,216 ns/iter (+/- 3,673) = 17931 MB/s
test str::char_count::emoji_huge::case01_old_libcore ... bench: 108,851 ns/iter (+/- 12,777) = 3330 MB/s
test str::char_count::emoji_huge::case02_iter_increment ... bench: 329,502 ns/iter (+/- 4,163) = 1100 MB/s
test str::char_count::emoji_huge::case03_manual_char_len ... bench: 223,333 ns/iter (+/- 14,167) = 1623 MB/s
test str::char_count::emoji_large::case00_cur_libcore ... bench: 293 ns/iter (+/- 6) = 19331 MB/s
test str::char_count::emoji_large::case01_old_libcore ... bench: 1,681 ns/iter (+/- 28) = 3369 MB/s
test str::char_count::emoji_large::case02_iter_increment ... bench: 5,166 ns/iter (+/- 85) = 1096 MB/s
test str::char_count::emoji_large::case03_manual_char_len ... bench: 3,476 ns/iter (+/- 62) = 1629 MB/s
test str::char_count::emoji_medium::case00_cur_libcore ... bench: 48 ns/iter (+/- 0) = 14750 MB/s
test str::char_count::emoji_medium::case01_old_libcore ... bench: 217 ns/iter (+/- 4) = 3262 MB/s
test str::char_count::emoji_medium::case02_iter_increment ... bench: 642 ns/iter (+/- 7) = 1102 MB/s
test str::char_count::emoji_medium::case03_manual_char_len ... bench: 445 ns/iter (+/- 3) = 1591 MB/s
test str::char_count::emoji_small::case00_cur_libcore ... bench: 18 ns/iter (+/- 0) = 3777 MB/s
test str::char_count::emoji_small::case01_old_libcore ... bench: 23 ns/iter (+/- 0) = 2956 MB/s
test str::char_count::emoji_small::case02_iter_increment ... bench: 66 ns/iter (+/- 2) = 1030 MB/s
test str::char_count::emoji_small::case03_manual_char_len ... bench: 29 ns/iter (+/- 1) = 2344 MB/s
test str::char_count::en_huge::case00_cur_libcore ... bench: 25,909 ns/iter (+/- 39,260) = 13299 MB/s
test str::char_count::en_huge::case01_old_libcore ... bench: 102,887 ns/iter (+/- 3,257) = 3349 MB/s
test str::char_count::en_huge::case02_iter_increment ... bench: 166,370 ns/iter (+/- 12,439) = 2071 MB/s
test str::char_count::en_huge::case03_manual_char_len ... bench: 166,332 ns/iter (+/- 4,262) = 2071 MB/s
test str::char_count::en_large::case00_cur_libcore ... bench: 281 ns/iter (+/- 6) = 19160 MB/s
test str::char_count::en_large::case01_old_libcore ... bench: 1,598 ns/iter (+/- 19) = 3369 MB/s
test str::char_count::en_large::case02_iter_increment ... bench: 2,598 ns/iter (+/- 167) = 2072 MB/s
test str::char_count::en_large::case03_manual_char_len ... bench: 2,578 ns/iter (+/- 55) = 2088 MB/s
test str::char_count::en_medium::case00_cur_libcore ... bench: 44 ns/iter (+/- 1) = 15295 MB/s
test str::char_count::en_medium::case01_old_libcore ... bench: 201 ns/iter (+/- 51) = 3348 MB/s
test str::char_count::en_medium::case02_iter_increment ... bench: 322 ns/iter (+/- 40) = 2090 MB/s
test str::char_count::en_medium::case03_manual_char_len ... bench: 319 ns/iter (+/- 5) = 2109 MB/s
test str::char_count::en_small::case00_cur_libcore ... bench: 15 ns/iter (+/- 0) = 2333 MB/s
test str::char_count::en_small::case01_old_libcore ... bench: 14 ns/iter (+/- 0) = 2500 MB/s
test str::char_count::en_small::case02_iter_increment ... bench: 30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::en_small::case03_manual_char_len ... bench: 30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::ru_huge::case00_cur_libcore ... bench: 16,439 ns/iter (+/- 3,105) = 19777 MB/s
test str::char_count::ru_huge::case01_old_libcore ... bench: 89,480 ns/iter (+/- 2,555) = 3633 MB/s
test str::char_count::ru_huge::case02_iter_increment ... bench: 217,703 ns/iter (+/- 22,185) = 1493 MB/s
test str::char_count::ru_huge::case03_manual_char_len ... bench: 157,330 ns/iter (+/- 19,188) = 2066 MB/s
test str::char_count::ru_large::case00_cur_libcore ... bench: 243 ns/iter (+/- 6) = 20905 MB/s
test str::char_count::ru_large::case01_old_libcore ... bench: 1,384 ns/iter (+/- 51) = 3670 MB/s
test str::char_count::ru_large::case02_iter_increment ... bench: 3,381 ns/iter (+/- 543) = 1502 MB/s
test str::char_count::ru_large::case03_manual_char_len ... bench: 2,423 ns/iter (+/- 429) = 2096 MB/s
test str::char_count::ru_medium::case00_cur_libcore ... bench: 42 ns/iter (+/- 1) = 15119 MB/s
test str::char_count::ru_medium::case01_old_libcore ... bench: 180 ns/iter (+/- 4) = 3527 MB/s
test str::char_count::ru_medium::case02_iter_increment ... bench: 402 ns/iter (+/- 45) = 1579 MB/s
test str::char_count::ru_medium::case03_manual_char_len ... bench: 280 ns/iter (+/- 29) = 2267 MB/s
test str::char_count::ru_small::case00_cur_libcore ... bench: 12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case01_old_libcore ... bench: 12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case02_iter_increment ... bench: 19 ns/iter (+/- 0) = 1684 MB/s
test str::char_count::ru_small::case03_manual_char_len ... bench: 14 ns/iter (+/- 1) = 2285 MB/s
test str::char_count::zh_huge::case00_cur_libcore ... bench: 15,053 ns/iter (+/- 2,640) = 20067 MB/s
test str::char_count::zh_huge::case01_old_libcore ... bench: 82,622 ns/iter (+/- 3,602) = 3656 MB/s
test str::char_count::zh_huge::case02_iter_increment ... bench: 230,456 ns/iter (+/- 7,246) = 1310 MB/s
test str::char_count::zh_huge::case03_manual_char_len ... bench: 220,595 ns/iter (+/- 11,624) = 1369 MB/s
test str::char_count::zh_large::case00_cur_libcore ... bench: 227 ns/iter (+/- 65) = 20792 MB/s
test str::char_count::zh_large::case01_old_libcore ... bench: 1,136 ns/iter (+/- 144) = 4154 MB/s
test str::char_count::zh_large::case02_iter_increment ... bench: 3,147 ns/iter (+/- 253) = 1499 MB/s
test str::char_count::zh_large::case03_manual_char_len ... bench: 2,993 ns/iter (+/- 400) = 1577 MB/s
test str::char_count::zh_medium::case00_cur_libcore ... bench: 36 ns/iter (+/- 5) = 16388 MB/s
test str::char_count::zh_medium::case01_old_libcore ... bench: 142 ns/iter (+/- 18) = 4154 MB/s
test str::char_count::zh_medium::case02_iter_increment ... bench: 379 ns/iter (+/- 37) = 1556 MB/s
test str::char_count::zh_medium::case03_manual_char_len ... bench: 364 ns/iter (+/- 51) = 1620 MB/s
test str::char_count::zh_small::case00_cur_libcore ... bench: 11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case01_old_libcore ... bench: 11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case02_iter_increment ... bench: 20 ns/iter (+/- 3) = 1650 MB/s
</pre>
</details>
I also added fairly thorough tests for different sizes and alignments. This completes on my machine in 0.02s, which is surprising given how thorough they are, but it seems to detect bugs in the implementation. (I haven't run the tests on a 32 bit machine yet since before I reworked the code a little though, so... hopefully I'm not about to embarrass myself).
This uses similar SWAR-style techniques to the `is_ascii` impl I contributed in https://github.com/rust-lang/rust/pull/74066, so I'm going to request review from the same person who reviewed that one. That said am not particularly picky, and might not have the correct syntax for requesting a review from someone (so it goes).
r? `@nagisa`
Document valid values of the char type
As discussed at #93392, the current documentation on what constitutes a valid char isn't very detailed and is partly on the MAX constant rather than the type itself.
This PR expands on that information, stating the actual numerical range, giving examples of what won't work, and also mentions how a `char` might be a valid USV but still not be a defined character (terminology checked against [Unicode 14.0, table 2-3](https://www.unicode.org/versions/Unicode14.0.0/ch02.pdf#M9.61673.TableTitle.Table.22.Types.of.Code.Points)).
Implement `RawWaker` and `Waker` getters for underlying pointers
implement #87021
New APIs:
- `RawWaker::data(&self) -> *const ()`
- `RawWaker::vtable(&self) -> &'static RawWakerVTable`
- ~`Waker::as_raw_waker(&self) -> &RawWaker`~ `Waker::as_raw(&self) -> &RawWaker`
This third one is an auxiliary function to make the two APIs above more useful. Since we can only get `&Waker` in `Future::poll`, without this, we need to `transmute` it into `&RawWaker` (relying on `repr(transparent)`) in order to access its data/vtable pointers.
~Not sure if it should be named `as_raw` or `as_raw_waker`. Seems we always use `as_<something-raw>` instead of just `as_raw`. But `as_raw_waker` seems not quite consistent with `Waker::from_raw`.~ As suggested in https://github.com/rust-lang/rust/pull/91828#discussion_r770729837, use `as_raw`.
Carefully remove bounds checks from some chunk iterator functions
So, I was writing code that requires the equivalent of `rchunks(N).rev()` (which isn't the same as forward `chunks(N)` — in particular, if the buffer length is not a multiple of `N`, I must handle the "remainder" first).
I happened to look at the codegen output of the function (I was actually interested in whether or not a nested loop was being unrolled — it was), and noticed that in the outer `rchunks(n).rev()` loop, LLVM seemed to be unable to remove the bounds checks from the iteration: https://rust.godbolt.org/z/Tnz4MYY8f (this panic was from the split_at in `RChunks::next_back`).
After doing some experimentation, it seems all of the `next_back` in the non-exact chunk iterators have the issue: (`Chunks::next_back`, `RChunks::next_back`, `ChunksMut::next_back`, and `RChunksMut::next_back`)...
Even worse, the forward `rchunks` iterators sometimes have the issue as well (... but only sometimes). For example https://rust.godbolt.org/z/oGhbqv53r has bounds checks, but if I uncomment the loop body, it manages to remove the check (which is bizarre, since I'd expect the opposite...). I suspect it's highly dependent on the surrounding code, so I decided to remove the bounds checks from them anyway. Overall, this change includes:
- All `next_back` functions on the non-`Exact` iterators (e.g. `R?Chunks(Mut)?`).
- All `next` functions on the non-exact rchunks iterators (e.g. `RChunks(Mut)?`).
I wasn't able to catch any of the other chunk iterators failing to remove the bounds checks (I checked iterations over `r?chunks(_exact)?(_mut)?` with constant chunk sizes under `-O3`, `-Os`, and `-Oz`), which makes sense, since these were the cases where it was harder to prove the bounds check correct to remove...
In fact, it took quite a bit of thinking to convince myself that using unchecked_ here was valid — so I'm not really surprised that LLVM had trouble (although compilers are slightly better at this sort of reasoning than humans). A consequence of that is the fact that the `// SAFETY` comment for these are... kinda long...
---
I didn't do this for, or even think about it for, any of the other iteration methods; just `next` and `next_back` (where it mattered). If this PR is accepted, I'll file a follow up for someone (possibly me) to look at the others later (in particular, `nth`/`nth_back` looked like they had similar logic), but I wanted to do this now, as IMO `next`/`next_back` are the most important here, since they're what gets used by the iteration protocol.
---
Note: While I don't expect this to impact performance directly, the panic is a side effect, which would otherwise not exist in these loops. That is, this could prevent the compiler from being able to move/remove/otherwise rework a loop over these iterators (as an example, it could not delete the code for a loop whose body computes a value which doesn't get used).
Also, some like to be able to have confidence this code has no panicking branches in the optimized code, and "no bounds checks" is kinda part of the selling point of Rust's iterators anyway.
Remove deprecated and unstable slice_partition_at_index functions
They have been deprecated since commit 01ac5a97c9
which was part of the 1.49.0 release, so from the point of nightly,
11 releases ago.
review the total_cmp documentation
The documentation has been restructured to split out a brief summary
paragraph out from the following elaborating paragraphs.
I also attempted my hand at wording improvements and adding articles
where I felt them missing, but being non-native english speaker these
may need more thorough review.
cc https://github.com/rust-lang/rust/issues/72599
Clarify documentation on char::MAX
As mentioned in https://github.com/rust-lang/rust/issues/91836#issuecomment-994106874, the documentation on `char::MAX` is not quite correct – USVs are not "only ones within a certain range", they are code points _outside_ a certain range. I have corrected this and given the actual numbers as there is no reason to hide them.
Make `char::DecodeUtf16::size_hist` more precise
New implementation takes into account contents of `self.buf` and rounds lower bound up instead of down.
Fixes#88762
Revival of #88763
Create `core::fmt::ArgumentV1` with generics instead of fn pointer
Split from (and prerequisite of) #90488, as this seems to have perf implication.
`@rustbot` label: +T-libs
The documentation has been restructured to split out a brief summary
paragraph out from the following elaborating paragraphs.
I also attempted my hand at wording improvements and adding articles
where I felt them missing, but being non-native english speaker these
may need more thorough review.
Add `intrinsics::const_deallocate`
Tracking issue: #79597
Related: #91884
This allows deallocation of a memory allocated by `intrinsics::const_allocate`. At the moment, this can be only used to reduce memory usage, but in the future this may be useful to detect memory leaks (If an allocated memory remains after evaluation, raise an error...?).
Unimpl {Add,Sub,Mul,Div,Rem,BitXor,BitOr,BitAnd}<$t> for Saturating<$t>
Tracking issue #92354
Analog to 9648b313cc#93208 reduce `saturating_int_assign_impl` (#93208) to:
```rust
let mut value = Saturating(2u8);
value += 3u8;
value -= 1u8;
value *= 2u8;
value /= 2u8;
value %= 2u8;
value ^= 255u8;
value |= 123u8;
value &= 2u8;
```
See https://github.com/rust-lang/rust/pull/93208#issuecomment-1022564429
Add links to the reference and rust by example for asm! docs and lints
These were previously removed in #91728 due to broken links.
cc ``@ehuss`` since this updates the rust-by-example submodule
This makes `PartialOrd` consistent with the other three traits in this
module, which all include links to their respective mathematical concepts
on Wikipedia.