Optimize `escape_ascii` using a lookup table
Based upon my suggestion here: https://github.com/rust-lang/rust/pull/125340#issuecomment-2130441817
Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion.
The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700)
But, if you want to try my benchmarking code for yourself:
<details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary>
```rust
#![feature(ascii_char)]
#![feature(ascii_char_variants)]
#![feature(const_option)]
#![feature(let_chains)]
use core::ascii;
use core::ops::Range;
use criterion::{criterion_group, criterion_main, Criterion};
use rand::{thread_rng, Rng};
const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap();
#[inline]
const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 2) };
let mut output = [ascii::Char::Null; N];
output[0] = ascii::Char::ReverseSolidus;
output[1] = a;
(output, 0..2)
}
#[inline]
const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
let mut output = [ascii::Char::Null; N];
let hi = HEX_DIGITS[(byte >> 4) as usize];
let lo = HEX_DIGITS[(byte & 0xf) as usize];
output[0] = ascii::Char::ReverseSolidus;
output[1] = ascii::Char::SmallX;
output[2] = hi;
output[3] = lo;
(output, 0..4)
}
#[inline]
const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 1) };
let mut output = [ascii::Char::Null; N];
output[0] = a;
(output, 0..1)
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
match byte {
b'\t' => backslash(ascii::Char::SmallT),
b'\r' => backslash(ascii::Char::SmallR),
b'\n' => backslash(ascii::Char::SmallN),
b'\\' => backslash(ascii::Char::ReverseSolidus),
b'\'' => backslash(ascii::Char::Apostrophe),
b'\"' => backslash(ascii::Char::QuotationMark),
0x00..=0x1F => hex_escape(byte),
_ => match ascii::Char::from_u8(byte) {
Some(a) => verbatim(a),
None => hex_escape(byte),
},
}
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
/// Lookup table helps us determine how to display character.
///
/// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to
/// indicate whether the result is escaped or unescaped.
///
/// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since
/// escaped NUL will not occur.
const LOOKUP: [u8; 256] = {
let mut arr = [0; 256];
let mut idx = 0;
loop {
arr[idx as usize] = match idx {
// use 8th bit to indicate escaped
b'\t' => 0x80 | b't',
b'\r' => 0x80 | b'r',
b'\n' => 0x80 | b'n',
b'\\' => 0x80 | b'\\',
b'\'' => 0x80 | b'\'',
b'"' => 0x80 | b'"',
// use NUL to indicate hex-escaped
0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0',
_ => idx,
};
if idx == 255 {
break;
}
idx += 1;
}
arr
};
let lookup = LOOKUP[byte as usize];
// 8th bit indicates escape
let lookup_escaped = lookup & 0x80 != 0;
// SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character.
let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) };
if lookup_escaped {
// NUL indicates hex-escaped
if matches!(lookup_ascii, ascii::Char::Null) {
hex_escape(byte)
} else {
backslash(lookup_ascii)
}
} else {
verbatim(lookup_ascii)
}
}
fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> {
let mut vec = Vec::new();
for b in bytes {
let (buf, range) = f(*b);
vec.extend_from_slice(&buf[range.start as usize..range.end as usize]);
}
vec
}
pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("escape_ascii");
group.sample_size(1000);
let rand_200k = &mut [0; 200 * 1024];
thread_rng().fill(&mut rand_200k[..]);
let cat = include_bytes!("/bin/cat");
let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml");
group.bench_function("old_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_old));
});
group.bench_function("new_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_new));
});
group.bench_function("old_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_old));
});
group.bench_function("new_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_new));
});
group.bench_function("old_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_old));
});
group.bench_function("new_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_new));
});
group.finish();
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
```
</details>
My benchmark results:
```
escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms]
Found 22 outliers among 1000 measurements (2.20%)
4 (0.40%) high mild
18 (1.80%) high severe
escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms]
Found 38 outliers among 1000 measurements (3.80%)
38 (3.80%) high mild
escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs]
Found 39 outliers among 1000 measurements (3.90%)
17 (1.70%) high mild
22 (2.20%) high severe
escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs]
Found 107 outliers among 1000 measurements (10.70%)
43 (4.30%) high mild
64 (6.40%) high severe
escape_ascii/old_cargo_toml
time: [23.397 µs 23.699 µs 24.014 µs]
Found 204 outliers among 1000 measurements (20.40%)
21 (2.10%) high mild
183 (18.30%) high severe
escape_ascii/new_cargo_toml
time: [16.404 µs 16.438 µs 16.483 µs]
Found 88 outliers among 1000 measurements (8.80%)
56 (5.60%) high mild
32 (3.20%) high severe
```
Random: 1.7006ms => 1.6953ms (<1% speedup)
Binary: 225.40µs => 165.63µs (26% speedup)
Text: 23.699µs => 16.438µs (30% speedup)
Stabilize const `ptr::write*` and `mem::replace`
Since `const_mut_refs` and `const_refs_to_cell` have been stabilized, we may now also stabilize the ability to write to places during const evaluation inside our library API. So, we now propose the `const fn` version of `ptr::write` and its variants. This allows us to also stabilize `mem::replace` and `ptr::replace`.
- const `mem::replace`: https://github.com/rust-lang/rust/issues/83164#issuecomment-2338660862
- const `ptr::write{,_bytes,_unaligned}`: https://github.com/rust-lang/rust/issues/86302#issuecomment-2330275266
Their implementation requires an additional internal stabilization of `const_intrinsic_forget`, which is required for `*::write*` and thus `*::replace`. Thus we const-stabilize the internal intrinsics `forget`, `write_bytes`, and `write_via_move`.
intrinsics fmuladdf{32,64}: expose llvm.fmuladd.* semantics
Add intrinsics `fmuladd{f32,f64}`. This computes `(a * b) + c`, to be fused if the code generator determines that (i) the target instruction set has support for a fused operation, and (ii) that the fused operation is more efficient than the equivalent, separate pair of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
The codegen_cranelift uses the `fma` function from libc, which is a correct implementation, but without the desired performance semantic. I think this requires an update to cranelift to expose a suitable instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same way (using `fma` from libc).
---
This topic has been discussed a few times on Zulip and was suggested, for example, by `@workingjubilee` in [Effect of fma disabled](https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Effect.20of.20fma.20disabled/near/274179331).
`IndexRange::len` is justified as an overall invariant, and
`take_prefix` and `take_suffix` are justified by local branch
conditions. A few more UB-checked calls remain in cases that are only
supported locally by `debug_assert!`, which won't do anything in
distributed builds, so those UB checks may still be useful.
We generally expect core's `#![rustc_preserve_ub_checks]` to optimize
away in user's release builds, but the mere presence of that extra code
can sometimes inhibit optimization, as seen in #131563.
Stabilise `const_char_encode_utf8`.
Closes: #130512
This PR stabilises the `const_char_encode_utf8` feature gate (i.e. support for `char::encode_utf8` in const scenarios).
Note that the linked tracking issue is currently awaiting FCP.
Port sort-research-rs test suite to Rust stdlib tests
This PR is a followup to https://github.com/rust-lang/rust/pull/124032. It replaces the tests that test the various sort functions in the standard library with a test-suite developed as part of https://github.com/Voultapher/sort-research-rs. The current tests suffer a couple of problems:
- They don't cover important real world patterns that the implementations take advantage of and execute special code for.
- The input lengths tested miss out on code paths. For example, important safety property tests never reach the quicksort part of the implementation.
- The miri side is often limited to `len <= 20` which means it very thoroughly tests the insertion sort, which accounts for 19 out of 1.5k LoC.
- They are split into to core and alloc, causing code duplication and uneven coverage.
- ~~The randomness is tied to a caller location, wasting the space exploration capabilities of randomized testing.~~ The randomness is not repeatable, as it relies on `std:#️⃣:RandomState::new().build_hasher()`.
Most of these issues existed before https://github.com/rust-lang/rust/pull/124032, but they are intensified by it. One thing that is new and requires additional testing, is that the new sort implementations specialize based on type properties. For example `Freeze` and non `Freeze` execute different code paths.
Effectively there are three dimensions that matter:
- Input type
- Input length
- Input pattern
The ported test-suite tests various properties along all three dimensions, greatly improving test coverage. It side-steps the miri issue by preferring sampled approaches. For example the test that checks if after a panic the set of elements is still the original one, doesn't do so for every single possible panic opportunity but rather it picks one at random, and performs this test across a range of input length, which varies the panic point across them. This allows regular execution to easily test inputs of length 10k, and miri execution up to 100 which covers significantly more code. The randomness used is tied to a fixed - but random per process execution - seed. This allows for fully repeatable tests and fuzzer like exploration across multiple runs.
Structure wise, the tests are previously found in the core integration tests for `sort_unstable` and alloc unit tests for `sort`. The new test-suite was developed to be a purely black-box approach, which makes integration testing the better place, because it can't accidentally rely on internal access. Because unwinding support is required the tests can't be in core, even if the implementation is, so they are now part of the alloc integration tests. Are there architectures that can only build and test core and not alloc? If so, do such platforms require sort testing? For what it's worth the current implementation state passes miri `--target mips64-unknown-linux-gnuabi64` which is big endian.
The test-suite also contains tests for properties that were and are given by the current and previous implementations, and likely relied upon by users but weren't tested. For example `self_cmp` tests that the two parameters `a` and `b` passed into the comparison function are never references to the same object, which if the user is sorting for example a `&mut [Mutex<i32>]` could lead to a deadlock.
Instead of using the hashed caller location as rand seed, it uses seconds since unix epoch / 10, which given timestamps in the CI should be reasonably easy to reproduce, but also allows fuzzer like space exploration.
---
Test run-time changes:
Setup:
```
Linux 6.10
rustc 1.83.0-nightly (f79a912d9 2024-09-18)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.
```
master: e9df22f
Before core integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/coretests-219cbd0308a49e2f
Time (mean ± σ): 869.6 ms ± 21.1 ms [User: 1327.6 ms, System: 95.1 ms]
Range (min … max): 845.4 ms … 917.0 ms 10 runs
# MIRIFLAGS="-Zmiri-disable-isolation" to get real time
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/core
finished in 738.44s
```
After core integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/coretests-219cbd0308a49e2f
Time (mean ± σ): 865.1 ms ± 14.7 ms [User: 1283.5 ms, System: 88.4 ms]
Range (min … max): 836.2 ms … 885.7 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/core
finished in 752.35s
```
Before alloc unit tests:
```
LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloc-19c15e6e8565aa54
Time (mean ± σ): 295.0 ms ± 9.9 ms [User: 719.6 ms, System: 35.3 ms]
Range (min … max): 284.9 ms … 319.3 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 322.75s
```
After alloc unit tests:
```
LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloc-19c15e6e8565aa54
Time (mean ± σ): 97.4 ms ± 4.1 ms [User: 297.7 ms, System: 28.6 ms]
Range (min … max): 92.3 ms … 109.2 ms 27 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 309.18s
```
Before alloc integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloctests-439e7300c61a8046
Time (mean ± σ): 103.2 ms ± 1.7 ms [User: 135.7 ms, System: 39.4 ms]
Range (min … max): 99.7 ms … 107.3 ms 28 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 231.35s
```
After alloc integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloctests-439e7300c61a8046
Time (mean ± σ): 379.8 ms ± 4.7 ms [User: 4620.5 ms, System: 1157.2 ms]
Range (min … max): 373.6 ms … 386.9 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 449.24s
```
In my opinion the results don't change iterative library development or CI execution in meaningful ways. For example currently the library doc-tests take ~66s and incremental compilation takes 10+ seconds. However I only have limited knowledge of the various local development workflows that exist, and might be missing one that is significantly impacted by this change.
Add intrinsics `fmuladd{f16,f32,f64,f128}`. This computes `(a * b) +
c`, to be fused if the code generator determines that (i) the target
instruction set has support for a fused operation, and (ii) that the
fused operation is more efficient than the equivalent, separate pair
of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
MIRI support is included for f32 and f64.
The codegen_cranelift uses the `fma` function from libc, which is a
correct implementation, but without the desired performance semantic. I
think this requires an update to cranelift to expose a suitable
instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same
way (using `fma` from libc).
Stabilize const `{slice,array}::from_mut`
This PR stabilizes the following APIs as const stable as of rust `1.83`:
```rs
// core::array
pub const fn from_mut<T>(s: &mut T) -> &mut [T; 1];
// core::slice
pub const fn from_mut<T>(s: &mut T) -> &mut [T];
```
This is made possible by `const_mut_refs` being stabilized (yay).
Tracking issue: #90206
Rewrite these blobs to explicitly mention the case of a sized operand.
The previous made that seem wrong instead of emphasizing it is nothing
but a simple cast. Instead, the explanation now emphasizes that the
address portion of the argument, together with its provenance, is
discarded which previously had to be inferred by the reader. Then an
example demonstrates a simple line of incorrect usage based on this
idea of provenance.
make Cell unstably const
Now that we can do interior mutability in `const`, most of the Cell API can be `const fn`. :) The main exception is `set`, because it drops the old value. So from const context one has to use `replace`, which delegates the responsibility for dropping to the caller.
Tracking issue: https://github.com/rust-lang/rust/issues/131283
`as_array_of_cells` is itself still unstable to I added the const-ness to the feature gate for that function and not to `const_cell`, Cc #88248.
r? libs-api
Stabilize the `map`/`value` methods on `ControlFlow`
And fix the stability attribute on the `pub use` in `core::ops`.
libs-api in https://github.com/rust-lang/rust/issues/75744#issuecomment-2231214910 seemed reasonably happy with naming for these, so let's try for an FCP.
Summary:
```rust
impl<B, C> ControlFlow<B, C> {
pub fn break_value(self) -> Option<B>;
pub fn map_break<T>(self, f: impl FnOnce(B) -> T) -> ControlFlow<T, C>;
pub fn continue_value(self) -> Option<C>;
pub fn map_continue<T>(self, f: impl FnOnce(C) -> T) -> ControlFlow<B, T>;
}
```
Resolves#75744
``@rustbot`` label +needs-fcp +t-libs-api -t-libs
---
Aside, in case it keeps someone else from going down the same dead end: I looked at the `{break,continue}_value` methods and tried to make them `const` as part of this, but that's disallowed because of not having `const Drop`, so put it back to not even unstably-const.
Add `[Option<T>; N]::transpose`
This PR as a new unstable libs API, `[Option<T>; N]::transpose`, which permits going from `[Option<T>; N]` to `Option<[T; N]>`.
This new API doesn't have an ACP as it was directly asked by T-libs-api in https://github.com/rust-lang/rust/issues/97601#issuecomment-2372109119:
> [..] but it'd be trivial to provide a helper method `.transpose()` that turns array-of-Option into Option-of-array (**and we think that method should exist**; it already does for array-of-MaybeUninit).
r? libs
Update Unicode escapes in `/library/core/src/char/methods.rs`
`char::MAX` is inconsistent on how Unicode escapes should be formatted. This PR resolves that.
ptr::add/sub: do not claim equivalence with `offset(c as isize)`
In https://github.com/rust-lang/rust/pull/110837, the `offset` intrinsic got changed to also allow a `usize` offset parameter. The intention is that this will do an unsigned multiplication with the size, and we have UB if that overflows -- and we also have UB if the result is larger than `usize::MAX`, i.e., if a subsequent cast to `isize` would wrap. ~~The LLVM backend sets some attributes accordingly.~~
This updates the docs for `add`/`sub` to match that intent, in preparation for adjusting codegen to exploit this UB. We use this opportunity to clarify what the exact requirements are: we compute the offset using mathematical multiplication (so it's no problem to have an `isize * usize` multiplication, we just multiply integers), and the result must fit in an `isize`.
Cc `@rust-lang/opsem` `@nikic`
https://github.com/rust-lang/rust/pull/130239 updates Miri to detect this UB.
`sub` still has some cases of UB not reflected in the underlying intrinsic semantics (and Miri does not catch): when we subtract `usize::MAX`, then after casting to `isize` that's just `-1` so we end up adding one unit without noticing any UB, but actually the offset we gave does not fit in an `isize`. Miri will currently still not complain for such cases:
```rust
fn main() {
let x = &[0i32; 2];
let x = x.as_ptr();
// This should be UB, we are subtracting way too much.
unsafe { x.sub(usize::MAX).read() };
}
```
However, the LLVM IR we generate here also is UB-free. This is "just" library UB but not language UB.
Cc `@saethlin;` might be worth adding precondition checks against overflow on `offset`/`add`/`sub`?
Fixes https://github.com/rust-lang/rust/issues/130211
make ptr metadata functions callable from stable const fn
So far this was done with a bunch of `rustc_allow_const_fn_unstable`. But those should be the exception, not the norm. If we are confident we can expose the ptr metadata APIs *indirectly* in stable const fn, we should just mark them as `rustc_const_stable`. And we better be confident we can do that since it's already been done a while ago. ;)
In particular this marks two intrinsics as const-stable: `aggregate_raw_ptr`, `ptr_metadata`. This should be uncontroversial, they are trivial to implement in the interpreter.
Cc `@rust-lang/wg-const-eval` `@rust-lang/lang`
This commit is a followup to https://github.com/rust-lang/rust/pull/124032. It
replaces the tests that test the various sort functions in the standard library
with a test-suite developed as part of
https://github.com/Voultapher/sort-research-rs. The current tests suffer a
couple of problems:
- They don't cover important real world patterns that the implementations take
advantage of and execute special code for.
- The input lengths tested miss out on code paths. For example, important safety
property tests never reach the quicksort part of the implementation.
- The miri side is often limited to `len <= 20` which means it very thoroughly
tests the insertion sort, which accounts for 19 out of 1.5k LoC.
- They are split into to core and alloc, causing code duplication and uneven
coverage.
- The randomness is not repeatable, as it
relies on `std:#️⃣:RandomState::new().build_hasher()`.
Most of these issues existed before
https://github.com/rust-lang/rust/pull/124032, but they are intensified by it.
One thing that is new and requires additional testing, is that the new sort
implementations specialize based on type properties. For example `Freeze` and
non `Freeze` execute different code paths.
Effectively there are three dimensions that matter:
- Input type
- Input length
- Input pattern
The ported test-suite tests various properties along all three dimensions,
greatly improving test coverage. It side-steps the miri issue by preferring
sampled approaches. For example the test that checks if after a panic the set of
elements is still the original one, doesn't do so for every single possible
panic opportunity but rather it picks one at random, and performs this test
across a range of input length, which varies the panic point across them. This
allows regular execution to easily test inputs of length 10k, and miri execution
up to 100 which covers significantly more code. The randomness used is tied to a
fixed - but random per process execution - seed. This allows for fully
repeatable tests and fuzzer like exploration across multiple runs.
Structure wise, the tests are previously found in the core integration tests for
`sort_unstable` and alloc unit tests for `sort`. The new test-suite was
developed to be a purely black-box approach, which makes integration testing the
better place, because it can't accidentally rely on internal access. Because
unwinding support is required the tests can't be in core, even if the
implementation is, so they are now part of the alloc integration tests. Are
there architectures that can only build and test core and not alloc? If so, do
such platforms require sort testing? For what it's worth the current
implementation state passes miri `--target mips64-unknown-linux-gnuabi64` which
is big endian.
The test-suite also contains tests for properties that were and are given by the
current and previous implementations, and likely relied upon by users but
weren't tested. For example `self_cmp` tests that the two parameters `a` and `b`
passed into the comparison function are never references to the same object,
which if the user is sorting for example a `&mut [Mutex<i32>]` could lead to a
deadlock.
Instead of using the hashed caller location as rand seed, it uses seconds since
unix epoch / 10, which given timestamps in the CI should be reasonably easy to
reproduce, but also allows fuzzer like space exploration.
stabilize const_cell_into_inner
This const-stabilizes
- `UnsafeCell::into_inner`
- `Cell::into_inner`
- `RefCell::into_inner`
- `OnceCell::into_inner`
`@rust-lang/wg-const-eval` this uses `rustc_allow_const_fn_unstable(const_precise_live_drops)`, so we'd be comitting to always finding *some* way to accept this code. IMO that's fine -- what these functions do is to move out the only field of a struct, and that struct has no destructor itself. The field's destructor does not get run as it gets returned to the caller.
`@rust-lang/libs-api` this was FCP'd already [years ago](https://github.com/rust-lang/rust/issues/78729#issuecomment-811409860), except that `OnceCell::into_inner` was added to the same feature gate since then (Cc `@tgross35).` Does that mean we have to re-run the FCP? If yes, I'd honestly prefer to move `OnceCell` into its own feature gate to not risk missing the next release. (That's why it's not great to add new functions to an already FCP'd feature gate.) OTOH if this needs an FCP either way since the previous FCP was so long ago, then we might as well do it all at once.
Improve Ord docs
- Makes wording more clear and re-structures some sections that can be overwhelming for someone not already in the know.
- Adds examples of how *not* to implement Ord, inspired by various anti-patterns found in real world code.
Many of the wording changes are inspired directly by my personal experience of being confused by the `Ord` docs and seeing other people get it wrong as well, especially lately having looked at a number of `Ord` implementations as part of #128899.
Created with help by `@orlp.`
r? `@workingjubilee`
Update `catch_unwind` doc comments for `c_unwind`
Updates `catch_unwind` doc comments to indicate that catching a foreign exception _will no longer_ be UB. Instead, there are two possible behaviors, though it is not specified which one an implementation will choose.
Nominated for t-lang to confirm that they are okay with making such a promise based on t-opsem FCP, or whether they would like to be included in the FCP.
Related: https://github.com/rust-lang/rust/issues/74990, https://github.com/rust-lang/rust/issues/115285, https://github.com/rust-lang/reference/pull/1226
- Makes wording more clear and re-structures some
sections that can be overwhelming for some not
already in the know.
- Adds examples of how *not* to implement Ord,
inspired by various anti-patterns found in real
world code.
update `compiler-builtins` to 0.1.126
this requires the addition of a bootstrap variant of the new `naked_asm!` macro
r? `@tgross35`
extracted from https://github.com/rust-lang/rust/pull/128651
[`cfg_match`] Generalize inputs
cc #115585
Changes the input type from `item` to `tt`, which makes the macro have the same functionality of `cfg_if`.
Also adds a test to ensure that `stmt_expr_attributes` is not triggered.
Utf8Chunks: add link to Utf8Chunk
It is currently surprisingly non-trivial to go from the `utf8_chunks` method to the docs of the `valid`/`invalid` methods used in the example. This should help.
Since the stabilization in #127679 has reached stage0, 1.82-beta, we can
start using `&raw` freely, and even the soft-deprecated `ptr::addr_of!`
and `ptr::addr_of_mut!` can stop allowing the unstable feature.
I intentionally did not change any documentation or tests, but the rest
of those macro uses are all now using `&raw const` or `&raw mut` in the
standard library.
fix some cfg logic around optimize_for_size and 16-bit targets
Fixes https://github.com/rust-lang/rust/issues/130818.
Fixes https://github.com/rust-lang/rust/issues/129910.
There are still some warnings when building on a 16bit target:
```
warning: struct `AlignedStorage` is never constructed
--> /home/r/src/rust/rustc.2/library/core/src/slice/sort/stable/mod.rs:135:8
|
135 | struct AlignedStorage<T, const N: usize> {
| ^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: associated items `new` and `as_uninit_slice_mut` are never used
--> /home/r/src/rust/rustc.2/library/core/src/slice/sort/stable/mod.rs:141:8
|
140 | impl<T, const N: usize> AlignedStorage<T, N> {
| -------------------------------------------- associated items in this implementation
141 | fn new() -> Self {
| ^^^
...
145 | fn as_uninit_slice_mut(&mut self) -> &mut [MaybeUninit<T>] {
| ^^^^^^^^^^^^^^^^^^^
warning: function `quicksort` is never used
--> /home/r/src/rust/rustc.2/library/core/src/slice/sort/unstable/quicksort.rs:19:15
|
19 | pub(crate) fn quicksort<'a, T, F>(
| ^^^^^^^^^
warning: `core` (lib) generated 3 warnings
```
However, the cfg stuff here is sufficiently messy that I didn't want to touch more of it. I think all `feature = "optimize_for_size"` should become `any(feature = "optimize_for_size", target_pointer_width = "16")` but I am not entirely certain. Warnings are fine, Miri will just ignore them.
Cc `@Voultapher`
Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable
- Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
- Unstable sort jumps directly to the branchless heapsort fallback.
- select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.
Some code is duplicated but de-duplication would bring it's own problems. For example `swap_if_less` is critical for performance, if the sorting networks don't inline it perf drops drastically, however `#[inline(always)]` is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate `swap_if_less` once and call it. Another aspect that would suffer when making `swap_if_less` pub, is having to cfg out dozens of functions in in smallsort module.
Part of https://github.com/rust-lang/rust/issues/125612
r? `@Kobzol`
Mark `make_ascii_uppercase` and `make_ascii_lowercase` in `[u8]` and `str` as const.
Relevant tracking issue: #130698
This PR extends #130697 and #130713 to the similar methods in byte slices (`[u8]`) and string slices (`str`).
For the `str` methods, this simply requires adding the `const` specifier to the function signatures. The `[u8]` methods, however, require (at least a temporary) reimplementation due to the use of iterators and `for` loops.
Mark `u8::make_ascii_uppercase` and `u8::make_ascii_lowercase` as const.
Relevant tracking issue: #130698
This PR extends #130697 by also marking the `make_ascii_uppercase` and `make_ascii_lowercase` methods in `u8` as const.
The `const_char_make_ascii` feature gate is additionally renamed to `const_make_ascii`.
Support `char::encode_utf16` in const scenarios.
Relevant tracking issue: #130660
The method `char::encode_utf16` should be marked "const" to allow compile-time conversions.
This PR additionally rewrites the `encode_utf16_raw` function for better readability whilst also reducing the amount of unsafe code.
try-job: x86_64-msvc
Add str.as_str() for easy Deref to string slices
Working with `Box<str>` is cumbersome, because in places like `iter.filter()` it can end up being `&Box<str>` or even `&&Box<str>`, and such type doesn't always get auto-dereferenced as expected.
Dereferencing such box to `&str` requires ugly syntax like `&**boxed_str` or `&***boxed_str`, with the exact amount of `*`s.
`Box<str>` is [not easily comparable with other string types](https://github.com/rust-lang/rust/pull/129852) via `PartialEq`. `Box<str>` won't work for lookups in types like `HashSet<String>`, because `Borrow<String>` won't take types like `&Box<str>`. OTOH `set.contains(s.as_str())` works nicely regardless of levels of indirection.
`String` has a simple solution for this: the `as_str()` method, and `Box<str>` should too.
Mark `char::make_ascii_uppercase` and `char::make_ascii_lowercase` as const.
Relevant tracking issue: #130698
The `make_ascii_uppercase` and `make_ascii_lowercase` methods in `char` should be marked "const."
With the stabilisation of [`const_mut_refs`](https://github.com/rust-lang/rust/issues/57349/), this simply requires adding the `const` specifier to the function signatures.
Address diagnostics regression for `const_char_encode_utf8`.
Relevant tracking issue: #130512
This PR regains full diagnostics for non-const calls to `char::encode_utf8`.
[Clippy] Get rid of most `std` `match_def_path` usage, swap to diagnostic items.
Part of https://github.com/rust-lang/rust-clippy/issues/5393.
This was going to remove all `std` paths, but `SeekFrom` has issues being cleanly replaced with a diagnostic item as the paths are for variants, which currently cannot be diagnostic items.
This also, as a last step, categories the paths to help with future path removals.
Improve documentation for <integer>::from_str_radix
Two improvements to the documentation:
- Document `-` as a valid character for signed integer destinations
- Make the documentation even more clear that extra whitespace and non-digit characters is invalid. Many other languages, e.g. c++, are very permissive in string to integer routines and simply try to consume as much as they can, ignoring the rest. This is trying to make the transition for developers who are used to the conversion semantics in these languages a bit easier.
Pass `fmt::Arguments` by reference to `PanicInfo` and `PanicMessage`
Resolves#129330
For some reason after #115974 and #126732 optimizations applied to panic handler became worse and compiler stopped removing panic locations if they are not used in the panic message. This PR fixes that and maybe we can merge it into beta before rust 1.81 is released.
Note: optimization only works with `lto = "fat"`.
r? libs-api
Take more advantage of the `isize::MAX` limit in `Layout`
Things like `padding_needed_for` are current implemented being super careful to handle things like `Layout::size` potentially being `usize::MAX`.
But now that #95295 has happened, that's no longer a concern. It's possible to add two `Layout::size`s together without risking overflow now.
So take advantage of that to remove a bunch of checked math that's not actually needed. For example, the round-up-and-add-next-size in `extend` doesn't need any overflow checks at all, just the final check for compatibility with the alignment.
(And while I was doing that I made it all unstably const, because there's nothing in `Layout` that's fundamentally runtime-only.)
Things like `padding_needed_for` are current implemented being super careful to handle things like `Layout::size` potentially being `usize::MAX`.
But now that 95295 has happened, that's no longer a concern. It's possible to add two `Layout::size`s together without risking overflow now.
So take advantage of that to remove a bunch of checked math that's not actually needed. For example, the round-up-and-add-next-size in `extend` doesn't need any overflow checks at all, just the final check for compatibility with the alignment.
(And while I was doing that I made it all unstably const, because there's nothing in `Layout` that's fundamentally runtime-only.)
There is a Self: PartialOrd bound in Ord::clamp, but it is already
required by the trait itself. Likely a left-over from the const trait
deletion in 76dbe29104.
Reported-by: @noeensarguet
In the implementation of `force_mut`, I chose performance over safety.
For `LazyLock` this isn't really a choice; the code has to be unsafe.
But for `LazyCell`, we can have a full-safe implementation, but it will
be a bit less performant, so I went with the unsafe approach.
Document futility of printing temporary pointers
In the user forum I've seen a few people trying to understand how borrowing and moves are implemented by peppering their code with printing of `{:p}` of references to variables and expressions. This is a bad idea. It gives misleading and confusing results, because of autoderef magic, printing pointers of temporaries on the stack, and/or causes LLVM to optimize code differently when values had their address exposed.
simplify float::classify logic
I played around with the float-classify test in the hope of triggering x87 bugs by strategically adding `black_box`, and still the exact expression `@beetrees` suggested [here](https://github.com/rust-lang/rust/pull/129835#issuecomment-2325661597) remains the only case I found where we get the wrong result on x87. Curiously, this bug only occurs when MIR optimizations are enabled -- probably the extra inlining that does is required for LLVM to hit the right "bad" case in the backend. But even for that case, it makes no difference whether `classify` is implemented in the simple bit-pattern-based version or the more complicated version we had before.
Without even a single testcase that can distinguish our `classify` from the naive version, I suggest we switch to the naive version.
Add `core::panic::abort_unwind`
`abort_unwind` is like `catch_unwind` except that it aborts the process if it unwinds, using the `#[rustc_nounwind]` mechanism also used by `extern "C" fn` to abort unwinding. The docs attempt to make it clear when to (rarely) and when not to (usually) use the function.
Although usage of the function is discouraged, having it available will help to normalize the experience when abort_unwind shims are hit, as opposed to the current ecosystem where there exist multiple common patterns for converting unwinding into a process abort.
For further information and justification, see the linked ACP.
- Tracking issue: https://github.com/rust-lang/rust/issues/130338
- ACP: https://github.com/rust-lang/libs-team/issues/441
move Option::unwrap_unchecked into const_option feature gate
That's where `unwrap` and `expect` are so IMO it makes more sense to group them together.
Part of #91930, #67441
Stabilize `&mut` (and `*mut`) as well as `&Cell` (and `*const Cell`) in const
This stabilizes `const_mut_refs` and `const_refs_to_cell`. That allows a bunch of new things in const contexts:
- Mentioning `&mut` types
- Creating `&mut` and `*mut` values
- Creating `&T` and `*const T` values where `T` contains interior mutability
- Dereferencing `&mut` and `*mut` values (both for reads and writes)
The same rules as at runtime apply: mutating immutable data is UB. This includes mutation through pointers derived from shared references; the following is diagnosed with a hard error:
```rust
#[allow(invalid_reference_casting)]
const _: () = {
let mut val = 15;
let ptr = &val as *const i32 as *mut i32;
unsafe { *ptr = 16; }
};
```
The main limitation that is enforced is that the final value of a const (or non-`mut` static) may not contain `&mut` values nor interior mutable `&` values. This is necessary because the memory those references point to becomes *read-only* when the constant is done computing, so (interior) mutable references to such memory would be pretty dangerous. We take a multi-layered approach here to ensuring no mutable references escape the initializer expression:
- A static analysis rejects (interior) mutable references when the referee looks like it may outlive the current MIR body.
- To be extra sure, this static check is complemented by a "safety net" of dynamic checks. ("Dynamic" in the sense of "running during/after const-evaluation, e.g. at runtime of this code" -- in contrast to "static" which works entirely by looking at the MIR without evaluating it.)
- After the final value is computed, we do a type-driven traversal of the entire value, and if we find any `&mut` or interior-mutable `&` we error out.
- However, the type-driven traversal cannot traverse `union` or raw pointers, so there is a second dynamic check where if the final value of the const contains any pointer that was not derived from a shared reference, we complain. This is currently a future-compat lint, but will become an ICE in #128543. On the off-chance that it's actually possible to trigger this lint on stable, I'd prefer if we could make it an ICE before stabilizing const_mut_refs, but it's not a hard blocker. This part of the "safety net" is only active for mutable references since with shared references, it has false positives.
Altogether this should prevent people from leaking (interior) mutable references out of the const initializer.
While updating the tests I learned that surprisingly, this code gets rejected:
```rust
const _: Vec<i32> = {
let mut x = Vec::<i32>::new(); //~ ERROR destructor of `Vec<i32>` cannot be evaluated at compile-time
let r = &mut x;
let y = x;
y
};
```
The analysis that rejects destructors in `const` is very conservative when it sees an `&mut` being created to `x`, and then considers `x` to be always live. See [here](https://github.com/rust-lang/rust/issues/65394#issuecomment-541499219) for a longer explanation. `const_precise_live_drops` will solve this, so I consider this problem to be tracked by https://github.com/rust-lang/rust/issues/73255.
Cc `@rust-lang/wg-const-eval` `@rust-lang/lang`
Cc https://github.com/rust-lang/rust/issues/57349
Cc https://github.com/rust-lang/rust/issues/80384
simd_shuffle: require index argument to be a vector
Remove some codegen hacks by forcing the SIMD shuffle `index` argument to be a vector, which means (thanks to https://github.com/rust-lang/rust/pull/128537) that it will automatically be passed as an immediate in LLVM. The only special-casing we still have is for the extra sanity-checks we add that ensure that the indices are all in-bounds. (And the GCC backend needs to do a bunch of work since the Rust intrinsic is modeled after what LLVM expects, which seems to be quite different from what GCC expects.)
Fixes https://github.com/rust-lang/rust/issues/128738, see that issue for more context.
fix doc comments for Peekable::next_if(_eq)
Fix references to a nonexistent `consume` function in the doc comments for `Peekable::next_if` and `Peekable::next_if_eq`.
some const cleanup: remove unnecessary attributes, add const-hack indications
I learned that we use `FIXME(const-hack)` on top of the "const-hack" label. That seems much better since it marks the right place in the code and moves around with the code. So I went through the PRs with that label and added appropriate FIXMEs in the code. IMO this means we can then remove the label -- Cc ``@rust-lang/wg-const-eval.``
I also noticed some const stability attributes that don't do anything useful, and removed them.
r? ``@fee1-dead``
Rollup of 11 pull requests
Successful merges:
- #128316 (Stabilize most of `io_error_more`)
- #129473 (use `download-ci-llvm=true` in the default compiler config)
- #129529 (Add test to build crates used by r-a on stable)
- #129981 (Remove `serialized_bitcode` from `LtoModuleCodegen`.)
- #130094 (Inform the solver if evaluation is concurrent)
- #130132 ([illumos] enable SIGSEGV handler to detect stack overflows)
- #130146 (bootstrap `naked_asm!` for `compiler-builtins`)
- #130149 (Helper function for formatting with `LifetimeSuggestionPosition`)
- #130152 (adapt a test for llvm 20)
- #130162 (bump download-ci-llvm-stamp)
- #130164 (move some const fn out of the const_ptr_as_ref feature)
r? `@ghost`
`@rustbot` modify labels: rollup
move some const fn out of the const_ptr_as_ref feature
When a `const fn` is still `#[unstable]`, it should generally use the same feature to track its regular stability and const-stability. Then when that feature moves towards stabilization we can decide whether the const-ness can be stabilized as well, or whether it should be moved into a new feature.
Also, functions like `ptr::as_ref` (which returns an `Option<&mut T>`) require `is_null`, which is tricky and blocked on some design concerns (see #74939). So move those to the is_null feature gate, as they should be stabilized together with `ptr.is_null()`.
Affects #91822, #122034, #75402, https://github.com/rust-lang/rust/issues/74939
bootstrap `naked_asm!` for `compiler-builtins`
tracking issue: https://github.com/rust-lang/rust/issues/90957
parent PR: https://github.com/rust-lang/rust/pull/128651
in this PR, `naked_asm!` is added as an alias for `asm!` with one difference: `options(noreturn)` is always enabled by `naked_asm!`. That makes it future-compatible for when `naked_asm!` starts disallowing `options(noreturn)` later.
The `naked_asm!` macro must be introduced first so that we can upgrade `compiler-builtins` to use it, and can then change the implementation of `naked_asm!` in https://github.com/rust-lang/rust/pull/128651
I've added some usages for `naked_asm!` in the tests, so we can be confident that it works, but I've left upgrading the whole test suite to the parent PR.
r? ``@Amanieu``