std: fix stdout-before-main
Fixes#130210.
Since #124881, `ReentrantLock` uses `ThreadId` to identify threads. This has the unfortunate consequence of breaking uses of `Stdout` before main: Locking the `ReentrantLock` that synchronizes the output will initialize the thread ID before the handle for the main thread is set in `rt::init`. But since that would overwrite the current thread ID, `thread::set_current` triggers an abort.
This PR fixes the problem by using the already initialized thread ID for constructing the main thread handle and allowing `set_current` calls that do not change the thread's ID.
Stabilize const `ptr::write*` and `mem::replace`
Since `const_mut_refs` and `const_refs_to_cell` have been stabilized, we may now also stabilize the ability to write to places during const evaluation inside our library API. So, we now propose the `const fn` version of `ptr::write` and its variants. This allows us to also stabilize `mem::replace` and `ptr::replace`.
- const `mem::replace`: https://github.com/rust-lang/rust/issues/83164#issuecomment-2338660862
- const `ptr::write{,_bytes,_unaligned}`: https://github.com/rust-lang/rust/issues/86302#issuecomment-2330275266
Their implementation requires an additional internal stabilization of `const_intrinsic_forget`, which is required for `*::write*` and thus `*::replace`. Thus we const-stabilize the internal intrinsics `forget`, `write_bytes`, and `write_via_move`.
Fixes#130210.
Since #124881, `ReentrantLock` uses `ThreadId` to identify threads. This has the unfortunate consequence of breaking uses of `Stdout` before main: Locking the `ReentrantLock` that synchronizes the output will initialize the thread ID before the handle for the main thread is set in `rt::init`. But since that would overwrite the current thread ID, `thread::set_current` triggers an abort.
This PR fixes the problem by using the already initialized thread ID for constructing the main thread handle and allowing `set_current` calls that do not change the thread's ID.
Migrate lib's `&Option<T>` into `Option<&T>`
Trying out my new lint https://github.com/rust-lang/rust-clippy/pull/13336 - according to the [video](https://www.youtube.com/watch?v=6c7pZYP_iIE), this could lead to some performance and memory optimizations.
Basic thoughts expressed in the video that seem to make sense:
* `&Option<T>` in an API breaks encapsulation:
* caller must own T and move it into an Option to call with it
* if returned, the owner must store it as Option<T> internally in order to return it
* Performance is subject to compiler optimization, but at the basics, `&Option<T>` points to memory that has `presence` flag + value, whereas `Option<&T>` by specification is always optimized to a single pointer.
intrinsics fmuladdf{32,64}: expose llvm.fmuladd.* semantics
Add intrinsics `fmuladd{f32,f64}`. This computes `(a * b) + c`, to be fused if the code generator determines that (i) the target instruction set has support for a fused operation, and (ii) that the fused operation is more efficient than the equivalent, separate pair of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
The codegen_cranelift uses the `fma` function from libc, which is a correct implementation, but without the desired performance semantic. I think this requires an update to cranelift to expose a suitable instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same way (using `fma` from libc).
---
This topic has been discussed a few times on Zulip and was suggested, for example, by `@workingjubilee` in [Effect of fma disabled](https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Effect.20of.20fma.20disabled/near/274179331).
Stabilise `const_char_encode_utf8`.
Closes: #130512
This PR stabilises the `const_char_encode_utf8` feature gate (i.e. support for `char::encode_utf8` in const scenarios).
Note that the linked tracking issue is currently awaiting FCP.
Port sort-research-rs test suite to Rust stdlib tests
This PR is a followup to https://github.com/rust-lang/rust/pull/124032. It replaces the tests that test the various sort functions in the standard library with a test-suite developed as part of https://github.com/Voultapher/sort-research-rs. The current tests suffer a couple of problems:
- They don't cover important real world patterns that the implementations take advantage of and execute special code for.
- The input lengths tested miss out on code paths. For example, important safety property tests never reach the quicksort part of the implementation.
- The miri side is often limited to `len <= 20` which means it very thoroughly tests the insertion sort, which accounts for 19 out of 1.5k LoC.
- They are split into to core and alloc, causing code duplication and uneven coverage.
- ~~The randomness is tied to a caller location, wasting the space exploration capabilities of randomized testing.~~ The randomness is not repeatable, as it relies on `std:#️⃣:RandomState::new().build_hasher()`.
Most of these issues existed before https://github.com/rust-lang/rust/pull/124032, but they are intensified by it. One thing that is new and requires additional testing, is that the new sort implementations specialize based on type properties. For example `Freeze` and non `Freeze` execute different code paths.
Effectively there are three dimensions that matter:
- Input type
- Input length
- Input pattern
The ported test-suite tests various properties along all three dimensions, greatly improving test coverage. It side-steps the miri issue by preferring sampled approaches. For example the test that checks if after a panic the set of elements is still the original one, doesn't do so for every single possible panic opportunity but rather it picks one at random, and performs this test across a range of input length, which varies the panic point across them. This allows regular execution to easily test inputs of length 10k, and miri execution up to 100 which covers significantly more code. The randomness used is tied to a fixed - but random per process execution - seed. This allows for fully repeatable tests and fuzzer like exploration across multiple runs.
Structure wise, the tests are previously found in the core integration tests for `sort_unstable` and alloc unit tests for `sort`. The new test-suite was developed to be a purely black-box approach, which makes integration testing the better place, because it can't accidentally rely on internal access. Because unwinding support is required the tests can't be in core, even if the implementation is, so they are now part of the alloc integration tests. Are there architectures that can only build and test core and not alloc? If so, do such platforms require sort testing? For what it's worth the current implementation state passes miri `--target mips64-unknown-linux-gnuabi64` which is big endian.
The test-suite also contains tests for properties that were and are given by the current and previous implementations, and likely relied upon by users but weren't tested. For example `self_cmp` tests that the two parameters `a` and `b` passed into the comparison function are never references to the same object, which if the user is sorting for example a `&mut [Mutex<i32>]` could lead to a deadlock.
Instead of using the hashed caller location as rand seed, it uses seconds since unix epoch / 10, which given timestamps in the CI should be reasonably easy to reproduce, but also allows fuzzer like space exploration.
---
Test run-time changes:
Setup:
```
Linux 6.10
rustc 1.83.0-nightly (f79a912d9 2024-09-18)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.
```
master: e9df22f
Before core integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/coretests-219cbd0308a49e2f
Time (mean ± σ): 869.6 ms ± 21.1 ms [User: 1327.6 ms, System: 95.1 ms]
Range (min … max): 845.4 ms … 917.0 ms 10 runs
# MIRIFLAGS="-Zmiri-disable-isolation" to get real time
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/core
finished in 738.44s
```
After core integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/coretests-219cbd0308a49e2f
Time (mean ± σ): 865.1 ms ± 14.7 ms [User: 1283.5 ms, System: 88.4 ms]
Range (min … max): 836.2 ms … 885.7 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/core
finished in 752.35s
```
Before alloc unit tests:
```
LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloc-19c15e6e8565aa54
Time (mean ± σ): 295.0 ms ± 9.9 ms [User: 719.6 ms, System: 35.3 ms]
Range (min … max): 284.9 ms … 319.3 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 322.75s
```
After alloc unit tests:
```
LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloc-19c15e6e8565aa54
Time (mean ± σ): 97.4 ms ± 4.1 ms [User: 297.7 ms, System: 28.6 ms]
Range (min … max): 92.3 ms … 109.2 ms 27 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 309.18s
```
Before alloc integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloctests-439e7300c61a8046
Time (mean ± σ): 103.2 ms ± 1.7 ms [User: 135.7 ms, System: 39.4 ms]
Range (min … max): 99.7 ms … 107.3 ms 28 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 231.35s
```
After alloc integration tests:
```
$ LD_LIBRARY_PATH=build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/ hyperfine build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/alloctests-439e7300c61a8046
Time (mean ± σ): 379.8 ms ± 4.7 ms [User: 4620.5 ms, System: 1157.2 ms]
Range (min … max): 373.6 ms … 386.9 ms 10 runs
$ MIRIFLAGS="-Zmiri-disable-isolation" ./x.py miri library/alloc
finished in 449.24s
```
In my opinion the results don't change iterative library development or CI execution in meaningful ways. For example currently the library doc-tests take ~66s and incremental compilation takes 10+ seconds. However I only have limited knowledge of the various local development workflows that exist, and might be missing one that is significantly impacted by this change.
Add intrinsics `fmuladd{f16,f32,f64,f128}`. This computes `(a * b) +
c`, to be fused if the code generator determines that (i) the target
instruction set has support for a fused operation, and (ii) that the
fused operation is more efficient than the equivalent, separate pair
of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
MIRI support is included for f32 and f64.
The codegen_cranelift uses the `fma` function from libc, which is a
correct implementation, but without the desired performance semantic. I
think this requires an update to cranelift to expose a suitable
instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same
way (using `fma` from libc).
rustc_target: Add sme-b16b16 as an explicit aarch64 target feature
LLVM 20 split out what used to be called b16b16 and correspond to aarch64
FEAT_SVE_B16B16 into sve-b16b16 and sme-b16b16.
Add sme-b16b16 as an explicit feature and update the codegen accordingly.
Resolves https://github.com/rust-lang/rust/pull/129894.
Stabilize const `{slice,array}::from_mut`
This PR stabilizes the following APIs as const stable as of rust `1.83`:
```rs
// core::array
pub const fn from_mut<T>(s: &mut T) -> &mut [T; 1];
// core::slice
pub const fn from_mut<T>(s: &mut T) -> &mut [T];
```
This is made possible by `const_mut_refs` being stabilized (yay).
Tracking issue: #90206
LLVM 20 split out what used to be called b16b16 and correspond to aarch64
FEAT_SVE_B16B16 into sve-b16b16 and sme-b16b16.
Add sme-b16b16 as an explicit feature and update the codegen accordingly.
Decouple WASIp2 sockets from WasiFd
This is a follow up to #129638, decoupling WASIp2's socket implementation from WASIp1's `WasiFd` as discussed with `@alexcrichton.`
Quite a few trait implementations in `std::os::fd` rely on the fact that there is an additional layer of abstraction between `Socket` and `OwnedFd`. I thus had to add a thin `WasiSocket` wrapper struct that just "forwards" to `OwnedFd`. Alternatively, I could have added a lot of conditional compilation to `std::os::fd`, which feels even worse.
Since `WasiFd::sock_accept` is no longer accessible from `TcpListener` and since WASIp2 has proper support for accepting sockets through `Socket::accept`, the `std::os::wasi::net` module has been removed from WASIp2, which only contains a single `TcpListenerExt` trait with a `sock_accept` method as well as an implementation for `TcpListener`. Let me know if this is an acceptable solution.
liballoc: introduce String, Vec const-slicing
This change `const`-qualifies many methods on `Vec` and `String`, notably `as_slice`, `as_str`, `len`. These changes are made behind the unstable feature flag `const_vec_string_slice`.
## Motivation
This is to support simultaneous variance over ownership and constness. I have an enum type that may contain either `String` or `&str`, and I want to produce a `&str` from it in a possibly-`const` context.
```rust
enum StrOrString<'s> {
Str(&'s str),
String(String),
}
impl<'s> StrOrString<'s> {
const fn as_str(&self) -> &str {
match self {
// In a const-context, I really only expect to see this variant, but I can't switch the implementation
// in some mode like #[cfg(const)] -- there has to be a single body
Self::Str(s) => s,
// so this is a problem, since it's not `const`
Self::String(s) => s.as_str(),
}
}
}
```
Currently `String` and `Vec` don't support this, but can without functional changes. Similar logic applies for `len`, `capacity`, `is_empty`.
## Changes
The essential thing enabling this change is that `Unique::as_ptr` is `const`. This lets us convert `RawVec::ptr` -> `Vec::as_ptr` -> `Vec::as_slice` -> `String::as_str`.
I had to move the `Deref` implementations into `as_{str,slice}` because `Deref` isn't `#[const_trait]`, but I would expect this change to be invisible up to inlining. I moved the `DerefMut` implementations as well for uniformity.
This change `const`-qualifies many methods on Vec and String, notably
`as_slice`, `as_str`, `len`. These changes are made behind the unstable
feature flag `const_vec_string_slice` with the following tracking issue:
https://github.com/rust-lang/rust/issues/129041
Android: Debug assertion after setting thread name
While `prctl` cannot fail if it points to a valid buffer, it's still better to assert the result as it's done for other places.