nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
fix: #128855 Ensure `Guard`'s `drop` method is removed at `opt-level=s` for `…
fix: #128855
…Copy` types
Added `#[inline]` to the `drop` method in the `Guard` implementation to ensure that the method is removed by the compiler at optimization level `opt-level=s` for `Copy` types. This change aims to align the method's behavior with optimization expectations and ensure it does not affect performance.
r? `@scottmcm`
Apply "polymorphization at home" to RawVec
The idea here is to move all the logic in RawVec into functions with explicit size and alignment parameters. This should eliminate all the fussing about how tweaking RawVec code produces large swings in compile times.
This uncovered https://github.com/rust-lang/rust-clippy/issues/12979, so I've modified the relevant test in a way that tries to preserve the spirit of the test without tripping the ICE.
core: optimise Debug impl for ascii::Char
Rather than writing character at a time, optimise Debug implementation
for core::ascii::Char such that it writes the entire representation
with a single write_str call.
With that, add tests for Display and Debug.
Issue: https://github.com/rust-lang/rust/issues/110998
Improve `Ord` violation help
Recent experience in #128083 showed that the panic message when an Ord violation is detected by the new sort implementations can be confusing. So this PR aims to improve it, together with minor bug fixes in the doc comments for sort*, sort_unstable* and select_nth_unstable*.
Is it possible to get these changes into the 1.81 release? It doesn't change behavior and would greatly help when users encounter this panic for the first time, which they may after upgrading to 1.81.
Tagging `@orlp`
Rather than writing character at a time, optimise Debug implementation
for core::ascii::Char such that it writes the entire representation as
with a single write_str call.
With that, add tests for Display and Debug implementations.
Added `#[inline]` to the `drop` method in the `Guard` implementation to ensure that the method is removed by the compiler at optimization level `opt-level=s` for `Copy` types. This change aims to align the method's behavior with optimization expectations and ensure it does not affect performance.
Mark `{f32,f64}::{next_up,next_down,midpoint}` inline
Most float functions are marked `#[inline]` so any float symbols used by these functions only need to be provided if the function itself is used. RFL recently noticed that `next_up`, `next_down`, and `midpoint` for `f32` and `f64` are not inline, which causes linker errors when building with certain configurations <https://lore.kernel.org/all/20240806150619.192882-1-ojeda@kernel.org/>.
Add the missing attributes so the symbols should no longer be required.
Most float functions are marked `#[inline]` so any float symbols used by
these functions only need to be provided if the function itself is used.
RFL recently noticed that `next_up`, `next_down`, and `midpoint` for
`f32` and `f64` are not inline, which causes linker errors when building
with certain configurations [1].
Add the missing attributes so the symbols should no longer be required.
Cc: Gary Guo <gary@garyguo.net>
Reported-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/all/20240806150619.192882-1-ojeda@kernel.org/ [1]
Add `f16` and `f128` math functions
This adds intrinsics and math functions for `f16` and `f128` floating point types. Support is quite limited and some things are broken so tests don't run on many platforms, but this provides a starting point.
Correct the const stabilization of `<[T]>::last_chunk`
`<[T]>::first_chunk` became const stable in 1.77, but `<[T]>::last_chunk` was left out. This was fixed in 3488679768, which reached stable in 1.80, making `<[T]>::last_chunk` const stable as of that version, but it is documented as being const stable as 1.77. While this is what should have happened, the documentation should reflect what actually did happen.
Remove unnecessary constants from flt2dec dragon
The "dragon" `flt2dec` algorithm uses multi-precision multiplication by (sometimes large) powers of 10. It has precomputed some values to help with these calculations.
BUT:
* There is no need to store powers of 10 and 2 * powers of 10: it is trivial to compute the second from the first.
* We can save a chunk of memory by storing powers of 5 instead of powers of 10 for the large powers (and just shifting as appropriate).
* This also slightly speeds up the routines (by ~1-3%) since the intermediate products are smaller and the shift is cheap.
In this PR, we remove the unnecessary constants and do the necessary adjustments.
Relevant benchmarks before (on my Threadripper 3970X, x86_64-unknown-linux-gnu):
```
num::flt2dec::bench_big_shortest 137.92/iter +/- 2.24
num::flt2dec::strategy:🐉:bench_big_exact_12 2135.28/iter +/- 38.90
num::flt2dec::strategy:🐉:bench_big_exact_3 904.95/iter +/- 10.58
num::flt2dec::strategy:🐉:bench_big_exact_inf 47230.33/iter +/- 320.84
num::flt2dec::strategy:🐉:bench_big_shortest 3915.05/iter +/- 51.37
```
and after:
```
num::flt2dec::bench_big_shortest 137.40/iter +/- 2.03
num::flt2dec::strategy:🐉:bench_big_exact_12 2101.10/iter +/- 25.63
num::flt2dec::strategy:🐉:bench_big_exact_3 873.86/iter +/- 4.20
num::flt2dec::strategy:🐉:bench_big_exact_inf 47468.19/iter +/- 374.45
num::flt2dec::strategy:🐉:bench_big_shortest 3877.01/iter +/- 45.74
```
`<[T]>::first_chunk` became const stable in 1.77, but `<[T]>::last_chunk` was
left out. This was fixed in 3488679768, which reached stable in 1.80,
making `<[T]>::last_chunk` const stable as of that version, but it is
documented as being const stable as 1.77. While this is what should have
happened, the documentation should reflect what actually did happen.
Implement `UncheckedIterator` directly for `RepeatN`
This just pulls the code out of `next` into `next_unchecked`, rather than making the `Some` and `unwrap_unchecked`ing it.
And while I was touching it, I added a codegen test that `array::repeat` for something that's just `Clone`, not `Copy`, still ends up optimizing to the same thing as `[x; n]`: <https://rust.godbolt.org/z/YY3a5ajMW>.
The "dragon" `flt2dec` algorithm uses multi-precision multiplication by
(sometimes large) powers of 10. It has precomputed some values to help
with these calculations.
BUT:
* There is no need to store powers of 10 and 2 * powers of 10: it is
trivial to compute the second from the first.
* We can save a chunk of memory by storing powers of 5 instead of powers
of 10 for the large powers (and just shifting by 2 as appropriate).
* This also slightly speeds up the routines (by ~1-3%) since the
intermediate products are smaller and the shift is cheap.
In this PR, we remove the unnecessary constants and do the necessary
adjustments.
Relevant benchmarks before (on my Threadripper 3970X, x86_64-unknown-linux-gnu):
```
num::flt2dec::bench_big_shortest 137.92/iter +/- 2.24
num::flt2dec::strategy:🐉:bench_big_exact_12 2135.28/iter +/- 38.90
num::flt2dec::strategy:🐉:bench_big_exact_3 904.95/iter +/- 10.58
num::flt2dec::strategy:🐉:bench_big_exact_inf 47230.33/iter +/- 320.84
num::flt2dec::strategy:🐉:bench_big_shortest 3915.05/iter +/- 51.37
```
and after:
```
num::flt2dec::bench_big_shortest 137.40/iter +/- 2.03
num::flt2dec::strategy:🐉:bench_big_exact_12 2101.10/iter +/- 25.63
num::flt2dec::strategy:🐉:bench_big_exact_3 873.86/iter +/- 4.20
num::flt2dec::strategy:🐉:bench_big_exact_inf 47468.19/iter +/- 374.45
num::flt2dec::strategy:🐉:bench_big_shortest 3877.01/iter +/- 45.74
```
Revert recent changes to dead code analysis
This is a revert to recent changes to dead code analysis, namely:
* efdf219 Rollup merge of #128104 - mu001999-contrib:fix/128053, r=petrochenkov
* a70dc297a8 Rollup merge of #127017 - mu001999-contrib:dead/enhance, r=pnkfelix
* 31fe9628cf Rollup merge of #127107 - mu001999-contrib:dead/enhance-2, r=pnkfelix
* 2724aeaaeb Rollup merge of #126618 - mu001999-contrib:dead/enhance, r=pnkfelix
* 977c5fd419 Rollup merge of #126315 - mu001999-contrib:fix/126289, r=petrochenkov
* 13314df21b Rollup merge of #125572 - mu001999-contrib:dead/enhance, r=pnkfelix
There is an additional change stacked on top, which suppresses false-negatives that were masked by this work. I believe the functions that are touched in that code are legitimately unused functions and the types are not reachable since this `AnonPipe` type is not publically reachable -- please correct me if I'm wrong cc `@NobodyXu` who added these in ##127153.
Some of these reverts (#126315 and #126618) are only included because it makes the revert apply cleanly, and I think these changes were only done to fix follow-ups from the other PRs?
I apologize for the size of the PR and the churn that it has on the codebase (and for reverting `@mu001999's` work here), but I'm putting this PR up because I am concerned that we're making ad-hoc changes to fix bugs that are fallout of these PRs, and I'd like to see these changes reimplemented in a way that's more separable from the existing dead code pass. I am happy to review any code to reapply these changes in a more separable way.
cc `@mu001999`
r? `@pnkfelix`
Fixes#128272Fixes#126169
Added SHA512, SM3, SM4 target-features and `sha512_sm_x86` feature gate
This is an effort towards #126624. This adds support for these 3 target-features and introduces the feature flag `sha512_sm_x86`, which would gate these target-features and the yet-to-be-implemented detection and intrinsics in stdarch.
Rewrite binary search implementation
This PR builds on top of #128250, which should be merged first.
This restores the original binary search implementation from #45333 which has the nice property of having a loop count that only depends on the size of the slice. This, along with explicit conditional moves from #128250, means that the entire binary search loop can be perfectly predicted by the branch predictor.
Additionally, LLVM is able to unroll the loop when the slice length is known at compile-time. This results in a very compact code sequence of 3-4 instructions per binary search step and zero branches.
Fixes#53823Fixes#115271
raw_eq: using it on bytes with provenance is not UB (outside const-eval)
The current behavior of raw_eq violates provenance monotonicity. See https://github.com/rust-lang/rust/pull/124921 for an explanation of provenance monotonicity. It is violated in raw_eq because comparing bytes without provenance is well-defined, but adding provenance makes the operation UB.
So remove the no-provenance requirement from raw_eq. However, the requirement stays in-place for compile-time invocations of raw_eq, that indeed cannot deal with provenance.
Cc `@rust-lang/opsem`
Due to a LLVM bug, `f128` math functions link successfully but LLVM
chooses the wrong symbols (`long double` symbols rather than those for
binary128).
Since this is a notable problem that may surprise a number of users, add
a note about it.
Link: https://github.com/llvm/llvm-project/issues/44744
`min`, `max`, and similar functions require external math routines. Add
these under the same gates as `std` math functions (`reliable_f16_math`
and `reliable_f128_math`).
- Use if the implementation of [`Ord`] for `T`
language
- Link to total order wiki page
- Rework total order help and examples
- Improve language to be more precise and less
prone to misunderstandings.
- Fix usage of `sort_unstable_by` in `sort_by`
example
- Fix missing author mention
- Use more consistent example input for sort
- Use more idiomatic assert_eq! in examples
- Use more natural "comparison function" language
instead of "comparator function"