Revert "Forbid inlining `thread_local!`'s `__getit` function on Windows"
Revert of #101368, fixes#104852.
I'd rather not do this since that's a soundness fix and this is hitting some compiler bug, but I don't really know an alternative.
r? `@ChrisDenton`
Previously, async constructs would be lowered to "normal" generators,
with an additional `from_generator` / `GenFuture` shim in between to
convert from `Generator` to `Future`.
The compiler will now special-case these generators internally so that
async constructs will *directly* implement `Future` without the need
to go through the `from_generator` / `GenFuture` shim.
The primary motivation for this change was hiding this implementation
detail in stack traces and debuginfo, but it can in theory also help
the optimizer as there is less abstractions to see through.
Forbid inlining `thread_local!`'s `__getit` function on Windows
Sadly, this will make things slower to avoid UB in an edge case, but it seems hard to avoid... and really whenever I look at this code I can't help but think we're asking for trouble.
It's pretty dodgy for us to leave this as a normal function rather than `#[inline(never)]`, given that if it *does* get inlined into a dynamically linked component, it's extremely unsafe (you get some other thread local, or if you're lucky, crash). Given that it's pretty rare for people to use dylibs on Windows, the fact that we haven't gotten bug reports about it isn't really that convincing. Ideally we'd come up with some kind of compiler solution (that avoids paying for this cost when static linking, or *at least* for use within the same crate...), but it's not clear what that looks like.
Oh, and because all this is only needed when we're implementing `thread_local!` with `#[thread_local]`, this patch adjusts the `cfg_attr` to be `all(windows, target_thread_local)` as well.
r? ``@ChrisDenton``
See also #84933, which is about improving the situation.
disable strict-provenance-violating doctests in Miri
Most of these are on deprecated unstable functions anyway. This lets us run the remaining doctests with `-Zmiri-strict-provenance`, which I think is a win.
r? `@thomcc`
Constify remaining `Layout` methods
Makes the methods on `Layout` that aren't yet unstably const, under the same feature and issue, #67521. Most of them required no changes, only non-trivial change is probably constifying `ValidAlignment` which may affect #102072
Deprecate the unstable `ptr_to_from_bits` feature
I propose that we deprecate the (unstable!) `to_bits` and `from_bits` methods on raw pointers. (With the intent to ~~remove them once `addr` has been around long enough to make the transition easy on people -- maybe another 6 weeks~~ remove them fairly soon after, as the strict and expose versions have been around for a while already.)
The APIs that came from the strict provenance explorations (#95228) are a more holistic version of these, and things like `.expose_addr()` work for the "that cast looks sketchy" case even if the full strict provenance stuff never happens. (As a bonus, `addr` is even shorter than `to_bits`, though it is only applicable if people can use full strict provenance! `addr` is *not* a direct replacement for `to_bits`.) So I think it's fine to move away from the `{to|from}_bits` methods, and encourage the others instead.
That also resolves the worry that was brought up (I forget where) that `q.to_bits()` and `(*q).to_bits()` both work if `q` is a pointer-to-floating-point, as they also have a `to_bits` method.
Tracking issue #91126
Code search: https://github.com/search?l=Rust&p=1&q=ptr_to_from_bits&type=Code
For potential pushback, some users in case they want to chime in
- `@RSSchermer` 365bb68541/arwa/src/html/custom_element.rs (L105)
- `@strax` 99616d1dbf/openexr/src/core/alloc.rs (L36)
- `@MiSawa` 577c622358/crates/kernel/src/timer.rs (L50)
Add slice methods for indexing via an array of indices.
Disclaimer: It's been a while since I contributed to the main Rust repo, apologies in advance if this is large enough already that it should've been an RFC.
---
# Update:
- Based on feedback, removed the `&[T]` variant of this API, and removed the requirements for the indices to be sorted.
# Description
This adds the following slice methods to `core`:
```rust
impl<T> [T] {
pub unsafe fn get_many_unchecked_mut<const N: usize>(&mut self, indices: [usize; N]) -> [&mut T; N];
pub fn get_many_mut<const N: usize>(&mut self, indices: [usize; N]) -> Option<[&mut T; N]>;
}
```
This allows creating multiple mutable references to disjunct positions in a slice, which previously required writing some awkward code with `split_at_mut()` or `iter_mut()`. For the bound-checked variant, the indices are checked against each other and against the bounds of the slice, which requires `N * (N + 1) / 2` comparison operations.
This has a proof-of-concept standalone implementation here: https://crates.io/crates/index_many
Care has been taken that the implementation passes miri borrow checks, and generates straight-forward assembly (though this was only checked on x86_64).
# Example
```rust
let v = &mut [1, 2, 3, 4];
let [a, b] = v.get_many_mut([0, 2]).unwrap();
std::mem::swap(a, b);
*v += 100;
assert_eq!(v, &[3, 2, 101, 4]);
```
# Codegen Examples
<details>
<summary>Click to expand!</summary>
Disclaimer: Taken from local tests with the standalone implementation.
## Unchecked Indexing:
```rust
pub unsafe fn example_unchecked(slice: &mut [usize], indices: [usize; 3]) -> [&mut usize; 3] {
slice.get_many_unchecked_mut(indices)
}
```
```nasm
example_unchecked:
mov rcx, qword, ptr, [r9]
mov r8, qword, ptr, [r9, +, 8]
mov r9, qword, ptr, [r9, +, 16]
lea rcx, [rdx, +, 8*rcx]
lea r8, [rdx, +, 8*r8]
lea rdx, [rdx, +, 8*r9]
mov qword, ptr, [rax], rcx
mov qword, ptr, [rax, +, 8], r8
mov qword, ptr, [rax, +, 16], rdx
ret
```
## Checked Indexing (Option):
```rust
pub unsafe fn example_option(slice: &mut [usize], indices: [usize; 3]) -> Option<[&mut usize; 3]> {
slice.get_many_mut(indices)
}
```
```nasm
mov r10, qword, ptr, [r9, +, 8]
mov rcx, qword, ptr, [r9, +, 16]
cmp rcx, r10
je .LBB0_7
mov r9, qword, ptr, [r9]
cmp rcx, r9
je .LBB0_7
cmp rcx, r8
jae .LBB0_7
cmp r10, r9
je .LBB0_7
cmp r9, r8
jae .LBB0_7
cmp r10, r8
jae .LBB0_7
lea r8, [rdx, +, 8*r9]
lea r9, [rdx, +, 8*r10]
lea rcx, [rdx, +, 8*rcx]
mov qword, ptr, [rax], r8
mov qword, ptr, [rax, +, 8], r9
mov qword, ptr, [rax, +, 16], rcx
ret
.LBB0_7:
mov qword, ptr, [rax], 0
ret
```
## Checked Indexing (Panic):
```rust
pub fn example_panic(slice: &mut [usize], indices: [usize; 3]) -> [&mut usize; 3] {
let len = slice.len();
match slice.get_many_mut(indices) {
Some(s) => s,
None => {
let tmp = indices;
index_many::sorted_bound_check_failed(&tmp, len)
}
}
}
```
```nasm
example_panic:
sub rsp, 56
mov rax, qword, ptr, [r9]
mov r10, qword, ptr, [r9, +, 8]
mov r9, qword, ptr, [r9, +, 16]
cmp r9, r10
je .LBB0_6
cmp r9, rax
je .LBB0_6
cmp r9, r8
jae .LBB0_6
cmp r10, rax
je .LBB0_6
cmp rax, r8
jae .LBB0_6
cmp r10, r8
jae .LBB0_6
lea rax, [rdx, +, 8*rax]
lea r8, [rdx, +, 8*r10]
lea rdx, [rdx, +, 8*r9]
mov qword, ptr, [rcx], rax
mov qword, ptr, [rcx, +, 8], r8
mov qword, ptr, [rcx, +, 16], rdx
mov rax, rcx
add rsp, 56
ret
.LBB0_6:
mov qword, ptr, [rsp, +, 32], rax
mov qword, ptr, [rsp, +, 40], r10
mov qword, ptr, [rsp, +, 48], r9
lea rcx, [rsp, +, 32]
mov edx, 3
call index_many::bound_check_failed
ud2
```
</details>
# Extensions
There are multiple optional extensions to this.
## Indexing With Ranges
This could easily be expanded to allow indexing with `[I; N]` where `I: SliceIndex<Self>`. I wanted to keep the initial implementation simple, so I didn't include it yet.
## Panicking Variant
We could also add this method:
```rust
impl<T> [T] {
fn index_many_mut<const N: usize>(&mut self, indices: [usize; N]) -> [&mut T; N];
}
```
This would work similar to the regular index operator and panic with out-of-bound indices. The advantage would be that we could more easily ensure good codegen with a useful panic message, which is non-trivial with the `Option` variant.
This is implemented in the standalone implementation, and used as basis for the codegen examples here and there.
Pin::new_unchecked: discuss pinning closure captures
Regardless of how the discussion in https://github.com/rust-lang/rust/pull/102737 turns out, pinning closure captures is super subtle business and probably worth discussing separately.
Fix doc example for `wrapping_abs`
The `max` variable is unused. This change introduces the `min_plus` variable, to make the example similar to the one from `saturating_abs`. An alternative would be to remove the unused variable.
add examples to chunks remainder methods.
add examples to chunks remainder methods.
my motivation for adding the examples was to make it very clear that the state of the iterator (in terms of where its cursor lies) has no effect on what remainder returns.
Also fixed some links to rchunk remainder methods.
Clarify and restrict when `{Arc,Rc}::get_unchecked_mut` is allowed.
(Tracking issue for `{Arc,Rc}::get_unchecked_mut`: #63292)
(I'm using `Rc` in this comment, but it applies for `Arc` all the same).
As currently documented, `Rc::get_unchecked_mut` can lead to unsoundness when multiple `Rc`/`Weak` pointers to the same allocation exist. The current documentation only requires that other `Rc`/`Weak` pointers to the same allocation "must not be dereferenced for the duration of the returned borrow". This can lead to unsoundness in (at least) two ways: variance, and `Rc<str>`/`Rc<[u8]>` aliasing. ([playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=d7e2d091c389f463d121630ab0a37320)).
This PR changes the documentation of `Rc::get_unchecked_mut` to restrict usage to when all `Rc<T>`/`Weak<T>` have the exact same `T` (including lifetimes). I believe this is sufficient to prevent unsoundness, while still allowing `get_unchecked_mut` to be called on an aliased `Rc` as long as the safety contract is upheld by the caller.
## Alternatives
* A less strict, but still sound alternative would be to say that the caller must only write values which are valid for all aliased `Rc`/`Weak` inner types. (This was [mentioned](https://github.com/rust-lang/rust/issues/63292#issuecomment-568284090) in the tracking issue). This may be too complicated to clearly express in the documentation.
* A more strict alternative would be to say that there must not be any aliased `Rc`/`Weak` pointers, i.e. it is required that get_mut would return `Some(_)`. (This was also mentioned in the tracking issue). There is at least one codebase that this would cause to become unsound ([here](be5a164d77/src/memtable.rs (L166)), where additional locking is used to ensure unique access to an aliased `Rc<T>`; I saw this because it was linked on the tracking issue).
clarify that realloc refreshes pointer provenance even when the allocation remains in-place
This [matches what C does](https://en.cppreference.com/w/c/memory/realloc):
> The original pointer ptr is invalidated and any access to it is undefined behavior (even if reallocation was in-place).
Cc `@rust-lang/wg-allocators`
`VecDeque::resize` should re-use the buffer in the passed-in element
Today it always copies it for *every* appended element, but one of those clones is avoidable.
This adds `iter::repeat_n` (https://github.com/rust-lang/rust/issues/104434) as the primitive needed to do this. If this PR is acceptable, I'll also use this in `Vec` rather than its custom `ExtendElement` type & infrastructure that is harder to share between multiple different containers:
101e1822c3/library/alloc/src/vec/mod.rs (L2479-L2492)
* Fix doc examples for Platforms with underaligned integer primitives.
* Mutable pointer doc examples use mutable pointers.
* Fill out tracking issue.
* Minor formatting changes.
Rollup of 10 pull requests
Successful merges:
- #103117 (Use `IsTerminal` in place of `atty`)
- #103969 (Partial support for running UI tests with `download-rustc`)
- #103989 (Fix build of std for thumbv7a-pc-windows-msvc)
- #104076 (fix sysroot issue which appears for ci downloaded rustc)
- #104469 (Make "long type" printing type aware and trim types in E0275)
- #104497 (detect () to avoid redundant <> suggestion for type)
- #104577 (Don't focus on notable trait parent when hiding it)
- #104587 (Update cargo)
- #104593 (Improve spans for RPITIT object-safety errors)
- #104604 (Migrate top buttons style to CSS variables)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Fix build of std for thumbv7a-pc-windows-msvc
Attempting to build std for the tier-3 target `thumbv7a-pc-windows-msvc` fails with the following error:
```
Building stage1 std artifacts (x86_64-pc-windows-msvc -> thumbv7a-pc-windows-msvc)
..
LLVM ERROR: WinEH not implemented for this target
error: could not compile `panic_unwind`
```
EH (unwinding) is not supported by LLVM for 32 bit arm msvc targets. This changes panic unwind to use the dummy implementation for `thumbv7a-pc-windows-msvc`.
Revert Vec/Rc storage reuse opt
Remove the optimization for using storage added by #104205.
The perf wins were pretty small, and it relies on non-guarenteed behaviour. On platforms that don't implement shrinking in place, the performance will be significantly worse.
While it could be gated to platforms that do this (such as GNU), I don't think it's worth the overhead of maintaining it for very small gains. (#104565, #104563)
cc `@RalfJung` `@matthiaskrgr`
Fixes#104565Fixes#104563
Improve accuracy of asinh and acosh
This PR addresses the inaccuracy of `asinh` and `acosh` identified by the [Herbie](http://herbie.uwplse.org/) tool, `@pavpanchekha,` `@finnbear` in #104548. It also adds a couple tests that failed in the existing implementations and now pass.
Closes#104548
r? rust-lang/libs
Rollup of 8 pull requests
Successful merges:
- #102977 (remove HRTB from `[T]::is_sorted_by{,_key}`)
- #103378 (Fix mod_inv termination for the last iteration)
- #103456 (`unchecked_{shl|shr}` should use `u32` as the RHS)
- #103701 (Simplify some pointer method implementations)
- #104047 (Diagnostics `icu4x` based list formatting.)
- #104338 (Enforce that `dyn*` coercions are actually pointer-sized)
- #104498 (Edit docs for `rustc_errors::Handler::stash_diagnostic`)
- #104556 (rustdoc: use `code-header` class to format enum variants)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Enforce that `dyn*` coercions are actually pointer-sized
Implement a perma-unstable, rudimentary `PointerSized` trait to enforce `dyn*` casts are `usize`-sized for now, at least to prevent ICEs and weird codegen issues from cropping up after monomorphization since currently we enforce *nothing*.
This probably can/should be removed in favor of a more sophisticated trait for handling `dyn*` conversions when we decide on one, but I just want to get something up for discussion and experimentation for now.
r? ```@eholk``` cc ```@tmandry``` (though feel free to claim/reassign)
Fixes#102141Fixes#102173
Simplify some pointer method implementations
- Make `pointer::with_metadata_of` const (+simplify implementation) (cc #75091)
- Simplify implementation of various pointer methods
r? ```@scottmcm```
----
`from_raw_parts::<T>(this, metadata(self))` was annoying me for a while and I've finally figured out how it should _actually_ be done.
Fix mod_inv termination for the last iteration
On usize=u64 platforms, the 4th iteration would overflow the `mod_gate` back to 0. Similarly for usize=u32 platforms, the 3rd iteration would overflow much the same way.
I tested various approaches to resolving this, including approaches with `saturating_mul` and `widening_mul` to a double usize. Turns out LLVM likes `mul_with_overflow` the best. In fact now, that LLVM can see the iteration count is limited, it will happily unroll the loop into a nice linear sequence.
You will also notice that the code around the loop got simplified somewhat. Now that LLVM is handling the loop nicely, there isn’t any more reasons to manually unroll the first iteration out of the loop (though looking at the code today I’m not sure all that complexity was necessary in the first place).
Fixes#103361
Fix non-associativity of `Instant` math on `aarch64-apple-darwin` targets
This is a duplicate of #94100 (since the original author is unresponsive), which resolves#91417.
On `aarch64-apple-darwin` targets, the internal resolution of `Instant` is lower than that of `Duration`, so math between them becomes non-associative with small-enough durations.
This PR makes this target use the standard Unix implementation (where `Instant` has 1ns resolution), but with `CLOCK_UPTIME_RAW` so it still returns the same values as `mach_absolute_time`[^1].
(Edit: I need someone to confirm that this still works, I do not have access to an M1 device.)
[^1]: https://www.manpagez.com/man/3/clock_gettime/
Support `#[track_caller]` on async fns
Adds `#[track_caller]` to the generator that is created when we desugar the async fn.
Fixes#78840
Open questions:
- What is the performance impact of adding `#[track_caller]` to every `GenFuture`'s `poll(...)` function, even if it's unused (i.e., the parent span does not set `#[track_caller]`)? We might need to set it only conditionally, if the indirection causes overhead we don't want.
Attempt to reuse `Vec<T>` backing storage for `Rc/Arc<[T]>`
If a `Vec<T>` has sufficient capacity to store the inner `RcBox<[T]>`, we can just reuse the existing allocation and shift the elements up, instead of making a new allocation.
x86_64 SSE2 fast-path for str.contains(&str) and short needles
Based on Wojciech Muła's [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#sse-avx2)
The two-way algorithm is Big-O efficient but it needs to preprocess the needle
to find a "critical factorization" of it. This additional work is significant
for short needles. Additionally it mostly advances needle.len() bytes at a time.
The SIMD-based approach used here on the other hand can advance based on its
vector width, which can exceed the needle length. Except for pathological cases,
but due to being limited to small needles the worst case blowup is also small.
benchmarks taken on a Zen2, compiled with `-Ccodegen-units=1`:
```
OLD:
test str::bench_contains_16b_in_long ... bench: 504 ns/iter (+/- 14) = 5061 MB/s
test str::bench_contains_2b_repeated_long ... bench: 948 ns/iter (+/- 175) = 2690 MB/s
test str::bench_contains_32b_in_long ... bench: 445 ns/iter (+/- 6) = 5732 MB/s
test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 1) = 569 MB/s
test str::bench_contains_bad_simd ... bench: 84 ns/iter (+/- 8) = 880 MB/s
test str::bench_contains_equal ... bench: 142 ns/iter (+/- 7) = 394 MB/s
test str::bench_contains_short_long ... bench: 677 ns/iter (+/- 25) = 3768 MB/s
test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 2) = 2074 MB/s
NEW:
test str::bench_contains_16b_in_long ... bench: 82 ns/iter (+/- 0) = 31109 MB/s
test str::bench_contains_2b_repeated_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s
test str::bench_contains_32b_in_long ... bench: 71 ns/iter (+/- 1) = 35929 MB/s
test str::bench_contains_bad_naive ... bench: 7 ns/iter (+/- 0) = 10571 MB/s
test str::bench_contains_bad_simd ... bench: 97 ns/iter (+/- 41) = 762 MB/s
test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) = 14000 MB/s
test str::bench_contains_short_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s
test str::bench_contains_short_short ... bench: 12 ns/iter (+/- 0) = 4666 MB/s
```
There seem to be some scenarios where `cpu.cfs_period_us` can contain `0`
This causes a panic when calling `std:🧵:available_parallelism()` as is done so
from binaries built by `cargo test`, which was how the issue was
discovered. I don't feel like `0` is a good value for `cpu.cfs_period_us`, but I
also don't think applications should panic if this value is seen.
This case is handled by other projects which read this information:
- num_cpus: e437b9d908/src/linux.rs (L207-L210)
- ninja: https://github.com/ninja-build/ninja/pull/2174/files
- dotnet: c4341d45ac/src/coreclr/pal/src/misc/cgroup.cpp (L481-L483)
Before this change, this panic could be seen in environments setup as described
above:
```
$ RUST_BACKTRACE=1 cargo test
Finished test [unoptimized + debuginfo] target(s) in 3.55s
Running unittests src/main.rs (target/debug/deps/x-9a42e145aca2934d)
thread 'main' panicked at 'attempt to divide by zero', library/std/src/sys/unix/thread.rs:546:70
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: std::sys::unix:🧵:cgroups::quota
4: std::sys::unix:🧵:available_parallelism
5: std:🧵:available_parallelism
6: test::helpers::concurrency::get_concurrency
7: test::console::run_tests_console
8: test::test_main
9: test::test_main_static
10: x::main
at ./src/main.rs:1:1
11: core::ops::function::FnOnce::call_once
at /tmp/rust-1.64-1.64.0-1/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass '--bin local-rabmq-amqpprox'
```
I've tested this change in an environment which has the bad setup and
rebuilding the test executable against a fixed std library fixes the
panic.
Make `pointer::byte_offset_from` more generic
As suggested by https://github.com/rust-lang/rust/issues/96283#issuecomment-1288792955 (cc ````@scottmcm),```` make `pointer::byte_offset_from` work on pointers of different types. `byte_offset_from` really doesn't care about pointer types, so this is totally fine and, for example, allows patterns like this:
```rust
ptr::addr_of!(x.b).byte_offset_from(ptr::addr_of!(x))
```
The only possible downside is that this removes the `T` == `U` hint to inference, but I don't think this matter much. I don't think there are a lot of cases where you'd want to use `byte_offset_from` with a pointer of unbounded type (and in such cases you can just specify the type).
````@rustbot```` label +T-libs-api
Fix inconsistent rounding of 0.5 when formatted to 0 decimal places
As described in #70336, when displaying values to zero decimal places the value of 0.5 is rounded to 1, which is inconsistent with the display of other half-integer values which round to even.
From testing the flt2dec implementation, it looks like this comes down to the condition in the fixed-width Dragon implementation where an empty buffer is treated as a case to apply rounding up. I believe the change below fixes it and updates only the relevant tests.
Nevertheless I am aware this is very much a core piece of functionality, so please take a very careful look to make sure I haven't missed anything. I hope this change does not break anything in the wider ecosystem as having a consistent rounding behaviour in floating point formatting is in my opinion a useful feature to have.
Resolves#70336
interpret: support for per-byte provenance
Also factors the provenance map into its own module.
The third commit does the same for the init mask. I can move it in a separate PR if you prefer.
Fixes https://github.com/rust-lang/miri/issues/2181
r? `@oli-obk`
- bump simd compare to 32bytes
- import small slice compare code from memmem crate
- try a few different probe bytes to avoid degenerate cases
- but special-case 2-byte needles
Add `rustc_deny_explicit_impl`
Also adjust `E0322` error message to be more general, since it's used for `DiscriminantKind` and `Pointee` as well.
Also add `rustc_deny_explicit_impl` on the `Tuple` and `Destruct` marker traits.
Move most of unwind's build script to lib.rs
Only the android libunwind detection remains in the build script
* Reduces dependence on build scripts for building the standard library
* Reduces dependence on exact target names in favor of using semantic cfg(target_*) usage.
* Keeps almost all code related to linking of the unwinder in one file
Remove unused symbols and diagnostic items
As the title suggests, this removes unused symbols from `sym::` and `#[rustc_diagnostic_item]` annotations that weren't mentioned anywhere.
Originally I tried to use grep, to find symbols and item names that are never mentioned via `sym::name`, however this produced a lot of false positives (?), for example clippy matching on `Symbol::as_str` or macros "implicitly" adding `sym::`. I ended up fixing all these false positives (?) by hand, but tbh I'm not sure if it was worth it...
Update compiler-builtins
This was originally a part of https://github.com/rust-lang/rust/pull/100316. However, extracting it to a seperate PR should help with any extra testing that might be needed.
Signed-off-by: Ayush Singh <ayushsingh1325@gmail.com>
Based on Wojciech Muła's "SIMD-friendly algorithms for substring searching"[0]
The two-way algorithm is Big-O efficient but it needs to preprocess the needle
to find a "criticla factorization" of it. This additional work is significant
for short needles. Additionally it mostly advances needle.len() bytes at a time.
The SIMD-based approach used here on the other hand can advance based on its
vector width, which can exceed the needle length. Except for pathological cases,
but due to being limited to small needles the worst case blowup is also small.
benchmarks taken on a Zen2:
```
16CGU, OLD:
test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 1)
test str::bench_contains_short_long ... bench: 667 ns/iter (+/- 29)
test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 2)
test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 2)
test str::bench_contains_equal ... bench: 148 ns/iter (+/- 4)
16CGU, NEW:
test str::bench_contains_short_short ... bench: 8 ns/iter (+/- 0)
test str::bench_contains_short_long ... bench: 135 ns/iter (+/- 4)
test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 2)
test str::bench_contains_bad_simd ... bench: 292 ns/iter (+/- 1)
test str::bench_contains_equal ... bench: 3 ns/iter (+/- 0)
1CGU, OLD:
test str::bench_contains_short_short ... bench: 30 ns/iter (+/- 0)
test str::bench_contains_short_long ... bench: 713 ns/iter (+/- 17)
test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 3)
test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 3)
test str::bench_contains_equal ... bench: 148 ns/iter (+/- 6)
1CGU, NEW:
test str::bench_contains_short_short ... bench: 10 ns/iter (+/- 0)
test str::bench_contains_short_long ... bench: 111 ns/iter (+/- 0)
test str::bench_contains_bad_naive ... bench: 135 ns/iter (+/- 3)
test str::bench_contains_bad_simd ... bench: 274 ns/iter (+/- 2)
test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0)
```
[0] http://0x80.pl/articles/simd-strfind.html#sse-avx2
The `max` variable is unused. This change introduces the `min_plus`
variable, to make the example similar to the one from `saturating_abs`.
An alternative would be to remove the unused variable.
Fixed some `_i32` notation in `maybe_uninit`’s doc
This PR just changed two lines in the documentation for `MaybeUninit`:
```rs
let val = 0x12345678i32;
```
was changed to:
```rs
let val = 0x12345678_i32;
```
in two doctests, making the values a tad easier to read.
It does not seem like there are other literals needing this change in the file.
Stabilize const char convert
Split out `const_char_from_u32_unchecked` from `const_char_convert` and stabilize the rest, i.e. stabilize the following functions:
```Rust
impl char {
pub const fn from_u32(self, i: u32) -> Option<char>;
pub const fn from_digit(self, num: u32, radix: u32) -> Option<char>;
pub const fn to_digit(self, radix: u32) -> Option<u32>;
}
// Available through core::char and std::char
mod char {
pub const fn from_u32(i: u32) -> Option<char>;
pub const fn from_digit(num: u32, radix: u32) -> Option<char>;
}
```
And put the following under the `from_u32_unchecked` const stability gate as it needs `Option::unwrap` which isn't const-stable (yet):
```Rust
impl char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
// Available through core::char and std::char
mod char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
```
cc the tracking issue #89259 (which I'd like to keep open for `const_char_from_u32_unchecked`).
Move `unix_socket_abstract` feature API to `SocketAddrExt`.
The pre-stabilized API for abstract socket addresses exposes methods on `SocketAddr` that are only enabled for `cfg(any(target_os = "android", target_os = "linux"))`. Per discussion in <https://github.com/rust-lang/rust/issues/85410>, moving these methods to an OS-specific extension trait is required before stabilization can be considered.
This PR makes four changes:
1. The internal module `std::os::net` contains logic for the unstable feature `tcp_quickack` (https://github.com/rust-lang/rust/issues/96256). I moved that code into `linux_ext/tcp.rs` and tried to adjust the module tree so it could accommodate a second unstable feature there.
2. Moves the public API out of `impl SocketAddr`, into `impl SocketAddrExt for SocketAddr` (the headline change).
3. The existing function names and docs for `unix_socket_abstract` refer to addresses as being created from abstract namespaces, but a more accurate description is that they create sockets in *the* abstract namespace. I adjusted the function signatures correspondingly and tried to update the docs to be clearer.
4. I also tweaked `from_abstract_name` so it takes an `AsRef<[u8]>` instead of `&[u8]`, allowing `b""` literals to be passed directly.
Issues:
1. The public module `std::os::linux::net` is marked as part of `tcp_quickack`. I couldn't figure out how to mark a module as being part of two unstable features, so I just left the existing attributes in place. My hope is that this will be fixed as a side-effect of stabilizing either feature.
Only the android libunwind detection remains in the build script
* Reduces dependence on build scripts for building the standard library
* Reduces dependence on exact target names in favor of using semantic
cfg(target_*) usage.
* Keeps almost all code related to linking of the unwinder in one file
Rollup of 9 pull requests
Successful merges:
- #103709 (ci: Upgrade dist-x86_64-netbsd to NetBSD 9.0)
- #103744 (Upgrade cc for working is_flag_supported on cross-compiles)
- #104105 (llvm: dwo only emitted when object code emitted)
- #104158 (Return .efi extension for EFI executable)
- #104181 (Add a few known-bug tests)
- #104266 (Regression test for coercion of mut-ref to dyn-star)
- #104300 (Document `Path::parent` behavior around relative paths)
- #104304 (Enable profiler in dist-s390x-linux)
- #104362 (Add `delay_span_bug` to `AttrWrapper::take_for_recovery`)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Document `Path::parent` behavior around relative paths
A relative path with just one component will return `Some("")` as its parent, which wasn't clear to me from the documentation.
The parent of `""` is `None`, which was missing from the documentation as well.
Change the way libunwind is linked for *-windows-gnullvm targets
I have no idea why previous way works for `x86_64-fortanix-unknown-sgx` (assuming it actually works...) but not for `gnullvm`. It fails when linking libtest during Rust build (unless somebody adds `RUSTFLAGS='-Clinkarg=-lunwind'`).
Also fixes exception handling on AArch64.
This was originally a part of https://github.com/rust-lang/rust/pull/100316.
However, extracting it to a seperate PR should help with any extra
testing that might be needed.
Signed-off-by: Ayush Singh <ayushsingh1325@gmail.com>
Use `derive_const` and rm manual StructuralEq impl
This does not change any semantics of the impl except for the const stability. It should be fine because trait methods and const bounds can never be used in stable without enabling `const_trait_impl`.
cc `@oli-obk`
Add small clarification around using pointers derived from references
r? `@RalfJung`
One question about your example from https://github.com/rust-lang/libs-team/issues/122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data?
```rust
fn foo(x: &mut NonZeroI32) {
let ptr = x as *mut NonZeroI32;
unsafe { ptr.cast::<i32>().write(0); } // no UB here
// What now? x is considered garbage when?
}
```
Merge crossbeam-channel into `std::sync::mpsc`
This PR imports the [`crossbeam-channel`](https://github.com/crossbeam-rs/crossbeam/tree/master/crossbeam-channel#crossbeam-channel) crate into the standard library as a private module, `sync::mpmc`. `sync::mpsc` is now implemented as a thin wrapper around `sync::mpmc`. The primary purpose of this PR is to resolve https://github.com/rust-lang/rust/issues/39364. The public API intentionally remains the same.
The reason https://github.com/rust-lang/rust/issues/39364 has not been fixed in over 5 years is that the current channel is *incredibly* complex. It was written many years ago and has sat mostly untouched since. `crossbeam-channel` has become the most popular alternative on crates.io, amassing over 30 million downloads. While crossbeam's channel is also complex, like all fast concurrent data structures, it avoids some of the major issues with the current implementation around dynamic flavor upgrades. The new implementation decides on the datastructure to be used when the channel is created, and the channel retains that structure until it is dropped.
Replacing `sync::mpsc` with a simpler, less performant implementation has been discussed as an alternative. However, Rust touts itself as enabling *fearless concurrency*, and having the standard library feature a subpar implementation of a core concurrency primitive doesn't feel right. The argument is that slower is better than broken, but this PR shows that we can do better.
As mentioned before, the primary purpose of this PR is to fix https://github.com/rust-lang/rust/issues/39364, and so the public API intentionally remains the same. *After* that problem is fixed, the fact that `sync::mpmc` now exists makes it easier to fix the primary limitation of `mpsc`, the fact that it only supports a single consumer. spmc and mpmc are two other common concurrency patterns, and this change enables a path to deprecating `mpsc` and exposing a general `sync::channel` module that supports multiple consumers. It also implements other useful methods such as `send_timeout`. That said, exposing MPMC and other new functionality is mostly out of scope for this PR, and it would be helpful if discussion stays on topic :)
For what it's worth, the new implementation has also been shown to be more performant in [some basic benchmarks](https://github.com/crossbeam-rs/crossbeam/tree/master/crossbeam-channel/benchmarks#results).
cc `@taiki-e`
r? rust-lang/libs
Improve performance of `rem_euclid()` for signed integers
such code is copy from
https://github.com/rust-lang/rust/blob/master/library/std/src/f32.rs and
https://github.com/rust-lang/rust/blob/master/library/std/src/f64.rs
using `r+rhs.abs()` is faster than calc it with an if clause. Bench result:
```
$ cargo bench
Compiling div-euclid v0.1.0 (/me/div-euclid)
Finished bench [optimized] target(s) in 1.01s
Running unittests src/lib.rs (target/release/deps/div_euclid-7a4530ca7817d1ef)
running 7 tests
test tests::it_works ... ignored
test tests::bench_aaabs ... bench: 10,498,793 ns/iter (+/- 104,360)
test tests::bench_aadefault ... bench: 11,061,862 ns/iter (+/- 94,107)
test tests::bench_abs ... bench: 10,477,193 ns/iter (+/- 81,942)
test tests::bench_default ... bench: 10,622,983 ns/iter (+/- 25,119)
test tests::bench_zzabs ... bench: 10,481,971 ns/iter (+/- 43,787)
test tests::bench_zzdefault ... bench: 11,074,976 ns/iter (+/- 29,633)
test result: ok. 0 passed; 0 failed; 1 ignored; 6 measured; 0 filtered out; finished in 19.35s
```
It seems that, default `rem_euclid` triggered a branch prediction, thus `bench_default` is faster than `bench_aadefault` and `bench_aadefault`, which shuffles the order of calculations. but all of them slower than what it was in `f64`'s and `f32`'s `rem_euclid`, thus I submit this PR.
bench code:
```rust
#![feature(test)]
extern crate test;
fn rem_euclid(a:i32,rhs:i32)->i32{
let r = a % rhs;
if r < 0 { r + rhs.abs() } else { r }
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
use rand::prelude::*;
use rand::rngs::SmallRng;
const N:i32=1000;
#[test]
fn it_works() {
let a: i32 = 7; // or any other integer type
let b = 4;
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
for i in &d {
for j in &n {
assert_eq!(i.rem_euclid(*j),rem_euclid(*i,*j));
}
}
assert_eq!(rem_euclid(a,b), 3);
assert_eq!(rem_euclid(-a,b), 1);
assert_eq!(rem_euclid(a,-b), 3);
assert_eq!(rem_euclid(-a,-b), 1);
}
#[bench]
fn bench_aaabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_aadefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_abs(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_default(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_zzabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_zzdefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
}
```
Add the `#[derive_const]` attribute
Closes#102371. This is a minimal patchset for the attribute to work. There are no restrictions on what traits this attribute applies to.
r? `````@oli-obk`````
Remove lock wrappers in `sys_common`
This moves the lazy allocation to `sys` (SGX and UNIX). While this leads to a bit more verbosity, it will simplify future improvements by making room in `sys_common` for platform-independent implementations.
This also removes the condvar check on SGX as it is not necessary for soundness and will be removed anyway once mutex has been made movable.
For simplicity's sake, `libunwind` also uses lazy allocation now on SGX. This will require an update to the C definitions before merging this (CC `@raoulstrackx).`
r? `@m-ou-se`
A relative path with just one component will return `Some("")` as its
parent, which wasn't clear to me from the documentation.
The parent of `""` is `None`, which was missing from the documentation
as well.
Make `Hash`, `Hasher` and `BuildHasher` `#[const_trait]` and make `Sip` const `Hasher`
This PR enables using Hashes in const context.
r? ``@fee1-dead``
This patch allows the usage of the `track_caller` annotation on
generators, as well as sets them conditionally if the parent also has
`track_caller` set.
Also add this annotation on the `GenFuture`'s `poll()` function.
Add support for custom mir
This implements rust-lang/compiler-team#564 . Details about the design, motivation, etc. can be found in there.
r? ```@oli-obk```
Add context to compiler error message
Changed `creates a temporary which is freed while still in use` to `creates a temporary value which is freed while still in use`.
Const Compare for Tuples
Makes the impls for Tuples of ~const `PartialEq` types also `PartialEq`, impls for Tuples of ~const `PartialOrd` types also `PartialOrd`, for Tuples of ~const `Ord` types also `Ord`.
behind the `#![feature(const_cmp)]` gate.
~~Do not merge before #104113 is merged because I want to use this feature to clean up the new test that I added there.~~
r? ``@fee1-dead``
Added documentation for IPv6 Addresses `IN6ADDR_ANY_INIT` also known as
`in6addr_any` and `IN6ADDR_LOOPBACK_INIT` also known as
`in6addr_loopback` similar to `INADDR_ANY` for IPv4 Addresses.
Clarify licensing situation of MPSC and SPSC queue
Originally, these two files were licensed under the `BSD-2-Clause` license, as they were based off sample code on a blog licensing those snippets under that license:
* `library/std/src/sync/mpsc/mpsc_queue.rs`
* `library/std/src/sync/mpsc/spsc_queue.rs`
In 2017 though, the author of that blog agreed to relicense their code under the standard `MIT OR Apache-2.0` license in https://github.com/rust-lang/rust/pull/42149. This PR clarifies the situation in the files by expanding the comment at the top of the file.
r? ``@pnkfelix``
Fix `const_fn_trait_ref_impl`, add test for it
#99943 broke `#[feature(const_fn_trait_ref_impl)]`, this PR fixes this and adds a test for it.
r? ````@fee1-dead````
run alloc benchmarks in Miri and fix UB
Miri since recently has a "fake monotonic clock" that works even with isolation. Its measurements are not very meaningful but it means we can run these benches and check them for UB.
And that's a good thing since there was UB here: fixes https://github.com/rust-lang/rust/issues/104096.
r? ``@thomcc``
disable btree size tests on Miri
Seems fine not to run these in Miri, they can't have UB anyway. And this lets us do layout randomization in Miri.
r? ``@thomcc``
Specialize `iter::ArrayChunks::fold` for TrustedRandomAccess iterators
```
OLD:
test iter::bench_trusted_random_access_chunks ... bench: 368 ns/iter (+/- 4)
NEW:
test iter::bench_trusted_random_access_chunks ... bench: 30 ns/iter (+/- 0)
```
The resulting assembly is similar to #103166 but the specialization kicks in under different (partially overlapping) conditions compared to that PR. They're complementary.
In principle a TRA-based specialization could be applied to all `ArrayChunks` methods, including `next()` as we do for `Zip` but that would have all the same hazards as the Zip specialization. Only doing it for `fold` is far less hazardous. The downside is that it only helps with internal, exhaustive iteration. I.e. `for _ in` or `try_fold` will not benefit.
Note that the regular, `try_fold`-based and the specialized `fold()` impl have observably slightly different behavior. Namely the specialized variant does not fetch the remainder elements from the underlying iterator. We do have a few other places in the standard library where beyond-the-end-of-iteration side-effects are being elided under some circumstances but not others.
Inspired by https://old.reddit.com/r/rust/comments/yaft60/zerocost_iterator_abstractionsnot_so_zerocost/
The type is unsafe and now exposed to the whole crate.
Document it properly and add an unsafe method so the
caller can make it visible that something unsafe is happening.
Implement `std::marker::Tuple`, use it in `extern "rust-call"` and `Fn`-family traits
Implements rust-lang/compiler-team#537
I made a few opinionated decisions in this implementation, specifically:
1. Enforcing `extern "rust-call"` on fn items during wfcheck,
2. Enforcing this for all functions (not just ones that have bodies),
3. Gating this `Tuple` marker trait behind its own feature, instead of grouping it into (e.g.) `unboxed_closures`.
Still needing to be done:
1. Enforce that `extern "rust-call"` `fn`-ptrs are well-formed only if they have 1/2 args and the second one implements `Tuple`. (Doing this would fix ICE in #66696.)
2. Deny all explicit/user `impl`s of the `Tuple` trait, kinda like `Sized`.
3. Fixing `Tuple` trait built-in impl for chalk, so that chalkification tests are un-broken.
Open questions:
1. Does this need t-lang or t-libs signoff?
Fixes#99820
fix a comment in UnsafeCell::new
There are several safe methods that access the inner value: `into_inner` has existed since forever and `get_mut` also exists since recently. So this comment seems just wrong. But `&self` methods return raw pointers and thus require unsafe code (though the methods themselves are still safe).
libtest: run all tests in their own thread, if supported by the host
This reverts the threading changes of https://github.com/rust-lang/rust/pull/56243, which made it so that with `-j1`, the test harness does not spawn any threads. Those changes were done to enable Miri to run the test harness, but Miri supports threads nowadays, so this is no longer needed. Using a thread for each test is useful because the thread's name can be set to the test's name which makes panic messages consistent between `-j1` and `-j2` runs and also a bit more readable.
I did not revert the HashMap changes of https://github.com/rust-lang/rust/pull/56243; using a deterministic map seems fine for the test harness and the more deterministic testing is the better.
Fixes https://github.com/rust-lang/rust/issues/59122
Fixes https://github.com/rust-lang/rust/issues/70492
benchmark result:
```
$ cargo bench
Compiling div-euclid v0.1.0 (/me/div-euclid)
Finished bench [optimized] target(s) in 1.01s
Running unittests src/lib.rs (target/release/deps/div_euclid-7a4530ca7817d1ef)
running 7 tests
test tests::it_works ... ignored
test tests::bench_aaabs ... bench: 10,498,793 ns/iter (+/- 104,360)
test tests::bench_aadefault ... bench: 11,061,862 ns/iter (+/- 94,107)
test tests::bench_abs ... bench: 10,477,193 ns/iter (+/- 81,942)
test tests::bench_default ... bench: 10,622,983 ns/iter (+/- 25,119)
test tests::bench_zzabs ... bench: 10,481,971 ns/iter (+/- 43,787)
test tests::bench_zzdefault ... bench: 11,074,976 ns/iter (+/- 29,633)
test result: ok. 0 passed; 0 failed; 1 ignored; 6 measured; 0 filtered out; finished in 19.35s
```
benchmark code:
```rust
#![feature(test)]
extern crate test;
#[inline(always)]
fn rem_euclid(a:i32,rhs:i32)->i32{
let r = a % rhs;
if r < 0 {
// if rhs is `integer::MIN`, rhs.wrapping_abs() == rhs.wrapping_abs,
// thus r.wrapping_add(rhs.wrapping_abs()) == r.wrapping_add(rhs) == r - rhs,
// which suits our need.
// otherwise, rhs.wrapping_abs() == -rhs, which won't overflow since r is negative.
r.wrapping_add(rhs.wrapping_abs())
} else {
r
}
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
use rand::prelude::*;
use rand::rngs::SmallRng;
const N:i32=1000;
#[test]
fn it_works() {
let a: i32 = 7; // or any other integer type
let b = 4;
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
for i in &d {
for j in &n {
assert_eq!(i.rem_euclid(*j),rem_euclid(*i,*j));
}
}
assert_eq!(rem_euclid(a,b), 3);
assert_eq!(rem_euclid(-a,b), 1);
assert_eq!(rem_euclid(a,-b), 3);
assert_eq!(rem_euclid(-a,-b), 1);
}
#[bench]
fn bench_aaabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_aadefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_abs(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_default(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_zzabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_zzdefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
}
```
such code is copy from
https://github.com/rust-lang/rust/blob/master/library/std/src/f32.rs
and
https://github.com/rust-lang/rust/blob/master/library/std/src/f64.rs
using r+rhs.abs() is faster than calc it directly.
Bench result:
```
$ cargo bench
Compiling div-euclid v0.1.0 (/me/div-euclid)
Finished bench [optimized] target(s) in 1.01s
Running unittests src/lib.rs (target/release/deps/div_euclid-7a4530ca7817d1ef)
running 7 tests
test tests::it_works ... ignored
test tests::bench_aaabs ... bench: 10,498,793 ns/iter (+/- 104,360)
test tests::bench_aadefault ... bench: 11,061,862 ns/iter (+/- 94,107)
test tests::bench_abs ... bench: 10,477,193 ns/iter (+/- 81,942)
test tests::bench_default ... bench: 10,622,983 ns/iter (+/- 25,119)
test tests::bench_zzabs ... bench: 10,481,971 ns/iter (+/- 43,787)
test tests::bench_zzdefault ... bench: 11,074,976 ns/iter (+/- 29,633)
test result: ok. 0 passed; 0 failed; 1 ignored; 6 measured; 0 filtered out; finished in 19.35s
```
bench code:
```
#![feature(test)]
extern crate test;
fn rem_euclid(a:i32,rhs:i32)->i32{
let r = a % rhs;
if r < 0 { r + rhs.abs() } else { r }
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
use rand::prelude::*;
use rand::rngs::SmallRng;
const N:i32=1000;
#[test]
fn it_works() {
let a: i32 = 7; // or any other integer type
let b = 4;
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
for i in &d {
for j in &n {
assert_eq!(i.rem_euclid(*j),rem_euclid(*i,*j));
}
}
assert_eq!(rem_euclid(a,b), 3);
assert_eq!(rem_euclid(-a,b), 1);
assert_eq!(rem_euclid(a,-b), 3);
assert_eq!(rem_euclid(-a,-b), 1);
}
#[bench]
fn bench_aaabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_aadefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_abs(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_default(b: &mut Bencher) {
let d:Vec<i32>=(-N..=N).collect();
let n:Vec<i32>=(-N..0).chain(1..=N).collect();
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
#[bench]
fn bench_zzabs(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=rem_euclid(*i,*j);
}
}
res
});
}
#[bench]
fn bench_zzdefault(b: &mut Bencher) {
let mut d:Vec<i32>=(-N..=N).collect();
let mut n:Vec<i32>=(-N..0).chain(1..=N).collect();
let mut rng=SmallRng::from_seed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,21]);
d.shuffle(&mut rng);
n.shuffle(&mut rng);
d.shuffle(&mut rng);
b.iter(||{
let mut res=0;
for i in &d {
for j in &n {
res+=i.rem_euclid(*j);
}
}
res
});
}
}
```
After rust-lang/rust#101946 this completes the move to cfg-if 1.0 by:
* Updating getrandom 0.1.14->0.1.16
* Updating panic_abort, panic_unwind, and unwind to cfg-if 1.0
Rewrite implementation of `#[alloc_error_handler]`
The new implementation doesn't use weak lang items and instead changes `#[alloc_error_handler]` to an attribute macro just like `#[global_allocator]`.
The attribute will generate the `__rg_oom` function which is called by the compiler-generated `__rust_alloc_error_handler`. If no `__rg_oom` function is defined in any crate then the compiler shim will call `__rdl_oom` in the alloc crate which will simply panic.
This also fixes link errors with `-C link-dead-code` with `default_alloc_error_handler`: `__rg_oom` was previously defined in the alloc crate and would attempt to reference the `oom` lang item, even if it didn't exist. This worked as long as `__rg_oom` was excluded from linking since it was not called.
This is a prerequisite for the stabilization of `default_alloc_error_handler` (#102318).
Include both benchmarks and tests in the numbers given to `TeFiltered{,Out}`
Fixes#103794
`#[bench]` is broken on nightly without this, sadly. It apparently has no test coverage. In addition to manually testing, I've added a run-make smokecheck for this (which would have caught the issue), but it would be nice to have a better way to test, err, libtest. For now we should get this in ASAP IMO
The new implementation doesn't use weak lang items and instead changes
`#[alloc_error_handler]` to an attribute macro just like
`#[global_allocator]`.
The attribute will generate the `__rg_oom` function which is called by
the compiler-generated `__rust_alloc_error_handler`. If no `__rg_oom`
function is defined in any crate then the compiler shim will call
`__rdl_oom` in the alloc crate which will simply panic.
This also fixes link errors with `-C link-dead-code` with
`default_alloc_error_handler`: `__rg_oom` was previously defined in the
alloc crate and would attempt to reference the `oom` lang item, even if
it didn't exist. This worked as long as `__rg_oom` was excluded from
linking since it was not called.
This is a prerequisite for the stabilization of
`default_alloc_error_handler` (#102318).
Do fewer passes and generally be more efficient when filtering tests
Follow-on of the work I started with this PR: https://github.com/rust-lang/rust/pull/99939
Basically, the startup code for libtest is really inefficient, but that's not usually a problem because it is distributed in release and workloads are small. But under Miri which can be 100x slower than a debug build, these inefficiencies explode.
Most of the diff here is making test filtering single-pass. There are a few other small optimizations as well, but they are more straightforward.
With this PR, the startup time of the `iced` tests with `--features=code_asm,mvex` drops from 17 to 2 minutes (I think Miri has gotten slower under this workload since #99939). The easiest way to try this out is to set `MIRI_LIB_SRC` to a checkout of this branch when running `cargo +nightly miri test --features=code_asm,mvex`.
r? `@thomcc`
Prevent foreign Rust exceptions from being caught
Fix#102715
Use the address of a static variable (which is guaranteed to be unique per copy of std) to tell apart if a Rust exception comes from local or foreign Rust code, and abort for the latter.
The signature for new was
```
fn new<F>(f: F) -> Lazy<T, F>
```
Notably, with `F` unconstrained, `T` can be literally anything, and just
`let _ = Lazy::new(|| 92)` would not typecheck.
This historiacally was a necessity -- `new` is a `const` function, it
couldn't have any bounds. Today though, we can move `new` under the `F:
FnOnce() -> T` bound, which gives the compiler enough data to infer the
type of T from closure.
poll_fn and Unpin: fix pinning
See [IRLO](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484) for details: currently `poll_fn` is very subtle to use, since it does not pin the closure, so creating a `Pin::get_unchcked(&mut capture)` inside the closure is unsound. This leads to actual miscompilations with `futures::join!`.
IMO the proper fix is to pin the closure when the future is pinned, which is achieved by changing the `Unpin` implementation. This is a breaking change though. 1.64.0 was *just* released, so maybe this is still okay?
The alternative would be to add some strong comments to the docs saying that closure captures are *not pinned* and doing `Pin::get_unchecked` on them is unsound.