This change was prompted by the stage1 compiler spending 4% of its time
when compiling the polymorphic-recursion MIR opt test in `unlikely`.
Intrinsic fallback bodies like `unlikely` should always be inlined, it's
very silly if they are not. To do this, we enable the fallback bodies to
be cross-crate inlineable. Not that this matters for our workloads since
the compiler never actually _uses_ the "fallback bodies", it just uses
whatever was cfg(bootstrap)ped, so I've also added `#[inline]` to those.
The current complexities in `assert_unsafe_precondition` are delicately
balancing several concerns, among them compile times for the cases where
there are no debug assertions. This comes at a large runtime cost when
the assertions are enabled, making the debug assertion compiler a lot
slower, which is very annoying.
To avoid this, we always inline the check when building with debug
assertions.
Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)`
`debug_assertions`: 67s
- this: 54s
So this seems like a good solution. I think we can still get
the same run-time perf improvements for other users too by
massaging this code further (see my other PR about adding
`#[rustc_no_mir_inline]`) but this is a simpler step that
solves the imminent problem of "holy shit my rustc is sooo slow".
Funny consequence: This now means compiling the standard library with
dbeug assertions makes it faster (than without, when using debug
assertions downstream)!
Store core::str::CharSearcher::utf8_size as u8
This is already relied on being smaller than u8 due to the `safety invariant: utf8_size must be less than 5`, so this helps LLVM optimize and maybe improve copies due to padding instead of unused bytes.
Specialize some methods of `io::Chain`
This PR specializes the implementation of some methods of `io::Chain`, which could bring performance improvements when using it.
Portable SIMD subtree update
Syncs nightly to the latest changes from rust-lang/portable-simd
r? `@rust-lang/libs`
Also, fixes#119904 which is now fixed upstream.
Reduce monomorphisation bloat in small_c_string
This is a code path usually next to an FFI call, so taking the `dyn` slowdown for the 1159 llvm-line (fat lto, codegen-units 1, release build) drop in my testing program [t2fanrd](https://github.com/GnomedDev/t2fanrd) is worth it imo.
Remove unnecessary unit binding
It appears that the unit binding is not necessary at this time. However, I am unsure of its importance in the past. Please let me know if it is unsafe to remove.
Move `OsStr::slice_encoded_bytes` validation to platform modules
This delegates OS string slicing (`OsStr::slice_encoded_bytes`) validation to the underlying platform implementation. For now that results in increased performance and better error messages on Windows without any changes to semantics. In the future we may want to provide different semantics for different platforms.
The existing implementation is still used on Unix and most other platforms and is now optimized a little better.
Tracking issue: https://github.com/rust-lang/rust/issues/118485
cc `@epage,` `@BurntSushi`
Tracking import use types for more accurate redundant import checking
fixes#117448
By tracking import use types to check whether it is scope uses or the other situations like module-relative uses, we can do more accurate redundant import checking.
For example unnecessary imports in std::prelude that can be eliminated:
```rust
use std::option::Option::Some;//~ WARNING the item `Some` is imported redundantly
use std::option::Option::None; //~ WARNING the item `None` is imported redundantly
```
fixes#117448
For example unnecessary imports in std::prelude that can be eliminated:
```rust
use std::option::Option::Some;//~ WARNING the item `Some` is imported redundantly
use std::option::Option::None; //~ WARNING the item `None` is imported redundantly
```
Fix typo in VecDeque::handle_capacity_increase() doc comment.
Strategies B and C both show a full buffer before the capacity increase, while strategy A had one empty element left. Filled the last element in.
Don't use mem::zeroed in vec::IntoIter
`mem::zeroed` is not a trivial function. Maybe it was once, but now it involves multiple locals, copies, and an intrinsic that gets monomorphized into a call to `panic_nounwind` for iterators of types like `Vec<&T>`. Of course all that complexity is trivially optimized out, but generating a bunch of IR where we don't need to just so we can optimize it away later is silly.
Add examples to document the return type of quickselect functions
Currently, `select_nth_unstable`, `select_nth_unstable_by`, and `select_nth_unstable_by_key`'s examples do not show how to use the return values of the functions in an example, so this PR adds that in.
Note: I didn't know what to call the parameters, so I settled on lesser, median, greater because the example is used for median finding so I retained that naming for the pivot, but lesser and greater are poor names for the example that sorts in descending order, because lesser and greater are then flipped.
I think it's common to say "lo" and "hi" for low and high respectively, but that's also not great when the comparator flips the elements. Otherwise, "left" and "right" are also commonly used but I think that's poor naming because some languages read right to left so those names are also unintuitive.
Lesser and greater are also not that great but I found a test that used `less`, `equal`, `greater` so I took that: dfa88b328f/library/core/tests/slice.rs (L1962)
Use a hardcoded constant instead of calling OpenProcessToken.
Now that Win 7 support is dropped, we can resurrect #90144.
GetCurrentProcessToken is defined in processthreadsapi.h as:
FORCEINLINE
HANDLE
GetCurrentProcessToken (
VOID
)
{
return (HANDLE)(LONG_PTR) -4;
}
Since it's very unlikely that this constant will ever change, let's just use it instead of making calls to get the same information.
Make `io::BorrowedCursor::advance` safe
This also keeps the old `advance` method under `advance_unchecked` name.
This makes pattern like `std::io::default_read_buf` safe to write.
Now that Win 7 support is dropped, we can resurrect #90144.
GetCurrentProcessToken is defined in processthreadsapi.h as:
FORCEINLINE
HANDLE
GetCurrentProcessToken (
VOID
)
{
return (HANDLE)(LONG_PTR) -4;
}
Since it's very unlikely that this constant will ever change, let's just use it instead of making calls to get the same information.
Rename MaybeUninit::write_slice
A step to push #79995 forward.
https://github.com/rust-lang/libs-team/issues/122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Implement intrinsics with fallback bodies
fixes#93145 (though we can port many more intrinsics)
cc #63585
The way this works is that the backend logic for generating custom code for intrinsics has been made fallible. The only failure path is "this intrinsic is unknown". The `Instance` (that was `InstanceDef::Intrinsic`) then gets converted to `InstanceDef::Item`, which represents the fallback body. A regular function call to that body is then codegenned. This is currently implemented for
* codegen_ssa (so llvm and gcc)
* codegen_cranelift
other backends will need to adjust, but they can just keep doing what they were doing if they prefer (though adding new intrinsics to the compiler will then require them to implement them, instead of getting the fallback body).
cc `@scottmcm` `@WaffleLapkin`
### todo
* [ ] miri support
* [x] default intrinsic name to name of function instead of requiring it to be specified in attribute
* [x] make sure that the bodies are always available (must be collected for metadata)
doc: add note about panicking examples for strict_overflow_ops
The first commit adds a note before the panicking examples for strict_overflow_ops to make it clearer that the following examples should panic and why, without needing the reader to hover the mouse over the information icon.
The second commit adds panicking examples for division by zero operations for strict division operations on unsigned numbers. The signed numbers already have two panicking examples each: one for division by zero and one for overflowing division (`MIN/-1`); this commit includes the division by zero examples for the unsigned numbers.
Waker::will_wake: Compare vtable address instead of its content
Optimize will_wake implementation by comparing vtable address instead of its content.
The existing best practice to avoid false negatives from will_wake is to define a waker vtable as a static item. That approach continues to works with the new implementation.
While this potentially changes the observable behaviour, the function is documented to work on a best-effort basis. The PartialEq impl for RawWaker remains as it was.
Add `ErrorGuaranteed` to `ast::LitKind::Err`, `token::LitKind::Err`.
Similar to recent work doing the same for `ExprKind::Err` (#120586) and `TyKind::Err` (#121109).
r? `@oli-obk`
std::thread update freebsd stack guard handling.
up to now, it had been assumed the stack guard setting default is not touched in the field but some user might just want to disable it or increase it. checking it once at runtime should be enough.
Fix BTreeMap's Cursor::remove_{next,prev}
These would incorrectly leave `current` as `None` after a failed attempt to remove an element (due to the cursor already being at the start/end).
Clarified docs on non-atomic oprations on owned/mut refs to atomics
I originally misinterpreted the documentation to mean that the compiler can/will automatically optimise away atomic operations whenever the data is owned or mutably referenced.
On re-reading I think it is not technically incorrect, but specifically mentioning _how_ the atomic operations can be avoided also prevents this misunderstanding.
Make contributing to windows bindings easier
This PR does three things:
- Automatically sorts bindings so contributors don't have to. I should have done this to begin with but was lazy.
- Renames `windows_sys.lst` to `bindings.txt`. This [matches the windows-rs repository](8e71051ea8/crates/tools/sys/bindings.txt) (and repos that copy it). I believe consistency with other projects helps get people orientated.
- Adds a `README.md` file explaining what this is about and how to add bindings. This has the benefit of being directly editable and it's rendered when viewed online. Also people are understandably jumping right into the `windows_sys.rs` file via ripgrep or github search and so missing that it's generated. A `README.md` alongside it is at least slightly more obvious in that case. There is still a small note at the top of `windows_sys` in case people do read from the beginning.
None of this has any impact on the actual code generated. It's purely to make the new contributors workflow a bit nicer.
This mostly works well, and eliminates a couple of delayed bugs.
One annoying thing is that we should really also add an
`ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's
difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`,
so we have to fake it.
Rollup of 13 pull requests
Successful merges:
- #116387 (Additional doc links and explanation of `Wake`.)
- #118738 (Netbsd10 update)
- #118890 (Clarify the lifetimes of allocations returned by the `Allocator` trait)
- #120498 (Uplift `TypeVisitableExt` into `rustc_type_ir`)
- #120530 (Be less confident when `dyn` suggestion is not checked for object safety)
- #120915 (Fix suggestion span for `?Sized` when param type has default)
- #121015 (Optimize `delayed_bug` handling.)
- #121024 (implement `Default` for `AsciiChar`)
- #121039 (Correctly compute adjustment casts in GVN)
- #121045 (Fix two UI tests with incorrect directive / invalid revision)
- #121049 (Do not point at `#[allow(_)]` as the reason for compat lint triggering)
- #121071 (Use fewer delayed bugs.)
- #121073 (Fix typos in `OneLock` doc)
r? `@ghost`
`@rustbot` modify labels: rollup
implement `Default` for `AsciiChar`
This implements `Default` for `AsciiChar` in order to match `char`'s implementation.
From all the different possible ways to do this I think the clearest one is to have both `char` and `AsciiChar` impls together.
I've also updated the doc-comment of the default variant since rustdoc doesn't seem to indicate it otherwise. Probably the text could be improved, though. I couldn't find any similar examples in the codebase and suggestions are welcomed.
r? `@scottmcm`
Clarify the lifetimes of allocations returned by the `Allocator` trait
The previous definition (accidentally) disallowed the implementation of stack-based allocators whose memory would become invalid once the lifetime of the allocator type ended.
This also ensures the validity of the following blanket implementation:
```rust
impl<A: Allocator> Allocator for &'_ A {}
```
Additional doc links and explanation of `Wake`.
This is intended to clarify:
* That `Wake` exists and can be used instead of `RawWaker`.
* How to construct a `Waker` when you are looking at `Wake` (which was previously only documented in the example).
Optimize away poison guards when std is built with panic=abort
> **Note**: To take advantage of this PR, you will have to use `-Zbuild-std` or build your own toolchain. rustup toolchains always link to a libstd that was compiled with `panic=unwind`, since it's compatible with `panic=abort` code.
When std is compiled with `panic=abort` we can remove a lot of the poison machinery from the locks. This changes the `Flag` and `Guard` types to be ZSTs. It also adds an uninhabited member to `PoisonError` so the compiler knows it can optimize away the `Result::Err` paths, and make `LockResult<T>` layout-equivalent to `T`.
### Is this a breaking change?
`PoisonError::new` now panics if invoked from a libstd built with `panic="abort"` (or any non-`unwind` strategy). It is unclear to me whether to consider this a breaking change.
In order to encounter this behavior, **both of the following must be true**:
#### Using a libstd with `panic="abort"`
This is pretty uncommon. We don't build libstd with that in rustup, except in (Tier 2-3) platforms that do not support unwinding, **most notably wasm**.
Most people who do this are using cargo's `-Z build-std` feature, which is unstable.
`panic="abort"` is not a supported option in Rust's build system. It is possible to configure it using `CARGO_TARGET_xxx_RUSTFLAGS`, but I believe this only works on **non-host** platforms.
#### Creating `PoisonError` manually
This is also unlikely. The only common use case I can think of is in tests, and you can't run tests with `panic="abort"` without the unstable `-Z panic_abort_tests` flag.
It's possible that someone is implementing their own locks using std's `PoisonError` **and** defining "thread failure" to mean something other than "panic". If this is the case then we would break their code if it was used with a `panic="abort"` libstd. The locking crates I know of don't replicate std's poison API, but I haven't done much research into this yet.
I've touched on a fair number of considerations here. Which ones do people consider relevant?
`compile_fail` should only be used when the code is meant to show
what *not* to do. In other words, there should be a fundamental flaw
in the code. However, in this case, the example is just incomplete,
so we should use `ignore` to avoid confusing readers.
Rollup of 10 pull requests
Successful merges:
- #120696 (Properly handle `async` block and `async fn` in `if` exprs without `else`)
- #120751 (Provide more suggestions on invalid equality where bounds)
- #120802 (Bail out of drop elaboration when encountering error types)
- #120967 (docs: mention round-to-even in precision formatting)
- #120973 (allow static_mut_ref in some tests that specifically test mutable statics)
- #120974 (llvm-wrapper: adapt for LLVM API change: Add support for EXPORTAS name types)
- #120986 (iterator.rs: remove "Basic usage" text)
- #120987 (remove redundant logic)
- #120988 (fix comment)
- #120995 (PassWrapper: adapt for llvm/llvm-project@93cdd1b5cf)
r? `@ghost`
`@rustbot` modify labels: rollup
docs: mention round-to-even in precision formatting
_Note_: Not quite sure exactly how to format this documentation.
Mentions round-to-even usage in precision formatting. (should this also be mentioned in `f64::round`?)
From https://github.com/rust-lang/rust/issues/70336
Implement sys/thread for UEFI
Since UEFI has no concept of threads, most of this module can be ignored. However, implementing parts that make sense.
- Implement sleep
- Implement available_parallelism
improve `btree_cursors` functions documentation
As suggested by ``@Amanieu`` (and others) in #107540 (https://github.com/rust-lang/rust/issues/107540#issuecomment-1937760547)
Improvements:
- Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`.
- Added another doc-example for the `Bound::Unbounded` for each of the methods
- Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`.
- Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago.
- Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
The previous definition (accidentally) disallowed the implementation of
stack-based allocators whose memory would become invalid once the
lifetime of the allocator type ended.
This also ensures the validity of the following blanket implementation:
```rust
impl<A: Allocator> Allocator for &'_ A {}
```
Replace pthread `RwLock` with custom implementation
This is one of the last items in #93740. I'm doing `RwLock` first because it is more self-contained and has less tradeoffs to make. The motivation is explained in the documentation, but in short: the pthread rwlock is slow and buggy and `std` can do much better. I considered implementing a parking lot, as was discussed in the tracking issue, but settled for the queue-based version because writing self-balancing binary trees is not fun in Rust...
This is a rather complex change, so I have added quite a bit of documentation to help explain it. Please point out any part that could be explained better.
~~The read performance is really good, I'm getting 4x the throughput of the pthread version and about the same performance as usync/parking_lot on an Apple M1 Max in the usync benchmark suite, but the write performance still falls way behind what usync and parking_lot achieve. I tried using a separate queue lock like what usync uses, but that didn't help. I'll try to investigate further in the future, but I wanted to get some eyes on this first.~~ [Resolved](https://github.com/rust-lang/rust/pull/110211#issuecomment-1513682336)
r? `@m-ou-se`
CC `@kprotty`
assert_unsafe_precondition cleanup
I moved the polymorphic `is_nonoverlapping` into the `Cell` function that uses it and renamed `intrinsics::is_nonoverlapping_mono` to just `intrinsics::is_nonoverlapping`.
We now also have some docs for `intrinsics::debug_assertions`.
r? RalfJung
Make cmath.rs a single file
It makes sense to have this all in one file. There's essentially only one target that has missing symbols and that's easy enough to handle inline.
Note that the Windows definitions used to use `c_float` and `c_double` whereas the other platforms all used `f32` and `f64`. They've now been made consistent. However, `c_float` and `c_double` have the expected definitions on all Windows platforms we support.
Create try_new function for ThinBox
The `allocator_api` feature has proven very useful in my work in the FreeBSD kernel. I've found a few places where a `ThinBox` #92791 would be useful, but it must be able to be fallibly allocated for it to be used in the kernel.
This PR proposes a change to add such a constructor for ThinBox.
ACP: https://github.com/rust-lang/libs-team/issues/213
Since UEFI has no concept of threads, most of this module can be
ignored. However, implementing parts that make sense.
- Implement sleep
- Implement available_parallelism
Signed-off-by: Ayush Singh <ayushdevel1325@gmail.com>
Introducing a new config for this purpose as NetBSD 9 or 8 will be still around
for a good while. For now, we re finally enabling sys::unix::rand::getrandom.
core: add Duration constructors
Add more `Duration` constructors.
Tracking issue: #120301.
These match similar convenience constructors available on both `chrono::Duration` and `time::Duration`.
What's the best ordering for these with respect to the existing constructors?
Suggest less bug-prone construction of Duration in docs
std::time::Duration has a well-known quirk: Duration::as_nanos() returns u128 [1], but Duration::from_nanos() takes u64 [2]. So these methods cannot easily roundtrip [3]. It is not possible to simply accept u128 in from_nanos [4], because it requires breaking other API [5].
It seems to me that callers have basically only two options:
1. `Duration::from_nanos(d.as_nanos() as u64)`, which is the "obvious" and buggy approach.
2. `Duration::new(d.as_secs(), d.subsecs_nanos())`, which only becomes apparent after reading and digesting the entire Duration struct documentation.
I suggest that the documentation of `from_nanos` is changed to make option 2 more easily discoverable.
There are two major usecases for this:
- "Weird math" operations that should not be supported directly by `Duration`, like squaring.
- "Disconnected roundtrips", where the u128 value is passed through various other stack frames, and perhaps reconstructed into a Duration on a different machine.
In both cases, it seems like a good idea to not tempt people into thinking "Eh, u64 is good enough, what could possibly go wrong!". That's why I want to add a note that points out the similarly-easy and *safe* way to reconstruct a Duration.
[1] https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.as_nanos
[2] https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.from_nanos
[3] https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fa6bab2b6b72f20c14b5243610ea1dde
[4] https://github.com/rust-lang/rust/issues/103332
[5] https://github.com/rust-lang/rust/issues/51107#issuecomment-392353166
Remove an unneeded helper from the tuple library code
Thanks to https://github.com/rust-lang/rust/pull/107022, this is just what `==` does, so we don't need the helper here anymore.
Add some links and minor explanatory comments to `std::fmt`
I thought the documentation for the `#` flag could do with a link to the explanation of the `?xXbo` flags, because at that point they haven't been explained yet and it's a bit confusing.
I also added that the `0` flag overrides the fill character and alignment flag, here's a [Rust Playgrond](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=0d580b7b78b8a2d8c08a2fc7a936ef17) that shows what I mean.
This is intended to clarify:
* That `Wake` exists and can be used instead of `RawWaker`.
* How to construct a `Waker` when you are looking at `Wake`
(which was previously only documented in the example).
Add support for custom JSON targets when using build-std.
Currently, when building with `build-std`, some library build scripts check properties of the target by inspecting the target triple at `env::TARGET`, which is simply set to the filename of the JSON file when using JSON target files.
This patch alters these build scripts to use `env::CARGO_CFG_*` to fetch target information instead, allowing JSON target files describing platforms without `restricted_std` to build correctly when using `-Z build-std`. There are some weak assertions here (for example, `nintendo && newlib`), however this seems at least a marginal improvement on the existing solution.
Fixes https://github.com/rust-lang/wg-cargo-std-aware/issues/60.
Clarify that atomic and regular integers can differ in alignment
The documentation for atomic integers says that they have the "same in-memory representation" as their underlying integers. This might be misconstrued as implying that they have the same layout. Therefore, clarify that atomic integers' alignment is equal to their size.
Harmonize `AsyncFn` implementations, make async closures conditionally impl `Fn*` traits
This PR implements several changes to the built-in and libcore-provided implementations of `Fn*` and `AsyncFn*` to address two problems:
1. async closures do not implement the `Fn*` family traits, leading to breakage: https://crater-reports.s3.amazonaws.com/pr-120361/index.html
2. *references* to async closures do not implement `AsyncFn*`, as a consequence of the existing blanket impls of the shape `AsyncFn for F where F: Fn, F::Output: Future`.
In order to fix (1.), we implement `Fn` traits appropriately for async closures. It turns out that async closures can:
* always implement `FnOnce`, meaning that they're drop-in compatible with `FnOnce`-bound combinators like `Option::map`.
* conditionally implement `Fn`/`FnMut` if they have no captures, which means that existing usages of async closures should *probably* work without breakage (crater checking this: https://github.com/rust-lang/rust/pull/120712#issuecomment-1930587805).
In order to fix (2.), we make all of the built-in callables implement `AsyncFn*` via built-in impls, and instead adjust the blanket impls for `AsyncFn*` provided by libcore to match the blanket impls for `Fn*`.
up to now, it had been assumed the stack guard setting default is not
touched in the field but some user might just want to disable it or
increase it. checking it once at runtime should be enough.
Improve `Option::inspect` docs
* Refer to the function as "a function" instead of "the provided closure" since it is not necessarily a closure.
* State that the original Option/Result is returned.
* Adjust the example for `Option::inspect` to use chaining.
core/time: avoid divisions in Duration::new
In our (decently large) code base, we use `SystemTime::UNIX_EPOCH.elapsed()` in a lot of places & often in a loop or in the hot path. On [Unix](https://github.com/rust-lang/rust/blob/1.75.0/library/std/src/sys/unix/time.rs#L153-L162) at least, it seems we do calculations before hand to ensure that nanos is within the valid range, yet `Duration::new()` still checks it again, using 2 divisions. It seems like adding a branch can make this function 33% faster on ARM64 in the cases where nanos is already in the valid range & seems to have no effect in the other case.
Benchmarks:
M1 Pro (14-inch base model):
```
duration/current/checked
time: [1.5945 ns 1.6167 ns 1.6407 ns]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
duration/current/unchecked
time: [1.5941 ns 1.6051 ns 1.6179 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
duration/branched/checked
time: [1.1997 ns 1.2048 ns 1.2104 ns]
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
duration/branched/unchecked
time: [1.5881 ns 1.5957 ns 1.6039 ns]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
```
EC2 c7gd.16xlarge (Graviton 3):
```
duration/current/checked
time: [2.7996 ns 2.8000 ns 2.8003 ns]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low severe
3 (3.00%) low mild
duration/current/unchecked
time: [2.9922 ns 2.9925 ns 2.9928 ns]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
duration/branched/checked
time: [2.0830 ns 2.0843 ns 2.0857 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
duration/branched/unchecked
time: [2.9879 ns 2.9886 ns 2.9893 ns]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low severe
2 (2.00%) low mild
```
EC2 r7iz.16xlarge (Intel Xeon Scalable-based (Sapphire Rapids)):
```
duration/current/checked
time: [980.60 ps 980.79 ps 980.99 ps]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
duration/current/unchecked
time: [979.53 ps 979.74 ps 979.96 ps]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
duration/branched/checked
time: [938.72 ps 938.96 ps 939.22 ps]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
duration/branched/unchecked
time: [1.0103 ns 1.0110 ns 1.0118 ns]
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe
```
Bench code (ran using stable 1.75.0 & criterion latest 0.5.1):
I couldn't find any benches for `Duration` in this repo, so I just copied the relevant types & recreated it.
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
pub fn duration_bench(c: &mut Criterion) {
const NANOS_PER_SEC: u32 = 1_000_000_000;
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
struct Nanoseconds(u32);
impl Default for Nanoseconds {
#[inline]
fn default() -> Self {
// SAFETY: 0 is within the valid range
unsafe { Nanoseconds(0) }
}
}
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
pub struct Duration {
secs: u64,
nanos: Nanoseconds, // Always 0 <= nanos < NANOS_PER_SEC
}
impl Duration {
#[inline]
pub const fn new_current(secs: u64, nanos: u32) -> Duration {
let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) {
Some(secs) => secs,
None => panic!("overflow in Duration::new"),
};
let nanos = nanos % NANOS_PER_SEC;
// SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range
Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
}
#[inline]
pub const fn new_branched(secs: u64, nanos: u32) -> Duration {
if nanos < NANOS_PER_SEC {
// SAFETY: nanos < NANOS_PER_SEC, therefore nanos is within the valid range
Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
} else {
let secs = match secs.checked_add((nanos / NANOS_PER_SEC) as u64) {
Some(secs) => secs,
None => panic!("overflow in Duration::new"),
};
let nanos = nanos % NANOS_PER_SEC;
// SAFETY: nanos % NANOS_PER_SEC < NANOS_PER_SEC, therefore nanos is within the valid range
Duration { secs, nanos: unsafe { Nanoseconds(nanos) } }
}
}
}
let mut group = c.benchmark_group("duration/current");
group.bench_function("checked", |b| {
b.iter(|| black_box(Duration::new_current(black_box(1_000_000_000), black_box(1_000_000))));
});
group.bench_function("unchecked", |b| {
b.iter(|| {
black_box(Duration::new_current(black_box(1_000_000_000), black_box(2_000_000_000)))
});
});
drop(group);
let mut group = c.benchmark_group("duration/branched");
group.bench_function("checked", |b| {
b.iter(|| {
black_box(Duration::new_branched(black_box(1_000_000_000), black_box(1_000_000)))
});
});
group.bench_function("unchecked", |b| {
b.iter(|| {
black_box(Duration::new_branched(black_box(1_000_000_000), black_box(2_000_000_000)))
});
});
}
criterion_group!(duration_benches, duration_bench);
criterion_main!(duration_benches);
```
Toggle assert_unsafe_precondition in codegen instead of expansion
The goal of this PR is to make some of the unsafe precondition checks in the standard library available in debug builds. Some UI tests are included to verify that it does that.
The diff is large, but most of it is blessing mir-opt tests and I've also split up this PR so it can be reviewed commit-by-commit.
This PR:
1. Adds a new intrinsic, `debug_assertions` which is lowered to a new MIR NullOp, and only to a constant after monomorphization
2. Rewrites `assume_unsafe_precondition` to check the new intrinsic, and be monomorphic.
3. Skips codegen of the `assume` intrinsic in unoptimized builds, because that was silly before but with these checks it's *very* silly
4. The checks with the most overhead are `ptr::read`/`ptr::write` and `NonNull::new_unchecked`. I've simply added `#[cfg(debug_assertions)]` to the checks for `ptr::read`/`ptr::write` because I was unable to come up with any (good) ideas for decreasing their impact. But for `NonNull::new_unchecked` I found that the majority of callers can use a different function, often a safe one.
Yes, this PR slows down the compile time of some programs. But in our benchmark suite it's never more than 1% icount, and the average icount change in debug-full programs is 0.22%. I think that is acceptable for such an improvement in developer experience.
https://github.com/rust-lang/rust/issues/120539#issuecomment-1922687101
Always check the result of `pthread_mutex_lock`
Fixes#120147.
Instead of manually adding a list of "good" platforms, I've simply made the check unconditional. pthread's mutex is already quite slow on most platforms, so one single well-predictable branch shouldn't hurt performance too much.
The documentation for atomic integers says that they have the "same
in-memory representation" as their underlying integers. This might be
misconstrued as implying that they have the same layout. Therefore,
clarify that atomic integers' alignment is equal to their size.
Stop bailing out from compilation just because there were incoherent traits
fixes#120343
but also has a lot of "type annotations needed" fallout. Some are fixed in the second commit.
Make `NonZero` constructors generic.
This makes `NonZero` constructors generic, so that `NonZero::new` can be used without turbofish syntax.
Tracking issue: https://github.com/rust-lang/rust/issues/120257
~~I cannot figure out how to make this work with `const` traits. Not sure if I'm using it wrong or whether there's a bug:~~
```rust
101 | if n == T::ZERO {
| ^^^^^^^^^^^^ expected `host`, found `true`
|
= note: expected constant `host`
found constant `true`
```
r? `@dtolnay`
Reconstify `Add`
r? project-const-traits
I'm not happy with the ui test changes (or failures because I did not bless them and include the diffs in this PR). There is at least some bugs I need to look and try fix:
1. A third duplicated diagnostic when a consumer crate that does not have `effects` enabled has a trait selection error for an upstream const_trait trait. See tests/ui/ufcs/ufcs-qpath-self-mismatch.rs.
2. For some reason, making `Add` a const trait would stop us from suggesting `T: Add` when we try to add two `T`s without that bound. See tests/ui/suggestions/issue-97677.rs