Make intrinsic fallback bodies cross-crate inlineable
This change was prompted by the stage1 compiler spending 4% of its time when compiling the polymorphic-recursion MIR opt test in `unlikely`.
Intrinsic fallback bodies like `unlikely` should always be inlined, it's very silly if they are not. To do this, we enable the fallback bodies to be cross-crate inlineable. Not that this matters for our workloads since the compiler never actually _uses_ the "fallback bodies", it just uses whatever was cfg(bootstrap)ped, so I've also added `#[inline]` to those.
See the comments for more information.
r? oli-obk
intrinsics::simd: add missing functions, avoid UB-triggering fast-math
Turns out stdarch declares a bunch more SIMD intrinsics that are still missing from libcore.
I hope I got the docs and in particular the safety requirements right for these "unordered" and "nanless" intrinsics.
Many of these are unused even in stdarch, but they are implemented in the codegen backend, so we may as well list them here.
r? `@Amanieu`
Cc `@calebzulawski` `@workingjubilee`
Remove useless `'static` bounds on `Box` allocator
#79327 added `'static` bounds to the allocator parameter for various `Box` + `Pin` APIs to ensure soundness. But it was a bit overzealous, some of the bounds aren't actually needed.
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.
And this uses the algebraic float ops to fix https://github.com/rust-lang/rust/issues/120720
cc `@orlp`
Refactor trait implementations in `core::convert::num`.
Tracking issue: https://github.com/rust-lang/rust/issues/120257
Implement conversion traits using generic `NonZero` type, and refactor all macros to use a consistent format/order of parameters.
r? `@dtolnay`
Correct the simd_masked_{load,store} intrinsic docs
Explains the uniform pointer being used for these two operations and how elements are offset from it.
Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)
The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.
To avoid this, we always inline the check when building with debug assertions.
Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (#121112)): 67s
- this: 54s
So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` #121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".
Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!
r? ```@saethlin``` (or anyone else if someone wants to review this)
fixes#121110, supposedly
Use intrinsics::debug_assertions in debug_assert_nounwind
This is the first item in https://github.com/rust-lang/rust/issues/120848.
Based on the benchmarking in this PR, it looks like, for the programs in our benchmark suite, enabling all these additional checks does not introduce significant compile-time overhead, with the single exception of `Alignment::new_unchecked`. Therefore, I've added `#[cfg(debug_assertions)]` to that one call site, so that it remains compiled out in the distributed standard library.
The trailing commas in the previous calls to `debug_assert_nounwind!` were causing the macro to expand to `panic_nouwnind_fmt`, which requires more work to set up its arguments, and that overhead alone is measured between this perf run and the next: https://github.com/rust-lang/rust/pull/120863#issuecomment-1937423502
This change was prompted by the stage1 compiler spending 4% of its time
when compiling the polymorphic-recursion MIR opt test in `unlikely`.
Intrinsic fallback bodies like `unlikely` should always be inlined, it's
very silly if they are not. To do this, we enable the fallback bodies to
be cross-crate inlineable. Not that this matters for our workloads since
the compiler never actually _uses_ the "fallback bodies", it just uses
whatever was cfg(bootstrap)ped, so I've also added `#[inline]` to those.
The current complexities in `assert_unsafe_precondition` are delicately
balancing several concerns, among them compile times for the cases where
there are no debug assertions. This comes at a large runtime cost when
the assertions are enabled, making the debug assertion compiler a lot
slower, which is very annoying.
To avoid this, we always inline the check when building with debug
assertions.
Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)`
`debug_assertions`: 67s
- this: 54s
So this seems like a good solution. I think we can still get
the same run-time perf improvements for other users too by
massaging this code further (see my other PR about adding
`#[rustc_no_mir_inline]`) but this is a simpler step that
solves the imminent problem of "holy shit my rustc is sooo slow".
Funny consequence: This now means compiling the standard library with
dbeug assertions makes it faster (than without, when using debug
assertions downstream)!
Store core::str::CharSearcher::utf8_size as u8
This is already relied on being smaller than u8 due to the `safety invariant: utf8_size must be less than 5`, so this helps LLVM optimize and maybe improve copies due to padding instead of unused bytes.
Specialize some methods of `io::Chain`
This PR specializes the implementation of some methods of `io::Chain`, which could bring performance improvements when using it.
Portable SIMD subtree update
Syncs nightly to the latest changes from rust-lang/portable-simd
r? `@rust-lang/libs`
Also, fixes#119904 which is now fixed upstream.
Reduce monomorphisation bloat in small_c_string
This is a code path usually next to an FFI call, so taking the `dyn` slowdown for the 1159 llvm-line (fat lto, codegen-units 1, release build) drop in my testing program [t2fanrd](https://github.com/GnomedDev/t2fanrd) is worth it imo.
Remove unnecessary unit binding
It appears that the unit binding is not necessary at this time. However, I am unsure of its importance in the past. Please let me know if it is unsafe to remove.
Move `OsStr::slice_encoded_bytes` validation to platform modules
This delegates OS string slicing (`OsStr::slice_encoded_bytes`) validation to the underlying platform implementation. For now that results in increased performance and better error messages on Windows without any changes to semantics. In the future we may want to provide different semantics for different platforms.
The existing implementation is still used on Unix and most other platforms and is now optimized a little better.
Tracking issue: https://github.com/rust-lang/rust/issues/118485
cc `@epage,` `@BurntSushi`