Optimise align_offset for stride=1 further
`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:
%address = ptrtoint i8* %p to i64
%negptr = sub i64 0, %address
%offset = and i64 %negptr, %a_minus_one
And produces pretty much ideal code-gen when this function is used in
isolation.
Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.
%aligned = getelementptr inbounds i8, i8* %p, i64 %offset
This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:
; x86_64
dec rsi
mov rax, rdi
neg rax
and rax, rsi
add rax, rdi
In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:
; x86_64
lea rax, [rsi + rdi - 1]
neg rsi
and rax, rsi
To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.
One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:
; using add enables `lea` and similar CISCisms
%offset_ptr = add i64 %address, %a_minus_one
%mask = sub i64 0, %a
%masked = and i64 %offset_ptr, %mask
; can be folded with `gepi` that may follow
%offset = sub i64 %masked, %address
…and generates the intended x86_64 machine code.
One might also wonder how the increased amount of code would impact a
RISC target. Turns out not much:
; aarch64 previous ; aarch64 new
sub x8, x1, #1 add x8, x1, x0
neg x9, x0 sub x8, x8, #1
and x8, x9, x8 neg x9, x1
add x0, x0, x8 and x0, x8, x9
(and similarly for ppc, sparc, mips, riscv, etc)
The only target that seems to do worse is… wasm32.
Onto actual measurements – the best way to evaluate snipets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%!
Fixes#75579
Properly define va_arg and va_list for aarch64-apple-darwin
From [Apple][]:
> Because of these changes, the type `va_list` is an alias for `char*`,
> and not for the struct type in the generic procedure call standard.
With this change `/x.py test --stage 1 src/test/ui/abi/variadic-ffi`
passes.
Fixes#78092
[Apple]: https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms
transmute_copy: explain that alignment is handled correctly
The doc comment currently is somewhat misleading because if it actually transmuted `&T` to `&U`, a higher-aligned `U` would be problematic.
`#[deny(unsafe_op_in_unsafe_fn)]` in sys/wasm
This is part of #73904.
This encloses unsafe operations in unsafe fn in `libstd/sys/wasm`.
@rustbot modify labels: F-unsafe-block-in-unsafe-fn
BTreeMap: stop mistaking node::MIN_LEN for a node level constraint
Correcting #77612 that fell into the trap of assuming that node::MIN_LEN is an imposed minimum everywhere, and trying to make it much more clear it is an offered minimum at the node level.
r? @Mark-Simulacrum
Bump backtrace-rs to enable Mach-O support on iOS.
Related to rust-lang/backtrace-rs#378. Fixes backtraces on iOS that were missing in Rust v1.47.0 after switching to gimli because it only enabled Mach-O support on macOS.
replace `#[allow_internal_unstable]` with `#[rustc_allow_const_fn_unstable]` for `const fn`s
`#[allow_internal_unstable]` is currently used to side-step feature gate and stability checks.
While it was originally only meant to be used only on macros, its use was expanded to `const fn`s.
This pr adds stricter checks for the usage of `#[allow_internal_unstable]` (only on macros) and introduces the `#[rustc_allow_const_fn_unstable]` attribute for usage on `const fn`s.
This pr does not change any of the functionality associated with the use of `#[allow_internal_unstable]` on macros or the usage of `#[rustc_allow_const_fn_unstable]` (instead of `#[allow_internal_unstable]`) on `const fn`s (see https://github.com/rust-lang/rust/issues/69399#issuecomment-712911540).
Note: The check for `#[rustc_allow_const_fn_unstable]` currently only validates that the attribute is used on a function, because I don't know how I would check if the function is a `const fn` at the place of the check. I therefore openend this as a 'draft pull request'.
Closesrust-lang/rust#69399
r? @oli-obk
Throw core::panic!("message") as &str instead of String.
This makes `core::panic!("message")` consistent with `std::panic!("message")`, which throws a `&str` and not a `String`.
This also makes any other panics from `core::panicking::panic` result in a `&str` rather than a `String`, which includes compiler-generated panics such as the panics generated for `mem::zeroed()`.
---
Demonstration:
```rust
use std::panic;
use std::any::Any;
fn main() {
panic::set_hook(Box::new(|panic_info| check(panic_info.payload())));
check(&*panic::catch_unwind(|| core::panic!("core")).unwrap_err());
check(&*panic::catch_unwind(|| std::panic!("std")).unwrap_err());
}
fn check(msg: &(dyn Any + Send)) {
if let Some(s) = msg.downcast_ref::<String>() {
println!("Got a String: {:?}", s);
} else if let Some(s) = msg.downcast_ref::<&str>() {
println!("Got a &str: {:?}", s);
}
}
```
Before:
```
Got a String: "core"
Got a String: "core"
Got a &str: "std"
Got a &str: "std"
```
After:
```
Got a &str: "core"
Got a &str: "core"
Got a &str: "std"
Got a &str: "std"
```
Fix const core::panic!(non_literal_str).
Invocations of `core::panic!(x)` where `x` is not a string literal expand to `panic!("{}", x)`, which is not understood by the const panic logic right now. This adds `panic_str` as a lang item, and modifies the const eval implementation to hook into this item as well.
This fixes the issue mentioned here: https://github.com/rust-lang/rust/issues/51999#issuecomment-687604248
r? `@RalfJung`
`@rustbot` modify labels: +A-const-eval
revise Hermit's mutex interface to support the behaviour of StaticMutex
rust-lang/rust#77147 simplifies things by splitting this Mutex type into two types matching the two use cases: StaticMutex and MovableMutex. To support the new behavior of StaticMutex, we move part of the mutex implementation into libstd.
The interface to the OS changed. Consequently, I removed a few functions, which aren't longer needed.
change the order of type arguments on ControlFlow
This allows ControlFlow<BreakType> which is much more ergonomic for common iterator combinator use cases.
Addresses one component of #75744
Check for exhaustion in RangeInclusive::contains and slicing
When a range has finished iteration, `is_empty` returns true, so it
should also be the case that `contains` returns false.
Fixes#77941.
add `insert` to `Option`
This removes a cause of `unwrap` and code complexity.
This allows replacing
```
option_value = Some(build());
option_value.as_mut().unwrap()
```
with
```
option_value.insert(build())
```
It's also useful in contexts not requiring the mutability of the reference.
Here's a typical cache example:
```
let checked_cache = cache.as_ref().filter(|e| e.is_valid());
let content = match checked_cache {
Some(e) => &e.content,
None => {
cache = Some(compute_cache_entry());
// unwrap is OK because we just filled the option
&cache.as_ref().unwrap().content
}
};
```
It can be changed into
```
let checked_cache = cache.as_ref().filter(|e| e.is_valid());
let content = match checked_cache {
Some(e) => &e.content,
None => &cache.insert(compute_cache_entry()).content,
};
```
*(edited: I removed `insert_with`)*
This removes a cause of `unwrap` and code complexity.
This allows replacing
```
option_value = Some(build());
option_value.as_mut().unwrap()
```
with
```
option_value.insert(build())
```
or
```
option_value.insert_with(build)
```
It's also useful in contexts not requiring the mutability of the reference.
Here's a typical cache example:
```
let checked_cache = cache.as_ref().filter(|e| e.is_valid());
let content = match checked_cache {
Some(e) => &e.content,
None => {
cache = Some(compute_cache_entry());
// unwrap is OK because we just filled the option
&cache.as_ref().unwrap().content
}
};
```
It can be changed into
```
let checked_cache = cache.as_ref().filter(|e| e.is_valid());
let content = match checked_cache {
Some(e) => &e.content,
None => &cache.insert_with(compute_cache_entry).content,
};
```
Doc formating consistency between slice sort and sort_unstable, and big O notation consistency
Updated documentation for slice sorting methods to be consistent between stable and unstable versions, which just ended up being minor formatting differences.
I also went through and updated any doc comments with big O notation to be consistent with #74010 by italicizing them rather than having them in a code block.
Fixing escaping to ensure generation of welformed json.
doc tests' json name have a filename in them. When json test output is asked for on windows currently produces invalid json.
Tracking issue for json test output: #49359
Implement TryFrom between NonZero types.
This will instantly be stable, as trait implementations for stable types and traits can not be `#[unstable]`.
Closes#77258.
@rustbot modify labels: +T-libs