Commit Graph

271076 Commits

Author SHA1 Message Date
bors
3bc6916f4c Auto merge of #132965 - mati865:cfguard-gnullvm, r=wesleywiser
allow CFGuard on windows-gnullvm

No unit tests because of https://github.com/rust-lang/rust/issues/132278
2024-11-15 00:21:07 +00:00
liushuyu
0733ed77d1
tests/run-make/simd-ffi: use a generic LLVM intrinsics ...
... to do more comprehensive type checking
2024-11-14 15:49:51 -07:00
Trevor Gross
5d818914af Always inline functions signatures containing f16 or f128
There are a handful of tier 2 and tier 3 targets that cause a LLVM crash
or linker error when generating code that contains `f16` or `f128`. The
cranelift backend also does not support these types. To work around
this, every function in `std` or `core` that contains these types must
be marked `#[inline]` in order to avoid sending any code to the backend
unless specifically requested.

However, this is inconvenient and easy to forget. Introduce a check for
these types in the frontend that automatically inlines any function
signatures that take or return `f16` or `f128`.

Note that this is not a perfect fix because it does not account for the
types being passed by reference or as members of aggregate types, but
this is sufficient for what is currently needed in the standard library.

Fixes: https://github.com/rust-lang/rust/issues/133035
Closes: https://github.com/rust-lang/rust/pull/133037
2024-11-14 16:18:41 -06:00
Trevor Gross
b77dbbd4fd Pass f16 and f128 by value in const_assert!
These types are currently passed by reference, which does not avoid the
backend crashes. Change these back to being passed by value, which makes
the types easier to detect for automatic inlining.
2024-11-14 16:09:45 -06:00
bors
e84902d35a Auto merge of #133047 - matthiaskrgr:rollup-9se1vth, r=matthiaskrgr
Rollup of 4 pull requests

Successful merges:

 - #128197 (Skip locking span interner for some syntax context checks)
 - #133040 ([rustdoc] Fix handling of footnote reference in footnote definition)
 - #133043 (rustdoc-search: case-sensitive only when capitals are used)
 - #133046 (Clippy subtree update)

r? `@ghost`
`@rustbot` modify labels: rollup
2024-11-14 21:09:28 +00:00
liushuyu
ede8a74f1e tests/run-make/simd-ffi: fix test crashing on x86 targets ...
... that do not have SSE2 support (e.g. i586)
2024-11-14 14:07:47 -07:00
maxcabrajac
9fde49b338 Change visit_precise_capturing_arg so it returns a Self::Result 2024-11-14 17:07:46 -03:00
cyrgani
7711ba2d14 use &raw in {read, write}_unaligned documentation 2024-11-14 21:04:30 +01:00
Matthias Krüger
d6a9ded560
Rollup merge of #133046 - flip1995:clippy-subtree-update, r=Manishearth
Clippy subtree update

r? `@Manishearth`

Smaller sync today, as the last sync was delayed by a week.
2024-11-14 20:45:16 +01:00
Matthias Krüger
8912909b98
Rollup merge of #133043 - notriddle:master, r=fmease
rustdoc-search: case-sensitive only when capitals are used

This is the "smartcase" behavior, described by vim and dtolnay.

Fixes https://github.com/rust-lang/rust/issues/133017
2024-11-14 20:45:15 +01:00
Matthias Krüger
dd61213be4
Rollup merge of #133040 - GuillaumeGomez:footnote-ref-in-def, r=notriddle
[rustdoc] Fix handling of footnote reference in footnote definition

Fixes https://github.com/rust-lang/rust/issues/131946.

We didn't check if we had footnote reference in footnote definition.

r? `@notriddle`
2024-11-14 20:45:13 +01:00
Matthias Krüger
e158303c09
Rollup merge of #128197 - Alexendoo:span-ctxt, r=davidtwco
Skip locking span interner for some syntax context checks

- `from_expansion` now never needs to consult the interner
- `eq_ctxt` now only needs the interner when both spans are fully interned
2024-11-14 20:45:12 +01:00
Philipp Krones
35c3b25321
Merge commit '786fbd6d683933cd0e567fdcd25d449a69b4320c' into clippy-subtree-update 2024-11-14 19:35:26 +01:00
Michael Howell
32500aa8e0 rustdoc-search: case-sensitive only when capitals are used
This is the "smartcase" behavior, described by vim and dtolnay.
2024-11-14 11:10:14 -07:00
Philipp Krones
786fbd6d68
Rustup (#13687)
r? @ghost

changelog: none
2024-11-14 17:32:56 +00:00
Philipp Krones
99ef36938a
Bump nightly version -> 2024-11-14 2024-11-14 18:27:46 +01:00
Philipp Krones
c166ee1fc8
Merge remote-tracking branch 'upstream/master' into rustup 2024-11-14 18:27:35 +01:00
bors
90ab8eaedd Auto merge of #133039 - GuillaumeGomez:rollup-i223onq, r=GuillaumeGomez
Rollup of 5 pull requests

Successful merges:

 - #132172 (borrowck diagnostics: suggest borrowing function inputs in generic positions)
 - #132649 (add ./x clippy ci)
 - #133005 (rustdoc: use a trie for name-based search)
 - #133034 (update download-rustc comments and default)
 - #133036 (add myself into `users_on_vacation` on triagebot)

r? `@ghost`
`@rustbot` modify labels: rollup
2024-11-14 17:22:10 +00:00
Guillaume Gomez
052d40adb3 Add regression test for #131946 2024-11-14 17:01:29 +01:00
Guillaume Gomez
1d2f9118fe Fix handling of footnote reference in footnote definition 2024-11-14 17:01:09 +01:00
Xing Xue
467ce2695a Include the "unwind" crate to link with libunwind instead of the "libc" crate. 2024-11-14 10:51:28 -05:00
Guillaume Gomez
35214ebf32
Rollup merge of #133036 - onur-ozkan:vacation, r=jieyouxu
add myself into `users_on_vacation` on triagebot

I will be on vacation for about 10 days.
2024-11-14 15:16:18 +01:00
Guillaume Gomez
19d0e60cf2
Rollup merge of #133034 - onur-ozkan:new-default, r=jieyouxu
update download-rustc comments and default

See https://github.com/rust-lang/rust/pull/132872#issuecomment-2476135053
2024-11-14 15:16:16 +01:00
Guillaume Gomez
fc7ca70013
Rollup merge of #133005 - notriddle:notriddle/trie-search, r=GuillaumeGomez
rustdoc: use a trie for name-based search

Potentially https://github.com/rust-lang/rust/issues/131156 — need to try reproducing the problem with `windows`

Preview and profiler results
----------------------------

Here's some quick profiling in Firefox done on the rust compiler docs:

- Before: https://share.firefox.dev/3UPm3M8
- After: https://share.firefox.dev/40LXvYb

Here's the results for the node.js profiler:

- https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html

Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly.

- https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html

[the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/

The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary.

The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie.

[Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf
[Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java
[moman]: https://gitlab.com/notriddle/moman-rustdoc

User-visible changes
--------------------

I don't expect anybody to notice anything, but it does cause two changes:

- Substring matches, in the middle of a name, only apply if there's three or more characters in the search query.
- Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them.
- It uses more RAM.
- It's faster (assuming you don't swap thrash).

[^1]: technically utf-16 code units
2024-11-14 15:16:14 +01:00
Guillaume Gomez
1a1efafc64
Rollup merge of #132649 - klensy:pa-clippy-ci, r=onur-ozkan
add ./x clippy ci

This is rebase of https://github.com/rust-lang/rust/pull/126321

also https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/enable.20more.20clippy.20lints.20for.20compiler.20.28and.5Cor.20std.29 for context
2024-11-14 15:16:11 +01:00
Guillaume Gomez
5ee347ece4
Rollup merge of #132172 - dianne:suggest-borrow-generic, r=matthewjasper
borrowck diagnostics: suggest borrowing function inputs in generic positions

# Summary
This generalizes borrowck's existing suggestions to borrow instead of moving when passing by-value to a function that's generic in that input. Previously, this was special-cased to `AsRef`/`Borrow`-like traits and `Fn`-like traits. This PR changes it to test if, for a moved place with type `T`, that the callee's signature and clauses don't break if you substitute in `&T` or `&mut T`. For instance, it now works with `Read`/`Write`-like traits.

Fixes https://github.com/rust-lang/rust/issues/131413

# Incidental changes
- No longer spuriously suggests mutable borrows of closures in some situations (see e.g. the tests in [tests/ui/closures/2229_closure_analysis/](https://github.com/rust-lang/rust/compare/master...dianne:rust:suggest-borrow-generic?expand=1#diff-8dfb200c559f0995d0f2ffa2f23bc6f8041b263e264e5c329a1f4171769787c0)).
- No longer suggests cloning closures that implement `Fn`, since they can be borrowed (see e.g. [tests/ui/moves/borrow-closures-instead-of-move.stderr](https://github.com/rust-lang/rust/compare/master...dianne:rust:suggest-borrow-generic?expand=1#diff-5db268aac405eec56d099a72d8b58ac46dab523cf013e29008104840168577fb)).

This keeps the behavior to suppress suggestions of `fn_once.clone()()`. I think it might make sense to suggest it with a "but this might not be your desired behavior" caveat, as is done when something is used after being consumed as the receiver for a method call. That's probably out of the scope of this PR though.

# Limitations and possible improvements
- This doesn't work for receivers of method calls. This is a small change, and I have it implemented locally, but I'm not sure it's useful on its own. In most cases I've found borrowing the receiver would change the call's output type (see below). In other cases (e.g. `Iterator::sum`), borrowing the receiver isn't useful since it's consumed.
- This doesn't work when it would change the call's output type. In general, I figure inserting references into the output type is an unwanted change. However, this also means it doesn't work in cases where the new output type would be effectively the same as the old one. For example, from the rand crate, the iterator returned by [`Rng::sample_iter`](https://docs.rs/rand/latest/rand/trait.Rng.html#method.sample_iter) is effectively the same (modulo regions) whether you borrow or consume the receiver `Rng`, so common usage involves borrowing it. I'm not sure whether the best approach is to add a more complex check of approximate equivalence, to forego checking the call's output type and give spurious suggestions, or to leave it as-is.
- This doesn't work when it would change the call's other input types. Instead, it could suggest borrowing any others that have the same parameter type (but only when suggesting shared borrows). I think this would be a pretty easy change, but I don't think it's very useful so long as the normalized output type can't change.

I'm happy to implement any of these (or other potential improvements to this), but I'm not sure which are common enough patterns to justify the added complexity. Please let me know if any sound worthwhile.
2024-11-14 15:16:08 +01:00
bors
c82e0dff84 Auto merge of #132709 - programmerjake:optimize-charto_digit, r=joshtriplett
optimize char::to_digit and assert radix is at least 2

approved by t-libs: https://github.com/rust-lang/libs-team/issues/475#issuecomment-2457858458

let me know if this needs an assembly test or similar.
2024-11-14 14:14:40 +00:00
onur-ozkan
2b8c345393 add myself into users_on_vacation on triagebot
Signed-off-by: onur-ozkan <work@onurozkan.dev>
2024-11-14 15:06:12 +03:00
onur-ozkan
49f9b4b4de update download-rustc comments and default
Signed-off-by: onur-ozkan <work@onurozkan.dev>
2024-11-14 14:56:48 +03:00
bors
a4cedecc9e Auto merge of #133032 - GuillaumeGomez:rollup-vqakdmw, r=GuillaumeGomez
Rollup of 5 pull requests

Successful merges:

 - #132010 (ci: Enable full `debuginfo-level=2` in `DEPLOY_ALT`)
 - #132310 (compiletest: add `max-llvm-major-version` directive)
 - #132773 (PassWrapper: disable UseOdrIndicator for Asan Win32)
 - #133013 (compiletest: known-bug / crashes: allow for an "auxiliary" directory to contain files that do not have a "known-bug" directive)
 - #133027 (Fix a copy-paste issue in the NuttX raw type definition)

r? `@ghost`
`@rustbot` modify labels: rollup
2024-11-14 10:59:49 +00:00
Guillaume Gomez
e6cd8699ea
Rollup merge of #133027 - no1wudi:master, r=jhpratt
Fix a copy-paste issue in the NuttX raw type definition

This file is copied from the rtems as initial implementation, and forgot to change the OS name in the comment.
2024-11-14 18:26:16 +08:00
Guillaume Gomez
6a783a4c5e
Rollup merge of #133013 - matthiaskrgr:crash_aux, r=onur-ozkan
compiletest: known-bug / crashes: allow for an "auxiliary" directory to contain files that do not have a "known-bug" directive

Fixes #133009

r? `@jieyouxu`
2024-11-14 18:26:16 +08:00
Guillaume Gomez
e3c76c5699
Rollup merge of #132773 - jakos-sec:fix-asan-win32, r=jieyouxu
PassWrapper: disable UseOdrIndicator for Asan Win32

As described in https://reviews.llvm.org/D137227 UseOdrIndicator should be disabled on Windows since link.exe does not support duplicate weak definitions.

Fixes https://github.com/rust-lang/rust/issues/124390.

Credits also belong to `@1c3t3a`  who worked with me on this.
We are currently testing this on a Windows machine.
2024-11-14 18:26:15 +08:00
Guillaume Gomez
475203f098
Rollup merge of #132310 - jieyouxu:max-llvm-version, r=onur-ozkan
compiletest: add `max-llvm-major-version` directive

To complement existing `min-llvm-version` so contributors don't have to use `ignore-llvm-version: 20 - 99` to emulate `max-llvm-major-version: 19`.

Closes #132305.
cc `@workingjubilee` who suggested this.

### Implementation steps

- [x] 1. Implement the directive (this PR)
- [x] 2. Open an accompanying dev-guide PR to describe the directive (https://github.com/rust-lang/rustc-dev-guide/pull/2129)

r? bootstrap
2024-11-14 18:26:15 +08:00
Guillaume Gomez
bce5fa62ab
Rollup merge of #132010 - cuviper:alt-full-debuginfo, r=Mark-Simulacrum
ci: Enable full `debuginfo-level=2` in `DEPLOY_ALT`

It will be slower to build and produce larger artifacts, but hopefully
it will help catch debuginfo regressions sooner, especially for problems
that LLVM assertions would uncover.

try-job: dist-x86_64-linux
try-job: dist-x86_64-linux-alt
2024-11-14 18:26:14 +08:00
许杰友 Jieyou Xu (Joe)
91fa16b211 tests: use max-llvm-major-version instead of ignore-llvm-version range like N - 99
For tests that use `ignore-llvm-version: N - M`, replace that with
`max-llvm-major-version: N-1`.
2024-11-14 17:44:54 +08:00
许杰友 Jieyou Xu (Joe)
7eee9faea1 compiletest: add max-llvm-major-version directive
There's already `min-llvm-version`, and contributors were using
`ignore-llvm-version: 20 - 99` to emulate `max-llvm-major-version: 19`.
2024-11-14 17:44:04 +08:00
Luca Versari
3d3b515707 ABI checks: add support for some tier3 arches, warn on others. 2024-11-14 08:57:39 +01:00
bors
dae7ac133b Auto merge of #133026 - workingjubilee:rollup-q8ig6ah, r=workingjubilee
Rollup of 7 pull requests

Successful merges:

 - #131304 (float types: move copysign, abs, signum to libcore)
 - #132907 (Change intrinsic declarations to new style)
 - #132971 (Handle infer vars in anon consts on stable)
 - #133003 (Make `CloneToUninit` dyn-compatible)
 - #133004 (btree: simplify the backdoor between set and map)
 - #133008 (update outdated comment about test-float-parse)
 - #133012 (Add test cases for #125918)

r? `@ghost`
`@rustbot` modify labels: rollup
2024-11-14 07:07:53 +00:00
Huang Qi
79e2af285a Fix a copy-paste issue in the NuttX raw type definition
This file is copied from the rtems as initial implementation, and
forgot to change the OS name in the comment.

Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
2024-11-14 14:50:30 +08:00
Jubilee
18136cf0d8
Rollup merge of #133012 - Eclips4:issue-125670, r=compiler-errors
Add test cases for #125918

Closes #125670

r? `@jieyouxu`
2024-11-13 22:43:39 -08:00
Jubilee
d21a53d836
Rollup merge of #133008 - onur-ozkan:update-outdated-comment, r=jieyouxu
update outdated comment about test-float-parse

It's no longer a Python program since https://github.com/rust-lang/rust/pull/127510.
2024-11-13 22:43:38 -08:00
Jubilee
966b8930e3
Rollup merge of #133004 - cuviper:unrecover-btree, r=ibraheemdev
btree: simplify the backdoor between set and map

The internal `btree::Recover` trait acted as a private API between
`BTreeSet` and `BTreeMap`, but we can use `pub(_)` restrictions these
days, and some of the methods don't need special handling anymore.

* `BTreeSet::get` can use `BTreeMap::get_key_value`
* `BTreeSet::take` can use `BTreeMap::remove_entry`
* `BTreeSet::replace` does need help, but this now uses a `pub(super)`
  method on `BTreeMap` instead of the trait.
* `btree::Recover` is now removed.
2024-11-13 22:43:38 -08:00
Jubilee
17dcadd587
Rollup merge of #133003 - zachs18:clonetouninit-dyn-compat-u8, r=dtolnay
Make `CloneToUninit` dyn-compatible

Make `CloneToUninit` dyn-compatible, by making `clone_to_uninit`'s `dst` parameter `*mut u8` instead of `*mut Self`, so the method does not reference `Self` except in the `self` parameter and is thus dispatchable from a trait object.

This allows, among other things, adding `CloneToUninit` as a supertrait bound for `trait Foo` to allow cloning `dyn Foo` in some containers. Currently, this means that `Rc::make_mut` and `Arc::make_mut` can work with `dyn Foo` where `trait Foo: CloneToUninit`.

<details><summary>Example</summary>

```rs
#![feature(clone_to_uninit)]
use std::clone::CloneToUninit;
use std::rc::Rc;
use std::fmt::Debug;
use std::borrow::BorrowMut;

trait Foo: BorrowMut<u32> + CloneToUninit + Debug {}

impl<T: BorrowMut<u32> + CloneToUninit + Debug> Foo for T {}

fn main() {
    let foo: Rc<dyn Foo> = Rc::new(42_u32);
    let mut bar = foo.clone();
    *Rc::make_mut(&mut bar).borrow_mut() = 37;
    dbg!(foo, bar); // 42, 37
}
```

</details>

Eventually, `Box::<T>::clone` is planned to be converted to use `T::clone_to_uninit`, which when combined with this change, will allow cloning `Box<dyn Foo>` where `trait Foo: CloneToUninit` without any additional `unsafe` code for the author of `trait Foo`.[^1]

This PR should have no stable side-effects, as `CloneToUninit` is unstable so cannot be mentioned on stable, and `CloneToUninit` is not used as a supertrait anywhere in the stdlib.

This change removes some length checks that could only fail if library UB was already hit (e.g. calling `<[T]>::clone_to_uninit` with a too-small-length `dst` is library UB and was previously detected[^2]; since `dst` does not have a length anymore, this now cannot be detected[^3]).

r? libs-api

-----

I chose to make the parameter `*mut u8` instead of `*mut ()` because that might make it simpler to pass the result of `alloc` to `clone_to_uninit`, but `*mut ()` would also make sense, and any `*mut ConcreteType` would *work*. The original motivation for [using specifically `*mut ()`](https://github.com/rust-lang/rust/pull/116113#discussion_r1335303908) appears to be `std::ptr::from_raw_parts_mut`, but that now [takes `*mut impl Thin`](https://doc.rust-lang.org/nightly/std/ptr/fn.from_raw_parts.html) instead of `*mut ()`. I have another branch where the parameter is `*mut ()`, if that is preferred.

It *could* also take something like `&mut [MaybeUninit<u8>]` to be dyn-compatible but still allow size-checking and in some cases safe writing, but this is already an `unsafe` API where misuse is UB, so I'm not sure how many guardrails it's worth adding here, and `&mut [MaybeUninit<u8>]` might be overly cumbersome to construct for callers compared to `*mut u8`

[^1]:  Note that  `impl<T: CloneToUninit + ?Sized> Clone for Box` must be added before or at the same time as when `CloneToUninit` becomes stable, due to `Box` being `#[fundamental]`, as if there is any stable gap between the stabilization of `CloneToUninit` and `impl<T: CloneToUninit + ?Sized> Clone for Box`, then users could implement both `CloneToUninit for dyn LocalTrait` and separately `Clone for Box<dyn LocalTrait>` during that gap, and be broken by the introduction of  `impl<T: CloneToUninit + ?Sized> Clone for Box`.

[^2]: Using a `debug_assert_eq` in [`core::clone::uninit::CopySpec::clone_slice`](https://doc.rust-lang.org/nightly/src/core/clone/uninit.rs.html#28).

[^3]: This PR just uses [the metadata (length) from `self`](e0c1c8bc50/library/core/src/clone.rs (L286)) to construct the `*mut [T]` to pass to `CopySpec::clone_slice` in `<[T]>::clone_to_uninit`.
2024-11-13 22:43:37 -08:00
Jubilee
aa189460b8
Rollup merge of #132971 - BoxyUwU:handle_infers_in_anon_consts, r=compiler-errors
Handle infer vars in anon consts on stable

Fixes #132955

Diagnostics will sometimes try to replace generic parameters with inference variables in failing goals. This means that if we have some failing goal with an array repeat expr count anon const in it, we will wind up with some `ty::ConstKind::Unevaluated(anon_const_def, [?x])` during diagnostics which will then ICE if we do not handle inference variables correctly on stable when normalizing type system consts.

r? ```@compiler-errors```
2024-11-13 22:43:37 -08:00
Jubilee
55e05f240b
Rollup merge of #132907 - BLANKatGITHUB:intrinsic, r=saethlin
Change intrinsic declarations to new style

Pr is for issue #132735
This changes the first `extern "rust-intrinsic"` block to the new style.
r? `@RalfJung`
2024-11-13 22:43:36 -08:00
Jubilee
52913653dd
Rollup merge of #131304 - RalfJung:float-core, r=tgross35
float types: move copysign, abs, signum to libcore

These operations are explicitly specified to act "bitwise", i.e. they just act on the sign bit and do not even quiet signaling NaNs. We also list them as ["non-arithmetic operations"](https://doc.rust-lang.org/nightly/std/primitive.f32.html#nan-bit-patterns), and all the other non-arithmetic operations are in libcore. There's no reason to expect them to require any sort of runtime support, and from [these experiments](https://github.com/rust-lang/rust/issues/50145#issuecomment-997301250) it seems like LLVM indeed compiles them in a way that does not require any sort of runtime support.

Nominating for `@rust-lang/libs-api` since this change takes immediate effect on stable.

Part of https://github.com/rust-lang/rust/issues/50145.
2024-11-13 22:43:35 -08:00
Maybe Lapkin
673bb5e3ff
Mark never_type_fallback_flowing_into_unsafe as a semantic change
...rather than a future error
2024-11-14 06:01:14 +01:00
dianne
2ab8480605 Suggest borrowing arguments in generic positions when trait bounds are satisfied
This subsumes the suggestions to borrow arguments with `AsRef`/`Borrow` bounds and those to borrow
arguments with `Fn` and `FnMut` bounds. It works for other traits implemented on references as well,
such as `std::io::Read`, `std::io::Write`, and `core::fmt::Write`.

Incidentally, by making the logic for suggesting borrowing closures general, this removes some
spurious suggestions to mutably borrow `FnMut` closures in assignments, as well as an unhelpful
suggestion to add a `Clone` constraint to an `impl Fn` argument.
2024-11-13 20:29:40 -08:00
bors
22bcb81c66 Auto merge of #122770 - iximeow:ixi/int-formatting-optimization, r=workingjubilee
improve codegen of fmt_num to delete unreachable panic

it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`.

adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.

in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper:
<img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef">

where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone:
<img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa">

this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x.

the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after.

---

i initially measured slightly better outcomes using `unreachable_unchecked()` here instead, but that was hacking on std and rebuilding with `-Z build-std` on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all.

<details>
<summary>initial notes about that, seemingly irrelevant on modern rustc</summary>
i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like
```diff
        if x == zero {
            // No more digits left to accumulate.
            break;
        };
    }
}
+
+ if curr >= buf.len() {
+     unsafe { core::hint::unreachable_unchecked(); }
+ }
let buf = &buf[curr..];
```

posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like

```
put_char(x & 0xf);
let mut len = 1;
if x > 0xf {
  put_char((x >> 4) & 0xf);
  len = 2;
}
pad_integral(buf[buf.len() - len..]);
```

it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case.

</details>

---

i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_.

(i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
2024-11-14 04:17:20 +00:00