Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
Update to new browser-ui-test version
This new version brings a lot of new internal improvements (mostly around validating the commands input).
It also improved some command names and arguments.
r? `@notriddle`
rustdoc-search: shard the search result descriptions
## Preview
This makes no visual changes to rustdoc search. It's a pure perf improvement.
<details><summary>old</summary>
Preview: <http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec>
WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN>
Waterfall diagram:
![image](https://github.com/rust-lang/rust/assets/1593513/39548f0c-7ad6-411b-abf8-f6668ff4da18)
</details>
Preview: <http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec>
WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104>
![image](https://github.com/rust-lang/rust/assets/1593513/4be1f9ff-c3ff-4b96-8f5b-b264df2e662d)
## Description
r? `@GuillaumeGomez`
The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files.
Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, *then* download.
This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse.
https://caniuse.com/async-functions
[^1]:
<https://microsoft.github.io/windows-docs-rs/>, a crate with
44MiB of pure names and no descriptions for them, is an outlier
and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much.
## Detailed sizes
```console
$ cat test.sh
set -ex
cp ../search-index*.js search-index.js
awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json
jq -c '.t' search-index.json > t.json
jq -c '.n' search-index.json > n.json
jq -c '.q' search-index.json > q.json
jq -c '.D' search-index.json > D.json
jq -c '.e' search-index.json > e.json
jq -c '.i' search-index.json > i.json
jq -c '.f' search-index.json > f.json
jq -c '.c' search-index.json > c.json
jq -c '.p' search-index.json > p.json
jq -c '.a' search-index.json > a.json
du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
$ bash test.sh
+ cp ../search-index1.78.0.js search-index.js
+ awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js
+ awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}'
+ sed -E 's:\\'\'':'\'':g'
+ jq -c .t search-index.json
+ jq -c .n search-index.json
+ jq -c .q search-index.json
+ jq -c .D search-index.json
+ jq -c .e search-index.json
+ jq -c .i search-index.json
+ jq -c .f search-index.json
+ jq -c .c search-index.json
+ jq -c .p search-index.json
+ jq -c .a search-index.json
+ du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
64K t.json
800K n.json
8.0K q.json
4.0K D.json
16K e.json
192K i.json
544K f.json
4.0K c.json
36K p.json
20K a.json
```
These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important).
t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library.
n = name, and that's now the largest section of the search index with the descriptions removed from it
q = parent *module* path, stored parallel to the items within
D = the size of each description shard, stored as vlq hex numbers
e = empty description bit flags, stored as a roaring bitmap
i = parent *type* index as a link into `p`, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr
f = function signature, stored as lists of lists that index into `p`
c = deprecation flag, stored as a roaring bitmap
p = parent *type*, stored separately and linked into from `i` and `f`
a = alias, as [[key, value]] pairs
## Search performance
http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html
For example, in stm32f4:
<table><thead><tr><th>before<th>after</tr></thead>
<tbody><tr><td>
```
Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 617
Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 198
Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 282
Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 426
Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 673
```
</td><td>
```
Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 716
Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 207
Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 289
Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 418
Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 687
```
</td></tr><tr><td>
```
user: 005.345 s
sys: 002.955 s
wall: 006.899 s
child_RSS_high: 583664 KiB
group_mem_high: 557876 KiB
```
</td><td>
```
user: 004.652 s
sys: 000.565 s
wall: 003.865 s
child_RSS_high: 538696 KiB
group_mem_high: 511724 KiB
```
</td></tr>
</table>
This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real.
## Future possibilities
* Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps).
* More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t.
* [Full text search](https://internals.rust-lang.org/t/full-text-search-for-rustdoc-and-doc-rs/20427) also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that [publishers](https://gist.github.com/notriddle/c289e77f3ed469d1c0238d1d135d49e1) have to store.
rustdoc: heavily simplify the synthesis of auto trait impls
`gd --numstat HEAD~2 HEAD src/librustdoc/clean/auto_trait.rs`
**+315 -705** 🟩🟥🟥🟥⬛
---
As outlined in issue #113015, there are currently 3[^1] large separate routines that “clean” `rustc_middle::ty` data types related to generics & predicates to rustdoc data types. Every single one has their own kinds of bugs. While I've patched a lot of bugs in each of the routines in the past, it's about time to unify them. This PR is only the first in a series. It completely **yanks** the custom “bounds cleaning” of mod `auto_trait` and reuses the routines found in mod `simplify`. As alluded to, `simplify` is also flawed but it's still more complete than `auto_trait`'s routines. [See also my review comment over at `tests/rustdoc/synthetic_auto/bounds.rs`](https://github.com/rust-lang/rust/pull/123340#discussion_r1546900539).
This is preparatory work for rewriting “bounds cleaning” from scratch in follow-up PRs in order to finally [fix] #113015.
Apart from that, I've eliminated all potential sources of *instability* in the rendered output.
See also #119597. I'm pretty sure this fixes#119597.
This PR does not attempt to fix [any other issues related to synthetic auto trait impls](https://github.com/rust-lang/rust/issues?q=is%3Aissue+is%3Aopen+label%3AA-synthetic-impls%20label%3AA-auto-traits).
However, it's definitely meant to be a *stepping stone* by making `auto_trait` more contributor-friendly.
---
* Replace `FxHash{Map,Set}` with `FxIndex{Map,Set}` to guarantee a stable iteration order
* Or as a perf opt, `UnordSet` (a thin wrapper around `FxHashSet`) in cases where we never iterate over the set.
* Yes, we do make use of `swap_remove` but that shouldn't matter since all the callers are deterministic. It does make the output less “predictable” but it's still better than before. Ofc, I rely on `rustc_infer` being deterministic. I hope that holds.
* Utilizing `clean::simplify` over the custom “bounds cleaning” routines wipes out the last reference to `collect_referenced_late_bound_regions` in rustdoc (`simplify` uses `bound_vars`) which was a source of instability / unpredictability (cc #116388)
* Remove the types `RegionTarget` and `RegionDeps` from `librustdoc`. They were duplicates of the identical types found in `rustc`. Just import them from `rustc`. For some reason, they were duplicated when splitting `auto_trait` in two in #49711.
* Get rid of the useless “type namespace” `AutoTraitFinder` in `librustdoc`
* The struct only held a `DocContext`, it was over-engineered
* Turn the associated functions into free ones
* Eliminates rightward drift; increases legibility
* `rustc` also contains a useless `AutoTraitFinder` struct but I plan on removing that in a follow-up PR
* Rename a bunch of methods to be way more descriptive
* Eliminate `use super::*;`
* Lead to `clean/mod.rs` accumulating a lot of unnecessary imports
* Made `auto_traits` less modular
* Eliminate a custom `TypeFolder`: We can just use the rustc helper `fold_regions` which does that for us
I plan on adding extensive documentation to `librustdoc`'s `auto_trait` in follow-up PRs.
I don't want to do that in this PR because further refactoring & bug fix PRs may alter the overall structure of `librustdoc`'s & `rustc`'s `auto_trait` modules to a great degree. I'm slowly digging into the dark details of `rustc`'s `auto_trait` module again and once I have the full picture I will be able to provide proper docs.
---
While this PR does indeed touch `rustc`'s `auto_trait` — mostly tiny refactorings — I argue this PR doesn't need any compiler reviewers next to rustdoc ones since that module falls under the purview of rustdoc — it used to be part of `librustdoc` after all (#49711).
Sorry for not having split this into more commits. If you'd like me to I can try to split it into more atomic commits retroactively. However, I don't know if that would actually make reviewing easier. I think the best way to review this might just be to place the master version of `auto_trait` on the left of your screen and the patched one on the right, not joking.
r? `@GuillaumeGomez`
[^1]: Or even 4 depending on the way you're counting.
Refactor stack overflow handling
Currently, every platform must implement a `Guard` that protects a thread from stack overflow. However, UNIX is the only platform that actually does so. Windows has a different mechanism for detecting stack overflow, while the other platforms don't detect it at all. Also, the UNIX stack overflow handling is split between `sys::pal::unix::stack_overflow`, which implements the signal handler, and `sys::pal::unix::thread`, which detects/installs guard pages.
This PR cleans this by getting rid of `Guard` and unifying UNIX stack overflow handling inside `stack_overflow` (commit 1). Therefore we can get rid of `sys_common::thread_info`, which stores `Guard` and the current `Thread` handle and move the `thread::current` TLS variable into `thread` (commit 2).
The second commit is not strictly speaking necessary. To keep the implementation clean, I've included it here, but if it causes too much noise, I can split it out without any trouble.
Don't inherit codegen attrs from parent static
Putting this up partly for discussion and partly for review. Specifically, in #121644, `@oli-obk` designed a system that creates new static items for representing nested allocations in statics. However, in that PR, oli made it so that these statics inherited the codegen attrs from the parent.
This causes problems such as colliding symbols with `#[export_name]` and ICEs with `#[no_mangle]` since these synthetic statics have no `tcx.item_name(..)`.
So the question is, is there any case where we *do* want to inherit codegen attrs from the parent? The only one that seems a bit suspicious is the thread-local attribute. And there may be some interesting interactions with the coverage attributes as well...
Fixes (after backport) #123274. Fixes#123243. cc #121644.
r? `@oli-obk` cc `@nnethercote` `@RalfJung` (reviewers on that pr)
Fix error message for `env!` when env var is not valid Unicode
Currently (without this PR) the `env!` macro emits an ```environment variable `name` not defined at compile time``` error when the environment variable is defined, but not a valid Unicode string. This PR introduces a separate more accurate error message, and a test to verify this behaviour.
For reference, before this PR, the new test would have outputted:
```
error: environment variable `NON_UNICODE_VAR` not defined at compile time
--> non_unicode_env.rs:2:13
|
2 | let _ = env!("NON_UNICODE_VAR");
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= help: use `std::env::var("NON_UNICODE_VAR")` to read the variable at run time
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to 1 previous error
```
whereas with this PR, the test ouputs:
```
error: environment variable `NON_UNICODE_VAR` is not a valid Unicode string
--> non_unicode_env.rs:2:13
|
2 | let _ = env!("NON_UNICODE_VAR");
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to 1 previous error
```
Use the `Align` type when parsing alignment attributes
Use the `Align` type in `rustc_attr::parse_alignment`, removing the need to call `Align::from_bytes(...).unwrap()` later in the compilation process.
Rewrite `core-no-fp-fmt-parse` test in Rust
Claiming the simple "core-no-fp-fmt-parse" test from #121876. `run_make_support` was altered with `arg_path` written in #121918 by `@abhay-51,` with additional doc comment.
Preliminary GSoC contribution for the project proposal mentored by `@jieyouxu.`
match lowering: sort `Eq` candidates in the failure case too
This is a slight tweak to MIR gen of matches. Take a match like:
```rust
match (s, flag) {
("a", _) if foo() => 1,
("b", true) => 2,
("a", false) => 3,
(_, true) => 4,
_ => 5,
}
```
If we switch on `s == "a"`, the first candidate matches, and we learn almost nothing about the second candidate. So there's a choice:
1. (what we do today) stop sorting candidates, keep the "b" case grouped with everything below. This could allow us to be clever here and test on `flag == true` next.
2. (what this PR does) sort "b" into the failure case. The "b" will be alone (fewer opportunities for picking a good test), but that means the two "a" cases require a single test.
Today, we aren't clever in which tests we pick, so this is an unambiguous win. In a future where we pick tests better, idk. Grouping tests as much as possible feels like a generally good strategy.
This was proposed in https://github.com/rust-lang/rust/issues/29623 (9 years ago :D)
CFI: Abstract Closures and Coroutines
This will abstract coroutines in a moment, it's just abstracting closures for now to show `@rcvalle`
This uses the same principal as the methods on traits - figure out the `dyn` type representing the fn trait, instantiate it, and attach that alias set. We're essentially just computing how we would be called in a dynamic context, and attaching that.
Similar to methods on a trait object, the most common way to indirectly
call a closure or coroutine is through the vtable on the appropriate
trait. This uses the same approach as we use for trait methods, after
backing out the trait arguments from the type.
Add support for `NonNull`s in the `ambiguous_wide_ptr_comparisions` lint
This PR add support for `NonNull` pointers in the `ambiguous_wide_ptr_comparisions` lint.
Fixes https://github.com/rust-lang/rust/issues/121264
r? `@Nadrieril` (since you just reviewed #121268, feel free to reassign)
KCFI: Require -C panic=abort
While the KCFI scheme is not incompatible with unwinding, LLVM's `invoke` instruction does not currently support KCFI bundles. While it likely will in the near future, we won't be able to assume that in Rust for a while.
We encountered this problem while [turning on closure support](https://github.com/rust-lang/rust/pull/123106#issuecomment-2027436640).
r? ``@workingjubilee``
Replace regions in const canonical vars' types with `'static` in next-solver canonicalizer
We shouldn't ever have non-static regions in consts on stable (or really any regions at all, lol).
The test I committed is less minimal than, e.g., https://github.com/rust-lang/rust/issues/123155?notification_referrer_id=NT_kwDOADgQyrMxMDAzNDU4MDI0OTozNjc0MzE0#issuecomment-2025472029 -- however, I believe that it actually portrays the underlying issue here a bit better than that one.
In the linked issue, we end up emitting a normalizes-to predicate for a const placeholder because we don't actually unify `false` and `""`. In the test I committed, we emit a normalizes-to predicate as a part of actually solving a negative coherence goal.
Fixes#123155Fixes#118783
r? lcnr
CFI: Support calling methods on supertraits
Automatically adjust `Virtual` calls to supertrait functions to use the supertrait's trait object type as the receiver rather than the child trait.
cc `@compiler-errors` - this is the next usage of `trait_object_ty` I intend to have, so I thought it might be relevant while reviewing the existing one.
Remove len argument from RawVec::reserve_for_push
Removes `RawVec::reserve_for_push`'s `len` argument since it's always the same as capacity.
Also makes `Vec::insert` use `RawVec::reserve_for_push`.
Stabilize `unchecked_{add,sub,mul}`
Tracking issue: #85122
I think we might as well just stabilize these basic three. They're the ones that have `nuw`/`nsw` flags in LLVM.
Notably, this doesn't include the potentially-more-complex or -more-situational things like `unchecked_neg` or `unchecked_shr` that are under different feature flags.
To quote Ralf https://github.com/rust-lang/rust/issues/85122#issuecomment-1681669646,
> Are there any objections to stabilizing at least `unchecked_{add,sub,mul}`? For those there shouldn't be any surprises about what their safety requirements are.
*Semantially* these are [already available on stable, even in `const`, via](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bdb1ff889b61950897f1e9f56d0c9a36) `checked_*`+`unreachable_unchecked`. So IMHO we might as well just let people write them directly, rather than try to go through a `let Some(x) = x.checked_add(y) else { unsafe { hint::unreachable_unchecked() }};` dance.
I added additional text to each method to attempt to better describe the behaviour and encourage `wrapping_*` instead.
r? rust-lang/libs-api
Add detection of [Partial]Ord methods in the `ambiguous_wide_pointer_comparisons` lint
Partially addresses https://github.com/rust-lang/rust/issues/121264 by adding diagnostics items for PartialOrd and Ord methods, detecting such diagnostics items as "binary operation" and suggesting the correct replacement.
I also took the opportunity to change the suggestion to use new methods `.cast()` on `*mut T` an d `*const T`.