Dominator-order information is only needed for coverage graphs, and is easy
enough to collect by just traversing the graph again.
This avoids wasted work when computing graph dominators for any other purpose.
stabilize Strict Provenance and Exposed Provenance APIs
Given that [RFC 3559](https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html) has been accepted, t-lang has approved the concept of provenance to exist in the language. So I think it's time that we stabilize the strict provenance and exposed provenance APIs, and discuss provenance explicitly in the docs:
```rust
// core::ptr
pub const fn without_provenance<T>(addr: usize) -> *const T;
pub const fn dangling<T>() -> *const T;
pub const fn without_provenance_mut<T>(addr: usize) -> *mut T;
pub const fn dangling_mut<T>() -> *mut T;
pub fn with_exposed_provenance<T>(addr: usize) -> *const T;
pub fn with_exposed_provenance_mut<T>(addr: usize) -> *mut T;
impl<T: ?Sized> *const T {
pub fn addr(self) -> usize;
pub fn expose_provenance(self) -> usize;
pub fn with_addr(self, addr: usize) -> Self;
pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}
impl<T: ?Sized> *mut T {
pub fn addr(self) -> usize;
pub fn expose_provenance(self) -> usize;
pub fn with_addr(self, addr: usize) -> Self;
pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}
impl<T: ?Sized> NonNull<T> {
pub fn addr(self) -> NonZero<usize>;
pub fn with_addr(self, addr: NonZero<usize>) -> Self;
pub fn map_addr(self, f: impl FnOnce(NonZero<usize>) -> NonZero<usize>) -> Self;
}
```
I also did a pass over the docs to adjust them, because this is no longer an "experiment". The `ptr` docs now discuss the concept of provenance in general, and then they go into the two families of APIs for dealing with provenance: Strict Provenance and Exposed Provenance. I removed the discussion of how pointers also have an associated "address space" -- that is not actually tracked in the pointer value, it is tracked in the type, so IMO it just distracts from the core point of provenance. I also adjusted the docs for `with_exposed_provenance` to make it clear that we cannot guarantee much about this function, it's all best-effort.
There are two unstable lints associated with the strict_provenance feature gate; I moved them to a new [strict_provenance_lints](https://github.com/rust-lang/rust/issues/130351) feature since I didn't want this PR to have an even bigger FCP. ;)
`@rust-lang/opsem` Would be great to get some feedback on the docs here. :)
Nominating for `@rust-lang/libs-api.`
Part of https://github.com/rust-lang/rust/issues/95228.
[FCP comment](https://github.com/rust-lang/rust/pull/130350#issuecomment-2395114536)
Use `ThinVec` for PredicateObligation storage
~~I noticed while profiling clippy on a project that a large amount of time is being spent allocating `Vec`s for `PredicateObligation`, and the `Vec`s are often quite small. This is an attempt to optimise this by using SmallVec to avoid heap allocations for these common small Vecs.~~
This PR turns all the `Vec<PredicateObligation>` into a single type alias while avoiding referring to `Vec` around it, then swaps the type over to `ThinVec<PredicateObligation>` and fixes the fallout. This also contains an implementation of `ThinVec::extract_if`, copied from `Vec::extract_if` and currently being upstreamed to https://github.com/Gankra/thin-vec/pull/66.
This leads to a small (0.2-0.7%) performance gain in the latest perf run.
Like #130865 did for the standard library, we can use `&raw` in the
compiler now that stage0 supports it. Also like the other issue, I did
not make any doc or test changes at this time.
Use more slice patterns inside the compiler
Nothing super noteworthy. Just replacing the common 'fragile' pattern of "length check followed by indexing or unwrap" with slice patterns for legibility and 'robustness'.
r? ghost
Optimize SipHash by reordering compress instructions
This PR optimizes hashing by changing the order of instructions in the sip.rs `compress` macro so the CPU can parallelize it better. The new order is taken directly from Fig 2.1 in [the SipHash paper](https://eprint.iacr.org/2012/351.pdf) (but with the xors moved which makes it a little faster). I attempted to optimize it some more after this, but I think this might be the optimal instruction order. Note that this shouldn't change the behavior of hashing at all, only statements that don't depend on each other were reordered.
It appears like the current order hasn't changed since its [original implementation from 2012](fada46c421 (diff-b751133c229259d7099bbbc7835324e5504b91ab1aded9464f0c48cd22e5e420R35)) which doesn't look like it was written with data dependencies in mind.
Running `./x bench library/core --stage 0 --test-args hash` before and after this change shows the following results:
Before:
```
benchmarks:
hash::sip::bench_bytes_4 7.20/iter +/- 0.70
hash::sip::bench_bytes_7 9.01/iter +/- 0.35
hash::sip::bench_bytes_8 8.12/iter +/- 0.10
hash::sip::bench_bytes_a_16 10.07/iter +/- 0.44
hash::sip::bench_bytes_b_32 13.46/iter +/- 0.71
hash::sip::bench_bytes_c_128 37.75/iter +/- 0.48
hash::sip::bench_long_str 121.18/iter +/- 3.01
hash::sip::bench_str_of_8_bytes 11.20/iter +/- 0.25
hash::sip::bench_str_over_8_bytes 11.20/iter +/- 0.26
hash::sip::bench_str_under_8_bytes 9.89/iter +/- 0.59
hash::sip::bench_u32 9.57/iter +/- 0.44
hash::sip::bench_u32_keyed 6.97/iter +/- 0.10
hash::sip::bench_u64 8.63/iter +/- 0.07
```
After:
```
benchmarks:
hash::sip::bench_bytes_4 6.64/iter +/- 0.14
hash::sip::bench_bytes_7 8.19/iter +/- 0.07
hash::sip::bench_bytes_8 8.59/iter +/- 0.68
hash::sip::bench_bytes_a_16 9.73/iter +/- 0.49
hash::sip::bench_bytes_b_32 12.70/iter +/- 0.06
hash::sip::bench_bytes_c_128 32.38/iter +/- 0.20
hash::sip::bench_long_str 102.99/iter +/- 0.82
hash::sip::bench_str_of_8_bytes 10.71/iter +/- 0.21
hash::sip::bench_str_over_8_bytes 11.73/iter +/- 0.17
hash::sip::bench_str_under_8_bytes 10.33/iter +/- 0.41
hash::sip::bench_u32 10.41/iter +/- 0.29
hash::sip::bench_u32_keyed 9.50/iter +/- 0.30
hash::sip::bench_u64 8.44/iter +/- 1.09
```
I ran this on my computer so there's some noise, but you can tell at least `bench_long_str` is significantly faster (~18%).
Also, I noticed the same compress function from the library is used in the compiler as well, so I took the liberty of copy-pasting this change to there as well.
Thanks `@semisol` for porting SipHash for another project which led me to notice this issue in Rust, and for helping investigate. <3
Instead of keeping a list of architectures which have native support
for 64-bit atomics, just use #[cfg(target_has_atomic = "64")] and its
inverted counterpart to determine whether we need to use portable
AtomicU64 on the target architecture.
Fixes for 32-bit SPARC on Linux
This PR fixes a number of issues which previously prevented `rustc` from being built
successfully for 32-bit SPARC using the `sparc-unknown-linux-gnu` triplet.
In particular, it adds linking against `libatomic` where necessary, uses portable `AtomicU64`
for `rustc_data_structures` and rewrites the spec for `sparc_unknown_linux_gnu` to use
`TargetOptions` and replaces the previously used `-mv8plus` with the more portable
`-mcpu=v9 -m32`.
To make `rustc` build successfully, support for 32-bit SPARC needs to be added to the `object`
crate as well as the `nix` crate which I will be sending out later as well.
r? nagisa
This is possible now that inline const blocks are stable; the idea was
even mentioned as an alternative when `uninit_array()` was added:
<https://github.com/rust-lang/rust/pull/65580#issuecomment-544200681>
> if it’s stabilized soon enough maybe it’s not worth having a
> standard library method that will be replaceable with
> `let buffer = [MaybeUninit::<T>::uninit(); $N];`
Const array repetition and inline const blocks are now stable (in the
next release), so that circumstance has come to pass, and we no longer
have reason to want `uninit_array()` other than convenience. Therefore,
let’s evaluate the inconvenience by not using `uninit_array()` in
the standard library, before potentially deleting it entirely.
Added an associated `const THIS_IMPLEMENTATION_HAS_BEEN_TRIPLE_CHECKED`
to the `StableOrd` trait to ensure that implementors carefully consider
whether the trait's contract is upheld, as incorrect implementations can
cause miscompilations.
This makes their intent and expected location clearer. We see some
examples where these comments were not clearly separate from `use`
declarations, which made it hard to understand what the comment is
describing.
This version shaves off ca 2% of the cycles in my experiments
and makes the control flow easier to follow for me and hopefully
others, including the compiler.
Someone gave me a working profiler and by God I'm using it.
This patch has been extracted from #123720. It specifically enhances
`Sccs` to allow tracking arbitrary commutative properties of SCCs, including
- reachable values (max/min)
- SCC-internal values (max/min)
This helps with among other things universe computation: we can now identify
SCC universes as a straightforward "find max/min" operation during SCC construction.
It's also more or less zero-cost; don't use the new features, don't pay for them.
This commit also vastly extends the documentation of the SCCs module, which I had a very hard time following.
Update ena to 0.14.3
Includes https://github.com/rust-lang/ena/pull/53, which removes a trivial `Self: Sized` bound that prevents `ena` from building on the new solver.
It's a macro that just creates an enum with a `from_u32` method. It has
two arms. One is unused and the other has a single use.
This commit inlines that single use and removes the whole macro. This
increases readability because we don't have two different macros
interacting (`enum_from_u32` and `language_item_table`).
It provides a way to effectively embed a linked list within an
`IndexVec` and also iterate over that list. It's written in a very
generic way, involving two traits `Links` and `LinkElem`. But the
`Links` trait is only impl'd for `IndexVec` and `&IndexVec`, and the
whole thing is only used in one module within `rustc_borrowck`. So I
think it's over-engineered and hard to read. Plus it has no comments.
This commit removes it, and adds a (non-generic) local iterator for the
use within `rustc_borrowck`. Much simpler.
It is optimized for lists with a single element, avoiding the need for
an allocation in that case. But `SmallVec<[T; 1]>` also avoids the
allocation, and is better in general: more standard, log2 number of
allocations if the list exceeds one item, and a much more capable API.
This commit removes `TinyList` and converts the two uses to
`SmallVec<[T; 1]>`. It also reorders the `use` items in the relevant
file so they are in just two sections (`pub` and non-`pub`), ordered
alphabetically, instead of many sections. (This is a relevant part of
the change because I had to decide where to add a `use` item for
`SmallVec`.)
And move the `repr` line after the `derive` line, where it's harder to
overlook. (I overlooked it initially, and didn't understand how this
type worked.)
Stabilize the size of incr comp object file names
The current implementation does not produce stable-length paths, and we create the paths in a way that makes our allocation behavior is nondeterministic. I think `@eddyb` fixed a number of other cases like this in the past, and this PR fixes another one. Whether that actually matters I have no idea, but we still have bimodal behavior in rustc-perf and the non-uniformity in `find` and `ls` was bothering me.
I've also removed the truncation of the mangled CGU names. Before this PR incr comp paths look like this:
```
target/debug/incremental/scratch-38izrrq90cex7/s-gux6gz0ow8-1ph76gg-ewe1xj434l26w9up5bedsojpd/261xgo1oqnd90ry5.o
```
And after, they look like this:
```
target/debug/incremental/scratch-035omutqbfkbw/s-gux6borni0-16r3v1j-6n64tmwqzchtgqzwwim5amuga/55v2re42sztc8je9bva6g8ft3.o
```
On the one hand, I'm sure this will break some people's builds because they're on Windows and only a few bytes from the path length limit. But if we're that seriously worried about the length of our file names, I have some other ideas on how to make them smaller. And last time I deleted some hash truncations from the compiler, there was a huge drop in the number if incremental compilation ICEs that were reported: https://github.com/rust-lang/rust/pull/110367https://github.com/rust-lang/rust/pull/110367
---
Upon further reading, this PR actually fixes a bug. This comment says the CGU names are supposed to be a fixed-length hash, and before this PR they aren't: ca7d34efa9/compiler/rustc_monomorphize/src/partitioning.rs (L445-L448)
This is a workaround for #122758, but it's not clear why 1.79 requires a
more extensive amount of no_inline than the previous release. Seems like
there's something relatively subtle happening here.
Does not necessarily change much, but we never overwrite it, so I see no reason
for it to be in the `Successors` trait. (+we already have a similar `is_cyclic`)
Add add/sub methods that only panic with debug assertions to rustc
This mitigates the perf impact of enabling overflow checks on rustc. The change to use overflow checks will be done in a later PR.
For rust-lang/compiler-team#724, based on data gathered in #119440.
recursively evaluate the constants in everything that is 'mentioned'
This is another attempt at fixing https://github.com/rust-lang/rust/issues/107503. The previous attempt at https://github.com/rust-lang/rust/pull/112879 seems stuck in figuring out where the [perf regression](https://perf.rust-lang.org/compare.html?start=c55d1ee8d4e3162187214692229a63c2cc5e0f31&end=ec8de1ebe0d698b109beeaaac83e60f4ef8bb7d1&stat=instructions:u) comes from. In https://github.com/rust-lang/rust/pull/122258 I learned some things, which informed the approach this PR is taking.
Quoting from the new collector docs, which explain the high-level idea:
```rust
//! One important role of collection is to evaluate all constants that are used by all the items
//! which are being collected. Codegen can then rely on only encountering constants that evaluate
//! successfully, and if a constant fails to evaluate, the collector has much better context to be
//! able to show where this constant comes up.
//!
//! However, the exact set of "used" items (collected as described above), and therefore the exact
//! set of used constants, can depend on optimizations. Optimizing away dead code may optimize away
//! a function call that uses a failing constant, so an unoptimized build may fail where an
//! optimized build succeeds. This is undesirable.
//!
//! To fix this, the collector has the concept of "mentioned" items. Some time during the MIR
//! pipeline, before any optimization-level-dependent optimizations, we compute a list of all items
//! that syntactically appear in the code. These are considered "mentioned", and even if they are in
//! dead code and get optimized away (which makes them no longer "used"), they are still
//! "mentioned". For every used item, the collector ensures that all mentioned items, recursively,
//! do not use a failing constant. This is reflected via the [`CollectionMode`], which determines
//! whether we are visiting a used item or merely a mentioned item.
//!
//! The collector and "mentioned items" gathering (which lives in `rustc_mir_transform::mentioned_items`)
//! need to stay in sync in the following sense:
//!
//! - For every item that the collector gather that could eventually lead to build failure (most
//! likely due to containing a constant that fails to evaluate), a corresponding mentioned item
//! must be added. This should use the exact same strategy as the ecollector to make sure they are
//! in sync. However, while the collector works on monomorphized types, mentioned items are
//! collected on generic MIR -- so any time the collector checks for a particular type (such as
//! `ty::FnDef`), we have to just onconditionally add this as a mentioned item.
//! - In `visit_mentioned_item`, we then do with that mentioned item exactly what the collector
//! would have done during regular MIR visiting. Basically you can think of the collector having
//! two stages, a pre-monomorphization stage and a post-monomorphization stage (usually quite
//! literally separated by a call to `self.monomorphize`); the pre-monomorphizationn stage is
//! duplicated in mentioned items gathering and the post-monomorphization stage is duplicated in
//! `visit_mentioned_item`.
//! - Finally, as a performance optimization, the collector should fill `used_mentioned_item` during
//! its MIR traversal with exactly what mentioned item gathering would have added in the same
//! situation. This detects mentioned items that have *not* been optimized away and hence don't
//! need a dedicated traversal.
enum CollectionMode {
/// Collect items that are used, i.e., actually needed for codegen.
///
/// Which items are used can depend on optimization levels, as MIR optimizations can remove
/// uses.
UsedItems,
/// Collect items that are mentioned. The goal of this mode is that it is independent of
/// optimizations: the set of "mentioned" items is computed before optimizations are run.
///
/// The exact contents of this set are *not* a stable guarantee. (For instance, it is currently
/// computed after drop-elaboration. If we ever do some optimizations even in debug builds, we
/// might decide to run them before computing mentioned items.) The key property of this set is
/// that it is optimization-independent.
MentionedItems,
}
```
And the `mentioned_items` MIR body field docs:
```rust
/// Further items that were mentioned in this function and hence *may* become monomorphized,
/// depending on optimizations. We use this to avoid optimization-dependent compile errors: the
/// collector recursively traverses all "mentioned" items and evaluates all their
/// `required_consts`.
///
/// This is *not* soundness-critical and the contents of this list are *not* a stable guarantee.
/// All that's relevant is that this set is optimization-level-independent, and that it includes
/// everything that the collector would consider "used". (For example, we currently compute this
/// set after drop elaboration, so some drop calls that can never be reached are not considered
/// "mentioned".) See the documentation of `CollectionMode` in
/// `compiler/rustc_monomorphize/src/collector.rs` for more context.
pub mentioned_items: Vec<Spanned<MentionedItem<'tcx>>>,
```
Fixes#107503
Filed #122758 to track a proper fix, but this seems to solve the
problem in the meantime and is probably OK in terms of impact on
(internal) doc quality.