Use more slice patterns inside the compiler
Nothing super noteworthy. Just replacing the common 'fragile' pattern of "length check followed by indexing or unwrap" with slice patterns for legibility and 'robustness'.
r? ghost
Optimize SipHash by reordering compress instructions
This PR optimizes hashing by changing the order of instructions in the sip.rs `compress` macro so the CPU can parallelize it better. The new order is taken directly from Fig 2.1 in [the SipHash paper](https://eprint.iacr.org/2012/351.pdf) (but with the xors moved which makes it a little faster). I attempted to optimize it some more after this, but I think this might be the optimal instruction order. Note that this shouldn't change the behavior of hashing at all, only statements that don't depend on each other were reordered.
It appears like the current order hasn't changed since its [original implementation from 2012](fada46c421 (diff-b751133c229259d7099bbbc7835324e5504b91ab1aded9464f0c48cd22e5e420R35)) which doesn't look like it was written with data dependencies in mind.
Running `./x bench library/core --stage 0 --test-args hash` before and after this change shows the following results:
Before:
```
benchmarks:
hash::sip::bench_bytes_4 7.20/iter +/- 0.70
hash::sip::bench_bytes_7 9.01/iter +/- 0.35
hash::sip::bench_bytes_8 8.12/iter +/- 0.10
hash::sip::bench_bytes_a_16 10.07/iter +/- 0.44
hash::sip::bench_bytes_b_32 13.46/iter +/- 0.71
hash::sip::bench_bytes_c_128 37.75/iter +/- 0.48
hash::sip::bench_long_str 121.18/iter +/- 3.01
hash::sip::bench_str_of_8_bytes 11.20/iter +/- 0.25
hash::sip::bench_str_over_8_bytes 11.20/iter +/- 0.26
hash::sip::bench_str_under_8_bytes 9.89/iter +/- 0.59
hash::sip::bench_u32 9.57/iter +/- 0.44
hash::sip::bench_u32_keyed 6.97/iter +/- 0.10
hash::sip::bench_u64 8.63/iter +/- 0.07
```
After:
```
benchmarks:
hash::sip::bench_bytes_4 6.64/iter +/- 0.14
hash::sip::bench_bytes_7 8.19/iter +/- 0.07
hash::sip::bench_bytes_8 8.59/iter +/- 0.68
hash::sip::bench_bytes_a_16 9.73/iter +/- 0.49
hash::sip::bench_bytes_b_32 12.70/iter +/- 0.06
hash::sip::bench_bytes_c_128 32.38/iter +/- 0.20
hash::sip::bench_long_str 102.99/iter +/- 0.82
hash::sip::bench_str_of_8_bytes 10.71/iter +/- 0.21
hash::sip::bench_str_over_8_bytes 11.73/iter +/- 0.17
hash::sip::bench_str_under_8_bytes 10.33/iter +/- 0.41
hash::sip::bench_u32 10.41/iter +/- 0.29
hash::sip::bench_u32_keyed 9.50/iter +/- 0.30
hash::sip::bench_u64 8.44/iter +/- 1.09
```
I ran this on my computer so there's some noise, but you can tell at least `bench_long_str` is significantly faster (~18%).
Also, I noticed the same compress function from the library is used in the compiler as well, so I took the liberty of copy-pasting this change to there as well.
Thanks `@semisol` for porting SipHash for another project which led me to notice this issue in Rust, and for helping investigate. <3
Instead of keeping a list of architectures which have native support
for 64-bit atomics, just use #[cfg(target_has_atomic = "64")] and its
inverted counterpart to determine whether we need to use portable
AtomicU64 on the target architecture.
Fixes for 32-bit SPARC on Linux
This PR fixes a number of issues which previously prevented `rustc` from being built
successfully for 32-bit SPARC using the `sparc-unknown-linux-gnu` triplet.
In particular, it adds linking against `libatomic` where necessary, uses portable `AtomicU64`
for `rustc_data_structures` and rewrites the spec for `sparc_unknown_linux_gnu` to use
`TargetOptions` and replaces the previously used `-mv8plus` with the more portable
`-mcpu=v9 -m32`.
To make `rustc` build successfully, support for 32-bit SPARC needs to be added to the `object`
crate as well as the `nix` crate which I will be sending out later as well.
r? nagisa
This is possible now that inline const blocks are stable; the idea was
even mentioned as an alternative when `uninit_array()` was added:
<https://github.com/rust-lang/rust/pull/65580#issuecomment-544200681>
> if it’s stabilized soon enough maybe it’s not worth having a
> standard library method that will be replaceable with
> `let buffer = [MaybeUninit::<T>::uninit(); $N];`
Const array repetition and inline const blocks are now stable (in the
next release), so that circumstance has come to pass, and we no longer
have reason to want `uninit_array()` other than convenience. Therefore,
let’s evaluate the inconvenience by not using `uninit_array()` in
the standard library, before potentially deleting it entirely.
Added an associated `const THIS_IMPLEMENTATION_HAS_BEEN_TRIPLE_CHECKED`
to the `StableOrd` trait to ensure that implementors carefully consider
whether the trait's contract is upheld, as incorrect implementations can
cause miscompilations.
This makes their intent and expected location clearer. We see some
examples where these comments were not clearly separate from `use`
declarations, which made it hard to understand what the comment is
describing.
This version shaves off ca 2% of the cycles in my experiments
and makes the control flow easier to follow for me and hopefully
others, including the compiler.
Someone gave me a working profiler and by God I'm using it.
This patch has been extracted from #123720. It specifically enhances
`Sccs` to allow tracking arbitrary commutative properties of SCCs, including
- reachable values (max/min)
- SCC-internal values (max/min)
This helps with among other things universe computation: we can now identify
SCC universes as a straightforward "find max/min" operation during SCC construction.
It's also more or less zero-cost; don't use the new features, don't pay for them.
This commit also vastly extends the documentation of the SCCs module, which I had a very hard time following.
Update ena to 0.14.3
Includes https://github.com/rust-lang/ena/pull/53, which removes a trivial `Self: Sized` bound that prevents `ena` from building on the new solver.
It's a macro that just creates an enum with a `from_u32` method. It has
two arms. One is unused and the other has a single use.
This commit inlines that single use and removes the whole macro. This
increases readability because we don't have two different macros
interacting (`enum_from_u32` and `language_item_table`).
It provides a way to effectively embed a linked list within an
`IndexVec` and also iterate over that list. It's written in a very
generic way, involving two traits `Links` and `LinkElem`. But the
`Links` trait is only impl'd for `IndexVec` and `&IndexVec`, and the
whole thing is only used in one module within `rustc_borrowck`. So I
think it's over-engineered and hard to read. Plus it has no comments.
This commit removes it, and adds a (non-generic) local iterator for the
use within `rustc_borrowck`. Much simpler.
It is optimized for lists with a single element, avoiding the need for
an allocation in that case. But `SmallVec<[T; 1]>` also avoids the
allocation, and is better in general: more standard, log2 number of
allocations if the list exceeds one item, and a much more capable API.
This commit removes `TinyList` and converts the two uses to
`SmallVec<[T; 1]>`. It also reorders the `use` items in the relevant
file so they are in just two sections (`pub` and non-`pub`), ordered
alphabetically, instead of many sections. (This is a relevant part of
the change because I had to decide where to add a `use` item for
`SmallVec`.)
And move the `repr` line after the `derive` line, where it's harder to
overlook. (I overlooked it initially, and didn't understand how this
type worked.)
Stabilize the size of incr comp object file names
The current implementation does not produce stable-length paths, and we create the paths in a way that makes our allocation behavior is nondeterministic. I think `@eddyb` fixed a number of other cases like this in the past, and this PR fixes another one. Whether that actually matters I have no idea, but we still have bimodal behavior in rustc-perf and the non-uniformity in `find` and `ls` was bothering me.
I've also removed the truncation of the mangled CGU names. Before this PR incr comp paths look like this:
```
target/debug/incremental/scratch-38izrrq90cex7/s-gux6gz0ow8-1ph76gg-ewe1xj434l26w9up5bedsojpd/261xgo1oqnd90ry5.o
```
And after, they look like this:
```
target/debug/incremental/scratch-035omutqbfkbw/s-gux6borni0-16r3v1j-6n64tmwqzchtgqzwwim5amuga/55v2re42sztc8je9bva6g8ft3.o
```
On the one hand, I'm sure this will break some people's builds because they're on Windows and only a few bytes from the path length limit. But if we're that seriously worried about the length of our file names, I have some other ideas on how to make them smaller. And last time I deleted some hash truncations from the compiler, there was a huge drop in the number if incremental compilation ICEs that were reported: https://github.com/rust-lang/rust/pull/110367https://github.com/rust-lang/rust/pull/110367
---
Upon further reading, this PR actually fixes a bug. This comment says the CGU names are supposed to be a fixed-length hash, and before this PR they aren't: ca7d34efa9/compiler/rustc_monomorphize/src/partitioning.rs (L445-L448)
This is a workaround for #122758, but it's not clear why 1.79 requires a
more extensive amount of no_inline than the previous release. Seems like
there's something relatively subtle happening here.
Does not necessarily change much, but we never overwrite it, so I see no reason
for it to be in the `Successors` trait. (+we already have a similar `is_cyclic`)
Add add/sub methods that only panic with debug assertions to rustc
This mitigates the perf impact of enabling overflow checks on rustc. The change to use overflow checks will be done in a later PR.
For rust-lang/compiler-team#724, based on data gathered in #119440.
recursively evaluate the constants in everything that is 'mentioned'
This is another attempt at fixing https://github.com/rust-lang/rust/issues/107503. The previous attempt at https://github.com/rust-lang/rust/pull/112879 seems stuck in figuring out where the [perf regression](https://perf.rust-lang.org/compare.html?start=c55d1ee8d4e3162187214692229a63c2cc5e0f31&end=ec8de1ebe0d698b109beeaaac83e60f4ef8bb7d1&stat=instructions:u) comes from. In https://github.com/rust-lang/rust/pull/122258 I learned some things, which informed the approach this PR is taking.
Quoting from the new collector docs, which explain the high-level idea:
```rust
//! One important role of collection is to evaluate all constants that are used by all the items
//! which are being collected. Codegen can then rely on only encountering constants that evaluate
//! successfully, and if a constant fails to evaluate, the collector has much better context to be
//! able to show where this constant comes up.
//!
//! However, the exact set of "used" items (collected as described above), and therefore the exact
//! set of used constants, can depend on optimizations. Optimizing away dead code may optimize away
//! a function call that uses a failing constant, so an unoptimized build may fail where an
//! optimized build succeeds. This is undesirable.
//!
//! To fix this, the collector has the concept of "mentioned" items. Some time during the MIR
//! pipeline, before any optimization-level-dependent optimizations, we compute a list of all items
//! that syntactically appear in the code. These are considered "mentioned", and even if they are in
//! dead code and get optimized away (which makes them no longer "used"), they are still
//! "mentioned". For every used item, the collector ensures that all mentioned items, recursively,
//! do not use a failing constant. This is reflected via the [`CollectionMode`], which determines
//! whether we are visiting a used item or merely a mentioned item.
//!
//! The collector and "mentioned items" gathering (which lives in `rustc_mir_transform::mentioned_items`)
//! need to stay in sync in the following sense:
//!
//! - For every item that the collector gather that could eventually lead to build failure (most
//! likely due to containing a constant that fails to evaluate), a corresponding mentioned item
//! must be added. This should use the exact same strategy as the ecollector to make sure they are
//! in sync. However, while the collector works on monomorphized types, mentioned items are
//! collected on generic MIR -- so any time the collector checks for a particular type (such as
//! `ty::FnDef`), we have to just onconditionally add this as a mentioned item.
//! - In `visit_mentioned_item`, we then do with that mentioned item exactly what the collector
//! would have done during regular MIR visiting. Basically you can think of the collector having
//! two stages, a pre-monomorphization stage and a post-monomorphization stage (usually quite
//! literally separated by a call to `self.monomorphize`); the pre-monomorphizationn stage is
//! duplicated in mentioned items gathering and the post-monomorphization stage is duplicated in
//! `visit_mentioned_item`.
//! - Finally, as a performance optimization, the collector should fill `used_mentioned_item` during
//! its MIR traversal with exactly what mentioned item gathering would have added in the same
//! situation. This detects mentioned items that have *not* been optimized away and hence don't
//! need a dedicated traversal.
enum CollectionMode {
/// Collect items that are used, i.e., actually needed for codegen.
///
/// Which items are used can depend on optimization levels, as MIR optimizations can remove
/// uses.
UsedItems,
/// Collect items that are mentioned. The goal of this mode is that it is independent of
/// optimizations: the set of "mentioned" items is computed before optimizations are run.
///
/// The exact contents of this set are *not* a stable guarantee. (For instance, it is currently
/// computed after drop-elaboration. If we ever do some optimizations even in debug builds, we
/// might decide to run them before computing mentioned items.) The key property of this set is
/// that it is optimization-independent.
MentionedItems,
}
```
And the `mentioned_items` MIR body field docs:
```rust
/// Further items that were mentioned in this function and hence *may* become monomorphized,
/// depending on optimizations. We use this to avoid optimization-dependent compile errors: the
/// collector recursively traverses all "mentioned" items and evaluates all their
/// `required_consts`.
///
/// This is *not* soundness-critical and the contents of this list are *not* a stable guarantee.
/// All that's relevant is that this set is optimization-level-independent, and that it includes
/// everything that the collector would consider "used". (For example, we currently compute this
/// set after drop elaboration, so some drop calls that can never be reached are not considered
/// "mentioned".) See the documentation of `CollectionMode` in
/// `compiler/rustc_monomorphize/src/collector.rs` for more context.
pub mentioned_items: Vec<Spanned<MentionedItem<'tcx>>>,
```
Fixes#107503
Filed #122758 to track a proper fix, but this seems to solve the
problem in the meantime and is probably OK in terms of impact on
(internal) doc quality.
Adding support of quirky filesystems occuring in virtualised settings not
having full POSIX support for memory mapped files. Example: current virtiofs
with cache disabled, occuring in Incus/LXD or Kata Containers. Has been
hitting various virtualised filesystems since 2016, depending on their levels
of maturity at the time. The situation will perhaps improve when virtiofs DAX
support patches will have made it into the qemu mainline.
On a reliability level, using the MAP_PRIVATE sycall flag instead of the
MAP_SHARED syscall flag for the mmap() system call does have some undefined
behaviour when the caller update the memory mapping of the mmap()ed file, but
MAP_SHARED does allow not only the calling process but other processes to
modify the memory mapping. Thus, in the current context, using MAP_PRIVATE
copy-on-write is marginally more reliable than MAP_SHARED.
This discussion of reliability is orthogonal to the type system enforced safety
policy of rust, which does not claim to handle memory modification of memory
mapped files triggered through the operating system and not the running rust
process.
fixes#117448
For example unnecessary imports in std::prelude that can be eliminated:
```rust
use std::option::Option::Some;//~ WARNING the item `Some` is imported redundantly
use std::option::Option::None; //~ WARNING the item `None` is imported redundantly
```
Invert diagnostic lints.
That is, change `diagnostic_outside_of_impl` and `untranslatable_diagnostic` from `allow` to `deny`, because more than half of the compiler has been converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow` attributes, which proves that this change is warranted.
r? ````@davidtwco````
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
Pack u128 in the compiler to mitigate new alignment
This is based on #116672, adding a new `#[repr(packed(8))]` wrapper on `u128` to avoid changing any of the compiler's size assertions. This is needed in two places:
* `SwitchTargets`, otherwise its `SmallVec<[u128; 1]>` gets padded up to 32 bytes.
* `LitKind::Int`, so that entire `enum` can stay 24 bytes.
* This change definitely has far-reaching effects though, since it's public.
[rustc_data_structures] Use partition_point to find slice range end.
This PR uses approach introduced in https://github.com/rust-lang/rust/pull/114152 to find
the end of the range. It's much easier to understand and reason about invariants of such
implementation.
Technically it's possible to make it even shorter by returning `&[start..end]` unconditionally
because even if searched item is not present in the slice, `start` and `end` would point at
the same index, so the range would be empty. The reason I decided not to use this shorter
implementation is because it would involve more comparisons in case there are no elements
in the slice with key equal to `key`.
Also, not that it matters much, but this implementation also improves perf according to the
benchmark below:
https://gist.github.com/ttsugriy/63c0ed39ae132b131931fa1f8a3dea55
The results on my M1 macbook air are:
```
Running benches/bin_search_slice_benchmark.rs (target/release/deps/bin_search_slice_benchmark-90fa6d68c3bd1298)
Benchmarking multiply add/binary_search_slice: Collecting 100 samples in estimated 5.0002 s (1
multiply add/binary_search_slice
time: [44.719 ns 44.918 ns 45.158 ns]
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking multiply add/binary_search_slice_new: Collecting 100 samples in estimated 5.0001
multiply add/binary_search_slice_new
time: [36.955 ns 37.060 ns 37.221 ns]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
```
rustc_mir_transform: Make DestinationPropagation stable for queries
By using `FxIndexMap` instead of `FxHashMap`, so that the order of visiting of locals is deterministic.
We also need to bless
`copy_propagation_arg.foo.DestinationPropagation.panic*.diff`. Do not review the diff of the diff. Instead look at the diff files before and after this commit. Both before and after this commit, 3 statements are replaced with nop. It's just that due to change in ordering, different statements are replaced. But the net result is the same. In other words, compare this diff (before fix):
* 090d5eac72/tests/mir-opt/dest-prop/copy_propagation_arg.foo.DestinationPropagation.panic-unwind.diff
With this diff (after fix):
* f603babd63/tests/mir-opt/dest-prop/copy_propagation_arg.foo.DestinationPropagation.panic-unwind.diff
and you can see that both before and after the fix, we replace 3 statements with `nop`s.
I find it _slightly_ surprising that the test this PR affects did not previously fail spuriously due to the indeterminism of `FxHashMap`, but I guess in can be explained with the predictability of small `FxHashMap`s with `usize` (`Local`) keys, or something along those lines.
This should fix [this](https://github.com/rust-lang/rust/pull/119252#discussion_r1436101791) comment, but I wanted to make a separate PR for this fix for a simpler development and review process.
Part of https://github.com/rust-lang/rust/issues/84447 which is E-help-wanted.
r? `@cjgillot` who is reviewer for the highly related PR https://github.com/rust-lang/rust/pull/119252.
Avoid specialization in the metadata serialization code
With the exception of a perf-only specialization for byte slices and byte vectors.
This uses the same trick of introducing a new trait and having the Encodable and Decodable derives add a bound to it as used for TyEncoder/TyDecoder. The new code is clearer about which encoder/decoder uses which impl and it reduces the dependency of rustc on specialization, making it easier to remove support for specialization entirely or turn it into a construct that is only allowed for perf optimizations if we decide to do this.
By using FxIndexMap instead of FxHashMap, so that the order of visiting
of locals is deterministic.
We also need to bless
copy_propagation_arg.foo.DestinationPropagation.panic*.diff. Do not
review the diff of the diff. Instead look at the diff file before and
after this commit. Both before and after this commit, 3 statements are
replaced with nop. It's just that due to change in ordering, different
statements are replaced. But the net result is the same.
StableCompare is a companion trait to `StableOrd`. Some types like `Symbol` can be compared in a cross-session stable way, but their `Ord` implementation is not stable. In such cases, a `StableOrd` implementation can be provided to offer a lightweight way for stable sorting. (The more heavyweight option is to sort via `ToStableHashKey`, but then sorting needs to have access to a stable hashing context and `ToStableHashKey` can also be expensive as in the case of `Symbol` where it has to allocate a `String`.)
This involves lots of breaking changes. There are two big changes that
force changes. The first is that the bitflag types now don't
automatically implement normal derive traits, so we need to derive them
manually.
Additionally, bitflags now have a hidden inner type by default, which
breaks our custom derives. The bitflags docs recommend using the impl
form in these cases, which I did.
[`RFC 3086`] Attempt to try to resolve blocking concerns
Implements what is described at https://github.com/rust-lang/rust/issues/83527#issuecomment-1744822345 to hopefully make some progress.
It is unknown if such approach is or isn't desired due to the lack of further feedback, as such, it is probably best to nominate this PR to the official entities.
`@rustbot` labels +I-compiler-nominated
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
Make `FatalErrorMarker` lower priority than other panics
This makes `FatalErrorMarker` lower priority than other panics in a parallel sections. If any other panics occur, they will be unwound instead of `FatalErrorMarker`. This ensures `rustc` will exit with the correct error code on ICEs.
This fixes https://github.com/rust-lang/rust/issues/116659.
enable parallel rustc front end in nightly builds
Refers to the [MCP](https://github.com/rust-lang/compiler-team/issues/681), this pr does:
1. Enable the parallel front end in nightly builds, and keep the default number of threads as 1. Then users can use the parallel rustc front end via -Z threads=n option.
2. Set it up to serial front end for beta/stable builds via bootstrap.
3. Switch over the alt builders from parallel rustc to serial, so we have artifacts without parallel to test against the artifacts with parallel.
r? `@oli-obk`
cc `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
Avoid a `track_errors` by bubbling up most errors from `check_well_formed`
I believe `track_errors` is mostly papering over issues that a sufficiently convoluted query graph can hit. I made this change, while the actual change I want to do is to stop bailing out early on errors, and instead use this new `ErrorGuaranteed` to invoke `check_well_formed` for individual items before doing all the `typeck` logic on them.
This works towards resolving https://github.com/rust-lang/rust/issues/97477 and various other ICEs, as well as allowing us to use parallel rustc more (which is currently rather limited/bottlenecked due to the very sequential nature in which we do `rustc_hir_analysis::check_crate`)
cc `@SparrowLii` `@Zoxc` for the new `try_par_for_each_in` function
Remove `IdFunctor` trait.
It's defined in `rustc_data_structures` but is only used in
`rustc_type_ir`. The code is shorter and easier to read if we remove
this layer of abstraction and just do the things directly where they are
needed.
r? `@BoxyUwU`