Pack u128 in the compiler to mitigate new alignment
This is based on #116672, adding a new `#[repr(packed(8))]` wrapper on `u128` to avoid changing any of the compiler's size assertions. This is needed in two places:
* `SwitchTargets`, otherwise its `SmallVec<[u128; 1]>` gets padded up to 32 bytes.
* `LitKind::Int`, so that entire `enum` can stay 24 bytes.
* This change definitely has far-reaching effects though, since it's public.
[rustc_data_structures] Use partition_point to find slice range end.
This PR uses approach introduced in https://github.com/rust-lang/rust/pull/114152 to find
the end of the range. It's much easier to understand and reason about invariants of such
implementation.
Technically it's possible to make it even shorter by returning `&[start..end]` unconditionally
because even if searched item is not present in the slice, `start` and `end` would point at
the same index, so the range would be empty. The reason I decided not to use this shorter
implementation is because it would involve more comparisons in case there are no elements
in the slice with key equal to `key`.
Also, not that it matters much, but this implementation also improves perf according to the
benchmark below:
https://gist.github.com/ttsugriy/63c0ed39ae132b131931fa1f8a3dea55
The results on my M1 macbook air are:
```
Running benches/bin_search_slice_benchmark.rs (target/release/deps/bin_search_slice_benchmark-90fa6d68c3bd1298)
Benchmarking multiply add/binary_search_slice: Collecting 100 samples in estimated 5.0002 s (1
multiply add/binary_search_slice
time: [44.719 ns 44.918 ns 45.158 ns]
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking multiply add/binary_search_slice_new: Collecting 100 samples in estimated 5.0001
multiply add/binary_search_slice_new
time: [36.955 ns 37.060 ns 37.221 ns]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
```
rustc_mir_transform: Make DestinationPropagation stable for queries
By using `FxIndexMap` instead of `FxHashMap`, so that the order of visiting of locals is deterministic.
We also need to bless
`copy_propagation_arg.foo.DestinationPropagation.panic*.diff`. Do not review the diff of the diff. Instead look at the diff files before and after this commit. Both before and after this commit, 3 statements are replaced with nop. It's just that due to change in ordering, different statements are replaced. But the net result is the same. In other words, compare this diff (before fix):
* 090d5eac72/tests/mir-opt/dest-prop/copy_propagation_arg.foo.DestinationPropagation.panic-unwind.diff
With this diff (after fix):
* f603babd63/tests/mir-opt/dest-prop/copy_propagation_arg.foo.DestinationPropagation.panic-unwind.diff
and you can see that both before and after the fix, we replace 3 statements with `nop`s.
I find it _slightly_ surprising that the test this PR affects did not previously fail spuriously due to the indeterminism of `FxHashMap`, but I guess in can be explained with the predictability of small `FxHashMap`s with `usize` (`Local`) keys, or something along those lines.
This should fix [this](https://github.com/rust-lang/rust/pull/119252#discussion_r1436101791) comment, but I wanted to make a separate PR for this fix for a simpler development and review process.
Part of https://github.com/rust-lang/rust/issues/84447 which is E-help-wanted.
r? `@cjgillot` who is reviewer for the highly related PR https://github.com/rust-lang/rust/pull/119252.
Avoid specialization in the metadata serialization code
With the exception of a perf-only specialization for byte slices and byte vectors.
This uses the same trick of introducing a new trait and having the Encodable and Decodable derives add a bound to it as used for TyEncoder/TyDecoder. The new code is clearer about which encoder/decoder uses which impl and it reduces the dependency of rustc on specialization, making it easier to remove support for specialization entirely or turn it into a construct that is only allowed for perf optimizations if we decide to do this.
By using FxIndexMap instead of FxHashMap, so that the order of visiting
of locals is deterministic.
We also need to bless
copy_propagation_arg.foo.DestinationPropagation.panic*.diff. Do not
review the diff of the diff. Instead look at the diff file before and
after this commit. Both before and after this commit, 3 statements are
replaced with nop. It's just that due to change in ordering, different
statements are replaced. But the net result is the same.
StableCompare is a companion trait to `StableOrd`. Some types like `Symbol` can be compared in a cross-session stable way, but their `Ord` implementation is not stable. In such cases, a `StableOrd` implementation can be provided to offer a lightweight way for stable sorting. (The more heavyweight option is to sort via `ToStableHashKey`, but then sorting needs to have access to a stable hashing context and `ToStableHashKey` can also be expensive as in the case of `Symbol` where it has to allocate a `String`.)
This involves lots of breaking changes. There are two big changes that
force changes. The first is that the bitflag types now don't
automatically implement normal derive traits, so we need to derive them
manually.
Additionally, bitflags now have a hidden inner type by default, which
breaks our custom derives. The bitflags docs recommend using the impl
form in these cases, which I did.
[`RFC 3086`] Attempt to try to resolve blocking concerns
Implements what is described at https://github.com/rust-lang/rust/issues/83527#issuecomment-1744822345 to hopefully make some progress.
It is unknown if such approach is or isn't desired due to the lack of further feedback, as such, it is probably best to nominate this PR to the official entities.
`@rustbot` labels +I-compiler-nominated
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
Make `FatalErrorMarker` lower priority than other panics
This makes `FatalErrorMarker` lower priority than other panics in a parallel sections. If any other panics occur, they will be unwound instead of `FatalErrorMarker`. This ensures `rustc` will exit with the correct error code on ICEs.
This fixes https://github.com/rust-lang/rust/issues/116659.
enable parallel rustc front end in nightly builds
Refers to the [MCP](https://github.com/rust-lang/compiler-team/issues/681), this pr does:
1. Enable the parallel front end in nightly builds, and keep the default number of threads as 1. Then users can use the parallel rustc front end via -Z threads=n option.
2. Set it up to serial front end for beta/stable builds via bootstrap.
3. Switch over the alt builders from parallel rustc to serial, so we have artifacts without parallel to test against the artifacts with parallel.
r? `@oli-obk`
cc `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
Avoid a `track_errors` by bubbling up most errors from `check_well_formed`
I believe `track_errors` is mostly papering over issues that a sufficiently convoluted query graph can hit. I made this change, while the actual change I want to do is to stop bailing out early on errors, and instead use this new `ErrorGuaranteed` to invoke `check_well_formed` for individual items before doing all the `typeck` logic on them.
This works towards resolving https://github.com/rust-lang/rust/issues/97477 and various other ICEs, as well as allowing us to use parallel rustc more (which is currently rather limited/bottlenecked due to the very sequential nature in which we do `rustc_hir_analysis::check_crate`)
cc `@SparrowLii` `@Zoxc` for the new `try_par_for_each_in` function
Remove `IdFunctor` trait.
It's defined in `rustc_data_structures` but is only used in
`rustc_type_ir`. The code is shorter and easier to read if we remove
this layer of abstraction and just do the things directly where they are
needed.
r? `@BoxyUwU`
It's defined in `rustc_data_structures` but is only used in
`rustc_type_ir`. The code is shorter and easier to read if we remove
this layer of abstraction and just do the things directly where they are
needed.
Extract parallel operations in `rustc_data_structures::sync` into a new `parallel` submodule
This extracts parallel operations in `rustc_data_structures::sync` into a new `parallel` submodule. This cuts down on the size of the large `cfg_if!` in `sync` and makes it easier to compare between serial and parallel variants.
Add optimized lock methods for `Sharded` and refactor `Lock`
This adds methods to `Sharded` which pick a shard and also locks it. These branch on parallelism just once instead of twice, improving performance.
Benchmark for `cfg(parallel_compiler)` and 1 thread:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6461s</td><td align="right">1.6345s</td><td align="right"> -0.70%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2414s</td><td align="right">0.2394s</td><td align="right"> -0.83%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9205s</td><td align="right">0.9143s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4981s</td><td align="right">1.4869s</td><td align="right"> -0.75%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.7629s</td><td align="right">5.7256s</td><td align="right"> -0.65%</td></tr><tr><td>Total</td><td align="right">10.0690s</td><td align="right">10.0008s</td><td align="right"> -0.68%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9928s</td><td align="right"> -0.72%</td></tr></table>
cc `@SparrowLii`
Remove conditional use of `Sharded` from query state
`Sharded` is already a zero cost abstraction, so it shouldn't affect the performance of the single thread compiler if LLVM does its job.
r? `@cjgillot`
elaborate a bit on the (lack of) safety in 'Mmap::map'
Sadly none of the callers of this function even consider it worth mentioning in their unsafe block that what they are doing is completely unsound.
Fix races conditions with `SyntaxContext` decoding
This changes `SyntaxContext` decoding to work with concurrent decoding. The `remapped_ctxts` field now only stores `SyntaxContext` which have completed decoding, while the new `decoding` and `local_in_progress` keeps track of `SyntaxContext`s which are in process of being decoding and on which threads.
This fixes 2 issues with the current implementation. It can return an `SyntaxContext` which contains dummy data if another thread starts decoding before the first one has completely finished. Multiple threads could also allocate multiple `SyntaxContext`s for the same `raw_id`.
[rustc_data_structures][base_n][perf] Remove unnecessary utf8 check.
Since all output characters taken from `BASE_64` are valid UTF8 chars there is no need to waste cycles on validation.
Even though it's obviously a perf win, I've also used a [benchmark](https://gist.github.com/ttsugriy/e1e63c07927d8f31e71695a9c617bbf3) on M1 MacBook Air with following results:
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.670 µs 14.852 µs 15.074 µs]
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
push_str/new time: [12.573 µs 12.674 µs 12.801 µs]
Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe
```
It lints against features that are inteded to be internal to the
compiler and standard library. Implements MCP #596.
We allow `internal_features` in the standard library and compiler as those
use many features and this _is_ the standard library from the "internal to the compiler and
standard library" after all.
Marking some features as internal wasn't exactly the most scientific approach, I just marked some
mostly obvious features. While there is a categorization in the macro,
it's not very well upheld (should probably be fixed in another PR).
We always pass `-Ainternal_features` in the testsuite
About 400 UI tests and several other tests use internal features.
Instead of throwing the attribute on each one, just always allow them.
There's nothing wrong with testing internal features^^
Since all output characters taken from `BASE_64` are valid UTF8 chars
there is no need to waste cycles on validation.
Even though it's obviously a perf win, I've also used a [benchmark](https://gist.github.com/ttsugriy/e1e63c07927d8f31e71695a9c617bbf3)
on M1 MacBook Air with following results:
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.670 µs 14.852 µs 15.074 µs]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
push_str/new time: [12.573 µs 12.674 µs 12.801 µs]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe
```
[rustc_data_structures][perf] Simplify base_n::push_str.
This minor change removes the need to reverse resulting digits. Since reverse is O(|digit_num|) but bounded by 128, it's unlikely to be a noticeable in practice. At the same time, this code is also a 1 line shorter, so combined with tiny perf win, why not?
I ran https://gist.github.com/ttsugriy/ed14860ef597ab315d4129d5f8adb191 on M1 macbook air and got a small improvement
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.180 µs 14.313 µs 14.462 µs]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
push_str/new time: [13.741 µs 13.839 µs 13.973 µs]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
```
[rustc_data_structures] Simplify SortedMap::insert.
It looks like current usage of `swap` is aimed at achieving what `std::mem::replace` does but more concisely and idiomatically.
This minor change removes the need to reverse resulting digits.
Since reverse is O(|digit_num|) but bounded by 128, it's unlikely
to be a noticeable in practice. At the same time, this code is
also a 1 line shorter, so combined with tiny perf win, why not?
I ran https://gist.github.com/ttsugriy/ed14860ef597ab315d4129d5f8adb191
on M1 macbook air and got a small improvement
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.180 µs 14.313 µs 14.462 µs]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
push_str/new time: [13.741 µs 13.839 µs 13.973 µs]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
```
It no longer has any uses. If it's needed in the future, it can be
easily reinstated. Or a crate such as `smallstr` can be used, much like
we use `smallvec`.
Removed unnecessary &String -> &str, now that &String implements StableOrd as well
Applied a few nits suggested by lcnr to PR #110040 (nits can be found [here](https://github.com/rust-lang/rust/pull/110040#pullrequestreview-1469452191).)
Making a new PR because the old one was already merged, and given that this just applies changes that were already suggested, reviewing it should be fairly open-and-shut.
Don't leak the function that is called on drop
It probably wasn't causing problems anyway, but still, a `// this leaks, please don't pass anything that owns memory` is not sustainable.
I could implement a version which does not require `Option`, but it would require `unsafe`, at which point it's probably not worth it.
Use `Option::is_some_and` and `Result::is_ok_and` in the compiler
`.is_some_and(..)`/`.is_ok_and(..)` replace `.map_or(false, ..)` and `.map(..).unwrap_or(false)`, making the code more readable.
This PR is a sibling of https://github.com/rust-lang/rust/pull/111873#issuecomment-1561316515
Preprocess and cache dominator tree
Preprocessing dominators has a very strong effect for https://github.com/rust-lang/rust/pull/111344.
That pass checks that assignments dominate their uses repeatedly. Using the unprocessed dominator tree caused a quadratic runtime (number of bbs x depth of the dominator tree).
This PR also caches the dominator tree and the pre-processed dominators in the MIR cfg cache.
Rebase of https://github.com/rust-lang/rust/pull/107157
cc `@tmiasko`
Process current bucket instead of parent's bucket when starting loop for dominators.
The linked paper by Georgiadis suggests in §2.2.3 to process `bucket[w]` when beginning the loop, instead of `bucket[parent[w]]` when finishing it.
In the test case, we correctly computed `idom[2] = 0` and `sdom[3] = 1`, but the algorithm returned `idom[3] = 1`, instead of the correct value 0, because of the path 0-7-2-3.
This provoked LLVM ICE in https://github.com/rust-lang/rust/pull/111061#issuecomment-1546912112. LLVM checks that SSA assignments dominate uses using its own implementation of Lengauer-Tarjan, and saw case where rustc was breaking the dominance property.
r? `@Mark-Simulacrum`