[rustc_data_structures][perf] Simplify base_n::push_str.
This minor change removes the need to reverse resulting digits. Since reverse is O(|digit_num|) but bounded by 128, it's unlikely to be a noticeable in practice. At the same time, this code is also a 1 line shorter, so combined with tiny perf win, why not?
I ran https://gist.github.com/ttsugriy/ed14860ef597ab315d4129d5f8adb191 on M1 macbook air and got a small improvement
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.180 µs 14.313 µs 14.462 µs]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
push_str/new time: [13.741 µs 13.839 µs 13.973 µs]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
```
[rustc_data_structures] Simplify SortedMap::insert.
It looks like current usage of `swap` is aimed at achieving what `std::mem::replace` does but more concisely and idiomatically.
This minor change removes the need to reverse resulting digits.
Since reverse is O(|digit_num|) but bounded by 128, it's unlikely
to be a noticeable in practice. At the same time, this code is
also a 1 line shorter, so combined with tiny perf win, why not?
I ran https://gist.github.com/ttsugriy/ed14860ef597ab315d4129d5f8adb191
on M1 macbook air and got a small improvement
```
Running benches/base_n_benchmark.rs (target/release/deps/base_n_benchmark-825fe5895b5c2693)
push_str/old time: [14.180 µs 14.313 µs 14.462 µs]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
push_str/new time: [13.741 µs 13.839 µs 13.973 µs]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
```
It no longer has any uses. If it's needed in the future, it can be
easily reinstated. Or a crate such as `smallstr` can be used, much like
we use `smallvec`.
Removed unnecessary &String -> &str, now that &String implements StableOrd as well
Applied a few nits suggested by lcnr to PR #110040 (nits can be found [here](https://github.com/rust-lang/rust/pull/110040#pullrequestreview-1469452191).)
Making a new PR because the old one was already merged, and given that this just applies changes that were already suggested, reviewing it should be fairly open-and-shut.
Don't leak the function that is called on drop
It probably wasn't causing problems anyway, but still, a `// this leaks, please don't pass anything that owns memory` is not sustainable.
I could implement a version which does not require `Option`, but it would require `unsafe`, at which point it's probably not worth it.
Use `Option::is_some_and` and `Result::is_ok_and` in the compiler
`.is_some_and(..)`/`.is_ok_and(..)` replace `.map_or(false, ..)` and `.map(..).unwrap_or(false)`, making the code more readable.
This PR is a sibling of https://github.com/rust-lang/rust/pull/111873#issuecomment-1561316515
Preprocess and cache dominator tree
Preprocessing dominators has a very strong effect for https://github.com/rust-lang/rust/pull/111344.
That pass checks that assignments dominate their uses repeatedly. Using the unprocessed dominator tree caused a quadratic runtime (number of bbs x depth of the dominator tree).
This PR also caches the dominator tree and the pre-processed dominators in the MIR cfg cache.
Rebase of https://github.com/rust-lang/rust/pull/107157
cc `@tmiasko`
Process current bucket instead of parent's bucket when starting loop for dominators.
The linked paper by Georgiadis suggests in §2.2.3 to process `bucket[w]` when beginning the loop, instead of `bucket[parent[w]]` when finishing it.
In the test case, we correctly computed `idom[2] = 0` and `sdom[3] = 1`, but the algorithm returned `idom[3] = 1`, instead of the correct value 0, because of the path 0-7-2-3.
This provoked LLVM ICE in https://github.com/rust-lang/rust/pull/111061#issuecomment-1546912112. LLVM checks that SSA assignments dominate uses using its own implementation of Lengauer-Tarjan, and saw case where rustc was breaking the dominance property.
r? `@Mark-Simulacrum`
Change the immediate_dominator return type to Option, and use None to
indicate that node has no immediate dominator.
Also fix the issue where the start node would be returned as its own
immediate dominator.
Introduce `DynSend` and `DynSync` auto trait for parallel compiler
part of parallel-rustc #101566
This PR introduces `DynSend / DynSync` trait and `FromDyn / IntoDyn` structure in rustc_data_structure::marker. `FromDyn` can dynamically check data structures for thread safety when switching to parallel environments (such as calling `par_for_each_in`). This happens only when `-Z threads > 1` so it doesn't affect single-threaded mode's compile efficiency.
r? `@cjgillot`
bump windows crate 0.46 -> 0.48
This drops duped version of crate(0.46), reduces `rustc_driver.dll` ~800kb and reduces exported functions number from 26k to 22k.
Also while here, added `tidy-alphabetical` sorting to lists in tidy allowed lists.
Min specialization improvements
- Don't allow specialization impls with no items, such implementations are probably not correct and only occur as mistakes in the compiler and standard library
- Fix a missing normalization call
- Adds spans for lifetime errors from overly general specializations
Closes#79457Closes#109815
Such implementations are usually mistakes and are not used in the
compiler or standard library (after this commit) so forbid them with
`min_specialization`.
Move the WorkerLocal type from the rustc-rayon fork into rustc_data_structures
This PR moves the definition of the `WorkerLocal` type from `rustc-rayon` into `rustc_data_structures`. This is enabled by the introduction of the `Registry` type which allows you to group up threads to be used by `WorkerLocal` which is basically just an array with an per thread index. The `Registry` type mirrors the one in Rayon and each Rayon worker thread is also registered with the new `Registry`. Safety for `WorkerLocal` is ensured by having it keep a reference to the registry and checking on each access that we're still on the group of threads associated with the registry used to construct it.
Accessing a `WorkerLocal` is micro-optimized due to it being hot since it's used for most arena allocations.
Performance is slightly improved for the parallel compiler:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.9992s</td><td align="right">1.9949s</td><td align="right"> -0.21%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2977s</td><td align="right">0.2970s</td><td align="right"> -0.22%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">1.1335s</td><td align="right">1.1315s</td><td align="right"> -0.18%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.8235s</td><td align="right">1.8171s</td><td align="right"> -0.35%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">6.9047s</td><td align="right">6.8930s</td><td align="right"> -0.17%</td></tr><tr><td>Total</td><td align="right">12.1586s</td><td align="right">12.1336s</td><td align="right"> -0.21%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9977s</td><td align="right"> -0.23%</td></tr></table>
cc `@SparrowLii`
Sprinkle some `#[inline]` in `rustc_data_structures::tagged_ptr`
This is based on `nm --demangle (rustc +a --print sysroot)/lib/librustc_driver-*.so | rg CopyTaggedPtr` which shows many methods that should probably be inlined. May fix the regression in https://github.com/rust-lang/rust/pull/110795.
r? ```@Nilstrieb```
Add `impl_tag!` macro to implement `Tag` for tagged pointer easily
r? `@Nilstrieb`
This should also lifts the need to think about safety from the callers (`impl_tag!` is robust (ish, see the macro issue)) and removes the possibility of making a "weird" `Tag` impl.
Encode hashes as bytes, not varint
In a few places, we store hashes as `u64` or `u128` and then apply `derive(Decodable, Encodable)` to the enclosing struct/enum. It is more efficient to encode hashes directly than try to apply some varint encoding. This PR adds two new types `Hash64` and `Hash128` which are produced by `StableHasher` and replace every use of storing a `u64` or `u128` that represents a hash.
Distribution of the byte lengths of leb128 encodings, from `x build --stage 2` with `incremental = true`
Before:
```
( 1) 373418203 (53.7%, 53.7%): 1
( 2) 196240113 (28.2%, 81.9%): 3
( 3) 108157958 (15.6%, 97.5%): 2
( 4) 17213120 ( 2.5%, 99.9%): 4
( 5) 223614 ( 0.0%,100.0%): 9
( 6) 216262 ( 0.0%,100.0%): 10
( 7) 15447 ( 0.0%,100.0%): 5
( 8) 3633 ( 0.0%,100.0%): 19
( 9) 3030 ( 0.0%,100.0%): 8
( 10) 1167 ( 0.0%,100.0%): 18
( 11) 1032 ( 0.0%,100.0%): 7
( 12) 1003 ( 0.0%,100.0%): 6
( 13) 10 ( 0.0%,100.0%): 16
( 14) 10 ( 0.0%,100.0%): 17
( 15) 5 ( 0.0%,100.0%): 12
( 16) 4 ( 0.0%,100.0%): 14
```
After:
```
( 1) 372939136 (53.7%, 53.7%): 1
( 2) 196240140 (28.3%, 82.0%): 3
( 3) 108014969 (15.6%, 97.5%): 2
( 4) 17192375 ( 2.5%,100.0%): 4
( 5) 435 ( 0.0%,100.0%): 5
( 6) 83 ( 0.0%,100.0%): 18
( 7) 79 ( 0.0%,100.0%): 10
( 8) 50 ( 0.0%,100.0%): 9
( 9) 6 ( 0.0%,100.0%): 19
```
The remaining 9 or 10 and 18 or 19 are `u64` and `u128` respectively that have the high bits set. As far as I can tell these are coming primarily from `SwitchTargets`.
Spelling compiler
This is per https://github.com/rust-lang/rust/pull/110392#issuecomment-1510193656
I'm going to delay performing a squash because I really don't expect people to be perfectly happy w/ my changes, I really am a human and I really do make mistakes.
r? Nilstrieb
I'm going to be flying this evening, but I should be able to squash / respond to reviews w/in a day or two.
I tried to be careful about dropping changes to `tests`, afaict only two files had changes that were likely related to the changes for a given commit (this is where not having eagerly squashed should have given me an advantage), but, that said, picking things apart can be error prone.
Rollup of 7 pull requests
Successful merges:
- #109981 (Set commit information environment variables when building tools)
- #110348 (Add list of supported disambiguators and suffixes for intra-doc links in the rustdoc book)
- #110409 (Don't use `serde_json` to serialize a simple JSON object)
- #110442 (Avoid including dry run steps in the build metrics)
- #110450 (rustdoc: Fix invalid handling of nested items with `--document-private-items`)
- #110461 (Use `Item::expect_*` and `ImplItem::expect_*` more)
- #110465 (Assure everyone that `has_type_flags` is fast)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Don't use `serde_json` to serialize a simple JSON object
This avoids `rustc_data_structures` depending on `serde_json` which allows it to be compiled much earlier, unlocking most of rustc.
This used to not matter, but after #110407 we're not blocked on fluent anymore, which means that it's now a blocking edge.
![image](https://user-images.githubusercontent.com/48135649/232313178-e0150420-3020-4eb6-98d3-fe5294a8f947.png)
This saves a few more seconds.
cc ````@Zoxc```` who added it recently
Implement StableHasher::write_u128 via write_u64
In https://github.com/rust-lang/rust/pull/110367#issuecomment-1510114777 the cachegrind diffs indicate that nearly all the regression is from this:
```
22,892,558 ???:<rustc_data_structures::sip128::SipHasher128>::slice_write_process_buffer
-9,502,262 ???:<rustc_data_structures::sip128::SipHasher128>::short_write_process_buffer::<8>
```
Which happens because the diff for that perf run swaps a `Hash::hash` of a `u64` to a `u128`. But `slice_write_process_buffer` is a `#[cold]` function, and is for handling hashes of arbitrary-length byte arrays.
Using the much more optimizer-friendly `u64` path twice to hash a `u128` provides a nice perf boost in some benchmarks.
Tagged pointers, now with strict provenance!
This is a big refactor of tagged pointers in rustc, with three main goals:
1. Porting the code to the strict provenance
2. Cleanup the code
3. Document the code (and safety invariants) better
This PR has grown quite a bit (almost a complete rewrite at this point...), so I'm not sure what's the best way to review this, but reviewing commit-by-commit should be fine.
r? `@Nilstrieb`
Remove some suspicious cast truncations
These truncations were added a long time ago, and as best I can tell without a perf justification. And with rust-lang/rust#110410 it has become perf-neutral to not truncate anymore. We worked hard for all these bits, let's use them.
Turns out
- `owning_ref` is unsound due to `Box` aliasing stuff
- `rustc` doesn't need 99% of the `owning_ref` API
- `rustc` can use a far simpler abstraction that is `OwnedSlice`
Also, `MTRef<'a, T>` is a typedef for a reference to a `T`, but in
practice it's only used (and useful) in combination with `MTLock`, i.e.
`MTRef<'a, MTLock<T>>`. So this commit changes it to be a typedef for a
reference to an `MTLock<T>`, and renames it as `MTLockRef`. I think this
clarifies things, because I found `MTRef` quite puzzling at first.
Add `-Z time-passes-format` to allow specifying a JSON output for `-Z time-passes`
This adds back the `-Z time` option as that is useful for [my rustc benchmark tool](https://github.com/Zoxc/rcb), reverting https://github.com/rust-lang/rust/pull/102725. It now uses nanoseconds and bytes as the units so it is renamed to `time-precise`.
- only borrow the refcell once per loop
- avoid complex matches to reduce branch paths in the hot loop
- use a by-ref fast path that avoids mutations at the expense of having false negatives
Rename `MapInPlace` as `FlatMapInPlace`.
After removing the `map_in_place` method, which isn't much use because modifying every element in a collection such as a `Vec` can be done trivially with iteration.
r? ``@lqd``
Do not implement HashStable for HashSet (MCP 533)
This PR removes all occurrences of `HashSet` in query results, replacing it either with `FxIndexSet` or with `UnordSet`, and then removes the `HashStable` implementation of `HashSet`. This is part of implementing [MCP 533](https://github.com/rust-lang/compiler-team/issues/533), that is, removing the `HashStable` implementations of all collection types with unstable iteration order.
The changes are mostly mechanical. The only place where additional sorting is happening is in Miri's override implementation of the `exported_symbols` query.
After removing the `map_in_place` method, which isn't much use because
modifying every element in a collection such as a `Vec` can be done
trivially with iteration.
Use a lock-free datastructure for source_span
follow up to the perf regression in https://github.com/rust-lang/rust/pull/105462
The main regression is likely the CStore, but let's evaluate the perf impact of this on its own
Better debug logs for borrowck constraint graph
It's really cumbersome to work with `RegionVar`s when trying to debug borrowck code or when trying to understand how the borrowchecker works. This PR collects some region information (behind `cfg(debug_assertions)`) for created `RegionVar`s (NLL region vars, this PR doesn't touch canonicalization) and prints the nodes and edges of the strongly connected constraints graph using representatives that use that region information (either lifetime names, locations in MIR or spans).
Bump bootstrap compiler to 1.68
This also changes our stage0.json to include the rustc component for the rustfmt pinned nightly toolchain, which is currently necessary due to rustfmt dynamically linking to that toolchain's librustc_driver and libstd.
r? `@pietroalbini`
Use stable metric for const eval limit instead of current terminator-based logic
This patch adds a `MirPass` that inserts a new MIR instruction `ConstEvalCounter` to any loops and function calls in the CFG. This instruction is used during Const Eval to count against the `const_eval_limit`, and emit the `StepLimitReached` error, replacing the current logic which uses Terminators only.
The new method of counting loops and function calls should be more stable across compiler versions (i.e., not cause crates that compiled successfully before, to no longer compile when changes to the MIR generation/optimization are made).
Also see: #103877
Consistently use dominates instead of is_dominated_by
There is a number of APIs that answer dominance queries. Previously they were named either "dominates" or "is_dominated_by". Consistently use the "dominates" form.
No functional changes.
Use UnordMap and UnordSet for id collections (DefIdMap, LocalDefIdMap, etc)
This PR changes the `rustc_data_structures::define_id_collections!` macro to use `UnordMap` and `UnordSet` instead of `FxHashMap` and `FxHashSet`. This should account for a large portion of hash-maps being used in places where they can cause trouble.
The changes required are moderate but non-zero:
- In some places the collections are extracted into sorted vecs.
- There are a few instances where for-loops have been changed to extends.
~~Let's see what the performance impact is. With a bit more refactoring, we might be able to get rid of some of the additional sorting -- but the change set is already big enough. Unless there's a performance impact, I'd like to do further changes in subsequent PRs.~~
Performance does not seem to be negatively affected ([perf-run here](https://github.com/rust-lang/rust/pull/106977#issuecomment-1396776699)).
Part of [MCP 533](https://github.com/rust-lang/compiler-team/issues/533).
r? `@ghost`
There is a number of APIs that answer dominance queries. Previously they
were named either "dominates" or "is_dominated_by". Consistently use the
"dominates" form.
No functional changes.
Switching them to `Break(())` and `Continue(())` instead.
libs-api would like to remove these constants, so stop using them in compiler to make the removal PR later smaller.
Convert all the crates that have had their diagnostic migration
completed (except save_analysis because that will be deleted soon and
apfloat because of the licensing problem).
HIR debug output is currently very verbose, especially when used with
the alternate (`#`) flag. This commit reduces the amount of noisy
newlines by forcing a few small key types to stay on one line, which
makes the output easier to read and scroll by.
```
$ rustc +after hello_world.rs -Zunpretty=hir-tree | wc -l
582
$ rustc +before hello_world.rs -Zunpretty=hir-tree | wc -l
932
```
Remove the `..` from the body, only a few invocations used it and it's
inconsistent with rust syntax.
Use `;` instead of `,` between consts. As the Rust syntax gods inteded.
Put all cached values into a central struct instead of just the stable hash
cc `@nnethercote`
this allows re-use of the type for Predicate without duplicating all the logic for the non-hash cached fields
Add StableOrd trait as proposed in MCP 533.
The `StableOrd` trait can be used to mark types as having a stable sort order across compilation sessions. Collections that sort their items in a stable way can safely implement HashStable by hashing items in sort order.
See https://github.com/rust-lang/compiler-team/issues/533 for more information.
Rollup of 9 pull requests
Successful merges:
- #104199 (Keep track of the start of the argument block of a closure)
- #105050 (Remove useless borrows and derefs)
- #105153 (Create a hacky fail-fast mode that stops tests at the first failure)
- #105164 (Restore `use` suggestion for `dyn` method call requiring `Sized`)
- #105193 (Disable coverage instrumentation for naked functions)
- #105200 (Remove useless filter in unused extern crate check.)
- #105201 (Do not call fn_sig on non-functions.)
- #105208 (Add AmbiguityError for inconsistent resolution for an import)
- #105214 (update Miri)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Remove useless borrows and derefs
They are nothing more than noise.
<sub>These are not all of them, but my clippy started crashing (stack overflow), so rip :(</sub>
The StableOrd trait can be used to mark types as having a stable
sort order across compilation sessions. Collections that sort their
items in a stable way can safely implement HashStable by
hashing items in sort order.
Use liballoc's specialised in-place vec collection
liballoc already specialises in-place vector collection, so manually
reimplementing it in `IdFunctor::try_map_id` was superfluous.
Use Set instead of Vec in transitive_relation
Helps with #103195. It doesn't fix the underlying quadraticness but it makes it _a lot_ faster to an extent where even doubling the amount of nested references still takes less than two seconds (50s on nightly).
I want to see whether this causes regressions (because the vec was usually quite small) or improvements (as lookup for bigger sets is now much faster) in real code.
Remove "execute" bit from lock file permissions
Previously, flock would set the "execute" bit on Rust lock files. That makes no sense.
This patch clears the "execute" bit on Rust lock files.
See issue #102531.
Remove `-Ztime`
Because it has a lot of overlap with `-Ztime-passes` but is generally less useful. Plus some related cleanups.
Best reviewed one commit at a time.
r? `@davidtwco`
`print_time_passes_entry` unconditionally prints data about a pass. The
most commonly used call site, in `VerboseTimingGuard::drop`, guards it
with a `should_print_passes` test. But there are a couple of other call
sites that don't do that test.
This commit moves the `should_print_passes` test within
`print_time_passes_entry` so that all passes are treated equally.
The compiler currently has `-Ztime` and `-Ztime-passes`. I've used
`-Ztime-passes` for years but only recently learned about `-Ztime`.
What's the difference? Let's look at the `-Zhelp` output:
```
-Z time=val -- measure time of rustc processes (default: no)
-Z time-passes=val -- measure time of each rustc pass (default: no)
```
The `-Ztime-passes` description is clear, but the `-Ztime` one is less so.
Sounds like it measures the time for the entire process?
No. The real difference is that `-Ztime-passes` prints out info about passes,
and `-Ztime` does the same, but only for a subset of those passes. More
specifically, there is a distinction in the profiling code between a "verbose
generic activity" and an "extra verbose generic activity". `-Ztime-passes`
prints both kinds, while `-Ztime` only prints the first one. (It took me
a close reading of the source code to determine this difference.)
In practice this distinction has low value. Perhaps in the past the "extra
verbose" output was more voluminous, but now that we only print stats for a
pass if it exceeds 5ms or alters the RSS, `-Ztime-passes` is less spammy. Also,
a lot of the "extra verbose" cases are for individual lint passes, and you need
to also use `-Zno-interleave-lints` to see those anyway.
Therefore, this commit removes `-Ztime` and the associated machinery. One thing
to note is that the existing "extra verbose" activities all have an extra
string argument, so the commit adds the ability to accept an extra argument to
the "verbose" activities.