Parallel-compiler-related cleanup
Parallel-compiler-related cleanup
I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.
cc "Parallel Rustc Front-end" https://github.com/rust-lang/rust/issues/113349
r? SparrowLii
``@rustbot`` label: +WG-compiler-parallel
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
`rustc_middle` and `rustc_query_system` both have a file called
`dep_node.rs` with a big comment at the top, and the comments are very
similar. The one in `rustc_query_system` looks like the original, and
the one in `rustc_middle` is a copy with some improvements.
This commit removes the comment from `rustc_middle` and updates the one
in `rustc_query_system` to include the improvements. I did it this way
because `rustc_query_system` is the crate that defines `DepNode`, and so
seems like the right place for the comment.
When the word "cache" appears in the context of the query system, it often
isn't obvious whether that is referring to the in-memory query cache or the
on-disk incremental cache.
For these types, we can assure the reader that they are for in-memory caching.
Refactored the duplicated code into a function.
`with_feed_task` currently passes the query key to `debug_assert!`.
This commit changes that, so it debug prints the `DepNode`, as in
`with_task`.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Improve VecCache under parallel frontend
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (~1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
try-job: i686-gnu-nopt
try-job: dist-x86_64-linux
This replaces the single Vec allocation with a series of progressively
larger buckets. With the cfg for parallel enabled but with -Zthreads=1,
this looks like a slight regression in i-count and cycle counts (<0.1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5%
wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based
approach, likely due to reducing the bouncing of the cache line holding
the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319
seconds).
Delete the `cfg(not(parallel))` serial compiler
Since it's inception a long time ago, the parallel compiler and its cfgs have been a maintenance burden. This was a necessary evil the allow iteration while not degrading performance because of synchronization overhead.
But this time is over. Thanks to the amazing work by the parallel working group (and the dyn sync crimes), the parallel compiler has now been fast enough to be shipped by default in nightly for quite a while now.
Stable and beta have still been on the serial compiler, because they can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has served us well in the years since it was split from the parallel one, but it's over now.
Let the knight slay one head of the two-headed dragon!
#113349
Note that the default is still 1 thread, as more than 1 thread is still fairly broken.
cc `@onur-ozkan` to see if i did the bootstrap field removal correctly, `@SparrowLii` on the sync parts
Since it's inception a long time ago, the parallel compiler and its cfgs
have been a maintenance burden. This was a necessary evil the allow
iteration while not degrading performance because of synchronization
overhead.
But this time is over. Thanks to the amazing work by the parallel
working group (and the dyn sync crimes), the parallel compiler has now
been fast enough to be shipped by default in nightly for quite a while
now.
Stable and beta have still been on the serial compiler, because they
can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has
served us well in the years since it was split from the parallel one,
but it's over now.
Let the knight slay one head of the two-headed dragon!
No longer track "zero-width" chars in `SourceMap`, read directly from the line when calculating the `display_col` of a `BytePos`. Move `char_width` to `rustc_span` and use it from the emitter.
This change allows the following to properly align in terminals (depending on the font, the replaced control codepoints are rendered as 1 or 2 width, on my terminal they are rendered as 1, on VSCode text they are rendered as 2):
```
error: this file contains an unclosed delimiter
--> $DIR/issue-68629.rs:5:17
|
LL | ␜␟ts␀![{i
| -- unclosed delimiter
| |
| unclosed delimiter
LL | ␀␀ fn rݻoa>rݻm
| ^
```
Added an associated `const THIS_IMPLEMENTATION_HAS_BEEN_TRIPLE_CHECKED`
to the `StableOrd` trait to ensure that implementors carefully consider
whether the trait's contract is upheld, as incorrect implementations can
cause miscompilations.
The way it is implemented currently try_force_from_dep_node returns true
as long as there's a function to force the query. It wasn't this way
from the beginning, earlier version was producing forcing result and it
was changed in https://github.com/rust-lang/rust/pull/89978, I couldn't
find any comments addressing this change.
One way it can fail is by failing to recover the query in
DepNodeParams::recover - when we are trying to query something that no
longer exists in the current environment