Batch mark waiters as unblocked when resuming in the deadlock handler
This fixes a race when resuming multiple threads to resolve query cycles. This now marks all threads as unblocked before resuming any of them. Previously if one was resumed and marked as unblocked at a time. The first thread resumed could fall asleep then Rayon would detect a second false deadlock. Later the initial deadlock handler thread would resume further threads.
This also reverts the workaround added in https://github.com/rust-lang/rust/pull/137731.
cc `@SparrowLii` `@lqd`
Abort in deadlock handler if we fail to get a query map
Resolving query cycles requires the complete active query map, or it may miss query cycles. We did not check that the map is completely constructed before. If there is some error collecting the map, something has gone wrong already. This adds a check to abort/panic if we fail to construct the complete map.
This can help differentiate errors from the `deadlock detected` case if constructing query map has errors in practice.
An `Option` is not used for `collect_active_jobs` as the panic handler can still make use of a partial map.
The `hash_raw_entry` feature has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
Resume one waiter at once in deadlock handler
When multiple query loop errors occur in the code, only one waiter should be resumed at a time to avoid waking up multiple waiters at the same time and causing deadlock due to thread grabbing.
This fixes the UI failures in #132051
cc `@Zoxc` `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`
Zulip discussion [here](https://rust-lang.zulipchat.com/#narrow/channel/187679-t-compiler.2Fwg-parallel-rustc/topic/Deadlocks.20and.20Rayon)
Edit: We can't reproduce these bugs with the existing test suits, so we keep them until we merge #132051
UPDATES #129912
UPDATES #120757
UPDATES #129911
When the word "cache" appears in the context of the query system, it often
isn't obvious whether that is referring to the in-memory query cache or the
on-disk incremental cache.
For these types, we can assure the reader that they are for in-memory caching.
Improve VecCache under parallel frontend
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (~1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
try-job: i686-gnu-nopt
try-job: dist-x86_64-linux
This replaces the single Vec allocation with a series of progressively
larger buckets. With the cfg for parallel enabled but with -Zthreads=1,
this looks like a slight regression in i-count and cycle counts (<0.1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5%
wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based
approach, likely due to reducing the bouncing of the cache line holding
the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319
seconds).
Since it's inception a long time ago, the parallel compiler and its cfgs
have been a maintenance burden. This was a necessary evil the allow
iteration while not degrading performance because of synchronization
overhead.
But this time is over. Thanks to the amazing work by the parallel
working group (and the dyn sync crimes), the parallel compiler has now
been fast enough to be shipped by default in nightly for quite a while
now.
Stable and beta have still been on the serial compiler, because they
can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has
served us well in the years since it was split from the parallel one,
but it's over now.
Let the knight slay one head of the two-headed dragon!
This sharding is never used (per the comment in code). If we re-add
sharding at some point in the future this is cheap to restore, but for
now no need for the extra complexity.
Stashed errors used to be counted as errors, but could then be
cancelled, leading to `ErrorGuaranteed` soundness holes. #120828 changed
that, closing the soundness hole. But it introduced other difficulties
because you sometimes have to account for pending stashed errors when
making decisions about whether errors have occured/will occur and it's
easy to overlook these.
This commit aims for a middle ground.
- Stashed errors (not warnings) are counted immediately as emitted
errors, avoiding the possibility of forgetting to consider them.
- The ability to cancel (or downgrade) stashed errors is eliminated, by
disallowing the use of `steal_diagnostic` with errors, and introducing
the more restrictive methods `try_steal_{modify,replace}_and_emit_err`
that can be used instead.
Other things:
- `DiagnosticBuilder::stash` and `DiagCtxt::stash_diagnostic` now both
return `Option<ErrorGuaranteed>`, which enables the removal of two
`delayed_bug` calls and one `Ty::new_error_with_message` call. This is
possible because we store error guarantees in
`DiagCtxt::stashed_diagnostics`.
- Storing the guarantees also saves us having to maintain a counter.
- Calls to the `stashed_err_count` method are no longer necessary
alongside calls to `has_errors`, which is a nice simplification, and
eliminates two more `span_delayed_bug` calls and one FIXME comment.
- Tests are added for three of the four fixed PRs mentioned below.
- `issue-121108.rs`'s output improved slightly, omitting a non-useful
error message.
Fixes#121451.
Fixes#121477.
Fixes#121504.
Fixes#121508.
For some cases where it's clear that an error has already occurred,
e.g.:
- there's a comment stating exactly that, or
- things like HIR lowering, where we are lowering an error kind
The commit also tweaks some comments around delayed bug sites.
There are a couple of places where we call
`inner.emitter.emit_diagnostic` directly rather than going through
`inner.emit_diagnostic`, to guarantee the diagnostic is printed. This
feels dubious to me, particularly the bypassing of `TRACK_DIAGNOSTIC`.
This commit removes those.
- In `print_error_count`, it uses `ForceWarning` instead of `Warning`.
- It removes `DiagCtxtInner::failure_note`, because it only has three
uses and direct use of `emit_diagnostic` is consistent with other
similar locations.
- It removes `force_print_diagnostic`, and adds `struct_failure_note`,
and updates `print_query_stack` accordingly, which makes it more
normal. That location doesn't seem to need forced printing anyway.
Foreign maps are used to cache external DefIds, typically backed by
metadata decoding. In the future we might skip caching `V` there (since
loading from metadata usually is already cheap enough), but for now this
cuts down on the impact to memory usage and time to None-init a bunch of
memory. Foreign data is usually much sparser, since we're not usually
loading *all* entries from the foreign crate(s).
We have `span_delayed_bug` and often pass it a `DUMMY_SP`. This commit
adds `delayed_bug`, which matches pairs like `err`/`span_err` and
`warn`/`span_warn`.