This commit updates the signatures of all diagnostic functions to accept
types that can be converted into a `DiagnosticMessage`. This enables
existing diagnostic calls to continue to work as before and Fluent
identifiers to be provided. The `SessionDiagnostic` derive just
generates normal diagnostic calls, so these APIs had to be modified to
accept Fluent identifiers.
In addition, loading of the "fallback" Fluent bundle, which contains the
built-in English messages, has been implemented.
Each diagnostic now has "arguments" which correspond to variables in the
Fluent messages (necessary to render a Fluent message) but no API for
adding arguments has been added yet. Therefore, diagnostics (that do not
require interpolation) can be converted to use Fluent identifiers and
will be output as before.
Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
This was largely just caching the shard value at this point, which is not
particularly useful -- in the use sites the key was being hashed nearby anyway.
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to `Cell<u64>` as the backing
storage over a `u64`, but the latter was already inside a `RefCell` so it's not
really a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.
Don't perform any new queries while reading a query result on disk
In addition to being very confusing, this can cause us to add dep node edges between two queries that would not otherwise have an edge.
We now panic if any new dep node edges are created during the deserialization of a query result. This requires serializing the full `AdtDef` to disk, instead of just serializing the `DefId` and invoking the `adt_def` query during deserialization.
I'll probably split this up into several smaller PRs for perf runs.
Currently, you can use `#[rustc_clean]` to assert to that a particular
query (technically, a `DepNode`) is green or red. However, a green
`DepNode` does not mean that the query result was actually deserialized
from disk - we might have never re-run a query that needed the result.
Some incremental tests are written as regression tests for ICEs that
occured during query result decoding. Using
`#[rustc_clean(loaded_from_disk="typeck")]`, you can now assert
that the result of a particular query (e.g. `typeck`) was actually
loaded from disk, in addition to being green.
This performs a substitution of code following the pattern:
let <id> = if let <pat> = ... { identity } else { ... : ! };
To simplify it to:
let <pat> = ... { identity } else { ... : ! };
By adopting the let_else feature.
This was already only enabled in debug_assertions builds. Generally, it seems
like most use cases that would use this could also use the -Zself-profile flag
which also tracks cache hits (in all builds), and so the extra cfg's and such
are not really necessary.
This is largely just a small cleanup though, which primarily is intended to make
other changes easier by avoiding the need to deal with this field.
Refactor query forcing
The control flow in those functions was very complex, with several layers of continuations.
I tried to simplify the implementation, while keeping essentially the same logic.
Now, all code paths go through `try_execute_query` for the actual query execution.
Communication with the `dep_graph` and the live caches are the only difference between query getting/ensuring/forcing.
Previously, `QueryJobInfo` was composed of two parts: a `QueryInfo` and
a `QueryJob`. However, both `QueryInfo` and `QueryJob` have a `span`
field, which seem to be the same. So, the `span` was recorded twice.
Now, `QueryJobInfo` is composed of a `QueryStackFrame` (the other field
of `QueryInfo`) and a `QueryJob`. So, now, the `span` is only recorded
once.
try_execute_query is now able to centralize the path for query
get/ensure/force.
try_execute_query now takes the dep_node as a parameter, so it can
accommodate `force`. This dep_node is an Option to avoid computing it in
the `get` fast path.
try_execute_query now returns both the result and the dep_node_index to
allow the caller to handle the dep graph.
The caller is responsible for marking the dependency.
`with_taks_impl` is only called from `with_eval_always_task` and
`with_task` . The former is only used in query invocation, while the
latter is also used to start the `tcx` and to trigger codegen.
This move should not change significantly the number of calls to this
assertion.
When an incremental fingerprint mismatch occurs, we debug-print
our `DepNode` and query result. Unfortunately, the debug printing
process may cause us to run additional queries, which can result
in a re-entrant fingerprint mismatch error.
To avoid a double panic, this commit adds a thread-local variable
to detect re-entrant calls.