coverage bug fixes and optimization support
Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to
address multiple, somewhat related issues.
Fixed a significant flaw in prior coverage solution: Every counter
generated a new counter variable, but there should have only been one
counter variable per function. This appears to have bloated .profraw
files significantly. (For a small program, it increased the size by
about 40%. I have not tested large programs, but there is anecdotal
evidence that profraw files were way too large. This is a good fix,
regardless, but hopefully it also addresses related issues.
Fixes: #82144
Invalid LLVM coverage data produced when compiled with -C opt-level=1
Existing tests now work up to at least `opt-level=3`. This required a
detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR
when compiled with coverage, and a lot of trial and error with codegen
adjustments.
The biggest hurdle was figuring out how to continue to support coverage
results for unused functions and generics. Rust's coverage results have
three advantages over Clang's coverage results:
1. Rust's coverage map does not include any overlapping code regions,
making coverage counting unambiguous.
2. Rust generates coverage results (showing zero counts) for all unused
functions, including generics. (Clang does not generate coverage for
uninstantiated template functions.)
3. Rust's unused functions produce minimal stubbed functions in LLVM IR,
sufficient for including in the coverage results; while Clang must
generate the complete LLVM IR for each unused function, even though
it will never be called.
This PR removes the previous hack of attempting to inject coverage into
some other existing function instance, and generates dedicated instances
for each unused function. This change, and a few other adjustments
(similar to what is required for `-C link-dead-code`, but with lower
impact), makes it possible to support LLVM optimizations.
Fixes: #79651
Coverage report: "Unexecuted instantiation:..." for a generic function
from multiple crates
Fixed by removing the aforementioned hack. Some "Unexecuted
instantiation" notices are unavoidable, as explained in the
`used_crate.rs` test, but `-Zinstrument-coverage` has new options to
back off support for either unused generics, or all unused functions,
which avoids the notice, at the cost of less coverage of unused
functions.
Fixes: #82875
Invalid LLVM coverage data produced with crate brotli_decompressor
Fixed by disabling the LLVM function attribute that forces inlining, if
`-Z instrument-coverage` is enabled. This attribute is applied to
Rust functions with `#[inline(always)], and in some cases, the forced
inlining breaks coverage instrumentation and reports.
FYI: `@wesleywiser`
r? `@tmandry`
Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to
address multiple, somewhat related issues.
Fixed a significant flaw in prior coverage solution: Every counter
generated a new counter variable, but there should have only been one
counter variable per function. This appears to have bloated .profraw
files significantly. (For a small program, it increased the size by
about 40%. I have not tested large programs, but there is anecdotal
evidence that profraw files were way too large. This is a good fix,
regardless, but hopefully it also addresses related issues.
Fixes: #82144
Invalid LLVM coverage data produced when compiled with -C opt-level=1
Existing tests now work up to at least `opt-level=3`. This required a
detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR
when compiled with coverage, and a lot of trial and error with codegen
adjustments.
The biggest hurdle was figuring out how to continue to support coverage
results for unused functions and generics. Rust's coverage results have
three advantages over Clang's coverage results:
1. Rust's coverage map does not include any overlapping code regions,
making coverage counting unambiguous.
2. Rust generates coverage results (showing zero counts) for all unused
functions, including generics. (Clang does not generate coverage for
uninstantiated template functions.)
3. Rust's unused functions produce minimal stubbed functions in LLVM IR,
sufficient for including in the coverage results; while Clang must
generate the complete LLVM IR for each unused function, even though
it will never be called.
This PR removes the previous hack of attempting to inject coverage into
some other existing function instance, and generates dedicated instances
for each unused function. This change, and a few other adjustments
(similar to what is required for `-C link-dead-code`, but with lower
impact), makes it possible to support LLVM optimizations.
Fixes: #79651
Coverage report: "Unexecuted instantiation:..." for a generic function
from multiple crates
Fixed by removing the aforementioned hack. Some "Unexecuted
instantiation" notices are unavoidable, as explained in the
`used_crate.rs` test, but `-Zinstrument-coverage` has new options to
back off support for either unused generics, or all unused functions,
which avoids the notice, at the cost of less coverage of unused
functions.
Fixes: #82875
Invalid LLVM coverage data produced with crate brotli_decompressor
Fixed by disabling the LLVM function attribute that forces inlining, if
`-Z instrument-coverage` is enabled. This attribute is applied to
Rust functions with `#[inline(always)], and in some cases, the forced
inlining breaks coverage instrumentation and reports.
Let a portion of DefPathHash uniquely identify the DefPath's crate.
This allows to directly map from a `DefPathHash` to the crate it originates from, without constructing side tables to do that mapping -- something that is useful for incremental compilation where we deal with `DefPathHash` instead of `DefId` a lot.
It also allows to reliably and cheaply check for `DefPathHash` collisions which allows the compiler to gracefully abort compilation instead of running into a subsequent ICE at some random place in the code.
The following new piece of documentation describes the most interesting aspects of the changes:
```rust
/// A `DefPathHash` is a fixed-size representation of a `DefPath` that is
/// stable across crate and compilation session boundaries. It consists of two
/// separate 64-bit hashes. The first uniquely identifies the crate this
/// `DefPathHash` originates from (see [StableCrateId]), and the second
/// uniquely identifies the corresponding `DefPath` within that crate. Together
/// they form a unique identifier within an entire crate graph.
///
/// There is a very small chance of hash collisions, which would mean that two
/// different `DefPath`s map to the same `DefPathHash`. Proceeding compilation
/// with such a hash collision would very probably lead to an ICE and, in the
/// worst case, to a silent mis-compilation. The compiler therefore actively
/// and exhaustively checks for such hash collisions and aborts compilation if
/// it finds one.
///
/// `DefPathHash` uses 64-bit hashes for both the crate-id part and the
/// crate-internal part, even though it is likely that there are many more
/// `LocalDefId`s in a single crate than there are individual crates in a crate
/// graph. Since we use the same number of bits in both cases, the collision
/// probability for the crate-local part will be quite a bit higher (though
/// still very small).
///
/// This imbalance is not by accident: A hash collision in the
/// crate-local part of a `DefPathHash` will be detected and reported while
/// compiling the crate in question. Such a collision does not depend on
/// outside factors and can be easily fixed by the crate maintainer (e.g. by
/// renaming the item in question or by bumping the crate version in a harmless
/// way).
///
/// A collision between crate-id hashes on the other hand is harder to fix
/// because it depends on the set of crates in the entire crate graph of a
/// compilation session. Again, using the same crate with a different version
/// number would fix the issue with a high probability -- but that might be
/// easier said then done if the crates in questions are dependencies of
/// third-party crates.
///
/// That being said, given a high quality hash function, the collision
/// probabilities in question are very small. For example, for a big crate like
/// `rustc_middle` (with ~50000 `LocalDefId`s as of the time of writing) there
/// is a probability of roughly 1 in 14,750,000,000 of a crate-internal
/// collision occurring. For a big crate graph with 1000 crates in it, there is
/// a probability of 1 in 36,890,000,000,000 of a `StableCrateId` collision.
```
Given the probabilities involved I hope that no one will ever actually see the error messages. Nonetheless, I'd be glad about some feedback on how to improve them. Should we create a GH issue describing the problem and possible solutions to point to? Or a page in the rustc book?
r? `@pnkfelix` (feel free to re-assign)
Only store a LocalDefId in some HIR nodes
Some HIR nodes are guaranteed to be HIR owners: Item, TraitItem, ImplItem, ForeignItem and MacroDef.
As a consequence, we do not need to store the `HirId`'s `local_id`, and we can directly store a `LocalDefId`.
This allows to avoid a bit of the dance with `tcx.hir().local_def_id` and `tcx.hir().local_def_id_to_hir_id` mappings.
This allows a build system to indicate a location in its own dependency
specification files (eg Cargo's `Cargo.toml`) which can be reported
along side any unused crate dependency.
This supports several types of location:
- 'json' - provide some json-structured data, which is included in the json diagnostics
in a `tool_metadata` field
- 'raw' - emit the provided string into the output. This also appears as a json string in
`tool_metadata`.
If no `--extern-location` is explicitly provided then a default json entry of the form
`"tool_metadata":{"name":<cratename>,"path":<cratepath>}` is emitted.
Encode MIR metadata by iterating on DefId instead of traversing the HIR tree
Split out of https://github.com/rust-lang/rust/pull/80347.
This part only traverses `mir_keys` and encodes MIR according to the def kind.
r? `@oli-obk`
This allows to directly map from a DefPathHash to the crate it
originates from, without constructing side tables to do that mapping.
It also allows to reliably and cheaply check for DefPathHash collisions.
Remove DepKind::CrateMetadata and pre-allocation of DepNodes
Remove much of the special-case handling around crate metadata
dependency tracking by replacing `DepKind::CrateMetadata` and the
pre-allocation of corresponding `DepNodes` with on-demand invocation
of the `crate_hash` query.
Rework diagnostics for wrong number of generic args (fixes#66228 and #71924)
This PR reworks the `wrong number of {} arguments` message, so that it provides more details and contextual hints.
Consistently avoid constructing optimized MIR when not doing codegen
The optimized MIR for closures is being encoded unconditionally, while
being unnecessary for cargo check. This turns out to be especially
costly with MIR inlining enabled, since it triggers computation of
optimized MIR for all callees that are being examined for inlining
purposes https://github.com/rust-lang/rust/pull/77307#issuecomment-751915450.
Skip encoding of optimized MIR for closures, enum constructors, struct
constructors, and trait fns when not doing codegen, like it is already
done for other items since 49433.
Separate out a `hir::Impl` struct
This makes it possible to pass the `Impl` directly to functions, instead
of having to pass each of the many fields one at a time. It also
simplifies matches in many cases.
See `rustc_save_analysis::dump_visitor::process_impl` or `rustdoc::clean::clean_impl` for a good example of how this makes `impl`s easier to work with.
r? `@petrochenkov` maybe?
This makes it possible to pass the `Impl` directly to functions, instead
of having to pass each of the many fields one at a time. It also
simplifies matches in many cases.
The optimized MIR for closures is being encoded unconditionally, while
being unnecessary for cargo check. This turns out to be especially
costly with MIR inlining enabled, since it triggers computation of
optimized MIR for all callees that are being examined for inlining
purposes.
Skip encoding of optimized MIR for closures, enum constructors, struct
constructors, and trait fns when not doing codegen, like it is already
done for other items since 49433.
Remove much of the special-case handling around crate metadata
dependency tracking by replacing `DepKind::CrateMetadata` and the
pre-allocation of corresponding `DepNodes` with on-demand invocation
of the `crate_hash` query.
Make CTFE able to check for UB...
... by not doing any optimizations on the `const fn` MIR used in CTFE. This means we duplicate all `const fn`'s MIR now, once for CTFE, once for runtime. This PR is for checking the perf effect, so we have some data when talking about https://github.com/rust-lang/const-eval/blob/master/rfcs/0000-const-ub.md
To do this, we now have two queries for obtaining mir: `optimized_mir` and `mir_for_ctfe`. It is now illegal to invoke `optimized_mir` to obtain the MIR of a const/static item's initializer, an array length, an inline const expression or an enum discriminant initializer. For `const fn`, both `optimized_mir` and `mir_for_ctfe` work, the former returning the MIR that LLVM should use if the function is called at runtime. Similarly it is illegal to invoke `mir_for_ctfe` on regular functions.
This is all checked via appropriate assertions and I don't think it is easy to get wrong, as there should be no `mir_for_ctfe` calls outside the const evaluator or metadata encoding. Almost all rustc devs should keep using `optimized_mir` (or `instance_mir` for that matter).
Reduce a large memory spike that happens during serialization by writing
the incr comp structures to file by way of a fixed-size buffer, rather
than an unbounded vector.
Effort was made to keep the instruction count close to that of the
previous implementation. However, buffered writing to a file inherently
has more overhead than writing to a vector, because each write may
result in a handleable error. To reduce this overhead, arrangements are
made so that each LEB128-encoded integer can be written to the buffer
with only one capacity and error check. Higher-level optimizations in
which entire composite structures can be written with one capacity and
error check are possible, but would require much more work.
The performance is mostly on par with the previous implementation, with
small to moderate instruction count regressions. The memory reduction is
significant, however, so it seems like a worth-while trade-off.
Move variable into the only branch where it is relevant
At the `if` branch `filter` (the `let` binding) is `None` iff `filter` (the parameter) was `None`.
We can branch on the parameter, move the binding into the `if`, and the complexity of handling
`Option<Option<_>` largely dissolves.
`@rustbot` modify labels +C-cleanup +T-compiler
Note: I have no idea how hot this code is. If this method frequently gets called with a `None` filter, there might be a small perf improvement.
Clean up in `each_child_of_item`
This PR hopes to eliminate some of the surprising elements I encountered while reading the function.
- `macros_only` is checked against inside the loop body, but if it is `true`, the loop is skipped anyway
- only query `span` when relevant
- no need to allocate attribute vector
At the `if` branch `filter` (the `let` binding) is `None` iff `filter` (the parameter) was `None`.
We can branch on the parameter, move the binding into the `if`, and the complexity of handling
`Option<Option<_>` largely dissolves.
Adds checks for:
* `no_core` attribute
* explicitly-enabled `legacy` symbol mangling
* mir_opt_level > 1 (which enables inlining)
I removed code from the `Inline` MIR pass that forcibly disabled
inlining if `-Zinstrument-coverage` was set. The default `mir_opt_level`
does not enable inlining anyway. But if the level is explicitly set and
is greater than 1, I issue a warning.
The new warnings show up in tests, which is much better for diagnosing
potential option conflicts in these cases.
When encoding a proc-macro crate, there may be gaps in the table (since
we only encode the crate root and proc-macro items). Account for this by
checking if the entry is present, rather than using `unwrap()`
Implement lazy decoding of DefPathTable during incremental compilation
PR https://github.com/rust-lang/rust/pull/75813 implemented lazy decoding of the `DefPathTable` from crate metadata. However, it requires decoding the entire `DefPathTable` when incremental compilation is active, so that we can map a decoded `DefPathHash` to a `DefId` from an arbitrary crate.
This PR adds support for lazy decoding of dependency `DefPathTable`s when incremental compilation si active.
When we load the incremental cache and dep
graph, we need the ability to map a `DefPathHash` to a `DefId` in the
current compilation session (if the corresponding definition still
exists).
This is accomplished by storing the old `DefId` (that is, the `DefId`
from the previous compilation session) for each `DefPathHash` we need to
remap. Since a `DefPathHash` includes the owning crate, the old crate is
guaranteed to be the right one (if the definition still exists). We then
use the old `DefIndex` as an initial guess, which we validate by
comparing the expected and actual `DefPathHash`es. In most cases,
foreign crates will be completely unchanged, which means that we our
guess will be correct. If our guess is wrong, we fall back to decoding
the entire `DefPathTable` for the foreign crate. This still represents
an improvement over the status quo, since we can skip decoding the
entire `DefPathTable` for other crates (where all of our guesses were
correct).
The discussion seems to have resolved that this lint is a bit "noisy" in
that applying it in all places would result in a reduction in
readability.
A few of the trivial functions (like `Path::new`) are fine to leave
outside of closures.
The general rule seems to be that anything that is obviously an
allocation (`Box`, `Vec`, `vec![]`) should be in a closure, even if it
is a 0-sized allocation.
with an eye on merging `TargetOptions` into `Target`.
`TargetOptions` as a separate structure is mostly an implementation detail of `Target` construction, all its fields logically belong to `Target` and available from `Target` through `Deref` impls.
foreign_modules query hash table lookups
When compiling a large monolithic crate we're seeing huge times in the `foreign_modules` query due to repeated iteration over foreign modules (in order to find a module by its id). This implements hash table lookups so that which massively reduces time spent in that query in this particular case. We'll need to see if the overhead of creating the hash table has a negative impact on performance in more normal compilation scenarios.
I'm working with `@wesleywiser` on this.
$ touch empty.rs
$ env RUSTC_LOG=debug rustc +stage1 --crate-type=lib empty.rs
Fails with a `BorrowMutError` because source map files are already
borrowed while `features_query` attempts to format a log message
containing a span.
Release the borrow before the query to avoid the issue.
Fixes#75982
The direct parent of a module may not be a module
(e.g. `const _: () = { #[path = "foo.rs"] mod foo; };`).
To find the parent of a module for purposes of resolution, we need to
walk up the tree until we hit a module or a crate root.
Preparation for a subsequent change that replaces
rustc_target::config::Config with its wrapped Target.
On its own, this commit breaks the build. I don't like making
build-breaking commits, but in this instance I believe that it
makes review easier, as the "real" changes of this PR can be
seen much more easily.
Result of running:
find compiler/ -type f -exec sed -i -e 's/target\.target\([)\.,; ]\)/target\1/g' {} \;
find compiler/ -type f -exec sed -i -e 's/target\.target$/target/g' {} \;
find compiler/ -type f -exec sed -i -e 's/target.ptr_width/target.pointer_width/g' {} \;
./x.py fmt
Fixes#77523
Now that hygiene serialization is implemented, we also need to record
`expansion_that_defined` so that we properly handle a foreign
`SyntaxContext`.
Currently, we serialize the same crate metadata for proc-macro crates as
we do for normal crates. This is quite wasteful - almost none of this
metadata is ever used, and much of it can't even be deserialized (if it
contains a foreign `CrateNum`).
This PR changes metadata encoding to skip encoding the majority of crate
metadata for proc-macro crates. Most of the `Lazy<[T]>` fields are left
completetly empty, while the non-lazy fields are left as-is.
Additionally, proc-macros now have a def span that does not include
their body. This was done for normal functions in #75465, but was missed
for proc-macros.
As a result of this PR, we should only ever encode local `CrateNum`s
when encoding proc-macro crates. I've added a specialized serialization
impl for `CrateNum` to assert this.
emit errors during AbstractConst building
There changes are currently still untested, so I don't expect this to pass CI 😆
It seems to me like this is the direction we want to go in, though we didn't have too much of a discussion about this.
r? @oli-obk
Don't query stability data when `staged_api` is off
This data only needs to be encoded when `#![feature(staged_api)]` or `-Zforce-unstable-if-unmarked` is on. Running these queries takes measurable time on large crates with many items, so skip it when the unstable flags have not been enabled.
Previously, we would throw away the `SyntaxContext` of any span with a
dummy location during metadata encoding. This commit makes metadata Span
encoding consistent with incr-cache Span encoding - an 'invalid span'
tag is only used when it doesn't lose any information.