Most coverage metadata is encoded into two sections in the final executable.
The `__llvm_covmap` section mostly just contains a list of filenames, while the
`__llvm_covfun` section contains encoded coverage maps for each instrumented
function.
The catch is that each per-function record also needs to contain a hash of the
filenames list that it refers to. Historically this was handled by assembling
most of the per-function data into a temporary list, then assembling the
filenames buffer, then using the filenames hash to emit the per-function data,
and then finally emitting the filenames table itself.
However, now that we build the filenames table up-front (via a separate
traversal of the per-function data), we can hash and emit that part first, and
then emit each of the per-function records immediately after building. This
removes the awkwardness of having to temporarily store nearly-complete
per-function records.
The main change here is that `VirtualFileMapping` now uses an internal hashmap
to de-duplicate incoming global file IDs. That removes the need for
`encode_mappings_for_function` to re-sort its mappings by filename in order to
de-duplicate them.
(We still de-duplicate runs of identical filenames to save work, but this is
not load-bearing for correctness, so a sort is not necessary.)
The combined `get_expressions_and_counter_regions` method was an artifact of
having to prepare the expressions and mappings at the same time, to avoid
ownership/lifetime problems with temporary data used by both.
Now that we have an explicit transition from `FunctionCoverageCollector` to the
final `FunctionCoverage`, we can prepare any shared data during that step and
store it in the final struct.
This gives us a clearly-defined place to run code after the instance's MIR has
been traversed by codegen, but before we emit its `__llvm_covfun` record.
This query has a name that sounds general-purpose, but in fact it has
coverage-specific semantics, and (fortunately) is only used by coverage code.
Because it is only ever called once (from one designated CGU), it doesn't need
to be a query, and we can change it to a regular function instead.
Implement rustc part of RFC 3127 trim-paths
This PR implements (or at least tries to) [RFC 3127 trim-paths](https://github.com/rust-lang/rust/issues/111540), the rustc part. That is `-Zremap-path-scope` with all of it's components/scopes.
`@rustbot` label: +F-trim-paths
Even though expression details are now stored in the info structure, we still
need to inject `ExpressionUsed` statements into MIR, because if one is missing
during codegen then we know that it was optimized out and we can remap all of
its associated code regions to zero.
Previously, mappings were attached to individual coverage statements in MIR.
That necessitated special handling in MIR optimizations to avoid deleting those
statements, since otherwise codegen would be unable to reassemble the original
list of mappings.
With this change, a function's list of mappings is now attached to its MIR
body, and survives intact even if individual statements are deleted by
optimizations.
Instead of modifying the accumulated expressions in-place, we now build a set
of expressions that are known to be zero, and then consult that set on the fly
when converting the expression data for FFI.
This will be necessary when moving mappings and expression data into function
coverage info, which can't be mutated during codegen.
Coverage codegen can now allocate arrays based on the number of
counters/expressions originally used by the instrumentor.
The existing query that inspects coverage statements is still used for
determining the number of counters passed to `llvm.instrprof.increment`. If
some high-numbered counters were removed by MIR optimizations, the instrumented
binary can potentially use less memory and disk space at runtime.
This allows coverage information to be attached to the function as a whole when
appropriate, instead of being smuggled through coverage statements in the
function's basic blocks.
As an example, this patch moves the `function_source_hash` value out of
individual `CoverageKind::Counter` statements and into the per-function info.
When synthesizing unused functions for coverage purposes, the absence of this
info is taken to indicate that a function was not eligible for coverage and
should not be synthesized.
The LLVM API that we use to encode coverage mappings already has its own code
for removing unused coverage expressions and renumbering the rest.
This lets us get rid of our own complex renumbering code, making it easier to
change our coverage code in other ways.
After coverage instrumentation and MIR transformations, we can sometimes end up
with coverage expressions that always have a value of zero. Any expression
operand that refers to an always-zero expression can be replaced with a literal
`Operand::Zero`, making the emitted coverage mapping data smaller and simpler.
This simplification step is mostly redundant with the simplifications performed
inline in `expressions_with_regions`, except that it does a slightly more
thorough job in some cases (because it checks for always-zero expressions
*after* other simplifications).
However, adding this simplification step will then let us greatly simplify that
code, without affecting the quality of the emitted coverage maps.
Rework `no_coverage` to `coverage(off)`
As discussed at the tail of https://github.com/rust-lang/rust/issues/84605 this replaces the `no_coverage` attribute with a `coverage` attribute that takes sub-parameters (currently `off` and `on`) to control the coverage instrumentation.
Allows future-proofing for things like `coverage(off, reason="Tested live", issue="#12345")` or similar.
Instead of writing coverage mappings into a supplied `&RustString`, this
function can just create the buffer itself and return the resulting vector of
bytes.
If two or more mappings cover exactly the same region, their relative order
will now be preserved from `get_expressions_and_counter_regions`, rather than
being disturbed by implementation details of an unstable sort.
The current order is: counter mappings, expression mappings, zero mappings.
(LLVM will also perform its own stable sort on these mappings, but that sort
only compares file ID, start location, and `RegionKind`.)
This struct was only being used to hold the global file table, and one of its
methods didn't even use the table. Changing its methods to ordinary functions
makes it easier to see where the table is mutated.
Coverage FFI types were historically split across two modules, because some of
them were needed by code in `rustc_codegen_ssa`.
Now that all of the coverage codegen code has been moved into
`rustc_codegen_llvm` (#113355), it's possible to move all of the FFI types into
a single module, making it easier to see all of them at once.
Operand types are now tracked explicitly, so there is no need to reserve ID 0
for the special always-zero counter.
As part of the renumbering, this change fixes an off-by-one error in the way
counters were counted by the `coverageinfo` query. As a result, functions
should now have exactly the number of counters they actually need, instead of
always having an extra counter that is never used.
Operand types are now tracked explicitly, so there is no need for expression
IDs to avoid counter IDs by descending from `u32::MAX`. Instead they can just
count up from 0, and can be used directly as indices when necessary.
Because the three kinds of operand are now distinguished explicitly, we no
longer need fiddly code to disambiguate counter IDs and expression IDs based on
the total number of counters/expressions in a function.
This does increase the size of operands from 4 bytes to 8 bytes, but that
shouldn't be a big deal since they are mostly stored inside boxed structures,
and the current coverage code is not particularly size-optimized anyway.
This section name is always constant for a given target, but obtaining it from
LLVM requires a few intermediate allocations. There's no need to do so
repeatedly from inside a per-function loop.
Remove `LLVMRustCoverageHashCString`
Coverage has two FFI functions for computing the hash of a byte string. One takes a ptr/len pair (`LLVMRustCoverageHashByteArray`), and the other takes a NUL-terminated C string (`LLVMRustCoverageHashCString`).
But on closer inspection, the C string version is unnecessary. The calling-side code converts a Rust `&str` into a `CString`, and the C++ code then immediately turns it back into a ptr/len string before actually hashing it. So we can just call the ptr/len version directly instead.
---
This PR also fixes a bug in the C++ declaration of `LLVMRustCoverageHashByteArray`. It should be `size_t`, since that's what is declared and passed on the Rust side, and it's what `StrRef`'s constructor expects to receive on the callee side.
Coverage has two FFI functions for computing the hash of a byte string. One
takes a ptr/len pair, and the other takes a NUL-terminated C string.
But on closer inspection, the C string version is unnecessary. The calling-side
code converts a Rust `&str` into a C string, and the C++ code then immediately
turns it back into a ptr/len string before actually hashing it.
The function body immediately treats it as a slice anyway, so this just makes
it possible to call the hash function with arbitrary read-only byte slices.