Similar to how the alignment is already checked, this adds a check
for null pointer dereferences in debug mode. It is implemented similarly
to the alignment check as a MirPass.
This is related to a 2025H1 project goal for better UB checks in debug
mode: https://github.com/rust-lang/rust-project-goals/pull/177.
It's a function that does stuff with MIR and yet it weirdly has its own
module in `rustc_middle::util`. This commit moves it into
`rustc_middle::mir`, a more sensible home.
This also parameterize the "excluded pointee types" and exposes a
general method for inserting checks on pointers.
This is a preparation for adding a NullCheck that makes use of the same
code.
Lower index bounds checking to `PtrMetadata`, this time with the right fake borrow semantics 😸
Change `Rvalue::RawRef` to take a `RawRefKind` instead of just a `Mutability`. Then introduce `RawRefKind::FakeForPtrMetadata` and use that for lowering index bounds checking to a `PtrMetadata`. This new `RawRefKind::FakeForPtrMetadata` acts like a shallow fake borrow in borrowck, which mimics the semantics of the old `Rvalue::Len` operation we're replacing.
We can then use this `RawRefKind` instead of using a span desugaring hack in CTFE.
cc ``@scottmcm`` ``@RalfJung``
Use identifiers more in diagnostics code
This should make the diagnostics code slightly more correct when rendering idents in mixed crate edition situations. Kinda a no-op, but a cleanup regardless.
r? oli-obk or reassign
Incorporate `iter_nodes` into `graph::DirectedGraph`
This helper method iterates over all node IDs in the dense range `0..num_nodes`.
In practice, we have a lot of graph-algorithm code that already assumes that nodes are densely numbered, by using `num_nodes` to allocate per-node indexed data structures. So I don't think this is actually a substantial change to the de-facto semantics of `graph::DirectedGraph`.
---
Resolves a FIXME from #135481.
Get rid of `mir::Const::from_ty_const`
This function is strange, because it turns valtrees into `mir::Const::Value`, but the rest of the const variants stay as type system consts.
All of the callsites except for one in `instsimplify` (array length simplification of `ptr_metadata` call) just go through the valtree arm of the function, so it's easier to just create a `mir::Const` directly for those.
For the instsimplify case, if we have a type system const we should *keep* having a type system const, rather than turning it into a `mir::Const::Value`; it doesn't really matter in practice, though, bc `usize` has no padding, but it feels more principled.
This assumes that the set of valid node IDs is exactly `0..num_nodes`.
In practice, we have a lot of graph-algorithm code that already assumes that
nodes are densely numbered, by using `num_nodes` to allocate per-node indexed
data structures.
Add `#[optimize(none)]`
cc #54882
This extends the `optimize` attribute to add `none`, which corresponds to the LLVM `OptimizeNone` attribute.
Not sure if an MCP is required for this, happy to file one if so.
By removing all methods from this struct and treating it as a collection of
data fields, we make it easier for a future PR to store that data in a query
result, without having to move all of its methods into `rustc_middle`.
This dedicated type seemed like a good idea at the time, but if we want to
store this information in a query result then a plainer data type is more
convenient.
Using `SmallVec` here was fine when it was a module-private detail, but if we
want to pass these terms across query boundaries then it's not worth the extra
hassle.
Replacing a method call with direct field access is slightly simpler.
Using the name `counter_terms` is more consistent with other code that tries to
maintain a distinction between (physical) "counters" and "expressions".
This reflects the fact that we can't compute meaningful info for a function
that wasn't instrumented and therefore doesn't have `function_coverage_info`.
Making these separate types from `CovTerm` and `Expression` was historically
very helpful, but now that most of the counter-creation work is handled by
`node_flow` they are no longer needed.
- Move `make_bcb_counters` out of `CoverageCounters`
- Split out `make_node_counter_priority_list`
- Flatten `Transcriber` into the function `transcribe_counters`
Make MIR cleanup for functions with impossible predicates into a real MIR pass
It's a bit jarring to see the body of a function with an impossible-to-satisfy where clause suddenly go to a single `unreachable` terminator when looking at the MIR dump output in order, and I discovered it's because we manually replace the body outside of a MIR pass.
Let's make it into a fully flegded MIR pass so it's more clear what it's doing and when it's being applied.
Add an InstSimplify for repetitive array expressions
I noticed in https://github.com/rust-lang/rust/pull/135068#issuecomment-2569955426 that GVN's implementation of this same transform was quite profitable on the deep-vector benchmark. But of course GVN doesn't run in unoptimized builds, so this is my attempt to write a version of this transform that benefits the deep-vector case and is fast enough to run in InstSimplify.
The benchmark suite indicates that this is effective.
Adds `#[rustc_force_inline]` which is similar to always inlining but
reports an error if the inlining was not possible, and which always
attempts to inline annotated items, regardless of optimisation levels.
It can only be applied to free functions to guarantee that the MIR
inliner will be able to resolve calls.
We already did `Transmute`-then-`PtrToPtr`; this adds the nearly-identical `PtrToPtr`-then-`Transmute`.
It also adds `transmute(Foo(x))` → `transmute(x)`, when `Foo` is a single-field transparent type. That's useful for things like `NonNull { pointer: p }.as_ptr()`.
Found these as I was looking at MCP807-related changes.
rustc_intrinsic: support functions without body
We synthesize a HIR body `loop {}` but such bodyless intrinsics.
Most of the diff is due to turning `ItemKind::Fn` into a brace (named-field) enum variant, because it carries a `bool`-typed field now. This is to remember whether the function has a body. MIR building panics to avoid ever translating the fake `loop {}` body, and the intrinsic logic uses the lack of a body to implicitly mark that intrinsic as must-be-overridden.
I first tried actually having no body rather than generating the fake body, but there's a *lot* of code that assumes that all function items have HIR and MIR, so this didn't work very well. Then I noticed that even `rustc_intrinsic_must_be_overridden` intrinsics have MIR generated (they are filled with an `Unreachable` terminator) so I guess I am not the first to discover this. ;)
r? `@oli-obk`
Begin to implement type system layer of unsafe binders
Mostly TODOs, but there's a lot of match arms that are basically just noops so I wanted to split these out before I put up the MIR lowering/projection part of this logic.
r? oli-obk
Tracking:
- https://github.com/rust-lang/rust/issues/130516
During coverage instrumentation, this variable always holds the coverage graph,
which is a simplified view of the MIR control-flow graph. The new name is
clearer in context, and also shorter.
coverage: Store coverage source regions as `Span` until codegen (take 2)
This is an attempt to re-land #133418:
> Historically, coverage spans were converted into line/column coordinates during the MIR instrumentation pass.
> This PR moves that conversion step into codegen, so that coverage spans spend most of their time stored as Span instead.
> In addition to being conceptually nicer, this also reduces the size of coverage mappings in MIR, because Span is smaller than 4x u32.
That PR was reverted by #133608, because in some circumstances not covered by our test suite we were emitting coverage metadata that was causing `llvm-cov` to exit with an error (#133606).
---
The implementation here is *mostly* the same, but adapted for subsequent changes in the relevant code (e.g. #134163).
I believe that the changes in #134163 should be sufficient to prevent the problem that required the original PR to be reverted. But I haven't been able to reproduce the original breakage in a regression test, and the `llvm-cov` error message is extremely unhelpful, so I can't completely rule out the possibility of this breaking again.
r? jieyouxu (reviewer of the original PR)
Variants::Single: do not use invalid VariantIdx for uninhabited enums
~~Stacked on top of https://github.com/rust-lang/rust/pull/133681, only the last commit is new.~~
Currently, `Variants::Single` for an empty enum contains a `VariantIdx` of 0; looking that up in the enum variant list will ICE. That's quite confusing. So let's fix that by adding a new `Variants::Empty` case for types that have 0 variants.
try-job: i686-msvc
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Bounds-check with PtrMetadata instead of Len in MIR
Rather than emitting `Len(*_n)` in array index bounds checks, emit `PtrMetadata(copy _n)` instead -- with some asterisks for arrays and `&mut` that need it to be done slightly differently.
We're getting pretty close to removing `Len` entirely, actually. I think just one more PR after this (for slice drop shims).
r? mir
We don't need `NonNull::as_ptr` debuginfo
In order to stop pessimizing the use of local variables in core, skip debug info for MIR temporaries in tiny (single-BB) functions.
For functions as simple as this -- `Pin::new`, etc -- nobody every actually wants debuginfo for them in the first place. They're more like intrinsics than real functions, and stepping over them is good.
Stop pessimizing the use of local variables in core by skipping debug info for MIR temporaries in tiny (single-BB) functions.
For functions as simple as this -- `Pin::new`, etc -- nobody every actually wants debuginfo for them in the first place. They're more like intrinsics than real functions, and stepping over them is good.
coverage: Use a query to find counters/expressions that must be zero
As of #133446, this query (`coverage_ids_info`) determines which counter/expression IDs are unused. So with only a little extra work, we can take the code that was using that information to determine which coverage counters/expressions must be zero, and move that inside the query as well.
There should be no change in compiler output.
coverage: Prefer to visit nodes whose predecessors have been visited
In coverage instrumentation, we need to traverse the control-flow graph and decide what kind of counter (physical counter or counter-expression) should be used for each node that needs a counter.
The existing traversal order is complex and hard to tweak. This new traversal order tries to be a bit more principled, by always preferring to visit nodes whose predecessors have already been visited, which is a good match for how the counter-creation code ends up dealing with a node's in-edges and out-edges.
For several of the coverage tests, this ends up being a strict improvement in reducing the size of the coverage metadata, and also reducing the number of physical counters needed.
(The new traversal should hopefully also allow some further code simplifications in the future.)
---
This is made possible by the separate simplification pass introduced by #133849. Without that, almost any change to the traversal order ends up increasing the size of the expression table or the number of physical counters.
The words "before" and "after" have an obvious temporal meaning, e.g.
`seek_before_primary_effect`,
`visit_statement_{before,after}_primary_effect`. But "before" is also
used to name the effect that occurs before the primary effect of a
statement/terminator; this is `Effect::Before`. This leads to the
confusing possibility of talking about things happening "before/after
the before event".
This commit removes this awkward overloading of "before" by renaming
`Effect::Before` as `Effect::Early`. It also renames some of the
`Analysis` and `ResultsVisitor` methods to be more consistent.
Here are the before and after names:
- `Effect::{Before,Primary}` -> `Effect::{Early,Primary}`
- `apply_before_statement_effect` -> `apply_early_statement_effect`
- `apply_statement_effect` -> `apply_primary_statement_effect`
- `visit_statement_before_primary_effect` -> `visit_after_early_statement_effect`
- `visit_statement_after_primary_effect` -> `visit_after_primary_statement_effect`
(And s/statement/terminator/ for all the terminator events.)
It uses `MaybeInitializedPlaces` and `MaybeUninitializedPlaces`, but
calls the results `live` and `dead`. This is very confusing given that
there are also analyses called `MaybeLiveLocals` and `MaybeStorageLive`
and `MaybeStorageDead`.
This commit changes it to use `maybe_init` and `maybe_uninit`.
Introduce `MixedBitSet`
`ChunkedBitSet` is good at avoiding excessive memory usage for programs with very large functgions where dataflow bitsets have very large domain sizes. But it's overly heavyweight for small bitsets, because any non-empty `ChunkedBitSet` takes up at least 256 bytes.
This PR introduces `MixedBitSet`, which is a simple bitset that uses `BitSet` for small/medium bitsets and `ChunkedBitSet` for large bitsets. It's a speed and memory usage win.
r? `@Mark-Simulacrum`
This query (`coverage_ids_info`) already determines which counter/expression
IDs are unused, so it only takes a little extra effort to also determine which
counters/expressions must have a value of zero.
It's a performance win because `MixedBitSet` is faster and uses less
memory than `ChunkedBitSet`.
Also reflow some overlong comment lines in
`lint_tail_expr_drop_order.rs`.