This adds the typeid and `vcall_visibility` metadata to vtables when the
-Cvirtual-function-elimination flag is set.
The typeid is generated in the same way as for the
`llvm.type.checked.load` intrinsic from the trait_ref.
The offset that is added to the typeid is always 0. This is because LLVM
assumes that vtables are constructed according to the definition in the
Itanium ABI. This includes an "address point" of the vtable. In C++ this
is the offset in the vtable where information for RTTI is placed. Since
there is no RTTI information in Rust's vtables, this "address point" is
always 0. This "address point" in combination with the offset passed to
the `llvm.type.checked.load` intrinsic determines the final function
that should be loaded from the vtable in the
`WholeProgramDevirtualization` pass in LLVM. That's why the
`llvm.type.checked.load` intrinsics are generated with the typeid of the
trait, rather than with that of the function that is called. This
matches what `clang` does for C++.
The vcall_visibility metadata depends on three factors:
1. LTO level: Currently this is always fat LTO, because LLVM only
supports this optimization with fat LTO.
2. Visibility of the trait: If the trait is publicly visible, VFE
can only act on its vtables after linking.
3. Number of CGUs: if there is more than one CGU, also vtables with
restricted visibility could be seen outside of the CGU, so VFE can
only act on them after linking.
To reflect this, there are three visibility levels: Public, LinkageUnit,
and TranslationUnit.
In DWARF, alignment of types is specified in bits, as is made clear by the
parameter name `AlignInBits`. However, `rustc` was incorrectly passing a byte
alignment. This commit fixes that.
This was noticed in upstream LLVM when I tried to check in a test consisting of
LLVM IR generated from `rustc` and it triggered assertions [1].
[1]: https://reviews.llvm.org/D126835
Before this fix, the debuginfo for the fields was generated from the
struct defintion of Box<T>, but (at least at the moment) the compiler
pretends that Box<T> is just a (fat) pointer, so the fields need to be
`pointer` and `vtable` instead of `__0: Unique<T>` and `__1: Allocator`.
This is meant as a temporary mitigation until we can make sure that
simply treating Box as a regular struct in debuginfo does not cause too
much breakage in the ecosystem.
This commit
- changes names to use di_node instead of metadata
- uniformly names all functions that build new debuginfo nodes build_xyz_di_node
- renames CrateDebugContext to CodegenUnitDebugContext (which is more accurate)
- moves TypeMap and functions that work directly work with it to a new type_map module
- moves and reimplements enum related builder functions to a new enums module
- splits enum debuginfo building for the native and cpp-like cases, since they are mostly separate
- uses SmallVec instead of Vec in many places
- removes the old infrastructure for dealing with recursion cycles (create_and_register_recursive_type_forward_declaration(), RecursiveTypeDescription, set_members_of_composite_type(), MemberDescription, MemberDescriptionFactory, prepare_xyz_metadata(), etc)
- adds type_map::build_type_with_children() as a replacement for dealing with recursion cycles
- adds many (doc-)comments explaining what's going on
- changes cpp-like naming for C-Style enums so they don't get a enum$<...> name (because the NatVis visualizer does not apply to them)
- fixes detection of what is a C-style enum because some enums where classified as C-style even though they have fields
- changes the position of discriminant debuginfo node so it is consistently nested inside the top-level union instead of, sometimes, next to it
Improve `AdtDef` interning.
This commit makes `AdtDef` use `Interned`. Much of the commit is tedious
changes to introduce getter functions. The interesting changes are in
`compiler/rustc_middle/src/ty/adt.rs`.
r? `@fee1-dead`
This commit makes `AdtDef` use `Interned`. Much the commit is tedious
changes to introduce getter functions. The interesting changes are in
`compiler/rustc_middle/src/ty/adt.rs`.
debuginfo: Simplify TypeMap used during LLVM debuginfo generation.
This PR simplifies the TypeMap that is used in `rustc_codegen_llvm::debuginfo::metadata`. It was unnecessarily complicated because it was originally implemented when types were not yet normalized before codegen. So it did it's own normalization and kept track of multiple unnormalized types being mapped to a single unique id.
This PR is based on https://github.com/rust-lang/rust/pull/93503, which is not merged yet.
The PR also removes the arena used for allocating string ids and instead uses `InlinableString` from the [inlinable_string](https://crates.io/crates/inlinable_string) crate. That might not be the best choice, since that crate does not seem to be very actively maintained. The [flexible-string](https://crates.io/crates/flexible-string) crate would be an alternative.
r? `@ghost`
properly handle fat pointers to uninhabitable types
Calculate the pointee metadata size by using `tcx.struct_tail_erasing_lifetimes` instead of duplicating the logic in `fat_pointer_kind`. Open to alternatively suggestions on how to fix this.
Fixes#94149
r? ````@michaelwoerister```` since you touched this code last, I think!
Rust previously encoded the `char` type as DW_ATE_unsigned_char. The more
appropriate encoding is DW_ATE_UTF.
Clang uses this same debug encoding for char32_t.
This fixes the display of `char` types in Windows debuggers as well as LLDB.
The previous implementation was written before types were properly
normalized for code generation and had to assume a more complicated
relationship between types and their debuginfo -- generating separate
identifiers for debuginfo nodes that were based on normalized types.
Since types are now already normalized, we can use them as identifiers
for debuginfo nodes.
Specifically, change `Ty` from this:
```
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
```
to this
```
pub struct Ty<'tcx>(Interned<'tcx, TyS<'tcx>>);
```
There are two benefits to this.
- It's now a first class type, so we can define methods on it. This
means we can move a lot of methods away from `TyS`, leaving `TyS` as a
barely-used type, which is appropriate given that it's not meant to
be used directly.
- The uniqueness requirement is now explicit, via the `Interned` type.
E.g. the pointer-based `Eq` and `Hash` comes from `Interned`, rather
than via `TyS`, which wasn't obvious at all.
Much of this commit is boring churn. The interesting changes are in
these files:
- compiler/rustc_middle/src/arena.rs
- compiler/rustc_middle/src/mir/visit.rs
- compiler/rustc_middle/src/ty/context.rs
- compiler/rustc_middle/src/ty/mod.rs
Specifically:
- Most mentions of `TyS` are removed. It's very much a dumb struct now;
`Ty` has all the smarts.
- `TyS` now has `crate` visibility instead of `pub`.
- `TyS::make_for_test` is removed in favour of the static `BOOL_TY`,
which just works better with the new structure.
- The `Eq`/`Ord`/`Hash` impls are removed from `TyS`. `Interned`s impls
of `Eq`/`Hash` now suffice. `Ord` is now partly on `Interned`
(pointer-based, for the `Equal` case) and partly on `TyS`
(contents-based, for the other cases).
- There are many tedious sigil adjustments, i.e. adding or removing `*`
or `&`. They seem to be unavoidable.
LLDB does not seem to see fields if they are marked with DW_AT_artificial
which breaks pretty printers that use these fields for decoding fat pointers.
The field is also renamed from `ident` to `name. In most cases,
we don't actually need the `Span`. A new `ident` method is added
to `VariantDef` and `FieldDef`, which constructs the full `Ident`
using `tcx.def_ident_span()`. This method is used in the cases
where we actually need an `Ident`.
This makes incremental compilation properly track changes
to the `Span`, without all of the invalidations caused by storing
a `Span` directly via an `Ident`.
Consolidate checking for msvc when generating debuginfo
If the target we're generating code for is msvc, then we do two main
things differently: we generate type names in a C++ style instead of a
Rust style and we generate debuginfo for enums differently.
I've refactored the code so that there is one function
(`cpp_like_debuginfo`) which determines if we should use the C++ style
of naming types and other debuginfo generation or the regular Rust one.
r? ``@michaelwoerister``
This PR is not urgent so please don't let it interrupt your holidays! 🎄🎁
If the target we're generating code for is msvc, then we do two main
things differently: we generate type names in a C++ style instead of a
Rust style and we generate debuginfo for enums differently.
I've refactored the code so that there is one function
(`cpp_like_debuginfo`) which determines if we should use the C++ style
of naming types and other debuginfo generation or the regular Rust one.
In #79570, `-Z split-dwarf-kind={none,single,split}` was replaced by `-C
split-debuginfo={off,packed,unpacked}`. `-C split-debuginfo`'s packed
and unpacked aren't exact parallels to single and split, respectively.
On Unix, `-C split-debuginfo=packed` will put debuginfo into object
files and package debuginfo into a DWARF package file (`.dwp`) and
`-C split-debuginfo=unpacked` will put debuginfo into dwarf object files
and won't package it.
In the initial implementation of Split DWARF, split mode wrote sections
which did not require relocation into a DWARF object (`.dwo`) file which
was ignored by the linker and then packaged those DWARF objects into
DWARF packages (`.dwp`). In single mode, sections which did not require
relocation were written into object files but ignored by the linker and
were not packaged. However, both split and single modes could be
packaged or not, the primary difference in behaviour was where the
debuginfo sections that did not require link-time relocation were
written (in a DWARF object or the object file).
This commit re-introduces a `-Z split-dwarf-kind` flag, which can be
used to pick between split and single modes when `-C split-debuginfo` is
used to enable Split DWARF (either packed or unpacked).
Signed-off-by: David Wood <david.wood@huawei.com>
Remove `SymbolStr`
This was originally proposed in https://github.com/rust-lang/rust/pull/74554#discussion_r466203544. As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.
Best reviewed one commit at a time.
r? `@oli-obk`
rustc_codegen_llvm: Give each codegen unit a unique DWARF name on all platforms, not just Apple ones.
To avoid breaking split DWARF, we need to ensure that each codegen unit has a
unique `DW_AT_name`. This is because there's a remote chance that different
codegen units for the same module will have entirely identical DWARF entries
for the purpose of the DWO ID, which would violate Appendix F ("Split Dwarf
Object Files") of the DWARF 5 specification. LLVM uses the algorithm specified
in section 7.32 "Type Signature Computation" to compute the DWO ID, which does
not include any fields that would distinguish compilation units. So we must
embed the codegen unit name into the `DW_AT_name`.
Closes#88521.
Remove `in_band_lifetimes` from `rustc_codegen_llvm`
See #91867 for more information.
This one took a while. This crate has dozens of functions not associated with any type, and most of them were using in-band lifetimes for `'ll` and `'tcx`.
Apply path remapping to DW_AT_GNU_dwo_name when producing split DWARF
`--remap-path-prefix` doesn't apply to paths to `.o` (in case of packed) or `.dwo` (in case of unpacked) files in `DW_AT_GNU_dwo_name`. GCC also has this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91888
platforms, not just Apple ones.
To avoid breaking split DWARF, we need to ensure that each codegen unit has a
unique `DW_AT_name`. This is because there's a remote chance that different
codegen units for the same module will have entirely identical DWARF entries
for the purpose of the DWO ID, which would violate Appendix F ("Split Dwarf
Object Files") of the DWARF 5 specification. LLVM uses the algorithm specified
in section 7.32 "Type Signature Computation" to compute the DWO ID, which does
not include any fields that would distinguish compilation units. So we must
embed the codegen unit name into the `DW_AT_name`.
Closes#88521.
By changing `as_str()` to take `&self` instead of `self`, we can just
return `&str`. We're still lying about lifetimes, but it's a smaller lie
than before, where `SymbolStr` contained a (fake) `&'static str`!