Some codegen_llvm cleanups
Using some more safe wrappers and thus being able to remove a large unsafe block.
As a next step we should probably look into safe extern fns
- For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information
- Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information
compiler: Stop reexporting stuff in cg_llvm::abi
The reexports confuse tooling like rustdoc into thinking cg_llvm is the source of key types that originate in rustc_target.
improve cold_path()
#120370 added a new instrinsic `cold_path()` and used it to fix `likely` and `unlikely`
However, in order to limit scope, the information about cold code paths is only used in 2-target switch instructions. This is sufficient for `likely` and `unlikely`, but limits usefulness of `cold_path` for idiomatic rust. For example, code like this:
```
if let Some(x) = y { ... }
```
may generate 3-target switch:
```
switch y.discriminator:
0 => true branch
1 = > false branch
_ => unreachable
```
and therefore marking a branch as cold will have no effect.
This PR improves `cold_path()` to work with arbitrary switch instructions.
Note that for 2-target switches, we can use `llvm.expect`, but for multiple targets we need to manually emit branch weights. I checked Clang and it also emits weights in this situation. The Clang's weight calculation is more complex that this PR, which I believe is mainly because `switch` in `C/C++` can have multiple cases going to the same target.
Continuing the work started in #136466.
Every method gains a `hir_` prefix, though for the ones that already
have a `par_` or `try_par_` prefix I added the `hir_` after that.
Replace some u64 hashes with Hash64
I introduced the Hash64 and Hash128 types in https://github.com/rust-lang/rust/pull/110083, essentially as a mechanism to prevent hashes from landing in our leb128 encoding paths. If you just have a u64 or u128 field in a struct then derive Encodable/Decodable, that number gets leb128 encoding. So if you need to store a hash or some other value which behaves very close to a hash, don't store it as a u64.
This reverts part of https://github.com/rust-lang/rust/pull/117603, which turned an encoded Hash64 into a u64.
Based on https://github.com/rust-lang/rust/pull/110083, I don't expect this to be perf-sensitive on its own, though I expect that it may help stabilize some of the small rmeta size fluctuations we currently see in perf reports.
nvptx64: update default alignment to match LLVM 21
This changed in llvm/llvm-project@91cb8f5d32. The commit itself is mostly about some intrinsic instructions, but as an aside it also mentions something about addrspace for tensor memory, which I believe is what this string is telling us.
`@rustbot` label: +llvm-main
Set both `nuw` and `nsw` in slice size calculation
There's an old note in the code to do this, and now that [LLVM-C has an API for it](f0b8ff1251/llvm/include/llvm-c/Core.h (L4403-L4408)), we might as well. And it's been there since what looks like LLVM 17 de9b6aa341 so doesn't even need to be conditional.
(There's other places, like `RawVecInner` or `Layout`, that might want to do things like this too, but I'll leave those for a future PR.)
debuginfo: Set bitwidth appropriately in enum variant tags
Previously, we unconditionally set the bitwidth to 128-bits, the largest an enum would possibly be. Then, LLVM would cut down the constant by chopping off leading zeroes before emitting the DWARF. LLVM only supported 64-bit enumerators, so this would also have occasionally resulted in truncated data.
LLVM added support for 128-bit enumerators in llvm/llvm-project#125578
That patchset trusts the constant to describe how wide the variant tag is, so the high 64-bits of zeros are considered potentially load-bearing.
As a result, we went from emitting tags that looked like:
DW_AT_discr_value (0xfe)
(because `dwarf::BestForm` selected `data1`)
to emitting tags that looked like:
DW_AT_discr_value (<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )
This makes the `DW_AT_discr_value` encode at the bitwidth of the tag, which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all debuggers / downstream tools.
3. Will result in smaller debug information.
cg_llvm: Reduce visibility of all functions in the llvm module
Next part of #135502
This reduces the visibility of all functions in the `llvm` module to `pub(crate)` and marks the `enzyme_ffi` modules with `#![expect(dead_code)]` (as previously discussed: <https://github.com/rust-lang/rust/pull/135502#discussion_r1915608085>).
r? ``@Zalathar``
Parallel-compiler-related cleanup
Parallel-compiler-related cleanup
I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.
cc "Parallel Rustc Front-end" https://github.com/rust-lang/rust/issues/113349
r? SparrowLii
``@rustbot`` label: +WG-compiler-parallel
Mark condition/carry bit as clobbered in C-SKY inline assembly
C-SKY's compare and some arithmetic/logical instructions modify condition/carry bit (C) in PSR, but there is currently no way to mark it as clobbered in `asm!`.
This PR marks it as clobbered except when [`options(preserves_flags)`](https://doc.rust-lang.org/reference/inline-assembly.html#r-asm.options.supported-options.preserves_flags) is used.
Refs:
- Section 1.3 "Programming model" and Section 1.3.5 "Condition/carry bit" in CSKY Architecture user_guide:
9f7121f7d4/CSKY%20Architecture%20user_guide.pdf
> Under user mode, condition/carry bit (C) is located in the lowest bit of PSR, and it can be
accessed and changed by common user instructions. It is the only data bit that can be visited
under user mode in PSR.
> Condition or carry bit represents the result after one operation. Condition/carry bit can be
clearly set according to the results of compare instructions or unclearly set as some
high-precision arithmetic or logical instructions. In addition, special instructions such as
DEC[GT,LT,NE] and XTRB[0-3] will influence the value of condition/carry bit.
- Register definition in LLVM:
https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/CSKY/CSKYRegisterInfo.td#L88
cc ```@Dirreke``` ([target maintainer](aa6f5ab18e/src/doc/rustc/src/platform-support/csky-unknown-linux-gnuabiv2.md (target-maintainers)))
r? ```@Amanieu```
```@rustbot``` label +O-csky +A-inline-assembly
Cast allocas to default address space
Pointers for variables all need to be in the same address space for correct compilation. Therefore ensure that even if an `alloca` is created in a different address space, it is casted to the default address space before its value is used.
This is necessary for the amdgpu target and others where the default address space for `alloca`s is not 0.
For example the following code compiles incorrectly when not casting the address space to the default one:
```rust
fn f(p: *const i8 /* addrspace(0) */) -> *const i8 /* addrspace(0) */ {
let local = 0i8; /* addrspace(5) */
let res = if cond { p } else { &raw const local };
res
}
```
results in
```llvm
%local = alloca addrspace(5) i8
%res = alloca addrspace(5) ptr
if:
; Store 64-bit flat pointer
store ptr %p, ptr addrspace(5) %res
else:
; Store 32-bit scratch pointer
store ptr addrspace(5) %local, ptr addrspace(5) %res
ret:
; Load and return 64-bit flat pointer
%res.load = load ptr, ptr addrspace(5) %res
ret ptr %res.load
```
For amdgpu, `addrspace(0)` are 64-bit pointers, `addrspace(5)` are 32-bit pointers.
The above code may store a 32-bit pointer and read it back as a 64-bit pointer, which is obviously wrong and cannot work. Instead, we need to `addrspacecast %local to ptr addrspace(0)`, then we store and load the correct type.
Tracking issue: #135024
Previously, we unconditionally set the bitwidth to 128-bits, the largest
an discrimnator would possibly be. Then, LLVM would cut down the constant by
chopping off leading zeroes before emitting the DWARF. LLVM only
supported 64-bit descriminators, so this would also have occasionally
resulted in truncated data (or an assert) if more than 64-bits were
used.
LLVM added support for 128-bit enumerators in llvm/llvm-project#125578
That patchset also trusts the constant to describe how wide the variant tag is.
As a result, we went from emitting tags that looked like:
DW_AT_discr_value (0xfe)
(`form1`)
to emitting tags that looked like:
DW_AT_discr_value (<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )
This makes the `DW_AT_discr_value` encode at the bitwidth of the tag,
which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all
debuggers / downstream tools.
3. Will result in smaller debug information.
Document some safety constraints and use more safe wrappers
Lots of unsafe codegen_llvm code has safe wrappers already, so I used some of them and added some where applicable. I stopped here because this diff is large enough and should probably be reviewed independently of other changes.
cg_llvm: Reduce visibility of some items outside the `llvm` module
Next piece of #135502
This reduces the visibility of items (other than those in the `llvm` module) so that dead code analysis will correctly identify unused items.
Pointers for variables all need to be in the same address space for
correct compilation. Therefore ensure that even if an `alloca` is
created in a different address space, it is casted to the default
address space before its value is used.
This is necessary for the amdgpu target and others where the default
address space for `alloca`s is not 0.
For example the following code compiles incorrectly when not casting the
address space to the default one:
```rust
fn f(p: *const i8 /* addrspace(0) */) -> *const i8 /* addrspace(0) */ {
let local = 0i8; /* addrspace(5) */
let res = if cond { p } else { &raw const local };
res
}
```
results in
```llvm
%local = alloca addrspace(5) i8
%res = alloca addrspace(5) ptr
if:
; Store 64-bit flat pointer
store ptr %p, ptr addrspace(5) %res
else:
; Store 32-bit scratch pointer
store ptr addrspace(5) %local, ptr addrspace(5) %res
ret:
; Load and return 64-bit flat pointer
%res.load = load ptr, ptr addrspace(5) %res
ret ptr %res.load
```
For amdgpu, `addrspace(0)` are 64-bit pointers, `addrspace(5)` are
32-bit pointers.
The above code may store a 32-bit pointer and read it back as a 64-bit
pointer, which is obviously wrong and cannot work. Instead, we need to
`addrspacecast %local to ptr addrspace(0)`, then we store and load the
correct type.
adding autodiff tests
I'd like to get started with upstreaming some tests, even though I'm still waiting for an answer on how to best integrate the enzyme pass. Can we therefore temporarily support the -Z llvm-plugins here without too much effort? And in that case, how would that work? I saw you can do remapping, e.g. `rust-src-base`, but I don't think that will give me the path to libEnzyme.so. Do you have another suggestion?
Other than that this test simply checks that the derivative of `x*x` is `2.0 * x`, which in this case is computed as
`%0 = fadd fast double %x.0.val, %x.0.val`
(I'll add a few more tests and move it to an autodiff folder if we can use the -Z flag)
r? ``@jieyouxu``
Locally at least `-Zllvm-plugins=${PWD}/build/x86_64-unknown-linux-gnu/enzyme/build/Enzyme/libEnzyme-19.so` seems to work if I copy the command I get from x.py test and run it manually. However, running x.py test itself fails.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/channel/326414-t-infra.2Fbootstrap/topic/Enzyme.20build.20changes
coverage: Defer part of counter-creation until codegen
Follow-up to #135481 and #135873.
One of the pleasant properties of the new counter-assignment algorithm is that we can stop partway through the process, store the intermediate state in MIR, and then resume the rest of the algorithm during codegen. This lets it take into account which parts of the control-flow graph were eliminated by MIR opts, resulting in fewer physical counters and simpler counter expressions.
Those improvements end up completely obsoleting much larger chunks of code that were previously responsible for cleaning up the coverage metadata after MIR opts, while also doing a more thorough cleanup job.
(That change also unlocks some further simplifications that I've kept out of this PR to limit its scope.)
It is speculated that these two can be conceptually merged, and it can
start by ripping out rustc's notion of the PtxKernel call convention.
Leave the ExternAbi for now, but the nvptx target now should see it as
just a different way to spell Conv::GpuKernel.
Update bootstrap compiler and rustfmt
The rustfmt version we previously used formats things differently from what the latest nightly rustfmt does. This causes issues for subtrees that get formatted both in-tree and in their own repo. Updating the rustfmt used in-tree solves those issues. Also bumped the bootstrap compiler as the stage0 update command always updates both at the same
time.
Rollup of 5 pull requests
Successful merges:
- #134679 (Windows: remove readonly files)
- #136213 (Allow Rust to use a number of libc filesystem calls)
- #136530 (Implement `x perf` directly in bootstrap)
- #136601 (Detect (non-raw) borrows of null ZST pointers in CheckNull)
- #136659 (Pick the max DWARF version when LTO'ing modules with different versions )
r? `@ghost`
`@rustbot` modify labels: rollup
compiler: mostly-finish `rustc_abi` updates
This almost-finishes all the updates in the compiler to use `rustc_abi` and removes some of the reexports of `rustc_abi` items in `rustc_target` that were previously available.
r? ```@compiler-errors```
Pick the max DWARF version when LTO'ing modules with different versions
Currently, when rustc compiles code with `-Clto` enabled that was built
with different choices for `-Zdwarf-version`, a warning will be
reported. It's very easy to observe this by compiling most anything (eg,
"hello world") and specifying `-Clto -Zdwarf-version=5` since the
standard library is distributed with `-Zdwarf-version=4`.
This behavior isn't actually useful for a few reasons:
- From observation, LLVM chooses to pick the highest DWARF version
anyway after issuing the warning.
- Clang specifies that in this case, the max version should be picked
without a warning and as a general principle, we want to support
x-lang LTO with Clang which implies using the same module flag merge
behaviors.
- Debuggers need to be able to handle a variety of versions within the
same debugging session as you can easily have some parts of a binary
(or some dynamic libraries within an application) all compiled with
different DWARF versions.
This commit changes the module flag merge behavior to match Clang and
use the highest version of DWARF. It also adds a test to ensure this
behavior is respected in the case of two crates being LTO'd together and
adds a test to ensure no warning is printed.
Fixes#130041 which fails due to these warnings being printed
cc #103057
Currently, when rustc compiles code with `-Clto` enabled that was built
with different choices for `-Zdwarf-version`, a warning will be
reported. It's very easy to observe this by compiling most anything (eg,
"hello world") and specifying `-Clto -Zdwarf-version=5` since the
standard library is distributed with `-Zdwarf-version=4`.
This behavior isn't actually useful for a few reasons:
- from observation, LLVM chooses to pick the highest DWARF version
anyway after issuing the warning
- Clang specifies that in this case, the max version should be picked
without a warning and as a general principle, we want to support
x-lang LTO with Clang which implies using the same module flag merge
behaviors
- Debuggers need to be able to handle a variety of versions withing the
same debugging session as you can easily have some parts of a binary
(or some dynamic libraries within an application) all compiled with
different DWARF versions
This commit changes the module flag merge behavior to match Clang and
use the highest version of DWARF. It also adds a test to ensure this
behavior is respected in the case of two crates being LTO'd together and
updates the test added in the previous commit to ensure no warning is
printed.
Debuginfo for function ZSTs should have alignment of 8 bits, not 1 bit
In #116096, function ZSTs were made to have debuginfo that gives them an alignment of “1”. But because alignment in LLVM debuginfo is denoted in *bits*, not bytes, this resulted in an alignment specification of 1 bit instead of 1 byte.
I don't know whether this has any practical consequences, but I noticed that a test started failing when I accidentally fixed the mistake while working on #136632, so I extracted the fix (and the test adjustment) to this PR.
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
cg_llvm: Replace some DIBuilder wrappers with LLVM-C API bindings (part 1)
Part of #134001, follow-up to #136326, extracted from #134009.
This PR performs an arbitrary subset of the LLVM-C binding migrations from #134009, which should make it less tedious to review. The remaining migrations can occur in one or more subsequent PRs.
This changed in llvm/llvm-project@91cb8f5d32.
The commit itself is mostly about some intrinsic instructions, but as an
aside it also mentions something about addrspace for tensor memory,
which I believe is what this string is telling us.
@rustbot label: +llvm-main
Explain why we retroactively change a static initializer to have a different type
I keep getting confused about it and in turn confused `@GuillaumeGomez` while trying to explain it badly
Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle
This PR should not be merged until the rustc_codegen_llvm part is merged.
I will also alter it a little based on what get's shaved off from the cg_llvm PR,
and address some of the feedback I received in the other PR (including cleanups).
I am putting it already up to
1) Discuss with `@jieyouxu` if there is more work needed to add tests to this and
2) Pray that there is someone reviewing who can tell me why some of my autodiff invocations get lost.
Re 1: My test require fat-lto. I also modify the compilation pipeline. So if there are any other llvm-ir tests in the same compilation unit then I will likely break them. Luckily there are two groups who currently have the same fat-lto requirement for their GPU code which I have for my autodiff code and both groups have some plans to enable support for thin-lto. Once either that work pans out, I'll copy it over for this feature. I will also work on not changing the optimization pipeline for functions not differentiated, but that will require some thoughts and engineering, so I think it would be good to be able to run the autodiff tests isolated from the rest for now. Can you guide me here please?
For context, here are some of my tests in the samples folder: https://github.com/EnzymeAD/rustbook
Re 2: This is a pretty serious issue, since it effectively prevents publishing libraries making use of autodiff: https://github.com/EnzymeAD/rust/issues/173. For some reason my dummy code persists till the end, so the code which calls autodiff, deletes the dummy, and inserts the code to compute the derivative never gets executed. To me it looks like the rustc_autodiff attribute just get's dropped, but I don't know WHY? Any help would be super appreciated, as rustc queries look a bit voodoo to me.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
r? `@jieyouxu`
Fix deduplication mismatches in vtables leading to upcasting unsoundness
We currently have two cases where subtleties in supertraits can trigger disagreements in the vtable layout, e.g. leading to a different vtable layout being accessed at a callsite compared to what was prepared during unsizing. Namely:
### #135315
In this example, we were not normalizing supertraits when preparing vtables. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Identity {
type Selff;
}
impl<Selff> Identity for Selff {
type Selff = Selff;
}
trait Middle<T>: Supertrait<()> + Supertrait<T> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T> Middle<T> for () {}
trait Trait: Middle<<() as Identity>::Selff> {}
impl Trait for () {}
fn main() {
(&() as &dyn Trait as &dyn Middle<()>).say_hello(&0);
}
```
When we prepare `dyn Trait`, we see a supertrait of `Middle<<() as Identity>::Selff>`, which itself has two supertraits `Supertrait<()>` and `Supertrait<<() as Identity>::Selff>`. These two supertraits are identical, but they are not duplicated because we were using structural equality and *not* considering normalization. This leads to a vtable layout with two trait pointers.
When we upcast to `dyn Middle<()>`, those two supertraits are now the same, leading to a vtable layout with only one trait pointer. This leads to an offset error, and we call the wrong method.
### #135316
This one is a bit more interesting, and is the bulk of the changes in this PR. It's a bit similar, except it uses binder equality instead of normalization to make the compiler get confused about two vtable layouts. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Trait<T, U>: Supertrait<T> + Supertrait<U> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T, U> Trait<T, U> for () {}
fn main() {
(&() as &'static dyn for<'a> Trait<&'static (), &'a ()>
as &'static dyn Trait<&'static (), &'static ()>)
.say_hello(&0);
}
```
When we prepare the vtable for `dyn for<'a> Trait<&'static (), &'a ()>`, we currently consider the PolyTraitRef of the vtable as the key for a supertrait. This leads two two supertraits -- `Supertrait<&'static ()>` and `for<'a> Supertrait<&'a ()>`.
However, we can upcast[^up] without offsetting the vtable from `dyn for<'a> Trait<&'static (), &'a ()>` to `dyn Trait<&'static (), &'static ()>`. This is just instantiating the principal trait ref for a specific `'a = 'static`. However, when considering those supertraits, we now have only one distinct supertrait -- `Supertrait<&'static ()>` (which is deduplicated since there are two supertraits with the same substitutions). This leads to similar offsetting issues, leading to the wrong method being called.
[^up]: I say upcast but this is a cast that is allowed on stable, since it's not changing the vtable at all, just instantiating the binder of the principal trait ref for some lifetime.
The solution here is to recognize that a vtable isn't really meaningfully higher ranked, and to just treat a vtable as corresponding to a `TraitRef` so we can do this deduplication more faithfully. That is to say, the vtable for `dyn for<'a> Tr<'a>` and `dyn Tr<'x>` are always identical, since they both would correspond to a set of free regions on an impl... Do note that `Tr<for<'a> fn(&'a ())>` and `Tr<fn(&'static ())>` are still distinct.
----
There's a bit more that can be cleaned up. In codegen, we can stop using `PolyExistentialTraitRef` basically everywhere. We can also fix SMIR to stop storing `PolyExistentialTraitRef` in its vtable allocations.
As for testing, it's difficult to actually turn this into something that can be tested with `rustc_dump_vtable`, since having multiple supertraits that are identical is a recipe for ambiguity errors. Maybe someone else is more creative with getting that attr to work, since the tests I added being run-pass tests is a bit unsatisfying. Miri also doesn't help here, since it doesn't really generate vtables that are offset by an index in the same way as codegen.
r? `@lcnr` for the vibe check? Or reassign, idk. Maybe let's talk about whether this makes sense.
<sup>(I guess an alternative would also be to not do any deduplication of vtable supertraits (or only a really conservative subset) rather than trying to normalize and deduplicate more faithfully here. Not sure if that works and is sufficient tho.)</sup>
cc `@steffahn` -- ty for the minimizations
cc `@WaffleLapkin` -- since you're overseeing the feature stabilization :3
Fixes#135315Fixes#135316
Introduce a wrapper for "typed valtrees" and properly check the type before extracting the value
This PR adds a new wrapper type `ty::Value` to replace the tuple `(Ty, ty::ValTree)` and become the new canonical representation of type-level constant values.
The value extraction methods `try_to_bits`/`try_to_bool`/`try_to_target_usize` are moved to this new type. For `try_to_bits` in particular, this avoids some redundant matches on `ty::ConstKind::Value`. Furthermore, these methods and will now properly check the type before extracting the value, which fixes some ICEs.
The name `ty::Value` was chosen to be consistent with `ty::Expr`.
Commit 1 should be non-functional and commit 2 adds the type check.
---
fixes https://github.com/rust-lang/rust/issues/131102
supercedes https://github.com/rust-lang/rust/pull/136130
r? `@oli-obk`
cc `@FedericoBruzzone` `@BoxyUwU`
Cast global variables to default address space
Pointers for variables all need to be in the same address space for correct compilation. Therefore ensure that even if a global variable is created in a different address space, it is casted to the default address space before its value is used.
This is necessary for the amdgpu target and others where the default address space for global variables is not 0.
For example `core` does not compile in debug mode when not casting the address space to the default one because it tries to emit the following (simplified) LLVM IR, containing a type mismatch:
```llvm
`@alloc_0` = addrspace(1) constant <{ [6 x i8] }> <{ [6 x i8] c"bit.rs" }>, align 1
`@alloc_1` = addrspace(1) constant <{ ptr }> <{ ptr addrspace(1) `@alloc_0` }>, align 8
; ^ here a struct containing a `ptr` is needed, but it is created using a `ptr addrspace(1)`
```
For this to compile, we need to insert a constant `addrspacecast` before we use a global variable:
```llvm
`@alloc_0` = addrspace(1) constant <{ [6 x i8] }> <{ [6 x i8] c"bit.rs" }>, align 1
`@alloc_1` = addrspace(1) constant <{ ptr }> <{ ptr addrspacecast (ptr addrspace(1) `@alloc_0` to ptr) }>, align 8
```
As vtables are global variables as well, they are also created with an `addrspacecast`. In the SSA backend, after a vtable global is created, metadata is added to it. To add metadata, we need the non-casted global variable. Therefore we strip away an addrspacecast if there is one, to get the underlying global.
Tracking issue: #135024
ABI-required target features: warn when they are missing in base CPU
Part of https://github.com/rust-lang/rust/pull/135408:
instead of adding ABI-required features to the target we build for LLVM, check that they are already there. Crucially we check this after applying `-Ctarget-cpu` and `-Ctarget-feature`, by reading `sess.unstable_target_features`. This means we can tweak the ABI target feature check without changing the behavior for any existing user; they will get warnings but the target features behave as before.
The test changes here show that we are un-doing the "add all required target features" part. Without the full #135408, there is no way to take a way an ABI-required target feature with `-Ctarget-cpu`, so we cannot yet test that part.
Cc ``@workingjubilee``