The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes#115344. Fixes#117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic
Make it so that every structured error annotated with `#[derive(Diagnostic)]` that has a field of type `Ty<'_>`, the printing of that value into a `String` will look at the thread-local storage `TyCtxt` in order to shorten to a length appropriate with the terminal width. When this happen, the resulting error will have a note with the file where the full type name was written to.
```
error[E0618]: expected function, found `((..., ..., ..., ...), ..., ..., ...)``
--> long.rs:7:5
|
6 | fn foo(x: D) { //~ `x` has type `(...
| - `x` has type `((..., ..., ..., ...), ..., ..., ...)`
7 | x(); //~ ERROR expected function, found `(...
| ^--
| |
| call expression requires function
|
= note: the full name for the type has been written to 'long.long-type-14182675702747116984.txt'
= note: consider using `--verbose` to print the full type name to the console
```
Use StableHasher + Hash64 for dep_tracking_hash
This is similar to https://github.com/rust-lang/rust/pull/137095. We currently have a +/- 1 byte jitter in the size of dep graphs reported on perf.rust-lang.org. I think this fixes that jitter.
When I introduced `Hash64`, I wired it through most of the compiler by making it an output of `StableHasher::finalize` then fixing the compile errors. I missed this case because the `u64` hash in this function is being produced by `DefaultHasher` instead. That seems pretty sketchy because the code seems confident that the hash needs to be stable, and we have a mechanism for stable hashing that we weren't using here.
Rollup of 7 pull requests
Successful merges:
- #137095 (Replace some u64 hashes with Hash64)
- #137100 (HIR analysis: Remove unnecessary abstraction over list of clauses)
- #137105 (Restrict DerefPure for Cow<T> impl to T = impl Clone, [impl Clone], str.)
- #137120 (Enable `relative-path-include-bytes-132203` rustdoc-ui test on Windows)
- #137125 (Re-add missing empty lines in the releases notes)
- #137145 (use add-core-stubs / minicore for a few more tests)
- #137149 (Remove SSE ABI from i586-pc-windows-msvc)
r? `@ghost`
`@rustbot` modify labels: rollup
Replace some u64 hashes with Hash64
I introduced the Hash64 and Hash128 types in https://github.com/rust-lang/rust/pull/110083, essentially as a mechanism to prevent hashes from landing in our leb128 encoding paths. If you just have a u64 or u128 field in a struct then derive Encodable/Decodable, that number gets leb128 encoding. So if you need to store a hash or some other value which behaves very close to a hash, don't store it as a u64.
This reverts part of https://github.com/rust-lang/rust/pull/117603, which turned an encoded Hash64 into a u64.
Based on https://github.com/rust-lang/rust/pull/110083, I don't expect this to be perf-sensitive on its own, though I expect that it may help stabilize some of the small rmeta size fluctuations we currently see in perf reports.
Overhaul `rustc_middle::limits`
In particular, to make `pattern_complexity` work more like other limits, which then enables some other simplifications.
r? ``@Nadrieril``
It's similar to the other limits, e.g. obtained via `get_limit`. So it
makes sense to handle it consistently with the other limits. We now use
`Limit`/`usize` in most places instead of `Option<usize>`, so we use
`Limit::new(usize::MAX)`/`usize::MAX` to emulate how `None` used to work.
The commit also adds `Limit::unlimited`.
Load all builtin targets at once instead of one by one in check-cfg
This PR adds a method on `rustc_target::Target` to load all the builtin targets at once, and then uses that method when constructing the `target_*` values in check-cfg instead of load loading each target one by one by their name, which requires a lookup and was more of a hack anyway.
This may give us some performance improvements as we won't need to do the lookup for the _currently_ 287 targets we have.
DWARF 1 is very different than DWARF 2+ (see the commentary in
https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#index-gdwarf)
and LLVM does not really seem to support DWARF 1 as Clang does not offer
a `-gdwarf-1` flag and `llc` will just generate DWARF 2 with the version
set to 1: https://godbolt.org/z/s85d87n3a.
Since this isn't actually supported (and it's not clear it would be
useful anyway), report that DWARF 1 is not supported if it is requested.
Also add a help message to the error saying which versions are supported.
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
The extended syntax for function signature that includes contract clauses
should never be user exposed versus the interface we want to ship
externally eventually.
Target modifiers (special marked options) are recorded in metainfo
Target modifiers (special marked options) are recorded in metainfo and compared to be equal in different linked crates.
PR for this RFC: https://github.com/rust-lang/rfcs/pull/3716
Option may be marked as `TARGET_MODIFIER`, example: `regparm: Option<u32> = (None, parse_opt_number, [TRACKED TARGET_MODIFIER]`.
If an TARGET_MODIFIER-marked option has non-default value, it will be recorded in crate metainfo as a `Vec<TargetModifier>`:
```
pub struct TargetModifier {
pub opt: OptionsTargetModifiers,
pub value_name: String,
}
```
OptionsTargetModifiers is a macro-generated enum.
Option value code (for comparison) is generated using `Debug` trait.
Error example:
```
error: mixing `-Zregparm` will cause an ABI mismatch in crate `incompatible_regparm`
--> $DIR/incompatible_regparm.rs:10:1
|
LL | #![crate_type = "lib"]
| ^
|
= help: the `-Zregparm` flag modifies the ABI so Rust crates compiled with different values of this flag cannot be used together safely
= note: `-Zregparm=1` in this crate is incompatible with `-Zregparm=2` in dependency `wrong_regparm`
= help: set `-Zregparm=2` in this crate or `-Zregparm=1` in `wrong_regparm`
= help: if you are sure this will not cause problems, use `-Cunsafe-allow-abi-mismatch=regparm` to silence this error
error: aborting due to 1 previous error
```
`-Cunsafe-allow-abi-mismatch=regparm,reg-struct-return` to disable list of flags.
Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle
This PR should not be merged until the rustc_codegen_llvm part is merged.
I will also alter it a little based on what get's shaved off from the cg_llvm PR,
and address some of the feedback I received in the other PR (including cleanups).
I am putting it already up to
1) Discuss with `@jieyouxu` if there is more work needed to add tests to this and
2) Pray that there is someone reviewing who can tell me why some of my autodiff invocations get lost.
Re 1: My test require fat-lto. I also modify the compilation pipeline. So if there are any other llvm-ir tests in the same compilation unit then I will likely break them. Luckily there are two groups who currently have the same fat-lto requirement for their GPU code which I have for my autodiff code and both groups have some plans to enable support for thin-lto. Once either that work pans out, I'll copy it over for this feature. I will also work on not changing the optimization pipeline for functions not differentiated, but that will require some thoughts and engineering, so I think it would be good to be able to run the autodiff tests isolated from the rest for now. Can you guide me here please?
For context, here are some of my tests in the samples folder: https://github.com/EnzymeAD/rustbook
Re 2: This is a pretty serious issue, since it effectively prevents publishing libraries making use of autodiff: https://github.com/EnzymeAD/rust/issues/173. For some reason my dummy code persists till the end, so the code which calls autodiff, deletes the dummy, and inserts the code to compute the derivative never gets executed. To me it looks like the rustc_autodiff attribute just get's dropped, but I don't know WHY? Any help would be super appreciated, as rustc queries look a bit voodoo to me.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
r? `@jieyouxu`
Fix deduplication mismatches in vtables leading to upcasting unsoundness
We currently have two cases where subtleties in supertraits can trigger disagreements in the vtable layout, e.g. leading to a different vtable layout being accessed at a callsite compared to what was prepared during unsizing. Namely:
### #135315
In this example, we were not normalizing supertraits when preparing vtables. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Identity {
type Selff;
}
impl<Selff> Identity for Selff {
type Selff = Selff;
}
trait Middle<T>: Supertrait<()> + Supertrait<T> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T> Middle<T> for () {}
trait Trait: Middle<<() as Identity>::Selff> {}
impl Trait for () {}
fn main() {
(&() as &dyn Trait as &dyn Middle<()>).say_hello(&0);
}
```
When we prepare `dyn Trait`, we see a supertrait of `Middle<<() as Identity>::Selff>`, which itself has two supertraits `Supertrait<()>` and `Supertrait<<() as Identity>::Selff>`. These two supertraits are identical, but they are not duplicated because we were using structural equality and *not* considering normalization. This leads to a vtable layout with two trait pointers.
When we upcast to `dyn Middle<()>`, those two supertraits are now the same, leading to a vtable layout with only one trait pointer. This leads to an offset error, and we call the wrong method.
### #135316
This one is a bit more interesting, and is the bulk of the changes in this PR. It's a bit similar, except it uses binder equality instead of normalization to make the compiler get confused about two vtable layouts. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Trait<T, U>: Supertrait<T> + Supertrait<U> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T, U> Trait<T, U> for () {}
fn main() {
(&() as &'static dyn for<'a> Trait<&'static (), &'a ()>
as &'static dyn Trait<&'static (), &'static ()>)
.say_hello(&0);
}
```
When we prepare the vtable for `dyn for<'a> Trait<&'static (), &'a ()>`, we currently consider the PolyTraitRef of the vtable as the key for a supertrait. This leads two two supertraits -- `Supertrait<&'static ()>` and `for<'a> Supertrait<&'a ()>`.
However, we can upcast[^up] without offsetting the vtable from `dyn for<'a> Trait<&'static (), &'a ()>` to `dyn Trait<&'static (), &'static ()>`. This is just instantiating the principal trait ref for a specific `'a = 'static`. However, when considering those supertraits, we now have only one distinct supertrait -- `Supertrait<&'static ()>` (which is deduplicated since there are two supertraits with the same substitutions). This leads to similar offsetting issues, leading to the wrong method being called.
[^up]: I say upcast but this is a cast that is allowed on stable, since it's not changing the vtable at all, just instantiating the binder of the principal trait ref for some lifetime.
The solution here is to recognize that a vtable isn't really meaningfully higher ranked, and to just treat a vtable as corresponding to a `TraitRef` so we can do this deduplication more faithfully. That is to say, the vtable for `dyn for<'a> Tr<'a>` and `dyn Tr<'x>` are always identical, since they both would correspond to a set of free regions on an impl... Do note that `Tr<for<'a> fn(&'a ())>` and `Tr<fn(&'static ())>` are still distinct.
----
There's a bit more that can be cleaned up. In codegen, we can stop using `PolyExistentialTraitRef` basically everywhere. We can also fix SMIR to stop storing `PolyExistentialTraitRef` in its vtable allocations.
As for testing, it's difficult to actually turn this into something that can be tested with `rustc_dump_vtable`, since having multiple supertraits that are identical is a recipe for ambiguity errors. Maybe someone else is more creative with getting that attr to work, since the tests I added being run-pass tests is a bit unsatisfying. Miri also doesn't help here, since it doesn't really generate vtables that are offset by an index in the same way as codegen.
r? `@lcnr` for the vibe check? Or reassign, idk. Maybe let's talk about whether this makes sense.
<sup>(I guess an alternative would also be to not do any deduplication of vtable supertraits (or only a really conservative subset) rather than trying to normalize and deduplicate more faithfully here. Not sure if that works and is sufficient tho.)</sup>
cc `@steffahn` -- ty for the minimizations
cc `@WaffleLapkin` -- since you're overseeing the feature stabilization :3
Fixes#135315Fixes#135316
Remove -Zinline-in-all-cgus and clean up tests/codegen-units/
Implementation of https://github.com/rust-lang/compiler-team/issues/814
I've taken some liberties with cleaning up the CGU partitioning tests, because that's the only place this flag was used and also mattered. I've often fought a lot with the contents of `tests/codegen-units` and it has never been clear to me when a test failure indicates a problem with my changes as opposed to a test just needing to be manually blessed. Hopefully the combination of the new README, new comments, and using `-Zprint-mono-items=lazy` in the partitioning tests improves that.
I've also deleted some of the `tests/run-make/sepcomp` tests. I think all the "sepcomp" tests have been obviated for years by better-designed (less flaky, clearer failures) test suites, but here I'm just deleting the ones I'm confident in.