Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle
This PR should not be merged until the rustc_codegen_llvm part is merged.
I will also alter it a little based on what get's shaved off from the cg_llvm PR,
and address some of the feedback I received in the other PR (including cleanups).
I am putting it already up to
1) Discuss with `@jieyouxu` if there is more work needed to add tests to this and
2) Pray that there is someone reviewing who can tell me why some of my autodiff invocations get lost.
Re 1: My test require fat-lto. I also modify the compilation pipeline. So if there are any other llvm-ir tests in the same compilation unit then I will likely break them. Luckily there are two groups who currently have the same fat-lto requirement for their GPU code which I have for my autodiff code and both groups have some plans to enable support for thin-lto. Once either that work pans out, I'll copy it over for this feature. I will also work on not changing the optimization pipeline for functions not differentiated, but that will require some thoughts and engineering, so I think it would be good to be able to run the autodiff tests isolated from the rest for now. Can you guide me here please?
For context, here are some of my tests in the samples folder: https://github.com/EnzymeAD/rustbook
Re 2: This is a pretty serious issue, since it effectively prevents publishing libraries making use of autodiff: https://github.com/EnzymeAD/rust/issues/173. For some reason my dummy code persists till the end, so the code which calls autodiff, deletes the dummy, and inserts the code to compute the derivative never gets executed. To me it looks like the rustc_autodiff attribute just get's dropped, but I don't know WHY? Any help would be super appreciated, as rustc queries look a bit voodoo to me.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
r? `@jieyouxu`
ABI-required target features: warn when they are missing in base CPU
Part of https://github.com/rust-lang/rust/pull/135408:
instead of adding ABI-required features to the target we build for LLVM, check that they are already there. Crucially we check this after applying `-Ctarget-cpu` and `-Ctarget-feature`, by reading `sess.unstable_target_features`. This means we can tweak the ABI target feature check without changing the behavior for any existing user; they will get warnings but the target features behave as before.
The test changes here show that we are un-doing the "add all required target features" part. Without the full #135408, there is no way to take a way an ABI-required target feature with `-Ctarget-cpu`, so we cannot yet test that part.
Cc ``@workingjubilee``
Remove -Zinline-in-all-cgus and clean up tests/codegen-units/
Implementation of https://github.com/rust-lang/compiler-team/issues/814
I've taken some liberties with cleaning up the CGU partitioning tests, because that's the only place this flag was used and also mattered. I've often fought a lot with the contents of `tests/codegen-units` and it has never been clear to me when a test failure indicates a problem with my changes as opposed to a test just needing to be manually blessed. Hopefully the combination of the new README, new comments, and using `-Zprint-mono-items=lazy` in the partitioning tests improves that.
I've also deleted some of the `tests/run-make/sepcomp` tests. I think all the "sepcomp" tests have been obviated for years by better-designed (less flaky, clearer failures) test suites, but here I'm just deleting the ones I'm confident in.
Get rid of RunCompiler
The various `set_*` methods that have been removed can be replaced by setting the respective fields in the `Callbacks::config` implementation. `set_using_internal_features` was often forgotten and it's equivalent is now done automatically.
Properly note when query stack is being cut off
cc #70953
also, i'm not certain whether we should even limit this at all. i don't see the problem with printing the full query stack, apparently it was limited b/c we used to ICE? but we're already printing the full stack to disk since #108714.
r? oli-obk
add `-Zmin-function-alignment`
tracking issue: https://github.com/rust-lang/rust/issues/82232
This PR adds the `-Zmin-function-alignment=<align>` flag, that specifies a minimum alignment for all* functions.
### Motivation
This feature is requested by RfL [here](https://github.com/rust-lang/rust/issues/128830):
> i.e. the equivalents of `-fmin-function-alignment` ([GCC](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fmin-function-alignment_003dn), Clang does not support it) / `-falign-functions` ([GCC](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-functions), [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang1-falign-functions)).
>
> For the Linux kernel, the behavior wanted is that of GCC's `-fmin-function-alignment` and Clang's `-falign-functions`, i.e. align all functions, including cold functions.
>
> There is [`feature(fn_align)`](https://github.com/rust-lang/rust/issues/82232), but we need to do it globally.
### Behavior
The `fn_align` feature does not have an RFC. It was decided at the time that it would not be necessary, but maybe we feel differently about that now? In any case, here are the semantics of this flag:
- `-Zmin-function-alignment=<align>` specifies the minimum alignment of all* functions
- the `#[repr(align(<align>))]` attribute can be used to override the function alignment on a per-function basis: when `-Zmin-function-alignment` is specified, the attribute's value is only used when it is higher than the value passed to `-Zmin-function-alignment`.
- the target may decide to use a higher value (e.g. on x86_64 the minimum that LLVM generates is 16)
- The highest supported alignment in rust is `2^29`: I checked a bunch of targets, and they all emit the `.p2align 29` directive for targets that align functions at all (some GPU stuff does not have function alignment).
*: Only with `build-std` would the minimum alignment also be applied to `std` functions.
---
cc `@ojeda`
r? `@workingjubilee` you were active on the tracking issue
mark deprecated option as deprecated in rustc_session to remove copypasta and small refactor
This marks deprecated options as deprecated via flag in options table in rustc_session, which removes copypasted deprecation text from rustc_driver_impl.
This also adds warning for deprecated `-C ar` option, which didn't emitted any warnings before.
Makes `inline_threshold` `[UNTRACKED]`, as it do nothing.
Adds few tests.
See individual commits.
inline_threshold mark deprecated
no-stack-check
print deprecation message for -Car too
inline_threshold deprecated and do nothing: make in untracked
make OptionDesc struct from tuple
Improve dependency_format a bit
* Make `DependencyList` an `IndexVec` rather than emulating one using a `Vec` (which was off-by-one as LOCAL_CRATE was intentionally skipped)
* Update some comments for the fact that we now use `#[global_allocator]` rather than `extern crate alloc_system;`/`extern crate alloc_jemalloc;` for specifying which allocator to use. We still use a similar mechanism for the panic runtime, so refer to the panic runtime in those comments instead.
* An unrelated refactor to `create_and_enter_global_ctxt` I forgot to include in https://github.com/rust-lang/rust/pull/134302. This refactor is too small to be worth it's own PR.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Split up attribute parsing code and move data types to `rustc_attr_data_structures`
This change renames `rustc_attr` to `rustc_attr_parsing`, and splits up the parsing code. At the same time, all the data types used move to `rustc_attr_data_structures`. This is in preparation of also having a third crate: `rustc_attr_validation`
I initially envisioned this as two separate PRs, but I think doing it in one go reduces the number of ways others would have to rebase their changes on this. However, I can still split them.
r? `@oli-obk` (we already discussed how this is a first step in a larger plan)
For a more detailed plan on how attributes are going to change, see https://github.com/rust-lang/rust/issues/131229
Edit: this looks like a giant PR, but the changes are actually rather trivial. Each commit is reviewable on its own, and mostly moves code around. No new logic is added.
Remove queries from the driver interface
All uses of driver queries in the public api of rustc_driver have been removed in https://github.com/rust-lang/rust/pull/134130 already. This removes driver queries from rustc_interface and does a couple of cleanups around TyCtxt construction and entering enabled by this removal.
Finishes the removal of driver queries started with https://github.com/rust-lang/rust/pull/126834.
A bunch of cleanups (part 2)
Just like https://github.com/rust-lang/rust/pull/133567 these were all found while looking at the respective code, but are not blocking any other changes I want to make in the short term.
Rollup of 7 pull requests
Successful merges:
- #133900 (Advent of `tests/ui` (misc cleanups and improvements) [1/N])
- #133937 (Keep track of parse errors in `mod`s and don't emit resolve errors for paths involving them)
- #133938 (`rustc_mir_dataflow` cleanups, including some renamings)
- #134058 (interpret: reduce usage of TypingEnv::fully_monomorphized)
- #134130 (Stop using driver queries in the public API)
- #134140 (Add AST support for unsafe binders)
- #134229 (Fix typos in docs on provenance)
r? `@ghost`
`@rustbot` modify labels: rollup
It only exists to pass some information from one part of the driver to
another part. We can directly pass this information to the function that
needs it to reduce the amount of mutation of the Session.
A bunch of cleanups
These are all extracted from a branch I have to get rid of driver queries. Most of the commits are not directly necessary for this, but were found in the process of implementing the removal of driver queries.
Previous PR: https://github.com/rust-lang/rust/pull/132410
It was inconsistently done (sometimes even within a single function) and
most of the rest of the compiler uses fatal errors instead, which need
to be caught using catch_with_exit_code anyway. Using fatal errors
instead of ErrorGuaranteed everywhere in the driver simplifies things a
bit.
rust_for_linux: -Zreg-struct-return commandline flag for X86 (#116973)
Command line flag `-Zreg-struct-return` for X86 (32-bit) for rust-for-linux.
This flag enables the same behavior as the `abi_return_struct_as_int` target spec key.
- Tracking issue: https://github.com/rust-lang/rust/issues/116973
Remove `-Zshow-span`.
It's very old (added in #12087). It's strange, and it's not clear what its use cases are. It only works with the crate root file because it runs before expansion. I suspect it won't be missed.
r? `@estebank`
It's very old (added in #12087). It's strange, and it's not clear what
its use cases are. It only works with the crate root file because it
runs before expansion. I suspect it won't be missed.
Get rid of HIR const checker
As far as I can tell, the HIR const checker was implemented in https://github.com/rust-lang/rust/pull/66170 because we were not able to issue useful const error messages in the MIR const checker.
This seems to have changed in the last 5 years, probably due to work like #90532. I've tweaked the diagnostics slightly and think the error messages have gotten *better* in fact.
Thus I think the HIR const checker has reached the end of its usefulness, and we can retire it.
cc `@RalfJung`
I was surprised to find that running with `-Zparse-only` only parses the
crate root file. Other files aren't parsed because that happens later
during expansion.
This commit renames the option and updates the help message to make this
clearer.
Some more refactorings towards removing driver queries
Follow up to https://github.com/rust-lang/rust/pull/127184
## Custom driver breaking change
The `after_analysis` callback is changed to accept `TyCtxt` instead of `Queries`. The only safe query in `Queries` to call at this point is `global_ctxt()` which allows you to enter the `TyCtxt` either way. To fix your custom driver, replace the `queries: &'tcx Queries<'tcx>` argument with `tcx: TyCtxt<'tcx>` and remove your `queries.global_ctxt().unwrap().enter(|tcx| { ... })` call and only keep the contents of the closure.
## Custom driver deprecation
The `after_crate_root_parsing` callback is now deprecated. Several custom drivers are incorrectly calling `queries.global_ctxt()` from inside of it, which causes some driver code to be skipped. As such I would like to either remove it in the future or if custom drivers still need it, change it to accept an `&rustc_ast::Crate` instead.
Merge `-Zhir-stats` into `-Zinput-stats`
Currently `-Z hir-stats` prints the size and count of various kinds of nodes, and the total size of all the nodes it counted, but not the total count of nodes. So, before this PR:
```
$ git clone https://github.com/BurntSushi/ripgrep
$ cd ripgrep
$ cargo +nightly rustc -- -Z hir-stats
ast-stats-1 PRE EXPANSION AST STATS
ast-stats-1 Name Accumulated Size Count Item Size
ast-stats-1 ----------------------------------------------------------------
ast-stats-1 ...
ast-stats-1 ----------------------------------------------------------------
ast-stats-1 Total 93_576
ast-stats-1
ast-stats-2 POST EXPANSION AST STATS
ast-stats-2 Name Accumulated Size Count Item Size
ast-stats-2 ----------------------------------------------------------------
ast-stats-2 ...
ast-stats-2 ----------------------------------------------------------------
ast-stats-2 Total 2_430_648
ast-stats-2
hir-stats HIR STATS
hir-stats Name Accumulated Size Count Item Size
hir-stats ----------------------------------------------------------------
hir-stats ...
hir-stats ----------------------------------------------------------------
hir-stats Total 3_678_512
hir-stats
```
For consistency, this PR adds a total for the count as well:
```
$ cargo +stage1 rustc -- -Z hir-stats
ast-stats-1 PRE EXPANSION AST STATS
ast-stats-1 Name Accumulated Size Count Item Size
ast-stats-1 ----------------------------------------------------------------
ast-stats-1 ...
ast-stats-1 ----------------------------------------------------------------
ast-stats-1 Total 93_576 1_877
ast-stats-1
ast-stats-2 POST EXPANSION AST STATS
ast-stats-2 Name Accumulated Size Count Item Size
ast-stats-2 ----------------------------------------------------------------
ast-stats-2 ...
ast-stats-2 ----------------------------------------------------------------
ast-stats-2 Total 2_430_648 48_625
ast-stats-2
hir-stats HIR STATS
hir-stats Name Accumulated Size Count Item Size
hir-stats ----------------------------------------------------------------
hir-stats ...
hir-stats ----------------------------------------------------------------
hir-stats Total 3_678_512 73_418
hir-stats
```
I wasn't sure if I was supposed to update `tests/ui/stats/hir-stats.stderr` to reflect this. I ran it locally, thinking it would fail, but it didn't:
```
$ ./x test tests/ui/stats
...
running 2 tests
i.
test result: ok. 1 passed; 0 failed; 1 ignored; 0 measured; 17949 filtered out
```
Also: is there a reason `-Z hir-stats` and `-Z input-stats` both exist? The former seems like it should completely supercede the latter. But strangely, the two give very different numbers for node counts:
```
$ cargo +nightly rustc -- -Z input-stats
...
Lines of code: 483
Pre-expansion node count: 2386
Post-expansion node count: 63844
```
That's a 30% difference in this case. Is it intentional that these numbers are so different? I see comments for both saying that they are merely approximations and should not be expected to be correct:
bd0826a452/compiler/rustc_ast_passes/src/node_count.rs (L1)bd0826a452/compiler/rustc_passes/src/hir_stats.rs (L1-L3)
Over in Zed we've noticed that loading crates for a large-ish workspace can take almost 200ms. We've pinned it down to how rustc searches for paths, as it performs a linear search over the list of candidate paths. In our case the candidate list had about 20k entries which we had to iterate over for each dependency being loaded.
This commit introduces a simple FilesIndex that's just a sorted Vec under the hood. Since crates are looked up by both prefix and suffix, we perform a range search on said Vec (which constraints the search space based on prefix) and follow up with a linear scan of entries with matching suffixes.
FilesIndex is also pre-filtered before any queries are performed using available target information; query prefixes/sufixes are based on the target we are compiling for, so we can remove entries that can never match up front.
Overall, this commit brings down build time for us in dev scenarios by about 6%.
100ms might not seem like much, but this is a constant cost that each of our workspace crates has to pay, even when said crate is miniscule.
Delete the `cfg(not(parallel))` serial compiler
Since it's inception a long time ago, the parallel compiler and its cfgs have been a maintenance burden. This was a necessary evil the allow iteration while not degrading performance because of synchronization overhead.
But this time is over. Thanks to the amazing work by the parallel working group (and the dyn sync crimes), the parallel compiler has now been fast enough to be shipped by default in nightly for quite a while now.
Stable and beta have still been on the serial compiler, because they can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has served us well in the years since it was split from the parallel one, but it's over now.
Let the knight slay one head of the two-headed dragon!
#113349
Note that the default is still 1 thread, as more than 1 thread is still fairly broken.
cc `@onur-ozkan` to see if i did the bootstrap field removal correctly, `@SparrowLii` on the sync parts
Since it's inception a long time ago, the parallel compiler and its cfgs
have been a maintenance burden. This was a necessary evil the allow
iteration while not degrading performance because of synchronization
overhead.
But this time is over. Thanks to the amazing work by the parallel
working group (and the dyn sync crimes), the parallel compiler has now
been fast enough to be shipped by default in nightly for quite a while
now.
Stable and beta have still been on the serial compiler, because they
can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has
served us well in the years since it was split from the parallel one,
but it's over now.
Let the knight slay one head of the two-headed dragon!
rustc_codegen_llvm: Add a new 'pc' option to branch-protection
Add a new 'pc' option to -Z branch-protection for aarch64 that enables the use of PC as a diversifier in PAC branch protection code.
When the pauth-lr target feature is enabled in combination with -Z branch-protection=pac-ret,pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) will be generated.
Add a new 'pc' option to -Z branch-protection for aarch64 that
enables the use of PC as a diversifier in PAC branch protection code.
When the pauth-lr target feature is enabled in combination
with -Z branch-protection=pac-ret,pc, the new 9.5-a instructions
(pacibsppc, retaasppc, etc) will be generated.
Reject leading unsafe in `cfg!(...)` and `--check-cfg`
This PR reject leading unsafe in `cfg!(...)` and `--check-cfg`.
Fixes (after-backport) https://github.com/rust-lang/rust/issues/131055
r? `@jieyouxu`
Use `Vec` in `rustc_interface::Config::locale_resources`
This allows a third-party tool to injects its own resources, when receiving the config via `rustc_driver::Callbacks::config`.
Don't call closure_by_move_body_def_id on FnOnce async closures in MIR validation
Refactors the check in #129847 to not unncessarily call the `closure_by_move_body_def_id` query for async closures that don't *need* a by-move body.
Fixes#130167
This flag allows specifying the threshold size above which LLVM should
not consider placing small objects in a .sdata or .sbss section.
Support is indicated in the target options via the
small-data-threshold-support target option, which can indicate either an
LLVM argument or an LLVM module flag. To avoid duplicate specifications
in a large number of targets, the default value for support is
DefaultForArch, which is translated to a concrete value according to the
target's architecture.