Because tiny CGUs make compilation less efficient *and* result in worse
generated code.
We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.
This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.
Here are some example before/after CGU sizes for opt builds.
- html5ever
- CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
- CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]
- libc
- CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
- CGUs: 1, mean size: 424.0, sizes: [424]
- tt-muncher
- CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
- CGUs: 1, mean size: 9075.0, sizes: [9075]
Note that CGUs of size 100,000+ aren't unusual in larger programs.
Removed use of iteration through a HashMap/HashSet in rustc_incremental and replaced with IndexMap/IndexSet
This allows for the `#[allow(rustc::potential_query_instability)]` in rustc_incremental to be removed, moving towards fixing #84447 (although a LOT more modules have to be changed to fully resolve it). Only HashMaps/HashSets that are being iterated through have been modified (although many structs and traits outside of rustc_incremental had to be modified as well, as they had fields/methods that involved a HashMap/HashSet that would be iterated through)
I'm making a PR for just 1 module changed to test for performance regressions and such, for future changes I'll either edit this PR to reflect additional modules being converted, or batch multiple modules of changes together and make a PR for each group of modules.
use c literals in compiler and library
Use c literals #108801 in compiler and library
currently blocked on:
* <strike>rustfmt: don't know how to format c literals</strike> nope, nightly one works.
* <strike>bootstrap</strike>
r? `@ghost`
`@rustbot` blocked
These tend to have special handling in a bunch of places anyway, so the variant helps remember that. And I think it's easier to grok than non-Scalar Aggregates sometimes being `Immediates` (like I got wrong and caused 109992). As a minor bonus, it means we don't need to generate poison LLVM values for them to pass around in `OperandValue::Immediate`s.
Optimize scalar and scalar pair representations loaded from ByRef in llvm
in https://github.com/rust-lang/rust/pull/105653 I noticed that we were generating suboptimal LLVM IR if we had a `ConstValue::ByRef` that could be represented by a `ScalarPair`. Before https://github.com/rust-lang/rust/pull/105653 this is probably rare, but after it, every slice will go down this suboptimal code path that requires LLVM to untangle a bunch of indirections and translate static allocations that are only used once to read a scalar pair from.
Adds support for LLVM [SafeStack] which provides backward edge control
flow protection by separating the stack into two parts: data which is
only accessed in provable safe ways is allocated on the normal stack
(the "safe stack") and all other data is placed in a separate allocation
(the "unsafe stack").
SafeStack support is enabled by passing `-Zsanitizer=safestack`.
[SafeStack]: https://clang.llvm.org/docs/SafeStack.html
Support #[global_allocator] without the allocator shim
This makes it possible to use liballoc/libstd in combination with `--emit obj` if you use `#[global_allocator]`. This is what rust-for-linux uses right now and systemd may use in the future. Currently they have to depend on the exact implementation of the allocator shim to create one themself as `--emit obj` doesn't create an allocator shim.
Note that currently the allocator shim also defines the oom error handler, which is normally required too. Once `#![feature(default_alloc_error_handler)]` becomes the only option, this can be avoided. In addition when using only fallible allocator methods and either `--cfg no_global_oom_handling` for liballoc (like rust-for-linux) or `--gc-sections` no references to the oom error handler will exist.
To avoid this feature being insta-stable, you will have to define `__rust_no_alloc_shim_is_unstable` to avoid linker errors.
(Labeling this with both T-compiler and T-lang as it originally involved both an implementation detail and had an insta-stable user facing change. As noted above, the `__rust_no_alloc_shim_is_unstable` symbol requirement should prevent unintended dependence on this unstable feature.)
Rather than returning an array of features from to_llvm_features, return a structure that contains
the dependencies. This also contains metadata on how the features depend on each other to allow for
the correct enabling and disabling.
Some features that are tied together only make sense to be folded
together when enabling the feature. For example on AArch64 sve and
neon are tied together, however it doesn't make sense to disable neon
when disabling sve.
In #91608 the fp-armv8 feature was removed as it's tied to the neon
feature. However disabling neon didn't actually disable the use of
floating point registers and instructions, for this `-fp-armv8` is
required.
Fix dependency tracking for debugger visualizers
This PR fixes dependency tracking for debugger visualizer files by changing the `debugger_visualizers` query to an `eval_always` query that scans the AST while it is still available. This way the set of visualizer files is already available when dep-info is emitted. Since the query is turned into an `eval_always` query, dependency tracking will now reliably detect changes to the visualizer script files themselves.
TODO:
- [x] perf.rlo
- [x] Needs a bit more documentation in some places
- [x] Needs regression test for the incr. comp. case
Fixes https://github.com/rust-lang/rust/issues/111226
Fixes https://github.com/rust-lang/rust/issues/111227
Fixes https://github.com/rust-lang/rust/issues/111295
r? `@wesleywiser`
cc `@gibbyfree`
Only depend on CFG_VERSION in rustc_interface
This avoids having to rebuild the whole compiler on each commit when `omit-git-hash = false`.
cc https://github.com/rust-lang/rust/issues/76720 - this won't fix it, and I'm not suggesting we turn this on by default, but it will make it less painful for people who do have `omit-git-hash` on as a workaround.
Remove the ThinLTO CU hack
This reverts #46722, commit e0ab5d5feb.
Since #111167, commit 10b69dde3f, we are
generating DWARF subprograms in a way that is meant to be more compatible
with LLVM's expectations, so hopefully we don't need this workaround
rewriting CUs anymore.
Remove misleading target feature aliases
Fixes#100752. This is a follow up to #103750. These aliases could not be completely removed until rust-lang/stdarch#1355 landed.
cc `@Amanieu`
CFI: Fix SIGILL reached via trait objects
Fix#106547 by transforming the concrete self into a reference to a trait object before emitting type metadata identifiers for trait methods.
You will need to add the following as replacement for the old __rust_*
definitions when not using the alloc shim.
#[no_mangle]
static __rust_no_alloc_shim_is_unstable: u8 = 0;
This makes it possible to use liballoc/libstd in combination with
`--emit obj` if you use `#[global_allocator]`. Making it work for the
default libstd allocator would require weak functions, which are not
well supported on all systems.
This reverts #46722, commit e0ab5d5feb.
Since #111167, commit 10b69dde3f, we are
generating DWARF subprograms in a way that is meant to be more compatible
with LLVM's expectations, so hopefully we don't need this workaround
rewriting CUs anymore.
Mark s390x condition code register as clobbered in inline assembly
Various s390x instructions (arithmetic operations, logical operations, comparisons, etc. see also "Condition Codes" section in [z/Architecture Reference Summary](https://www.ibm.com/support/pages/zarchitecture-reference-summary)) modify condition code register `cc`, but AFAIK there is currently no way to mark it as clobbered in `asm!`.
`cc` register definition in LLVM:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L320
This PR also updates asm_experimental_arch docs in the unstable-book to mention s390x registers.
cc `@uweigand`
r? `@Amanieu`
Output LLVM optimization remark kind in `-Cremark` output
Since https://github.com/rust-lang/rust/pull/90833, the optimization remark kind has not been printed. Therefore it wasn't possible to easily determine from the log (in a programmatic way) which remark kind was produced. I think that the most interesting remarks are the missed ones, which can lead users to some code optimization.
Maybe we could also change the format closer to the "old" one:
```
note: optimization remark for tailcallelim at /checkout/src/libcore/num/mod.rs:1:0: marked this call a tail call candidate
```
I wanted to programatically parse the remarks so that they could work e.g. with https://github.com/OfekShilon/optview2. However, now that I think about it, probably the proper solution is to tell rustc to output them to YAML and then use the YAML as input for the opt remark visualization tools. The flag for enabling this does not seem to work though (https://github.com/rust-lang/rust/issues/96705#issuecomment-1117632322).
Still I think that it's good to output the remark kind anyway, it's an important piece of information.
r? ```@tmiasko```
debuginfo: split method declaration and definition
When we're adding a method to a type DIE, we only want a DW_AT_declaration
there, because LLVM LTO can't unify type definitions when a child DIE is a
full subprogram definition. Now the subprogram definition gets added at the
CU level with a specification link back to the abstract declaration.
Both GCC and Clang write debuginfo this way for C++ class methods.
Fixes#109730.
Fixes#109934.