The default diagnostic handler considers all remarks to be disabled by
default unless configured otherwise through LLVM internal flags:
`-pass-remarks`, `-pass-remarks-missed`, and `-pass-remarks-analysis`.
This behaviour makes `-Cremark` ineffective on its own.
Fix this by configuring a custom diagnostic handler that enables
optimization remarks based on the value of `-Cremark` option. With
`-Cremark=all` enabling all remarks.
Record more artifact sizes during self-profiling.
This PR adds artifact size recording for
- "linked artifacts" (executables, RLIBs, dylibs, static libs)
- object files
- dwo files
- assembly files
- crate metadata
- LLVM bitcode files
- LLVM IR files
- codegen unit size estimates
Currently the identifiers emitted for these are hard-coded as string literals. Is it worth adding constants to https://github.com/rust-lang/measureme/blob/master/measureme/src/rustc.rs instead? We don't do that for query names and the like -- but artifact kinds might be more stable than query names.
The only reason to use `abort_if_errors` is when the program is so broken that either:
1. later passes get confused and ICE
2. any diagnostics from later passes would be noise
This is never the case for lints, because the compiler has to be able to deal with `allow`-ed lints.
So it can continue to lint and compile even if there are lint errors.
Add -Z no-unique-section-names to reduce ELF header bloat.
This change adds a new compiler flag that can help reduce the size of ELF binaries that contain many functions.
By default, when enabling function sections (which is the default for most targets), the LLVM backend will generate different section names for each function. For example, a function `func` would generate a section called `.text.func`. Normally this is fine because the linker will merge all those sections into a single one in the binary. However, starting with [LLVM 12](https://github.com/llvm/llvm-project/commit/ee5d1a04), the backend will also generate unique section names for exception handling, resulting in thousands of `.gcc_except_table.*` sections ending up in the final binary because some linkers like LLD don't currently merge or strip these EH sections (see discussion [here](https://reviews.llvm.org/D83655)). This can bloat the ELF headers and string table significantly in binaries that contain many functions.
The new option is analogous to Clang's `-fno-unique-section-names`, and instructs LLVM to generate the same `.text` and `.gcc_except_table` section for each function, resulting in a smaller final binary.
The motivation to add this new option was because we have a binary that ended up with so many ELF sections (over 65,000) that it broke some existing ELF tools, which couldn't handle so many sections.
Here's our old binary:
```
$ readelf --sections old.elf | head -1
There are 71746 section headers, starting at offset 0x2a246508:
$ readelf --sections old.elf | grep shstrtab
[71742] .shstrtab STRTAB 0000000000000000 2977204c ad44bb 00 0 0 1
```
That's an 11MB+ string table. Here's the new binary using this option:
```
$ readelf --sections new.elf | head -1
There are 43 section headers, starting at offset 0x29143ca8:
$ readelf --sections new.elf | grep shstrtab
[40] .shstrtab STRTAB 0000000000000000 29143acc 0001db 00 0 0 1
```
The whole binary size went down by over 20MB, which is quite significant.
This change adds a new compiler flag that can help reduce the size of
ELF binaries that contain many functions.
By default, when enabling function sections (which is the default for most
targets), the LLVM backend will generate different section names for each
function. For example, a function "func" would generate a section called
".text.func". Normally this is fine because the linker will merge all those
sections into a single one in the binary. However, starting with LLVM 12
(llvm/llvm-project@ee5d1a0), the backend will
also generate unique section names for exception handling, resulting in
thousands of ".gcc_except_table.*" sections ending up in the final binary
because some linkers don't currently merge or strip these EH sections.
This can bloat the ELF headers and string table significantly in
binaries that contain many functions.
The new option is analogous to Clang's -fno-unique-section-names, and
instructs LLVM to generate the same ".text" and ".gcc_except_table"
section for each function, resulting in smaller object files and
potentially a smaller final binary.
Implement `#[link_ordinal(n)]`
Allows the use of `#[link_ordinal(n)]` with `#[link(kind = "raw-dylib")]`, allowing Rust to link against DLLs that export symbols by ordinal rather than by name. As long as the ordinal matches, the name of the function in Rust is not required to match the name of the corresponding function in the exporting DLL.
Part of #58713.
This largely involves implementing the options debug-info-for-profiling
and profile-sample-use and forwarding them on to LLVM.
AutoFDO can be used on x86-64 Linux like this:
rustc -O -Cdebug-info-for-profiling main.rs -o main
perf record -b ./main
create_llvm_prof --binary=main --out=code.prof
rustc -O -Cprofile-sample-use=code.prof main.rs -o main2
Now `main2` will have feedback directed optimization applied to it.
The create_llvm_prof tool can be obtained from this github repository:
https://github.com/google/autofdoFixes#64892.
Fix clippy lints
I'm currently working on allowing clippy to run on librustdoc after a discussion I had with `@Mark-Simulacrum.` So in the meantime, I fixed a few lints on the compiler crates.
Fix use after drop in self-profile with llvm events
self-profile with `-Z self-profile-events=llvm` have failed with a segmentation fault due to this use after drop.
this type of events can be more useful now that the new passmanager is the default.
The new pass manager is enabled by default in clang since
Clang/LLVM 13. While the discussion about this is still ongoing
(https://lists.llvm.org/pipermail/llvm-dev/2021-August/152305.html)
it's expected that support for the legacy pass manager will be
dropped either in LLVM 14 or 15.
This switches us to use the new pass manager if LLVM >= 13 is used.
This does not yet support #[link_name] attributes on functions, the #[link_ordinal]
attribute, #[link(kind = "raw-dylib")] on extern blocks in bin crates, or
stdcall functions on 32-bit x86.
This doesn't seem to be necessary anymore, although I don't know
at which point or why that changed.
Forcing -O1 makes some tests fail under NewPM, because NewPM also
performs inlining at -O1, so it ends up performing much more
optimization in practice than before.
This commit implements both the native linking modifiers infrastructure
as well as an initial attempt at the individual modifiers from the RFC.
It also introduces a feature flag for the general syntax along with
individual feature flags for each modifier.
Use FromStr trait for number option parsing
Replace `parse_uint` with generic `parse_number` based on `FromStr`.
Use it for parsing inlining threshold to avoid casting later.
Run LLVM coverage instrumentation passes before optimization passes
This matches the behavior of Clang and allows us to remove several
hacks which were needed to ensure functions weren't optimized away
before reaching the instrumentation pass.
Fixes#83429
cc `@richkadel`
r? `@tmandry`
This matches the behavior of Clang and allows us to remove several
hacks which were needed to ensure functions weren't optimized away
before reaching the instrumentation pass.
Adjust `-Ctarget-cpu=native` handling in cg_llvm
When cg_llvm encounters the `-Ctarget-cpu=native` it computes an
explciit set of features that applies to the target in order to
correctly compile code for the host CPU (because e.g. `skylake` alone is
not sufficient to tell if some of the instructions are available or
not).
However there were a couple of issues with how we did this. Firstly, the
order in which features were overriden wasn't quite right – conceptually
you'd expect `-Ctarget-cpu=native` option to override the features that
are implicitly set by the target definition. However due to how other
`-Ctarget-cpu` values are handled we must adopt the following order
of priority:
* Features from -Ctarget-cpu=*; are overriden by
* Features implied by --target; are overriden by
* Features from -Ctarget-feature; are overriden by
* function specific features.
Another problem was in that the function level `target-features`
attribute would overwrite the entire set of the globally enabled
features, rather than just the features the
`#[target_feature(enable/disable)]` specified. With something like
`-Ctarget-cpu=native` we'd end up in a situation wherein a function
without `#[target_feature(enable)]` annotation would have a broader
set of features compared to a function with one such attribute. This
turned out to be a cause of heavy run-time regressions in some code
using these function-level attributes in conjunction with
`-Ctarget-cpu=native`, for example.
With this PR rustc is more careful about specifying the entire set of
features for functions that use `#[target_feature(enable/disable)]` or
`#[instruction_set]` attributes.
Sadly testing the original reproducer for this behaviour is quite
impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in
particular on developer or CI machines.
cc https://github.com/rust-lang/rust/issues/83027 `@BurntSushi`
When cg_llvm encounters the `-Ctarget-cpu=native` it computes an
explciit set of features that applies to the target in order to
correctly compile code for the host CPU (because e.g. `skylake` alone is
not sufficient to tell if some of the instructions are available or
not).
However there were a couple of issues with how we did this. Firstly, the
order in which features were overriden wasn't quite right – conceptually
you'd expect `-Ctarget-cpu=native` option to override the features that
are implicitly set by the target definition. However due to how other
`-Ctarget-cpu` values are handled we must adopt the following order
of priority:
* Features from -Ctarget-cpu=*; are overriden by
* Features implied by --target; are overriden by
* Features from -Ctarget-feature; are overriden by
* function specific features.
Another problem was in that the function level `target-features`
attribute would overwrite the entire set of the globally enabled
features, rather than just the features the
`#[target_feature(enable/disable)]` specified. With something like
`-Ctarget-cpu=native` we'd end up in a situation wherein a function
without `#[target_feature(enable)]` annotation would have a broader
set of features compared to a function with one such attribute. This
turned out to be a cause of heavy run-time regressions in some code
using these function-level attributes in conjunction with
`-Ctarget-cpu=native`, for example.
With this PR rustc is more careful about specifying the entire set of
features for functions that use `#[target_feature(enable/disable)]` or
`#[instruction_set]` attributes.
Sadly testing the original reproducer for this behaviour is quite
impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in
particular on developer or CI machines.
Set path of the compile unit to the source directory
As part of the effort to implement split dwarf debug info, we ended up
setting the compile unit location to the output directory rather than
the source directory. Furthermore, it seems like we failed to remap the
prefixes for this as well!
The desired behaviour is to instead set the `DW_AT_GNU_dwo_name` to a
path relative to compiler's working directory. This still allows
debuggers to find the split dwarf files, while not changing the
behaviour of the code that is compiling with regular debug info, and not
changing the compiler's behaviour with regards to reproducibility.
Fixes#82074
cc `@alexcrichton` `@davidtwco`
In the backend we may want to remove certain temporary files, but in
certain other situations these files might not be produced in the first
place. We don't exactly care about that, and the intent is really that
these files are gone after a certain point in the backend.
Here we unify the backend file removing calls to use `ensure_removed`
which will attempt to delete a file, but will not fail if it does not
exist (anymore).
The tradeoff to this approach is, of course, that we may miss instances
were we are attempting to remove files at wrong paths due to some bug –
compilation would silently succeed but the temporary files would remain
there somewhere.
As part of the effort to implement split dwarf debug info, we ended up
setting the compile unit location to the output directory rather than
the source directory. Furthermore, it seems like we failed to remap the
prefixes for this as well!
The desired behaviour is to instead set the `DW_AT_GNU_dwo_name` to a
path relative to compiler's working directory. This still allows
debuggers to find the split dwarf files, while not changing the
behaviour of the code that is compiling with regular debug info, and not
changing the compiler's behaviour with regards to reproducibility.
Fixes#82074
This commit adds a new stable codegen option to rustc,
`-Csplit-debuginfo`. The old `-Zrun-dsymutil` flag is deleted and now
subsumed by this stable flag. Additionally `-Zsplit-dwarf` is also
subsumed by this flag but still requires `-Zunstable-options` to
actually activate. The `-Csplit-debuginfo` flag takes one of
three values:
* `off` - This indicates that split-debuginfo from the final artifact is
not desired. This is not supported on Windows and is the default on
Unix platforms except macOS. On macOS this means that `dsymutil` is
not executed.
* `packed` - This means that debuginfo is desired in one location
separate from the main executable. This is the default on Windows
(`*.pdb`) and macOS (`*.dSYM`). On other Unix platforms this subsumes
`-Zsplit-dwarf=single` and produces a `*.dwp` file.
* `unpacked` - This means that debuginfo will be roughly equivalent to
object files, meaning that it's throughout the build directory
rather than in one location (often the fastest for local development).
This is not the default on any platform and is not supported on Windows.
Each target can indicate its own default preference for how debuginfo is
handled. Almost all platforms default to `off` except for Windows and
macOS which default to `packed` for historical reasons.
Some equivalencies for previous unstable flags with the new flags are:
* `-Zrun-dsymutil=yes` -> `-Csplit-debuginfo=packed`
* `-Zrun-dsymutil=no` -> `-Csplit-debuginfo=unpacked`
* `-Zsplit-dwarf=single` -> `-Csplit-debuginfo=packed`
* `-Zsplit-dwarf=split` -> `-Csplit-debuginfo=unpacked`
Note that `-Csplit-debuginfo` still requires `-Zunstable-options` for
non-macOS platforms since split-dwarf support was *just* implemented in
rustc.
There's some more rationale listed on #79361, but the main gist of the
motivation for this commit is that `dsymutil` can take quite a long time
to execute in debug builds and provides little benefit. This means that
incremental compile times appear that much worse on macOS because the
compiler is constantly running `dsymutil` over every single binary it
produces during `cargo build` (even build scripts!). Ideally rustc would
switch to not running `dsymutil` by default, but that's a problem left
to get tackled another day.
Closes#79361