Lots of vectors of messages called `message` or `msg`. This commit
pluralizes them.
Note that `emit_message_default` and `emit_messages_default` both
already existed, and both process a vector, so I renamed the former
`emit_messages_default_inner` because it's called by the latter.
And make all hand-written `IntoDiagnostic` impls generic, by using
`DiagnosticBuilder::new(dcx, level, ...)` instead of e.g.
`dcx.struct_err(...)`.
This means the `create_*` functions are the source of the error level.
This change will let us remove `struct_diagnostic`.
Note: `#[rustc_lint_diagnostics]` is added to `DiagnosticBuilder::new`,
it's necessary to pass diagnostics tests now that it's used in
`into_diagnostic` functions.
coverage: Skip instrumenting a function if no spans were extracted from MIR
The immediate symptoms of #118643 were fixed by #118666, but some users reported that their builds now encounter another coverage-related ICE:
```
error: internal compiler error: compiler/rustc_codegen_llvm/src/coverageinfo/mapgen.rs:98:17: A used function should have had coverage mapping data but did not: (...)
```
I was able to reproduce at least one cause of this error: if no relevant spans could be extracted from a function, but the function contains `CoverageKind::SpanMarker` statements, then codegen still thinks the function is instrumented and complains about the fact that it has no coverage spans.
This PR prevents that from happening in two ways:
- If we didn't extract any relevant spans from MIR, skip instrumenting the entire function and don't create a `FunctionCoverateInfo` for it.
- If coverage codegen sees a `CoverageKind::SpanMarker` statement, skip it early and avoid creating `func_coverage`.
---
Fixes#118850.
Coverage marker statements should have no effect on codegen, but in some cases
they could have the side-effect of creating a `func_coverage` entry for their
enclosing function. That can lead to an ICE for functions that don't actually
have any coverage spans.
Currently, we assume that ScalarPair is always represented using
a two-element struct, both as an immediate value and when stored
in memory.
This currently works fairly well, but runs into problems with
https://github.com/rust-lang/rust/pull/116672, where a ScalarPair
involving an i128 type can no longer be represented as a two-element
struct in memory. For example, the tuple `(i32, i128)` needs to be
represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy
alignment requirement. Using `{ i32, i128 }` instead will result in
the second element being stored at the wrong offset (prior to
LLVM 18).
Resolve this issue by no longer requiring that the immediate and
in-memory type for ScalarPair are the same. The in-memory type
will now look the same as for normal struct types (and will include
padding filler and similar), while the immediate type stays a
simple two-element struct type. This also means that booleans in
immediate ScalarPair are now represented as i1 rather than i8,
just like we do everywhere else.
The core change here is to llvm_type (which now treats ScalarPair
as a normal struct) and immediate_llvm_type (which returns the
two-element struct that llvm_type used to produce). The rest is
fixing things up to no longer assume these are the same. In
particular, this switches places that try to get pointers to the
ScalarPair elements to use byte-geps instead of struct-geps.
Sets the accessibility of types and fields in DWARF using
`DW_AT_accessibility` attribute.
`DW_AT_accessibility` (public/protected/private) isn't exactly right for
Rust, but neither is `DW_AT_visibility` (local/exported/qualified), and
there's no way to set `DW_AT_visbility` in LLVM's API.
Signed-off-by: David Wood <david@davidtw.co>
It's necessary for `derive(Diagnostic)`, but is best avoided elsewhere
because there are clearer alternatives.
This required adding `Handler::struct_almost_fatal`.
Fix alignment passed down to LLVM for simd_masked_load
Follow up to #117953
The alignment for a masked load operation should be that of the element/lane, not the vector as a whole
It can produce miscompilations after the LLVM optimizer notices the higher alignment and promotes this to an unmasked, aligned load followed up by blend/select - https://rust.godbolt.org/z/KEeGbevbb
Stop allowing `rustc::potential_query_instability` on all of
`rustc_codegen_llvm` and instead allow it on a case-by-case basis. In
this case, both instances are safe to allow.
`.debug_pubnames` and `.debug_pubtypes` are poorly designed and people
seldom use them. However, they take a considerable portion of size in
the final binary. This tells LLVM stop emitting those sections on
DWARFv4 or lower. DWARFv5 use `.debug_names` which is more concise
in size and performant for name lookup.
Implement repr(packed) for repr(simd)
This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
Add more SIMD platform-intrinsics
- [x] simd_masked_load
- [x] LLVM codegen - llvm.masked.load
- [x] cranelift codegen - implemented but untested
- [ ] simd_masked_store
- [x] LLVM codegen - llvm.masked.store
- [ ] cranelift codegen
Also added a run-pass test to test both intrinsics, and additional build-fail & check-fail to cover validation for both intrinsics
update target feature following LLVM API change
LLVM commit e817966718 renamed* the `unaligned-scalar-mem` target feature to `fast-unaligned-access`.
(*) technically the commit folded two previous features into one, but there are no references to the other one in rust.
coverage: Use `SpanMarker` to improve coverage spans for `if !` expressions
Coverage instrumentation works by extracting source code spans from MIR. However, some kinds of syntax are effectively erased during MIR building, so their spans don't necessarily exist anywhere in MIR, making them invisible to the coverage instrumentor (unless we resort to various heuristics and hacks to recover them).
This PR introduces `CoverageKind::SpanMarker`, which is a new variant of `StatementKind::Coverage`. Its sole purpose is to represent spans that would otherwise not appear in MIR, so that the coverage instrumentor can extract them.
When coverage is enabled, the MIR builder can insert these dummy statements as needed, to improve the accuracy of spans used by coverage mappings.
Fixes#115468.
---
```@rustbot``` label +A-code-coverage
Add emulated TLS support
This is a reopen of https://github.com/rust-lang/rust/pull/96317 . many android devices still only use 128 pthread keys, so using emutls can be helpful.
Currently LLVM uses emutls by default for some targets (such as android, openbsd), but rust does not use it, because `has_thread_local` is false.
This commit has some changes to allow users to enable emutls:
1. add `-Zhas-thread-local` flag to specify that std uses `#[thread_local]` instead of pthread key.
2. when using emutls, decorate symbol names to find thread local symbol correctly.
3. change `-Zforce-emulated-tls` to `-Ztls-model=emulated` to explicitly specify whether to generate emutls.
r? `@Amanieu`
There are cases where coverage instrumentation wants to show a span for some
syntax element, but there is no MIR node that naturally carries that span, so
the instrumentor can't see it.
MIR building can now use this new kind of coverage statement to deliberately
include those spans in MIR, attached to a dummy statement that has no other
effect.
Avoid adding builtin functions to `symbols.o`
We found performance regressions in #113923. The problem seems to be that `--gc-sections` does not remove these symbols. I tested that lld removes these symbols, but ld and gold do not.
I found that `used` adds symbols to `symbols.o` at 3e202ead60/compiler/rustc_codegen_ssa/src/back/linker.rs (L1786-L1791).
The PR removes builtin functions.
Note that under LTO, ld still preserves these symbols. (lld will still remove them.)
The first commit also fixes#118559. But I think the second commit also makes sense.
compile-time evaluation: detect writes through immutable pointers
This has two motivations:
- it unblocks https://github.com/rust-lang/rust/pull/116745 (and therefore takes a big step towards `const_mut_refs` stabilization), because we can now detect if the memory that we find in `const` can be interned as "immutable"
- it would detect the UB that was uncovered in https://github.com/rust-lang/rust/pull/117905, which was caused by accidental stabilization of `copy` functions in `const` that can only be called with UB
When UB is detected, we emit a future-compat warn-by-default lint. This is not a breaking change, so completely in line with [the const-UB RFC](https://rust-lang.github.io/rfcs/3016-const-ub.html), meaning we don't need t-lang FCP here. I made the lint immediately show up for dependencies since it is nearly impossible to even trigger this lint without `const_mut_refs` -- the accidentally stabilized `copy` functions are the only way this can happen, so the crates that popped up in #117905 are the only causes of such UB (in the code that crater covers), and the three cases of UB that we know about have all been fixed in their respective crates already.
The way this is implemented is by making use of the fact that our interpreter is already generic over the notion of provenance. For CTFE we now use the new `CtfeProvenance` type which is conceptually an `AllocId` plus a boolean `immutable` flag (but packed for a more efficient representation). This means we can mark a pointer as immutable when it is created as a shared reference. The flag will be propagated to all pointers derived from this one. We can then check the immutable flag on each write to reject writes through immutable pointers.
I just hope perf works out.
Currently LLVM uses emutls by default
for some targets (such as android, openbsd),
but rust does not use it, because `has_thread_local` is false.
This commit has some changes to allow users to enable emutls:
1. add `-Zhas-thread-local` flag to specify
that std uses `#[thread_local]` instead of pthread key.
2. when using emutls, decorate symbol names
to find thread local symbol correctly.
3. change `-Zforce-emulated-tls` to `-Ztls-model=emulated`
to explicitly specify whether to generate emutls.
These impls are all needed for just a single `IntoDiagnostic` type, not
a family of them.
Note that `ErrorGuaranteed` is the default type parameter for
`IntoDiagnostic`.
Restore `#![no_builtins]` crates participation in LTO.
After #113716, we can make `#![no_builtins]` crates participate in LTO again.
`#![no_builtins]` with LTO does not result in undefined references to the error. I believe this type of issue won't happen again.
Fixes#72140. Fixes#112245. Fixes#110606. Fixes#105734. Fixes#96486. Fixes#108853. Fixes#108893. Fixes#78744. Fixes#91158. Fixes https://github.com/rust-lang/cargo/issues/10118. Fixes https://github.com/rust-lang/compiler-builtins/issues/347.
The `nightly-2023-07-20` version does not always reproduce problems due to changes in compiler-builtins, core, and user code. That's why this issue recurs and disappears.
Some issues were not tested due to the difficulty of reproducing them.
r? pnkfelix
cc `@bjorn3` `@japaric` `@alexcrichton` `@Amanieu`
Add `-Zfunction-return={keep,thunk-extern}` option
This is intended to be used for Linux kernel RETHUNK builds.
With this commit (optionally backported to Rust 1.73.0), plus a patched Linux kernel to pass the flag, I get a RETHUNK build with Rust enabled that is `objtool`-warning-free and is able to boot in QEMU and load a sample Rust kernel module.
Issue: https://github.com/rust-lang/rust/issues/116853.
This is intended to be used for Linux kernel RETHUNK builds.
With this commit (optionally backported to Rust 1.73.0), plus a
patched Linux kernel to pass the flag, I get a RETHUNK build with
Rust enabled that is `objtool`-warning-free and is able to boot in
QEMU and load a sample Rust kernel module.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
If the TargetMachine is disposed after the Context is disposed, it can
lead to use after frees in some cases.
I've observed this happening occasionally on code compiled for
aarch64-pc-windows-msvc using `-Zstack-protector=strong` but other users
have reported AVs from host aarch64-pc-windows-msvc compilers as well.
Currently we always do this:
```
use rustc_fluent_macro::fluent_messages;
...
fluent_messages! { "./example.ftl" }
```
But there is no need, we can just do this everywhere:
```
rustc_fluent_macro::fluent_messages! { "./example.ftl" }
```
which is shorter.
The `fluent_messages!` macro produces uses of
`crate::{D,Subd}iagnosticMessage`, which means that every crate using
the macro must have this import:
```
use rustc_errors::{DiagnosticMessage, SubdiagnosticMessage};
```
This commit changes the macro to instead use
`rustc_errors::{D,Subd}iagnosticMessage`, which avoids the need for the
imports.
Tighten up link attributes for llvm-wrapper bindings
Fixes https://github.com/rust-lang/rust/issues/118084 by moving all of the declarations of symbols from `llvm_rust` into a separate extern block with `#[link(name = "llvm-wrapper", kind = "static")]`.
This also renames `LLVMTimeTraceProfiler*` to `LLVMRustTimeTraceProfiler*` because those are functions from `llvm_rust`.
r? tmiasko
Enable Rust to use the EHCont security feature of Windows
In the future Windows will enable Control-flow Enforcement Technology (CET aka Shadow Stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.
The required support for EHCONT Guard has already been merged into LLVM, long ago. This change simply adds the Rust codegen option to enable it.
Relevant LLVM change: https://reviews.llvm.org/D40223
This also adds a new `ehcont-guard` option to the bootstrap config which enables EHCont Guard when building std.
We at Microsoft have been using this feature for a significant period of time; we are confident that the LLVM feature, when enabled, generates well-formed code.
We currently enable EHCONT using a codegen feature, but I'm certainly open to refactoring this to be a target feature instead, or to use any appropriate mechanism to enable it.
By default, `newtype_index!` types get a default `Encodable`/`Decodable`
impl. You can opt out of this with `custom_encodable`. Opting out is the
opposite to how Rust normally works with autogenerated (derived) impls.
This commit inverts the behaviour, replacing `custom_encodable` with
`encodable` which opts into the default `Encodable`/`Decodable` impl.
Only 23 of the 59 `newtype_index!` occurrences need `encodable`.
Even better, there were eight crates with a dependency on
`rustc_serialize` just from unused default `Encodable`/`Decodable`
impls. This commit removes that dependency from those eight crates.
In the future Windows will enable Control-flow Enforcement Technology
(CET aka Shadow Stacks). To protect the path where the context is
updated during exception handling, the binary is required to enumerate
valid unwind entrypoints in a dedicated section which is validated when
the context is being set during exception handling.
The required support for EHCONT has already been merged into LLVM,
long ago. This change adds the Rust codegen option to enable it.
Reference:
* https://reviews.llvm.org/D40223
This also adds a new `ehcont-guard` option to the bootstrap config which
enables EHCont Guard when building std.
Ensure sanity of all computed ABIs
This moves the ABI sanity assertions from the codegen backend to the ABI computation logic. Sadly, due to past mistakes, we [have to](https://github.com/rust-lang/rust/pull/117351#issuecomment-1788495503) be able to compute a sane ABI for nonsensical function types like `extern "C" fn(str) -> str`. So to make the sanity check pass we first need to make all ABI adjustment deal with unsized types... and we have no shared infrastructure for those adjustments, so that's a bunch of copy-paste. At least we have assertions failing loudly when one accidentally sets a different mode for an unsized argument.
To achieve this, this re-lands the parts of https://github.com/rust-lang/rust/pull/80594 that got reverted in https://github.com/rust-lang/rust/pull/81388. To avoid breaking wasm ABI again, that ABI now explicitly opts-in to the (wrong, broken) ABI that we currently keep for backwards compatibility. That's still better than having *every* ABI use the wrong broken default!
Cc `@bjorn3`
Fixes https://github.com/rust-lang/rust/issues/115845
Remove asmjs
Fulfills [MCP 668](https://github.com/rust-lang/compiler-team/issues/668).
`asmjs-unknown-emscripten` does not work as-specified, and lacks essential upstream support for generating asm.js, so it should not exist at all.
Ensure strings created with `const_str` get the `unnamed_addr` attribute
This function (`const_str`) is only used when we need to invent a string during codegen -- for example, for a panic message to pass when codegening some of the assert/panic/etc terminators (for stuff like divide by zero).
AFAICT all other consts, such as the user-defined ones from const eval, should already be getting this attribute (things that come from a ConstAllocation do, for example). Which means that the "unnamed" part is even more true than usual here, these aren't strings that even exist as far as the user can tell.
~~Setting this attribute allows LLVM to merge these constants, leading to significant binary size savings (much more than I would expect). On x86_64-unknown-linux-gnu, t takes a build of ripgrep (release without debug info) from 9.7MiB to 6.0MiB (a savings of over 30%!?), and a build of rustc_driver's shared object from 123MiB to 112MiB (less drastic, but still over 10% reduced).~~
~~The effect on ripgrep is substantially reduced on macOS for reasons beyond me (I may have fucked up the test), only saving around 0.2MiB, although rustc_driver is still around 10MB or smaller than it had been previously.~~
~~This raises some questions, such as "does that mean 1/3 of ripgrep was made of division by zero complaints?" I'm not sure, that may be the case. The output of `strings path/to/rg` is \~2MB smaller, so it seems like a lot of it was. Allowing these to be merged presumably also allow functions that contain them to be merged (if the addresses had semantic meaning, then it stands).~~
~~I intend to do some more analysis here, but I got this up as soon as I realized that this attribute was only missing for internal const strings, and all other ones already get it.~~
Edit: The wins are much more marginal, but there's some argument to do this for the sake of consistency.
Add -Z llvm_module_flag
Allow adding values to the `!llvm.module.flags` metadata for a generated module. The syntax is
`-Z llvm_module_flag=<name>:<type>:<value>:<behavior>`
Currently only u32 values are supported but the type is required to be specified for forward compatibility. The `behavior` element must match one of the named LLVM metadata behaviors.viors.
This flag is expected to be perma-unstable.
Allow adding values to the `!llvm.module.flags` metadata for a generated
module. The syntax is
`-Z llvm_module_flag=<name>:<type>:<value>:<behavior>`
Currently only u32 values are supported but the type is required to be
specified for forward compatibility. The `behavior` element must match
one of the named LLVM metadata behaviors.viors.
This flag is expected to be perma-unstable.
consts: remove dead code around `i1` constant values
`LLVMConstZext` recently got deleted, and it turns out (thanks to `@nikic` for knowing!) that this is dead code. Tests all pass for me without this logic, and per nikic:
> We always generate constants in "relocatable bag of bytes"
> representation, so you're never going to get a plain bool.
So this should be a safe thing to do.
r? `@nikic`
`@rustbot` label: +llvm-main
`LLVMConstZext` recently got deleted, and it turns out (thanks to @nikic
for knowing!) that this is dead code. Tests all pass for me without this
logic, and per nikic:
> We always generate constants in "relocatable bag of bytes"
> representation, so you're never going to get a plain bool.
So this should be a safe thing to do.
r? @nikic
@rustbot label: +llvm-main
Most notably, this commit changes the `pub use crate::*;` in that file
to `use crate::*;`. This requires a lot of `use` items in other crates
to be adjusted, because everything defined within `rustc_span::*` was
also available via `rustc_span::source_map::*`, which is bizarre.
The commit also removes `SourceMap::span_to_relative_line_string`, which
is unused.
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
coverage: Consistently remove unused counter IDs from expressions/mappings
If some coverage counters were removed by MIR optimizations, we need to take care not to refer to those counter IDs in coverage mappings, and instead replace them with a constant zero value. If we don't, `llvm-cov` might see a too-large counter ID and silently discard the entire function from its coverage reports.
Fixes#117012.
report `unused_import` for empty reexports even it is pub
Fixes#116032
An easy fix. r? `@petrochenkov`
(Discovered this issue while reviewing #115993.)
Most coverage metadata is encoded into two sections in the final executable.
The `__llvm_covmap` section mostly just contains a list of filenames, while the
`__llvm_covfun` section contains encoded coverage maps for each instrumented
function.
The catch is that each per-function record also needs to contain a hash of the
filenames list that it refers to. Historically this was handled by assembling
most of the per-function data into a temporary list, then assembling the
filenames buffer, then using the filenames hash to emit the per-function data,
and then finally emitting the filenames table itself.
However, now that we build the filenames table up-front (via a separate
traversal of the per-function data), we can hash and emit that part first, and
then emit each of the per-function records immediately after building. This
removes the awkwardness of having to temporarily store nearly-complete
per-function records.
The main change here is that `VirtualFileMapping` now uses an internal hashmap
to de-duplicate incoming global file IDs. That removes the need for
`encode_mappings_for_function` to re-sort its mappings by filename in order to
de-duplicate them.
(We still de-duplicate runs of identical filenames to save work, but this is
not load-bearing for correctness, so a sort is not necessary.)
The combined `get_expressions_and_counter_regions` method was an artifact of
having to prepare the expressions and mappings at the same time, to avoid
ownership/lifetime problems with temporary data used by both.
Now that we have an explicit transition from `FunctionCoverageCollector` to the
final `FunctionCoverage`, we can prepare any shared data during that step and
store it in the final struct.
This gives us a clearly-defined place to run code after the instance's MIR has
been traversed by codegen, but before we emit its `__llvm_covfun` record.
This query has a name that sounds general-purpose, but in fact it has
coverage-specific semantics, and (fortunately) is only used by coverage code.
Because it is only ever called once (from one designated CGU), it doesn't need
to be a query, and we can change it to a regular function instead.
Implement rustc part of RFC 3127 trim-paths
This PR implements (or at least tries to) [RFC 3127 trim-paths](https://github.com/rust-lang/rust/issues/111540), the rustc part. That is `-Zremap-path-scope` with all of it's components/scopes.
`@rustbot` label: +F-trim-paths
Even though expression details are now stored in the info structure, we still
need to inject `ExpressionUsed` statements into MIR, because if one is missing
during codegen then we know that it was optimized out and we can remap all of
its associated code regions to zero.
Previously, mappings were attached to individual coverage statements in MIR.
That necessitated special handling in MIR optimizations to avoid deleting those
statements, since otherwise codegen would be unable to reassemble the original
list of mappings.
With this change, a function's list of mappings is now attached to its MIR
body, and survives intact even if individual statements are deleted by
optimizations.
Instead of modifying the accumulated expressions in-place, we now build a set
of expressions that are known to be zero, and then consult that set on the fly
when converting the expression data for FFI.
This will be necessary when moving mappings and expression data into function
coverage info, which can't be mutated during codegen.
Coverage codegen can now allocate arrays based on the number of
counters/expressions originally used by the instrumentor.
The existing query that inspects coverage statements is still used for
determining the number of counters passed to `llvm.instrprof.increment`. If
some high-numbered counters were removed by MIR optimizations, the instrumented
binary can potentially use less memory and disk space at runtime.
This allows coverage information to be attached to the function as a whole when
appropriate, instead of being smuggled through coverage statements in the
function's basic blocks.
As an example, this patch moves the `function_source_hash` value out of
individual `CoverageKind::Counter` statements and into the per-function info.
When synthesizing unused functions for coverage purposes, the absence of this
info is taken to indicate that a function was not eligible for coverage and
should not be synthesized.
Remove cgu_reuse_tracker from Session
This removes a bit of global mutable state.
It will now miss post-lto cgu reuse when ThinLTO determines that a cgu doesn't get changed, but there weren't any tests for this anyway and a test for it would be fragile to the exact implementation of ThinLTO in LLVM.
Copy 1-element arrays as scalars, not vectors
For `[T; 1]` it's silly to copy as `<1 x T>` when we can just copy as `T`.
Inspired by https://github.com/rust-lang/rust/issues/101210#issuecomment-1732470941, which pointed out that `Option<[u8; 1]>` was codegenning worse than `Option<u8>`.
(I'm not sure *why* LLVM doesn't optimize out `<1 x u8>`, but might as well just not emit it in the first place in this codepath.)
---
I think I bit off too much in #116479; let me try just the scalar case first.
r? `@ghost`
Prototype using const generic for simd_shuffle IDX array
cc https://github.com/rust-lang/rust/issues/85229
r? `@workingjubilee` on the design
TLDR: there is now a `fn simd_shuffle_generic<T, U, const IDX: &'static [u32]>(x: T, y: T) -> U;` intrinsic that allows replacing
```rust
simd_shuffle(a, b, const { stuff })
```
with
```rust
simd_shuffle_generic::<_, _, {&stuff}>(a, b)
```
which makes the compiler implementations much simpler, if we manage to at some point eliminate `simd_shuffle`.
There are some issues with this today though (can't do math without bubbling it up in the generic arguments). With this change, we can start porting the simple cases and get better data on the others.
subst -> instantiate
continues #110793, there are still quite a few uses of `subst` and `substitute`, but changing them all in the same PR was a bit too much, so I've stopped here for now.
The LLVM API that we use to encode coverage mappings already has its own code
for removing unused coverage expressions and renumbering the rest.
This lets us get rid of our own complex renumbering code, making it easier to
change our coverage code in other ways.
After coverage instrumentation and MIR transformations, we can sometimes end up
with coverage expressions that always have a value of zero. Any expression
operand that refers to an always-zero expression can be replaced with a literal
`Operand::Zero`, making the emitted coverage mapping data smaller and simpler.
This simplification step is mostly redundant with the simplifications performed
inline in `expressions_with_regions`, except that it does a slightly more
thorough job in some cases (because it checks for always-zero expressions
*after* other simplifications).
However, adding this simplification step will then let us greatly simplify that
code, without affecting the quality of the emitted coverage maps.
treat host effect params as erased in codegen
This fixes the changes brought to codegen tests when effect params are added to libcore, by not attempting to monomorphize functions that get the host param by being `const fn`.
r? `@oli-obk`
This fixes the changes brought to codegen tests when effect params are
added to libcore, by not attempting to monomorphize functions that get
the host param by being `const fn`.
Rework `no_coverage` to `coverage(off)`
As discussed at the tail of https://github.com/rust-lang/rust/issues/84605 this replaces the `no_coverage` attribute with a `coverage` attribute that takes sub-parameters (currently `off` and `on`) to control the coverage instrumentation.
Allows future-proofing for things like `coverage(off, reason="Tested live", issue="#12345")` or similar.
Remove `verbose_generic_activity_with_arg`
This removes `verbose_generic_activity_with_arg` and changes users to `generic_activity_with_arg`. This keeps the output of `-Z time` readable while these repeated events are still available with the self profiling mechanism.
Instead of writing coverage mappings into a supplied `&RustString`, this
function can just create the buffer itself and return the resulting vector of
bytes.
If two or more mappings cover exactly the same region, their relative order
will now be preserved from `get_expressions_and_counter_regions`, rather than
being disturbed by implementation details of an unstable sort.
The current order is: counter mappings, expression mappings, zero mappings.
(LLVM will also perform its own stable sort on these mappings, but that sort
only compares file ID, start location, and `RegionKind`.)
This struct was only being used to hold the global file table, and one of its
methods didn't even use the table. Changing its methods to ordinary functions
makes it easier to see where the table is mutated.
debuginfo: add compiler option to allow compressed debuginfo sections
LLVM already supports emitting compressed debuginfo. In debuginfo=full builds, the debug section is often a large amount of data, and it typically compresses very well (3x is not unreasonable.) We add a new knob to allow debuginfo to be compressed when the matching LLVM functionality is present. Like clang, if a known-but-disabled compression mechanism is requested, we disable compression and emit uncompressed debuginfo sections.
The API is different enough on older LLVMs we just pretend the support
is missing on LLVM older than 16.
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this are inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this, consider the following program:
```rust
#[no_mangle]
fn add_numbers(x: &Option<i32>, y: &Option<i32>) -> i32 {
let x1 = x.unwrap();
let y1 = y.unwrap();
x1 + y1
}
```
When building for x86_64 Windows using 1.72 it generates (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again.
Ideally, we would also deduplicate child scopes and variables, however my attempt to do that with #114643 resulted in asserts when building for Linux (#115156) which would require some deep changes to Rust to fix (#115455).
Instead, when using an inlined function as a debug scope, we will also create a new child scope such that subsequent child scopes and variables do not collide (from LLVM's perspective).
After this change the above assembly now (with <https://reviews.llvm.org/D159226> as well) shows the `panic!` was inlined from `unwrap` in `option.rs` at line 935 into the current function in `lib.rs` at line 0 (line 0 is emitted since it is ambiguous which line to use as there were two inline sites that lead to this same code):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
.cv_inline_site_id 6 within 0 inlined_at 1 0 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
lto: load bitcode sections by name
Upstream change
llvm/llvm-project@6b539f5eb8 changed `isSectionBitcode` works and it now only respects `.llvm.lto` sections instead of also `.llvmbc`, which it says was never intended to be used for LTO. We instead load sections by name, and sniff for raw bitcode by hand.
This is an alternative approach to #115136, where we tried the same thing using the `object` crate, but it got too fraught to continue.
r? `@nikic`
`@rustbot` label: +llvm-main
Use `Freeze` for `SourceFile`
This uses the `Freeze` type in `SourceFile` to let accessing `external_src` and `lines` be lock-free.
Behavior of `add_external_src` is changed to set `ExternalSourceKind::AbsentErr` on a hash mismatch which matches the documentation. `ExternalSourceKind::Unneeded` was removed as it's unused.
Based on https://github.com/rust-lang/rust/pull/115401.
LLVM already supports emitting compressed debuginfo. In debuginfo=full
builds, the debug section is often a large amount of data, and it
typically compresses very well (3x is not unreasonable.) We add a new
knob to allow debuginfo to be compressed when the matching LLVM
functionality is present. Like clang, if a known-but-disabled
compression mechanism is requested, we disable compression and emit
uncompressed debuginfo sections.
The API is different enough on older LLVMs we just pretend the support
is missing on LLVM older than 16.
Upstream change
llvm/llvm-project@6b539f5eb8 changed
`isSectionBitcode` works and it now only respects `.llvm.lto` sections
instead of also `.llvmbc`, which it says was never intended to be used
for LTO. We instead load sections by name, and sniff for raw bitcode by
hand.
r? @nikic
@rustbot label: +llvm-main
Add CL and CMD into to pdb debug info
Partial fix for https://github.com/rust-lang/rust/issues/96475
The Arg0 and CommandLineArgs of the MCTargetOptions cpp class are not set within bb548f9645/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L378)
This causes LLVM to not neither output any compiler path (cl) nor the arguments that were used when invoking it (cmd) in the PDB file.
This fix adds the missing information to the target machine so LLVM can use it.
Upstream change
llvm/llvm-project@6b539f5eb8 changed
`isSectionBitcode` works and it now only respects `.llvm.lto` sections
instead of also `.llvmbc`, which it says was never intended to be used
for LTO. We instead load sections by name, and sniff for raw bitcode by
hand.
r? @nikic
@rustbot label: +llvm-main
Inline functions called from `add_coverage`
This removes quite a bit of indirection and duplicated code related to getting the `FunctionCoverage`.
CC `@Zalathar`
Use `preserve_mostcc` for `extern "rust-cold"`
As experimentation in #115242 has shown looks better than `coldcc`. Notably, clang exposes `preserve_most` (https://clang.llvm.org/docs/AttributeReference.html#preserve-most) but not `cold`, so this change should put us on a better-supported path.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse. (See comment in the code.)
cc tracking issue #97544
codegen_llvm/llvm_type: avoid matching on the Rust type
This `match` is highly suspicious. Looking at `scalar_llvm_type_at` I think it makes no difference. But if it were to make a difference that would be a huge problem, since it doesn't look through `repr(transparent)`!
Cc `@eddyb` `@bjorn3`
As experimentation in 115242 has shown looks better than `coldcc`.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse.
cc tracking issue 97544
Revert "Use the same DISubprogram for each instance of the same inline function within the caller"
This reverts commit 687bffa493.
Reverting to resolve ICEs reported on nightly.
cc `@dpaoliello`
Fixes#115156
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this is inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this when building for x86_64 Windows (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again, this also requires caching the `DILexicalBlock` and `DIVariable` objects to avoid creating duplicates.
After this change the above assembly now looks like:
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
.cv_inline_site_id 5 within 0 inlined_at 1 0 0
.cv_inline_site_id 6 within 5 inlined_at 1 12 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
Replace the \01__gnu_mcount_nc to LLVM intrinsic for ARM
Current `-Zinstrument-mcount` for ARM32 use the `\01__gnu_mcount_nc` directly for its instrumentation function.
However, the LLVM does not use this mcount function directly, but it wraps it to intrinsic, `llvm.arm.gnu.eabi.mcount` and the transform pass also only handle the intrinsic.
As a result, current `-Zinstrument-mcount` not work on ARM32. Refer: https://github.com/namhyung/uftrace/issues/1764
This commit replaces the mcount name from native function to the LLVM intrinsic so that the transform pass can handle it.
Current `-Zinstrument-mcount` for ARM32 use the `\01__gnu_mcount_nc`
directly for its instrumentation function.
However, the LLVM does not use this mcount function directly, but it wraps
it to intrinsic, `llvm.arm.gnu.eabi.mcount` and the transform pass also
only handle the intrinsic.
As a result, current `-Zinstrument-mcount` not work on ARM32.
Refer: https://github.com/namhyung/uftrace/issues/1764
This commit replaces the mcount name from native function to the
LLVM intrinsic so that the transform pass can handle it.
Signed-off-by: ChoKyuWon <kyuwoncho18@gmail.com>
Use `unstable_target_features` when checking inline assembly
This is necessary to properly validate register classes even when the relevant target feature name is still unstable.
Extract a create_wrapper_function for use in allocator shim writing
This deduplicates some logic and makes it easier to follow what wrappers are produced. In the future it may allow moving the code to determine which wrappers to create to cg_ssa.
coverage: Don't convert filename/symbol strings to `CString` for FFI
LLVM APIs are usually perfectly happy to accept pointer/length strings, as long as we supply a suitable length value when creating a `StringRef` or `std::string`.
This lets us avoid quite a few intermediate `CString` copies during coverage codegen. It also lets us use an `IndexSet<Symbol>` (instead of an `IndexSet<CString>`) when building the deduplicated filename table.
feat: `riscv-interrupt-{m,s}` calling conventions
Similar to prior support added for the mips430, avr, and x86 targets this change implements the rough equivalent of clang's [`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking visibility into any non-assembly body of the interrupt handler, have to be very conservative and save the [entire CPU state to the stack frame][full-frame-save]. By instead asking LLVM to only save the registers that it uses, we defer the decision to the tool with the best context: it can more accurately account for the cost of spills if it knows that every additional register used is already at the cost of an implicit spill.
At the LLVM level, this is apparently [implemented by] marking every register as "[callee-save]," matching the semantics of an interrupt handler nicely (it has to leave the CPU state just as it found it after its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes no attempt to e.g. save the state in a user-accessible stack frame. For a full discussion of those challenges and tradeoffs, please refer to [the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM does not expose the "all-saved" function flavor as a calling convention directly, instead preferring to use an attribute that allows for differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be aware of the differences between machine-mode and supervisor-mode interrupts as to why no `riscv-interrupt` calling convention is exposed through rustc, and similarly for why `riscv-interrupt-u` makes no appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
Similar to prior support added for the mips430, avr, and x86 targets
this change implements the rough equivalent of clang's
[`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling
e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking
visibility into any non-assembly body of the interrupt handler, have to
be very conservative and save the [entire CPU state to the stack
frame][full-frame-save]. By instead asking LLVM to only save the
registers that it uses, we defer the decision to the tool with the best
context: it can more accurately account for the cost of spills if it
knows that every additional register used is already at the cost of an
implicit spill.
At the LLVM level, this is apparently [implemented by] marking every
register as "[callee-save]," matching the semantics of an interrupt
handler nicely (it has to leave the CPU state just as it found it after
its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes
no attempt to e.g. save the state in a user-accessible stack frame. For
a full discussion of those challenges and tradeoffs, please refer to
[the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM
does not expose the "all-saved" function flavor as a calling convention
directly, instead preferring to use an attribute that allows for
differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be
aware of the differences between machine-mode and supervisor-mode
interrupts as to why no `riscv-interrupt` calling convention is exposed
through rustc, and similarly for why `riscv-interrupt-u` makes no
appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
CFI: Fix error compiling core with LLVM CFI enabled
Fix#90546 by filtering out global value function pointer types from the type tests, and adding the LowerTypeTests pass to the rustc LTO optimization pipelines.
Add hotness data to LLVM remarks
Slight improvement of https://github.com/rust-lang/rust/pull/113040. This makes sure that if PGO is used, remarks generated using `-Zremark-dir` will include the `Hotness` attribute.
r? `@tmiasko`
Fix#90546 by filtering out global value function pointer types from the
type tests, and adding the LowerTypeTests pass to the rustc LTO
optimization pipelines.
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly
As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)
Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.
cc `@RalfJung` `@Amanieu`
This deduplicates some logic and makes it easier to follow what wrappers
are produced. In the future it may allow moving the code to determine
which wrappers to create to cg_ssa.
cg_llvm: stop identifying ADTs in LLVM IR
This is an extension of https://github.com/rust-lang/rust/pull/94107. It may be a minor perf win.
Fixes#96242.
Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like
```rs
struct Foo(Box<Foo>, u64, u64);
```
which would be represented with something like
```ll
%Foo = type { %Foo*, i64, i64 }
```
which is now just
```ll
{ ptr, i64, i64 }
```
r? `@tmiasko`
Coverage FFI types were historically split across two modules, because some of
them were needed by code in `rustc_codegen_ssa`.
Now that all of the coverage codegen code has been moved into
`rustc_codegen_llvm` (#113355), it's possible to move all of the FFI types into
a single module, making it easier to see all of them at once.
Filter out short-lived LLVM diagnostics before they reach the rustc handler
During profiling I saw remark passes being unconditionally enabled: for example `Machine Optimization Remark Emitter`.
The diagnostic remarks enabled by default are [from missed optimizations and opt analyses](https://github.com/rust-lang/rust/pull/113339#discussion_r1259480303). They are created by LLVM, passed to the diagnostic handler on the C++ side, emitted to rust, where they are unpacked, C++ strings are converted to rust, etc.
Then they are discarded in the vast majority of the time (i.e. unless some kind of `-Cremark` has enabled some of these passes' output to be printed).
These unneeded allocations are very short-lived, basically only lasting between the LLVM pass emitting them and the rust handler where they are discarded. So it doesn't hugely impact max-rss, and is only a slight reduction in instruction count (cachegrind reports a reduction between 0.3% and 0.5%) _on linux_. It's possible that targets without `jemalloc` or with a worse allocator, may optimize these less.
It is however significant in the aggregate, looking at the total number of allocated bytes:
- it's the biggest source of allocations according to dhat, on the benchmarks I've tried e.g. `syn` or `cargo`
- allocations on `syn` are reduced by 440MB, 17% (from 2440722647 bytes total, to 2030461328 bytes)
- allocations on `cargo` are reduced by 6.6GB, 19% (from 35371886402 bytes total, to 28723987743 bytes)
Some of these diagnostics objects [are allocated in LLVM](https://github.com/rust-lang/rust/pull/113339#discussion_r1252387484) *before* they're emitted to our diagnostic handler, where they'll be filtered out. So we could remove those in the future, but that will require changing a few LLVM call-sites upstream, so I left a FIXME.
now that remarks are filtered before cg_llvm's diagnostic handler callback
is called, we don't need to do the filtering post c++-to-rust conversion
of the diagnostic.
cleanup: remove pointee types
This can't be merged until the oldest LLVM version we support uses opaque pointers, which will be the case after #114148. (Also note `-Cllvm-args="-opaque-pointers=0"` can technically be used in LLVM 15, though I don't think we should support that configuration.)
I initially hoped this would provide some minor perf win, but in https://github.com/rust-lang/rust/pull/105412#issuecomment-1341224450 it had very little impact, so this is only valuable as a cleanup.
As a followup, this will enable #96242 to be resolved.
r? `@ghost`
`@rustbot` label S-blocked
Operand types are now tracked explicitly, so there is no need to reserve ID 0
for the special always-zero counter.
As part of the renumbering, this change fixes an off-by-one error in the way
counters were counted by the `coverageinfo` query. As a result, functions
should now have exactly the number of counters they actually need, instead of
always having an extra counter that is never used.
Operand types are now tracked explicitly, so there is no need for expression
IDs to avoid counter IDs by descending from `u32::MAX`. Instead they can just
count up from 0, and can be used directly as indices when necessary.
Because the three kinds of operand are now distinguished explicitly, we no
longer need fiddly code to disambiguate counter IDs and expression IDs based on
the total number of counters/expressions in a function.
This does increase the size of operands from 4 bytes to 8 bytes, but that
shouldn't be a big deal since they are mostly stored inside boxed structures,
and the current coverage code is not particularly size-optimized anyway.
Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like
```rs
struct Foo(Box<Foo>, u64, u64);
```
which would be represented with something like
```ll
%Foo = type { %Foo*, i64, i64 }
```
which is now just
```ll
{ ptr, i64, i64 }
```
This section name is always constant for a given target, but obtaining it from
LLVM requires a few intermediate allocations. There's no need to do so
repeatedly from inside a per-function loop.
Both GCC and Clang write by default a `.comment` section with compiler
information:
```txt
$ gcc -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] GCC: (GNU) 11.2.0
$ clang -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] clang version 14.0.1 (https://github.com/llvm/llvm-project.git c62053979489ccb002efe411c3af059addcb5d7d)
```
They also implement the `-Qn` flag to avoid doing so:
```txt
$ gcc -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
$ clang -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
```
So far, `rustc` only does it for WebAssembly targets and only
when debug info is enabled:
```txt
$ echo 'fn main(){}' | rustc --target=wasm32-unknown-unknown --emit=llvm-ir -Cdebuginfo=2 - && grep llvm.ident rust_out.ll
!llvm.ident = !{!27}
```
In the RFC part of this PR it was decided to always add
the information, which gets us closer to other popular compilers.
An opt-out flag like GCC and Clang may be added later on if deemed
necessary.
Implementation-wise, this covers both `ModuleLlvm::new()` and
`ModuleLlvm::new_metadata()` cases by moving the addition to
`context::create_module` and adds a few test cases.
ThinLTO also sees the `llvm.ident` named metadata duplicated (in
temporary outputs), so this deduplicates it like it is done for
`wasm.custom_sections`. The tests also check this duplication does
not take place.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Prototype: Add unstable `-Z reference-niches` option
MCP: rust-lang/compiler-team#641
Relevant RFC: rust-lang/rfcs#3204
This prototype adds a new `-Z reference-niches` option, controlling the range of valid bit-patterns for reference types (`&T` and `&mut T`), thereby enabling new enum niching opportunities. Like `-Z randomize-layout`, this setting is crate-local; as such, references to built-in types (primitives, tuples, ...) are not affected.
The possible settings are (here, `MAX` denotes the all-1 bit-pattern):
| `-Z reference-niches=` | Valid range |
|:---:|:---:|
| `null` (the default) | `1..=MAX` |
| `size` | `1..=(MAX- size)` |
| `align` | `align..=MAX.align_down_to(align)` |
| `size,align` | `align..=(MAX-size).align_down_to(align)` |
------
This is very WIP, and I'm not sure the approach I've taken here is the best one, but stage 1 tests pass locally; I believe this is in a good enough state to unleash this upon unsuspecting 3rd-party code, and see what breaks.
Support `--print KIND=PATH` command line syntax
As is already done for `--emit KIND=PATH` and `-L KIND=PATH`.
In the discussion of #110785, it was pointed out that `--print KIND=PATH` is nicer than trying to apply the single global `-o` path to `--print`'s output, because in general there can be multiple print requests within a single rustc invocation, and anyway `-o` would already be used for a different meaning in the case of `link-args` and `native-static-libs`.
I am interested in using `--print cfg=PATH` in Buck2. Currently Buck2 works around the lack of support for `--print KIND=PATH` by [indirecting through a Python wrapper script](d43cf3a51a/prelude/rust/tools/get_rustc_cfg.py) to redirect rustc's stdout into the location dictated by the build system.
From skimming Cargo's usages of `--print`, it definitely seems like it would benefit from `--print KIND=PATH` too. Currently it is working around the lack of this by inserting `--crate-name=___ --print=crate-name` so that it can look for a line containing `___` as a delimiter between the 2 other `--print` informations it actually cares about. This is commented as a "HACK" and "abuse". 31eda6f7c3/src/cargo/core/compiler/build_context/target_info.rs (L242) (FYI `@weihanglo` as you dealt with this recently in https://github.com/rust-lang/cargo/pull/11633.)
Mentioning reviewers active in #110785: `@fee1-dead` `@jyn514` `@bjorn3`
Resurrect: rustc_llvm: Add a -Z `print-codegen-stats` option to expose LLVM statistics.
This resurrects PR https://github.com/rust-lang/rust/pull/104000, which has sat idle for a while. And I want to see the effect of stack-move optimizations on LLVM (like https://reviews.llvm.org/D153453) :).
I have applied the changes requested by `@oli-obk` and `@nagisa` https://github.com/rust-lang/rust/pull/104000#discussion_r1014625377 and https://github.com/rust-lang/rust/pull/104000#discussion_r1014642482 in the latest commits.
r? `@oli-obk`
-----
LLVM has a neat [statistics](https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option) feature that tracks how often optimizations kick in. It's very handy for optimization work. Since we expose the LLVM pass timings, I thought it made sense to expose the LLVM statistics too.
-----
(Edit: fix broken link
(Edit2: fix segmentation fault and use malloc
If `rustc` is built with
```toml
[llvm]
assertions = true
```
Then you can see like
```
rustc +stage1 -Z print-codegen-stats -C opt-level=3 tmp.rs
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
3 aa - Number of MayAlias results
193 aa - Number of MustAlias results
531 aa - Number of NoAlias results
...
```
And the current default build emits only
```
$ rustc +stage1 -Z print-codegen-stats -C opt-level=3 tmp.rs
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
$
```
This might be better to emit the message to tell assertion flag necessity, but now I can't find how to do that...
Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.
**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**
This is also the reason why no_builtins does not take effect in LTO as mentioned in #35540.
Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.
Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.
We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.
Related discussions:
- #109821
- #35540
Next (a separate pull request?):
- [ ] Revert #35637
- [ ] Add a function-level `no_builtin` attribute?
Better diagnostics for dlltool errors.
When dlltool fails, show the full command that was executed. In particular, llvm-dlltool is not very helpful, printing a generic usage message rather than what actually went wrong, so stdout and stderr aren't of much use when troubleshooting.
When dlltool fails, show the full command that was executed. In
particular, llvm-dlltool is not very helpful, printing a generic usage
message rather than what actually went wrong, so stdout and stderr
aren't of much use when troubleshooting.
LLVM has a neat [statistics] feature that tracks how often optimizations kick
in. It's very handy for optimization work. Since we expose the LLVM pass
timings, I thought it made sense to expose the LLVM statistics too.
[statistics]: https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option
Remove `LLVMRustCoverageHashCString`
Coverage has two FFI functions for computing the hash of a byte string. One takes a ptr/len pair (`LLVMRustCoverageHashByteArray`), and the other takes a NUL-terminated C string (`LLVMRustCoverageHashCString`).
But on closer inspection, the C string version is unnecessary. The calling-side code converts a Rust `&str` into a `CString`, and the C++ code then immediately turns it back into a ptr/len string before actually hashing it. So we can just call the ptr/len version directly instead.
---
This PR also fixes a bug in the C++ declaration of `LLVMRustCoverageHashByteArray`. It should be `size_t`, since that's what is declared and passed on the Rust side, and it's what `StrRef`'s constructor expects to receive on the callee side.
Coverage has two FFI functions for computing the hash of a byte string. One
takes a ptr/len pair, and the other takes a NUL-terminated C string.
But on closer inspection, the C string version is unnecessary. The calling-side
code converts a Rust `&str` into a C string, and the C++ code then immediately
turns it back into a ptr/len string before actually hashing it.
The function body immediately treats it as a slice anyway, so this just makes
it possible to call the hash function with arbitrary read-only byte slices.
Move `TyCtxt::mk_x` to `Ty::new_x` where applicable
Part of rust-lang/compiler-team#616
turns out there's a lot of places we construct `Ty` this is a ridiculously huge PR :S
r? `@oli-obk`
Revert the lexing of `c"…"` string literals
Fixes \[after beta-backport\] #113235.
Further progress is tracked in #113333.
This PR *manually* reverts parts of #108801 (since a git-revert would've been too coarse-grained & messy)
and git-reverts #111647.
CC `@fee1-dead` (#108801) `@klensy` (#111647)
r? `@compiler-errors`
`@rustbot` label F-c_str_literals beta-nominated
Add `-Zremark-dir` unstable flag to write LLVM optimization remarks to YAML
This PR adds an option for `rustc` to emit LLVM optimization remarks to a set of YAML files, which can then be digested by existing tools, like https://github.com/OfekShilon/optview2. When `-Cremark-dir` is passed, and remarks are enabled (`-Cremark=all`), the remarks will be now written to the specified directory, **instead** of being printed to standard error output. The files are named based on the CGU from which they are being generated.
Currently, the remarks are written using the LLVM streaming machinery, directly in the diagnostics handler. It seemed easier than going back to Rust and then form there back to C++ to use the streamer from the diagnostics handler. But there are many ways to implement this, of course, so I'm open to suggestions :)
I included some comments with questions into the code. Also, I'm not sure how to test this.
r? `@tmiasko`
Support for native WASM exceptions
### Motivation
Currently, rustc does not support native WASM exceptions. It does support JavaScript based exceptions for the wasm32-emscripten-target, but this requires back&forth with javascript for many calls, which is very slow.
Native wasm support for exceptions is quite common: Clang+LLVM implemented them years ago, and all major browsers support them by now. They enable zero-cost exceptions, at least with regard to runtime-performance-cost. They may increase startup-time and code size, though.
### Important: This PR does not change default behaviour
Exceptions usually add a lot of code in form of unwinding blocks, increasing the binary size. Most users probably do not want that, especially which regard to web development.
Therefore, wasm exceptions play a similar role as WASM-threads: rustc should support them, like clang does, but users who want to use it have to use some command-line magic like rustflags to opt in.
### What does this PR do?
As stated above, the default behaviour is not changed. It is already possible to opt-in into wasm exceptions using the command line. Unfortunately, the LLVM IR is invalid and the LLVM backend crashes.
```
rustc <sourcefile>
--target wasm32-unknown-unknown
-C panic=unwind
-C llvm-args=-wasm-enable-eh
-C target-feature=+exception-handling
```
As it turns out, LLVM is quite picky when it comes to IR for exception handling. If the IR does not look exactly like it should, some LLVM-assertions fail and the code generation crashes.
This PR adds the necessary modifications to the code generator to make it work. It also adds `exception-handling` as a wasm target feature.
### What this PR does not / what is missing
This PR is not a full fledges solution. It is the first step. A few parts are still missing; however, it is already useable (see next section).
Currently missing:
* The std library has to be adapted. Currently, only [no_std] crates work
* Usually, nested exceptions abort the program (i.e. a panic during the cleanup of another panic). This is currently not done yet.
- Currently, code inside cleanup handlers does not unwind
- To fix this requires a little more work: The code generator currently maintains a single terminate block per function for this. Unfortunately, WASM requires funclet based exception handling. Therefore, we need to create a terminate block per funclet. This is probably not a big problem, but I want to keep this PR simple.
### How to use the compiler given this PR?
This PR does not add any command line flags or features. It uses those which are already there. To compile with exceptions enabled, you need
* to set the panic strategy to unwind, i.e. `-C panic=unwind`
* to enable the exception-handling target feature, i.e. `-C target-feature=+exception-handling`
* to tell LLVM about the exception handling, i.e. `-C llvm-args=-wasm-enable-eh`
Since the standard library has not been adapted, you can only use it in [no_std] crates as of now. The intrinsic `core::intrinsics::r#try` works. To throw exceptions, you need the ```@llvm.wasm.throw``` intrinsic.
I created a sample application which works for me: https://github.com/mirkootter/rust-wasm-demos
This example can be run at https://webassembly.sh
After the last commit, they contain `Option<&OperandBundleDef<'a>>` but
the values are always `Some(_)`. This commit removes the needless
`Option` wrapper. This also simplifies the type signatures of
`LLVMRustBuild{Invoke,Call}`, which were relying on the fact that the
represention of `Option<&T>` is the same as `&T` for non-`None` values.
They never have a length of more than two. So this commit changes them
to `SmallVec<[_; 2]>`.
Also, we possibly push `None` values and then filter those `None` values
out again with `retain`. So this commit removes the `retain` and instead
only pushes the values if they are `Some(_)`.
I don't know why `SmallStr` was used here; some ad hoc profiling showed
this code is not that hot, the string is usually empty, and when it's
not empty it's usually very short. However, the use of a
`SmallStr<1024>` does result in 1024 byte `memcpy` call on each
execution, which shows up when I do `memcpy` profiling. So using a
normal string makes the code both simpler and very slightly faster.
`lookup_debug_loc` finds a file, line, and column, which requires two
binary searches. But this call site only needs the file.
This commit replaces the call with `lookup_source_file`, which does a
single binary search.
`lookup_debug_loc` calls `SourceMap::lookup_line`, which does a binary
search over the files, and then a binary search over the lines within
the found file. It then calls `SourceFile::line_begin_pos`, which redoes
the binary search over the lines within the found file.
This commit removes the second binary search over the lines, instead
getting the line starting pos directly using the result of the first
binary search over the lines.
(And likewise for `get_span_loc`, in the cranelift backend.)
Because tiny CGUs make compilation less efficient *and* result in worse
generated code.
We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.
This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.
Here are some example before/after CGU sizes for opt builds.
- html5ever
- CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
- CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]
- libc
- CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
- CGUs: 1, mean size: 424.0, sizes: [424]
- tt-muncher
- CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
- CGUs: 1, mean size: 9075.0, sizes: [9075]
Note that CGUs of size 100,000+ aren't unusual in larger programs.
Removed use of iteration through a HashMap/HashSet in rustc_incremental and replaced with IndexMap/IndexSet
This allows for the `#[allow(rustc::potential_query_instability)]` in rustc_incremental to be removed, moving towards fixing #84447 (although a LOT more modules have to be changed to fully resolve it). Only HashMaps/HashSets that are being iterated through have been modified (although many structs and traits outside of rustc_incremental had to be modified as well, as they had fields/methods that involved a HashMap/HashSet that would be iterated through)
I'm making a PR for just 1 module changed to test for performance regressions and such, for future changes I'll either edit this PR to reflect additional modules being converted, or batch multiple modules of changes together and make a PR for each group of modules.
use c literals in compiler and library
Use c literals #108801 in compiler and library
currently blocked on:
* <strike>rustfmt: don't know how to format c literals</strike> nope, nightly one works.
* <strike>bootstrap</strike>
r? `@ghost`
`@rustbot` blocked
These tend to have special handling in a bunch of places anyway, so the variant helps remember that. And I think it's easier to grok than non-Scalar Aggregates sometimes being `Immediates` (like I got wrong and caused 109992). As a minor bonus, it means we don't need to generate poison LLVM values for them to pass around in `OperandValue::Immediate`s.
Optimize scalar and scalar pair representations loaded from ByRef in llvm
in https://github.com/rust-lang/rust/pull/105653 I noticed that we were generating suboptimal LLVM IR if we had a `ConstValue::ByRef` that could be represented by a `ScalarPair`. Before https://github.com/rust-lang/rust/pull/105653 this is probably rare, but after it, every slice will go down this suboptimal code path that requires LLVM to untangle a bunch of indirections and translate static allocations that are only used once to read a scalar pair from.
Adds support for LLVM [SafeStack] which provides backward edge control
flow protection by separating the stack into two parts: data which is
only accessed in provable safe ways is allocated on the normal stack
(the "safe stack") and all other data is placed in a separate allocation
(the "unsafe stack").
SafeStack support is enabled by passing `-Zsanitizer=safestack`.
[SafeStack]: https://clang.llvm.org/docs/SafeStack.html
Support #[global_allocator] without the allocator shim
This makes it possible to use liballoc/libstd in combination with `--emit obj` if you use `#[global_allocator]`. This is what rust-for-linux uses right now and systemd may use in the future. Currently they have to depend on the exact implementation of the allocator shim to create one themself as `--emit obj` doesn't create an allocator shim.
Note that currently the allocator shim also defines the oom error handler, which is normally required too. Once `#![feature(default_alloc_error_handler)]` becomes the only option, this can be avoided. In addition when using only fallible allocator methods and either `--cfg no_global_oom_handling` for liballoc (like rust-for-linux) or `--gc-sections` no references to the oom error handler will exist.
To avoid this feature being insta-stable, you will have to define `__rust_no_alloc_shim_is_unstable` to avoid linker errors.
(Labeling this with both T-compiler and T-lang as it originally involved both an implementation detail and had an insta-stable user facing change. As noted above, the `__rust_no_alloc_shim_is_unstable` symbol requirement should prevent unintended dependence on this unstable feature.)
Rather than returning an array of features from to_llvm_features, return a structure that contains
the dependencies. This also contains metadata on how the features depend on each other to allow for
the correct enabling and disabling.
Some features that are tied together only make sense to be folded
together when enabling the feature. For example on AArch64 sve and
neon are tied together, however it doesn't make sense to disable neon
when disabling sve.
In #91608 the fp-armv8 feature was removed as it's tied to the neon
feature. However disabling neon didn't actually disable the use of
floating point registers and instructions, for this `-fp-armv8` is
required.
Fix dependency tracking for debugger visualizers
This PR fixes dependency tracking for debugger visualizer files by changing the `debugger_visualizers` query to an `eval_always` query that scans the AST while it is still available. This way the set of visualizer files is already available when dep-info is emitted. Since the query is turned into an `eval_always` query, dependency tracking will now reliably detect changes to the visualizer script files themselves.
TODO:
- [x] perf.rlo
- [x] Needs a bit more documentation in some places
- [x] Needs regression test for the incr. comp. case
Fixes https://github.com/rust-lang/rust/issues/111226
Fixes https://github.com/rust-lang/rust/issues/111227
Fixes https://github.com/rust-lang/rust/issues/111295
r? `@wesleywiser`
cc `@gibbyfree`
Only depend on CFG_VERSION in rustc_interface
This avoids having to rebuild the whole compiler on each commit when `omit-git-hash = false`.
cc https://github.com/rust-lang/rust/issues/76720 - this won't fix it, and I'm not suggesting we turn this on by default, but it will make it less painful for people who do have `omit-git-hash` on as a workaround.
Remove the ThinLTO CU hack
This reverts #46722, commit e0ab5d5feb.
Since #111167, commit 10b69dde3f, we are
generating DWARF subprograms in a way that is meant to be more compatible
with LLVM's expectations, so hopefully we don't need this workaround
rewriting CUs anymore.
Remove misleading target feature aliases
Fixes#100752. This is a follow up to #103750. These aliases could not be completely removed until rust-lang/stdarch#1355 landed.
cc `@Amanieu`
CFI: Fix SIGILL reached via trait objects
Fix#106547 by transforming the concrete self into a reference to a trait object before emitting type metadata identifiers for trait methods.
You will need to add the following as replacement for the old __rust_*
definitions when not using the alloc shim.
#[no_mangle]
static __rust_no_alloc_shim_is_unstable: u8 = 0;
This makes it possible to use liballoc/libstd in combination with
`--emit obj` if you use `#[global_allocator]`. Making it work for the
default libstd allocator would require weak functions, which are not
well supported on all systems.