Add asm goto support to `asm!`
Tracking issue: #119364
This PR implements asm-goto support, using the syntax described in "future possibilities" section of [RFC2873](https://rust-lang.github.io/rfcs/2873-inline-asm.html#asm-goto).
Currently I have only implemented the `label` part, not the `fallthrough` part (i.e. fallthrough is implicit). This doesn't reduce the expressive though, since you can use label-break to get arbitrary control flow or simply set a value and rely on jump threading optimisation to get the desired control flow. I can add that later if deemed necessary.
r? ``@Amanieu``
cc ``@ojeda``
Currently many diagnostic modifier methods are available on both
`Diagnostic` and `DiagnosticBuilder`. This commit removes most of them
from `Diagnostic`. To minimize the diff size, it keeps them within
`diagnostic.rs` but changes the surrounding `impl Diagnostic` block to
`impl DiagnosticBuilder`. (I intend to move things around later, to give
a more sensible code layout.)
`Diagnostic` keeps a few methods that it still needs, like `sub`,
`arg`, and `replace_args`.
The `forward!` macro, which defined two additional methods per call
(e.g. `note` and `with_note`), is replaced by the `with_fn!` macro,
which defines one additional method per call (e.g. `with_note`). It's
now also only used when necessary -- not all modifier methods currently
need a `with_*` form. (New ones can be easily added as necessary.)
All this also requires changing `trait AddToDiagnostic` so its methods
take `DiagnosticBuilder` instead of `Diagnostic`, which leads to many
mechanical changes. `SubdiagnosticMessageOp` gains a type parameter `G`.
There are three subdiagnostics -- `DelayedAtWithoutNewline`,
`DelayedAtWithNewline`, and `InvalidFlushedDelayedDiagnosticLevel` --
that are created within the diagnostics machinery and appended to
external diagnostics. These are handled at the `Diagnostic` level, which
means it's now hard to construct them via `derive(Diagnostic)`, so
instead we construct them by hand. This has no effect on what they look
like when printed.
There are lots of new `allow` markers for `untranslatable_diagnostics`
and `diagnostics_outside_of_impl`. This is because
`#[rustc_lint_diagnostics]` annotations were present on the `Diagnostic`
modifier methods, but missing from the `DiagnosticBuilder` modifier
methods. They're now present.
Implement intrinsics with fallback bodies
fixes#93145 (though we can port many more intrinsics)
cc #63585
The way this works is that the backend logic for generating custom code for intrinsics has been made fallible. The only failure path is "this intrinsic is unknown". The `Instance` (that was `InstanceDef::Intrinsic`) then gets converted to `InstanceDef::Item`, which represents the fallback body. A regular function call to that body is then codegenned. This is currently implemented for
* codegen_ssa (so llvm and gcc)
* codegen_cranelift
other backends will need to adjust, but they can just keep doing what they were doing if they prefer (though adding new intrinsics to the compiler will then require them to implement them, instead of getting the fallback body).
cc `@scottmcm` `@WaffleLapkin`
### todo
* [ ] miri support
* [x] default intrinsic name to name of function instead of requiring it to be specified in attribute
* [x] make sure that the bodies are always available (must be collected for metadata)
Merge `impl_polarity` and `impl_trait_ref` queries
Hopefully this is perf neutral. I want to finish https://github.com/rust-lang/rust/pull/120835 and stop using the HIR in `coherent_trait`, which should then give us a perf improvement.
Dejargonize `subst`
In favor of #110793, replace almost every occurence of `subst` and `substitution` from rustc codes, but they still remains in subtrees under `src/tools/` like clippy and test codes (I'd like to replace them after this)
Fix async closures in CTFE
First commit renames `is_coroutine_or_closure` into `is_closure_like`, because `is_coroutine_or_closure_or_coroutine_closure` seems confusing and long.
Second commit fixes some forgotten cases where we want to handle `TyKind::CoroutineClosure` the same as closures and coroutines.
The test exercises the change to `ValidityVisitor::aggregate_field_path_elem` which is the source of #120946, but not the change to `UsedParamsNeedSubstVisitor`, though I feel like it's not that big of a deal. Let me know if you'd like for me to look into constructing a test for the latter, though I have no idea what it'd look like (we can't assert against `TooGeneric` anywhere?).
Fixes#120946
r? oli-obk cc ``@RalfJung``
large_assignments: Allow moves into functions
Moves into functions are typically implemented with pointer passing
rather than memcpy's at the llvm-ir level, so allow moves into
functions.
Part of the "Differentiate between Operand::Move and Operand::Copy" step of https://github.com/rust-lang/rust/issues/83518.
r? `@oli-obk` (who I think is still E-mentor?)
Invert diagnostic lints.
That is, change `diagnostic_outside_of_impl` and `untranslatable_diagnostic` from `allow` to `deny`, because more than half of the compiler has been converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow` attributes, which proves that this change is warranted.
r? ````@davidtwco````
rustc_monomorphize: fix outdated comment in partition
`max_cgu_count` was removed in 51821515b3, but not comment (usage in `merge_codegen_units` was removed earlier).
r? `@nnethercote`
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
Remove `track_errors` entirely
follow up to https://github.com/rust-lang/rust/pull/119869
r? `@matthewjasper`
There are some diagnostic changes adding new diagnostics or not emitting some anymore. We can improve upon that in follow-up work imo.
Do not normalize closure signature when building `FnOnce` shim
It is not necessary to normalize the closure signature when building an `FnOnce` shim for an `Fn`/`FnMut` closure. That closure shim is just calling `FnMut::call_mut(&mut self)` anyways.
It's also somewhat sketchy that we were ever doing this to begin with, since we're normalizing with a `ParamEnv::reveal_all()` param-env, which is definitely not right with possibly polymorphic substs.
This cuts out a tiny bit of unnecessary work in `Instance::resolve` and simplifies the signature because now we can unconditionally return an `Instance`.
Use `bool` instead of `PartiolOrd` as return value of the comparison closure in `{slice,Iteraotr}::is_sorted_by`
Changes the function signature of the closure given to `{slice,Iteraotr}::is_sorted_by` to return a `bool` instead of a `PartiolOrd` as suggested by the libs-api team here: https://github.com/rust-lang/rust/issues/53485#issuecomment-1766411980.
This means these functions now return true if the closure returns true for all the pairs of values.
When writing a no_std binary, you'll be greeted with nonsensical errors
mentioning lang items like eh_personality and start. That's pretty bad
because it makes you think that you need to define them somewhere! But
oh no, now you're getting the `internal_features` lint telling you that
you shouldn't use them! But you need a no_std binary! What now?
No problem! Writing a no_std binary is super easy. Just use panic=abort
and supply your own platform specific entrypoint symbol (like `main`)
and you're good to go. Would be nice if the compiler told you that,
right?
This makes it so that it does do that.
`Diagnostic` has 40 methods that return `&mut Self` and could be
considered setters. Four of them have a `set_` prefix. This doesn't seem
necessary for a type that implements the builder pattern. This commit
removes the `set_` prefixes on those four methods.
And make all hand-written `IntoDiagnostic` impls generic, by using
`DiagnosticBuilder::new(dcx, level, ...)` instead of e.g.
`dcx.struct_err(...)`.
This means the `create_*` functions are the source of the error level.
This change will let us remove `struct_diagnostic`.
Note: `#[rustc_lint_diagnostics]` is added to `DiagnosticBuilder::new`,
it's necessary to pass diagnostics tests now that it's used in
`into_diagnostic` functions.
Fix cases where std accidentally relied on inline(never)
This PR increases the power of `-Zcross-crate-inline-threshold=always` so that it applies through `#[inline(never)]`. Note that though this is called "cross-crate-inlining" in this case especially it is _just_ lazy per-CGU codegen. The MIR inliner and LLVM still respect the attribute as much as they ever have.
Trying to bootstrap with the new `-Zcross-crate-inline-threshold=always` change revealed two bugs:
We have special intrinsics `assert_inhabited`, `assert_zero_valid`, and `assert_mem_uniniitalized_valid` which codegen backends will lower to nothing or a call to `panic_nounwind`. Since we may not have any call to `panic_nounwind` in MIR but emit one anyway, we need to specially tell `MirUsedCollector` about this situation.
`#[lang = "start"]` is special-cased already so that `MirUsedCollector` will collect it, but then when we make it cross-crate-inlinable it is only assigned to a CGU based on whether `MirUsedCollector` saw a call to it, which of course we didn't.
---
I started looking into this because https://github.com/rust-lang/rust/pull/118683 revealed a case where we were accidentally relying on a function being `#[inline(never)]`, and cranking up cross-crate-inlinability seems like a way to find other situations like that.
r? `@nnethercote` because I don't like what I'm doing to the CGU partitioning code here but I can't come up with something much better
Tell MirUsedCollector that the pointer alignment checks calls its panic symbol
Fixes https://github.com/rust-lang/rust/pull/118683 (not an issue, but that PR is a basically a bug report)
When we had `panic_immediate_abort` start adding `#[inline]` to this panic function, builds started breaking because we failed to write up the MIR assert terminator to the correct panic shim. Things happened to work before by pure luck because without this feature enabled, the function we're inserting calls to is `#[inline(never)]` so we always generated code for it.
r? bjorn3
Currently we always do this:
```
use rustc_fluent_macro::fluent_messages;
...
fluent_messages! { "./example.ftl" }
```
But there is no need, we can just do this everywhere:
```
rustc_fluent_macro::fluent_messages! { "./example.ftl" }
```
which is shorter.
The `fluent_messages!` macro produces uses of
`crate::{D,Subd}iagnosticMessage`, which means that every crate using
the macro must have this import:
```
use rustc_errors::{DiagnosticMessage, SubdiagnosticMessage};
```
This commit changes the macro to instead use
`rustc_errors::{D,Subd}iagnosticMessage`, which avoids the need for the
imports.
Most notably, this commit changes the `pub use crate::*;` in that file
to `use crate::*;`. This requires a lot of `use` items in other crates
to be adjusted, because everything defined within `rustc_span::*` was
also available via `rustc_span::source_map::*`, which is bizarre.
The commit also removes `SourceMap::span_to_relative_line_string`, which
is unused.
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
This query has a name that sounds general-purpose, but in fact it has
coverage-specific semantics, and (fortunately) is only used by coverage code.
Because it is only ever called once (from one designated CGU), it doesn't need
to be a query, and we can change it to a regular function instead.
Don't store lazyness in `DefKind::TyAlias`
1. Don't store lazyness of a type alias in its `DefKind`, but instead via a query.
2. This allows us to treat type aliases as lazy if `#[feature(lazy_type_alias)]` *OR* if the alias contains a TAIT, rather than having checks for both in separate parts of the codebase.
r? `@oli-obk` cc `@fmease`
rename mir::Constant -> mir::ConstOperand, mir::ConstKind -> mir::Const
Also, be more consistent with the `to/eval_bits` methods... we had some that take a type and some that take a size, and then sometimes the one that takes a type is called `bits_for_ty`.
Turns out that `ty::Const`/`mir::ConstKind` carry their type with them, so we don't need to even pass the type to those `eval_bits` functions at all.
However this is not properly consistent yet: in `ty` we have most of the methods on `ty::Const`, but in `mir` we have them on `mir::ConstKind`. And indeed those two types are the ones that correspond to each other. So `mir::ConstantKind` should actually be renamed to `mir::Const`. But what to do with `mir::Constant`? It carries around a span, that's really more like a constant operand that appears as a MIR operand... it's more suited for `syntax.rs` than `consts.rs`, but the bigger question is, which name should it get if we want to align the `mir` and `ty` types? `ConstOperand`? `ConstOp`? `Literal`? It's not a literal but it has a field called `literal` so it would at least be consistently wrong-ish...
``@oli-obk`` any ideas?
treat host effect params as erased in codegen
This fixes the changes brought to codegen tests when effect params are added to libcore, by not attempting to monomorphize functions that get the host param by being `const fn`.
r? `@oli-obk`
This fixes the changes brought to codegen tests when effect params are
added to libcore, by not attempting to monomorphize functions that get
the host param by being `const fn`.
Allow `large_assignments` for Box/Arc/Rc initialization
Does the `stop linting in box/arc initialization` task of #83518.
r? `@oli-obk` who is E-mentor.
Avoid duplicate `large_assignments` lints
By checking for overlapping spans.
This PR does the "reduce noisiness" task in #83518.
r? `@oli-obk` who added E-mentor and E-help-wanted and wrote the initial code.
(The fix itself is in dc82736677. The two commits before that are just small refactorings.)
Tweak CGU sorting in a couple of places.
In `base.rs`, tweak how the CGU size interleaving works. Since #113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.
In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)
r? `@pnkfelix`
In `base.rs`, tweak how the CGU size interleaving works. Since #113777,
it's much more common to have multiple CGUs with identical sizes. With
the existing code these same-sized items ended up in the
opposite-to-desired order due to the stable sorting. The code now starts
with a reverse sort (like is done in `partitioning.rs`) which gives the
behaviour we want. This doesn't matter much for perf, but makes profiles
in `samply` look more like what we expect.
In `partitioning.rs`, we can use `sort_by_key` instead of
`sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is
an identical CGU sort earlier in that function that already uses
`sort_by_key`.)
Instead of repeatedly merging the two smallest CGUs, we now use a
merging algorithm that aims to minimize the duplication of inlined
functions.
`exa-0.10.1` was one benchmark that saw particularly good results. The
old CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3834 (1216 root + 2618 inlined), placed size: 154552 (77219 root + 77333 inlined)
- placed/unique items ratio: 1.38, placed/unique size ratio: 1.27
- CGUs: 16, mean size: 9659.5, sizes: [11791, 11634, 11173, 10987, 10939, 10507, 9992, 9813, 9593, 9580, 9030, 8447, 7975, 7961, 7876, 7254]
```
The new CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3626 (1216 root + 2410 inlined), placed size: 147201 (77219 root + 69982 inlined)
- placed/unique items ratio: 1.31, placed/unique size ratio: 1.21
- CGUs: 16, mean size: 9200.1, sizes: [11634, 10939, 10227, 9555, 9178, 9167, 8879, 8804, 8604, 8603 (x3), 8602 (x2), 8601, 8600]
```
The difference is in the number of inlined items. There are 1558 unique
inlined items. With the old algorithm these were placed 2618 times,
resulting in 1060 duplicates. With the new algorithm these were placed
2410 times, resulting in 852 duplicates. Also, the mean CGU size dropped
from 9659.5 to 9200.1, and the CGU size distribution tightened, with the
biggest one a little smaller and the smallest ones a little bigger.
They're quite rare, and ignoring them simplifies things quite a bit, and
further reduces the number of calls to `MonoItem::size_estimate` to the
number of placed items (one per root item, and one or more per reachable
inlined item).
This means we call `MonoItem::size_estimate` (which involves a query)
less often: just once per mono item, and then once more per inline item
placement. After that we can reuse the stored value as necessary. This
means `CodegenUnit::compute_size_estimate` is cheaper.
Rollup of 7 pull requests
Successful merges:
- #112931 (Enable zlib in LLVM on aarch64-apple-darwin)
- #113158 (tests: unset `RUSTC_LOG_COLOR` in a test)
- #113173 (CI: include workflow name in concurrency group)
- #113335 (Reveal opaques in new solver)
- #113390 (CGU formation tweaks)
- #113399 (Structurally normalize again for byte string lit pat checking)
- #113412 (Add basic types to SMIR)
r? `@ghost`
`@rustbot` modify labels: rollup
It makes it sound like the `ExprKind` and `Rvalue` are supposed to represent all pointer related
casts, when in reality their just used to share a some enum variants. Make it clear there these
are only coercion to make it clear why only some pointer related "casts" are in the enum.
For non-incremental builds on Unix, currently all the thread names look
like `opt regex.f10ba03eb5ec7975-cgu.0`. But they are truncated by
`pthread_setname` to `opt regex.f10ba`, hiding the numeric suffix that
distinguishes them. This is really annoying when using a profiler like
Samply.
This commit changes these thread names to a form like `opt cgu.0`, which
is much better.
Currently there are two problems.
First, the CGUS don't end up in size order. The merging loop does sort
by size on each iteration, but we don't sort after the final merge, so
typically there is one CGU out of place. (And sometimes we don't enter
the merging loop at all, in which case they end up in random order.)
Second, we then assign names that differ only by a numeric suffix, and
then we sort them lexicographically by name, giving us an order like
this:
regex.f10ba03eb5ec7975-cgu.1
regex.f10ba03eb5ec7975-cgu.10
regex.f10ba03eb5ec7975-cgu.11
regex.f10ba03eb5ec7975-cgu.12
regex.f10ba03eb5ec7975-cgu.13
regex.f10ba03eb5ec7975-cgu.14
regex.f10ba03eb5ec7975-cgu.15
regex.f10ba03eb5ec7975-cgu.2
regex.f10ba03eb5ec7975-cgu.3
regex.f10ba03eb5ec7975-cgu.4
regex.f10ba03eb5ec7975-cgu.5
regex.f10ba03eb5ec7975-cgu.6
regex.f10ba03eb5ec7975-cgu.7
regex.f10ba03eb5ec7975-cgu.8
regex.f10ba03eb5ec7975-cgu.9
These two problems are really annoying when debugging and profiling the
CGUs.
This commit ensures CGUs are sorted by name *and* reverse sorted by
size. This involves (a) one extra sort by size operation, and (b)
padding the numeric indices with zeroes, e.g.
`regex.f10ba03eb5ec7975-cgu.01`.
(Note that none of this applies for incremental builds, where a
different hash-based CGU naming scheme is used.)
- Rename `create_size_estimate` as `compute_size_estimate`, because that
makes more sense for the second and subsequent calls for each CGU.
- Change `CodegenUnit::size_estimate` from `Option<usize>` to `usize`.
We can still assert that `compute_size_estimate` is called first.
- Move the size estimation for `place_mono_items` inside the function,
for consistency with `merge_codegen_units`.
Because CGU merging relies on CGU sizes, but the CGU sizes before
inlining aren't accurate.
This requires tweaking how the sizes are updated during merging: if CGU
A and B both have an inlined function F, then `size(A + B)` will be a
little less than `size(A) + size(B)`, because `A + B` will only have one
copy of F. Also, the minimum CGU size is increased because it now has to
account for inlined functions.
This change doesn't have much effect on compile perf, but it makes
follow-on changes that involve more sophisticated reasoning about CGU
sizes much easier.
Remove `box_free` lang item
This PR removes the `box_free` lang item, replacing it with `Box`'s `Drop` impl. Box dropping is still slightly magic because the contained value is still dropped by the compiler.
Always put the `create_size_estimate` calls and `debug_dump` calls
within a timed scopes. This makes the four main steps look more similar
to each other.
The comment says "Find the smallest CGU that has exported symbols and
put the dead function stubs in that CGU". But the code sorts the CGUs by
size (smallest first) and then searches them in reverse order, which
means it will find the *largest* CGU that has exported symbols.
The erroneous code was introduced in #92142.
This commit changes it to use a simpler search, avoiding the sort, and
fixes the bug in the process.
Because tiny CGUs make compilation less efficient *and* result in worse
generated code.
We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.
This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.
Here are some example before/after CGU sizes for opt builds.
- html5ever
- CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
- CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]
- libc
- CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
- CGUs: 1, mean size: 424.0, sizes: [424]
- tt-muncher
- CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
- CGUs: 1, mean size: 9075.0, sizes: [9075]
Note that CGUs of size 100,000+ aren't unusual in larger programs.
This loop is doing two different things. For inlined items, it's adding
them to the CGU. For all items, it's recording them in
`mono_item_placements`.
This commit splits it into two separate loops. This avoids putting root
mono items into `reachable`, and removes the low-value check that
`roots` doesn't contain inlined mono items.
Currently it sorts by symbol name, which is a mangled name like
`_ZN1a4main17hb29587cdb6db5f42E`, which leads to non-obvious orderings.
This commit changes it to use the existing
`items_in_deterministic_order`, which iterates in source code order.
I found this confusing because it includes the root item, plus the
inlined items reachable from the root item. The new formulation
separates the two parts more clearly.