This replaces the drop_in_place reference with null in vtables. On
librustc_driver.so, this drops about ~17k dynamic relocations from the
output, since many vtables can now be placed in read-only memory, rather
than having a relocated pointer included.
This makes a tradeoff by adding a null check at vtable call sites.
That's hard to avoid without changing the vtable format (e.g., to use a
pc-relative relocation instead of an absolute address, and avoid the
dynamic relocation that way). But it seems likely that the check is
cheap at runtime.
rustc_codegen_llvm: add support for writing summary bitcode
Typical uses of ThinLTO don't have any use for this as a standalone file, but distributed ThinLTO uses this to make the linker phase more efficient. With clang you'd do something like `clang -flto=thin -fthin-link-bitcode=foo.indexing.o -c foo.c` and then get both foo.o (full of bitcode) and foo.indexing.o (just the summary or index part of the bitcode). That's then usable by a two-stage linking process that's more friendly to distributed build systems like bazel, which is why I'm working on this area.
I talked some to `@teresajohnson` about naming in this area, as things seem to be a little confused between various blog posts and build systems. "bitcode index" and "bitcode summary" tend to be a little too ambiguous, and she tends to use "thin link bitcode" and "minimized bitcode" (which matches the descriptions in LLVM). Since the clang option is thin-link-bitcode, I went with that to try and not add a new spelling in the world.
Per `@dtolnay,` you can work around the lack of this by using `lld --thinlto-index-only` to do the indexing on regular .o files of bitcode, but that is a bit wasteful on actions when we already have all the information in rustc and could just write out the matching minimized bitcode. I didn't test that at all in our infrastructure, because by the time I learned that I already had this patch largely written.
Typical uses of ThinLTO don't have any use for this as a standalone
file, but distributed ThinLTO uses this to make the linker phase more
efficient. With clang you'd do something like `clang -flto=thin
-fthin-link-bitcode=foo.indexing.o -c foo.c` and then get both foo.o
(full of bitcode) and foo.indexing.o (just the summary or index part of
the bitcode). That's then usable by a two-stage linking process that's
more friendly to distributed build systems like bazel, which is why I'm
working on this area.
I talked some to @teresajohnson about naming in this area, as things
seem to be a little confused between various blog posts and build
systems. "bitcode index" and "bitcode summary" tend to be a little too
ambiguous, and she tends to use "thin link bitcode" and "minimized
bitcode" (which matches the descriptions in LLVM). Since the clang
option is thin-link-bitcode, I went with that to try and not add a new
spelling in the world.
Per @dtolnay, you can work around the lack of this by using `lld
--thinlto-index-only` to do the indexing on regular .o files of
bitcode, but that is a bit wasteful on actions when we already have all
the information in rustc and could just write out the matching minimized
bitcode. I didn't test that at all in our infrastructure, because by the
time I learned that I already had this patch largely written.
`-Z debug-macros` is "stabilized" by enabling it by default and removing.
`-Z collapse-macro-debuginfo` is stabilized as `-C collapse-macro-debuginfo`.
It now supports all typical boolean values (`parse_opt_bool`) in addition to just yes/no.
Default value of `collapse_debuginfo` was changed from `false` to `external` (i.e. collapsed if external, not collapsed if local).
`#[collapse_debuginfo]` attribute without a value is no longer supported to avoid guessing the default.
Subtree sync for rustc_codegen_cranelift
This fixes a crash when compiling the standard library. In addition the Cranelift update fixes all the 128bit int abi incompatibility between cg_clif and cg_llvm.
r? ``@ghost``
``@rustbot`` label +A-codegen +A-cranelift +T-compiler
Dellvmize some intrinsics (use `u32` instead of `Self` in some integer intrinsics)
This implements https://github.com/rust-lang/compiler-team/issues/693 minus what was implemented in #123226.
Note: I decided to _not_ change `shl`/... builder methods, as it just doesn't seem worth it.
r? ``@scottmcm``
Save/restore more items in cache with incremental compilation
Right now they don't play very well together, consider a simple example:
```
$ export RUSTFLAGS="--emit asm"
$ cargo new --lib foo
Created library `foo` package
$ cargo build -q
$ touch src/lib.rs
$ cargo build
error: could not copy
"/path/to/foo/target/debug/deps/foo-e307cc7fa7b6d64f.4qbzn9k8mosu50a5.rcgu.s"
to "/path/to/foo/target/debug/deps/foo-e307cc7fa7b6d64f.s":
No such file or directory (os error 2)
```
Touch triggers the rebuild, incremental compilation detects no changes (yay) and everything explodes while trying to copy files were they should go.
This pull request fixes it by copying and restoring more files in the incremental compilation cache
Fixes https://github.com/rust-lang/rust/issues/89149
Fixes https://github.com/rust-lang/rust/issues/88829
Related: https://internals.rust-lang.org/t/interaction-between-incremental-compilation-and-emit/20551
rename ptr::from_exposed_addr -> ptr::with_exposed_provenance
As discussed on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/To.20expose.20or.20not.20to.20expose/near/427757066).
The old name, `from_exposed_addr`, makes little sense as it's not the address that is exposed, it's the provenance. (`ptr.expose_addr()` stays unchanged as we haven't found a better option yet. The intended interpretation is "expose the provenance and return the address".)
The new name nicely matches `ptr::without_provenance`.
Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
Simplify trim-paths feature by merging all debuginfo options together
This PR simplifies the trim-paths feature by merging all debuginfo options together, as described in https://github.com/rust-lang/rust/issues/111540#issuecomment-1994010274.
And also do some correctness fixes found during the review.
cc `@weihanglo`
r? `@michaelwoerister`
Codegen const panic messages as function calls
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting shared library (see [perf]).
A sample improvement from nightly:
```
leaq str.0(%rip), %rdi
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdx
movl $25, %esi
callq *_ZN4core9panicking5panic17h17cabb89c5bcc999E@GOTPCREL(%rip)
```
to this PR:
```
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdi
callq *_RNvNtNtCsduqIKoij8JB_4core9panicking11panic_const23panic_const_div_by_zero@GOTPCREL(%rip)
```
[perf]: https://perf.rust-lang.org/compare.html?start=a7e4de13c1785819f4d61da41f6704ed69d5f203&end=64fbb4f0b2d621ff46d559d1e9f5ad89a8d7789b&stat=instructions:u
Remove `TypeAndMut` from `ty::RawPtr` variant, make it take `Ty` and `Mutability`
Pretty much mechanically converting `ty::RawPtr(ty::TypeAndMut { ty, mutbl })` to `ty::RawPtr(ty, mutbl)` and its fallout.
r? lcnr
cc rust-lang/types-team#124
"Handle" calls to upstream monomorphizations in compiler_builtins
This is pretty cooked, but I think it works.
compiler-builtins has a long-standing problem that at link time, its rlib cannot contain any calls to `core`. And yet, in codegen we _love_ inserting calls to symbols in `core`, generally from various panic entrypoints.
I intend this PR to attack that problem as completely as possible. When we generate a function call, we now check if we are generating a function call from `compiler_builtins` and whether the callee is a function which was not lowered in the current crate, meaning we will have to link to it.
If those conditions are met, actually generating the call is asking for a linker error. So we don't. If the callee diverges, we lower to an abort with the same behavior as `core::intrinsics::abort`. If the callee does not diverge, we produce an error. This means that compiler-builtins can contain panics, but they'll SIGILL instead of panicking. I made non-diverging calls a compile error because I'm guessing that they'd mostly get into compiler-builtins by someone making a mistake while working on the crate, and compile errors are better than linker errors. We could turn such calls into aborts as well if that's preferred.
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting binary.
add test ensuring simd codegen checks don't run when a static assertion failed
stdarch relies on this to ensure that SIMD indices are in bounds.
I would love to know why this works, but I can't figure out where codegen decides to not codegen a function if a required-const does not evaluate. `@oli-obk` `@bjorn3` do you have any idea?
Distinguish between library and lang UB in assert_unsafe_precondition
As described in https://github.com/rust-lang/rust/pull/121583#issuecomment-1963168186, `assert_unsafe_precondition` now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
`debug_assert_nounwind` was originally added to avoid the "only at runtime" aspect of `assert_unsafe_precondition`. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now _all_ preconditions shall be checked with `assert_unsafe_precondition`. If you have a precondition that's only checkable at runtime, do a `const_eval_select` hack, as done in this PR.
r? RalfJung
Add asm goto support to `asm!`
Tracking issue: #119364
This PR implements asm-goto support, using the syntax described in "future possibilities" section of [RFC2873](https://rust-lang.github.io/rfcs/2873-inline-asm.html#asm-goto).
Currently I have only implemented the `label` part, not the `fallthrough` part (i.e. fallthrough is implicit). This doesn't reduce the expressive though, since you can use label-break to get arbitrary control flow or simply set a value and rely on jump threading optimisation to get the desired control flow. I can add that later if deemed necessary.
r? ``@Amanieu``
cc ``@ojeda``
Remove useless lifetime of ArchiveBuilder
`trait ArchiveBuilder<'a>` has a seemingly useless lifetime a, so I remove it. If this is intentional, please reject this PR.
```rust
pub trait ArchiveBuilder<'a> {
fn add_file(&mut self, path: &Path);
fn add_archive(
&mut self,
archive: &Path,
skip: Box<dyn FnMut(&str) -> bool + 'static>,
) -> io::Result<()>;
fn build(self: Box<Self>, output: &Path) -> bool;
}
```
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.
And this uses the algebraic float ops to fix https://github.com/rust-lang/rust/issues/120720
cc `@orlp`
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
Replacement of #114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two
This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See #114390 for more discussion.
While I have extended the scope of the power of two optimization from #114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.
Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).
Fixes#47234Resolves#114390
`@Centri3`
Improved support of collapse_debuginfo attribute for macros.
Added walk_chain_collapsed function to consider collapse_debuginfo attribute in parent macros in call chain.
Fixed collapse_debuginfo attribute processing for cranelift (there was if/else branches error swap).
cc https://github.com/rust-lang/rust/issues/100758
Remove `DiagCtxt` API duplication
`DiagCtxt` defines the internal API for creating and emitting diagnostics: methods like `struct_err`, `struct_span_warn`, `note`, `create_fatal`, `emit_bug`. There are over 50 methods.
Some of these methods are then duplicated across several other types: `Session`, `ParseSess`, `Parser`, `ExtCtxt`, and `MirBorrowckCtxt`. `Session` duplicates the most, though half the ones it does are unused. Each duplicated method just calls forward to the corresponding method in `DiagCtxt`. So this duplication exists to (in the best case) shorten chains like `ecx.tcx.sess.parse_sess.dcx.emit_err()` to `ecx.emit_err()`.
This API duplication is ugly and has been bugging me for a while. And it's inconsistent: there's no real logic about which methods are duplicated, and the use of `#[rustc_lint_diagnostic]` and `#[track_caller]` attributes vary across the duplicates.
This PR removes the duplicated API methods and makes all diagnostic creation and emission go through `DiagCtxt`. It also adds `dcx` getter methods to several types to shorten chains. This approach scales *much* better than API duplication; indeed, the PR adds `dcx()` to numerous types that didn't have API duplication: `TyCtxt`, `LoweringCtxt`, `ConstCx`, `FnCtxt`, `TypeErrCtxt`, `InferCtxt`, `CrateLoader`, `CheckAttrVisitor`, and `Resolver`. These result in a lot of changes from `foo.tcx.sess.emit_err()` to `foo.dcx().emit_err()`. (You could do this with more types, but it gets into diminishing returns territory for types that don't emit many diagnostics.)
After all these changes, some call sites are more verbose, some are less verbose, and many are the same. The total number of lines is reduced, mostly because of the removed API duplication. And consistency is increased, because calls to `emit_err` and friends are always preceded with `.dcx()` or `.dcx`.
r? `@compiler-errors`
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
compile-time evaluation: detect writes through immutable pointers
This has two motivations:
- it unblocks https://github.com/rust-lang/rust/pull/116745 (and therefore takes a big step towards `const_mut_refs` stabilization), because we can now detect if the memory that we find in `const` can be interned as "immutable"
- it would detect the UB that was uncovered in https://github.com/rust-lang/rust/pull/117905, which was caused by accidental stabilization of `copy` functions in `const` that can only be called with UB
When UB is detected, we emit a future-compat warn-by-default lint. This is not a breaking change, so completely in line with [the const-UB RFC](https://rust-lang.github.io/rfcs/3016-const-ub.html), meaning we don't need t-lang FCP here. I made the lint immediately show up for dependencies since it is nearly impossible to even trigger this lint without `const_mut_refs` -- the accidentally stabilized `copy` functions are the only way this can happen, so the crates that popped up in #117905 are the only causes of such UB (in the code that crater covers), and the three cases of UB that we know about have all been fixed in their respective crates already.
The way this is implemented is by making use of the fact that our interpreter is already generic over the notion of provenance. For CTFE we now use the new `CtfeProvenance` type which is conceptually an `AllocId` plus a boolean `immutable` flag (but packed for a more efficient representation). This means we can mark a pointer as immutable when it is created as a shared reference. The flag will be propagated to all pointers derived from this one. We can then check the immutable flag on each write to reject writes through immutable pointers.
I just hope perf works out.
Currently, `Handler::fatal` returns `FatalError`. But `Session::fatal`
returns `!`, because it calls `Handler::fatal` and then calls `raise` on
the result. This inconsistency is unfortunate.
This commit changes `Handler::fatal` to do the `raise` itself, changing
its return type to `!`. This is safe because there are only two calls to
`Handler::fatal`, one in `rustc_session` and one in
`rustc_codegen_cranelift`, and they both call `raise` on the result.
`HandlerInner::fatal` still returns `FatalError`, so I renamed it
`fatal_no_raise` to emphasise the return type difference.
share some track_caller logic between interpret and codegen
Also move the code that implements the track_caller intrinsics out of the core interpreter engine -- it's just a helper creating a const-allocation, doesn't need to be part of the interpreter core.
Bump stdarch submodule and remove special handling for LLVM intrinsics that are no longer needed
Bumps stdarch to pull https://github.com/rust-lang/stdarch/pull/1477, which reimplemented some functions with portable SIMD intrinsics instead of arch specific LLVM intrinsics.
Handling of those LLVM intrinsics is removed from cranelift codegen and miri.
cc `@RalfJung` `@bjorn3`
Prototype using const generic for simd_shuffle IDX array
cc https://github.com/rust-lang/rust/issues/85229
r? `@workingjubilee` on the design
TLDR: there is now a `fn simd_shuffle_generic<T, U, const IDX: &'static [u32]>(x: T, y: T) -> U;` intrinsic that allows replacing
```rust
simd_shuffle(a, b, const { stuff })
```
with
```rust
simd_shuffle_generic::<_, _, {&stuff}>(a, b)
```
which makes the compiler implementations much simpler, if we manage to at some point eliminate `simd_shuffle`.
There are some issues with this today though (can't do math without bubbling it up in the generic arguments). With this change, we can start porting the simple cases and get better data on the others.
rename mir::Constant -> mir::ConstOperand, mir::ConstKind -> mir::Const
Also, be more consistent with the `to/eval_bits` methods... we had some that take a type and some that take a size, and then sometimes the one that takes a type is called `bits_for_ty`.
Turns out that `ty::Const`/`mir::ConstKind` carry their type with them, so we don't need to even pass the type to those `eval_bits` functions at all.
However this is not properly consistent yet: in `ty` we have most of the methods on `ty::Const`, but in `mir` we have them on `mir::ConstKind`. And indeed those two types are the ones that correspond to each other. So `mir::ConstantKind` should actually be renamed to `mir::Const`. But what to do with `mir::Constant`? It carries around a span, that's really more like a constant operand that appears as a MIR operand... it's more suited for `syntax.rs` than `consts.rs`, but the bigger question is, which name should it get if we want to align the `mir` and `ty` types? `ConstOperand`? `ConstOp`? `Literal`? It's not a literal but it has a field called `literal` so it would at least be consistently wrong-ish...
``@oli-obk`` any ideas?
move required_consts check to general post-mono-check function
This factors some code that is common between the interpreter and the codegen backends into shared helper functions. Also as a side-effect the interpreter now uses the same `eval` functions as everyone else to get the evaluated MIR constants.
Also this is in preparation for another post-mono check that will be needed for (the current hackfix for) https://github.com/rust-lang/rust/issues/115709: ensuring that all locals are dynamically sized.
I didn't expect this to change diagnostics, but it's just cycle errors that change.
r? `@oli-obk`
Remove `verbose_generic_activity_with_arg`
This removes `verbose_generic_activity_with_arg` and changes users to `generic_activity_with_arg`. This keeps the output of `-Z time` readable while these repeated events are still available with the self profiling mechanism.
Update stdarch submodule and remove special handling in cranelift codegen for some AVX and SSE2 LLVM intrinsics
https://github.com/rust-lang/stdarch/pull/1463 reimplemented some x86 intrinsics to avoid using some x86-specific LLVM intrinsics:
* Store unaligned (`_mm*_storeu_*`) use `<*mut _>::write_unaligned` instead of `llvm.x86.*.storeu.*`.
* Shift by immediate (`_mm*_s{ll,rl,ra}i_epi*`) use `if` (srl, sll) or `min` (sra) to simulate the behaviour when the RHS is out of range. RHS is constant, so the `if`/`min` will be optimized away.
This PR updates the stdarch submodule to pull these changes and removes special handling for those LLVM intrinsics from cranelift codegen. I left gcc codegen untouched because there are some autogenerated lists.
As experimentation in 115242 has shown looks better than `coldcc`.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse.
cc tracking issue 97544
Rollup of 6 pull requests
Successful merges:
- #110435 (rustdoc-json: Add test for field ordering.)
- #111891 (feat: `riscv-interrupt-{m,s}` calling conventions)
- #114377 (test_get_dbpath_for_term(): handle non-utf8 paths (fix FIXME))
- #114469 (Detect method not found on arbitrary self type with different mutability)
- #114587 (Convert Const to Allocation in smir)
- #114670 (Don't use `type_of` to determine if item has intrinsic shim)
Failed merges:
- #114599 (Add impl trait declarations to SMIR)
r? `@ghost`
`@rustbot` modify labels: rollup
Similar to prior support added for the mips430, avr, and x86 targets
this change implements the rough equivalent of clang's
[`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling
e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking
visibility into any non-assembly body of the interrupt handler, have to
be very conservative and save the [entire CPU state to the stack
frame][full-frame-save]. By instead asking LLVM to only save the
registers that it uses, we defer the decision to the tool with the best
context: it can more accurately account for the cost of spills if it
knows that every additional register used is already at the cost of an
implicit spill.
At the LLVM level, this is apparently [implemented by] marking every
register as "[callee-save]," matching the semantics of an interrupt
handler nicely (it has to leave the CPU state just as it found it after
its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes
no attempt to e.g. save the state in a user-accessible stack frame. For
a full discussion of those challenges and tradeoffs, please refer to
[the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM
does not expose the "all-saved" function flavor as a calling convention
directly, instead preferring to use an attribute that allows for
differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be
aware of the differences between machine-mode and supervisor-mode
interrupts as to why no `riscv-interrupt` calling convention is exposed
through rustc, and similarly for why `riscv-interrupt-u` makes no
appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly
As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)
Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.
cc `@RalfJung` `@Amanieu`
Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process.
Same as #111551, which I [accidentally closed](https://github.com/rust-lang/rust/pull/111551#issuecomment-1571222612) :/
---
This resurrects PR #103830, which has sat idle for a while.
Beyond #103830, this also:
- fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`)
- fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`)
- fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`)
r? `@nikic`
---
`@pcwalton's` original PR description is reproduced below:
Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.
The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.
As a side effect, this should fix#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.
[this comment]: #80822 (comment)
It makes it sound like the `ExprKind` and `Rvalue` are supposed to represent all pointer related
casts, when in reality their just used to share a some enum variants. Make it clear there these
are only coercion to make it clear why only some pointer related "casts" are in the enum.
`lookup_debug_loc` calls `SourceMap::lookup_line`, which does a binary
search over the files, and then a binary search over the lines within
the found file. It then calls `SourceFile::line_begin_pos`, which redoes
the binary search over the lines within the found file.
This commit removes the second binary search over the lines, instead
getting the line starting pos directly using the result of the first
binary search over the lines.
(And likewise for `get_span_loc`, in the cranelift backend.)