Support for `f16` and `f128` is varied across targets, backends, and
backend versions. Eventually we would like to reach a point where all
backends support these approximately equally, but until then we have to
work around some of these nuances of support being observable.
Introduce the `cfg_target_has_reliable_f16_f128` internal feature, which
provides the following new configuration gates:
* `cfg(target_has_reliable_f16)`
* `cfg(target_has_reliable_f16_math)`
* `cfg(target_has_reliable_f128)`
* `cfg(target_has_reliable_f128_math)`
`reliable_f16` and `reliable_f128` indicate that basic arithmetic for
the type works correctly. The `_math` versions indicate that anything
relying on `libm` works correctly, since sometimes this hits a separate
class of codegen bugs.
These options match configuration set by the build script at [1]. The
logic for LLVM support is duplicated as-is from the same script. There
are a few possible updates that will come as a follow up.
The config introduced here is not planned to ever become stable, it is
only intended to replace the build scripts for `std` tests and
`compiler-builtins` that don't have any way to configure based on the
codegen backend.
MCP: https://github.com/rust-lang/compiler-team/issues/866
Closes: https://github.com/rust-lang/compiler-team/issues/866
[1]: 555e1d0386/library/std/build.rs (L84-L186)
Make #![feature(let_chains)] bootstrap conditional in compiler/
Let chains have been stabilized recently in #132833, so we can remove the gating from our uses in the compiler (as the compiler uses edition 2024).
Autodiff flags
Interestingly, it seems that some other projects have conflicts with exactly the same LLVM optimization passes as autodiff.
At least `LLVMRustOptimize` has exactly the flags that we need to disable problematic opt passes.
This PR enables us to compile code where users differentiate two identical functions in the same module. This has been especially common in test cases, but it's not impossible to encounter in the wild.
It also enables two new flags for testing/debugging. I consider writing an MCP to upgrade PrintPasses to be a standalone -Z flag, since it is *not* the same as `-Z print-llvm-passes`, which IMHO gives less useful output. A discussion can be found here: [#t-compiler/llvm > Print llvm passes. @ 💬](https://rust-lang.zulipchat.com/#narrow/channel/187780-t-compiler.2Fllvm/topic/Print.20llvm.20passes.2E/near/511533038)
Finally, it improves `PrintModBefore` and `PrintModAfter`. They used to work reliable, but now we just schedule enzyme as part of an existing ModulePassManager (MPM). Since Enzyme is last in the MPM scheduling, PrintModBefore became very inaccurate. It used to print the input module, which we gave to the Enzyme and was great to create llvm-ir reproducer. However, lately the MPM would run the whole `default<O3>` pipeline, which heavily modifies the llvm module, before we pass it to Enzyme. That made it impossible to use the flag to create llvm-ir reproducers for Enzyme bugs. We now schedule a PrintModule pass just before Enzyme, solving this problem.
Based on the PrintPass output, it also _seems_ like changing `registerEnzymeAndPassPipeline(PB, true);` to `registerEnzymeAndPassPipeline(PB, false);` has no effect. In theory, the bool should tell Enzyme to schedule some helpful passes in the PassBuilder. However, since it doesn't do anything and I'm not 100% sure anymore on whether we really need it, I'll just disable it for now and postpone investigations.
r? ``@oli-obk``
closes#139471
Tracking:
- https://github.com/rust-lang/rust/issues/124509
mitigate MSVC alignment issue on x86-32
This implements mitigation for https://github.com/rust-lang/rust/issues/112480 by stopping to emit `align` attributes on loads and function arguments when building for a win32 MSVC target. MSVC is known to not properly align `u64` and similar types, and claiming to LLVM that everything is properly aligned increases the chance that this will cause problems.
Of course, the misalignment is still a bug, but we can't fix that bug, only MSVC can.
Also add an errata note to the platform support page warning users about this known problem.
try-job: `i686-msvc*`
simd intrinsics with mask: accept unsigned integer masks, and fix some of the errors
It's not clear at all why the mask would have to be signed, it is anyway interpreted bitwise. The backend should just make sure that works no matter the surface-level type; our LLVM backend already does this correctly. The note of "the mask may be widened, which only has the correct behavior for signed integers" explains... nothing? Why can't the code do the widening correctly? If necessary, just cast to the signed type first...
Also while we are at it, fix the errors. For simd_masked_load/store, the errors talked about the "third argument" but they meant the first argument (the mask is the first argument there). They also used the wrong type for `expected_element`.
I have extremely low confidence in the GCC part of this PR.
See [discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/257879-project-portable-simd/topic/On.20the.20sign.20of.20masks)
avoid overflow when generating debuginfo for expanding recursive types
Fixes#135093Fixes#121538Fixes#107362Fixes#100618Fixes#115994
The overflow happens because expanding recursive types keep creating new nested types when recurring into sub fields.
I fixed that by returning an empty stub node when expanding recursion is detected.
Autodiff batching2
~I will rebase it once my first PR landed.~ done.
This autodiff batch mode is more similar to scalar autodiff, since it still only takes one shadow argument.
However, that argument is supposed to be `width` times larger.
r? `@oli-obk`
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Prepend temp files with per-invocation random string to avoid temp filename conflicts
https://github.com/rust-lang/rust/issues/139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management.
Specifically, imagine we're building a single file 3 times, all with `-Csave-temps -Cincremental=...`. Let's call the object file we're building for the codegen unit for `main` "`XXX.o`" just for clarity since it's probably some gigantic hash name:
```
#[inline(never)]
#[cfg(any(rpass1, rpass3))]
fn a() -> i32 {
0
}
#[cfg(any(cfail2))]
fn a() -> i32 {
1
}
fn main() {
evil::evil();
assert_eq!(a(), 0);
}
mod evil {
#[cfg(any(rpass1, rpass3))]
pub fn evil() {
unsafe {
std::arch::asm!("/* */");
}
}
#[cfg(any(cfail2))]
pub fn evil() {
unsafe {
std::arch::asm!("missing");
}
}
}
```
Session 1 (`rpass1`):
* Type-check, borrow-check, etc.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o` which is spit out in the cwd.
* Hard-link[^1] `XXX.rcgu.o` to the incremental working directory `.../s-...-working/XXX.o`.
* Save-temps option means we don't delete `XXX.rgcu.o`.
* Link the binary and stuff.
* Finalize[^2] the working incremental session by renaming `.../s-...-working` to ` s-...-asjkdhsjakd` (some other finalized incr comp session dir name).
Session 2 (`cfail2`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph.
* Type-check, borrow-check, etc. since the file has changed, so most dep graph nodes are red.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o`. **HERE IS THE PROBLEM**: The hard-link is still set up to point to the inode from `XXX.o` from the first session, so this also modifies the `XXX.o` in the previous finalized session directory.
* Codegen emits an error b/c `missing` is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the *previous* session is the last finalized session.
Session 3 (`rpass3`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph. NOTE that this is from session 1.
* All the dep graph nodes are green since we are basically replaying session 1.
* codegen object file `XXX.o`, which is detected as *reused* from session 1 since dep nodes were green. That means we **reuse** `XXX.o` which had been dirtied from session 2.
* Link the binary and stuff.
This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1.
At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst **unsound**.
This isn't limited to `-C save-temps`, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like `-C debuginfo=1 -C split-debuginfo=unpacked` on darwin, for example).
---
This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not *deterministic*, but temporary files are transient anyways, so I don't believe this is a problem.
That means that temp files are now something like... `{crate-name}.{cgu}.{invocation_temp}.rcgu.o`, where `{invocation_temp}` is the new temporary string we generate per invocation of rustc.
Fixes https://github.com/rust-lang/rust/issues/139407
[^1]: 175dcc7773/compiler/rustc_fs_util/src/lib.rs (L60)
[^2]: 175dcc7773/compiler/rustc_incremental/src/persist/fs.rs (L1-L40)
add `core::intrinsics::simd::{simd_extract_dyn, simd_insert_dyn}`
fixes https://github.com/rust-lang/rust/issues/137372
adds `core::intrinsics::simd::{simd_extract_dyn, simd_insert_dyn}`, which contrary to their non-dyn counterparts allow a non-const index. Many platforms (but notably not x86_64 or aarch64) have dedicated instructions for this operation, which stdarch can emit with this change.
Future work is to also make the `Index` operation on the `Simd` type emit this operation, but the intrinsic can't be used directly. We'll need some MIR shenanigans for that.
r? `@ghost`
add sret handling for scalar autodiff
r? `@oli-obk`
Fixing one of the todo's which I left in my previous batching PR.
This one handles sret for scalar autodiff. `sret` mostly shows up when we try to return a lot of scalar floats.
People often start testing autodiff which toy functions which just use a few scalars as inputs and outputs, and those were the most likely to be affected by this issue. So this fix should make learning/teaching hopefully a bit easier.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
coverage: Build the CGU's global file table as late as possible
Embedding coverage metadata in the output binary is a delicate dance, because per-function records need to embed references to the per-CGU filename table, but we only want to include files in that table if they are successfully used by at least one function.
The way that we build the file tables has changed a few times over the last few years. This particular change is motivated by experimental work on properly supporting macro-expansion regions, which adds some additional constraints that our previous implementation wasn't equipped to deal with.
LLVM is very strict about not allowing unused entries in local file tables. Currently that's not much of an issue, because we assume one source file per function, but to support expansion regions we need the flexibility to avoid committing to the use of a file until we're completely sure that we are able and willing to produce at least one coverage mapping region for it. In particular, when preparing a function's covfun record, we need the flexibility to decide at a late stage that a particular file isn't needed/usable after all.
(It's OK for the *global* file table to contain unused entries, but we would still prefer to avoid that if possible, and this implementation also achieves that.)
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Rename `is_like_osx` to `is_like_darwin`
Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).
``@rustbot`` label O-apple
r? compiler
Add the new `amx` target features and the `movrs` target feature
Adds 5 new `amx` target features included in LLVM20. These are guarded under `x86_amx_intrinsics` (#126622)
- `amx-avx512`
- `amx-fp8`
- `amx-movrs`
- `amx-tf32`
- `amx-transpose`
Adds the `movrs` target feature (from #137976).
`@rustbot` label O-x86_64 O-x86_32 T-compiler A-target-feature
r? `@Amanieu`
Avoid wrapping constant allocations in packed structs when not necessary
This way LLVM will set the string merging flag if the alloc is a nul terminated string, reducing binary sizes.
try-job: armhf-gnu
cg_llvm: Reduce the visibility of types, modules and using declarations in `rustc_codegen_llvm`.
Final part of #135502
Reduces the visibility of types, modules and using declarations in the `rustc_codegen_llvm` to private or `pub(crate)` where possible, and marks unused fields and enum entries with `#[expect(dead_code)]`.
r? Zalathar
Lower BinOp::Cmp to llvm.{s,u}cmp.* intrinsics
Lowers `mir::BinOp::Cmp` (`three_way_compare` intrinsic) to the corresponding LLVM `llvm.{s,u}cmp.i8.*` intrinsics.
These are the intrinsics mentioned in https://github.com/rust-lang/rust/pull/118310, which are now available in LLVM 19.
I couldn't find any follow-up PRs/discussions about this, please let me know if I missed something.
r? `@scottmcm`
For expansion region support, we will want to be able to convert and check
spans before creating a corresponding local file ID.
If we create local file IDs eagerly, but some expansion turns out to have no
successfully-converted spans, LLVM will complain about that expansion's file ID
having no regions.
Mangle rustc_std_internal_symbols functions
This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`.
Helps mitigate https://github.com/rust-lang/rust/issues/104707
try-job: aarch64-gnu-debug
try-job: aarch64-apple
try-job: x86_64-apple-1
try-job: x86_64-mingw-1
try-job: i686-mingw-1
try-job: x86_64-msvc-1
try-job: i686-msvc-1
try-job: test-various
try-job: armhf-gnu
Emit function declarations for functions with `#[linkage="extern_weak"]`
Currently, when declaring an extern weak function in Rust, we use the following syntax:
```rust
unsafe extern "C" {
#[linkage = "extern_weak"]
static FOO: Option<unsafe extern "C" fn() -> ()>;
}
```
This allows runtime-checking the extern weak symbol through the Option.
When emitting LLVM-IR, the Rust compiler currently emits this static as an i8, and a pointer that is initialized with the value of the global i8 and represents the nullabilty e.g.
```
`@FOO` = extern_weak global i8
`@_rust_extern_with_linkage_FOO` = internal global ptr `@FOO`
```
This approach does not work well with CFI, where we need to attach CFI metadata to a concrete function declaration, which was pointed out in https://github.com/rust-lang/rust/issues/115199.
This change switches to emitting a proper function declaration instead of a global i8. This allows CFI to work for extern_weak functions. Example:
```
`@_rust_extern_with_linkage_FOO` = internal global ptr `@FOO`
...
declare !type !61 !type !62 !type !63 !type !64 extern_weak void `@FOO(double)` unnamed_addr #6
```
We keep initializing the Rust internal symbol with the function declaration, which preserves the correct behavior for runtime checking the Option.
r? `@rcvalle`
cc `@jakos-sec`
try-job: test-various
Currently, when declaring an extern weak function in Rust, we use the
following syntax:
```rust
unsafe extern "C" {
#[linkage = "extern_weak"]
static FOO: Option<unsafe extern "C" fn() -> ()>;
}
```
This allows runtime-checking the extern weak symbol through the Option.
When emitting LLVM-IR, the Rust compiler currently emits this static
as an i8, and a pointer that is initialized with the value of the global
i8 and represents the nullabilty e.g.
```
@FOO = extern_weak global i8
@_rust_extern_with_linkage_FOO = internal global ptr @FOO
```
This approach does not work well with CFI, where we need to attach CFI
metadata to a concrete function declaration, which was pointed out in
https://github.com/rust-lang/rust/issues/115199.
This change switches to emitting a proper function declaration instead
of a global i8. This allows CFI to work for extern_weak functions.
We keep initializing the Rust internal symbol with the function
declaration, which preserves the correct behavior for runtime checking
the Option.
Co-authored-by: Jakob Koschel <jakobkoschel@google.com>
Speed up target feature computation
The LLVM backend calls `LLVMRustHasFeature` twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.
r? `@bjorn3`
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).
This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.
This reverts commit 48caf81484, reversing
changes made to c6662879b2.
Apply dllimport in ThinLTO
This partially reverts https://github.com/rust-lang/rust/pull/103353 by properly applying `dllimport` if `-Z dylib-lto` is passed. That PR should probably fully be reverted as it looks quite sketchy. We don't know locally if the entire crate graph would be statically linked.
This should hopefully be sufficient to make ThinLTO work for rustc on Windows.
r? ``@wesleywiser``
---
Edit: This PR is changed to just generally revert https://github.com/rust-lang/rust/pull/103353.
By naming them in `[workspace.lints.rust]` in the top-level
`Cargo.toml`, and then making all `compiler/` crates inherit them with
`[lints] workspace = true`. (I omitted `rustc_codegen_{cranelift,gcc}`,
because they're a bit different.)
The advantages of this over the current approach:
- It uses a standard Cargo feature, rather than special handling in
bootstrap. So, easier to understand, and less likely to get
accidentally broken in the future.
- It works for proc macro crates.
It's a shame it doesn't work for rustc-specific lints, as the comments
explain.
Clean up various LLVM FFI things in codegen_llvm
cc ```@ZuseZ4``` I touched some autodiff parts
The major change of this PR is [bfd88ce](bfd88cead0) which makes `CodegenCx` generic just like `GenericBuilder`
The other commits mostly took advantage of the new feature of making extern functions safe, but also just used some wrappers that were already there and shrunk unsafe blocks.
best reviewed commit-by-commit
Currently it is called twice, once with `allow_unstable` set to true and
once with it set to false. This results in some duplicated work. Most
notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for
every feature, and it's moderately slow. For very short running
compilations on platforms with many features (e.g. a `check` build of
hello-world on x86) this is a significant fraction of runtime.
This commit changes `target_features_cfg` so it is only called once, and
it now returns a pair of feature sets. This halves the number of
`LLVMRustHasFeature` calls.
Rollup of 12 pull requests
Successful merges:
- #135767 (Future incompatibility warning `unsupported_fn_ptr_calling_conventions`: Also warn in dependencies)
- #137852 (Remove layouting dead code for non-array SIMD types.)
- #137863 (Fix pretty printing of unsafe binders)
- #137882 (do not build additional stage on compiler paths)
- #137894 (Revert "store ScalarPair via memset when one side is undef and the other side can be memset")
- #137902 (Make `ast::TokenKind` more like `lexer::TokenKind`)
- #137921 (Subtree update of `rust-analyzer`)
- #137922 (A few cleanups after the removal of `cfg(not(parallel))`)
- #137939 (fix order on shl impl)
- #137946 (Fix docker run-local docs)
- #137955 (Always allow rustdoc-json tests to contain long lines)
- #137958 (triagebot.toml: Don't label `test/rustdoc-json` as A-rustdoc-search)
r? `@ghost`
`@rustbot` modify labels: rollup
Stop using `hash_raw_entry` in `CodegenCx::const_str`
That unstable feature (#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
rename BackendRepr::Vector → SimdVector
For many Rustaceans, "vector" does not imply "SIMD", so let's be more clear in this type that is used pervasively in the compiler.
r? `@workingjubilee`
The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes#115344. Fixes#117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic
Fix enzyme build errors
After [this PR](https://github.com/rust-lang/rust/pull/136428) was merged, I switched to master and attempted building `./x.py build --stage 1 library` with the config mentioned in the enzyme rustbook but it resulted in some errors tho the config.example.toml build succeeded
The errors were re:
### 1. Use of ref in match patterns
The errors were related to match ergonomics in Rust 2024, where ref is no longer needed when matching on references. Examples:
```
error: binding modifiers may only be written when the default binding mode is `move`
--> compiler/rustc_builtin_macros/src/autodiff.rs:136:31
|
136 | Annotatable::Item(ref iitem) => {
| ^^^ binding modifier not allowed under `ref` default binding mode
|
= note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
--> compiler/rustc_builtin_macros/src/autodiff.rs:136:13
|
136 | Annotatable::Item(ref iitem) => {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
|
136 - Annotatable::Item(ref iitem) => {
136 + Annotatable::Item(iitem) => {
|
error: binding modifiers may only be written when the default binding mode is `move`
--> compiler/rustc_builtin_macros/src/autodiff.rs:146:36
|
146 | Annotatable::AssocItem(ref assoc_item, _) => {
| ^^^ binding modifier not allowed under `ref` default binding mode
|
= note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
--> compiler/rustc_builtin_macros/src/autodiff.rs:146:13
|
146 | Annotatable::AssocItem(ref assoc_item, _) => {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
|
146 - Annotatable::AssocItem(ref assoc_item, _) => {
146 + Annotatable::AssocItem(assoc_item, _) => {
|
error: binding modifiers may only be written when the default binding mode is `move`
--> compiler/rustc_builtin_macros/src/autodiff.rs:174:31
|
174 | ... Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ide...
| ^^^ binding modifier not allowed under `ref` default binding mode
|
= note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
--> compiler/rustc_builtin_macros/src/autodiff.rs:174:13
|
174 | ... Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ident.c...
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
|
174 - Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ident.clone()),
174 + Annotatable::Item(iitem) => (iitem.vis.clone(), iitem.ident.clone()),
|
error: binding modifiers may only be written when the default binding mode is `move`
--> compiler/rustc_builtin_macros/src/autodiff.rs:175:36
|
175 | Annotatable::AssocItem(ref assoc_item, _) => {
| ^^^ binding modifier not allowed under `ref` default binding mode
|
= note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
--> compiler/rustc_builtin_macros/src/autodiff.rs:175:13
|
175 | Annotatable::AssocItem(ref assoc_item, _) => {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
|
175 - Annotatable::AssocItem(ref assoc_item, _) => {
175 + Annotatable::AssocItem(assoc_item, _) => {
|
error: could not compile `rustc_builtin_macros` (lib) due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
Build completed unsuccessfully in 0:19:39
```
### 2. the use of external C blocks without unsafe in compiler/rustc_codegen_llvm/src/llvm/enzyme_ffi.rs (I don't have the error message handy)
The first commit fixes the errors above
---
## Additional Improvement:
`@ZuseZ4` suggested we consolidate the variants under `#[cfg(llvm_enzyme)]` and `#[cfg(not(llvm_enzyme))]` by conditionally checking for `cfg!(llvm_enzyme)` instead. This way, the autodiff code is compiled but not executed avoiding such regressions
r? `@ZuseZ4`
cc: `@oli-obk`
That unstable feature completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
codegen_llvm: avoid `Deref` impls w/ extern type
`rustc_codegen_llvm` relied on `Deref` impls where `Deref::Target` was or contained an extern type - in my experimental implementation of rust-lang/rfcs#3729, this isn't possible as the `Target` associated type's `?Sized` bound cannot be relaxed backwards compatibly (unless we come up with some way of doing this).
In later pull requests with the rust-lang/rfcs#3729 implementation, breakage like this could only occur for nightly users relying on the `extern_types` feature.
Upstreaming this to avoid needing to keep carrying this patch locally, and I think it'll necessarily need to change eventually.
remove `simd_fpow` and `simd_fpowi`
Discussed in https://github.com/rust-lang/rust/issues/137555
These functions are not exposed from `std::intrinsics::simd`, and not used anywhere outside of the compiler. They also don't lower to particularly good code at least on the major ISAs (I checked x86_64, aarch64, s390x, powerpc), where the vector is just spilled to the stack and scalar functions are used for the actual logic.
r? `@RalfJung`
`rustc_codegen_llvm` relied on `Deref` impls where `Deref::Target` was
or contained an extern type - in my experimental implementation of
rust-lang/rfcs#3729, this isn't possible as the `Target` associated
type's `?Sized` bound cannot be relaxed backwards compatibly (unless we
come up with some way of doing this).
In later pull requests with the rust-lang/rfcs#3729 implementation,
breakage like this could only occur for nightly users relying on the
`extern_types` feature.
Upstreaming this to avoid needing to keep carrying this patch locally,
and I think it'll necessarily need to change eventually.
Emit getelementptr inbounds nuw for pointer::add()
Lower pointer::add (via intrinsic::offset with unsigned offset) to getelementptr inbounds nuw on LLVM versions that support it. This lets LLVM make use of the pre-condition that the offset addition does not wrap in an unsigned sense. Together with inbounds, this also implies that the offset is non-negative.
Fixes https://github.com/rust-lang/rust/issues/137217.
intrinsics: unify rint, roundeven, nearbyint in a single round_ties_even intrinsic
LLVM has three intrinsics here that all do the same thing (when used in the default FP environment). There's no reason Rust needs to copy that historically-grown mess -- let's just have one intrinsic and leave it up to the LLVM backend to decide how to lower that.
Suggested by `@hanna-kruppe` in https://github.com/rust-lang/rust/issues/136459; Cc `@tgross35`
try-job: test-various
Some codegen_llvm cleanups
Using some more safe wrappers and thus being able to remove a large unsafe block.
As a next step we should probably look into safe extern fns
- For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information
- Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information