Prepend temp files with per-invocation random string to avoid temp filename conflicts
https://github.com/rust-lang/rust/issues/139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management.
Specifically, imagine we're building a single file 3 times, all with `-Csave-temps -Cincremental=...`. Let's call the object file we're building for the codegen unit for `main` "`XXX.o`" just for clarity since it's probably some gigantic hash name:
```
#[inline(never)]
#[cfg(any(rpass1, rpass3))]
fn a() -> i32 {
0
}
#[cfg(any(cfail2))]
fn a() -> i32 {
1
}
fn main() {
evil::evil();
assert_eq!(a(), 0);
}
mod evil {
#[cfg(any(rpass1, rpass3))]
pub fn evil() {
unsafe {
std::arch::asm!("/* */");
}
}
#[cfg(any(cfail2))]
pub fn evil() {
unsafe {
std::arch::asm!("missing");
}
}
}
```
Session 1 (`rpass1`):
* Type-check, borrow-check, etc.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o` which is spit out in the cwd.
* Hard-link[^1] `XXX.rcgu.o` to the incremental working directory `.../s-...-working/XXX.o`.
* Save-temps option means we don't delete `XXX.rgcu.o`.
* Link the binary and stuff.
* Finalize[^2] the working incremental session by renaming `.../s-...-working` to ` s-...-asjkdhsjakd` (some other finalized incr comp session dir name).
Session 2 (`cfail2`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph.
* Type-check, borrow-check, etc. since the file has changed, so most dep graph nodes are red.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o`. **HERE IS THE PROBLEM**: The hard-link is still set up to point to the inode from `XXX.o` from the first session, so this also modifies the `XXX.o` in the previous finalized session directory.
* Codegen emits an error b/c `missing` is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the *previous* session is the last finalized session.
Session 3 (`rpass3`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph. NOTE that this is from session 1.
* All the dep graph nodes are green since we are basically replaying session 1.
* codegen object file `XXX.o`, which is detected as *reused* from session 1 since dep nodes were green. That means we **reuse** `XXX.o` which had been dirtied from session 2.
* Link the binary and stuff.
This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1.
At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst **unsound**.
This isn't limited to `-C save-temps`, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like `-C debuginfo=1 -C split-debuginfo=unpacked` on darwin, for example).
---
This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not *deterministic*, but temporary files are transient anyways, so I don't believe this is a problem.
That means that temp files are now something like... `{crate-name}.{cgu}.{invocation_temp}.rcgu.o`, where `{invocation_temp}` is the new temporary string we generate per invocation of rustc.
Fixes https://github.com/rust-lang/rust/issues/139407
[^1]: 175dcc7773/compiler/rustc_fs_util/src/lib.rs (L60)
[^2]: 175dcc7773/compiler/rustc_incremental/src/persist/fs.rs (L1-L40)
Tell LLVM about impossible niche tags
I was trying to find a better way of emitting discriminant calculations, but sadly had no luck.
So here's a fairly small PR with the bits that did seem worth bothering:
1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.)
2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.
Refactor Apple version handling in the compiler
Move various Apple version handling code in the compiler out `rustc_codegen_ssa` and into a place where it can be accessed by `rustc_attr_parsing`, which I found to be necessary when doing https://github.com/rust-lang/rust/pull/136867. Thought I'd split it out to make it easier to land, and to make further changes like https://github.com/rust-lang/rust/pull/131477 have fewer conflicts / PR dependencies.
There should be no functional changes in this PR.
`@rustbot` label O-apple
r? rust-lang/compiler
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Rename `is_like_osx` to `is_like_darwin`
Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).
``@rustbot`` label O-apple
r? compiler
Improve `xcrun` error handling
The compiler invokes `xcrun` on macOS when linking Apple targets, to find the Xcode SDK which contain all the necessary linker stubs. The error messages that `xcrun` outputs aren't always that great though, so this PR tries to improve that by providing extra context when an error occurs.
Fixes https://github.com/rust-lang/rust/issues/56829.
Fixes https://github.com/rust-lang/rust/issues/84534.
Part of https://github.com/rust-lang/rust/issues/129432.
See also the alternative https://github.com/rust-lang/rust/pull/131433.
Tested on:
- `x86_64-apple-darwin`, MacBook Pro running Mac OS X 10.12.6
- With no tooling installed
- With Xcode 9.2
- With Xcode 9.2 Commandline Tools
- `aarch64-apple-darwin`, MacBook M2 Pro running macOS 14.7.4
- With Xcode 13.4.1
- With Xcode 16.2
- Inside `nix-shell -p xcbuild` (nixpkgs' `xcrun` shim)
- `aarch64-apple-darwin`, VM running macOS 15.3.1
- With no tooling installed
- With Xcode 16.2 Commandline Tools
``@rustbot`` label O-apple
r? compiler
CC ``@BlackHoleFox`` ``@thomcc``
add FCW to warn about wasm ABI transition
See https://github.com/rust-lang/rust/issues/122532 for context: the "C" ABI on wasm32-unk-unk will change. The goal of this lint is to warn about any function definition and calls whose behavior will be affected by the change. My understanding is the following:
- scalar arguments are fine
- including 128 bit types, they get passed as two `i64` arguments in both ABIs
- `repr(C)` structs (recursively) wrapping a single scalar argument are fine (unless they have extra padding due to over-alignment attributes)
- all return values are fine
`@bjorn3` `@alexcrichton` `@Manishearth` is that correct?
I am making this a "show up in future compat reports" lint to maximize the chances people become aware of this. OTOH this likely means warnings for most users of Diplomat so maybe we shouldn't do this?
IIUC, wasm-bindgen should be unaffected by this lint as they only pass scalar types as arguments.
Tracking issue: https://github.com/rust-lang/rust/issues/138762
Transition plan blog post: https://github.com/rust-lang/blog.rust-lang.org/pull/1531
try-job: dist-various-2
Remove InstanceKind::generates_cgu_internal_copy
This PR should not contain any behavior changes. Before this PR, the logic for selecting instantiation mode is spread across all of
* `instantiation_mode`
* `cross_crate_inlinable`
* `generates_cgu_internal_copy`
* `requires_inline`
The last two of those functions are not well-designed. The function that actually decides if we generate a CGU-internal copy is `instantiation_mode`, _not_ `generates_cgu_internal_copy`. The function `requires_inline` documents that it is about the LLVM `inline` attribute and that it is a hint. The LLVM attribute is called `inlinehint`, this function is also used by other codegen backends, and since it is part of instantiation mode selection it is *not* a hint.
The goal of this PR is to start cleaning up the logic into a sequence of checks that have a more logical flow and are easier to customize in the future (to do things like improve incrementality or improve optimizations without causing obscure linker errors because you forgot to update another part of the compiler).
Lower to a memset(undef) when Rvalue::Repeat repeats uninit
Fixes https://github.com/rust-lang/rust/issues/138625.
It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.
Lower BinOp::Cmp to llvm.{s,u}cmp.* intrinsics
Lowers `mir::BinOp::Cmp` (`three_way_compare` intrinsic) to the corresponding LLVM `llvm.{s,u}cmp.i8.*` intrinsics.
These are the intrinsics mentioned in https://github.com/rust-lang/rust/pull/118310, which are now available in LLVM 19.
I couldn't find any follow-up PRs/discussions about this, please let me know if I missed something.
r? `@scottmcm`
Don't attempt to export compiler-builtins symbols from rust dylibs
They are marked with hidden visibility to prevent them from getting exported, so we shouldn't ask the linker to export them anyway. The only thing that does it cause a warning on macOS.
Part of https://github.com/rust-lang/rust/issues/136096
cc `@jyn514`
Avoid no-op unlink+link dances in incr comp
Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that.
The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory.
This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links.
`cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`:
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
52.56 0.024950 25 978 485 unlink
34.38 0.016318 22 727 linkat
13.06 0.006200 24 249 unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00 0.047467 24 1954 485 total
```
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
42.83 0.014521 57 252 unlink
38.41 0.013021 26 486 linkat
18.77 0.006362 25 249 unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00 0.033904 34 987 total
```
This reduces the number of hard-links that are causing perf troubles, noted in https://github.com/rust-lang/rust/issues/64291 and https://github.com/rust-lang/rust/issues/137560
They are marked with hidden visibility to prevent them from getting
exported, so we shouldn't ask the linker to export them anyway. The only
thing that does it cause a warning on macOS.
It's very useful. There are some false positives involving integration
tests in `rustc_pattern_analysis` and `rustc_serialize`. There is also a
false positive involving `rustc_driver_impl`'s
`rustc_randomized_layouts` feature. And I removed a `rustc_span` mention
in a doc comment in `rustc_log` because it wasn't integral to the
comment but caused a dev-dependency.
Rollup of 7 pull requests
Successful merges:
- #138384 (Move `hir::Item::ident` into `hir::ItemKind`.)
- #138508 (Clarify "owned data" in E0515.md)
- #138531 (Store test diffs in job summaries and improve analysis formatting)
- #138533 (Only use `DIST_TRY_BUILD` for try jobs that were not selected explicitly)
- #138556 (Fix ICE: attempted to remap an already remapped filename)
- #138608 (rustc_target: Add target feature constraints for LoongArch)
- #138619 (Flatten `if`s in `rustc_codegen_ssa`)
r? `@ghost`
`@rustbot` modify labels: rollup
Mangle rustc_std_internal_symbols functions
This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`.
Helps mitigate https://github.com/rust-lang/rust/issues/104707
try-job: aarch64-gnu-debug
try-job: aarch64-apple
try-job: x86_64-apple-1
try-job: x86_64-mingw-1
try-job: i686-mingw-1
try-job: x86_64-msvc-1
try-job: i686-msvc-1
try-job: test-various
try-job: armhf-gnu
Denote `ControlFlow` as `#[must_use]`
I've repeatedly hit bugs in the compiler due to `ControlFlow` not being marked `#[must_use]`. There seems to be an accepted ACP to make the type `#[must_use]` (https://github.com/rust-lang/libs-team/issues/444), so this PR implements that part of it.
Most of the usages in the compiler that trigger this new warning are "root" usages (calling into an API that uses control-flow internally, but for which the callee doesn't really care) and have been suppressed by `let _ = ...`, but I did legitimately find one instance of a missing `?` and one for a never-used `ControlFlow` value in #137448.
Presumably this needs an FCP too, so I'm opening this and nominating it for T-libs-api.
This PR also touches the tools (incl. rust-analyzer), but if this went into FCP, I'd split those out into separate PRs which can land before this one does.
r? libs-api
`@rustbot` label: T-libs-api I-libs-api-nominated
Rollup of 5 pull requests
Successful merges:
- #138283 (Enforce type of const param correctly in MIR typeck)
- #138439 (feat: check ARG_MAX on Unix platforms)
- #138502 (resolve: Avoid some unstable iteration)
- #138514 (Remove fake borrows of refs that are converted into non-refs in `MakeByMoveBody`)
- #138524 (Mark myself as unavailable for reviews temporarily)
r? `@ghost`
`@rustbot` modify labels: rollup
On Unix the limits can be gargantuan anyway so we're pretty
unlikely to hit them, but might still exceed it.
We consult ARG_MAX here to get an estimate.
fix: remove the check of lld not supporting @response-file
In LLVM v9, lld has supported `@response-file.`
LLVM v9 was released on 2019-09-19.
The check was added back to 2018-03-14 (1.26.0) via 04442af18b.
It has been more than five years, and we ship our own lld regardlessly.
This should be happily removed.
See also:
* <bb12396f91>
* <https://reviews.llvm.org/D63024>
atomic intrinsics: clarify which types are supported and (if applicable) what happens with provenance
The provenance semantics match what Miri implements and what the `AtomicPtr` API expects.
Don't `alloca` just to look at a discriminant
Today we're making LLVM do a bunch of extra work when you match on trivial stuff like `Option<bool>` or `ControlFlow<u8>`.
This PR changes that so that simple types like `Option<u32>` or `Result<(), Box<Error>>` can stay as `OperandValue::ScalarPair` and we can still read the discriminant from them, rather than needing to write them into memory to have a `PlaceValue` just to get the discriminant out.
Fixes#137503
attempt to support `BinaryFormat::Xcoff` in `naked_asm!`
Fixes https://github.com/rust-lang/rust/issues/137219
So, the inline assembly support for xcoff is extremely limited. The LLVM [XCOFFAsmParser](1b25c0c4da/llvm/lib/MC/MCParser/XCOFFAsmParser.cpp) does not support many of the attributes that LLVM itself emits, and that should exist based on [the assembler docs](https://www.ibm.com/docs/en/ssw_aix_71/assembler/assembler_pdf.pdf). It also does accept some that should not exist based on those docs.
So, I've tried to do the best I can given those limitations. At least it's better than emitting the directives for elf and having that fail somewhere deep in LLVM. Given that inline assembly for this target is incomplete (under `asm_experimental_arch`), I think that's OK (and again I don't see how we can do better given the limitations in LLVM).
r? ```@Noratrieb``` (given that you reviewed https://github.com/rust-lang/rust/pull/136637)
It seems reasonable to ping the [`powerpc64-ibm-aix` target maintainers](https://doc.rust-lang.org/rustc/platform-support/aix.html), hopefully they have thoughts too: ```@daltenty``` ```@gilamn5tr```
naked functions: on windows emit `.endef` without the symbol name
tracking issue: https://github.com/rust-lang/rust/issues/90957
fixes https://github.com/rust-lang/rust/issues/138320
The `.endef` directive does not take the name as an argument. Apparently the LLVM x86_64 parser does accept this, but on i686 it's rejected. In general `i686` does some special name mangling stuff, so it's good to include it in the naked function tests.
r? ````@ChrisDenton```` (because windows)
metadata: Ignore sysroot when doing the manual native lib search in rustc
This is the opposite alternative to https://github.com/rust-lang/rust/pull/138170 and another way to make native library search consistent between rustc and linker.
This way the directory list searched by rustc is still a prefix of the directory list considered by linker, but it's a shorter prefix than in #138170.
We can include the sysroot directories into rustc's search again later if the issues with #138170 are resolved, it will be a backward compatible change.
This may break some code doing weird things on unstable rustc, or tier 2-3 targets, like bundling `libunwind.a` or sanitizers into something.
Note that this doesn't affect shipped `libc.a`, because it lives in `self-contained` directories in sysroot, and `self-contained` sysroot is already not included into the rustc's search. All libunwind and sanitizer libs should be moved to `self-contained` sysroot too eventually.
With the consistent search directory list between rustc and linker we can make rustc own the native library search (at least for static libs) and use linker search only as a fallback (like in #123436). This will allow addressing issues like https://github.com/rust-lang/rust/pull/132394 once and for all on all targets.
r? ``@bjorn3``
In LLVM v9, lld has supported @response-file
LLVM v9 was released on 2019-09-19.
And the check was added back to 2018-03-14 (1.26.0) via 04442af18b.
It has been more than five years, and we ship our own lld regardlessly.
This should be happily removed.
See also:
* <bb12396f91>
* <https://reviews.llvm.org/D63024>
Continuing the work from #137350.
Removes the unused methods: `expect_variant`, `expect_field`,
`expect_foreign_item`.
Every method gains a `hir_` prefix.
Prevent ICE in autodiff validation by emitting user-friendly errors
This PR moves `valid_ret_activity` and `valid_input_activity` checks to the macro expansion phase in compiler/rustc_builtin_macros/src/autodiff.rs, replacing the following internal compiler error (ICE):
```
error: internal compiler error:
compiler/rustc_codegen_ssa/src/codegen_attrs.rs:935:13:
Invalid input activity Dual for Reverse mode
```
with a more user-friendly message.
The issue specifically affected the test file `tests/ui/autodiff/autodiff_illegal.rs`, impacting the functions `f5` and `f6`.
The ICE can be reproduced by following [Enzyme's Rustbook](https://enzymead.github.io/rustbook/installation.html) installation guide.
Additionally, this PR adds tests for invalid return activity in `autodiff_illegal.rs`, which previously triggered an unnoticed ICE before these fixes.
r? ``@oli-obk``
Speed up target feature computation
The LLVM backend calls `LLVMRustHasFeature` twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.
r? `@bjorn3`
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).
This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.
This reverts commit 48caf81484, reversing
changes made to c6662879b2.
compiler: Use `size_of` from the prelude instead of imported
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the prelude instead of importing or qualifying them. Apply this change across the compiler.
These functions were added to all preludes in Rust 1.80.
r? ``@compiler-errors``
Don't re-`assume` in `transmute`s that don't change niches
I noticed in nightly 2025-02-21 that `transmute` is emitting way more `assume`s than necessary for newtypes.
For example, the three transmutes in <https://rust.godbolt.org/z/fW1KaTc4o> emits
```rust
define noundef range(i32 1, 0) i32 `@repeatedly_transparent_transmute(i32` noundef range(i32 1, 0) %_1) unnamed_addr {
start:
%0 = sub i32 %_1, 1
%1 = icmp ule i32 %0, -2
call void `@llvm.assume(i1` %1)
%2 = sub i32 %_1, 1
%3 = icmp ule i32 %2, -2
call void `@llvm.assume(i1` %3)
%4 = sub i32 %_1, 1
%5 = icmp ule i32 %4, -2
call void `@llvm.assume(i1` %5)
%6 = sub i32 %_1, 1
%7 = icmp ule i32 %6, -2
call void `@llvm.assume(i1` %7)
%8 = sub i32 %_1, 1
%9 = icmp ule i32 %8, -2
call void `@llvm.assume(i1` %9)
%10 = sub i32 %_1, 1
%11 = icmp ule i32 %10, -2
call void `@llvm.assume(i1` %11)
ret i32 %_1
}
```
But those are all just newtypes that don't change size or niches, so none of it's needed.
After this PR it's down to just
```rust
define noundef range(i32 1, 0) i32 `@repeatedly_transparent_transmute(i32` noundef range(i32 1, 0) %_1) unnamed_addr {
start:
ret i32 %_1
}
```
because none of those `assume`s in the original actually did anything.
(Transmuting to something with a difference niche, though, still has the assumes -- the other tests continue to pass checking that.)