In #31751, `impl AsRef<Path> for Cow<'_, OsStr>` was added, but the
converse was omitted and it wasn't mentioned in discussion. This seems
to have been a simple oversight, as it otherwise implemented traits
mutually between `OsStr` and `Path`.
Update the minimum external LLVM to 19
With this change, we'll have stable support for LLVM 19 and 20.
For reference, the previous increase to LLVM 18 was #130487.
cc `@rust-lang/wg-llvm`
r? nikic
StableMIR: Prepare for refactoring
Temporarily make `stable_mir` "parasitic" on the `rustc_smir` crate.
It aims to resolve the circular dependency that would arise if we directly invert the dependency order between `rustc_smir` and `stable_mir`.
Once the refactoring is complete (`rustc_smir` does not depend on `stable_mir`), we will migrate it back to the `stable_mir` crate. See more details: [here](https://hackmd.io/jBRkZLqAQL2EVgwIIeNMHg).
Rename internal module from `statik` to `no_threads`
This module is named in reference to the keyword, but the term is somewhat overloaded. Rename it to more clearly describe it and avoid the misspelling.
Fix `ProvenVia` for global where clauses
When we're merging one (or more) global where clauses in the presence of no other candidates, ensure that we return `TraitGoalProvenVia::ParamEnv` so that rigid projections work correctly. This fixes some tests with `feature(trivial_bounds)`.
Fixes#139408
Fix missing const for inherent pointer `replace` methods
`ptr::replace` (the free fn) is already const stable. However, there are inherent convenience methods on `*mut T` and `NonNull<T>`, allowing you to write eg. `unsafe { foo.replace(bar) }` where `foo` is `*mut T` or `NonNull<T>`.
It seems const was never added to the inherent method (likely oversight), so this PR adds it.
I don't believe this needs another[^1] FCP as the inherent methods are already stable and `ptr::replace` is already const stable, so this adds no new API.
Original tracking issue: #83164
`ptr::replace` constified in #83091
`ptr::replace` const stabilized in #130954
[^1]: `const_replace` FCP completed: https://github.com/rust-lang/rust/issues/83164#issuecomment-2385670050
Folder experiment: Micro-optimize RegionEraserVisitor
**NOTE:** This is one of a series of perf experiments that I've come up with while sick in bed. I'm assigning them to lqd b/c you're a good reviewer and you'll hopefully be awake when these experiments finish, lol.
r? lqd
The region eraser is very hot, so let's see if we can avoid erasing types (and visiting consts and preds that don't have region-ful types) unnecessarily.
Change notifications for Exploit Mitigations PG
Reduce the amount of notifications sent to all the Exploit Mitigations PG by removing it from some of the paths.
Move `fd` into `std::sys`
Move platform definitions of `fd` into `std::sys`, as part of https://github.com/rust-lang/rust/issues/117276.
Unlike other modules directly under `std::sys`, this is only available on some platforms and I have not provided a fallback abstraction for unsupported platforms. That is similar to how `std::os::fd` is gated to only supported platforms.
Also, fix the `unsafe_op_in_unsafe_fn` lint, which was allowed for the Unix fd impl. Since macro expansions from `std::sys::pal::unix::weak` trigger this lint, fix it there too.
cc `@joboet,` `@ChrisDenton`
try-job: x86_64-gnu-aux
Implement `SliceIndex` for `ByteStr`
Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).
At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.
cc `@joshtriplett`
`ByteStr`/`ByteString` tracking issue: https://github.com/rust-lang/rust/issues/134915
hygiene: Avoid recursion in syntax context decoding
#139241 has two components
- Avoiding recursion during syntax context decoding
- Encoding/decoding only the non-redundant data, and recalculating the redundant data again during decoding
Both of these parts may influence compilation times, possibly in opposite directions.
So this PR contains only the first part to evaluate its effect in isolation.
This module is named in reference to the keyword, but the term is
somewhat overloaded. Rename it to more clearly describe it and avoid the
misspelling.
Apply `Recovery::Forbidden` when reparsing pasted macro fragments.
Fixes#137874.
The changes to the output of `tests/ui/associated-consts/issue-93835.rs`
partly undo the changes seen when `NtTy` was removed in #133436, which
is good.
r? ``@petrochenkov``
Rustdoc: typecheck settings.js
This makes the file fully typechecked with no instances of ``````@ts-expect-error`````` and no type casts.
r? `````@notriddle`````
replace extra_filename with strict version hash in metrics file names
Should resolve the potential issue of overwriting metrics from the same crate when compiled with different features or flags.
r? `````@estebank`````
try-job: test-various
Add integer to string formatting tests
As discussed in https://github.com/rust-lang/rust/pull/136264, there doesn't seem to have tests to ensure that int to string conversion is performed correctly, only sporadic tests here and there. Now we have some basic tests. :)
r? `````@Mark-Simulacrum`````
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Expose algebraic floating point intrinsics
# Problem
A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.
See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.
### C++: 10us ✅
With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
<th>C++</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
#pragma clang fp reassociate(on)
float sum = 0.0;
for (size_t i = 0; i < len; ++i) {
sum += a[i] * b[i];
}
return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>
### Nightly Rust: 10us ✅
With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>
### Stable Rust: 84us ❌
With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum += a[i] * b[i];
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>
# Proposed Change
Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.
# Alternatives Considered
https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.
In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.
# References
* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
Use the span of the whole bound when the diagnostic talks about a bound
While it makes sense that the host predicate only points to the `~const` part, as whether the actual trait bound is satisfied is checked separately, the user facing diagnostic is talking about the entire trait bound, at which point it makes more sense to just highlight the entire bound
r? `@compiler-errors` or `@fee1-dead`
ToSocketAddrs: fix typo
It's "a function", never "an function".
I noticed the same typo somewhere in the compiler sources so figured I'd fix it there as well.
Fix `Debug` impl for `LateParamRegionKind`.
It uses `Br` prefixes which are inappropriate and appear to have been incorrectly copy/pasted from the `Debug` impl for `BoundRegionKind`.
r? `@BoxyUwU`
AsyncDestructor: replace fields with impl_did
The future and ctor fields aren't actually used, and the way they are extracted is obviously wrong – swapping the order of the items in the source code will give wrong results.
Instead, store just the LocalDefId of the impl, which is enough for the only use of this data.
Fix 2024 edition doctest panic output
Fixes#137970.
The problem was that the output was actually displayed by rustc itself because we're exiting with `Result<(), String>`, and the display is really not great. So instead, we get the output, we print it and then we return an `ExitCode`.
r? ````@aDotInTheVoid````
Remove `rustc_middle::ty::util::ExplicitSelf`.
It's an old (2017 or earlier) type that describes a `self` receiver. It's only used in `rustc_hir_analysis` for two error messages, and much of the complexity isn't used. I suspect it used to be used for more things.
This commit removes it, and moves a greatly simplified version of the `determine` method into `rustc_hir_analysis`, renamed as `get_self_string`. The big comment on the method is removed because it no longer seems relevant.
r? `@BoxyUwU`
add `TypingMode::Borrowck`
Shares the first commit with #138499, doesn't really matter which PR to land first 😊😁
Introduces `TypingMode::Borrowck` which unlike `TypingMode::Analysis`, uses the hidden type computed by HIR typeck as the initial value of opaques instead of an unconstrained infer var. This is a part of https://github.com/rust-lang/types-team/issues/129.
Using this new `TypingMode` is unfortunately a breaking change for now, see tests/ui/impl-trait/non-defining-uses/as-projection-term.rs. Using an inference variable as the initial value results in non-defining uses in the defining scope. We therefore only enable it if with `-Znext-solver=globally` or `-Ztyping-mode-borrowck`
To do that the PR contains the following changes:
- `TypeckResults::concrete_opaque_type` are already mapped to the definition of the opaque type
- writeback now checks that the non-lifetime parameters of the opaque are universal
- for this, `fn check_opaque_type_parameter_valid` is moved from `rustc_borrowck` to `rustc_trait_selection`
- we add a new `query type_of_opaque_hir_typeck` which, using the same visitors as MIR typeck, attempts to merge the hidden types from HIR typeck from all defining scopes
- done by adding a `DefiningScopeKind` flag to toggle between using borrowck and HIR typeck
- the visitors stop checking that the MIR type matches the HIR type. This is trivial as the HIR type are now used as the initial hidden types of the opaque. This check is useful as a safeguard when not using `TypingMode::Borrowck`, but adding it to the new structure is annoying and it's not soundness critical, so I intend to not add it back.
- add a `TypingMode::Borrowck` which behaves just like `TypingMode::Analysis` except when normalizing opaque types
- it uses `type_of_opaque_hir_typeck(opaque)` as the initial value after replacing its regions with new inference vars
- it uses structural lookup in the new solver
fixes#112201, fixes#132335, fixes#137751
r? `@compiler-errors` `@oli-obk`