Clarify str::from_utf8_unchecked's invariants
Specifically, make it clear that it is immediately UB to pass ill-formed UTF-8 into the function. The previous wording left space to interpret that the UB only occurred when calling another function, which "assumes that `&str`s are valid UTF-8."
This does not change whether str being UTF-8 is a safety or a validity invariant. (As per previous discussion, it is a safety invariant, not a validity invariant.) It just makes it clear that valid UTF-8 is a precondition of str::from_utf8_unchecked, and that emitting an Abstract Machine fault (e.g. UB or a sanitizer error) on invalid UTF-8 is a valid thing to do.
If user code wants to create an unsafe `&str` pointing to ill-formed UTF-8, it must be done via transmutes. Also, just, don't.
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/str.3A.3Afrom_utf8_unchecked.20Safety.20requirement
Fix miscompilation of inline assembly with outputs in cases where we emit an invoke instead of call instruction.
We ran into this bug where rustc would segfault while trying to compile certain uses of inline assembly.
Here is a simple repro that demonstrates the issue:
```rust
#![feature(asm_unwind)]
fn main() {
let _x = String::from("string here just cause we need something with a non-trivial drop");
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
options(may_unwind)
);
}
println!("{}", foo);
}
```
([playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=7d6641e83370d2536a07234aca2498ff))
But crucially `feature(asm_unwind)` is not actually needed and this can be triggered on stable as a result of the way async functions/generators are handled in the compiler. e.g.:
```rust
extern crate futures; // 0.3.21
async fn bar() {
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
);
}
println!("{}", foo);
}
fn main() {
futures::executor::block_on(bar());
}
```
([playground link](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1c7781c34dd4a3e80ae4bd936a0c82fc))
An example of the incorrect LLVM generated:
```llvm
bb1: ; preds = %start
%1 = invoke i64 asm sideeffect alignstack inteldialect unwind "mov ${0:q}, 1", "=&r,~{dirflag},~{fpsr},~{flags},~{memory}"()
to label %bb2 unwind label %cleanup, !srcloc !9
store i64 %1, i64* %foo, align 8
bb2:
[...snip...]
```
The store should not be placed after the asm invoke but rather should be in the normal control flow basic block (`bb2` in this case).
[Here](https://gist.github.com/luqmana/be1af5b64d2cda5a533e3e23a7830b44) is a writeup of the investigation that lead to finding this.
Replace RwLock by a futex based one on Linux
This replaces the pthread-based RwLock on Linux by a futex based one.
This implementation is similar to [the algorithm](https://gist.github.com/kprotty/3042436aa55620d8ebcddf2bf25668bc) suggested by `@kprotty,` but modified to prefer writers and spin before sleeping. It uses two futexes: One for the readers to wait on, and one for the writers to wait on. The readers futex contains the state of the RwLock: The number of readers, a bit indicating whether writers are waiting, and a bit indicating whether readers are waiting. The writers futex is used as a simple condition variable and its contents are meaningless; it just needs to be changed on every notification.
Using two futexes rather than one has the obvious advantage of allowing a separate queue for readers and writers, but it also means we avoid the problem a single-futex RwLock would have of making it hard for a writer to go to sleep while the number of readers is rapidly changing up and down, as the writers futex is only changed when we actually want to wake up a writer.
It always prefers writers, as we decided [here](https://github.com/rust-lang/rust/issues/93740#issuecomment-1070696128).
To be able to prefer writers, it relies on futex_wake to return the number of awoken threads to be able to handle write-unlocking while both the readers-waiting and writers-waiting bits are set. Instead of waking both and letting them race, it first wakes writers and only continues to wake the readers too if futex_wake reported there were no writers to wake up.
r? `@Amanieu`
[`let_chains`] Forbid `let` inside parentheses
Parenthesizes are mostly a no-op in let chains, in other words, they are mostly ignored.
```rust
let opt = Some(Some(1i32));
if (let Some(a) = opt && (let Some(b) = a)) && b == 1 {
println!("`b` is declared inside but used outside");
}
```
As seen above, such behavior can lead to confusion.
A proper fix or nested encapsulation would probably require research, time and a modified MIR graph so in this PR I simply denied any `let` inside parentheses. Non-let stuff are still allowed.
```rust
fn main() {
let fun = || true;
if let true = (true && fun()) && (true) {
println!("Allowed");
}
}
```
It is worth noting that `let ...` is not an expression and the RFC did not mention this specific situation.
cc `@matthewjasper`
Rollup of 7 pull requests
Successful merges:
- #95743 (Update binary_search example to instead redirect to partition_point)
- #95771 (Update linker-plugin-lto.md to 1.60)
- #95861 (Note that CI tests Windows 10)
- #95875 (bootstrap: show available paths help text for aliased subcommands)
- #95876 (Add a note for unsatisfied `~const Drop` bounds)
- #95907 (address fixme for diagnostic variable name)
- #95917 (thin_box test: import from std, not alloc)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
bootstrap: show available paths help text for aliased subcommands
Running `./x.py build -h -v` shows a list of available build targets,
but the short alias `./x.py b -h -v` does not. Fix so that the aliases
behave the same as their spelled out counterparts.
Update binary_search example to instead redirect to partition_point
Inspired by discussion in the tracking issue for `Result::into_ok_or_err`: https://github.com/rust-lang/rust/issues/82223#issuecomment-1067098167
People are surprised by us not providing a `Result<T, T> -> T` conversion, and the main culprit for this confusion seems to be the `binary_search` API. We should instead redirect people to the equivalent API that implicitly does that `Result<T, T> -> T` conversion internally which should obviate the need for the `into_ok_or_err` function and give us time to work towards a more general solution that applies to all enums rather than just `Result` such as making or_patterns usable for situations like this via postfix `match`.
I choose to duplicate the example rather than simply moving it from `binary_search` to partition point because most of the confusion seems to arise when people are looking at `binary_search`. It makes sense to me to have the example presented immediately rather than requiring people to click through to even realize there is an example. If I had to put it in only one place I'd leave it in `binary_search` and remove it from `partition_point` but it seems pretty obviously relevant to `partition_point` so I figured the best option would be to duplicate it.
Only suggest removing semicolon when expression is compatible with `impl Trait`
https://github.com/rust-lang/rust/issues/54771#issuecomment-476423690
> It still needs checking that the last statement's expr can actually conform to the trait, but the naïve behavior is there.
Only suggest removing a semicolon when the type behind the semicolon actually implements the trait in an RPIT `-> impl Trait`. Also upgrade the label that suggests removing the semicolon to a suggestion (should it be verbose?).
cc #54771
Better error for `for<...>` on associated type bound
With GATs just around the corner, we'll probably see more people trying out `Trait<for<'a> Assoc<'a> = ..>`.
This PR improves the syntax error slightly, and also makes it slightly easier to make this into real syntax in the future.
Feel free to push back if the reviewer thinks this should have a suggestion on how to fix it (i.e. push the `for<'a>` outside of the angle brackets), but that can also be handled in a follow-up PR.
`s/compiler-flags/compile-flags` in compiletest
Also make compiletest panic so this doesn't happen in the future! I literally always forget which it's called, so I wanted to make my life easier in the future.
Also open to the possibility of parsing both.
When a `macro_rules! foo { ... }` invocation is compiled the name used
is `foo`, not `macro_rules!`. This is different to all other macro
invocations, and confused me when I was inserted debugging println
statements for macro evaluation.
This commit changes it to `macro_rules` (or just `macro`), which is what
I expected. There are no externally visible changes.
These paths (`_cranelift` and `_gcc`) are somewhat misleading, since they
actually tell bootstrap to build *all* codegen backends. But this seems like
a useful improvement in the meantime.
Bootstrap already allows selecting these in `PathSet::has`, which allows
any string that matches the end of a full path.
I found these by adding `assert!(path.exists())` in `StepDescription::paths`.
I think ideally we wouldn't have any aliases that aren't paths, but I've held
off on enforcing that here since it may be controversial, I'll open a separate PR.
Rollup of 7 pull requests
Successful merges:
- #95566 (Avoid duplication of doc comments in `std::char` constants and functions)
- #95784 (Suggest replacing `typeof(...)` with an actual type)
- #95807 (Suggest adding a local for vector to fix borrowck errors)
- #95849 (Check for git submodules in non-git source tree.)
- #95852 (Fix missing space in lossy provenance cast lint)
- #95857 (Allow multiple derefs to be splitted in deref_separator)
- #95868 (rustdoc: Reduce allocations in a `html::markdown` function)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Specifically, make it clear that it is immediately UB to pass ill-formed UTF-8 into the function. The previous wording left space to interpret that the UB only occurred when calling another function, which "assumes that `&str`s are valid UTF-8."
This does not change whether str being UTF-8 is a safety or a validity invariant. (As per previous discussion, it is a safety invariant, not a validity invariant.) It just makes it clear that valid UTF-8 is a precondition of str::from_utf8_unchecked, and that emitting an Abstract Machine fault (e.g. UB or a sanitizer error) on invalid UTF-8 is a valid thing to do.
If user code wants to create an unsafe `&str` pointing to ill-formed UTF-8, it must be done via transmutes. Also, just, don't.