Precedence improvements: closures and jumps
This PR fixes some cases where rustc's pretty printers would redundantly parenthesize expressions that didn't need it.
<table>
<tr><th>Before</th><th>After</th></tr>
<tr><td><code>return (|x: i32| x)</code></td><td><code>return |x: i32| x</code></td></tr>
<tr><td><code>(|| -> &mut () { std::process::abort() }).clone()</code></td><td><code>|| -> &mut () { std::process::abort() }.clone()</code></td></tr>
<tr><td><code>(continue) + 1</code></td><td><code>continue + 1</code></td></tr>
</table>
Tested by `echo "fn main() { let _ = $AFTER; }" | rustc -Zunpretty=expanded /dev/stdin`.
The pretty-printer aims to render the syntax tree as it actually exists in rustc, as faithfully as possible, in Rust syntax. It can insert parentheses where forced by Rust's grammar in order to preserve the meaning of a macro-generated syntax tree, for example in the case of `a * $rhs` where $rhs is `b + c`. But for any expression parsed from source code, without a macro involved, there should never be a reason for inserting additional parentheses not present in the original.
For closures and jumps (return, break, continue, yield, do yeet, become) the unneeded parentheses came from the precedence of some of these expressions being misidentified. In the same order as the table above:
- Jumps and closures are supposed to have equal precedence. The [Rust Reference](https://doc.rust-lang.org/1.83.0/reference/expressions.html#expression-precedence) says so, and in Syn they do. There is no Rust syntax that would require making a precedence distinction between jumps and closures. But in rustc these were previously 2 distinct levels with the closure being lower, hence the parentheses around a closure inside a jump (but not a jump inside a closure).
- When a closure is written with an explicit return type, the grammar [requires](https://doc.rust-lang.org/1.83.0/reference/expressions/closure-expr.html) that the closure body consists of exactly one block expression, not any other arbitrary expression as usual for closures. Parsing of the closure body does not continue after the block expression. So in `|| { 0 }.clone()` the clone is inside the closure body and applies to `{ 0 }`, whereas in `|| -> _ { 0 }.clone()` the clone is outside and applies to the closure as a whole.
- Continue never needs parentheses. It was previously marked as having the lowest possible precedence but it should have been the highest, next to paths and loops and function calls, not next to jumps.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Rollup of 9 pull requests
Successful merges:
- #122670 (Fix bug where `option_env!` would return `None` when env var is present but not valid Unicode)
- #131095 (Use environment variables instead of command line arguments for merged doctests)
- #131339 (Expand set_ptr_value / with_metadata_of docs)
- #131652 (Move polarity into `PolyTraitRef` rather than storing it on the side)
- #131675 (Update lint message for ABI not supported)
- #131681 (Fix up-to-date checking for run-make tests)
- #131702 (Suppress import errors for traits that couldve applied for method lookup error)
- #131703 (Resolved python deprecation warning in publish_toolstate.py)
- #131710 (Remove `'apostrophes'` from `rustc_parse_format`)
r? `@ghost`
`@rustbot` modify labels: rollup
Add `&pin (mut|const) T` type position sugar
This adds parser support for `&pin mut T` and `&pin const T` references. These are desugared to `Pin<&mut T>` and `Pin<&T>` in the AST lowering phases.
This PR currently includes #130526 since that one is in the commit queue. Only the most recent commits (bd450027eb4a94b814a7dd9c0fa29102e6361149 and following) are new.
Tracking:
- #130494
r? `@compiler-errors`
Parenthesize break values containing leading label
The AST pretty printer previously produced invalid syntax in the case of `break` expressions with a value that begins with a loop or block label.
```rust
macro_rules! expr {
($e:expr) => {
$e
};
}
fn main() {
loop {
break expr!('a: loop { break 'a 1; } + 1);
};
}
```
`rustc -Zunpretty=expanded main.rs `:
```console
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::rust_2015::*;
#[macro_use]
extern crate std;
macro_rules! expr { ($e:expr) => { $e }; }
fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
```
The expanded code is not valid Rust syntax. Printing invalid syntax is bad because it blocks `cargo expand` from being able to format the output as Rust syntax using rustfmt.
```console
error: parentheses are required around this expression to avoid confusion with a labeled break expression
--> <anon>:9:26
|
9 | fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
help: wrap the expression in parentheses
|
9 | fn main() { loop { break ('a: loop { break 'a 1; }) + 1; }; }
| + +
```
This PR updates the AST pretty-printer to insert parentheses around the value of a `break` expression as required to avoid this edge case.
This commit by itself is supposed to have no effect on behavior. All of
the call sites are updated to preserve their previous behavior.
The behavior changes are in the commits that follow.
For each of these, we need to decide whether they need to be using
`expr_requires_semi_to_be_stmt`, or `expr_requires_comma_to_be_match_arm`,
which are supposed to be 2 different behaviors. Previously they were
conflated into one, causing either too much or too little
parenthesization.
This mostly works well, and eliminates a couple of delayed bugs.
One annoying thing is that we should really also add an
`ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's
difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`,
so we have to fake it.
`cook_lexer_literal` can emit an error about an invalid int literal but
then return a non-`Err` token. And then `integer_lit` has to account for
this to avoid printing a redundant error message.
This commit changes `cook_lexer_literal` to return `Err` in that case.
Then `integer_lit` doesn't need the special case, and
`LitError::LexerError` can be removed.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
- Rename it as `MixedUnit`, because it will soon be used in more than
just C string literals.
- Change the `Byte` variant to `HighByte` and use it only for
`\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
could be encoded with either `Byte` or `Char`.
- Add useful comments.
- Remove `is_ascii`, in favour of `u8::is_ascii`.
The parser already does a check-only unescaping which catches all
errors. So the checking done in `from_token_lit` never hits.
But literals causing warnings can still occur in `from_token_lit`. So
the commit changes `str-escape.rs` to use byte string literals and C
string literals as well, to give better coverage and ensure the new
assertions in `from_token_lit` are correct.
By making it an `EscapeError` instead of a `LitError`. This makes it
like the other errors produced when checking string literals contents,
e.g. for invalid escape sequences or bare CR chars.
NOTE: this means these errors are issued earlier, before expansion,
which changes behaviour. It will be possible to move the check back to
the later point if desired. If that happens, it's likely that all the
string literal contents checks will be delayed together.
One nice thing about this: the old approach had some code in
`report_lit_error` to calculate the span of the nul char from a range.
This code used a hardwired `+2` to account for the `c"` at the start of
a C string literal, but this should have changed to a `+3` for raw C
string literals to account for the `cr"`, which meant that the caret in
`cr"` nul error messages was one short of where it should have been. The
new approach doesn't need any of this and avoids the off-by-one error.