Some parser improvements
I was looking closely at attribute handling in the parser while debugging some issues relating to #124141, and found a few small improvements.
``@spastorino``
This commit does the following.
- Pulls the code out of `AttrTokenStream::to_token_trees` into a new
function `attrs_and_tokens_to_token_trees`.
- Simplifies `TokenStream::from_ast` by calling the new function. This
is nicer than the old way, which created a temporary
`AttrTokenStream` containing a single `AttrsTarget` (which required
some cloning) just to call `to_token_trees` on it. (It is good to
remove this use of `AttrsTarget` which isn't related to `cfg_attr`
expansion.)
The only place it is meaningfully used is in a panic message in
`TokenStream::from_ast`. But `node.span()` doesn't need to be printed
because `node` is also printed and it must contain the span.
Added dots at the sentence ends of rustc AST doc
Just a tiny improvement for the AST documentation by bringing consistency to sentence ends. I intentionally didn't terminate every sentence, there are still some members not having them, but at least there's no mixing style on the type level.
Match ergonomics 2024: Implement TC's match ergonomics proposal
Under gate `ref_pat_eat_one_layer_2024_structural`. Enabling `ref_pat_eat_one_layer_2024` at the same time allows the union of what the individual gates allow. `@traviscross`
r? `@Nadrieril`
cc https://github.com/rust-lang/rust/issues/123076
`@rustbot` label A-edition-2024 A-patterns
Parenthesize break values containing leading label
The AST pretty printer previously produced invalid syntax in the case of `break` expressions with a value that begins with a loop or block label.
```rust
macro_rules! expr {
($e:expr) => {
$e
};
}
fn main() {
loop {
break expr!('a: loop { break 'a 1; } + 1);
};
}
```
`rustc -Zunpretty=expanded main.rs `:
```console
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::rust_2015::*;
#[macro_use]
extern crate std;
macro_rules! expr { ($e:expr) => { $e }; }
fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
```
The expanded code is not valid Rust syntax. Printing invalid syntax is bad because it blocks `cargo expand` from being able to format the output as Rust syntax using rustfmt.
```console
error: parentheses are required around this expression to avoid confusion with a labeled break expression
--> <anon>:9:26
|
9 | fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
help: wrap the expression in parentheses
|
9 | fn main() { loop { break ('a: loop { break 'a 1; }) + 1; }; }
| + +
```
This PR updates the AST pretty-printer to insert parentheses around the value of a `break` expression as required to avoid this edge case.
Currently it uses a mixture of functional style (`flat_map`) and
imperative style (`push`), which is a bit hard to read. This commit
converts it to fully imperative, which is more concise and avoids the
need for `smallvec`.
I.e. change the return type from `TokenStream` to `Vec<TokenTree>`.
Most of the callsites require a `TokenStream`, but the recursive call
used to create `target_tokens` requires a `Vec<TokenTree>`. It's easy
to convert a `Vec<TokenTree>` to a `TokenStream` (just call
`TokenStream::new`) but it's harder to convert a `TokenStream` to a
`Vec<TokenTree>` (either iterate/clone/collect, or use `Lrc::into_inner`
if appropriate).
So this commit changes the return value to simplify that `target_tokens`
call site.
ast: Standardize visiting order
Order: ID, attributes, inner nodes in source order if possible, tokens, span.
Also always use exhaustive matching in visiting infra, and visit some discovered missing nodes.
Unlike https://github.com/rust-lang/rust/pull/125741 this shouldn't affect anything serious like `macro_rules` scopes.
Under gate `ref_pat_eat_one_layer_2024_structural`.
Enabling `ref_pat_eat_one_layer_2024` at the same time allows the union
of what the individual gates allow.
Id, attributes, inner nodes in source order if possible, tokens, span.
Also always use exhaustive matching in visiting infra, and visit some missing nodes.
Fix a span in `parse_ty_bare_fn`.
It currently goes one token too far.
Example: line 259 of `tests/ui/abi/compatibility.rs`:
```
test_abi_compatible!(fn_fn, fn(), fn(i32) -> i32);
```
This commit changes the span for the second element from `fn(),` to `fn()`, i.e. removes the extraneous comma.
This doesn't affect any tests. I found it while debugging some other code. Not a big deal but an easy fix so I figure it worth doing.
r? ``@spastorino``
It currently goes one token too far.
Example: line 259 of `tests/ui/abi/compatibility.rs`:
```
test_abi_compatible!(fn_fn, fn(), fn(i32) -> i32);
```
This commit changes the span for the second element from `fn(),` to
`fn()`, i.e. removes the extraneous comma.
Eliminate the distinction between PREC_POSTFIX and PREC_PAREN precedence level
I have been tangling with precedence as part of porting some pretty-printer improvements from syn back to rustc (related to parenthesization of closures, returns, and breaks by the AST pretty-printer).
As far as I have been able to tell, there is no difference between the 2 different precedence levels that rustc identifies as `PREC_POSTFIX` (field access, square bracket index, question mark, method call) and `PREC_PAREN` (loops, if, paths, literals).
There are a bunch of places that look at either `prec < PREC_POSTFIX` or `prec >= PREC_POSTFIX`. But there is nothing that needs to distinguish PREC_POSTFIX and PREC_PAREN from one another.
d49994b060/compiler/rustc_ast/src/util/parser.rs (L236-L237)d49994b060/compiler/rustc_hir_typeck/src/fn_ctxt/suggestions.rs (L2829)d49994b060/compiler/rustc_hir_typeck/src/fn_ctxt/suggestions.rs (L1290)
In the interest of eliminating a distinction without a difference, this PR collapses these 2 levels down to 1.
There is exactly 1 case where an expression with PREC_POSTFIX precedence needs to be parenthesized in a location that an expression with PREC_PAREN would not, and that's when the receiver of ExprKind::MethodCall is ExprKind::Field. `x.f()` means a different thing than `(x.f)()`. But this does not justify having separate precedence levels because this special case in the grammar is not governed by precedence. Field access does not have "lower precedence than" method call syntax — you can tell because if it did, then `x.f[0].f()` wouldn't be able to have its unparenthesized field access in the receiver of a method call. Because this Field/MethodCall special case is not governed by precedence, it already requires special handling and is not affected by eliminating the PREC_POSTFIX precedence level.
d49994b060/compiler/rustc_ast_pretty/src/pprust/state/expr.rs (L217-L221)
Just some extra sanity checking, making explicit some values not
possible in code working with token trees -- we shouldn't be seeing
explicit delimiter tokens, because they should be represented as
`TokenTree::Delimited`.
Merge `PatParam`/`PatWithOr`, and `Expr`/`Expr2021`, for a few reasons.
- It's conceptually nice, because the two pattern kinds and the two
expression kinds are very similar.
- With expressions in particular, there are several places where both
expression kinds get the same treatment.
- It removes one unreachable match arm.
- Most importantly, for #124141 I will need to introduce a new type
`MetaVarKind` that is very similar to `NonterminalKind`, but records a
couple of extra fields for expression metavars. It's nicer to have a
single `MetaVarKind::Expr` expression variant to hold those extra
fields instead of duplicating them across two variants
`MetaVarKind::{Expr,Expr2021}`. And then it makes sense for patterns
to be treated the same way, and for `NonterminalKind` to also be
treated the same way.
I also clarified the comments, because I have long found them a little
hard to understand.
`StaticForeignItem` and `StaticItem` are the same
The struct `StaticItem` and `StaticForeignItem` are the same, so remove `StaticForeignItem`. Having them be separate is unique to `static` items -- unlike `ForeignItemKind::{Fn,TyAlias}`, which use the normal AST item.
r? ``@spastorino`` or ``@oli-obk``
Make edition dependent `:expr` macro fragment act like the edition-dependent `:pat` fragment does
Parse the `:expr` fragment as `:expr_2021` in editions <=2021, and as `:expr` in edition 2024. This is similar to how we parse `:pat` as `:pat_param` in edition <=2018 and `:pat_with_or` in >=2021, and means we can get rid of a span dependency from `nonterminal_may_begin_with`.
Specifically, this fixes a theoretical regression since the `expr_2021` macro fragment previously would allow `const {}` if the *caller* is edition 2024. This is inconsistent with the way that the `pat` macro fragment was upgraded, and also leads to surprising behavior when a macro *caller* crate upgrades to edtion 2024, since they may have parsing changes that they never asked for (with no way of opting out of it).
This PR also allows using `expr_2021` in all editions. Why was this was disallowed in the first place? It's purely additive, and also it's still feature gated?
r? ```@fmease``` ```@eholk``` cc ```@vincenzopalazzo```
cc #123865
Tracking:
- https://github.com/rust-lang/rust/issues/123742
We currently use `can_begin_literal_maybe_minus` in a couple of places
where only string literals are allowed. This commit introduces a
more specific function, which makes things clearer. It doesn't change
behaviour because the two functions affected (`is_unsafe_foreign_mod`
and `check_keyword_case`) are always followed by a call to `parse_abi`,
which checks again for a string literal.
It's clearer this way, because the `Interpolated` cases in
`can_begin_const_arg` and `is_pat_range_end_start` are more permissive
than the `Interpolated` cases in `can_begin_literal_maybe_minus`.
Fix duplicated attributes on nonterminal expressions
This PR fixes a long-standing bug (#86055) whereby expression attributes can be duplicated when expanded through declarative macros.
First, consider how items are parsed in declarative macros:
```
Items:
- parse_nonterminal
- parse_item(ForceCollect::Yes)
- parse_item_
- attrs = parse_outer_attributes
- parse_item_common(attrs)
- maybe_whole!
- collect_tokens_trailing_token
```
The important thing is that the parsing of outer attributes is outside token collection, so the item's tokens don't include the attributes. This is how it's supposed to be.
Now consider how expression are parsed in declarative macros:
```
Exprs:
- parse_nonterminal
- parse_expr_force_collect
- collect_tokens_no_attrs
- collect_tokens_trailing_token
- parse_expr
- parse_expr_res(None)
- parse_expr_assoc_with
- parse_expr_prefix
- parse_or_use_outer_attributes
- parse_expr_dot_or_call
```
The important thing is that the parsing of outer attributes is inside token collection, so the the expr's tokens do include the attributes, i.e. in `AttributesData::tokens`.
This PR fixes the bug by rearranging expression parsing to that outer attribute parsing happens outside of token collection. This requires a number of small refactorings because expression parsing is somewhat complicated. While doing so the PR makes the code a bit cleaner and simpler, by eliminating `parse_or_use_outer_attributes` and `Option<AttrWrapper>` arguments (in favour of the simpler `parse_outer_attributes` and `AttrWrapper` arguments), and simplifying `LhsExpr`.
r? `@petrochenkov`