Improve parser diagnostics
This pr fixes https://github.com/rust-lang/rust/issues/93867 and contains a couple of diagnostics related changes to the parser.
Here is a short list with some of the changes:
- don't suggest the same thing that is the current token
- suggest removing the current token if the following token is one of the suggestions (maybe incorrect)
- tell the user to put a type or lifetime after where if there is none (as a warning)
- reduce the amount of tokens suggested (via the new eat_noexpect and check_noexpect methods)
If any of these changes are undesirable, i can remove them, thanks!
Move conditions out of recover/report functions.
`Parser` has six recover/report functions that are passed a boolean, and
nothing is done if the boolean has a particular value.
This PR moves the tests outside the functions. This has the following effects.
- The number of lines of code goes down.
- Some `use` items become shorter.
- Avoids the strangeness whereby 11 out of 12 calls to
`maybe_recover_from_bad_qpath` pass `true` as the second argument.
- Makes it clear at the call site that only one of
`maybe_recover_from_bad_type_plus` and `maybe_report_ambiguous_plus` will be
run.
r? `@estebank`
By heap allocating the argument within `NtPath`, `NtVis`, and `NtStmt`.
This slightly reduces cumulative and peak allocation amounts, most
notably on `deep-vector`.
* Recover from invalid `'label: ` before block.
* Make suggestion to enclose statements in a block multipart.
* Point at `match`, `while`, `loop` and `unsafe` keywords when failing
to parse their expression.
* Do not suggest `{ ; }`.
* Do not suggest `|` when very unlikely to be what was wanted (in `let`
statements).
Nicer error message if the user attempts to do let...else if
Gives a nice "conditional `else if` is not supported for `let...else`" error when encountering a `let...else if` pattern, as suggested in the [let...else tracking issue](https://github.com/rust-lang/rust/issues/87335#issuecomment-944846205).
Accept `m!{ .. }.method()` and `m!{ .. }?` statements.
This PR fixes something that I keep running into when using `quote!{}.into()` in a proc macro to convert the `proc_macro2::TokenStream` to a `proc_macro::TokenStream`:
Before:
```
error: expected expression, found `.`
--> src/lib.rs:6:6
|
4 | quote! {
5 | ...
6 | }.into()
| ^ expected expression
```
After:
```
```
(No output, compiles fine.)
---
Context:
For expressions like `{ 1 }` and `if true { 1 } else { 2 }`, we accept them as full statements without a trailing `;`, which means the following is not accepted:
```rust
{ 1 } - 1 // error
```
since that is parsed as two statements: `{ 1 }` and `-1`. Syntactically correct, but the type of `{ 1 }` should be `()` as there is no `;`.
However, for specifically `.` and `?` after the `}`, we do [continue parsing it as an expression](13db8440bb/compiler/rustc_parse/src/parser/expr.rs (L864-L876)):
```rust
{ "abc" }.len(); // ok
```
For braced macro invocations, we do not do this:
```rust
vec![1, 2, 3].len(); // ok
vec!{1, 2, 3}.len(); // error
```
(It parses `vec!{1, 2, 3}` as a full statement, and then complains about `.len()` not being a valid expression.)
This PR changes this to also look for a `.` and `?` after a braced macro invocation. We can be sure the macro is an expression and not a full statement in those cases, since no statement can start with a `.` or `?`.
This PR modifies the macro expansion infrastructure to handle attributes
in a fully token-based manner. As a result:
* Derives macros no longer lose spans when their input is modified
by eager cfg-expansion. This is accomplished by performing eager
cfg-expansion on the token stream that we pass to the derive
proc-macro
* Inner attributes now preserve spans in all cases, including when we
have multiple inner attributes in a row.
This is accomplished through the following changes:
* New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced.
These are very similar to a normal `TokenTree`, but they also track
the position of attributes and attribute targets within the stream.
They are built when we collect tokens during parsing.
An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when
we invoke a macro.
* Token capturing and `LazyTokenStream` are modified to work with
`AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which
is created during the parsing of a nested AST node to make the 'outer'
AST node aware of the attributes and attribute target stored deeper in the token stream.
* When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`),
we tokenize and reparse our target, capturing additional information about the locations of
`#[cfg]` and `#[cfg_attr]` attributes at any depth within the target.
This is a performance optimization, allowing us to perform less work
in the typical case where captured tokens never have eager cfg-expansion run.
or-patterns: disallow in `let` bindings
~~Blocked on https://github.com/rust-lang/rust/pull/81869~~
Disallows top-level or-patterns before type ascription. We want to reserve this syntactic space for possible future generalized type ascription.
r? ``@petrochenkov``
When token-based attribute handling is implemeneted in #80689,
we will need to access tokens from `HasAttrs` (to perform
cfg-stripping), and we will to access attributes from `HasTokens` (to
construct a `PreexpTokenStream`).
This PR merges the `HasAttrs` and `HasTokens` traits into a new
`AstLike` trait. The previous `HasAttrs` impls from `Vec<Attribute>` and `AttrVec`
are removed - they aren't attribute targets, so the impls never really
made sense.
Along the way, we also implement a handful of diagnostics improvements
and fixes, particularly with respect to the special handling of `||` in
place of `|` and when there are leading verts in function params, which
don't allow top-level or-patterns anyway.
This is a pure refactoring split out from #80689.
It represents the most invasive part of that PR, requiring changes in
every caller of `parse_outer_attributes`
In order to eagerly expand `#[cfg]` attributes while preserving the
original `TokenStream`, we need to know the range of tokens that
corresponds to every attribute target. This is accomplished by making
`parse_outer_attributes` return an opaque `AttrWrapper` struct. An
`AttrWrapper` must be converted to a plain `AttrVec` by passing it to
`collect_tokens_trailing_token`. This makes it difficult to accidentally
construct an AST node with attributes without calling `collect_tokens_trailing_token`,
since AST nodes store an `AttrVec`, not an `AttrWrapper`.
As a result, we now call `collect_tokens_trailing_token` for attribute
targets which only support inert attributes, such as generic arguments
and struct fields. Currently, the constructed `LazyTokenStream` is
simply discarded. Future PRs will record the token range corresponding
to the attribute target, allowing those tokens to be removed from an
enclosing `collect_tokens_trailing_token` call if necessary.
Fixes#81007
Previously, we would fail to collect tokens in the proper place when
only builtin attributes were present. As a result, we would end up with
attribute tokens in the collected `TokenStream`, leading to duplication
when we attempted to prepend the attributes from the AST node.
We now explicitly track when token collection must be performed due to
nomterminal parsing.
A new `HasTokens` trait is introduced, which is used to move logic from
the callers of `collect_tokens` into the body of `collect_tokens`.
In addition to reducing duplication, this paves the way for PR #80689,
which needs to perform additional logic during token collection.
When parsing a statement (e.g. inside a function body),
we now consider `struct Foo {};` and `$stmt;` to each consist
of two statements: `struct Foo {}` and `;`, and `$stmt` and `;`.
As a result, an attribute macro invoke as
`fn foo() { #[attr] struct Bar{}; }` will see `struct Bar{}` as its
input. Additionally, the 'unused semicolon' lint now fires in more
places.
We now collect tokens for the underlying node wrapped by `StmtKind`
instead of storing tokens directly in `Stmt`.
`LazyTokenStream` now supports capturing a trailing semicolon after it
is initially constructed. This allows us to avoid refactoring statement
parsing to wrap the parsing of the semicolon in `parse_tokens`.
Attributes on item statements
(e.g. `fn foo() { #[bar] struct MyStruct; }`) are now treated as
item attributes, not statement attributes, which is consistent with how
we handle attributes on other kinds of statements. The feature-gating
code is adjusted so that proc-macro attributes are still allowed on item
statements on stable.
Two built-in macros (`#[global_allocator]` and `#[test]`) needed to be
adjusted to support being passed `Annotatable::Stmt`.