When encountering code like `f::<f::<f::<f::<f::<f::<f::<f::<...` with
unmatched closing angle brackets, add a linear check that avoids the
exponential behavior of the parse recovery mechanism.
Fix#117080.
```
error: expected one of `,`, `:`, or `}`, found `.`
--> $DIR/missing-fat-arrow.rs:25:14
|
LL | Some(a) if a.value == b {
| - while parsing this struct
LL | a.value = 1;
| -^ expected one of `,`, `:`, or `}`
| |
| while parsing this struct field
|
help: try naming a field
|
LL | a: a.value = 1;
| ++
help: you might have meant to start a match arm after the match guard
|
LL | Some(a) if a.value == b => {
| ++
```
Fix#78585.
There was an incomplete version of the check in parsing and a second
version in AST validation. This meant that some, but not all, invalid
uses were allowed inside macros/disabled cfgs. It also means that later
passes have a hard time knowing when the let expression is in a valid
location, sometimes causing ICEs.
- Add a field to ExprKind::Let in AST/HIR to mark whether it's in a
valid location.
- Suppress later errors and MIR construction for invalid let
expressions.
It's the same as `Delimiter`, minus the `Invisible` variant. I'm
generally in favour of using types to make impossible states
unrepresentable, but this one feels very low-value, and the conversions
between the two types are annoying and confusing.
Look at the change in `src/tools/rustfmt/src/expr.rs` for an example:
the old code converted from `MacDelimiter` to `Delimiter` and back
again, for no good reason. This suggests the author was confused about
the types.
Similar to the last commit, it's more of a `Parser`-level concern than a
`TokenCursor`-level concern. And the struct size reductions are nice.
After this change, `TokenCursor` is as minimal as possible (two fields
and two methods) which is nice.
It's more of a `Parser`-level concern than a `TokenCursor`-level
concern. Also, `num_bump_calls` is a more accurate name, because it's
incremented in `Parser::bump`.
Move doc comment desugaring out of `TokenCursor`.
It's awkward that `TokenCursor` sometimes desugars doc comments on the fly, but usually doesn't.
r? `@petrochenkov`
`TokenCursor` currently does doc comment desugaring on the fly, if the
`desugar_doc_comment` field is set. This requires also modifying the
token stream on the fly with `replace_prev_and_rewind`.
This commit moves the doc comment desugaring out of `TokenCursor`, by
introducing a new `TokenStream::desugar_doc_comment` method. This
separation of desugaring and iterating makes the code nicer.
It doesn't really matter what the `desugar_doc_comments` argument is
here, because in practice we never look ahead through doc comments.
Changing it to `cursor.desugar_doc_comments` will allow some follow-up
simplifications.
Currently a `{D,Subd}iagnosticMessage` can be created from any type that
impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static,
str>`, which are reasonable. It also includes `&String`, which is pretty
weird, and results in many places making unnecessary allocations for
patterns like this:
```
self.fatal(&format!(...))
```
This creates a string with `format!`, takes a reference, passes the
reference to `fatal`, which does an `into()`, which clones the
reference, doing a second allocation. Two allocations for a single
string, bleh.
This commit changes the `From` impls so that you can only create a
`{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static,
str>`. This requires changing all the places that currently create one
from a `&String`. Most of these are of the `&format!(...)` form
described above; each one removes an unnecessary static `&`, plus an
allocation when executed. There are also a few places where the existing
use of `&String` was more reasonable; these now just use `clone()` at
the call site.
As well as making the code nicer and more efficient, this is a step
towards possibly using `Cow<'static, str>` in
`{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing
the `From<&'a str>` impls to `From<&'static str>`, which is doable, but
I'm not yet sure if it's worthwhile.
My type ascription
Oh rip it out
Ah
If you think we live too much then
You can sacrifice diagnostics
Don't mix your garbage
Into my syntax
So many weird hacks keep diagnostics alive
Yet I don't even step outside
So many bad diagnostics keep tyasc alive
Yet tyasc doesn't even bother to survive!
We currently provide wrong suggestions and unhelpful errors on closure
bodies with braces missing. For example, given the following code:
```
fn main() {
let _x = Box::new(|x|x+1;);
}
```
the current output is like this:
```
error: expected expression, found `)`
--> ./main.rs:2:30
|
2 | let _x = Box::new(|x|x+1;);
| ^ expected expression
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^
3 | }
| ^
|
...
help: try adding braces
|
2 ~ let _x = Box::new(|x| {x+1;);
3 ~ }}
...
error: expected `;`, found `}`
--> ./main.rs:2:32
|
2 | let _x = Box::new(|x|x+1;);
| ^ help: add `;` here
3 | }
| - unexpected token
error: aborting due to 3 previous errors
```
This commit allows outputting correct suggestions and errors. The above
code would output like this:
```
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^ ^
|
note: statement found outside of a block
--> ./main.rs:2:29
|
2 | let _x = Box::new(|x|x+1;);
| ---^ this `;` turns the preceding closure into a statement
| |
| this expression is a statement because of the trailing semicolon
note: the closure body may be incorrectly delimited
--> ./main.rs:2:23
|
2 | let _x = Box::new(|x|x+1;);
| ^^^^^^ - ...but likely you meant the closure to end here
| |
| this is the parsed closure...
help: try adding braces
|
2 | let _x = Box::new(|x| {x+1;});
| + +
error: aborting due to previous error
```
The motivation here is to eliminate the `Option<(Delimiter,
DelimSpan)>`, which is `None` for the outermost token stream and `Some`
for all other token streams.
We are already treating the innermost frame specially -- this is the
`frame` vs `stack` distinction in `TokenCursor`. We can push that
further so that `frame` only contains the cursor, and `stack` elements
contain the delimiters for their children. When we are in the outermost
token stream `stack` is empty, so there are no stored delimiters, which
is what we want because the outermost token stream *has* no delimiters.
This change also shrinks `TokenCursor`, which shrinks `Parser` and
`LazyAttrTokenStreamImpl`, which is nice.
Sometimes the parser needs to desugar a doc comment into `#[doc =
r"foo"]`. Currently it does this in a hacky way: by pushing a "fake" new
frame (one without a delimiter) onto the `TokenCursor` stack.
This commit changes things so that the token stream itself is modified
in place. The nice thing about this is that it means
`TokenCursorFrame::delim_sp` is now only `None` for the outermost frame.
Properly calculate best failure in macro matching
Previously, we used spans. This was not good. Sometimes, the span of the token that failed to match may come from a position later in the file which has been transcribed into a token stream way earlier in the file. If precisely this token fails to match, we think that it was the best match because its span is so high, even though other arms might have gotten further in the token stream.
We now try to properly use the location in the token stream.
This needs a little cleanup as the `best_failure` field is getting out of hand but it should be mostly good to go. I hope I didn't violate too many abstraction boundaries..
Previously, we used spans. This was not good. Sometimes, the span of the
token that failed to match may come from a position later in the file
which has been transcribed into a token stream way earlier in the file.
If precisely this token fails to match, we think that it was the best
match because its span is so high, even though other arms might have
gotten further in the token stream.
We now try to properly use the location in the token stream.
Because the error message that emit out is from main error of parser
The information of enum variant disappears while parsing enum variant with error
We only check the syntax of expecting token, i.e, in case #103869
It will error it without telling the message that this error is from pasring enum variant.
Propagate the sub-error from parsing enum variant to the main error of parser by chaining it with map_err
Check the sub-error before emitting the main error of parser and attach it.
Fix#103869
`MacArgs` is an enum with three variants: `Empty`, `Delimited`, and `Eq`. It's
used in two ways:
- For representing attribute macro arguments (e.g. in `AttrItem`), where all
three variants are used.
- For representing function-like macros (e.g. in `MacCall` and `MacroDef`),
where only the `Delimited` variant is used.
In other words, `MacArgs` is used in two quite different places due to them
having partial overlap. I find this makes the code hard to read. It also leads
to various unreachable code paths, and allows invalid values (such as
accidentally using `MacArgs::Empty` in a `MacCall`).
This commit splits `MacArgs` in two:
- `DelimArgs` is a new struct just for the "delimited arguments" case. It is
now used in `MacCall` and `MacroDef`.
- `AttrArgs` is a renaming of the old `MacArgs` enum for the attribute macro
case. Its `Delimited` variant now contains a `DelimArgs`.
Various other related things are renamed as well.
These changes make the code clearer, avoids several unreachable paths, and
disallows the invalid values.
Recover wrong-cased keywords that start items
(_this pr was inspired by [this tweet](https://twitter.com/Azumanga/status/1552982326409367561)_)
r? `@estebank`
We've talked a bit about this recovery, but I just wanted to make sure that this is the right approach :)
For now I've only added the case insensitive recovery to `use`s, since most other items like `impl` blocks, modules, functions can start with multiple keywords which complicates the matter.
`To` is better than `Create` for indicating that this is a non-consuming
conversion, rather than creating something out of nothing.
And the addition of `Attr` is because the current names makes them sound
like they relate to `TokenStream`, but really they relate to
`AttrTokenStream`.
These two type names are long and have long matching prefixes. I find
them hard to read, especially in combinations like
`AttrAnnotatedTokenStream::new(vec![AttrAnnotatedTokenTree::Token(..)])`.
This commit renames them as `AttrToken{Stream,Tree}`.
This PR will fix some typos detected by [typos].
I only picked the ones I was sure were spelling errors to fix, mostly in
the comments.
[typos]: https://github.com/crate-ci/typos
`Parser::parse_bottom_expr` currently constructs an empty `attrs` and
then passes it to a large number of other functions. This makes the code
harder to read than it should be, because it's not clear that many
`attrs` arguments are always empty.
This commit removes `attrs` and the passing, simplifying a lot of
functions. The commit also renames `Parser::mk_expr` (which takes an
`attrs` argument) as `mk_expr_with_attrs`, and introduces a new
`mk_expr` which creates an expression with no attributes, which is the
more common case.
Use Parser's `restrictions` instead of `let_expr_allowed`
This also means that the `ALLOW_LET` flag is reset properly for subexpressions, so we can properly deny things like `a && (b && let c = d)`. Also the parser is a tiny bit smaller now.
It doesn't reject _all_ bad `let` expr usages, just a bit more.
cc `@c410-f3r`
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is
not quite right. `Spacing` makes sense for `TokenTree::Token`, but does
not make sense for `TokenTree::Delimited`, because a
`TokenTree::Delimited` cannot be joined with another `TokenTree`.
This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`,
changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the
`TreeAndSpacing` typedef.
The commit removes these two impls:
- `impl From<TokenTree> for TokenStream`
- `impl From<TokenTree> for TreeAndSpacing`
These were useful, but also resulted in code with many `.into()` calls
that was hard to read, particularly for anyone not highly familiar with
the relevant types. This commit makes some other changes to compensate:
- `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`.
- `TokenStream::token_{alone,joint}()` are added.
- `TokenStream::delimited` is added.
This results in things like this:
```rust
TokenTree::token(token::Semi, stmt.span).into()
```
changing to this:
```rust
TokenStream::token_alone(token::Semi, stmt.span)
```
This makes the type of the result, and its spacing, clearer.
These changes also simplifies `Cursor` and `CursorRef`, because they no longer
need to distinguish between `next` and `next_with_spacing`.
Avoid some `&str` to `String` conversions with `MultiSpan::push_span_label`
This patch removes some`&str` to `String` conversions with `MultiSpan::push_span_label`.
Improve parser diagnostics
This pr fixes https://github.com/rust-lang/rust/issues/93867 and contains a couple of diagnostics related changes to the parser.
Here is a short list with some of the changes:
- don't suggest the same thing that is the current token
- suggest removing the current token if the following token is one of the suggestions (maybe incorrect)
- tell the user to put a type or lifetime after where if there is none (as a warning)
- reduce the amount of tokens suggested (via the new eat_noexpect and check_noexpect methods)
If any of these changes are undesirable, i can remove them, thanks!
Overhaul `MacArgs`
Motivation:
- Clarify some code that I found hard to understand.
- Eliminate one use of three places where `TokenKind::Interpolated` values are created.
r? `@petrochenkov`
The value in `MacArgs::Eq` is currently represented as a `Token`.
Because of `TokenKind::Interpolated`, `Token` can be either a token or
an arbitrary AST fragment. In practice, a `MacArgs::Eq` starts out as a
literal or macro call AST fragment, and then is later lowered to a
literal token. But this is very non-obvious. `Token` is a much more
general type than what is needed.
This commit restricts things, by introducing a new type `MacArgsEqKind`
that is either an AST expression (pre-lowering) or an AST literal
(post-lowering). The downside is that the code is a bit more verbose in
a few places. The benefit is that makes it much clearer what the
possibilities are (though also shorter in some other places). Also, it
removes one use of `TokenKind::Interpolated`, taking us a step closer to
removing that variant, which will let us make `Token` impl `Copy` and
remove many "handle Interpolated" code paths in the parser.
Things to note:
- Error messages have improved. Messages like this:
```
unexpected token: `"bug" + "found"`
```
now say "unexpected expression", which makes more sense. Although
arbitrary expressions can exist within tokens thanks to
`TokenKind::Interpolated`, that's not obvious to anyone who doesn't
know compiler internals.
- In `parse_mac_args_common`, we no longer need to collect tokens for
the value expression.
Change `span_suggestion` (and variants) to take `impl ToString` rather
than `String` for the suggested code, as this simplifies the
requirements on the diagnostic derive.
Signed-off-by: David Wood <david.wood@huawei.com>