Currently a `{D,Subd}iagnosticMessage` can be created from any type that
impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static,
str>`, which are reasonable. It also includes `&String`, which is pretty
weird, and results in many places making unnecessary allocations for
patterns like this:
```
self.fatal(&format!(...))
```
This creates a string with `format!`, takes a reference, passes the
reference to `fatal`, which does an `into()`, which clones the
reference, doing a second allocation. Two allocations for a single
string, bleh.
This commit changes the `From` impls so that you can only create a
`{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static,
str>`. This requires changing all the places that currently create one
from a `&String`. Most of these are of the `&format!(...)` form
described above; each one removes an unnecessary static `&`, plus an
allocation when executed. There are also a few places where the existing
use of `&String` was more reasonable; these now just use `clone()` at
the call site.
As well as making the code nicer and more efficient, this is a step
towards possibly using `Cow<'static, str>` in
`{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing
the `From<&'a str>` impls to `From<&'static str>`, which is doable, but
I'm not yet sure if it's worthwhile.
My type ascription
Oh rip it out
Ah
If you think we live too much then
You can sacrifice diagnostics
Don't mix your garbage
Into my syntax
So many weird hacks keep diagnostics alive
Yet I don't even step outside
So many bad diagnostics keep tyasc alive
Yet tyasc doesn't even bother to survive!
We currently provide wrong suggestions and unhelpful errors on closure
bodies with braces missing. For example, given the following code:
```
fn main() {
let _x = Box::new(|x|x+1;);
}
```
the current output is like this:
```
error: expected expression, found `)`
--> ./main.rs:2:30
|
2 | let _x = Box::new(|x|x+1;);
| ^ expected expression
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^
3 | }
| ^
|
...
help: try adding braces
|
2 ~ let _x = Box::new(|x| {x+1;);
3 ~ }}
...
error: expected `;`, found `}`
--> ./main.rs:2:32
|
2 | let _x = Box::new(|x|x+1;);
| ^ help: add `;` here
3 | }
| - unexpected token
error: aborting due to 3 previous errors
```
This commit allows outputting correct suggestions and errors. The above
code would output like this:
```
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^ ^
|
note: statement found outside of a block
--> ./main.rs:2:29
|
2 | let _x = Box::new(|x|x+1;);
| ---^ this `;` turns the preceding closure into a statement
| |
| this expression is a statement because of the trailing semicolon
note: the closure body may be incorrectly delimited
--> ./main.rs:2:23
|
2 | let _x = Box::new(|x|x+1;);
| ^^^^^^ - ...but likely you meant the closure to end here
| |
| this is the parsed closure...
help: try adding braces
|
2 | let _x = Box::new(|x| {x+1;});
| + +
error: aborting due to previous error
```
The motivation here is to eliminate the `Option<(Delimiter,
DelimSpan)>`, which is `None` for the outermost token stream and `Some`
for all other token streams.
We are already treating the innermost frame specially -- this is the
`frame` vs `stack` distinction in `TokenCursor`. We can push that
further so that `frame` only contains the cursor, and `stack` elements
contain the delimiters for their children. When we are in the outermost
token stream `stack` is empty, so there are no stored delimiters, which
is what we want because the outermost token stream *has* no delimiters.
This change also shrinks `TokenCursor`, which shrinks `Parser` and
`LazyAttrTokenStreamImpl`, which is nice.
Sometimes the parser needs to desugar a doc comment into `#[doc =
r"foo"]`. Currently it does this in a hacky way: by pushing a "fake" new
frame (one without a delimiter) onto the `TokenCursor` stack.
This commit changes things so that the token stream itself is modified
in place. The nice thing about this is that it means
`TokenCursorFrame::delim_sp` is now only `None` for the outermost frame.
Properly calculate best failure in macro matching
Previously, we used spans. This was not good. Sometimes, the span of the token that failed to match may come from a position later in the file which has been transcribed into a token stream way earlier in the file. If precisely this token fails to match, we think that it was the best match because its span is so high, even though other arms might have gotten further in the token stream.
We now try to properly use the location in the token stream.
This needs a little cleanup as the `best_failure` field is getting out of hand but it should be mostly good to go. I hope I didn't violate too many abstraction boundaries..
Previously, we used spans. This was not good. Sometimes, the span of the
token that failed to match may come from a position later in the file
which has been transcribed into a token stream way earlier in the file.
If precisely this token fails to match, we think that it was the best
match because its span is so high, even though other arms might have
gotten further in the token stream.
We now try to properly use the location in the token stream.
Because the error message that emit out is from main error of parser
The information of enum variant disappears while parsing enum variant with error
We only check the syntax of expecting token, i.e, in case #103869
It will error it without telling the message that this error is from pasring enum variant.
Propagate the sub-error from parsing enum variant to the main error of parser by chaining it with map_err
Check the sub-error before emitting the main error of parser and attach it.
Fix#103869
`MacArgs` is an enum with three variants: `Empty`, `Delimited`, and `Eq`. It's
used in two ways:
- For representing attribute macro arguments (e.g. in `AttrItem`), where all
three variants are used.
- For representing function-like macros (e.g. in `MacCall` and `MacroDef`),
where only the `Delimited` variant is used.
In other words, `MacArgs` is used in two quite different places due to them
having partial overlap. I find this makes the code hard to read. It also leads
to various unreachable code paths, and allows invalid values (such as
accidentally using `MacArgs::Empty` in a `MacCall`).
This commit splits `MacArgs` in two:
- `DelimArgs` is a new struct just for the "delimited arguments" case. It is
now used in `MacCall` and `MacroDef`.
- `AttrArgs` is a renaming of the old `MacArgs` enum for the attribute macro
case. Its `Delimited` variant now contains a `DelimArgs`.
Various other related things are renamed as well.
These changes make the code clearer, avoids several unreachable paths, and
disallows the invalid values.
Recover wrong-cased keywords that start items
(_this pr was inspired by [this tweet](https://twitter.com/Azumanga/status/1552982326409367561)_)
r? `@estebank`
We've talked a bit about this recovery, but I just wanted to make sure that this is the right approach :)
For now I've only added the case insensitive recovery to `use`s, since most other items like `impl` blocks, modules, functions can start with multiple keywords which complicates the matter.
`To` is better than `Create` for indicating that this is a non-consuming
conversion, rather than creating something out of nothing.
And the addition of `Attr` is because the current names makes them sound
like they relate to `TokenStream`, but really they relate to
`AttrTokenStream`.
These two type names are long and have long matching prefixes. I find
them hard to read, especially in combinations like
`AttrAnnotatedTokenStream::new(vec![AttrAnnotatedTokenTree::Token(..)])`.
This commit renames them as `AttrToken{Stream,Tree}`.
This PR will fix some typos detected by [typos].
I only picked the ones I was sure were spelling errors to fix, mostly in
the comments.
[typos]: https://github.com/crate-ci/typos
`Parser::parse_bottom_expr` currently constructs an empty `attrs` and
then passes it to a large number of other functions. This makes the code
harder to read than it should be, because it's not clear that many
`attrs` arguments are always empty.
This commit removes `attrs` and the passing, simplifying a lot of
functions. The commit also renames `Parser::mk_expr` (which takes an
`attrs` argument) as `mk_expr_with_attrs`, and introduces a new
`mk_expr` which creates an expression with no attributes, which is the
more common case.
Use Parser's `restrictions` instead of `let_expr_allowed`
This also means that the `ALLOW_LET` flag is reset properly for subexpressions, so we can properly deny things like `a && (b && let c = d)`. Also the parser is a tiny bit smaller now.
It doesn't reject _all_ bad `let` expr usages, just a bit more.
cc `@c410-f3r`
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is
not quite right. `Spacing` makes sense for `TokenTree::Token`, but does
not make sense for `TokenTree::Delimited`, because a
`TokenTree::Delimited` cannot be joined with another `TokenTree`.
This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`,
changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the
`TreeAndSpacing` typedef.
The commit removes these two impls:
- `impl From<TokenTree> for TokenStream`
- `impl From<TokenTree> for TreeAndSpacing`
These were useful, but also resulted in code with many `.into()` calls
that was hard to read, particularly for anyone not highly familiar with
the relevant types. This commit makes some other changes to compensate:
- `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`.
- `TokenStream::token_{alone,joint}()` are added.
- `TokenStream::delimited` is added.
This results in things like this:
```rust
TokenTree::token(token::Semi, stmt.span).into()
```
changing to this:
```rust
TokenStream::token_alone(token::Semi, stmt.span)
```
This makes the type of the result, and its spacing, clearer.
These changes also simplifies `Cursor` and `CursorRef`, because they no longer
need to distinguish between `next` and `next_with_spacing`.
Avoid some `&str` to `String` conversions with `MultiSpan::push_span_label`
This patch removes some`&str` to `String` conversions with `MultiSpan::push_span_label`.
Improve parser diagnostics
This pr fixes https://github.com/rust-lang/rust/issues/93867 and contains a couple of diagnostics related changes to the parser.
Here is a short list with some of the changes:
- don't suggest the same thing that is the current token
- suggest removing the current token if the following token is one of the suggestions (maybe incorrect)
- tell the user to put a type or lifetime after where if there is none (as a warning)
- reduce the amount of tokens suggested (via the new eat_noexpect and check_noexpect methods)
If any of these changes are undesirable, i can remove them, thanks!
Overhaul `MacArgs`
Motivation:
- Clarify some code that I found hard to understand.
- Eliminate one use of three places where `TokenKind::Interpolated` values are created.
r? `@petrochenkov`
The value in `MacArgs::Eq` is currently represented as a `Token`.
Because of `TokenKind::Interpolated`, `Token` can be either a token or
an arbitrary AST fragment. In practice, a `MacArgs::Eq` starts out as a
literal or macro call AST fragment, and then is later lowered to a
literal token. But this is very non-obvious. `Token` is a much more
general type than what is needed.
This commit restricts things, by introducing a new type `MacArgsEqKind`
that is either an AST expression (pre-lowering) or an AST literal
(post-lowering). The downside is that the code is a bit more verbose in
a few places. The benefit is that makes it much clearer what the
possibilities are (though also shorter in some other places). Also, it
removes one use of `TokenKind::Interpolated`, taking us a step closer to
removing that variant, which will let us make `Token` impl `Copy` and
remove many "handle Interpolated" code paths in the parser.
Things to note:
- Error messages have improved. Messages like this:
```
unexpected token: `"bug" + "found"`
```
now say "unexpected expression", which makes more sense. Although
arbitrary expressions can exist within tokens thanks to
`TokenKind::Interpolated`, that's not obvious to anyone who doesn't
know compiler internals.
- In `parse_mac_args_common`, we no longer need to collect tokens for
the value expression.
Change `span_suggestion` (and variants) to take `impl ToString` rather
than `String` for the suggested code, as this simplifies the
requirements on the diagnostic derive.
Signed-off-by: David Wood <david.wood@huawei.com>
This lets us clone just the parts within a `TokenTree` that need
cloning, rather than the entire thing. This is a surprisingly large
performance win, up to 4% on `async-std-1.10.0`.
This makes `CloseDelim` handling more like `OpenDelim` handling, which
produces `OpenDelim` and pushes the stack at the same time. It requires
some adjustment to `parse_token_tree` now that we don't remain within
the frame after getting the `CloseDelim`.
A Google search of the error message fails to return any relevant
resuts, suggesting this has never occurred in practice. And removeing it
reduces instruction counts by up to 2% on some benchmarks.
The loop is there to handle a `NoDelim` open/close token. This commit
changes `TokenCursor::inlined_next` so it never returns such a token.
This is a performance win because the conditional test in `bump()` is
removed.
If the parser needs changing in the future to handle `NoDelim` tokens,
then `inlined_next()` can easily be changed to return them.
The `DelimToken` here is `NoDelim`, which means the returned delim
tokens will just be ignored by `Parser::bump()`. This commit changes
things so the delim tokens won't be returned.
This will facilitate the change in the next commit.
`boolean` arguments aren't great, but the function is only used in three
places within this one file.
In particular, avoid wrapping a token within `TokenTree::Token` and then
immediately matching it and returning the token within. Just return the
token immediately.
Parse inner attributes on inline const block
According to https://github.com/rust-lang/rust/pull/84414#issuecomment-826150936, inner attributes are intended to be supported *"in all containers for statements (or some subset of statements)"*.
This PR adds inner attribute parsing and pretty-printing for inline const blocks (https://github.com/rust-lang/rust/issues/76001), which contain statements just like an unsafe block or a loop body.
```rust
let _ = const {
#![allow(...)]
let x = ();
x
};
```
By heap allocating the argument within `NtPath`, `NtVis`, and `NtStmt`.
This slightly reduces cumulative and peak allocation amounts, most
notably on `deep-vector`.
`MultiSpan` contains labels, which are more complicated with the
introduction of diagnostic translation and will use types from
`rustc_errors` - however, `rustc_errors` depends on `rustc_span` so
`rustc_span` cannot use types like `DiagnosticMessage` without
dependency cycles. Introduce a new `rustc_error_messages` crate that can
contain `DiagnosticMessage` and `MultiSpan`.
Signed-off-by: David Wood <david.wood@huawei.com>
It's only needed for macro expansion, not as a general element in the
AST. This commit removes it, adds `NtOrTt` for the parser and macro
expansion cases, and renames the variants in `NamedMatch` to better
match the new type.
The call site within `Parser::bump` is hot.
Also add an inline annotation to `Parser::next_tok`. It was already
being inlined by the compiler; this just makes sure that continues.
* Recover from invalid `'label: ` before block.
* Make suggestion to enclose statements in a block multipart.
* Point at `match`, `while`, `loop` and `unsafe` keywords when failing
to parse their expression.
* Do not suggest `{ ; }`.
* Do not suggest `|` when very unlikely to be what was wanted (in `let`
statements).
This commit focuses on emitting clean errors for the following syntax
error:
```
Some(42).map(|a|
dbg!(a);
a
);
```
Previous implementation tried to recover after parsing the closure body
(the `dbg` expression) by replacing the next `;` with a `,`, which made
the next expression belong to the next function argument. As such, the
following errors were emitted (among others):
- the semicolon token was not expected,
- a is not in scope,
- Option::map is supposed to take one argument, not two.
This commit allows us to gracefully handle this situation by adding
giving the parser the ability to remember when it has just parsed a
closure body inside a function call. When this happens, we can treat the
unexpected `;` specifically and try to parse as much statements as
possible in order to eat the whole block. When we can't parse statements
anymore, we generate a clean error indicating that the braces are
missing, and return an ExprKind::Err.
Point at unclosed delimiters as part of the primary MultiSpan
Both the place where the parser encounters a needed closed delimiter and
the unclosed opening delimiter are important, so they should get the
same level of highlighting in the output.
_Context: https://twitter.com/mwk4/status/1430631546432675840_
Both the place where the parser encounters a needed closed delimiter and
the unclosed opening delimiter are important, so they should get the
same level of highlighting in the output.
# Stabilization report
## Summary
This stabilizes using macro expansion in key-value attributes, like so:
```rust
#[doc = include_str!("my_doc.md")]
struct S;
#[path = concat!(env!("OUT_DIR"), "/generated.rs")]
mod m;
```
See the changes to the reference for details on what macros are allowed;
see Petrochenkov's excellent blog post [on internals](https://internals.rust-lang.org/t/macro-expansion-points-in-attributes/11455)
for alternatives that were considered and rejected ("why accept no more
and no less?")
This has been available on nightly since 1.50 with no major issues.
## Notes
### Accepted syntax
The parser accepts arbitrary Rust expressions in this position, but any expression other than a macro invocation will ultimately lead to an error because it is not expected by the built-in expression forms (e.g., `#[doc]`). Note that decorators and the like may be able to observe other expression forms.
### Expansion ordering
Expansion of macro expressions in "inert" attributes occurs after decorators have executed, analogously to macro expressions appearing in the function body or other parts of decorator input.
There is currently no way for decorators to accept macros in key-value position if macro expansion must be performed before the decorator executes (if the macro can simply be copied into the output for later expansion, that can work).
## Test cases
- https://github.com/rust-lang/rust/blob/master/src/test/ui/attributes/key-value-expansion-on-mac.rs
- https://github.com/rust-lang/rust/blob/master/src/test/rustdoc/external-doc.rs
The feature has also been dogfooded extensively in the compiler and
standard library:
- https://github.com/rust-lang/rust/pull/83329
- https://github.com/rust-lang/rust/pull/83230
- https://github.com/rust-lang/rust/pull/82641
- https://github.com/rust-lang/rust/pull/80534
## Implementation history
- Initial proposal: https://github.com/rust-lang/rust/issues/55414#issuecomment-554005412
- Experiment to see how much code it would break: https://github.com/rust-lang/rust/pull/67121
- Preliminary work to restrict expansion that would conflict with this
feature: https://github.com/rust-lang/rust/pull/77271
- Initial implementation: https://github.com/rust-lang/rust/pull/78837
- Fix for an ICE: https://github.com/rust-lang/rust/pull/80563
## Unresolved Questions
~~https://github.com/rust-lang/rust/pull/83366#issuecomment-805180738 listed some concerns, but they have been resolved as of this final report.~~
## Additional Information
There are two workarounds that have a similar effect for `#[doc]`
attributes on nightly. One is to emulate this behavior by using a limited version of this feature that was stabilized for historical reasons:
```rust
macro_rules! forward_inner_docs {
($e:expr => $i:item) => {
#[doc = $e]
$i
};
}
forward_inner_docs!(include_str!("lib.rs") => struct S {});
```
This also works for other attributes (like `#[path = concat!(...)]`).
The other is to use `doc(include)`:
```rust
#![feature(external_doc)]
#[doc(include = "lib.rs")]
struct S {}
```
The first works, but is non-trivial for people to discover, and
difficult to read and maintain. The second is a strange special-case for
a particular use of the macro. This generalizes it to work for any use
case, not just including files.
I plan to remove `doc(include)` when this is stabilized. The
`forward_inner_docs` workaround will still compile without warnings, but
I expect it to be used less once it's no longer necessary.
This PR modifies the macro expansion infrastructure to handle attributes
in a fully token-based manner. As a result:
* Derives macros no longer lose spans when their input is modified
by eager cfg-expansion. This is accomplished by performing eager
cfg-expansion on the token stream that we pass to the derive
proc-macro
* Inner attributes now preserve spans in all cases, including when we
have multiple inner attributes in a row.
This is accomplished through the following changes:
* New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced.
These are very similar to a normal `TokenTree`, but they also track
the position of attributes and attribute targets within the stream.
They are built when we collect tokens during parsing.
An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when
we invoke a macro.
* Token capturing and `LazyTokenStream` are modified to work with
`AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which
is created during the parsing of a nested AST node to make the 'outer'
AST node aware of the attributes and attribute target stored deeper in the token stream.
* When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`),
we tokenize and reparse our target, capturing additional information about the locations of
`#[cfg]` and `#[cfg_attr]` attributes at any depth within the target.
This is a performance optimization, allowing us to perform less work
in the typical case where captured tokens never have eager cfg-expansion run.
Those two recovery attempts have a very bad interaction that causes too
unnecessary output. Add a simple gate to avoid interpreting a `;` as a
`,` when there are unclosed braces.
Previously, we would silently remove any `None`-delimiters when
capturing a `TokenStream`, 'flattenting' them to their inner tokens.
This was not normally visible, since we usually have
`TokenKind::Interpolated` (which gets converted to a `None`-delimited
group during macro invocation) instead of an actual `None`-delimited
group.
However, there are a couple of cases where this becomes visible to
proc-macros:
1. A cross-crate `macro_rules!` macro has a `None`-delimited group
stored in its body (as a result of being produced by another
`macro_rules!` macro). The cross-crate `macro_rules!` invocation
can then expand to an attribute macro invocation, which needs
to be able to see the `None`-delimited group.
2. A proc-macro can invoke an attribute proc-macro with its re-collected
input. If there are any nonterminals present in the input, they will
get re-collected to `None`-delimited groups, which will then get
captured as part of the attribute macro invocation.
Both of these cases are incredibly obscure, so there hopefully won't be
any breakage. This change will allow more agressive 'flattenting' of
nonterminals in #82608 without losing `None`-delimited groups.
When token-based attribute handling is implemeneted in #80689,
we will need to access tokens from `HasAttrs` (to perform
cfg-stripping), and we will to access attributes from `HasTokens` (to
construct a `PreexpTokenStream`).
This PR merges the `HasAttrs` and `HasTokens` traits into a new
`AstLike` trait. The previous `HasAttrs` impls from `Vec<Attribute>` and `AttrVec`
are removed - they aren't attribute targets, so the impls never really
made sense.
Improve suggestion for tuple struct pattern matching errors.
Closes#80174
This change allows numbers to be parsed as field names when pattern matching on structs, which allows us to provide better error messages when tuple structs are matched using a struct pattern.
r? ``@estebank``
Along the way, we also implement a handful of diagnostics improvements
and fixes, particularly with respect to the special handling of `||` in
place of `|` and when there are leading verts in function params, which
don't allow top-level or-patterns anyway.
This is a pure refactoring split out from #80689.
It represents the most invasive part of that PR, requiring changes in
every caller of `parse_outer_attributes`
In order to eagerly expand `#[cfg]` attributes while preserving the
original `TokenStream`, we need to know the range of tokens that
corresponds to every attribute target. This is accomplished by making
`parse_outer_attributes` return an opaque `AttrWrapper` struct. An
`AttrWrapper` must be converted to a plain `AttrVec` by passing it to
`collect_tokens_trailing_token`. This makes it difficult to accidentally
construct an AST node with attributes without calling `collect_tokens_trailing_token`,
since AST nodes store an `AttrVec`, not an `AttrWrapper`.
As a result, we now call `collect_tokens_trailing_token` for attribute
targets which only support inert attributes, such as generic arguments
and struct fields. Currently, the constructed `LazyTokenStream` is
simply discarded. Future PRs will record the token range corresponding
to the attribute target, allowing those tokens to be removed from an
enclosing `collect_tokens_trailing_token` call if necessary.
Reverts PR #80830Fixestaiki-e/pin-project#312
We can have an arbitrary number of `None`-delimited group frames pushed
on the stack due to proc-macro invocations, which can legally be exited.
Attempting to account for this would add a lot of complexity for a tiny
performance gain, so let's just use the original strategy.
Improve diagnostics when parsing angle args
https://github.com/rust-lang/rust/pull/79266 introduced parsing of generic arguments in associated type constraints, this however resulted in possibly very confusing error messages in cases in which closing angle brackets were missing such as in `Vec<(u32, _, _) = vec![]`, which outputs an incorrectly parsed equality constraint error, as noted by `@cynecx.`
This PR tries to provide better error messages in such cases.
r? `@petrochenkov`
Currently, when a user uses a struct pattern to pattern match on
a tuple struct, the errors we emit generally suggest adding fields
using their field names, which are numbers. However, numbers are
not valid identifiers, so the suggestions, which use the shorthand
notation, are not valid syntax. This commit changes those errors
to suggest using the actual tuple struct pattern syntax instead,
which is a more actionable suggestion.
Fixes#81007
Previously, we would fail to collect tokens in the proper place when
only builtin attributes were present. As a result, we would end up with
attribute tokens in the collected `TokenStream`, leading to duplication
when we attempted to prepend the attributes from the AST node.
We now explicitly track when token collection must be performed due to
nomterminal parsing.
A new `HasTokens` trait is introduced, which is used to move logic from
the callers of `collect_tokens` into the body of `collect_tokens`.
In addition to reducing duplication, this paves the way for PR #80689,
which needs to perform additional logic during token collection.
We will never need to pop past our starting frame during token
capturing. Using an empty stack allows us to avoid pointless heap
allocations/deallocations.
If we try to capture the `Vec<u8>` in `Option<Vec<u8>>`, we'll
need to capture a `>` token which was 'unglued' from a `>>` token.
The processing of unglueing a token for parsing purposes bypasses the
usual capturing infrastructure, so we currently lose the trailing `>`.
As a result, we fall back to the reparsed `TokenStream`, causing us to
lose spans.
This commit makes token capturing keep track of a trailing 'unglued'
token. Note that we don't need to care about unglueing except at the end
of the captured tokens - if we capture both the first and second unglued
tokens, then we'll end up capturing the full 'glued' token, which
already works correctly.
rustc_parse: fix ConstBlock expr span
The span for a ConstBlock expression should presumably run through the end of the block it contains and not stop at the keyword, just like is done with similar block-containing expression kinds, such as a TryBlock
We now collect tokens for the underlying node wrapped by `StmtKind`
instead of storing tokens directly in `Stmt`.
`LazyTokenStream` now supports capturing a trailing semicolon after it
is initially constructed. This allows us to avoid refactoring statement
parsing to wrap the parsing of the semicolon in `parse_tokens`.
Attributes on item statements
(e.g. `fn foo() { #[bar] struct MyStruct; }`) are now treated as
item attributes, not statement attributes, which is consistent with how
we handle attributes on other kinds of statements. The feature-gating
code is adjusted so that proc-macro attributes are still allowed on item
statements on stable.
Two built-in macros (`#[global_allocator]` and `#[test]`) needed to be
adjusted to support being passed `Annotatable::Stmt`.
The optimization conflates empty token streams with unknown token stream, which is at least suspicious, and doesn't affect performance because 0-length token streams are very rare.
Suggest that expressions that look like const generic arguments should be enclosed in brackets
I pulled out the changes for const expressions from https://github.com/rust-lang/rust/pull/71592 (without the trait object diagnostic changes) and made some small changes; the implementation is `@estebank's.`
We're also going to want to make some changes separately to account for trait objects (they result in poor diagnostics, as is evident from one of the test cases here), such as an adaption of https://github.com/rust-lang/rust/pull/72273.
Fixes https://github.com/rust-lang/rust/issues/70753.
r? `@petrochenkov`
Unconditionally capture tokens for attributes.
This allows us to avoid synthesizing tokens in `prepend_attr`, since we
have the original tokens available.
We still need to synthesize tokens when expanding `cfg_attr`,
but this is an unavoidable consequence of the syntax of `cfg_attr` -
the user does not supply the `#` and `[]` tokens that a `cfg_attr`
expands to.
This is based on PR https://github.com/rust-lang/rust/pull/77250 - this PR exposes a bug in the current `collect_tokens` implementation, which is fixed by the rewrite.
Rewrite `collect_tokens` implementations to use a flattened buffer
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.
The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.
This implementation has a number of advantages over the previous one:
* It is significantly simpler, with no edge cases around capturing the
start/end of a delimited group.
* It can be easily extended to allow replacing tokens an an arbitrary
'depth' by just using `Vec::splice` at the proper position. This is
important for PR #76130, which requires us to track information about
attributes along with tokens.
* The lazy approach to `TokenStream` construction allows us to easily
parse an AST struct, and then decide after the fact whether we need a
`TokenStream`. This will be useful when we start collecting tokens for
`Attribute` - we can discard the `LazyTokenStream` if the parsed
attribute doesn't need tokens (e.g. is a builtin attribute).
The performance impact seems to be neglibile (see
https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.
The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.
This implementation has a number of advantages over the previous one:
* It is significantly simpler, with no edge cases around capturing the
start/end of a delimited group.
* It can be easily extended to allow replacing tokens an an arbitrary
'depth' by just using `Vec::splice` at the proper position. This is
important for PR #76130, which requires us to track information about
attributes along with tokens.
* The lazy approach to `TokenStream` construction allows us to easily
parse an AST struct, and then decide after the fact whether we need a
`TokenStream`. This will be useful when we start collecting tokens for
`Attribute` - we can discard the `LazyTokenStream` if the parsed
attribute doesn't need tokens (e.g. is a builtin attribute).
The performance impact seems to be neglibile (see
https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
This approach lives exclusively in the parser, so struct expr bodies
that are syntactically correct on their own but are otherwise incorrect
will still emit confusing errors, like in the following case:
```rust
fn foo() -> Foo {
bar: Vec::new()
}
```
```
error[E0425]: cannot find value `bar` in this scope
--> src/file.rs:5:5
|
5 | bar: Vec::new()
| ^^^ expecting a type here because of type ascription
error[E0214]: parenthesized type parameters may only be used with a `Fn` trait
--> src/file.rs:5:15
|
5 | bar: Vec::new()
| ^^^^^ only `Fn` traits may use parentheses
error[E0107]: wrong number of type arguments: expected 1, found 0
--> src/file.rs:5:10
|
5 | bar: Vec::new()
| ^^^^^^^^^^ expected 1 type argument
```
If that field had a trailing comma, that would be a parse error and it
would trigger the new, more targetted, error:
```
error: struct literal body without path
--> file.rs:4:17
|
4 | fn foo() -> Foo {
| _________________^
5 | | bar: Vec::new(),
6 | | }
| |_^
|
help: you might have forgotten to add the struct literal inside the block
|
4 | fn foo() -> Foo { Path {
5 | bar: Vec::new(),
6 | } }
|
```
Partially address last part of #34255.
Improve recovery on malformed format call
The token following a format expression should be a comma. However, when it is replaced with a similar token (such as a dot), then the corresponding error is emitted, but the token is treated as a comma, and the parsing step continues.
r? @petrochenkov