Adding details, clarifying lots of little things, etc. In particular,
the commit adds details of an example. I find this very helpful, because
it's taken me a long time to understand how this code works.
Currently in `collect_tokens_trailing_token`, `start_pos` and `end_pos`
are 1-indexed by `replace_ranges` is 0-indexed, which is really
confusing. Making them both 0-indexed makes debugging much easier.
The `Option`s within the `ReplaceRange`s within the hashmap are always
`None`. This PR omits them and inserts them when they are extracted from
the hashmap.
It's used in `Parser::collect_tokens_trailing_token` to decide whether
to capture a trailing token. But the callers actually know whether to
capture a trailing token, so it's simpler for them to just pass in a
bool.
Also, the `TrailingToken::Gt` case was weird, because it didn't result
in a trailing token being captured. It could have been subsumed by the
`TrailingToken::MaybeComma` case, and it effectively is in the new code.
Fix `DebugParser`.
I tried using this and it didn't work at all. `prev_token` is never eof, so the accumulator is always false, which means the `then_some` always returns `None`, which means `scan` always returns `None`, and `tokens` always ends up an empty vec. I'm not sure how this code was supposed to work.
(An aside: I find `Iterator::scan` to be a pretty wretched function, that produces code which is very hard to understand. Probably why this is just one of two uses of it in the entire compiler.)
This commit changes it to a simpler imperative style that produces a valid `tokens` vec.
r? `@workingjubilee`
It currently doesn't work at all. This commit changes it to a simpler
imperative style that produces a valid `tokens` vec.
(An aside: I find `Iterator::scan` to be a pretty wretched function,
that produces code which is very hard to understand. Probably why this
is just one of two uses of it in the entire compiler.)
This new special case is simpler than the old special case because it
only is used when `dist == 1`. But that's still enough to cover ~98% of
cases. This results in equivalent performance to the old special case,
and identical behaviour as the general case.
The general case at the bottom of `look_ahead` is slow, because it
clones the token cursor. Above it there is a special case for
performance that is hit most of the time and avoids the cloning.
Unfortunately, its behaviour differs from the general case in two ways.
- When within a pair of delimiters, if you look any distance past the
closing delimiter you get the closing delimiter instead of what comes
after the closing delimiter.
- It uses `tree_cursor.look_ahead(dist - 1)` which totally confuses
tokens with token trees. This means that only the first token in a
token tree will be seen. E.g. in a sequence like `{ a }` the `a` and
`}` will be skipped over. Bad!
It's likely that these differences weren't noticed before now because
the use of `look_ahead` in the parser is limited to small distances and
relatively few contexts.
Removing the special case causes slowdowns up of to 2% on a range of
benchmarks. The next commit will add a new, correct special case to
regain that lost performance.
Currently the second element is a `Vec<(FlatToken, Spacing)>`. But the
vector always has zero or one elements, and the `FlatToken` is always
`FlatToken::AttrTarget` (which contains an `AttributesData`), and the
spacing is always `Alone`. So we can simplify it to
`Option<AttributesData>`.
An assertion in `to_attr_token_stream` can can also be removed, because
`new_tokens.len()` was always 0 or 1, which means than `range.len()`
is always greater than or equal to it, because `range.is_empty()` is
always false (as per the earlier assertion).
The number of source code bytes can't exceed a `u32`'s range, so a token
position also can't. This reduces the size of `Parser` and
`LazyAttrTokenStreamImpl` by eight bytes each.
Fix duplicated attributes on nonterminal expressions
This PR fixes a long-standing bug (#86055) whereby expression attributes can be duplicated when expanded through declarative macros.
First, consider how items are parsed in declarative macros:
```
Items:
- parse_nonterminal
- parse_item(ForceCollect::Yes)
- parse_item_
- attrs = parse_outer_attributes
- parse_item_common(attrs)
- maybe_whole!
- collect_tokens_trailing_token
```
The important thing is that the parsing of outer attributes is outside token collection, so the item's tokens don't include the attributes. This is how it's supposed to be.
Now consider how expression are parsed in declarative macros:
```
Exprs:
- parse_nonterminal
- parse_expr_force_collect
- collect_tokens_no_attrs
- collect_tokens_trailing_token
- parse_expr
- parse_expr_res(None)
- parse_expr_assoc_with
- parse_expr_prefix
- parse_or_use_outer_attributes
- parse_expr_dot_or_call
```
The important thing is that the parsing of outer attributes is inside token collection, so the the expr's tokens do include the attributes, i.e. in `AttributesData::tokens`.
This PR fixes the bug by rearranging expression parsing to that outer attribute parsing happens outside of token collection. This requires a number of small refactorings because expression parsing is somewhat complicated. While doing so the PR makes the code a bit cleaner and simpler, by eliminating `parse_or_use_outer_attributes` and `Option<AttrWrapper>` arguments (in favour of the simpler `parse_outer_attributes` and `AttrWrapper` arguments), and simplifying `LhsExpr`.
r? `@petrochenkov`
The extra span is now recorded in the new `TokenKind::NtIdent` and
`TokenKind::NtLifetime`. These both consist of a single token, and so
there's no operator precedence problems with inserting them directly
into the token stream.
The other way to do this would be to wrap the ident/lifetime in invisible
delimiters, but there's a lot of code that assumes an interpolated
ident/lifetime fits in a single token, and changing all that code to work with
invisible delimiters would have been a pain. (Maybe it could be done in a
follow-up.)
This change might not seem like much of a win, but it's a first step toward the
much bigger and long-desired removal of `Nonterminal` and
`TokenKind::Interpolated`. That change is big and complex enough that it's
worth doing this piece separately. (Indeed, this commit is based on part of a
late commit in #114647, a prior attempt at that big and complex change.)
This span records the declaration of the metavariable in the LHS of the macro.
It's used in a couple of error messages. Unfortunately, it gets in the way of
the long-term goal of removing `TokenKind::Interpolated`. So this commit
removes it, which degrades a couple of (obscure) error messages but makes
things simpler and enables the next commit.
The starting point for this was identical comments on two different
fields, in `ast::VariantData::Struct` and `hir::VariantData::Struct`:
```
// FIXME: investigate making this a `Option<ErrorGuaranteed>`
recovered: bool
```
I tried that, and then found that I needed to add an `ErrorGuaranteed`
to `Recovered::Yes`. Then I ended up using `Recovered` instead of
`Option<ErrorGuaranteed>` for these two places and elsewhere, which
required moving `ErrorGuaranteed` from `rustc_parse` to `rustc_ast`.
This makes things more consistent, because `Recovered` is used in more
places, and there are fewer uses of `bool` and
`Option<ErrorGuaranteed>`. And safer, because it's difficult/impossible
to set `recovered` to `Recovered::Yes` without having emitted an error.
Improve `rustc_parse::Parser`'s debuggability
The main event is the final commit where I add `Parser::debug_lookahead`. Everything else was basically cleaning up things that bugged me (debugging, as it were) until I felt comfortable enough to actually work on it.
The motivation is that it's annoying as hell to try to figure out how the debug infra works in rustc without having basic queries like `debug!(?parser);` come up "empty". However, Parser has a lot of fields that are mostly irrelevant for most debugging, like the entire ParseSess. I think `Parser::debug_lookahead` with a capped lookahead might be fine as a general-purpose Debug impl, but this adapter version was suggested to allow more choice, and admittedly, it's a refined version of what I was already handrolling just to get some insight going.
I tried debugging a parser-related issue but found it annoying to not be
able to easily peek into the Parser's token stream.
Add a convenience fn that offers an opinionated view into the parser,
but one that is useful for answering basic questions about parser state.
There are some test cases involving `parse` and `tokenstream` and
`mut_visit` that are located in `rustc_expand`. Because it used to be
the case that constructing a `ParseSess` required the involvement of
`rustc_expand`. However, since #64197 merged (a long time ago)
`rust_expand` no longer needs to be involved.
This commit moves the tests into `rustc_parse`. This is the optimal
place for the `parse` tests. It's not ideal for the `tokenstream` and
`mut_visit` tests -- they would be better in `rustc_ast` -- but they
still rely on parsing, which is not available in `rustc_ast`. But
`rustc_parse` is lower down in the crate graph and closer to `rustc_ast`
than `rust_expand`, so it's still an improvement for them.
The exact renaming is as follows:
- rustc_expand/src/mut_visit/tests.rs -> rustc_parse/src/parser/mut_visit/tests.rs
- rustc_expand/src/tokenstream/tests.rs -> rustc_parse/src/parser/tokenstream/tests.rs
- rustc_expand/src/tests.rs + rustc_expand/src/parse/tests.rs ->
compiler/rustc_parse/src/parser/tests.rs
The latter two test files are combined because there's no need for them
to be separate, and having a `rustc_parse::parser::parse` module would
be weird. This also means some `pub(crate)`s can be removed.
Don't fatal when calling `expect_one_of` when recovering arg in `parse_seq`
In `parse_seq`, when parsing a sequence of token-separated items, if we don't see a separator, we try to parse another item eagerly in order to give a good diagnostic and recover from a missing separator:
d1a0fa5ed3/compiler/rustc_parse/src/parser/mod.rs (L900-L901)
If parsing the item itself calls `expect_one_of`, then we will fatal because of #58903:
d1a0fa5ed3/compiler/rustc_parse/src/parser/mod.rs (L513-L516)
For `precise_capturing` feature I implemented, we do end up calling `expected_one_of`:
d1a0fa5ed3/compiler/rustc_parse/src/parser/ty.rs (L712-L714)
This leads the compiler to fatal *before* having emitted the first error, leading to absolutely no useful information for the user about what happened in the parser.
This PR makes it so that we stop doing that.
Fixes#124195
Cleanup: Rename `ModSep` to `PathSep`
`::` is usually referred to as the *path separator* (citation needed).
The existing name `ModSep` for *module separator* is a bit misleading since it in fact separates the segments of arbitrary path segments, not only ones resolving to modules. Let me just give a shout-out to associated items (`T::Assoc`, `<Ty as Trait>::function`) and enum variants (`Option::None`).
Motivation: Reduce friction for new contributors, prevent potential confusion.
cc `@petrochenkov`
r? nnethercote or compiler
Interpolated cleanups
Various cleanups I made while working on attempts to remove `Interpolated`, that are worth merging now. Best reviewed one commit at a time.
r? `@petrochenkov`
Inline a bunch of trivial conditions in parser
It is often the case that these small, conditional functions, when inlined, reveal notable optimization opportunities to LLVM. While saethlin has done a lot of good work on making these kinds of small functions not need `#[inline]` tags as much, being clearer about what we want inlined will get both the MIR opts and LLVM to pursue it more aggressively.
On local perf runs, this seems fruitful. Let's see what rust-timer says.
r? `@ghost`
This commit combines `MatchedTokenTree` and `MatchedNonterminal`, which
are often considered together, into a single `MatchedSingle`. It shares
a representation with the newly-parameterized `ParseNtResult`.
This will also make things much simpler if/when variants from
`Interpolated` start being moved to `ParseNtResult`.