Deduplicate mismatched delimiter errors
Delay unmatched delimiter errors until after the parser has run to deduplicate them when parsing and attempt recovering intelligently.
Second attempt at #54029, follow up to #53949. Fix#31528.
This commit changes `syntax::fold::Folder` from a functional style
(where most methods take a `T` and produce a new `T`) to a more
imperative style (where most methods take and modify a `&mut T`), and
renames it `syntax::mut_visit::MutVisitor`.
The first benefit is speed. The functional style does not require any
reallocations, due to the use of `P::map` and
`MoveMap::move_{,flat_}map`. However, every field in the AST must be
overwritten; even those fields that are unchanged are overwritten with
the same value. This causes a lot of unnecessary memory writes. The
imperative style reduces instruction counts by 1--3% across a wide range
of workloads, particularly incremental workloads.
The second benefit is conciseness; the imperative style is usually more
concise. E.g. compare the old functional style:
```
fn fold_abc(&mut self, abc: ABC) {
ABC {
a: fold_a(abc.a),
b: fold_b(abc.b),
c: abc.c,
}
}
```
with the imperative style:
```
fn visit_abc(&mut self, ABC { a, b, c: _ }: &mut ABC) {
visit_a(a);
visit_b(b);
}
```
(The reductions get larger in more complex examples.)
Overall, the patch removes over 200 lines of code -- even though the new
code has more comments -- and a lot of the remaining lines have fewer
characters.
Some notes:
- The old style used methods called `fold_*`. The new style mostly uses
methods called `visit_*`, but there are a few methods that map a `T`
to something other than a `T`, which are called `flat_map_*` (`T` maps
to multiple `T`s) or `filter_map_*` (`T` maps to 0 or 1 `T`s).
- `move_map.rs`/`MoveMap`/`move_map`/`move_flat_map` are renamed
`map_in_place.rs`/`MapInPlace`/`map_in_place`/`flat_map_in_place` to
reflect their slightly changed signatures.
- Although this commit renames the `fold` module as `mut_visit`, it
keeps it in the `fold.rs` file, so as not to confuse git. The next
commit will rename the file.
Use a `newtype_index!` within `Symbol`.
This shrinks `Option<Symbol>` from 8 bytes to 4 bytes, which shrinks
`Token` from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.
r? @oli-obk
This shrinks `Option<Symbol>` from 8 bytes to 4 bytes, which shrinks
`Token` from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.
Because it's an extra type layer that doesn't really help; in a couple
of places it actively gets in the way, and overall removing it makes the
code nicer. It does, however, move `tokenstream::TokenTree` further away
from the `TokenTree` in `quote.rs`.
More importantly, this change reduces the size of `TokenStream` from 48
bytes to 40 bytes on x86-64, which is enough to slightly reduce
instruction counts on numerous benchmarks, the best by 1.5%.
Note that `open_tt` and `close_tt` have gone from being methods on
`Delimited` to associated methods of `TokenTree`.
This commit updates the tokenization of items which are subsequently passed to
`proc_macro` to ensure that span information is preserved on attributes as much
as possible. Previously this area of the code suffered from #43081 where we
haven't actually implemented converting an attribute to to a token tree yet, but
a local fix was possible here.
Closes#47941
rustc: Fix procedural macros generating lifetime tokens
This commit fixes an accidental regression from #50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!
The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.
Closes#50942
This commit fixes an accidental regression from #50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!
The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.
Closes#50942
This commit fixes `StringReader`'s parsing of tokens which have been stringified
through procedural macros. Whether or not a token tree is joint is defined by
span information, but when working with procedural macros these spans are often
dummy and/or overridden which means that they end up considering all operators
joint if they can!
The fix here is to track the raw source span as opposed to the overridden span.
With this information we can more accurately classify `Punct` structs as either
joint or not.
Closes#50700
Discovered in #50061 we're falling off the "happy path" of using a stringified
token stream more often than we should. This was due to the fact that a
user-written token like `0xf` is equality-different from the stringified token
of `15` (despite being semantically equivalent).
This patch updates the call to `eq_unspanned` with an even more awful solution,
`probably_equal_for_proc_macro`, which ignores the value of each token and
basically only compares the structure of the token stream, assuming that the AST
doesn't change just one token at a time.
While this is a step towards fixing #50061 there is still one regression
from #49154 which needs to be fixed.
proc_macro: Avoid cached TokenStream more often
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.
This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.
This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.
Closes#48644Closes#49846
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.
This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.
This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.
Closes#48644Closes#49846
Impressing confused Python users with magical diagnostics is perhaps
worth this not-grossly-unreasonable (only 40ish lines) extra complexity
in the parser?
Thanks to Vadim Petrochenkov for guidance.
This resolves#46836.