rust/compiler/rustc_parse/src/parser/mod.rs

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1754 lines
66 KiB
Rust
Raw Normal View History

2019-10-16 11:23:46 +00:00
pub mod attr;
2021-02-13 17:42:43 +00:00
mod attr_wrapper;
mod diagnostics;
2019-08-11 11:14:30 +00:00
mod expr;
mod generics;
2019-08-11 16:34:42 +00:00
mod item;
mod nonterminal;
2019-08-11 17:59:27 +00:00
mod pat;
mod path;
2019-08-11 18:32:29 +00:00
mod stmt;
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
pub mod token_type;
mod ty;
2024-08-11 16:10:36 +00:00
use std::assert_matches::debug_assert_matches;
use std::ops::Range;
use std::sync::Arc;
use std::{fmt, mem, slice};
use attr_wrapper::{AttrWrapper, UsePreAttrPos};
pub use diagnostics::AttemptLocalParseRecovery;
pub(crate) use expr::ForbiddenLetReason;
pub(crate) use item::FnParseMode;
pub use pat::{CommaRecoveryMode, RecoverColon, RecoverComma};
2024-06-03 05:47:46 +00:00
use path::PathStyle;
use rustc_ast::ptr::P;
use rustc_ast::token::{
self, Delimiter, IdentIsRaw, InvisibleOrigin, MetaVarKind, Nonterminal, Token, TokenKind,
};
use rustc_ast::tokenstream::{AttrsTarget, Spacing, TokenStream, TokenTree};
use rustc_ast::util::case::Case;
2024-03-24 01:04:45 +00:00
use rustc_ast::{
2024-10-16 23:14:01 +00:00
self as ast, AnonConst, AttrArgs, AttrId, ByRef, Const, CoroutineKind, DUMMY_NODE_ID,
DelimArgs, Expr, ExprKind, Extern, HasAttrs, HasTokens, Mutability, Recovered, Safety, StrLit,
Visibility, VisibilityKind,
2024-03-24 01:04:45 +00:00
};
use rustc_ast_pretty::pprust;
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
use rustc_data_structures::fx::FxHashMap;
use rustc_errors::{Applicability, Diag, FatalError, MultiSpan, PResult};
use rustc_index::interval::IntervalSet;
use rustc_session::parse::ParseSess;
use rustc_span::{DUMMY_SP, Ident, Span, Symbol, kw, sym};
use thin_vec::ThinVec;
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
use token_type::TokenTypeSet;
pub use token_type::{ExpKeywordPair, ExpTokenPair, TokenType};
use tracing::debug;
2018-07-03 17:38:14 +00:00
use crate::errors::{
self, IncorrectVisibilityRestriction, MismatchedClosingDelimiter, NonStringAbiLiteral,
};
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
use crate::exp;
use crate::lexer::UnmatchedDelim;
#[cfg(test)]
mod tests;
// Ideally, these tests would be in `rustc_ast`. But they depend on having a
// parser, so they are here.
#[cfg(test)]
mod tokenstream {
mod tests;
}
#[cfg(test)]
mod mut_visit {
mod tests;
}
2019-02-06 17:33:01 +00:00
bitflags::bitflags! {
#[derive(Clone, Copy, Debug)]
struct Restrictions: u8 {
const STMT_EXPR = 1 << 0;
const NO_STRUCT_LITERAL = 1 << 1;
const CONST_EXPR = 1 << 2;
const ALLOW_LET = 1 << 3;
const IN_IF_GUARD = 1 << 4;
const IS_PAT = 1 << 5;
}
}
#[derive(Clone, Copy, PartialEq, Debug)]
2019-10-08 07:35:34 +00:00
enum SemiColonMode {
2016-02-10 03:11:27 +00:00
Break,
Ignore,
Comma,
2016-02-10 03:11:27 +00:00
}
#[derive(Clone, Copy, PartialEq, Debug)]
2019-10-08 07:35:34 +00:00
enum BlockMode {
Break,
Ignore,
}
/// Whether or not we should force collection of tokens for an AST node,
/// regardless of whether or not it has attributes
#[derive(Clone, Copy, Debug, PartialEq)]
pub enum ForceCollect {
Yes,
No,
}
2019-08-11 13:24:37 +00:00
#[macro_export]
macro_rules! maybe_whole {
($p:expr, $constructor:ident, |$x:ident| $e:expr) => {
if let token::Interpolated(nt) = &$p.token.kind
&& let token::$constructor(x) = &**nt
{
#[allow(unused_mut)]
let mut $x = x.clone();
$p.bump();
return Ok($e);
2013-07-02 19:47:32 +00:00
}
};
}
/// If the next tokens are ill-formed `$ty::` recover them as `<$ty>::`.
2019-08-11 11:14:30 +00:00
#[macro_export]
macro_rules! maybe_recover_from_interpolated_ty_qpath {
($self: expr, $allow_qpath_recovery: expr) => {
if $allow_qpath_recovery
2023-11-13 13:24:55 +00:00
&& $self.may_recover()
&& let Some(mv_kind) = $self.token.is_metavar_seq()
&& let token::MetaVarKind::Ty { .. } = mv_kind
&& $self.check_noexpect_past_close_delim(&token::PathSep)
2023-11-13 13:24:55 +00:00
{
// Reparse the type, then move to recovery.
let ty = $self
.eat_metavar_seq(mv_kind, |this| this.parse_ty_no_question_mark_recover())
.expect("metavar seq ty");
2023-11-13 13:24:55 +00:00
return $self.maybe_recover_from_bad_qpath_stage_2($self.prev_token.span, ty);
}
};
}
#[derive(Clone, Copy, Debug)]
pub enum Recovery {
Allowed,
Forbidden,
}
#[derive(Clone)]
2014-03-09 14:54:34 +00:00
pub struct Parser<'a> {
pub psess: &'a ParseSess,
/// The current token.
pub token: Token,
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// The spacing for the current token.
2024-06-03 05:47:46 +00:00
token_spacing: Spacing,
/// The previous token.
pub prev_token: Token,
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
pub capture_cfg: bool,
restrictions: Restrictions,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
expected_token_types: TokenTypeSet,
2019-08-31 13:03:54 +00:00
token_cursor: TokenCursor,
// The number of calls to `bump`, i.e. the position in the token stream.
num_bump_calls: u32,
// During parsing we may sometimes need to "unglue" a glued token into two
// or three component tokens (e.g. `>>` into `>` and `>`, or `>>=` into `>`
// and `>` and `=`), so the parser can consume them one at a time. This
// process bypasses the normal capturing mechanism (e.g. `num_bump_calls`
// will not be incremented), since the "unglued" tokens due not exist in
// the original `TokenStream`.
//
// If we end up consuming all the component tokens, this is not an issue,
// because we'll end up capturing the single "glued" token.
//
// However, sometimes we may want to capture not all of the original
// token. For example, capturing the `Vec<u8>` in `Option<Vec<u8>>`
// requires us to unglue the trailing `>>` token. The `break_last_token`
// field is used to track these tokens. They get appended to the captured
// stream when we evaluate a `LazyAttrTokenStream`.
//
// This value is always 0, 1, or 2. It can only reach 2 when splitting
// `>>=` or `<<=`.
break_last_token: u32,
/// This field is used to keep track of how many left angle brackets we have seen. This is
/// required in order to detect extra leading left angle brackets (`<` characters) and error
/// appropriately.
///
/// See the comments in the `parse_path_segment` function for more details.
unmatched_angle_bracket_count: u16,
angle_bracket_nesting: u16,
2023-03-03 22:48:21 +00:00
2019-10-08 07:35:34 +00:00
last_unexpected_token_span: Option<Span>,
2019-05-22 05:17:53 +00:00
/// If present, this `Parser` is not parsing Rust code but rather a macro call.
2019-10-08 07:35:34 +00:00
subparser_name: Option<&'static str>,
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
capture_state: CaptureState,
/// This allows us to recover when the user forget to add braces around
/// multiple statements in the closure body.
current_closure: Option<ClosureSpans>,
2022-10-26 19:09:28 +00:00
/// Whether the parser is allowed to do recovery.
/// This is disabled when parsing macro arguments, see #103534
2024-06-03 05:47:46 +00:00
recovery: Recovery,
}
// This type is used a lot, e.g. it's cloned when matching many declarative macro rules with
// nonterminals. Make sure it doesn't unintentionally get bigger. We only check a few arches
// though, because `TokenTypeSet(u128)` alignment varies on others, changing the total size.
#[cfg(all(target_pointer_width = "64", any(target_arch = "aarch64", target_arch = "x86_64")))]
rustc_data_structures::static_assert_size!(Parser<'_>, 288);
2022-04-19 23:31:34 +00:00
/// Stores span information about a closure.
#[derive(Clone, Debug)]
2024-06-03 05:47:46 +00:00
struct ClosureSpans {
whole_closure: Span,
closing_pipe: Span,
body: Span,
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
/// A token range within a `Parser`'s full token stream.
#[derive(Clone, Debug)]
struct ParserRange(Range<u32>);
/// A token range within an individual AST node's (lazy) token stream, i.e.
/// relative to that node's first token. Distinct from `ParserRange` so the two
/// kinds of range can't be mixed up.
#[derive(Clone, Debug)]
struct NodeRange(Range<u32>);
/// Indicates a range of tokens that should be replaced by an `AttrsTarget`
/// (replacement) or be replaced by nothing (deletion). This is used in two
/// places during token collection.
///
/// 1. Replacement. During the parsing of an AST node that may have a
/// `#[derive]` attribute, when we parse a nested AST node that has `#[cfg]`
/// or `#[cfg_attr]`, we replace the entire inner AST node with
/// `FlatToken::AttrsTarget`. This lets us perform eager cfg-expansion on an
/// `AttrTokenStream`.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
///
/// 2. Deletion. We delete inner attributes from all collected token streams,
/// and instead track them through the `attrs` field on the AST node. This
/// lets us manipulate them similarly to outer attributes. When we create a
/// `TokenStream`, the inner attributes are inserted into the proper place
/// in the token stream.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
///
/// Each replacement starts off in `ParserReplacement` form but is converted to
/// `NodeReplacement` form when it is attached to a single AST node, via
/// `LazyAttrTokenStreamImpl`.
type ParserReplacement = (ParserRange, Option<AttrsTarget>);
/// See the comment on `ParserReplacement`.
type NodeReplacement = (NodeRange, Option<AttrsTarget>);
impl NodeRange {
// Converts a range within a parser's tokens to a range within a
// node's tokens beginning at `start_pos`.
//
// For example, imagine a parser with 50 tokens in its token stream, a
// function that spans `ParserRange(20..40)` and an inner attribute within
// that function that spans `ParserRange(30..35)`. We would find the inner
// attribute's range within the function's tokens by subtracting 20, which
// is the position of the function's start token. This gives
// `NodeRange(10..15)`.
fn new(ParserRange(parser_range): ParserRange, start_pos: u32) -> NodeRange {
assert!(!parser_range.is_empty());
assert!(parser_range.start >= start_pos);
NodeRange((parser_range.start - start_pos)..(parser_range.end - start_pos))
}
}
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// Controls how we capture tokens. Capturing can be expensive,
/// so we try to avoid performing capturing in cases where
/// we will never need an `AttrTokenStream`.
#[derive(Copy, Clone, Debug)]
2024-06-03 05:47:46 +00:00
enum Capturing {
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// We aren't performing any capturing - this is the default mode.
No,
/// We are capturing tokens
Yes,
}
// This state is used by `Parser::collect_tokens`.
#[derive(Clone, Debug)]
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
struct CaptureState {
capturing: Capturing,
parser_replacements: Vec<ParserReplacement>,
inner_attr_parser_ranges: FxHashMap<AttrId, ParserRange>,
// `IntervalSet` is good for perf because attrs are mostly added to this
// set in contiguous ranges.
seen_attrs: IntervalSet<AttrId>,
}
#[derive(Clone, Debug)]
struct TokenTreeCursor {
stream: TokenStream,
/// Points to the current token tree in the stream. In `TokenCursor::curr`,
/// this can be any token tree. In `TokenCursor::stack`, this is always a
/// `TokenTree::Delimited`.
index: usize,
}
impl TokenTreeCursor {
#[inline]
fn new(stream: TokenStream) -> Self {
TokenTreeCursor { stream, index: 0 }
}
#[inline]
fn curr(&self) -> Option<&TokenTree> {
self.stream.get(self.index)
}
#[inline]
fn bump(&mut self) {
self.index += 1;
}
}
/// A `TokenStream` cursor that produces `Token`s. It's a bit odd that
/// we (a) lex tokens into a nice tree structure (`TokenStream`), and then (b)
/// use this type to emit them as a linear sequence. But a linear sequence is
/// what the parser expects, for the most part.
#[derive(Clone, Debug)]
2019-08-31 13:03:54 +00:00
struct TokenCursor {
// Cursor for the current (innermost) token stream. The index within the
// cursor can point to any token tree in the stream (or one past the end).
// The delimiters for this token stream are found in `self.stack.last()`;
// if that is `None` we are in the outermost token stream which never has
// delimiters.
curr: TokenTreeCursor,
// Token streams surrounding the current one. The index within each cursor
// always points to a `TokenTree::Delimited`.
stack: Vec<TokenTreeCursor>,
}
impl TokenCursor {
fn next(&mut self) -> (Token, Spacing) {
self.inlined_next()
}
/// This always-inlined version should only be used on hot code paths.
#[inline(always)]
fn inlined_next(&mut self) -> (Token, Spacing) {
loop {
2023-07-31 01:10:25 +00:00
// FIXME: we currently don't return `Delimiter::Invisible` open/close delims. To fix
// #67062 we will need to, whereupon the `delim != Delimiter::Invisible` conditions
// below can be removed.
if let Some(tree) = self.curr.curr() {
match tree {
&TokenTree::Token(ref token, spacing) => {
debug_assert!(!matches!(
token.kind,
token::OpenDelim(_) | token::CloseDelim(_)
));
let res = (token.clone(), spacing);
self.curr.bump();
return res;
}
&TokenTree::Delimited(sp, spacing, delim, ref tts) => {
let trees = TokenTreeCursor::new(tts.clone());
self.stack.push(mem::replace(&mut self.curr, trees));
if !delim.skip() {
return (Token::new(token::OpenDelim(delim), sp.open), spacing.open);
}
// No open delimiter to return; continue on to the next iteration.
}
};
} else if let Some(parent) = self.stack.pop() {
// We have exhausted this token stream. Move back to its parent token stream.
let Some(&TokenTree::Delimited(span, spacing, delim, _)) = parent.curr() else {
panic!("parent should be Delimited")
};
self.curr = parent;
self.curr.bump(); // move past the `Delimited`
if !delim.skip() {
return (Token::new(token::CloseDelim(delim), span.close), spacing.close);
}
// No close delimiter to return; continue on to the next iteration.
} else {
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
// We have exhausted the outermost token stream. The use of
// `Spacing::Alone` is arbitrary and immaterial, because the
// `Eof` token's spacing is never used.
return (Token::new(token::Eof, DUMMY_SP), Spacing::Alone);
}
}
}
}
2019-10-08 10:59:59 +00:00
/// A sequence separator.
#[derive(Debug)]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
struct SeqSep<'a> {
2019-10-08 10:59:59 +00:00
/// The separator token.
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
sep: Option<ExpTokenPair<'a>>,
2019-10-08 10:59:59 +00:00
/// `true` if a trailing separator is allowed.
trailing_sep_allowed: bool,
}
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
impl<'a> SeqSep<'a> {
fn trailing_allowed(sep: ExpTokenPair<'a>) -> SeqSep<'a> {
SeqSep { sep: Some(sep), trailing_sep_allowed: true }
2019-10-08 10:59:59 +00:00
}
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn none() -> SeqSep<'a> {
2019-10-08 10:59:59 +00:00
SeqSep { sep: None, trailing_sep_allowed: false }
}
}
#[derive(Debug)]
pub enum FollowedByType {
Yes,
No,
}
2024-02-13 23:48:23 +00:00
#[derive(Copy, Clone, Debug)]
2024-06-03 05:47:46 +00:00
enum Trailing {
2024-02-13 23:48:23 +00:00
No,
Yes,
}
impl From<bool> for Trailing {
fn from(b: bool) -> Trailing {
if b { Trailing::Yes } else { Trailing::No }
}
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2024-06-03 05:47:46 +00:00
pub(super) enum TokenDescription {
ReservedIdentifier,
Keyword,
ReservedKeyword,
DocComment,
// Expanded metavariables are wrapped in invisible delimiters which aren't
// pretty-printed. In error messages we must handle these specially
// otherwise we get confusing things in messages like "expected `(`, found
// ``". It's better to say e.g. "expected `(`, found type metavariable".
MetaVar(MetaVarKind),
}
impl TokenDescription {
2024-06-03 05:47:46 +00:00
pub(super) fn from_token(token: &Token) -> Option<Self> {
match token.kind {
_ if token.is_special_ident() => Some(TokenDescription::ReservedIdentifier),
_ if token.is_used_keyword() => Some(TokenDescription::Keyword),
_ if token.is_unused_keyword() => Some(TokenDescription::ReservedKeyword),
token::DocComment(..) => Some(TokenDescription::DocComment),
token::OpenDelim(Delimiter::Invisible(InvisibleOrigin::MetaVar(kind))) => {
Some(TokenDescription::MetaVar(kind))
}
_ => None,
}
}
2019-12-07 02:07:35 +00:00
}
pub fn token_descr(token: &Token) -> String {
let s = pprust::token_to_string(token).to_string();
match (TokenDescription::from_token(token), &token.kind) {
(Some(TokenDescription::ReservedIdentifier), _) => format!("reserved identifier `{s}`"),
(Some(TokenDescription::Keyword), _) => format!("keyword `{s}`"),
(Some(TokenDescription::ReservedKeyword), _) => format!("reserved keyword `{s}`"),
(Some(TokenDescription::DocComment), _) => format!("doc comment `{s}`"),
// Deliberately doesn't print `s`, which is empty.
(Some(TokenDescription::MetaVar(kind)), _) => format!("`{kind}` metavariable"),
(None, TokenKind::NtIdent(..)) => format!("identifier `{s}`"),
(None, TokenKind::NtLifetime(..)) => format!("lifetime `{s}`"),
(None, TokenKind::Interpolated(node)) => format!("{} `{s}`", node.descr()),
(None, _) => format!("`{s}`"),
}
2019-12-07 02:07:35 +00:00
}
2014-03-09 14:54:34 +00:00
impl<'a> Parser<'a> {
pub fn new(
psess: &'a ParseSess,
stream: TokenStream,
2019-05-22 05:17:53 +00:00
subparser_name: Option<&'static str>,
) -> Self {
let mut parser = Parser {
psess,
token: Token::dummy(),
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
token_spacing: Spacing::Alone,
prev_token: Token::dummy(),
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
capture_cfg: false,
restrictions: Restrictions::empty(),
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
expected_token_types: TokenTypeSet::new(),
token_cursor: TokenCursor { curr: TokenTreeCursor::new(stream), stack: Vec::new() },
num_bump_calls: 0,
break_last_token: 0,
unmatched_angle_bracket_count: 0,
angle_bracket_nesting: 0,
last_unexpected_token_span: None,
2019-05-22 05:17:53 +00:00
subparser_name,
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
capture_state: CaptureState {
capturing: Capturing::No,
parser_replacements: Vec::new(),
inner_attr_parser_ranges: Default::default(),
seen_attrs: IntervalSet::new(u32::MAX as usize),
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
},
current_closure: None,
recovery: Recovery::Allowed,
};
// Make parser point to the first token.
parser.bump();
// Change this from 1 back to 0 after the bump. This eases debugging of
// `Parser::collect_tokens` because 0-indexed token positions are nicer
// than 1-indexed token positions.
parser.num_bump_calls = 0;
parser
}
#[inline]
pub fn recovery(mut self, recovery: Recovery) -> Self {
self.recovery = recovery;
self
}
2022-10-26 19:09:28 +00:00
/// Whether the parser is allowed to recover from broken code.
///
/// If this returns false, recovering broken code into valid code (especially if this recovery does lookahead)
/// is not allowed. All recovery done by the parser must be gated behind this check.
///
/// Technically, this only needs to restrict eager recovery by doing lookahead at more tokens.
2022-10-26 19:09:28 +00:00
/// But making the distinction is very subtle, and simply forbidding all recovery is a lot simpler to uphold.
#[inline]
fn may_recover(&self) -> bool {
matches!(self.recovery, Recovery::Allowed)
}
/// Version of [`unexpected`](Parser::unexpected) that "returns" any type in the `Ok`
/// (both those functions never return "Ok", and so can lie like that in the type).
pub fn unexpected_any<T>(&mut self) -> PResult<'a, T> {
match self.expect_one_of(&[], &[]) {
Err(e) => Err(e),
// We can get `Ok(true)` from `recover_closing_delimiter`
// which is called in `expected_one_of_not_found`.
Ok(_) => FatalError.raise(),
}
}
pub fn unexpected(&mut self) -> PResult<'a, ()> {
self.unexpected_any()
}
2019-02-08 13:53:55 +00:00
/// Expects and consumes the token `t`. Signals an error if the next token is not `t`.
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
pub fn expect(&mut self, exp: ExpTokenPair<'_>) -> PResult<'a, Recovered> {
if self.expected_token_types.is_empty() {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.token == *exp.tok {
self.bump();
2024-02-13 23:44:33 +00:00
Ok(Recovered::No)
} else {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.unexpected_try_recover(exp.tok)
}
} else {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expect_one_of(slice::from_ref(&exp), &[])
}
}
/// Expect next token to be edible or inedible token. If edible,
2014-06-09 20:12:30 +00:00
/// then consume it; if inedible, then return without consuming
/// anything. Signal a fatal error if next token is unexpected.
2024-06-03 05:47:46 +00:00
fn expect_one_of(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
edible: &[ExpTokenPair<'_>],
inedible: &[ExpTokenPair<'_>],
2024-02-13 23:44:33 +00:00
) -> PResult<'a, Recovered> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if edible.iter().any(|exp| exp.tok == &self.token.kind) {
self.bump();
2024-02-13 23:44:33 +00:00
Ok(Recovered::No)
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
} else if inedible.iter().any(|exp| exp.tok == &self.token.kind) {
// leave it in the input
2024-02-13 23:44:33 +00:00
Ok(Recovered::No)
} else if self.token != token::Eof
&& self.last_unexpected_token_span == Some(self.token.span)
{
FatalError.raise();
} else {
self.expected_one_of_not_found(edible, inedible)
.map(|error_guaranteed| Recovered::Yes(error_guaranteed))
}
}
2020-01-11 19:19:57 +00:00
// Public for rustfmt usage.
2020-04-19 11:00:18 +00:00
pub fn parse_ident(&mut self) -> PResult<'a, Ident> {
self.parse_ident_common(true)
}
2020-04-19 11:00:18 +00:00
fn parse_ident_common(&mut self, recover: bool) -> PResult<'a, Ident> {
let (ident, is_raw) = self.ident_or_err(recover)?;
2024-02-13 23:28:27 +00:00
if matches!(is_raw, IdentIsRaw::No) && ident.is_reserved() {
Make `DiagnosticBuilder::emit` consuming. This works for most of its call sites. This is nice, because `emit` very much makes sense as a consuming operation -- indeed, `DiagnosticBuilderState` exists to ensure no diagnostic is emitted twice, but it uses runtime checks. For the small number of call sites where a consuming emit doesn't work, the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will be removed in subsequent commits.) Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes consuming, while `delay_as_bug_without_consuming` is added (which will also be removed in subsequent commits.) All this requires significant changes to `DiagnosticBuilder`'s chaining methods. Currently `DiagnosticBuilder` method chaining uses a non-consuming `&mut self -> &mut Self` style, which allows chaining to be used when the chain ends in `emit()`, like so: ``` struct_err(msg).span(span).emit(); ``` But it doesn't work when producing a `DiagnosticBuilder` value, requiring this: ``` let mut err = self.struct_err(msg); err.span(span); err ``` This style of chaining won't work with consuming `emit` though. For that, we need to use to a `self -> Self` style. That also would allow `DiagnosticBuilder` production to be chained, e.g.: ``` self.struct_err(msg).span(span) ``` However, removing the `&mut self -> &mut Self` style would require that individual modifications of a `DiagnosticBuilder` go from this: ``` err.span(span); ``` to this: ``` err = err.span(span); ``` There are *many* such places. I have a high tolerance for tedious refactorings, but even I gave up after a long time trying to convert them all. Instead, this commit has it both ways: the existing `&mut self -> Self` chaining methods are kept, and new `self -> Self` chaining methods are added, all of which have a `_mv` suffix (short for "move"). Changes to the existing `forward!` macro lets this happen with very little additional boilerplate code. I chose to add the suffix to the new chaining methods rather than the existing ones, because the number of changes required is much smaller that way. This doubled chainging is a bit clumsy, but I think it is worthwhile because it allows a *lot* of good things to subsequently happen. In this commit, there are many `mut` qualifiers removed in places where diagnostics are emitted without being modified. In subsequent commits: - chaining can be used more, making the code more concise; - more use of chaining also permits the removal of redundant diagnostic APIs like `struct_err_with_code`, which can be replaced easily with `struct_err` + `code_mv`; - `emit_without_diagnostic` can be removed, which simplifies a lot of machinery, removing the need for `DiagnosticBuilderState`.
2024-01-03 01:17:35 +00:00
let err = self.expected_ident_found_err();
if recover {
err.emit();
} else {
return Err(err);
}
}
self.bump();
Ok(ident)
}
2024-02-13 23:28:27 +00:00
fn ident_or_err(&mut self, recover: bool) -> PResult<'a, (Ident, IdentIsRaw)> {
match self.token.ident() {
Some(ident) => Ok(ident),
None => self.expected_ident_found(recover),
}
}
2019-02-08 13:53:55 +00:00
/// Checks if the next token is `tok`, and returns `true` if so.
///
/// This method will automatically add `tok` to `expected_token_types` if `tok` is not
/// encountered.
#[inline]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn check(&mut self, exp: ExpTokenPair<'_>) -> bool {
let is_present = self.token == *exp.tok;
if !is_present {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expected_token_types.insert(exp.token_type);
}
is_present
}
#[inline]
#[must_use]
fn check_noexpect(&self, tok: &TokenKind) -> bool {
self.token == *tok
}
// Check the first token after the delimiter that closes the current
// delimited sequence. (Panics if used in the outermost token stream, which
// has no delimiters.) It uses a clone of the relevant tree cursor to skip
// past the entire `TokenTree::Delimited` in a single step, avoiding the
// need for unbounded token lookahead.
//
// Primarily used when `self.token` matches
// `OpenDelim(Delimiter::Invisible(_))`, to look ahead through the current
// metavar expansion.
fn check_noexpect_past_close_delim(&self, tok: &TokenKind) -> bool {
let mut tree_cursor = self.token_cursor.stack.last().unwrap().clone();
tree_cursor.bump();
matches!(
tree_cursor.curr(),
Some(TokenTree::Token(token::Token { kind, .. }, _)) if kind == tok
)
}
/// Consumes a token 'tok' if it exists. Returns whether the given token was present.
///
/// the main purpose of this function is to reduce the cluttering of the suggestions list
/// which using the normal eat method could introduce in some cases.
#[inline]
#[must_use]
2024-06-03 05:47:46 +00:00
fn eat_noexpect(&mut self, tok: &TokenKind) -> bool {
let is_present = self.check_noexpect(tok);
if is_present {
self.bump()
}
is_present
}
2019-02-08 13:53:55 +00:00
/// Consumes a token 'tok' if it exists. Returns whether the given token was present.
#[inline]
#[must_use]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
pub fn eat(&mut self, exp: ExpTokenPair<'_>) -> bool {
let is_present = self.check(exp);
if is_present {
self.bump()
}
is_present
}
2019-10-01 03:13:42 +00:00
/// If the next token is the given keyword, returns `true` without eating it.
/// An expectation is also added for diagnostics purposes.
#[inline]
#[must_use]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn check_keyword(&mut self, exp: ExpKeywordPair) -> bool {
let is_keyword = self.token.is_keyword(exp.kw);
if !is_keyword {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expected_token_types.insert(exp.token_type);
}
is_keyword
}
#[inline]
#[must_use]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn check_keyword_case(&mut self, exp: ExpKeywordPair, case: Case) -> bool {
if self.check_keyword(exp) {
true
} else if case == Case::Insensitive
2024-02-13 23:28:27 +00:00
&& let Some((ident, IdentIsRaw::No)) = self.token.ident()
// Do an ASCII case-insensitive match, because all keywords are ASCII.
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
&& ident.as_str().eq_ignore_ascii_case(exp.kw.as_str())
{
true
} else {
false
}
}
2019-10-01 03:13:42 +00:00
/// If the next token is the given keyword, eats it and returns `true`.
/// Otherwise, returns `false`. An expectation is also added for diagnostics purposes.
// Public for rustc_builtin_macros and rustfmt usage.
#[inline]
#[must_use]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
pub fn eat_keyword(&mut self, exp: ExpKeywordPair) -> bool {
let is_keyword = self.check_keyword(exp);
if is_keyword {
self.bump();
}
is_keyword
}
/// Eats a keyword, optionally ignoring the case.
/// If the case differs (and is ignored) an error is issued.
/// This is useful for recovery.
#[inline]
#[must_use]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn eat_keyword_case(&mut self, exp: ExpKeywordPair, case: Case) -> bool {
if self.eat_keyword(exp) {
true
} else if case == Case::Insensitive
2024-02-13 23:28:27 +00:00
&& let Some((ident, IdentIsRaw::No)) = self.token.ident()
// Do an ASCII case-insensitive match, because all keywords are ASCII.
&& ident.as_str().eq_ignore_ascii_case(exp.kw.as_str())
{
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.dcx().emit_err(errors::KwBadCase { span: ident.span, kw: exp.kw.as_str() });
self.bump();
true
} else {
false
}
}
/// If the next token is the given keyword, eats it and returns `true`.
/// Otherwise, returns `false`. No expectation is added.
// Public for rustc_builtin_macros usage.
#[inline]
#[must_use]
pub fn eat_keyword_noexpect(&mut self, kw: Symbol) -> bool {
let is_keyword = self.token.is_keyword(kw);
if is_keyword {
self.bump();
2014-08-28 04:34:03 +00:00
}
is_keyword
}
2019-02-08 13:53:55 +00:00
/// If the given word is not a keyword, signals an error.
/// If the next token is not the given word, signals an error.
/// Otherwise, eats it.
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
pub fn expect_keyword(&mut self, exp: ExpKeywordPair) -> PResult<'a, ()> {
if !self.eat_keyword(exp) { self.unexpected() } else { Ok(()) }
}
/// Consume a sequence produced by a metavar expansion, if present.
fn eat_metavar_seq<T>(
&mut self,
mv_kind: MetaVarKind,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
) -> Option<T> {
self.eat_metavar_seq_with_matcher(|mvk| mvk == mv_kind, f)
}
/// A slightly more general form of `eat_metavar_seq`, for use with the
/// `MetaVarKind` variants that have parameters, where an exact match isn't
/// desired.
fn eat_metavar_seq_with_matcher<T>(
&mut self,
match_mv_kind: impl Fn(MetaVarKind) -> bool,
mut f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
) -> Option<T> {
if let token::OpenDelim(delim) = self.token.kind
&& let Delimiter::Invisible(InvisibleOrigin::MetaVar(mv_kind)) = delim
&& match_mv_kind(mv_kind)
{
self.bump();
let res = f(self).expect("failed to reparse {mv_kind:?}");
if let token::CloseDelim(delim) = self.token.kind
&& let Delimiter::Invisible(InvisibleOrigin::MetaVar(mv_kind)) = delim
&& match_mv_kind(mv_kind)
{
self.bump();
Some(res)
} else {
panic!("no close delim when reparsing {mv_kind:?}");
}
} else {
None
}
}
/// Is the given keyword `kw` followed by a non-reserved identifier?
fn is_kw_followed_by_ident(&self, kw: Symbol) -> bool {
self.token.is_keyword(kw) && self.look_ahead(1, |t| t.is_ident() && !t.is_reserved_ident())
}
#[inline]
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn check_or_expected(&mut self, ok: bool, token_type: TokenType) -> bool {
if !ok {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expected_token_types.insert(token_type);
}
ok
}
2019-10-08 07:35:34 +00:00
fn check_ident(&mut self) -> bool {
self.check_or_expected(self.token.is_ident(), TokenType::Ident)
2019-09-30 04:21:30 +00:00
}
fn check_path(&mut self) -> bool {
self.check_or_expected(self.token.is_path_start(), TokenType::Path)
}
fn check_type(&mut self) -> bool {
self.check_or_expected(self.token.can_begin_type(), TokenType::Type)
}
fn check_const_arg(&mut self) -> bool {
self.check_or_expected(self.token.can_begin_const_arg(), TokenType::Const)
2019-09-30 04:21:30 +00:00
}
2022-12-20 16:15:55 +00:00
fn check_const_closure(&self) -> bool {
self.is_keyword_ahead(0, &[kw::Const])
&& self.look_ahead(1, |t| match &t.kind {
// async closures do not work with const closures, so we do not parse that here.
token::Ident(kw::Move | kw::Static, IdentIsRaw::No)
| token::OrOr
| token::BinOp(token::Or) => true,
2022-12-20 16:15:55 +00:00
_ => false,
})
}
fn check_inline_const(&self, dist: usize) -> bool {
self.is_keyword_ahead(dist, &[kw::Const])
2022-11-22 09:42:01 +00:00
&& self.look_ahead(dist + 1, |t| match &t.kind {
token::Interpolated(nt) => matches!(&**nt, token::NtBlock(..)),
token::OpenDelim(Delimiter::Brace) => true,
_ => false,
})
2020-09-21 20:55:58 +00:00
}
2019-09-30 04:21:30 +00:00
/// Checks to see if the next token is either `+` or `+=`.
/// Otherwise returns `false`.
#[inline]
2019-09-30 04:21:30 +00:00
fn check_plus(&mut self) -> bool {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.check_or_expected(self.token.is_like_plus(), TokenType::Plus)
}
/// Eats the expected token if it's present possibly breaking
/// compound tokens like multi-character operators in process.
/// Returns `true` if the token was eaten.
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
fn break_and_eat(&mut self, exp: ExpTokenPair<'_>) -> bool {
if self.token == *exp.tok {
self.bump();
return true;
}
match self.token.kind.break_two_token_op(1) {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
Some((first, second)) if first == *exp.tok => {
let first_span = self.psess.source_map().start_point(self.token.span);
let second_span = self.token.span.with_lo(first_span.hi());
self.token = Token::new(first, first_span);
// Keep track of this token - if we end token capturing now,
// we'll want to append this token to the captured stream.
//
// If we consume any additional tokens, then this token
// is not needed (we'll capture the entire 'glued' token),
// and `bump` will set this field to 0.
self.break_last_token += 1;
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
// Use the spacing of the glued token as the spacing of the
// unglued second token.
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
self.bump_with((Token::new(second, second_span), self.token_spacing));
true
}
_ => {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expected_token_types.insert(exp.token_type);
false
}
}
}
2018-05-25 21:36:23 +00:00
/// Eats `+` possibly breaking tokens like `+=` in process.
fn eat_plus(&mut self) -> bool {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.break_and_eat(exp!(Plus))
}
/// Eats `&` possibly breaking tokens like `&&` in process.
/// Signals an error if `&` is not eaten.
2015-12-20 21:00:43 +00:00
fn expect_and(&mut self) -> PResult<'a, ()> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.break_and_eat(exp!(And)) { Ok(()) } else { self.unexpected() }
}
/// Eats `|` possibly breaking tokens like `||` in process.
/// Signals an error if `|` was not eaten.
fn expect_or(&mut self) -> PResult<'a, ()> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.break_and_eat(exp!(Or)) { Ok(()) } else { self.unexpected() }
}
/// Eats `<` possibly breaking tokens like `<<` in process.
fn eat_lt(&mut self) -> bool {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
let ate = self.break_and_eat(exp!(Lt));
if ate {
// See doc comment for `unmatched_angle_bracket_count`.
self.unmatched_angle_bracket_count += 1;
debug!("eat_lt: (increment) count={:?}", self.unmatched_angle_bracket_count);
}
ate
}
/// Eats `<` possibly breaking tokens like `<<` in process.
/// Signals an error if `<` was not eaten.
2015-12-20 21:00:43 +00:00
fn expect_lt(&mut self) -> PResult<'a, ()> {
if self.eat_lt() { Ok(()) } else { self.unexpected() }
}
/// Eats `>` possibly breaking tokens like `>>` in process.
/// Signals an error if `>` was not eaten.
fn expect_gt(&mut self) -> PResult<'a, ()> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.break_and_eat(exp!(Gt)) {
// See doc comment for `unmatched_angle_bracket_count`.
if self.unmatched_angle_bracket_count > 0 {
self.unmatched_angle_bracket_count -= 1;
debug!("expect_gt: (decrement) count={:?}", self.unmatched_angle_bracket_count);
}
Ok(())
} else {
self.unexpected()
}
}
/// Checks if the next token is contained within `closes`, and returns `true` if so.
fn expect_any_with_type(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
closes_expected: &[ExpTokenPair<'_>],
closes_not_expected: &[&TokenKind],
) -> bool {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
closes_expected.iter().any(|&close| self.check(close))
|| closes_not_expected.iter().any(|k| self.check_noexpect(k))
}
2023-12-28 13:06:51 +00:00
/// Parses a sequence until the specified delimiters. The function
/// `f` must consume tokens until reaching the next separator or
/// closing bracket.
2019-10-08 07:35:34 +00:00
fn parse_seq_to_before_tokens<T>(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
closes_expected: &[ExpTokenPair<'_>],
closes_not_expected: &[&TokenKind],
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
sep: SeqSep<'_>,
mut f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing, Recovered)> {
let mut first = true;
2024-02-13 23:44:33 +00:00
let mut recovered = Recovered::No;
2024-02-13 23:48:23 +00:00
let mut trailing = Trailing::No;
let mut v = ThinVec::new();
while !self.expect_any_with_type(closes_expected, closes_not_expected) {
if let token::CloseDelim(..) | token::Eof = self.token.kind {
break;
}
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if let Some(exp) = sep.sep {
if first {
2023-12-28 13:06:51 +00:00
// no separator for the first element
first = false;
} else {
2023-12-28 13:06:51 +00:00
// check for separator
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
match self.expect(exp) {
2024-02-13 23:44:33 +00:00
Ok(Recovered::No) => {
self.current_closure.take();
}
Ok(Recovered::Yes(guar)) => {
self.current_closure.take();
recovered = Recovered::Yes(guar);
break;
}
Err(mut expect_err) => {
let sp = self.prev_token.span.shrink_to_hi();
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
let token_str = pprust::token_kind_to_string(exp.tok);
match self.current_closure.take() {
Some(closure_spans) if self.token == TokenKind::Semi => {
// Finding a semicolon instead of a comma
// after a closure body indicates that the
// closure body may be a block but the user
// forgot to put braces around its
// statements.
self.recover_missing_braces_around_closure_body(
closure_spans,
expect_err,
)?;
continue;
}
_ => {
// Attempt to keep parsing if it was a similar separator.
if exp.tok.similar_tokens().contains(&self.token.kind) {
self.bump();
}
}
}
// If this was a missing `@` in a binding pattern
// bail with a suggestion
// https://github.com/rust-lang/rust/issues/72373
if self.prev_token.is_ident() && self.token == token::DotDot {
let msg = format!(
2023-11-06 19:56:45 +00:00
"if you meant to bind the contents of the rest of the array \
pattern into `{}`, use `@`",
pprust::token_to_string(&self.prev_token)
);
expect_err
.with_span_suggestion_verbose(
self.prev_token.span.shrink_to_hi().until(self.token.span),
msg,
" @ ",
Applicability::MaybeIncorrect,
)
.emit();
break;
}
// Attempt to keep parsing if it was an omitted separator.
self.last_unexpected_token_span = None;
match f(self) {
Ok(t) => {
// Parsed successfully, therefore most probably the code only
// misses a separator.
expect_err
.with_span_suggestion_short(
sp,
format!("missing `{token_str}`"),
token_str,
Applicability::MaybeIncorrect,
)
.emit();
v.push(t);
continue;
}
Err(e) => {
// Parsing failed, therefore it must be something more serious
// than just a missing separator.
for xx in &e.children {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
// Propagate the help message from sub error `e` to main
// error `expect_err`.
expect_err.children.push(xx.clone());
}
e.cancel();
if self.token == token::Colon {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
// We will try to recover in
// `maybe_recover_struct_lit_bad_delims`.
return Err(expect_err);
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
} else if let [exp] = closes_expected
&& exp.token_type == TokenType::CloseParen
2023-11-06 23:19:14 +00:00
{
return Err(expect_err);
} else {
expect_err.emit();
break;
}
}
}
}
}
}
}
if sep.trailing_sep_allowed
&& self.expect_any_with_type(closes_expected, closes_not_expected)
{
2024-02-13 23:48:23 +00:00
trailing = Trailing::Yes;
break;
}
let t = f(self)?;
v.push(t);
}
2023-03-14 23:10:59 +00:00
Ok((v, trailing, recovered))
}
fn recover_missing_braces_around_closure_body(
&mut self,
closure_spans: ClosureSpans,
mut expect_err: Diag<'_>,
) -> PResult<'a, ()> {
let initial_semicolon = self.token.span;
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
while self.eat(exp!(Semi)) {
let _ = self.parse_stmt_without_recovery(false, ForceCollect::No).unwrap_or_else(|e| {
e.cancel();
None
});
}
expect_err
.primary_message("closure bodies that contain statements must be surrounded by braces");
let preceding_pipe_span = closure_spans.closing_pipe;
let following_token_span = self.token.span;
let mut first_note = MultiSpan::from(vec![initial_semicolon]);
first_note.push_span_label(
initial_semicolon,
"this `;` turns the preceding closure into a statement",
);
first_note.push_span_label(
closure_spans.body,
"this expression is a statement because of the trailing semicolon",
);
expect_err.span_note(first_note, "statement found outside of a block");
let mut second_note = MultiSpan::from(vec![closure_spans.whole_closure]);
second_note.push_span_label(closure_spans.whole_closure, "this is the parsed closure...");
second_note.push_span_label(
following_token_span,
"...but likely you meant the closure to end here",
);
expect_err.span_note(second_note, "the closure body may be incorrectly delimited");
expect_err.span(vec![preceding_pipe_span, following_token_span]);
let opening_suggestion_str = " {".to_string();
let closing_suggestion_str = "}".to_string();
expect_err.multipart_suggestion(
"try adding braces",
vec![
(preceding_pipe_span.shrink_to_hi(), opening_suggestion_str),
(following_token_span.shrink_to_lo(), closing_suggestion_str),
],
Applicability::MaybeIncorrect,
);
expect_err.emit();
Ok(())
}
2023-12-28 13:06:51 +00:00
/// Parses a sequence, not including the delimiters. The function
/// `f` must consume tokens until reaching the next separator or
/// closing bracket.
fn parse_seq_to_before_end<T>(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
close: ExpTokenPair<'_>,
sep: SeqSep<'_>,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing, Recovered)> {
self.parse_seq_to_before_tokens(&[close], &[], sep, f)
}
2023-12-28 13:06:51 +00:00
/// Parses a sequence, including only the closing delimiter. The function
/// `f` must consume tokens until reaching the next separator or
/// closing bracket.
fn parse_seq_to_end<T>(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
close: ExpTokenPair<'_>,
sep: SeqSep<'_>,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing)> {
let (val, trailing, recovered) = self.parse_seq_to_before_end(close, sep, f)?;
if matches!(recovered, Recovered::No) && !self.eat(close) {
self.dcx().span_delayed_bug(
self.token.span,
"recovered but `parse_seq_to_before_end` did not give us the close token",
);
}
Ok((val, trailing))
}
2023-12-28 13:06:51 +00:00
/// Parses a sequence, including both delimiters. The function
2019-02-08 13:53:55 +00:00
/// `f` must consume tokens until reaching the next separator or
2014-06-09 20:12:30 +00:00
/// closing bracket.
fn parse_unspanned_seq<T>(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
open: ExpTokenPair<'_>,
close: ExpTokenPair<'_>,
sep: SeqSep<'_>,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing)> {
self.expect(open)?;
self.parse_seq_to_end(close, sep, f)
}
2023-12-28 13:06:51 +00:00
/// Parses a comma-separated sequence, including both delimiters.
/// The function `f` must consume tokens until reaching the next separator or
/// closing bracket.
fn parse_delim_comma_seq<T>(
&mut self,
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
open: ExpTokenPair<'_>,
close: ExpTokenPair<'_>,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing)> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.parse_unspanned_seq(open, close, SeqSep::trailing_allowed(exp!(Comma)), f)
}
2023-12-28 13:06:51 +00:00
/// Parses a comma-separated sequence delimited by parentheses (e.g. `(x, y)`).
/// The function `f` must consume tokens until reaching the next separator or
/// closing bracket.
fn parse_paren_comma_seq<T>(
&mut self,
f: impl FnMut(&mut Parser<'a>) -> PResult<'a, T>,
2024-02-13 23:48:23 +00:00
) -> PResult<'a, (ThinVec<T>, Trailing)> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.parse_delim_comma_seq(exp!(OpenParen), exp!(CloseParen), f)
}
/// Advance the parser by one token using provided token as the next one.
fn bump_with(&mut self, next: (Token, Spacing)) {
self.inlined_bump_with(next)
}
/// This always-inlined version should only be used on hot code paths.
#[inline(always)]
fn inlined_bump_with(&mut self, (next_token, next_spacing): (Token, Spacing)) {
// Update the current and previous tokens.
self.prev_token = mem::replace(&mut self.token, next_token);
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
self.token_spacing = next_spacing;
// Diagnostics.
self.expected_token_types.clear();
}
2013-07-02 19:47:32 +00:00
/// Advance the parser by one token.
pub fn bump(&mut self) {
// Note: destructuring here would give nicer code, but it was found in #96210 to be slower
// than `.0`/`.1` access.
let mut next = self.token_cursor.inlined_next();
self.num_bump_calls += 1;
// We got a token from the underlying cursor and no longer need to
// worry about an unglued token. See `break_and_eat` for more details.
self.break_last_token = 0;
if next.0.span.is_dummy() {
// Tweak the location for better diagnostics, but keep syntactic context intact.
let fallback_span = self.token.span;
next.0.span = fallback_span.with_ctxt(next.0.span.ctxt());
}
debug_assert!(!matches!(
next.0.kind,
token::OpenDelim(delim) | token::CloseDelim(delim) if delim.skip()
));
self.inlined_bump_with(next)
}
2019-10-01 03:13:42 +00:00
/// Look-ahead `dist` tokens of `self.token` and get access to that token there.
/// When `dist == 0` then the current token is looked at. `Eof` will be
/// returned if the look-ahead is any distance past the end of the tokens.
2019-09-30 04:21:30 +00:00
pub fn look_ahead<R>(&self, dist: usize, looker: impl FnOnce(&Token) -> R) -> R {
if dist == 0 {
2019-09-30 04:21:30 +00:00
return looker(&self.token);
}
// Typically around 98% of the `dist > 0` cases have `dist == 1`, so we
// have a fast special case for that.
if dist == 1 {
// The index is zero because the tree cursor's index always points
// to the next token to be gotten.
match self.token_cursor.curr.curr() {
Some(tree) => {
// Indexing stayed within the current token tree.
match tree {
TokenTree::Token(token, _) => return looker(token),
&TokenTree::Delimited(dspan, _, delim, _) => {
if !delim.skip() {
return looker(&Token::new(token::OpenDelim(delim), dspan.open));
}
}
}
}
None => {
// The tree cursor lookahead went (one) past the end of the
// current token tree. Try to return a close delimiter.
if let Some(last) = self.token_cursor.stack.last()
&& let Some(&TokenTree::Delimited(span, _, delim, _)) = last.curr()
&& !delim.skip()
{
// We are not in the outermost token stream, so we have
// delimiters. Also, those delimiters are not skipped.
return looker(&Token::new(token::CloseDelim(delim), span.close));
}
}
}
}
// Just clone the token cursor and use `next`, skipping delimiters as
// necessary. Slow but simple.
let mut cursor = self.token_cursor.clone();
let mut i = 0;
let mut token = Token::dummy();
while i < dist {
token = cursor.next().0;
if matches!(
token.kind,
token::OpenDelim(delim) | token::CloseDelim(delim) if delim.skip()
) {
continue;
}
i += 1;
}
looker(&token)
}
/// Returns whether any of the given keywords are `dist` tokens ahead of the current one.
pub(crate) fn is_keyword_ahead(&self, dist: usize, kws: &[Symbol]) -> bool {
self.look_ahead(dist, |t| kws.iter().any(|&kw| t.is_keyword(kw)))
}
2019-02-08 13:53:55 +00:00
/// Parses asyncness: `async` or nothing.
2023-12-05 21:45:01 +00:00
fn parse_coroutine_kind(&mut self, case: Case) -> Option<CoroutineKind> {
let span = self.token.uninterpolated_span();
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword_case(exp!(Async), case) {
2023-12-05 21:45:01 +00:00
// FIXME(gen_blocks): Do we want to unconditionally parse `gen` and then
// error if edition <= 2024, like we do with async and edition <= 2018?
if self.token.uninterpolated_span().at_least_rust_2024()
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
&& self.eat_keyword_case(exp!(Gen), case)
2023-12-05 21:45:01 +00:00
{
let gen_span = self.prev_token.uninterpolated_span();
Some(CoroutineKind::AsyncGen {
span: span.to(gen_span),
closure_id: DUMMY_NODE_ID,
return_impl_trait_id: DUMMY_NODE_ID,
})
} else {
Some(CoroutineKind::Async {
span,
closure_id: DUMMY_NODE_ID,
return_impl_trait_id: DUMMY_NODE_ID,
})
}
} else if self.token.uninterpolated_span().at_least_rust_2024()
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
&& self.eat_keyword_case(exp!(Gen), case)
2023-12-05 21:45:01 +00:00
{
2023-12-01 00:39:56 +00:00
Some(CoroutineKind::Gen {
2023-11-30 22:54:39 +00:00
span,
closure_id: DUMMY_NODE_ID,
return_impl_trait_id: DUMMY_NODE_ID,
2023-12-01 00:39:56 +00:00
})
} else {
2023-12-01 00:39:56 +00:00
None
}
}
2024-05-17 17:17:48 +00:00
/// Parses fn unsafety: `unsafe`, `safe` or nothing.
fn parse_safety(&mut self, case: Case) -> Safety {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword_case(exp!(Unsafe), case) {
2024-05-17 17:17:48 +00:00
Safety::Unsafe(self.prev_token.uninterpolated_span())
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
} else if self.eat_keyword_case(exp!(Safe), case) {
Safety::Safe(self.prev_token.uninterpolated_span())
2020-02-29 11:59:37 +00:00
} else {
2024-05-17 17:17:48 +00:00
Safety::Default
2020-02-29 11:59:37 +00:00
}
}
/// Parses constness: `const` or nothing.
fn parse_constness(&mut self, case: Case) -> Const {
self.parse_constness_(case, false)
}
/// Parses constness for closures (case sensitive, feature-gated)
fn parse_closure_constness(&mut self) -> Const {
let constness = self.parse_constness_(Case::Sensitive, true);
if let Const::Yes(span) = constness {
self.psess.gated_spans.gate(sym::const_closures, span);
}
constness
}
fn parse_constness_(&mut self, case: Case, is_closure: bool) -> Const {
// Avoid const blocks and const closures to be parsed as const items
if (self.check_const_closure() == is_closure)
&& !self
.look_ahead(1, |t| *t == token::OpenDelim(Delimiter::Brace) || t.is_whole_block())
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
&& self.eat_keyword_case(exp!(Const), case)
2020-09-21 20:55:58 +00:00
{
Const::Yes(self.prev_token.uninterpolated_span())
2020-02-29 11:59:37 +00:00
} else {
Const::No
}
}
2020-09-21 20:55:58 +00:00
/// Parses inline const expressions.
fn parse_const_block(&mut self, span: Span, pat: bool) -> PResult<'a, P<Expr>> {
if pat {
self.psess.gated_spans.gate(sym::inline_const_pat, span);
}
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expect_keyword(exp!(Const))?;
let (attrs, blk) = self.parse_inner_attrs_and_block()?;
let anon_const = AnonConst {
id: DUMMY_NODE_ID,
value: self.mk_expr(blk.span, ExprKind::Block(blk, None)),
};
let blk_span = anon_const.value.span;
Ok(self.mk_expr_with_attrs(span.to(blk_span), ExprKind::ConstBlock(anon_const), attrs))
2020-09-21 20:55:58 +00:00
}
2019-02-08 13:53:55 +00:00
/// Parses mutability (`mut` or nothing).
2017-03-16 21:47:32 +00:00
fn parse_mutability(&mut self) -> Mutability {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword(exp!(Mut)) { Mutability::Mut } else { Mutability::Not }
}
2024-03-24 01:04:45 +00:00
/// Parses reference binding mode (`ref`, `ref mut`, or nothing).
fn parse_byref(&mut self) -> ByRef {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword(exp!(Ref)) { ByRef::Yes(self.parse_mutability()) } else { ByRef::No }
2024-03-24 01:04:45 +00:00
}
2019-09-30 00:36:08 +00:00
/// Possibly parses mutability (`const` or `mut`).
fn parse_const_or_mut(&mut self) -> Option<Mutability> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword(exp!(Mut)) {
Some(Mutability::Mut)
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
} else if self.eat_keyword(exp!(Const)) {
Some(Mutability::Not)
2019-09-30 00:36:08 +00:00
} else {
None
}
}
fn parse_field_name(&mut self) -> PResult<'a, Ident> {
2019-06-05 11:17:56 +00:00
if let token::Literal(token::Lit { kind: token::Integer, symbol, suffix }) = self.token.kind
{
if let Some(suffix) = suffix {
self.expect_no_tuple_index_suffix(self.token.span, suffix);
}
self.bump();
Ok(Ident::new(symbol, self.prev_token.span))
} else {
self.parse_ident_common(true)
}
}
fn parse_delim_args(&mut self) -> PResult<'a, P<DelimArgs>> {
if let Some(args) = self.parse_delim_args_inner() {
Ok(P(args))
} else {
self.unexpected_any()
}
}
fn parse_attr_args(&mut self) -> PResult<'a, AttrArgs> {
Ok(if let Some(args) = self.parse_delim_args_inner() {
AttrArgs::Delimited(args)
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
} else if self.eat(exp!(Eq)) {
2024-09-11 21:23:56 +00:00
let eq_span = self.prev_token.span;
2024-10-16 23:14:01 +00:00
AttrArgs::Eq { eq_span, expr: self.parse_expr_force_collect()? }
} else {
2024-09-11 21:23:56 +00:00
AttrArgs::Empty
})
}
fn parse_delim_args_inner(&mut self) -> Option<DelimArgs> {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
let delimited = self.check(exp!(OpenParen))
|| self.check(exp!(OpenBracket))
|| self.check(exp!(OpenBrace));
delimited.then(|| {
let TokenTree::Delimited(dspan, _, delim, tokens) = self.parse_token_tree() else {
unreachable!()
};
DelimArgs { dspan, delim, tokens }
})
}
2019-08-11 11:14:30 +00:00
/// Parses a single token tree from the input.
pub fn parse_token_tree(&mut self) -> TokenTree {
2019-08-11 11:14:30 +00:00
match self.token.kind {
token::OpenDelim(..) => {
// Clone the `TokenTree::Delimited` that we are currently
// within. That's what we are going to return.
let tree = self.token_cursor.stack.last().unwrap().curr().unwrap().clone();
debug_assert_matches!(tree, TokenTree::Delimited(..));
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
// Advance the token cursor through the entire delimited
// sequence. After getting the `OpenDelim` we are *within* the
// delimited sequence, i.e. at depth `d`. After getting the
// matching `CloseDelim` we are *after* the delimited sequence,
// i.e. at depth `d - 1`.
let target_depth = self.token_cursor.stack.len() - 1;
loop {
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
// Advance one token at a time, so `TokenCursor::next()`
// can capture these tokens if necessary.
self.bump();
if self.token_cursor.stack.len() == target_depth {
2024-08-11 16:10:36 +00:00
debug_assert_matches!(self.token.kind, token::CloseDelim(_));
break;
}
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
// Consume close delimiter
2019-08-11 11:14:30 +00:00
self.bump();
tree
2019-08-11 11:14:30 +00:00
}
token::CloseDelim(_) | token::Eof => unreachable!(),
_ => {
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
let prev_spacing = self.token_spacing;
2019-08-11 11:14:30 +00:00
self.bump();
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
TokenTree::Token(self.prev_token.clone(), prev_spacing)
2014-07-06 21:29:29 +00:00
}
}
}
2019-08-11 11:14:30 +00:00
pub fn parse_tokens(&mut self) -> TokenStream {
let mut result = Vec::new();
loop {
match self.token.kind {
token::Eof | token::CloseDelim(..) => break,
_ => result.push(self.parse_token_tree()),
}
}
2019-08-11 11:14:30 +00:00
TokenStream::new(result)
}
2019-08-11 11:14:30 +00:00
/// Evaluates the closure with restrictions in place.
///
/// Afters the closure is evaluated, restrictions are reset.
2019-09-30 01:29:41 +00:00
fn with_res<T>(&mut self, res: Restrictions, f: impl FnOnce(&mut Self) -> T) -> T {
2019-08-11 11:14:30 +00:00
let old = self.restrictions;
2019-09-30 01:29:41 +00:00
self.restrictions = res;
let res = f(self);
2019-08-11 11:14:30 +00:00
self.restrictions = old;
2019-09-30 01:29:41 +00:00
res
2019-08-11 11:14:30 +00:00
}
/// Parses `pub` and `pub(in path)` plus shortcuts `pub(crate)` for `pub(in crate)`, `pub(self)`
/// for `pub(in self)` and `pub(super)` for `pub(in super)`.
2019-08-11 16:34:42 +00:00
/// If the following element can't be a tuple (i.e., it's a function definition), then
/// it's not a tuple struct field), and the contents within the parentheses aren't valid,
2019-08-11 16:34:42 +00:00
/// so emit a proper diagnostic.
// Public for rustfmt usage.
pub fn parse_visibility(&mut self, fbt: FollowedByType) -> PResult<'a, Visibility> {
if let Some(vis) = self
.eat_metavar_seq(MetaVarKind::Vis, |this| this.parse_visibility(FollowedByType::Yes))
{
return Ok(vis);
}
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if !self.eat_keyword(exp!(Pub)) {
2019-08-11 16:34:42 +00:00
// We need a span for our `Spanned<VisibilityKind>`, but there's inherently no
// keyword to grab a span from for inherited visibility; an empty span at the
// beginning of the current token would seem to be the "Schelling span".
return Ok(Visibility {
span: self.token.span.shrink_to_lo(),
kind: VisibilityKind::Inherited,
tokens: None,
});
2019-08-11 16:34:42 +00:00
}
let lo = self.prev_token.span;
2017-03-07 23:50:13 +00:00
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.check(exp!(OpenParen)) {
// We don't `self.bump()` the `(` yet because this might be a struct definition where
// `()` or a tuple might be allowed. For example, `struct Struct(pub (), pub (usize));`.
// Because of this, we only `bump` the `(` if we're assured it is appropriate to do so
// by the following tokens.
if self.is_keyword_ahead(1, &[kw::In]) {
// Parse `pub(in path)`.
self.bump(); // `(`
self.bump(); // `in`
let path = self.parse_path(PathStyle::Mod)?; // `path`
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expect(exp!(CloseParen))?; // `)`
let vis = VisibilityKind::Restricted {
path: P(path),
id: ast::DUMMY_NODE_ID,
shorthand: false,
};
return Ok(Visibility {
span: lo.to(self.prev_token.span),
kind: vis,
tokens: None,
});
} else if self.look_ahead(2, |t| t == &token::CloseDelim(Delimiter::Parenthesis))
&& self.is_keyword_ahead(1, &[kw::Crate, kw::Super, kw::SelfLower])
{
// Parse `pub(crate)`, `pub(self)`, or `pub(super)`.
self.bump(); // `(`
let path = self.parse_path(PathStyle::Mod)?; // `crate`/`super`/`self`
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expect(exp!(CloseParen))?; // `)`
let vis = VisibilityKind::Restricted {
path: P(path),
id: ast::DUMMY_NODE_ID,
shorthand: true,
};
return Ok(Visibility {
span: lo.to(self.prev_token.span),
kind: vis,
tokens: None,
});
} else if let FollowedByType::No = fbt {
// Provide this diagnostic if a type cannot follow;
// in particular, if this is not a tuple struct.
2019-09-30 04:42:56 +00:00
self.recover_incorrect_vis_restriction()?;
// Emit diagnostic, but continue with public visibility.
}
2016-04-11 00:39:35 +00:00
}
2017-03-07 23:50:13 +00:00
Ok(Visibility { span: lo, kind: VisibilityKind::Public, tokens: None })
}
2019-09-30 04:42:56 +00:00
/// Recovery for e.g. `pub(something) fn ...` or `struct X { pub(something) y: Z }`
fn recover_incorrect_vis_restriction(&mut self) -> PResult<'a, ()> {
self.bump(); // `(`
let path = self.parse_path(PathStyle::Mod)?;
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
self.expect(exp!(CloseParen))?; // `)`
2019-09-30 04:42:56 +00:00
let path_str = pprust::path_to_string(&path);
self.dcx()
.emit_err(IncorrectVisibilityRestriction { span: path.span, inner_str: path_str });
2019-09-30 04:42:56 +00:00
Ok(())
}
/// Parses `extern string_literal?`.
fn parse_extern(&mut self, case: Case) -> Extern {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.eat_keyword_case(exp!(Extern), case) {
2022-07-02 17:25:55 +00:00
let mut extern_span = self.prev_token.span;
let abi = self.parse_abi();
if let Some(abi) = abi {
extern_span = extern_span.to(abi.span);
}
Extern::from_abi(abi, extern_span)
} else {
Extern::None
}
}
/// Parses a string literal as an ABI spec.
fn parse_abi(&mut self) -> Option<StrLit> {
match self.parse_str_lit() {
Ok(str_lit) => Some(str_lit),
Err(Some(lit)) => match lit.kind {
ast::LitKind::Err(_) => None,
_ => {
self.dcx().emit_err(NonStringAbiLiteral { span: lit.span });
None
}
},
Err(None) => None,
}
}
2024-06-03 05:47:46 +00:00
fn collect_tokens_no_attrs<R: HasAttrs + HasTokens>(
&mut self,
f: impl FnOnce(&mut Self) -> PResult<'a, R>,
) -> PResult<'a, R> {
// The only reason to call `collect_tokens_no_attrs` is if you want tokens, so use
// `ForceCollect::Yes`
self.collect_tokens(None, AttrWrapper::empty(), ForceCollect::Yes, |this, _attrs| {
Ok((f(this)?, Trailing::No, UsePreAttrPos::No))
})
}
2024-09-21 17:07:52 +00:00
/// Checks for `::` or, potentially, `:::` and then look ahead after it.
fn check_path_sep_and_look_ahead(&mut self, looker: impl Fn(&Token) -> bool) -> bool {
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
if self.check(exp!(PathSep)) {
2024-09-21 17:07:52 +00:00
if self.may_recover() && self.look_ahead(1, |t| t.kind == token::Colon) {
debug_assert!(!self.look_ahead(1, &looker), "Looker must not match on colon");
self.look_ahead(2, looker)
} else {
self.look_ahead(1, looker)
}
} else {
false
}
}
/// `::{` or `::*`
fn is_import_coupler(&mut self) -> bool {
2024-09-21 17:07:52 +00:00
self.check_path_sep_and_look_ahead(|t| {
matches!(t.kind, token::OpenDelim(Delimiter::Brace) | token::BinOp(token::Star))
})
2016-04-17 00:48:40 +00:00
}
2024-06-03 05:47:46 +00:00
// Debug view of the parser's token stream, up to `{lookahead}` tokens.
// Only used when debugging.
#[allow(unused)]
pub(crate) fn debug_lookahead(&self, lookahead: usize) -> impl fmt::Debug {
fmt::from_fn(move |f| {
let mut dbg_fmt = f.debug_struct("Parser"); // or at least, one view of
// we don't need N spans, but we want at least one, so print all of prev_token
dbg_fmt.field("prev_token", &self.prev_token);
let mut tokens = vec![];
for i in 0..lookahead {
let tok = self.look_ahead(i, |tok| tok.kind.clone());
let is_eof = tok == TokenKind::Eof;
tokens.push(tok);
if is_eof {
// Don't look ahead past EOF.
break;
}
}
dbg_fmt.field_with("tokens", |field| field.debug_list().entries(tokens).finish());
dbg_fmt.field("approx_token_stream_pos", &self.num_bump_calls);
// some fields are interesting for certain values, as they relate to macro parsing
if let Some(subparser) = self.subparser_name {
dbg_fmt.field("subparser_name", &subparser);
}
if let Recovery::Forbidden = self.recovery {
dbg_fmt.field("recovery", &self.recovery);
}
// imply there's "more to know" than this view
dbg_fmt.finish_non_exhaustive()
})
}
pub fn clear_expected_token_types(&mut self) {
self.expected_token_types.clear();
}
pub fn approx_token_stream_pos(&self) -> u32 {
self.num_bump_calls
}
}
2019-02-05 09:35:25 +00:00
pub(crate) fn make_unclosed_delims_error(
unmatched: UnmatchedDelim,
psess: &ParseSess,
) -> Option<Diag<'_>> {
// `None` here means an `Eof` was found. We already emit those errors elsewhere, we add them to
// `unmatched_delims` only for error recovery in the `Parser`.
let found_delim = unmatched.found_delim?;
let mut spans = vec![unmatched.found_span];
if let Some(sp) = unmatched.unclosed_span {
spans.push(sp);
};
let err = psess.dcx().create_err(MismatchedClosingDelimiter {
spans,
delimiter: pprust::token_kind_to_string(&token::CloseDelim(found_delim)).to_string(),
unmatched: unmatched.found_span,
opening_candidate: unmatched.candidate_span,
unclosed: unmatched.unclosed_span,
});
Some(err)
}
/// A helper struct used when building an `AttrTokenStream` from
/// a `LazyAttrTokenStream`. Both delimiter and non-delimited tokens
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// are stored as `FlatToken::Token`. A vector of `FlatToken`s
/// is then 'parsed' to build up an `AttrTokenStream` with nested
/// `AttrTokenTree::Delimited` tokens.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
#[derive(Debug, Clone)]
2024-06-03 05:47:46 +00:00
enum FlatToken {
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// A token - this holds both delimiter (e.g. '{' and '}')
/// and non-delimiter tokens
Token((Token, Spacing)),
/// Holds the `AttrsTarget` for an AST node. The `AttrsTarget` is inserted
/// directly into the constructed `AttrTokenStream` as an
/// `AttrTokenTree::AttrsTarget`.
AttrsTarget(AttrsTarget),
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// A special 'empty' token that is ignored during the conversion
/// to an `AttrTokenStream`. This is used to simplify the
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// handling of replace ranges.
Empty,
}
// Metavar captures of various kinds.
#[derive(Clone, Debug)]
pub enum ParseNtResult {
Tt(TokenTree),
Ident(Ident, IdentIsRaw),
2024-09-05 09:43:55 +00:00
Lifetime(Ident, IdentIsRaw),
Ty(P<ast::Ty>),
Vis(P<ast::Visibility>),
/// This variant will eventually be removed, along with `Token::Interpolate`.
Nt(Arc<Nonterminal>),
}