rust/compiler/rustc_ast/src/tokenstream.rs

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

782 lines
29 KiB
Rust
Raw Normal View History

//! # Token Streams
//!
//! `TokenStream`s represent syntactic objects before they are converted into ASTs.
//! A `TokenStream` is, roughly speaking, a sequence of [`TokenTree`]s,
//! which are themselves a single [`Token`] or a `Delimited` subsequence of tokens.
//!
//! ## Ownership
2019-02-08 13:53:55 +00:00
//!
//! `TokenStream`s are persistent data structures constructed as ropes with reference
//! counted-children. In general, this means that calling an operation on a `TokenStream`
//! (such as `slice`) produces an entirely new `TokenStream` from the borrowed reference to
//! the original. This essentially coerces `TokenStream`s into "views" of their subparts,
//! and a borrowed `TokenStream` is sufficient to build an owned `TokenStream` without taking
//! ownership of the original.
use std::borrow::Cow;
use std::{cmp, fmt, iter};
2019-02-06 17:33:01 +00:00
use rustc_data_structures::stable_hasher::{HashStable, StableHasher};
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
use rustc_data_structures::sync::{self, Lrc};
use rustc_macros::{Decodable, Encodable, HashStable_Generic};
use rustc_serialize::{Decodable, Encodable};
use rustc_span::{sym, Span, SpanDecoder, SpanEncoder, Symbol, DUMMY_SP};
use crate::ast::{AttrStyle, StmtKind};
use crate::ast_traits::{HasAttrs, HasTokens};
use crate::token::{self, Delimiter, Nonterminal, Token, TokenKind};
use crate::{AttrVec, Attribute};
/// Part of a `TokenStream`.
#[derive(Debug, Clone, PartialEq, Encodable, Decodable, HashStable_Generic)]
pub enum TokenTree {
/// A single token. Should never be `OpenDelim` or `CloseDelim`, because
/// delimiters are implicitly represented by `Delimited`.
Token(Token, Spacing),
/// A delimited sequence of token trees.
Delimited(DelimSpan, DelimSpacing, Delimiter, TokenStream),
}
2023-04-07 13:20:26 +00:00
// Ensure all fields of `TokenTree` are `DynSend` and `DynSync`.
#[cfg(parallel_compiler)]
fn _dummy()
where
Token: sync::DynSend + sync::DynSync,
Spacing: sync::DynSend + sync::DynSync,
DelimSpan: sync::DynSend + sync::DynSync,
Delimiter: sync::DynSend + sync::DynSync,
TokenStream: sync::DynSend + sync::DynSync,
{
}
impl TokenTree {
/// Checks if this `TokenTree` is equal to the other, regardless of span/spacing information.
pub fn eq_unspanned(&self, other: &TokenTree) -> bool {
match (self, other) {
(TokenTree::Token(token, _), TokenTree::Token(token2, _)) => token.kind == token2.kind,
(TokenTree::Delimited(.., delim, tts), TokenTree::Delimited(.., delim2, tts2)) => {
2022-11-29 11:01:17 +00:00
delim == delim2 && tts.eq_unspanned(tts2)
}
_ => false,
}
}
/// Retrieves the `TokenTree`'s span.
pub fn span(&self) -> Span {
match self {
TokenTree::Token(token, _) => token.span,
TokenTree::Delimited(sp, ..) => sp.entire(),
}
}
/// Create a `TokenTree::Token` with alone spacing.
pub fn token_alone(kind: TokenKind, span: Span) -> TokenTree {
TokenTree::Token(Token::new(kind, span), Spacing::Alone)
}
/// Create a `TokenTree::Token` with joint spacing.
pub fn token_joint(kind: TokenKind, span: Span) -> TokenTree {
TokenTree::Token(Token::new(kind, span), Spacing::Joint)
}
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// Create a `TokenTree::Token` with joint-hidden spacing.
pub fn token_joint_hidden(kind: TokenKind, span: Span) -> TokenTree {
TokenTree::Token(Token::new(kind, span), Spacing::JointHidden)
}
pub fn uninterpolate(&self) -> Cow<'_, TokenTree> {
match self {
TokenTree::Token(token, spacing) => match token.uninterpolate() {
Cow::Owned(token) => Cow::Owned(TokenTree::Token(token, *spacing)),
Cow::Borrowed(_) => Cow::Borrowed(self),
},
_ => Cow::Borrowed(self),
}
}
}
impl<CTX> HashStable<CTX> for TokenStream
where
CTX: crate::HashStableContext,
{
fn hash_stable(&self, hcx: &mut CTX, hasher: &mut StableHasher) {
for sub_tt in self.trees() {
sub_tt.hash_stable(hcx, hasher);
}
}
}
pub trait ToAttrTokenStream: sync::DynSend + sync::DynSync {
fn to_attr_token_stream(&self) -> AttrTokenStream;
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
impl ToAttrTokenStream for AttrTokenStream {
fn to_attr_token_stream(&self) -> AttrTokenStream {
self.clone()
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
/// A lazy version of [`TokenStream`], which defers creation
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
/// of an actual `TokenStream` until it is needed.
/// `Box` is here only to reduce the structure size.
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
#[derive(Clone)]
pub struct LazyAttrTokenStream(Lrc<Box<dyn ToAttrTokenStream>>);
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
impl LazyAttrTokenStream {
pub fn new(inner: impl ToAttrTokenStream + 'static) -> LazyAttrTokenStream {
LazyAttrTokenStream(Lrc::new(Box::new(inner)))
}
pub fn to_attr_token_stream(&self) -> AttrTokenStream {
self.0.to_attr_token_stream()
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
impl fmt::Debug for LazyAttrTokenStream {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "LazyAttrTokenStream({:?})", self.to_attr_token_stream())
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
impl<S: SpanEncoder> Encodable<S> for LazyAttrTokenStream {
fn encode(&self, _s: &mut S) {
panic!("Attempted to encode LazyAttrTokenStream");
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
impl<D: SpanDecoder> Decodable<D> for LazyAttrTokenStream {
2022-01-18 02:22:50 +00:00
fn decode(_d: &mut D) -> Self {
panic!("Attempted to decode LazyAttrTokenStream");
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
impl<CTX> HashStable<CTX> for LazyAttrTokenStream {
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
fn hash_stable(&self, _hcx: &mut CTX, _hasher: &mut StableHasher) {
panic!("Attempted to compute stable hash for LazyAttrTokenStream");
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-27 01:56:29 +00:00
}
}
/// An `AttrTokenStream` is similar to a `TokenStream`, but with extra
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// information about the tokens for attribute targets. This is used
/// during expansion to perform early cfg-expansion, and to process attributes
/// during proc-macro invocations.
#[derive(Clone, Debug, Default, Encodable, Decodable)]
pub struct AttrTokenStream(pub Lrc<Vec<AttrTokenTree>>);
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// Like `TokenTree`, but for `AttrTokenStream`.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
#[derive(Clone, Debug, Encodable, Decodable)]
pub enum AttrTokenTree {
Token(Token, Spacing),
Delimited(DelimSpan, DelimSpacing, Delimiter, AttrTokenStream),
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// Stores the attributes for an attribute target,
/// along with the tokens for that attribute target.
/// See `AttrsTarget` for more information
AttrsTarget(AttrsTarget),
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
impl AttrTokenStream {
pub fn new(tokens: Vec<AttrTokenTree>) -> AttrTokenStream {
AttrTokenStream(Lrc::new(tokens))
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
/// Converts this `AttrTokenStream` to a plain `Vec<TokenTree>`. During
/// conversion, any `AttrTokenTree::AttrsTarget` gets "flattened" back to a
/// `TokenStream`, as described in the comment on
/// `attrs_and_tokens_to_token_trees`.
pub fn to_token_trees(&self) -> Vec<TokenTree> {
let mut res = Vec::with_capacity(self.0.len());
for tree in self.0.iter() {
match tree {
AttrTokenTree::Token(inner, spacing) => {
res.push(TokenTree::Token(inner.clone(), *spacing));
}
AttrTokenTree::Delimited(span, spacing, delim, stream) => {
res.push(TokenTree::Delimited(
*span,
*spacing,
*delim,
TokenStream::new(stream.to_token_trees()),
))
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
AttrTokenTree::AttrsTarget(target) => {
attrs_and_tokens_to_token_trees(&target.attrs, &target.tokens, &mut res);
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
}
}
res
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
}
// Converts multiple attributes and the tokens for a target AST node into token trees, and appends
// them to `res`.
//
// Example: if the AST node is "fn f() { blah(); }", then:
// - Simple if no attributes are present, e.g. "fn f() { blah(); }"
// - Simple if only outer attribute are present, e.g. "#[outer1] #[outer2] fn f() { blah(); }"
// - Trickier if inner attributes are present, because they must be moved within the AST node's
// tokens, e.g. "#[outer] fn f() { #![inner] blah() }"
fn attrs_and_tokens_to_token_trees(
attrs: &[Attribute],
target_tokens: &LazyAttrTokenStream,
res: &mut Vec<TokenTree>,
) {
let idx = attrs.partition_point(|attr| matches!(attr.style, crate::AttrStyle::Outer));
let (outer_attrs, inner_attrs) = attrs.split_at(idx);
// Add outer attribute tokens.
for attr in outer_attrs {
res.extend(attr.token_trees());
}
// Add target AST node tokens.
res.extend(target_tokens.to_attr_token_stream().to_token_trees());
// Insert inner attribute tokens.
if !inner_attrs.is_empty() {
let mut found = false;
// Check the last two trees (to account for a trailing semi)
for tree in res.iter_mut().rev().take(2) {
if let TokenTree::Delimited(span, spacing, delim, delim_tokens) = tree {
// Inner attributes are only supported on extern blocks, functions,
// impls, and modules. All of these have their inner attributes
// placed at the beginning of the rightmost outermost braced group:
// e.g. fn foo() { #![my_attr] }
//
// Therefore, we can insert them back into the right location
// without needing to do any extra position tracking.
//
// Note: Outline modules are an exception - they can
// have attributes like `#![my_attr]` at the start of a file.
// Support for custom attributes in this position is not
// properly implemented - we always synthesize fake tokens,
// so we never reach this code.
let mut tts = vec![];
for inner_attr in inner_attrs {
tts.extend(inner_attr.token_trees());
}
tts.extend(delim_tokens.0.iter().cloned());
let stream = TokenStream::new(tts);
*tree = TokenTree::Delimited(*span, *spacing, *delim, stream);
found = true;
break;
}
}
assert!(found, "Failed to find trailing delimited group in: {res:?}");
}
}
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// Stores the tokens for an attribute target, along
/// with its attributes.
///
/// This is constructed during parsing when we need to capture
/// tokens, for `cfg` and `cfg_attr` attributes.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
///
/// For example, `#[cfg(FALSE)] struct Foo {}` would
2021-04-19 12:57:08 +00:00
/// have an `attrs` field containing the `#[cfg(FALSE)]` attr,
/// and a `tokens` field storing the (unparsed) tokens `struct Foo {}`
///
/// The `cfg`/`cfg_attr` processing occurs in
/// `StripUnconfigured::configure_tokens`.
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
#[derive(Clone, Debug, Encodable, Decodable)]
pub struct AttrsTarget {
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
/// Attributes, both outer and inner.
/// These are stored in the original order that they were parsed in.
pub attrs: AttrVec,
/// The underlying tokens for the attribute target that `attrs`
/// are applied to
pub tokens: LazyAttrTokenStream,
Implement token-based handling of attributes during expansion This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2020-11-28 23:33:17 +00:00
}
/// A `TokenStream` is an abstract sequence of tokens, organized into [`TokenTree`]s.
///
2017-01-18 03:27:09 +00:00
/// The goal is for procedural macros to work with `TokenStream`s and `TokenTree`s
/// instead of a representation of the abstract syntax tree.
/// Today's `TokenTree`s can still contain AST via `token::Interpolated` for
2021-04-05 20:58:07 +00:00
/// backwards compatibility.
#[derive(Clone, Debug, Default, Encodable, Decodable)]
pub struct TokenStream(pub(crate) Lrc<Vec<TokenTree>>);
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// Indicates whether a token can join with the following token to form a
/// compound token. Used for conversions to `proc_macro::Spacing`. Also used to
/// guide pretty-printing, which is where the `JointHidden` value (which isn't
/// part of `proc_macro::Spacing`) comes in useful.
#[derive(Clone, Copy, Debug, PartialEq, Encodable, Decodable, HashStable_Generic)]
pub enum Spacing {
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// The token cannot join with the following token to form a compound
/// token.
///
/// In token streams parsed from source code, the compiler will use `Alone`
/// for any token immediately followed by whitespace, a non-doc comment, or
/// EOF.
///
/// When constructing token streams within the compiler, use this for each
/// token that (a) should be pretty-printed with a space after it, or (b)
/// is the last token in the stream. (In the latter case the choice of
/// spacing doesn't matter because it is never used for the last token. We
/// arbitrarily use `Alone`.)
///
/// Converts to `proc_macro::Spacing::Alone`, and
/// `proc_macro::Spacing::Alone` converts back to this.
Alone,
2022-09-28 01:20:42 +00:00
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// The token can join with the following token to form a compound token.
///
/// In token streams parsed from source code, the compiler will use `Joint`
/// for any token immediately followed by punctuation (as determined by
/// `Token::is_punct`).
///
/// When constructing token streams within the compiler, use this for each
/// token that (a) should be pretty-printed without a space after it, and
/// (b) is followed by a punctuation token.
///
/// Converts to `proc_macro::Spacing::Joint`, and
/// `proc_macro::Spacing::Joint` converts back to this.
Joint,
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// The token can join with the following token to form a compound token,
/// but this will not be visible at the proc macro level. (This is what the
/// `Hidden` means; see below.)
///
/// In token streams parsed from source code, the compiler will use
/// `JointHidden` for any token immediately followed by anything not
/// covered by the `Alone` and `Joint` cases: an identifier, lifetime,
/// literal, delimiter, doc comment.
///
/// When constructing token streams, use this for each token that (a)
/// should be pretty-printed without a space after it, and (b) is followed
/// by a non-punctuation token.
///
/// Converts to `proc_macro::Spacing::Alone`, but
/// `proc_macro::Spacing::Alone` converts back to `token::Spacing::Alone`.
/// Because of that, pretty-printing of `TokenStream`s produced by proc
/// macros is unavoidably uglier (with more whitespace between tokens) than
/// pretty-printing of `TokenStream`'s produced by other means (i.e. parsed
/// source code, internally constructed token streams, and token streams
/// produced by declarative macros).
JointHidden,
}
impl TokenStream {
/// Given a `TokenStream` with a `Stream` of only two arguments, return a new `TokenStream`
/// separating the two arguments with a comma for diagnostic suggestions.
pub fn add_comma(&self) -> Option<(TokenStream, Span)> {
// Used to suggest if a user writes `foo!(a b);`
let mut suggestion = None;
let mut iter = self.0.iter().enumerate().peekable();
while let Some((pos, ts)) = iter.next() {
if let Some((_, next)) = iter.peek() {
let sp = match (&ts, &next) {
(_, TokenTree::Token(Token { kind: token::Comma, .. }, _)) => continue,
2019-12-22 22:42:04 +00:00
(
TokenTree::Token(token_left, Spacing::Alone),
TokenTree::Token(token_right, _),
) if ((token_left.is_ident() && !token_left.is_reserved_ident())
|| token_left.is_lit())
&& ((token_right.is_ident() && !token_right.is_reserved_ident())
|| token_right.is_lit()) =>
2019-12-22 22:42:04 +00:00
{
token_left.span
2019-12-22 22:42:04 +00:00
}
(TokenTree::Delimited(sp, ..), _) => sp.entire(),
_ => continue,
};
let sp = sp.shrink_to_hi();
let comma = TokenTree::token_alone(token::Comma, sp);
suggestion = Some((pos, comma, sp));
}
}
if let Some((pos, comma, sp)) = suggestion {
2020-10-16 09:43:39 +00:00
let mut new_stream = Vec::with_capacity(self.0.len() + 1);
let parts = self.0.split_at(pos + 1);
new_stream.extend_from_slice(parts.0);
new_stream.push(comma);
new_stream.extend_from_slice(parts.1);
return Some((TokenStream::new(new_stream), sp));
}
None
}
}
impl FromIterator<TokenTree> for TokenStream {
fn from_iter<I: IntoIterator<Item = TokenTree>>(iter: I) -> Self {
TokenStream::new(iter.into_iter().collect::<Vec<TokenTree>>())
2018-08-12 19:45:48 +00:00
}
}
2017-01-18 03:27:09 +00:00
impl Eq for TokenStream {}
impl PartialEq<TokenStream> for TokenStream {
fn eq(&self, other: &TokenStream) -> bool {
2017-01-18 03:27:09 +00:00
self.trees().eq(other.trees())
}
}
impl TokenStream {
pub fn new(tts: Vec<TokenTree>) -> TokenStream {
TokenStream(Lrc::new(tts))
}
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
pub fn len(&self) -> usize {
self.0.len()
}
pub fn trees(&self) -> RefTokenTreeCursor<'_> {
RefTokenTreeCursor::new(self)
}
pub fn into_trees(self) -> TokenTreeCursor {
TokenTreeCursor::new(self)
}
/// Compares two `TokenStream`s, checking equality without regarding span information.
pub fn eq_unspanned(&self, other: &TokenStream) -> bool {
let mut t1 = self.trees();
let mut t2 = other.trees();
2021-03-08 23:32:41 +00:00
for (t1, t2) in iter::zip(&mut t1, &mut t2) {
2022-11-29 11:01:17 +00:00
if !t1.eq_unspanned(t2) {
return false;
}
}
t1.next().is_none() && t2.next().is_none()
}
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
/// Create a token stream containing a single token with alone spacing. The
/// spacing used for the final token in a constructed stream doesn't matter
/// because it's never used. In practice we arbitrarily use
/// `Spacing::Alone`.
pub fn token_alone(kind: TokenKind, span: Span) -> TokenStream {
TokenStream::new(vec![TokenTree::token_alone(kind, span)])
}
pub fn from_ast(node: &(impl HasAttrs + HasTokens + fmt::Debug)) -> TokenStream {
let tokens = node.tokens().unwrap_or_else(|| panic!("missing tokens for node: {:?}", node));
let mut tts = vec![];
attrs_and_tokens_to_token_trees(node.attrs(), tokens, &mut tts);
TokenStream::new(tts)
}
pub fn from_nonterminal_ast(nt: &Nonterminal) -> TokenStream {
match nt {
Nonterminal::NtItem(item) => TokenStream::from_ast(item),
Nonterminal::NtBlock(block) => TokenStream::from_ast(block),
Nonterminal::NtStmt(stmt) if let StmtKind::Empty = stmt.kind => {
// FIXME: Properly collect tokens for empty statements.
TokenStream::token_alone(token::Semi, stmt.span)
}
Nonterminal::NtStmt(stmt) => TokenStream::from_ast(stmt),
Nonterminal::NtPat(pat) => TokenStream::from_ast(pat),
Nonterminal::NtTy(ty) => TokenStream::from_ast(ty),
Nonterminal::NtMeta(attr) => TokenStream::from_ast(attr),
Nonterminal::NtPath(path) => TokenStream::from_ast(path),
Nonterminal::NtVis(vis) => TokenStream::from_ast(vis),
Nonterminal::NtExpr(expr) | Nonterminal::NtLiteral(expr) => TokenStream::from_ast(expr),
}
}
fn flatten_token(token: &Token, spacing: Spacing) -> TokenTree {
match token.kind {
token::NtIdent(ident, is_raw) => {
TokenTree::Token(Token::new(token::Ident(ident.name, is_raw), ident.span), spacing)
}
2024-09-05 09:43:55 +00:00
token::NtLifetime(ident, is_raw) => TokenTree::Delimited(
DelimSpan::from_single(token.span),
DelimSpacing::new(Spacing::JointHidden, spacing),
Delimiter::Invisible,
2024-09-05 09:43:55 +00:00
TokenStream::token_alone(token::Lifetime(ident.name, is_raw), ident.span),
),
token::Interpolated(ref nt) => TokenTree::Delimited(
DelimSpan::from_single(token.span),
DelimSpacing::new(Spacing::JointHidden, spacing),
Delimiter::Invisible,
TokenStream::from_nonterminal_ast(&nt).flattened(),
),
_ => TokenTree::Token(token.clone(), spacing),
}
}
fn flatten_token_tree(tree: &TokenTree) -> TokenTree {
match tree {
TokenTree::Token(token, spacing) => TokenStream::flatten_token(token, *spacing),
TokenTree::Delimited(span, spacing, delim, tts) => {
TokenTree::Delimited(*span, *spacing, *delim, tts.flattened())
}
}
}
#[must_use]
pub fn flattened(&self) -> TokenStream {
fn can_skip(stream: &TokenStream) -> bool {
stream.trees().all(|tree| match tree {
TokenTree::Token(token, _) => !matches!(
token.kind,
token::NtIdent(..) | token::NtLifetime(..) | token::Interpolated(..)
),
TokenTree::Delimited(.., inner) => can_skip(inner),
})
}
if can_skip(self) {
return self.clone();
}
self.trees().map(|tree| TokenStream::flatten_token_tree(tree)).collect()
}
Remove `TokenStreamBuilder`. `TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-04 23:38:15 +00:00
// If `vec` is not empty, try to glue `tt` onto its last token. The return
// value indicates if gluing took place.
fn try_glue_to_last(vec: &mut Vec<TokenTree>, tt: &TokenTree) -> bool {
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
if let Some(TokenTree::Token(last_tok, Spacing::Joint | Spacing::JointHidden)) = vec.last()
Remove `TokenStreamBuilder`. `TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-04 23:38:15 +00:00
&& let TokenTree::Token(tok, spacing) = tt
2022-11-29 11:01:17 +00:00
&& let Some(glued_tok) = last_tok.glue(tok)
Remove `TokenStreamBuilder`. `TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-04 23:38:15 +00:00
{
// ...then overwrite the last token tree in `vec` with the
// glued token, and skip the first token tree from `stream`.
*vec.last_mut().unwrap() = TokenTree::Token(glued_tok, *spacing);
true
} else {
false
}
}
/// Push `tt` onto the end of the stream, possibly gluing it to the last
/// token. Uses `make_mut` to maximize efficiency.
Remove `TokenStreamBuilder`. `TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-04 23:38:15 +00:00
pub fn push_tree(&mut self, tt: TokenTree) {
let vec_mut = Lrc::make_mut(&mut self.0);
if Self::try_glue_to_last(vec_mut, &tt) {
// nothing else to do
} else {
vec_mut.push(tt);
}
}
/// Push `stream` onto the end of the stream, possibly gluing the first
/// token tree to the last token. (No other token trees will be glued.)
/// Uses `make_mut` to maximize efficiency.
Remove `TokenStreamBuilder`. `TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-04 23:38:15 +00:00
pub fn push_stream(&mut self, stream: TokenStream) {
let vec_mut = Lrc::make_mut(&mut self.0);
let stream_iter = stream.0.iter().cloned();
if let Some(first) = stream.0.first()
&& Self::try_glue_to_last(vec_mut, first)
{
// Now skip the first token tree from `stream`.
vec_mut.extend(stream_iter.skip(1));
} else {
// Append all of `stream`.
vec_mut.extend(stream_iter);
}
}
pub fn chunks(&self, chunk_size: usize) -> core::slice::Chunks<'_, TokenTree> {
self.0.chunks(chunk_size)
}
/// Desugar doc comments like `/// foo` in the stream into `#[doc =
/// r"foo"]`. Modifies the `TokenStream` via `Lrc::make_mut`, but as little
/// as possible.
pub fn desugar_doc_comments(&mut self) {
if let Some(desugared_stream) = desugar_inner(self.clone()) {
*self = desugared_stream;
}
// The return value is `None` if nothing in `stream` changed.
fn desugar_inner(mut stream: TokenStream) -> Option<TokenStream> {
let mut i = 0;
let mut modified = false;
while let Some(tt) = stream.0.get(i) {
match tt {
&TokenTree::Token(
Token { kind: token::DocComment(_, attr_style, data), span },
_spacing,
) => {
let desugared = desugared_tts(attr_style, data, span);
let desugared_len = desugared.len();
Lrc::make_mut(&mut stream.0).splice(i..i + 1, desugared);
modified = true;
i += desugared_len;
}
&TokenTree::Token(..) => i += 1,
&TokenTree::Delimited(sp, spacing, delim, ref delim_stream) => {
if let Some(desugared_delim_stream) = desugar_inner(delim_stream.clone()) {
let new_tt =
TokenTree::Delimited(sp, spacing, delim, desugared_delim_stream);
Lrc::make_mut(&mut stream.0)[i] = new_tt;
modified = true;
}
i += 1;
}
}
}
if modified { Some(stream) } else { None }
}
fn desugared_tts(attr_style: AttrStyle, data: Symbol, span: Span) -> Vec<TokenTree> {
// Searches for the occurrences of `"#*` and returns the minimum number of `#`s
// required to wrap the text. E.g.
// - `abc d` is wrapped as `r"abc d"` (num_of_hashes = 0)
// - `abc "d"` is wrapped as `r#"abc "d""#` (num_of_hashes = 1)
// - `abc "##d##"` is wrapped as `r###"abc ##"d"##"###` (num_of_hashes = 3)
let mut num_of_hashes = 0;
let mut count = 0;
for ch in data.as_str().chars() {
count = match ch {
'"' => 1,
'#' if count > 0 => count + 1,
_ => 0,
};
num_of_hashes = cmp::max(num_of_hashes, count);
}
// `/// foo` becomes `[doc = r"foo"]`.
let delim_span = DelimSpan::from_single(span);
let body = TokenTree::Delimited(
delim_span,
DelimSpacing::new(Spacing::JointHidden, Spacing::Alone),
Delimiter::Bracket,
[
2024-02-13 23:28:27 +00:00
TokenTree::token_alone(token::Ident(sym::doc, token::IdentIsRaw::No), span),
TokenTree::token_alone(token::Eq, span),
TokenTree::token_alone(
TokenKind::lit(token::StrRaw(num_of_hashes), data, None),
span,
),
]
.into_iter()
.collect::<TokenStream>(),
);
if attr_style == AttrStyle::Inner {
vec![
Improve `print_tts` by changing `tokenstream::Spacing`. `tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-08-08 01:43:44 +00:00
TokenTree::token_joint(token::Pound, span),
TokenTree::token_joint_hidden(token::Not, span),
body,
]
} else {
vec![TokenTree::token_joint_hidden(token::Pound, span), body]
}
}
}
}
/// By-reference iterator over a [`TokenStream`], that produces `&TokenTree`
/// items.
#[derive(Clone)]
pub struct RefTokenTreeCursor<'t> {
stream: &'t TokenStream,
index: usize,
}
impl<'t> RefTokenTreeCursor<'t> {
fn new(stream: &'t TokenStream) -> Self {
RefTokenTreeCursor { stream, index: 0 }
}
pub fn look_ahead(&self, n: usize) -> Option<&TokenTree> {
self.stream.0.get(self.index + n)
}
}
impl<'t> Iterator for RefTokenTreeCursor<'t> {
type Item = &'t TokenTree;
fn next(&mut self) -> Option<&'t TokenTree> {
self.stream.0.get(self.index).map(|tree| {
self.index += 1;
tree
})
}
}
/// Owning by-value iterator over a [`TokenStream`], that produces `&TokenTree`
/// items.
///
/// Doesn't impl `Iterator` because Rust doesn't permit an owning iterator to
/// return `&T` from `next`; the need for an explicit lifetime in the `Item`
/// associated type gets in the way. Instead, use `next_ref` (which doesn't
/// involve associated types) for getting individual elements, or
/// `RefTokenTreeCursor` if you really want an `Iterator`, e.g. in a `for`
/// loop.
#[derive(Clone, Debug)]
pub struct TokenTreeCursor {
pub stream: TokenStream,
index: usize,
2017-01-18 03:27:09 +00:00
}
impl TokenTreeCursor {
fn new(stream: TokenStream) -> Self {
TokenTreeCursor { stream, index: 0 }
}
2022-04-20 02:43:25 +00:00
#[inline]
pub fn next_ref(&mut self) -> Option<&TokenTree> {
self.stream.0.get(self.index).map(|tree| {
self.index += 1;
tree
})
}
pub fn look_ahead(&self, n: usize) -> Option<&TokenTree> {
self.stream.0.get(self.index + n)
}
}
#[derive(Debug, Copy, Clone, PartialEq, Encodable, Decodable, HashStable_Generic)]
pub struct DelimSpan {
pub open: Span,
pub close: Span,
}
impl DelimSpan {
pub fn from_single(sp: Span) -> Self {
DelimSpan { open: sp, close: sp }
}
pub fn from_pair(open: Span, close: Span) -> Self {
DelimSpan { open, close }
}
pub fn dummy() -> Self {
Self::from_single(DUMMY_SP)
}
pub fn entire(self) -> Span {
self.open.with_hi(self.close.hi())
}
}
#[derive(Copy, Clone, Debug, PartialEq, Encodable, Decodable, HashStable_Generic)]
pub struct DelimSpacing {
pub open: Spacing,
pub close: Spacing,
}
impl DelimSpacing {
pub fn new(open: Spacing, close: Spacing) -> DelimSpacing {
DelimSpacing { open, close }
}
}
// Some types are used a lot. Make sure they don't unintentionally get bigger.
#[cfg(target_pointer_width = "64")]
mod size_asserts {
use rustc_data_structures::static_assert_size;
use super::*;
// tidy-alphabetical-start
static_assert_size!(AttrTokenStream, 8);
static_assert_size!(AttrTokenTree, 32);
static_assert_size!(LazyAttrTokenStream, 8);
static_assert_size!(Option<LazyAttrTokenStream>, 8); // must be small, used in many AST nodes
static_assert_size!(TokenStream, 8);
static_assert_size!(TokenTree, 32);
// tidy-alphabetical-end
}