Auto merge of #106505 - Nilstrieb:format-args-string-literal-episode-2, r=petrochenkov

Properly allow macro expanded `format_args` invocations to uses captures

Originally, this was kinda half-allowed. There were some primitive checks in place that looked at the span to see whether the input was likely a literal. These "source literal" checks are needed because the spans created during `format_args` parsing only make sense when it is indeed a literal that was written in the source code directly.

This is orthogonal to the restriction that the first argument must be a "direct literal", not being exanpanded from macros. This restriction was imposed by [RFC 2795] on the basis of being too confusing. But this was only concerned with the argument of the invocation being a literal, not whether it was a source literal (maybe in spirit it meant it being a source literal, this is not clear to me).

Since the original check only really cared about source literals (which is good enough to deny the `format_args!(concat!())` example), macros expanding to `format_args` invocations were able to use implicit captures if they spanned the string in a way that lead back to a source string.

The "source literal" checks were not strict enough and caused ICEs in certain cases (see #106191). So I tightened it up in #106195 to really only work if it's a direct source literal.

This caused the `indoc` crate to break. `indoc` transformed the source literal by removing whitespace, which made it not a "source literal" anymore (which is required to fix the ICE). But since `indoc` spanned the literal in ways that made the old check think that it's a literal, it was able to use implicit captures (which is useful and nice for the users of `indoc`).

This commit properly seperates the previously introduced concepts of "source literal" and "direct literal" and therefore allows `indoc` invocations, which don't create "source literals" to use implicit captures again.

Fixes #106191

[RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
This commit is contained in:
bors 2023-03-14 14:25:02 +00:00
commit 2e7034ebf7
11 changed files with 233 additions and 63 deletions

View File

@ -36,6 +36,21 @@ enum PositionUsedAs {
} }
use PositionUsedAs::*; use PositionUsedAs::*;
struct MacroInput {
fmtstr: P<Expr>,
args: FormatArguments,
/// Whether the first argument was a string literal or a result from eager macro expansion.
/// If it's not a string literal, we disallow implicit arugment capturing.
///
/// This does not correspond to whether we can treat spans to the literal normally, as the whole
/// invocation might be the result of another macro expansion, in which case this flag may still be true.
///
/// See [RFC 2795] for more information.
///
/// [RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
is_direct_literal: bool,
}
/// Parses the arguments from the given list of tokens, returning the diagnostic /// Parses the arguments from the given list of tokens, returning the diagnostic
/// if there's a parse error so we can continue parsing other format! /// if there's a parse error so we can continue parsing other format!
/// expressions. /// expressions.
@ -45,11 +60,7 @@ use PositionUsedAs::*;
/// ```text /// ```text
/// Ok((fmtstr, parsed arguments)) /// Ok((fmtstr, parsed arguments))
/// ``` /// ```
fn parse_args<'a>( fn parse_args<'a>(ecx: &mut ExtCtxt<'a>, sp: Span, tts: TokenStream) -> PResult<'a, MacroInput> {
ecx: &mut ExtCtxt<'a>,
sp: Span,
tts: TokenStream,
) -> PResult<'a, (P<Expr>, FormatArguments)> {
let mut args = FormatArguments::new(); let mut args = FormatArguments::new();
let mut p = ecx.new_parser_from_tts(tts); let mut p = ecx.new_parser_from_tts(tts);
@ -59,25 +70,21 @@ fn parse_args<'a>(
} }
let first_token = &p.token; let first_token = &p.token;
let fmtstr = match first_token.kind {
token::TokenKind::Literal(token::Lit { let fmtstr = if let token::Literal(lit) = first_token.kind && matches!(lit.kind, token::Str | token::StrRaw(_)) {
kind: token::LitKind::Str | token::LitKind::StrRaw(_), // This allows us to properly handle cases when the first comma
.. // after the format string is mistakenly replaced with any operator,
}) => { // which cause the expression parser to eat too much tokens.
// If the first token is a string literal, then a format expression p.parse_literal_maybe_minus()?
// is constructed from it. } else {
// // Otherwise, we fall back to the expression parser.
// This allows us to properly handle cases when the first comma p.parse_expr()?
// after the format string is mistakenly replaced with any operator,
// which cause the expression parser to eat too much tokens.
p.parse_literal_maybe_minus()?
}
_ => {
// Otherwise, we fall back to the expression parser.
p.parse_expr()?
}
}; };
// Only allow implicit captures to be used when the argument is a direct literal
// instead of a macro expanding to one.
let is_direct_literal = matches!(fmtstr.kind, ExprKind::Lit(_));
let mut first = true; let mut first = true;
while p.token != token::Eof { while p.token != token::Eof {
@ -147,17 +154,19 @@ fn parse_args<'a>(
} }
} }
} }
Ok((fmtstr, args)) Ok(MacroInput { fmtstr, args, is_direct_literal })
} }
pub fn make_format_args( fn make_format_args(
ecx: &mut ExtCtxt<'_>, ecx: &mut ExtCtxt<'_>,
efmt: P<Expr>, input: MacroInput,
mut args: FormatArguments,
append_newline: bool, append_newline: bool,
) -> Result<FormatArgs, ()> { ) -> Result<FormatArgs, ()> {
let msg = "format argument must be a string literal"; let msg = "format argument must be a string literal";
let unexpanded_fmt_span = efmt.span; let unexpanded_fmt_span = input.fmtstr.span;
let MacroInput { fmtstr: efmt, mut args, is_direct_literal } = input;
let (fmt_str, fmt_style, fmt_span) = match expr_to_spanned_string(ecx, efmt, msg) { let (fmt_str, fmt_style, fmt_span) = match expr_to_spanned_string(ecx, efmt, msg) {
Ok(mut fmt) if append_newline => { Ok(mut fmt) if append_newline => {
fmt.0 = Symbol::intern(&format!("{}\n", fmt.0)); fmt.0 = Symbol::intern(&format!("{}\n", fmt.0));
@ -208,11 +217,11 @@ pub fn make_format_args(
} }
} }
let is_literal = parser.is_literal; let is_source_literal = parser.is_source_literal;
if !parser.errors.is_empty() { if !parser.errors.is_empty() {
let err = parser.errors.remove(0); let err = parser.errors.remove(0);
let sp = if is_literal { let sp = if is_source_literal {
fmt_span.from_inner(InnerSpan::new(err.span.start, err.span.end)) fmt_span.from_inner(InnerSpan::new(err.span.start, err.span.end))
} else { } else {
// The format string could be another macro invocation, e.g.: // The format string could be another macro invocation, e.g.:
@ -230,7 +239,7 @@ pub fn make_format_args(
if let Some(note) = err.note { if let Some(note) = err.note {
e.note(&note); e.note(&note);
} }
if let Some((label, span)) = err.secondary_label && is_literal { if let Some((label, span)) = err.secondary_label && is_source_literal {
e.span_label(fmt_span.from_inner(InnerSpan::new(span.start, span.end)), label); e.span_label(fmt_span.from_inner(InnerSpan::new(span.start, span.end)), label);
} }
if err.should_be_replaced_with_positional_argument { if err.should_be_replaced_with_positional_argument {
@ -256,7 +265,7 @@ pub fn make_format_args(
} }
let to_span = |inner_span: rustc_parse_format::InnerSpan| { let to_span = |inner_span: rustc_parse_format::InnerSpan| {
is_literal.then(|| { is_source_literal.then(|| {
fmt_span.from_inner(InnerSpan { start: inner_span.start, end: inner_span.end }) fmt_span.from_inner(InnerSpan { start: inner_span.start, end: inner_span.end })
}) })
}; };
@ -304,7 +313,7 @@ pub fn make_format_args(
// Name not found in `args`, so we add it as an implicitly captured argument. // Name not found in `args`, so we add it as an implicitly captured argument.
let span = span.unwrap_or(fmt_span); let span = span.unwrap_or(fmt_span);
let ident = Ident::new(name, span); let ident = Ident::new(name, span);
let expr = if is_literal { let expr = if is_direct_literal {
ecx.expr_ident(span, ident) ecx.expr_ident(span, ident)
} else { } else {
// For the moment capturing variables from format strings expanded from macros is // For the moment capturing variables from format strings expanded from macros is
@ -814,7 +823,7 @@ fn report_invalid_references(
// for `println!("{7:7$}", 1);` // for `println!("{7:7$}", 1);`
indexes.sort(); indexes.sort();
indexes.dedup(); indexes.dedup();
let span: MultiSpan = if !parser.is_literal || parser.arg_places.is_empty() { let span: MultiSpan = if !parser.is_source_literal || parser.arg_places.is_empty() {
MultiSpan::from_span(fmt_span) MultiSpan::from_span(fmt_span)
} else { } else {
MultiSpan::from_spans(invalid_refs.iter().filter_map(|&(_, span, _, _)| span).collect()) MultiSpan::from_spans(invalid_refs.iter().filter_map(|&(_, span, _, _)| span).collect())
@ -855,8 +864,8 @@ fn expand_format_args_impl<'cx>(
) -> Box<dyn base::MacResult + 'cx> { ) -> Box<dyn base::MacResult + 'cx> {
sp = ecx.with_def_site_ctxt(sp); sp = ecx.with_def_site_ctxt(sp);
match parse_args(ecx, sp, tts) { match parse_args(ecx, sp, tts) {
Ok((efmt, args)) => { Ok(input) => {
if let Ok(format_args) = make_format_args(ecx, efmt, args, nl) { if let Ok(format_args) = make_format_args(ecx, input, nl) {
MacEager::expr(ecx.expr(sp, ExprKind::FormatArgs(P(format_args)))) MacEager::expr(ecx.expr(sp, ExprKind::FormatArgs(P(format_args))))
} else { } else {
MacEager::expr(DummyResult::raw_expr(sp, true)) MacEager::expr(DummyResult::raw_expr(sp, true))

View File

@ -14,6 +14,7 @@
// We want to be able to build this crate with a stable compiler, so no // We want to be able to build this crate with a stable compiler, so no
// `#![feature]` attributes should be added. // `#![feature]` attributes should be added.
use rustc_lexer::unescape;
pub use Alignment::*; pub use Alignment::*;
pub use Count::*; pub use Count::*;
pub use Piece::*; pub use Piece::*;
@ -234,8 +235,10 @@ pub struct Parser<'a> {
last_opening_brace: Option<InnerSpan>, last_opening_brace: Option<InnerSpan>,
/// Whether the source string is comes from `println!` as opposed to `format!` or `print!` /// Whether the source string is comes from `println!` as opposed to `format!` or `print!`
append_newline: bool, append_newline: bool,
/// Whether this formatting string is a literal or it comes from a macro. /// Whether this formatting string was written directly in the source. This controls whether we
pub is_literal: bool, /// can use spans to refer into it and give better error messages.
/// N.B: This does _not_ control whether implicit argument captures can be used.
pub is_source_literal: bool,
/// Start position of the current line. /// Start position of the current line.
cur_line_start: usize, cur_line_start: usize,
/// Start and end byte offset of every line of the format string. Excludes /// Start and end byte offset of every line of the format string. Excludes
@ -262,7 +265,7 @@ impl<'a> Iterator for Parser<'a> {
} else { } else {
let arg = self.argument(lbrace_end); let arg = self.argument(lbrace_end);
if let Some(rbrace_pos) = self.must_consume('}') { if let Some(rbrace_pos) = self.must_consume('}') {
if self.is_literal { if self.is_source_literal {
let lbrace_byte_pos = self.to_span_index(pos); let lbrace_byte_pos = self.to_span_index(pos);
let rbrace_byte_pos = self.to_span_index(rbrace_pos); let rbrace_byte_pos = self.to_span_index(rbrace_pos);
@ -302,7 +305,7 @@ impl<'a> Iterator for Parser<'a> {
_ => Some(String(self.string(pos))), _ => Some(String(self.string(pos))),
} }
} else { } else {
if self.is_literal { if self.is_source_literal {
let span = self.span(self.cur_line_start, self.input.len()); let span = self.span(self.cur_line_start, self.input.len());
if self.line_spans.last() != Some(&span) { if self.line_spans.last() != Some(&span) {
self.line_spans.push(span); self.line_spans.push(span);
@ -322,8 +325,8 @@ impl<'a> Parser<'a> {
append_newline: bool, append_newline: bool,
mode: ParseMode, mode: ParseMode,
) -> Parser<'a> { ) -> Parser<'a> {
let input_string_kind = find_width_map_from_snippet(snippet, style); let input_string_kind = find_width_map_from_snippet(s, snippet, style);
let (width_map, is_literal) = match input_string_kind { let (width_map, is_source_literal) = match input_string_kind {
InputStringKind::Literal { width_mappings } => (width_mappings, true), InputStringKind::Literal { width_mappings } => (width_mappings, true),
InputStringKind::NotALiteral => (Vec::new(), false), InputStringKind::NotALiteral => (Vec::new(), false),
}; };
@ -339,7 +342,7 @@ impl<'a> Parser<'a> {
width_map, width_map,
last_opening_brace: None, last_opening_brace: None,
append_newline, append_newline,
is_literal, is_source_literal,
cur_line_start: 0, cur_line_start: 0,
line_spans: vec![], line_spans: vec![],
} }
@ -532,13 +535,13 @@ impl<'a> Parser<'a> {
'{' | '}' => { '{' | '}' => {
return &self.input[start..pos]; return &self.input[start..pos];
} }
'\n' if self.is_literal => { '\n' if self.is_source_literal => {
self.line_spans.push(self.span(self.cur_line_start, pos)); self.line_spans.push(self.span(self.cur_line_start, pos));
self.cur_line_start = pos + 1; self.cur_line_start = pos + 1;
self.cur.next(); self.cur.next();
} }
_ => { _ => {
if self.is_literal && pos == self.cur_line_start && c.is_whitespace() { if self.is_source_literal && pos == self.cur_line_start && c.is_whitespace() {
self.cur_line_start = pos + c.len_utf8(); self.cur_line_start = pos + c.len_utf8();
} }
self.cur.next(); self.cur.next();
@ -890,6 +893,7 @@ impl<'a> Parser<'a> {
/// written code (code snippet) and the `InternedString` that gets processed in the `Parser` /// written code (code snippet) and the `InternedString` that gets processed in the `Parser`
/// in order to properly synthesise the intra-string `Span`s for error diagnostics. /// in order to properly synthesise the intra-string `Span`s for error diagnostics.
fn find_width_map_from_snippet( fn find_width_map_from_snippet(
input: &str,
snippet: Option<string::String>, snippet: Option<string::String>,
str_style: Option<usize>, str_style: Option<usize>,
) -> InputStringKind { ) -> InputStringKind {
@ -902,8 +906,27 @@ fn find_width_map_from_snippet(
return InputStringKind::Literal { width_mappings: Vec::new() }; return InputStringKind::Literal { width_mappings: Vec::new() };
} }
// Strip quotes.
let snippet = &snippet[1..snippet.len() - 1]; let snippet = &snippet[1..snippet.len() - 1];
// Macros like `println` add a newline at the end. That technically doens't make them "literals" anymore, but it's fine
// since we will never need to point our spans there, so we lie about it here by ignoring it.
// Since there might actually be newlines in the source code, we need to normalize away all trailing newlines.
// If we only trimmed it off the input, `format!("\n")` would cause a mismatch as here we they actually match up.
// Alternatively, we could just count the trailing newlines and only trim one from the input if they don't match up.
let input_no_nl = input.trim_end_matches('\n');
let Some(unescaped) = unescape_string(snippet) else {
return InputStringKind::NotALiteral;
};
let unescaped_no_nl = unescaped.trim_end_matches('\n');
if unescaped_no_nl != input_no_nl {
// The source string that we're pointing at isn't our input, so spans pointing at it will be incorrect.
// This can for example happen with proc macros that respan generated literals.
return InputStringKind::NotALiteral;
}
let mut s = snippet.char_indices(); let mut s = snippet.char_indices();
let mut width_mappings = vec![]; let mut width_mappings = vec![];
while let Some((pos, c)) = s.next() { while let Some((pos, c)) = s.next() {
@ -986,6 +1009,19 @@ fn find_width_map_from_snippet(
InputStringKind::Literal { width_mappings } InputStringKind::Literal { width_mappings }
} }
fn unescape_string(string: &str) -> Option<string::String> {
let mut buf = string::String::new();
let mut ok = true;
unescape::unescape_literal(string, unescape::Mode::Str, &mut |_, unescaped_char| {
match unescaped_char {
Ok(c) => buf.push(c),
Err(_) => ok = false,
}
});
ok.then_some(buf)
}
// Assert a reasonable size for `Piece` // Assert a reasonable size for `Piece`
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))] #[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
rustc_data_structures::static_assert_size!(Piece<'_>, 16); rustc_data_structures::static_assert_size!(Piece<'_>, 16);

View File

@ -28,25 +28,41 @@ pub fn err_with_input_span(input: TokenStream) -> TokenStream {
TokenStream::from(TokenTree::Literal(lit)) TokenStream::from(TokenTree::Literal(lit))
} }
fn build_format(args: impl Into<TokenStream>) -> TokenStream {
TokenStream::from_iter([
TokenTree::from(Ident::new("format", Span::call_site())),
TokenTree::from(Punct::new('!', Spacing::Alone)),
TokenTree::from(Group::new(Delimiter::Parenthesis, args.into())),
])
}
#[proc_macro] #[proc_macro]
pub fn respan_to_invalid_format_literal(input: TokenStream) -> TokenStream { pub fn respan_to_invalid_format_literal(input: TokenStream) -> TokenStream {
let mut s = Literal::string("{"); let mut s = Literal::string("{");
s.set_span(input.into_iter().next().unwrap().span()); s.set_span(input.into_iter().next().unwrap().span());
TokenStream::from_iter([
TokenTree::from(Ident::new("format", Span::call_site())), build_format(TokenTree::from(s))
TokenTree::from(Punct::new('!', Spacing::Alone)),
TokenTree::from(Group::new(Delimiter::Parenthesis, TokenTree::from(s).into())),
])
} }
#[proc_macro] #[proc_macro]
pub fn capture_a_with_prepended_space_preserve_span(input: TokenStream) -> TokenStream { pub fn capture_a_with_prepended_space_preserve_span(input: TokenStream) -> TokenStream {
let mut s = Literal::string(" {a}"); let mut s = Literal::string(" {a}");
s.set_span(input.into_iter().next().unwrap().span()); s.set_span(input.into_iter().next().unwrap().span());
TokenStream::from_iter([
TokenTree::from(Ident::new("format", Span::call_site())), build_format(TokenTree::from(s))
TokenTree::from(Punct::new('!', Spacing::Alone)), }
TokenTree::from(Group::new(Delimiter::Parenthesis, TokenTree::from(s).into())),
]) #[proc_macro]
pub fn format_args_captures(_: TokenStream) -> TokenStream {
r#"{ let x = 5; format!("{x}") }"#.parse().unwrap()
}
#[proc_macro]
pub fn bad_format_args_captures(_: TokenStream) -> TokenStream {
r#"{ let x = 5; format!(concat!("{x}")) }"#.parse().unwrap()
}
#[proc_macro]
pub fn identity_pm(input: TokenStream) -> TokenStream {
input
} }

View File

@ -0,0 +1,21 @@
// aux-build:format-string-proc-macro.rs
#[macro_use]
extern crate format_string_proc_macro;
macro_rules! identity_mbe {
($tt:tt) => {
$tt
//~^ ERROR there is no argument named `a`
};
}
fn main() {
let a = 0;
format!(identity_pm!("{a}"));
//~^ ERROR there is no argument named `a`
format!(identity_mbe!("{a}"));
format!(concat!("{a}"));
//~^ ERROR there is no argument named `a`
}

View File

@ -0,0 +1,30 @@
error: there is no argument named `a`
--> $DIR/format-args-capture-first-literal-is-macro.rs:16:26
|
LL | format!(identity_pm!("{a}"));
| ^^^^^
|
= note: did you intend to capture a variable `a` from the surrounding scope?
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
error: there is no argument named `a`
--> $DIR/format-args-capture-first-literal-is-macro.rs:8:9
|
LL | $tt
| ^^^
|
= note: did you intend to capture a variable `a` from the surrounding scope?
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
error: there is no argument named `a`
--> $DIR/format-args-capture-first-literal-is-macro.rs:19:13
|
LL | format!(concat!("{a}"));
| ^^^^^^^^^^^^^^
|
= note: did you intend to capture a variable `a` from the surrounding scope?
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
= note: this error originates in the macro `concat` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to 3 previous errors

View File

@ -0,0 +1,8 @@
// aux-build:format-string-proc-macro.rs
extern crate format_string_proc_macro;
fn main() {
format_string_proc_macro::bad_format_args_captures!();
//~^ ERROR there is no argument named `x`
}

View File

@ -0,0 +1,12 @@
error: there is no argument named `x`
--> $DIR/format-args-capture-from-pm-first-arg-macro.rs:6:5
|
LL | format_string_proc_macro::bad_format_args_captures!();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: did you intend to capture a variable `x` from the surrounding scope?
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
= note: this error originates in the macro `concat` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to previous error

View File

@ -0,0 +1,10 @@
// check-pass
// aux-build:format-string-proc-macro.rs
extern crate format_string_proc_macro;
fn main() {
// While literal macros like `format_args!(concat!())` are not supposed to work with implicit
// captures, it should work if the whole invocation comes from a macro expansion (#106408).
format_string_proc_macro::format_args_captures!();
}

View File

@ -0,0 +1,16 @@
// run-pass
macro_rules! format_mbe {
($tt:tt) => {
{
#[allow(unused_variables)]
let a = 123;
format!($tt)
}
};
}
fn main() {
let a = 0;
assert_eq!(format_mbe!("{a}"), "0");
}

View File

@ -1,15 +1,10 @@
// aux-build:format-string-proc-macro.rs // aux-build:format-string-proc-macro.rs
// check-fail
// known-bug: #106191
// unset-rustc-env:RUST_BACKTRACE
// had to be reverted
// error-pattern:unexpectedly panicked
// failure-status:101
// dont-check-compiler-stderr
extern crate format_string_proc_macro; extern crate format_string_proc_macro;
fn main() { fn main() {
format_string_proc_macro::respan_to_invalid_format_literal!("¡"); format_string_proc_macro::respan_to_invalid_format_literal!("¡");
//~^ ERROR invalid format string: expected `'}'` but string was terminated
format_args!(r#concat!("¡ {")); format_args!(r#concat!("¡ {"));
//~^ ERROR invalid format string: expected `'}'` but string was terminated
} }

View File

@ -1,2 +1,19 @@
query stack during panic: error: invalid format string: expected `'}'` but string was terminated
end of query stack --> $DIR/respanned-literal-issue-106191.rs:6:65
|
LL | format_string_proc_macro::respan_to_invalid_format_literal!("¡");
| ^^^ expected `'}'` in format string
|
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'` but string was terminated
--> $DIR/respanned-literal-issue-106191.rs:8:18
|
LL | format_args!(r#concat!("¡ {"));
| ^^^^^^^^^^^^^^^^^^^^^^^ expected `'}'` in format string
|
= note: if you intended to print `{`, you can escape it using `{{`
= note: this error originates in the macro `concat` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to 2 previous errors