mirror of
https://github.com/rust-lang/rust.git
synced 2025-04-29 11:37:39 +00:00
Add raw string literal ambiguity document
This commit is contained in:
parent
19e1f5cdb6
commit
1a1a9d5445
29
src/grammar/raw-string-literal-ambiguity.md
Normal file
29
src/grammar/raw-string-literal-ambiguity.md
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
Rust's lexical grammar is not context-free. Raw string literals are the source
|
||||||
|
of the problem. Informally, a raw string literal is an `r`, followed by `N`
|
||||||
|
hashes (where N can be zero), a quote, any characters, then a quote followed
|
||||||
|
by `N` hashes. This grammar describes this as best possible:
|
||||||
|
|
||||||
|
R -> 'r' S
|
||||||
|
S -> '"' B '"'
|
||||||
|
S -> '#' S '#'
|
||||||
|
B -> . B
|
||||||
|
B -> ε
|
||||||
|
|
||||||
|
Where `.` represents any character, and `ε` the empty string. Consider the
|
||||||
|
string `r#""#"#`. This string is not a valid raw string literal, but can be
|
||||||
|
accepted as one by the above grammar, using the derivation:
|
||||||
|
|
||||||
|
R : #""#"#
|
||||||
|
S : ""#"
|
||||||
|
S : "#
|
||||||
|
B : #
|
||||||
|
B : ε
|
||||||
|
|
||||||
|
(Where `T : U` means the rule `T` is applied, and `U` is the remainder of the
|
||||||
|
string.) The difficulty arises from the fact that it is fundamentally
|
||||||
|
context-sensitive. In particular, the context needed is the number of hashes.
|
||||||
|
I know of no way to resolve this, but also have not come up with a proof that
|
||||||
|
it is not context sensitive. Such a proof would probably use the pumping lemma
|
||||||
|
for context-free languages, but I (cmr) could not come up with a proof after
|
||||||
|
spending a few hours on it, and decided my time best spent elsewhere. Pull
|
||||||
|
request welcome!
|
Loading…
Reference in New Issue
Block a user