Describe numeric and textual literals better; clean up lexeme descriptions a bit.

This commit is contained in:
Graydon Hoare 2010-07-01 09:00:47 -07:00
parent aa614d5280
commit 3aaff59dba

View File

@ -583,39 +583,42 @@ Unicode characters.
* Ref.Lex.Sym:: Special symbol tokens.
@end menu
@page
@node
@node Ref.Lex.Ignore
@subsection Ref.Lex.Ignore
@c * Ref.Lex.Ignore:: Ignored tokens.
The classes of @emph{whitespace} and @emph{comment} is ignored, and are not
considered as tokens.
Characters considered to be @emph{whitespace} or @emph{comment} are ignored,
and are not considered as tokens. They serve only to delimit tokens. Rust is
otherwise a free-form language.
@dfn{Whitespace} is any of the following Unicode characters: U+0020 (space),
U+0009 (tab, @code{'\t'}), U+000A (LF, @code{'\n'}), U+000D (CR, @code{'\r'}).
@dfn{Comments} are any sequence of Unicode characters beginning with U+002F
U+002F (@code{//}) and extending to the next U+000a character,
U+002F (@code{"//"}) and extending to the next U+000A character,
@emph{excluding} cases in which such a sequence occurs within a string literal
token or a syntactic extension token.
@page
@node Ref.Lex.Ident
@subsection Ref.Lex.Ident
@c * Ref.Lex.Ident:: Identifier tokens.
Identifiers follow the pattern of C identifiers: they begin with a
@emph{letter} or underscore character @code{_} (Unicode character U+005f), and
continue with any combination of @emph{letters}, @emph{digits} and
underscores, and must not be equal to any keyword. @xref{Ref.Lex.Key}.
@emph{letter} or @emph{underscore}, and continue with any combination of
@emph{letters}, @emph{decimal digits} and underscores, and must not be equal
to any keyword. @xref{Ref.Lex.Key}.
A @emph{letter} is a Unicode character in the ranges U+0061-U+007A and
U+0041-U+005A (@code{a-z} and @code{A-Z}).
U+0041-U+005A (@code{'a'}-@code{'z'} and @code{'A'}-@code{'Z'}).
A @emph{digit} is a Unicode character in the range U+0030-U0039 (@code{0-9}).
An @dfn{underscore} is the character U+005F ('_').
A @dfn{decimal digit} is a character in the range U+0030-U+0039
(@code{'0'}-@code{'9'}).
@page
@node Ref.Lex.Key
@subsection Ref.Lex.Key
@c * Ref.Lex.Key:: Keyword tokens.
@ -701,25 +704,91 @@ The keywords are:
@subsection Ref.Lex.Num
@c * Ref.Lex.Num:: Numeric tokens.
@emph{TODO: describe numeric literals}.
A @dfn{number literal} is either an @emph{integer literal} or a
@emph{floating-point literal}.
@sp 1
An @dfn{integer literal} has one of three forms:
@enumerate
@item A @dfn{decimal literal} starts with a @emph{decimal digit} and continues
with any mixture of @emph{decimal digits} and @emph{underscores}.
@item A @dfn{hex literal} starts with the character sequence U+0030
U+0078 (@code{"0x"}) and continues as any mixture @emph{hex digits}
and @emph{underscores}.
@item A @dfn{binary literal} starts with the character sequence U+0030
U+0062 (@code{"0b"}) and continues as any mixture @emph{binary digits}
and @emph{underscores}.
@end enumerate
@sp 1
A @dfn{floating point literal} has one of two forms:
@enumerate
@item Two @emph{decimal literals} separated by a period
character U+002E ('.'), with an optional @emph{exponent} trailing after the
second @emph{decimal literal}.
@item A single @emph{decimal literal} followed by an @emph{exponent}.
@end enumerate
@sp 1
A @dfn{hex digit} is either a @emph{decimal digit} or else a character in the
ranges U+0061-U+0066 and U+0041-U+0046 (@code{'a'}-@code{'f'},
@code{'A'}-@code{'F'}).
A @dfn{binary digit} is either the character U+0030 or U+0031 (@code{'0'} or
@code{'1'}).
An @dfn{exponent} begins with either of the characters U+0065 or U+0045
(@code{'e'} or @code{'E'}), followed by an optional @emph{sign character},
followed by a trailing @emph{decimal literal}.
A @dfn{sign character} is either U+002B or U+002D (@code{'+'} or @code{'-'}).
@page
@node Ref.Lex.Text
@subsection Ref.Lex.Text
@c * Ref.Lex.Key:: String and character tokens.
@emph{TODO: describe string and character literals}.
A @dfn{character literal} is a single Unicode character enclosed within two
U+0027 (single-quote) characters, with the exception of U+0027 itself, which
must be @emph{escaped} by a preceding U+005C character ('\').
A @dfn{string literal} is a sequence of any Unicode characters enclosed
within two U+0022 (double-quote) characters, with the exception of U+0022
itself, which must be @emph{escaped} by a preceding U+005C character
('\').
Some additional @emph{escapes} are available in either character or string
literals. An escape starts with a U+005C ('\') and continues with one
of the following forms:
@itemize
@item An @dfn{8-bit codepoint escape} escape starts with U+0078 ('x') and is
followed by exactly two @dfn{hex digits}. It denotes the Unicode codepoint
equal to the provided hex value.
@item A @dfn{16-bit codepoint escape} starts with U+0075 ('u') and is followed
by exactly four @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{32-bit codepoint escape} starts with U+0055 ('U') and is followed
by exactly eight @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{whitespace escape} is one of the characters U+006E, U+0072, or
U+0074, denoting the unicode values U+000A (LF), U+000D (CR) or U+0009 (HT)
respectively.
@item The @dfn{backslash escape} is the character U+005C ('\') which must be
escaped in order to denote @emph{itself}.
@end itemize
@page
@node Ref.Lex.Syntax
@subsection Ref.Lex.Syntax
@c * Ref.Lex.Syntax:: Syntactic extension tokens.
Syntactic extensions are marked with the @emph{pound} sigil @code{#} (U+0023),
Syntactic extensions are marked with the @emph{pound} sigil U+0023 (@code{#}),
followed by a qualified name of a compile-time imported module item, an
optional parenthesized list of @emph{tokens}, and an optional brace-enclosed
region of free-form text (with brace-matching and brace-escaping used to
determine the limit of the region). @xref{Ref.Comp.Syntax}.
optional parenthesized list of @emph{parsed expressions}, and an optional
brace-enclosed region of free-form text (with brace-matching and
brace-escaping used to determine the limit of the
region). @xref{Ref.Comp.Syntax}.
@emph{TODO: formalize those terms more}.