rust/compiler/rustc_expand/src/module.rs

290 lines
11 KiB
Rust
Raw Normal View History

use std::iter::once;
use std::path::{self, Path, PathBuf};
use rustc_ast::ptr::P;
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
use rustc_ast::{AttrVec, Attribute, Inline, Item, ModSpans};
use rustc_errors::{Diag, ErrorGuaranteed};
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
use rustc_parse::{exp, new_parser_from_file, unwrap_or_emit_fatal, validate_attr};
use rustc_session::Session;
use rustc_session::parse::ParseSess;
use rustc_span::{Ident, Span, sym};
2023-01-30 04:39:22 +00:00
use thin_vec::ThinVec;
2019-08-11 16:34:42 +00:00
use crate::base::ModuleData;
use crate::errors::{
ModuleCircular, ModuleFileNotFound, ModuleInBlock, ModuleInBlockName, ModuleMultipleCandidates,
};
2020-03-08 21:10:37 +00:00
#[derive(Copy, Clone)]
pub enum DirOwnership {
2020-03-08 21:10:37 +00:00
Owned {
// None if `mod.rs`, `Some("foo")` if we're in `foo.rs`.
2020-04-19 11:00:18 +00:00
relative: Option<Ident>,
2020-03-08 21:10:37 +00:00
},
UnownedViaBlock,
}
2020-01-11 19:19:57 +00:00
// Public for rustfmt usage.
pub struct ModulePathSuccess {
pub file_path: PathBuf,
pub dir_ownership: DirOwnership,
2019-08-11 16:34:42 +00:00
}
pub(crate) struct ParsedExternalMod {
2023-01-30 04:39:22 +00:00
pub items: ThinVec<P<Item>>,
2022-03-03 23:45:25 +00:00
pub spans: ModSpans,
pub file_path: PathBuf,
pub dir_path: PathBuf,
pub dir_ownership: DirOwnership,
pub had_parse_error: Result<(), ErrorGuaranteed>,
}
pub enum ModError<'a> {
CircularInclusion(Vec<PathBuf>),
ModInBlock(Option<Ident>),
FileNotFound(Ident, PathBuf, PathBuf),
2021-05-03 10:57:48 +00:00
MultipleCandidates(Ident, PathBuf, PathBuf),
ParserError(Diag<'a>),
}
pub(crate) fn parse_external_mod(
sess: &Session,
ident: Ident,
2020-03-09 10:16:00 +00:00
span: Span, // The span to blame on errors.
module: &ModuleData,
mut dir_ownership: DirOwnership,
attrs: &mut AttrVec,
) -> ParsedExternalMod {
2020-03-08 12:36:20 +00:00
// We bail on the first error, but that error does not cause a fatal error... (1)
let result: Result<_, ModError<'_>> = try {
2020-03-08 12:36:20 +00:00
// Extract the file path and the new ownership.
let mp = mod_file_path(sess, ident, attrs, &module.dir_path, dir_ownership)?;
dir_ownership = mp.dir_ownership;
2020-03-08 12:36:20 +00:00
// Ensure file paths are acyclic.
if let Some(pos) = module.file_path_stack.iter().position(|p| p == &mp.file_path) {
do yeet ModError::CircularInclusion(module.file_path_stack[pos..].to_vec());
}
2020-03-08 12:36:20 +00:00
2020-03-21 21:51:03 +00:00
// Actually parse the external file as a module.
let mut parser =
unwrap_or_emit_fatal(new_parser_from_file(&sess.psess, &mp.file_path, Some(span)));
let (inner_attrs, items, inner_span) =
Speed up `Parser::expected_token_types`. The parser pushes a `TokenType` to `Parser::expected_token_types` on every call to the various `check`/`eat` methods, and clears it on every call to `bump`. Some of those `TokenType` values are full tokens that require cloning and dropping. This is a *lot* of work for something that is only used in error messages and it accounts for a significant fraction of parsing execution time. This commit overhauls `TokenType` so that `Parser::expected_token_types` can be implemented as a bitset. This requires changing `TokenType` to a C-style parameterless enum, and adding `TokenTypeSet` which uses a `u128` for the bits. (The new `TokenType` has 105 variants.) The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to the `check`/`eat` methods. This is for maximum speed. The elements in the pairs are always statically known; e.g. a `token::BinOp(token::Star)` is always paired with a `TokenType::Star`. So we now compute `TokenType`s in advance and pass them in to `check`/`eat` rather than the current approach of constructing them on insertion into `expected_token_types`. Values of these pair types can be produced by the new `exp!` macro, which is used at every `check`/`eat` call site. The macro is for convenience, allowing any pair to be generated from a single identifier. The ident/keyword filtering in `expected_one_of_not_found` is no longer necessary. It was there to account for some sloppiness in `TokenKind`/`TokenType` comparisons. The existing `TokenType` is moved to a new file `token_type.rs`, and all its new infrastructure is added to that file. There is more boilerplate code than I would like, but I can't see how to make it shorter.
2024-12-04 04:55:06 +00:00
parser.parse_mod(exp!(Eof)).map_err(|err| ModError::ParserError(err))?;
attrs.extend(inner_attrs);
(items, inner_span, mp.file_path)
2020-03-08 12:36:20 +00:00
};
2020-03-08 12:36:20 +00:00
// (1) ...instead, we return a dummy module.
let ((items, spans, file_path), had_parse_error) = match result {
Err(err) => (Default::default(), Err(err.report(sess, span))),
Ok(result) => (result, Ok(())),
};
2020-03-08 12:36:20 +00:00
// Extract the directory path for submodules of the module.
let dir_path = file_path.parent().unwrap_or(&file_path).to_owned();
2020-03-08 12:36:20 +00:00
ParsedExternalMod { items, spans, file_path, dir_path, dir_ownership, had_parse_error }
2020-03-08 08:54:19 +00:00
}
2020-03-07 18:53:25 +00:00
pub(crate) fn mod_dir_path(
sess: &Session,
ident: Ident,
2020-03-08 08:54:19 +00:00
attrs: &[Attribute],
module: &ModuleData,
mut dir_ownership: DirOwnership,
inline: Inline,
) -> (PathBuf, DirOwnership) {
match inline {
2023-10-13 08:58:33 +00:00
Inline::Yes
if let Some(file_path) = mod_file_path_from_attr(sess, attrs, &module.dir_path) =>
{
2021-08-16 15:29:49 +00:00
// For inline modules file path from `#[path]` is actually the directory path
// for historical reasons, so we don't pop the last segment here.
(file_path, DirOwnership::Owned { relative: None })
}
Inline::Yes => {
// We have to push on the current module name in the case of relative
// paths in order to ensure that any additional module paths from inline
// `mod x { ... }` come after the relative extension.
//
// For example, a `mod z { ... }` inside `x/y.rs` should set the current
// directory path to `/x/y/z`, not `/x/z` with a relative offset of `y`.
let mut dir_path = module.dir_path.clone();
if let DirOwnership::Owned { relative } = &mut dir_ownership {
if let Some(ident) = relative.take() {
// Remove the relative offset.
dir_path.push(ident.as_str());
}
}
dir_path.push(ident.as_str());
(dir_path, dir_ownership)
2019-08-11 16:34:42 +00:00
}
Inline::No => {
// FIXME: This is a subset of `parse_external_mod` without actual parsing,
// check whether the logic for unloaded, loaded and inline modules can be unified.
let file_path = mod_file_path(sess, ident, attrs, &module.dir_path, dir_ownership)
.map(|mp| {
dir_ownership = mp.dir_ownership;
mp.file_path
})
.unwrap_or_default();
// Extract the directory path for submodules of the module.
let dir_path = file_path.parent().unwrap_or(&file_path).to_owned();
(dir_path, dir_ownership)
}
}
2019-08-11 16:34:42 +00:00
}
2020-03-08 08:28:46 +00:00
fn mod_file_path<'a>(
sess: &'a Session,
ident: Ident,
2020-03-08 11:19:27 +00:00
attrs: &[Attribute],
2020-03-08 08:28:46 +00:00
dir_path: &Path,
dir_ownership: DirOwnership,
) -> Result<ModulePathSuccess, ModError<'a>> {
if let Some(file_path) = mod_file_path_from_attr(sess, attrs, dir_path) {
// All `#[path]` files are treated as though they are a `mod.rs` file.
// This means that `mod foo;` declarations inside `#[path]`-included
// files are siblings,
//
// Note that this will produce weirdness when a file named `foo.rs` is
// `#[path]` included and contains a `mod foo;` declaration.
// If you encounter this, it's your own darn fault :P
let dir_ownership = DirOwnership::Owned { relative: None };
return Ok(ModulePathSuccess { file_path, dir_ownership });
2020-03-08 08:28:46 +00:00
}
let relative = match dir_ownership {
DirOwnership::Owned { relative } => relative,
DirOwnership::UnownedViaBlock => None,
2020-03-08 08:28:46 +00:00
};
let result = default_submod_path(&sess.psess, ident, relative, dir_path);
match dir_ownership {
DirOwnership::Owned { .. } => result,
DirOwnership::UnownedViaBlock => Err(ModError::ModInBlock(match result {
Ok(_) | Err(ModError::MultipleCandidates(..)) => Some(ident),
_ => None,
})),
2020-03-08 08:28:46 +00:00
}
}
/// Derive a submodule path from the first found `#[path = "path_string"]`.
/// The provided `dir_path` is joined with the `path_string`.
pub(crate) fn mod_file_path_from_attr(
sess: &Session,
attrs: &[Attribute],
dir_path: &Path,
) -> Option<PathBuf> {
2020-03-08 08:28:46 +00:00
// Extract path string from first `#[path = "path_string"]` attribute.
let first_path = attrs.iter().find(|at| at.has_name(sym::path))?;
2022-02-18 23:48:49 +00:00
let Some(path_sym) = first_path.value_str() else {
// This check is here mainly to catch attempting to use a macro,
// such as `#[path = concat!(...)]`. This isn't supported because
// otherwise the `InvocationCollector` would need to defer loading
// a module until the `#[path]` attribute was expanded, and it
// doesn't support that (and would likely add a bit of complexity).
// Usually bad forms are checked during semantic analysis via
// `TyCtxt::check_mod_attrs`), but by the time that runs the macro
2022-02-18 23:48:49 +00:00
// is expanded, and it doesn't give an error.
validate_attr::emit_fatal_malformed_builtin_attribute(&sess.psess, first_path, sym::path);
};
2020-03-08 08:28:46 +00:00
let path_str = path_sym.as_str();
2020-03-08 08:28:46 +00:00
// On windows, the base path might have the form
// `\\?\foo\bar` in which case it does not tolerate
// mixed `/` and `\` separators, so canonicalize
// `/` to `\`.
#[cfg(windows)]
let path_str = path_str.replace("/", "\\");
2020-03-08 08:28:46 +00:00
Some(dir_path.join(path_str))
2020-03-08 08:28:46 +00:00
}
/// Returns a path to a module.
// Public for rustfmt usage.
pub fn default_submod_path<'a>(
psess: &'a ParseSess,
ident: Ident,
2020-04-19 11:00:18 +00:00
relative: Option<Ident>,
2020-03-08 08:28:46 +00:00
dir_path: &Path,
) -> Result<ModulePathSuccess, ModError<'a>> {
2020-03-08 08:28:46 +00:00
// If we're in a foo.rs file instead of a mod.rs file,
// we need to look for submodules in
// `./foo/<ident>.rs` and `./foo/<ident>/mod.rs` rather than
// `./<ident>.rs` and `./<ident>/mod.rs`.
2020-03-08 08:28:46 +00:00
let relative_prefix_string;
let relative_prefix = if let Some(ident) = relative {
relative_prefix_string = format!("{}{}", ident.name, path::MAIN_SEPARATOR);
&relative_prefix_string
} else {
""
};
let default_path_str = format!("{}{}.rs", relative_prefix, ident.name);
2020-03-08 08:28:46 +00:00
let secondary_path_str =
format!("{}{}{}mod.rs", relative_prefix, ident.name, path::MAIN_SEPARATOR);
2020-03-08 08:28:46 +00:00
let default_path = dir_path.join(&default_path_str);
let secondary_path = dir_path.join(&secondary_path_str);
let default_exists = psess.source_map().file_exists(&default_path);
let secondary_exists = psess.source_map().file_exists(&secondary_path);
2020-03-08 08:28:46 +00:00
match (default_exists, secondary_exists) {
2020-03-08 08:28:46 +00:00
(true, false) => Ok(ModulePathSuccess {
file_path: default_path,
dir_ownership: DirOwnership::Owned { relative: Some(ident) },
2020-03-08 08:28:46 +00:00
}),
(false, true) => Ok(ModulePathSuccess {
file_path: secondary_path,
dir_ownership: DirOwnership::Owned { relative: None },
2020-03-08 08:28:46 +00:00
}),
(false, false) => Err(ModError::FileNotFound(ident, default_path, secondary_path)),
2021-05-03 10:57:48 +00:00
(true, true) => Err(ModError::MultipleCandidates(ident, default_path, secondary_path)),
}
}
2020-03-08 08:28:46 +00:00
impl ModError<'_> {
fn report(self, sess: &Session, span: Span) -> ErrorGuaranteed {
match self {
ModError::CircularInclusion(file_paths) => {
let path_to_string = |path: &PathBuf| path.display().to_string();
let paths = file_paths
.iter()
.map(path_to_string)
.chain(once(path_to_string(&file_paths[0])))
.collect::<Vec<_>>();
let modules = paths.join(" -> ");
sess.dcx().emit_err(ModuleCircular { span, modules })
}
ModError::ModInBlock(ident) => sess.dcx().emit_err(ModuleInBlock {
span,
name: ident.map(|name| ModuleInBlockName { span, name }),
}),
ModError::FileNotFound(name, default_path, secondary_path) => {
sess.dcx().emit_err(ModuleFileNotFound {
span,
name,
default_path: default_path.display().to_string(),
secondary_path: secondary_path.display().to_string(),
})
}
ModError::MultipleCandidates(name, default_path, secondary_path) => {
sess.dcx().emit_err(ModuleMultipleCandidates {
span,
name,
default_path: default_path.display().to_string(),
secondary_path: secondary_path.display().to_string(),
})
}
Make `DiagnosticBuilder::emit` consuming. This works for most of its call sites. This is nice, because `emit` very much makes sense as a consuming operation -- indeed, `DiagnosticBuilderState` exists to ensure no diagnostic is emitted twice, but it uses runtime checks. For the small number of call sites where a consuming emit doesn't work, the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will be removed in subsequent commits.) Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes consuming, while `delay_as_bug_without_consuming` is added (which will also be removed in subsequent commits.) All this requires significant changes to `DiagnosticBuilder`'s chaining methods. Currently `DiagnosticBuilder` method chaining uses a non-consuming `&mut self -> &mut Self` style, which allows chaining to be used when the chain ends in `emit()`, like so: ``` struct_err(msg).span(span).emit(); ``` But it doesn't work when producing a `DiagnosticBuilder` value, requiring this: ``` let mut err = self.struct_err(msg); err.span(span); err ``` This style of chaining won't work with consuming `emit` though. For that, we need to use to a `self -> Self` style. That also would allow `DiagnosticBuilder` production to be chained, e.g.: ``` self.struct_err(msg).span(span) ``` However, removing the `&mut self -> &mut Self` style would require that individual modifications of a `DiagnosticBuilder` go from this: ``` err.span(span); ``` to this: ``` err = err.span(span); ``` There are *many* such places. I have a high tolerance for tedious refactorings, but even I gave up after a long time trying to convert them all. Instead, this commit has it both ways: the existing `&mut self -> Self` chaining methods are kept, and new `self -> Self` chaining methods are added, all of which have a `_mv` suffix (short for "move"). Changes to the existing `forward!` macro lets this happen with very little additional boilerplate code. I chose to add the suffix to the new chaining methods rather than the existing ones, because the number of changes required is much smaller that way. This doubled chainging is a bit clumsy, but I think it is worthwhile because it allows a *lot* of good things to subsequently happen. In this commit, there are many `mut` qualifiers removed in places where diagnostics are emitted without being modified. In subsequent commits: - chaining can be used more, making the code more concise; - more use of chaining also permits the removal of redundant diagnostic APIs like `struct_err_with_code`, which can be replaced easily with `struct_err` + `code_mv`; - `emit_without_diagnostic` can be removed, which simplifies a lot of machinery, removing the need for `DiagnosticBuilderState`.
2024-01-03 01:17:35 +00:00
ModError::ParserError(err) => err.emit(),
}
}
2020-03-08 08:28:46 +00:00
}