Implement jump threading MIR opt
This pass is an attempt to generalize `ConstGoto` and `SeparateConstSwitch` passes into a more complete jump threading pass.
This pass is rather heavy, as it performs a truncated backwards DFS on MIR starting from each `SwitchInt` terminator. This backwards DFS remains very limited, as it only walks through `Goto` terminators.
It is build to support constants and discriminants, and a propagating through a very limited set of operations.
The pass successfully manages to disentangle the `Some(x?)` use case and the DFA use case. It still needs a few tests before being ready.
Avoid a `track_errors` by bubbling up most errors from `check_well_formed`
I believe `track_errors` is mostly papering over issues that a sufficiently convoluted query graph can hit. I made this change, while the actual change I want to do is to stop bailing out early on errors, and instead use this new `ErrorGuaranteed` to invoke `check_well_formed` for individual items before doing all the `typeck` logic on them.
This works towards resolving https://github.com/rust-lang/rust/issues/97477 and various other ICEs, as well as allowing us to use parallel rustc more (which is currently rather limited/bottlenecked due to the very sequential nature in which we do `rustc_hir_analysis::check_crate`)
cc `@SparrowLii` `@Zoxc` for the new `try_par_for_each_in` function
coverage: Emit mappings for unused functions without generating stubs
For a while I've been annoyed by the fact that generating coverage maps for unused functions involves generating a stub function at the LLVM level.
As I suspected, generating that stub function isn't actually necessary, as long as we specifically tell LLVM about the symbol names of all the functions that have coverage mappings but weren't codegenned (due to being unused).
---
There is some helper code that gets moved around in the follow-up patches, so look at the first patch to see the most important functional changes.
---
`@rustbot` label +A-code-coverage
This simplifies the code by removing all the `self` assignments and
makes the flow of data clearer - always into the printer.
Especially in v0 mangling, which already used `&mut self` in some
places, it gets a lot more uniform.
This query has a name that sounds general-purpose, but in fact it has
coverage-specific semantics, and (fortunately) is only used by coverage code.
Because it is only ever called once (from one designated CGU), it doesn't need
to be a query, and we can change it to a regular function instead.
Uplift movability and mutability, the simple way
Just make type_ir a dependency of ast. This can be relaxed later if we want to make the dependency less heavy. Part of rust-lang/types-team#124.
r? `@lcnr` or `@jackh726`
coverage: Move most per-function coverage info into `mir::Body`
Currently, all of the coverage information collected by the `InstrumentCoverage` pass is smuggled through MIR in the form of individual `StatementKind::Coverage` statements, which must then be reassembled by coverage codegen.
That's awkward for a number of reasons:
- While some of the coverage statements do care about their specific position in the MIR control-flow graph, many of them don't, and are just tacked onto the function's first BB as metadata carriers.
- MIR inlining can result in coverage statements being duplicated, so coverage codegen has to jump through hoops to avoid emitting duplicate mappings.
- MIR optimizations that would delete coverage statements need to carefully copy them into the function's first BB so as not to omit them from coverage reports.
- The order in which coverage codegen sees coverage statements is dependent on MIR optimizations/inlining, which can cause unnecessary churn in the emitted coverage mappings.
- We don't have a good way to annotate MIR-level functions with extra coverage info that doesn't belong in a statement.
---
This PR therefore takes most of the per-function coverage info and stores it in a field in `mir::Body` as `Option<Box<FunctionCoverageInfo>>`.
(This adds one pointer to the size of `mir::Body`, even when coverage is not enabled.)
Coverage statements still need to be injected into MIR in some cases, but only when they actually affect codegen (counters) or are needed to detect code that has been optimized away as unreachable (counters/expressions).
---
By the end of this PR, the information stored in `FunctionCoverageInfo` is:
- A hash of the function's source code (needed by LLVM's coverage map format)
- The number of coverage counters added by coverage instrumentation
- A table of coverage expressions, associating each expression ID with its operator (add or subtract) and its two operands
- The list of mappings, associating each covered code region with a counter/expression/zero value
---
~~This is built on top of #115301, so I'll rebase and roll a reviewer once that lands.~~
r? `@ghost`
`@rustbot` label +A-code-coverage
This new description reflects the changes made in this PR, and should hopefully
be more useful to non-coverage developers who need to care about coverage
statements.
Even though expression details are now stored in the info structure, we still
need to inject `ExpressionUsed` statements into MIR, because if one is missing
during codegen then we know that it was optimized out and we can remap all of
its associated code regions to zero.
Previously, mappings were attached to individual coverage statements in MIR.
That necessitated special handling in MIR optimizations to avoid deleting those
statements, since otherwise codegen would be unable to reassemble the original
list of mappings.
With this change, a function's list of mappings is now attached to its MIR
body, and survives intact even if individual statements are deleted by
optimizations.
Don't compare host param by name
Seems sketchy to be searching for `sym::host` by name, especially when we can get the actual index with not very much work.
r? fee1-dead
Coverage codegen can now allocate arrays based on the number of
counters/expressions originally used by the instrumentor.
The existing query that inspects coverage statements is still used for
determining the number of counters passed to `llvm.instrprof.increment`. If
some high-numbered counters were removed by MIR optimizations, the instrumented
binary can potentially use less memory and disk space at runtime.
This allows coverage information to be attached to the function as a whole when
appropriate, instead of being smuggled through coverage statements in the
function's basic blocks.
As an example, this patch moves the `function_source_hash` value out of
individual `CoverageKind::Counter` statements and into the per-function info.
When synthesizing unused functions for coverage purposes, the absence of this
info is taken to indicate that a function was not eligible for coverage and
should not be synthesized.
Remove lots of generics from `ty::print`
All of these generics mostly resolve to the same thing, which means we can remove them, greatly simplifying the types involved in pretty printing and unlocking another simplification (that is not performed in this PR): Using `&mut self` instead of passing `self` through the return type.
cc `@eddyb` you probably know why it's like this, just checking in and making sure I didn't do anything bad
r? oli-obk
These are `Self` in almost all printers except one, which can just store
the state as a field instead. This simplifies the printer and allows for
further simplifications, for example using `&mut self` instead of
passing around the printer.
THIR unsafety checking was getting a cycle of
function unsafety checking
-> building THIR for the function
-> evaluating pattern inline constants in the function
-> building MIR for the inline constant
-> checking unsafety of functions (so that THIR can be stolen)
This is fixed by not stealing THIR when generating MIR but instead when
unsafety checking.
This leaves an issue with pattern inline constants not being unsafety
checked because they are evaluated away when generating THIR.
To fix that we now represent inline constants in THIR patterns and
visit them in THIR unsafety checking.
don't UB on dangling ptr deref, instead check inbounds on projections
This implements https://github.com/rust-lang/reference/pull/1387 in Miri. See that PR for what the change is about.
Detecting dangling references in `let x = &...;` is now done by validity checking only, so some tests need to have validity checking enabled. There is no longer inherently a "nodangle" check in evaluating the expression `&*ptr` (aside from the aliasing model).
r? `@oli-obk`
Based on:
- https://github.com/rust-lang/reference/pull/1387
- https://github.com/rust-lang/rust/pull/115524
interpret: clean up AllocBytes
Fixes https://github.com/rust-lang/miri/issues/2836
Nothing has moved here in half a year, so let's just remove these unused stubs -- they need a proper re-design anyway.
r? `@oli-obk`
Format all the let-chains in compiler crates
Since rust-lang/rustfmt#5910 has landed, soon we will have support for formatting let-chains (as soon as rustfmt syncs and beta gets bumped).
This PR applies the changes [from master rustfmt to rust-lang/rust eagerly](https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/out.20formatting.20of.20prs/near/374997516), so that the next beta bump does not have to deal with a 200+ file diff and can remain concerned with other things like `cfg(bootstrap)` -- #113637 was a pain to land, for example, because of let-else.
I will also add this commit to the ignore list after it has landed.
The commands that were run -- I'm not great at bash-foo, but this applies rustfmt to every compiler crate, and then reverts the two crates that should probably be formatted out-of-tree.
```
~/rustfmt $ ls -1d ~/rust/compiler/* | xargs -I@ cargo run --bin rustfmt -- `@/src/lib.rs` --config-path ~/rust --edition=2021 # format all of the compiler crates
~/rust $ git checkout HEAD -- compiler/rustc_codegen_{gcc,cranelift} # revert changes to cg-gcc and cg-clif
```
cc `@rust-lang/rustfmt`
r? `@WaffleLapkin` or `@Nilstrieb` who said they may be able to review this purely mechanical PR :>
cc `@Mark-Simulacrum` and `@petrochenkov,` who had some thoughts on the order of operations with big formatting changes in https://github.com/rust-lang/rust/pull/95262#issue-1178993801. I think the situation has changed since then, given that let-chains support exists on master rustfmt now, and I'm fairly confident that this formatting PR should land even if *bootstrap* rustfmt doesn't yet format let-chains in order to lessen the burden of the next beta bump.
Relate alias ty with variance
In the new solver, turns out that the subst-relate branch of the alias-relate predicate was relating args invariantly even for opaques, which have variance 💀.
This change is a bit more invasive, but I'd rather not special-case it [here](aeaa5c30e5/compiler/rustc_trait_selection/src/solve/alias_relate.rs (L171-L190)) and then have it break elsewhere. I'm doing a perf run to see if the extra call to `def_kind` is that expensive, if it is, I'll reconsider.
r? ``@lcnr``
Extend `impl`'s `def_span` to include its where clauses
Typically, we highlight the def-span of an impl in a diagnostic due to either:
1. coherence error
2. trait evaluation cycle
3. invalid implementation of built-in trait
I find that an impl's where clauses are very often required to understanding why these errors come about, which is unfortunate since where clauses may be located on different lines and don't show up in the error. This PR expands the def-span of impls to include these where clauses.
r? cjgillot since you've touched this code a while back to make some spans shorter, but you can also reassign to wg-diagnostics or compiler if you're busy or have no strong opinions.
improve the suggestion of `generic_bound_failure`
- Fixes#115375
- suggest the bound in the correct scope: trait or impl header vs assoc item. See `tests/ui/suggestions/lifetimes/type-param-bound-scope.rs`
- don't suggest a lifetime name that conflicts with the other late-bound regions of the function:
```rust
type Inv<'a> = *mut &'a ();
fn check_bound<'a, T: 'a>(_: T, _: Inv<'a>) {}
fn test<'a, T>(_: &'a str, t: T, lt: Inv<'_>) { // suggests a new name `'a`
check_bound(t, lt); //~ ERROR
}
```
Generalize small dominators optimization
* Use small dominators optimization from 640ede7b0a more generally.
* Merge `DefLocation` and `LocationExtended` since they serve the same purpose.
remove Key impls for types that involve an AllocId
I don't understand how but somehow that leads to issues like https://github.com/rust-lang/rust/issues/83085? Anyway removing unused impls doesn't seem like a bad idea. The concerning part is that of course nothing will stop us from having such impls again in the future, alongside re-introducing bugs like #83085.
r? `@oli-obk`
Bring back generic parameters for indices in rustc_abi and make it compile on stable
This effectively reverses https://github.com/rust-lang/rust/pull/107163, allowing rust-analyzer to depend on this crate again,
It also moves some glob imports / expands them in the first commit because they made it more difficult for me to reason about things.
The word "active" is currently used in two different and confusing ways:
- `ACTIVE_FEATURES` actually means "available unstable features"
- `Features::active_features` actually means "features declared in the
crate's code", which can include feature within `ACTIVE_FEATURES` but
also others.
(This is also distinct from "enabled" features which includes declared
features but also some edition-specific features automatically enabled
depending on the edition in use.)
This commit changes the `Features::active_features` to
`Features::declared_features` which actually matches its meaning.
Likewise, `Features::active` becomes `Features::declared`.
Remove the `TypedArena::alloc_from_iter` specialization.
It was added in #78569. It's complicated and doesn't actually help
performance.
r? `@cjgillot`
All other impls replace type generics with `()` (or a type implementing the necessery traits)
and lifetimes with `'static`, do the same for those impls.
coverage: Allow each coverage statement to have multiple code regions
The original implementation of coverage instrumentation was built around the assumption that a coverage counter/expression would be associated with *up to one* code region. When it was discovered that *multiple* regions would sometimes need to share a counter, a workaround was found: for the remaining regions, the instrumentor would create a fresh expression that adds zero to the existing counter/expression.
That got the job done, but resulted in some awkward code, and produces unnecessarily complicated coverage maps in the final binary.
---
This PR removes that tension by changing `StatementKind::Coverage`'s code region field from `Option<CodeRegion>` to `Vec<CodeRegion>`.
The changes on the codegen side are fairly straightforward. As long as each `CoverageKind::Counter` only injects one `llvm.instrprof.increment`, the rest of coverage codegen is happy to handle multiple regions mapped to the same counter/expression, with only minor option-to-vec adjustments.
On the instrumentor/mir-transform side, we can get rid of the code that creates extra (x + 0) expressions. Instead we gather all of the code regions associated with a single BCB, and inject them all into one coverage statement.
---
There are several patches here but they can be divided in to three phases:
- Preparatory work
- Actually switching over to multiple regions per coverage statement
- Cleaning up
So viewing the patches individually may be easier.
a small wf and clause cleanup
- remove `Clause::from_projection_clause`, instead use `ToPredicate`
- change `predicate_obligations` to directly take a `Clause`
- remove some unnecessary `&`
- use clause in `min_specialization` checks where easily applicable
Rollup of 5 pull requests
Successful merges:
- #115863 (Add check_unused_messages in tidy)
- #116210 (Ensure that `~const` trait bounds on associated functions are in const traits or impls)
- #116358 (Rename both of the `Match` relations)
- #116371 (Remove unused features from `rustc_llvm`.)
- #116374 (Print normalized ty)
r? `@ghost`
`@rustbot` modify labels: rollup
When these methods were originally written, I wasn't aware that
`newtype_index!` already supports addition with ordinary numbers, without
needing to unwrap and re-wrap.
Cleanup number handling in match exhaustiveness
Doing a little bit of cleanup; handling number constants was somewhat messy. In particular, this:
- evals float consts once instead of repetitively
- reduces `Constructor` from 88 bytes to 56 (`mir::Const` is big!)
The `fast_try_eval_bits` function was mostly constructed from inlining existing code but I don't fully understand it; I don't follow how consts work and are evaluated very well.
Assorted improvements for `rustc_middle::mir::traversal`
r? `@cjgillot`
I'm not _entirely_ sure about all changes, although I do like all of them. If you'd like I can drop some commits. Best reviewed on a commit-by-commit basis, I think, since they are fairly isolated.
Implement a global value numbering MIR optimization
The aim of this pass is to avoid repeated computations by reusing past assignments. It is based on an analysis of SSA locals, in order to perform a restricted form of common subexpression elimination.
By opportunity, this pass allows for some simplifications by combining assignments. For instance, this pass could be able to see through projections of aggregates to directly reuse the aggregate field (not in this PR).
We handle references by assigning a different "provenance" index to each `Ref`/`AddressOf` rvalue. This ensure that we do not spuriously merge borrows that should not be merged. Meanwhile, we consider all the derefs of an immutable reference to a freeze type to give the same value:
```rust
_a = *_b // _b is &Freeze
_c = *_b // replaced by _c = _a
```
Don't store lazyness in `DefKind::TyAlias`
1. Don't store lazyness of a type alias in its `DefKind`, but instead via a query.
2. This allows us to treat type aliases as lazy if `#[feature(lazy_type_alias)]` *OR* if the alias contains a TAIT, rather than having checks for both in separate parts of the codebase.
r? `@oli-obk` cc `@fmease`
Check that closure/generator's interior/capture types are sized
check that closure upvars and generator interiors are sized. this check is only necessary when `unsized_fn_params` or `unsized_locals` is enabled, so only check if those are active.
Fixes#93622Fixes#61335Fixes#68543
adjust how closure/generator types are printed
I saw `&[closure@$DIR/issue-20862.rs:2:5]` and I thought it is a slice type, because that's usually what `&[_]` is... it took me a while to realize that this is just a confusing printer and actually there's no slice. Let's use something that cannot be mistaken for a regular type.
give FutureIncompatibilityReason variants more explicit names
Also make the `reason` field mandatory when declaring a lint, to make sure this is a deliberate decision.
Move `DepKind` to `rustc_query_system` and define it as `u16`
This moves the `DepKind` type to `rustc_query_system` where it's defined with an inner `u16` field. This decouples it from `rustc_middle` and is a step towards letting other crates define dep kinds. It also allows some type parameters to be removed. The `DepKind` trait is replaced with a `Deps` trait. That's used when some operations or information about dep kinds which is unavailable in `rustc_query_system` are still needed.
r? `@cjgillot`
rustc_hir_analysis: add a helper to check function the signature mismatches
This function is now used to check `#[panic_handler]`, `start` lang item, `main`, `#[start]` and intrinsic functions.
The diagnosis produced are now closer to the ones produced by trait/impl method signature mismatch.
This is the first time I do anything with rustc_hir_analysis/rustc_hir_typeck, so comments and suggestions about things I did wrong or that could be improved will be appreciated.
Suggest desugaring to return-position `impl Future` when an `async fn` in trait fails an auto trait bound
First commit allows us to store the span of the `async` keyword in HIR.
Second commit implements a suggestion to desugar an `async fn` to a return-position `impl Future` in trait to slightly improve the `Send` situation being discussed in #115822.
This suggestion is only made when `#![feature(return_type_notation)]` is not enabled -- if it is, we should instead suggest an appropriate where-clause bound.
coverage: Don't bother renumbering expressions on the Rust side
The LLVM API that we use to encode coverage mappings already has its own code for removing unused coverage expressions and renumbering the rest.
This lets us get rid of our own complex renumbering code, making it easier to change our coverage code in other ways.
---
Now that we have tests for coverage mappings (#114843), I've been able to verify that this PR doesn't make the coverage mappings worse, thanks to an explicit simplification step.
rename mir::Constant -> mir::ConstOperand, mir::ConstKind -> mir::Const
Also, be more consistent with the `to/eval_bits` methods... we had some that take a type and some that take a size, and then sometimes the one that takes a type is called `bits_for_ty`.
Turns out that `ty::Const`/`mir::ConstKind` carry their type with them, so we don't need to even pass the type to those `eval_bits` functions at all.
However this is not properly consistent yet: in `ty` we have most of the methods on `ty::Const`, but in `mir` we have them on `mir::ConstKind`. And indeed those two types are the ones that correspond to each other. So `mir::ConstantKind` should actually be renamed to `mir::Const`. But what to do with `mir::Constant`? It carries around a span, that's really more like a constant operand that appears as a MIR operand... it's more suited for `syntax.rs` than `consts.rs`, but the bigger question is, which name should it get if we want to align the `mir` and `ty` types? `ConstOperand`? `ConstOp`? `Literal`? It's not a literal but it has a field called `literal` so it would at least be consistently wrong-ish...
``@oli-obk`` any ideas?
Prevent promotion of const fn calls in inline consts
We don't wanna make that mistake we did for statics and consts worse by letting more code use it.
r? ``@RalfJung``
cc https://github.com/rust-lang/rust/issues/76001
The LLVM API that we use to encode coverage mappings already has its own code
for removing unused coverage expressions and renumbering the rest.
This lets us get rid of our own complex renumbering code, making it easier to
change our coverage code in other ways.
adjust ConstValue::Slice to work for arbitrary slice types
valtrees have already been assuming that this works; this PR makes it a reality. Also further restrict `ConstValue::Slice` to what it is actually used for; this even shrinks `ConstValue` from 32 to 24 bytes which is a nice win. :)
The alternative to this approach is to make `ConstValue::Slice` work really only for `&str`/`&[u8]` literals, and never return it in `op_to_const`. That would make `op_to_const` very clean. We could then even remove the `meta` field; the length would always be `data.inner().len()`. We could *almost* just use a `Symbol` instead of a `ConstAllocation`, but we have to support byte strings and there doesn't seem to be an interned representation of them (or rather, `ConstAllocation` *is* their interned representation). In this world, valtrees of slice reference types would then become noticeably more expensive to turn into a `ConstValue` -- but does that matter? Specifically for `&str`/`&[u8]` we could still use the optimized representation if we wanted.
If byte strings were already interned somewhere I'd gravitate towards the alternative, but the way things stand, we need a `ConstAllocation` case anyway to support byte strings, and then we might as well support arbitrary slices. (Or we say that byte strings don't get an optimized representation at all. Such a performance cliff between `str` and byte strings is probably unexpected, though due to the lack of interning for byte strings I think there might already be a performance cliff there.)
clean up unneeded `ToPredicate` impls
Part of #107250.
Removed all totally unused impls. And inlined two impls not need to satisify trait bound.
r? `@oli-obk`
miri: reduce code duplication in some SSE/SSE2 intrinsics
Reduces code duplication in the Miri implementation of some SSE and SSE2 using generics and rustc_const_eval helper functions.
There are also some other minor changes.
r? `@RalfJung`
Pretty-print argument-position impl trait to name it.
This removes a corner case.
RPIT and TAIT keep having no name, and it would be wrong to use the one in HIR (Ident::empty), so I make this case ICE.
rustc_target/riscv: Fix passing of transparent unions with only one non-ZST member
This ensures that `MaybeUninit<T>` has the same ABI as `T` when passed through an `extern "C"` function.
Fixes https://github.com/rust-lang/rust/issues/115481.
r? `@RalfJung`
This function is now used to check `#[panic_handler]`, `start` lang item, `main`, `#[start]` and intrinsic functions.
The diagnosis produced are now closer to the ones produced by trait/impl method signature mismatch.
move things out of mir/mod.rs
This moves a bunch of things out of `mir/mod.rs`:
- all const-related stuff to a new file consts.rs
- all statement/place/operand-related stuff to a new file statement.rs
- all pretty-printing related stuff to pretty.rs
`mod.rs` started out with 3100 lines and ends up with 1600. :)
Also there was some pretty-printing stuff in terminator.rs, that also got moved to pretty.rs, and I reordered things in pretty.rs so that it can be grouped by functionality.
Only the commit "use pretty_print_const_value from MIR constant 'extra' printing" has any behavior changes; it resolves the issue of having a fancy and a very crude pretty-printer for `ConstValue`.
r? `@oli-obk`
Make `TyKind::Adt`'s `Debug` impl be more pretty
Currently `{:?}` on `Ty` for a `TyKind::Adt` would print as `Adt(Foo, [])`. This PR changes it to be `Foo` when there are no generics or `Foo<T>`/`Foo<T, U>` when there _are_ generics. Example from debug log:
`├─0ms DEBUG rustc_hir_analysis::astconv return=Bar<T/#0, U/#1>`
I should have done this in my initial PR for a prettier TyKind: Debug impl but I thought I would need to be accessing generics_of to figure out where in the "path" the generics would have to go??? but no, adts literally only have a single place the generics can go (on the end). Feel a bit silly about this :)
r? `@oli-obk`
move required_consts check to general post-mono-check function
This factors some code that is common between the interpreter and the codegen backends into shared helper functions. Also as a side-effect the interpreter now uses the same `eval` functions as everyone else to get the evaluated MIR constants.
Also this is in preparation for another post-mono check that will be needed for (the current hackfix for) https://github.com/rust-lang/rust/issues/115709: ensuring that all locals are dynamically sized.
I didn't expect this to change diagnostics, but it's just cycle errors that change.
r? `@oli-obk`
nop_lift macros: ensure that we are using the right interner
Right now someone could put down the wrong list name when using these macros, and everything would still build. Nothing does a type-check to ensure that the `$set` contains element of type `Self::Lifted`. Let's fix that.
For lists this is fairly easy; for the other interners we need to unwrap some newtypes which makes this more complicated.
It's easier to pass it in to the one method that needs it
(`highlighting_region_vid`) than to store it in the type. This means
`RegionHighlightMode` can impl `Default`.
`#[diagnostic::on_unimplemented]` without filters
This commit adds support for a `#[diagnostic::on_unimplemented]` attribute with the following options:
* `message` to customize the primary error message
* `note` to add a customized note message to an error message
* `label` to customize the label part of the error message
The relevant behavior is specified in [RFC-3366](https://rust-lang.github.io/rfcs/3366-diagnostic-attribute-namespace.html)
Accept additional user-defined syntax classes in fenced code blocks
Part of #79483.
This is a re-opening of https://github.com/rust-lang/rust/pull/79454 after a big update/cleanup. I also converted the syntax to pandoc as suggested by `@notriddle:` the idea is to be as compatible as possible with the existing instead of having our own syntax.
## Motivation
From the original issue: https://github.com/rust-lang/rust/issues/78917
> The technique used by `inline-c-rs` can be ported to other languages. It's just super fun to see C code inside Rust documentation that is also tested by `cargo doc`. I'm sure this technique can be used by other languages in the future.
Having custom CSS classes for syntax highlighting will allow tools like `highlight.js` to be used in order to provide highlighting for languages other than Rust while not increasing technical burden on rustdoc.
## What is the feature about?
In short, this PR changes two things, both related to codeblocks in doc comments in Rust documentation:
* Allow to disable generation of `language-*` CSS classes with the `custom` attribute.
* Add your own CSS classes to a code block so that you can use other tools to highlight them.
#### The `custom` attribute
Let's start with the new `custom` attribute: it will disable the generation of the `language-*` CSS class on the generated HTML code block. For example:
```rust
/// ```custom,c
/// int main(void) {
/// return 0;
/// }
/// ```
```
The generated HTML code block will not have `class="language-c"` because the `custom` attribute has been set. The `custom` attribute becomes especially useful with the other thing added by this feature: adding your own CSS classes.
#### Adding your own CSS classes
The second part of this feature is to allow users to add CSS classes themselves so that they can then add a JS library which will do it (like `highlight.js` or `prism.js`), allowing to support highlighting for other languages than Rust without increasing burden on rustdoc. To disable the automatic `language-*` CSS class generation, you need to use the `custom` attribute as well.
This allow users to write the following:
```rust
/// Some code block with `{class=language-c}` as the language string.
///
/// ```custom,{class=language-c}
/// int main(void) {
/// return 0;
/// }
/// ```
fn main() {}
```
This will notably produce the following HTML:
```html
<pre class="language-c">
int main(void) {
return 0;
}</pre>
```
Instead of:
```html
<pre class="rust rust-example-rendered">
<span class="ident">int</span> <span class="ident">main</span>(<span class="ident">void</span>) {
<span class="kw">return</span> <span class="number">0</span>;
}
</pre>
```
To be noted, we could have written `{.language-c}` to achieve the same result. `.` and `class=` have the same effect.
One last syntax point: content between parens (`(like this)`) is now considered as comment and is not taken into account at all.
In addition to this, I added an `unknown` field into `LangString` (the parsed code block "attribute") because of cases like this:
```rust
/// ```custom,class:language-c
/// main;
/// ```
pub fn foo() {}
```
Without this `unknown` field, it would generate in the DOM: `<pre class="language-class:language-c language-c">`, which is quite bad. So instead, it now stores all unknown tags into the `unknown` field and use the first one as "language". So in this case, since there is no unknown tag, it'll simply generate `<pre class="language-c">`. I added tests to cover this.
Finally, I added a parser for the codeblock attributes to make it much easier to maintain. It'll be pretty easy to extend.
As to why this syntax for adding attributes was picked: it's [Pandoc's syntax](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes). Even if it seems clunkier in some cases, it's extensible, and most third-party Markdown renderers are smart enough to ignore Pandoc's brace-delimited attributes (from [this comment](https://github.com/rust-lang/rust/pull/110800#issuecomment-1522044456)).
## Raised concerns
#### It's not obvious when the `language-*` attribute generation will be added or not.
It is added by default. If you want to disable it, you will need to use the `custom` attribute.
#### Why not using HTML in markdown directly then?
Code examples in most languages are likely to contain `<`, `>`, `&` and `"` characters. These characters [require escaping](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre) when written inside the `<pre>` element. Using the \`\`\` code blocks allows rustdoc to take care of escaping, which means doc authors can paste code samples directly without manually converting them to HTML.
cc `@poliorcetics`
r? `@notriddle`
Properly consider binder vars in `HasTypeFlagsVisitor`
Given a PolyTraitRef like `for<'a> Ty: Trait` (where neither `Ty` nor `Trait` mention `'a`), we do *not* return true for `.has_type_flags(TypeFlags::HAS_LATE_BOUND)`, even though binders are supposed to act as if they have late-bound vars even if they don't mention them in their bound value: 31ae3b2bdb. This is because we use `HasTypeFlagsVisitor`, which only computes the type flags for `Ty`, `Const` and `Region` and `Predicates`, and we consequently skip any binders (and setting flags for their vars) that are not contained in one of these types.
This ends up causing a problem, because when we call `TyCtxt::erase_regions` (which both erases regions *and* anonymizes bound vars), we will skip such a PolyTraitRef, not anonymizing it, and therefore not making it structurally equal to other binders. This breaks vtable computations.
This PR computes the flags for all binders we enter in `HasTypeFlagsVisitor` if we're looking for `TypeFlags::HAS_LATE_BOUND` (or `TypeFlags::HAS_{RE,TY,CT}_LATE_BOUND`).
Fixes#115807