Accept additional user-defined syntax classes in fenced code blocks
Part of #79483.
This is a re-opening of https://github.com/rust-lang/rust/pull/79454 after a big update/cleanup. I also converted the syntax to pandoc as suggested by `@notriddle:` the idea is to be as compatible as possible with the existing instead of having our own syntax.
## Motivation
From the original issue: https://github.com/rust-lang/rust/issues/78917
> The technique used by `inline-c-rs` can be ported to other languages. It's just super fun to see C code inside Rust documentation that is also tested by `cargo doc`. I'm sure this technique can be used by other languages in the future.
Having custom CSS classes for syntax highlighting will allow tools like `highlight.js` to be used in order to provide highlighting for languages other than Rust while not increasing technical burden on rustdoc.
## What is the feature about?
In short, this PR changes two things, both related to codeblocks in doc comments in Rust documentation:
* Allow to disable generation of `language-*` CSS classes with the `custom` attribute.
* Add your own CSS classes to a code block so that you can use other tools to highlight them.
#### The `custom` attribute
Let's start with the new `custom` attribute: it will disable the generation of the `language-*` CSS class on the generated HTML code block. For example:
```rust
/// ```custom,c
/// int main(void) {
/// return 0;
/// }
/// ```
```
The generated HTML code block will not have `class="language-c"` because the `custom` attribute has been set. The `custom` attribute becomes especially useful with the other thing added by this feature: adding your own CSS classes.
#### Adding your own CSS classes
The second part of this feature is to allow users to add CSS classes themselves so that they can then add a JS library which will do it (like `highlight.js` or `prism.js`), allowing to support highlighting for other languages than Rust without increasing burden on rustdoc. To disable the automatic `language-*` CSS class generation, you need to use the `custom` attribute as well.
This allow users to write the following:
```rust
/// Some code block with `{class=language-c}` as the language string.
///
/// ```custom,{class=language-c}
/// int main(void) {
/// return 0;
/// }
/// ```
fn main() {}
```
This will notably produce the following HTML:
```html
<pre class="language-c">
int main(void) {
return 0;
}</pre>
```
Instead of:
```html
<pre class="rust rust-example-rendered">
<span class="ident">int</span> <span class="ident">main</span>(<span class="ident">void</span>) {
<span class="kw">return</span> <span class="number">0</span>;
}
</pre>
```
To be noted, we could have written `{.language-c}` to achieve the same result. `.` and `class=` have the same effect.
One last syntax point: content between parens (`(like this)`) is now considered as comment and is not taken into account at all.
In addition to this, I added an `unknown` field into `LangString` (the parsed code block "attribute") because of cases like this:
```rust
/// ```custom,class:language-c
/// main;
/// ```
pub fn foo() {}
```
Without this `unknown` field, it would generate in the DOM: `<pre class="language-class:language-c language-c">`, which is quite bad. So instead, it now stores all unknown tags into the `unknown` field and use the first one as "language". So in this case, since there is no unknown tag, it'll simply generate `<pre class="language-c">`. I added tests to cover this.
Finally, I added a parser for the codeblock attributes to make it much easier to maintain. It'll be pretty easy to extend.
As to why this syntax for adding attributes was picked: it's [Pandoc's syntax](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes). Even if it seems clunkier in some cases, it's extensible, and most third-party Markdown renderers are smart enough to ignore Pandoc's brace-delimited attributes (from [this comment](https://github.com/rust-lang/rust/pull/110800#issuecomment-1522044456)).
## Raised concerns
#### It's not obvious when the `language-*` attribute generation will be added or not.
It is added by default. If you want to disable it, you will need to use the `custom` attribute.
#### Why not using HTML in markdown directly then?
Code examples in most languages are likely to contain `<`, `>`, `&` and `"` characters. These characters [require escaping](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre) when written inside the `<pre>` element. Using the \`\`\` code blocks allows rustdoc to take care of escaping, which means doc authors can paste code samples directly without manually converting them to HTML.
cc `@poliorcetics`
r? `@notriddle`
Make useless_ptr_null_checks smarter about some std functions
This teaches the `useless_ptr_null_checks` lint that some std functions can't ever return null pointers, because they need to point to valid data, get references as input, etc.
This is achieved by introducing an `#[rustc_never_returns_null_ptr]` attribute and adding it to these std functions (gated behind bootstrap `cfg_attr`).
Later on, the attribute could maybe be used to tell LLVM that the returned pointer is never null. I don't expect much impact of that though, as the functions are pretty shallow and usually the input data is already never null.
Follow-up of PR #113657Fixes#114442
Rework `no_coverage` to `coverage(off)`
As discussed at the tail of https://github.com/rust-lang/rust/issues/84605 this replaces the `no_coverage` attribute with a `coverage` attribute that takes sub-parameters (currently `off` and `on`) to control the coverage instrumentation.
Allows future-proofing for things like `coverage(off, reason="Tested live", issue="#12345")` or similar.
Use `Freeze` for `SourceFile`
This uses the `Freeze` type in `SourceFile` to let accessing `external_src` and `lines` be lock-free.
Behavior of `add_external_src` is changed to set `ExternalSourceKind::AbsentErr` on a hash mismatch which matches the documentation. `ExternalSourceKind::Unneeded` was removed as it's unused.
Based on https://github.com/rust-lang/rust/pull/115401.
`Span` has undergone some changes over the years (addition of an optional
parent, and possible inlining of the context in interned spans) but the
comments and identifiers used haven't kept up. As a result, I find it
harder to understand than I should.
This commit reworks the comments, renames some identifiers, and
restructures the code slightly, all to make things clearer. I now feel
like I understand this code again.
Lint on invalid usage of `UnsafeCell::raw_get` in reference casting
This PR proposes to take into account `UnsafeCell::raw_get` method call for non-Freeze types for the `invalid_reference_casting` lint.
The goal of this is to catch those kind of invalid reference casting:
```rust
fn as_mut<T>(x: &T) -> &mut T {
unsafe { &mut *std::cell::UnsafeCell::raw_get(x as *const _ as *const _) }
//~^ ERROR casting `&T` to `&mut T` is undefined behavior
}
```
r? `@est31`
Make RPITITs capture all in-scope lifetimes
Much like #114616, this implements the lang team decision from this T-lang meeting on [opaque captures strategy moving forward](https://hackmd.io/sFaSIMJOQcuwCdnUvCxtuQ?view). This will be RFC'd soon, but given that RPITITs are a nightly feature, this shouldn't necessarily be blocked on that.
We unconditionally capture all lifetimes in RPITITs -- impl is not as simple as #114616, since we still need to duplicate RPIT lifetimes to make sure we reify any late-bound lifetimes in scope.
Closes#112194
Load include_bytes! directly into an Lrc
This PR deletes an innocent-looking `.into()` that was converting from a `Vec<u8>` to `Lrc<[u8]>`. This has significant runtime and memory overhead when using `include_bytes!` to pull in a large binary file.
Add symbols for Clippy usage
The `arithmetic_side_effects` lint is always "interning" these non-existing symbols related to math operations causing a bit of a slowdown.
Fix races conditions with `SyntaxContext` decoding
This changes `SyntaxContext` decoding to work with concurrent decoding. The `remapped_ctxts` field now only stores `SyntaxContext` which have completed decoding, while the new `decoding` and `local_in_progress` keeps track of `SyntaxContext`s which are in process of being decoding and on which threads.
This fixes 2 issues with the current implementation. It can return an `SyntaxContext` which contains dummy data if another thread starts decoding before the first one has completely finished. Multiple threads could also allocate multiple `SyntaxContext`s for the same `raw_id`.
Parse unnamed fields and anonymous structs or unions (no-recovery)
It is part of #114782 which implements #49804. Only parse anonymous structs or unions in struct field definition positions.
r? `@petrochenkov`
Anonymous structs or unions are only allowed in struct field
definitions.
Co-authored-by: carbotaniuman <41451839+carbotaniuman@users.noreply.github.com>
Fix ABI flags in RISC-V/LoongArch ELF file generated by rustc
Fix#114153
It turns out the current way to set these flags are completely wrong. In LLVM the target ABI is used instead of target features to determine these flags.
Not sure how to write a test though. Or maybe a test isn't necessary because this affects only those touching target json?
r? `@Nilstrieb`
Fix argument removal suggestion around macros
Fixes#112437.
Fixes#113866.
Helps with #114255.
The issue was that `span.find_ancestor_inside(outer)` could previously return a span with a different expansion context from `outer`.
This happens for example for the built-in macro `panic!`, which expands to another macro call of `panic_2021!` or `panic_2015!`. Because the call site of `panic_20xx!` has not associated source code, its span currently points to the call site of `panic!` instead.
Something similar also happens items that get desugared in AST->HIR lowering. For example, `for` loops get two spans: One "inner" span that has the `.desugaring_kind()` kind set to `DesugaringKind::ForLoop` and one "outer" span that does not. Similar to the macro situation, both of these spans point to the same source code, but have different expansion contexts.
This causes problems, because joining two spans with different expansion contexts will usually[^1] not actually join them together to avoid creating "spaghetti" spans that go from the macro definition to the macro call. For example, in the following snippet `full_span` might not actually contain the `adjusted_start` and `adjusted_end`. This caused the broken suggestion / debug ICE in the linked issues.
```rust
let adjusted_start = start.find_ancestor_inside(shared_ancestor);
let adjusted_end = end.find_ancestor_inside(shared_ancestor);
let full_span = adjusted_start.to(adjusted_end)
```
To fix the issue, this PR introduces a new method, `find_ancestor_inside_same_ctxt`, which combines the functionality of `find_ancestor_inside` and `find_ancestor_in_same_ctxt`: It finds an ancestor span that is contained within the parent *and* has the same syntax context, and is therefore safe to extend. This new method should probably be used everywhere, where the returned span is extended, but for now it is just used for the argument removal suggestion.
Additionally, this PR fixes a second issue where the function call itself is inside a macro but the arguments come from outside the macro. The test is added in the first commit to include stderr diff, so this is best reviewed commit by commit.
[^1]: If one expansion context is the root context and the other is not.