rustdoc-search: shard the search result descriptions
## Preview
This makes no visual changes to rustdoc search. It's a pure perf improvement.
<details><summary>old</summary>
Preview: <http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec>
WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN>
Waterfall diagram:
![image](https://github.com/rust-lang/rust/assets/1593513/39548f0c-7ad6-411b-abf8-f6668ff4da18)
</details>
Preview: <http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec>
WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104>
![image](https://github.com/rust-lang/rust/assets/1593513/4be1f9ff-c3ff-4b96-8f5b-b264df2e662d)
## Description
r? `@GuillaumeGomez`
The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files.
Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, *then* download.
This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse.
https://caniuse.com/async-functions
[^1]:
<https://microsoft.github.io/windows-docs-rs/>, a crate with
44MiB of pure names and no descriptions for them, is an outlier
and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much.
## Detailed sizes
```console
$ cat test.sh
set -ex
cp ../search-index*.js search-index.js
awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json
jq -c '.t' search-index.json > t.json
jq -c '.n' search-index.json > n.json
jq -c '.q' search-index.json > q.json
jq -c '.D' search-index.json > D.json
jq -c '.e' search-index.json > e.json
jq -c '.i' search-index.json > i.json
jq -c '.f' search-index.json > f.json
jq -c '.c' search-index.json > c.json
jq -c '.p' search-index.json > p.json
jq -c '.a' search-index.json > a.json
du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
$ bash test.sh
+ cp ../search-index1.78.0.js search-index.js
+ awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js
+ awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}'
+ sed -E 's:\\'\'':'\'':g'
+ jq -c .t search-index.json
+ jq -c .n search-index.json
+ jq -c .q search-index.json
+ jq -c .D search-index.json
+ jq -c .e search-index.json
+ jq -c .i search-index.json
+ jq -c .f search-index.json
+ jq -c .c search-index.json
+ jq -c .p search-index.json
+ jq -c .a search-index.json
+ du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
64K t.json
800K n.json
8.0K q.json
4.0K D.json
16K e.json
192K i.json
544K f.json
4.0K c.json
36K p.json
20K a.json
```
These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important).
t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library.
n = name, and that's now the largest section of the search index with the descriptions removed from it
q = parent *module* path, stored parallel to the items within
D = the size of each description shard, stored as vlq hex numbers
e = empty description bit flags, stored as a roaring bitmap
i = parent *type* index as a link into `p`, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr
f = function signature, stored as lists of lists that index into `p`
c = deprecation flag, stored as a roaring bitmap
p = parent *type*, stored separately and linked into from `i` and `f`
a = alias, as [[key, value]] pairs
## Search performance
http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html
For example, in stm32f4:
<table><thead><tr><th>before<th>after</tr></thead>
<tbody><tr><td>
```
Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 617
Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 198
Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 282
Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 426
Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 673
```
</td><td>
```
Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 716
Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 207
Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 289
Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 418
Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 687
```
</td></tr><tr><td>
```
user: 005.345 s
sys: 002.955 s
wall: 006.899 s
child_RSS_high: 583664 KiB
group_mem_high: 557876 KiB
```
</td><td>
```
user: 004.652 s
sys: 000.565 s
wall: 003.865 s
child_RSS_high: 538696 KiB
group_mem_high: 511724 KiB
```
</td></tr>
</table>
This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real.
## Future possibilities
* Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps).
* More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t.
* [Full text search](https://internals.rust-lang.org/t/full-text-search-for-rustdoc-and-doc-rs/20427) also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that [publishers](https://gist.github.com/notriddle/c289e77f3ed469d1c0238d1d135d49e1) have to store.
The descriptions are, on almost all crates[^1], the majority
of the size of the search index, even though they aren't really
used for searching. This makes it relatively easy to separate
them into their own files.
This commit also bumps us to ES8. Out of the browsers we support,
all of them support async functions according to caniuse.
https://caniuse.com/async-functions
[^1]:
<https://microsoft.github.io/windows-docs-rs/>, a crate with
44MiB of pure names and no descriptions for them, is an outlier
and should not be counted.
rustdoc: fix up old test
`tests/rustdoc/line-breaks.rs` had several issues:
1. It used `//`@count`` instead of `// `@count`` (notice the space!) which gets treated as a `ui_test` directive instead of a `htmldocck` one. `compiletest` didn't flag it as an error because it's allowlisted ([#121561](https://github.com/rust-lang/rust/pull/121561)) presumably precisely because of this test. And before the compiletest→ui_test migration, these directives must've been ignored, too, because …
2. … the checks themselves no longer work either: The count of `<br>`s is actually 0 in all 3 cases because – well – we no longer generate any `<br>`s inside `<pre>`s.
Since I don't know how to ``@count`` `\n`s instead of `<br>`s, I've turned them into ``@matches`.` Btw, I don't know if this test is still desirable or if we have other tests that cover this (I haven't checked).
r? rustdoc
Expose the Freeze trait again (unstably) and forbid implementing it manually
non-emoji version of https://github.com/rust-lang/rust/pull/121501
cc #60715
This trait is useful for generic constants (associated consts of generic traits). See the test (`tests/ui/associated-consts/freeze.rs`) added in this PR for a usage example. The builtin `Freeze` trait is the only way to do it, users cannot work around this issue.
It's also a useful trait for building some very specific abstrations, as shown by the usage by the `zerocopy` crate: https://github.com/google/zerocopy/issues/941
cc ```@RalfJung```
T-lang signed off on reexposing this unstably: https://github.com/rust-lang/rust/pull/121501#issuecomment-1969827742
[rustdoc] Prevent inclusion of whitespace character after macro_rules ident
Discovered this bug randomly when looking at:
![image](https://github.com/rust-lang/rust/assets/3050060/dca38047-9085-4377-bfac-f98890224be4)
We were too eagerly trying to merge tokens that shouldn't be merged together (for example if you have a code comment followed by a code comment, we merge them in one attribute to reduce the DOM size).
r? ``@notriddle``
Fix link generation for foreign macro in jump to definition feature
The crate name is already added to the link so it shouldn't be added a second time for local foreign macros.
r? ``@notriddle``
rustdoc: fix and refactor HTML rendering a bit
* refactoring: get rid of a bunch of manual `f.alternate()` branches
* not sure why this wasn't done so already, is this perf-sensitive?
* fix an ICE in debug builds of rustdoc
* rustdoc used to crash on empty outlives-bounds: `where 'a:`
* properly escape const generic defaults
* actually print empty trait and outlives-bounds (doesn't work for cross-crate reexports yet, will fix that at some other point) since they can have semantic significance
* outlives-bounds: forces lifetime params to be early-bound instead of late-bound which is technically speaking part of the public API
* trait-bounds: can affect the well-formedness, consider
* makeshift “const-evaluatable” bounds under `generic_const_exprs`
* bounds to force wf-checking in light of #100041 (quite artificial I know, I couldn't figure out something better), see https://github.com/rust-lang/rust/pull/121160#discussion_r1491563816
Add extra indent spaces for rust-playground link
Fixes#120998
Seems add `rustfmt` for this is somehow too heavy,
only adding indent spaces at the starting of each line of code seems good enough.
rustdoc: Correctly handle attribute merge if this is a glob reexport
Fixes#120487.
The regression was introduced in https://github.com/rust-lang/rust/pull/113091. Only non-glob reexports should have been impacted.
cc `````@Nemo157`````
r? `````@notriddle`````
rustdoc: offset generic args of cross-crate trait object types when cleaning
Fixes#119529.
This PR contains several refactorings apart from the bug fix.
Best reviewed commit by commit.
r? GuillaumeGomez
Use diagnostic namespace in stdlib
This required a minor fix to have the diagnostics shown in third party crates when the `diagnostic_namespace` feature is not enabled. See 5d63f5d8d1 for details. I've opted for having a single PR for both changes as it's really not that much code. If it is required it should be easy to split up the change into several PR's.
r? `@compiler-errors`
[rustdoc] Fix invalid handling for static method calls in jump to definition feature
I realized when working on a clippy lint that static method calls on `Self` could not give me the method `Res`. For that, we need to use `typeck` and so that's what I did in here.
It fixes the linking to static method calls.
r? ````@notriddle````
It doesn't look quite right, because the lines are too far apart,
and it's not going to be announced by screenreaders as a menu button,
since that's not what the symbol means.
This adds a real tooltip and uses a better drawing of the icon.
Don't merge cfg and doc(cfg) attributes for re-exports
Fixes#112881.
## Explanations
When re-exporting things with different `cfg`s there are two things that can happen:
* The re-export uses a subset of `cfg`s, this subset is sufficient so that the item will appear exactly with the subset
* The re-export uses a non-subset of `cfg`s (e.g. like the example I posted just above where the re-export is ungated), if the non-subset `cfg`s are active (e.g. compiling that example on windows) then this will be a compile error as the item doesn't exist to re-export, if the subset `cfg`s are active it behaves like 1.
### Glob re-exports?
**This only applies to non-glob inlined re-exports.** For glob re-exports the item may or may not exist to be re-exported (potentially the `cfg`s on the path up until the glob can be removed, and only `cfg`s on the globbed item itself matter), for non-inlined re-exports see https://github.com/rust-lang/rust/issues/85043.
cc `@Nemo157`
r? `@notriddle`
Remove mention of rust to make the error message generic.
The deprecation notice is used when in crates as well. This applies to versions Rust or Crates.
Relates #118148
The deprecation notice is used when in crates as well. This applies to versions Rust or Crates.
Fixes#118148
Signed-off-by: Harold Dost <h.dost@criteo.com>
This is about equally readable, a lot more terse, and stops
special-casing functions and methods.
```console
$ du -hs doc-old/ doc-new/
671M doc-old/
670M doc-new/
```
Rollup of 5 pull requests
Successful merges:
- #113241 (rustdoc: Document lack of object safety on affected traits)
- #117388 (Turn const_caller_location from a query to a hook)
- #117417 (Add a stable MIR visitor)
- #117439 (prepopulate opaque ty storage before using it)
- #117451 (Add support for pre-unix-epoch file dates on Apple platforms (#108277))
r? `@ghost`
`@rustbot` modify labels: rollup
rustdoc: Document lack of object safety on affected traits
Closes#85138
I saw the issue didn't have any recent activity, if there is another MR for it I missed it.
I want the issue to move forward so here is my proposition.
It takes some space just before the "Implementors" section and only if the trait is **not** object
safe since it is the only case where special care must be taken in some cases and this has the
benefit of avoiding generation of HTML in (I hope) the common case.
rustdoc: use JS to inline target type impl docs into alias
Preview docs:
- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias/std/io/type.Result.html
- https://notriddle.com/rustdoc-html-demo-5/js-trait-alias-compiler/rustc_middle/ty/type.PolyTraitRef.html
This pull request also includes a bug fix for trait alias inlining across crates. This means more documentation is generated, and is why ripgrep runs slower (it's a thin wrapper on top of the `grep` crate, so 5% of its docs are now the Result type).
- Before, built with rustdoc 1.75.0-nightly (aa1a71e9e 2023-10-26), Result type alias method docs are missing: http://notriddle.com/rustdoc-html-demo-5/ripgrep-js-nightly/rg/type.Result.html
- After, built with this branch, all the methods on Result are shown: http://notriddle.com/rustdoc-html-demo-5/ripgrep-js-trait-alias/rg/type.Result.html
*Review note: This is mostly just reverting https://github.com/rust-lang/rust/pull/115201. The last commit has the new work in it.*
Fixes#115718
This is an attempt to balance three problems, each of which would
be violated by a simpler implementation:
- A type alias should show all the `impl` blocks for the target
type, and vice versa, if they're applicable. If nothing was
done, and rustdoc continues to match them up in HIR, this
would not work.
- Copying the target type's docs into its aliases' HTML pages
directly causes far too much redundant HTML text to be generated
when a crate has large numbers of methods and large numbers
of type aliases.
- Using JavaScript exclusively for type alias impl docs would
be a functional regression, and could make some docs very hard
to find for non-JS readers.
- Making sure that only applicable docs are show in the
resulting page requires a type checkers. Do not reimplement
the type checker in JavaScript.
So, to make it work, rustdoc stashes these type-alias-inlined docs
in a JSONP "database-lite". The file is generated in `write_shared.rs`,
included in a `<script>` tag added in `print_item.rs`, and `main.js`
takes care of patching the additional docs into the DOM.
The format of `trait.impl` and `type.impl` JS files are superficially
similar. Each line, except the JSONP wrapper itself, belongs to a crate,
and they are otherwise separate (rustdoc should be idempotent). The
"meat" of the file is HTML strings, so the frontend code is very simple.
Links are relative to the doc root, though, so the frontend needs to fix
that up, and inlined docs can reuse these files.
However, there are a few differences, caused by the sophisticated
features that type aliases have. Consider this crate graph:
```text
---------------------------------
| crate A: struct Foo<T> |
| type Bar = Foo<i32> |
| impl X for Foo<i8> |
| impl Y for Foo<i32> |
---------------------------------
|
----------------------------------
| crate B: type Baz = A::Foo<i8> |
| type Xyy = A::Foo<i8> |
| impl Z for Xyy |
----------------------------------
```
The type.impl/A/struct.Foo.js JS file has a structure kinda like this:
```js
JSONP({
"A": [["impl Y for Foo<i32>", "Y", "A::Bar"]],
"B": [["impl X for Foo<i8>", "X", "B::Baz", "B::Xyy"], ["impl Z for Xyy", "Z", "B::Baz"]],
});
```
When the type.impl file is loaded, only the current crate's docs are
actually used. The main reason to bundle them together is that there's
enough duplication in them for DEFLATE to remove the redundancy.
The contents of a crate are a list of impl blocks, themselves
represented as lists. The first item in the sublist is the HTML block,
the second item is the name of the trait (which goes in the sidebar),
and all others are the names of type aliases that successfully match.
This way:
- There's no need to generate these files for types that have no aliases
in the current crate. If a dependent crate makes a type alias, it'll
take care of generating its own docs.
- There's no need to reimplement parts of the type checker in
JavaScript. The Rust backend does the checking, and includes its
results in the file.
- Docs defined directly on the type alias are dropped directly in the
HTML by `render_assoc_items`, and are accessible without JavaScript.
The JSONP file will not list impl items that are known to be part
of the main HTML file already.
[JSONP]: https://en.wikipedia.org/wiki/JSONP
This is an attempt to balance three problems, each of which would
be violated by a simpler implementation:
- A type alias should show all the `impl` blocks for the target
type, and vice versa, if they're applicable. If nothing was
done, and rustdoc continues to match them up in HIR, this
would not work.
- Copying the target type's docs into its aliases' HTML pages
directly causes far too much redundant HTML text to be generated
when a crate has large numbers of methods and large numbers
of type aliases.
- Using JavaScript exclusively for type alias impl docs would
be a functional regression, and could make some docs very hard
to find for non-JS readers.
- Making sure that only applicable docs are show in the
resulting page requires a type checkers. Do not reimplement
the type checker in JavaScript.
So, to make it work, rustdoc stashes these type-alias-inlined docs
in a JSONP "database-lite". The file is generated in `write_shared.rs`,
included in a `<script>` tag added in `print_item.rs`, and `main.js`
takes care of patching the additional docs into the DOM.
The format of `trait.impl` and `type.impl` JS files are superficially
similar. Each line, except the JSONP wrapper itself, belongs to a crate,
and they are otherwise separate (rustdoc should be idempotent). The
"meat" of the file is HTML strings, so the frontend code is very simple.
Links are relative to the doc root, though, so the frontend needs to fix
that up, and inlined docs can reuse these files.
However, there are a few differences, caused by the sophisticated
features that type aliases have. Consider this crate graph:
```text
---------------------------------
| crate A: struct Foo<T> |
| type Bar = Foo<i32> |
| impl X for Foo<i8> |
| impl Y for Foo<i32> |
---------------------------------
|
----------------------------------
| crate B: type Baz = A::Foo<i8> |
| type Xyy = A::Foo<i8> |
| impl Z for Xyy |
----------------------------------
```
The type.impl/A/struct.Foo.js JS file has a structure kinda like this:
```js
JSONP({
"A": [["impl Y for Foo<i32>", "Y", "A::Bar"]],
"B": [["impl X for Foo<i8>", "X", "B::Baz", "B::Xyy"], ["impl Z for Xyy", "Z", "B::Baz"]],
});
```
When the type.impl file is loaded, only the current crate's docs are
actually used. The main reason to bundle them together is that there's
enough duplication in them for DEFLATE to remove the redundancy.
The contents of a crate are a list of impl blocks, themselves
represented as lists. The first item in the sublist is the HTML block,
the second item is the name of the trait (which goes in the sidebar),
and all others are the names of type aliases that successfully match.
This way:
- There's no need to generate these files for types that have no aliases
in the current crate. If a dependent crate makes a type alias, it'll
take care of generating its own docs.
- There's no need to reimplement parts of the type checker in
JavaScript. The Rust backend does the checking, and includes its
results in the file.
- Docs defined directly on the type alias are dropped directly in the
HTML by `render_assoc_items`, and are accessible without JavaScript.
The JSONP file will not list impl items that are known to be part
of the main HTML file already.
[JSONP]: https://en.wikipedia.org/wiki/JSONP
This is shorter, avoids potential conflicts with a crate
named `implementors`[^1], and will be less confusing when JS
include files are added for type aliases.
[^1]: AFAIK, this couldn't actually cause any problems right now,
but it's simpler just to make it impossible than relying on never
having a file named `trait.Foo.js` in the crate data area.
rustdoc: hide `#[repr(transparent)]` if it isn't part of the public ABI
Fixes#90435.
This hides `#[repr(transparent)]` when the non-1-ZST field the struct is "transparent" over is private.
CC `@RalfJung`
Tentatively nominating it for the release notes, feel free to remove the nomination.
`@rustbot` label needs-fcp relnotes A-rustdoc-ui