nordic-dev.net/rust - rust

mirror of https://github.com/rust-lang/rust.git synced 2024-12-03 04:04:06 +00:00

Author	SHA1	Message	Date
Michael Howell	0ea58e2346	rustdoc-search: count path edits with separate edit limit Since the two are counted separately elsewhere, they should get their own limits, too. The biggest problem with combining them is that paths are loosely checked by not requiring every component to match, which means that if they are short and matched loosely, they can easily find "drunk typist" matches that make no sense, like this old result: std::collections::btree_map::itermut matching slice::itermut maxEditDistance = ("slice::itermut".length) / 3 = 14 / 3 = 4 editDistance("std", "slice") = 4 editDistance("itermut", "itermut") = 0 4 + 0 <= 4 PASS Of course, `slice::itermut` should not match stuff from btreemap. `slice` should not match `std`. The new result counts them separately: maxPathEditDistance = "slice".length / 3 = 5 / 3 = 1 maxEditDistance = "itermut".length / 3 = 7 / 3 = 2 editDistance("std", "slice") = 4 4 <= 1 FAIL Effectively, this makes path queries less "typo-resistant". It's not zero, but it means `vec` won't match the `v1` prelude. Queries without parent paths are unchanged.	2023-12-26 18:46:17 -07:00
Michael Howell	6b69ebcae0	rustdoc-search: remove parallel searchWords array This might have made sense if the algorithm could use `searchWords` to skip having to look at `searchIndex`, but since it always does a substring check on both the stock word and the normalizedName, it doesn't seem to help performance anyway.	2023-12-15 16:26:35 -07:00
Michael Howell	9a9695a052	rustdoc-search: use set ops for ranking and filtering This commit adds ranking and quick filtering to type-based search, improving performance and having it order results based on their type signatures. Motivation ---------- If I write a query like `str -> String`, a lot of functions come up. That's to be expected, but `String::from_str` should come up on top, and it doesn't right now. This is because the sorting algorithm is based on the functions name, and doesn't consider the type signature at all. `slice::join` even comes up above it! To fix this, the sorting should take into account the function's signature, and the closer match should come up on top. Guide-level description ----------------------- When searching by type signature, types with a "closer" match will show up above types that match less precisely. Reference-level explanation --------------------------- Functions signature search works in three major phases: * A compact "fingerprint," based on the [bloom filter] technique, is used to check for matches and to estimate the distance. It sometimes has false positive matches, but it also operates on 128 bit contiguous memory and requires no backtracking, so it performs a lot better than real unification. The fingerprint represents the set of items in the type signature, but it does not represent nesting, and it ignores when the same item appears more than once. The result is rejected if any query bits are absent in the function, or if the distance is higher than the current maximum and 200 results have already been found. * The second step performs unification. This is where nesting and true bag semantics are taken into account, and it has no false positives. It uses a recursive, backtracking algorithm. The result is rejected if any query elements are absent in the function. [bloom filter]: https://en.wikipedia.org/wiki/Bloom_filter Drawbacks --------- This makes the code bigger. More than that, this design is a subtle trade-off. It makes the cases I've tested against measurably faster, but it's not clear how well this extends to other crates with potentially more functions and fewer types. The more complex things get, the more important it is to gather a good set of data to test with (this is arguably more important than the actual benchmarking ifrastructure right now). Rationale and alternatives -------------------------- Throwing a bloom filter in front makes it faster. More than that, it tries to take a tactic where the system can not only check for potential matches, but also gets an accurate distance function without needing to do unification. That way it can skip unification even on items that have the needed elems, as long as they have more items than the currently found maximum. If I didn't want to be able to cheaply do set operations on the fingerprint, a [cuckoo filter] is supposed to have better performance. But the nice bit-banging set intersection doesn't work AFAIK. I also looked into [minhashing], but since it's actually an unbiased estimate of the similarity coefficient, I'm not sure how it could be used to skip unification (I wouldn't know if the estimate was too low or too high). This function actually uses the number of distinct items as its "distance function." This should give the same results that it would have gotten from a Jaccard Distance $1-\frac{\|F\cap{}Q\|}{\|F\cup{}Q\|}$, while being cheaper to compute. This is because: * The function $F$ must be a superset of the query $Q$, so their union is just $F$ and the intersection is $Q$ and it can be reduced to $1-\frac{\|Q\|}{\|F\|}. * There are no magic thresholds. These values are only being used to compare against each other while sorting (and, if 200 results are found, to compare with the maximum match). This means we only care if one value is bigger than the other, not what it's actual value is, and since $Q$ is the same for everything, it can be safely left out, reducing the formula to $1-\frac{1}{\|F\|} = \frac{\|F\|}{\|F\|}-\frac{1}{\|F\|} = \|F\|-1$. And, since the values are only being compared with each other, $\|F\|$ is fine. Prior art --------- This is significantly different from how Hoogle does it. It doesn't account for order, and it has no special account for nesting, though `Box<t>` is still two items, while `t` is only one. This should give the same results that it would have gotten from a Jaccard Distance $1-\frac{\|A\cap{}B\|}{\|A\cup{}B\|}$, while being cheaper to compute. Unresolved questions -------------------- `[]` and `()`, the slice/array and tuple/union operators, are ignored while building the signature for the query. This is because they match more than one thing, making them ambiguous. Unfortunately, this also makes them a performance cliff. Is this likely to be a problem? Right now, the system just stashes the type distance into the same field that levenshtein distance normally goes in. This means exact query matches show up on top (for example, if you have a function like `fn nothing(a: Nothing, b: i32)`, then searching for `nothing` will show it on top even if there's another function with `fn bar(x: Nothing)` that's technically a closer match in type signature. Future possibilities -------------------- It should be possible to adopt more sorting criteria to act as a tie breaker, which could be determined during unification. [cuckoo filter]: https://en.wikipedia.org/wiki/Cuckoo_filter [minhashing]: https://en.wikipedia.org/wiki/MinHash	2023-12-13 10:37:15 -07:00
Michael Howell	7162cb9550	rustdoc-search: fix fast path unboxing bindings	2023-12-10 20:53:53 -07:00
Michael Howell	92b84f849a	rustdoc-search: do not treat associated type names as types Before: http://notriddle.com/rustdoc-html-demo-6/tor-before/tor_config/ After: http://notriddle.com/rustdoc-html-demo-6/tor-after/tor_config/ Profile: http://notriddle.com/rustdoc-html-demo-6/tor-profile/ As a bit of background information: in type-based queries, a type name that does not exist gets treated as a generic type variable. This causes a counterintuitive behavior in the `tor_config` crate, which has a trait with an associated type variable called `T`. This isn't a searchable concrete type, but its name still gets stored in the typeNameIdMap, as a convenient way to intern its name.	2023-12-10 16:52:21 -07:00
Michael Howell	1b7b9540fe	rustdoc-search: avoid infinite where clause unbox Fixes #118242	2023-11-24 10:42:11 -07:00
Michael Howell	63c50712f4	rustdoc-search: add support for associated types	2023-11-19 18:54:36 -07:00
Michael Howell	a66972d551	rustdoc-search: fix accidental shared, mutable map	2023-11-17 18:22:31 -07:00
Guillaume Gomez	efac0b9c02	Add regression test for #115480	2023-10-11 11:41:39 +02:00
Guillaume Gomez	4be9cfabf2	Rollup merge of #109422 - notriddle:notriddle/impl-disambiguate-search, r=GuillaumeGomez rustdoc-search: add impl disambiguator to duplicate assoc items Preview (to see the difference, click the link and pay attention to the specific function that comes up): \| Before \| After \| \|--\|--\| \| [`simd<i64>, simd<i64> -> simd<i64>`](https://doc.rust-lang.org/nightly/std/?search=simd%3Ci64%3E%2C%20simd%3Ci64%3E%20-%3E%20simd%3Ci64%3E) \| [`simd<i64>, simd<i64> -> simd<i64>`](https://notriddle.com/rustdoc-demo-html-3/impl-disambiguate-search/std/index.html?search=simd%3Ci64%3E%2C%20simd%3Ci64%3E%20-%3E%20simd%3Ci64%3E) \| \| [`cow, vec -> bool`](https://doc.rust-lang.org/nightly/std/?search=cow%2C%20vec%20-%3E%20bool) \| [`cow, vec -> bool`](https://notriddle.com/rustdoc-demo-html-3/impl-disambiguate-search/std/index.html?search=cow%2C%20vec%20-%3E%20bool) Helps with #90929 This changes the search results, specifically, when there's more than one impl with an associated item with the same name. For example, the search queries `simd<i8> -> simd<i8>` and `simd<i64> -> simd<i64>` don't link to the same function, but most of the functions have the same names. This change should probably be FCP-ed, especially since it adds a new anchor link format for `main.js` to handle, so that URLs like `struct.Vec.html#impl-AsMut<[T]>-for-Vec<T,+A>/method.as_mut` redirect to `struct.Vec.html#method.as_mut-2`. It's a strange design, but there are a few reasons for it: * I'd like to avoid making the HTML bigger. Obviously, fixing this bug is going to add at least a little more data to the search index, but adding more HTML penalises viewers for the benefit of searchers. * Breaking `struct.Vec.html#method.len` would also be a disappointment. On the other hand: * The path-style anchors might be less prone to link rot than the numbered anchors. It's definitely less likely to have URLs that appear to "work", but silently point at the wrong thing. * This commit arranges the path-style anchor to redirect to the numbered anchor. Nothing stops rustdoc from doing the opposite, making path-style anchors the default and redirecting the "legacy" numbered ones. ### The bug On the "Before" links, this example search calls for `i64`: ![image](https://github.com/rust-lang/rust/assets/1593513/9431d89d-41dc-4f68-bbb1-3e2704a973d2) But if I click any of the results, I get `f64` instead. ![image](https://github.com/rust-lang/rust/assets/1593513/6d89c692-1847-421a-84d9-22e359d9cf82) The PR fixes this problem by adding enough information to the search result `href` to disambiguate methods with different types but the same name. More detailed description of the problem at: https://github.com/rust-lang/rust/pull/109422#issuecomment-1491089293 > When a struct/enum/union has multiple impls with different type parameters, it can have multiple methods that have the same name, but which are on different impls. Besides Simd, [Any](https://doc.rust-lang.org/nightly/std/any/trait.Any.html?search=any%3A%3Adowncast) also demonstrates this pattern. It has three methods named `downcast`, on three different impls. > > When that happens, it presents a challenge in linking to the method. Normally we link like `#method.foo`. When there are multiple `foo`, we number them like `#method.foo`, `#method.foo-1`, `#method.foo-2`, etc. > > It also presents a challenge for our search code. Currently we store all the variants in the index, but don’t have any way to generate unambiguous URLs in the results page, or to distinguish them in the SERP. > > To fix this, we need three things: > > 1. A fragment format that fully specifies the impl type parameters when needed to disambiguate (`#impl-SimdOrd-for-Simd<i64,+LANES>/method.simd_max`) > 2. A search index that stores methods with enough information to disambiguate the impl they were on. > 3. A search results interface that can display multiple methods on the same type with the same name, when appropriate OR a disambiguation landing section on item pages? > > For reviewers: it can be hard to see the new fragment format in action since it immediately gets rewritten to the numbered form.	2023-10-10 18:44:43 +02:00
Michael Howell	1eb2a76641	rustdoc-search: fix bug with multi-item impl trait	2023-10-05 22:32:37 -07:00
Michael Howell	3fbfe2bca5	rustdoc-search: add impl disambiguator to duplicate assoc items Helps with #90929 This changes the search results, specifically, when there's more than one impl with an associated item with the same name. For example, the search queries `simd<i8> -> simd<i8>` and `simd<i64> -> simd<i64>` don't link to the same function, but most of the functions have the same names. This change should probably be FCP-ed, especially since it adds a new anchor link format for `main.js` to handle, so that URLs like `struct.Vec.html#impl-AsMut<[T]>-for-Vec<T,+A>/method.as_mut` redirect to `struct.Vec.html#method.as_mut-2`. It's a strange design, but there are a few reasons for it: * I'd like to avoid making the HTML bigger. Obviously, fixing this bug is going to add at least a little more data to the search index, but adding more HTML penalises viewers for the benefit of searchers. * Breaking `struct.Vec.html#method.len` would also be a disappointment. On the other hand: * The path-style anchors might be less prone to link rot than the numbered anchors. It's definitely less likely to have URLs that appear to "work", but silently point at the wrong thing. * This commit arranges the path-style anchor to redirect to the numbered anchor. Nothing stops rustdoc from doing the opposite, making path-style anchors the default and redirecting the "legacy" numbered ones.	2023-09-21 15:16:44 -07:00
Michael Howell	269cb57947	rustdoc-search: fix bugs when unboxing and reordering combine	2023-09-09 16:58:37 -07:00
Michael Howell	6068850008	rustdoc: fix test case for generics that look like names	2023-09-03 13:06:08 -07:00
Michael Howell	0b3c617ec0	rustdoc-search: add support for type parameters When writing a type-driven search query in rustdoc, specifically one with more than one query element, non-existent types become generic parameters instead of auto-correcting (which is currently only done for single-element queries) or giving no result. You can also force a generic type parameter by writing `generic:T` (and can force it to not use a generic type parameter with something like `struct:T` or whatever, though if this happens it means the thing you're looking for doesn't exist and will give you no results). There is no syntax provided for specifying type constraints for generic type parameters. When you have a generic type parameter in a search query, it will only match up with generic type parameters in the actual function, not concrete types that match, not concrete types that implement a trait. It also strictly matches based on when they're the same or different, so `option<T>, option<U> -> option<U>` matches `Option::and`, but not `Option::or`. Similarly, `option<T>, option<T> -> option<T>`` matches `Option::or`, but not `Option::and`.	2023-09-03 13:06:06 -07:00
Guillaume Gomez	e161fa1a6b	Correctly handle paths from foreign items	2023-09-02 23:04:37 +02:00
Guillaume Gomez	05eda41658	Add tests for type-based search	2023-09-01 15:16:11 +02:00
bors	314c39d2ea	Auto merge of #112233 - notriddle:notriddle/search-unify, r=GuillaumeGomez rustdoc-search: clean up type unification and "unboxing" This PR redesigns parameter matching, return matching, and generics matching to use a single function that compares two lists of types. It also makes the algorithms more consistent, so the "unboxing" behavior where `Vec<i32>` is considered a match for `i32` works inside generics, and not just at the top level.	2023-06-15 03:04:46 +00:00
Michael Howell	db277f5284	rustdoc-search: search never type with `!` This feature extends rustdoc to support the syntax that most users will naturally attempt to use to search for diverging functions. Part of #60485 It's already possible to do this search with `primitive:never`, but that's not what the Rust language itself uses, so nobody will try it if they aren't told or helped along.	2023-06-12 17:30:23 -07:00
Michael Howell	94badbe599	rustdoc-search: fix order-independence bug	2023-06-11 18:57:33 -07:00
Michael Howell	9946d67579	rustdoc-search: build args, return, and generics on one unifier This enhances generics with the "unboxing" behavior where A<T> matches T. It makes this unboxing transitive over generics.	2023-06-11 18:19:37 -07:00
Michael Howell	d3a4cd6813	rustdoc: add note about slice/array searches to help popup	2023-06-10 14:08:26 -07:00
Michael Howell	2e569274d3	rustdoc: search for slices and arrays by type with `[]` Part of #60485	2023-06-10 13:52:54 -07:00
Guillaume Gomez	9803651ee8	Update rustdoc-js* format	2023-06-09 17:00:47 +02:00
Yuki Okushi	30220be929	Rollup merge of #110780 - notriddle:notriddle/slice-index, r=GuillaumeGomez rustdoc-search: add slices and arrays to index This indexes them as primitives with generics, so `slice<u32>` is how you search for `[u32]`, and `array<u32>` for `[u32; 1]`. A future commit will desugar the square bracket syntax to search both arrays and slices at once.	2023-05-06 09:09:31 +09:00
Michael Howell	c4e00f7bd5	rustdoc-search: add slices and arrays to index This indexes them as primitives with generics, so `slice<u32>` is how you search for `[u32]`, and `array<u32>` for `[u32; 1]`. A future commit will desugar the square bracket syntax to search both arrays and slices at once.	2023-04-24 12:14:35 -07:00
Michael Howell	4c11822aeb	rustdoc: restructure type search engine to pick-and-use IDs This change makes it so, instead of mixing string distance with type unification, function signature search works by mapping names to IDs at the start, reporting to the user any cases where it had to make corrections, and then matches with IDs when going through the items. This only changes function searches. Name searches are left alone, and corrections are only done when there's a single item in the search query.	2023-04-17 12:16:54 -07:00
Michael Howell	afee2411e3	rustdoc-search: add support for nested generics	2023-04-14 14:55:45 -07:00
Michael Howell	e600c0ba0e	rustdoc: add support for type filters in arguments and generics This makes sense, since the search index has the information in it, and it's more useful for function signature searches since a function signature search's item type is, by definition, some type of function (there's more than one, but not very many).	2023-03-20 22:41:57 -07:00
Michael Howell	5451fe7d7c	rustdoc: implement bag semantics for function parameter search This tweak to the function signature search engine makes things so that, if a type is repeated in the search query, it'll only match if the function actually includes it that many times.	2023-03-19 18:19:24 -07:00
Michael Howell	44813e038c	rustdoc: fix type search when more than one `where` clause applies	2023-03-07 11:37:04 -07:00
Michael Howell	a6446c53fe	rustdoc: fix type search index for `fn<T>() -> &T where T: Trait`	2023-03-07 11:20:49 -07:00
Michael Howell	9d27028391	rustdoc: function signature search with traits in `where` clause	2023-03-04 09:05:57 -07:00
Michael Howell	4de9c6d491	rustdoc: search by macro when query ends with `!` Related to #96399	2023-02-16 18:16:09 -07:00
Michael Howell	39fd4bb476	rustdoc: update test cases to match with stricter match criteria	2023-01-21 00:11:39 -07:00
Michael Howell	e09e6df787	rustdoc: compute maximum Levenshtein distance based on the query The heuristic is pretty close to the name resolver. Fixes #103357	2023-01-21 00:11:39 -07:00
Michael Howell	db558b4686	rustdoc: update search test cases	2023-01-14 12:04:12 -07:00
Albert Larsan	cf2dff2b1e	Move /src/test to /tests	2023-01-11 09:32:08 +00:00

38 Commits