rust/src at c4168fdb50bd3d50a1729ae9af3ca4921841c35a - rust

mirror of https://github.com/rust-lang/rust.git synced 2025-05-08 07:57:40 +00:00

History

bors 8c4fc9d9a4 Auto merge of #94598 - scottmcm:prefix-free-hasher-methods, r=Amanieu Add a dedicated length-prefixing method to `Hasher` This accomplishes two main goals: - Make it clear who is responsible for prefix-freedom, including how they should do it - Make it feasible for a `Hasher` that doesn't care about Hash-DoS resistance to get better performance by not hashing lengths This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future. Fixes #94026 r? rust-lang/libs --- The core of this change is the following two new methods on `Hasher`: ```rust pub trait Hasher { /// Writes a length prefix into this hasher, as part of being prefix-free. /// /// If you're implementing [`Hash`] for a custom collection, call this before /// writing its contents to this `Hasher`. That way /// `(collection![1, 2, 3], collection![4, 5])` and /// `(collection![1, 2], collection![3, 4, 5])` will provide different /// sequences of values to the `Hasher` /// /// The `impl<T> Hash for [T]` includes a call to this method, so if you're /// hashing a slice (or array or vector) via its `Hash::hash` method, /// you should not call this yourself. /// /// This method is only for providing domain separation. If you want to /// hash a `usize` that represents part of the data, then it's important /// that you pass it to [`Hasher::write_usize`] instead of to this method. /// /// # Examples /// /// ``` /// #![feature(hasher_prefixfree_extras)] /// # // Stubs to make the `impl` below pass the compiler /// # struct MyCollection<T>(Option<T>); /// # impl<T> MyCollection<T> { /// # fn len(&self) -> usize { todo!() } /// # } /// # impl<'a, T> IntoIterator for &'a MyCollection<T> { /// # type Item = T; /// # type IntoIter = std::iter::Empty<T>; /// # fn into_iter(self) -> Self::IntoIter { todo!() } /// # } /// /// use std:#️⃣:{Hash, Hasher}; /// impl<T: Hash> Hash for MyCollection<T> { /// fn hash<H: Hasher>(&self, state: &mut H) { /// state.write_length_prefix(self.len()); /// for elt in self { /// elt.hash(state); /// } /// } /// } /// ``` /// /// # Note to Implementers /// /// If you've decided that your `Hasher` is willing to be susceptible to /// Hash-DoS attacks, then you might consider skipping hashing some or all /// of the `len` provided in the name of increased performance. #[inline] #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")] fn write_length_prefix(&mut self, len: usize) { self.write_usize(len); } /// Writes a single `str` into this hasher. /// /// If you're implementing [`Hash`], you generally do not need to call this, /// as the `impl Hash for str` does, so you can just use that. /// /// This includes the domain separator for prefix-freedom, so you should /// not call `Self::write_length_prefix` before calling this. /// /// # Note to Implementers /// /// The default implementation of this method includes a call to /// [`Self::write_length_prefix`], so if your implementation of `Hasher` /// doesn't care about prefix-freedom and you've thus overridden /// that method to do nothing, there's no need to override this one. /// /// This method is available to be overridden separately from the others /// as `str` being UTF-8 means that it never contains `0xFF` bytes, which /// can be used to provide prefix-freedom cheaper than hashing a length. /// /// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating /// them into a buffer), then you can hash the bytes of the `str` followed /// by a single `0xFF` byte. /// /// If your `Hasher` works in chunks, you can also do this by being careful /// about how you pad partial chunks. If the chunks are padded with `0x00` /// bytes then just hashing an extra `0xFF` byte doesn't necessarily /// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash /// the same sequence of chunks. But if you pad with `0xFF` bytes instead, /// ensuring at least one padding byte, then it can often provide /// prefix-freedom cheaper than hashing the length would. #[inline] #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")] fn write_str(&mut self, s: &str) { self.write_length_prefix(s.len()); self.write(s.as_bytes()); } } ``` With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`. `write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach. --- Compatibility: Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.		2022-05-06 09:43:57 +00:00
..
base_n	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
binary_search_util	Adopt let else in more places	2022-02-19 17:27:43 +01:00
fingerprint	Make `Fingerprint::combine_commutative` associative	2022-01-03 19:07:29 +01:00
flock	separate flock implementations into separate modules	2022-04-14 18:30:53 -04:00
graph	Avoid exhausting stack space in dominator compression	2022-02-23 16:07:56 -05:00
intern	Rename `PtrKey` as `Interned` and improve it.	2022-02-15 15:50:29 +11:00
obligation_forest	obligation forest docs	2022-02-21 12:00:26 +01:00
owning_ref	Also fix “a `OwningRef`”	2021-08-24 02:28:38 +02:00
sip128	SipHasher128: improve constant names and add more comments	2020-10-11 23:48:35 -07:00
small_c_str	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
small_str	Add SmallStr	2022-03-04 16:57:34 +01:00
snapshot_map	Call the method fork instead of clone and add proper comments	2022-02-14 12:57:20 -03:00
sorted_map	Remove invalid #[cfg(tests)] in index_map	2022-03-04 11:34:50 +01:00
sso	compiler: fix some typos	2022-03-01 20:02:47 +08:00
stable_hasher	Fix `isize` optimization in `StableHasher` for big-endian architectures	2022-02-03 11:47:41 +01:00
tagged_ptr	Small performance tweaks	2021-12-12 12:35:01 +08:00
thin_vec	eplace usages of vec![].into_iter with [].into_iter	2022-01-09 14:09:25 +11:00
tiny_list	Move some test-only code to test files	2021-03-17 10:31:30 -04:00
transitive_relation	Spellchecking some comments	2022-03-30 01:39:38 -04:00
vec_map	eplace usages of vec![].into_iter with [].into_iter	2022-01-09 14:09:25 +11:00
atomic_ref.rs	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
base_n.rs	Apply clippy suggestions	2021-10-10 15:38:19 +02:00
captures.rs	Remove `#[allow(unused_lifetimes)]` which is now unnecessary	2021-06-17 08:56:54 +09:00
fingerprint.rs	Provide copy-free access to raw Decoder bytes	2022-02-22 18:11:59 -05:00
flock.rs	separate flock implementations into separate modules	2022-04-14 18:30:53 -04:00
frozen.rs	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
functor.rs	Make IdFunctor::try_map_id panic-safe	2021-12-07 11:11:23 +00:00
fx.rs	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
intern.rs	Document and rename the new wrapper type	2022-04-07 13:01:48 +00:00
jobserver.rs	datastructures: replace `lazy_static` by `SyncLazy` from std	2020-09-01 22:06:47 +01:00
lib.rs	Auto merge of #94598 - scottmcm:prefix-free-hasher-methods, r=Amanieu	2022-05-06 09:43:57 +00:00
macros.rs	Introduce `ChunkedBitSet` and use it for some dataflow analyses.	2022-02-23 10:18:49 +11:00
map_in_place.rs	Add debug assertions to some unsafe functions	2022-03-29 11:05:24 -04:00
memmap.rs	Add safety comment to StableAddress impl for Mmap	2021-04-03 14:51:05 +02:00
profiling.rs	add `generic_activity_with_arg_recorder` to the self-profiler	2022-04-07 15:47:20 +02:00
sharded.rs	Move Sharded maps into each QueryCache impl	2022-02-20 12:10:46 -05:00
sip128.rs	Add a dedicated length-prefixing method to `Hasher`	2022-05-06 00:03:38 -07:00
small_c_str.rs	Inline SmallCStr::deref	2022-03-04 16:57:34 +01:00
small_str.rs	Add SmallStr	2022-03-04 16:57:34 +01:00
sorted_map.rs	Use SortedMap in HIR.	2021-10-21 23:08:57 +02:00
stable_hasher.rs	Add a dedicated length-prefixing method to `Hasher`	2022-05-06 00:03:38 -07:00
stable_map.rs	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
stable_set.rs	mv compiler to compiler/	2020-08-30 18:45:07 +03:00
stack.rs	Allow inlining of ensure_sufficient_stack()	2022-02-12 11:30:04 +01:00
steal.rs	more clippy fixes	2021-11-07 16:59:05 +01:00
svh.rs	Make `Decodable` and `Decoder` infallible.	2022-01-22 10:38:31 +11:00
sync.rs	Fix typos “a”→“an”	2021-08-22 15:35:11 +02:00
tagged_ptr.rs	Miscellaneous inlining improvements	2021-06-02 08:49:58 +02:00
temp_dir.rs	Capitalize safety comments	2020-09-08 22:37:18 -04:00
thin_vec.rs	Rustdoc: use ThinVec for GenericArgs bindings	2022-01-01 11:29:14 +01:00
tiny_list.rs	Apply clippy suggestions	2021-10-10 15:38:19 +02:00
transitive_relation.rs	add `#[rustc_pass_by_value]` to more types	2022-03-08 15:39:52 +01:00
unhash.rs	Avoid rehashing Fingerprint as a map key	2020-09-01 18:27:02 -07:00
vec_linked_list.rs	Stop enabling `in_band_lifetimes` in rustc_data_structures	2021-12-05 20:17:35 -08:00
vec_map.rs	Fix some fallout around type alias impl trait in associated types	2022-04-06 12:56:22 +00:00
work_queue.rs	Remove (lots of) dead code	2021-03-27 22:16:33 -04:00