mirror of
https://github.com/rust-lang/rust.git
synced 2024-11-26 00:34:06 +00:00
689e8470ff
Clean up Vec's benchmarks The Vec benchmarks need a lot of love. I sort of noticed this in https://github.com/rust-lang/rust/pull/83357 but the overall situation is much less awesome than I thought at the time. The first commit just removes a lot of asserts and does a touch of other cleanup. A number of these benchmarks are poorly-named. For example, `bench_map_fast` is not in fact fast, `bench_rev_1` and `bench_rev_2` are vague, `bench_in_place_zip_iter_mut` doesn't call `zip`, `bench_in_place*` don't do anything in-place... Should I fix these, or is there tooling that depend on the names not changing? I've also noticed that `bench_rev_1` and `bench_rev_2` are remarkably fragile. It looks like poking other code in `Vec` can cause the codegen of this benchmark to switch to a version that has almost exactly half its current throughput and I have absolutely no idea why. Here's the fast version: ```asm 0.69 │110: movdqu -0x20(%rbx,%rdx,4),%xmm0 1.76 │ movdqu -0x10(%rbx,%rdx,4),%xmm1 0.71 │ pshufd $0x1b,%xmm1,%xmm1 0.60 │ pshufd $0x1b,%xmm0,%xmm0 3.68 │ movdqu %xmm1,-0x30(%rcx) 14.36 │ movdqu %xmm0,-0x20(%rcx) 13.88 │ movdqu -0x40(%rbx,%rdx,4),%xmm0 6.64 │ movdqu -0x30(%rbx,%rdx,4),%xmm1 0.76 │ pshufd $0x1b,%xmm1,%xmm1 0.77 │ pshufd $0x1b,%xmm0,%xmm0 1.87 │ movdqu %xmm1,-0x10(%rcx) 13.01 │ movdqu %xmm0,(%rcx) 38.81 │ add $0x40,%rcx 0.92 │ add $0xfffffffffffffff0,%rdx 1.22 │ ↑ jne 110 ``` And the slow one: ```asm 0.42 │9a880: movdqa %xmm2,%xmm1 4.03 │9a884: movq -0x8(%rbx,%rsi,4),%xmm4 8.49 │9a88a: pshufd $0xe1,%xmm4,%xmm4 2.58 │9a88f: movq -0x10(%rbx,%rsi,4),%xmm5 7.02 │9a895: pshufd $0xe1,%xmm5,%xmm5 4.79 │9a89a: punpcklqdq %xmm5,%xmm4 5.77 │9a89e: movdqu %xmm4,-0x18(%rdx) 15.74 │9a8a3: movq -0x18(%rbx,%rsi,4),%xmm4 3.91 │9a8a9: pshufd $0xe1,%xmm4,%xmm4 5.04 │9a8ae: movq -0x20(%rbx,%rsi,4),%xmm5 5.29 │9a8b4: pshufd $0xe1,%xmm5,%xmm5 4.60 │9a8b9: punpcklqdq %xmm5,%xmm4 9.81 │9a8bd: movdqu %xmm4,-0x8(%rdx) 11.05 │9a8c2: paddq %xmm3,%xmm0 0.86 │9a8c6: paddq %xmm3,%xmm2 5.89 │9a8ca: add $0x20,%rdx 0.12 │9a8ce: add $0xfffffffffffffff8,%rsi 1.16 │9a8d2: add $0x2,%rdi 2.96 │9a8d6: → jne 9a880 <<alloc::vec::Vec<T,A> as core::iter::traits::collect::Extend<&T>>::extend+0xd0> ``` |
||
---|---|---|
.. | ||
benches | ||
src | ||
tests | ||
Cargo.toml |