Commit Graph

66 Commits

Author SHA1 Message Date
Chai T. Rex
0cac915211 Improve isqrt tests and add benchmarks
* Choose test inputs more thoroughly and systematically.
* Check that `isqrt` and `checked_isqrt` have equivalent results for
  signed types, either equivalent numerically or equivalent as a panic
  and a `None`.
* Check that `isqrt` has numerically-equivalent results for unsigned
  types and their `NonZero` counterparts.
* Reuse `ilog10` benchmarks, plus benchmarks that use a uniform
  distribution.
2024-08-28 23:06:54 -04:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Arpad Borsos
0334c45bb5
Write char::DebugEscape sequences using write_str
Instead of writing each `char` of an escape sequence one by one,
this delegates to `Display`, which uses `write_str` internally
in order to write the whole escape sequence at once.
2024-05-20 10:04:44 +02:00
Arpad Borsos
5fe296c6c5
Add benchmarks for impl Debug for str
In order to inform future perf improvements and prevent regressions,
lets add some benchmarks that stress `impl Debug for str`.
2024-05-01 09:54:29 +02:00
Guillaume Gomez
206e0df78d
Rollup merge of #115913 - FedericoStra:checked_ilog, r=the8472
checked_ilog: improve performance

Addresses #115874.

(This PR replicates the original #115875, which I accidentally closed by deleting my forked repository...)
2024-04-22 20:25:58 +02:00
Ralf Jung
c0b564b767 disable benches in Miri 2024-04-07 09:58:10 +02:00
okaneco
69637c9212 Add benches for net parsing
Add benches for IpAddr, Ipv4Addr, Ipv6Addr, SocketAddr, SocketAddrV4,
and SocketAddrV6 parsing
2024-03-04 18:46:24 -05:00
bors
fa404339c9 Auto merge of #85528 - the8472:iter-markers, r=dtolnay
Implement iterator specialization traits on more adapters

This adds

* `TrustedLen` to `Skip` and `StepBy`
* `TrustedRandomAccess` to `Skip`
* `InPlaceIterable` and `SourceIter` to  `Copied` and `Cloned`

The first two might improve performance in the compiler itself since `skip` is used in several places. Constellations that would exercise the last point are probably rare since it would require an owning iterator that has references as Items somewhere in its iterator pipeline.

Improvements for `Skip`:

```
# old
test iter::bench_skip_trusted_random_access                     ... bench:       8,335 ns/iter (+/- 90)

# new
test iter::bench_skip_trusted_random_access                     ... bench:       2,753 ns/iter (+/- 27)
```
2024-01-21 11:17:46 +00:00
Matthias Krüger
17c95b6330
Rollup merge of #113142 - the8472:opt-cstr-display, r=Mark-Simulacrum
optimize EscapeAscii's Display  and CStr's Debug

```
old:
    ascii::bench_ascii_escape_display_mixed      17.97µs/iter +/- 204.00ns
    ascii::bench_ascii_escape_display_no_escape 545.00ns/iter   +/- 6.00ns
new:
    ascii::bench_ascii_escape_display_mixed      4.99µs/iter +/- 56.00ns
    ascii::bench_ascii_escape_display_no_escape 91.00ns/iter  +/- 1.00ns
```
2024-01-20 09:37:25 +01:00
Nicholas Thompson
c65c35b3ef Reduced amount of int_pow benches
Also simplified the macros
2024-01-11 14:00:01 -05:00
Nicholas Thompson
7dcce97686 Edited int_pow micro-benchmarks 2024-01-11 11:30:12 -05:00
Nicholas Thompson
33a47df84a Added int_pow micro-benchmarks 2024-01-11 11:30:12 -05:00
The8472
3aa73135cf bench trustedrandomaccess specialization in zip 2024-01-10 18:59:44 +01:00
Jake Goulding
5772818dc8 Adjust library tests for unused_tuple_struct_fields -> dead_code 2024-01-02 15:34:37 -05:00
surechen
40ae34194c remove redundant imports
detects redundant imports that can be eliminated.

for #117772 :

In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
2023-12-10 10:56:22 +08:00
The 8472
3f55e8665c benchmarks for Chars::advance_by 2023-11-27 22:06:35 +01:00
Federico Stra
0f9a4d9ab4 checked_ilog: add benchmarks 2023-09-22 16:03:14 +02:00
Deadbeef
626efab67f fix 2023-07-23 09:58:31 +00:00
The 8472
6c87448b57 optimize Cstr/EscapeAscii display
old:
    ascii::bench_ascii_escape_display_mixed      17.97µs/iter +/- 204.00ns
    ascii::bench_ascii_escape_display_no_escape 545.00ns/iter   +/- 6.00ns
new:
    ascii::bench_ascii_escape_display_mixed      4.99µs/iter +/- 56.00ns
    ascii::bench_ascii_escape_display_no_escape 91.00ns/iter  +/- 1.00ns
2023-06-29 01:55:03 +02:00
The 8472
070ce235f2 Specialize StepBy<Range<{integer}>>
For ranges < usize we determine the number of items
StepBy would yield and then store that in the range.end
instead of the actual end. This significantly
simplifies calculation of the loop induction variable
especially in cases where StepBy::step (an usize)
could overflow the Range's item type
2023-06-23 00:17:34 +02:00
The 8472
cfb0f11a9f add benchmark 2023-06-12 13:03:29 +02:00
The 8472
b40896d17b optimize next_chunk impls for Filter and FilterMap 2023-05-20 11:29:34 +02:00
Matthias Krüger
fc30207b16
Rollup merge of #108291 - chenyukang:yukang/fix-benchmarks, r=workingjubilee
Fix more benchmark test with black_box

Follow up fix for https://github.com/rust-lang/rust/issues/107590
2023-05-15 17:12:43 +02:00
mazong1123
b0a85d614d Add shortcut for Grisu3 algorithm.
Check requested digit length and the fractional or integral parts of the number. Falls back earlier without trying the Grisu algorithm if the specific condition meets.

Fix #110129
2023-04-25 11:34:57 +08:00
bors
816f958ac3 Auto merge of #108157 - scottmcm:tuple-gt-via-partialcmp, r=dtolnay
Use `partial_cmp` to implement tuple `lt`/`le`/`ge`/`gt`

In today's implementation, `(A, B)::gt` contains calls to *both* `A::eq` *and* `A::gt`.

That's fine for primitives, but for things like `String`s it's kinda weird -- `(String, usize)::gt` has a call to both `bcmp` and `memcmp` (<https://rust.godbolt.org/z/7jbbPMesf>) because when `bcmp` says the `String`s aren't equal, it turns around and calls `memcmp` to find out which one's bigger.

This PR changes the implementation to instead implement `(A, …, C, Z)::gt` using `A::partial_cmp`, `…::partial_cmp`, `C::partial_cmp`, and `Z::gt`.  (And analogously for `lt`, `le`, and `ge`.)  That way expensive comparisons don't need to be repeated.

Technically this is an observable change on stable, so I've marked it `needs-fcp` + `T-libs-api` and will
r? rust-lang/libs-api

I'm hoping that this will be non-controversial, however, since it's very similar to the observable changes that were made to the derives (#81384 #98655) -- like those, this only changes behaviour if a type overrode behaviour in a way inconsistent with the rules for the various traits involved.

(The first commit here is #108156, adding the codegen test, which I used to make sure this doesn't regress behaviour for primitives.)

Zulip conversation about this change: <https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.60.3E.60.20on.20Tuples/near/328392927>.
2023-03-05 22:02:26 +00:00
yukang
62cfd8a123 fix more benchmark test with black_box 2023-02-21 03:26:40 +00:00
Scott McMurray
4492793e0d Add a slightly-contrived tuple comparison benchmark 2023-02-17 11:46:19 -08:00
kadmin
826abcc728 Shrink size of array benchmarks 2023-02-14 05:01:24 +00:00
kadmin
cbd1b81bd2 Add array::map benchmarks 2023-02-11 04:23:53 +00:00
yukang
fe84cecf60 fix #107590, Fix benchmarks in library/core with black_box 2023-02-03 00:33:36 +08:00
Thom Chiovoloni
a4bf36e87b
Update rand in the stdlib tests, and remove the getrandom feature from it 2023-01-04 14:52:41 -08:00
Dylan DPC
1db7f690b1
Rollup merge of #103570 - lukas-code:stabilize-ilog, r=scottmcm
Stabilize integer logarithms

Stabilizes feature `int_log`.

I've also made the functions const stable, because they don't depend on any unstable const features. `rustc_allow_const_fn_unstable` is just there for `Option::expect`, which could be replaced with a `match` and `panic!`. cc ``@rust-lang/wg-const-eval``

closes https://github.com/rust-lang/rust/issues/70887 (tracking issue)

~~blocked on FCP finishing: https://github.com/rust-lang/rust/issues/70887#issuecomment-1289028216~~
FCP finished: https://github.com/rust-lang/rust/issues/70887#issuecomment-1302121266
2022-11-09 19:21:21 +05:30
The 8472
b00666ed09 add benchmark for iter::ArrayChunks::fold specialization
This also updates the existing iter::Copied::next_chunk benchmark so
that the thing it benches doesn't get masked by the ArrayChunks specialization
2022-11-07 21:44:24 +01:00
Lukas Markeffsky
9e36fd926c stabilize int_log 2022-10-26 11:58:33 +02:00
The 8472
963d6f757c add a benchmark for slice_iter.copied().array_chunks() 2022-10-17 23:40:21 +02:00
Tim Vermeulen
db2b4a3a7e Use internal iteration in Iterator::{cmp_by, partial_cmp_by, eq_by} 2022-08-21 12:23:10 +02:00
Eric Holk
c18f22058b Rename integer log* methods to ilog*
This reflects the concensus from the libs team as reported at
https://github.com/rust-lang/rust/issues/70887#issuecomment-1209513261

Co-authored-by: Yosh Wuyts <github@yosh.is>
2022-08-09 10:20:49 -07:00
Nilstrieb
3358a41acb Add unicode fast path to is_printable
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.
2022-05-31 10:51:35 +02:00
bors
12d3f107c1 Auto merge of #96626 - thomcc:rand-bump, r=m-ou-se
Avoid using `rand::thread_rng` in the stdlib benchmarks.

This is kind of an anti-pattern because it introduces extra nondeterminism for no real reason. In thread_rng's case this comes both from the random seed and also from the reseeding operations it does, which occasionally does syscalls (which adds additional nondeterminism). The impact of this would be pretty small in most cases, but it's a good practice to avoid (particularly because avoiding it was not hard).

Anyway, several of our benchmarks already did the right thing here anyway, so the change was pretty easy and mostly just applying it more universally. That said, the stdlib benchmarks aren't particularly stable (nor is our benchmark framework particularly great), so arguably this doesn't matter that much in practice.

~~Anyway, this also bumps the `rand` dev-dependency to 0.8, since it had fallen somewhat out of date.~~ Nevermind, too much of a headache.
2022-05-05 05:08:44 +00:00
The 8472
e3db41bf97 add benchmark 2022-05-02 20:54:46 +02:00
Thom Chiovoloni
0812759840
Avoid use of rand::thread_rng in stdlib benchmarks 2022-05-02 00:08:21 -07:00
Eduardo Sánchez Muñoz
93ae6f80e3 Make some usize-typed masks definition agnostic to the size of usize
Some masks where defined as
```rust
const NONASCII_MASK: usize = 0x80808080_80808080u64 as usize;
```
where it was assumed that `usize` is never wider than 64, which is currently true.

To make those constants valid in a hypothetical 128-bit target, these constants have been redefined in an `usize`-width-agnostic way
```rust
const NONASCII_MASK: usize = usize::from_ne_bytes([0x80; size_of::<usize>()]);
```

There are already some cases where Rust anticipates the possibility of supporting 128-bit targets, such as not implementing `From<usize>` for `u64`.
2022-04-15 17:04:59 +02:00
T-O-R-U-S
72a25d05bf Use implicit capture syntax in format_args
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.

A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
2022-03-10 10:23:40 -05:00
Scott McMurray
8ca47d7ae4 Stop manually SIMDing in swap_nonoverlapping
Like I previously did for `reverse`, this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.

It does still need logic to type-erase where appropriate, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.

As a bonus, this also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
2022-02-21 00:54:02 -08:00
Thom Chiovoloni
ebbccaf6bf
Respond to review feedback, and improve implementation somewhat 2022-02-05 11:15:18 -08:00
Thom Chiovoloni
ed01324835
Fix zh::SMALL string in core::str benchmarks 2022-02-05 11:15:17 -08:00
Thom Chiovoloni
628b217326
Optimize core::str::Chars::count 2022-02-05 11:15:17 -08:00
bors
ffdf18d144 Auto merge of #88788 - falk-hueffner:speedup-int-log10-branchless, r=joshtriplett
Speedup int log10 branchless

This is achieved with a branchless bit-twiddling implementation of the case x < 100_000, and using this as building block.

Benchmark on an Intel i7-8700K (Coffee Lake):

```
name                                   old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
num::int_log::u8_log10_predictable     165          169                     4    2.42%   x 0.98
num::int_log::u8_log10_random          438          423                   -15   -3.42%   x 1.04
num::int_log::u8_log10_random_small    438          423                   -15   -3.42%   x 1.04
num::int_log::u16_log10_predictable    633          417                  -216  -34.12%   x 1.52
num::int_log::u16_log10_random         908          471                  -437  -48.13%   x 1.93
num::int_log::u16_log10_random_small   945          471                  -474  -50.16%   x 2.01
num::int_log::u32_log10_predictable    1,496        1,340                -156  -10.43%   x 1.12
num::int_log::u32_log10_random         1,076        873                  -203  -18.87%   x 1.23
num::int_log::u32_log10_random_small   1,145        874                  -271  -23.67%   x 1.31
num::int_log::u64_log10_predictable    4,005        3,171                -834  -20.82%   x 1.26
num::int_log::u64_log10_random         1,247        1,021                -226  -18.12%   x 1.22
num::int_log::u64_log10_random_small   1,265        921                  -344  -27.19%   x 1.37
num::int_log::u128_log10_predictable   39,667       39,579                -88   -0.22%   x 1.00
num::int_log::u128_log10_random        6,456        6,696                 240    3.72%   x 0.96
num::int_log::u128_log10_random_small  4,108        3,903                -205   -4.99%   x 1.05
```

Benchmark on an M1 Mac Mini:

```
name                                   old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
num::int_log::u8_log10_predictable     143          130                   -13   -9.09%   x 1.10
num::int_log::u8_log10_random          375          325                   -50  -13.33%   x 1.15
num::int_log::u8_log10_random_small    376          325                   -51  -13.56%   x 1.16
num::int_log::u16_log10_predictable    500          322                  -178  -35.60%   x 1.55
num::int_log::u16_log10_random         794          405                  -389  -48.99%   x 1.96
num::int_log::u16_log10_random_small   1,035        405                  -630  -60.87%   x 2.56
num::int_log::u32_log10_predictable    1,144        894                  -250  -21.85%   x 1.28
num::int_log::u32_log10_random         832          786                   -46   -5.53%   x 1.06
num::int_log::u32_log10_random_small   832          787                   -45   -5.41%   x 1.06
num::int_log::u64_log10_predictable    2,681        2,057                -624  -23.27%   x 1.30
num::int_log::u64_log10_random         1,015        806                  -209  -20.59%   x 1.26
num::int_log::u64_log10_random_small   1,004        795                  -209  -20.82%   x 1.26
num::int_log::u128_log10_predictable   56,825       56,526               -299   -0.53%   x 1.01
num::int_log::u128_log10_random        9,056        8,861                -195   -2.15%   x 1.02
num::int_log::u128_log10_random_small  1,528        1,527                  -1   -0.07%   x 1.00
```

The 128 bit case remains ridiculously slow because llvm fails to optimize division by a constant 128-bit value to multiplications. This could be worked around but it seems preferable to fix this in llvm.

From u32 up, table lookup (like suggested [here](https://github.com/rust-lang/rust/issues/70887#issuecomment-881099813)) is still faster, but requires a hardware `leading_zeros` to be viable, and might clog up the cache.
2021-10-12 03:18:54 +00:00
The8472
4c44f061d8 benchmark for str.chars().count() 2021-09-11 00:25:41 +02:00
Falk Hüffner
57c623570a Cosmetic fixes. 2021-09-09 20:06:46 +02:00