implement TrustedLen for Flatten/FlatMap if the U: IntoIterator == [T; N]
This only works if arrays are passed directly instead of array iterators
because we need to be sure that they have not been advanced before
Flatten does its size calculation.
resolves#87094
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.
# Summary
Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).
Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](0e2b5d163d/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.
In addition, [edge-cases](https://github.com/rust-lang/rust/issues/85234) in Rust's [dec2flt](868c702d0c/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.
Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.
```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid })
```
# Solution
This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.
**Documentation**
Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:
```rust
// Round-to-even only happens for negative values of q
// when q ≥ −4 in the 64-bit case and when q ≥ −17 in
// the 32-bitcase.
//
// When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
// have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
// 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
//
// When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
// so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
// or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
// (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
// or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
//
// Thus we have that we only need to round ties to even when
// we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
// (in the 32-bit case). In both cases,the power of five(5^|q|)
// fits in a 64-bit word.
const MIN_EXPONENT_ROUND_TO_EVEN: i32;
const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```
This ensures maintainability of the code base.
**Improvements for Disguised Fast-Path Cases**
The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.
However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in https://github.com/rust-lang/rust/issues/85198.
**Digit Parsing Improvements**
Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.
**Unsafe Changes**
Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.
```rust
impl AsciiStr {
#[inline]
pub fn step_by(&mut self, n: usize) -> &mut Self {
unsafe { self.ptr = self.ptr.add(n) };
self
}
}
...
#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
// the first character is 'e'/'E' and scientific mode is enabled
let start = *s;
s.step();
...
}
```
The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.
```rust
impl AsciiStr {
/// Advance the view by n, advancing it in-place to (n..).
pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
// SAFETY: same as step_by, safe as long n is less than the buffer length
self.ptr = unsafe { self.ptr.add(n) };
self
}
}
...
/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
let start = *s;
// SAFETY: the first character is 'e'/'E' and scientific mode is enabled
unsafe {
s.step();
}
...
}
```
This allows us to trivially demonstrate the new implementation of dec2flt is safe.
**Inline Annotations Have Been Removed**
In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](https://github.com/aldanor/fast-float-rust/issues/15#issuecomment-864485157).
**Fixed Correctness Tests**
Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.
**Undefined Behavior**
An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:
```rust
#[inline]
pub fn check_len(&self, n: usize) -> bool {
unsafe { self.ptr.add(n) <= self.end }
}
```
And the new implementation is as follows:
```rust
/// Check if the slice at least `n` length.
fn check_len(&self, n: usize) -> bool {
n <= self.as_ref().len()
}
```
Note that this has since been fixed in [fast-float-rust](https://github.com/aldanor/fast-float-rust/pull/29).
**Inferring Binary Exponents**
Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](868c702d0c/library/core/src/num/dec2flt/table.rs (L8)).
# Code Size
The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.
**new**
Using rustc version 1.55.0-dev.
opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K
**old**
Using rustc version 1.53.0-nightly.
opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K
# Correctness
The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.
# Issues Addressed
This will fix and close the following issues:
- resolves#85198
- resolves#85214
- resolves#85234
- fixes#31407
- fixes#31109
- fixes#53015
- resolves#68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
Implementation is based off fast-float-rust, with a few notable changes.
- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for #31109 and #31407 in core tests.
- Added regression tests for #31109 and #31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.
- resolves#85198
- resolves#85214
- resolves#85234
- fixes#31407
- fixes#31109
- fixes#53015
- resolves#68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
Due to #20400 the corresponding TrustedLen impls need a helper trait
instead of directly adding `Item = &[T;N]` bounds.
Since TrustedLen is a public trait this in turn means
the helper trait needs to be public. Since it's just a workaround
for a compiler deficit it's marked hidden, unstable and unsafe.
This only works if arrays are passed directly instead of array iterators
because we need to be sure that they have not been advanced before
Flatten does its size calculation.
Add Integer::log variants
_This is another attempt at landing https://github.com/rust-lang/rust/pull/70835, which was approved by the libs team but failed on Android tests through Bors. The text copied here is from the original issue. The only change made so far is the addition of non-`checked_` variants of the log methods._
_Tracking issue: #70887_
---
This implements `{log,log2,log10}` methods for all integer types. The implementation was provided by `@substack` for use in the stdlib.
_Note: I'm not big on math, so this PR is a best effort written with limited knowledge. It's likely I'll be getting things wrong, but happy to learn and correct. Please bare with me._
## Motivation
Calculating the logarithm of a number is a generally useful operation. Currently the stdlib only provides implementations for floats, which means that if we want to calculate the logarithm for an integer we have to cast it to a float and then back to an int.
> would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure
_— [`@substack,` 2020-03-08](https://twitter.com/substack/status/1236445105197727744)_
At higher numbers converting from an integer to a float we also risk overflows. This means that Rust currently only provides log operations for a limited set of integers.
The process of doing log operations by converting between floats and integers is also prone to rounding errors. In the following example we're trying to calculate `base10` for an integer. We might try and calculate the `base2` for the values, and attempt [a base swap](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules) to arrive at `base10`. However because we're performing intermediate rounding we arrive at the wrong result:
```rust
// log10(900) = ~2.95 = 2
dbg!(900f32.log10() as u64);
// log base change rule: logb(x) = logc(x) / logc(b)
// log2(900) / log2(10) = 9/3 = 3
dbg!((900f32.log2() as u64) / (10f32.log2() as u64));
```
_[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)_
This is somewhat nuanced as a lot of the time it'll work well, but in real world code this could lead to some hard to track bugs. By providing correct log implementations directly on integers we can help prevent errors around this.
## Implementation notes
I checked whether LLVM intrinsics existed before implementing this, and none exist yet. ~~Also I couldn't really find a better way to write the `ilog` function. One option would be to make it a private method on the number, but I didn't see any precedent for that. I also didn't know where to best place the tests, so I added them to the bottom of the file. Even though they might seem like quite a lot they take no time to execute.~~
## References
- [Log rules](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules)
- [Rounding error playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)
- [substack's tweet asking about integer log2 in the stdlib](https://twitter.com/substack/status/1236445105197727744)
- [Integer Logarithm, A. Jaffer 2008](https://people.csail.mit.edu/jaffer/III/ilog.pdf)
core: add unstable no_fp_fmt_parse to disable float formatting code
In some projects (e.g. kernel), floating point is forbidden. They can disable
hardware floating point support and use `+soft-float` to avoid fp instructions
from being generated, but as libcore contains the formatting code for `f32`
and `f64`, some fp intrinsics are depended. One could define stubs for these
intrinsics that just panic [1], but it means that if any formatting functions
are accidentally used, mistake can only be caught during the runtime rather
than during compile-time or link-time, and they consume a lot of space without
LTO.
This patch provides an unstable cfg `no_fp_fmt_parse` to disable these.
A panicking stub is still provided for the `Debug` implementation (unfortunately)
because there are some SIMD types that use `#[derive(Debug)]`.
[1]: https://lkml.org/lkml/2021/4/14/1028
This was unsound since a panic in a.next_back() would result in the
length not being updated which would then lead to the same element
being revisited in the side-effect preserving code.
to_digit simplification (less jumps)
I just realised we might be able to make use of the fact that changing case in ascii is easy to help simplify to_digit some more.
It looks a bit cleaner and it looks like it's less jumps and there's less instructions in the generated assembly:
https://godbolt.org/z/84Erh5dhz
The benchmarks don't really tell me much. Maybe a slight improvement on the var radix.
Before:
```
test char::methods::bench_to_digit_radix_10 ... bench: 53,819 ns/iter (+/- 8,314)
test char::methods::bench_to_digit_radix_16 ... bench: 57,265 ns/iter (+/- 10,730)
test char::methods::bench_to_digit_radix_2 ... bench: 55,077 ns/iter (+/- 5,431)
test char::methods::bench_to_digit_radix_36 ... bench: 56,549 ns/iter (+/- 3,248)
test char::methods::bench_to_digit_radix_var ... bench: 43,848 ns/iter (+/- 3,189)
test char::methods::bench_to_digit_radix_10 ... bench: 51,707 ns/iter (+/- 10,946)
test char::methods::bench_to_digit_radix_16 ... bench: 52,835 ns/iter (+/- 2,689)
test char::methods::bench_to_digit_radix_2 ... bench: 51,012 ns/iter (+/- 2,746)
test char::methods::bench_to_digit_radix_36 ... bench: 53,210 ns/iter (+/- 8,645)
test char::methods::bench_to_digit_radix_var ... bench: 40,386 ns/iter (+/- 4,711)
test char::methods::bench_to_digit_radix_10 ... bench: 54,088 ns/iter (+/- 5,677)
test char::methods::bench_to_digit_radix_16 ... bench: 55,972 ns/iter (+/- 17,229)
test char::methods::bench_to_digit_radix_2 ... bench: 52,083 ns/iter (+/- 2,425)
test char::methods::bench_to_digit_radix_36 ... bench: 54,132 ns/iter (+/- 1,548)
test char::methods::bench_to_digit_radix_var ... bench: 41,250 ns/iter (+/- 5,299)
```
After:
```
test char::methods::bench_to_digit_radix_10 ... bench: 48,907 ns/iter (+/- 19,449)
test char::methods::bench_to_digit_radix_16 ... bench: 52,673 ns/iter (+/- 8,122)
test char::methods::bench_to_digit_radix_2 ... bench: 48,509 ns/iter (+/- 2,885)
test char::methods::bench_to_digit_radix_36 ... bench: 50,526 ns/iter (+/- 4,610)
test char::methods::bench_to_digit_radix_var ... bench: 38,618 ns/iter (+/- 3,180)
test char::methods::bench_to_digit_radix_10 ... bench: 54,202 ns/iter (+/- 6,994)
test char::methods::bench_to_digit_radix_16 ... bench: 56,585 ns/iter (+/- 8,448)
test char::methods::bench_to_digit_radix_2 ... bench: 50,548 ns/iter (+/- 1,674)
test char::methods::bench_to_digit_radix_36 ... bench: 52,749 ns/iter (+/- 2,576)
test char::methods::bench_to_digit_radix_var ... bench: 40,215 ns/iter (+/- 3,327)
test char::methods::bench_to_digit_radix_10 ... bench: 50,233 ns/iter (+/- 22,272)
test char::methods::bench_to_digit_radix_16 ... bench: 50,841 ns/iter (+/- 19,981)
test char::methods::bench_to_digit_radix_2 ... bench: 50,386 ns/iter (+/- 4,555)
test char::methods::bench_to_digit_radix_36 ... bench: 52,369 ns/iter (+/- 2,737)
test char::methods::bench_to_digit_radix_var ... bench: 40,417 ns/iter (+/- 2,766)
```
I removed the likely as it resulted in a few less instructions. (It's not been in there long - I added it in the last to_digit iteration).
Make copy/copy_nonoverlapping fn's again
Make copy/copy_nonoverlapping fn's again, rather than intrinsics.
This a short-term change to address issue #84297.
It effectively reverts PRs #81167#81238 (and part of #82967), #83091, and parts of #79684.
Update standard library for IntoIterator implementation of arrays
This PR partially resolves issue #84513 of updating the standard library part.
I haven't found any remaining doctest examples which are using iterators over e.g. &i32 instead of just i32 in the standard library. Can anyone point me to them if there's remaining any?
Thanks!
r? ```@m-ou-se```
While stdlib implementations of the unchecked methods require unchecked
math, there is no reason to gate it behind this for external users. The
reasoning for a separate `step_trait_ext` feature is unclear, and as
such has been merged as well.
Implement indexing slices with pairs of core::ops::Bound<usize>
Closes#49976.
I am not sure about code duplication between `check_range` and `into_maybe_range`. Should be former implemented in terms of the latter? Also this PR doesn't address code duplication between `impl SliceIndex for Range*`.
Format `Struct { .. }` on one line even with `{:#?}`.
The result of `debug_struct("A").finish_non_exhaustive()` before this change:
```
A {
..
}
```
And after this change:
```
A { .. }
```
If there's any fields, the result stays unchanged:
```
A {
field: value,
..
}