Commit Graph

314 Commits

Author SHA1 Message Date
Ralf Jung
3b9f8116a2 get rid of NoMirFor error variant 2021-07-24 14:08:04 +02:00
Ralf Jung
bed3b965ae miri: better ptr-out-of-bounds errors 2021-07-18 10:38:00 +02:00
bors
c78ebb7bdc Auto merge of #87123 - RalfJung:miri-provenance-overhaul, r=oli-obk
CTFE/Miri engine Pointer type overhaul

This fixes the long-standing problem that we are using `Scalar` as a type to represent pointers that might be integer values (since they point to a ZST). The main problem is that with int-to-ptr casts, there are multiple ways to represent the same pointer as a `Scalar` and it is unclear if "normalization" (i.e., the cast) already happened or not. This leads to ugly methods like `force_mplace_ptr` and `force_op_ptr`.
Another problem this solves is that in Miri, it would make a lot more sense to have the `Pointer::offset` field represent the full absolute address (instead of being relative to the `AllocId`). This means we can do ptr-to-int casts without access to any machine state, and it means that the overflow checks on pointer arithmetic are (finally!) accurate.

To solve this, the `Pointer` type is made entirely parametric over the provenance, so that we can use `Pointer<AllocId>` inside `Scalar` but use `Pointer<Option<AllocId>>` when accessing memory (where `None` represents the case that we could not figure out an `AllocId`; in that case the `offset` is an absolute address). Moreover, the `Provenance` trait determines if a pointer with a given provenance can be cast to an integer by simply dropping the provenance.

I hope this can be read commit-by-commit, but the first commit does the bulk of the work. It introduces some FIXMEs that are resolved later.
Fixes https://github.com/rust-lang/miri/issues/841
Miri PR: https://github.com/rust-lang/miri/pull/1851
r? `@oli-obk`
2021-07-17 15:26:27 +00:00
bors
f502bd3abd Auto merge of #86761 - Alexhuszagh:master, r=estebank
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.

# Summary

Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).

Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](0e2b5d163d/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.

In addition, [edge-cases](https://github.com/rust-lang/rust/issues/85234) in Rust's [dec2flt](868c702d0c/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })
```

# Solution

This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.

**Documentation**

Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:

```rust
    // Round-to-even only happens for negative values of q
    // when q ≥ −4 in the 64-bit case and when q ≥ −17 in
    // the 32-bitcase.
    //
    // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
    // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
    // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
    //
    // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
    // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
    // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
    // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
    // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
    //
    // Thus we have that we only need to round ties to even when
    // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
    // (in the 32-bit case). In both cases,the power of five(5^|q|)
    // fits in a 64-bit word.
    const MIN_EXPONENT_ROUND_TO_EVEN: i32;
    const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```

This ensures maintainability of the code base.

**Improvements for Disguised Fast-Path Cases**

The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.

However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in https://github.com/rust-lang/rust/issues/85198.

**Digit Parsing Improvements**

Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.

**Unsafe Changes**

Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.

```rust
impl AsciiStr {
    #[inline]
    pub fn step_by(&mut self, n: usize) -> &mut Self {
        unsafe { self.ptr = self.ptr.add(n) };
        self
    }
}

...

#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    // the first character is 'e'/'E' and scientific mode is enabled
    let start = *s;
    s.step();
    ...
}
```

The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.

```rust
impl AsciiStr {
    /// Advance the view by n, advancing it in-place to (n..).
    pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
        // SAFETY: same as step_by, safe as long n is less than the buffer length
        self.ptr = unsafe { self.ptr.add(n) };
        self
    }
}

...

/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    let start = *s;
    // SAFETY: the first character is 'e'/'E' and scientific mode is enabled
    unsafe {
        s.step();
    }
    ...
}
```

This allows us to trivially demonstrate the new implementation of dec2flt is safe.

**Inline Annotations Have Been Removed**

In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](https://github.com/aldanor/fast-float-rust/issues/15#issuecomment-864485157).

**Fixed Correctness Tests**

Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.

**Undefined Behavior**

An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:

```rust
    #[inline]
    pub fn check_len(&self, n: usize) -> bool {
        unsafe { self.ptr.add(n) <= self.end }
    }
```

And the new implementation is as follows:

```rust
    /// Check if the slice at least `n` length.
    fn check_len(&self, n: usize) -> bool {
        n <= self.as_ref().len()
    }
```

Note that this has since been fixed in [fast-float-rust](https://github.com/aldanor/fast-float-rust/pull/29).

**Inferring Binary Exponents**

Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](868c702d0c/library/core/src/num/dec2flt/table.rs (L8)).

# Code Size

The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.

**new**

Using rustc version 1.55.0-dev.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K

**old**

Using rustc version 1.53.0-nightly.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K

# Correctness

The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham  [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.

# Issues Addressed

This will fix and close the following issues:

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 12:56:22 +00:00
Alex Huszagh
8752b40369 Changed dec2flt to use the Eisel-Lemire algorithm.
Implementation is based off fast-float-rust, with a few notable changes.

- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for #31109 and #31407 in core tests.
- Added regression tests for #31109 and #31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 00:30:34 -05:00
Ralf Jung
efbee50600 avoid manual Debug impls by adding extra Provenance bounds to types
I wish the derive macro would support adding extra where clauses...
2021-07-16 20:02:14 +02:00
Ralf Jung
a5299fb688 add some comments regarding the two major quirks of our memory model 2021-07-16 13:16:09 +02:00
Ralf Jung
7c720ce612 get rid of incorrect erase_for_fmt 2021-07-16 10:09:56 +02:00
Ralf Jung
4e28065618 tweak pointer out-of-bounds error message 2021-07-15 22:47:11 +02:00
Ralf Jung
adbe7554d7 enable Miri to fix the bytes in an allocation (since ptr offsets have different meanings there) 2021-07-15 18:03:22 +02:00
Ralf Jung
f4b61ba509 adjustions and cleanup to make Miri build again 2021-07-15 17:14:11 +02:00
Ralf Jung
8932aebfdf remove unused error variant 2021-07-14 18:17:50 +02:00
Ralf Jung
ae950a2dc7 more precise message for the ptr access check on deref 2021-07-14 18:17:49 +02:00
Ralf Jung
71c166a0dc use NonZeroU64 for AllocId to restore old type sizes 2021-07-14 18:17:49 +02:00
Ralf Jung
626605cea0 consistently treat None-tagged pointers as ints; get rid of some deprecated Scalar methods 2021-07-14 18:17:49 +02:00
Ralf Jung
d4f7dd6702 CTFE/Miri engine Pointer type overhaul: make Scalar-to-Pointer conversion infallible
This resolves all the problems we had around "normalizing" the representation of a Scalar in case it carries a Pointer value: we can just use Pointer if we want to have a value taht we are sure is already normalized.
2021-07-14 18:17:46 +02:00
Ralf Jung
c8baac5776 remove remaining use of Pointer in Allocation API 2021-07-12 18:45:26 +02:00
Mara Bos
e920ef8785
Rollup merge of #86855 - LeSeulArtichaut:patch-1, r=davidtwco
Fix comments about unique borrows
2021-07-09 16:20:32 +02:00
bjorn3
56c6a48d2e Truncate hex stable crate id to 8 characters (32 bits) 2021-07-06 11:36:23 +02:00
bjorn3
489ad8b8b5 Revert "Revert "Merge CrateDisambiguator into StableCrateId""
This reverts commit 8176ab8bc1.
2021-07-06 11:28:04 +02:00
Yuki Okushi
1ca32050b1
Rollup merge of #86685 - RalfJung:alloc-mut, r=oli-obk
double-check mutability inside Allocation

r? `@oli-obk`
2021-07-06 02:33:14 +09:00
LeSeulArtichaut
88bcd8a25e Fix comments about unique borrows 2021-07-04 03:34:08 +02:00
Smitty
e9d69d9f8e Allocation failure in constprop panics right away 2021-07-02 16:06:12 -04:00
Smitty
3e20129a18 Delay ICE on evaluation fail 2021-06-30 15:38:31 -04:00
Smittyvb
12a8d106f6
Note that even ConstProp follows the rules
Co-authored-by: Ralf Jung <post@ralfj.de>
2021-06-30 12:42:04 -04:00
Smitty
9f227945f1 Simplify memory failure checking 2021-06-30 11:24:52 -04:00
Smitty
ba542eebc0 Rename is_spurious -> is_volatile 2021-06-30 09:27:30 -04:00
Smittyvb
55379bb7ea
simplify explanation comment
Co-authored-by: Ralf Jung <post@ralfj.de>
2021-06-30 09:07:47 -04:00
Smitty
d04da1125d Properly handle const prop failures 2021-06-29 20:22:32 -04:00
Smitty
ab66c3fbd4 Add comment with reasoning for non-determinism 2021-06-29 19:08:30 -04:00
Smitty
43b55cf893 Simplify allocation creation 2021-06-29 19:08:29 -04:00
Smitty
dc1c6c3a25 Make memory exhaustion a hard error 2021-06-29 19:08:29 -04:00
Smitty
524e575bb4 Support allocation failures when interperting MIR
Note that this breaks Miri.

Closes #79601
2021-06-29 19:08:26 -04:00
Ralf Jung
719dafc48b double-check mutability inside Allocation 2021-06-28 09:19:36 +02:00
Cameron Steffen
b07bb6d698 Fix unused_unsafe with compiler-generated unsafe 2021-06-21 17:25:45 -05:00
bors
ec57c60c50 Auto merge of #86194 - RalfJung:const-ub-hard-error, r=oli-obk
make UB during CTFE a hard error

This is a next step for https://github.com/rust-lang/rust/issues/71800. `const_err` has been a future-incompatibility lint for 4 months now since https://github.com/rust-lang/rust/pull/80394 (and err-by-default for many years before that), so I think we could try making it a proper hard error at least in some situations.

I didn't yet adjust the tests, since I first want to gauge the fall-out via crater.
Cc `@rust-lang/wg-const-eval`
2021-06-18 23:17:40 +00:00
bors
312b894cc1 Auto merge of #85421 - Smittyvb:rm_pushpop_unsafe, r=matthewjasper
Remove some last remants of {push,pop}_unsafe!

These macros have already been removed, but there was still some code handling these macros. That code is now removed.
2021-06-18 14:17:53 +00:00
Ralf Jung
3c08cf8e5e make UB during CTFE a hard error 2021-06-18 16:00:04 +02:00
Yuki Okushi
c062f3dddd
Rollup merge of #86340 - Smittyvb:ctfe-hard-error-message, r=RalfJung
Use better error message for hard errors in CTFE

I noticed this while working on #86255: currently the same message is used for hard errors and soft errors in CTFE. This changes the error messages to make hard errors use a message that indicates the reality of the situation correctly, since usage of the constant is never allowed when there was a hard error evaluating it. This doesn't affect the behaviour of these error messages, only the content.

This changes the error logic to check if the error should be hard or soft where it is generated, instead of where it is emitted, to allow this distinction in error messages.
2021-06-17 21:56:43 +09:00
Smitty
044b3620e7 Move some hard error logic to InterpError 2021-06-16 18:23:34 -04:00
Rémy Rakic
19fddc019f Improve documentation on UndefinedBehaviorInfo::ValidationFailure 2021-06-14 18:57:06 +02:00
Rémy Rakic
87ecf84c36 Improve CTFE validation error message 2021-06-13 22:40:42 +02:00
bors
fb3ea63d9b Auto merge of #86245 - lqd:const-ub-align, r=RalfJung
Fix ICEs on invalid vtable size/alignment const UB errors

The invalid vtable size/alignment errors from `InterpCx::read_size_and_align_from_vtable` were "freeform const UB errors", causing ICEs when reaching validation. This PR turns them into const UB hard errors to catch them during validation and avoid that.

Fixes #86193

r? `@RalfJung`

(It seemed cleaner to have 2 variants but they can be merged into one variant with a message payload if you prefer that ?)
2021-06-13 12:08:59 +00:00
Rémy Rakic
cae1918b29 Turn incorrect vtable size/alignment errors into hard const-UB errors
They were "freeform const UB" error message, but could reach validation
and trigger ICEs there. We now catch them during validation to avoid
that.
2021-06-13 13:11:07 +02:00
bors
d59b80d588 Auto merge of #86130 - BoxyUwU:abstract_const_as_cast, r=oli-obk
const_eval_checked: Support as casts in abstract consts
2021-06-12 08:50:22 +00:00
bors
c4b5406981 Auto merge of #86118 - spastorino:tait-soundness-bug, r=nikomatsakis
Create different inference variables for different defining uses of TAITs

Fixes #73481

r? `@nikomatsakis`
cc `@oli-obk`
2021-06-09 09:00:16 +00:00
Ellen
8e7299dfcd Support as casts in abstract consts 2021-06-08 08:02:16 +01:00
Santiago Pastorino
7294f49d52
Remove ResolvedOpaqueTy and just use Ty, SubstsRef is already there 2021-06-07 19:07:07 -03:00
Santiago Pastorino
7f8cad2019
Make OpaqueTypeKey the key of opaque types map 2021-06-07 19:04:52 -03:00
Santiago Pastorino
3405725e00
Change concrete opaque type to be a VecMap 2021-06-07 19:04:19 -03:00