mem::uninitialized: mitigate many incorrect uses of this function
Alternative to https://github.com/rust-lang/rust/pull/98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.
This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.
This is somewhat similar to https://github.com/rust-lang/rust/pull/87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.
`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc https://github.com/rust-lang/rust/issues/66151
Fixes https://github.com/rust-lang/rust/issues/87675
Fix slice::ChunksMut aliasing
Fixes https://github.com/rust-lang/rust/issues/94231, details in that issue.
cc `@RalfJung`
This isn't done just yet, all the safety comments are placeholders. But otherwise, it seems to work.
I don't really like this approach though. There's a lot of unsafe code where there wasn't before, but as far as I can tell the only other way to uphold the aliasing requirement imposed by `__iterator_get_unchecked` is to use raw slices, which I think require the same amount of unsafe code. All that would do is tie the `len` and `ptr` fields together.
Oh I just looked and I'm pretty sure that `ChunksExactMut`, `RChunksMut`, and `RChunksExactMut` also need to be patched. Even more reason to put up a draft.
Rollup of 5 pull requests
Successful merges:
- #99079 (Check that RPITs constrained by a recursive call in a closure are compatible)
- #99704 (Add `Self: ~const Trait` to traits with `#[const_trait]`)
- #99769 (Sync rustc_codegen_cranelift)
- #99783 (rustdoc: remove Clean trait impls for more items)
- #99789 (Refactor: use `pluralize!`)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Remove some redundant checks from BufReader
The implementation of BufReader contains a lot of redundant checks. While any one of these checks is not particularly expensive to execute, especially when taken together they dramatically inhibit LLVM's ability to make subsequent optimizations by confusing data flow increasing the code size of anything that uses BufReader.
In particular, these changes have a ~2x increase on the benchmark that this adds a `black_box` to. I'm adding that `black_box` here just in case LLVM gets clever enough to remove the reads entirely. Right now it can't, but these optimizations are really setting it up to do so.
We get this optimization by factoring all the actual buffer management and bounds-checking logic into a new module inside `bufreader` with a new `Buffer` type. This makes it much easier to ensure that we have correctly encapsulated the management of the region of the buffer that we have read bytes into, and it lets us provide a new faster way to do small reads. `Buffer::consume_with` lets a caller do a read from the buffer with a single bounds check, instead of the double-check that's required to use `buffer` + `consume`.
Unfortunately I'm not aware of a lot of open-source usage of `BufReader` in perf-critical environments. Some time ago I tweaked this code because I saw `BufReader` in a profile at work, and I contributed some benchmarks to the `bincode` crate which exercise `BufReader::buffer`. These changes appear to help those benchmarks at little, but all these sorts of benchmarks are kind of fragile so I'm wary of quoting anything specific.
Rollup of 8 pull requests
Successful merges:
- #98583 (Stabilize Windows `FileTypeExt` with `is_symlink_dir` and `is_symlink_file`)
- #99698 (Prefer visibility map parents that are not `doc(hidden)` first)
- #99700 (Add a clickable link to the layout section)
- #99712 (passes: port more of `check_attr` module)
- #99759 (Remove dead code from cg_llvm)
- #99765 (Don't build std for *-uefi targets)
- #99771 (Update pulldown-cmark version to 0.9.2 (fixes url encoding for some chars))
- #99775 (rustdoc: do not allocate String when writing path full name)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Stabilize Windows `FileTypeExt` with `is_symlink_dir` and `is_symlink_file`
These calls allow detecting whether a symlink is a file or a directory,
a distinction Windows maintains, and one important to software that
wants to do further operations on the symlink (e.g. removing it).
Optimized vec::IntoIter::next_chunk impl
```
x86_64v1, default
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
On znver2 the default impl seems to be slow due to different inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deep call tree.
codegen: use new {re,de,}allocator annotations in llvm
This obviates the patch that teaches LLVM internals about
_rust_{re,de}alloc functions by putting annotations directly in the IR
for the optimizer.
The sole test change is required to anchor FileCheck to the body of the
`box_uninitialized` method, so it doesn't see the `allocalign` on
`__rust_alloc` and get mad about the string `alloca` showing up. Since I
was there anyway, I added some checks on the attributes to prove the
right attributes got set.
r? `@nikic`
```
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
The znver2 default impl seems to be slow due to inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deeper call tree.
This obviates the patch that teaches LLVM internals about
_rust_{re,de}alloc functions by putting annotations directly in the IR
for the optimizer.
The sole test change is required to anchor FileCheck to the body of the
`box_uninitialized` method, so it doesn't see the `allocalign` on
`__rust_alloc` and get mad about the string `alloca` showing up. Since I
was there anyway, I added some checks on the attributes to prove the
right attributes got set.
While we're here, we also emit allocator attributes on
__rust_alloc_zeroed. This should allow LLVM to perform more
optimizations for zeroed blocks, and probably fixes#90032. [This
comment](https://github.com/rust-lang/rust/issues/24194#issuecomment-308791157)
mentions "weird UB-like behaviour with bitvec iterators in
rustc_data_structures" so we may need to back this change out if things
go wrong.
The new test cases require LLVM 15, so we copy them into LLVM
14-supporting versions, which we can delete when we drop LLVM 14.
interpret, ptr_offset_from: refactor and test too-far-apart check
We didn't have any tests for the "too far apart" message, and indeed that check mostly relied on the in-bounds check and was otherwise probably not entirely correct... so I rewrote that check, and it is before the in-bounds check so we can test it separately.
Expose size_hint() for TokenStream's iterator
The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator:
babff2211e/library/proc_macro/src/lib.rs (L363-L371)
so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction:
babff2211e/library/alloc/src/vec/into_iter.rs (L170-L177)
I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation.
Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (https://github.com/rust-lang/rust/issues/77404) and I noticed this comment at the top of libproc_macro:
babff2211e/library/proc_macro/src/lib.rs (L20-L22)
correct the output of a `capacity` method example
The output of this example in std::alloc is different from which shown in the comment. I have tested it on both Linux and Windows.
Constify a few `(Partial)Ord` impls
Only a few `impl`s are constified for now, as #92257 has not landed in the bootstrap compiler yet and quite a few impls would need that fix.
This unblocks #92228, which unblocks marking iterator methods as `default_method_body_is_const`.
add miri-track-caller to more intrinsic-exposing methods
Follow-up to https://github.com/rust-lang/rust/pull/98674: I went through the Miri test suite to find more functions that would benefit from Miri backtrace pruning, and this is what I found.
Basically anything that just exposes a potentially-UB intrinsic to the user should get this treatment.
kmc-solid: Use `libc::abort` to abort a program
This PR updates the target-specific abort subroutine for the [`*-kmc-solid_*`](https://doc.rust-lang.org/nightly/rustc/platform-support/kmc-solid.html) Tier 3 targets.
The current implementation uses a `hlt` instruction, which is the most direct way to notify a connected debugger but is not the most flexible way. This PR changes it to call the `abort` libc function, making it possible for a system designer to override its behavior as they see fit.
Support vec zero-alloc optimization for tuples and byte arrays
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
The implementation of BufReader contains a lot of redundant checks.
While any one of these checks is not particularly expensive to execute,
especially when taken together they dramatically inhibit LLVM's ability
to make subsequent optimizations.