Implement most of RFC 2930, providing the ReadBuf abstraction
This replaces the `Initializer` abstraction for permitting reading into uninitialized buffers, closing #42788.
This leaves several APIs described in the RFC out of scope for the initial implementation:
* read_buf_vectored
* `ReadBufs`
Closes#42788, by removing the relevant APIs.
Reading a file into an empty vector or string buffer can incur
unnecessary `read` syscalls and memory re-allocations as the buffer
"warms up" and grows to its final size. This is perhaps a necessary evil
with generic readers, but files can be read in smarter by checking the
file size and reserving that much capacity.
`std::fs::read` and `read_to_string` already perform this optimization:
they open the file, reads its metadata, and call `with_capacity` with
the file size. This ensures that the buffer does not need to be resized
and an initial string of small `read` syscalls.
However, if a user opens the `File` themselves and calls
`file.read_to_end` or `file.read_to_string` they do not get this
optimization.
```rust
let mut buf = Vec::new();
file.read_to_end(&mut buf)?;
```
I searched through this project's codebase and even here are a *lot* of
examples of this. They're found all over in unit tests, which isn't a
big deal, but there are also several real instances in the compiler and
in Cargo. I've documented the ones I found in a comment here:
https://github.com/rust-lang/rust/issues/89516#issuecomment-934423999
Most telling, the `Read` trait and the `read_to_end` method both show
this exact pattern as examples of how to use readers. What this says to
me is that this shouldn't be solved by simply fixing the instances of it
in this codebase. If it's here it's certain to be prevalent in the wider
Rust ecosystem.
To that end, this commit adds specializations of `read_to_end` and
`read_to_string` directly on `File`. This way it's no longer a minor
footgun to start with an empty buffer when reading a file in.
A nice side effect of this change is that code that accesses a `File` as
a bare `Read` constraint or via a `dyn Read` trait object will benefit.
For example, this code from `compiler/rustc_serialize/src/json.rs`:
```rust
pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
let mut contents = Vec::new();
match rdr.read_to_end(&mut contents) {
```
Related changes:
- I also added specializations to `BufReader` to delegate to
`self.inner`'s methods. That way it can call `File`'s optimized
implementations if the inner reader is a file.
- The private `std::io::append_to_string` function is now marked
`unsafe`.
- `File::read_to_string` being more efficient means that the performance
note for `io::read_to_string` can be softened. I've added @camelid's
suggested wording from:
https://github.com/rust-lang/rust/issues/80218#issuecomment-936806502
Fix read_to_end to not grow an exact size buffer
If you know how much data to expect and use `Vec::with_capacity` to pre-allocate a buffer of that capacity, `Read::read_to_end` will still double its capacity. It needs some space to perform a read, even though that read ends up returning `0`.
It's a bummer to carefully pre-allocate 1GB to read a 1GB file into memory and end up using 2GB.
This fixes that behavior by special casing a full buffer and reading into a small "probe" buffer instead. If that read returns `0` then it's confirmed that the buffer was the perfect size. If it doesn't, the probe buffer is appended to the normal buffer and the read loop continues.
Fixing this allows several workarounds in the standard library to be removed:
- `Take` no longer needs to override `Read::read_to_end`.
- The `reservation_size` callback that allowed `Take` to inhibit the previous over-allocation behavior isn't needed.
- `fs::read` doesn't need to reserve an extra byte in `initial_buffer_size`.
Curiously, there was a unit test that specifically checked that `Read::read_to_end` *does* over-allocate. I removed that test, too.
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::fmt
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::{rc, sync}
----------
Fix spacing for links inside code blocks, and improve link tooltips in alloc::string
----------
Fix spacing for links inside code blocks in alloc::vec
----------
Fix spacing for links inside code blocks in core::option
----------
Fix spacing for links inside code blocks, and improve a few link tooltips in core::result
----------
Fix spacing for links inside code blocks in core::{iter::{self, iterator}, stream::stream, poll}
----------
Fix spacing for links inside code blocks, and improve a few link tooltips in std::{fs, path}
----------
Fix spacing for links inside code blocks in std::{collections, time}
----------
Fix spacing for links inside code blocks in and make formatting of `&str`-like types consistent in std::ffi::{c_str, os_str}
----------
Fix spacing for links inside code blocks, and improve link tooltips in std::ffi
----------
Fix spacing for links inside code blocks, and improve a few link tooltips
in std::{io::{self, buffered::{bufreader, bufwriter}, cursor, util}, net::{self, addr}}
----------
Fix typo in link to `into` for `OsString` docs
----------
Remove tooltips that will probably become redundant in the future
----------
Apply suggestions from code review
Replacing `…std/primitive.reference.html` paths with just `reference`
Co-authored-by: Joshua Nelson <github@jyn.dev>
----------
Also replace `…std/primitive.reference.html` paths with just `reference` in `core::pin`
If you know how much data to expect and use `Vec::with_capacity` to
pre-allocate a buffer of that capacity, `Read::read_to_end` will still
double its capacity. It needs some space to perform a read, even though
that read ends up returning `0`.
It's a bummer to carefully pre-allocate 1GB to read a 1GB file into
memory and end up using 2GB.
This fixes that behavior by special casing a full buffer and reading
into a small "probe" buffer instead. If that read returns `0` then it's
confirmed that the buffer was the perfect size. If it doesn't, the probe
buffer is appended to the normal buffer and the read loop continues.
Fixing this allows several workarounds in the standard library to be
removed:
- `Take` no longer needs to override `Read::read_to_end`.
- The `reservation_size` callback that allowed `Take` to inhibit the
previous over-allocation behavior isn't needed.
- `fs::read` doesn't need to reserve an extra byte in
`initial_buffer_size`.
Curiously, there was a unit test that specifically checked that
`Read::read_to_end` *does* over-allocate. I removed that test, too.
Add diagnostic items for Clippy
This adds a bunch of diagnostic items to `std`/`core`/`alloc` functions, structs and traits used in Clippy. The actual refactorings in Clippy to use these items will be done in a different PR in Clippy after the next sync.
This PR doesn't include all paths Clippy uses, I've only gone through the first 85 lines of Clippy's [`paths.rs`](ecf85f4bdc/clippy_utils/src/paths.rs) (after rust-lang/rust-clippy#7466) to get some feedback early on. I've also decided against adding diagnostic items to methods, as it would be nicer and more scalable to access them in a nicer fashion, like adding a `is_diagnostic_assoc_item(did, sym::Iterator, sym::map)` function or something similar (Suggested by `@camsteffen` [on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/147480-t-compiler.2Fwg-diagnostics/topic/Diagnostic.20Item.20Naming.20Convention.3F/near/225024603))
There seems to be some different naming conventions when it comes to diagnostic items, some use UpperCamelCase (`BinaryHeap`) and some snake_case (`hashmap_type`). This PR uses UpperCamelCase for structs and traits and snake_case with the module name as a prefix for functions. Any feedback on is this welcome.
cc: rust-lang/rust-clippy#5393
r? `@Manishearth`
Remove some doc aliases
As per the new doc alias policy in https://github.com/rust-lang/std-dev-guide/pull/25, this removes some controversial doc aliases:
- `malloc`, `alloc`, `realloc`, etc.
- `length` (alias for `len`)
- `delete` (alias for `remove` in collections and also file/directory deletion)
r? `@joshtriplett`
Redefine `ErrorKind::Other` and stop using it in std.
This implements the idea I shared yesterday in the libs meeting when we were discussing how to handle adding new `ErrorKind`s to the standard library: This redefines `Other` to be for *user defined errors only*, and changes all uses of `Other` in the standard library to a `#[doc(hidden)]` and permanently `#[unstable]` `ErrorKind` that users can not match on. This ensures that adding `ErrorKind`s at a later point in time is not a breaking change, since the user couldn't match on these errors anyway. This way, we use the `#[non_exhaustive]` property of the enum in a more effective way.
Open questions:
- How do we check this change doesn't cause too much breakage? Will a crate run help and be enough?
- How do we ensure we don't accidentally start using `Other` again in the standard library? We don't have a `pub(not crate)` or `#[deprecated(in this crate only)]`.
cc https://github.com/rust-lang/rust/pull/79965
cc `@rust-lang/libs` `@ijackson`
r? `@dtolnay`
This patch adds doc aliases for "delete". The added aliases are
supposed to reference usages `delete` in other programming
languages.
- `HashMap::remove`, `BTreeMap::remove` -> `Map#delete` and `delete`
keyword in JavaScript.
- `HashSet::remove`, `BTreeSet::remove` -> `Set#delete` in JavaScript.
- `mem::drop` -> `delete` keyword in C++.
- `fs::remove_file`, `fs::remove_dir`, `fs::remove_dir_all`
-> `File#delete` in Java, `File#delete` and `Dir#delete` in Ruby.
Before this change, searching for "delete" in documentation
returned no results.
specialize io::copy to use copy_file_range, splice or sendfile
Fixes#74426.
Also covers #60689 but only as an optimization instead of an official API.
The specialization only covers std-owned structs so it should avoid the problems with #71091
Currently linux-only but it should be generalizable to other unix systems that have sendfile/sosplice and similar.
There is a bit of optimization potential around the syscall count. Right now it may end up doing more syscalls than the naive copy loop when doing short (<8KiB) copies between file descriptors.
The test case executes the following:
```
[pid 103776] statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=17, ...}) = 0
[pid 103776] write(4, "wxyz", 4) = 4
[pid 103776] write(4, "iklmn", 5) = 5
[pid 103776] copy_file_range(3, NULL, 4, NULL, 5, 0) = 5
```
0-1 `stat` calls to identify the source file type. 0 if the type can be inferred from the struct from which the FD was extracted
𝖬 `write` to drain the `BufReader`/`BufWriter` wrappers. only happen when buffers are present. 𝖬 ≾ number of wrappers present. If there is a write buffer it may absorb the read buffer contents first so only result in a single write. Vectored writes would also be an option but that would require more invasive changes to `BufWriter`.
𝖭 `copy_file_range`/`splice`/`sendfile` until file size, EOF or the byte limit from `Take` is reached. This should generally be *much* more efficient than the read-write loop and also have other benefits such as DMA offload or extent sharing.
## Benchmarks
```
OLD
test io::tests::bench_file_to_file_copy ... bench: 21,002 ns/iter (+/- 750) = 6240 MB/s [ext4]
test io::tests::bench_file_to_file_copy ... bench: 35,704 ns/iter (+/- 1,108) = 3671 MB/s [btrfs]
test io::tests::bench_file_to_socket_copy ... bench: 57,002 ns/iter (+/- 4,205) = 2299 MB/s
test io::tests::bench_socket_pipe_socket_copy ... bench: 142,640 ns/iter (+/- 77,851) = 918 MB/s
NEW
test io::tests::bench_file_to_file_copy ... bench: 14,745 ns/iter (+/- 519) = 8889 MB/s [ext4]
test io::tests::bench_file_to_file_copy ... bench: 6,128 ns/iter (+/- 227) = 21389 MB/s [btrfs]
test io::tests::bench_file_to_socket_copy ... bench: 13,767 ns/iter (+/- 3,767) = 9520 MB/s
test io::tests::bench_socket_pipe_socket_copy ... bench: 26,471 ns/iter (+/- 6,412) = 4951 MB/s
```
POSIX leaves it implementation-defined whether `link` follows symlinks.
In practice, for example, on Linux it does not and on FreeBSD it does.
So, switch to `linkat`, so that we can pick a behavior rather than
depending on OS defaults.
Pick the option to not follow symlinks. This is somewhat arbitrary, but
seems the less surprising choice because hard linking is a very
low-level feature which requires the source and destination to be on
the same mounted filesystem, and following a symbolic link could end
up in a different mounted filesystem.
- Use intra-doc links for `std::io` in `std::fs`
- Use intra-doc links for File::read in unix/ext/fs.rs
- Remove explicit intra-doc links for `true` in `net/addr.rs`
- Use intra-doc links in alloc/src/sync.rs
- Use intra-doc links in src/ascii.rs
- Switch to intra-doc links in alloc/rc.rs
- Use intra-doc links in core/pin.rs
- Use intra-doc links in std/prelude
- Use shorter links in `std/fs.rs`
`io` is already in scope.
clarify documentation of remove_dir errors
remove_dir will error if the path doesn't exist or isn't a directory.
It's useful to clarify that this is "remove dir or fail" not "remove dir
if it exists".
I don't think this belongs in the title. "Removes an existing, empty
directory" is strangely worded-- there's no such thing as a non-existing
directory. Better to just say explicitly it will return an error.
remove_dir will error if the path doesn't exist or isn't a directory.
It's useful to clarify that this is "remove dir or fail" not "remove dir
if it exists".
I don't think this belongs in the title. "Removes an existing, empty
directory" is strangely worded-- there's no such thing as a non-existing
directory. Better to just say explicitly it will return an error.