Commit Graph

18 Commits

Author SHA1 Message Date
Thom Chiovoloni
554918e311
Hide Repr details from io::Error, and rework io::Error::new_const. 2022-02-04 18:47:29 -08:00
DrMeepster
98c6200b16 read_buf 2021-11-02 22:47:20 -07:00
John Kugelman
a990c76d84 Optimize File::read_to_end and read_to_string
Reading a file into an empty vector or string buffer can incur
unnecessary `read` syscalls and memory re-allocations as the buffer
"warms up" and grows to its final size. This is perhaps a necessary evil
with generic readers, but files can be read in smarter by checking the
file size and reserving that much capacity.

`std::fs::read` and `read_to_string` already perform this optimization:
they open the file, reads its metadata, and call `with_capacity` with
the file size. This ensures that the buffer does not need to be resized
and an initial string of small `read` syscalls.

However, if a user opens the `File` themselves and calls
`file.read_to_end` or `file.read_to_string` they do not get this
optimization.

```rust
let mut buf = Vec::new();
file.read_to_end(&mut buf)?;
```

I searched through this project's codebase and even here are a *lot* of
examples of this. They're found all over in unit tests, which isn't a
big deal, but there are also several real instances in the compiler and
in Cargo. I've documented the ones I found in a comment here:

https://github.com/rust-lang/rust/issues/89516#issuecomment-934423999

Most telling, the `Read` trait and the `read_to_end` method both show
this exact pattern as examples of how to use readers. What this says to
me is that this shouldn't be solved by simply fixing the instances of it
in this codebase. If it's here it's certain to be prevalent in the wider
Rust ecosystem.

To that end, this commit adds specializations of `read_to_end` and
`read_to_string` directly on `File`. This way it's no longer a minor
footgun to start with an empty buffer when reading a file in.

A nice side effect of this change is that code that accesses a `File` as
a bare `Read` constraint or via a `dyn Read` trait object will benefit.
For example, this code from `compiler/rustc_serialize/src/json.rs`:

```rust
pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
    let mut contents = Vec::new();
    match rdr.read_to_end(&mut contents) {
```

Related changes:

- I also added specializations to `BufReader` to delegate to
  `self.inner`'s methods. That way it can call `File`'s optimized
  implementations if the inner reader is a file.

- The private `std::io::append_to_string` function is now marked
  `unsafe`.

- `File::read_to_string` being more efficient means that the performance
  note for `io::read_to_string` can be softened. I've added @camelid's
  suggested wording from:

  https://github.com/rust-lang/rust/issues/80218#issuecomment-936806502
2021-10-07 18:42:02 -04:00
John Kugelman
9b9c24ec7f Fix read_to_end to not grow an exact size buffer
If you know how much data to expect and use `Vec::with_capacity` to
pre-allocate a buffer of that capacity, `Read::read_to_end` will still
double its capacity. It needs some space to perform a read, even though
that read ends up returning `0`.

It's a bummer to carefully pre-allocate 1GB to read a 1GB file into
memory and end up using 2GB.

This fixes that behavior by special casing a full buffer and reading
into a small "probe" buffer instead. If that read returns `0` then it's
confirmed that the buffer was the perfect size. If it doesn't, the probe
buffer is appended to the normal buffer and the read loop continues.

Fixing this allows several workarounds in the standard library to be
removed:

- `Take` no longer needs to override `Read::read_to_end`.
- The `reservation_size` callback that allowed `Take` to inhibit the
  previous over-allocation behavior isn't needed.
- `fs::read` doesn't need to reserve an extra byte in
  `initial_buffer_size`.

Curiously, there was a unit test that specifically checked that
`Read::read_to_end` *does* over-allocate. I removed that test, too.
2021-09-22 00:54:27 -04:00
Aris Merchant
6d34a2e007 Stabilize Seek::rewind 2021-07-01 15:08:20 -07:00
bors
ce1d5611a2 Auto merge of #85815 - YuhanLiin:buf-read-data-left, r=m-ou-se
Add has_data_left() to BufRead

This is a continuation of #40747 and also addresses #40745. The problem with the previous PR was that it had "eof" in its method name. This PR uses a more descriptive method name, but I'm open to changing it.
2021-06-18 20:11:51 +00:00
Mara Bos
b7dd942e15
Rollup merge of #86202 - a1phyr:spec_io_bytes_size_hint, r=m-ou-se
Specialize `io::Bytes::size_hint` for more types

Improve the result of `<io::Bytes as Iterator>::size_hint` for some readers. I did not manage to specialize `SizeHint` for `io::Cursor`

Side question: would it be interesting for `io::Read` to have an optional `size_hint` method ?
2021-06-17 23:40:58 +02:00
Benoît du Garreau
2cbd5d1df5 Specialize io::Bytes::size_hint for more types 2021-06-10 19:16:55 +02:00
Thomas de Zeeuw
fd14c52075 Rename IoSlice(Mut)::advance_slice to advance_slices 2021-06-05 13:06:10 +02:00
YuhanLiin
e76929ff98 Add has_data_left() to BufRead 2021-05-29 17:47:51 -04:00
Thomas de Zeeuw
3803c090f8 Rename IoSlice(Mut)::advance to advance_slice
To make way for a new IoSlice(Mut)::advance function that advances a
single slice.

Also changes the signature to accept a `&mut &mut [IoSlice]`, not
returning anything. This will better match the future IoSlice::advance
function.
2021-05-29 10:08:00 +02:00
Mara Bos
7b71719faf Use io::Error::new_const everywhere to avoid allocations. 2021-03-21 20:22:38 +01:00
Xavientois
389e638c05 Add tests for SizeHint implementations 2021-01-31 08:34:42 -05:00
Xavientois
c8e0f8aaa3 Use fully qualified syntax to avoid dyn 2021-01-31 08:31:35 -05:00
The8472
18bfe2a66b move copy specialization tests to their own module 2020-11-13 22:38:27 +01:00
The8472
ad9b07c7e5 add benchmarks 2020-11-13 19:46:37 +01:00
The8472
67a6059aa5 move tests module into separate file 2020-11-13 19:45:38 +01:00
Lzu Tao
a4e926daee std: move "mod tests/benches" to separate files
Also doing fmt inplace as requested.
2020-08-31 02:56:59 +00:00