Commit Graph

733 Commits

Author SHA1 Message Date
bors
30e49a9ead Auto merge of #75272 - the8472:spec-copy, r=KodrAus
specialize io::copy to use copy_file_range, splice or sendfile

Fixes #74426.
Also covers #60689 but only as an optimization instead of an official API.

The specialization only covers std-owned structs so it should avoid the problems with #71091

Currently linux-only but it should be generalizable to other unix systems that have sendfile/sosplice and similar.

There is a bit of optimization potential around the syscall count. Right now it may end up doing more syscalls than the naive copy loop when doing short (<8KiB) copies between file descriptors.

The test case executes the following:

```
[pid 103776] statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=17, ...}) = 0
[pid 103776] write(4, "wxyz", 4)        = 4
[pid 103776] write(4, "iklmn", 5)       = 5
[pid 103776] copy_file_range(3, NULL, 4, NULL, 5, 0) = 5

```

0-1 `stat` calls to identify the source file type. 0 if the type can be inferred from the struct from which the FD was extracted
𝖬 `write` to drain the `BufReader`/`BufWriter` wrappers. only happen when buffers are present. 𝖬 ≾ number of wrappers present. If there is a write buffer it may absorb the read buffer contents first so only result in a single write. Vectored writes would also be an option but that would require more invasive changes to `BufWriter`.
𝖭 `copy_file_range`/`splice`/`sendfile` until file size, EOF or the byte limit from `Take` is reached. This should generally be *much* more efficient than the read-write loop and also have other benefits such as DMA offload or extent sharing.

## Benchmarks

```

OLD

test io::tests::bench_file_to_file_copy         ... bench:      21,002 ns/iter (+/- 750) = 6240 MB/s    [ext4]
test io::tests::bench_file_to_file_copy         ... bench:      35,704 ns/iter (+/- 1,108) = 3671 MB/s  [btrfs]
test io::tests::bench_file_to_socket_copy       ... bench:      57,002 ns/iter (+/- 4,205) = 2299 MB/s
test io::tests::bench_socket_pipe_socket_copy   ... bench:     142,640 ns/iter (+/- 77,851) = 918 MB/s

NEW

test io::tests::bench_file_to_file_copy         ... bench:      14,745 ns/iter (+/- 519) = 8889 MB/s    [ext4]
test io::tests::bench_file_to_file_copy         ... bench:       6,128 ns/iter (+/- 227) = 21389 MB/s   [btrfs]
test io::tests::bench_file_to_socket_copy       ... bench:      13,767 ns/iter (+/- 3,767) = 9520 MB/s
test io::tests::bench_socket_pipe_socket_copy   ... bench:      26,471 ns/iter (+/- 6,412) = 4951 MB/s
```
2020-11-14 12:01:55 +00:00
The8472
bbfa92c82d Always handle EOVERFLOW by falling back to the generic copy loop
Previously EOVERFLOW handling was only applied for io::copy specialization
but not for fs::copy sharing the same code.

Additionally we lower the chunk size to 1GB since we have a user report
that older kernels may return EINVAL when passing 0x8000_0000
but smaller values succeed.
2020-11-13 22:38:27 +01:00
The8472
4854d418a5 do direct splice syscall and probe availability to get android builds to work
Android builds use feature level 14, the libc wrapper for splice is gated
on feature level 21+ so we have to invoke the syscall directly.
Additionally the emulator doesn't seem to support it so we also have to
add ENOSYS checks.
2020-11-13 22:38:27 +01:00
The8472
3dfc377aa1 move sendfile/splice/copy_file_range into kernel_copy module 2020-11-13 22:38:27 +01:00
The8472
888b1031bc limit visibility of copy offload helpers to sys::unix module 2020-11-13 22:38:27 +01:00
The8472
18bfe2a66b move copy specialization tests to their own module 2020-11-13 22:38:27 +01:00
The8472
7f5d2722af move copy specialization into sys::unix module 2020-11-13 22:38:23 +01:00
The8472
ad9b07c7e5 add benchmarks 2020-11-13 19:46:37 +01:00
The8472
46e7fbe60b reduce syscalls by inferring FD types based on source struct instead of calling stat()
also adds handling for edge-cases involving large sparse files where sendfile could fail with EOVERFLOW
2020-11-13 19:46:35 +01:00
The8472
0624730d9e add forwarding specializations for &mut variants
`impl Write for &mut T where T: Write`, thus the same should
apply to the specialization traits
2020-11-13 19:45:38 +01:00
The8472
cd3bddc044 prioritize sendfile over splice since it results in fewer context switches when sending to pipes
splice returns to userspace when the pipe is full, sendfile
just blocks until it's done, this can achieve much higher throughput
2020-11-13 19:45:38 +01:00
The8472
67a6059aa5 move tests module into separate file 2020-11-13 19:45:38 +01:00
The8472
5eb88fa5c7 hide unused exports on other platforms 2020-11-13 19:45:38 +01:00
The8472
16236470c1 specialize io::copy to use copy_file_range, splice or sendfile
Currently it only applies to linux systems. It can be extended to make use
of similar syscalls on other unix systems.
2020-11-13 19:45:27 +01:00
Guillaume Gomez
ef32ef7baf
Rollup merge of #77996 - tkaitchuck:master, r=m-ou-se
Doc change: Remove mention of `fnv` in HashMap

Disclaimer: I am the author of [aHash](https://github.com/tkaitchuck/aHash).

This changes the Rustdoc in `HashMap` from mentioning the `fnv` crate to mentioning the `aHash` crate, as an alternative `Hasher` implementation.

### Why

Fnv [has poor hash quality](https://github.com/rurban/smhasher), is [slow for larger keys](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), and does not provide dos resistance, because it is unkeyed (this can also cause [other problems](https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion)).

Fnv has acceptable performance for integers and has very poor performance with keys >32 bytes. This is the reason it was removed from the standard library in https://github.com/rust-lang/rust/pull/37229 .

Because regardless of which dimension you value, there are better alternatives, it does not make sense for anyone to consider using `fnv`.

The text mentioning `fnv` in the standard library continues to create confusion: https://github.com/rust-lang/hashbrown/issues/153  https://github.com/rust-lang/hashbrown/issues/9 . There are also a number of [crates using it](https://crates.io/crates/fnv/reverse_dependencies) a great many of which are hashing strings (Which is when Fnv is the [worst](https://github.com/cbreeden/fxhash#benchmarks), [possible](https://github.com/tkaitchuck/aHash#speed), [choice](http://cglab.ca/~abeinges/blah/hash-rs/).)

I think aHash makes the most sense to mention as an alternative because it is the most credible option (in my obviously biased opinion). It offers [good performance on numbers and strings](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), is [of high quality](https://github.com/tkaitchuck/aHash#hash-quality), and [provides dos resistance](https://github.com/tkaitchuck/aHash/wiki/How-aHash-is-resists-DOS-attacks). It is popular (see [stats](https://crates.io/crates/ahash)) and is the default hasher for [hashbrown](https://crates.io/crates/hashbrown) and [dashmap](https://crates.io/crates/dashmap) which are the most popular alternative hashmaps. Finally it does not have any of the [`gotcha` cases](https://github.com/tkaitchuck/aHash#fxhash) that `FxHash` suffers from. (Which is the other popular hashing option when DOS attacks are not a concern)

Signed-off-by: Tom Kaitchuck <tom.kaitchuck@emc.com>
2020-11-13 15:26:10 +01:00
Tom Kaitchuck
4e5848349c
Update library/std/src/collections/hash/map.rs
Co-authored-by: Mara Bos <m-ou.se@m-ou.se>
2020-11-12 20:14:57 -08:00
bors
55794e4396 Auto merge of #78965 - jryans:emscripten-threads-libc, r=kennytm
Update thread and futex APIs to work with Emscripten

This updates the thread and futex APIs in `std` to match the APIs exposed by
Emscripten. This allows threads to run on `wasm32-unknown-emscripten` and the
thread parker to compile without errors related to the missing `futex` module.

To make use of this, Rust code must be compiled with `-C target-feature=atomics`
and Emscripten must link with `-pthread`.

I have confirmed this works well locally when building multithreaded crates.
Attempting to enable `std` thread tests currently fails for seemingly obscure
reasons and Emscripten is currently disabled in CI, so further work is needed to
have proper test coverage here.
2020-11-12 05:52:17 +00:00
J. Ryan Stinnett
bf3be09ee8 Fix timeout conversion 2020-11-12 03:40:15 +00:00
J. Ryan Stinnett
951576051b Update thread and futex APIs to work with Emscripten
This updates the thread and futex APIs in `std` to match the APIs exposed by
Emscripten. This allows threads to run on `wasm32-unknown-emscripten` and the
thread parker to compile without errors related to the missing `futex` module.

To make use of this, Rust code must be compiled with `-C target-feature=atomics`
and Emscripten must link with `-pthread`.

I have confirmed this works well locally when building multithreaded crates.
Attempting to enable `std` thread tests currently fails for seemingly obscure
reasons and Emscripten is currently disabled in CI, so further work is needed to
have proper test coverage here.
2020-11-12 01:41:49 +00:00
Jonas Schievink
62f0a78056
Rollup merge of #78216 - workingjubilee:duration-zero, r=m-ou-se
Duration::zero() -> Duration::ZERO

In review for #72790, whether or not a constant or a function should be favored for `#![feature(duration_zero)]` was seen as an open question. In https://github.com/rust-lang/rust/issues/73544#issuecomment-691701670 an invitation was opened to either stabilize the methods or propose a switch to the constant value, supplemented with reasoning. Followup comments suggested community preference leans towards the const ZERO, which would be reason enough.

ZERO also "makes sense" beside existing associated consts for Duration. It is ever so slightly awkward to have a series of constants specifying 1 of various units but leave 0 as a method, especially when they are side-by-side in code. It seems unintuitive for the one non-dynamic value (that isn't from Default) to be not-a-const, which could hurt discoverability of the associated constants overall. Elsewhere in `std`, methods for obtaining a constant value were even deprecated, as seen with [std::u32::min_value](https://doc.rust-lang.org/std/primitive.u32.html#method.min_value).

Most importantly, ZERO costs less to use. A match supports a const pattern, but const fn can only be used if evaluated through a const context such as an inline `const { const_fn() }` or a `const NAME: T = const_fn()` declaration elsewhere. Likewise, while https://github.com/rust-lang/rust/issues/73544#issuecomment-691949373 notes `Duration::zero()` can optimize to a constant value, "can" is not "will". Only const contexts have a strong promise of such. Even without that in mind, the comment in question still leans in favor of the constant for simplicity. As it costs less for a developer to use, may cost less to optimize, and seems to have more of a community consensus for it, the associated const seems best.

r? ```@LukasKalbertodt```
2020-11-11 20:58:52 +01:00
Nicholas-Baron
261ca04c92 Changed unwrap_or to unwrap_or_else in some places.
The discussion seems to have resolved that this lint is a bit "noisy" in
that applying it in all places would result in a reduction in
readability.

A few of the trivial functions (like `Path::new`) are fine to leave
outside of closures.

The general rule seems to be that anything that is obviously an
allocation (`Box`, `Vec`, `vec![]`) should be in a closure, even if it
is a 0-sized allocation.
2020-11-10 20:07:47 -08:00
hyd-dev
70e175b551
Add missing newline to error message of the default OOM hook 2020-11-10 00:15:07 +08:00
Dylan DPC
a8beaa3b3c
Rollup merge of #78878 - shepmaster:intersecting-ignores, r=Mark-Simulacrum
Avoid overlapping cfg attributes when both macOS and aarch64

r? ``@Mark-Simulacrum``
2020-11-09 01:13:48 +01:00
Dylan DPC
41134be153
Rollup merge of #78026 - sunfishcode:symlink-hard-link, r=dtolnay
Define `fs::hard_link` to not follow symlinks.

POSIX leaves it [implementation-defined] whether `link` follows symlinks.
In practice, for example, on Linux it does not and on FreeBSD it does.
So, switch to `linkat`, so that we can pick a behavior rather than
depending on OS defaults.

Pick the option to not follow symlinks. This is somewhat arbitrary, but
seems the less surprising choice because hard linking is a very
low-level feature which requires the source and destination to be on
the same mounted filesystem, and following a symbolic link could end
up in a different mounted filesystem.

[implementation-defined]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html
2020-11-09 01:13:28 +01:00
Jake Goulding
b13817a795 Avoid overlapping cfg attributes when both macOS and aarch64 2020-11-08 09:43:51 -05:00
Mara Bos
96975e515a
Rollup merge of #78852 - camelid:intra-doc-bonanza, r=jyn514
Convert a bunch of intra-doc links

An intra-doc link bonanza!

This was accomplished using a bunch of trial-and-error with sed.
2020-11-08 13:36:28 +01:00
Mara Bos
77f333b304
Rollup merge of #78811 - a1phyr:const_io_structs, r=dtolnay
Make some std::io functions `const`

Tracking issue: #78812

Make the following functions `const`:
- `io::Cursor::new`
- `io::Cursor::get_ref`
- `io::Cursor::position`
- `io::empty`
- `io::repeat`
- `io::sink`

r? `````@dtolnay`````
2020-11-08 13:36:19 +01:00
Mara Bos
eef9951e44
Rollup merge of #78572 - de-vri-es:bsd-cloexec, r=m-ou-se
Use SOCK_CLOEXEC and accept4() on more platforms.

This PR enables the use of `SOCK_CLOEXEC` and `accept4` on more platforms.

-----

Android uses the linux kernel, so it should also support it.

DragonflyBSD introduced them in 4.4 (December 2015):
https://www.dragonflybsd.org/release44/

FreeBSD introduced them in 10.0 (January 2014):
https://wiki.freebsd.org/AtomicCloseOnExec

Illumos introduced them in a commit in April 2013, not sure when it was released. It is quite possible that is has always been in Illumos:
5dbfd19ad5
https://illumos.org/man/3socket/socket
https://illumos.org/man/3socket/accept4

NetBSD introduced them in 6.0 (Oktober 2012) and 8.0 (July 2018):
https://man.netbsd.org/NetBSD-6.0/socket.2
https://man.netbsd.org/NetBSD-8.0/accept.2

OpenBSD introduced them in 5.7 (May 2015):
https://man.openbsd.org/socket https://man.openbsd.org/accept
2020-11-08 13:36:07 +01:00
Mara Bos
1f034f77bc
Rollup merge of #76097 - pickfire:stabilize-spin-loop, r=KodrAus
Stabilize hint::spin_loop

Partially fix #55002, deprecate in another release

r? ``````@KodrAus``````
2020-11-08 13:35:54 +01:00
Camelid
8258cf285f Convert a bunch of intra-doc links 2020-11-07 12:50:57 -08:00
Benoît du Garreau
001dd7e6a5 Add tracking issue 2020-11-06 18:04:52 +01:00
Benoît du Garreau
ae059b532f Make some std::io functions const
Includes:
- io::Cursor::new
- io::Cursor::get_ref
- io::Cursor::position
- io::empty
- io::repeat
- io::sink
2020-11-06 17:48:26 +01:00
Yuki Okushi
0e71fc75cc
Rollup merge of #78006 - pitaj:master, r=jyn514
Use Intra-doc links for std::io::buffered

Helps with #75080. I used the implicit link style for intrinsics, as that was what `minnumf32` and others already had.

``@rustbot`` modify labels: T-doc, A-intra-doc-links

r? ``@jyn514``
2020-11-07 01:02:03 +09:00
Yuki Okushi
4136ed26a1
Rollup merge of #74979 - maekawatoshiki:fix, r=Mark-Simulacrum
`#![deny(unsafe_op_in_unsafe_fn)]` in sys/hermit

Partial fix of #73904.

This encloses ``unsafe`` operations in ``unsafe fn`` in ``sys/hermit``.
Some unsafe blocks are not well documented because some system-based functions lack documents.
2020-11-07 01:01:59 +09:00
Ivan Tham
e8b5be5dff Stabilize hint::spin_loop
Partially fix #55002, deprecate in another release

Co-authored-by: Ashley Mannix <kodraus@hey.com>

Update stable version for stabilize_spin_loop

Co-authored-by: Joshua Nelson <joshua@yottadb.com>

Use better example for spinlock

As suggested by KodrAus

Remove renamed_spin_loop already available in master

Fix spin loop example
2020-11-06 23:41:55 +08:00
Maarten de Vries
3bee37c290 Disable accept4 on Android. 2020-11-06 14:17:48 +01:00
Peter Jaszkowiak
8d48e3bbb2 document HACKs 2020-11-05 19:26:08 -07:00
Peter Jaszkowiak
fe6dfcd28a Intra-doc links for std::io::buffered 2020-11-05 19:09:42 -07:00
Mara Bos
f383e4f1d9
Rollup merge of #78757 - camelid:crate-link-text, r=jyn514
Improve and clean up some intra-doc links
2020-11-05 10:30:02 +01:00
Mara Bos
43e1b58bcc
Rollup merge of #78716 - est31:array_traits, r=Dylan-DPC
Array trait impl comment/doc fixes

Two small doc/comment fixes regarding trait implementations on arrays.
2020-11-05 10:29:46 +01:00
Mara Bos
10d2843604
Rollup merge of #78093 - camelid:as-cleanup, r=jyn514
Clean up docs for 'as' keyword
2020-11-05 10:29:38 +01:00
Camelid
677b2acb48 Add missing comma
'Note however,' -> 'Note, however,'
2020-11-04 18:57:54 -08:00
Camelid
bbdb1f0f66 Clean up some intra-doc links 2020-11-04 18:57:52 -08:00
Camelid
d8afe98eba Clean up docs for 'as' keyword 2020-11-04 16:05:55 -08:00
est31
5801109ba9 Move Copy and Clone into the list of traits implemented for all sizes 2020-11-04 01:28:37 +01:00
Mara Bos
8ed31d2782
Rollup merge of #78602 - RalfJung:raw-ptr-aliasing-issues, r=m-ou-se
fix various aliasing issues in the standard library

This fixes various cases where the standard library either used raw pointers after they were already invalidated by using the original reference again, or created raw pointers for one element of a slice and used it to access neighboring elements.
2020-11-01 11:53:36 +01:00
Mara Bos
f281a76f83
Rollup merge of #78599 - panstromek:master, r=m-ou-se
Add note to process::arg[s] that args shouldn't be escaped or quoted

This came out of discussion on [forum](https://users.rust-lang.org/t/how-to-get-full-output-from-command/50626), where I recently asked a question and it turned out that the problem was redundant quotation:

```rust
 Command::new("rg")
        .arg("\"pattern\"") // this will look for "pattern" with quotes included
```

This is something that has bitten me few times already (in multiple languages actually), so It'd be grateful to have it in the docs, even though it's not sctrictly Rust specific problem. Other users also agreed.

This can be really annoying to debug, because in many cases (inluding mine), quotes can be legal part of the argument, so the command doesn't fail, it just behaves unexpectedly. Not everybody (including me) knows that quotes around arguments are part of the shell and not part of the called program. Coincidentally, somoene had the same problem [yesterday](https://www.reddit.com/r/rust/comments/jkxelc/going_crazy_over_running_a_curl_process_from_rust/) on reddit.

I am not a native speaker, so I welcome any corrections or better formulation, I don't expect this to be merged as is. I was also reminded that this is platform/shell specific behaviour, but I didn't find a good way to formulate that briefly, any ideas welcome.

 It's also my first PR here, so I am not sure I did everything correctly, I did this just from Github UI.
2020-11-01 11:53:34 +01:00
Matyáš Racek
db416b232c
Apply suggestions from code review
Co-authored-by: Mara Bos <m-ou.se@m-ou.se>
2020-10-31 17:28:44 +01:00
Ralf Jung
9f630af930 fix aliasing issue in unix sleep function 2020-10-31 16:26:06 +01:00
Matyáš Racek
d417bbef95
Add note to process::arg[s] that args shouldn't be escaped or quoted 2020-10-31 14:40:36 +01:00