Add std:🧵:available_concurrency
This PR adds a counterpart to [C++'s `std:🧵:hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) to Rust, tracking issue https://github.com/rust-lang/rust/issues/74479.
cc/ `@rust-lang/libs`
## Motivation
Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the [`std:🧵:hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) API. Currently in Rust most of the ecosystem depends on the [`num_cpus` crate](https://docs.rs/num_cpus/1.13.0/num_cpus/) ([no.35 in top 500 crates](https://docs.google.com/spreadsheets/d/1wwahRMHG3buvnfHjmPQFU4Kyfq15oTwbfsuZpwHUKc4/edit#gid=1253069234)) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform.
__edit (2020-07-24):__ The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: https://github.com/rust-lang/rust/pull/74480#issuecomment-662116186.
## Naming
Discussing the naming of the API on Zulip surfaced two options:
- `std:🧵:hardware_concurrency`
- `std:🧵:hardware_threads`
Both options seemed acceptable, but overall people seem to gravitate the most towards `hardware_threads`. Additionally `@jonas-schievink` pointed out that the "hardware threads" terminology is well-established and is used in among other the [RISC-V specification](https://riscv.org/specifications/isa-spec-pdf/) (page 20):
> A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible __hardware threads__, or harts, through multithreading.
It's also worth noting that [the original paper introducing C++'s `std::thread` submodule](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html) unfortunately doesn't feature any discussion on the naming of `hardware_concurrency`, so we can't use that to help inform our decision here.
## Return type
An important consideration `@joshtriplett` brought up is that we don't want to default to `1` for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR uses `Option<NonZeroUsize>` as its return type, where `None` is returned on platforms where we don't know the number of hardware threads available.
The reasoning for `NonZeroUsize` vs `usize` is that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example the `NonZero*` family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately.
## Implementation
`@Mark-Simulacrum` pointed out that most of the code we wanted to expose here was already available under `libtest`. So this PR mostly moves the internal code of libtest into a public API.
Add some more regression tests
This is another round of #77741. Tested with `debug-assertions=true` and it passed on my local.
Closes#70877Closes#70944Closes#71659Closes#74816Closes#75707Closes#75983
(Skipped #63355 because I'm not sure about the error.)
Rollup of 7 pull requests
Successful merges:
- #75802 (resolve: Do not put nonexistent crate `meta` into prelude)
- #76607 (Modify executable checking to be more universal)
- #77851 (BTreeMap: refactor Entry out of map.rs into its own file)
- #78043 (Fix grammar in note for orphan-rule error [E0210])
- #78048 (Suggest correct place to add `self` parameter when inside closure)
- #78050 (Small CSS cleanup)
- #78059 (Set `MDBOOK_OUTPUT__HTML__INPUT_404` on linkchecker)
Failed merges:
r? `@ghost`
Suggest correct place to add `self` parameter when inside closure
It would incorrectly suggest adding it as a parameter to the closure instead of the containing function.
[For example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1936bcd1e5f981573386e0cee985c3c0):
```
help: add a `self` receiver parameter to make the associated `fn` a method
|
5 | let _ = || self&self;
| ^^^^^
```
`DiagnosticMetadata.current_function` is only used for these messages so tweaking its behavior should be ok.
Fix grammar in note for orphan-rule error [E0210]
Fixes the grammar in the error note for [E0210] from:
_"= note: implementing a foreign trait is only possible if at least one of the types for which **is it** implemented is local"_
to:
_"= note: implementing a foreign trait is only possible if at least one of the types for which **it is** implemented is local"_
The content of this commit is the result of running the following command at the repository root:
`find . \( -type d -name .git -prune \) -o -type f -print0 | xargs -0 sed -i 's/which is it implemented/which it is implemented/g'`
BTreeMap: refactor Entry out of map.rs into its own file
btree/map.rs is approaching the 3000 line mark, splitting out the entry
code buys about 500 lines of headroom.
I've created this PR because the changes I've made in #77438 will push `map.rs` over the 3000 line limit and cause tidy to complain.
I picked `Entry` to factor out because it feels less tightly coupled to the rest of `BTreeMap` than the various iterator implementations.
Related: #60302
Modify executable checking to be more universal
This uses a dummy file to check if the filesystem being used supports the executable bit in general.
Supersedes #74753.
resolve: Do not put nonexistent crate `meta` into prelude
Before the 2018 edition release there was some vague suggestion about adding a crate named `meta` to the standard distribution.
On this basis the name `meta` was "partially reserved" by putting `meta` into extern prelude (this means importing something named `meta` will result in an ambiguity error, for example).
This only caused confusion so far, and two years later there are no specific plans to add such crate.
If some standard crate (named `meta` or not) is added in the future, then cargo will hopefully already have ability to put it into extern prelude explicitly through `Cargo.toml`.
Otherwise, it could be added to extern prelude by the compiler at edition boundary.
Closes https://github.com/rust-lang/rust/issues/73948
Use rebind instead of Binder::bind when possible
These are really only the easy places. I just searched for `Binder::bind` and replaced where it straightforward.
r? `@lcnr`
cc. `@nikomatsakis`
Use double quote for rustdoc html
r? `@GuillaumeGomez`
Feels scary without escaping stuff when I looked at the code, probably susceptible to XSS.
Follow up of https://github.com/rust-lang/rust/pull/75842
Use posix_spawn() on unix if program is a path
Previously `Command::spawn` would fall back to the non-posix_spawn based
implementation if the `PATH` environment variable was possibly changed.
On systems with a modern (g)libc `posix_spawn()` can be significantly
faster. If program is a path itself the `PATH` environment variable is
not used for the lookup and it should be safe to use the
`posix_spawnp()` method. [1]
We found this, because we have a cli application that effectively runs a
lot of subprocesses. It would sometimes noticeably hang while printing
output. Profiling showed that the process was spending the majority of
time in the kernel's `copy_page_range` function while spawning
subprocesses. During this time the process is completely blocked from
running, explaining why users were reporting the cli app hanging.
Through this we discovered that `std::process::Command` has a fast and
slow path for process execution. The fast path is backed by
`posix_spawnp()` and the slow path by fork/exec syscalls being called
explicitly. Using fork for process creation is supposed to be fast, but
it slows down as your process uses more memory. It's not because the
kernel copies the actual memory from the parent, but it does need to
copy the references to it (see `copy_page_range` above!). We ended up
using the slow path, because the command spawn implementation in falls
back to the slow path if it suspects the PATH environment variable was
changed.
Here is a smallish program demonstrating the slowdown before this code
change:
```
use std::process::Command;
use std::time::Instant;
fn main() {
let mut args = std::env::args().skip(1);
if let Some(size) = args.next() {
// Allocate some memory
let _xs: Vec<_> = std::iter::repeat(0)
.take(size.parse().expect("valid number"))
.collect();
let mut command = Command::new("/bin/sh");
command
.arg("-c")
.arg("echo hello");
if args.next().is_some() {
println!("Overriding PATH");
command.env("PATH", std::env::var("PATH").expect("PATH env var"));
}
let now = Instant::now();
let child = command
.spawn()
.expect("failed to execute process");
println!("Spawn took: {:?}", now.elapsed());
let output = child.wait_with_output().expect("failed to wait on process");
println!("Output: {:?}", output);
} else {
eprintln!("Usage: prog [size]");
std::process::exit(1);
}
()
}
```
Running it and passing different amounts of elements to use to allocate
memory shows that the time taken for `spawn()` can differ quite
significantly. In latter case the `posix_spawnp()` implementation is 30x
faster:
```
$ cargo run --release 10000000
...
Spawn took: 324.275µs
hello
$ cargo run --release 10000000 changepath
...
Overriding PATH
Spawn took: 2.346809ms
hello
$ cargo run --release 100000000
...
Spawn took: 387.842µs
hello
$ cargo run --release 100000000 changepath
...
Overriding PATH
Spawn took: 13.434677ms
hello
```
[1]: 5f72f9800b/posix/execvpe.c (L81)
Set .llvmbc and .llvmcmd sections as allocatable
This marks both sections as allocatable rather than excluded, which matches what
clang does with the equivalent `-fembed-bitcode` flag.
BTreeMap: improve gdb introspection of BTreeMap with ZST keys or values
I accidentally pushed an earlier revision in #77788: it changes the index of tuples for BTreeSet from ""[{}]".format(i) to "key{}".format(i). Which doesn't seem to make the slightest difference on my linux box nor on CI. In fact, gdb doesn't make any distinction between "key{}" and "val{}" for a BTreeMap either, leading to confusing output if you test more. But easy to improve.
r? @Mark-Simulacrum
liballoc: VecDeque: Add binary search functions
I am submitting rust-lang/rfcs#2997 as a PR as suggested by @scottmcm
I haven't yet created a tracking issue - if there's a favorable feedback I'll create one and update the issue links in the unstable attribs.
Permit uninhabited enums to cast into ints
This essentially reverts part of #6204; it is unclear why that [commit](c0f587de34) was introduced, and I suspect no one remembers.
The changed code was only called from casting checks and appears to not affect any callers of that code (other than permitting this one case).
Fixes#75647.
Remove shrink_to_fit from default ToString::to_string implementation.
As suggested by `@scottmcm` on Zulip. shrink_to_fit() seems like the wrong thing to do here in most use cases of to_string(). Would be intereseting to see if it makes any difference in a timer run.
r? `@joshtriplett`
Rollup of 10 pull requests
Successful merges:
- #75209 (Suggest imports of unresolved macros)
- #77547 (stabilize union with 'ManuallyDrop' fields and 'impl Drop for Union')
- #77827 (Don't link to nightly primitives on stable channel)
- #77855 (resolve: further improvements to "try using the enum's variant" diagnostic)
- #77900 (Use fdatasync for File::sync_data on more OSes)
- #77925 (Suggest minimal subset features in `incomplete_features` lint)
- #77971 (Deny broken intra-doc links in linkchecker)
- #77991 (Bump backtrace-rs)
- #77992 (instrument-coverage: try our best to not ICE)
- #78013 (Fix sidebar scroll on mobile devices)
Failed merges:
r? `@ghost`
Fix sidebar scroll on mobile devices
Fixes#77942.
The issue was coming from the appearance/disappearance of the "wrapper" on the mobile devices web browsers, which triggers the "resize" event, calling the `hideSidebar` function is the JS code.
r? @jyn514
instrument-coverage: try our best to not ICE
instrument-coverage was ICEing for me on some code, in particular code
that had devirtualized paths from standard library. Instrument coverage
probably has no bussiness dictating which paths are valid and which
aren't so just feed it everything and whatever and let tooling deal with
other stuff.
For example, with this commit we can generate coverage hitpoints for
these interesting paths:
* `/rustc/.../library/core/lib.rs` – non-devirtualized path for libcore
* `/home/.../src/library/core/lib.rs` – devirtualized version of above
* `<inline asm>`, `<anon>` and many similar synthetic paths
Even if those paths somehow get to the instrumentation pass, I'd much
rather get hits for these weird paths and hope some of them work (as
would be the case for devirtualized path to libcore), rather than have
compilation fail entirely.