available_parallelism: Gracefully handle zero value cfs_period_us
There seem to be some scenarios where the cgroup cpu quota field `cpu.cfs_period_us` can contain `0`. This field is used to determine the "amount" of parallelism suggested by the function `std:🧵:available_parallelism`
A zero value of this field cause a panic when `available_parallelism()` is invoked. This issue was detected by the call from binaries built by `cargo test`. I really don't feel like `0` is a good value for `cpu.cfs_period_us`, but I also don't think applications should panic if this value is seen.
This panic started happening with rust 1.64.0.
This case is gracefully handled by other projects which read this information: [num_cpus](e437b9d908/src/linux.rs (L207-L210)), [ninja](https://github.com/ninja-build/ninja/pull/2174/files), [dotnet](c4341d45ac/src/coreclr/pal/src/misc/cgroup.cpp (L481-L483))
Before this change, running `cargo test` in environments configured as described above would trigger this panic:
```
$ RUST_BACKTRACE=1 cargo test
Finished test [unoptimized + debuginfo] target(s) in 3.55s
Running unittests src/main.rs (target/debug/deps/x-9a42e145aca2934d)
thread 'main' panicked at 'attempt to divide by zero', library/std/src/sys/unix/thread.rs:546:70
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: std::sys::unix:🧵:cgroups::quota
4: std::sys::unix:🧵:available_parallelism
5: std:🧵:available_parallelism
6: test::helpers::concurrency::get_concurrency
7: test::console::run_tests_console
8: test::test_main
9: test::test_main_static
10: x::main
at ./src/main.rs:1:1
11: core::ops::function::FnOnce::call_once
at /tmp/rust-1.64-1.64.0-1/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass '--bin x'
```
I've tested this change in an environment which has the bad (questionable?) setup and rebuilding the test executable against a fixed std library fixes the panic.
Use more LFS functions.
On Linux, use mmap64, open64, openat64, and sendfile64 in place of their non-LFS counterparts.
This is relevant to #94173.
With these changes (together with rust-lang/backtrace-rs#501), the simple binaries I produce with rustc seem to have no non-LFS functions, so maybe #94173 is fixed. But I can't be sure if I've missed something and maybe some non-LFS functions could sneak in somehow.
Ensure that heap allocation does not occur in a thread until std::thread
is ready. This fixes issues with custom allocators that call
std:🧵:current(), since doing so prematurely initializes
THREAD_INFO and causes the following thread_info::set() to fail.
On Linux, use mmap64, open64, openat64, and sendfile64 in place of their
non-LFS counterparts.
This is relevant to #94173.
With these changes (together with rust-lang/backtrace-rs#501), the
simple binaries I produce with rustc seem to have no non-LFS functions,
so maybe #94173 is fixed. But I can't be sure if I've missed something
and maybe some non-LFS functions could sneak in somehow.
There seem to be some scenarios where `cpu.cfs_period_us` can contain `0`
This causes a panic when calling `std:🧵:available_parallelism()` as is done so
from binaries built by `cargo test`, which was how the issue was
discovered. I don't feel like `0` is a good value for `cpu.cfs_period_us`, but I
also don't think applications should panic if this value is seen.
This case is handled by other projects which read this information:
- num_cpus: e437b9d908/src/linux.rs (L207-L210)
- ninja: https://github.com/ninja-build/ninja/pull/2174/files
- dotnet: c4341d45ac/src/coreclr/pal/src/misc/cgroup.cpp (L481-L483)
Before this change, this panic could be seen in environments setup as described
above:
```
$ RUST_BACKTRACE=1 cargo test
Finished test [unoptimized + debuginfo] target(s) in 3.55s
Running unittests src/main.rs (target/debug/deps/x-9a42e145aca2934d)
thread 'main' panicked at 'attempt to divide by zero', library/std/src/sys/unix/thread.rs:546:70
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: std::sys::unix:🧵:cgroups::quota
4: std::sys::unix:🧵:available_parallelism
5: std:🧵:available_parallelism
6: test::helpers::concurrency::get_concurrency
7: test::console::run_tests_console
8: test::test_main
9: test::test_main_static
10: x::main
at ./src/main.rs:1:1
11: core::ops::function::FnOnce::call_once
at /tmp/rust-1.64-1.64.0-1/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass '--bin local-rabmq-amqpprox'
```
I've tested this change in an environment which has the bad setup and
rebuilding the test executable against a fixed std library fixes the
panic.
These targets have system limits on the thread names, 16 and 64 bytes
respectively, and `pthread_setname_np` returns an error if the name is
longer. However, we're not in a context that can propagate errors when
we call this, and we used to implicitly truncate on Linux with `prctl`,
so now we manually truncate these names ahead of time.
openbsd: don't reallocate a guard page on the stack.
the kernel currently enforce that a stack is immutable. calling mmap(2) or mprotect(2) to change it will result in EPERM, which generate a panic!().
so just do like for Linux, and trust the kernel to do the right thing.
the kernel currently enforce that a stack is immutable. calling mmap(2) or
mprotect(2) to change it will result in EPERM, which generate a panic!().
so just do like for Linux, and trust the kernel to do the right thing.
This function is available on Linux since glibc 2.12, musl 1.1.16, and
uClibc 1.0.20. The main advantage over `prctl` is that it properly
represents the pointer argument, rather than a multi-purpose `long`,
so we're better representing strict provenance (#95496).
Add cgroupv1 support to available_parallelism
Fixes#97549
My dev machine uses cgroup v2 so I was only able to test that code path. So the v1 code path is written only based on documentation. I could use some help testing that it works on a machine with cgroups v1:
```
$ x.py build --stage 1
# quota.rs
fn main() {
println!("{:?}", std:🧵:available_parallelism());
}
# assuming stage1 is linked in rustup
$ rust +stage1 quota.rs
# spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS
# should print Ok(3)
$ ./quota
```
If it doesn't work as expected an strace, the contents of `/proc/self/cgroups` and the structure of `/sys/fs/cgroups` would help.
libc::prctl and the prctl definitions in glibc, musl, and the kernel
headers are C variadic functions. Therefore, all the arguments (except
for the first) are untyped. It is only the Linux man page which says
that prctl takes 4 unsigned long arguments. I have no idea why it says
this.
In any case, the upshot is that we don't need to cast the pointer to an
integer and confuse Miri.
this avoids parsing mountinfo which can be huge on some systems and
something might be emulating cgroup fs for sandboxing reasons which means
it wouldn't show up as mountpoint
additionally the new implementation operates on a single pathbuffer, reducing allocations
Manually tested via
```
// spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS
// quota.rs
#![feature(available_parallelism)]
fn main() {
println!("{:?}", std:🧵:available_parallelism()); // prints Ok(3)
}
```
Caveats
* cgroup v1 is ignored
* funky mountpoints (containing spaces, newlines or control chars) for cgroupfs will not be handled correctly since that would require unescaping /proc/self/mountinfo
The escaping behavior of procfs seems to be undocumented. systemd and docker default to `/sys/fs/cgroup` so it should be fine for most systems.
* quota will be ignored when `sched_getaffinity` doesn't work
* assumes procfs is mounted under `/proc` and cgroupfs mounted and readable somewhere in the directory tree
This makes a few changes to the weak symbol macros in `sys::unix`:
- `dlsym!` is added to keep the functionality for runtime `dlsym`
lookups, like for `__pthread_get_minstack@GLIBC_PRIVATE` that we don't
want to show up in ELF symbol tables.
- `weak!` now uses `#[linkage = "extern_weak"]` symbols, so its runtime
behavior is just a simple null check. This is also used by `syscall!`.
- On non-ELF targets (macos/ios) where that linkage is not known to
behave, `weak!` is just an alias to `dlsym!` for the old behavior.
- `raw_syscall!` is added to always call `libc::syscall` on linux and
android, for cases like `clone3` that have no known libc wrapper.
The new `weak!` linkage does mean that you'll get versioned symbols if
you build with a newer glibc, like `WEAK DEFAULT UND statx@GLIBC_2.28`.
This might seem problematic, but old non-weak symbols can tie the build
to new versions too, like `dlsym@GLIBC_2.34` from their recent library
unification. If you build with an old glibc like `dist-x86_64-linux`
does, you'll still get unversioned `WEAK DEFAULT UND statx`, which may
be resolved based on the runtime glibc.
I also found a few functions that don't need to be weak anymore:
- Android can directly use `ftruncate64`, `pread64`, and `pwrite64`, as
these were added in API 12, and our baseline is API 14.
- Linux can directly use `splice`, added way back in glibc 2.5 and
similarly old musl. Android only added it in API 21 though.
Use libc::sched_getaffinity and count the number of CPUs in the returned
mask. This handles cases where the process doesn't have access to all
CPUs, such as when limited via taskset or similar.
Add new tier-3 target: armv7-unknown-linux-uclibceabihf
This change adds a new tier-3 target: armv7-unknown-linux-uclibceabihf
This target is primarily used in embedded linux devices where system resources are slim and glibc is deemed too heavyweight. Cross compilation C toolchains are available [here](https://toolchains.bootlin.com/) or via [buildroot](https://buildroot.org).
The change is based largely on a previous PR #79380 with a few minor modifications. The author of that PR was unable to push the PR forward, and graciously allowed me to take it over.
Per the [target tier 3 policy](https://github.com/rust-lang/rfcs/blob/master/text/2803-target-tier-policy.md), I volunteer to be the "target maintainer".
This is my first PR to Rust itself, so I apologize if I've missed things!
Rename `std:🧵:available_conccurrency` to `std:🧵:available_parallelism`
_Tracking issue: https://github.com/rust-lang/rust/issues/74479_
This PR renames `std:🧵:available_conccurrency` to `std:🧵:available_parallelism`.
## Rationale
The API was initially named `std:🧵:hardware_concurrency`, mirroring the [C++ API of the same name](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency). We eventually decided to omit any reference to the word "hardware" after [this comment](https://github.com/rust-lang/rust/pull/74480#issuecomment-662045841). And so we ended up with `available_concurrency` instead.
---
For a talk I was preparing this week I was reading through ["Understanding and expressing scalable concurrency" (A. Turon, 2013)](http://aturon.github.io/academic/turon-thesis.pdf), and the following passage stood out to me (emphasis mine):
> __Concurrency is a system-structuring mechanism.__ An interactive system that deals with disparate asynchronous events is naturally structured by division into concurrent threads with disparate responsibilities. Doing so creates a better fit between problem and solution, and can also decrease the average latency of the system by preventing long-running computations from obstructing quicker ones.
> __Parallelism is a resource.__ A given machine provides a certain capacity for parallelism, i.e., a bound on the number of computations it can perform simultaneously. The goal is to maximize throughput by intelligently using this resource. For interactive systems, parallelism can decrease latency as well.
_Chapter 2.1: Concurrency is not Parallelism. Page 30._
---
_"Concurrency is a system-structuring mechanism. Parallelism is a resource."_ — It feels like this accurately captures the way we should be thinking about these APIs. What this API returns is not "the amount of concurrency available to the program" which is a property of the program, and thus even with just a single thread is effectively unbounded. But instead it returns "the amount of _parallelism_ available to the program", which is a resource hard-constrained by the machine's capacity (and can be further restricted by e.g. operating systems).
That's why I'd like to propose we rename this API from `available_concurrency` to `available_parallelism`. This still meets the criteria we previously established of not attempting to define what exactly we mean by "hardware", "threads", and other such words. Instead we only talk about "concurrency" as an abstract resource available to our program.
r? `@joshtriplett`
`weak!` is needed in a test in another module. With macros
1.0, importing `weak!` would require reordering module
declarations in `std/src/lib.rs`, which is a bit too
evil.
Beginning with FreeBSD 10.4 and 11.1, there is one guard page by
default. And the stack autoresizes, so if Rust allocates its own guard
page, then FreeBSD's will simply move up one page. The best solution is
to just use the OS's guard page.