`binop_common` emits a `SKIP` that is intended to apply only to
`copysign`, but is instead applying to all binary operators. Correct the
general case but leave the currently-failing `maximum_num` tests as a
FIXME, to be resolved separately in [1].
Also simplify skip logic and NaN checking, and add a few more `copysign`
checks.
[1]: https://github.com/rust-lang/compiler-builtins/pull/939
As seen at [1], LLVM uses `long long` on LLP64 (to get a 64-bit integer
matching pointer size) and `long` on everything else, with exceptions
for AArch64 and AVR. Our current logic always uses an `i32`. This
happens to work because LLVM uses 32-bit instructions to check the
output on x86-64, but the GCC checks the full 64-bit register so garbage
in the upper half leads to incorrect results.
Update our return type to be `isize`, with exceptions for AArch64 and
AVR.
Fixes: https://github.com/rust-lang/compiler-builtins/issues/919
[1]: 0cf3c437c1/compiler-rt/lib/builtins/fp_compare_impl.inc (L11-L27)
These were deleted during refactoring in 0a2dc5d9 ("Combine the source
files for more generic implementations") but got added back by accident
in 54bac411 ("refactor: Move the libm crate to a subdirectory"). Remove
them again here.
The `feature_detect` module is currently being built on all targets, but
the use of `AtomicU32` causes a problem if atomics are not available
(such as with `bpfel-unknown-none`). Gate this module behind
`target_has_atomic = "ptr"`.
The below now completes successfully:
cargo build -p compiler_builtins --target=bpfel-unknown-none -Z build-std=core
Fixes: https://github.com/rust-lang/compiler-builtins/issues/908
Get performance closer to the glibc implementations by adding assembly
fma routines, with runtime feature detection so they are used even if
not compiled with `+fma` (as the distributed standard library is often
not). Glibc uses ifuncs, this implementation stores a function pointer
in an atomic.
Results of CPU flags are also cached in order to avoid repeating the
startup time in calls to different functions. The feature detection code
is a slightly simplified version of `std-detect`.
Musl sources were used as a reference [1].
Fixes: https://github.com/rust-lang/rust/issues/140452 once synced
[1]: c47ad25ea3/src/math/x32/fma.c
These appeared in a later nightly. In compiler-builtins we can apply the
suggestion, but in `libm` we need to ignore them since `fx::from_bits`
is not `const` at the MSRV.
`clippy::uninlined_format_args` also seems to have gotten stricter, so
fix those here.
Use the 2024 style edition for all crates and enable import sorting.
2024 already applies some smaller heuristics that look good in
compiler-builtins, I have dropped `use_small_heuristics` that was set in
`libm` because it seems to negatively affect the readibility of anything
working with numbers (e.g. collapsing multiple small `if` expressions
into a single line).
Distribute everything from `libm/` to better locations in the repo.
`libm/libm/*` has not moved yet to avoid Git seeing the move as an edit
to `Cargo.toml`.
Files that remain to be merged somehow are in `etc/libm`.
Unfortunately this means we lose use of the convenient name `gen`, so
this includes a handful of renaming.
We can't increase the edition for `libm` yet due to MSRV, but we can
enable `unsafe_op_in_unsafe_fn` to help make that change smoother in the
future.
Move the workspace configuration to a virtual manifest. This
reorganization makes a more clear separation between package contents
and support files that don't get distributed. It will also make it
easier to merge this repository with `compiler-builtins` which is
planned (builtins had a similar update done in [1]).
LICENSE.txt and README.md are symlinkedinto the new directory to ensure
they get included in the package.
[1]: https://github.com/rust-lang/compiler-builtins/pull/702
In preparation for switching to a virtual manifest, move the `libm`
crate into a subdirectory and update paths to match.
Updating `Cargo.toml` is done in the next commit so git tracks the moved
file correctly.
Benchmarks for [1] seemed to indicate that repository organization for
some reason had an effect on performance, even though the exact same
rustc commands were running (though some with a different order). After
investigating more, it appears that dependencies may have an affect on
inlining thresholds for generic functions.
It is surprising that this happens, we more or less expect that public
functions will be standalone but everything they call will be inlined.
To help ensure this, mark all generic functions `#[inline]` if they
should be merged into the public function.
Zulip discussion at [2].
[1]: https://github.com/rust-lang/libm/pull/533
[2]: https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/Dependencies.20affecting.20codegen/with/513079387
Since `fmod` is generic, there isn't any need to have the small wrappers
in separate files. Most operations was done in [1] but `fmod` was
omitted until now.
[1]: https://github.com/rust-lang/libm/pull/537
The reorganization PR has caused this to fail once before because every
file shows up as changed. Increase the timeout so this doesn't happen.
We now cancel the job if too many extensive tests are run unless `ci:
allow-many-extensive` is in the PR description, so this helps prevent
the limit being hit by accident.
Error out when too many extensive tests would be run unless `ci:
allow-many-extensive` is in the PR description. This allows us to set a
much higher CI timeout with less risk that a 4+ hour job gets started by
accident.
Sometimes we do refactoring that moves things around and triggers an
extensive test, even though the implementation didn't change. There
isn't any need to run full extensive CI in these cases, so add a way to
skip it from the PR message.
Jobs should just cancel automatically, it isn't ideal that extensive
jobs can continue running for multiple hours after code has been
updated. Use a solution from [1] to do this.
[1]: https://stackoverflow.com/a/72408109/5380651
Splitting into different source files by float size doesn't have any
benefit when the only content is a small function that forwards to the
generic implementation. Combine the source files for all width versions
of:
* ceil
* copysign
* fabs
* fdim
* floor
* fmaximum
* fmaximum_num
* fminimum
* fminimum_num
* ldexp
* scalbn
* sqrt
* truc
fmod is excluded to avoid conflicts with an open PR.
As part of this change move unit tests out of the generic module,
instead testing the type-specific functions (e.g. `ceilf16` rather than
`ceil::<f16>()`). This ensures that unit tests are validating whatever
we expose, such as arch-specific implementations via
`select_implementation!`, which would otherwise be skipped. (They are
still covered by integration tests).
Introduce a constant representing NaN with a negative sign bit for use
with testing. There isn't really any guarantee that `F::NAN` is positive
but in practice it always is, which is good enough for testing purposes.
Discussed at [1], there was an off-by-one mistake when converting from
the loop routine to using `leading_zeros` for normalization.
Currently, using `EXP_BITS` has the effect that `ix` after the branch
has its MSB _one bit to the left_ of the implicit bit's position,
whereas a shift by `EXP_BITS + 1` ensures that the MSB is exactly at the
implicit bit's position, matching what is done for normals (where the
implicit bit is set to be explicit). This doesn't seem to have any
effect in our implementation since the failing test cases from [1]
appear to still have correct results.
Since the result of using `EXP_BITS + 1` is more consistent with what is
done for normals, apply this here.
[1]: https://github.com/rust-lang/libm/pull/469#discussion_r2012473920
Parsing errors are now bubbled up part of the way, but that needs some
more work.
Rounding should be correct, and the `Status` returned by `parse_any`
should have the correct bits set. These are used for the current (unchanged)
behavior of the surface level functions like `hf64`: panic on invalid inputs, or
values that aren't exactly representable.
Replace `core::arch` versions of the following with handwritten
assembly, which avoids recursion issues (cg_gcc using `rint` as a
fallback) as well as problems with `aarch64be`.
* `rint`
* `rintf`
Additionally, add assembly versions of the following:
* `fma`
* `fmaf`
* `sqrt`
* `sqrtf`
If the `fp16` target feature is available, which implies `neon`, also
include the following:
* `rintf16`
* `sqrtf16`
`sqrt` is added to match the implementation for `x86`. `fma` is included
since it is used by many other routines.
There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in `core` in the near future, which will
allow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.