Add midpoint function for all integers and floating numbers
This pull-request adds the `midpoint` function to `{u,i}{8,16,32,64,128,size}`, `NonZeroU{8,16,32,64,size}` and `f{32,64}`.
This new function is analog to the [C++ midpoint](https://en.cppreference.com/w/cpp/numeric/midpoint) function, and basically compute `(a + b) / 2` with a rounding towards ~~`a`~~ negative infinity in the case of integers. Or simply said: `midpoint(a, b)` is `(a + b) >> 1` as if it were performed in a sufficiently-large signed integral type.
Note that unlike the C++ function this pull-request does not implement this function on pointers (`*const T` or `*mut T`). This could be implemented in a future pull-request if desire.
### Implementation
For `f32` and `f64` the implementation in based on the `libcxx` [one](18ab892ff7/libcxx/include/__numeric/midpoint.h (L65-L77)). I originally tried many different approach but all of them failed or lead me with a poor version of the `libcxx`. Note that `libstdc++` has a very similar one; Microsoft STL implementation is also basically the same as `libcxx`. It unfortunately doesn't seems like a better way exist.
For unsigned integers I created the macro `midpoint_impl!`, this macro has two branches:
- The first one take `$SelfT` and is used when there is no unsigned integer with at least the double of bits. The code simply use this formula `a + (b - a) / 2` with the arguments in the correct order and signs to have the good rounding.
- The second branch is used when a `$WideT` (at least double of bits as `$SelfT`) is provided, using a wider number means that no overflow can occur, this greatly improve the codegen (no branch and less instructions).
For signed integers the code basically forwards the signed numbers to the unsigned version of midpoint by mapping the signed numbers to their unsigned numbers (`ex: i8 [-128; 127] to [0; 255]`) and vice versa.
I originally created a version that worked directly on the signed numbers but the code was "ugly" and not understandable. Despite this mapping "overhead" the codegen is better than my most optimized version on signed integers.
~~Note that in the case of unsigned numbers I tried to be smart and used `#[cfg(target_pointer_width = "64")]` to determine if using the wide version was better or not by looking at the assembly on godbolt. This was applied to `u32`, `u64` and `usize` and doesn't change the behavior only the assembly code generated.~~