Commit Graph

38 Commits

Author SHA1 Message Date
Eduard-Mihai Burtescu
bc6af97ed0 core: disable ptr::swap_nonoverlapping_one's block optimization on SPIR-V. 2021-04-04 22:26:27 +03:00
Eduard-Mihai Burtescu
3c3d3ddde9 core: rearrange ptr::swap_nonoverlapping_one's cases (no functional changes). 2021-04-04 22:26:00 +03:00
Dylan DPC
5b67543c98
Rollup merge of #83579 - RalfJung:ptr-arithmetic, r=dtolnay
Improve pointer arithmetic docs

* Add slightly more detailed definition of "allocated object" to the module docs, and link it from everywhere.
* Clarify the "remains attached" wording a bit (at least I hope this is clearer).
* Remove the sentence about using integer arithmetic; this seems to confuse people even if it is technically correct.

As usual, the edit needs to be done in a dozen places to remain consistent, I hope I got them all.
2021-03-30 11:34:26 +02:00
Ralf Jung
b5d71bfb0f add definition of 'allocated object', and link it from relevant method docs 2021-03-27 19:26:10 +01:00
Ralf Jung
fb4f48e032 make unaligned_refereces future-incompat lint warn-by-default, and remove the safe_packed_borrows lint that it replaces 2021-03-27 16:59:37 +01:00
bors
f82664191d Auto merge of #83053 - oli-obk:const_stab_version, r=m-ou-se
Fix const stability `since` versions.

fixes #82085

r? `@m-ou-se`
2021-03-21 16:21:39 +00:00
Albin Hedman
45988ee438
Constify mem::replace and ptr::replace 2021-03-15 20:45:43 +01:00
Albin Hedman
64e2248794
Constify mem::swap and ptr::swap[_nonoverlapping] 2021-03-15 20:45:22 +01:00
Oli Scherer
6f3635d87b Fix const stability since versions. 2021-03-15 14:39:18 +00:00
Motoki Ikeda
71a784d763 Fix a typo in swap_nonoverlapping_bytes 2021-03-14 16:32:15 +09:00
Joshua Nelson
9a75f4fed1 Convert primitives to use intra-doc links 2021-02-25 20:31:53 -05:00
Albin Hedman
89c761058a
Constify ptr::write and the write[_unaligned] methods on *mut T
Constify intrinsics::forget
2021-02-23 18:00:01 +01:00
Simon Sapin
cf000f0408 Pointer metadata: add tracking issue number 2021-02-15 14:27:51 +01:00
Simon Sapin
787f4de6ab Use new pointer metadata API inside libcore instead of manual transmutes 2021-02-15 14:27:34 +01:00
Simon Sapin
937d580a25 Add ptr::from_raw_parts, ptr::from_raw_parts_mut, and NonNull::from_raw_parts
The use of module-level functions instead of associated functions
on `<*const T>` or `<*mut T>` follows the precedent of
`ptr::slice_from_raw_parts` and `ptr::slice_from_raw_parts_mut`.
2021-02-15 14:27:31 +01:00
Simon Sapin
696b239f72 Add ptr::Pointee trait (for all types) and ptr::metadata function
RFC: https://github.com/rust-lang/rfcs/pull/2580
2021-02-15 14:27:12 +01:00
bors
8e54a21139 Auto merge of #81238 - RalfJung:copy-intrinsics, r=m-ou-se
directly expose copy and copy_nonoverlapping intrinsics

This effectively un-does https://github.com/rust-lang/rust/pull/57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in https://github.com/rust-lang/rust/pull/80290 and https://github.com/rust-lang/rust/issues/81163.

Cc `@bjorn3` `@therealprof`
2021-02-13 20:30:07 +00:00
Ralf Jung
13ffa43bbb rename raw_const/mut -> const/mut_addr_of, and stabilize them 2021-01-29 15:18:45 +01:00
Ralf Jung
18d12ad171 directly expose copy and copy_nonoverlapping intrinsics 2021-01-21 10:41:22 +01:00
Ralf Jung
712d065061 remove some outdated comments regarding debug assertions 2021-01-18 13:06:01 +01:00
Ralf Jung
a5b89a00cb
extend comment
Co-authored-by: lcnr <bastian_kauschke@hotmail.de>
2021-01-02 16:58:38 +01:00
Ralf Jung
1862135351 implement ptr::write without dedicated intrinsic 2020-12-30 18:39:05 +01:00
Albin Hedman
0cea1c9206 Added reference to tracking issue #80377 2020-12-26 14:03:28 +01:00
Albin Hedman
7594d2a084 Constify ptr::read and ptr::read_unaligned 2020-12-26 02:25:08 +01:00
Dylan DPC
6cd02a85f1
Rollup merge of #77844 - RalfJung:zst-box, r=nikomatsakis
clarify rules for ZST Boxes

LLVM's rules around `getelementptr inbounds` with offset 0 are a bit annoying, and as a consequence we have no choice but say that a `Box<()>` pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this.

This is based on conversations on the LLVM mailing list.
* Here's my initial mail: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html
* The first email of the March part of that thread: https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html
* First email of the April part: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html

The conclusion for me at least was that `getelementptr inbounds` with offset 0 is *not* the identity function, but can sometimes return `poison` even when the input is a regular pointer -- specifically, it returns `poison` when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers.

Note that there [are formal proposals](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf) for LLVM semantics where `getelementptr inbounds` with offset 0 isn't quite the identity function but never returns `poison` (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.
2020-11-21 19:44:07 +01:00
Ralf Jung
a7677f7714 reference NonNull::dangling 2020-11-20 11:09:49 +01:00
Camelid
fee4f8feb0 Improve wording of core::ptr::drop_in_place docs
And two small intra-doc link conversions in `std::{f32, f64}`.
2020-10-29 20:09:29 -07:00
bors
69e68cf550 Auto merge of #75728 - nagisa:improve_align_offset_2, r=Mark-Simulacrum
Optimise align_offset for stride=1 further

`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:

    %address = ptrtoint i8* %p to i64
    %negptr = sub i64 0, %address
    %offset = and i64 %negptr, %a_minus_one

And produces pretty much ideal code-gen when this function is used in
isolation.

Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.

    %aligned = getelementptr inbounds i8, i8* %p, i64 %offset

This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:

    ; x86_64
    dec     rsi
    mov     rax, rdi
    neg     rax
    and     rax, rsi
    add     rax, rdi

In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:

    ; x86_64
    lea     rax, [rsi + rdi - 1]
    neg     rsi
    and     rax, rsi

To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.

One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:

    ; using add enables `lea` and similar CISCisms
    %offset_ptr = add i64 %address, %a_minus_one
    %mask = sub i64 0, %a
    %masked = and i64 %offset_ptr, %mask
    ; can be folded with `gepi` that may follow
    %offset = sub i64 %masked, %address

…and generates the intended x86_64 machine code.
One might also wonder how the increased amount of code would impact a
RISC target. Turns out not much:

    ; aarch64 previous                 ; aarch64 new
    sub     x8, x1, #1                 add     x8, x1, x0
    neg     x9, x0                     sub     x8, x8, #1
    and     x8, x9, x8                 neg     x9, x1
    add     x0, x0, x8                 and     x0, x8, x9

    (and similarly for ppc, sparc, mips, riscv, etc)

The only target that seems to do worse is… wasm32.

Onto actual measurements – the best way to evaluate snipets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%!

Fixes #75579
2020-10-26 06:49:34 +00:00
Ralf Jung
defcd7ff47 stop relying on feature(untagged_unions) in stdlib 2020-10-16 11:33:35 +02:00
Ralf Jung
0f572a9810 explicitly talk about integer literals 2020-10-13 09:30:09 +02:00
Ralf Jung
c555aabc5b clarify rules for ZST Boxes 2020-10-12 10:32:11 +02:00
Camelid
884a1b4b9b Fix anchor links
#safety -> self#safety
2020-09-09 13:42:57 -07:00
Camelid
d24026bb6d Fix broken link
`write` is ambiguous because there's also a macro called `write`.

Also removed unnecessary and potentially confusing link to a function in
its own docs.
2020-09-08 19:24:57 -07:00
Camelid
325acefee4 Use intra-doc links in core::ptr
The only link that I did not change is a link to a function on the
`pointer` primitive because intra-doc links for the `pointer` primitive
don't work yet (see #63351).
2020-09-08 14:36:36 -07:00
Simonas Kazlauskas
4bfacffb90 Optimise align_offset for stride=1 further
`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:

    %address = ptrtoint i8* %p to i64
    %negptr = sub i64 0, %address
    %offset = and i64 %negptr, %a_minus_one

And produces pretty much ideal code-gen when this function is used in
isolation.

Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.

    %aligned = getelementptr inbounds i8, i8* %p, i64 %offset

This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:

    ; x86_64
    dec     rsi
    mov     rax, rdi
    neg     rax
    and     rax, rsi
    add     rax, rdi

In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:

    ; x86_64
    lea     rax, [rsi + rdi - 1]
    neg     rsi
    and     rax, rsi

To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.

One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:

    ; using add enables `lea` and similar CISCisms
    %offset_ptr = add i64 %address, %a_minus_one
    %mask = sub i64 0, %a
    %masked = and i64 %offset_ptr, %mask
    ; can be folded with `gepi` that may follow
    %offset = sub i64 %masked, %address

…and generates the intended x86_64 machine code. One might also wonder
how the increased amount of code would impact a RISC target. Turns out
not much:

    ; aarch64 previous                 ; aarch64 new
    sub     x8, x1, #1                 add     x8, x1, x0
    neg     x9, x0                     sub     x8, x8, #1
    and     x8, x9, x8                 neg     x9, x1
    add     x0, x0, x8                 and     x0, x8, x9

    (and similarly for ppc, sparc, mips, riscv, etc)

The only target that seems to do worse is… wasm32.

Onto actual measurements – the best way to evaluate snippets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%, however!
2020-08-20 05:06:00 +03:00
Simonas Kazlauskas
5d22b18bf2 Improve codegen of align_offset when stride == 1
Previously checking for `pmoda == 0` would get LLVM to generate branchy
code, when, for `stride = 1` the offset can be computed without such a
branch by doing effectively a `-p % a`.

For well-known (constant) alignments, with the new ordering of these
conditionals, we end up generating 2 to 3 cheap instructions on x86_64:

    movq    %rdi, %rax
    negl    %eax
    andl    $7, %eax

instead of 5+ as previously.

For unknown alignments the new code also generates just 3 instructions:

    negq    %rdi
    leaq    -1(%rsi), %rax
    andq    %rdi, %rax
2020-08-16 21:31:48 +03:00
Simonas Kazlauskas
e7271da69a Improve align_offset at opt-level <= 1
At opt-level <= 1, the methods such as `wrapping_mul` are not being
inlined, causing significant bloating and slowdowns of the
implementation at these optimisation levels.

With use of these intrinsics, the codegen of this function at
-Copt_level=1 is the same as it is at -Copt_level=3.
2020-08-16 21:31:48 +03:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00