nordic-dev.net/rust - rust

mirror of https://github.com/rust-lang/rust.git synced 2025-06-05 19:58:32 +00:00

Author	SHA1	Message	Date
Eduard-Mihai Burtescu	bc6af97ed0	core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.	2021-04-04 22:26:27 +03:00
Eduard-Mihai Burtescu	3c3d3ddde9	core: rearrange `ptr::swap_nonoverlapping_one`'s cases (no functional changes).	2021-04-04 22:26:00 +03:00
Dylan DPC	5b67543c98	Rollup merge of #83579 - RalfJung:ptr-arithmetic, r=dtolnay Improve pointer arithmetic docs * Add slightly more detailed definition of "allocated object" to the module docs, and link it from everywhere. * Clarify the "remains attached" wording a bit (at least I hope this is clearer). * Remove the sentence about using integer arithmetic; this seems to confuse people even if it is technically correct. As usual, the edit needs to be done in a dozen places to remain consistent, I hope I got them all.	2021-03-30 11:34:26 +02:00
Ralf Jung	b5d71bfb0f	add definition of 'allocated object', and link it from relevant method docs	2021-03-27 19:26:10 +01:00
Ralf Jung	fb4f48e032	make unaligned_refereces future-incompat lint warn-by-default, and remove the safe_packed_borrows lint that it replaces	2021-03-27 16:59:37 +01:00
bors	f82664191d	Auto merge of #83053 - oli-obk:const_stab_version, r=m-ou-se Fix const stability `since` versions. fixes #82085 r? `@m-ou-se`	2021-03-21 16:21:39 +00:00
Albin Hedman	45988ee438	Constify mem::replace and ptr::replace	2021-03-15 20:45:43 +01:00
Albin Hedman	64e2248794	Constify mem::swap and ptr::swap[_nonoverlapping]	2021-03-15 20:45:22 +01:00
Oli Scherer	6f3635d87b	Fix const stability `since` versions.	2021-03-15 14:39:18 +00:00
Motoki Ikeda	71a784d763	Fix a typo in `swap_nonoverlapping_bytes`	2021-03-14 16:32:15 +09:00
Joshua Nelson	9a75f4fed1	Convert primitives to use intra-doc links	2021-02-25 20:31:53 -05:00
Albin Hedman	89c761058a	Constify ptr::write and the write[_unaligned] methods on *mut T Constify intrinsics::forget	2021-02-23 18:00:01 +01:00
Simon Sapin	cf000f0408	Pointer metadata: add tracking issue number	2021-02-15 14:27:51 +01:00
Simon Sapin	787f4de6ab	Use new pointer metadata API inside libcore instead of manual transmutes	2021-02-15 14:27:34 +01:00
Simon Sapin	937d580a25	Add `ptr::from_raw_parts`, `ptr::from_raw_parts_mut`, and `NonNull::from_raw_parts` The use of module-level functions instead of associated functions on `<const T>` or `<mut T>` follows the precedent of `ptr::slice_from_raw_parts` and `ptr::slice_from_raw_parts_mut`.	2021-02-15 14:27:31 +01:00
Simon Sapin	696b239f72	Add `ptr::Pointee` trait (for all types) and `ptr::metadata` function RFC: https://github.com/rust-lang/rfcs/pull/2580	2021-02-15 14:27:12 +01:00
bors	8e54a21139	Auto merge of #81238 - RalfJung:copy-intrinsics, r=m-ou-se directly expose copy and copy_nonoverlapping intrinsics This effectively un-does https://github.com/rust-lang/rust/pull/57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in https://github.com/rust-lang/rust/pull/80290 and https://github.com/rust-lang/rust/issues/81163. Cc `@bjorn3` `@therealprof`	2021-02-13 20:30:07 +00:00
Ralf Jung	13ffa43bbb	rename raw_const/mut -> const/mut_addr_of, and stabilize them	2021-01-29 15:18:45 +01:00
Ralf Jung	18d12ad171	directly expose copy and copy_nonoverlapping intrinsics	2021-01-21 10:41:22 +01:00
Ralf Jung	712d065061	remove some outdated comments regarding debug assertions	2021-01-18 13:06:01 +01:00
Ralf Jung	a5b89a00cb	extend comment Co-authored-by: lcnr <bastian_kauschke@hotmail.de>	2021-01-02 16:58:38 +01:00
Ralf Jung	1862135351	implement ptr::write without dedicated intrinsic	2020-12-30 18:39:05 +01:00
Albin Hedman	0cea1c9206	Added reference to tracking issue #80377	2020-12-26 14:03:28 +01:00
Albin Hedman	7594d2a084	Constify ptr::read and ptr::read_unaligned	2020-12-26 02:25:08 +01:00
Dylan DPC	6cd02a85f1	Rollup merge of #77844 - RalfJung:zst-box, r=nikomatsakis clarify rules for ZST Boxes LLVM's rules around `getelementptr inbounds` with offset 0 are a bit annoying, and as a consequence we have no choice but say that a `Box<()>` pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this. This is based on conversations on the LLVM mailing list. * Here's my initial mail: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html * The first email of the March part of that thread: https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html * First email of the April part: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html The conclusion for me at least was that `getelementptr inbounds` with offset 0 is not the identity function, but can sometimes return `poison` even when the input is a regular pointer -- specifically, it returns `poison` when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers. Note that there [are formal proposals](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf) for LLVM semantics where `getelementptr inbounds` with offset 0 isn't quite the identity function but never returns `poison` (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.	2020-11-21 19:44:07 +01:00
Ralf Jung	a7677f7714	reference NonNull::dangling	2020-11-20 11:09:49 +01:00
Camelid	fee4f8feb0	Improve wording of `core::ptr::drop_in_place` docs And two small intra-doc link conversions in `std::{f32, f64}`.	2020-10-29 20:09:29 -07:00
bors	69e68cf550	Auto merge of #75728 - nagisa:improve_align_offset_2, r=Mark-Simulacrum Optimise align_offset for stride=1 further `stride == 1` case can be computed more efficiently through `-p (mod a)`. That, then translates to a nice and short sequence of LLVM instructions: %address = ptrtoint i8* %p to i64 %negptr = sub i64 0, %address %offset = and i64 %negptr, %a_minus_one And produces pretty much ideal code-gen when this function is used in isolation. Typical use of this function will, however, involve use of the result to offset a pointer, i.e. %aligned = getelementptr inbounds i8, i8* %p, i64 %offset This still looks very good, but LLVM does not really translate that to what would be considered ideal machine code (on any target). For example that's the codegen we obtain for an unknown alignment: ; x86_64 dec rsi mov rax, rdi neg rax and rax, rsi add rax, rdi In particular negating a pointer is not something that’s going to be optimised for in the design of CISC architectures like x86_64. They are much better at offsetting pointers. And so we’d love to utilize this ability and produce code that's more like this: ; x86_64 lea rax, [rsi + rdi - 1] neg rsi and rax, rsi To achieve this we need to give LLVM an opportunity to apply its various peep-hole optimisations that it does during DAG selection. In particular, the `and` instruction appears to be a major inhibitor here. We cannot, sadly, get rid of this load-bearing operation, but we can reorder operations such that LLVM has more to work with around this instruction. One such ordering is proposed in #75579 and results in LLVM IR that looks broadly like this: ; using add enables `lea` and similar CISCisms %offset_ptr = add i64 %address, %a_minus_one %mask = sub i64 0, %a %masked = and i64 %offset_ptr, %mask ; can be folded with `gepi` that may follow %offset = sub i64 %masked, %address …and generates the intended x86_64 machine code. One might also wonder how the increased amount of code would impact a RISC target. Turns out not much: ; aarch64 previous ; aarch64 new sub x8, x1, #1 add x8, x1, x0 neg x9, x0 sub x8, x8, #1 and x8, x9, x8 neg x9, x1 add x0, x0, x8 and x0, x8, x9 (and similarly for ppc, sparc, mips, riscv, etc) The only target that seems to do worse is… wasm32. Onto actual measurements – the best way to evaluate snipets like these is to use llvm-mca. Much like Aarch64 assembly would allow to suspect, there isn’t any performance difference to be found. Both snippets execute in same number of cycles for the CPUs I tried. On x86_64, we get throughput improvement of >50%! Fixes #75579	2020-10-26 06:49:34 +00:00
Ralf Jung	defcd7ff47	stop relying on feature(untagged_unions) in stdlib	2020-10-16 11:33:35 +02:00
Ralf Jung	0f572a9810	explicitly talk about integer literals	2020-10-13 09:30:09 +02:00
Ralf Jung	c555aabc5b	clarify rules for ZST Boxes	2020-10-12 10:32:11 +02:00
Camelid	884a1b4b9b	Fix anchor links #safety -> self#safety	2020-09-09 13:42:57 -07:00
Camelid	d24026bb6d	Fix broken link `write` is ambiguous because there's also a macro called `write`. Also removed unnecessary and potentially confusing link to a function in its own docs.	2020-09-08 19:24:57 -07:00
Camelid	325acefee4	Use intra-doc links in `core::ptr` The only link that I did not change is a link to a function on the `pointer` primitive because intra-doc links for the `pointer` primitive don't work yet (see #63351).	2020-09-08 14:36:36 -07:00
Simonas Kazlauskas	4bfacffb90	Optimise align_offset for stride=1 further `stride == 1` case can be computed more efficiently through `-p (mod a)`. That, then translates to a nice and short sequence of LLVM instructions: %address = ptrtoint i8* %p to i64 %negptr = sub i64 0, %address %offset = and i64 %negptr, %a_minus_one And produces pretty much ideal code-gen when this function is used in isolation. Typical use of this function will, however, involve use of the result to offset a pointer, i.e. %aligned = getelementptr inbounds i8, i8* %p, i64 %offset This still looks very good, but LLVM does not really translate that to what would be considered ideal machine code (on any target). For example that's the codegen we obtain for an unknown alignment: ; x86_64 dec rsi mov rax, rdi neg rax and rax, rsi add rax, rdi In particular negating a pointer is not something that’s going to be optimised for in the design of CISC architectures like x86_64. They are much better at offsetting pointers. And so we’d love to utilize this ability and produce code that's more like this: ; x86_64 lea rax, [rsi + rdi - 1] neg rsi and rax, rsi To achieve this we need to give LLVM an opportunity to apply its various peep-hole optimisations that it does during DAG selection. In particular, the `and` instruction appears to be a major inhibitor here. We cannot, sadly, get rid of this load-bearing operation, but we can reorder operations such that LLVM has more to work with around this instruction. One such ordering is proposed in #75579 and results in LLVM IR that looks broadly like this: ; using add enables `lea` and similar CISCisms %offset_ptr = add i64 %address, %a_minus_one %mask = sub i64 0, %a %masked = and i64 %offset_ptr, %mask ; can be folded with `gepi` that may follow %offset = sub i64 %masked, %address …and generates the intended x86_64 machine code. One might also wonder how the increased amount of code would impact a RISC target. Turns out not much: ; aarch64 previous ; aarch64 new sub x8, x1, #1 add x8, x1, x0 neg x9, x0 sub x8, x8, #1 and x8, x9, x8 neg x9, x1 add x0, x0, x8 and x0, x8, x9 (and similarly for ppc, sparc, mips, riscv, etc) The only target that seems to do worse is… wasm32. Onto actual measurements – the best way to evaluate snippets like these is to use llvm-mca. Much like Aarch64 assembly would allow to suspect, there isn’t any performance difference to be found. Both snippets execute in same number of cycles for the CPUs I tried. On x86_64, we get throughput improvement of >50%, however!	2020-08-20 05:06:00 +03:00
Simonas Kazlauskas	5d22b18bf2	Improve codegen of align_offset when stride == 1 Previously checking for `pmoda == 0` would get LLVM to generate branchy code, when, for `stride = 1` the offset can be computed without such a branch by doing effectively a `-p % a`. For well-known (constant) alignments, with the new ordering of these conditionals, we end up generating 2 to 3 cheap instructions on x86_64: movq %rdi, %rax negl %eax andl $7, %eax instead of 5+ as previously. For unknown alignments the new code also generates just 3 instructions: negq %rdi leaq -1(%rsi), %rax andq %rdi, %rax	2020-08-16 21:31:48 +03:00
Simonas Kazlauskas	e7271da69a	Improve `align_offset` at opt-level <= 1 At opt-level <= 1, the methods such as `wrapping_mul` are not being inlined, causing significant bloating and slowdowns of the implementation at these optimisation levels. With use of these intrinsics, the codegen of this function at -Copt_level=1 is the same as it is at -Copt_level=3.	2020-08-16 21:31:48 +03:00
mark	2c31b45ae8	mv std libs to library/	2020-07-27 19:51:13 -05:00

38 Commits