rustc_target: `TyAndLayout::field` should never error.
This refactor (making `TyAndLayout::field` return `TyAndLayout` without any `Result` around it) is based on a simple observation, regarding `TyAndLayout::field`:
If `cx.layout_of(ty)` succeeds (for some `cx` and `ty`), then `.field(cx, i)` on the resulting `TyAndLayout` should *always* succeed in computing `cx.layout_of(field_ty)` (where `field_ty` is the type of the `i`th field of `ty`).
The reason for this is that no matter which field is chosen, `cx.layout_of(field_ty)` *will have already been computed*, as part of computing `cx.layout_of(ty)`, as we cannot determine the layout of *any* type without considering the layouts of *all* of its fields.
And so it should be fine to turn any errors into ICEs, since they likely indicate a `cx` mismatch, or some other edge case that is due to a compiler bug (as opposed to ever being an user-facing error).
<hr/>
Each commit should probably be reviewed separately, though note that there's some `where` clauses (in `rustc_target::abi::call::*`) that change in most commits.
cc `@nagisa` `@oli-obk`
S390x inline asm
This adds register definitions and constraint codes for the s390x general and floating point registers necessary for fixing #85931; as well as a few tests.
Further testing is needed, but I am a little unsure of what specific tests should be added to `src/test/assembly/asm/s390x.rs` to address this.
lazily "compute" anon const default substs
Continuing the work of #83086, this implements the discussed solution for the [unused substs problem](https://github.com/rust-lang/project-const-generics/blob/master/design-docs/anon-const-substs.md#unused-substs). As of now, anonymous constants inherit all of their parents generics, even if they do not use them, e.g. in `fn foo<T, const N: usize>() -> [T; N + 1]`, the array length has `T` as a generic parameter even though it doesn't use it. These *unused substs* cause some backwards incompatible, and imo incorrect behavior, e.g. #78369.
---
We do not actually filter any generic parameters here and the `default_anon_const_substs` query still a dummy which only checks that
- we now prevent the previously existing query cycles and are able to call `predicates_of(parent)` when computing the substs of anonymous constants
- the default anon consts substs only include the typeflags we assume it does.
Implementing that filtering will be left as future work.
---
The idea of this PR is to delay the creation of the anon const substs until after we've computed `predicates_of` for the parent of the anon const. As the predicates of the parent can however contain the anon const we still have to create a `ty::Const` for it.
We do this by changing the substs field of `ty::Unevaluated` to an option and modifying accesses to instead call the method `unevaluated.substs(tcx)` which returns the substs as before. If the substs - now `substs_` - of `ty::Unevaluated` are `None`, it means that the anon const currently has its default substs, i.e. the substs it has when first constructed, which are the generic parameters it has available. To be able to call `unevaluated.substs(tcx)` in a `TypeVisitor`, we add the non-defaulted method `fn tcx_for_anon_const_substs(&self) -> Option<TyCtxt<'tcx>>`. In case `tcx_for_anon_const_substs` returns `None`, unknown anon const default substs are skipped entirely.
Even when `substs_` is `None` we still have to treat the constant as if it has its default substs. To do this, `TypeFlags` are modified so that it is clear whether they can still change when *exposing* any anon const default substs. A new flag, `HAS_UNKNOWN_DEFAULT_CONST_SUBSTS`, is added in case some default flags are missing.
The rest of this PR are some smaller changes to either not cause cycles by trying to access the default anon const substs too early or to be able to access the `tcx` in previously unused locations.
cc `@rust-lang/project-const-generics`
r? `@nikomatsakis`
Use custom wrap-around type instead of RangeInclusive
Two reasons:
1. More memory is allocated than necessary for `valid_range` in `Scalar`. The range is not used as an iterator and `exhausted` is never used.
2. `contains`, `count` etc. methods in `RangeInclusive` are doing very unhelpful(and dangerous!) things when used as a wrap-around range. - In general this PR wants to limit potentially confusing methods, that have a low probability of working.
Doing a local perf run, every metric shows improvement except for instructions.
Max-rss seem to have a very consistent improvement.
Sorry - newbie here, probably doing something wrong.
Stop emitting the `dso_local` LLVM attribute for external symbols under the static relocation model on macOS.
This matches Clang's behavior:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1038-L1040)
Even if `dso_local` were properly supported in this way on macOS, it seems
incorrect to add this annotation as liberally as we did. The `dso_local`
annotation is for symbols that ultimately end up in the same linkage unit, but
we were adding this annotation even for `static` values inside `extern` blocks
marked with `#[link(type="framework")]`, which should be considered dynamically
linked. Note that Clang likewise avoids emitting `dso_local` for `dllimport`
symbols:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1005-L1007)
This issue caused breakage in the `ring` crate, which links to a symbol defined
in `Security.framework` that ultimately resolves to address `0x0`:
b94d61e044/src/rand.rs (L390)
For this symbol, the use of `dso_local` causes LLVM to emit a relocation of
type `X86_64_RELOC_SIGNED`, which is a 32-bit signed PC-relative offset. If the
binary is large enough, `0x0` might be out of range, and the link will fail.
Avoiding `dso_local` causes LLVM to use the GOT instead, emitting a relocation
of type `X86_64_RELOC_GOT_LOAD`, which will properly handle the large offset
and cause the link to succeed.
As a side note, the static relocation model is effectively deprecated for
security reasons on macOS, as it prohibits PIE. It's also completely
unsupported on Apple Silicon, so I don't think it's worth going to the effort
of properly supporting this model on that platform.
Upgrade to LLVM 13
Work in progress update to LLVM 13. Main changes:
* InlineAsm diagnostics reported using SrcMgr diagnostic kind are now handled. Previously these used a separate diag handler.
* Codegen tests are updated for additional attributes.
* Some data layouts have changed.
* Switch `#[used]` attribute from `llvm.used` to `llvm.compiler.used` to avoid SHF_GNU_RETAIN flag introduced in https://reviews.llvm.org/D97448, which appears to trigger a bug in older versions of gold.
* Set `LLVM_INCLUDE_TESTS=OFF` to avoid Python 3.6 requirement.
Upstream issues:
* ~~https://bugs.llvm.org/show_bug.cgi?id=51210 (InlineAsm diagnostic reporting for module asm)~~ Fixed by 1558bb80c0.
* ~~https://bugs.llvm.org/show_bug.cgi?id=51476 (Miscompile on AArch64 due to incorrect comparison elimination)~~ Fixed by 81b106584f.
* https://bugs.llvm.org/show_bug.cgi?id=51207 (Can't set custom section flags anymore). Problematic change reverted in our fork, https://reviews.llvm.org/D107216 posted for upstream revert.
* https://bugs.llvm.org/show_bug.cgi?id=51211 (Regression in codegen for #83623). This is an optimization regression that we may likely have to eat for this release. The fix for #83623 was based on an incorrect premise, and this needs to be properly addressed in the MergeICmps pass.
The [compile-time impact](https://perf.rust-lang.org/compare.html?start=ef9549b6c0efb7525c9b012148689c8d070f9bc0&end=0983094463497eec22d550dad25576a894687002) is mixed, but quite positive as LLVM upgrades go.
The LLVM 13 final release is scheduled for Sep 21st. The current nightly is scheduled for stable release on Oct 21st.
r? `@ghost`
This matches Clang's behavior:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1038-L1040)
Even if `dso_local` were properly supported in this way on macOS, it seems
incorrect to add this annotation as liberally as we did. The `dso_local`
annotation is for symbols that ultimately end up in the same linkage unit, but
we were adding this annotation even for `static` values inside `extern` blocks
marked with `#[link(type="framework")]`, which should be considered dynamically
linked. Note that Clang likewise avoids emitting `dso_local` for `dllimport`
symbols:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1005-L1007)
This issue caused breakage in the `ring` crate, which links to a symbol defined
in `Security.framework` that ultimately resolves to address `0x0`:
b94d61e044/src/rand.rs (L390)
For this symbol, the use of `dso_local` causes LLVM to emit a relocation of
type `X86_64_RELOC_SIGNED`, which is a 32-bit signed PC-relative offset. If the
binary is large enough, `0x0` might be out of range, and the link will fail.
Avoiding `dso_local` causes LLVM to use the GOT instead, emitting a relocation
of type `X86_64_RELOC_GOT_LOAD`, which will properly handle the large offset
and cause the link to succeed.
As a side note, the static relocation model is effectively deprecated for
security reasons on macOS, as it prohibits PIE. It's also completely
unsupported on Apple Silicon, so I don't think it's worth going to the effort
of properly supporting this model on that platform.
The TargetMachine may be referencing data in the context. In
particular, at least the GlobalISel instruction selector stored
in the TM may reference a TrackedMDNode DebugLoc that destruction
of the TargetMachine will try to untrack.
The #[used] attribute explicitly only requires symbols to be
retained in object files, but allows the linker to drop them
if dead. This corresponds to llvm.compiler.used semantics.
The motivation to change this *now* is that https://reviews.llvm.org/D97448
starts emitting #[used] symbols into unique sections with
SHF_GNU_RETAIN flag. This triggers a bug in some version of gold,
resulting in the ARGV_INIT_ARRAY symbol part of the .init_array
section to be incorrectly placed.
Fixes#85019
A `SourceFile` created during compilation may have a relative
path (e.g. if rustc itself is invoked with a relative path).
When we write out crate metadata, we convert all relative paths
to absolute paths using the current working direction.
However, the working directory is not included in the crate hash.
This means that the crate metadata can change while the crate
hash remains the same. Among other problems, this can cause a
fingerprint mismatch ICE, since incremental compilation uses
the crate metadata hash to determine if a foreign query is green.
This commit moves the field holding the working directory from
`Session` to `Options`, including it as part of the crate hash.
Add support for clobber_abi to asm!
This PR adds the `clobber_abi` feature that was proposed in #81092.
Fixes#81092
cc `@rust-lang/wg-inline-asm`
r? `@nagisa`
Name the captured upvars for closures/generators in debuginfo
Previously, debuggers print closures as something like
```
y::main::closure-0 (0x7fffffffdd34)
```
The pointer actually references to an upvar. It is not very obvious, especially for beginners.
It's because upvars don't have names before, as they are packed into a tuple. This PR names the upvars, so we can expect to see something like
```
y::main::closure-0 {_captured_ref__b: 0x[...]}
```
r? `@tmandry`
Discussed at https://github.com/rust-lang/rust/pull/84752#issuecomment-831639489 .
Implement `black_box` using intrinsic
Introduce `black_box` intrinsic, as suggested in https://github.com/rust-lang/rust/pull/87590#discussion_r680468700.
This is still codegenned as empty inline assembly for LLVM. For MIR interpretation and cranelift it's treated as identity.
cc `@Amanieu` as this is related to inline assembly
cc `@bjorn3` for rustc_codegen_cranelift changes
cc `@RalfJung` as this affects MIRI
r? `@nagisa` I suppose
The new implementation allows some `memcpy`s to be optimized away,
so the uninit value in ui/sanitize/memory.rs is constructed directly
onto the return place. Therefore the sanitizer now says that the
value is allocated by `main` rather than `random`.
LLVM codegen: Don't emit zero-sized padding for fields
Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the `unadjusted` ABI, see https://github.com/rust-lang/stdarch/issues/1143#issuecomment-827404092.
This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the `unadjusted` ABI.
cc `@SparrowLii` `@bjorn3`
([Discussion on internals](https://internals.rust-lang.org/t/feature-request-add-a-way-in-rustc-for-generating-struct-type-llvm-ir-without-paddings/15007))
The indexes into the VaListImpl struct used on aarch64 ABI (not macos/ios) are hard-coded which is brittle so we replace them with the usual lookup.
The varargs ffi is tested in ui/abi/variadic-ffi.rs on aarch64 Linux.
Rather than relying on `getPointerElementType()` from LLVM function
pointers, we now pass the function type explicitly when building `call`
or `invoke` instructions.