rust/compiler/rustc_codegen_gcc
bors 96477c55bc Auto merge of #131341 - taiki-e:ppc-clobber-abi, r=bzEq,workingjubilee
Support clobber_abi and vector registers (clobber-only) in PowerPC inline assembly

This supports `clobber_abi` which is one of the requirements of stabilization mentioned in #93335.

This basically does a similar thing I did in https://github.com/rust-lang/rust/pull/130630 to implement `clobber_abi` for s390x, but for powerpc/powerpc64/powerpc64le.
- This also supports vector registers (as `vreg`) as clobber-only, which need to support clobbering of them to implement `clobber_abi`.
- `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `altivec` target feature is enabled, but `core::arch::{powerpc,powerpc64}` vector types, `#[repr(simd)]`, and `core::simd` are all unstable, so the fact that this is currently a clobber-only should not be considered a blocker of clobber_abi implementation or stabilization. So I have not implemented it in this PR.
  - See https://github.com/rust-lang/rust/pull/131551 (which is based on this PR) for a PR to implement this.
  - (I'm not sticking to whether that PR should be a separate PR or part of this PR, so I can merge that PR into this PR if needed.)

Refs:
- PPC32 SysV: Section "Function Calling Sequence" in [System V Application Binary Interface PowerPC Processor Supplement](https://refspecs.linuxfoundation.org/elf/elfspec_ppc.pdf)
- PPC64 ELFv1: Section 3.2 "Function Calling Sequence" in [64-bit PowerPC ELF Application Binary Interface Supplement](https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-CALL)
- PPC64 ELFv2: Section 2.2 "Function Calling Sequence" in [64-Bit ELF V2 ABI Specification](https://openpowerfoundation.org/specifications/64bitelfabi/)
- AIX: [Register usage and conventions](https://www.ibm.com/docs/en/aix/7.3?topic=overview-register-usage-conventions), [Special registers in the PowerPC®](https://www.ibm.com/docs/en/aix/7.3?topic=overview-special-registers-in-powerpc), [AIX vector programming](https://www.ibm.com/docs/en/aix/7.3?topic=concepts-aix-vector-programming)
- Register definition in LLVM: https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/PowerPC/PPCRegisterInfo.td#L189

If I understand the above four ABI documentations correctly, except for the PPC32 SysV's VR (Vector Registers) and 32-bit AIX (currently not supported by rustc)'s r13, there does not appear to be important differences in terms of implementing `clobber_abi`:
- The above four ABIs are consistent about FPR (0-13: volatile, 14-31: nonvolatile), CR (0-1,5-7: volatile, 2-4: nonvolatile), XER (volatile), and CTR (volatile).
- As for GPR, only the registers we are treating as reserved are slightly different
  - r0, r3-r12 are volatile
  - r1(sp, reserved), r14-31 are nonvolatile
  - r2(reserved) is TOC pointer in PPC64 ELF/AIX, system-reserved register in PPC32 SysV (AFAIK used as thread pointer in Linux/BSDs)
  - r13(reserved for non-32-bit-AIX) is thread pointer in PPC64 ELF, small data area pointer register in PPC32 SysV, "reserved under 64-bit environment; not restored across system calls[^r13]" in AIX)
- As for FPSCR, volatile in PPC64 ELFv1/AIX, some fields are volatile only in certain situations (rest are volatile) in PPC32 SysV/PPC64 ELFv2.
- As for VR (Vector Registers), it is not mentioned in PPC32 SysV, v0-v19 are volatile in both in PPC64 ELF/AIX, v20-v31 are nonvolatile in PPC64 ELF, reserved or nonvolatile depending on the ABI ([vec-extabi vs vec-default in LLVM](https://reviews.llvm.org/D89684), we are [using vec-extabi](https://github.com/rust-lang/rust/pull/131341#discussion_r1797693299)) in AIX:
  > When the default Vector enabled mode is used, these registers are reserved and must not be used.
  > In the extended ABI vector enabled mode, these registers are nonvolatile and their values are preserved across function calls

  I left [FIXME comment about PPC32 SysV](https://github.com/rust-lang/rust/pull/131341#discussion_r1790496095) and added ABI check for AIX.
- As for VRSAVE, it is not mentioned in PPC32 SysV, nonvolatile in PPC64 ELFv1, reserved in PPC64 ELFv2/AIX
- As for VSCR, it is not mentioned in PPC32 SysV/PPC64 ELFv1, some fields are volatile only in certain situations (rest are volatile) in PPC64 ELFv2, volatile in AIX

We are currently treating r1-r2, r13 (non-32-bit-AIX), r29-r31, LR, CTR, and VRSAVE as reserved.
We are currently not processing anything about FPSCR and VSCR, but I feel those are things that should be processed by `preserves_flags` rather than `clobber_abi` if we need to do something about them. (However, PPCRegisterInfo.td in LLVM does not seem to define anything about them.)

Replaces #111335 and #124279

cc `@ecnelises` `@bzEq` `@lu-zero`

r? `@Amanieu`

`@rustbot` label +O-PowerPC +A-inline-assembly

[^r13]: callee-saved, according to [LLVM](6a6af0246b/llvm/lib/Target/PowerPC/PPCCallingConv.td (L322)) and [GCC](a9173a50e7/gcc/config/rs6000/rs6000.h (L859)).
2024-11-05 03:13:47 +00:00
..
.github/workflows Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
build_system Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
doc Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
example Rename Receiver -> LegacyReceiver 2024-10-22 12:55:16 +00:00
patches Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
src Auto merge of #131341 - taiki-e:ppc-clobber-abi, r=bzEq,workingjubilee 2024-11-05 03:13:47 +00:00
target_specs Merge commit '98ed962c7d3eebe12c97588e61245273d265e72f' into master 2024-07-10 12:44:23 +02:00
tests Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
tools Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
.gitignore Merge commit '98ed962c7d3eebe12c97588e61245273d265e72f' into master 2024-07-10 12:44:23 +02:00
.ignore Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
.rustfmt.toml Align cg_gcc rustfmt.toml with rust's 2024-07-17 20:17:44 +02:00
Cargo.lock Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
Cargo.toml Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
config.example.toml Merge commit '98ed962c7d3eebe12c97588e61245273d265e72f' into master 2024-07-10 12:44:23 +02:00
libgccjit.version Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
LICENSE-APACHE Add 'compiler/rustc_codegen_gcc/' from commit 'afae271d5d3719eeb92c18bc004bb6d1965a5f3f' 2021-08-12 21:53:49 -04:00
LICENSE-MIT Add 'compiler/rustc_codegen_gcc/' from commit 'afae271d5d3719eeb92c18bc004bb6d1965a5f3f' 2021-08-12 21:53:49 -04:00
messages.ftl codegen_ssa: consolidate tied feature checking 2024-09-24 15:48:49 +01:00
Readme.md Merge commit '98ed962c7d3eebe12c97588e61245273d265e72f' into master 2024-07-10 12:44:23 +02:00
rust-toolchain Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-update 2024-09-27 22:00:17 +02:00
y.sh Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00

WIP libgccjit codegen backend for rust

Chat on IRC Chat on Matrix

This is a GCC codegen for rustc, which means it can be loaded by the existing rustc frontend, but benefits from GCC: more architectures are supported and GCC's optimizations are used.

Despite its name, libgccjit can be used for ahead-of-time compilation, as is used here.

Motivation

The primary goal of this project is to be able to compile Rust code on platforms unsupported by LLVM. A secondary goal is to check if using the gcc backend will provide any run-time speed improvement for the programs compiled using rustc.

Dependencies

rustup: Follow the instructions on the official website

DejaGnu: Consider to install DejaGnu which is necessary for running the libgccjit test suite. website

Building

This requires a patched libgccjit in order to work. You need to use my fork of gcc which already includes these patches.

$ cp config.example.toml config.toml

If don't need to test GCC patches you wrote in our GCC fork, then the default configuration should be all you need. You can update the rustc_codegen_gcc without worrying about GCC.

Building with your own GCC version

If you wrote a patch for GCC and want to test it without this backend, you will need to do a few more things.

To build it (most of these instructions come from here, so don't hesitate to take a look there if you encounter an issue):

$ git clone https://github.com/antoyo/gcc
$ sudo apt install flex libmpfr-dev libgmp-dev libmpc3 libmpc-dev
$ mkdir gcc-build gcc-install
$ cd gcc-build
$ ../gcc/configure \
    --enable-host-shared \
    --enable-languages=jit \
    --enable-checking=release \ # it enables extra checks which allow to find bugs
    --disable-bootstrap \
    --disable-multilib \
    --prefix=$(pwd)/../gcc-install
$ make -j4 # You can replace `4` with another number depending on how many cores you have.

If you want to run libgccjit tests, you will need to also enable the C++ language in the configure:

--enable-languages=jit,c++

Then to run libgccjit tests:

$ cd gcc # from the `gcc-build` folder
$ make check-jit
# To run one specific test:
$ make check-jit RUNTESTFLAGS="-v -v -v jit.exp=jit.dg/test-asm.cc"

Put the path to your custom build of libgccjit in the file config.toml.

You now need to set the gcc-path value in config.toml with the result of this command:

$ dirname $(readlink -f `find . -name libgccjit.so`)

and to comment the download-gccjit setting:

gcc-path = "[MY PATH]"
# download-gccjit = true

Then you can run commands like this:

$ ./y.sh prepare # download and patch sysroot src and install hyperfine for benchmarking
$ ./y.sh build --sysroot --release

To run the tests:

$ ./y.sh test --release

Usage

You have to run these commands, in the corresponding order:

$ ./y.sh prepare
$ ./y.sh build --sysroot

To check if all is working correctly, run:

$ ./y.sh cargo build --manifest-path tests/hello-world/Cargo.toml

Cargo

$ CHANNEL="release" $CG_GCCJIT_DIR/y.sh cargo run

If you compiled cg_gccjit in debug mode (aka you didn't pass --release to ./y.sh test) you should use CHANNEL="debug" instead or omit CHANNEL="release" completely.

LTO

To use LTO, you need to set the variable EMBED_LTO_BITCODE=1 in addition to setting lto = "fat" in the Cargo.toml.

Failing to set EMBED_LTO_BITCODE will give you the following error:

error: failed to copy bitcode to object file: No such file or directory (os error 2)

Rustc

If you want to run rustc directly, you can do so with:

$ ./y.sh rustc my_crate.rs

You can do the same manually (although we don't recommend it):

$ LIBRARY_PATH="[gcc-path value]" LD_LIBRARY_PATH="[gcc-path value]" rustc +$(cat $CG_GCCJIT_DIR/rust-toolchain | grep 'channel' | cut -d '=' -f 2 | sed 's/"//g' | sed 's/ //g') -Cpanic=abort -Zcodegen-backend=$CG_GCCJIT_DIR/target/release/librustc_codegen_gcc.so --sysroot $CG_GCCJIT_DIR/build_sysroot/sysroot my_crate.rs

Env vars

  • CG_GCCJIT_DUMP_ALL_MODULES: Enables dumping of all compilation modules. When set to "1", a dump is created for each module during compilation and stored in /tmp/reproducers/.
  • CG_GCCJIT_DUMP_MODULE: Enables dumping of a specific module. When set with the module name, e.g., CG_GCCJIT_DUMP_MODULE=module_name, a dump of that specific module is created in /tmp/reproducers/.
  • CG_RUSTFLAGS: Send additional flags to rustc. Can be used to build the sysroot without unwinding by setting CG_RUSTFLAGS=-Cpanic=abort.
  • CG_GCCJIT_DUMP_TO_FILE: Dump a C-like representation to /tmp/gccjit_dumps and enable debug info in order to debug this C-like representation.
  • CG_GCCJIT_DUMP_RTL: Dumps RTL (Register Transfer Language) for virtual registers.
  • CG_GCCJIT_DUMP_RTL_ALL: Dumps all RTL passes.
  • CG_GCCJIT_DUMP_TREE_ALL: Dumps all tree (GIMPLE) passes.
  • CG_GCCJIT_DUMP_IPA_ALL: Dumps all Interprocedural Analysis (IPA) passes.
  • CG_GCCJIT_DUMP_CODE: Dumps the final generated code.
  • CG_GCCJIT_DUMP_GIMPLE: Dumps the initial GIMPLE representation.
  • CG_GCCJIT_DUMP_EVERYTHING: Enables dumping of all intermediate representations and passes.
  • CG_GCCJIT_KEEP_INTERMEDIATES: Keeps intermediate files generated during the compilation process.
  • CG_GCCJIT_VERBOSE: Enables verbose output from the GCC driver.

Extra documentation

More specific documentation is available in the doc folder:

Licensing

While this crate is licensed under a dual Apache/MIT license, it links to libgccjit which is under the GPLv3+ and thus, the resulting toolchain (rustc + GCC codegen) will need to be released under the GPL license.

However, programs compiled with rustc_codegen_gcc do not need to be released under a GPL license.