Fix handling of wasm import modules and names
The WebAssembly targets of rustc have weird issues around name mangling
and import the same name from different modules. This all largely stems
from the fact that we're using literal symbol names in LLVM IR to
represent what a function is called when it's imported, and we're not
using the wasm-specific `wasm-import-name` attribute. This in turn leads
to two issues:
* If, in the same codegen unit, the same FFI symbol is referenced twice
then rustc, when translating to LLVM IR, will only reference one
symbol from the first wasm module referenced.
* There's also a bug in LLD [1] where even if two codegen units
reference different modules, having the same symbol names means that
LLD coalesces the symbols and only refers to one wasm module.
Put another way, all our imported wasm symbols from the environment are
keyed off their LLVM IR symbol name, which has lots of collisions today.
This commit fixes the issue by implementing two changes:
1. All wasm symbols with `#[link(wasm_import_module = "...")]` are
mangled by default in LLVM IR. This means they're all given unique names.
2. Symbols then use the `wasm-import-name` attribute to ensure that the
WebAssembly file uses the correct import name.
When put together this should ensure we don't trip over the LLD bug [1]
and we also codegen IR correctly always referencing the right symbols
with the right import module/name pairs.
Closes#50021Closes#56309Closes#63562
[1]: https://bugs.llvm.org/show_bug.cgi?id=44316
The WebAssembly targets of rustc have weird issues around name mangling
and import the same name from different modules. This all largely stems
from the fact that we're using literal symbol names in LLVM IR to
represent what a function is called when it's imported, and we're not
using the wasm-specific `wasm-import-name` attribute. This in turn leads
to two issues:
* If, in the same codegen unit, the same FFI symbol is referenced twice
then rustc, when translating to LLVM IR, will only reference one
symbol from the first wasm module referenced.
* There's also a bug in LLD [1] where even if two codegen units
reference different modules, having the same symbol names means that
LLD coalesces the symbols and only refers to one wasm module.
Put another way, all our imported wasm symbols from the environment are
keyed off their LLVM IR symbol name, which has lots of collisions today.
This commit fixes the issue by implementing two changes:
1. All wasm symbols with `#[link(wasm_import_module = "...")]` are
mangled by default in LLVM IR. This means they're all given unique names.
2. Symbols then use the `wasm-import-name` attribute to ensure that the
WebAssembly file uses the correct import name.
When put together this should ensure we don't trip over the LLD bug [1]
and we also codegen IR correctly always referencing the right symbols
with the right import module/name pairs.
Closes#50021Closes#56309Closes#63562
[1]: https://bugs.llvm.org/show_bug.cgi?id=44316
This commit builds on #65501 continue to simplify the build system and
compiler now that we no longer have multiple LLVM backends to ship by
default. Here this switches the compiler back to what it once was long
long ago, which is linking LLVM directly to the compiler rather than
dynamically loading it at runtime. The `codegen-backends` directory of
the sysroot no longer exists and all relevant support in the build
system is removed. Note that `rustc` still supports a dynamically loaded
codegen backend as it did previously, it just no longer supports
dynamically loaded codegen backends in its own sysroot.
Additionally as part of this the `librustc_codegen_llvm` crate now once
again explicitly depends on all of its crates instead of implicitly
loading them through the sysroot. This involved filling out its
`Cargo.toml` and deleting all the now-unnecessary `extern crate`
annotations in the header of the crate. (this in turn required adding a
number of imports for names of macros too).
The end results of this change are:
* Rustbuild's build process for the compiler as all the "oh don't forget
the codegen backend" checks can be easily removed.
* Building `rustc_codegen_llvm` is much simpler since it's simply
another compiler crate.
* Managing the dependencies of `rustc_codegen_llvm` is much simpler since
it's "just another `Cargo.toml` to edit"
* The build process should be a smidge faster because there's more
parallelism in the main rustc build step rather than splitting
`librustc_codegen_llvm` out to its own step.
* The compiler is expected to be slightly faster by default because the
codegen backend does not need to be dynamically loaded.
* Disabling LLVM as part of rustbuild is still supported, supporting
multiple codegen backends is still supported, and dynamic loading of a
codegen backend is still supported.
Always mark rust and rust-call abi's as unwind
PR #63909 identified a bug that had been injected by PR #55982. As discussed on https://github.com/rust-lang/rust/issues/64655#issuecomment-537517428 , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.
This is a more targeted variant of PR #63909 that fixes the above bug.
Fix#64655
----
I personally suspect we will want PR #63909 to land in the long-term
But:
* it is not certain that PR #63909 *will* land,
* more importantly, PR #63909 almost certainly will not be backported to beta/stable.
The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](https://github.com/rust-lang/rust/pull/63909#issuecomment-524818838)).
Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
When thread sanitizer instrumentation is enabled during compilation of
stack probe function, the function will be miscompiled and trigger
segmentation fault at runtime. Disable stack probes when tsan is
enabled.
This flag inserts `mcount` function call to the beginning of every function
after inline processing. So tracing tools like uftrace [1] (or ftrace for
Linux kernel modules) have a chance to examine function calls.
It is similar to the `-pg` flag provided by gcc or clang, but without
generating a `__gmon_start__` function for executables. If a program
runs without being traced, no `gmon.out` will be written to disk.
Under the hood, it simply adds `"instrument-function-entry-inlined"="mcount"`
attribute to every function. The `post-inline-ee-instrument` LLVM pass does
the actual job.
[1]: https://github.com/namhyung/uftrace