rust/compiler/rustc_codegen_llvm/src/base.rs

221 lines
8.2 KiB
Rust
Raw Normal View History

//! Codegen the MIR to the LLVM IR.
//!
2018-05-08 13:10:16 +00:00
//! Hopefully useful general knowledge about codegen:
//!
//! * There's no way to find out the [`Ty`] type of a [`Value`]. Doing so
2019-02-08 13:53:55 +00:00
//! would be "trying to get the eggs out of an omelette" (credit:
//! pcwalton). You can, instead, find out its [`llvm::Type`] by calling [`val_ty`],
//! but one [`llvm::Type`] corresponds to many [`Ty`]s; for instance, `tup(int, int,
//! int)` and `rec(x=int, y=int, z=int)` will have the same [`llvm::Type`].
//!
//! [`Ty`]: rustc_middle::ty::Ty
//! [`val_ty`]: common::val_ty
use super::ModuleLlvm;
use crate::attributes;
2019-02-17 18:58:58 +00:00
use crate::builder::Builder;
use crate::common;
use crate::context::CodegenCx;
2019-12-22 22:42:04 +00:00
use crate::llvm;
use crate::value::Value;
2020-03-29 15:19:48 +00:00
use rustc_codegen_ssa::base::maybe_create_entry_wrapper;
use rustc_codegen_ssa::mono_item::MonoItemExt;
use rustc_codegen_ssa::traits::*;
use rustc_codegen_ssa::{ModuleCodegen, ModuleKind};
use rustc_data_structures::small_c_str::SmallCStr;
2020-03-29 14:41:09 +00:00
use rustc_middle::dep_graph;
use rustc_middle::middle::codegen_fn_attrs::CodegenFnAttrs;
2020-03-29 14:41:09 +00:00
use rustc_middle::middle::cstore::EncodedMetadata;
use rustc_middle::middle::exported_symbols;
use rustc_middle::mir::mono::{Linkage, Visibility};
use rustc_middle::ty::TyCtxt;
2021-02-07 21:47:03 +00:00
use rustc_session::config::DebugInfo;
use rustc_span::symbol::Symbol;
2021-02-07 21:47:03 +00:00
use rustc_target::spec::SanitizerSet;
use std::ffi::CString;
use std::time::Instant;
2019-06-13 21:48:52 +00:00
pub fn write_compressed_metadata<'tcx>(
tcx: TyCtxt<'tcx>,
metadata: &EncodedMetadata,
llvm_module: &mut ModuleLlvm,
) {
use snap::write::FrameEncoder;
2019-12-22 22:42:04 +00:00
use std::io::Write;
Store metadata separately in rlib files Right now whenever an rlib file is linked against, all of the metadata from the rlib is pulled in to the final staticlib or binary. The reason for this is that the metadata is currently stored in a section of the object file. Note that this is intentional for dynamic libraries in order to distribute metadata bundled with static libraries. This commit alters the situation for rlib libraries to instead store the metadata in a separate file in the archive. In doing so, when the archive is passed to the linker, none of the metadata will get pulled into the result executable. Furthermore, the metadata file is skipped when assembling rlibs into an archive. The snag in this implementation comes with multiple output formats. When generating a dylib, the metadata needs to be in the object file, but when generating an rlib this needs to be separate. In order to accomplish this, the metadata variable is inserted into an entirely separate LLVM Module which is then codegen'd into a different location (foo.metadata.o). This is then linked into dynamic libraries and silently ignored for rlib files. While changing how metadata is inserted into archives, I have also stopped compressing metadata when inserted into rlib files. We have wanted to stop compressing metadata, but the sections it creates in object file sections are apparently too large. Thankfully if it's just an arbitrary file it doesn't matter how large it is. I have seen massive reductions in executable sizes, as well as staticlib output sizes (to confirm that this is all working).
2013-12-04 01:41:01 +00:00
2021-03-29 11:02:59 +00:00
// Historical note:
//
// When using link.exe it was seen that the section name `.note.rustc`
// was getting shortened to `.note.ru`, and according to the PE and COFF
// specification:
//
// > Executable images do not use a string table and do not support
// > section names longer than 8 characters
//
// https://docs.microsoft.com/en-us/windows/win32/debug/pe-format
//
// As a result, we choose a slightly shorter name! As to why
2021-05-10 08:49:45 +00:00
// `.note.rustc` works on MinGW, see
// https://github.com/llvm/llvm-project/blob/llvmorg-12.0.0/lld/COFF/Writer.cpp#L1190-L1197
2021-03-29 11:02:59 +00:00
let section_name = if tcx.sess.target.is_like_osx { "__DATA,.rustc" } else { ".rustc" };
let (metadata_llcx, metadata_llmod) = (&*llvm_module.llcx, llvm_module.llmod());
2021-05-29 20:49:59 +00:00
let mut compressed = rustc_metadata::METADATA_HEADER.to_vec();
FrameEncoder::new(&mut compressed).write_all(&metadata.raw_data).unwrap();
2015-11-20 23:08:09 +00:00
let llmeta = common::bytes_in_context(metadata_llcx, &compressed);
let llconst = common::struct_in_context(metadata_llcx, &[llmeta], false);
let name = exported_symbols::metadata_symbol_name(tcx);
std: Implement CString-related RFCs This commit is an implementation of [RFC 592][r592] and [RFC 840][r840]. These two RFCs tweak the behavior of `CString` and add a new `CStr` unsized slice type to the module. [r592]: https://github.com/rust-lang/rfcs/blob/master/text/0592-c-str-deref.md [r840]: https://github.com/rust-lang/rfcs/blob/master/text/0840-no-panic-in-c-string.md The new `CStr` type is only constructable via two methods: 1. By `deref`'ing from a `CString` 2. Unsafely via `CStr::from_ptr` The purpose of `CStr` is to be an unsized type which is a thin pointer to a `libc::c_char` (currently it is a fat pointer slice due to implementation limitations). Strings from C can be safely represented with a `CStr` and an appropriate lifetime as well. Consumers of `&CString` should now consume `&CStr` instead to allow producers to pass in C-originating strings instead of just Rust-allocated strings. A new constructor was added to `CString`, `new`, which takes `T: IntoBytes` instead of separate `from_slice` and `from_vec` methods (both have been deprecated in favor of `new`). The `new` method returns a `Result` instead of panicking. The error variant contains the relevant information about where the error happened and bytes (if present). Conversions are provided to the `io::Error` and `old_io::IoError` types via the `FromError` trait which translate to `InvalidInput`. This is a breaking change due to the modification of existing `#[unstable]` APIs and new deprecation, and more detailed information can be found in the two RFCs. Notable breakage includes: * All construction of `CString` now needs to use `new` and handle the outgoing `Result`. * Usage of `CString` as a byte slice now explicitly needs a `.as_bytes()` call. * The `as_slice*` methods have been removed in favor of just having the `as_bytes*` methods. Closes #22469 Closes #22470 [breaking-change]
2015-02-18 06:47:40 +00:00
let buf = CString::new(name).unwrap();
2019-12-22 22:42:04 +00:00
let llglobal =
unsafe { llvm::LLVMAddGlobal(metadata_llmod, common::val_ty(llconst), buf.as_ptr()) };
unsafe {
llvm::LLVMSetInitializer(llglobal, llconst);
let name = SmallCStr::new(section_name);
llvm::LLVMSetSection(llglobal, name.as_ptr());
// Also generate a .section directive to force no
// flags, at least for ELF outputs, so that the
// metadata doesn't get loaded into memory.
let directive = format!(".section {}", section_name);
llvm::LLVMSetModuleInlineAsm2(metadata_llmod, directive.as_ptr().cast(), directive.len())
}
}
pub struct ValueIter<'ll> {
cur: Option<&'ll Value>,
step: unsafe extern "C" fn(&'ll Value) -> Option<&'ll Value>,
}
impl Iterator for ValueIter<'ll> {
type Item = &'ll Value;
fn next(&mut self) -> Option<&'ll Value> {
let old = self.cur;
if let Some(old) = old {
self.cur = unsafe { (self.step)(old) };
}
old
}
}
pub fn iter_globals(llmod: &'ll llvm::Module) -> ValueIter<'ll> {
2019-12-22 22:42:04 +00:00
unsafe { ValueIter { cur: llvm::LLVMGetFirstGlobal(llmod), step: llvm::LLVMGetNextGlobal } }
}
2015-01-02 04:54:03 +00:00
pub fn compile_codegen_unit(
tcx: TyCtxt<'tcx>,
cgu_name: Symbol,
) -> (ModuleCodegen<ModuleLlvm>, u64) {
let start_time = Instant::now();
let dep_node = tcx.codegen_unit(cgu_name).codegen_dep_node(tcx);
2019-12-22 22:42:04 +00:00
let (module, _) =
tcx.dep_graph.with_task(dep_node, tcx, cgu_name, module_codegen, dep_graph::hash_result);
2018-05-08 13:10:16 +00:00
let time_to_codegen = start_time.elapsed();
// We assume that the cost to run LLVM on a CGU is proportional to
2018-05-08 13:10:16 +00:00
// the time we needed for codegenning it.
2020-09-20 08:12:57 +00:00
let cost = time_to_codegen.as_nanos() as u64;
2019-12-22 22:42:04 +00:00
fn module_codegen(tcx: TyCtxt<'_>, cgu_name: Symbol) -> ModuleCodegen<ModuleLlvm> {
let cgu = tcx.codegen_unit(cgu_name);
let _prof_timer = tcx.prof.generic_activity_with_args(
"codegen_module",
&[cgu_name.to_string(), cgu.size_estimate().to_string()],
);
2018-05-08 13:10:16 +00:00
// Instantiate monomorphizations without filling out definitions yet...
let llvm_module = ModuleLlvm::new(tcx, &cgu_name.as_str());
{
2018-09-27 13:31:20 +00:00
let cx = CodegenCx::new(tcx, cgu, &llvm_module);
2019-12-22 22:42:04 +00:00
let mono_items = cx.codegen_unit.items_in_deterministic_order(cx.tcx);
2018-05-08 13:10:16 +00:00
for &(mono_item, (linkage, visibility)) in &mono_items {
mono_item.predefine::<Builder<'_, '_, '_>>(&cx, linkage, visibility);
}
// ... and now that we have everything pre-defined, fill out those definitions.
2018-05-08 13:10:16 +00:00
for &(mono_item, _) in &mono_items {
mono_item.define::<Builder<'_, '_, '_>>(&cx);
}
// If this codegen unit contains the main function, also create the
// wrapper here
if let Some(entry) = maybe_create_entry_wrapper::<Builder<'_, '_, '_>>(&cx) {
attributes::sanitize(&cx, SanitizerSet::empty(), entry);
}
// Run replace-all-uses-with for statics that need it
for &(old_g, new_g) in cx.statics_to_rauw().borrow().iter() {
unsafe {
let bitcast = llvm::LLVMConstPointerCast(new_g, cx.val_ty(old_g));
llvm::LLVMReplaceAllUsesWith(old_g, bitcast);
llvm::LLVMDeleteGlobal(old_g);
}
}
// Finalize code coverage by injecting the coverage map. Note, the coverage map will
// also be added to the `llvm.used` variable, created next.
coverage bug fixes and optimization support Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to address multiple, somewhat related issues. Fixed a significant flaw in prior coverage solution: Every counter generated a new counter variable, but there should have only been one counter variable per function. This appears to have bloated .profraw files significantly. (For a small program, it increased the size by about 40%. I have not tested large programs, but there is anecdotal evidence that profraw files were way too large. This is a good fix, regardless, but hopefully it also addresses related issues. Fixes: #82144 Invalid LLVM coverage data produced when compiled with -C opt-level=1 Existing tests now work up to at least `opt-level=3`. This required a detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR when compiled with coverage, and a lot of trial and error with codegen adjustments. The biggest hurdle was figuring out how to continue to support coverage results for unused functions and generics. Rust's coverage results have three advantages over Clang's coverage results: 1. Rust's coverage map does not include any overlapping code regions, making coverage counting unambiguous. 2. Rust generates coverage results (showing zero counts) for all unused functions, including generics. (Clang does not generate coverage for uninstantiated template functions.) 3. Rust's unused functions produce minimal stubbed functions in LLVM IR, sufficient for including in the coverage results; while Clang must generate the complete LLVM IR for each unused function, even though it will never be called. This PR removes the previous hack of attempting to inject coverage into some other existing function instance, and generates dedicated instances for each unused function. This change, and a few other adjustments (similar to what is required for `-C link-dead-code`, but with lower impact), makes it possible to support LLVM optimizations. Fixes: #79651 Coverage report: "Unexecuted instantiation:..." for a generic function from multiple crates Fixed by removing the aforementioned hack. Some "Unexecuted instantiation" notices are unavoidable, as explained in the `used_crate.rs` test, but `-Zinstrument-coverage` has new options to back off support for either unused generics, or all unused functions, which avoids the notice, at the cost of less coverage of unused functions. Fixes: #82875 Invalid LLVM coverage data produced with crate brotli_decompressor Fixed by disabling the LLVM function attribute that forces inlining, if `-Z instrument-coverage` is enabled. This attribute is applied to Rust functions with `#[inline(always)], and in some cases, the forced inlining breaks coverage instrumentation and reports.
2021-03-15 23:32:45 +00:00
if cx.sess().instrument_coverage() {
cx.coverageinfo_finalize();
}
// Create the llvm.used variable
// This variable has type [N x i8*] and is stored in the llvm.metadata section
if !cx.used_statics().borrow().is_empty() {
cx.create_used_variable()
}
// Finalize debuginfo
if cx.sess().opts.debuginfo != DebugInfo::None {
cx.debuginfo_finalize();
}
}
ModuleCodegen {
name: cgu_name.to_string(),
module_llvm: llvm_module,
kind: ModuleKind::Regular,
}
}
(module, cost)
}
pub fn set_link_section(llval: &Value, attrs: &CodegenFnAttrs) {
let sect = match attrs.link_section {
Some(name) => name,
None => return,
rustc: Add a `#[wasm_import_module]` attribute This commit adds a new attribute to the Rust compiler specific to the wasm target (and no other targets). The `#[wasm_import_module]` attribute is used to specify the module that a name is imported from, and is used like so: #[wasm_import_module = "./foo.js"] extern { fn some_js_function(); } Here the import of the symbol `some_js_function` is tagged with the `./foo.js` module in the wasm output file. Wasm-the-format includes two fields on all imports, a module and a field. The field is the symbol name (`some_js_function` above) and the module has historically unconditionally been `"env"`. I'm not sure if this `"env"` convention has asm.js or LLVM roots, but regardless we'd like the ability to configure it! The proposed ES module integration with wasm (aka a wasm module is "just another ES module") requires that the import module of wasm imports is interpreted as an ES module import, meaning that you'll need to encode paths, NPM packages, etc. As a result, we'll need this to be something other than `"env"`! Unfortunately neither our version of LLVM nor LLD supports custom import modules (aka anything not `"env"`). My hope is that by the time LLVM 7 is released both will have support, but in the meantime this commit adds some primitive encoding/decoding of wasm files to the compiler. This way rustc postprocesses the wasm module that LLVM emits to ensure it's got all the imports we'd like to have in it. Eventually I'd ideally like to unconditionally require this attribute to be placed on all `extern { ... }` blocks. For now though it seemed prudent to add it as an unstable attribute, so for now it's not required (as that'd force usage of a feature gate). Hopefully it doesn't take too long to "stabilize" this! cc rust-lang-nursery/rust-wasm#29
2018-02-10 22:28:17 +00:00
};
unsafe {
let buf = SmallCStr::new(&sect.as_str());
llvm::LLVMSetSection(llval, buf.as_ptr());
}
}
pub fn linkage_to_llvm(linkage: Linkage) -> llvm::Linkage {
match linkage {
Linkage::External => llvm::Linkage::ExternalLinkage,
Linkage::AvailableExternally => llvm::Linkage::AvailableExternallyLinkage,
Linkage::LinkOnceAny => llvm::Linkage::LinkOnceAnyLinkage,
Linkage::LinkOnceODR => llvm::Linkage::LinkOnceODRLinkage,
Linkage::WeakAny => llvm::Linkage::WeakAnyLinkage,
Linkage::WeakODR => llvm::Linkage::WeakODRLinkage,
Linkage::Appending => llvm::Linkage::AppendingLinkage,
Linkage::Internal => llvm::Linkage::InternalLinkage,
Linkage::Private => llvm::Linkage::PrivateLinkage,
Linkage::ExternalWeak => llvm::Linkage::ExternalWeakLinkage,
Linkage::Common => llvm::Linkage::CommonLinkage,
}
}
pub fn visibility_to_llvm(linkage: Visibility) -> llvm::Visibility {
match linkage {
Visibility::Default => llvm::Visibility::Default,
Visibility::Hidden => llvm::Visibility::Hidden,
Visibility::Protected => llvm::Visibility::Protected,
2017-09-18 10:14:52 +00:00
}
}