Empowering everyone to build reliable and efficient software.
Go to file
bors 53c2933d44 Auto merge of #30900 - michaelwoerister:trans_item_collect, r=nikomatsakis
The purpose of the translation item collector is to find all monomorphic instances of functions, methods and statics that need to be translated into LLVM IR in order to compile the current crate.

So far these instances have been discovered lazily during the trans path. For incremental compilation we want to know the set of these instances in advance, and that is what the trans::collect module provides.
In the future, incremental and regular translation will be driven by the collector implemented here.

r? @nikomatsakis
cc @rust-lang/compiler

Translation Item Collection
===========================

This module is responsible for discovering all items that will contribute to
to code generation of the crate. The important part here is that it not only
needs to find syntax-level items (functions, structs, etc) but also all
their monomorphized instantiations. Every non-generic, non-const function
maps to one LLVM artifact. Every generic function can produce
from zero to N artifacts, depending on the sets of type arguments it
is instantiated with.
This also applies to generic items from other crates: A generic definition
in crate X might produce monomorphizations that are compiled into crate Y.
We also have to collect these here.

The following kinds of "translation items" are handled here:

 - Functions
 - Methods
 - Closures
 - Statics
 - Drop glue

The following things also result in LLVM artifacts, but are not collected
here, since we instantiate them locally on demand when needed in a given
codegen unit:

 - Constants
 - Vtables
 - Object Shims

General Algorithm
-----------------
Let's define some terms first:

 - A "translation item" is something that results in a function or global in
   the LLVM IR of a codegen unit. Translation items do not stand on their
   own, they can reference other translation items. For example, if function
   `foo()` calls function `bar()` then the translation item for `foo()`
   references the translation item for function `bar()`. In general, the
   definition for translation item A referencing a translation item B is that
   the LLVM artifact produced for A references the LLVM artifact produced
   for B.

 - Translation items and the references between them for a directed graph,
   where the translation items are the nodes and references form the edges.
   Let's call this graph the "translation item graph".

 - The translation item graph for a program contains all translation items
   that are needed in order to produce the complete LLVM IR of the program.

The purpose of the algorithm implemented in this module is to build the
translation item graph for the current crate. It runs in two phases:

 1. Discover the roots of the graph by traversing the HIR of the crate.
 2. Starting from the roots, find neighboring nodes by inspecting the MIR
    representation of the item corresponding to a given node, until no more
    new nodes are found.

The roots of the translation item graph correspond to the non-generic
syntactic items in the source code. We find them by walking the HIR of the
crate, and whenever we hit upon a function, method, or static item, we
create a translation item consisting of the items DefId and, since we only
consider non-generic items, an empty type-substitution set.

Given a translation item node, we can discover neighbors by inspecting its
MIR. We walk the MIR and any time we hit upon something that signifies a
reference to another translation item, we have found a neighbor. Since the
translation item we are currently at is always monomorphic, we also know the
concrete type arguments of its neighbors, and so all neighbors again will be
monomorphic. The specific forms a reference to a neighboring node can take
in MIR are quite diverse. Here is an overview:

The most obvious form of one translation item referencing another is a
function or method call (represented by a CALL terminator in MIR). But
calls are not the only thing that might introduce a reference between two
function translation items, and as we will see below, they are just a
specialized of the form described next, and consequently will don't get any
special treatment in the algorithm.

A function does not need to actually be called in order to be a neighbor of
another function. It suffices to just take a reference in order to introduce
an edge. Consider the following example:

```rust
fn print_val<T: Display>(x: T) {
    println!("{}", x);
}

fn call_fn(f: &Fn(i32), x: i32) {
    f(x);
}

fn main() {
    let print_i32 = print_val::<i32>;
    call_fn(&print_i32, 0);
}
```
The MIR of none of these functions will contain an explicit call to
`print_val::<i32>`. Nonetheless, in order to translate this program, we need
an instance of this function. Thus, whenever we encounter a function or
method in operand position, we treat it as a neighbor of the current
translation item. Calls are just a special case of that.

In a way, closures are a simple case. Since every closure object needs to be
constructed somewhere, we can reliably discover them by observing
`RValue::Aggregate` expressions with `AggregateKind::Closure`. This is also
true for closures inlined from other crates.

Drop glue translation items are introduced by MIR drop-statements. The
generated translation item will again have drop-glue item neighbors if the
type to be dropped contains nested values that also need to be dropped. It
might also have a function item neighbor for the explicit `Drop::drop`
implementation of its type.

A subtle way of introducing neighbor edges is by casting to a trait object.
Since the resulting fat-pointer contains a reference to a vtable, we need to
instantiate all object-save methods of the trait, as we need to store
pointers to these functions even if they never get called anywhere. This can
be seen as a special case of taking a function reference.

Since `Box` expression have special compiler support, no explicit calls to
`exchange_malloc()` and `exchange_free()` may show up in MIR, even if the
compiler will generate them. We have to observe `Rvalue::Box` expressions
and Box-typed drop-statements for that purpose.

Interaction with Cross-Crate Inlining
-------------------------------------
The binary of a crate will not only contain machine code for the items
defined in the source code of that crate. It will also contain monomorphic
instantiations of any extern generic functions and of functions marked with
The collection algorithm handles this more or less transparently. When
constructing a neighbor node for an item, the algorithm will always call
`inline::get_local_instance()` before proceeding. If no local instance can
be acquired (e.g. for a function that is just linked to) no node is created;
which is exactly what we want, since no machine code should be generated in
the current crate for such an item. On the other hand, if we can
successfully inline the function, we subsequently can just treat it like a
local item, walking it's MIR et cetera.

Eager and Lazy Collection Mode
------------------------------
Translation item collection can be performed in one of two modes:

 - Lazy mode means that items will only be instantiated when actually
   referenced. The goal is to produce the least amount of machine code
   possible.

 - Eager mode is meant to be used in conjunction with incremental compilation
   where a stable set of translation items is more important than a minimal
   one. Thus, eager mode will instantiate drop-glue for every drop-able type
   in the crate, even of no drop call for that type exists (yet). It will
   also instantiate default implementations of trait methods, something that
   otherwise is only done on demand.

Open Issues
-----------
Some things are not yet fully implemented in the current version of this
module.

Since no MIR is constructed yet for initializer expressions of constants and
statics we cannot inspect these properly.

Ideally, no translation item should be generated for const fns unless there
is a call to them that cannot be evaluated at compile time. At the moment
this is not implemented however: a translation item will be produced
regardless of whether it is actually needed or not.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/rust-lang/rust/30900)
<!-- Reviewable:end -->
2016-01-29 03:41:44 +00:00
man Tweak manpage’s emit section 2015-12-04 21:11:04 +02:00
mk Auto merge of #30900 - michaelwoerister:trans_item_collect, r=nikomatsakis 2016-01-29 03:41:44 +00:00
src Auto merge of #30900 - michaelwoerister:trans_item_collect, r=nikomatsakis 2016-01-29 03:41:44 +00:00
.gitattributes std: Remove msvc/valgrind headers 2015-07-27 16:21:15 -07:00
.gitignore Ignore KDevelop 4 (and 5 pre-release) project files 2015-06-25 23:26:05 +00:00
.gitmodules Correct .gitmodules 2015-12-26 02:51:04 +08:00
.mailmap AUTHORS and .mailmap cleanup 2015-11-14 23:11:20 -05:00
.travis.yml Use Travis trusty infrastructure 2015-10-19 18:31:02 -04:00
COMPILER_TESTS.md Correct spelling in docs 2015-10-13 09:44:11 -04:00
configure Implement the translation item collector. 2016-01-26 10:17:45 -05:00
CONTRIBUTING.md Re-worded intro paragraph to be more positive 2016-01-26 14:44:27 +00:00
COPYRIGHT Remove reference to AUTHORS.txt file 2016-01-16 15:53:16 +00:00
LICENSE-APACHE Update license, add license boilerplate to most files. Remainder will follow. 2012-12-03 17:12:14 -08:00
LICENSE-MIT Update copyright date 2016-01-01 00:32:31 +00:00
Makefile.in mk: Remove all perf-related targets 2016-01-21 14:45:23 -08:00
README.md added link for issue mentioned in readme 2016-01-15 20:50:03 -07:00
RELEASES.md Update RELEASES.md for 1.5 2015-12-04 19:33:42 +00:00

The Rust Programming Language

This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.

Quick Start

Read "Installing Rust" from The Book.

Building from Source

  1. Make sure you have installed the dependencies:

    • g++ 4.7 or clang++ 3.x
    • python 2.7 or later (but not 3.x)
    • GNU make 3.81 or later
    • curl
    • git
  2. Clone the source with git:

    $ git clone https://github.com/rust-lang/rust.git
    $ cd rust
    
  1. Build and install:

    $ ./configure
    $ make && make install
    

    Note: You may need to use sudo make install if you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a --prefix argument to configure. Various other options are also supported pass --help for more information on them.

    When complete, make install will place several programs into /usr/local/bin: rustc, the Rust compiler, and rustdoc, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.

Building on Windows

There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.

MinGW

MSYS2 can be used to easily build Rust on Windows:

  1. Grab the latest MSYS2 installer and go through the installer.

  2. From the MSYS2 terminal, install the mingw64 toolchain and other required tools.

    # Update package mirrors (may be needed if you have a fresh install of MSYS2)
    $ pacman -Sy pacman-mirrors
    

Download MinGW from here, and choose the threads=win32,exceptions=dwarf/seh flavor when installing. After installing, add its bin directory to your PATH. This is due to #28260, in the future, installing from pacman should be just fine.

# Make git available in MSYS2 (if not already available on path)
$ pacman -S git

$ pacman -S base-devel
  1. Run mingw32_shell.bat or mingw64_shell.bat from wherever you installed MSYS2 (i.e. C:\msys), depending on whether you want 32-bit or 64-bit Rust.

  2. Navigate to Rust's source code, configure and build it:

    $ ./configure
    $ make && make install
    

MSVC

MSVC builds of Rust additionally require an installation of Visual Studio 2013 (or later) so rustc can use its linker. Make sure to check the “C++ tools” option. In addition, cmake needs to be installed to build LLVM.

With these dependencies installed, the build takes two steps:

$ ./configure
$ make && make install

Building Documentation

If youd like to build the documentation, its almost the same:

./configure
$ make docs

Building the documentation requires building the compiler, so the above details will apply. Once you have the compiler built, you can

$ make docs NO_REBUILD=1

To make sure you dont re-build the compiler because you made a change to some documentation.

The generated documentation will appear in a top-level doc directory, created by the make rule.

Notes

Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.

Snapshot binaries are currently built and tested on several platforms:

Platform \ Architecture x86 x86_64
Windows (7, 8, Server 2008 R2)
Linux (2.6.18 or later)
OSX (10.7 Lion or later)

You may find that other platforms work, but these are our officially supported build environments that are most likely to work.

Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.

There is more advice about hacking on Rust in CONTRIBUTING.md.

Getting Help

The Rust community congregates in a few places:

Contributing

To contribute to Rust, please see CONTRIBUTING.

Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust, and a good place to ask for help.

License

Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.

See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.