This series of commits adds the initial implementation of a new build system for
the compiler and standard library based on Cargo. The high-level architecture
now looks like:
1. The `./configure` script is run with `--enable-rustbuild` and other standard
configuration options.
2. A `Makefile` is generate which proxies commands to the new build system.
3. The new build system has a Python script entry point which manages
downloading both a Rust and Cargo nightly. This initial script also manages
building the build system itself (which is written in Rust).
4. The build system, written in rust and called `bootstrap`, architects how to
call `cargo` and manages building all native libraries and such.
One might reasonably ask "why rewrite the build system?", which is a good
question! The Rust project has used Makefiles for as long as I can remember at
least, and while ugly and difficult to use are undeniably robust as they contain
years worth of tweaking and tuning for working on as many platforms in as many
situation as possible. The rationale behind this PR, however is:
* The makefiles are impenetrable to all but a few people on this
planet. This means that contributions to the build system are almost
nonexistent, and furthermore if a build system change is needed it's
incredibly difficult to figure out how to do so. This hindrance prevents us
from doing some "perhaps fancier" things we may wish to do in make.
* Our build system, while portable, is unfortunately not infinitely portable
everywhere. For example the recently-introduced MSVC target is quite unlikely
to have `make` installed by default (e.g. it requires building inside of an
MSYS2 shell currently). Conversely, the portability of make comes at a cost of
crazy and weird hacks to work around all sorts of versions of software
everywhere, especially when it comes to the configure script and makefiles.
By rewriting this logic in one of the most robust platforms there is, Rust,
we get to assuage all of these worries for free!
* There's a standard tool to build Rust crates, Cargo, but the standard library
and compiler don't use it. This means that they cannot benefit easily from the
crates.io ecosystem, nor can the ecosystem benefit from a standard way to
build this repository itself. Moving to Cargo should help assuage both of
these needs. This has the added benefit of making the compiler more
approachable for newbies as working on the compiler will just happen to be
working on a large Cargo project, all the same standard tools and tricks will
apply.
* There's a huge amount of portability information in the main distribution, for
example around cross compiling, compiling on new OSes, etc. Pushing this logic
into standard crates (like `gcc`) enables the community to immediately benefit
from new build logic.
Despite these benefits, it's going to be a long road to actually replace our
current build system. This PR is just the beginning and doesn't implement the
full suite of functionality as the current one, but there are many more to
follow! The current implementation strategy hopes to look like:
1. Land a second build system in-tree that can be itereated on an and
contributed to. This will not be used just yet in terms of gating new commits
to the repo.
2. Over time, bring the second build system to feature parity with the old build
system, start setting up CI for both build systems.
3. At some point in the future, switch the default to the new build system, but
keep the old one around.
4. At some further point in the future, delete the entire old build system.
---
Alright, so with all that out of the way, here's some more info on this PR
itself. The inital build system here is contained in the `src/bootstrap`
directory and just adds the necessary minimum bits to bootstrap the compiler
itself. There is currently no support for building documentation, running tests,
or installing, but the implemented support is:
* Compiling LLVM with `cmake` instead of `./configure` + `make`. The LLVM
project is removing their autotools build system, so we'd have to make this
transition eventually anyway.
* Compiling compiler-rt with `cmake` as well (for the same rationale as above).
* Adding `Cargo.toml` to map out the dependency graph to all crates, and also
adding `build.rs` files where appropriate. For example `alloc_jemalloc` has a
script to build jemalloc, `flate` has a script to build `miniz.c`, `std` will
build `libbacktrace`, etc.
* Orchestrating all the calls to `cargo` to build the standard distribution,
following the normal bootstrapping process. This also tracks dependencies
between steps to ensure cross-compilation targets happen as well.
* Configuration is intended to eventually be done through a `config.toml` file,
so support is implemented for this. The most likely vector of configuration
for now, however, is likely through `config.mk` (what `./configure` emits), so
the build system currently parses this information.
There's still quite a few steps left to do, and I'll open up some follow-up
issues (as well as a tracking issue) for this migration, but hopefully this is a
great start to get going! This PR is currently tested on all the
Windows/Linux/OSX triples for x86\_64 and x86, but more portability is always
welcome!
---
Future functionality left to implement
* [ ] Re-verify that multi-host builds work
* [ ] Verify android build works
* [ ] Verify iOS build work (mostly compiler-rt)
* [ ] Verify sha256 and ideally gpg of downloaded nightly compiler and nightly rustc
* [ ] Implement testing -- this is a huge bullet point with lots of sub-bullets
* [ ] Build and generate documentation (plus the various tools we have in-tree)
* [ ] Move various src/etc scripts into Rust -- not sure how this interacts with `make` build system
* [ ] Implement `make install` - like testing this is also quite massive
* [x] Deduplicate version information with makefiles
LLVM's memory dependence analysis doesn't properly account for calls
that could unwind and thus effectively act as a branching point. This
can lead to stores that are only visible when the call unwinds being
removed, possibly leading to calls to drop() functions with b0rked
memory contents.
As there is no fix for this in LLVM yet and we want to keep
compatibility to current LLVM versions anyways, we have to workaround
this bug by omitting the noalias attribute on &mut function arguments.
Benchmarks suggest that the performance loss by this change is very
small.
Thanks to @RalfJung for pushing me towards not removing too many
noalias annotations and @alexcrichton for helping out with the test for
this bug.
Fixes#29485
r? @Manishearth
I just noticed they can't be rolled up (often modifying the same line(s) in imports). So once I reach the critical amount for them to be merged I'll create a PR that merges all of them.
We no longer have a separate powerpc64 and powerpc64le target_arch, and instead use target_endian to select between the two. These patches fix a couple of remaining issues.
LLVM's memory dependence analysis doesn't properly account for calls
that could unwind and thus effectively act as a branching point. This
can lead to stores that are only visible when the call unwinds being
removed, possibly leading to calls to drop() functions with b0rked
memory contents.
As there is no fix for this in LLVM yet and we want to keep
compatibility to current LLVM versions anyways, we have to workaround
this bug by omitting the noalias attribute on &mut function arguments.
Benchmarks suggest that the performance loss by this change is very
small.
Thanks to @RalfJung for pushing me towards not removing too many
noalias annotations and @alexcrichton for helping out with the test for
this bug.
Fixes#29485
* We don't have SEH-based unwinding yet.
For this reason we don't need operand bundles in MIR trans.
* Refactored some uses of fcx.
* Refactored some calls to `with_block`.
Here's another go at adding emscripten support. This needs to wait again on new [libc definitions](https://github.com/rust-lang-nursery/libc/pull/122) landing. To get the libc definitions right I had to add support for i686-unknown-linux-musl, which are very similar to emscripten's, which are derived from arm/musl.
This branch additionally removes the makefile dependency on the `EMSCRIPTEN` environment variable by not building the unused compiler-rt.
Again, this is not sufficient for actually compiling to asmjs since it needs additional LLVM patches.
r? @alexcrichton
This tells trans:🔙:write not to LLVM codegen to create .o
files but to put LLMV bitcode in .o files.
Emscripten's emcc supports .o in this format, and this is,
I think, slightly easier than making rlibs work without .o
files.
Backtraces, and the compilation of libbacktrace for asmjs, are disabled.
This port doesn't use jemalloc so, like pnacl, it disables jemalloc *for all targets*
in the configure file.
It disables stack protection.
The scope of these refactorings is a little bit bigger than the title implies. See each commit for details.
I’m submitting this for nitpicking now (the first 4 commits), because I feel the basic idea/implementation is sound and should work. I will eventually expand this PR to cover the translator changes necessary for all this to work (+ tests), ~~and perhaps implement a dynamic dropping scheme while I’m at it as well.~~
r? @nikomatsakis
If a new cleanup is added to a cleanup scope, the cached exits for that
scope are cleared, so all previous cleanups have to be translated
again. In the worst case this means that we get N distinct landing pads
where the last one has N cleanups, then N-1 and so on.
As new cleanups are to be executed before older ones, we can instead
cache the number of already translated cleanups in addition to the
block that contains them, and then only translate new ones, if any and
then jump to the cached ones, getting away with linear growth instead.
For the crate in #31381 this reduces the compile time for an optimized
build from >20 minutes (I cancelled the build at that point) to about 11
seconds. Testing a few crates that come with rustc show compile time
improvements somewhere between 1 and 8%. The "big" winner being
rustc_platform_intrinsics which features code similar to that in #31381.
Fixes#31381
The first commit improves detection of unused imports -- it should have been part of #30325. Right now, the unused import in the changed test would not be reported.
The rest of the commits are miscellaneous, independent clean-ups in resolve that I didn't think warranted individual PRs.
r? @nrc
The structure of the old translator as well as MIR assumed that drop glue cannot possibly panic and
translated the drops accordingly. However, in presence of `Drop::drop` this assumption can be
trivially shown to be untrue. As such, the Rust code like the following would never print number 2:
```rust
struct Droppable(u32);
impl Drop for Droppable {
fn drop(&mut self) {
if self.0 == 1 { panic!("Droppable(1)") } else { println!("{}", self.0) }
}
}
fn main() {
let x = Droppable(2);
let y = Droppable(1);
}
```
While the behaviour is allowed according to the language rules (we allow drops to not run), that’s
a very counter-intuitive behaviour. We fix this in MIR by allowing `Drop` to have a target to take
on divergence and connect the drops in such a way so the leftover drops are executed when some drop
unwinds.
Note, that this commit still does not implement the translator part of changes necessary for the
grand scheme of things to fully work, so the actual observed behaviour does not change yet. Coming
soon™.
See #14875.
We used to have CallKind only because there was a requirement to have all successors in a
contiguous memory block. Now that the requirement is gone, remove the CallKind and instead just
have the necessary information inline.
Awesome!
After the truly incredible and embarrassing mess I managed to make in my last pull request, this should be a bit less messy.
Fixes#31267 - with this change, the code mentioned in the issue compiles.
Found and fixed another issue as well - constants of zero-size types, when used in ExprRepeats inside associated constants, were causing the compiler to crash at the same place as #31267. An example of this:
```
struct Bar;
const BAZ: Bar = Bar;
struct Foo([Bar; 1]);
struct Biz;
impl Biz {
const BAZ: Foo = Foo([BAZ; 1]);
}
fn main() {
let foo = Biz::BAZ;
println!("{:?}", foo);
}
```
However, I'm fairly certain that my fix for this is not as elegant as it could be. The problem seems to occur only with an associated constant of a tuple struct containing a fixed size array which is initialized using a repeat expression, and when the element to be repeated provided to the repeat expression is another constant which is of a zero-sized type. The fix works by looking for constants and associated constants which are zero-width and consequently contain no data, but for which rustc is still attempting to emit an LLVM value; it simply stops rustc from attempting to emit anything. By my logic, this should work fine since the only values that are emitted in this case (according to the comments) are for closures with side effects, and constants will never have side effects, so it's fine to simply get rid of them. It fixes the error and things compile fine with it, but I have a sneaking suspicion that it could be done in a far better manner.
r? @nikomatsakis
If a new cleanup is added to a cleanup scope, the cached exits for that
scope are cleared, so all previous cleanups have to be translated
again. In the worst case this means that we get N distinct landing pads
where the last one has N cleanups, then N-1 and so on.
As new cleanups are to be executed before older ones, we can instead
cache the number of already translated cleanups in addition to the
block that contains them, and then only translate new ones, if any and
then jump to the cached ones, getting away with linear growth instead.
For the crate in #31381 this reduces the compile time for an optimized
build from >20 minutes (I cancelled the build at that point) to about 11
seconds. Testing a few crates that come with rustc show compile time
improvements somewhere between 1 and 8%. The "big" winner being
rustc_platform_intrinsics which features code similar to that in #31381.
Fixes#31381