This tool will generate a JSON file with statistics about each
individual step to disk. It will be used in rust-lang/rust's CI to
replace the mix of scripts and log scraping we currently have to gather
this data.
This attempts to keep the logic as close to the original python as possible.
`probably_large` has been removed, since it was always `True`, and UTF-8 paths are no longer supported when patching files for NixOS.
I can readd UTF-8 support if desired.
Note that this required making `llvm_link_shared` computed on-demand,
since we don't know whether it will be static or dynamic until we download LLVM from CI.
feat: Allow usage of sudo [while not accessing root] in x.py
# Fixes
This PR should fix#93344
# Info
Allows usage of sudo (while not accessing root) in x.py
[bootstrap.py] Instruct curl to follow redirect
Some mirror RUSTUP_DIST_SERVER (like https://mirrors.sjtug.sjtu.edu.cn/rust-static) perform redirection when downloading
stage0 compiler. Curl should be able to follow that.
Per https://www.freedesktop.org/software/systemd/man/os-release.html,
> Variable assignment values must be enclosed in double or single quotes
> if they include spaces, semicolons or other special characters outside
> of A–Z, a–z, 0–9. (Assignments that do not include these special
> characters may be enclosed in quotes too, but this is optional.)
So, past `ID=nixos`, let's also check for `ID='nixos'` and `ID="nixos"`.
One of these is necessary between nixos/nixpkgs#162168 and
nixos/nixpkgs#164068, but this seems more correct either way.
This also preserves the behavior where x.py will only give a hard error on a missing config file
if it was configured through `--config` or RUST_BOOTSTRAP_CONFIG.
It also removes the top-level fallback for everything except the default path.
Same rationale as https://github.com/rust-lang/rust/pull/76544;
it would be nice to make python entirely optional at some point.
This also removes $ROOT as an option for the build directory; I haven't been using it, and like Alex
said in https://github.com/rust-lang/rust/pull/76544#discussion_r488248930 it seems like a
misfeature.
This allows running `cargo run` from src/bootstrap, although that still gives
lots of compile errors if you don't use the beta toolchain.
Currently the output of a command like `./x.py build --stage 0 library/std` is this:
```
Updating only changed submodules
Submodules updated in 0.02 seconds
extracting [...]
Compiling [...]
Finished dev [unoptimized] target(s) in 17.53s
Building stage0 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Compling [...]
Finished release [optimized + debuginfo] target(s) in 21.99s
Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Build completed successfully in 0:00:51
```
I find the part before the "Building stage0 std artifacts" a bit confusing.
After this commit, it looks like this:
```
Updating only changed submodules
Submodules updated in 0.02 seconds
extracting [...]
Building rustbuild
Compiling [...]
Finished dev [unoptimized] target(s) in 17.53s
Building stage0 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Compling [...]
Finished release [optimized + debuginfo] target(s) in 21.99s
Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Build completed successfully in 0:00:51
```
The "Building rustbuild" label makes it clear what the first cargo build
invocation is for. The indentation of the "Submodules updated" line
indicates it is a sub-step of a parent task.
Lock bootstrap (x.py) build directory
Closes#76661, closes#80849,
`x.py` creates a lock file at `project_root/lock.db`
r? `@jyn514` , because he was one that told me about this~
Prevent spurious build failures and other bugs caused by parallel runs of
x.py. We back the lock with sqlite, so that we have a cross-platform locking
strategy, and which can be done nearly first in the build process (from Python),
which helps move the lock as early as possible.
Add support for riscv64gc-unknown-freebsd
For https://doc.rust-lang.org/nightly/rustc/target-tier-policy.html#tier-3-target-policy:
* A tier 3 target must have a designated developer or developers (the "target maintainers") on record to be CCed when issues arise regarding the target. (The mechanism to track and CC such developers may evolve over time.)
For all Rust targets on FreeBSD, it's [rust@FreeBSD.org](mailto:rust@FreeBSD.org).
* Targets must use naming consistent with any existing targets; for instance, a target for the same CPU or OS as an existing Rust target should use the same name for that CPU or OS. Targets should normally use the same names and naming conventions as used elsewhere in the broader ecosystem beyond Rust (such as in other toolchains), unless they have a very good reason to diverge. Changing the name of a target can be highly disruptive, especially once the target reaches a higher tier, so getting the name right is important even for a tier 3 target.
Done.
* Target names should not introduce undue confusion or ambiguity unless absolutely necessary to maintain ecosystem compatibility. For example, if the name of the target makes people extremely likely to form incorrect beliefs about what it targets, the name should be changed or augmented to disambiguate it.
Done
* Tier 3 targets may have unusual requirements to build or use, but must not create legal issues or impose onerous legal terms for the Rust project or for Rust developers or users.
Done.
* The target must not introduce license incompatibilities.
Done.
* Anything added to the Rust repository must be under the standard Rust license (MIT OR Apache-2.0).
Fine with me.
* The target must not cause the Rust tools or libraries built for any other host (even when supporting cross-compilation to the target) to depend on any new dependency less permissive than the Rust licensing policy. This applies whether the dependency is a Rust crate that would require adding new license exceptions (as specified by the tidy tool in the rust-lang/rust repository), or whether the dependency is a native library or binary. In other words, the introduction of the target must not cause a user installing or running a version of Rust or the Rust tools to be subject to any new license requirements.
Done.
* If the target supports building host tools (such as rustc or cargo), those host tools must not depend on proprietary (non-FOSS) libraries, other than ordinary runtime libraries supplied by the platform and commonly used by other binaries built for the target. For instance, rustc built for the target may depend on a common proprietary C runtime library or console output library, but must not depend on a proprietary code generation library or code optimization library. Rust's license permits such combinations, but the Rust project has no interest in maintaining such combinations within the scope of Rust itself, even at tier 3.
Done.
* Targets should not require proprietary (non-FOSS) components to link a functional binary or library.
Done.
* "onerous" here is an intentionally subjective term. At a minimum, "onerous" legal/licensing terms include but are not limited to: non-disclosure requirements, non-compete requirements, contributor license agreements (CLAs) or equivalent, "non-commercial"/"research-only"/etc terms, requirements conditional on the employer or employment of any particular Rust developers, revocable terms, any requirements that create liability for the Rust project or its developers or users, or any requirements that adversely affect the livelihood or prospects of the Rust project or its developers or users.
Fine with me.
* Neither this policy nor any decisions made regarding targets shall create any binding agreement or estoppel by any party. If any member of an approving Rust team serves as one of the maintainers of a target, or has any legal or employment requirement (explicit or implicit) that might affect their decisions regarding a target, they must recuse themselves from any approval decisions regarding the target's tier status, though they may otherwise participate in discussions.
Ok.
* This requirement does not prevent part or all of this policy from being cited in an explicit contract or work agreement (e.g. to implement or maintain support for a target). This requirement exists to ensure that a developer or team responsible for reviewing and approving a target does not face any legal threats or obligations that would prevent them from freely exercising their judgment in such approval, even if such judgment involves subjective matters or goes beyond the letter of these requirements.
Ok.
* Tier 3 targets should attempt to implement as much of the standard libraries as possible and appropriate (core for most targets, alloc for targets that can support dynamic memory allocation, std for targets with an operating system or equivalent layer of system-provided functionality), but may leave some code unimplemented (either unavailable or stubbed out as appropriate), whether because the target makes it impossible to implement or challenging to implement. The authors of pull requests are not obligated to avoid calling any portions of the standard library on the basis of a tier 3 target not implementing those portions.
std is implemented.
* The target must provide documentation for the Rust community explaining how to build for the target, using cross-compilation if possible. If the target supports running tests (even if they do not pass), the documentation must explain how to run tests for the target, using emulation if possible or dedicated hardware if necessary.
Building is possible the same way as other Rust on FreeBSD targets.
* Tier 3 targets must not impose burden on the authors of pull requests, or other developers in the community, to maintain the target. In particular, do not post comments (automated or manual) on a PR that derail or suggest a block on the PR based on a tier 3 target. Do not send automated messages or notifications (via any medium, including via `@)` to a PR author or others involved with a PR regarding a tier 3 target, unless they have opted into such messages.
Ok.
* Backlinks such as those generated by the issue/PR tracker when linking to an issue or PR are not considered a violation of this policy, within reason. However, such messages (even on a separate repository) must not generate notifications to anyone involved with a PR who has not requested such notifications.
Ok.
* Patches adding or updating tier 3 targets must not break any existing tier 2 or tier 1 target, and must not knowingly break another tier 3 target without approval of either the compiler team or the maintainers of the other tier 3 target.
Ok.
* In particular, this may come up when working on closely related targets, such as variations of the same architecture with different features. Avoid introducing unconditional uses of features that another variation of the target may not have; use conditional compilation or runtime detection, as appropriate, to let each target run code supported by that target.
Ok.
In some situations we should want on influence into the .cargo/config
when we use vendored source. One example is #90764, when we want to
workaround some references to crates forked and living in git, that are
missing in the vendor/ directory.
This commit will create the .cargo/config file only when the .cargo/
directory needs to be created.
Handling submodule update failures more gracefully from x.py
Addresses #80498
Handling the case where x.py can't check out the right commit of a submodule, because the submodule has local edits that would be overwritten by the checkout, more gracefully.
The error is printed in detail, with some hints on how to revert the local changes to the submodule.
bootstrap: tweak verbosity settings
Currently the verbosity settings are:
- 2: RUSTC-SHIM envvars get spammed on every invocation, O(30) lines
cargo is passed -v which outputs CLI invocations, O(5) lines
- 3: cargo is passed -vv which outputs build script output, O(0-10) lines
This commit changes it to:
- 1: cargo is passed -v, O(5) lines
- 2: cargo is passed -vv, O(10) lines
- 3: RUSTC-SHIM envvars get spammed, O(30) lines
Currently the verbosity settings are:
- 2: RUSTC-SHIM envvars get spammed on every invocation, O(30) lines
cargo is passed -v which outputs CLI invocations, O(5) lines
- 3: cargo is passed -vv which outputs build script output, O(0-10) lines
This commit changes it to:
- 1: cargo is passed -v, O(5) lines
- 2: cargo is passed -vv, O(10) lines
- 3: RUSTC-SHIM envvars get spammed, O(30) lines
Use shallow clones for submodules
This reduces the amount of git history downloaded for submodules from ~67M to ~11M. For comparison, a shallow clone of rust-lang/rust is 103M and a deep clone is 740M, so this almost halves the amount of history necessary if you made a shallow clone to start, and it's a significant reduction even if not.
Closes https://github.com/rust-lang/rust/issues/63978. r? `@Mark-Simulacrum`
Rather than compiling rustbuild and all its dependencies with
`debuginfo=2`, this compiles dependencies without debuginfo and
rustbuild with `debuginfo=1`. On my laptop, this brings compile times
down from ~1:20 to ~1:05.
On NixOS systems, bootstrap will patch rustc used in bootstrapping after
checking `/etc/os-release` (to confirm the current distribution is NixOS).
However, when using Nix on a non-NixOS system, it can be desirable for
bootstrap to patch rustc. In this commit, a `patch-binaries-for-nix`
option is added to `config.toml`, which allows for user opt-in to
bootstrap's Nix patching.
Signed-off-by: David Wood <david.wood@huawei.com>
Pin bootstrap checksums and add a tool to update it automatically
⚠️⚠️ This is just a proactive hardening we're performing on the build system, and it's not prompted by any known compromise. If you're aware of security issues being exploited please [check out our responsible disclosure page](https://www.rust-lang.org/policies/security). ⚠️⚠️
---
This PR aims to improve Rust's supply chain security by pinning the checksums of the bootstrap compiler downloaded by `x.py`, preventing a compromised `static.rust-lang.org` from affecting building the compiler. The checksums are stored in `src/stage0.json`, which replaces `src/stage0.txt`. This PR also adds a tool to automatically update the bootstrap compiler.
The changes in this PR were originally discussed in [Zulip](https://zulip-archive.rust-lang.org/stream/241545-t-release/topic/pinning.20stage0.20hashes.html).
## Potential attack
Before this PR, an attacker who wanted to compromise the bootstrap compiler would "just" need to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries and corresponding `.sha256` files to `static.rust-lang.org`.
There is no signature verification in `x.py` as we don't want the build system to depend on GPG. Also, since the checksums were not pinned inside the repository, they were downloaded from `static.rust-lang.org` too: this only protected from accidental changes in `static.rust-lang.org` that didn't change the `*.sha256` files. The attack would allow the attacker to compromise past and future invocations of `x.py`.
## Mitigations introduced in this PR
This PR adds pinned checksums for all the bootstrap components in `src/stage0.json` instead of downloading the checksums from `static.rust-lang.org`. This changes the attack scenario to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries to `static.rust-lang.org`.
3. Land a (reviewed) change in the `rust-lang/rust` repository changing the pinned hashes.
Even with a successful attack, existing clones of the Rust repository won't be affected, and once the attack is detected reverting the pinned hashes changes should be enough to be protected from the attack. This also enables further mitigations to be implemented in following PRs, such as verifying signatures when pinning new checksums (removing the trust on first use aspect of this PR) and adding a check in CI making sure a PR updating the checksum has not been tampered with (see the future improvements section).
## Additional changes
There are additional changes implemented in this PR to enable the mitigation:
* The `src/stage0.txt` file has been replaced with `src/stage0.json`. The reasoning for the change is that there is existing tooling to read and manipulate JSON files compared to the custom format we were using before, and the slight challenge of manually editing JSON files (no comments, no trailing commas) are not a problem thanks to the new `bump-stage0`.
* A new tool has been added to the repository, `bump-stage0`. When invoked, the tool automatically calculates which release should be used as the bootstrap compiler given the current version and channel, gathers all the relevant checksums and updates `src/stage0.json`. The tool can be invoked by running:
```
./x.py run src/tools/bump-stage0
```
* Support for downloading releases from `https://dev-static.rust-lang.org` has been removed, as it's not possible to verify checksums there (it's customary to replace existing artifacts there if a rebuild is warranted). This will require a change to the release process to avoid bumping the bootstrap compiler on beta before the stable release.
## Future improvements
* Add signature verification as part of `bump-stage0`, which would require the attacker to also obtain the release signing keys in order to successfully compromise the bootstrap compiler. This would be fine to add now, as the burden of installing the tool to verify signatures would only be placed on whoever updates the bootstrap compiler, instead of everyone compiling Rust.
* Add a check on CI that ensures the checksums in `src/stage0.json` are the expected ones. If a PR changes the stage0 file CI should also run the `bump-stage0` tool and fail if the output in CI doesn't match the committed file. This prevents the PR author from tweaking the output of the tool manually, which would otherwise be close to impossible for a human to detect.
* Automate creating the PRs bumping the bootstrap compiler, by setting up a scheduled job in GitHub Actions that runs the tool and opens a PR.
* Investigate whether a similar mitigation can be done for "download from CI" components like the prebuilt LLVM.
r? `@Mark-Simulacrum`
The architecture auto-detect table has no entry for riscv64 (which rustc
uses riscv64gc for the first part of triplet, assuming it's a generic
Linux distro).
Add it to the table to allow riscv64 systems to bootstrap Rust.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Only look for commits by bors that are merge commits, because
those are the only ones with CI artifacts. Also, use
`--first-parent` to avoid traversing stuff like rollup branches.
Use `git rev-list` instead of `git log` to be more robust against
UI changes in git. Also, use the full email address for bors,
because `--author` uses a substring match.
When determining which LLVM artifacts to download, bootstrap.py
calls: `git log --author=bors --format=%H -n1 -m --first-parent --
src/llvm-project src/bootstrap/download-ci-llvm-stamp src/version`.
However, the `-m` option has no effect, per the `git log` help:
> -m
> This option makes diff output for merge commits to be shown in the
> default format. -m will produce the output only if -p is given as
> well. The default format could be changed using log.diffMerges
> configuration parameter, which default value is separate.
Accordingly, this commit removes use of the -m option in favor of
`--no-patch`, to make clear that this command should never output
diff information, as the SHA-1 hash is the only desired output.
Tested using git 2.32, this does not change the
output of the command.
The motivation for this change is that some patched versions of git
change the behavior of the `-m` flag to imply `-p`, rather than to do
nothing unless `-p` is passed. These patched versions of git lead to
this script not working. Google's corp-provided git is one such example.
This only updates the submodules the first time they're needed, instead
of unconditionally the first time you run x.py.
Ideally, this would move *all* submodules and not exclude some tools and
backtrace. Unfortunately, cargo requires all `Cargo.toml` files in the
whole workspace to be present to build any crate.
On my machine, this takes the time for an initial submodule clone (for
`x.py --help`) from 55.70 to 15.87 seconds.
This uses exactly the same logic as the LLVM update used, modulo some
minor cleanups:
- Use a local variable for `src.join(relative_path)`
- Remove unnecessary arrays for `book!` macro and make the macro simpler to use
- Add more comments