bootstrap: Add llvm-has-rust-patches target option
This is so you can check out an upstream commit in src/llvm-project and
have everything just work.
This simplifies the logic in `is_rust_llvm` a bit; it doesn't need to
check for download-ci-llvm because we would have already errored if both
that and llvm-config were specified on the host platform.
To avoid rust-analyzer and rustc having to wait for each other,
the dev guide mentions using another build directory for RA.
We should also put this into the .gitignore, just like the normal `build`.
These used to be used by codegen-units tests, but were switched from manually specifying directories
to just using `// incremental` in https://github.com/rust-lang/rust/pull/89101.
Remove the old references.
`**node_modules` in a .gitignore is the same than
`*node_modules` or `*****node_modules`.
It matches every file whose name ends with `node_modules`,
including `not_node_modules`.
The intent here was obviously to have `**/node_modules`
which is the same than just `node_modules`.
This adds a binary called `x` in `src/tools/x`. All it does is check the
current directory and its ancestors for a file called `x.py`, and if it
finds one, runs it.
By installing x, you can easily `x.py` from any subdirectory.
It can be installed globally with `cargo install --path src/tools/x`
.gitignore should not ignore files that exist in the repository. The
ignore of .cargo applies to the committed .cargo directory used in an
example:
$ git ls-files --exclude-standard --ignored
src/test/run-make/thumb-none-qemu/example/.cargo/config
Explicitly un-ignore that file.
Refactor unicode.py script
Hi, I noticed that the `unicode.py` script used some deprecated escapes in regular expressions. E.g. `\d`, `\w`, `\.` will be illegal in the future without "raw strings". This is now fixed. I have also cleaned up the script quite a bit.
## Escape deprecation
OK (note the `r`):
`re.compile(r"\d")`
Deprecated (from Python 3.6 onwards, see [here][link1] and [here][link2]):
`re.compile("\d")`.
[link1]: https://docs.python.org/3.6/whatsnew/3.6.html#deprecated-python-behavior
[link2]: https://bugs.python.org/issue27364
This was evident running the script using Python 3.7 like so:
```
$ python3 -Wall unicode.py
unicode.py:227: DeprecationWarning: invalid escape sequence \w
re1 = re.compile("^ *([0-9A-F]+) *; *(\w+)")
unicode.py:228: DeprecationWarning: invalid escape sequence \.
re2 = re.compile("^ *([0-9A-F]+)\.\.([0-9A-F]+) *; *(\w+)")
unicode.py:453: DeprecationWarning: invalid escape sequence \d
pattern = "for Version (\d+)\.(\d+)\.(\d+) of the Unicode"
```
The documentation states that
> A backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases.
## Testing
To test my changes, I had to add support for choosing the Unicode version to use. The script will default to latest release (which is 12.0.0 at the moment, repo has 11.0.0 checked in).
The script generates the exact same output for version 11.0.0 with Python 2.7 and 3.7 and no longer generates any deprecation warnings:
```
$ python3 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 --version
Python 2.7.16
$ python3 --version
Python 3.7.3
```
## Extra functionality
Furthermore, the script will check and download the latest Unicode version by default (without the `-v` argument). The `--help` is below:
```
$ ./unicode.py --help
usage: unicode.py [-h] [-v VERSION]
Regenerate Unicode tables (tables.rs).
optional arguments:
-h, --help show this help message and exit
-v VERSION, --version VERSION
Unicode version to use (if not specified, defaults to
latest available final release).
```
## Cleanups
I have cleaned up the code quite a bit, with Python best practices and code style in mind. I'm happy to provide more details and rationale for all my changes if the reviewers so desire.
One externally visible change is that the Unicode data will now be downloaded into `src/libcore/unicode/downloaded` directory suffixed by Unicode version:
```
$ pwd
.../rust/src/libcore/unicode
$ exa -T downloaded/
downloaded
├── 11.0.0
│ ├── DerivedCoreProperties.txt
│ ├── DerivedNormalizationProps.txt
│ ├── PropList.txt
│ ├── ReadMe.txt
│ ├── Scripts.txt
│ ├── SpecialCasing.txt
│ └── UnicodeData.txt
└── 12.0.0
├── DerivedCoreProperties.txt
├── DerivedNormalizationProps.txt
├── PropList.txt
├── ReadMe.txt
├── Scripts.txt
├── SpecialCasing.txt
└── UnicodeData.txt
```
`BoolTrie` works well for sets of code points spread out through
most of Unicode’s range, but is uses a lot of space for sets
with few, mostly low, code points.
This switches a few of its instances to a similar but simpler trie
data structure.
## Before
`size_of::<BoolTrie>()` is 1552, which is added to
`table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`:
* `Cc_table`: 1632
* `White_Space_table`: 1656
* `Pattern_White_Space_table`: 1640
* Total: 4928 bytes
## After
`size_of::<SmallBoolTrie>()` is 32, which is added to
`t.r1.len() + t.r2.len() * 8`:
* `Cc_table`: 51
* `White_Space_table`: 273
* `Pattern_White_Space_table`: 193
* Total: 517 bytes
## Difference
Every Rust program with `std` statically linked should be about 4 KB smaller.
A few changes are included here:
* The `winapi` and `url` dependencies were dropped. The source code for these
projects is pretty weighty, and we're about to vendor them, so let's not
commit to that intake just yet. If necessary we can vendor them later but for
now it shouldn't be necessary.
* The `--frozen` flag is now always passed to Cargo, obviating the need for
tidy's `cargo_lock` check.
* Tidy was updated to not check the vendor directory
Closes#34687
This is the build directory our buildbots use, and right now the bots are
running `git clean -f -f -d` to remove all untracked files between runs and this
is accidentally deleting `obj`, so we're building LLVM a lot.
Hopefully this keeps the bots caching `obj` so we can clean it out manually and
leave LLVM around.