- Switch from `walkdir` to `ignore`. This required various changes to
make `skip` thread-safe.
- Ignore `build` anywhere in the source tree, not just at the top-level.
We support this in bootstrap, we should support it in tidy too.
As a nice side benefit, this also makes tidy a bit faster.
Before:
```
; hyperfine -i '"/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0-tools-bin/rust-tidy" "/home/gh-jyn514/rust2" "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0/bin/cargo" "/home/gh-jyn514/rust2/build" "32"'
Benchmark 1: "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0-tools-bin/rust-tidy" "/home/gh-jyn514/rust2" "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0/bin/cargo" "/home/gh-jyn514/rust2/build" "32"
Time (mean ± σ): 1.080 s ± 0.008 s [User: 2.616 s, System: 3.243 s]
Range (min … max): 1.069 s … 1.099 s 10 runs
```
After:
```
; hyperfine '"/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0-tools-bin/rust-tidy" "/home/gh-jyn514/rust2" "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0/bin/cargo" "/home/gh-jyn514/rust2/build" "32"'
Benchmark 1: "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0-tools-bin/rust-tidy" "/home/gh-jyn514/rust2" "/home/gh-jyn514/rust2/build/aarch64-unknown-linux-gnu/stage0/bin/cargo" "/home/gh-jyn514/rust2/build" "32"
Time (mean ± σ): 705.0 ms ± 1.4 ms [User: 3179.1 ms, System: 1517.5 ms]
Range (min … max): 702.3 ms … 706.9 ms 10 runs
```
bootstrap: Add llvm-has-rust-patches target option
This is so you can check out an upstream commit in src/llvm-project and
have everything just work.
This simplifies the logic in `is_rust_llvm` a bit; it doesn't need to
check for download-ci-llvm because we would have already errored if both
that and llvm-config were specified on the host platform.
To avoid rust-analyzer and rustc having to wait for each other,
the dev guide mentions using another build directory for RA.
We should also put this into the .gitignore, just like the normal `build`.
These used to be used by codegen-units tests, but were switched from manually specifying directories
to just using `// incremental` in https://github.com/rust-lang/rust/pull/89101.
Remove the old references.
`**node_modules` in a .gitignore is the same than
`*node_modules` or `*****node_modules`.
It matches every file whose name ends with `node_modules`,
including `not_node_modules`.
The intent here was obviously to have `**/node_modules`
which is the same than just `node_modules`.
This adds a binary called `x` in `src/tools/x`. All it does is check the
current directory and its ancestors for a file called `x.py`, and if it
finds one, runs it.
By installing x, you can easily `x.py` from any subdirectory.
It can be installed globally with `cargo install --path src/tools/x`
.gitignore should not ignore files that exist in the repository. The
ignore of .cargo applies to the committed .cargo directory used in an
example:
$ git ls-files --exclude-standard --ignored
src/test/run-make/thumb-none-qemu/example/.cargo/config
Explicitly un-ignore that file.
Refactor unicode.py script
Hi, I noticed that the `unicode.py` script used some deprecated escapes in regular expressions. E.g. `\d`, `\w`, `\.` will be illegal in the future without "raw strings". This is now fixed. I have also cleaned up the script quite a bit.
## Escape deprecation
OK (note the `r`):
`re.compile(r"\d")`
Deprecated (from Python 3.6 onwards, see [here][link1] and [here][link2]):
`re.compile("\d")`.
[link1]: https://docs.python.org/3.6/whatsnew/3.6.html#deprecated-python-behavior
[link2]: https://bugs.python.org/issue27364
This was evident running the script using Python 3.7 like so:
```
$ python3 -Wall unicode.py
unicode.py:227: DeprecationWarning: invalid escape sequence \w
re1 = re.compile("^ *([0-9A-F]+) *; *(\w+)")
unicode.py:228: DeprecationWarning: invalid escape sequence \.
re2 = re.compile("^ *([0-9A-F]+)\.\.([0-9A-F]+) *; *(\w+)")
unicode.py:453: DeprecationWarning: invalid escape sequence \d
pattern = "for Version (\d+)\.(\d+)\.(\d+) of the Unicode"
```
The documentation states that
> A backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases.
## Testing
To test my changes, I had to add support for choosing the Unicode version to use. The script will default to latest release (which is 12.0.0 at the moment, repo has 11.0.0 checked in).
The script generates the exact same output for version 11.0.0 with Python 2.7 and 3.7 and no longer generates any deprecation warnings:
```
$ python3 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 --version
Python 2.7.16
$ python3 --version
Python 3.7.3
```
## Extra functionality
Furthermore, the script will check and download the latest Unicode version by default (without the `-v` argument). The `--help` is below:
```
$ ./unicode.py --help
usage: unicode.py [-h] [-v VERSION]
Regenerate Unicode tables (tables.rs).
optional arguments:
-h, --help show this help message and exit
-v VERSION, --version VERSION
Unicode version to use (if not specified, defaults to
latest available final release).
```
## Cleanups
I have cleaned up the code quite a bit, with Python best practices and code style in mind. I'm happy to provide more details and rationale for all my changes if the reviewers so desire.
One externally visible change is that the Unicode data will now be downloaded into `src/libcore/unicode/downloaded` directory suffixed by Unicode version:
```
$ pwd
.../rust/src/libcore/unicode
$ exa -T downloaded/
downloaded
├── 11.0.0
│ ├── DerivedCoreProperties.txt
│ ├── DerivedNormalizationProps.txt
│ ├── PropList.txt
│ ├── ReadMe.txt
│ ├── Scripts.txt
│ ├── SpecialCasing.txt
│ └── UnicodeData.txt
└── 12.0.0
├── DerivedCoreProperties.txt
├── DerivedNormalizationProps.txt
├── PropList.txt
├── ReadMe.txt
├── Scripts.txt
├── SpecialCasing.txt
└── UnicodeData.txt
```
`BoolTrie` works well for sets of code points spread out through
most of Unicode’s range, but is uses a lot of space for sets
with few, mostly low, code points.
This switches a few of its instances to a similar but simpler trie
data structure.
## Before
`size_of::<BoolTrie>()` is 1552, which is added to
`table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`:
* `Cc_table`: 1632
* `White_Space_table`: 1656
* `Pattern_White_Space_table`: 1640
* Total: 4928 bytes
## After
`size_of::<SmallBoolTrie>()` is 32, which is added to
`t.r1.len() + t.r2.len() * 8`:
* `Cc_table`: 51
* `White_Space_table`: 273
* `Pattern_White_Space_table`: 193
* Total: 517 bytes
## Difference
Every Rust program with `std` statically linked should be about 4 KB smaller.