Clean up some Miri things in `sys/windows`
- remove miri hack that is only needed for win7 (we don't support win7 as a target in Miri)
- remove outdated comment now that Miri is on CI
Do not output () on empty description
When passing an explicitly empty description string, as explained here https://github.com/rust-lang/rust/blob/master/config.example.toml#L611-L613, my expectation is that the resulting rustc will be compatible with upstream.
However, it seems that instead, a `()` is added to the end of the version string, causing the version compatibility check to fail. My proposed fix here would be to instead only print `({description})` if `description` is a non-empty string.
Optimize `escape_ascii` using a lookup table
Based upon my suggestion here: https://github.com/rust-lang/rust/pull/125340#issuecomment-2130441817
Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion.
The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700)
But, if you want to try my benchmarking code for yourself:
<details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary>
```rust
#![feature(ascii_char)]
#![feature(ascii_char_variants)]
#![feature(const_option)]
#![feature(let_chains)]
use core::ascii;
use core::ops::Range;
use criterion::{criterion_group, criterion_main, Criterion};
use rand::{thread_rng, Rng};
const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap();
#[inline]
const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 2) };
let mut output = [ascii::Char::Null; N];
output[0] = ascii::Char::ReverseSolidus;
output[1] = a;
(output, 0..2)
}
#[inline]
const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
let mut output = [ascii::Char::Null; N];
let hi = HEX_DIGITS[(byte >> 4) as usize];
let lo = HEX_DIGITS[(byte & 0xf) as usize];
output[0] = ascii::Char::ReverseSolidus;
output[1] = ascii::Char::SmallX;
output[2] = hi;
output[3] = lo;
(output, 0..4)
}
#[inline]
const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 1) };
let mut output = [ascii::Char::Null; N];
output[0] = a;
(output, 0..1)
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
match byte {
b'\t' => backslash(ascii::Char::SmallT),
b'\r' => backslash(ascii::Char::SmallR),
b'\n' => backslash(ascii::Char::SmallN),
b'\\' => backslash(ascii::Char::ReverseSolidus),
b'\'' => backslash(ascii::Char::Apostrophe),
b'\"' => backslash(ascii::Char::QuotationMark),
0x00..=0x1F => hex_escape(byte),
_ => match ascii::Char::from_u8(byte) {
Some(a) => verbatim(a),
None => hex_escape(byte),
},
}
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
/// Lookup table helps us determine how to display character.
///
/// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to
/// indicate whether the result is escaped or unescaped.
///
/// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since
/// escaped NUL will not occur.
const LOOKUP: [u8; 256] = {
let mut arr = [0; 256];
let mut idx = 0;
loop {
arr[idx as usize] = match idx {
// use 8th bit to indicate escaped
b'\t' => 0x80 | b't',
b'\r' => 0x80 | b'r',
b'\n' => 0x80 | b'n',
b'\\' => 0x80 | b'\\',
b'\'' => 0x80 | b'\'',
b'"' => 0x80 | b'"',
// use NUL to indicate hex-escaped
0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0',
_ => idx,
};
if idx == 255 {
break;
}
idx += 1;
}
arr
};
let lookup = LOOKUP[byte as usize];
// 8th bit indicates escape
let lookup_escaped = lookup & 0x80 != 0;
// SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character.
let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) };
if lookup_escaped {
// NUL indicates hex-escaped
if matches!(lookup_ascii, ascii::Char::Null) {
hex_escape(byte)
} else {
backslash(lookup_ascii)
}
} else {
verbatim(lookup_ascii)
}
}
fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> {
let mut vec = Vec::new();
for b in bytes {
let (buf, range) = f(*b);
vec.extend_from_slice(&buf[range.start as usize..range.end as usize]);
}
vec
}
pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("escape_ascii");
group.sample_size(1000);
let rand_200k = &mut [0; 200 * 1024];
thread_rng().fill(&mut rand_200k[..]);
let cat = include_bytes!("/bin/cat");
let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml");
group.bench_function("old_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_old));
});
group.bench_function("new_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_new));
});
group.bench_function("old_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_old));
});
group.bench_function("new_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_new));
});
group.bench_function("old_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_old));
});
group.bench_function("new_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_new));
});
group.finish();
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
```
</details>
My benchmark results:
```
escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms]
Found 22 outliers among 1000 measurements (2.20%)
4 (0.40%) high mild
18 (1.80%) high severe
escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms]
Found 38 outliers among 1000 measurements (3.80%)
38 (3.80%) high mild
escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs]
Found 39 outliers among 1000 measurements (3.90%)
17 (1.70%) high mild
22 (2.20%) high severe
escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs]
Found 107 outliers among 1000 measurements (10.70%)
43 (4.30%) high mild
64 (6.40%) high severe
escape_ascii/old_cargo_toml
time: [23.397 µs 23.699 µs 24.014 µs]
Found 204 outliers among 1000 measurements (20.40%)
21 (2.10%) high mild
183 (18.30%) high severe
escape_ascii/new_cargo_toml
time: [16.404 µs 16.438 µs 16.483 µs]
Found 88 outliers among 1000 measurements (8.80%)
56 (5.60%) high mild
32 (3.20%) high severe
```
Random: 1.7006ms => 1.6953ms (<1% speedup)
Binary: 225.40µs => 165.63µs (26% speedup)
Text: 23.699µs => 16.438µs (30% speedup)
Rollup of 6 pull requests
Successful merges:
- #131086 (Update unicode-width to 0.2.0)
- #131585 (compiletest: Remove the one thing that was checking a directive's `original_line`)
- #131614 (Error on trying to use revisions in `run-make` tests)
- #131638 (compiletest: Move debugger setup code out of `lib.rs`)
- #131641 (switch unicode-data bitsets back to 'static')
- #131642 (Special case error message for a `build-fail` test that failed check build)
r? `@ghost`
`@rustbot` modify labels: rollup
Special case error message for a `build-fail` test that failed check build
A `build-fail` test requires that a check build (roughly `--emit=metadata`, no codegen) succeeds but fails later. Previously, if its check build failed, the user will see the error message
```
error: test compilation failed although it shouldn't!
```
which is confusing. Because the test is `build-fail`, we want the test compilation to fail! This error message doesn't account for the difference between a check build and a complete build, so let's special case the error message for a `build-fail` test whose check build failed to instead say
```
error: `build-fail` test is required to pass check build, but check build failed
```
Fixes#130894.
compiletest: Move debugger setup code out of `lib.rs`
These functions contain a few hundred lines of code for dealing with debuggers (for `debuginfo` tests), and don't really belong in the crate root.
Moving them out to their own module makes `lib.rs` easier to follow.
compiletest: Remove the one thing that was checking a directive's `original_line`
This special handling of `ignore-tidy*` was introduced during the migration to `//`@`` directives (#120881), and has become unnecessary after the subsequent removal of the legacy directive check (#131392).
Fixed get/set thread name implementations for macOS and FreeBSD
So, the story of fixing `pthread_getname_np` and `pthread_setname_np` continues, but this time I fixed the macOS implementation.
### [`pthread_getname_np`](c032e0b076/src/pthread.c (L1160-L1175))
The function never fails except for an invalid thread. Miri never verifies thread identifiers and uses them as indices when accessing a vector of threads. Therefore, the only possible result is `0` and a possibly trimmed output.
```c
int
pthread_getname_np(pthread_t thread, char *threadname, size_t len)
{
if (thread == pthread_self()) {
strlcpy(threadname, thread->pthread_name, len);
return 0;
}
if (!_pthread_validate_thread_and_list_lock(thread)) {
return ESRCH;
}
strlcpy(threadname, thread->pthread_name, len);
_pthread_lock_unlock(&_pthread_list_lock);
return 0;
}
```
#### [`strcpy`](https://www.man7.org/linux/man-pages/man7/strlcpy.7.html)
```
strlcpy(3bsd)
strlcat(3bsd)
Copy and catenate the input string into a destination
string. If the destination buffer, limited by its size,
isn't large enough to hold the copy, the resulting string
is truncated (but it is guaranteed to be null-terminated).
They return the length of the total string they tried to
create.
```
### [`pthread_setname_np`](c032e0b076/src/pthread.c (L1178-L1200))
```c
pthread_setname_np(const char *name)
{
int res;
pthread_t self = pthread_self();
size_t len = 0;
if (name != NULL) {
len = strlen(name);
}
_pthread_validate_signature(self);
res = __proc_info(5, getpid(), 2, (uint64_t)0, (void*)name, (int)len);
if (res == 0) {
if (len > 0) {
strlcpy(self->pthread_name, name, MAXTHREADNAMESIZE);
} else {
bzero(self->pthread_name, MAXTHREADNAMESIZE);
}
}
return res;
}
```
Where `5` is [`PROC_INFO_CALL_SETCONTROL`](8d741a5de7/bsd/sys/proc_info_private.h (L274)), and `2` is [`PROC_INFO_CALL_SETCONTROL`](8d741a5de7/bsd/sys/proc_info.h (L821)). And `__proc_info` is a syscall handled by the XNU kernel by [`proc_info_internal`](8d741a5de7/bsd/kern/proc_info.c (L300-L314)):
```c
int
proc_info_internal(int callnum, int pid, uint32_t flags, uint64_t ext_id, int flavor, uint64_t arg, user_addr_t buffer, uint32_t buffersize, int32_t * retval)
{
switch (callnum) {
// ...
case PROC_INFO_CALL_SETCONTROL:
return proc_setcontrol(pid, flavor, arg, buffer, buffersize, retval);
```
And the actual logic from [`proc_setcontrol`](8d741a5de7/bsd/kern/proc_info.c (L3218-L3227)):
```c
case PROC_SELFSET_THREADNAME: {
/*
* This is a bit ugly, as it copies the name into the kernel, and then
* invokes bsd_setthreadname again to copy it into the uthread name
* buffer. Hopefully this isn't such a hot codepath that an additional
* MAXTHREADNAMESIZE copy is a big issue.
*/
if (buffersize > (MAXTHREADNAMESIZE - 1)) {
return ENAMETOOLONG;
}
```
Unrelated to the current pull request, but perhaps, there's a very ugly thing in the kernel/libc because the last thing happening in `PROC_SELFSET_THREADNAME` is `bsd_setthreadname` which sets the name in the user space. But we just saw that `pthread_setname_np` sets the name in the user space too. Guess, I need to open a ticket in one of Apple's repositories at least to clarify that :D
Remap path prefix in the panic message of `tests/ui/meta/revision-bad.rs`
Otherwise `error-pattern` on the test run stderr can incorrectly match if the paths in panic backtrace has a matching substring (like if we look for `bar` in the error pattern, but the username is `baron`).
Tested locally by checking run output `./x test .\tests\ui\meta\revision-bad.rs -- -- --nocapture`:
```
--- stderr -------------------------------
thread 'main' panicked at remapped\meta\revision-bad.rs:14:5:
foo
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
------------------------------------------
```
Fixes#130996.
Use throw intrinsic from stdarch in wasm libunwind
Tracking issue: #118168
This is a very belated followup to #121438; now that rust-lang/stdarch#1542 is merged, we can use the intrinsic exported from `core::arch` instead of defining it inline. I also cleaned up the cfgs a bit and added a more detailed comment.
force "HEAD" for non-CI and `git_upstream_merge_base` for CI environment
When rust-lang/rust is configured as remote, some of the git logic (for tracking changed files) that uses get_closest_merge_commit starts to produce annoying results as the upstream branch becomes outdated quickly (since it isn't updated with git pull). We can rely on HEAD for non-CI environments as we specifically treat bors commits as merge commits, which also exist on upstream. As for CI environments, we should use `git_upstream_merge_base` to correctly track modified files as bors commits may be in `HEAD` but not yet on the upstream remote.
This is also an alternative fix for https://github.com/rust-lang/rust/issues/129528 since https://github.com/rust-lang/rust/pull/131331 reverts the previous fix attempts.