Clean up some Miri things in `sys/windows`
- remove miri hack that is only needed for win7 (we don't support win7 as a target in Miri)
- remove outdated comment now that Miri is on CI
Do not output () on empty description
When passing an explicitly empty description string, as explained here https://github.com/rust-lang/rust/blob/master/config.example.toml#L611-L613, my expectation is that the resulting rustc will be compatible with upstream.
However, it seems that instead, a `()` is added to the end of the version string, causing the version compatibility check to fail. My proposed fix here would be to instead only print `({description})` if `description` is a non-empty string.
Optimize `escape_ascii` using a lookup table
Based upon my suggestion here: https://github.com/rust-lang/rust/pull/125340#issuecomment-2130441817
Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion.
The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700)
But, if you want to try my benchmarking code for yourself:
<details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary>
```rust
#![feature(ascii_char)]
#![feature(ascii_char_variants)]
#![feature(const_option)]
#![feature(let_chains)]
use core::ascii;
use core::ops::Range;
use criterion::{criterion_group, criterion_main, Criterion};
use rand::{thread_rng, Rng};
const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap();
#[inline]
const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 2) };
let mut output = [ascii::Char::Null; N];
output[0] = ascii::Char::ReverseSolidus;
output[1] = a;
(output, 0..2)
}
#[inline]
const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
let mut output = [ascii::Char::Null; N];
let hi = HEX_DIGITS[(byte >> 4) as usize];
let lo = HEX_DIGITS[(byte & 0xf) as usize];
output[0] = ascii::Char::ReverseSolidus;
output[1] = ascii::Char::SmallX;
output[2] = hi;
output[3] = lo;
(output, 0..4)
}
#[inline]
const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 1) };
let mut output = [ascii::Char::Null; N];
output[0] = a;
(output, 0..1)
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
match byte {
b'\t' => backslash(ascii::Char::SmallT),
b'\r' => backslash(ascii::Char::SmallR),
b'\n' => backslash(ascii::Char::SmallN),
b'\\' => backslash(ascii::Char::ReverseSolidus),
b'\'' => backslash(ascii::Char::Apostrophe),
b'\"' => backslash(ascii::Char::QuotationMark),
0x00..=0x1F => hex_escape(byte),
_ => match ascii::Char::from_u8(byte) {
Some(a) => verbatim(a),
None => hex_escape(byte),
},
}
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
/// Lookup table helps us determine how to display character.
///
/// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to
/// indicate whether the result is escaped or unescaped.
///
/// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since
/// escaped NUL will not occur.
const LOOKUP: [u8; 256] = {
let mut arr = [0; 256];
let mut idx = 0;
loop {
arr[idx as usize] = match idx {
// use 8th bit to indicate escaped
b'\t' => 0x80 | b't',
b'\r' => 0x80 | b'r',
b'\n' => 0x80 | b'n',
b'\\' => 0x80 | b'\\',
b'\'' => 0x80 | b'\'',
b'"' => 0x80 | b'"',
// use NUL to indicate hex-escaped
0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0',
_ => idx,
};
if idx == 255 {
break;
}
idx += 1;
}
arr
};
let lookup = LOOKUP[byte as usize];
// 8th bit indicates escape
let lookup_escaped = lookup & 0x80 != 0;
// SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character.
let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) };
if lookup_escaped {
// NUL indicates hex-escaped
if matches!(lookup_ascii, ascii::Char::Null) {
hex_escape(byte)
} else {
backslash(lookup_ascii)
}
} else {
verbatim(lookup_ascii)
}
}
fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> {
let mut vec = Vec::new();
for b in bytes {
let (buf, range) = f(*b);
vec.extend_from_slice(&buf[range.start as usize..range.end as usize]);
}
vec
}
pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("escape_ascii");
group.sample_size(1000);
let rand_200k = &mut [0; 200 * 1024];
thread_rng().fill(&mut rand_200k[..]);
let cat = include_bytes!("/bin/cat");
let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml");
group.bench_function("old_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_old));
});
group.bench_function("new_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_new));
});
group.bench_function("old_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_old));
});
group.bench_function("new_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_new));
});
group.bench_function("old_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_old));
});
group.bench_function("new_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_new));
});
group.finish();
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
```
</details>
My benchmark results:
```
escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms]
Found 22 outliers among 1000 measurements (2.20%)
4 (0.40%) high mild
18 (1.80%) high severe
escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms]
Found 38 outliers among 1000 measurements (3.80%)
38 (3.80%) high mild
escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs]
Found 39 outliers among 1000 measurements (3.90%)
17 (1.70%) high mild
22 (2.20%) high severe
escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs]
Found 107 outliers among 1000 measurements (10.70%)
43 (4.30%) high mild
64 (6.40%) high severe
escape_ascii/old_cargo_toml
time: [23.397 µs 23.699 µs 24.014 µs]
Found 204 outliers among 1000 measurements (20.40%)
21 (2.10%) high mild
183 (18.30%) high severe
escape_ascii/new_cargo_toml
time: [16.404 µs 16.438 µs 16.483 µs]
Found 88 outliers among 1000 measurements (8.80%)
56 (5.60%) high mild
32 (3.20%) high severe
```
Random: 1.7006ms => 1.6953ms (<1% speedup)
Binary: 225.40µs => 165.63µs (26% speedup)
Text: 23.699µs => 16.438µs (30% speedup)
Rollup of 6 pull requests
Successful merges:
- #131086 (Update unicode-width to 0.2.0)
- #131585 (compiletest: Remove the one thing that was checking a directive's `original_line`)
- #131614 (Error on trying to use revisions in `run-make` tests)
- #131638 (compiletest: Move debugger setup code out of `lib.rs`)
- #131641 (switch unicode-data bitsets back to 'static')
- #131642 (Special case error message for a `build-fail` test that failed check build)
r? `@ghost`
`@rustbot` modify labels: rollup
Special case error message for a `build-fail` test that failed check build
A `build-fail` test requires that a check build (roughly `--emit=metadata`, no codegen) succeeds but fails later. Previously, if its check build failed, the user will see the error message
```
error: test compilation failed although it shouldn't!
```
which is confusing. Because the test is `build-fail`, we want the test compilation to fail! This error message doesn't account for the difference between a check build and a complete build, so let's special case the error message for a `build-fail` test whose check build failed to instead say
```
error: `build-fail` test is required to pass check build, but check build failed
```
Fixes#130894.
compiletest: Move debugger setup code out of `lib.rs`
These functions contain a few hundred lines of code for dealing with debuggers (for `debuginfo` tests), and don't really belong in the crate root.
Moving them out to their own module makes `lib.rs` easier to follow.
compiletest: Remove the one thing that was checking a directive's `original_line`
This special handling of `ignore-tidy*` was introduced during the migration to `//`@`` directives (#120881), and has become unnecessary after the subsequent removal of the legacy directive check (#131392).
Fixed get/set thread name implementations for macOS and FreeBSD
So, the story of fixing `pthread_getname_np` and `pthread_setname_np` continues, but this time I fixed the macOS implementation.
### [`pthread_getname_np`](c032e0b076/src/pthread.c (L1160-L1175))
The function never fails except for an invalid thread. Miri never verifies thread identifiers and uses them as indices when accessing a vector of threads. Therefore, the only possible result is `0` and a possibly trimmed output.
```c
int
pthread_getname_np(pthread_t thread, char *threadname, size_t len)
{
if (thread == pthread_self()) {
strlcpy(threadname, thread->pthread_name, len);
return 0;
}
if (!_pthread_validate_thread_and_list_lock(thread)) {
return ESRCH;
}
strlcpy(threadname, thread->pthread_name, len);
_pthread_lock_unlock(&_pthread_list_lock);
return 0;
}
```
#### [`strcpy`](https://www.man7.org/linux/man-pages/man7/strlcpy.7.html)
```
strlcpy(3bsd)
strlcat(3bsd)
Copy and catenate the input string into a destination
string. If the destination buffer, limited by its size,
isn't large enough to hold the copy, the resulting string
is truncated (but it is guaranteed to be null-terminated).
They return the length of the total string they tried to
create.
```
### [`pthread_setname_np`](c032e0b076/src/pthread.c (L1178-L1200))
```c
pthread_setname_np(const char *name)
{
int res;
pthread_t self = pthread_self();
size_t len = 0;
if (name != NULL) {
len = strlen(name);
}
_pthread_validate_signature(self);
res = __proc_info(5, getpid(), 2, (uint64_t)0, (void*)name, (int)len);
if (res == 0) {
if (len > 0) {
strlcpy(self->pthread_name, name, MAXTHREADNAMESIZE);
} else {
bzero(self->pthread_name, MAXTHREADNAMESIZE);
}
}
return res;
}
```
Where `5` is [`PROC_INFO_CALL_SETCONTROL`](8d741a5de7/bsd/sys/proc_info_private.h (L274)), and `2` is [`PROC_INFO_CALL_SETCONTROL`](8d741a5de7/bsd/sys/proc_info.h (L821)). And `__proc_info` is a syscall handled by the XNU kernel by [`proc_info_internal`](8d741a5de7/bsd/kern/proc_info.c (L300-L314)):
```c
int
proc_info_internal(int callnum, int pid, uint32_t flags, uint64_t ext_id, int flavor, uint64_t arg, user_addr_t buffer, uint32_t buffersize, int32_t * retval)
{
switch (callnum) {
// ...
case PROC_INFO_CALL_SETCONTROL:
return proc_setcontrol(pid, flavor, arg, buffer, buffersize, retval);
```
And the actual logic from [`proc_setcontrol`](8d741a5de7/bsd/kern/proc_info.c (L3218-L3227)):
```c
case PROC_SELFSET_THREADNAME: {
/*
* This is a bit ugly, as it copies the name into the kernel, and then
* invokes bsd_setthreadname again to copy it into the uthread name
* buffer. Hopefully this isn't such a hot codepath that an additional
* MAXTHREADNAMESIZE copy is a big issue.
*/
if (buffersize > (MAXTHREADNAMESIZE - 1)) {
return ENAMETOOLONG;
}
```
Unrelated to the current pull request, but perhaps, there's a very ugly thing in the kernel/libc because the last thing happening in `PROC_SELFSET_THREADNAME` is `bsd_setthreadname` which sets the name in the user space. But we just saw that `pthread_setname_np` sets the name in the user space too. Guess, I need to open a ticket in one of Apple's repositories at least to clarify that :D
Remap path prefix in the panic message of `tests/ui/meta/revision-bad.rs`
Otherwise `error-pattern` on the test run stderr can incorrectly match if the paths in panic backtrace has a matching substring (like if we look for `bar` in the error pattern, but the username is `baron`).
Tested locally by checking run output `./x test .\tests\ui\meta\revision-bad.rs -- -- --nocapture`:
```
--- stderr -------------------------------
thread 'main' panicked at remapped\meta\revision-bad.rs:14:5:
foo
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
------------------------------------------
```
Fixes#130996.
Use throw intrinsic from stdarch in wasm libunwind
Tracking issue: #118168
This is a very belated followup to #121438; now that rust-lang/stdarch#1542 is merged, we can use the intrinsic exported from `core::arch` instead of defining it inline. I also cleaned up the cfgs a bit and added a more detailed comment.
force "HEAD" for non-CI and `git_upstream_merge_base` for CI environment
When rust-lang/rust is configured as remote, some of the git logic (for tracking changed files) that uses get_closest_merge_commit starts to produce annoying results as the upstream branch becomes outdated quickly (since it isn't updated with git pull). We can rely on HEAD for non-CI environments as we specifically treat bors commits as merge commits, which also exist on upstream. As for CI environments, we should use `git_upstream_merge_base` to correctly track modified files as bors commits may be in `HEAD` but not yet on the upstream remote.
This is also an alternative fix for https://github.com/rust-lang/rust/issues/129528 since https://github.com/rust-lang/rust/pull/131331 reverts the previous fix attempts.
remove const_cow_is_borrowed feature gate
The two functions guarded by this are still unstable, and there's no reason to require a separate feature gate for their const-ness -- we can just have `cow_is_borrowed` cover both kinds of stability.
Cc #65143
Shallowly match opaque key in storage
Using a full eq on the key *and* the hidden type means that in cases where we first ambiguously register a `?t` hidden type then constrain that `?t` to be a type that doesn't actually satisfy its bounds, we end up with bogus entries in the opaque type storage. We should commit to the type in the storage if it's registered.
r? lcnr