Add a link validator to rustbuild
This commit was originally targeted at just adding a link checking script to the rustbuild system. This ended up snowballing a bit to extend rustbuild to be amenable to various tools we have as part of the build system in general.
There's a new `src/tools` directory which has a number of scripts/programs that are purely intended to be used as part of the build system and CI of this repository. This is currently inhabited by rustbook, the error index generator, and a new linkchecker script added as part of this PR. I suspect that more tools like compiletest, tidy scripts, snapshot scripts, etc will migrate their way into this directory over time.
The commit which adds the error index generator shows the steps necessary to add new tools to the build system, namely:
1. New steps are defined for building the tool and running the tool
2. The dependencies are configured
3. The steps are implemented
In terms of the link checker, these commits do a few things:
* A new `src/tools/linkchecker` script is added. This will read an entire documentation tree looking for broken relative links (HTTP links aren't followed yet).
* A large number of broken links throughout the documentation were fixed. Many of these were just broken when viewed from core as opposed to std, but were easily fixed.
* A few rustdoc bugs here and there were fixed
A few categories:
* Links into compiler docs were just all removed as we're not generating
compiler docs.
* Move up one more level to forcibly go to std docs to fix inlined documentation
across the facade crates.
Clarify documentation for string slicing (Index impls)
Clarify documentation for string slicing (Index impls)
- Mention the sugared syntax for the implementations, since it's not
apparent from the docs that `Index<Range<usize>>` corresponds to
`&self[a..b]`.
- Be specific in that start <= end and end <= len
This is just one fix in response to #32057
- Mention the sugared syntax for the implementations, since it's not
apparent from the docs that `Index<Range<usize>>` corresponds to
`&self[a..b]`.
- Be specific in that start <= end and end <= len
Fix panic on string slicing error to truncate the string
The string may be arbitrarily long, but we want to limit the panic
message to a reasonable length. Truncate the string if it is too long
(simply to char boundary).
Also add details to the start <= end message. I think it's ok to flesh
out the code here, since it's in a cold function.
Fixes#32063
This PR implements [RFC 1192](https://github.com/rust-lang/rfcs/blob/master/text/1192-inclusive-ranges.md), which is triple-dot syntax for inclusive range expressions. The new stuff is behind two feature gates (one for the syntax and one for the std::ops types). This replaces the deprecated functionality in std::iter. Along the way I simplified the desugaring for all ranges.
This is my first contribution to rust which changes more than one character outside of a test or comment, so please review carefully! Some of the individual commit messages have more of my notes. Also thanks for putting up with my dumb questions in #rust-internals.
- For implementing `std::ops::RangeInclusive`, I took @Stebalien's suggestion from https://github.com/rust-lang/rfcs/pull/1192#issuecomment-137864421. It seemed to me to make the implementation easier and increase type safety. If that stands, the RFC should be amended to avoid confusion.
- I also kind of like @glaebhoerl's [idea](https://github.com/rust-lang/rfcs/pull/1254#issuecomment-147815299), which is unified inclusive/exclusive range syntax something like `x>..=y`. We can experiment with this while everything is behind a feature gate.
- There are a couple of FIXMEs left (see the last commit). I didn't know what to do about `RangeArgument` and I haven't added `Index` impls yet. Those should be discussed/finished before merging.
cc @Gankro since you [complained](https://www.reddit.com/r/rust/comments/3xkfro/what_happened_to_inclusive_ranges/cy5j0yq)
cc #27777#30877rust-lang/rust#1192rust-lang/rfcs#1254
relevant to #28237 (tracking issue)
The string may be arbitrarily long, but we want to limit the panic
message to a reasonable length. Truncate the string if it is too long
(simply to char boundary).
Also add details to the start <= end message. I think it's ok to flesh
out the code here, since it's in a cold function.
Introduce private function from_raw_parts_mut for str to factor out the logic.
We want to use raw pointers here instead of duplicating a &mut str, to
be on safer ground w.r.t rust aliasing rules.
This commit stabilizes and deprecates the FCP (final comment period) APIs for
the upcoming 1.7 beta release. The specific APIs which changed were:
Stabilized
* `Path::strip_prefix` (renamed from `relative_from`)
* `path::StripPrefixError` (new error type returned from `strip_prefix`)
* `Ipv4Addr::is_loopback`
* `Ipv4Addr::is_private`
* `Ipv4Addr::is_link_local`
* `Ipv4Addr::is_multicast`
* `Ipv4Addr::is_broadcast`
* `Ipv4Addr::is_documentation`
* `Ipv6Addr::is_unspecified`
* `Ipv6Addr::is_loopback`
* `Ipv6Addr::is_unique_local`
* `Ipv6Addr::is_multicast`
* `Vec::as_slice`
* `Vec::as_mut_slice`
* `String::as_str`
* `String::as_mut_str`
* `<[T]>::clone_from_slice` - the `usize` return value is removed
* `<[T]>::sort_by_key`
* `i32::checked_rem` (and other signed types)
* `i32::checked_neg` (and other signed types)
* `i32::checked_shl` (and other signed types)
* `i32::checked_shr` (and other signed types)
* `i32::saturating_mul` (and other signed types)
* `i32::overflowing_add` (and other signed types)
* `i32::overflowing_sub` (and other signed types)
* `i32::overflowing_mul` (and other signed types)
* `i32::overflowing_div` (and other signed types)
* `i32::overflowing_rem` (and other signed types)
* `i32::overflowing_neg` (and other signed types)
* `i32::overflowing_shl` (and other signed types)
* `i32::overflowing_shr` (and other signed types)
* `u32::checked_rem` (and other unsigned types)
* `u32::checked_neg` (and other unsigned types)
* `u32::checked_shl` (and other unsigned types)
* `u32::saturating_mul` (and other unsigned types)
* `u32::overflowing_add` (and other unsigned types)
* `u32::overflowing_sub` (and other unsigned types)
* `u32::overflowing_mul` (and other unsigned types)
* `u32::overflowing_div` (and other unsigned types)
* `u32::overflowing_rem` (and other unsigned types)
* `u32::overflowing_neg` (and other unsigned types)
* `u32::overflowing_shl` (and other unsigned types)
* `u32::overflowing_shr` (and other unsigned types)
* `ffi::IntoStringError`
* `CString::into_string`
* `CString::into_bytes`
* `CString::into_bytes_with_nul`
* `From<CString> for Vec<u8>`
* `From<CString> for Vec<u8>`
* `IntoStringError::into_cstring`
* `IntoStringError::utf8_error`
* `Error for IntoStringError`
Deprecated
* `Path::relative_from` - renamed to `strip_prefix`
* `Path::prefix` - use `components().next()` instead
* `os::unix::fs` constants - moved to the `libc` crate
* `fmt::{radix, Radix, RadixFmt}` - not used enough to stabilize
* `IntoCow` - conflicts with `Into` and may come back later
* `i32::{BITS, BYTES}` (and other integers) - not pulling their weight
* `DebugTuple::formatter` - will be removed
* `sync::Semaphore` - not used enough and confused with system semaphores
Closes#23284
cc #27709 (still lots more methods though)
Closes#27712Closes#27722Closes#27728Closes#27735Closes#27729Closes#27755Closes#27782Closes#27798
This speeds up the ascii case (and long stretches of ascii in otherwise
mixed UTF-8 data) when checking UTF-8 validity.
Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14!
On xml and mostly english language input (en.wikipedia xml dump),
throughput increases by a factor 7.
On mostly non-ASCII input, performance increases slightly or is the
same.
The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, they
are statically elided by llvm and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.
A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.
Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.
Old code is `regular`, this PR is called `fast`.
Datasets:
- `ascii` is just ascii (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia xml dump
- `enwik10` is 100MB of an en.wikipedia xml dump
- `jawik10` is 10MB of a ja.wikipedia xml dump
```
test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular ... bench: 12,250 ns/iter (+/- 437) = 418 MB/s
test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```
Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below
Stabilized APIs
* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
marked as `#[stable]` and the methods appearing via traits for primitives like
`char` and `str` are now also marked as stable. Note that the extension traits
themeselves are marked as unstable as they're imported via the prelude. The
`try!` macro was also moved from the standard library into libcore to have the
same interface. Otherwise the functions all have copied stability from the
standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)
Deprecated APIs
* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`
New APIs (still unstable)
* `<[T]>::sort_by_key` (added to mirror `min_by_key`)
Closes#27585Closes#27704Closes#27707Closes#27710Closes#27711Closes#27727Closes#27740Closes#27744Closes#27799Closes#27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes#28968
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:
Stabilized APIs:
* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`
Deprecated APIs
* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`
Closes#27706Closes#27725
cc #27726 (align not stabilized yet)
Closes#27734Closes#27737Closes#27742Closes#27743Closes#27772Closes#27774Closes#27777Closes#27781
cc #27788 (a few remaining methods though)
Closes#27790Closes#27793Closes#27796Closes#27810
cc #28147 (not all parts stabilized)
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:
Stabilized APIs:
* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`
Deprecated APIs
* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`
Closes#27706Closes#27725
cc #27726 (align not stabilized yet)
Closes#27734Closes#27737Closes#27742Closes#27743Closes#27772Closes#27774Closes#27777Closes#27781
cc #27788 (a few remaining methods though)
Closes#27790Closes#27793Closes#27796Closes#27810
cc #28147 (not all parts stabilized)
Our docs were very basic for the various versions of from_utf8, so
this commit beefs them up.
It also improves docs for the &str variant's error, Utf8Error.
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).
cc #27743
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).
cc #27743
llvm seems to be having some trouble optimizing the iterator-based
string comparsion method into some equivalent to memcmp. This
explicitly calls out to the memcmp intrinisic in order to allow
llvm to generate better code. In some manual benchmarking, this
memcmp-based approach is 20 times faster than the iterator approach.
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.
The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.
[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.mdCloses#28032