Commit Graph

38 Commits

Author SHA1 Message Date
Jörn Horstmann
e393f56d37 Improve autovectorization of to_lowercase / to_uppercase functions
Refactor the code in the `convert_while_ascii` helper function to make
it more suitable for auto-vectorization and also process the full ascii
prefix of the string. The generic case conversion logic will only be
invoked starting from the first non-ascii character.

The runtime on microbenchmarks with ascii-only inputs improves between
1.5x for short and 4x for long inputs on x86_64 and aarch64.

The new implementation also encapsulates all unsafe inside the
`convert_while_ascii` function.

Fixes #123712
2024-09-23 11:31:29 +02:00
Michael Goulet
c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Benoît du Garreau
772315de7c Remove generic lifetime parameter of trait Pattern
Use a GAT for `Searcher` associated type because this trait is always
implemented for every lifetime anyway.
2024-07-15 12:12:44 +02:00
Marcondiro
bbdf97254a
fix #124714 str.to_lowercase sigma handling 2024-05-08 17:05:10 +02:00
The 8472
40cf1f9257 optimize str::iter::Chars::advance_by
this avoids part of the char decoding work by not looking at utf8 continuation bytes
2023-11-27 22:06:35 +01:00
bors
4f4dae055b Auto merge of #112387 - clarfonthey:non-panicking-ceil-char-boundary, r=m-ou-se
Don't panic in ceil_char_boundary

Implementing the alternative mentioned in this comment: https://github.com/rust-lang/rust/issues/93743#issuecomment-1579935853

Since `floor_char_boundary` will always work (rounding down to the length of the string is possible), it feels best for `ceil_char_boundary` to not panic either. However, the semantics of "rounding up" past the length of the string aren't very great, which is why the method originally panicked in these cases.

Taking into account how people are using this method, it feels best to simply return the end of the string in these cases, so that the result is still a valid char boundary.
2023-08-15 13:49:24 +00:00
Andrew Tribick
e6fa5c18b5 Fix size_hint for EncodeUtf16 2023-07-20 21:52:33 +02:00
Mark Rousskov
67b0cfc761 Flip cfg's for bootstrap bump 2023-07-12 21:38:55 -04:00
ltdk
d47371de69 Fix test 2023-06-08 09:21:05 -04:00
Urgau
b84c190b9a Allow newly uplifted invalid_from_utf8 lint 2023-05-27 00:18:28 +02:00
Dylan DPC
d694f47baa
Rollup merge of #100311 - xfix:lines-fix-handling-of-bare-cr, r=ChrisDenton
Fix handling of trailing bare CR in str::lines

Continuing from #91191.

Fixes #94435.
2023-03-23 00:00:30 +05:30
The 8472
d576a9b241 add test for issue 104726 2022-11-22 20:58:43 +01:00
The 8472
c37e8fae57 generalize str.contains() tests to a range of haystack sizes
The Big-O is cubic, but this is only called with ~70 chars so it's still fast enough
2022-11-15 18:30:07 +01:00
Konrad Borowski
cef81dcd0a Fix handling of trailing bare CR in str::lines
Previously "bare\r" was split into ["bare"] even though the
documentation said that only LF and CRLF count as newlines.

This fix is a behavioural change, even though it brings the behaviour
into line with the documentation, and into line with that of
`std::io::BufRead::lines()`.

This is an alternative to #91051, which proposes to document rather
than fix the behaviour.

Fixes #94435.

Co-authored-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
2022-10-06 16:05:38 +00:00
Maybe Waffle
e4720e1cf2 Replace most uses of pointer::offset with add and sub 2022-08-21 02:21:41 +04:00
Conrad Ludgate
d0f9930709 improve case conversion happy path 2022-05-26 13:18:57 +01:00
Ralf Jung
85bfe2d99d make utf8_char_counts test faster in Miri 2022-03-31 13:11:44 -04:00
David Tolnay
2ac9efbe95
Debug print char 0 as '\0' rather than '\u{0}' 2022-03-27 04:49:10 -07:00
T-O-R-U-S
72a25d05bf Use implicit capture syntax in format_args
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.

A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
2022-03-10 10:23:40 -05:00
ltdk
edd318c313 Add {floor,ceil}_char_boundary methods to str 2022-02-07 13:34:08 -05:00
Thom Chiovoloni
002aaf2c65
Ensure non-power-of-two sizes are tested in the Chars::count test 2022-02-05 11:15:18 -08:00
Thom Chiovoloni
628b217326
Optimize core::str::Chars::count 2022-02-05 11:15:17 -08:00
Matthias Krüger
60625a6ef0
Rollup merge of #88858 - spektom:to_lower_upper_rev, r=dtolnay
Allow reverse iteration of lowercase'd/uppercase'd chars

The PR implements `DoubleEndedIterator` trait for `ToLowercase` and `ToUppercase`.

This enables reverse iteration of lowercase/uppercase variants of character sequences.
One of use cases:  determining whether a char sequence is a suffix of another one.

Example:

```rust
fn endswith_ignore_case(s1: &str, s2: &str) -> bool {
    for eob in s1
        .chars()
        .flat_map(|c| c.to_lowercase())
        .rev()
        .zip_longest(s2.chars().flat_map(|c| c.to_lowercase()).rev())
    {
        match eob {
            EitherOrBoth::Both(c1, c2) => {
                if c1 != c2 {
                    return false;
                }
            }
            EitherOrBoth::Left(_) => return true,
            EitherOrBoth::Right(_) => return false,
        }
    }
    true
}
```
2021-12-23 00:28:51 +01:00
Frank Steffahn
a957cefda6 Fix a bunch of typos 2021-12-14 16:40:43 +01:00
Maybe Waffle
cf6f64a963 Make slice->str conversion and related functions const
This commit makes the following functions from `core::str` `const fn`:
- `from_utf8[_mut]` (`feature(const_str_from_utf8)`)
- `from_utf8_unchecked_mut` (`feature(const_str_from_utf8_unchecked_mut)`)
- `Utf8Error::{valid_up_to,error_len}` (`feature(const_str_from_utf8)`)
2021-11-18 00:50:42 +03:00
John Kugelman
68b0d86294 Add #[must_use] to remaining core functions 2021-10-30 18:21:29 -04:00
Michael Spector
83925dd453 Allow reverse iteration of lowercase'd/uppercase'd chars 2021-09-11 18:40:04 +03:00
Alexis Bourget
cd04731d3a Add test for the fix 2021-07-11 17:47:57 +02:00
hi-rustin
88abd7d81d Lint for unused borrows as part of UNUSED_MUST_USE 2021-06-18 15:09:40 +08:00
Yechan Bae
6d43225bfb Fixes #80335 2021-02-03 16:36:33 -05:00
Ralf Jung
7e74b72d13 break formatting so rustfmt is happy 2020-12-02 14:09:36 +01:00
Ralf Jung
67a67d827a disable a ptr equality test on Miri 2020-12-02 13:49:33 +01:00
Christiaan Dirkx
be554c4101 Make ui test that are run-pass and do not test the compiler itself library tests 2020-11-30 02:47:32 +01:00
Josh Stone
9202fbdbdb Check for exhaustion in SliceIndex for RangeInclusive 2020-10-20 17:18:08 -07:00
Ayush Kumar Mishra
7d834c87d2 Move Various str tests in library 2020-09-05 17:24:06 +05:30
Aleksey Kladov
6e9dc7d9ff Add str::[r]split_once
This is useful for quick&dirty parsing of key: value config pairs
2020-07-28 09:58:20 +02:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00