mirror of
https://github.com/rust-lang/rust.git
synced 2024-11-26 00:34:06 +00:00
38654ad741
Add explicit-endian String::from_utf16 variants This adds the following APIs under `feature(str_from_utf16_endian)`: ```rust impl String { pub fn from_utf16le(v: &[u8]) -> Result<String, FromUtf16Error>; pub fn from_utf16le_lossy(v: &[u8]) -> String; pub fn from_utf16be(v: &[u8]) -> Result<String, FromUtf16Error>; pub fn from_utf16be_lossy(v: &[u8]) -> String; } ``` These are versions of `String::from_utf16` that explicitly take [UTF-16LE and UTF-16BE](https://unicode.org/faq/utf_bom.html#gen7). Notably, we can do better than just the obvious `decode_utf16(v.array_chunks::<2>().copied().map(u16::from_le_bytes)).collect()` in that: - We handle the case where the byte slice is not an even number of bytes, and - In the case that the UTF-16 is native endian and the slice is aligned, we can forward to `String::from_utf16`. If the Unicode Consortium actively defines how to handle character replacement when decoding a UTF-16 bytestream with a trailing odd byte, I was unable to find reference. However, the behavior implemented here is fairly self-evidently correct: replace the single errant byte with the replacement character. |
||
---|---|---|
.. | ||
alloc | ||
backtrace@99faef833f | ||
core | ||
panic_abort | ||
panic_unwind | ||
portable-simd | ||
proc_macro | ||
profiler_builtins | ||
rtstartup | ||
rustc-std-workspace-alloc | ||
rustc-std-workspace-core | ||
rustc-std-workspace-std | ||
std | ||
stdarch@333e9e9977 | ||
sysroot | ||
test | ||
unwind |