x86_64 can load unaligned words in a single cache line as fast as
aligned words. Even when crossing cache or page boundaries it is just as
fast to do an unaligned word read instead of multiple byte reads.
Also add a couple more tests & benchmarks.
* mem: Move mem* functions to separate directory
Signed-off-by: Joe Richey <joerichey@google.com>
* memcpy: Create separate memcpy.rs file
Signed-off-by: Joe Richey <joerichey@google.com>
* benches: Add benchmarks for mem* functions
This allows comparing the "normal" implementations to the
implementations provided by this crate.
Signed-off-by: Joe Richey <joerichey@google.com>
* mem: Add REP MOVSB/STOSB implementations
The assembly generated seems correct:
https://rust.godbolt.org/z/GGnec8
Signed-off-by: Joe Richey <joerichey@google.com>
* mem: Add documentations for REP string insturctions
Signed-off-by: Joe Richey <joerichey@google.com>
* Use quad-word rep string instructions
Signed-off-by: Joe Richey <joerichey@google.com>
* Prevent panic when compiled in debug mode
Signed-off-by: Joe Richey <joerichey@google.com>
* Add tests for mem* functions
Signed-off-by: Joe Richey <joerichey@google.com>
* Add build/test with the "asm" feature
Signed-off-by: Joe Richey <joerichey@google.com>
* Add byte length to Bencher
Signed-off-by: Joe Richey <joerichey@google.com>