Copyedit FFI tutorial

This commit is contained in:
Tim Chevalier 2012-10-09 16:46:16 -07:00
parent 4b3be853af
commit cd6f24f9d1

View File

@ -2,17 +2,15 @@
# Introduction # Introduction
One of Rust's aims, as a system programming language, is to Because Rust is a systems programming language, one of its goals is to
interoperate well with C code. interoperate well with C code.
We'll start with an example. It's a bit bigger than usual, and We'll start with an example, which is a bit bigger than usual. We'll
contains a number of new concepts. We'll go over it one piece at a go over it one piece at a time. This is a program that uses OpenSSL's
time. `SHA1` function to compute the hash of its first command-line
argument, which it then converts to a hexadecimal string and prints to
This is a program that uses OpenSSL's `SHA1` function to compute the standard output. If you have the OpenSSL libraries installed, it
hash of its first command-line argument, which it then converts to a should compile and run without any extra effort.
hexadecimal string and prints to standard output. If you have the
OpenSSL libraries installed, it should 'just work'.
~~~~ {.xfail-test} ~~~~ {.xfail-test}
extern mod std; extern mod std;
@ -32,7 +30,7 @@ fn sha1(data: ~str) -> ~str unsafe {
let bytes = str::to_bytes(data); let bytes = str::to_bytes(data);
let hash = crypto::SHA1(vec::raw::to_ptr(bytes), let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
vec::len(bytes) as c_uint, ptr::null()); vec::len(bytes) as c_uint, ptr::null());
return as_hex(vec::raw::from_buf(hash, 20u)); return as_hex(vec::raw::from_buf(hash, 20));
} }
fn main(args: ~[~str]) { fn main(args: ~[~str]) {
@ -42,26 +40,27 @@ fn main(args: ~[~str]) {
# Foreign modules # Foreign modules
Before we can call `SHA1`, we have to declare it. That is what this Before we can call the `SHA1` function defined in the OpenSSL library, we have
part of the program is responsible for: to declare it. That is what this part of the program does:
~~~~ {.xfail-test} ~~~~ {.xfail-test}
extern mod crypto { extern mod crypto {
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; }
}
~~~~ ~~~~
An `extern` module declaration containing function signatures introduces An `extern` module declaration containing function signatures introduces the
the functions listed as _foreign functions_, that are implemented in some functions listed as _foreign functions_. Foreign functions differ from regular
other language (usually C) and accessed through Rust's foreign function Rust functions in that they are implemented in some other language (usually C)
interface (FFI). An extern module like this is called a foreign module, and and called through Rust's foreign function interface (FFI). An extern module
implicitly tells the compiler to link with a library with the same name as like this is called a foreign module, and implicitly tells the compiler to
the module, and that it will find the foreign functions in that library. link with a library that contains the listed foreign functions, and has the
same name as the module.
In this case, it'll change the name `crypto` to a shared library name In this case, the Rust compiler changes the name `crypto` to a shared library
in a platform-specific way (`libcrypto.so` on Linux, for example), and name in a platform-specific way (`libcrypto.so` on Linux, for example),
link that in. If you want the module to have a different name from the searches for the shared library with that name, and links the library into the
actual library, you can use the `"link_name"` attribute, like: program. If you want the module to have a different name from the actual
library, you can use the `"link_name"` attribute, like:
~~~~ {.xfail-test} ~~~~ {.xfail-test}
#[link_name = "crypto"] #[link_name = "crypto"]
@ -72,11 +71,11 @@ extern mod something {
# Foreign calling conventions # Foreign calling conventions
Most foreign code will be C code, which usually uses the `cdecl` calling Most foreign code is C code, which usually uses the `cdecl` calling
convention, so that is what Rust uses by default when calling foreign convention, so that is what Rust uses by default when calling foreign
functions. Some foreign functions, most notably the Windows API, use other functions. Some foreign functions, most notably the Windows API, use other
calling conventions, so Rust provides a way to hint to the compiler which calling conventions. Rust provides the `"abi"` attribute as a way to hint to
is expected by using the `"abi"` attribute: the compiler which calling convention to use:
~~~~ ~~~~
#[cfg(target_os = "win32")] #[cfg(target_os = "win32")]
@ -86,14 +85,14 @@ extern mod kernel32 {
} }
~~~~ ~~~~
The `"abi"` attribute applies to a foreign module (it can not be applied The `"abi"` attribute applies to a foreign module (it cannot be applied
to a single function within a module), and must be either `"cdecl"` to a single function within a module), and must be either `"cdecl"`
or `"stdcall"`. Other conventions may be defined in the future. or `"stdcall"`. We may extend the compiler in the future to support other
calling conventions.
# Unsafe pointers # Unsafe pointers
The foreign `SHA1` function is declared to take three arguments, and The foreign `SHA1` function takes three arguments, and returns a pointer.
return a pointer.
~~~~ {.xfail-test} ~~~~ {.xfail-test}
# extern mod crypto { # extern mod crypto {
@ -104,21 +103,20 @@ fn SHA1(src: *u8, sz: libc::c_uint, out: *u8) -> *u8;
When declaring the argument types to a foreign function, the Rust When declaring the argument types to a foreign function, the Rust
compiler has no way to check whether your declaration is correct, so compiler has no way to check whether your declaration is correct, so
you have to be careful. If you get the number or types of the you have to be careful. If you get the number or types of the
arguments wrong, you're likely to get a segmentation fault. Or, arguments wrong, you're likely to cause a segmentation fault. Or,
probably even worse, your code will work on one platform, but break on probably even worse, your code will work on one platform, but break on
another. another.
In this case, `SHA1` is defined as taking two `unsigned char*` In this case, we declare that `SHA1` takes two `unsigned char*`
arguments and one `unsigned long`. The rust equivalents are `*u8` arguments and one `unsigned long`. The Rust equivalents are `*u8`
unsafe pointers and an `uint` (which, like `unsigned long`, is a unsafe pointers and an `uint` (which, like `unsigned long`, is a
machine-word-sized type). machine-word-sized type).
Unsafe pointers can be created through various functions in the The standard library provides various functions to create unsafe pointers,
standard lib, usually with `unsafe` somewhere in their name. You can such as those in `core::cast`. Most of these functions have `unsafe` in their
dereference an unsafe pointer with `*` operator, but use name. You can dereference an unsafe pointer with the `*` operator, but use
caution—unlike Rust's other pointer types, unsafe pointers are caution: unlike Rust's other pointer types, unsafe pointers are completely
completely unmanaged, so they might point at invalid memory, or be unmanaged, so they might point at invalid memory, or be null pointers.
null pointers.
# Unsafe blocks # Unsafe blocks
@ -134,12 +132,12 @@ fn sha1(data: ~str) -> ~str {
let bytes = str::to_bytes(data); let bytes = str::to_bytes(data);
let hash = crypto::SHA1(vec::raw::to_ptr(bytes), let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
vec::len(bytes), ptr::null()); vec::len(bytes), ptr::null());
return as_hex(vec::raw::from_buf(hash, 20u)); return as_hex(vec::raw::from_buf(hash, 20));
} }
} }
~~~~ ~~~~
Firstly, what does the `unsafe` keyword at the top of the function First, what does the `unsafe` keyword at the top of the function
mean? `unsafe` is a block modifier—it declares the block following it mean? `unsafe` is a block modifier—it declares the block following it
to be known to be unsafe. to be known to be unsafe.
@ -158,8 +156,8 @@ advertise it to the world. An unsafe function is written like this:
unsafe fn kaboom() { ~"I'm harmless!"; } unsafe fn kaboom() { ~"I'm harmless!"; }
~~~~ ~~~~
This function can only be called from an unsafe block or another This function can only be called from an `unsafe` block or another
unsafe function. `unsafe` function.
# Pointer fiddling # Pointer fiddling
@ -179,35 +177,36 @@ Let's look at our `sha1` function again.
let bytes = str::to_bytes(data); let bytes = str::to_bytes(data);
let hash = crypto::SHA1(vec::raw::to_ptr(bytes), let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
vec::len(bytes), ptr::null()); vec::len(bytes), ptr::null());
return as_hex(vec::raw::from_buf(hash, 20u)); return as_hex(vec::raw::from_buf(hash, 20));
# } # }
# } # }
~~~~ ~~~~
The `str::to_bytes` function is perfectly safe: it converts a string to The `str::to_bytes` function is perfectly safe: it converts a string to a
a `[u8]`. This byte array is then fed to `vec::raw::to_ptr`, which `~[u8]`. The program then feeds this byte array to `vec::raw::to_ptr`, which
returns an unsafe pointer to its contents. returns an unsafe pointer to its contents.
This pointer will become invalid as soon as the vector it points into This pointer will become invalid at the end of the scope in which the vector
is cleaned up, so you should be very careful how you use it. In this it points to (`bytes`) is valid, so you should be very careful how you use
case, the local variable `bytes` outlives the pointer, so we're good. it. In this case, the local variable `bytes` outlives the pointer, so we're
good.
Passing a null pointer as the third argument to `SHA1` makes it use a Passing a null pointer as the third argument to `SHA1` makes it use a
static buffer, and thus save us the effort of allocating memory static buffer, and thus save us the effort of allocating memory
ourselves. `ptr::null` is a generic function that will return an ourselves. `ptr::null` is a generic function that, in this case, returns an
unsafe null pointer of the correct type (Rust generics are awesome unsafe null pointer of type `*u8`. (Rust generics are awesome
like thatthey can take the right form depending on the type that they like that: they can take the right form depending on the type that they
are expected to return). are expected to return.)
Finally, `vec::raw::from_buf` builds up a new `[u8]` from the Finally, `vec::raw::from_buf` builds up a new `~[u8]` from the
unsafe pointer that was returned by `SHA1`. SHA1 digests are always unsafe pointer that `SHA1` returned. SHA1 digests are always
twenty bytes long, so we can pass `20u` for the length of the new twenty bytes long, so we can pass `20` for the length of the new
vector. vector.
# Passing structures # Passing structures
C functions often take pointers to structs as arguments. Since Rust C functions often take pointers to structs as arguments. Since Rust
structs are binary-compatible with C structs, Rust programs can call `struct`s are binary-compatible with C structs, Rust programs can call
such functions directly. such functions directly.
This program uses the POSIX function `gettimeofday` to get a This program uses the POSIX function `gettimeofday` to get a
@ -241,12 +240,12 @@ fn unix_time_in_microseconds() -> u64 unsafe {
The `#[nolink]` attribute indicates that there's no foreign library to The `#[nolink]` attribute indicates that there's no foreign library to
link in. The standard C library is already linked with Rust programs. link in. The standard C library is already linked with Rust programs.
A `timeval`, in C, is a struct with two 32-bit integers. Thus, we In C, a `timeval` is a struct with two 32-bit integer fields. Thus, we
define a struct type with the same contents, and declare define a `struct` type with the same contents, and declare
`gettimeofday` to take a pointer to such a struct. `gettimeofday` to take a pointer to such a `struct`.
The second argument to `gettimeofday` (the time zone) is not used by This program does not use the second argument to `gettimeofday` (the time
this program, so it simply declares it to be a pointer to the nil zone), so the `extern mod` declaration for it simply declares this argument
type. Since all null pointers have the same representation regardless of to be a pointer to the unit type (written `()`). Since all null pointers have
their referent type, this is safe. the same representation regardless of their referent type, this is safe.