Update documentation for moves

This commit is contained in:
Niko Matsakis 2013-05-28 09:33:31 -04:00
parent e35db1ab35
commit f30b538929
3 changed files with 270 additions and 124 deletions

View File

@ -19,15 +19,27 @@ how it works, and then proceed to dive into the theoretical
background. Finally, they go into detail on some of the more subtle
aspects.
# Table of contents
These docs are long. Search for the section you are interested in.
- Overview
- Formal model
- Borrowing and loans
- Moves and initialization
- Future work
# Overview
The borrow checker checks one function at a time. It operates in two
passes. The first pass, called `gather_loans`, walks over the function
and identifies all of the places where borrows occur (e.g., `&`
expressions and `ref` bindings). For each borrow, we check various
basic safety conditions at this time (for example, that the lifetime
of the borrow doesn't exceed the lifetime of the value being
borrowed).
and identifies all of the places where borrows (e.g., `&` expressions
and `ref` bindings) and moves (copies or captures of a linear value)
occur. It also tracks initialization sites. For each borrow and move,
it checks various basic safety conditions at this time (for example,
that the lifetime of the borrow doesn't exceed the lifetime of the
value being borrowed, or that there is no move out of an `&T`
pointee).
It then uses the dataflow module to propagate which of those borrows
may be in scope at each point in the procedure. A loan is considered
@ -41,8 +53,8 @@ it is safe with respect to the in-scope loans.
# Formal model
Let's consider a simple subset of Rust in which you can only borrow
from lvalues like so:
Throughout the docs we'll consider a simple subset of Rust in which
you can only borrow from lvalues, defined like so:
LV = x | LV.f | *LV
@ -65,9 +77,11 @@ struct name and we assume structs are declared like so:
SD = struct S<'LT...> { (f: TY)... }
# An intuitive explanation
# Borrowing and loans
## Issuing loans
## An intuitive explanation
### Issuing loans
Now, imagine we had a program like this:
@ -83,7 +97,7 @@ This is of course dangerous because mutating `x` will free the old
value and hence invalidate `y`. The borrow checker aims to prevent
this sort of thing.
### Loans and restrictions
#### Loans and restrictions
The way the borrow checker works is that it analyzes each borrow
expression (in our simple model, that's stuff like `&LV`, though in
@ -120,7 +134,7 @@ prevents moves from `LV`. I chose not to make `MOVE` a fourth kind of
action because that would imply that sometimes moves are permitted
from restrictived values, which is not the case.
### Example
#### Example
To give you a better feeling for what kind of restrictions derived
from a loan, let's look at the loan `L` that would be issued as a
@ -159,7 +173,7 @@ because `x` is an owned pointer, the path `x` owns the path `*x`.
Therefore, borrowing `(*x).f` yields restrictions on both
`*x` and `x`.
## Checking for illegal assignments, moves, and reborrows
### Checking for illegal assignments, moves, and reborrows
Once we have computed the loans introduced by each borrow, the borrow
checker uses a data flow propagation to compute the full set of loans
@ -181,7 +195,7 @@ The kinds of expressions which in-scope loans can render illegal are:
- *read-only borrows* (`&const lv`): illegal there is an in-scope restriction
against aliasing `lv`.
# Formal rules
## Formal rules
Now that we hopefully have some kind of intuitive feeling for how the
borrow checker works, let's look a bit more closely now at the precise
@ -202,7 +216,7 @@ of this rule: there are comments in the borrowck source referencing
these names, so that you can cross reference to find the actual code
that corresponds to the formal rule.
## The `gather_loans` pass
### The `gather_loans` pass
We start with the `gather_loans` pass, which walks the AST looking for
borrows. For each borrow, there are three bits of information: the
@ -223,7 +237,7 @@ alive and for dynamically freezing `@mut` boxes.
restrictions to maintain memory safety. These are the restrictions
that will go into the final loan. We'll discuss in more detail below.
# Checking mutability
## Checking mutability
Checking mutability is fairly straightforward. We just want to prevent
immutable data from being borrowed as mutable. Note that it is ok to
@ -233,7 +247,7 @@ defined, means that "borrowing `LV` with mutability `MQ` is ok. The
Rust code corresponding to this predicate is the function
`check_mutability` in `middle::borrowck::gather_loans`.
## Checking mutability of variables
### Checking mutability of variables
*Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`,
but also the code in `mem_categorization`.
@ -249,7 +263,7 @@ otherwise the variable must be borrowed as immutable or const:
DECL(X) = imm
MQ = imm | const
## Checking mutability of owned content
### Checking mutability of owned content
Fields and owned pointers inherit their mutability from
their base expressions, so both of their rules basically
@ -262,7 +276,7 @@ delegate the check to the base expression `LV`:
TYPE(LV) = ~Ty
MUTABILITY(LV, MQ)
## Checking mutability of immutable pointer types
### Checking mutability of immutable pointer types
Immutable pointer types like `&T` and `@T` can only
be borrowed if MQ is immutable or const:
@ -275,7 +289,7 @@ be borrowed if MQ is immutable or const:
TYPE(LV) = @Ty
MQ == imm | const
## Checking mutability of mutable pointer types
### Checking mutability of mutable pointer types
`&mut T` and `@mut T` can be frozen, so it is acceptable to borrow
them as either imm or mut:
@ -286,7 +300,7 @@ them as either imm or mut:
MUTABILITY(*LV, MQ) // M-Deref-Managed-Mut
TYPE(LV) = @mut Ty
# Checking lifetime
## Checking lifetime
These rules aim to ensure that no data is borrowed for a scope that
exceeds its lifetime. In addition, these rules manage the rooting and
@ -297,7 +311,7 @@ safely borrowed for the lifetime `LT` with mutability `MQ`". The Rust
code corresponding to this predicate is the module
`middle::borrowck::gather_loans::lifetime`.
## The Scope function
### The Scope function
Several of the rules refer to a helper function `SCOPE(LV)=LT`. The
`SCOPE(LV)` yields the lifetime `LT` for which the lvalue `LV` is
@ -329,7 +343,7 @@ the pointer points at may actually live longer:
SCOPE(*LV) = LT if LV has type &'LT T or &'LT mut T
## Checking lifetime of variables
### Checking lifetime of variables
The rule for variables states that a variable can only be borrowed a
lifetime `LT` that is a subregion of the variable's scope:
@ -337,7 +351,7 @@ lifetime `LT` that is a subregion of the variable's scope:
LIFETIME(X, LT, MQ) // L-Local
LT <= SCOPE(X)
## Checking lifetime for owned content
### Checking lifetime for owned content
The lifetime of a field or owned pointer is the same as the lifetime
of its owner:
@ -349,7 +363,7 @@ of its owner:
TYPE(LV) = ~Ty
LIFETIME(LV, LT, MQ)
## Checking lifetime for derefs of borrowed pointers
### Checking lifetime for derefs of borrowed pointers
Borrowed pointers have a lifetime `LT'` associated with them. The
data they point at has been guaranteed to be valid for at least this
@ -361,7 +375,7 @@ itself:
TYPE(LV) = &LT' Ty OR &LT' mut Ty
LT <= LT'
## Checking lifetime for derefs of managed, immutable pointers
### Checking lifetime for derefs of managed, immutable pointers
Managed pointers are valid so long as the data within them is
*rooted*. There are two ways that this can be achieved. The first is
@ -410,7 +424,7 @@ makes a note in a side-table that the box `LV` must be rooted into the
stack when `*LV` is evaluated, and that this root can be released when
the scope `LT` exits.
## Checking lifetime for derefs of managed, mutable pointers
### Checking lifetime for derefs of managed, mutable pointers
Loans of the contents of mutable managed pointers are simpler in some
ways that loans of immutable managed pointers, because we can never
@ -433,7 +447,7 @@ borrowed and either the old loan or the new loan is a mutable loan
(multiple immutable loans are okay). The lock is released as we exit
the scope `LT`.
# Computing the restrictions
## Computing the restrictions
The final rules govern the computation of *restrictions*, meaning that
we compute the set of actions that will be illegal for the life of the
@ -456,7 +470,7 @@ permits other immutable borows but forbids writes and mutable borows.
Finally, a const borrow just wants to be sure that the value is not
moved out from under it, so no actions are forbidden.
## Restrictions for loans of a local variable
### Restrictions for loans of a local variable
The simplest case is a borrow of a local variable `X`:
@ -464,7 +478,7 @@ The simplest case is a borrow of a local variable `X`:
In such cases we just record the actions that are not permitted.
## Restrictions for loans of fields
### Restrictions for loans of fields
Restricting a field is the same as restricting the owner of that
field:
@ -479,7 +493,7 @@ frozen or aliased, we cannot allow the owner to be frozen or aliased,
since doing so indirectly freezes/aliases the field. This is the
origin of inherited mutability.
## Restrictions for loans of owned pointees
### Restrictions for loans of owned pointees
Because the mutability of owned pointees is inherited, restricting an
owned pointee is similar to restricting a field, in that it implies
@ -494,7 +508,7 @@ on `LV`:
TYPE(LV) = ~Ty
RESTRICTIONS(LV, ACTIONS|MUTATE|CLAIM) = RS
## Restrictions for loans of immutable managed/borrowed pointees
### Restrictions for loans of immutable managed/borrowed pointees
Immutable managed/borrowed pointees are freely aliasable, meaning that
the compiler does not prevent you from copying the pointer. This
@ -523,7 +537,7 @@ report an error because we cannot enforce a lack of aliases on a `&Ty`
or `@Ty` type. That case is described in more detail in the section on
mutable borrowed pointers.
## Restrictions for loans of const aliasable pointees
### Restrictions for loans of const aliasable pointees
Const pointers are read-only. There may be `&mut` or `&` aliases, and
we can not prevent *anything* but moves in that case. So the
@ -535,7 +549,7 @@ result.
RESTRICTIONS(*LV, []) = [] // R-Deref-Const-Borrowed
TYPE(LV) = &const Ty or @const Ty
## Restrictions for loans of mutable borrowed pointees
### Restrictions for loans of mutable borrowed pointees
Borrowing mutable borrowed pointees is a bit subtle because we permit
users to freeze or claim `&mut` pointees. To see what I mean, consider this
@ -645,7 +659,7 @@ any restrictions (beyond the default of "no move"):
Moving from an `&mut` pointee is never legal, so no special
restrictions are needed.
## Restrictions for loans of mutable managed pointees
### Restrictions for loans of mutable managed pointees
With `@mut` pointees, we don't make any static guarantees. But as a
convenience, we still register a restriction against `*LV`, because
@ -654,7 +668,159 @@ that way if we *can* find a simple static error, we will:
RESTRICTIONS(*LV, ACTIONS) = [*LV, ACTIONS] // R-Deref-Managed-Borrowed
TYPE(LV) = @mut Ty
# Some notes for future work
# Moves and initialization
The borrow checker is also in charge of ensuring that:
- all memory which is accessed is initialized
- immutable local variables are assigned at most once.
These are two separate dataflow analyses built on the same
framework. Let's look at checking that memory is initialized first;
the checking of immutable local variabe assignments works in a very
similar way.
To track the initialization of memory, we actually track all the
points in the program that *create uninitialized memory*, meaning
moves and the declaration of uninitialized variables. For each of
these points, we create a bit in the dataflow set. Assignments to a
variable `x` or path `a.b.c` kill the move/uninitialization bits for
those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`).
The bits are also killed when the root variables (`x`, `a`) go out of
scope. Bits are unioned when two control-flow paths join. Thus, the
presence of a bit indicates that the move may have occurred without an
intervening assignment to the same memory. At each use of a variable,
we examine the bits in scope, and check that none of them are
moves/uninitializations of the variable that is being used.
Let's look at a simple example:
fn foo(a: ~int) {
let b: ~int; // Gen bit 0.
if cond { // Bits: 0
use(&*a);
b = a; // Gen bit 1, kill bit 0.
use(&*b);
} else {
// Bits: 0
}
// Bits: 0,1
use(&*a); // Error.
use(&*b); // Error.
}
fn use(a: &int) { }
In this example, the variable `b` is created uninitialized. In one
branch of an `if`, we then move the variable `a` into `b`. Once we
exit the `if`, therefore, it is an error to use `a` or `b` since both
are only conditionally initialized. I have annotated the dataflow
state using comments. There are two dataflow bits, with bit 0
corresponding to the creation of `b` without an initializer, and bit 1
corresponding to the move of `a`. The assignment `b = a` both
generates bit 1, because it is a move of `a`, and kills bit 0, because
`b` is now initialized. On the else branch, though, `b` is never
initialized, and so bit 0 remains untouched. When the two flows of
control join, we union the bits from both sides, resulting in both
bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1
from the "then" branch, showing that `a` may be moved, and any attempt
to use `b` uncovers bit 0, from the "else" branch, showing that `b`
may not be initialized.
## Initialization of immutable variables
Initialization of immutable variables works in a very similar way,
except that:
1. we generate bits for each assignment to a variable;
2. the bits are never killed except when the variable goes out of scope.
Thus the presence of an assignment bit indicates that the assignment
may have occurred. Note that assignments are only killed when the
variable goes out of scope, as it is not relevant whether or not there
has been a move in the meantime. Using these bits, we can declare that
an assignment to an immutable variable is legal iff there is no other
assignment bit to that same variable in scope.
## Why is the design made this way?
It may seem surprising that we assign dataflow bits to *each move*
rather than *each path being moved*. This is somewhat less efficient,
since on each use, we must iterate through all moves and check whether
any of them correspond to the path in question. Similar concerns apply
to the analysis for double assignments to immutable variables. The
main reason to do it this way is that it allows us to print better
error messages, because when a use occurs, we can print out the
precise move that may be in scope, rather than simply having to say
"the variable may not be initialized".
## Data structures used in the move analysis
The move analysis maintains several data structures that enable it to
cross-reference moves and assignments to determine when they may be
moving/assigning the same memory. These are all collected into the
`MoveData` and `FlowedMoveData` structs. The former represents the set
of move paths, moves, and assignments, and the latter adds in the
results of a dataflow computation.
### Move paths
The `MovePath` tree tracks every path that is moved or assigned to.
These paths have the same form as the `LoanPath` data structure, which
in turn is the "real world version of the lvalues `LV` that we
introduced earlier. The difference between a `MovePath` and a `LoanPath`
is that move paths are:
1. Canonicalized, so that we have exactly one copy of each, and
we can refer to move paths by index;
2. Cross-referenced with other paths into a tree, so that given a move
path we can efficiently find all parent move paths and all
extensions (e.g., given the `a.b` move path, we can easily find the
move path `a` and also the move paths `a.b.c`)
3. Cross-referenced with moves and assignments, so that we can
easily find all moves and assignments to a given path.
The mechanism that we use is to create a `MovePath` record for each
move path. These are arranged in an array and are referenced using
`MovePathIndex` values, which are newtype'd indices. The `MovePath`
structs are arranged into a tree, representing using the standard
Knuth representation where each node has a child 'pointer' and a "next
sibling" 'pointer'. In addition, each `MovePath` has a parent
'pointer'. In this case, the 'pointers' are just `MovePathIndex`
values.
In this way, if we want to find all base paths of a given move path,
we can just iterate up the parent pointers (see `each_base_path()` in
the `move_data` module). If we want to find all extensions, we can
iterate through the subtree (see `each_extending_path()`).
### Moves and assignments
There are structs to represent moves (`Move`) and assignments
(`Assignment`), and these are also placed into arrays and referenced
by index. All moves of a particular path are arranged into a linked
lists, beginning with `MovePath.first_move` and continuing through
`Move.next_move`.
We distinguish between "var" assignments, which are assignments to a
variable like `x = foo`, and "path" assignments (`x.f = foo`). This
is because we need to assign dataflows to the former, but not the
latter, so as to check for double initialization of immutable
variables.
### Gathering and checking moves
Like loans, we distinguish two phases. The first, gathering, is where
we uncover all the moves and assignments. As with loans, we do some
basic sanity checking in this phase, so we'll report errors if you
attempt to move out of a borrowed pointer etc. Then we do the dataflow
(see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we
walk back over, identify all uses, assignments, and captures, and
check that they are legal given the set of dataflow bits we have
computed for that program point.
# Future work
While writing up these docs, I encountered some rules I believe to be
stricter than necessary:
@ -671,5 +837,26 @@ stricter than necessary:
extra error message in some cases, though.
- I have not described how closures interact. Current code is unsound.
I am working on describing and implementing the fix.
- If we wish, we can easily extend the move checking to allow finer-grained
tracking of what is initialized and what is not, enabling code like
this:
a = x.f.g; // x.f.g is now uninitialized
// here, x and x.f are not usable, but x.f.h *is*
x.f.g = b; // x.f.g is not initialized
// now x, x.f, x.f.g, x.f.h are all usable
What needs to change here, most likely, is that the `moves` module
should record not only what paths are moved, but what expressions
are actual *uses*. For example, the reference to `x` in `x.f.g = b`
is not a true *use* in the sense that it requires `x` to be fully
initialized. This is in fact why the above code produces an error
today: the reference to `x` in `x.f.g = b` is considered illegal
because `x` is not fully initialized.
There are also some possible refactorings:
- It might be nice to replace all loan paths with the MovePath mechanism,
since they allow lightweight comparison using an integer.
*/

View File

@ -10,7 +10,8 @@
/*!
Data structures used for tracking moves.
Data structures used for tracking moves. Please see the extensive
comments in the section "Moves and initialization" and in `doc.rs`.
*/
@ -29,11 +30,24 @@ use syntax::opt_vec;
use util::ppaux::Repr;
pub struct MoveData {
/// Move paths. See section "Move paths" in `doc.rs`.
paths: ~[MovePath],
/// Cache of loan path to move path index, for easy lookup.
path_map: HashMap<@LoanPath, MovePathIndex>,
/// Each move or uninitialized variable gets an entry here.
moves: ~[Move],
path_assignments: ~[Assignment],
/// Assignments to a variable, like `x = foo`. These are assigned
/// bits for dataflow, since we must track them to ensure that
/// immutable variables are assigned at most once along each path.
var_assignments: ~[Assignment],
/// Assignments to a path, like `x.f = foo`. These are not
/// assigned dataflow bits, but we track them because they still
/// kill move bits.
path_assignments: ~[Assignment],
assignee_ids: HashSet<ast::node_id>,
}
@ -43,34 +57,45 @@ pub struct FlowedMoveData {
// It makes me sad to use @mut here, except that due to
// the visitor design, this is what gather_loans
// must produce.
dfcx_moves: MoveDataFlow,
// We could (and maybe should, for efficiency) combine both move
// and assign data flow into one, but this way it's easier to
// distinguish the bits that correspond to moves and assignments.
dfcx_assign: AssignDataFlow
}
/// Index into `MoveData.paths`, used like a pointer
#[deriving(Eq)]
pub struct MovePathIndex(uint);
static InvalidMovePathIndex: MovePathIndex =
MovePathIndex(uint::max_value);
/// Index into `MoveData.moves`, used like a pointer
#[deriving(Eq)]
pub struct MoveIndex(uint);
static InvalidMoveIndex: MoveIndex =
MoveIndex(uint::max_value);
#[deriving(Eq)]
pub struct VarAssignmentIndex(uint);
static InvalidVarAssignmentIndex: VarAssignmentIndex =
VarAssignmentIndex(uint::max_value);
pub struct MovePath {
index: MovePathIndex,
/// Loan path corresponding to this move path
loan_path: @LoanPath,
/// Parent pointer, `InvalidMovePathIndex` if root
parent: MovePathIndex,
/// Head of linked list of moves to this path,
/// `InvalidMoveIndex` if not moved
first_move: MoveIndex,
/// First node in linked list of children, `InvalidMovePathIndex` if leaf
first_child: MovePathIndex,
/// Next node in linked list of parent's children (siblings),
/// `InvalidMovePathIndex` if none.
next_sibling: MovePathIndex,
}
@ -82,15 +107,27 @@ pub enum MoveKind {
}
pub struct Move {
/// Path being moved.
path: MovePathIndex,
/// id of node that is doing the move.
id: ast::node_id,
/// Kind of move, for error messages.
kind: MoveKind,
/// Next node in linked list of moves from `path`, or `InvalidMoveIndex`
next_move: MoveIndex,
}
pub struct Assignment {
/// Path being assigned.
path: MovePathIndex,
/// id where assignment occurs
id: ast::node_id,
/// span of node where assignment occurs
span: span,
}
@ -153,7 +190,6 @@ impl MoveData {
let index = MovePathIndex(self.paths.len());
self.paths.push(MovePath {
index: index,
loan_path: lp,
parent: InvalidMovePathIndex,
first_move: InvalidMoveIndex,
@ -172,7 +208,6 @@ impl MoveData {
self.mut_path(parent_index).first_child = index;
self.paths.push(MovePath {
index: index,
loan_path: lp,
parent: parent_index,
first_move: InvalidMoveIndex,

View File

@ -112,84 +112,8 @@ variables are captured and how (by ref, by copy, by move).
## Enforcement of Moves
FIXME out of date
The enforcement of moves is somewhat complicated because it is divided
amongst the liveness and borrowck modules. In general, the borrow
checker is responsible for guaranteeing that *only owned data is
moved*. The liveness checker, in contrast, is responsible for
checking that *no variable is used after it is moved*.
To see the difference, let's look at a few examples. Here is a
program fragment where the error would be caught by liveness:
struct Foo { a: int, b: ~int }
let x: Foo = ...;
let y = x.b; // (1)
let z = x; // (2) //~ ERROR use of moved value `x`
Here the liveness checker will see the assignment to `y` moves
invalidates the variable `x` because it moves the expression `x.b`.
An error is resported because `x` is not dead at the point where it is
invalidated.
In more concrete terms, the `moves_map` generated from this example
would contain both the expression `x.b` (1) and the expression `x`
(2). Note that it would not contain `x` (1), because `moves_map` only
contains the outermost expressions that are moved. However, the id of
`x` would be present in the `moved_variables_set`.
Now let's look at another illegal example, but one where liveness would
not catch the error:
struct Foo { a: int, b: ~int }
let x: @Foo = ...;
let y = x.b; //~ ERROR move from managed (@) box
This is an interesting example because the only change I've made is
to make `x` have type `@Foo` and not `Foo`. Thanks to auto-deref,
the expression `x.b` still works, but now it is short for `{x).b`,
and hence the move is actually moving out of the contents of a
managed box, which is illegal. However, liveness knows nothing of
this. It only tracks what variables are used where. The moves
pass (that is, this pass) is also ignorant of such details. From
the perspective of the moves pass, the `let y = x.b` line above
will be categorized as follows:
let y = {(x{Move}) {Move}).b; {Move}
Therefore, the reference to `x` will be present in
`variable_moves_map`, but liveness will not report an error because
there is no subsequent use.
This is where the borrow checker comes in. When the borrow checker
runs, it will see that `x.b` is present in the `moves_map`. It will
use the `mem_categorization` module to determine where the result of
this expression resides in memory and see that it is owned by managed
data, and report an error.
In principle, liveness could use the `mem_categorization` module
itself and check that moves always originate from owned data
(historically, of course, this was not the case; `mem_categorization`
used to be private to the borrow checker). However, there is another
kind of error which liveness could not possibly detect. Sometimes a
move is an error due to an outstanding loan, and it is borrow
checker's job to compute those loans. That is, consider *this*
example:
struct Foo { a: int, b: ~int }
let x: Foo = ...;
let y = &x.b; //~ NOTE loan issued here
let z = x.b; //~ ERROR move with outstanding loan
In this case, `y` is a pointer into `x`, so when `z` tries to move out
of `x`, we get an error. There is no way that liveness could compute
this information without redoing the efforts of the borrow checker.
### Closures
Liveness is somewhat complicated by having to deal with stack
closures. More information to come!
The enforcement of moves is done by the borrow checker. Please see
the section "Moves and initialization" in `middle/borrowck/doc.rs`.
## Distributive property