Compound data types
Besides integer types and reference types, Rust also features a variety of other interesting data types. This part of the tutorial will briefly go over some of these data types and how to interface with them in SAW.
Array types
Rust includes array types where the length of the array is known ahead of time.
For instance, this index
function takes an arr
argument that must contain
exactly three u32
values:
pub fn index(arr: [u32; 3], idx: usize) -> u32 {
arr[idx]
}
While Rust is good at catching many classes of programmer errors at compile
time, one thing that it cannot catch in general is out-of-bounds array
accesses. In this index
example, calling the function with a value of idx
ranging from 0
to 2
is fine, but any other choice of idx
will cause the
program to crash, since the idx
will be out of the bounds of arr
.
SAW is suited to checking for these sorts of out-of-bound accesses. Let’s write
an incorrect spec for index
to illustrate this:
let index_fail_spec = do {
arr <- mir_fresh_var "arr" (mir_array 3 mir_u32);
idx <- mir_fresh_var "idx" mir_usize;
mir_execute_func [mir_term arr, mir_term idx];
mir_return (mir_term {{ arr @ idx }});
};
m <- mir_load_module "arrays.linked-mir.json";
mir_verify m "arrays::index" [] false index_fail_spec z3;
Before we run this with SAW, let’s highlight some of the new concepts that this spec uses:
The type of the
arr
variable is specified usingmir_array 3 mir_u32
. Here, themir_array
function takes the length of the array and the element type as arguments, just as in Rust.The spec declares the return value to be
{{ arr @ idx }}
, where@
is Cryptol’s indexing operator. Also note that it is completely valid to embed a MIR array type into a Cryptol expression, as Cryptol has a sequence type that acts much like arrays do in MIR.
As we hinted above, this spec is wrong, as it says that this should work for
any possible values of idx
. SAW will catch this mistake:
$ saw arrays-fail.saw
[21:03:05.374] Loading file "arrays-fail.saw"
[21:03:05.411] Verifying arrays/47a26581::index[0] ...
[21:03:05.425] Simulating arrays/47a26581::index[0] ...
[21:03:05.426] Checking proof obligations arrays/47a26581::index[0] ...
[21:03:05.445] Subgoal failed: arrays/47a26581::index[0] index out of bounds: the length is move _4 but the index is _3
[21:03:05.445] SolverStats {solverStatsSolvers = fromList ["SBV->Z3"], solverStatsGoalSize = 53}
[21:03:05.445] ----------Counterexample----------
[21:03:05.445] idx: 2147483648
[21:03:05.445] Stack trace:
"mir_verify" (arrays-fail.saw:14:1-14:11)
Proof failed.
We can repair this spec by adding some preconditions:
let index_spec = do {
arr <- mir_fresh_var "arr" (mir_array 3 mir_u32);
idx <- mir_fresh_var "idx" mir_usize;
mir_precond {{ 0 <= idx }}; // Lower bound of idx
mir_precond {{ idx <= 2 }}; // Upper bound of idx
mir_execute_func [mir_term arr, mir_term idx];
mir_return (mir_term {{ arr @ idx }});
};
An alternative way of writing this spec is by using SAW’s mir_array_value
command:
sawscript> :type mir_array_value
MIRType -> [MIRValue] -> MIRValue
Here, the MIRType
argument represents the element type, and the list of
MIRValue
arguments are the element values of the array. We can rewrite
index_spec
using mir_array_value
like so:
let index_alt_spec = do {
arr0 <- mir_fresh_var "arr0" mir_u32;
arr1 <- mir_fresh_var "arr1" mir_u32;
arr2 <- mir_fresh_var "arr2" mir_u32;
let arr = mir_array_value mir_u32 [mir_term arr0, mir_term arr1, mir_term arr2];
idx <- mir_fresh_var "idx" mir_usize;
mir_precond {{ 0 <= idx }}; // Lower bound of idx
mir_precond {{ idx <= 2 }}; // Upper bound of idx
mir_execute_func [arr, mir_term idx];
mir_return (mir_term {{ [arr0, arr1, arr2] @ idx }});
Here, [arr0, arr1, arr2]
is Cryptol notation for constructing a length-3
sequence consisting of arr0
, arr1
, and arr2
as the elements.
index_alt_spec
is equivalent to index_spec
, albeit more verbose. For this
reason, it is usually preferable to use mir_fresh_var
to create an entire
symbolic array rather than creating separate symbolic values for each element
and combining them with mir_array_value
.
There are some situations where mir_array_value
is the only viable choice,
however. Consider this variant of the index
function:
pub fn index_ref_arr(arr: [&u32; 3], idx: usize) -> u32 {
*arr[idx]
}
When writing a SAW spec for index_ref_arr
, we can’t just create a symbolic
variable for arr
using mir_alloc (mir_array 3 ...)
, as the reference values
in the array wouldn’t point to valid memory. Instead, we must individually
allocate the elements of arr
using separate calls to mir_alloc
and then
build up the array using mir_array_value
. (As an exercise, try writing and
verifying a spec for index_ref_arr
).
Tuple types
Rust includes tuple types where the elements of the tuple can be of different types. For example:
pub fn flip(x: (u32, u64)) -> (u64, u32) {
(x.1, x.0)
}
SAW includes a mir_tuple
function for specifying the type of a tuple value.
In addition, one can embed MIR tuples into Cryptol, as Cryptol also includes
tuple types whose fields can be indexed with .0
, .1
, etc. Here is a spec
for flip
that makes use of all these features:
let flip_spec = do {
x <- mir_fresh_var "x" (mir_tuple [mir_u32, mir_u64]);
mir_execute_func [mir_term x];
mir_return (mir_term {{ (x.1, x.0) }});
};
SAW also includes a mir_tuple_value
function for constructing a tuple value
from other MIRValue
s:
sawscript> :type mir_tuple_value
[MIRValue] -> MIRValue
mir_tuple_value
plays a similar role for tuples as mir_array_value
does for
arrays.
Struct types
Rust supports the ability for users to define custom struct types. Structs are uniquely identified by their names, so if you have two structs like these:
pub struct S(u32, u64);
pub struct T(u32, u64);
Then even though the fields of the S
and T
structs are the same, they are
not the same struct. This is a type system feature that Cryptol does not
have, and for this reason, it is not possible to embed MIR struct values into
Cryptol. It is also not possible to use mir_fresh_var
to create a symbolic
struct value. Instead, one can use the mir_struct_value
command:
sawscript> :type mir_struct_value
MIRAdt -> [MIRValue] -> MIRValue
Like with mir_array_value
and mir_tuple_value
, the mir_struct_value
function takes a list of MIRValue
s as arguments. What makes
mir_struct_value
unique is its MIRAdt
argument, which we have not seen up
to this point. In this context, “Adt
” is shorthand for “algebraic data
type”, and Rust’s structs
are an example of ADTs. (Rust also supports enums, another type of ADT that we
will see later in this tutorial.)
ADTs in Rust are named entities, and as such, they have unique identifiers in
the MIR JSON file in which they are defined. Looking up these identifiers can
be somewhat error-prone, so SAW offers a mir_find_adt
command that computes
an ADT’s identifier and returns the MIRAdt
associated with it:
sawscript> :type mir_find_adt
MIRModule -> String -> [MIRType] -> MIRAdt
Here, MIRModule
correspond to the MIR JSON file containing the ADT
definition, and the String
is the name of the ADT whose identifier we want to
look up. The list of MIRType
s represent types to instantiate any type
parameters to the struct (more on this in a bit).
As an example, we can look up the S
and T
structs from above like so:
m <- mir_load_module "structs.linked-mir.json";
let s_adt = mir_find_adt m "structs::S" [];
let t_adt = mir_find_adt m "structs::T" [];
We pass an empty list of MIRType
s to each use of mir_find_adt
, as neither
S
nor T
have any type parameters. An example of a struct that does include
type parameters can be seen here:
pub struct Foo<A, B>(A, B);
As mentioned before, SAW doesn’t support generic definitions out of the box, so
the only way that we can make use of the Foo
struct is by looking up a
particular instantiation of Foo
’s type parameters. If we define a function
like this, for example:
pub fn make_foo() -> Foo<u32, u64> {
Foo(27, 42)
}
Then this function instantiates Foo
’s A
type parameter with u32
and the
B
type parameter with u64
. We can use mir_find_adt
to look up this
particular instantiation of Foo
like so:
let foo_adt = mir_find_adt m "structs::Foo" [mir_u32, mir_u64];
In general, a MIR JSON file can have many separate instantiations of a single
struct’s type parameters, and each instantiation must be looked up separately
using mir_find_adt
.
Having looked up Foo<u32, u64>
using mir_find_adt
, let’s use the resulting
MIRAdt
in a spec:
let make_foo_spec = do {
mir_execute_func [];
let ret = mir_struct_value
foo_adt
[mir_term {{ 27 : [32] }}, mir_term {{ 42 : [64] }}];
mir_return ret;
};
mir_verify m "structs::make_foo" [] false make_foo_spec z3;
Note that we are directly writing out the values 27
and 42
in Cryptol.
Cryptol’s numeric literals can take on many different types, so in order to
disambiguate which type they should be, we give each numeric literal an
explicit type annotation. For instance, the expression 27 : [32]
means that
27
should be a 32-bit integer.
Symbolic structs
Let’s now verify a function that takes a struct value as an argument:
pub struct Bar(u8, u16, Foo<u32, u64>);
pub fn do_stuff_with_bar(b: Bar) {
// ...
}
Moreover, let’s verify this function for all possible Bar
values. One way to
do this is to write a SAW spec that constructs a struct value whose fields are
themselves symbolic:
let bar_adt = mir_find_adt m "structs::Bar" [];
let do_stuff_with_bar_spec1 = do {
z1 <- mir_fresh_var "z1" mir_u32;
z2 <- mir_fresh_var "z2" mir_u64;
let z = mir_struct_value
foo_adt
[mir_term z1, mir_term z2];
x <- mir_fresh_var "x" mir_u8;
y <- mir_fresh_var "y" mir_u16;
let b = mir_struct_value
bar_adt
[mir_term x, mir_term y, z];
mir_execute_func [b];
// ...
};
This is a rather tedious process, however, as we had to repeatedly use
mir_fresh_var
to create a fresh, symbolic value for each field. Moreover,
because mit_fresh_var
does not work for structs, we had to recursively apply
this process in order to create a fresh Foo
value. It works, but it takes a
lot of typing to accomplish.
To make this process less tedious, SAW offers a mir_fresh_expanded_value
command that allows one to create symbolic values of many more types. While
mir_fresh_var
is limited to those MIR types that can be directly converted to
Cryptol, mir_fresh_expanded_value
can create symbolic structs by automating
the process of creating fresh values for each field. This process also applies
recursively for struct fields, such as the Foo
field in Bar
.
As an example, a much shorter way to write the spec above using
mir_fresh_expanded_value
is:
let do_stuff_with_bar_spec2 = do {
b <- mir_fresh_expanded_value "b" (mir_adt bar_adt);
mir_execute_func [b];
// ...
};
That’s it! Note that the string "b"
is used as a prefix for all fresh names
that mir_fresh_expanded_value
generates, so if SAW produces a counterexample
involving this symbolic struct value, one can expect to see names such as
b_0
, b_1
, etc. for the fields of the struct.
mir_fresh_expanded_value
makes it easier to construct large, compound values
that consist of many smaller, inner values. The drawback is that you can’t
refer to these inner values in the postconditions of a spec. As a result, there
are some functions for which mir_fresh_expanded_value
isn’t suitable, so keep
this in mind before reaching for it.
Enum types
Besides structs, another form of ADT that Rust supports are enums. Each enum
has a number of different variants that describe the different ways that an
enum value can look like. A famous example of a Rust enum is the Option
type,
which is defined by the standard library like so:
enum Option<T> {
None,
Some(T),
}
Option
is commonly used in Rust code to represent a value that may be present
(Some
) or absent (None
). For this reason, we will use Option
as our
motivating example of an enum in this section.
First, let’s start by defining some functions that make use of Option
’s
variants:
pub fn i_found_something(x: u32) -> Option<u32> {
Some(x)
}
pub fn i_got_nothing() -> Option<u32> {
None
}
Both functions return an Option<u32>
value, but each function returns a
different variant. In order to tell these variants apart, we need a SAW
function which can construct an enum value that allows the user to pick which
variant they want to construct. The `mir_enum_value function does exactly that:
sawscript> :type mir_enum_value
MIRAdt -> String -> [MIRValue] -> MIRValue
Like mir_struct_value
, mir_enum_value
also requires a MIRAdt
argument in
order to discern which particular enum you want. Unlike mir_struct_value
,
however, it also requires a String
which variant of the enum you want. In the
case of Option
, this String
will either be "None"
or "Some"
. Finally,
the [MIRValue]
arguments represent the fields of the enum variant.
Let’s now verify some enum-related code with SAW. First, we must look up the
Option<u32>
ADT, which works just as if you had a struct type:
let option_u32 = mir_find_adt m "core::option::Option" [mir_u32];
Next, we can use this ADT to construct enum values. We shall use
mir_enum_value
to create a Some
value in the spec for i_found_something
:
let i_found_something_spec = do {
x <- mir_fresh_var "x" mir_u32;
mir_execute_func [mir_term x];
let ret = mir_enum_value option_u32 "Some" [mir_term x];
mir_return ret;
};
mir_verify m "enums::i_found_something" [] false i_found_something_spec z3;
Note that while we used the full identifier core::option::Option
to look up
the Option
ADT, we do not need to use the core::option
prefix when
specifying the "Some"
variant. This is because SAW already knows what the
prefix should be from the option_u32
ADT, so the "Some"
shorthand suffices.
Similarly, we can also write a spec for i_got_nothing
, which uses the None
variant:
let i_got_nothing_spec = do {
mir_execute_func [];
let ret = mir_enum_value option_u32 "None" [];
mir_return ret;
};
mir_verify m "enums::i_got_nothing" [] false i_got_nothing_spec z3;
Symbolic enums
In order to create a symbolic struct, one could create symbolic fields and pack
them into a larger struct value using mir_struct_value
. The same process is
not possible with mir_enum_value
, however, as a symbolic enum value would
need to range over all possible variants in an enum.
Just as mir_fresh_expanded_value
supports creating symbolic structs,
mir_fresh_expanded_value
also supports creating symbolic enum values. For
example, given this function that accepts an Option<u32>
value as an
argument:
pub fn do_stuff_with_option(o: Option<u32>) {
// ...
}
We can write a spec for this function that considers all possible Option<u32>
values like so:
let do_stuff_with_option_spec = do {
o <- mir_fresh_expanded_value "o" (mir_adt option_u32);
mir_execute_func [o];
// ...
};
Here, o
can be a None
value, or it can be a Some
value with a symbolic
field.
Slices
Slices are a particular type of reference that allow referencing contiguous
sequences of elements in a collection, such as an array. Unlike ordinary
references (e.g., &u32
), SAW does not permit allocating a slice directly.
Instead, one must take a slice of an existing reference. To better illustrate
this distinction, consider this function:
pub fn sum_of_prefix(s: &[u32]) -> u32 {
s[0].wrapping_add(s[1])
}
sum_of_prefix
takes a slice to a sequence of u32
s as an argument, indexes
into the first two elements in the sequence, and adds them together. There are
many possible ways we can write a spec for this function, as the slice argument
may be backed by many different sequences. For example, the slice might be
backed by an array whose length is exactly two:
let a1 = [1, 2];
let s1 = &a1[..];
let sum1 = sum_of_prefix(s1);
We could also make a slice whose length is longer than two:
let a2 = [1, 2, 3, 4, 5];
let s2 = &a2[..];
let sum2 = sum_of_prefix(s2);
Alternatively, the slice might be a subset of an array whose length is longer than two:
let a3 = [1, 2, 3, 4, 5];
let s3 = &a3[0..2];
let sum3 = sum_of_prefix(s3);
All of these are valid ways of building the slice argument to sum_of_prefix
.
Let’s try to write SAW specifications that construct these different forms of
slices. To do so, we will need SAW functions that take a reference to a
collection (e.g., an array) and converts them into a slice reference. The
mir_slice_value
function is one such function:
sawscript> :type mir_slice_value
MIRValue -> MIRValue
mir_slice_value arr_ref
is the SAW equivalent of writing arr_ref[..]
. That
is, if arr_ref
is of type &[T; N]
, then mir_slice_value arr_ref
is of
type &[T]
. Note that arr_ref
must be a reference to an array, not an
array itself.
Let’s use mir_slice_value
to write a spec for sum_of_prefix
when the slice
argument is backed by an array of length two:
let sum_of_prefix_spec1 = do {
a_ref <- mir_alloc (mir_array 2 mir_u32);
a_val <- mir_fresh_var "a" (mir_array 2 mir_u32);
mir_points_to a_ref (mir_term a_val);
let s = mir_slice_value a_ref;
mir_execute_func [s];
mir_return (mir_term {{ (a_val @ 0) + (a_val @ 1) }});
};
The first part of this spec allocates an array reference a_ref
and declares
that it points to a fresh array value a_val
. The next part declares a slice
s
that is backed by the entirety of a_ref
, which is then passed as an
argument to the function itself. Finally, the return value is declared to be
the sum of the first and second elements of a_val
, which are the same values
that back the slice s
itself.
As noted above, the sum_of_prefix
function can work with slices of many
different lengths. Here is a slight modification to this spec that declares it
to take a slice of length 5 rather than a slice of length 2:
let sum_of_prefix_spec2 = do {
a_ref <- mir_alloc (mir_array 5 mir_u32);
a_val <- mir_fresh_var "a" (mir_array 5 mir_u32);
mir_points_to a_ref (mir_term a_val);
let s = mir_slice_value a_ref;
mir_execute_func [s];
mir_return (mir_term {{ (a_val @ 0) + (a_val @ 1) }});
};
Both of these examples declare a slice whose length matches the length of the
underlying array. In general, there is no reason that these have to be the
same, and it is perfectly fine for a slice’s length to be less than the the
length of the underlying array. In Rust, for example, we can write a slice of a
subset of an array by writing &arr_ref[0..2]
. The SAW equivalent of this can
be achieved with the mir_slice_range_value
function:
sawscript> :type mir_slice_range_value
MIRValue -> Int -> Int -> MIRValue
mir_slice_range_value
takes takes two additional Int
arguments that
represent (1) the index to start the slice from, and (2) the index at which the
slice ends. For example, mir_slice_range_value arr_ref 0 2
creates a slice
that is backed by the first element (index 0
) and the second element (index
1
) of arr_ref
. Note that the range [0..2]
is half-open, so this range
does not include the third element (index 2
).
For example, here is how to write a spec for sum_of_prefix
where the slice is
a length-2 subset of the original array:
let sum_of_prefix_spec3 = do {
a_ref <- mir_alloc (mir_array 5 mir_u32);
a_val <- mir_fresh_var "a" (mir_array 5 mir_u32);
mir_points_to a_ref (mir_term a_val);
let s = mir_slice_range_value a_ref 0 2;
mir_execute_func [s];
mir_return (mir_term {{ (a_val @ 0) + (a_val @ 1) }});
};
Note that both Int
arguments to mir_slice_range_value
must be concrete
(i.e., not symbolic). (See the section below if you want an explanation for why
they are not allowed to be symbolic.)
Aside: slices of arbitrary length
After reading the section about slices above, one might reasonably wonder: is
there a way to write a more general spec for sum_of_prefix
: that covers all
possible slice lengths n
, where n
is greater than or equal to 2? In this
case, the answer is “no”.
This is a fundamental limitation of the way SAW’s symbolic execution works. The
full reason for why this is the case is somewhat technical (keep reading if you
want to learn more), but the short answer is that if SAW attempts to
simulate code whose length is bounded by a symbolic integer, then SAW will go
into an infinite loop. To avoid this pitfall, the mir_slice_range_value
function very deliberately requires the start and end values to be concrete
integers, as allowing these values to be symbolic would allow users to
inadvertently introduce infinite loops in their specifications.
A longer answer as to why SAW loops forever on computations that are bounded by symbolic lengths: due to the way SAW’s symblolic execution works, it creates a complete model of the behavior of a function for all possible inputs. The way that SAW achieves this is by exploring all possible execution paths through a program. If a program involves a loop, for example, then SAW will unroll all iterations of the loop to construct a model of the loop’s behavior. Similarly, if a sequence (e.g., a slice or array) has an unspecified length, then SAW must consider all possible lengths of the array.
SAW’s ability to completely characterize the behavior of all paths through a function is one of its strengths, as this allows it to prove theorems that other program verification techniques would not. This strength is also a weakness, however. If a loop has a symbolic number of iterations, for example, then SAW will spin forever trying to unroll the loop. Similarly, if a slice were to have a symbolic length, then SAW would spin forever trying to simulate the program for all possible slice lengths.
In general, SAW cannot prevent users from writing programs whose length is bounded by a symbolic value. For now, however, SAW removes one potential footgun by requiring that slice values always have a concrete length.