Memory model

GREASE builds on the Macaw and Crucible-LLVM memory models.

Background: Crucible-LLVM

In the Crucible-LLVM memory model (permalink),

Each allocation in the memory model is completely isolated from all other allocations and has a unique block identifier. Pointers into the memory model have two components: the identifier of the block they reference and an offset into that block. Note that both the identifier and the offset can be symbolic.

Said another way, the memory model is two-dimensional: One "axis" of a pointer determines which allocation (possibly allocations in the case of a symbolic block identifier) it points to, and the other "axis" is the offset within that allocation.1 For example, the null pointer is represented by the pair (0, 0x0).

Crucible-LLVM memory

Global memory

The Macaw memory model (and so, GREASE) represents a binary's address space as a single Crucible-LLVM allocation of size 0xffffffff. Pointers that appear in the binary become offsets into this allocation. For example, if the global allocation had block identifier 1, the global g was at address 0x903278, then a load from g would become a load from the Crucible-LLVM pointer (1, 0x903278).

Mutable globals

Mutable global variables are tricky:

  • If GREASE initialized mutable global variables to their initial values or to completely symbolic values, it's possible that GREASE would report "bugs" that are infeasible in practice, because the value of the global required to trigger the bug isn't actually feasible at any of the callsites to the function under analysis.
  • On the other hand, if GREASE doesn't initialize global variables at all, any loads from them (that aren't preceeded by writes made by the function under analysis or its callees) will fail. As mutable global variables are pervasive, this would lead to a significant lack of coverage.

GREASE's behavior in this regard is controlled via the --globals flag. The possible values are:

  • Initialized: Each mutable global is initialized using its initializer before analysis of the target function.
  • Symbolic: Each mutable global is initialized to a symbolic value before analysis of the target function.
  • Uninitialized: Mutable globals are left uninitialized. Reads from mutable globals will fail, causing GREASE to be unable to proceed with analysis.

The stack

GREASE represents the stack as a separate Crucible-LLVM allocation. Before symbolically executing the target function, the stack pointer is initialized to point at the end of a fairly large allocation (currently, 1 MiB). This means that GREASE assumes a standard stack discipline, namely that the stack grows downwards from high addresses to low, and functions never access memory above their stack frame (with some exceptions, which are explained below). In theory, false positives may occur if a binary overruns this stack allocation, though in practice, symbolic execution of that much code would take a considerable amount of time.

On some architectures, GREASE will allocate a small amount of space just above the stack frame to store the current function's return address (e.g., on x86-64 and PowerPC) or the back chain (e.g., PowerPC), but otherwise, GREASE will assume that functions never access memory above their stack frame. Note that because GREASE leaves most of the space above the stack frame unallocated by default, this will cause GREASE to reject programs that load function arguments that are spilled to the stack.2 In general, it is challenging to infer how many arguments a function has from analyzing raw assembly code, so GREASE does not make an effort to do so.

Instead, GREASE offers a --stack-argument-slots=<NUM> that can be used to allocate <NUM> additional stack slots above the stack frame before simulating the entrypoint function. <NUM> is 0 by default, so if you know ahead of time how many stack-spilled arguments the entrypoint function requires, you should use --stack-argument-slots to ensure that GREASE can simulate it properly. Note that in general, it is not advisable to run GREASE with a large --stack-argument-slots value. This is because if GREASE reserves more stack space than what is required to model stack-spilled arguments, then reading from the extra stack space will succeed (incorrectly) instead of resulting in an uninitialized stack read error (which is normally what is desired).

The heap

GREASE overrides the behavior of the malloc and free functions such that they interact nicely with the Crucible-LLVM memory model. Each call to malloc will generate a distinct allocation with a non-zero block identifier that is different from all other allocations in the program, including global memory. Similarly, each call to free will deallocate a Crucible-LLVM allocation.

Be mindful of the following limitations of GREASE's model of the heap:

  • There is no limit to the maximum heap size, so GREASE cannot model heap overflows.
  • Because each heap allocation is completely isolated from all other allocations, GREASE cannot model events that would arise after certain types of heap-related exploits (e.g., buffer overflows).
1

Non-pointer bitvectors (e.g. the 5 in int x = 5;) are represented the same way as pointers, but with a block identifier that’s concretely 0.

2

The number of arguments a function must have in order for it to spill some of its arguments to the stack depends on which architecture is being used. In general, the more arguments a function has, the more likely it is to spill arguments to the stack.