On-boarding
This page provides learning resources for new GREASE developers.
Binary analysis
General background
- LWN:
- The journey before main(): Basically like "How programs get run{,: ELF binaries}" above, but with Rust-focused examples.
- GDB in The Architecture of Open-Source Applications
- Phrack
- Eli Bendersky's website
- Binary symbolic execution with KLEE-Native
Try it out!
- 
Use bintools
- 
Read objdumptldr, disassemble something!
- 
Read readelftldr, examine an ELF file!
- 
Read nmtldr, look at some symbols!
 
- 
Read 
- 
- Use llvm-dwarfdump
 
- Use 
- Download Ghidra, look at an ELF program
LLVM
- Docs
- LLVM in the Architecture of Open-Source Applications
- A helpful blog
- Another helpful blog
- Yet one more helpful blog
- Are there seriously more blogs?
Try it out!
- 
Compile a program to .ll with clang -fno-discard-value-names -emit-llvm -grecord-gcc-switches -S -O1 test.c
- Do the same thing using compiler explorer
- 
Assemble it to .bc with llvm-as test.bc, disassemble it again withllvm-dis
Under-constrained symbolic execution
- Blog post about an older tool that did this for LLVM
- Original paper about UC-KLEE: Under-Constrained Symbolic Execution: Correctness Checking for Real Code
- Follow-up paper: Sys: A Static/Symbolic Tool for Finding Good Bugs in Good (Browser) Code
Dependently-typed Haskell
GREASE and its underlying libraries are programmed using depedently-typed Haskell.
Singletons
This is the name of the approach of combining GADTs and type families to achieve depedently-typed programming. The main paper is Dependently typed programming with singletons.
The singletons library uses Template Haskell where parameterized-utils encourages a more explicit style.
parameterized-utils
This is our core dependently-typed programming library.
Idioms
See Fancy Haskell idioms for some examples of idioms that arise when doing "fancy" Haskell programming.
Try it out!
Libraries
GREASE makes use of several libraries for binary analysis and symbolic execution.
Macaw
Macaw performs code discovery: given some set of entrypoints (generally addresses of functions, generally from the symbol table), Macaw recovers the CFGs (bounds of and edges between basic blocks) and translates them into its own IR. Macaw CFGs can then be turned into Crucible CFGs for symbolic execution.
- arXiv paper: https://www.arxiv.org/pdf/2407.06375
- API documentation
- GitHub (warning: not well documented)
- Memory model blog post
The closest thing out there to the Macaw/Crucible combo is angr.
Try it out!
Prepare to build Macaw (see the latest build instructions in the README):
git clone https://github.com/GaloisInc/macaw
cd macaw
git submodule update --init
ln -s cabal.project.dist cabal.project
Then print the Macaw CFGs in a test ELF file:
cabal run exe:macaw-x86-dump -- discover x86_symbolic/tests/pass/identity.opt.exe
Pass --crucible to see the Crucible CFGs too.
Crucible
Crucible is an imperative IR. It has an SSA format with basic blocks. It uses dependently-typed programming to enforce well-typedness and the SSA invariant. It comes with a symbolic execution engine. There are many frontends that translate other languages into the Crucible IR, e.g., LLVM (e.g., from C and C++ via Clang), JVM (e.g., from Java), MIR (e.g., from Rust), WebAssembly, and crucially for us, machine code (via Macaw).
Try it out!
First, install Yices and Z3. Then, Prepare to build Crucible (see the latest build instructions in the README):
git clone https://github.com/GaloisInc/crucible
cd crucible
git submodule update --init
The run Crucible on some example IR files:
cabal run exe:crucible -- simulate crucible-cli/test-data/simulate/assert.cbl
What4
What4 is our internal representation of symbolic terms. What4 values are manipulated by Crucible programs. What4 wraps a variety of SMT solvers to make queries about terms.