On-boarding
This page provides learning resources for new GREASE developers.
Binary analysis
General background
- GDB in The Architecture of Open-Source Applications
- Phrack
- Eli Bendersky's website
- Binary symbolic execution with KLEE-Native
Try it out!
-
Use bintools
-
Read
objdump
tldr, disassemble something! -
Read
readelf
tldr, examine an ELF file! -
Read
nm
tldr, look at some symbols!
-
Read
-
- Use
llvm-dwarfdump
- Use
- Download Ghidra, look at an ELF program
LLVM
- Docs
- LLVM in the Architecture of Open-Source Applications
- A helpful blog
- Another helpful blog
- Yet one more helpful blog
- Are there seriously more blogs?
Try it out!
-
Compile a program to .ll with
clang -fno-discard-value-names -emit-llvm -grecord-gcc-switches -S -O1 test.c
- Do the same thing using compiler explorer
-
Assemble it to .bc with
llvm-as test.bc
, disassemble it again withllvm-dis
Under-constrained symbolic execution
- Blog post about an older tool that did this for LLVM
- Original paper about UC-KLEE: Under-Constrained Symbolic Execution: Correctness Checking for Real Code
- Follow-up paper: Sys: A Static/Symbolic Tool for Finding Good Bugs in Good (Browser) Code
Dependently-typed Haskell
GREASE and its underlying libraries are programmed using depedently-typed Haskell.
Singletons
This is the name of the approach of combining GADTs and type families to achieve depedently-typed programming. The main paper is Dependently typed programming with singletons.
The singletons library uses Template Haskell where parameterized-utils encourages a more explicit style.
parameterized-utils
This is our core dependently-typed programming library.
Idioms
See Fancy Haskell idioms for some examples of idioms that arise when doing "fancy" Haskell programming.
Try it out!
Libraries
GREASE makes use of several libraries for binary analysis and symbolic execution.
Macaw
Macaw performs code discovery: given some set of entrypoints (generally addresses of functions, generally from the symbol table), Macaw recovers the CFGs (bounds of and edges between basic blocks) and translates them into its own IR. Macaw CFGs can then be turned into Crucible CFGs for symbolic execution.
- arXiv paper: https://www.arxiv.org/pdf/2407.06375
- API documentation
- GitHub (warning: not well documented)
- Memory model blog post
The closest thing out there to the Macaw/Crucible combo is angr.
Try it out!
Prepare to build Macaw (see the latest build instructions in the README):
git clone https://github.com/GaloisInc/macaw
cd macaw
git submodule update --init
ln -s cabal.project.dist cabal.project
Then print the Macaw CFGs in a test ELF file:
cabal run exe:macaw-x86-dump -- discover x86_symbolic/tests/pass/identity.opt.exe
Pass --crucible
to see the Crucible CFGs too.
Crucible
Crucible is an imperative IR. It has an SSA format with basic blocks. It uses dependently-typed programming to enforce well-typedness and the SSA invariant. It comes with a symbolic execution engine. There are many frontends that translate other languages into the Crucible IR, e.g., LLVM (e.g., from C and C++ via Clang), JVM (e.g., from Java), MIR (e.g., from Rust), WebAssembly, and crucially for us, machine code (via Macaw).
Try it out!
First, install Yices and Z3. Then, Prepare to build Crucible (see the latest build instructions in the README):
git clone https://github.com/GaloisInc/crucible
cd crucible
git submodule update --init
The run Crucible on some example IR files:
cabal run exe:crucible -- simulate crucible-cli/test-data/simulate/assert.cbl
What4
What4 is our internal representation of symbolic terms. What4 values are manipulated by Crucible programs. What4 wraps a variety of SMT solvers to make queries about terms.