Safe Haskell | None |
---|---|
Language | Haskell2010 |
Data.Macaw.Discovery.Classifier
Description
Definitions supporting block classification during code discovery
This module defines data types and helpers to build block control flow classifiers. It comes with a pre-defined set that work well for most architectures. A reasonable default classifier is provided for all supported architectures. This infrastructure is available to enable derived tools to customize code discovery heuristics, and to enable architectures to provide architecture-specific rules.
Note that this is necessary for generating architecture-specific block terminators that can only be correctly injected based on analysis of values after abstract interpretation is applied to the rest of the code.
Synopsis
- isExecutableSegOff :: forall (w :: Nat). MemSegmentOff w -> Bool
- identifyConcreteAddresses :: forall (w :: Natural). MemWidth w => Memory w -> AbsValue w (BVType w) -> [MemSegmentOff w]
- branchClassifier :: BlockClassifier arch ids
- callClassifier :: BlockClassifier arch ids
- returnClassifier :: BlockClassifier arch ids
- directJumpClassifier :: BlockClassifier arch ids
- noreturnCallClassifier :: BlockClassifier arch ids
- tailCallClassifier :: BlockClassifier arch ids
- branchBlockState :: forall a ids t. Foldable t => ArchitectureInfo a -> AbsProcessorState (ArchReg a) ids -> t (Stmt a ids) -> RegState (ArchReg a) (Value a ids) -> Value a ids BoolType -> Bool -> AbsBlockState (ArchReg a)
- classifierEndBlock :: BlockClassifierContext arch ids -> MemAddr (ArchAddrWidth arch)
Utilities
isExecutableSegOff :: forall (w :: Nat). MemSegmentOff w -> Bool Source #
identifyConcreteAddresses :: forall (w :: Natural). MemWidth w => Memory w -> AbsValue w (BVType w) -> [MemSegmentOff w] Source #
Get code pointers out of a abstract value.
Pre-defined classifiers
branchClassifier :: BlockClassifier arch ids Source #
The classifier for conditional and unconditional branches
Note that this classifier can convert a conditional branch to an unconditional branch if (and only if) the condition is syntactically true or false after constant propagation. It never attempts sophisticated path trimming.
callClassifier :: BlockClassifier arch ids Source #
Use the architecture-specific callback to check if last statement was a call.
Note that in some cases the call is known not to return, and thus this code
will never jump to the return value; in that case, the
noreturnCallClassifier
should fire. As such, callClassifier
should always
be attempted *after* noreturnCallClassifier
.
returnClassifier :: BlockClassifier arch ids Source #
Check this block ends with a return as identified by the
architecture-specific processing. Basic return identification
can be performed by detecting when the Instruction Pointer
(ip_reg) contains the ReturnAddr
symbolic value (initially
placed on the top of the stack or in the Link Register by the
architecture-specific state initializer). However, some
architectures perform expression evaluations on this value before
loading the IP (e.g. ARM will clear the low bit in T32 mode or
the low 2 bits in A32 mode), so the actual detection process is
deferred to architecture-specific functionality.
directJumpClassifier :: BlockClassifier arch ids Source #
Classifies jumps to concrete addresses as unconditional jumps. Note that
this logic is substantially similar to the tailCallClassifier
in cases
where the function does not establish a stack frame (i.e., leaf functions).
Note that known call targets are not eligible to be intra-procedural jump
targets (see classifyDirectJump
). This means that we need to conservatively
prefer to mis-classify terminators as jumps rather than tail calls. The
downside of this choice is that code that could be considered a tail-called
function may be duplicated in some cases (i.e., considered part of multiple
functions).
The alternative interpretation (eagerly preferring tail calls) can cause a
section of a function to be marked as a tail-called function, thereby
blocking the directJumpClassifier
or the branchClassifier
from
recognizing the "callee" as an intra-procedural jump. This results in
classification failures that we don't have any mitigations for.
noreturnCallClassifier :: BlockClassifier arch ids Source #
Attempt to recognize a call to a function that is known to not return. These are effectively tail calls, even if the compiler did not obviously generate a tail call instruction sequence.
This classifier is important because compilers often place garbage instructions (for alignment, or possibly the next function) after calls to no-return functions. Without knowledge of no-return functions, macaw would otherwise think that the callee could return to the garbage instructions, causing later classification failures.
This functionality depends on a set of known non-return functions are
specified as an input to the code discovery process (see pctxKnownFnEntries
).
Note that this classifier should always be run before the callClassifier
.
tailCallClassifier :: BlockClassifier arch ids Source #
Attempt to recognize tail call
The current heuristic is that the target looks like a call, except the stack height in the caller is 0.
Note that, in leaf functions (i.e., with no stack usage), tail calls and jumps look substantially similar. We typically apply the jump classifier first to prefer them, which means that we very rarely recognize tail calls in leaf functions.
Reusable helpers
Arguments
:: forall a ids t. Foldable t | |
=> ArchitectureInfo a | |
-> AbsProcessorState (ArchReg a) ids | |
-> t (Stmt a ids) | |
-> RegState (ArchReg a) (Value a ids) | Register values |
-> Value a ids BoolType | Branch condition |
-> Bool | Flag indicating if branch is true or false. |
-> AbsBlockState (ArchReg a) |
This computes the abstract state for the start of a block for a given branch target. It is used so that we can make use of the branch condition to simplify the abstract state.
classifierEndBlock :: BlockClassifierContext arch ids -> MemAddr (ArchAddrWidth arch) Source #