macaw-base
Safe HaskellNone
LanguageHaskell2010

Data.Macaw.Discovery.Classifier

Description

Definitions supporting block classification during code discovery

This module defines data types and helpers to build block control flow classifiers. It comes with a pre-defined set that work well for most architectures. A reasonable default classifier is provided for all supported architectures. This infrastructure is available to enable derived tools to customize code discovery heuristics, and to enable architectures to provide architecture-specific rules.

Note that this is necessary for generating architecture-specific block terminators that can only be correctly injected based on analysis of values after abstract interpretation is applied to the rest of the code.

Synopsis

Utilities

identifyConcreteAddresses :: forall (w :: Natural). MemWidth w => Memory w -> AbsValue w (BVType w) -> [MemSegmentOff w] Source #

Get code pointers out of a abstract value.

Pre-defined classifiers

branchClassifier :: BlockClassifier arch ids Source #

The classifier for conditional and unconditional branches

Note that this classifier can convert a conditional branch to an unconditional branch if (and only if) the condition is syntactically true or false after constant propagation. It never attempts sophisticated path trimming.

callClassifier :: BlockClassifier arch ids Source #

Use the architecture-specific callback to check if last statement was a call.

Note that in some cases the call is known not to return, and thus this code will never jump to the return value; in that case, the noreturnCallClassifier should fire. As such, callClassifier should always be attempted *after* noreturnCallClassifier.

returnClassifier :: BlockClassifier arch ids Source #

Check this block ends with a return as identified by the architecture-specific processing. Basic return identification can be performed by detecting when the Instruction Pointer (ip_reg) contains the ReturnAddr symbolic value (initially placed on the top of the stack or in the Link Register by the architecture-specific state initializer). However, some architectures perform expression evaluations on this value before loading the IP (e.g. ARM will clear the low bit in T32 mode or the low 2 bits in A32 mode), so the actual detection process is deferred to architecture-specific functionality.

directJumpClassifier :: BlockClassifier arch ids Source #

Classifies jumps to concrete addresses as unconditional jumps. Note that this logic is substantially similar to the tailCallClassifier in cases where the function does not establish a stack frame (i.e., leaf functions).

Note that known call targets are not eligible to be intra-procedural jump targets (see classifyDirectJump). This means that we need to conservatively prefer to mis-classify terminators as jumps rather than tail calls. The downside of this choice is that code that could be considered a tail-called function may be duplicated in some cases (i.e., considered part of multiple functions).

The alternative interpretation (eagerly preferring tail calls) can cause a section of a function to be marked as a tail-called function, thereby blocking the directJumpClassifier or the branchClassifier from recognizing the "callee" as an intra-procedural jump. This results in classification failures that we don't have any mitigations for.

noreturnCallClassifier :: BlockClassifier arch ids Source #

Attempt to recognize a call to a function that is known to not return. These are effectively tail calls, even if the compiler did not obviously generate a tail call instruction sequence.

This classifier is important because compilers often place garbage instructions (for alignment, or possibly the next function) after calls to no-return functions. Without knowledge of no-return functions, macaw would otherwise think that the callee could return to the garbage instructions, causing later classification failures.

This functionality depends on a set of known non-return functions are specified as an input to the code discovery process (see pctxKnownFnEntries).

Note that this classifier should always be run before the callClassifier.

tailCallClassifier :: BlockClassifier arch ids Source #

Attempt to recognize tail call

The current heuristic is that the target looks like a call, except the stack height in the caller is 0.

Note that, in leaf functions (i.e., with no stack usage), tail calls and jumps look substantially similar. We typically apply the jump classifier first to prefer them, which means that we very rarely recognize tail calls in leaf functions.

Reusable helpers

branchBlockState Source #

Arguments

:: forall a ids t. Foldable t 
=> ArchitectureInfo a 
-> AbsProcessorState (ArchReg a) ids 
-> t (Stmt a ids) 
-> RegState (ArchReg a) (Value a ids)

Register values

-> Value a ids BoolType

Branch condition

-> Bool

Flag indicating if branch is true or false.

-> AbsBlockState (ArchReg a) 

This computes the abstract state for the start of a block for a given branch target. It is used so that we can make use of the branch condition to simplify the abstract state.