mate_query.cfl module¶
See documentation on CFLPath
- class mate_query.cfl.CFLPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
abc.ABC
,mate_query.db.Path
A Context-Free-Language path query.
CFL queries find paths in the CPG such that concatenation of all the symbols/labels of the edges of the path form a word in some context-free language. Incremental detection of CFL-paths is done by a push-down automaton (PDA) encoded as a recursive SQL query.
These ‘queries’ actually consist of a few distinct database operations:
populate_edge_symbol_table
constructs a table of symbols for (a relevant subset of) CPG edges. The mapping from edges to symbols is surjective and functional.populate_transition_table
constructs a table of transitions for this PDA, where a transition consists of a functional mapping from a (source state, source stack, input symbol) tuple to a (new state, new stack) tuple.The recursive query that finds CFL-paths is defined in
db.PathBuilder.build
.
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- cfl: ClassVar[bool] = True¶
- classmethod edge_symbol_table_name(graph: CPG) str ¶
- Parameters
graph (CPG) –
- Return type
str
- abstract classmethod populate_edge_symbol_table(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- abstract classmethod populate_transition_table(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- abstract classmethod table_name_prefix() str ¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Path
class return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefix
should return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- classmethod transition_symbol_table_name(graph: CPG) str ¶
- Parameters
graph (CPG) –
- Return type
str
- classmethod update_edge_symbol_table_statistics(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- classmethod update_transition_table_statistics(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- class mate_query.cfl.CSCFGPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.ContextSensitiveCFLPath
Inter-procedural context-sensitive (CS) control-flow graph (CFG) paths.
These use context-sensitive information from the points-to analysis, and do call-return matching. If an unmatched return is encountered, then the PDA returns to any feasible control-flow context of the caller (as determined by the MATE pointer analysis).
In the PDA for this class, “context” usually refers to a k-limited (where k is usually relatively low, like 1 or 2) list of callsites (stack frames). The PDA encoded by this class has:
- States:
INTRA
: The next step should follow intra-procedural controlflow
INTER
: The next step should follow inter-procedural controlflow
Edge symbols (variables appear
<in angle brackets>
, tokens are separated by spaces for clarity):- Symbol:
step_instruction
Edge kinds:
EdgeKind.INSTRUCTION_TO_SUCCESSOR_INSTRUCTION
)Condition: The target node is an instruction which is not a
call
,invoke
orret
, or it is acall
or invoke to a externally defined function (i.e., an LLVM declaration).
- Symbol:
step_instruction_stop
Edge kinds:
EdgeKind.INSTRUCTION_TO_SUCCESSOR_INSTRUCTION
Condition: The target node is a
call
,invoke
orret
instruction to a defined function.
- Symbol:
call : <call instruction> : <caller context> --> <callee context>
Edge kinds:
EdgeKind.CALL_TO_FUNCTION
Condition: The target node is a
Function
called by the source (instruction) node in (control-flow) context<caller context>
, resulting in context<callee context>
.
- Symbol:
entry_block
Edge kinds:
EdgeKind.FUNCTION_TO_ENTRY_BLOCK
Condition: None
- Symbol:
entry_instruction
Edge kinds:
EdgeKind.BLOCK_TO_ENTRY_INSTRUCTION
Condition: None
- Symbol:
return_instruction
Edge kinds:
EdgeKind.RETURN_INSTRUCTION_TO_CALL_RETURN
Condition: None
- Symbol:
return : <instruction> : <caller context> --> <callee context>
Edge kinds:
EdgeKind.CALL_RETURN_TO_CALLER
Condition: The target is the
call
orinvoke
instruction for the call to the function that the control flow is returning from. The call occurred in context<caller context>
, resulting in context<callee context>
.
- Stack symbol templates/formats:
call : <call instruction> : <caller context> --> <callee context>
<context>
Transitions (variables appear
<in angle brackets>
,push
is the cons constructor for the stack type,bot
is the bottom of the stack):- Transition:
Input state:
INTRA
Input stack:
<s>
Edge symbol:
step_instruction
New state:
INTRA
New stack:
<s>
- Transition:
Input state:
INTRA
Input stack:
<s>
Edge symbol:
step_instruction_stop
New state:
INTER
New stack:
<s>
- Transition:
Description: TODO(lb)
Input state:
INTER
Input stack:
push(call : <call instruction 0> : <caller context 0> --> <callee context 0>)
Edge symbol:
call : <call instruction> : <caller context> --> <callee context>
New state:
INTRA
New stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(call : <call instruction 0> : <caller context 0> : <callee context 0>))
Where:
<callee context 0> = <caller context>
and the callee function of the existing stack frame is the caller context of the new stack frame.TODO(lb): how does that relate to
<call instruction>
?
- Transition:
Description: Call with empty stack
Input state:
INTER
Input stack:
bot
Edge symbol:
call : <call instruction> : <caller context> --> <callee context>
New state:
INTRA
New stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
- Transition:
Description: TODO(lb)
Input state:
INTER
Input stack:
push(<caller context>, <s>)
Edge symbol:
call : <call instruction> : <caller context> --> <callee context>
New state:
INTRA
Input stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(<caller context>, <s>))
Where:
<call instruction>
may call<function>
- Transition:
Input state:
INTRA
Input stack:
<s>
Edge symbol:
entry_block
New state:
INTRA
New stack:
<s>
- Transition:
Input state:
INTRA
Input stack:
<s>
Edge symbol:
entry_instruction
New state:
INTRA
New stack:
<s>
- Transition:
Input state:
INTER
Input stack:
<s>
Edge symbol:
return_instruction
New state:
INTRA
New stack:
<s>
- Transition:
Description: Return with a call on the stack
Input state:
INTRA
Input stack:
push(call : <call instruction> : <caller context> --> <callee context>, <s>)
Edge symbol:
return : <call instruction> : <caller context> --> <callee context>
New state:
INTRA
New stack:
<s>
- Transition:
Description: Return with an empty stack
Input state:
INTRA
Input stack:
bot
Edge symbol:
return : <call instruction> : <caller context> --> <callee context>
New state:
INTRA
New stack:
push(<caller context>, bot)
- Transition:
Input state:
INTRA
Input stack:
push(<callee context>, <s>)
Edge symbol:
return : <call instruction> : <caller context> : <callee context>
New state:
INTRA
New stack:
push(<caller context>, <s>)
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement ¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement ¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None ¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str ¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Path
class return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefix
should return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- class mate_query.cfl.CSDataflowPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.ContextSensitiveCFLPath
Thin dataflow paths that use context-sensitivity (CS) from the points-to analysis.
The PDA encoded by this class has:
- States:
DATAFLOW
Edge symbols (variables appear
<in angle brackets>
, tokens are separated by spaces for clarity):- Symbol:
value_definition_to_use
Edge kinds:
EdgeKind.VALUE_DEFINITION_TO_USE
Condition: The target node is an instruction which is not a
call
,invoke
orload
; and the use is not as the first operand of astore
.
- Symbol:
dataflow_from_memory : <context>
Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE
,EdgeKind.DIRECT_DATAFLOW_SIGNATURE
, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATURE
ifthin
isFalse
.Condition: The target node is a dataflow, input, or output signature, and the source node is a memory location.
- Symbol:
dataflow_from_value : <context>
Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE
,EdgeKind.DIRECT_DATAFLOW_SIGNATURE
, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATURE
ifthin
isFalse
.Condition: The target node is a dataflow, input, or output signature, and the source node is not a memory location (i.e., it is an LLVM value).
- Symbol:
dataflow_from_value : <context>
Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE
,EdgeKind.DIRECT_DATAFLOW_SIGNATURE
, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATURE
ifthin
isFalse
.Condition: The target node is a dataflow, input, or output signature, and the source node is not a memory location (i.e., it is an LLVM value).
- Symbol:
dataflow_to_memory : <context>
Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE
,EdgeKind.DIRECT_DATAFLOW_SIGNATURE
, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATURE
ifthin
isFalse
.Condition: The target node is a memory location.
- Symbol:
dataflow_to_value : <context>
Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE
,EdgeKind.DIRECT_DATAFLOW_SIGNATURE
, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATURE
ifthin
isFalse
.Condition: The target node is not a memory location (i.e., it is an LLVM value).
- Symbol:
load : <context>
Edge kinds:
EdgeKind.LOAD_MEMORY
Condition: None
- Symbol:
store : <context>
Edge kinds:
EdgeKind.STORE_MEMORY
Condition: None
- Symbol:
operand_to_param_binding
Edge kinds:
EdgeKind.OPERAND_TO_PARAM_BINDING
Condition: None
- Symbol:
call : <call instruction> : <caller context> --> <callee context>
Edge kinds:
EdgeKind.PARAM_BINDING_TO_ARG
Condition: The parameter was bound by this call in this calling context
- Symbol:
return_value_to_call_return
Edge kinds:
EdgeKind.RETURN_VALUE_TO_CALL_RETURN
Condition: None
- Symbol:
return : <call instruction> : <caller context> --> <callee context>
Edge kinds:
EdgeKind.CALL_RETURN_TO_CALLER
Condition: This return is from a (potential) call in context
<caller context>
, resulting in callee context<callee context>
.
Stack symbol templates/formats:
<call instruction> : <caller context> --> <callee context>
<context>
Transitions (variables appear
<in angle brackets>
,push
is the cons constructor for the stack type,bot
is the bottom of the stack, all input and output states areDATAFLOW
):- Transition:
Input stack:
<s>
Edge symbol:
value_definition_to_use
New stack:
<s>
- Transition:
Input stack:
<s>
Edge symbol:
operand_to_param_binding
New stack:
<s>
- Transition:
Input stack:
<s>
Edge symbol:
return_value_to_call_return
New stack:
<s>
- Transition:
Description: TODO(lb)
Input stack:
push(call : <call instruction 0> : <caller context 0> --> <callee context 0>)
Edge symbol:
call : <call instruction> : <caller context> --> <callee context>
New stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(call : <call instruction 0> : <caller context 0> : <callee context 0>))
Where:
<callee context 0> = <caller context>
and the callee function of the existing stack frame is the caller context of the new stack frame.TODO(lb): how does that relate to
<call instruction>
?
- Transition:
Input stack:
push(<caller context>, <s>)
Edge symbol:
call : <call instruction> : <caller context> --> <callee context>
New stack:
push(<call instruction> : <caller context> --> <callee context>, push(<context>, bot))
- Transition:
Input stack:
push(call : <call instruction> : <caller context> --> <callee context>, <s>)
Edge symbol:
return : <call instruction> : <caller context> --> <callee context>
New stack:
<s>
- Transition:
Description: Return with an empty stack
Input stack:
bot
Edge symbol:
return : <call instruction> : <caller context> --> <callee context>
New stack:
push(<caller context>, bot)
- Transition:
Input stack:
push(<callee context>, <s>)
Edge symbol:
return : <call instruction> : <caller context> : <callee context>
New stack:
push(<caller context>, <s>)
- Transition:
Input stack:
bot
Edge symbol:
load : <context>
New stack:
push(<context>, <s>)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, <s>)
Edge symbol:
store : <callee context>
New stack:
bot
- Transition:
Input stack:
bot
Edge symbol:
store : <context>
New stack:
bot
- Transition:
Input stack:
push(<context>, <s>)
Edge symbol:
store : <context>
New stack:
bot
Similar rules where
dataflow_from_memory
is todataflow_to_memory
asload
is tostore
…- Transition:
Input stack:
push(<context>, <s>)
Edge symbol:
dataflow_from_value : <context>
New stack:
push(<context>, <s>)
- Transition:
Input stack:
bot
Edge symbol:
dataflow_from_value : <context>
New stack:
push(<context>, bot)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
Edge symbol:
dataflow_from_value : <callee context>
New stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
- Transition:
Input stack:
push(<context>, <s>)
Edge symbol:
dataflow_to_value : <context>
New stack:
push(<context>, <s>)
- Transition:
Input stack:
bot
Edge symbol:
dataflow_to_value : <context>
New stack:
push(<context>, bot)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
Edge symbol:
dataflow_to_value : <callee context>
New stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
Transitions for a reversed automaton are dual in some sense…
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement ¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement ¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None ¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str ¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Path
class return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefix
should return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.CSThinDataflowPath¶
alias of
mate_query.cfl.ConfigureDFGPath.<locals>.CSDataflowPath
- class mate_query.cfl.CallGraphPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.CFLPath
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement ¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement ¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None ¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None ¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str ¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Path
class return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefix
should return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.ConfigureCFGPath(*, allow_unmatched_returns: bool = False) Type[Path] ¶
- Parameters
allow_unmatched_returns (bool) –
- Return type
Type[Path]
- mate_query.cfl.ConfigureDFGPath(*, thin: bool = True) Type[Path] ¶
- Parameters
thin (bool) –
- Return type
Type[Path]
- class mate_query.cfl.ContextSensitiveCFLPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.CFLPath
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod select_callsite_stack_symbols(graph: CPG) Select ¶
Construct partial stack symbols of the form:
<call instruction> : <caller context> –> <callee context>
With columns:
stack_symbol
: The overall symbolcallsite
: caller instructioncaller
: calling functioncallee
: called functioncaller_context
caller_context
This select is used as a variable in complex rules in PDAs, where the above partial stack symbol gets concatenated onto the tokens
call:
orreturn:
, e.g.call : <call instruction> : <caller context> --> <callee context>
This overall symbol gets pushed to or popped from the PDA stack during transitions.
- Parameters
graph (CPG) –
- Return type
Select
- classmethod select_context_stack_symbols(graph: CPG) Select ¶
Construct stack symbols of the form
<context>
.with columns:
stack_symbol
: The contextfunction
: The function
where there exists some callgraph edge out of (resp., into)
<function>
with caller (resp., callee) context<context>
.This select is used as a variable in complex rules in PDAs.
- Parameters
graph (CPG) –
- Return type
Select
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.ForwardCFGPath¶
Inter-procedural control-flow graph (CFG) paths, in the forwards direction.
Matches calls and returns up to unbounded depth, rather than using context-sensitivity information from the pointer analysis.
When encountering a return when the call stack is empty, can return to any callsite, which is problematic for performance and precision (in contract to context-sensitive variants below which use points-to analysis information).
- mate_query.cfl.UnmatchedForwardCFGPath¶
Like
ForwardCFGPath
but doesn’t match calls and returns.