mate_query.cfl module¶
See documentation on CFLPath
- class mate_query.cfl.CFLPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
abc.ABC,mate_query.db.PathA Context-Free-Language path query.
CFL queries find paths in the CPG such that concatenation of all the symbols/labels of the edges of the path form a word in some context-free language. Incremental detection of CFL-paths is done by a push-down automaton (PDA) encoded as a recursive SQL query.
These ‘queries’ actually consist of a few distinct database operations:
populate_edge_symbol_tableconstructs a table of symbols for (a relevant subset of) CPG edges. The mapping from edges to symbols is surjective and functional.populate_transition_tableconstructs a table of transitions for this PDA, where a transition consists of a functional mapping from a (source state, source stack, input symbol) tuple to a (new state, new stack) tuple.The recursive query that finds CFL-paths is defined in
db.PathBuilder.build.
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- cfl: ClassVar[bool] = True¶
- classmethod edge_symbol_table_name(graph: CPG) str¶
- Parameters
graph (CPG) –
- Return type
str
- abstract classmethod populate_edge_symbol_table(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- abstract classmethod populate_transition_table(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- abstract classmethod table_name_prefix() str¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Pathclass return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefixshould return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- classmethod transition_symbol_table_name(graph: CPG) str¶
- Parameters
graph (CPG) –
- Return type
str
- classmethod update_edge_symbol_table_statistics(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- classmethod update_transition_table_statistics(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- class mate_query.cfl.CSCFGPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.ContextSensitiveCFLPathInter-procedural context-sensitive (CS) control-flow graph (CFG) paths.
These use context-sensitive information from the points-to analysis, and do call-return matching. If an unmatched return is encountered, then the PDA returns to any feasible control-flow context of the caller (as determined by the MATE pointer analysis).
In the PDA for this class, “context” usually refers to a k-limited (where k is usually relatively low, like 1 or 2) list of callsites (stack frames). The PDA encoded by this class has:
- States:
INTRA: The next step should follow intra-procedural controlflow
INTER: The next step should follow inter-procedural controlflow
Edge symbols (variables appear
<in angle brackets>, tokens are separated by spaces for clarity):- Symbol:
step_instructionEdge kinds:
EdgeKind.INSTRUCTION_TO_SUCCESSOR_INSTRUCTION)Condition: The target node is an instruction which is not a
call,invokeorret, or it is acallor invoke to a externally defined function (i.e., an LLVM declaration).
- Symbol:
step_instruction_stopEdge kinds:
EdgeKind.INSTRUCTION_TO_SUCCESSOR_INSTRUCTIONCondition: The target node is a
call,invokeorretinstruction to a defined function.
- Symbol:
call : <call instruction> : <caller context> --> <callee context>Edge kinds:
EdgeKind.CALL_TO_FUNCTIONCondition: The target node is a
Functioncalled by the source (instruction) node in (control-flow) context<caller context>, resulting in context<callee context>.
- Symbol:
entry_blockEdge kinds:
EdgeKind.FUNCTION_TO_ENTRY_BLOCKCondition: None
- Symbol:
entry_instructionEdge kinds:
EdgeKind.BLOCK_TO_ENTRY_INSTRUCTIONCondition: None
- Symbol:
return_instructionEdge kinds:
EdgeKind.RETURN_INSTRUCTION_TO_CALL_RETURNCondition: None
- Symbol:
return : <instruction> : <caller context> --> <callee context>Edge kinds:
EdgeKind.CALL_RETURN_TO_CALLERCondition: The target is the
callorinvokeinstruction for the call to the function that the control flow is returning from. The call occurred in context<caller context>, resulting in context<callee context>.
- Stack symbol templates/formats:
call : <call instruction> : <caller context> --> <callee context><context>
Transitions (variables appear
<in angle brackets>,pushis the cons constructor for the stack type,botis the bottom of the stack):- Transition:
Input state:
INTRAInput stack:
<s>Edge symbol:
step_instructionNew state:
INTRANew stack:
<s>
- Transition:
Input state:
INTRAInput stack:
<s>Edge symbol:
step_instruction_stopNew state:
INTERNew stack:
<s>
- Transition:
Description: TODO(lb)
Input state:
INTERInput stack:
push(call : <call instruction 0> : <caller context 0> --> <callee context 0>)Edge symbol:
call : <call instruction> : <caller context> --> <callee context>New state:
INTRANew stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(call : <call instruction 0> : <caller context 0> : <callee context 0>))Where:
<callee context 0> = <caller context>and the callee function of the existing stack frame is the caller context of the new stack frame.TODO(lb): how does that relate to
<call instruction>?
- Transition:
Description: Call with empty stack
Input state:
INTERInput stack:
botEdge symbol:
call : <call instruction> : <caller context> --> <callee context>New state:
INTRANew stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
- Transition:
Description: TODO(lb)
Input state:
INTERInput stack:
push(<caller context>, <s>)Edge symbol:
call : <call instruction> : <caller context> --> <callee context>New state:
INTRAInput stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(<caller context>, <s>))Where:
<call instruction>may call<function>
- Transition:
Input state:
INTRAInput stack:
<s>Edge symbol:
entry_blockNew state:
INTRANew stack:
<s>
- Transition:
Input state:
INTRAInput stack:
<s>Edge symbol:
entry_instructionNew state:
INTRANew stack:
<s>
- Transition:
Input state:
INTERInput stack:
<s>Edge symbol:
return_instructionNew state:
INTRANew stack:
<s>
- Transition:
Description: Return with a call on the stack
Input state:
INTRAInput stack:
push(call : <call instruction> : <caller context> --> <callee context>, <s>)Edge symbol:
return : <call instruction> : <caller context> --> <callee context>New state:
INTRANew stack:
<s>
- Transition:
Description: Return with an empty stack
Input state:
INTRAInput stack:
botEdge symbol:
return : <call instruction> : <caller context> --> <callee context>New state:
INTRANew stack:
push(<caller context>, bot)
- Transition:
Input state:
INTRAInput stack:
push(<callee context>, <s>)Edge symbol:
return : <call instruction> : <caller context> : <callee context>New state:
INTRANew stack:
push(<caller context>, <s>)
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Pathclass return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefixshould return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- class mate_query.cfl.CSDataflowPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.ContextSensitiveCFLPathThin dataflow paths that use context-sensitivity (CS) from the points-to analysis.
The PDA encoded by this class has:
- States:
DATAFLOW
Edge symbols (variables appear
<in angle brackets>, tokens are separated by spaces for clarity):- Symbol:
value_definition_to_useEdge kinds:
EdgeKind.VALUE_DEFINITION_TO_USECondition: The target node is an instruction which is not a
call,invokeorload; and the use is not as the first operand of astore.
- Symbol:
dataflow_from_memory : <context>Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE,EdgeKind.DIRECT_DATAFLOW_SIGNATURE, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATUREifthinisFalse.Condition: The target node is a dataflow, input, or output signature, and the source node is a memory location.
- Symbol:
dataflow_from_value : <context>Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE,EdgeKind.DIRECT_DATAFLOW_SIGNATURE, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATUREifthinisFalse.Condition: The target node is a dataflow, input, or output signature, and the source node is not a memory location (i.e., it is an LLVM value).
- Symbol:
dataflow_from_value : <context>Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE,EdgeKind.DIRECT_DATAFLOW_SIGNATURE, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATUREifthinisFalse.Condition: The target node is a dataflow, input, or output signature, and the source node is not a memory location (i.e., it is an LLVM value).
- Symbol:
dataflow_to_memory : <context>Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE,EdgeKind.DIRECT_DATAFLOW_SIGNATURE, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATUREifthinisFalse.Condition: The target node is a memory location.
- Symbol:
dataflow_to_value : <context>Edge kinds:
EdgeKind.DATAFLOW_SIGNATURE,EdgeKind.DIRECT_DATAFLOW_SIGNATURE, plusEdgeKind.INDIRECT_DATAFLOW_SIGNATUREifthinisFalse.Condition: The target node is not a memory location (i.e., it is an LLVM value).
- Symbol:
load : <context>Edge kinds:
EdgeKind.LOAD_MEMORYCondition: None
- Symbol:
store : <context>Edge kinds:
EdgeKind.STORE_MEMORYCondition: None
- Symbol:
operand_to_param_bindingEdge kinds:
EdgeKind.OPERAND_TO_PARAM_BINDINGCondition: None
- Symbol:
call : <call instruction> : <caller context> --> <callee context>Edge kinds:
EdgeKind.PARAM_BINDING_TO_ARGCondition: The parameter was bound by this call in this calling context
- Symbol:
return_value_to_call_returnEdge kinds:
EdgeKind.RETURN_VALUE_TO_CALL_RETURNCondition: None
- Symbol:
return : <call instruction> : <caller context> --> <callee context>Edge kinds:
EdgeKind.CALL_RETURN_TO_CALLERCondition: This return is from a (potential) call in context
<caller context>, resulting in callee context<callee context>.
Stack symbol templates/formats:
<call instruction> : <caller context> --> <callee context><context>Transitions (variables appear
<in angle brackets>,pushis the cons constructor for the stack type,botis the bottom of the stack, all input and output states areDATAFLOW):- Transition:
Input stack:
<s>Edge symbol:
value_definition_to_useNew stack:
<s>
- Transition:
Input stack:
<s>Edge symbol:
operand_to_param_bindingNew stack:
<s>
- Transition:
Input stack:
<s>Edge symbol:
return_value_to_call_returnNew stack:
<s>
- Transition:
Description: TODO(lb)
Input stack:
push(call : <call instruction 0> : <caller context 0> --> <callee context 0>)Edge symbol:
call : <call instruction> : <caller context> --> <callee context>New stack:
push(call : <call instruction> : <caller context> --> <callee context>, push(call : <call instruction 0> : <caller context 0> : <callee context 0>))Where:
<callee context 0> = <caller context>and the callee function of the existing stack frame is the caller context of the new stack frame.TODO(lb): how does that relate to
<call instruction>?
- Transition:
Input stack:
push(<caller context>, <s>)Edge symbol:
call : <call instruction> : <caller context> --> <callee context>New stack:
push(<call instruction> : <caller context> --> <callee context>, push(<context>, bot))
- Transition:
Input stack:
push(call : <call instruction> : <caller context> --> <callee context>, <s>)Edge symbol:
return : <call instruction> : <caller context> --> <callee context>New stack:
<s>
- Transition:
Description: Return with an empty stack
Input stack:
botEdge symbol:
return : <call instruction> : <caller context> --> <callee context>New stack:
push(<caller context>, bot)
- Transition:
Input stack:
push(<callee context>, <s>)Edge symbol:
return : <call instruction> : <caller context> : <callee context>New stack:
push(<caller context>, <s>)
- Transition:
Input stack:
botEdge symbol:
load : <context>New stack:
push(<context>, <s>)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, <s>)Edge symbol:
store : <callee context>New stack:
bot
- Transition:
Input stack:
botEdge symbol:
store : <context>New stack:
bot
- Transition:
Input stack:
push(<context>, <s>)Edge symbol:
store : <context>New stack:
bot
Similar rules where
dataflow_from_memoryis todataflow_to_memoryasloadis tostore…- Transition:
Input stack:
push(<context>, <s>)Edge symbol:
dataflow_from_value : <context>New stack:
push(<context>, <s>)
- Transition:
Input stack:
botEdge symbol:
dataflow_from_value : <context>New stack:
push(<context>, bot)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, bot)Edge symbol:
dataflow_from_value : <callee context>New stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
- Transition:
Input stack:
push(<context>, <s>)Edge symbol:
dataflow_to_value : <context>New stack:
push(<context>, <s>)
- Transition:
Input stack:
botEdge symbol:
dataflow_to_value : <context>New stack:
push(<context>, bot)
- Transition:
Input stack:
push(<call instruction> : <caller context> --> <callee context>, bot)Edge symbol:
dataflow_to_value : <callee context>New stack:
push(<call instruction> : <caller context> --> <callee context>, bot)
Transitions for a reversed automaton are dual in some sense…
- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Pathclass return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefixshould return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.CSThinDataflowPath¶
alias of
mate_query.cfl.ConfigureDFGPath.<locals>.CSDataflowPath
- class mate_query.cfl.CallGraphPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.CFLPath- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod initial_stack() ClauseElement¶
- Return type
ClauseElement
- classmethod initial_state() ClauseElement¶
- Return type
ClauseElement
- classmethod populate_edge_symbol_table(graph: CPG) None¶
Preprocess the graph’s edge table and give each edge a symbol to be used during during CFL-based path extension.
- Parameters
graph (CPG) –
- Return type
None
- classmethod populate_transition_table(graph: CPG) None¶
- Parameters
graph (CPG) –
- Return type
None
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- classmethod table_name_prefix() str¶
Returns a table prefix to be used when creating edge symbol and transition tables.
If multiple instantiations of a
Pathclass return the same prefix, the resulting tables will be shared between instances, reducing the amount of pre-computation required.If different instances require distinct tables,
table_name_prefixshould return distinct prefixes.The name will be converted to lower case and truncated to 16 characters.
- Return type
str
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.ConfigureCFGPath(*, allow_unmatched_returns: bool = False) Type[Path]¶
- Parameters
allow_unmatched_returns (bool) –
- Return type
Type[Path]
- mate_query.cfl.ConfigureDFGPath(*, thin: bool = True) Type[Path]¶
- Parameters
thin (bool) –
- Return type
Type[Path]
- class mate_query.cfl.ContextSensitiveCFLPath(info: Any, source: Any, source_stack: Tuple[Any, ...], target: Any, edge: Any, trace: Tuple[Any, ...], state: Any, stack: Tuple[Any, ...], stack_top: Any, c: Any, edge_symbol_table: Optional[Table] = None, transition_table: Optional[Table] = None, is_reversed: ClassVar[bool] = False, stepped_over_functions: ClassVar[set[str]] = <factory>, cfl: ClassVar[bool] = False)¶
Bases:
mate_query.cfl.CFLPath- Parameters
info (Any) –
source (Any) –
source_stack (Tuple[Any, ...]) –
target (Any) –
edge (Any) –
trace (Tuple[Any, ...]) –
state (Any) –
stack (Tuple[Any, ...]) –
stack_top (Any) –
c (Any) –
edge_symbol_table (Optional[Table]) –
transition_table (Optional[Table]) –
is_reversed (ClassVar[bool]) –
stepped_over_functions (ClassVar[set[str]]) –
cfl (ClassVar[bool]) –
- Return type
None
- c: Any¶
- edge: Any¶
- info: Any¶
- classmethod select_callsite_stack_symbols(graph: CPG) Select¶
Construct partial stack symbols of the form:
<call instruction> : <caller context> –> <callee context>
With columns:
stack_symbol: The overall symbolcallsite: caller instructioncaller: calling functioncallee: called functioncaller_contextcaller_context
This select is used as a variable in complex rules in PDAs, where the above partial stack symbol gets concatenated onto the tokens
call:orreturn:, e.g.call : <call instruction> : <caller context> --> <callee context>
This overall symbol gets pushed to or popped from the PDA stack during transitions.
- Parameters
graph (CPG) –
- Return type
Select
- classmethod select_context_stack_symbols(graph: CPG) Select¶
Construct stack symbols of the form
<context>.with columns:
stack_symbol: The contextfunction: The function
where there exists some callgraph edge out of (resp., into)
<function>with caller (resp., callee) context<context>.This select is used as a variable in complex rules in PDAs.
- Parameters
graph (CPG) –
- Return type
Select
- source: Any¶
- source_stack: Tuple[Any, ...]¶
- stack: Tuple[Any, ...]¶
- stack_top: Any¶
- state: Any¶
- stepped_over_functions: ClassVar[set[str]]¶
- target: Any¶
- trace: Tuple[Any, ...]¶
- mate_query.cfl.ForwardCFGPath¶
Inter-procedural control-flow graph (CFG) paths, in the forwards direction.
Matches calls and returns up to unbounded depth, rather than using context-sensitivity information from the pointer analysis.
When encountering a return when the call stack is empty, can return to any callsite, which is problematic for performance and precision (in contract to context-sensitive variants below which use points-to analysis information).
- mate_query.cfl.UnmatchedForwardCFGPath¶
Like
ForwardCFGPathbut doesn’t match calls and returns.