Using Flowfinder

Flowfinder is an interactive, graphical user interface for exploring a program’s code property graph. Given a potential vulnerability discovered via MATE’s automated analyses, Flowfinder displays relevant fragments of the CPG that explain relationships between program inputs, outputs, and computations. Users can deepen their understanding of the potential vulnerability by viewing additional fragments of the CPG that answer specific questions about program elements, such as data flows or control dependencies that influence specific statements. By exploring a potential vulnerability using Flowfinder, users can apply their high-level insights about the program’s semantics and security requirements to eliminate analysis false positives or develop concrete inputs that demonstrate the insecurity of the program.

Similar to other program analysis tools such as IDA Pro, Binary Ninja, and angr management, Flowfinder is designed to help answer questions such as “How does this data get from here to there and how is it changed along the way” or “If I can control this buffer, what effect can I have on the execution of the program?” By leveraging the detailed information in the MATE CPG, Flowfinder is intended to enable interprocedural analysis of program dataflows at a relatively high level of abstraction. Rather than navigating by scrolling or jumping between detailed level views of the program’s disassembly or source code, Flowfinder is designed to support expanding and contracting representations of code and data as needed and creating and manipulating visualizations of high-level flows between different components.

See Flowfinder Tutorial for a hands-on guide to finding a bug with Flowfinder.

Interface overview

When you click the analyze button on the Points of Interest page it will take you to the Flowfinder user interface.

Important

Loading Points of Interest can some times take a minute or two, especially the first time a POI for an analysis is loaded for a particular build. If the POI does not load after a few minutes, you may need to try reloading the page. If the page doesn’t load a second time, consider investigating a different POI—the code graph for this POI is likely to be too large to explore effectively.

This interface has three panels.

_images/flowfinder-panels.png

The panel in the top left is the graph area. The panel in the bottom left displays cards which represent what is shown in the graph. The panel on the right contains some controls for modifying the graph as well as some context for how to interpret the graph. Be sure to scroll down on the right-hand panel to see helpful background information about the kind of point of interest you are currently looking at.

We will look at all of the controls in each of these panels one at a time, starting with the main graph area.

Top left panel: graph area

_images/flowfinder-graph_labeled.png

You can pinch or pull on a touchpad to zoom in and out of the graph, and you can click and drag the canvas to change the view area. Additionally, you can use the arrow keys on your keyboard to move around the graph. In particular, sometimes the layout algorithm makes it challenging to see all the graphs you are interested in on a single screen–in these cases, it is helpful to drag nodes to where you want them.

A:

Click on the symbol labeled A in the diagram to automatically recenter the graph.

B:

Click on the symbol labeled B in the diagram to force the graph to be laid out again (this can be useful if you lose artifacts such as edges between nodes when adjusting the graph).

C:

This is a legend to help provide insight into the meaning behind the colors of elements rendered in the graph.

D:

The light grey box labeled D in the diagram, next to the text create_query_plan, represents the create_query_plan function in the program.

E:

The medium gray boxes represent lines of source code in that function. The top source code line node is labeled E in the diagram and the text beside it, src/plan.c:23 query_plan(ast* query, database* db);, indicates that the line query_plan(ast* query, database* db); appears on line 23 of the file src/plan.c.

Note: these aren’t necessarily the only lines of source code in this function, only the ones relevant to this graph.

F:

The dark gray boxes represent LLVM instructions; the LLVM instructions are grouped together based on the line of source code that they are derived from. You can tell that the LLVM node labeled is a call instruction based on the adjacent text.

You can click on the boxes to collapse or expand them as you explore the graph. If you click on the function level light grey box it will collapse the entire function to a single node. If you then click that single-function node again it will expand one level to show you the 4 medium gray boxes that represents the 4 lines of code in that function which are part of this graph.

_images/flowfinder-expanded_function_collapsed_lines.png

Right-clicking on nodes in the graph will reveal a context menu that will allow you to query for more parts of the graph as you explore the program.

_images/flowfinder-context_menu.png

Context menu actions

The context menu lists a number of actions for exploring the graph from the selected node.

Show data flow reaching this node

Dataflow edges in the code graph represent data dependencies between instructions and memory locations in the program. The dataflow graph abstracts away the control structure of the program, making it easy to focus on how data propagates within a program, but can obscure how the sequence or conditional execution of statements may impact the computation. Selecting “Show data flow reaching this node” will add the subgraph of the code graph to the graph view and add an associated card to the bottom panel. This graph shows what values computed by the program flow into instruction or memory location selected.

Show data flow from this node

Like “Show data flow reaching this node”, this control adds a dataflow subgraph to the graph view and an associated card showing how the value computed by the selected instruction or stored in the selected memory location may flow to other computations in the program.

Show control flow reaching this node

Adds a subgraph showing paths from the function’s entry to the selected instruction in the intra-procedural control-flow graph.

Show control flow from this node

Adds a subgraph showing paths in the intra-procedural control-flow graph starting at the selected instruction, or for the entire function if a function is selected.

Show control dependencies

An instruction is control-dependent on another if whether it is executed depends on the control-flow leaving the controlling instruction. For example, the body of a loop is control-dependent on the branch instruction of the loop guard. In this snippet of code:

if (some_condition) {
  stmt1;
} else {
  stmt2;
}
stmt3;

stmt1 and stmt2 are control dependent on some_condition, but stmt3 is not. Selecting “Show control dependencies” adds a subgraph to the graph view and an associated card showing all instructions in the function whose control-flow may influence the execution of the selected instruction.

Show callsites

Selecting “Show callsites” will add a subgraph to the graph view and an associated card showing all instructions that may invoke the selected function or signature.

Show callers

Selecting “Show callers” will add a subgraph to the graph view and an associated card showing all functions that may invoke the selected function or dataflow signature.

Show callees

Selecting “Show callees” will add a subgraph to the graph view and an associated card showing all functions and dataflow signatures that may be invoked by the selected callsite or function.

Set as source

Sets the “source” field of the “Create New Slice” tool in the right-hand panel to the label of the currently selected node. See Create New Slice panel.

Set as sink

Sets the “sink” field of the “Create New Slice” tool in the right-hand panel to the label of the currently selected node. See Create New Slice panel.

Bottom left panel: card area

_images/flowfinder-card_panel_labeled.png

The cards in the lower left-hand panel represent the different subgraphs that together make up the graph that is displayed in the graph panel above. You may need to resize the window or scroll to see all the available graphs.

There are three types of possible cards/subgraphs, and different points of interest will use different combinations of cards. Some points of interest specify a graph consisting of only a single subgraph, while others, such as the one in the screenshot above, specify a graph consisting of many different kinds of subgraphs.

Each subgraph will represent one of four possible views of the code property graph: dataflow, control flow, control dependence or call graph. The three kinds of subgraphs are:

Slice:

a slice is a subgraph that goes from a starting source node to an ending sink node. There are controls for refining what is shown in a slice which we will see later on in this document.

Graph:

a graph is a subgraph that is specified using an origin node and includes edges of the given kind radiating from the origin.

Node:

a node is a subgraph consisting of a single code-property-graph node.

Notice the labels A, B and C above distinguish which cards are which kind of subgraph.

There are one or two toggles on each card:

D:

The enabled toggle is present on every type of graph card. Toggling it shows or hides the graph. It may be helpful to either click the recenter button or manually drag nodes to recenter the graph after hiding / revealing components.

E:

The analyze toggle is only present on slice cards. As you explore the graph you may find nodes which you either want to focus the slice on or remove from a slice; this toggle allows you to do that.

When you click the analyze toggle, it will select that slice, highlighting it in gold. You can then click on nodes to include or exclude them in the slice.

Slice analysis mode

_images/flowfinder-slice_example.png

In the screenshot above, we’ve put the slice into analyze mode and then clicked on the loadIMEtypes node (labeled A).

This populates loadIMEtypes as a node in the Focus/Avoid Nodes portion of the card. You can add multiple nodes by clicking on them.

For each selected node, you can then click on the avoid button (labeled B) which looks like a crossed out eye, or the focus button (labeled C) which looks like a cross-hair. You can also remove the node from the Focus/Avoid Nodes portion of the card by clicking the trash can (labeled D). Choosing avoid will rerun the slice query (from source to sink), but avoiding all paths that include the specified node. Choosing focus will rerun the slice query, but include only those paths that include the specified node.

Important

Currently, the focus and avoid feature is only functional for data flow slices.

Finally, click the Run Analysis button (labeled E) to create a new slice that focuses on and/or avoids the nodes you’ve selected. The current slice will be automatically disabled, and the new slice will be enabled instead.