Points of Interest¶
MATE ships with a number of automated analyses that detect potential vulnerabilities, called Points of Interest (POIs). These detectors are written using the MATE Python API; it’s easy to write additional application-, domain-, or API-specific detectors. Potential vulnerabilities found by these queries can be viewed in Flowfinder for collaboration and triage.
The MATE CLI and REST API can also be
used to manage POIs and their results; see the analyses
and pois
subcommands of the MATE CLI and the /analyses/
and /pois
endpoints of
the REST API in particular.
For tips on developing POIs, see Hacking on MATE.
Viewing POIs¶
To view the POIs that MATE found for a particular binary, navigate to the MATE builds page (usually at http://localhost:3000/builds) and click “view POIs” to see a list of POIs.
Important
The list of POIs starts empty. Results are added one-by-one as the MATE analyses running in the background report findings. Refresh the POIs page periodically to see the most up-to-date set of results.
For each POI result, there is:
“Analysis Name”: the type of analysis, see below for details
“Insight”: brief summary of the finding itself
“Code Graph”: click the “analyze” link to visualize this finding in Flowfinder, see Using Flowfinder for details
Vulnerability Types¶
MATE automatically reports findings related to the following types of vulnerabilities.
The current release of MATE includes the following built-in POI analyses. The primary CPG layer(s) used in the analysis are listed at the end of each description. Each title links to a more complete description of the analysis below.
Command Injection: Finds calls to output functions (e.g.
write
) with potential SQL keywords in string arguments, detecting SQL injection (SQLi) vulnerabilities. (AST)Iterator Invalidation: Finds uses of C++ iterators subsequent to iterator-invalidating collection modifications, detecting vulnerabilities resulting from invalid iterator accesses. (CFG)
Overflowable Heap Allocations: Finds calls to dynamic allocations (e.g.
malloc
) where the size calculation may be influenced by user input, detecting unsafe or unintended heap accesses. (DFG)Path Traversal: Find calls to filesystem operations where the path may be influenced by user input, detecting path traversal vulnerabilities. (DFG)
Pointer Disclosure: Finds pointer-typed values that may be output to the user, detecting vulnerabilities that may allow an attacker to circumvent memory protections like ASLR and stack canaries. (DFG)
Truncated malloc Size: Finds calls to dynamic allocations where the size may be influenced by user input and the input is truncated and used elsewhere as a signed integer, detecting vulnerabilities in which an attacker may gain control of the heap. (DFG)
Uninitialized Stack Memory Use: Finds potential intra- and inter-procedural uses of uninitialized stack memory, detecting potential information leaks or computation on invalid data. (CFG, PTG)
Use After Free (UAF): Finds potential uses of heap-allocated memory after calls to
free
, detecting UAF vulnerabilities. (CFG, PTG)User-Controlled String Comparisons: Finds string and memory comparison calls where the comparison length may be controlled by user input, detecting various memory corruption vulnerabilities. (DFG)
Variable-Length Stack Allocations: Detects uses of C99-style variable-length arrays (VLAs) or the alloca library routine where the user can control the size of the stack allocation, detecting vulnerability to certain stack-based attacks. (DFG)
Command Injection¶
CPG Layer(s): AST
Programs frequently interact with other programs by building up sequences of commands and then sending those commands to the target. Common examples include SQL queries, HTTP requests, and “system” commands.
When command sequences are built up using string functions, command injection can occur: a malicious user can provide inputs that the target interprets as instructions, rather than as data.
For example, the following pseudocode to query a user by ID:
query = "SELECT * FROM users WHERE id = " + user_id;
can be manipulated by an attacker to return all rows by providing:
user_id = "1234 OR 1=1"
making the final query:
SELECT * FROM users WHERE id = 1234 OR 1=1;
which is always true, and therefore returns all users instead of the intended behavior of just one.
MATE looks for constants that contain keywords associated with command construction, followed by uses of those constants in string or output formatting functions that are likely sources of command injection.
Iterator Invalidation¶
CPG Layer(s): CFG
The C++ standard library supports a number of containers (vector
, set
, map
, etc). Each
container type has a corresponding iterator type that is designed to allow users to iterate across
and access its elements.
There are some rules around iterator usage. Some container methods cause “iterator invalidation”, meaning that any iterators that were retrieved from the container before the call can only be safely destructed and are otherwise unsafe to use. For example, the following iterator usage invokes undefined behaviour:
std::vector<int> vec = populate_vec();
auto iter = vec.begin();
vec.push_back(1); // invalidates `iter`
std::cout << *iter; // accesses invalid iterator, UB
MATE finds execution paths where invalid iterators are accessed. It does this by looking for
container methods such as begin
that return iterators into a given container. It then looks for
invalidating methods on the container found in the previous step. Finally, it checks whether the
iterator from the first step is accessed with operator*
or operator->
. If this is the case,
MATE will flag the code as a point of interest. The initial graph loaded for this point of interest
shows:
The call to the container method that constructed the iterator (e.g.
begin
)The call to the container method that invalidated the iterator (e.g.
push_back
)The usage of the invalid iterator
Overflowable Heap Allocations¶
CPG Layer(s): DFG
C programs can dynamically allocate memory using the malloc
function.
A common vulnerability occurs when a user controlled value is supplied as the size argument to
malloc
as part of an arithmetic expression. Consider the following example:
int *dest = malloc(size + 1);
memcpy(dest, src, size);
If size
can be controlled to 0xFFFFFFFF
, the argument to malloc
will evaluate to 0 and
an attacker will be able to write 0xFFFFFFFF
bytes of data from src
into the heap at
whatever address malloc
returns.
MATE finds execution paths in the program that take user input (such as a call to scanf()
), use
it as part of an arithmetic operation that is susceptible to integer over/under flow and use the
result to control the size of a dynamic memory allocation.
Path Traversal¶
CPG Layer(s): DFG
Path traversal (also known as directory traversal) is an attack in which user-provided input results in the program using a filepath that “escapes” the intended part of the filesystem to read or write unauthorized files.
Path traversal exploits often include input that causes unexpected relative path resolution, such as ../
, which navigates to the enclosing directory, //
, which may restart path resolution at the root directory, or ~
, which may resolve to the current user’s home directory. An example attack might include a file path like mallory/../../../../path/to/secret.txt
, which “escapes” from the current directory and accesses a secret file.
Sometimes you can insert ../
directly into some input that is used to generate a filename - try it!
But often programs include some logic to identify and replace patterns like ..
.
In these cases, you can sometimes provide encoded input that makes it past these checks, but nevertheless ends up being treated like ../
when it is used later in the program.
For example:
%2e%2e%2f
,%2e%2e/
, and%2e.%2f
may all be decoded to../
URL encoding
..%c0%af
may also be interpreted as../
Path traversal may be useful to:
read secret files (e.g. authentication tokens)
write over important files
MATE reports PathTraversal points of interest, which each include:
The source location where attacker-controlled input may enter the target program
The source location where user-controlled data related the the input may be used a filesystem-related function
Each POI result can be opened in Flowfinder, which can be used to explore:
How is user input used in the construction of the filepath used by the program? Can it take a relative path? Absolute path?
What functions perform checks and substitutions intended to prevent path traversal? What do they check (and what might they miss)?
What functions encode/decode input (e.g. URL encoding)? What might they miss: double-encoded values, invalid encodings, etc.?
Does the program check path data before or after the path has been canonicalized?
What part of the user input is/isn’t checked? Whole path? Filename? Extension?
Pointer Disclosure¶
CPG Layer(s): DFG
Security mechanisms like Address Space Layout Randomization (ASLR) aim to make it more difficult for attackers to predict where key code and data can be found in memory. Vulnerabilities that reveal information about the live layout of a program in memory can enable attacks that bypass these security mechanisms.
Pointer disclosure vulnerabilities generally involve programming errors that result in program output that includes the memory address of a value rather than the value itself.
For example, printf
and related string functions take a format string and a set of parameters.
The format string tells the string function how to interpret the parameters in order to render them appropriately (e.g. as an integer, as a string, etc.).
If the value to be rendered as an integer, for example, is instead a pointer to an integer, the address in that pointer is “leaked” to the attacker.
MATE provides an analysis that looks for ways that values used as pointers may be passed into functions that produce output. MATE reports POIs containing:
The source code location where a pointer is computed
The source code location where that pointer may be output to the user
Common source of false positives:
The MATE analysis may not be able to distinguish between code that prints an entire struct (leaking the pointer) and printing the first field of a struct (which is safe) in accesses to code constructs like:
struct { char msg[8]; void *ptr; }
Truncated malloc
Size¶
CPG Layer(s): DFG
A common vulnerability occurs when a user controllable value is supplied as the size argument to
malloc()
and used elsewhere as a signed integer.
If the size argument is manipulated to exceed 2GB, the conversion to a signed integer will truncate the value and result in an extremely negative value (-2GB). If this signed integer is used to control reads/writes to the allocated memory, it can result in an attacker being able to write to unexpected portions of the heap.
MATE finds execution paths in the program that take user input (such as a call to scanf()
), use it
to control the size of a dynamic memory allocation and then later convert the size to a signed
integer of equal or smaller width.
Uninitialized Stack Memory Use¶
CPG Layer(s): CFG, PTG
In C and C++ programs, stack variables are not initialized by default.
If a program reads a newly-allocated stack variable before it has been written to, the program will generally contain “junk” data from whatever was last stored at the memory location.
Sometimes junk data is already useful to an adversary (e.g. to easily pass a “check if not 0
” test), while other times an adversary may be able to control what data is at that location (e.g. content leftover from previous function invocations).
Normally, correct programs have the sequence:
allocation: reserve space on the stack for the variable near the start of the function
initialization: write some data to the variable
use: read data from that variable
MATE detects execution paths such that a use of a variable may be reached without passing through an initialization of that variable. MATE reports points of interest that each include:
The relevant variable name
Where the variable is declared in the source code
Where (in the source code) might that variable be used without having first been initialized
Look for:
How might it be valuable for an adversary to control that variable? How is that variable used (e.g. in conditionals, array indexing, etc.) that might be useful to control?
Common source of false positives:
Look for relevant calls to external (e.g. library) functions that initialize the variable. MATE can only analyze code included in the program, so relies on MATE Signatures to determine whether external library code is capable of initializing variables or not. If we don’t have a complete signature, MATE will miss the initialization and report a false positive.
MATE may report findings based on paths that may not actually be realizable: explore in Flowfinder and/or browsing the source to determine how that point in the code may be reached.
Use After Free (UAF)¶
CPG Layer(s): CFG, PTG
Programs can allocate memory dynamically on the heap, using functions like malloc
in C.
Normally, correct programs:
allocate memory with
malloc()
or similaruse that memory
free that memory with
free()
or similar
A Use After Free vulnerability is a condition in which memory is referenced (used) after it has been freed. This may be useful in a variety of ways, generally related to enabling the attacker to control the data at the use site by having previously filled that memory location with advantageous contents.
MATE finds execution paths through the program that pass through a free, reaching a use site for a variable without having first passing through a (re)allocation function. The POI results include:
The source location where a variable’s memory is freed
The source location where that variable’s memory is used without having first been (re)allocated
Recommendation: use Flowfinder to
Determine whether the use after free is feasible.
Determine how a control of the content of that variable could be useful.
Identify how to control the content of that variable. Will the content at the use site point to the previously used content? Can you cause/control a large number of allocations elsewhere so the free memory will likely contain useful values?
Common source of false positives:
Similar to the uninitialized stack memory POI, if an external function performs the proper (re)allocation between the free and the use site, MATE may not be able to determine this (as MATE only analyzes the code in the program binary).
Reasoning about dynamic behavior is challenging, and at times cannot be fully precise. In some cases, MATE may report findings that are not actually feasible in practice.
In particular, a common code pattern results in use-sites that are only reachable if a pointer passes a null-pointer check. If a pointer is set to null after it has been freed, these use sites are not reachable in practice.
User-Controlled String Comparisons¶
CPG Layer(s): DFG
Some string and memory comparison functions such as strncmp
and memcmp
take
an argument that limits the length of the strings that get compared. For
example, strcmp("password", "pass") == 0
is false, but
strncmp("password", "pass", 4) == 0
is true.
If a length-limited comparison function is used in an authentication check (e.g., to check a user-provided password) and an attacker can control both the length argument and one of the string arguments, they may be able to bypass the authentication check.
Variable-Length Stack Allocations¶
CPG Layer(s): DFG
In most programs, variable-sized objects are dynamically allocated on the heap, and stack objects are fixed in size. There are two exceptions to this:
1. In C99, programmers may use Variable-Length Arrays (VLAs) to create dynamic stack objects
1. In some runtimes, the alloca()
library routine can dynamically allocate stack memory
Dynamic stack objects are inherently dangerous because of the stack’s limited space: if a user can add arbitrarily sized objects to the stack, then they can potentially clash the stack with other memory regions in the program (like the heap) or even potentially write backwards from the current stack pointer.
MATE finds execution paths in the program that take user input (such as a call to scanf()
) and use that input to control the size of a stack object.