A Brief Cqual Tutorial

Below we present a small example showing how to use the web-based version of Cqual to find a potential format-string vulnerability in a C program. This tutorial is extracted from the documentation distributed with the source code.

Consider the following small example program:

char *getenv(const char *name);
int printf(const char *fmt, ...);

int main(void)
{
  char *s, *t;
  s = getenv("LD_LIBRARY_PATH");
  t = s;
  printf(t);
}

This program reads the value of LD_LIBRARY_PATH from the environment and passes it to printf as a format string. If the user can control the environment in which this program is run, then this program may have a format-string vulnerability. For example, if the user sets LD_LIBRARY_PATH to a large sequence of %s's, the program will likely seg fault.

By default Cqual assumes nothing about the behavior of your program. In order to start checking for bugs, we need to annotate the program with extra type qualifiers. For this example we will use two qualifiers. We will annotate untrusted strings as $tainted, and we will require that printf take $untainted data:

$tainted char *getenv(const char *name);
int printf($untainted const char *fmt, ...);

int main(void)
{
  char *s, *t;
  s = getenv("LD_LIBRARY_PATH");
  t = s;
  printf(t);
}

In Cqual all user-defined qualifiers, which we will refer to as constant qualifiers, begin with dollar signs. Notice that we only need to annotate getenv and printf with type qualifiers. For this example Cqual will infer that s and t must also be $tainted, and hence will signal a type error: $tainted data is being passed to printf, which requires $untainted data. The presence of a type error indicates a potential format-string vulnerability.

Running Cqual

Use the web interface to analyze this program, listed as ``Tainting: Small Example'' in the drop-down menu. Cqual analyzes the file and brings up a window listing the input files and the analysis results. In this case, Cqual complains

taint.c:9
type of actual argument 1 doesn't match type of formal
unsatisfiable qualifier constraint $tainted <= $untainted
Error messages are linked to the position in the file where the error was generated. You can also click on a file name to jump to the top of the file.

If you click on the error message link you will see a marked-up display of taint.c, the input file. Identifiers are colored according to their inferred qualifiers. In the web version of Cqual, $tainted identifiers are colored red, $untainted identifiers are colored green, and identifiers that may contribute to a type error are colored purple.

Each marked-up identifier is also a hyperlink. Clicking on an identifier will show you the type of the identifier, fully annotated with qualifiers. For example, clicking on t should display

t: t ptr (t' ptr (t'' char))
in the bottom frame.

The name of the identifier is shown to the left of the colon, and its inferred type is shown to the right of the colon. Here t has the type pointer to pointer to character. (We will explain the extra level of ptr below.) Notice that Cqual writes types from left-to-right using ptr as a type constructor.

The three hyperlinked names in the type are qualifier variables. In this case the qualifier variable t'' is colored purple because it has been inferred to be both $tainted and $untainted, an error.

Clicking on a qualifier variable will show you the inferred value of the qualifier variable and the shortest path on which it was inferred to have its value. For example, if you click on t'', you should see the following result:

t'':  $tainted $untainted

$tainted <= getenv_ret'
         <= s''
         <= t''
         <= printf_arg0'
         <= $untainted

The first line tells us that t'' is both $tainted and $untainted, an error. The remaining lines show us an erroneous path. We see that t'' was tainted from s'', which was tainted from the return type of getenv. We also see that the error arises because t'' taints the parameter to printf, which must be untainted.

Clicking on a <= will jump to the source location where that constraint was generated. Clicking an a qualifier we compute the shortest path by which that qualifier was inferred to have its value.

L-values and R-values

In C there is an important distinction between l-values, which correspond to memory locations, and r-values, which are ordinary values like integers. In the C type system, l-values and r-values are given the same type. For example, consider the following code:

int x;
x = ...;
... = x;

The first line declares that x is a location containing an integer. On the second line x is used as an l-value: it appears on the left-hand side of an assignment, meaning that the location corresponding to x should be updated. On the third line x is used as an r-value. Here when we use x as an r-value we are not referring to the location x, but to x's contents. In the C type system, x is given the type int in both places, and the syntax distinguishes integers that are l-values from integers that are r-values.

Cqual uses a slightly different approach in which the types distinguish l-values and r-values. In Cqual, x is given the type ptr(int), meaning that the name x is a location containing an integer. When x is used as an l-value its type stays the same---in Cqual, the left-hand side of an assignment is always a ptr type. When x is used as an r-value the outermost ptr is removed, i.e., x as an r-value has the type int.

In more concrete terms, if you click on an identifier a that can be used as an l-value you will see a's type as an l-value, i.e., with an extra ptr at the top-level. For most purposes you can safely ignore this extra level of indirection.