The purpose of this paper is to illustrate a system that allows specifications
to be derived semi-automatically from C programs. The specifications
so derived describe temporal and data-dependence properties of an API or
and ADT. In more detail, the specification consists of a NFA, where
the edges represent calling a particular function, with particular variables
passed as parameters, and a particular variable to store the return value.
For example, we may represent by the letter A a call to open(...) which
stores the returned file descriptor in X1. Similarly, let B be a call
to read (_,_,X1), let C be a call to write (_,_,X1), and let D be close(X1).
Now, we say that Start follows edge A to a node, which has edges B
and C back to itself, and edge D leading to End. Then we have a simple
specification for the POSIX open/read/write/close functions, which essentially
says that files must be opened before reading or writing, and they must eventually
be closed.
It takes several steps to derive the specification, some requiring considerable
expert input. First, every function call that is to be monitored must
be wrapped so that its name, arguments (and suitable expansions of struct
arguments), and return value are printed out to a trace file whenever it
is called. Then, the program is run with some testcases and the traces
are collected. The traces are then run through the "flow dependence
annotator". Basically, this tool notices when a particular value that
is "defined" by one function is later "used" by another function, then calls
the pair of invocations flow-dependent (i.e., adds this pair of "interactions"
to the flow-dependence relation). I enquote "defined" and "used" because
they have special meaning here: an expert must categorize all arguments
/ return values (collectively refered to as "attributes") as definers, users,
or both, for each function that they appear in. Definers change the
state of their attribute in some way; users depend on that state.
Once the flow dependence relation has been defined, each interaction attribute
is given a unique type. Types are then unified based on flow dependence:
that is, if two interaction attributes are flow-dependent, they must
share the same type.
Next, the scenario extractor finds maximal sets of interactions (up to
a certain bounded size) whose attributes are related by flow dependencies.
The scenario is then simplified by eliminating all attributes which
are not involved in flow dependence. Standardization converts concrete
values (e.g. fd = 7) into abstract values (e.g. x0) so that the scenarios
are not dependent on the particular file descriptors or pointers returned
by API calls. Each interaction is then given a one-letter representation.
These letters form "scenario strings" that are sent to the next step.
The PFSA learner uses an off-the-shelf method (sk-strings, from Raman and Patrick) to infer a PFSA that matches the language defined by the set of scenario strings. It is then "cored" by removing edges with low probability, and discarding edge weights.
To test this tool, they use it to generate a specification for the X11 selection mechanism as described by the ICCCM. Out of 16 programs tested, two had bugs found by this tool. I think more programs should have been tested, since there are many out there, and it is not clear from this small sample whether the kinds of bugs it checks for are actually a problem in practice, nor is it clear whether the model converges quickly. They cite verification tools as beyond the scope of the paper, yet they must have used something to get these results; it's not clear how advanced these tools are. Most likely, they rely on dynamic analysis rather than static analysis, which is unfortunate, since they say themselves early on that actual program traces are less likely to contain bugs than the program itself. What is an advantage for the trainer is a disadvantage for the checker.
At this point, the value of the tool is greatly diminished by the necessity
of extensive instrumenting that is required: it seems likely that the
time spent on instrumentation would be much greater than the time spent developing
the model manually. However, the instrumentation provides a basis for
automatic verification, and further work should reduce this burden.
Another drawback to their approach is that the sets of values used by
each type must be disjoint, or else inappropriate flow-dependencies will
be generated and types will be unified incorrectly. This is because
it looks only at the strings printed out by the tracer to determine if a
value is passed from one place to another: if "7" is returned in one place
and used in another, it assumes that it is not merely coincidence. If
heap objects are never freed, then there should be a 1-1 association between
types and pointer values; however, if memory is reused, then the method could
fail. It could also fail if handles in two different spaces are printed
out the same way, e.g. file descriptors and window references. I say
this is mostly minor because with a bit more annotation work, the tracer
could identify the types and use different representations, e.g. "fd7" or
"window80". (On the other hand, that could disguise errors, as well...
passing a file descriptor to "free" should violate the specification.) Basically,
the type system used by this paper is a crude hack to get around lack of
language support for true ADTs. Other than that, I find the approach
interesting intellectually, though its practicality is thus far unproven.