Mining Specifications

Glenn Ammons, Ratislav Bodik, James R. Larus

Notes by James Rose

1. Introduction

The purpose of this paper is to illustrate a system that allows specifications to be derived semi-automatically from C programs.  The specifications so derived describe temporal and data-dependence properties of an API or and ADT.  In more detail, the specification consists of a NFA, where the edges represent calling a particular function, with particular variables passed as parameters, and a particular variable to store the return value.  For example, we may represent by the letter A a call to open(...) which stores the returned file descriptor in X1.   Similarly, let B be a call to read (_,_,X1), let C be a call to write (_,_,X1), and let D be close(X1).  Now, we say that Start follows edge A to a node, which has edges B and C back to itself, and edge D leading to End.  Then we have a simple specification for the POSIX open/read/write/close functions, which essentially says that files must be opened before reading or writing, and they must eventually be closed.

2.  How it works

It takes several steps to derive the specification, some requiring considerable expert input.  First, every function call that is to be monitored must be wrapped so that its name, arguments (and suitable expansions of struct arguments), and return value are printed out to a trace file whenever it is called.  Then, the program is run with some testcases and the traces are collected.  The traces are then run through the "flow dependence annotator".  Basically, this tool notices when a particular value that is "defined" by one function is later "used" by another function, then calls the pair of invocations flow-dependent (i.e., adds this pair of "interactions" to the flow-dependence relation).  I enquote "defined" and "used" because they have special meaning here:  an expert must categorize all arguments / return values (collectively refered to as "attributes") as definers, users, or both, for each function that they appear in.  Definers change the state of their attribute in some way; users depend on that state.  

Once the flow dependence relation has been defined, each interaction attribute is given a unique type.   Types are then unified based on flow dependence:  that is, if two interaction attributes are flow-dependent, they must share the same type.

Next, the scenario extractor finds maximal sets of interactions (up to a certain bounded size) whose attributes are related by flow dependencies.  The scenario is then simplified by eliminating all attributes which are not involved in flow dependence.  Standardization converts concrete values (e.g. fd = 7) into abstract values (e.g. x0) so that the scenarios are not dependent on the particular file descriptors or pointers returned by API calls.  Each interaction is then given a one-letter representation.  These letters form "scenario strings" that are sent to the next step.

The PFSA learner uses an off-the-shelf method (sk-strings, from Raman and Patrick) to infer a PFSA that matches the language defined by the set of scenario strings.   It is then "cored" by removing edges with low probability, and discarding edge weights.

3. Results

To test this tool, they use it to generate a specification for the X11 selection mechanism as described by the ICCCM.  Out of 16 programs tested, two had bugs found by this tool.  I think more programs should have been tested, since there are many out there, and it is not clear from this small sample whether the kinds of bugs it checks for are actually a problem in practice, nor is it clear whether the model converges quickly.  They cite verification tools as beyond the scope of the paper, yet they must have used something to get these results; it's not clear how advanced these tools are.  Most likely, they rely on dynamic analysis rather than static analysis, which is unfortunate, since they say themselves early on that actual program traces are less likely to contain bugs than the program itself.  What is an advantage for the trainer is a disadvantage for the checker.

At this point, the value of the tool is greatly diminished by the necessity of extensive instrumenting that is required:  it seems likely that the time spent on instrumentation would be much greater than the time spent developing the model manually.  However, the instrumentation provides a basis for automatic verification, and further work should reduce this burden.  

Another drawback to their approach is that the sets of values used by each type must be disjoint, or else inappropriate flow-dependencies will be generated and types will be unified incorrectly.  This is because it looks only at the strings printed out by the tracer to determine if a value is passed from one place to another: if "7" is returned in one place and used in another, it assumes that it is not merely coincidence.  If heap objects are never freed, then there should be a 1-1 association between types and pointer values; however, if memory is reused, then the method could fail.  It could also fail if handles in two different spaces are printed out the same way, e.g. file descriptors and window references.  I say this is mostly minor because with a bit more annotation work, the tracer could identify the types and use different representations, e.g. "fd7" or "window80".  (On the other hand, that could disguise errors, as well... passing a file descriptor to "free" should violate the specification.)  Basically, the type system used by this paper is a crude hack to get around lack of language support for true ADTs.  Other than that, I find the approach interesting intellectually, though its practicality is thus far unproven.