Project Suggestions

Contact Dr. Michael Hicks for more information about the projects, unless otherwise noted in the project description. Here is a summarized list, with details below:

Dynamic Software Updating
1. Optimizations
2. Taken Application to Xpilot
Cyclone: Evaluating zero-terminated pointers
Type Qualifers:
1. Information flow analysis
2. Application to Java programs
UNO - Uniqueness and Ownership Inference for Java
1. Proving soundness
2. Extensions to discover almost unique methods
Pistachio - Checking protcol implementation correctness
Program analysis by transformation
Proof assistants for soundness proofs

There are suggestions for other projects at the bottom of this page.

Dynamic Software Updating. Software systems are imperfect, so software updates are a fact of life. While typical software updates require stopping and restarting the program in question, many systems cannot afford to halt service, or would prefer not to. Dynamic software updating (DSU) addresses this difficulty by permitting programs to be updated while they run. We have developed a DSU compiler for C programs that makes programs updateable, and so far we have had good experience using it (http://www.cs.umd.edu/projects/dsu/). There are two project-sized ideas that could be explored here:

The compiler introduces a number of performance overheads to support DSU, notably various levels of indirection. We believe a number of these overheads could be eliminated through novel compiler optimizations along the lines of data flow analysis. Supporting these would basically be adding a phase or two to our multi-phase compiler, which uses the CIL analysis framework (written in OCaml).
Skills required: I would suggest this for a two-person project. Projects 1 and 3 should get you in pretty good shape to do this project. You should also read Practical Dynamic Software Updating for C for background on Ginseng. Iulian Neamtiu will be able to provide support and assistance on this project.
So far we have experimented with using Ginseng on three open source servers. A useful project would be to apply it to some other programs, too, to determine problem areas for future research (if any!). One idea would be to try to construct dynamic updates for the game Xpilot. Basically you would build the game so that it could run on Linux, and then develop patches that correspond to different releases, attempting to apply those patches on the fly.
Skills required: I think this would make a good one-person project (but several people could be doing this project with different programs). You should be reasonably familiar with C.

Cyclone. Cyclone is a type-safe dialect of the C programming language, affording much of the control of C, but lacking a wide array of its security and reliability weaknesses (http://cyclone.thelanguage.org). One posssible project is the following:

Cyclone provides language support for a variety of pointer types to afford a wide degree of programmer control. One important annotation is @zeroterm: this annotation states the pointer is to a zero-terminated buffer, which is important if the pointer is to be passed to a C routine that expects it. The @zeroterm implementation has been recently updated to be more intuitive to programmers, we hope (this is not in the main trunk, so you'll have to acquire it from me directly). The goal of this project would be to port some programs to Cyclone, and to evaluate the effectiveness of @zeroterm annotations in terms of flexibility (how easy was it to retrofit existing programs to use this annotation?) and in terms of efficiency (how much additional overhead was due to the annotation?).

Skills required: This can be a one-person project. Basically you would need to learn to use Cyclone well enough to port one or two programs, and then would need to evaluate the process: how many lines of code did you change, and how much additional overhead was there, in terms of space and time. We have some benchmarking infrastructure for this purpose that you could use.

Type Qualifiers. There are many properties of programs that are not checked by standard tools such as compilers. As we discussed in class, type qualifiers are a lightweight mechanism for checking additional program properties. We have used type qualifiers to check and infer properties of C programs using CQual, a tool that adds type qualifiers to C. We've performed const inference, Y2K bug detection, and looked for format-string vulnerabilities in applications, deadlocks in the Linux kernel, and user/kernel-space trust errors in the linux kernel, among other applications. Currently Dave Greefieldboyce is working on JQual, which adds type qualifiers to Java.

Two possible projects:

JQual is a tool that adds type qualifiers to Java. So far, we've looked at to applications of JQual: checking that JNI code does not pass forged integers to C code that is expecting pointers to the C heap; and inferring the "readonly" qualifier in Java code. There are undoubtedly many more applications. The goal of this project is to explore the domain of qualifiers for Strings in Java. For example, it might be useful to distinguish strings that represent UTF8 encoded strings; from strings that are SQL queries; from strings that represent HTTP header information; and so on. The goal would be to take 1-5 large web applications, identify all the Strings in them, come up with a set of qualifiers describing the Strings; and use JQual to look for any inconsistent uses of the strings. (Contact Jeff Foster.)
Type qualifiers are quite similar to ideas in information flow analysis, which aims to prove that a program that manipulates sensitive data satisfies a property of non-interference. Put simply: the high security (private) data in the program should not be visible on a public channel. CQual tracks the dataflow of qualified values throughout the program, which is necessary but not sufficient for proving non-interference. The project would be to extend CQual's implementation (or perhaps that of Oink, a related tool that also works for C++ and is possibly more modular) to reason about implicit flows as well.
Skills required: You will also need to understand information flow analysis. The survey by Sabelfeld and Myers is a good start. The project could either be to modify CQual's code base, or to use a source-to-source transformation from the original program into one to be passed to CQual as is. This could be a one or two-person project.

UNO - Uniqueness and Ownership Inference for Java. Understanding and controlling aliasing is a fundamental part of building robusy software systems. Two important aliasing properties that have emerged in recent years are uniqueness and ownership. Unique objects are those referred to by only one pointer, and owned objects are encapsulated inside their owner, and hence cannot be directly accessed by other components.

UNO is a tool that, given a regular Java program, infers uniqueness and ownership of methods. A method has a unique return value if the caller of the method can safely assume they have the only pointer to the value returned by the method. A method or constructor owns its ith argument if after a call o.m(..., xi, ...), the object o has the only pointers to xi. You can read more about UNO here.

Here are two project ideas (Contact: Dr. Jeff Foster):

We have developed a very concise description of UNO as a set of mutually recursive predicates about the program. While we believe that UNO is correct, we do not know for sure. We would like to see a formal proof, either on paper or in a mechanical theorem proving system, that UNO is sound.
Skills required: You should be comfortable doing progress and preservation proofs, and interested in extending that approach to handle UNO. We have made some limited progress on the proof already, so you would have a head start. If you're interested in trying to use an automatic theorem prover to assist you with the proof, then I recommend working as a pair on this project. Martin Ma and myself will be able to help out. For the course project, you would not need to prove that the whole system is sound---starting with a small component, such as uniqueness, would be sufficient, and interesting on its own.
UNO currently provides two bits of information for methods: Whether they are unique, and whether they own their arguments. However, the way UNO is designed, it would be very easy to find methods that are almost unique, or almost own their arguments. Specifically, uniqueness is a predicate defined as a conjunction of other predicates, and instead of just deciding that uniqueness doesn't hold if one element of the conjuction doesn't, we could see how close we are to it holding. In this project, you would use Martin's implementation of UNO, figure out how it would make sense to modify it to detect near uniqueness and ownership, and run some experiments to see if this idea makes sense.
Skills required: You should be comfortable programming in Java. Martin Ma will be available to help you understand his code.

Pistachio is a tool that checks network protocol implementations against a specification using symbolic evaluation and theorem proving. You can read more about Pistachio here. Although as currently described Pistachio is aimed at network protocols, there many other things we believe Pistachio could be used for. Here are some project ideas (Contact: Dr. Jeff Foster):

One possible project would be to use Pistachio to check implementations of security primitives such as crypto algorithms. The idea would be to start from a textbook description of a crypto system, write rules that describe how the system should work, and then use Pistachio to check than an implementation matches the rules. For example, you could find an implementation of Kerberos and see if it matches the specification.
Skills required: Pistachio is written in OCaml, and you might need to do a little hacking on it. Knowledge of cryptography is not required, but couldn't hurt. Pistachio analyzes C code, so you should be comfortable looking at and understanding C. This could be a one- or two-person project.
Pistachio issues warnings to the user when it finds a potential violation of a network protocol specification. However, it is then up to the user to decide how to fix the problem, and exactly what went wrong. We would like to develop ways to automatically explain and propose fixes for problems that Pistachio detects. We believe that techniques from the AI literature could be used for this. In particular, there are AI techniques that, given an open world and a set of facts we would like to deduce, will come up with plausible explanations that would allow us to deduce the facts.
Skills required: If you're interested in doing this project, you'll probably want to work closely with Octavian Udrea, the developer of Pistachio, who has some detailed ideas about solving this problem. Of the three Pistachio projects, this one has the most potential upside, in that this could result in a very strong result.
Creating a better protocol specification language. So far we've used Pistachio to check implementations of two protocols, and the specification language seems pretty good---we were able to describe most of the protocols we we looked at, in a relatively short amount of time. However, that's only a modest amount of evidence that our protocol specification language is any good, and we don't know if anyone else can use it. The goal of this project is to come up with a better, more complete, and more software engineering-oriented specification language for protocols. The first step is to take an existing protocol, one that we haven't looked at with Pistachio, come up with a specification, and then try applying Pistachio to an implementation. Just doing that would actually be quite useful. The next step is to figure out all the problems. What are the basic components necessary for describing a protocol, and does Pistachio let you express them all? Is Pistachio's notation for protocols convenient and easy to use? What parts of protocol specifications are not easy or not possible to encode in Pistachio right now? Are there ideas such as abstraction, modularity, and other good software engineering practices that should be part of protocols? As part of this project, it would also be useful to do a brief survey of other specification languages for network protocols. There are many possible directions to go in: Coming up with a better language, and then making a "compiler" from the better language to Pistachio; modifying Pistachio to check the new specifications (which might be hard); or building a dynamic tool that, at run time, checks a protocol implementation against the better specification; building a tool to auto-generate protocol implementations from specifications.

Static analysis by program transformation. To make static analysis more agile and accessible, it is prudent to develop general tool support for analyzing programs. This support can then be built upon to build custom analyses. In class we discussed various styles of analysis, including "flow-sensitive" and "context-sensitive" analysis, where the form considers the effects of the program in control flow order (e.g., data flow analysis) and the latter distinguishes different call sites to the same function (e.g., in the work on type qualifiers, or parametric polymorphism for ML). We have also briefly discussed "field-sensitivity" (each field in a struct is treated distinctly from each other field, and the same fields of different objects are treated distinctly).

The goal of this series of projects is to be able to augment a simple static analysis to be more precise by transforming the input program. That is, rather than defining a sophisticated static analysis, which can be quite time consuming, we instead transform the program in a semantics-preserving manner that will cause the analysis to be more precise. Here are some possible transformations one could implement:

Function pointers. Generating a call graph for a program with function pointers can be challenging, since the possible values of the function pointer at a given program point must be known. One useful transformation would be to analyze the program to determine the possible values a function pointer could take on, and then change the call via the function pointer to switch on a scalar value instead. For example, in the following program
```
void (*f)() = NULL;
void g() { ... }
void h() { ... }
void foo(int x) {
  if (x)
    f = g;
  else
    f = h;
  f();
}  
```
We could transform this program so that each function pointer variable is changed to an integer instead and each function that could be assigned to that variable is given a number. Then we call the possible choices directly, switching on the integer:
```
int f = 0;
void g() { ... }
void h() { ... }
void foo(int x) {
  if (x)
    f = 1;
  else
    f = 2;
  if (f == 1) then
    g();
  else if (f == 2) then
    h();
}  
```
To determine possible targets, you simply need to analyze the program to find all functions that are assigned to any function pointer, and then give an integer to each. Alternately, you could employ a points-to analysis to reduce the space of candidates.
Context sensitivity. Rather than implement parametric polymorphism as you did in project 2, you could transform the program to duplicate the definitions of all functions called more than once in the program, and change all the callsites to be to different versions. For example
```
int id(int x) { return x };
void foo() {
  id(8);
  id(11);
}
```
would be transformed to be
```
int id_1(int x) { return x };
int id_2(int x) { return x };
void foo() {
  id_1(8);
  id_2(11);
}
```
This will obviously not work completely for recursive functions; you have to "tie the knot" at some point to stop the expansion.
This transformation will, almost certainly, blow up any reasonably-sized program. However, this technique is very easy to implement selectively; i.e., by treating only a few functions context-sensitively. Your transformation should allow such selective treatment.
Field sensitivity. A typical treatment of fields in record types is field insensitive: each field is "conflated" so that assignments and reads from a particular field are attributed to the entire structure. In other words, if you had the program
```
struct T { int x; int y };
void foo() {
  struct T f1;
  f1.x = 1;
  f1.y = 2;
}  
```
A field-insensitive analysis would think that f1.x could be assigned to either 1 or 2. One way to introduce field-sensitivity is to expand the fields by removing the struct and replacing it with variables:
```
void foo() {
  int f1_x;
  int f1_y;
  f1_x = 1;
  f1_y = 2;
}  
```
Assignments between structs necessitate generating code that assigns between the struct's fields (at least those that are actually used within a given function). One challenge here is in dealing with recursive types. Once again, this expansion can be selective.
Flow sensitivity. We observed that programs like CQual do not treat statements flow sensitively. One way to encode this is to convert the program into static single assignment (SSA) form first. IN this form, all variables are assigned to exactly once, and so they can be distinguished. There are well-understood techniques for implementing SSA.
Polymorphism. Oftentimes, C programmers use void* to implement polymorphism, as in
```
struct list {
  void *data;
  struct list *next;
};
```
Such uses of polymorphism, which often confuse an analysis, can be eliminated by expanding out the definition, as in
```
struct list_int {
  int data;
  struct list_int *next;
}
```

Skills required: This project could be implemented as modules for CIL, which is a C parser written in OCaml. Familiarity with C and OCaml are useful. Several people could work together to create different CIL modules.

Proof assistants for soundness proofs

We've developed a system for Contextual Effects which model what happens before, during, and after each expression in the program. We have a proof on paper that our system is sound, but it would be much better to have a mechanically verified proof. For this project, take our contextual effect system, our paper proof, and prove that it is correct using Coq, most likely starting from the paper by Aydemir et al on mechanizing metatheory.

Skills required: You should be comfortable with programming langauge soundness proofs and functional programming (Coq builds on OCaml).

Other Ideas for Projects

Replicate Gulwani's implementation for program verification via machine learning for a simple statement langauge.
Take a look (on the ACM digital library, for example; full access is available from machines in the umd.edu domain) at the last year or two's worth of proceedings of PLDI (Programming Langauge Design and Implmentation) and POPL (Principles of Programming Languages). You may also want to look at the last couple of years of the Static Analysis Symposium (published in Lecture Notes in Computer Science by Springer (http://www.springerlink.com for electronic versions)). You may also be able to find some of these in the library, or you could certainly stop by my office.
Take a look at web pages for the Open Source Quality project at Berkeley, the Softare Productivity Tools group at Microsoft, and the Program Analysis, Transformation, and Visualization group at IBM.

Here is a list of papers about interesting tools or techniques; we may/will cover some of these in the rest of the class, but clearly we will not have time to talk about all of them. You may want to read some to get ideas for projects.

Liblit et al, Scalable Statistical Bug Isolation
Herlihy, The Transactional Manifesto: Software Engineering and Non-Blocking Synchronization
Godefroid, Klarlund, and Sen, DART: Directed Automated Random Testing
Chin, Markstrum, and Millstein, Semantic Type Qualifiers
Mandelin et al, Jungloid Mining: Helping to Navigate the API Jungle
Fisher, Mandelbaum, and Walker, The Next 700 Data Description Languages
Leroy, Formal Certification of a Compiler Back-End or: Programming a Compiler with a Proof Assistsant
Bishop et al, Engineering with logic: HOL specification and symbolic-evaluation testing for TCP implementations
Boyapati, Liskov, and Shrira, Ownership Types for Object Enacpsulation
Ernst, Cockrell, Griswold, and Notkin, Dynamically Discovering Likely Program Invariants to Support Program Evolution
Sagiv, Reps, and Wilhelm, Parametric Shape Analysis via 3-Valued Logic
Xi and Pfenning, Dependent Types in Practical Programming
Ball, Rajamani, Automatically Validating Temporal Safety Properties of Interfaces
Moller and Schwartzbach, The Pointer Assertion Logic Engine
Corbett, Dwyer, Hatcliff, Laubach, Pasareanu, Robby, and Zheng, Bandera: Extracting Finite-State Models from Java Source Code
Condit, Harren, McPeak, Necula, and Weimer, CCured in the Real World
Strom and Yemini, Typestate: A Programming Language Concept for Enhancing Software Reliability
Crew, ASTLOG: A Language for Examining Abstract Syntax Trees
DeLine and Fanhdrich, Enforcing High-Level Protocols in Low-Level Software
Heintze and Tardieu, Ultra-Fast Aliasing using CLA: A Million Lines of C Code in a Second
Bush, Pincus, and Sielaff, A static analyzer for finding dynamic programming errors
Choi, Lee, Loginov, O'Callahan, Sarkar, and Sridharan, Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs
Savage, Burrows, Nelson, Sobalvarro, and Anderson, Eraser: A Dyanmic Data Race Detector for Multi-Threaded Programs
Flanagan and Freund, Type-Based Race Detection for Java
Guyer and Lin, Client-Driven Pointer Analysis
Hovemeyer and Pugh, Finding Bugs is Easy
O'Callahan and Jackson, Lackwit: A Program Understanding Tool Based on Type Inference (see also Ajax)
Evans, Static Detection of Dynamic Memory Errors
Polyspace, Abstract Inerpretation
Flanagan, Leino, Lillibridge, Nelson, Saxe, and Stata, Extended Static Checking for Java

Resource for Projects

Here's a page that lists a whole bunch of software tools that are helpful in doing programming languages research. (If you have suggestions for more stuff to add, please let me know!)

Web Accessibility