Project Suggestions

Contact Dr. Michael Hicks for more information about the projects, unless otherwise noted in the project description. Here is a summarized list, with details below: There are suggestions for other projects at the bottom of this page.
Dynamic Software Updating. Software systems are imperfect, so software updates are a fact of life. While typical software updates require stopping and restarting the program in question, many systems cannot afford to halt service, or would prefer not to. Dynamic software updating (DSU) addresses this difficulty by permitting programs to be updated while they run. We have developed a DSU compiler for C programs that makes programs updateable, and so far we have had good experience using it (http://www.cs.umd.edu/projects/dsu/). There are two project-sized ideas that could be explored here:
Cyclone. Cyclone is a type-safe dialect of the C programming language, affording much of the control of C, but lacking a wide array of its security and reliability weaknesses (http://cyclone.thelanguage.org). One posssible project is the following:

Cyclone provides language support for a variety of pointer types to afford a wide degree of programmer control. One important annotation is @zeroterm: this annotation states the pointer is to a zero-terminated buffer, which is important if the pointer is to be passed to a C routine that expects it. The @zeroterm implementation has been recently updated to be more intuitive to programmers, we hope (this is not in the main trunk, so you'll have to acquire it from me directly). The goal of this project would be to port some programs to Cyclone, and to evaluate the effectiveness of @zeroterm annotations in terms of flexibility (how easy was it to retrofit existing programs to use this annotation?) and in terms of efficiency (how much additional overhead was due to the annotation?).

Skills required: This can be a one-person project. Basically you would need to learn to use Cyclone well enough to port one or two programs, and then would need to evaluate the process: how many lines of code did you change, and how much additional overhead was there, in terms of space and time. We have some benchmarking infrastructure for this purpose that you could use.


Type Qualifiers. There are many properties of programs that are not checked by standard tools such as compilers. As we discussed in class, type qualifiers are a lightweight mechanism for checking additional program properties. We have used type qualifiers to check and infer properties of C programs using CQual, a tool that adds type qualifiers to C. We've performed const inference, Y2K bug detection, and looked for format-string vulnerabilities in applications, deadlocks in the Linux kernel, and user/kernel-space trust errors in the linux kernel, among other applications. Currently Dave Greefieldboyce is working on JQual, which adds type qualifiers to Java.

Two possible projects:


UNO - Uniqueness and Ownership Inference for Java. Understanding and controlling aliasing is a fundamental part of building robusy software systems. Two important aliasing properties that have emerged in recent years are uniqueness and ownership. Unique objects are those referred to by only one pointer, and owned objects are encapsulated inside their owner, and hence cannot be directly accessed by other components.

UNO is a tool that, given a regular Java program, infers uniqueness and ownership of methods. A method has a unique return value if the caller of the method can safely assume they have the only pointer to the value returned by the method. A method or constructor owns its ith argument if after a call o.m(..., xi, ...), the object o has the only pointers to xi. You can read more about UNO here.

Here are two project ideas (Contact: Dr. Jeff Foster):


Pistachio is a tool that checks network protocol implementations against a specification using symbolic evaluation and theorem proving. You can read more about Pistachio here. Although as currently described Pistachio is aimed at network protocols, there many other things we believe Pistachio could be used for. Here are some project ideas (Contact: Dr. Jeff Foster):
Static analysis by program transformation. To make static analysis more agile and accessible, it is prudent to develop general tool support for analyzing programs. This support can then be built upon to build custom analyses. In class we discussed various styles of analysis, including "flow-sensitive" and "context-sensitive" analysis, where the form considers the effects of the program in control flow order (e.g., data flow analysis) and the latter distinguishes different call sites to the same function (e.g., in the work on type qualifiers, or parametric polymorphism for ML). We have also briefly discussed "field-sensitivity" (each field in a struct is treated distinctly from each other field, and the same fields of different objects are treated distinctly).

The goal of this series of projects is to be able to augment a simple static analysis to be more precise by transforming the input program. That is, rather than defining a sophisticated static analysis, which can be quite time consuming, we instead transform the program in a semantics-preserving manner that will cause the analysis to be more precise. Here are some possible transformations one could implement:

  1. Function pointers. Generating a call graph for a program with function pointers can be challenging, since the possible values of the function pointer at a given program point must be known. One useful transformation would be to analyze the program to determine the possible values a function pointer could take on, and then change the call via the function pointer to switch on a scalar value instead. For example, in the following program
    void (*f)() = NULL;
    void g() { ... }
    void h() { ... }
    void foo(int x) {
      if (x)
        f = g;
      else
        f = h;
      f();
    }  
    
    We could transform this program so that each function pointer variable is changed to an integer instead and each function that could be assigned to that variable is given a number. Then we call the possible choices directly, switching on the integer:
    int f = 0;
    void g() { ... }
    void h() { ... }
    void foo(int x) {
      if (x)
        f = 1;
      else
        f = 2;
      if (f == 1) then
        g();
      else if (f == 2) then
        h();
    }  
    
    To determine possible targets, you simply need to analyze the program to find all functions that are assigned to any function pointer, and then give an integer to each. Alternately, you could employ a points-to analysis to reduce the space of candidates.
  2. Context sensitivity. Rather than implement parametric polymorphism as you did in project 2, you could transform the program to duplicate the definitions of all functions called more than once in the program, and change all the callsites to be to different versions. For example
    int id(int x) { return x };
    void foo() {
      id(8);
      id(11);
    }
    
    would be transformed to be
    int id_1(int x) { return x };
    int id_2(int x) { return x };
    void foo() {
      id_1(8);
      id_2(11);
    }
    
    This will obviously not work completely for recursive functions; you have to "tie the knot" at some point to stop the expansion.

    This transformation will, almost certainly, blow up any reasonably-sized program. However, this technique is very easy to implement selectively; i.e., by treating only a few functions context-sensitively. Your transformation should allow such selective treatment.

  3. Field sensitivity. A typical treatment of fields in record types is field insensitive: each field is "conflated" so that assignments and reads from a particular field are attributed to the entire structure. In other words, if you had the program
    struct T { int x; int y };
    void foo() {
      struct T f1;
      f1.x = 1;
      f1.y = 2;
    }  
    
    A field-insensitive analysis would think that f1.x could be assigned to either 1 or 2. One way to introduce field-sensitivity is to expand the fields by removing the struct and replacing it with variables:
    void foo() {
      int f1_x;
      int f1_y;
      f1_x = 1;
      f1_y = 2;
    }  
    
    Assignments between structs necessitate generating code that assigns between the struct's fields (at least those that are actually used within a given function). One challenge here is in dealing with recursive types. Once again, this expansion can be selective.
  4. Flow sensitivity. We observed that programs like CQual do not treat statements flow sensitively. One way to encode this is to convert the program into static single assignment (SSA) form first. IN this form, all variables are assigned to exactly once, and so they can be distinguished. There are well-understood techniques for implementing SSA.
  5. Polymorphism. Oftentimes, C programmers use void* to implement polymorphism, as in
    struct list {
      void *data;
      struct list *next;
    };
    
    Such uses of polymorphism, which often confuse an analysis, can be eliminated by expanding out the definition, as in
    struct list_int {
      int data;
      struct list_int *next;
    }
    

Skills required: This project could be implemented as modules for CIL, which is a C parser written in OCaml. Familiarity with C and OCaml are useful. Several people could work together to create different CIL modules.


Proof assistants for soundness proofs

We've developed a system for Contextual Effects which model what happens before, during, and after each expression in the program. We have a proof on paper that our system is sound, but it would be much better to have a mechanically verified proof. For this project, take our contextual effect system, our paper proof, and prove that it is correct using Coq, most likely starting from the paper by Aydemir et al on mechanizing metatheory.

Skills required: You should be comfortable with programming langauge soundness proofs and functional programming (Coq builds on OCaml).


Other Ideas for Projects

Here is a list of papers about interesting tools or techniques; we may/will cover some of these in the rest of the class, but clearly we will not have time to talk about all of them. You may want to read some to get ideas for projects.

Resource for Projects

Here's a page that lists a whole bunch of software tools that are helpful in doing programming languages research. (If you have suggestions for more stuff to add, please let me know!)