Project Suggestions
Contact Dr. Michael Hicks for more information about the projects,
unless otherwise noted in the project description.
Here is a summarized list, with details below:
There are suggestions for other projects at the
bottom of this page.
Dynamic Software Updating. Software
systems are imperfect, so
software updates are a fact of life. While typical software updates
require stopping and restarting the program in question, many systems
cannot afford to halt service, or would prefer not to. Dynamic
software updating (DSU) addresses this difficulty by permitting
programs to be updated while they run. We have developed a DSU
compiler for C programs that makes programs updateable, and so far we
have had good experience using it (http://www.cs.umd.edu/projects/dsu/).
There are two project-sized ideas that could be explored here:
- The compiler introduces a number of performance overheads to
support DSU, notably various levels of indirection. We believe a
number of these overheads could be eliminated through novel compiler
optimizations along the lines of data flow analysis. Supporting these
would basically be adding a phase or two to our multi-phase compiler,
which uses the CIL analysis framework (written in OCaml).
Skills required: I would suggest this for a two-person
project. Projects 1 and 3 should get you in pretty good shape to do
this project. You should also read Practical
Dynamic Software Updating for C for background on Ginseng. Iulian
Neamtiu will be able to provide support and assistance on this
project.
- So far we have experimented with using Ginseng on three open source
servers. A useful project would be to apply it to some other programs, too,
to determine problem areas for future research (if any!). One idea would be
to try to construct dynamic updates for the game Xpilot. Basically you would build the game
so that it could run on Linux, and then develop patches that correspond to
different releases, attempting to apply those patches on the fly.
Skills required: I think this would make a good one-person project
(but several people could be doing this project with different programs).
You should be reasonably familiar with C.
Cyclone. Cyclone is a type-safe dialect of the C programming
language, affording much of the control of C, but lacking a wide array
of its security and reliability weaknesses (http://cyclone.thelanguage.org).
One posssible project is the following:
Cyclone provides language support for a variety of pointer types to
afford a wide degree of programmer control. One important annotation
is @zeroterm: this annotation states the pointer is to a
zero-terminated buffer, which is important if the pointer is to be
passed to a C routine that expects it. The @zeroterm
implementation has been recently updated to be more intuitive to
programmers, we hope (this is not in the main trunk, so you'll have to
acquire it from me directly). The goal of this project would be to
port some programs to Cyclone, and to evaluate the effectiveness of
@zeroterm annotations in terms of flexibility (how easy was
it to retrofit existing programs to use this annotation?) and in terms
of efficiency (how much additional overhead was due to the
annotation?).
Skills required: This can be a one-person project. Basically
you would need to learn to use Cyclone well enough to port one or two
programs, and then would need to evaluate the process: how many lines
of code did you change, and how much additional overhead was there, in
terms of space and time. We have some benchmarking infrastructure for
this purpose that you could use.
Type Qualifiers. There are many
properties of programs that are not checked by standard tools such as
compilers. As we discussed in class, type qualifiers are a
lightweight mechanism for checking additional program properties. We
have used type qualifiers to check and infer properties of C programs
using CQual, a
tool that adds type qualifiers to C. We've performed const inference,
Y2K bug detection, and looked for format-string vulnerabilities in
applications, deadlocks in the Linux kernel, and user/kernel-space
trust errors in the linux kernel, among other applications. Currently
Dave Greefieldboyce is working on JQual, which adds type qualifiers to
Java.
Two possible projects:
- JQual is a tool that adds type qualifiers to Java. So far, we've looked
at to applications of JQual: checking that JNI code does not pass forged
integers to C code that is expecting pointers to the C heap; and inferring
the "readonly" qualifier in Java code. There are undoubtedly many more
applications. The goal of this project is to explore the domain of
qualifiers for Strings in Java. For example, it might be useful to
distinguish strings that represent UTF8 encoded strings; from strings that
are SQL queries; from strings that represent HTTP header information; and so
on. The goal would be to take 1-5 large web applications, identify all the
Strings in them, come up with a set of qualifiers describing the Strings;
and use JQual to look for any inconsistent uses of the strings. (Contact
Jeff Foster.)
- Type qualifiers are quite similar to ideas in information flow
analysis, which aims to prove that a program that manipulates
sensitive data satisfies a property of non-interference. Put
simply: the high security (private) data in the program should not be
visible on a public channel. CQual tracks the dataflow of qualified
values throughout the program, which is necessary but not sufficient
for proving non-interference. The project would be to extend CQual's
implementation (or perhaps that of Oink, a related tool that also
works for C++ and is possibly more modular) to reason about implicit
flows as well.
Skills required: You will also need to understand information
flow analysis. The survey
by Sabelfeld and Myers is a good start. The project could either be
to modify CQual's code base, or to use a source-to-source
transformation from the original program into one to be passed to
CQual as is. This could be a one or two-person project.
UNO - Uniqueness and Ownership Inference for Java.
Understanding and controlling aliasing is a fundamental part of
building robusy software systems. Two important aliasing properties
that have emerged in recent years are uniqueness and ownership.
Unique objects are those referred to by only one pointer, and
owned objects are encapsulated inside their owner, and hence
cannot be directly accessed by other components.
UNO is a tool that,
given a regular Java program, infers uniqueness and ownership of
methods. A method has a unique return value if the caller of the
method can safely assume they have the only pointer to the value
returned by the method. A method or constructor owns its ith
argument if after a call o.m(..., xi, ...), the object o
has the only pointers to xi. You can read more about UNO here.
Here are two project ideas (Contact: Dr. Jeff Foster):
- We have developed a very concise
description of UNO as a set of mutually recursive predicates about the
program. While we believe that UNO is correct, we do not know for
sure. We would like to see a formal proof, either on paper or in a
mechanical theorem proving system, that UNO is sound.
Skills required: You should be comfortable doing progress and
preservation proofs, and interested in extending that approach to
handle UNO. We have made some limited progress on the proof already,
so you would have a head start. If you're interested in trying to use
an automatic theorem prover to assist you with the proof, then I
recommend working as a pair on this project. Martin Ma and myself
will be able to help out. For the course project, you would not need
to prove that the whole system is sound---starting with a small
component, such as uniqueness, would be sufficient, and interesting on
its own.
- UNO currently provides two bits
of information for methods: Whether they are unique, and whether they
own their arguments. However, the way UNO is designed, it would be
very easy to find methods that are almost unique, or
almost own their arguments. Specifically, uniqueness is a
predicate defined as a conjunction of other predicates, and instead of
just deciding that uniqueness doesn't hold if one element of the
conjuction doesn't, we could see how close we are to it holding. In
this project, you would use Martin's implementation of UNO, figure out
how it would make sense to modify it to detect near uniqueness and
ownership, and run some experiments to see if this idea makes sense.
Skills required: You should be comfortable programming in
Java. Martin Ma will be available to help you understand his code.
Pistachio is a tool that checks network protocol
implementations against a specification using symbolic evaluation and
theorem proving. You can read more about Pistachio here.
Although as currently described Pistachio is aimed at network
protocols, there many other things we believe Pistachio could be used
for. Here are some project ideas (Contact: Dr. Jeff Foster):
- One possible project would be to
use Pistachio to check
implementations of security primitives such as crypto algorithms. The
idea would be to start from a textbook description of a crypto system,
write rules that describe how the system should work, and then use
Pistachio to check than an implementation matches the rules. For
example, you could find an implementation of Kerberos and see if it
matches the specification.
Skills required: Pistachio is written in OCaml, and you might
need to do a little hacking on it. Knowledge of cryptography is not
required, but couldn't hurt. Pistachio analyzes C code, so you should
be comfortable looking at and understanding C. This could be a one-
or two-person project.
- Pistachio issues warnings to the user when it finds a potential
violation of a network protocol specification. However, it is then up
to the user to decide how to fix the problem, and exactly what went
wrong. We would like to develop ways to automatically explain and
propose fixes for problems that Pistachio detects. We believe that
techniques from the AI literature could be used for this. In
particular, there are AI techniques that, given an open world and a
set of facts we would like to deduce, will come up with plausible
explanations that would allow us to deduce the facts.
Skills required: If you're interested in doing this project,
you'll probably want to work closely with Octavian Udrea, the
developer of Pistachio, who has some detailed ideas about solving this
problem. Of the three Pistachio projects, this one has the most
potential upside, in that this could result in a very strong result.
- Creating a better protocol specification language. So far we've used
Pistachio to check implementations of two protocols, and the specification
language seems pretty good---we were able to describe most of the protocols
we we looked at, in a relatively short amount of time. However, that's only
a modest amount of evidence that our protocol specification language is any
good, and we don't know if anyone else can use it. The goal of this project
is to come up with a better, more complete, and more software
engineering-oriented specification language for protocols. The first step
is to take an existing protocol, one that we haven't looked at with
Pistachio, come up with a specification, and then try applying Pistachio to
an implementation. Just doing that would actually be quite useful. The
next step is to figure out all the problems. What are the basic components
necessary for describing a protocol, and does Pistachio let you express them
all? Is Pistachio's notation for protocols convenient and easy to use?
What parts of protocol specifications are not easy or not possible to encode
in Pistachio right now? Are there ideas such as abstraction, modularity,
and other good software engineering practices that should be part of
protocols? As part of this project, it would also be useful to do a brief
survey of other specification languages for network protocols. There are
many possible directions to go in: Coming up with a better language, and
then making a "compiler" from the better language to Pistachio; modifying
Pistachio to check the new specifications (which might be hard); or building
a dynamic tool that, at run time, checks a protocol implementation against
the better specification; building a tool to auto-generate protocol
implementations from specifications.
Static analysis by program transformation. To
make static analysis more agile and accessible, it is prudent to develop
general tool support for analyzing programs. This support can then be built
upon to build custom analyses. In class we discussed various styles of
analysis, including "flow-sensitive" and "context-sensitive" analysis, where
the form considers the effects of the program in control flow order (e.g.,
data flow analysis) and the latter distinguishes different call sites to the
same function (e.g., in the work on type qualifiers, or parametric
polymorphism for ML). We have also briefly discussed "field-sensitivity"
(each field in a struct is treated distinctly from each other field, and the
same fields of different objects are treated distinctly).
The goal of this series of projects is to be able to augment a simple
static analysis to be more precise by transforming the input program. That
is, rather than defining a sophisticated static analysis, which can be quite
time consuming, we instead transform the program in a semantics-preserving
manner that will cause the analysis to be more precise. Here are some
possible transformations one could implement:
- Function pointers. Generating a call graph for a program with
function pointers can be challenging, since the possible values of the
function pointer at a given program point must be known. One useful
transformation would be to analyze the program to determine the possible
values a function pointer could take on, and then change the call via the
function pointer to switch on a scalar value instead. For example, in the
following program
void (*f)() = NULL;
void g() { ... }
void h() { ... }
void foo(int x) {
if (x)
f = g;
else
f = h;
f();
}
We could transform this program so that each function pointer variable is
changed to an integer instead and each function that could be assigned to
that variable is given a number. Then we call the possible choices
directly, switching on the integer:
int f = 0;
void g() { ... }
void h() { ... }
void foo(int x) {
if (x)
f = 1;
else
f = 2;
if (f == 1) then
g();
else if (f == 2) then
h();
}
To determine possible targets, you simply need to analyze the program to
find all functions that are assigned to any function pointer, and then give
an integer to each. Alternately, you could employ a points-to analysis to
reduce the space of candidates.
- Context sensitivity. Rather than implement parametric
polymorphism as you did in project 2, you could transform the program to
duplicate the definitions of all functions called more than once in the
program, and change all the callsites to be to different versions. For
example
int id(int x) { return x };
void foo() {
id(8);
id(11);
}
would be transformed to be
int id_1(int x) { return x };
int id_2(int x) { return x };
void foo() {
id_1(8);
id_2(11);
}
This will obviously not work completely for recursive functions; you have to
"tie the knot" at some point to stop the expansion.
This transformation will, almost certainly, blow up any reasonably-sized
program. However, this technique is very easy to implement selectively;
i.e., by treating only a few functions context-sensitively. Your
transformation should allow such selective treatment.
- Field sensitivity. A typical treatment of fields in record types
is field insensitive: each field is "conflated" so that assignments
and reads from a particular field are attributed to the entire structure.
In other words, if you had the program
struct T { int x; int y };
void foo() {
struct T f1;
f1.x = 1;
f1.y = 2;
}
A field-insensitive analysis would think that f1.x could be
assigned to either 1 or 2. One way to introduce field-sensitivity is to
expand the fields by removing the struct and replacing it with variables:
void foo() {
int f1_x;
int f1_y;
f1_x = 1;
f1_y = 2;
}
Assignments between structs necessitate generating code that assigns between
the struct's fields (at least those that are actually used within a given
function). One challenge here is in dealing with recursive types. Once
again, this expansion can be selective.
- Flow sensitivity. We observed that programs like CQual do not
treat statements flow sensitively. One way to encode this is to convert the
program into static single assignment (SSA) form first. IN this
form, all variables are assigned to exactly once, and so they can be
distinguished. There are well-understood techniques for implementing SSA.
- Polymorphism. Oftentimes, C programmers use void* to
implement polymorphism, as in
struct list {
void *data;
struct list *next;
};
Such uses of polymorphism, which often confuse an analysis, can be
eliminated by expanding out the definition, as in
struct list_int {
int data;
struct list_int *next;
}
Skills required: This project could be implemented as modules for CIL, which is a C parser
written in OCaml. Familiarity with C and OCaml are useful. Several people
could work together to create different CIL modules.
Proof assistants for soundness proofs
We've developed a system for Contextual
Effects which model what happens before, during, and after each
expression in the program. We have a proof on paper that our system is
sound, but it would be much better to have a mechanically verified proof.
For this project, take our contextual effect system, our paper proof, and
prove that it is correct using Coq, most likely starting from the paper by
Aydemir et
al on mechanizing metatheory.
Skills required: You should be comfortable with programming
langauge soundness proofs and functional programming (Coq builds on OCaml).
Other Ideas for Projects
- Replicate Gulwani's
implementation for program verification via machine learning for a
simple statement langauge.
- Take a look (on the ACM digital library, for example; full access
is available from machines in the umd.edu domain) at the last year or
two's worth of proceedings of PLDI (Programming Langauge Design and
Implmentation) and POPL (Principles of Programming Languages). You
may also want to look at the last couple of years of the Static
Analysis Symposium (published in Lecture Notes in Computer Science by
Springer (http://www.springerlink.com for electronic versions)). You
may also be able to find some of these in the library, or you could
certainly stop by my office.
- Take a look at web pages for the Open Source Quality project at
Berkeley, the Softare
Productivity Tools group at Microsoft, and the Program Analysis,
Transformation, and Visualization group at IBM.
Here is a list of papers about interesting tools or techniques; we
may/will cover some of these in the rest of the class, but clearly we
will not have time to talk about all of them. You may want to read
some to get ideas for projects.
- Liblit et al, Scalable Statistical
Bug Isolation
- Herlihy, The
Transactional Manifesto: Software Engineering and Non-Blocking Synchronization
- Godefroid, Klarlund, and Sen, DART: Directed
Automated Random Testing
- Chin, Markstrum, and Millstein, Semantic Type Qualifiers
- Mandelin et al, Jungloid Mining:
Helping to Navigate the API Jungle
- Fisher, Mandelbaum, and Walker, The Next 700 Data
Description Languages
- Leroy, Formal
Certification of a Compiler Back-End or: Programming a Compiler with a
Proof Assistsant
- Bishop et al, Engineering with
logic: HOL specification and symbolic-evaluation testing for TCP
implementations
- Boyapati, Liskov, and Shrira, Ownership Types for Object Enacpsulation
- Ernst, Cockrell, Griswold, and Notkin, Dynamically Discovering Likely Program Invariants to Support Program Evolution
- Sagiv, Reps, and Wilhelm, Parametric Shape Analysis via 3-Valued Logic
- Xi and Pfenning, Dependent Types in Practical Programming
- Ball, Rajamani, Automatically Validating Temporal Safety Properties of Interfaces
- Moller and Schwartzbach, The Pointer Assertion Logic Engine
- Corbett, Dwyer, Hatcliff, Laubach, Pasareanu, Robby, and Zheng, Bandera: Extracting Finite-State Models from Java Source Code
- Condit, Harren, McPeak, Necula, and Weimer, CCured in the Real World
- Strom and Yemini, Typestate: A Programming Language Concept for Enhancing Software Reliability
- Crew, ASTLOG: A Language for Examining Abstract Syntax Trees
- DeLine and Fanhdrich, Enforcing High-Level Protocols in Low-Level Software
- Heintze and Tardieu, Ultra-Fast Aliasing using CLA: A Million Lines of C Code in a Second
- Bush, Pincus, and Sielaff, A static analyzer for finding dynamic programming errors
- Choi, Lee, Loginov, O'Callahan, Sarkar, and Sridharan, Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs
- Savage, Burrows, Nelson, Sobalvarro, and Anderson, Eraser: A Dyanmic Data Race Detector for Multi-Threaded Programs
- Flanagan and Freund, Type-Based Race Detection for Java
- Guyer and Lin, Client-Driven Pointer Analysis
- Hovemeyer and Pugh, Finding Bugs is Easy
- O'Callahan and Jackson, Lackwit: A Program Understanding Tool Based on Type Inference (see also Ajax)
- Evans, Static Detection of Dynamic Memory Errors
- Polyspace, Abstract Inerpretation
- Flanagan, Leino, Lillibridge, Nelson, Saxe, and Stata, Extended Static Checking for Java
Resource for Projects
Here's a page that lists a whole bunch of software tools
that are helpful in doing programming languages research. (If you
have suggestions for more stuff to add, please let me know!)