The final project write-up and any implementation is due at
midnight on Friday, December 19th. E-mail me your project
write-up and any code you have written.
Kinds of Projects
Projects on any subject related to the topics covered in class are
acceptable. If you are taking two classes that require projects, then
with the agreement of the other instructor you may be able to do a
single (larger) combined project for both classes. There are two main
kinds of projects: a survey project and a research project.
The Survey Project
- Pick an area in which you are interested
- Read thoroughly 3-6 papers, and read at least superficially 3-6
other papers. I will help, but you will need to do work to find the
papers.
- Write a survey of what you have learned:
- What are the basic problems?
- What are the basic approaches to solving them?
- What are the main achievements to date?
- What are the open problems?
- Keep the scope narrow enough so you can say something interesting
The Research Project
In general, I recommend that you try to do a research project.
Usually research projects involve some survey work and some
implementation work. There are many kinds of research projects,
including the following:
- Design and implement a program analysis
- Try to formalize some interesting aspect of an existing language
- Invent a new language or new language features, to make program
analysis and understanding of that language more tractable
Doing a good research project is harder than doing a survey, but
ultimately much more rewarding. If you want to do a research project
in an area you are not yet sufficiently familiar with, you should
start out with a brief survey project and turn it into a research
project once you know more about the area. One important caveat:
While your research project may be very open ended, be sure that you
have a well-defined goal for the end of the semester so that you have
something to write-up and present.
The Paper
Your write-up at the end of the semester should be in the form of a
research paper. You should include an abstract and an introduction
summarizing your motivation and your accomplishments. The
intermediate sections should contain full details about what you did.
End with a conclusion putting the project in perspective and
mentioning open problems, and a bibliography of cited papers.
Research papers should also include a related work section.
The paper should be typeset and made available in electronic form.
The Presentation
At the end of the semester, you will give a presentation about your
project. Note that after the presentation you may continue to work on
your project, so it is okay if your project is not quite complete. If
we can manage to fit everyone in, presentations will be 20 minutes
long, plus 5 minutes for questions. I recommend preparing slides for
your talk. Depending on how long it takes you per slide, you may want
anywhere from 8-15 slides; try your talk out ahead of time to see how
long it takes.
Suggestions for Research Projects
(9/26/03) This is probably it for the list of canned projects I'll
suggest.
- CQual is a tool I have developed for adding type
qualifier to C programs. CQual has been used to infer extra consts in
C programs, find format-string vulnerabilities, find year 2000
problems, find deadlocks in the Linux kernel, and check file usage in
application code, among other applications. There are lots of
potential projects you could do with CQual:
- Pick another application domain, and try using CQual to check
programs in that domain. For example, you could look at programs that
use a particular library that places requirements on clients (e.g., a
database library, or a graphics library). You could try to apply some
of the analyses that work on the Linux kernel to another operating
system like BSD.
- Currently the portion of CQual that is hardest to use is the
flow-sensitive type qualifiers. The difficulty is that the
alias analysis that's used can lead to unexpected results. Pick an
application domain and a flow-sensitive problem, and, after surveying
the existing alias analysis techniques, see if you can come up with a
notion of aliasing that provides fewer unexpected results. For
example, would using an inclusion-based alias analysis, rather than a
unification-based alias analysis, improve things?
- CQual includes a simple visualization system for presenting the
analysis results to the user. The interface could be greatly improved
using better knowledge of HCI than I have. For example, Ben Bederson has developed
a toolkit for building zoomable user interfaces; can some ideas from
that be adapted to present the results of CQual? Given a constraint
graph with an inconsistency, is there some notion of a "best" error
message to report?
- Java includes a native interface that allows program written in C,
C++, and other languages to interact with Java code. Unfortunately,
the native side of the code is completely unchecked, and so mistakes
in native code can lead to a complete violation of the type and memory
safety aspects of Java. I have a laundry list of properties of Java
native code requirements that could be checked with static analysis,
ranging from simple to sophisticated. The project would be to explore
this area further, and to try implementing some set of analyses. For
example, one interesting property to check might be to make sure that
when native code uses the JNI interface, it always uses classes and
methods with the right types (the types are just supplied as strings
to the JNI library). A more complicated check would be to make sure
that Java resources are handled safely on the native side. For
example, if native code has a pointer to a Java object, and that
pointer persists across native code invocations, then that the JVM
must be informed of this fact, so that it doesn't try to garbage
collect the object. And once native code is done with a Java object,
it must tell the JVM so that the memory associated with the object can
be freed.
- The program analyses we have seen in class, and many of the ones
that have been proposed in the research literature, determine fairly
low-level properties of code: whether an expression is constant or
not, which expressions may alias, or whether a series of functions is
called in the correct sequence. In contrast, programmers are often
very concerned with high-level properties of their code: in the 90's,
there was a lot of excitement about design patterns and how they help
in the design and architecture of programs. I believe that it is
possible to check high-level design patterns using some of the
low-level techniques researchers have been developing. The goal of
this project would be to explore this connection, and to implement a
kind of type system for certain design patterns. There are lots of
possibilities here, so this could easily be more than one project.
- Over the past 5 years (and also earlier), there has been lots of
work in the research community on program analysis tools for improving
software reliability. But these tools are generally built in
isolation, and there is little direct comparison of them. The goal of
this project is to do a "face-off" of program analysis tools: Find a
set of tools that you can download (or that we can arrange access to),
and compare them head-to-head: Do they find the same bugs in the same
code? How fast do they run? How much memory do they use? How many
annotations do they require? How hard/easy are they to use? Could
you merge ideas from two separate tools to form one, better tool?
- Cyclone is a programming language that provides low-level control over
resources, like C (upon whose syntax it is based), and also provides safety,
like Java, thus ruling out important classes of bugs and security holes.
Here are some projects that propose to develop new, useful features for
Cyclone:
- Implement a simple thread system for Cyclone. This would
occur in three steps: 1) modify the Cyclone runtime system to be
thread safe but changing currently-global datastructures into thread
local ones; 2) adding a library wrapper for some subset of POSIX
threads, which would include the basic thread primitives (spawn, join,
etc.), and the basic synchronization datastructures (mutexes,
condition variables) and their operations; 3) Extend the type system
to enforce the proper use of synchronized data, i.e. to guarantee the
absence of data races. A theoretical description of how to do this
for Cyclone is described in Grossman's "Type-Safe Multithreading in
Cyclone," which appeared in TLDI '03. You could try to implement this
system as described, or a simpler system of your own making, which
might provide fewer guarantees.
- Improve/extend Cyclone's porting tool from C programs to
Cyclone. While Cyclone is quite similar to C, the fact that it reveals
underlying implementation details (like the layout of memory, stack
allocation, whether a pointer requires bounds checks, etc.) makes
porting C code to Cyclone somewhat tedious. Cyclone has a simple
porting tool, which uses constraint-based program analysis to aid in
this process, but it is primtive and brittle. The goal of this
project would be to generally improve the tool. For example, one
approach would be to improve the process of inferring pointer types.
Cyclone has a variety of types, including "thin pointers", "fat
pointers", zero-terminated pointers, and others. You could develop a
program analysis to infer the "best" version of a pointer in a C
program. Another approach would be to perform region inference to
make use of Cyclone's region-based memory management.
- Cyclone's memory management uses regions, which are memory areas
whose objects are deallocated all at once. Recent work has shown that
regions can be made more effective by also allowing individual object
deallocated. One approach to allowing this feature would be to keep
track dynamically (at run-time) of whether individual objects are live
or not, and to check whether an object is life before any access to
the object. This would most likely be prohibitively slow, so the
project would also include developing some kind of static analysis to
determine when no accesses to a location will be made after it is
freed, in which case no dynamic information need be kept.
- Security failures are a recurring problem. One important issue
that often comes up is the security of a particular protocol for
implementing, say, a secure communiciation. However, while protocols
can be checked abstractly using various techniques, there is always
the question whether the actual implementation meets the specified
protocol. There is already some technology for modeling the behavior
of a program (e.g., see the MOPS project at Berkeley). Can we use a
program analysis to extract the actual implemented protocol from a
piece of code and then determine whether the code actually implements
the protocol? This project would either be to design a formal system
and prove what it can do, or construct an implementation (perhaps one
that is only sound up to a certain degree) and try it out on some
examples.
Other Suggestions for Finding Projects
- Look through the papers we haven't yet read for class.
- Take a look (on the ACM digital library, for example; full access
is available from machines in the umd.edu domain) at the last year or
two's worth of proceedings of PLDI (Programming Langauge Design and
Implmentation) and POPL (Principles of Programming Languages). You
may also want to look at the last couple of years of the Static
Analysis Symposium (published in Lecture Notes in Computer Science by
Springer (http://www.springerlink.com for electronic versions)). You
may also be able to find some of these in the library, or you could
certainly stop by my office.
- Take a look at web pages for the Open Source Quality project at
Berkeley, the Softare
Productivity Tools group at Microsoft, and the Program Analysis,
Transformation, and Visualization group at IBM.
Front Ends
Any program analysis that works on a real language needs a parser for
that language. Fortunately, there are many existing front-ends for
popular languages. CQual has a robust C parser that you can use;
Elkhound is a C++ parser; and Soot and BCEL are systems for working
with Java bytecodes.
