CMSC 631, Fall 2011
Program Analysis and Understanding
In this project, you will write a symbolic executor for a very simple language. The programming language is essentially three-address code, as might be found in the intermediate representation of a compiler. The grammar for statements in the language is as follows:
stmt ::= | prestmt /* unlabeled statement */ | l: prestmt /* labeled statement */ prestmt ::= | x := a1 bop a2 /* binary operation */ | x := uop a /* unary operation */ | x := a /* copy statement */ | x := newarray n /* create size-n array (of int) of zeros */ | x[a1] := a2 /* array write */ | x := y[a] /* array read */ | goto l /* unconditional branch */ | if x rop y goto l /* conditional branch */ | assert x rop y /* assertion */ | x := symint /* create a symbolic, 16-bit, signed integer */ | x := symarray n /* create a size-n symbolic array of 16-bit signed ints */ a, a1, a2 ::= /* arguments */ | x /* variable */ | n /* positive integer (16-bits) */ bop ::= + | - | * | / /* binary operations */ uop ::= - /* unary operations */ rop ::= == | != | < | > | >= | <= /* relative operations */ x, y ::= variables (strings) l ::= labels (strings) n ::= natural numbers
Statements are optionally labeled (labels are just strings). Most of the statements are self-explanatory. Integers are 16 bits, and signed. (We use 16-bit numbers so they easily fit within an OCaml integer, which is 31 bits wide.) The language includes arrays; an array of size n is created with the statement x := newarray n. Arrays are indexed starting at 0 and may only contain integers; there are no arrays of arrays. You can assume that all variables and array elements are initialized before use.
Here is a code skeleton for this project: p2.tar.gz
We've supplied you with some basic code for working with the above language. We have provided a parser that takes text in the input language and turns it into an OCaml data structure, defined in se.ml, representing the code. We also have provided an unparser that takes OCaml data structures representing statements and prints them.
Here is a brief overview of the code we've given you:
To try out this skeleton code, cd into the top-level directory and type make. (You can speed up the build a little bit by typing make -j2 where 2 is the level of parallelism to use.) We have tested that this code builds on Linux (which you have access to using your junkfood account) and MacOS using gcc 4.x.
Once built, you should be able to run ./se -unparse test1.tac which should parse the file test1.tac and then unparse it to standard out. Note that each statement in the input file has been assigned a unique number, which you'll use in the output below. You can also run ./se -stp-example to run the example STP client in stp_example.ml. A correct run will simply print True.
Hint: you may find it useful to experiment with this file to learn about STP. Feel free to change this file however you like; we won't use this code in grading. Note that the STP interface uses what are called phantom types to ensure you don't use one kind of STP value (like a boolean variable) where another expected (like a bitvector variable); type errors will indicate a misuse of the API. Phantom types are implemented in OCaml as polymorphic variants; you don't really need to understand these to use STP expressions. If you have confusions/questions, of course, please post to the forum.
Your task is to implement a symbolic executor for this language. You must implement function se : stmt list -> unit in file se.ml. You need not put all the code you write in just that file; feel free to add additional files as you see fit, and modify the Makefile appropriately.
Your symbolic executor should start executing in the empty environment (no variables exist yet) with the first statement in the program. Execution of a program ends when there are no more statements to execute (i.e., when execution "falls off the end" of the program). Your executor should explore all possible paths through the program, forking whenever necessary. (Here, "fork" is conceptual; I do not mean you should use the fork system call to create multiple copies of the executor, unless you really want to. I recommend maintaining a work-list of executions.) Use STP to determine which branches are feasible, and whether assertions pass.
If all executions through the program pass all assertions, your program should simply print Pass and exit. Otherwise, do not print Pass. Each time the assertion at line n fails during symbolic execution, your program should print Fail n on one line. Then the next lines should print out assignments to all symbolic integers used in the program up to that point; these identify the conditions that trigger a failure. The assignment should be expressed as tab-indented lines either of the form x: v or x[j] : v, where x is the assigned-to variable name on the line where the symbolic integer or array was created, j is the array index (for arrays) and v is the value. For example, if the input program is
a := symint; b := symint; assert a < b;the output could be
Fail 2 a : 10 b : 5
You should print out an assignment to all symbolic integers used up to the point of the failure. Since there can be many assertion failures, your program could print many Fail cases. Your symbolic executor should stop executing the current path when it reaches an assertion failure (but will continue exploring other paths).
There are two kinds of implicit assertions: (1) all accesses to an array must be within its bounds, (2) all divisors must be non-zero. For an example of the first case: if arr is an array of size 10, then the expression arr[i] will produce an assertion violation if i is less than 0 or greater than 9. If i is symbolic such that possible values lie outside of the legal range, then an assertion error will signal a failure just as above. For an example of the second case: if i could possibly be 0 then 10 / i will lead to an assertion violation.
Hint: I suggest you proceed in three steps. First, build an interpreter for the three-address code language, not using STP at all. Then, change your implementation to use symbolic integers, but not symbolic arrays, using STP's bitvector type to represent ints. Finally, handle symbolic arrays using STP's support for arrays (along with other metadata you maintain in your interpreter, in particular, the array's length).
Hint: There are many ways to design the symbolic executor. One possibility is to write a function step : stmt list -> state -> state * (state option), where the first argument is the program (you could also use a different representation than the statement list), and state captures the current program counter, path condition, and environment (variable bindings). Then the left output state component contains the new program counter, path condition, and environment after executing a single statement. The right output state component contains the same information for the other path to take, if any.
What to Submit
Submit a complete project that we can compile with make and run. You may add more files to your project.
The Campus Senate has adopted a policy asking students to include the following statement on each assignment in every course: "I pledge on my honor that I have not given or received any unauthorized assistance on this assignment." Consequently your program is requested to contain this pledge in a comment near the top.
Please carefully read the academic honesty section of the course syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, or unauthorized use of computer accounts, will be submitted to the Student Honor Council, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to project assignments. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. Full information is found in the course syllabus---please review it at this time.