Project 3  CEKaml Interpreter
First download, p3.zip
, which contains cekaml.ml
, the file you will extend and use to implement your project. The math for our CEK machine is here (you should read this after this project description). The project zip also includes:
public_programs.ml
– A set of example programs written in CEKamltest_apply.ml/rb
– Tests applying functionstest_arg.ml/rb
– Tests evaluating argumentstest_call.ml/rb
– Tests calling functionstest_compute.ml/rb
– Tests computing results of whole programstest_final.ml/rb
– Tests thefinal_state
functiontest_fix.ml/rb
– Tests evaluation offix
expressionstest_if.ml/rb
– Tests evaluation ofif
expressionstest_inject.ml/rb
– Tests theinject
functiontest_let.ml/rb
– Tests implementation oflet
test_var.ml/rb
– Tests variable lookupoutputs/*
– Outputs for public testsgoTest.rb
– Run all public tests
Errata

July 3, 2015: Fixed
EBArg
to include environment as well, which must be passed along to evaluate each case for builtins. Added a few more builtins. 
July 7, 2015: Error in the definition of the
fix
rule caused error in public tests. Public test is fixed and change is noted on Piazza.
For the semantics portion of our class, we covered big step and small step semantics. Big step semantics (like the kind we implemented here) use an evaluation relation , which we implement in code as an eval
function. These interpreters are often simple to understand, becuase of their recursive topdown organization.
But while simple, big step interpreters have many shortcomings. For example, they can’t easily interpret large programs, since recursion in the target language will be mirrored in the eval
function. Because of this, we introduced small step semantics, and focused specifically on the CEK machine. A CEK machine evolves in steps, similar to a processor for a computer, pushing around various components to evolve the computation.
In this project you will implement a CEK style interpreter for the langauge CEKaml, a langauge based on OCaml but without types. Specifically, the CEKaml terms are:
 Integral and boolean literals,
1
,false
 Variable references,
x
 Let binding,
let x = t1 in t2
 Lambda abstraction,
fun x > t
 Application,
t1 t2
 Integer and boolean literals,
true
and23
 Polyadic builtin operations, e.g.,
+ 23 1 x
, false
 If/then/else,
if t1 then t2 else t3
 Recursion,
fix f in ...
 Constructors (or variants),
Cons(12,Nil)
 Match statements,
match Ctr1(2) with  Ctr1(x) > x  Ctr2(x,y) > y
In this project, we won’t work with concrete syntax for our programming language, we’ll work with abstract syntax. In other words, you’ll work with an OCaml based representation of the program text, represented as a term
datatype.
Important notes:

In this project you may assume only valid programs (or moves) will be made with your interpreter.

The only things matched by match statements are variants. This is in contrast to OCaml, where you can also match (e.g.,) integers.

For the built in operators, assume that you will always be passed at least one argument. For the case of
not
, you may assume that there is only one argument.
Term representation
Terms are represented in the Syntax
module of the project implementation. This module includes the term datatype and a utility to convert terms into strings string_of_term
.
type builtin =
 Plus
 Times
 And
 Not
 Or
type term =
 Var of var (* Variables *)
 Let of var * term * term (* Let binding let x = ... in *)
 Lam of var * term (* (\x. e) *)
 App of term * term (* e1 e2 *)
 NumLit of int (* literal integers *)
 BoolLit of bool (* literal booleans *)
 Builtin of builtin * term list (* built in operators e1 + e2 *)
 Ifthenels of term * term * term (* if e1 then e2 else e3 *)
 Fix of var * term (* fixpoints: let rec var = ... *)
 Variant of name * term list (* constructors: C(e1,...,en) *)
(* Match patterns.
match t with
 C1(v11,...,v1k) > b1
 ... > ...
 Cn(vn1,...,vnk) > bn
represented as
Match(t,[("C1",["v11",...,"v1k"],b1);
...
("C1",["v11",...,"v1k"],b1)])
*)
 Match of term * ((name * var list * term) list)
and
(* A match pattern is a:
 Constructor name
 Variable list
 Term
When matched, the term will be executed with the variables
substituted into the environment. *)
match_pattern = (name * var list * term)
Interpreters in the CEK style
This week’s class covered small step semantics using the CEK machine. The CEK machine is an abstract machine for efficiently evaluating functional languages, that has three components:
 The control, which tells us what the current instruction is. In our case this is simply a term.
 The environment, which tells us how to look up variables
 The continuation, which tells us where to go next
We went over the details for how the CEK machine worked in class, but I reiterate here the elements germane to this project. Remember that if you get stuck, there are various notes and resources on the course webpage and on Piazza from which to draw inspiration
The CEK machine state () is then a three tuple of those components, . The machine executes by taking multiple steps from an initial state. If the program is e
, then the initial state is
\[
\Sigma_0 = \langle e , [] , Done \rangle
\]
You will implement an OCaml function, inject e
, that performs this operation. After the initial state, the machine steps to new states until there are no more possible moves:
\[
\Sigma_0 \rightarrow \Sigma_1 \rightarrow \cdots \rightarrow \Sigma_f
\]
To detect whether the machine is in a final state, you will implement an OCaml function final_state s
. This function will look at the state and detect whether or not any possible move could be made. This will happen when the continuation is in the Done
state and no more moves can be applied.
To define an interpreter for TypelessCaml, you will implement the step
function for the CEK machine, as we discuss in class. To correctly implement this project, you will have to translate the mathematics in the CEK machine’s definition into OCaml code.
We went over the mathematics for the enhanced CEK machine yesterday in class, but didn’t systematically write them down anywhere.
When you get stuck on the project, and to understand the rules for our enhanced CEK machine, read the guide linked at the top of this description.
Assignment
Implement the following functions in cekaml.ml

inject e
The inject function takes a term and creates an initial state out of it, setting up the control to point at the program, the empty environment, and aDone
continuation. 
final_state s
Detects whether the machine is in a final state or not. The machine is in a final state when it can apply no more steps. 
step s
Implements the stepping behavior of the machine. There are many cases that need to be handled, which will be tested independently: Evaluating a variable
x
by looking it up in the environment. Remember to handle each possible case for machine values (not just closures).  Evaluating an application
t1 t2
by evaluatingt1
and setting up the correct continuation fort2
.  Evaluating an argument
t2
by swappingEArg
and creating aCall
with the proper structure.  Evaluating a call
\x. t
, by bindingx
in the proper environment and stepping into the function body.  Evaluating a let by treating
let x = e1 in e2
in the same way as(\x. e2) e1
 Evaluating a literal (integer or boolean) by correctly making
call
handle it.  Evaluating if correctly by evaluating the guard and then the correct branch.
 Evaluating fix correctly by creating a closure of the current environment and going into the function.
 (Extra credit) Evaluating constructors correctly by evaluating their arguments in order
 (Extra credit) Evaluating match correctly, by evaluating its term, and then choosing the correct case.
 Evaluating a variable

compute s
Usesinject
,final_state
, andstep
, to compute the final value of a computation by applying the step function until a final state is reached. When a final state is reached,compute
gives the value of the resulting term (in the control).
Extra credit (5%)
Implement constructors and match. If you plan to do this, let me know and we can talk about your planned implementation. The implementation is not difficult per se, but has some slight technical complications that we should talk about in person.
Specifically, to evaluate a variant, you need to evaluate its arguments in order: just like for built ins. But to then finish the variant, you can’t take it and step into the assembled variant, because you’d be calling the machine with an mvalue
in the control position. Things wouldn’t quite type check. To handle this, we have to add a special rule for calling functions where the argument is a variant, and also add a special case for if the program is done and computes a variant as its result.
To implement match, you perform a similar sequence of steps. You evaluate the discriminee (thing being matched against), then have a continuation that allows you to choose between a set of constructors. When you finish evaluating the discriminee, you decide which match body to jump into and instantiate the arguments appropriately.
These steps should not be too challenging if you’ve implemented the project, but might be overwhelming to complete if you don’t have a good familarity with how the CEK machine is working. If you’d like to do the extra credit, we can discuss over email, after class, or in office hours.
Tips / Hints

This project shouldn’t consist of all that much code. Each of the machine steps can be written succinctly.

Your step function should be a bunch of cases in a match statement. When this code starts to become too unwieldy, split it into various helper functions.

Debugging your interpreter should be easy if you write a little bit of utility code. Write a function that prints the input state (using the helper function for printing states), runs the step function, and then prints the output state. Then you can figure out what your step function is doing wrong.

In class we presented an implementation of the CEK machine that handled the basic lambda calculus. You should start the project by copying that over to your step function and seeing that it works for the basic lambda calculus.

If you get stuck on what the interpreter should do, look at the math for it in the companion document