First download, p3.zip, which contains cekaml.ml, the file you will extend and use to implement your project. The math for our CEK machine is here (you should read this after this project description). The project zip also includes:

public_programs.ml – A set of example programs written in CEKaml
test_apply.ml/rb – Tests applying functions
test_arg.ml/rb – Tests evaluating arguments
test_call.ml/rb – Tests calling functions
test_compute.ml/rb – Tests computing results of whole programs
test_final.ml/rb – Tests the final_state function
test_fix.ml/rb – Tests evaluation of fix expressions
test_if.ml/rb – Tests evaluation of if expressions
test_inject.ml/rb – Tests the inject function
test_let.ml/rb – Tests implementation of let
test_var.ml/rb – Tests variable lookup
outputs/* – Outputs for public tests
goTest.rb – Run all public tests

Errata

July 3, 2015: Fixed EBArg to include environment as well, which must be passed along to evaluate each case for builtins. Added a few more builtins.
July 7, 2015: Error in the definition of the fix rule caused error in public tests. Public test is fixed and change is noted on Piazza.

For the semantics portion of our class, we covered big step and small step semantics. Big step semantics (like the kind we implemented here) use an evaluation relation $e \Downarrow v$ , which we implement in code as an eval function. These interpreters are often simple to understand, becuase of their recursive top-down organization.

But while simple, big step interpreters have many shortcomings. For example, they can’t easily interpret large programs, since recursion in the target language will be mirrored in the eval function. Because of this, we introduced small step semantics, and focused specifically on the CEK machine. A CEK machine evolves in steps, similar to a processor for a computer, pushing around various components to evolve the computation.

In this project you will implement a CEK style interpreter for the langauge CEKaml, a langauge based on OCaml but without types. Specifically, the CEKaml terms are:

Integral and boolean literals, 1, false
Variable references, x
Let binding, let x = t1 in t2
Lambda abstraction, fun x -> t
Application, t1 t2
Integer and boolean literals, true and 23
Polyadic builtin operations, e.g., + 23 1 x, || false
If/then/else, if t1 then t2 else t3
Recursion, fix f in ...
Constructors (or variants), Cons(12,Nil)
Match statements, match Ctr1(2) with | Ctr1(x) -> x | Ctr2(x,y) -> y

In this project, we won’t work with concrete syntax for our programming language, we’ll work with abstract syntax. In other words, you’ll work with an OCaml based representation of the program text, represented as a term datatype.

Important notes:

In this project you may assume only valid programs (or moves) will be made with your interpreter.
The only things matched by match statements are variants. This is in contrast to OCaml, where you can also match (e.g.,) integers.
For the built in operators, assume that you will always be passed at least one argument. For the case of not, you may assume that there is only one argument.

Term representation

Terms are represented in the Syntax module of the project implementation. This module includes the term datatype and a utility to convert terms into strings string_of_term.

type builtin =
    | Plus
    | Times
    | And
    | Not
    | Or 

  type term =
    | Var of var                          (* Variables                          *)
    | Let of var * term * term            (* Let binding let x = ... in         *)
    | Lam of var * term                   (* (\x. e)                            *)
    | App of term * term                  (* e1 e2                              *)
    | NumLit of int                       (* literal integers                   *)
    | BoolLit of bool                     (* literal booleans                   *)
    | Builtin of builtin * term list      (* built in operators e1 + e2         *)
    | Ifthenels of term * term * term     (* if e1 then e2 else e3              *)
    | Fix of var * term                   (* fixpoints: let rec var = ...       *)
    | Variant of name * term list         (* constructors: C(e1,...,en)         *)
    (* Match patterns. 
       match t with 
         | C1(v11,...,v1k) -> b1
         | ...             -> ...
         | Cn(vn1,...,vnk) -> bn
       represented as 
       Match(t,[("C1",["v11",...,"v1k"],b1);
               ...
               ("C1",["v11",...,"v1k"],b1)])
    *)
    | Match of term * ((name * var list * term) list) 
  and
    (* A match pattern is a:
     - Constructor name
     - Variable list 
     - Term

     When matched, the term will be executed with the variables
     substituted into the environment.  *)
    match_pattern = (name * var list * term)

Interpreters in the CEK style

This week’s class covered small step semantics using the CEK machine. The CEK machine is an abstract machine for efficiently evaluating functional languages, that has three components:

The control, which tells us what the current instruction is. In our case this is simply a term.
The environment, which tells us how to look up variables
The continuation, which tells us where to go next

We went over the details for how the CEK machine worked in class, but I reiterate here the elements germane to this project. Remember that if you get stuck, there are various notes and resources on the course webpage and on Piazza from which to draw inspiration

The CEK machine state ( $\Sigma$ ) is then a three tuple of those components, $\Sigma = C \times E \times K$ . The machine executes by taking multiple steps from an initial state. If the program is e, then the initial state is

\[
\Sigma_0 = \langle e , [] , Done \rangle
\]

You will implement an OCaml function, inject e, that performs this operation. After the initial state, the machine steps to new states until there are no more possible moves:

\[
\Sigma_0 \rightarrow \Sigma_1 \rightarrow \cdots \rightarrow \Sigma_f
\]

To detect whether the machine is in a final state, you will implement an OCaml function final_state s. This function will look at the state and detect whether or not any possible move could be made. This will happen when the continuation is in the Done state and no more moves can be applied.

To define an interpreter for TypelessCaml, you will implement the step function for the CEK machine, as we discuss in class. To correctly implement this project, you will have to translate the mathematics in the CEK machine’s definition into OCaml code.

We went over the mathematics for the enhanced CEK machine yesterday in class, but didn’t systematically write them down anywhere.

When you get stuck on the project, and to understand the rules for our enhanced CEK machine, read the guide linked at the top of this description.

Assignment

Implement the following functions in cekaml.ml

inject e The inject function takes a term and creates an initial state out of it, setting up the control to point at the program, the empty environment, and a Done continuation.
final_state s Detects whether the machine is in a final state or not. The machine is in a final state when it can apply no more steps.
step s Implements the stepping behavior of the machine. There are many cases that need to be handled, which will be tested independently:
- Evaluating a variable x by looking it up in the environment. Remember to handle each possible case for machine values (not just closures).
- Evaluating an application t1 t2 by evaluating t1 and setting up the correct continuation for t2.
- Evaluating an argument t2 by swapping EArg and creating a Call with the proper structure.
- Evaluating a call \x. t, by binding x in the proper environment and stepping into the function body.
- Evaluating a let by treating let x = e1 in e2 in the same way as (\x. e2) e1
- Evaluating a literal (integer or boolean) by correctly making call handle it.
- Evaluating if correctly by evaluating the guard and then the correct branch.
- Evaluating fix correctly by creating a closure of the current environment and going into the function.
- (Extra credit) Evaluating constructors correctly by evaluating their arguments in order
- (Extra credit) Evaluating match correctly, by evaluating its term, and then choosing the correct case.
compute s Uses inject, final_state, and step, to compute the final value of a computation by applying the step function until a final state is reached. When a final state is reached, compute gives the value of the resulting term (in the control).

Extra credit (5%)

Implement constructors and match. If you plan to do this, let me know and we can talk about your planned implementation. The implementation is not difficult per se, but has some slight technical complications that we should talk about in person.

Specifically, to evaluate a variant, you need to evaluate its arguments in order: just like for built ins. But to then finish the variant, you can’t take it and step into the assembled variant, because you’d be calling the machine with an mvalue in the control position. Things wouldn’t quite type check. To handle this, we have to add a special rule for calling functions where the argument is a variant, and also add a special case for if the program is done and computes a variant as its result.

To implement match, you perform a similar sequence of steps. You evaluate the discriminee (thing being matched against), then have a continuation that allows you to choose between a set of constructors. When you finish evaluating the discriminee, you decide which match body to jump into and instantiate the arguments appropriately.

These steps should not be too challenging if you’ve implemented the project, but might be overwhelming to complete if you don’t have a good familarity with how the CEK machine is working. If you’d like to do the extra credit, we can discuss over email, after class, or in office hours.

Tips / Hints

This project shouldn’t consist of all that much code. Each of the machine steps can be written succinctly.
Your step function should be a bunch of cases in a match statement. When this code starts to become too unwieldy, split it into various helper functions.
Debugging your interpreter should be easy if you write a little bit of utility code. Write a function that prints the input state (using the helper function for printing states), runs the step function, and then prints the output state. Then you can figure out what your step function is doing wrong.
In class we presented an implementation of the CEK machine that handled the basic lambda calculus. You should start the project by copying that over to your step function and seeing that it works for the basic lambda calculus.
If you get stuck on what the interpreter should do, look at the math for it in the companion document

CMSC 330, Summer 2015

Organization of Programming Languages

Project 3 - CEKaml Interpreter