# Project 5 - Scheme Parser & Interpreter

Due Sat, Nov 14
11:59:59pm

## Introduction

In this project, you will write an interpreter for a tiny subset of the Scheme programming language. As part of the project, you will also write a parser that translates a plain text Scheme program into an abstract syntax tree (AST). You will then write an evaluator that executes the code represented as an AST.

In order to test your project you may want access to a Scheme interpreter. If you log in to linuxlab, you use the command guile to launch the top level of one implementation of Scheme. On Windows and MacOS machines you can use DrScheme, a part of PLT Scheme (select Languages->Choose Languages->Teaching Languages->Advanced Student). If you want to find out more about Scheme, you can download the Revised6 Report on the Algorithmic Language Scheme. You may also try Teach Yourself Scheme in Fixnum Days.

For purposes of this project, we will only test your interpreter with valid input. Thus your code may do whatever you want on a bad input. We do, however, recommend adding reasonable error handling code to your project for cases of malformed or otherwise incorrect Scheme input, because it will make developing your project easier. As you are testing your program, you may inadvertently create incorrect input data; substantial time may be lost in trying to debug the program, only to find a few mistyped characters in your input data are the source of the problem.

Also, even though you need all of these pieces written to have a "complete" Scheme interpreter, we've structured the project so that you can work on any of the separate phases independently and test them on their own. So if you get stuck on an earlier part of the project, you can still move on to a later part.

## Getting Started

Along with files used to make direct submissions to the submit server (submit.jar, .submit, submit.rb), you will find the following project files:

## Part 1: Simple Scheme Programming

Put your solution to this part in the file basic.scm.

Implement the following functions in regular Scheme. All of these functions should operate over integers and/or lists of integers. (These should look pretty familiar.) The purpose of this part is just to make sure you understand Scheme before you try to start writing an interpreter for it.

1. Write a function double x that returns two times x.
2. Write a function powof2 x that returns true (which is written #t in Scheme) if and only if x is a power of 2. (Hint: Use the functions / and modulo.)
3. Write a function sum l that returns the sum of the integer list l, using recursion. Hint: You can use the function pair? to determine whether a value is a cons cell or not; and in this problem, you can assume if something is not a cons cell, then it's '().
4. Write a function applyToList f l that returns a new list containing all the elements of l, in the same order, but with f applied to them. Your implementation should use recursion. You may not use Scheme's built-in map or for-each functions in writing your map function. Note applyToList directly takes 2 arguments (i.e., it is not using currying).
Warning: tools such as DrScheme will insert extra lines of configuration data at the beginning of basic.scm. Be sure to eliminate those lines before submitting your solution to the submit server, or else they may cause guile (the Scheme interpreter on the submit server) to fail. You may need to use a normal text editor since DrScheme will not display these lines.

## Part 2: Parsing Scheme

Put your solution to this part in the file scheme.ml.

Your next task is to write a parser, which in this case will be a function that turns a string into a Scheme abstract syntax tree (AST). There are actually two parts to this: building a lexer to translate a sequence of characters into a sequence of tokens, which are smaller strings that form the terminals for the parser; and writing the actual parser itself, to turn the tokens into an AST.

The grammar for Scheme is particularly simple:

S ::= id | n | b | str | ( L )
L ::= S L | ε
id ::= Scheme identifiers
n ::= Integers (made up of digits)
b ::= #t | #f
str ::= Strings (begin and end with quotes)

Integers and strings are distinguished from identifiers by their first character.

To make the project a bit simpler, we've supplied you with a function tokenize : string -> string list that acts as a lexer. Tokens may be any of the following:

• ( and ) are tokens.
• Scheme identifiers are tokens. Identifiers may contain upper- and lower-case letters, digits, =, *, +, /, < >, !, ?, and -. For example, valid identifiers are Foo, set!, <three, and +=. Identifiers may not begin with a digit.
• #t and #f are tokens
• Scheme strings are tokens. For purposes of this project, only alphanumeric and whitespace characters can appear within a string.
For example, when called as tokenize "(foo (bar 3) 4 \"baz\")", the return value is
["("; "foo"; "("; "bar"; "3"; ")"; "4"; "\"baz\""; ")"]

You must write a function parse : string list -> ast that takes as input a list of tokens (returned from tokenize) and returns an AST, given by the following datatype:

   type ast =
Id of string
| Num of int
| Bool of bool
| String of string
| List of ast list

For example, the Scheme program (foo (bar 3) 4 "baz") represented by the AST List [Id "foo"; (List [Id "bar"; Num 3]); Num 4; String "baz"]. Your AST nodes for strings should \textit{not} include quotes. For example, parse ["\"a\""] should return String "a", i.e., a String constructed from a string with one character, a.

You should use the idea of a recursive descent parser, as we discussed in class. Thus we suggest you write two functions: parse_sexpr, which parses the non-terminal S, and parse_list, which parses the non-terminal L representing a list of S-expressions.

## Part 3: The Interpreter

Put your solution to this part in the file scheme.ml also.

Your last task is to write a Scheme evaluator that, given an AST, executes the S-expression corresponding to that AST. We will describe the subset of the language you should handle below. You must write a function eval : ast -> value that, given an AST, produces a value representing the result of evaluating that AST. The type value should be of the following form:

type value =
Val_Num of int
| Val_Bool of bool
| Val_String of string
| Val_Nil
| Val_Cons of value * value
...

where the part labeled ... is for you to fill in. Hence you may extend this type with new constructors. However, you must not change the part of value we have given you, because our grading scripts will look for exactly those constructors, with exactly those arguments as given, to test your interpreter.

As one example, eval (Num 3) should return Val_Num 3, meaning that an AST node containing the integer 3 evaluates to the integer 3. When doing this project, be sure to keep straight the difference between textual entities that the programmer has written down (like the text "3") with the resulting value that your interpreter produces (Val_Num 3).

In class, we will give you a precise definition of the core operational semantics of Scheme, which you should follow in doing your project. Here are the language features your interpreter should support:

• define at the top level. Your evaluation function must maintain a top-level environment containing the values of variables that have been defined. This top-level environment should persist from one evaluation call to another. For example, the following sequence of calls to your eval function should return 3:
  eval (List [Id "define"; Id "x"; Num 3]);
eval (Id "x")

You should allow define to bind variables to any possible Scheme values, including closures (see below). A define expression itself evaluates to nil. You do not need to handle cases of define within the body of a function; we will not test your interpreter with any such examples. You also do not need to handle cases where we redefine primitive operators. Remember that the Scheme top-level is dynamically scoped, and so values may be redefined at any time, even with different types. When a defined value is looked up, we return its latest definition. Hint: Since the top-level bindings will persist across calls to eval, you should probably store them in a ref.
• Values Your evaluation function should evaluate integers, booleans, and strings to the corresponding values.
• if You should allow both (if cond tr fl), which evaluates to tr if cond is true and fl otherwise, and (if cond tr), which evaluates to Val_Nil if cond is false.
• lambda You should support creating anonymous functions of one argument. We will not test your code with functions of any other arity. You must support closures.
• identifiers When presented with an identifier, your interpreter needs to look it up. The first place it should look it up is in the "local" environment, consisting of parameters of enclosing lambdas. This is the environment shown in the semantics lecture. When looking up an identifier, if it is not in that environment, then you should look it up in the top-level environment (the one that define updates). If an identifier is in neither environment, then it is unbound, and trying to evaluate it should produce an error.
• Function calls You should support calling user-defined functions of one argument. According to the previous requirement, user-defined functions will be represented as closures. Your implementation must be call-by-value, so that arguments are evaluated before they are passed to a closure. For example, consider the following sequence of calls to eval:
  let t = List
[Id "define"; Id "next";
List [Id "lambda"; List [Id "x"]; List [Id "+"; Id "x"; Num 1]]];;
eval t;;
eval (List [Id "next"; Num 3]);;

Here we define t to be the abstract syntax tree corresponding to the Scheme code (define next (lambda (x) (+ x 1))). In evaluating t, we first evaluate the lambda to produce a closure. Then we bind next to the closure. The last evaluation line corresponds to the Scheme code (next 3). Thus we look up next in the top-level environment to return the closure. We evaluate 3, which produces Val_Num 3. Then we apply the closure to the argument to produce 4.
• Primitives You should support the following built-in functions; some of these functions can take more than one argument, but you don't have to implement them with currying---you can just make these special cases inside your evaluator.
• +, -, *, and = on integers. The + operator sums its arguments, and should accept one or more arguments. The - may take any number of arguments, and it subtracts its 2nd through last argument from its first argument; given only one argument, it compute unary negation. For example, (- 3) evaluates to Val_Num (-3), while (- 4 3) evaluates to Val_Num 1 and (- 4 3 1) evaluates to Val_Num 0. The * operator multiplies its arguments, and should accept one or more arguments. The = operator compares its two arguments and evaluates to either Val_Bool true or Val_Bool false.
• nil. For this project, we will vary slightly from Scheme and make nil the built-in keyword for the empty list. Thus nil should evaluate to Val_nil. In actual Scheme, nil is written '().
• cons, car, and cdr. Here (cons x y) should evaluate to the cons cell Val_Cons(x', y'), where x' and y' are whatever x and y evaluate to, respectively. When applied to a cons cell Val_Cons(x', y'), car returns x' and cdr returns y'.
• boolean?, number?, string?, and pair?. These return true if their single argument is a boolean, integer, string, or cons cell, respectively, and false otherwise.
• display. Prints its single argument, which is a string. Do not add a trailing newline after the string. Do not include any quotes around the string when it is printed.

In order to support closures and primitives, you will probably want to extend the type value with two new constructors; alternately, you can add a single new constructor for closures and put the primitive handling code as special cases in your evaluation function.

To make it easier to test your system, we've provided you with a driver file main.ml that reads lines of text up to ;; and passes their concatenation through your lexer, parser, and evaluator, and finally outputs the result. We've included a file sample.output containing a transcript of a session with our solution to this project. Note that we will test your project by calling your lexing, parsing, and evaluation functions directly, so be sure to give those functions the types we expect above.

## Submission

All your code should be in two files, basic.scm & scheme.ml (though if you are using DrScheme, you may submit your Scheme code in basic.ss instead). You can submit your project in two ways:
• Submit your basic.scm & scheme.ml files directly to the submit server. You'll need to put both files in a .zip archive first. On Windows you can select the two files, then right click to select the "Send to->Compressed (zipped) Folder" option to create a .zip archive.

Once your files are in a single zip archive, submit the archive by clicking on the submit link in the column "web submission".

Select your file using the "Browse" button, then press the "Submit project!" button.

• You may also submit directly by executing a Java program on a computer with Java and network access. Use the submit.jar file from the archive p5.zip, To submit, go to the directory containing your project, then either execute submit.rb or type the following command directly:

java -jar submit.jar

You will be asked to enter your class account and password, then all files in the directory (and its subdirectories) will be put in a jar file and submitted to the submit server. If your submission is successful you will see the message:

Successful submission # received for project 5