On this page:
From Con+   to Fraud+
Adding a Bit of Character
Generalizing Let
Extending your Parser
Testing
Submitting
Extra Credit
7.4

Assignment 4: Let there be Variables, Characters

Due: Thursday, Oct 8th at 11:59PM EST

The goal of this assignment is to extend a compiler with binding forms and a character data type.

Assignment repository:

You are given a repository with a starter compiler similar to the Fraud language we studied in class. You are tasked with:

From Con+ to Fraud+

Implement the abs and unary - operations and the cond form from Con+. You can start from your previous code, but you will need to update it to work for Fraud+.

In particular, functions should signal an error when applied to the wrong type of argument and your cond form should work with arbitrary question expressions. In other words, cond should work like if in Dupe.

The formal semantics of cond are defined as:

image

image

image

The following files have already been updated for you:
  • ast.rkt

  • syntax.rkt

  • interp.rkt

You will need to modify compile.rkt to correctly implement these features.

Adding a Bit of Character

Racket has a Character data type for representing single letters. A Racket character can represent any of the 1,114,112 Unicode code points.

The way a character is most often written is an octothorp, followed by a backslash, followed by the character itself. So for example the character a is written #\a. The character λ is written #\λ. The character is written #\文.

A character can be converted to an integer and vice versa:

Examples

> (char->integer #\a)

97

> (char->integer #\λ)

955

> (char->integer #\文)

25991

> (integer->char 97)

#\a

> (integer->char 955)

#\λ

> (integer->char 25991)

#\文

However, integers in the range of valid code points are acceptable to integer->char and using any other integer will produce an error:

Examples

> (integer->char -1)

integer->char: contract violation

  expected: (and/c (integer-in 0 #x10FFFF) (not/c

(integer-in #xD800 #xDFFF)))

  given: -1

> (integer->char 55296)

integer->char: contract violation

  expected: (and/c (integer-in 0 #x10FFFF) (not/c

(integer-in #xD800 #xDFFF)))

  given: 55296

There are a few other ways to write characters (see the Racket Reference for the details), but you don’t have to worry much about this since the lexer takes care of reading characters in all their different forms and the run-time system given to you takes care of printing them.

Your job is extend the compiler to handle the compilation of characters and implement the operations integer->char, char->integer, and char?. The operations should work as in Racket and should signal an error (i.e. 'err) whenever Racket produces an error. While you’re at it, implement the predicates integer? and boolean?, too.

The following files have already been updated for you:

You will need to modify compile.rkt to correctly implement these features. Note that you must use the same representation of characters as used in the run-time system and should not change main.c.

Generalizing Let

The Fraud language has a let form that binds a single variable in the scope of some expression. This is a restriction of the more general form of let that binds any number of expressions. So for example,

(let ((x 1) (y 2) (z 3))
  e)

simultaneously binds x, y, and z in the scope of e.

The syntax of a let expression allows any number of binders to occur, so (let () e) is valid syntax and is equivalent to e.

The binding of each variable is only in scope in the body, not in the right-hand-sides of any of the let.

For example,

(let ((x 1) (y x)) 0)

is a syntax error because the occurrence of x is not bound.

Update syntax.rkt to define two functions:

Update compile.rkt to correctly compile the generalized form of let. The compiler may assume the input is a closed expression.

Extending your Parser

Extend your Con+ parser for the Fraud+ language based on the following grammar:

<expr> ::= integer

        |  character

        |  boolean

        |  variable

        |  ( <compound> )

        |  [ <compound> ]

 

<compound> ::= <prim> <expr>

            |  if <expr> <expr> <expr>

            |  cond <clause>* <else>

            |  let <bindings> <expr>

 

<prim> ::= add1 | sub1 | abs | - | zero? | integer->char | char->integer

        |  char? | integer? | boolean?

 

<clause> ::= ( <expr> <expr> )

          |  [ <expr> <expr> ]

 

<else> ::= ( else <expr> )

        |  [ else <expr> ]

 

<bindings> ::= ( <binding>* )

            |  [ <binding>* ]

 

<binding> ::= ( variable <expr> )

           |  [ variable <expr> ]

There is a lexer given to you in lex.rkt, which provides two functions: lex-string and lex-port, which consume a string or an input port, respectively, and produce a list of tokens, which are defined as follows:

Note that the Token type has changed slightly from Assignment 3: Conditional forms and parsing: 'add1 is now '(prim add1), 'cond is now '(keyword cond), etc.

; type Token =
; | Integer
; | Char
; | Boolean
; | (variable ,Variable)
; | (keyword ,Keyword)
; | (prim ,Prim)
; | 'lparen    ;; (
; | 'rparen    ;; )
; | 'lsquare   ;; [
; | 'rsquare   ;; ]
; | 'eof       ;; end of file
 
; type Variable = Symbol (other than 'let, 'cond, etc.)
 
; type Keyword =
; | 'let
; | 'let*
; | 'cond
; | 'else
; | 'if
 
; type Prim =
; | 'add1
; | 'sub1
; | 'zero?
; | 'abs
; | '-
; | 'integer->char
; | 'char->integer
; | 'char?
; | 'boolean?
; | 'integer?

The lexer will take care of reading the #lang racket header and remove any whitespace.

You must complete the code in parse.rkt to implement the parser which constructs an AST representing a valid Fraud+ expression, if possible, from a list of tokens. The parse function should have the following signature and must be provided by the module:

; parse : [Listof Token] -> Expr

As an example, parse should produce (prim-e 'add1 (prim-e 'sub1 (int-e 7))) if given

'(lparen (prim add1) lparen (prim sub1) 7 rparen rparen eof)

You should not need to make any changes to lex.rkt.

The given interp-file.rkt and compile-file.rkt code no longer use read, but instead use the parser. This means you neither will work until the parser is complete.

The code you are given includes two(!) implementations of the Con+ parser. One implementation follows the imperative approach; the other follows the functional approach.

You may extend either, or you may throw out the given code and start from the code you wrote previously.

Testing

You can test your code in several ways:

Note that only a small number of tests are given to you, so you should write additional test cases.

We have removed random.rkt and instead provide a random-exprs.rkt module which provides exprs, a list of 500 closed expressions. It is used in the test/compile-rand.rkt file to randomly test compiler correctness. This should help speed up the testing process since the random generation is slow.

Submitting

Pushing your local repository to github “submits” your work. We will grade the latest submission that occurs before the deadline.

Extra Credit

Similar to let there is also let* that also binds any number of expressions. The difference is that previous bindings are available to subsequent bindings. For example,

(let* ((x 1) (y 2) (z (add1 y)))
  e)

binds x to 1, y to 2, and z to 3 in the scope of e.

The syntax of a let* expression allows any number of binders to occur, so (let* () e) is valid syntax and is equivalent to e.

Unlike let,

(let* ((x 1) (y x)) 0)

is not a syntax error.

No files have already been updated for you. You’re on your own for this one. You’ll need to make new AST nodes, change the lexer and parser, extend every function that traverses the AST, expr?, closed?, etc., the interpreter and the compiler. Do you need to alter main.c? You tell me.

If you choose to do this extra credit it will be worth 15 points. There will be no autograder, so if you want to points you have to schedule time with the instructor to look over your solution. You need to schedule the time before the deadline, but the meeting can happen after the deadline.