On this page:
More primitives
Con with Cond
Reading is Overrated
Testing
Submitting
7.4

Assignment 3: Conditional forms and parsing

Due: Tues, Sept 29, 11:59PM

The goal of this assignment is to extend a compiler with some simple unary numeric operations and conditional expressions, and to write a parser.

Assignment repository:

You are given a repository with a starter compiler for the Con language we studied in class. You are tasked with:

More primitives

Add the following forms of expression to the Con language:

Here’s one possible way to compute the absolute value of the value in rax:

mov rbx, rax

neg rax

cmovl rax, rbx

To do this, you should:
  • Study ast.rkt and the new forms of expression (i.e. new AST nodes) then update the comment at the top describing what the grammmar should look like.

  • Study syntax.rkt and make sure you understand ‘expr?‘ and ‘sexpr->ast‘.

  • Update interp.rkt to correctly interpret these expressions.

  • Update compile.rkt to correctly compile these expressions.

The neg and cmovl instructions have been included in the given asm code. If you need other x86 instructions, you will need to modify the asm/* code.

Con with Cond

The Con language we studied added a simple form of performing conditional evaluation of sub-expressions:

(if (zero? e0) e1 e2)

However, in the original paper on Lisp, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I, John McCarthy introduced a generalization of if called “conditional expressions,” which we could add to our language with the following syntax:

(cond [(zero? e-p1) e-a1]
      ...
      [else e-an])

A cond expression has any number of clauses [(zero? e-pi) e-ai] ..., followed by an “else” clause [else en]. (We are using zero? here to avoid having to introduce booleans as a distinct data type; a restriction we will remove later.)

The meaning of a cond expression is computed by evaluating each (zero? e-pi) in order until the first one that is true is found, in which case, the corresponding e-ai is evaluated and its value is the value of the cond expression. If no such e-pi exists, e-an’s value is the value of the cond.

The formal semantics can be defined as:

image

image

Your task is to extend Con with this (restricted) form of cond.

To do this, you should:

The subset of x86 needed to compile this extension of Con should not require anything more than what was used for Con without cond, so you should not need make changes to asm/*.

Reading is Overrated

We have so far side-stepped the issue of parsing by (1) relying on s-expression notation for the concrete syntax of programs and (2) using the built-in read to get an s-expression and (3) then using the sexpr->ast function that we’ve written to get our AST.

Your task is to design and implement a parser for the extended Con language based on the following grammar:

<expr> ::= integer

        |  ( <compound> )

        |  [ <compound> ]

 

<compound> ::= <prim> <expr>

            |  if <question> <expr> <expr>

            |  cond <clause>* <else>

 

<prim> ::= add1 | sub1 | abs | -

 

<clause> ::= ( <question> <expr> )

          |  [ <question> <expr> ]

 

<question> ::= ( zero? <expr> )

            |  [ zero? <expr> ]

 

<else> ::= ( else <expr> )

        |  [ else <expr> ]

There is a lexer given to you in lex.rkt, which provides two functions: lex-string and lex-port, which consume a string or an input port, respectively, and produce a list of tokens, which are defined as:

; type Token =
; | Integer
; | 'add1
; | 'sub1
; | 'zero?
; | 'if
; | 'cond
; | 'else
; | 'abs
; | '-
; | 'lparen    ;; (
; | 'rparen    ;; )
; | 'lsquare   ;; [
; | 'rsquare   ;; ]
; | 'eof       ;; end of file

The lexer will take care of reading the #lang racket header and remove any whitespace.

You must complete the code in parse.rkt to implement the parser which constructs an s-expression representing a valid (extended) Con expression, if possible, from a list of tokens. The parse function should have the following signature and must be provided by the module:

; parse : [Listof Token] -> Expr

As an example, parse should produce (add1-e (sub1-e (int-e 7))) if given '(lparen add1 lparen sub1 7 rparen rparen eof).

You should not need to make any changes to lex.rkt.

You may use any approach you’d like to write the parser, but following the recursive descent predictive parsing as studied in CMSC 330 is recommended. See the slides if you need a refresher.

If you want to set things up as done in 330, you can do the following:

(define *input* (box '()))
 
; [Listof Token] -> Expr
(define (parse lot)
  (set-box! *input* lot)
  (let ((e (parse-expr!))
        (_ (match-tok! 'eof)))
    e))
 
; -> Expr
; EFFECT: consume one expression's worth of tokens
(define (parse-expr!)
  (match (look-ahead)
    [... ...]))
 
; -> Token
; Produce (but don't consume) the next token
(define (look-ahead)
  (match (unbox *input*)
    ['() (error "no look ahead available")]
    [(cons t _) t]))
 
; Token -> Token
; EFFECT: consumes one token of input
(define (match-tok! t)
  (match (unbox *input*)
    ['() (error "no token available")]
    [(cons next ts)
     (set-box! *input* ts)
     (unless (equal? t next)
       (error "parse error"))
     t]))

The box, unbox, and set-box! functions correspond to OCaml’s ref, !, and := operators, respectively.

The bang! naming convention is a Scheme convention for marking effectful functions (but it’s just a naming convention).

This construction closely follows the 330 notes.

There is one complication, which is that the grammar requires 2 tokens of look-ahead when parsing a cond in order to determine if the next thing to parse is a <clause> or an <else>.

The simplest solution is just to add a look-ahead2 function that let’s you peek at the second token in the input stream.

As an alternative to the 330 design, you could try to do things functionally with the following set-up:

; [Listof Token] -> Expr
(define (parse lot)
  (match (parse-expr lot)
    [(cons '(eof) e) e]
    [_ (error "parse error")]))
 
; [Listof Token] -> (Pairof [Listof Token] Expr)
(define (parse-expr lot)
  (match lot
    [... ...]))

Here the idea is that each function that corresponds to a non-terminal is given the list of tokens to parse. It produces a pair of things: the remaining tokens after parsing and the thing it parsed. (The functional approach is much easier to test, IMO.)

Once your parser is complete, you can make the noted changes in compile-file.rkt and interp-file.rkt to make use of your own parser and remove the dependence on Racket’s read function.

Testing

You can test your code in several ways:

Note that only a small number of tests are given to you, so you should write additional test cases.

Ther random.rkt module provides a random-expr function for generating random (extended) Con expressions. It is used in the test/compile-rand.rkt file to randomly test compiler correctness.

There is a property-based random tester for the compiler in test/compile-rand.rkt that compiles and runs 500 random programs.

Submitting

Pushing your local repository to github “submits” your work. We will grade the latest submission that occurs before the deadline.