Assignment 3: Conditional forms and parsing

7.4

Assignment 3: Conditional forms and parsing

Due: Tues, Sept 29, 11:59PM

The goal of this assignment is to extend a compiler with some simple unary numeric operations and conditional expressions, and to write a parser.

Assignment repository:

https://classroom.github.com/a/YH2mwniz

You are given a repository with a starter compiler for the Con language we studied in class. You are tasked with:

extending the language in a number of ways and
implementing a parser for the language

More primitives

Add the following forms of expression to the Con language:

(abs e): compute the absolute value of e, and
(- e): flips the sign of e, i.e. compute 0-e.

Here’s one possible way to compute the absolute value of the value in rax:

mov rbx, rax

neg rax

cmovl rax, rbx

To do this, you should:

Study ast.rkt and the new forms of expression (i.e. new AST nodes) then update the comment at the top describing what the grammmar should look like.
Study syntax.rkt and make sure you understand ‘expr?‘ and ‘sexpr->ast‘.
Update interp.rkt to correctly interpret these expressions.
Update compile.rkt to correctly compile these expressions.

The neg and cmovl instructions have been included in the given asm code. If you need other x86 instructions, you will need to modify the asm/* code.

Con with Cond

The Con language we studied added a simple form of performing conditional evaluation of sub-expressions:

(if (zero? e0) e1 e2)

However, in the original paper on Lisp, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I, John McCarthy introduced a generalization of if called “conditional expressions,” which we could add to our language with the following syntax:

(cond [(zero? e-p1) e-a1]
...
[else e-an])

A cond expression has any number of clauses [(zero? e-pi) e-ai] ..., followed by an “else” clause [else en]. (We are using zero? here to avoid having to introduce booleans as a distinct data type; a restriction we will remove later.)

The meaning of a cond expression is computed by evaluating each (zero? e-pi) in order until the first one that is true is found, in which case, the corresponding e-ai is evaluated and its value is the value of the cond expression. If no such e-pi exists, e-an’s value is the value of the cond.

The formal semantics can be defined as:

Your task is to extend Con with this (restricted) form of cond.

To do this, you should:

Study ast.rkt to understand the cond-e and clause AST nodes.
Study syntax.rkt and make sure you understand the ‘cond-e‘ and ‘clause‘ aspects of ‘expr?‘ and ‘sexpr->ast‘.
Update interp.rkt to correctly interpret cond expressions.
Update compile.rkt to correctly compile cond expressions.

The subset of x86 needed to compile this extension of Con should not require anything more than what was used for Con without cond, so you should not need make changes to asm/*.

Reading is Overrated

We have so far side-stepped the issue of parsing by (1) relying on s-expression notation for the concrete syntax of programs and (2) using the built-in read to get an s-expression and (3) then using the sexpr->ast function that we’ve written to get our AST.

Your task is to design and implement a parser for the extended Con language based on the following grammar:

<expr> ::= integer

| ( <compound> )

| [ <compound> ]

| if <question> <expr> <expr>

| cond <clause>* <else>

<prim> ::= add1 | sub1 | abs | -

<clause> ::= ( <question> <expr> )

| [ <question> <expr> ]

<question> ::= ( zero? <expr> )

| [ zero? <expr> ]

<else> ::= ( else <expr> )

| [ else <expr> ]

There is a lexer given to you in lex.rkt, which provides two functions: lex-string and lex-port, which consume a string or an input port, respectively, and produce a list of tokens, which are defined as:

; type Token =
; | Integer
; | 'add1
; | 'sub1
; | 'zero?
; | 'if
; | 'cond
; | 'else
; | 'abs
; | '-
; | 'lparen    ;; (
; | 'rparen    ;; )
; | 'lsquare   ;; [
; | 'rsquare   ;; ]
; | 'eof       ;; end of file

The lexer will take care of reading the #lang racket header and remove any whitespace.

You must complete the code in parse.rkt to implement the parser which constructs an s-expression representing a valid (extended) Con expression, if possible, from a list of tokens. The parse function should have the following signature and must be provided by the module:

; parse : [Listof Token] -> Expr

As an example, parse should produce (add1-e (sub1-e (int-e 7))) if given '(lparen add1 lparen sub1 7 rparen rparen eof).

You should not need to make any changes to lex.rkt.

You may use any approach you’d like to write the parser, but following the recursive descent predictive parsing as studied in CMSC 330 is recommended. See the slides if you need a refresher.

If you want to set things up as done in 330, you can do the following:

(define *input* (box '()))

; [Listof Token] -> Expr
(define (parse lot)
  (set-box! *input* lot)
  (let ((e (parse-expr!))
        (_ (match-tok! 'eof)))
    e))

; -> Expr
; EFFECT: consume one expression's worth of tokens
(define (parse-expr!)
  (match (look-ahead)
    [... ...]))

; -> Token
; Produce (but don't consume) the next token
(define (look-ahead)
  (match (unbox *input*)
    ['() (error "no look ahead available")]
    [(cons t _) t]))

; Token -> Token
; EFFECT: consumes one token of input
(define (match-tok! t)
  (match (unbox *input*)
    ['() (error "no token available")]
    [(cons next ts)
     (set-box! *input* ts)
     (unless (equal? t next)
       (error "parse error"))
     t]))

The box, unbox, and set-box! functions correspond to OCaml’s ref, !, and := operators, respectively.

The bang! naming convention is a Scheme convention for marking effectful functions (but it’s just a naming convention).

This construction closely follows the 330 notes.

There is one complication, which is that the grammar requires 2 tokens of look-ahead when parsing a cond in order to determine if the next thing to parse is a <clause> or an <else>.

The simplest solution is just to add a look-ahead2 function that let’s you peek at the second token in the input stream.

As an alternative to the 330 design, you could try to do things functionally with the following set-up:

; [Listof Token] -> Expr
(define (parse lot)
  (match (parse-expr lot)
    [(cons '(eof) e) e]
    [_ (error "parse error")]))

; [Listof Token] -> (Pairof [Listof Token] Expr)
(define (parse-expr lot)
  (match lot
    [... ...]))

Here the idea is that each function that corresponds to a non-terminal is given the list of tokens to parse. It produces a pair of things: the remaining tokens after parsing and the thing it parsed. (The functional approach is much easier to test, IMO.)

Once your parser is complete, you can make the noted changes in compile-file.rkt and interp-file.rkt to make use of your own parser and remove the dependence on Racket’s read function.

Testing

You can test your code in several ways:

Using the command line raco test . from the directory containing the repository to test everything.
Using the command line raco test <file> to test only <file>.
Pushing to github. You can see test reports at:
https://travis-ci.com/cmsc430/
(You will need to be signed in in order see results for your private repo.)

Note that only a small number of tests are given to you, so you should write additional test cases.

Ther random.rkt module provides a random-expr function for generating random (extended) Con expressions. It is used in the test/compile-rand.rkt file to randomly test compiler correctness.

There is a property-based random tester for the compiler in test/compile-rand.rkt that compiles and runs 500 random programs.

Submitting

Pushing your local repository to github “submits” your work. We will grade the latest submission that occurs before the deadline.

← prev up next →

	Syllabus
	Texts
	Schedule
	Notes
	Assignments
	Midterms
	Project
	Racket

	Assignment 1: Learning about Programming Languages
	Assignment 2: Racket Primer
	Assignment 3: Conditional forms and parsing
	Assignment 4: Let there be Variables, Characters
	Assignment 5: A Heap of Characters
	Assignment 6: Apply, arity checking, and variable arity functions