11 Fraud: local binding, variables, and binary operations

We adopt this specialization of Racket’s let syntax so that you can always take a Fraud program and run it in Racket to confirm what it should produce.

Adding a notion of variable binding also means we need to add variables to the syntax of expressions.

Together this leads to the following grammar for Fraud:

Which can be modeled with the following data type definition:

fraud/ast.rkt
#lang racket
;; type Expr =
;; ...
;; | (Let Id Expr Expr)
;; | (Var Id)
;; type Id = Symbol
...
(struct Let (x e1 e2) #:prefab)
(struct Var (x) #:prefab)

Now for binary operations...

You may have noticed that up until this point, evaluating compound expressions in our language always depend upon the result of a single subexpression. For example, (add1 e) depends upon the result of e, (zero? e) depends upon e, and so on. Even expressions that involve multiple subexpressions such as (if e0 e1 e2) really only depend on e0 to determine which of e1 or e2 to evaluate. And in the case of (begin e0 e1), first we determine the value of e0 and then e1, but these values are never combined in any way.

Let’s now consider what happens when we have multiple subexpressions whose results must be combined in order to evaluate an expression. As an example, consider (+ e0 e1). We must evaluate both e0 and e1 and sum their results.

What’s new are the following binary operations:

(+ e0 e1)
(- e0 e1)

This leads to the following revised grammar for Fraud:

We can model it as a datatype as usual:

fraud/ast.rkt
#lang racket
;; type Expr =
;; ...
;; | (Prim2 Op2 Expr Expr)
;; type Op2 = '+ | '-
...
(struct Prim2 (p e1 e2) #:prefab)

11.2 Meaning of Fraud programs

The meaning of Fraud programs depends on the form of the expression and in the case of integers, increments, and decrements, the meaning is the same as in the prior languages.

The two new forms are let-expressions and variables.

the meaning of a let expression (let ((x e0)) e) is the meaning of e (the body of the let) when x means the value of e0 (the right hand side of the let),
the meaning of a variable x depends on the context in which it is bound. It means the value of the right-hand side of the nearest enclosing let expression that binds x. If there is no such enclosing let expression, the variable is meaningless.

Let’s consider some examples:

x: this expression is meaningless on its own.
(let ((x 7)) x): this means 7, since the body expression, x, means 7 because the nearest enclosing binding for x is to 7, which means 7.
(let ((x 7)) 2): this means 2 since the body expression, 2, means 2.
(let ((x 7)) (add1 x)): this means 8 since the body expression, (add1 x), means one more than x and x means 7 because the nearest enclosing binding for x is to 7.
(let ((x (add1 7))) x): this means 8 since the body expression, x, means 8 because the nearest enclosing binding for x is to (add1 7) which means 8.
(let ((x 7)) (let ((y 2)) x)): this means 7 since the body expression, (let ((y 2)) x), means 2 since the body expression, x, means 7 since the nearest enclosing binding for x is to 7.
(let ((x 7)) (let ((x 2)) x)): this means 2 since the body expression, (let ((x 2)) x), means 2 since the body expression, x, means 2 since the nearest enclosing binding for x is to 2.
(let ((x (add1 x))) x): this is meaningless, since the right-hand side expression, (add1 x) is meaningless because x has no enclosing let that binds it.
(let ((x 7)) (let ((x (add1 x))) x)): this means 8 because the body expression (let ((x (add1 x))) x) means 8 because the body expression, x, is bound to (add1 x) is in the nearest enclosing let expression that binds x and (add1 x) means 8 because it is one more than x where x is bound to 7 in the nearest enclosing let that binds it.

Make sure you have a good understanding of how binding work in these examples before moving on. Remember: you can always check your understanding by pasting expressions into Racket and seeing what it produces, or better yet, write examples in DrRacket and hover over identifiers to see arrows between variable bindings and their occurrences.

One thing that should be clear from these examples is that the meaning of a sub-expression is not determined by the form of that expression alone. For example, x could mean 7, or it could mean 8, or it could be meaningless, or it could mean 22, etc. It depends on the context in which it occurs. So in formulating the meaning of an expression, this context must be taken into account.

Thinking more about what information we need to keep track of reveals that when considering the meaning of a let’s body, we need to know that the variable it’s binding means the value of the right-hand expression. Since a program potentially consists of nested let expressions, we will need to keep track of some number of pairs of variables and their meaning. We will refer to this contextual information as an environment.

To keep things simple, we omit the treatment of IO in the semantics, but it’s easy enough to incorporate back in if desired following the template of Evildoer: change the world a couple nibbles at a time.

The meaning of a variable is resolved by looking up its meaning in the environment. The meaning of a let will depend on the meaning of its body with an extended environment that associates its variable binding to the value of the right hand side.

The heart of the semantics is an auxiliary relation, , which relates an expression and an environement to the integer the expression evaluates to (in the given environment):

The rules for dealing with the new forms (variables and lets) are:

These rely on two functions: one for extending an environment with a variable binding and one for lookup up a variable binding in an environment:

The remaining rules are just an adaptation of the existing rules from Extort to thread the environment through. For example, here are just a couple:

And rules for propagating errors through let:

The operational semantics for Fraud is then defined as a binary relation , which says that (e,i) in , only when e evaluates to i in the empty environment according to :

With the semantics of let and variables out of the way, extending the Fraud semantics to hand binary operations is pretty straightforward. For (+ e0 e1), the meaning is the sum of the meanings of e0 and e1, when they mean integers, otherwise the meaning is an error.

The handling of primitives occurs in the following rule:

It makes use of an auxiliary judgment for interpreting primitives:

The interpreter closely mirrors the semantics. The top-level interp function relies on a helper function interp-env that takes an expression and environment and computes the result. It is defined by structural recursion on the expression. Environments are represented as lists of associations between variables and integers. There are two helper functions for ext and lookup:

fraud/interp.rkt
  #lang racket
  (provide interp interp-env)
  (require "ast.rkt" "interp-prim.rkt")

  ;; type Answer = Value | 'err

  ;; type Value =
  ;; | Integer
  ;; | Boolean
  ;; | Character
  ;; | Eof
  ;; | Void

  ;; type REnv = (Listof (List Id Value))

  ;; Expr -> Answer
  (define (interp e)
    (interp-env e '()))

  ;; Expr Env -> Answer
  (define (interp-env e r)
    (match e
      [(Int i) i]
      [(Bool b) b]
      [(Char c) c]
      [(Eof) eof]
      [(Var x) (lookup r x)]
      [(Prim0 p) (interp-prim0 p)]
      [(Prim1 p e)
       (match (interp-env e r)
         ['err 'err]
         [v (interp-prim1 p v)])]
      [(Prim2 p e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v1 (match (interp-env e2 r)
               ['err 'err]
               [v2 (interp-prim2 p v1 v2)])])]
      [(If p e1 e2)
       (match (interp-env p r)
         ['err 'err]
         [v
          (if v
              (interp-env e1 r)
              (interp-env e2 r))])]
      [(Begin e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v    (interp-env e2 r)])]
      [(Let x e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v (interp-env e2 (ext r x v))])]))

  ;; Env Id -> Value
  (define (lookup r x)
    (match r
      [(cons (list y val) r)
       (if (symbol=? x y)
           val
           (lookup r x))]))

  ;; Env Id Value -> Env
  (define (ext r v val)
    (cons (list v val) r))

We can confirm the interpreter computes the right result for the examples given earlier:

Examples
> (interp (parse '(let ((x 7)) x)))
7
> (interp (parse '(let ((x 7)) 2)))
2
> (interp (parse '(let ((x 7)) (add1 x))))
8
> (interp (parse '(let ((x (add1 7))) x)))
8
> (interp (parse '(let ((x 7)) (let ((y 2)) x))))
7
> (interp (parse '(let ((x 7)) (let ((x 2)) x))))
2
> (interp (parse '(let ((x 7)) (let ((x (add1 x))) x))))
8

We can see that it works as expected:

Examples
> (interp (parse '(+ 3 4)))
7
> (interp (parse '(+ 3 (+ 2 2))))
7
> (interp (parse '(+ #f 8)))
'err

Interpreter Correctness: For all Fraud expressions e and values v, if (e,v) in , then (interp e) equals v.

11.3 Lexical Addressing

Just as we did with Dupe: a duplicity of types, the best way of understanding the forthcoming compiler is to write a “low-level” interpreter that explains some of the ideas used in the compiler without getting bogged down in code generation details.

At first glance at interp, it would seem we will need to generate code for implementing the REnv data structure and its associated operations: lookup and ext. REnv is an inductively defined data type, represented in the interpreter as a list of lists. Interpreting a variable involves recursively scanning the environment until the appropriate binding is found. This would take some doing to accomplish in x86.

However...

It is worth noting some invariants about environments created during the running of interp. Consider some subexpression e of the program. What environment will be used whenever e is interpreted? Well, it will be a mapping of every variable bound outside of e. It’s not so easy to figure out what these variables will be bound to, but the skeleton of the environment can be read off from the program structure.

For example:

(let ((x ...))
  (let ((y ...))
    (let ((z ...))
      e)))

The subexpression e will always be evaluated in an environment that looks like:

'((z ...) (y ...) (x ...))

Moreover, every free occurrence of x in e will resolve to the value in the third element of the environment; every free occurrence of y in e will resolve to the second element; etc.

This suggests that variable locations can be resolved statically using lexical addresses. The lexical address of a variable is the number of let-bindings between the variable occurrence and the let that binds it.

So for example:

(let ((x ...)) x): the occurrence of x has a lexical address of 0; there are no bindings between the let that binds x and its occurrence.
(let ((x ...)) (let ((y ...)) x)): the occurrence of x has a lexical address of 1 since the let-binding of y sits between the let-binding of x and its occurrence.

We can view a variable name as just a placeholder for the lexical address; it tells us which binder the variable comes from.

Using this idea, let’s build an alternative interpreter that operates over an intermediate form of expression that has no notion of variables, but just lexical addresses:

; type IExpr =
; | (Eof)
; | (Int Integer)
; | (Bool Boolean)
; | (Char Character)
; | (Prim0 Op0)
; | (Prim1 Op1 IExpr)
; | (Prim2 Op2 IExpr IExpr)
; | (If IExpr IExpr IExpr)
; | (Begin IExpr IExpr)
; | (Let '_ IExpr IExpr)
; | (Var Addr)
; type Addr = Natural

Notice that variables have gone away, replaced by a (Var Natural) form. The let binding form no longer binds a variable name either.

The idea is that we will translate expression (Expr) like:

(Let 'x (Int 7) (Var 'x))

into intermediate expressions (IExpr) like:

(Let '_ (Int 7) (Var 0))

And:

(Let 'x (Int 7) (Let 'y (Int 9) (Var 'x)))

into:

(Let '_ (Int 7) (Let '_ (Int 9) (Var 1)))

Similar to how interp is defined in terms of a helper function that takes an environment mapping variables to their value, the translate function will be defined in terms of a helper function that takes an environment mapping variables to their lexical address.

The lexical environment will just be a list of variable names. The address of a variable occurrence is count of variable names that occur before it in the list. When a variable is bound (via-let) the list grows:

fraud/translate.rkt
  #lang racket
  (provide translate)
  (require "ast.rkt")

  ;; type IExpr =
  ;; | (Eof)
  ;; | (Int Integer)
  ;; | (Bool Boolean)
  ;; | (Char Character)
  ;; | (Prim0 Op0)
  ;; | (Prim1 Op1 IExpr)
  ;; | (Prim2 Op2 IExpr IExpr)
  ;; | (If IExpr IExpr IExpr)
  ;; | (Begin IExpr IExpr)
  ;; | (Let '_ IExpr IExpr)
  ;; | (Var Addr)
  ;; type Addr = Natural

  ;; type LEnv = (Listof Id)

  ;; Expr -> IExpr
  (define (translate e)
    (translate-e e '()))

  ;; Expr LEnv -> IExpr
  (define (translate-e e r)
    (match e
      [(Eof)    e]
      [(Int i)  e]
      [(Bool b) e]
      [(Char c) e]
      [(Prim0 p) e]
      [(Prim1 p e0)
       (Prim1 p (translate-e e0 r))]
      [(Prim2 p e0 e1)
       (Prim2 p
              (translate-e e0 r)
              (translate-e e1 r))]
      [(If e0 e1 e2)
       (If (translate-e e0 r)
           (translate-e e1 r)
           (translate-e e2 r))]
      [(Begin e0 e1)
       (Begin (translate-e e0 r)
              (translate-e e1 r))]
      [(Let x e0 e1)
       (Let '_
            (translate-e e0 r)
            (translate-e e1 (cons x r)))]
      [(Var x)
       (Var (lexical-address x r))]))

  ;; Id LEnv -> Addr
  (define (lexical-address x r)
    (match r
      [(cons y r)
       (match (symbol=? x y)
         [#t 0]
         [#f (add1 (lexical-address x r))])]))

Notice that translate is a kind of mini-compiler that compiles Exprs to IExprs. It’s only job is to eliminate variable names by replacing variable occurrences with their lexical addresses. It does a minor amount of syntax checking while it’s at it by raising a (compile-time) error in the case of unbound variables.

We can try out some examples to confirm it works as expected.

Examples
> (translate (Let 'x (Int 7) (Var 'x)))
'#s(Let _ #s(Int 7) #s(Var 0))
> (translate (Let 'x (Int 7) (Let 'y (Int 9) (Var 'x))))
'#s(Let _ #s(Int 7) #s(Let _ #s(Int 9) #s(Var 1)))

The interpreter for IExprs will still have an environment data structure, however it will be simpler than the association list we started with. The run-time environment will consist only of a list of values; the lexical address of (what used to be a) variable indicates the position in this list. When a value is bound by a let, the list grows:

fraud/interp-lexical.rkt
  #lang racket
  (provide (all-defined-out))
  (require "ast.rkt" "translate.rkt" "interp-prim.rkt")

  ;; type VEnv = (Listof Value)

  ;; Expr -> Answer
  (define (interp e)
    (interp-env (translate e) '()))

  ;; Expr VEnv -> Answer
  (define (interp-env e r)
    (match e
      [(Int i) i]
      [(Bool b) b]
      [(Char c) c]
      [(Eof) eof]
      [(Var a) (list-ref r a)]
      [(Prim0 p) (interp-prim0 p)]
      [(Prim1 p e)
       (match (interp-env e r)
         ['err 'err]
         [v (interp-prim1 p v)])]
      [(Prim2 p e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v1 (match (interp-env e2 r)
               ['err 'err]
               [v2 (interp-prim2 p v1 v2)])])]
      [(If p e1 e2)
       (match (interp-env p r)
         ['err 'err]
         [v
          (if v
              (interp-env e1 r)
              (interp-env e2 r))])]
      [(Begin e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v    (interp-env e2 r)])]
      [(Let '_ e1 e2)
       (match (interp-env e1 r)
         ['err 'err]
         [v (interp-env e2 (cons v r))])]))

Try to convince yourself that the two version of interp compute the same function.

11.4 Compiling lets and variables

Suppose we want to compile (let ((x 7)) (add1 x)). There are two new forms we need to compile: the (let ((x ...)) ...) part and the x part in the body.

We already know how to compile the (add1 ...) part and the 7 part.

What needs to happen? Compiling the 7 part will emit instructions that, when run, leave 7 in the 'rax register. Compiling the (add1 ...) part relies on the result of evaluating it’s subexpression to be in 'rax when it increments it. So, compile the variable binding needs to stash the 7 somewhere and compiling the variable occurrence needs to retrieve that stashed value. After the let expression has been run, the stashed value should go away since the variable is no longer in scope.

This “stashing” of values follows a stack discipline. When entering a let, after the right-hand side has been run, the result should be pushed. When evaluating a variable occurrence, the bound value is on the stack. After exiting the let, the stack can be popped.

Suppose we want to compile (let ((x 7)) (let ((y 2)) (add1 x))). Using the intuition developed so far, we should push 7, push 8, and then run the body. But notice that the value of 'x is no longer on the top of the stack; y is. So to retrieve the value of x we need skip past the y. But calculating these offsets is pretty straightforward. In this example there is one binding between the binding of x and this occurrence. Since we push every time we enter a let and pop every time we leave, the number of bindings between an occurrence and its binder is exactly the offset from the top of the stack we need use; in other words, the compiler uses lexical addresses just like the alternative interperter above and the stack of values plays the role of the run-time envivornment.

Here are the relevant parts of the compiler:

fraud/compile.rkt
;; Expr CEnv -> Asm
(define (compile-e e c)
  (match e
    ; ...
    [(Var x)         (compile-variable x c)]
    [(Let x e1 e2)   (compile-let x e1 e2 c)]))

;; Id CEnv -> Asm
(define (compile-variable x c)
  (let ((i (lookup x c)))
    (seq (Mov rax (Offset rsp i)))))

;; Id Expr Expr CEnv -> Asm
(define (compile-let x e1 e2 c)
  (seq (compile-e e1 c)
       (Push rax)
       (compile-e e2 (cons x c))
       (Add rsp 8)))

;; Id CEnv -> Integer
(define (lookup x cenv)
  (match cenv
    ['() (error "undefined variable:" x)]
    [(cons y rest)
     (match (eq? x y)
       [#t 0]
       [#f (+ 8 (lookup x rest))])]))

Notice that the lookup function computes a lexical address from an identifier and compile-time environment, just like the lexical-address function in translate.rkt. The only difference is addresses are calculated as byte offsets, hence the addition of 8 instead of 1 in the recursive case.

Let’s take a look at some examples.

Examples
> (define (show e)
    (displayln (asm-string (compile (parse e)))))
> (show '(let ((x 7)) x))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, [rsp + 0]
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x 7)) 2))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, 4
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x 7)) (add1 x)))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, [rsp + 0]
        mov rbx, rax
        and rbx, 1
        cmp rbx, 0
        jne raise_error_align
        add rax, 2
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x (add1 7))) x))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        mov rbx, rax
        and rbx, 1
        cmp rbx, 0
        jne raise_error
        add rax, 2
        push rax
        mov rax, [rsp + 0]
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x 7)) (let ((y 2)) x)))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, 4
        push rax
        mov rax, [rsp + 8]
        add rsp, 8
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x 7)) (let ((x 2)) x)))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, 4
        push rax
        mov rax, [rsp + 0]
        add rsp, 8
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error
> (show '(let ((x 7)) (let ((x (add1 x))) x)))
        global entry
        default rel
        section .text
        extern peek_byte
        extern read_byte
        extern write_byte
        extern raise_error
entry:
        mov rax, 14
        push rax
        mov rax, [rsp + 0]
        mov rbx, rax
        and rbx, 1
        cmp rbx, 0
        jne raise_error_align
        add rax, 2
        push rax
        mov rax, [rsp + 0]
        add rsp, 8
        add rsp, 8
        ret
raise_error_align:
        sub rsp, 8
        jmp raise_error

And running the examples:

Examples
> (current-objs '("runtime.o"))
> (define (tell e)
    (match (asm-interp (compile (parse e)))
      ['err 'err]
      [b (bits->value b)]))
> (tell '(let ((x 7)) x))
7
> (tell '(let ((x 7)) 2))
2
> (tell '(let ((x 7)) (add1 x)))
8
> (tell '(let ((x (add1 7))) x))
8
> (tell '(let ((x 7)) (let ((y 2)) x)))
7
> (tell '(let ((x 7)) (let ((x 2)) x)))
2
> (tell '(let ((x 7)) (let ((x (add1 x))) x)))
8

11.5 Compiling binary operations

Binary expressions are easy to deal with at the level of the semantics and interpreter. However things are more complicated at the level of the compiler.

To see the problem consider blindly following the pattern we used (and ignoring type errors for the moment):

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (seq (compile-e e0 c)
       (compile-e e1 c)
       (Add 'rax ????)))

The problem here is that executing c0 places its result in register 'rax, but then executing c1 places its result in 'rax, overwriting the value of e0.

It may be tempting to use another register to stash away the result of the first subexpression:

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (seq (compile-e e0 c)
       (Mov 'r8 'rax)
       (compile-e e1 c)
       (Add 'rax 'r8)))

Can you think of how this could go wrong?

To come up with a general solution to this problem, we need to save the result of e0 and then retrieve it after computing e1 and it’s time to sum.

Note that this issue only comes up when e0 is a serious expression, i.e. an expression that must do some computation. If e0 were a literal integer or a variable, we could emit working code. For example:

; Integer Expr CEnv -> Asm
; A special case for compiling (+ i0 e1)
(define (compile-+-int i0 e1 c)
  (seq (compile-e e1 c)
       (Add 'rax (value->bits i0))))

; Id Expr CEnv -> Asm
; A special case for compiling (+ x0 e1)
(define (compile-+-var x0 e1)
  (let ((i (lookup x0 c)))
    (seq (compile-e e1 c)
         (Add 'rax (Offset 'rsp i)))))

The latter suggests a general solution could be to transform binary primitive applications into a let form that binds the first subexpression to a variable and then uses the compile-+-var function above. The idea is that every time the compiler encounters (+ e0 e1), we transform it to (let ((x e0)) (+ x e1)). For this to work out, x needs to be some variable that doesn’t appear free in e1. This transformation is what’s called ANF (administrative normal form) and is a widely used intermediate representation for compilers.

But, we can also solve the problem more directly by considering the code that is generated for the ANF style expression above.

Consider the lexical address of x in the transformed code above. It is always 0 because the transformation puts the let immediately around the occurrence of x. So if we’re compiling (+ e0 e1) in environment c using this approach, we know the value of e0 will live at (Offset 'rsp 0). There’s no need for a let binding or a fresh variable name. And this observation enables us to write a general purpose compiler for binary primitives that doesn’t require any program transformation: we simply push the value of e0 on the top of the stack and retrieve it later.

Here is a first cut:

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (let ((x (gensym))) ; generate a fresh variable
    (seq (compile-e e0 c)
         (Push 'rax)
         (compile-e e1 (cons x c))
         (Pop 'r8)
         (Add 'rax 'r8))))

There are a couple things to notice. When compiling e1 in environment (cons x c), we know that no variable in e1 resolves to x because x is a freshly gensym’d symbol. Putting (an unreferenced) x in the environment serves only to “bump up” by one the offset of any variable bound after x so as to not override the spot where e0’s values lives. We can accomplish the same thing by sticking in something that no variable is equal to: #f:

(define (compile-+ e0 e1 c)
  (seq (compile-e e0 c)
       (Push 'rax)
       (compile-e e1 (cons #f c))
       (Pop 'r8)
       (Add 'rax 'r8)))

The relevant bits of the compiler are:

fraud/compile.rkt
#lang racket
;...
;; Expr CEnv -> Asm
(define (compile-e e c)
  (match e
    ; ...
    [(Prim2 p e1 e2) (compile-prim2 p e1 e2 c)]))

;; Op2 Expr Expr CEnv -> Asm
(define (compile-prim2 p e1 e2 c)
  (seq (compile-e e1 c)
       (Push rax)
       (compile-e e2 (cons #f c))
       (match p
         ['+
          (seq (Pop r8)
               (assert-integer r8)
               (assert-integer rax)
               (Add rax r8))]
         ['-
          (seq (Pop r8)
               (assert-integer r8)
               (assert-integer rax)
               (Sub r8 rax)
               (Mov rax r8))])))

11.6 Complete Fraud compiler

The complete code for the compiler:

fraud/compile.rkt
  #lang racket
  (provide (all-defined-out))
  (require "ast.rkt" "types.rkt" a86)

  ;; Registers used
  (define rax 'rax)
  (define rbx 'rbx)
  (define r8  'r8)
  (define rsp 'rsp)
  (define rdi 'rdi)

  ;; type CEnv = [Listof Variable]

  ;; Expr -> Asm
  (define (compile e)
    (prog (Extern 'peek_byte)
          (Extern 'read_byte)
          (Extern 'write_byte)
          (Extern 'raise_error)
          (Label 'entry)
          (compile-e e '())
          (Ret)
          (Label 'raise_error_align)
          (Sub rsp 8)
          (Jmp 'raise_error)))

  ;; Expr CEnv -> Asm
  (define (compile-e e c)
    (match e
      [(Int i)         (compile-value i)]
      [(Bool b)        (compile-value b)]
      [(Char c)        (compile-value c)]
      [(Eof)           (compile-value eof)]
      [(Var x)         (compile-variable x c)]
      [(Prim0 p)       (compile-prim0 p c)]
      [(Prim1 p e)     (compile-prim1 p e c)]
      [(Prim2 p e1 e2) (compile-prim2 p e1 e2 c)]
      [(If e1 e2 e3)   (compile-if e1 e2 e3 c)]
      [(Begin e1 e2)   (compile-begin e1 e2 c)]
      [(Let x e1 e2)   (compile-let x e1 e2 c)]))

  ;; Value -> Asm
  (define (compile-value v)
    (seq (Mov rax (value->bits v))))

  ;; Id CEnv -> Asm
  (define (compile-variable x c)
    (let ((i (lookup x c)))
      (seq (Mov rax (Offset rsp i)))))

  ;; Op0 CEnv -> Asm
  (define (compile-prim0 p c)
    (match p
      ['void      (seq (Mov rax val-void))]
      ['read-byte (seq (pad-stack c)
                       (Call 'read_byte)
                       (unpad-stack c))]
      ['peek-byte (seq (pad-stack c)
                       (Call 'peek_byte)
                       (unpad-stack c))]))

  ;; Op1 Expr CEnv -> Asm
  (define (compile-prim1 p e c)
    (seq (compile-e e c)
         (compile-op1 p c)))

  ;; Op1 CEnv -> Asm
  (define (compile-op1 p c)
    (match p
      ['add1
       (seq (assert-integer rax c)
            (Add rax (value->bits 1)))]
      ['sub1
       (seq (assert-integer rax c)
            (Sub rax (value->bits 1)))]
      ['zero?
       (let ((l1 (gensym)))
         (seq (assert-integer rax c)
              (Cmp rax 0)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['char?
       (let ((l1 (gensym)))
         (seq (And rax mask-char)
              (Xor rax type-char)
              (Cmp rax 0)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['char->integer
       (seq (assert-char rax c)
            (Sar rax char-shift)
            (Sal rax int-shift))]
      ['integer->char
       (seq (assert-codepoint c)
            (Sar rax int-shift)
            (Sal rax char-shift)
            (Xor rax type-char))]
      ['eof-object?
       (let ((l1 (gensym)))
         (seq (Cmp rax val-eof)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['write-byte
       (seq (assert-byte c)
            (pad-stack c)
            (Mov rdi rax)
            (Call 'write_byte)
            (unpad-stack c)
            (Mov rax val-void))]))

  ;; Op2 Expr Expr CEnv -> Asm
  (define (compile-prim2 p e1 e2 c)
    (seq (compile-e e1 c)
         (Push rax)
         (compile-e e2 (cons #f c))
         (compile-op2 p c)))

  ;; Op2 CEnv -> Asm
  (define (compile-op2 p c)
    (match p
      ['+
       (seq (Pop r8)
            (assert-integer r8 c)
            (assert-integer rax c)
            (Add rax r8))]
      ['-
       (seq (Pop r8)
            (assert-integer r8 c)
            (assert-integer rax c)
            (Sub r8 rax)
            (Mov rax r8))]))

  ;; Expr Expr Expr CEnv -> Asm
  (define (compile-if e1 e2 e3 c)
    (let ((l1 (gensym 'if))
          (l2 (gensym 'if)))
      (seq (compile-e e1 c)
           (Cmp rax val-false)
           (Je l1)
           (compile-e e2 c)
           (Jmp l2)
           (Label l1)
           (compile-e e3 c)
           (Label l2))))

  ;; Expr Expr CEnv -> Asm
  (define (compile-begin e1 e2 c)
    (seq (compile-e e1 c)
         (compile-e e2 c)))

  ;; Id Expr Expr CEnv -> Asm
  (define (compile-let x e1 e2 c)
    (seq (compile-e e1 c)
         (Push rax)
         (compile-e e2 (cons x c))
         (Add rsp 8)))

  ;; CEnv -> Asm
  ;; Pad the stack to be aligned for a call
  (define (pad-stack c)
    (match (even? (length c))
      [#t (seq (Sub rsp 8))]
      [#f (seq)]))

  ;; CEnv -> Asm
  ;; Undo the stack alignment after a call
  (define (unpad-stack c)
    (match (even? (length c))
      [#t (seq (Add rsp 8))]
      [#f (seq)]))

  ;; Id CEnv -> Integer
  (define (lookup x cenv)
    (match cenv
      ['() (error "undefined variable:" x)]
      [(cons y rest)
       (match (eq? x y)
         [#t 0]
         [#f (+ 8 (lookup x rest))])]))

  (define (assert-type mask type)
    (λ (arg c)
      (seq (Mov rbx arg)
           (And rbx mask)
           (Cmp rbx type)
           (Jne (error-label c)))))

  (define assert-integer
    (assert-type mask-int type-int))
  (define assert-char
    (assert-type mask-char type-char))

  (define (assert-codepoint c)
    (let ((ok (gensym)))
      (seq (assert-integer rax c)
           (Cmp rax (value->bits 0))
           (Jl (error-label c))
           (Cmp rax (value->bits 1114111))
           (Jg (error-label c))
           (Cmp rax (value->bits 55295))
           (Jl ok)
           (Cmp rax (value->bits 57344))
           (Jg ok)
           (Jmp (error-label c))
           (Label ok))))

  (define (assert-byte c)
    (seq (assert-integer rax c)
         (Cmp rax (value->bits 0))
         (Jl (error-label c))
         (Cmp rax (value->bits 255))
         (Jg (error-label c))))

  ;; CEnv -> Label
  ;; Determine correct error handler label to jump to.
  (define (error-label c)
    (match (even? (length c))
      [#t 'raise_error]
      [#f 'raise_error_align]))

← prev up next →

	Syllabus
	Texts
	Schedule
	Notes
	Assignments
	Midterms
	Project
	Software

1	What is a Compiler?
2	From OCaml to Racket
3	a86: a Little Assembly Language
4	Abscond: a language of numbers
5	Blackmail: incrementing and decrementing
6	Con: branching with conditionals
7	Dupe: a duplicity of types
8	Dodger: addressing a lack of character
9	Evildoer: change the world a couple nibbles at a time
10	Extort: when errors exist
11	Fraud: local binding, variables, and binary operations
12	Hustle: heaps and lists
13	Iniquity: function definitions and calls
14	Juvie: cleaning up after your mess
15	Jig: jumping to tail calls
16	Knock: first-class function (pointers)
17	Loot: lambda the ultimate
18	Shakedown: Calling functions C functions

11.1	Binding, variables, and binary operations
11.2	Meaning of Fraud programs
11.3	Lexical Addressing
11.4	Compiling lets and variables
11.5	Compiling binary operations
11.6	Complete Fraud compiler