On this page:
5.1 Refinement, take one
5.2 Abstract syntax for Blackmail
5.3 Meaning of Blackmail programs
5.4 An Example of Blackmail compilation
5.5 A Compiler for Blackmail
5.6 Correctness and random testing
5.7 Looking back, looking forward
7.4

5 Blackmail: incrementing and decrementing

Let’s Do It Again!

    5.1 Refinement, take one

    5.2 Abstract syntax for Blackmail

    5.3 Meaning of Blackmail programs

    5.4 An Example of Blackmail compilation

    5.5 A Compiler for Blackmail

    5.6 Correctness and random testing

    5.7 Looking back, looking forward

5.1 Refinement, take one

We’ve seen all the essential peices (a grammar, an AST data type definition, an operational semantics, an interpreter, a compiler, etc.) for implementing a programming language, albeit for an amazingly simple language.

We will now, through a process of iterative refinement, grow the language to have an interesting set of features.

Our second language, which subsumes Abscond, is Blackmail. Expressions in Blackmail include integer literals and increment and decrement operations. It’s still a dead simple language, but at least programs do something.

5.2 Abstract syntax for Blackmail

A Blackmail program consists of a single expression, and the grammar of expressions is:

image

So, 0, 120, and -42 are Blackmail programs, but so are '(add1 0), '(sub1 120), '(add1 (add1 (add1 -42))).

A datatype for representing expressions can be defined as:

; type Expr =
; | Integer
; | (add1 ,Expr)
; | (sub1 ,Expr)

A predicate for recognizing well-formed expressions is more involved than Abscond, but still straightforward:

blackmail/syntax.rkt

  #lang racket
  (provide (all-defined-out))
  (require "ast.rkt")
   
  ;; Any -> Boolean
  ;; Is x a well-formed expression?
  (define (expr? x)
    (match x
      [(? integer? i) #t]
      [`(add1 ,x) (expr? x)]
      [`(sub1 ,x) (expr? x)]
      [_ #f]))
   
  ; SExpr -> AST
  ; convert the s-expr into our AST
  ; This should be a one-to-one mapping for now.
  (define (sexpr->ast s)
    (match s
      [(? integer? s) (int-e s)]
      [`(add1 ,e)     (add1-e (sexpr->ast e))]
      [`(sub1 ,e)     (sub1-e (sexpr->ast e))]
      ['get-int       (get-i)]
      [_              (error "operation not supported")]))
   
5.3 Meaning of Blackmail programs

The meaning of a Blackmail program depends on the form of the expression:

The operational semantics reflects this dependence on the form of the expression by having three rules, one for each kind of expression:

image    image    image

The first rule looks familiar; it’s exactly the semantics of integers from Abscond. The second and third rule are more involved. In particular, they have premises above the line. If the premises are true, the conclusion below the line is true as well. These rules are conditional on the premises being true. This is in contrast to the first rule, which applies unconditionally.

We can understand these rules as saying the following:
  • For all integers i, (i,i) is in image.

  • For expressions e0 and all integers i0 and i1, if (e0,i0) is in image and i1 = i0 + 1, then ('(add1 e0), i1) is in image.

  • For expressions e0 and all integers i0 and i1, if (e0,i0) is in image and i1 = i0 - 1, then ('(sub1 e0), i1) is in image.

These rules are inductive. We start from the meaning of integers and if we have the meaning of an expression, we can construct the meaning of a larger expression.

This may seem a bit strange at the moment, but it helps to view the semantics through its correspondence with an interpreter, which given an expression e, computes an integer i, such that (e,i) is in image.

Just as there are three rules, there will be three cases to the interpreter, one for each form of expression:

blackmail/interp.rkt

  #lang racket
  (provide (all-defined-out))
   
  (require "ast.rkt")
  (require "primitives.rkt")
   
  ;; Expr -> Integer
  (define (interp e)
    (match e
      [(int-e i) i]
      [(add1-e e1) (add1 (interp e1))]
      [(sub1-e e1) (sub1 (interp e1))]
      [(get-i)   (get-int)]))
   

Examples:
> (interp (sexpr->ast 42))

42

> (interp (sexpr->ast -7))

-7

> (interp (sexpr->ast '(add1 42)))

43

> (interp (sexpr->ast '(sub1 8)))

7

> (interp (sexpr->ast '(add1 (add1 (add1 8)))))

11

Here’s how to connect the dots between the semantics and interpreter: the interpreter is computing, for a given expression e, the integer i, such that (e,i) is in image. The interpreter uses pattern matching to determine the form of the expression, which determines which rule of the semantics applies.

This explaination of the correspondence is essentially a proof (by induction) of the interpreter’s correctness:

Interpreter Correctness: For all Blackmail expressions e and integers i, if (e,i) in image, then (interp e) equals i.

5.4 An Example of Blackmail compilation

Just as we did with Abscond, let’s approach writing the compiler by first writing an example.

Suppose we want to compile '(add1 (add1 40)). We already know how to compile the 40: '(mov rax 40). To do the increment (and decrement) we need to know a bit more x86-64. In particular, the add (and sub) instruction is relevant. It increments the contents of a register by some given amount.

Concretely, the program that adds 1 twice to 40 looks like:

blackmail/add1-add1-40.s

    global entry
    extern get_int
    section .text
entry:
    mov rax, 40
    add rax, 1
    add rax, 1
    ret

The runtime stays exactly the same as before.

shell

> make add1-add1-40.run
make[1]: Entering directory `/home/travis/build/cmsc430/www/www/notes/blackmail'
make[1]: `add1-add1-40.run' is up to date.
make[1]: Leaving directory `/home/travis/build/cmsc430/www/www/notes/blackmail'
> ./add1-add1-40.run
42

5.5 A Compiler for Blackmail

To represent these new instructions, we extend the Asm AST data type:

blackmail/asm/ast.rkt

#lang racket
;; type Instruction =
;; ...
;; | `(add ,Arg ,Arg)
;; | `(sub ,Arg ,Arg)

And correspondingly update the printer:

blackmail/asm/printer.rkt

#lang racket
;; Instruction -> String
(define (instr->string i)
  (match i
    ...
    [`(add ,a1 ,a2)
     (string-append "\tadd " (arg->string a1) ", " (arg->string a2) "\n")]
    [`(sub ,a1 ,a2)
     (string-append "\tsub " (arg->string a1) ", " (arg->string a2) "\n")]))

We can now print the assembly of our example from an AST:

Examples

> (asm-display
    '(entry (mov rax 40)
            (add rax 1)
            (add rax 1)
            ret))

global entry

extern get_int

section .text

entry:

mov rax, 40

add rax, 1

add rax, 1

ret

The compiler consists of two functions: the first, which is given a program, emits the entry point and return instructions, invoking another function to compile the expression:

blackmail/compile.rkt

  #lang racket
  (provide (all-defined-out))
   
  (require "ast.rkt")
  (require "primitives.rkt")
   
  ;; Expr -> Asm
  (define (compile e)
    (append '(entry)
            (compile-e e)
            '(ret)))
   
  ;; Expr -> Asm
  (define (compile-e e)
    (match e
      [(int-e i) `((mov rax ,i))]
      [(add1-e e1) (let ((c1 (compile-e e1))) ; c1 : [ASM]
                  `(,@c1
                    (add rax 1)))]
      [(sub1-e e1) (let ((c1 (compile-e e1)))
                  `(,@c1
                    (sub rax 1)))]
      [(get-i)  get-int-asm]))
   

Notice that compile-e is defined by structural recursion, much like the interpreter.

We can now try out a few examples:

Examples

> (compile (sexpr->ast '(add1 (add1 40))))

'(entry (mov rax 40) (add rax 1) (add rax 1) ret)

> (compile (sexpr->ast '(sub1 8)))

'(entry (mov rax 8) (sub rax 1) ret)

> (compile (sexpr->ast '(add1 (add1 (sub1 (add1 -8))))))

'(entry (mov rax -8) (add rax 1) (sub rax 1) (add rax 1) (add rax 1) ret)

And give a command line wrapper for parsing, checking, and compiling files:

blackmail/compile-file.rkt

  #lang racket
  (provide (all-defined-out))
  (require "compile.rkt" "syntax.rkt" "asm/printer.rkt")
   
  ;; String -> Void
  ;; Compile contents of given file name,
  ;; emit asm code on stdout
  (define (main fn)
    (with-input-from-file fn
      (λ ()
        (let ((c (read-line))) ; ignore #lang racket
          (let ((p (read)))
            (asm-display (compile (sexpr->ast p))))))))
   

Here it is in action:

shell

> echo "#lang racket\n(add1 (add1 40))" > add1-add1-40.rkt
> racket -t compile-file.rkt -m add1-add1-40.rkt
    global entry
    extern get_int
    section .text
entry:
    mov rax, 40
    add rax, 1
    add rax, 1
    ret

And using the same Makefile setup as in Abscond, we capture the whole compilation process with a single command:

shell

> make add1-add1-40.run
make[1]: Entering directory `/home/travis/build/cmsc430/www/www/notes/blackmail'
racket -t compile-file.rkt -m add1-add1-40.rkt > add1-add1-40.s
nasm -f elf64 -o add1-add1-40.o add1-add1-40.s
gcc main.o add1-add1-40.o -o add1-add1-40.run
rm add1-add1-40.o
make[1]: Leaving directory `/home/travis/build/cmsc430/www/www/notes/blackmail'
> ./add1-add1-40.run
42

Likewise, to test the compiler from within Racket, we use the same asm/interp.rkt code to encapsulate running assembly code:

Examples

> (asm-interp (compile (sexpr->ast '(add1 (add1 40)))))

42

> (asm-interp (compile (sexpr->ast '(sub1 8))))

7

> (asm-interp (compile (sexpr->ast '(add1 (add1 (sub1 (add1 -8)))))))

-6

5.6 Correctness and random testing

We can state correctness similarly to how it was stated for Abscond:

Compiler Correctness: For all expressions e and integers i, if (e,i) in image, then (asm-interp (compile e)) equals i.

And we can test this claim by comparing the results of running compiled and interpreted programs, leading to the following property, which hopefully holds:

Examples

> (define (check-compiler e)
    (check-eqv? (interp (sexpr->ast e))
                (asm-interp (compile (sexpr->ast e)))))

The problem, however, is that generating random Blackmail programs is less obvious compared to generating random Abscond programs (i.e. random integers). Randomly generating programs for testing is its own well studied and active research area. To side-step this wrinkle, we have provided a small utility for generating random Blackmail programs (random.rkt), which you can use, without needing the understand how it was implemented.

Examples

> (require "random.rkt")
> (random-expr)

32

> (random-expr)

-1

> (random-expr)

'(add1 (add1 1))

> (random-expr)

'(sub1 -1)

> (random-expr)

'(sub1 (sub1 (sub1 (sub1 -1))))

> (asm-display (compile (sexpr->ast (random-expr))))

global entry

extern get_int

section .text

entry:

mov rax, -1

ret

> (for ([i (in-range 10)])
    (check-compiler (random-expr)))
5.7 Looking back, looking forward

We’ve now built two compilers; enough to start observing a pattern.

Recall the phases of a compiler described in What does a Compiler look like?. Let’s identify these peices in the two compilers we’ve written:

Our recipe for building compiler involves:

  1. Build intuition with examples,

  2. Model problem with data types,

  3. Implement compiler via type-transforming-functions,

  4. Validate compiler via tests.

As we move forward, the language we are compiling will grow. As the language grows, you should apply this recipe to grow the compiler along with the language.