9 Evildoer: change the world a couple nibbles at a time

8.1

9 Evildoer: change the world a couple nibbles at a time

Warning: Side effects may include itching, burning, oozing, weeping. Not intended for heart patients and those with nervous disorders.

9.1 Reading and writing bytes

9.2 Reading and writing bytes in Racket

9.3 Meaning of Evildoer programs

9.4 A Run-Time for Evildoer

9.5 Calling C functions from a86

9.6 A Compiler for Evildoer

9.1 Reading and writing bytes

So far, the languages we’ve consider have had the following property: the result of evaluation is determined entirely as a function of the expression being evaluated. Which is to say, the meaning of a program is determined entirely by the text of the program. This is a nice property in that it makes reasoning about programs pretty clear cut: if you know the program, you can know exactly what it computes. However, many real-world programs (like the very compiler we are writing!) do not have this property. Instead they interact with the outside world and compute results based on the state of the world.

For example, consider the compile-file.rkt program, which reads the contents of a file from disk and compiles it. The meaning of this program depends on the state of your computer’s hard drive. Similarly, it prints out assembly code to the standard output port. So not only does this program depend on the outside world, it changes it too.

Let’s design a language that has a simple mechanism for interacting with the outside world. It will be able to read and write a byte of information at a time (i.e. an integer between 0 and 256) from the standard input port and output port, respectively.

We’ll call it Evildoer.

To the syntax of expressions, we add the following operations:

write-byte : Byte -> Void: writes given byte to stdout, produces nothing.
read-byte : -> Byte or EOF: reads a byte from stdin, if there is one, EOF otherwise.
peek-byte : -> Byte or EOF: peeks a byte from stdin, if there is one, EOF otherwise.

These operations will behave like their Racket counterparts.

To complement these operations, we add two new values:

void: a value used to indicate something has been done for effect only and has no useful result.
eof: a value used to indicate the end of an input port has been reached.

The void value arises as the result of an expression that is evaluated for effect only, such as write-byte. The eof value arises as the result of reading the end of a port. So that these values can be accessed more directly, we’ll also add:

eof : EOF bound to the end-of-file value, and
void : -> Void a function that produces the void value.

In order to recognize the end-of-file value, we add the following predicate (just as in Racket):

eof-object? : Any -> Boolean: determines if argument is the eof value.

Finally, we add a simple sequencing construct to first evaluate an expression for effect and then evaluate another expression for its result.

(begin e0 e1): evaluates e0, then e1.

Abstract syntax and parsing is done as you would expect. Since we now have primitive operations that take 0 arguments, we split the Prim constructor into Prim0 and Prim1.

evildoer/ast.rkt
  #lang racket
  (provide Eof Int Bool Char Prim0 Prim1 If Begin)

  ;; type Expr =
  ;; | (Eof)
  ;; | (Int Integer)
  ;; | (Bool Boolean)
  ;; | (Char Character)
  ;; | (Prim0 Op0)
  ;; | (Prim1 Op1 Expr)
  ;; | (If Expr Expr Expr)
  ;; | (Begin Expr Expr)
  ;; type Op0 = 'read-byte | 'peek-byte | 'void
  ;; type Op1 = 'add1 | 'sub1 | 'zero?
  ;;          | 'char? | 'integer->char | 'char->integer
  ;;          | 'write-byte | 'eof-object?
  (struct Eof   ()         #:prefab)
  (struct Int   (i)        #:prefab)
  (struct Bool  (b)        #:prefab)
  (struct Char  (c)        #:prefab)
  (struct Prim0 (p)        #:prefab)
  (struct Prim1 (p e)      #:prefab)
  (struct If    (e1 e2 e3) #:prefab)
  (struct Begin (e1 e2)    #:prefab)

The s-expression parser is defined as follows:

evildoer/parse.rkt
  #lang racket
  (provide parse)
  (require "ast.rkt")

  ;; S-Expr -> Expr
  (define (parse s)
    (match s
      ['eof                (Eof)]
      [(? integer?)        (Int s)]
      [(? boolean?)        (Bool s)]
      [(? char?)           (Char s)]
      [(list (? op0? o))   (Prim0 o)]
      [(list (? op1? o) e) (Prim1 o (parse e))]
      [(list 'begin e1 e2) (Begin (parse e1) (parse e2))]
      [(list 'if e1 e2 e3)
       (If (parse e1) (parse e2) (parse e3))]
      [_ (error "Parse error")]))

  ;; Any -> Boolean
  (define (op0? x)
    (memq x '(read-byte peek-byte void)))
  (define (op1? x)
    (memq x '(add1 sub1 zero? char? integer->char char->integer
                   write-byte eof-object?)))

9.2 Reading and writing bytes in Racket

Racket has a Byte data type that is not disjoint from other datatypes; it’s simply an integer in (0,256]. The operations read-byte and write-byte read and write, respectively, a byte to or from stdin or stdout (by default).

Let’s look at an example of write-byte:

Examples
> (write-byte 97)
a
> (write-byte 109)
m

A byte, when written corresponds to an ASCII character, which is why you see a for 97 and m for 109.

A subtle, but crucial point here that these expressions are printing, i.e. writing bytes to stdout. But they don’t produce any value. Or more precisely, they print and then produce a value that indicates "no useful value has been produced." In OCaml, this value is called "unit"; in Racket, it’s called "void." When the REPL is given an expression that produces void as its result, it doesn’t show anything. Here’s any example that uses the Racket void function, which simply returns the void value:

Examples
> (void)
> (+ 2 3)
5
> (void)
> (void)
> "fred"
"fred"

It’s important to note that void is just a value like any other value; it’s not literally "nothing." It’s just handled specially by the REPL in this case by not showing anything. Were we to put the void value within another value, the REPL shows it:

Examples
> (define xs (list (void) (void) 3 (void)))
> xs
'(#<void> #<void> 3 #<void>)
> (length xs)
4
> (first xs)
> (void? (first xs))
#t

So what write-byte is doing is printing something and producing void. If we were to sequence write-byte using begin, would print something and produce a non-void value:

Examples
> (begin (write-byte 97)
#t)
a
#t

Notice how the REPL in the notes is helpfully using color to distinguish the printed output from the program and the result of the program.

Now’s let’s look at read-byte. It takes no arguments and reads a byte from stdin. If there’s no more bytes on stdin, it produces eof. Its cousin peek-byte also gets a byte from stdin, but it leaves the stream intact so that the same byte would be read the next time around.

Now, making examples of read-byte is a bit more tricky. While write-byte interacts with the outside world by printing something to stdout, what it prints is determined by the program text. On the other hand, what read-byte reads depends on what’s in the stdin stream. If you launch Racket and type (read-byte), Racket will then block, waiting for you to type something, so that a byte can be read. It will produce whatever the first byte of what you type is.

So how can I make examples?

One option is to use the operating system shell to “pipe” output from one program as input to the Racket process, which will then read that data as it’s input. Together with printf program, we can write the data we want the Racket program to see on stdin:

shell

> printf 'hello' | racket -e '(read-byte)'
104

shell

> printf 'hello' | racket -e '(list (read-byte) (read-byte))'
'(104 101)

If we pipe the empty string, the program will produce the eof value:

shell

> printf '' | racket -e '(read-byte)'
#<eof>

Another possibility is to use a similar mechanism but from within Racket. The with-input-from-string uses a given string as the data available on stdin. Then (read-byte) will read from this data:

Examples
> (with-input-from-string "hello" (λ () (read-byte)))
104
> (with-input-from-string "hello" (λ () (list (read-byte) (read-byte))))
'(104 101)
> (with-input-from-string "" (λ () (read-byte)))
#<eof>

This uses with-input-from-string which takes a string and a zero-argument function. It then installs the contents of the string in stdin as though you had typed this data, then invokes the function, thereby running a computation with a predetermined input stream.

There’s a matching with-output-to-string function that takes a zero-argument function and runs it in a way that collects the output and produces it as a string. This let’s you capture what was printed and turn it in to a value that is produced:

Examples
> (with-output-to-string (λ () (write-byte 104)))
"h"

These facilities will be useful for making examples, but also for writing test cases, since we can set up the state of the outside world and capture changes as values, which we can then use to assert the expected behavior:

Examples
> (check-equal?
   (with-input-from-string "hello"
     (λ ()
       (with-output-to-string
         (λ ()
           (write-byte (read-byte))))))
   "h")

9.3 Meaning of Evildoer programs

Formulating the semantics of Evildoer is more complicated than the languages we’ve developed so far. Let’s put it off for now and instead focus on the interpreter, which remains basically as simple as before. The reason for this disparity is that math doesn’t have side-effects. The formal semantics will need to account for effectful computations without itself having access to them. Racket, on the other hand, can model effectful computations directly as effectful Racket programs.

Here’s an interpreter for Evildoer:

evildoer/interp.rkt
  #lang racket
  (provide interp)
  (require "ast.rkt" "interp-prim.rkt")

  ;; type Value =
  ;; | Integer
  ;; | Boolean
  ;; | Character
  ;; | Eof
  ;; | Void

  ;; Expr -> Value
  (define (interp e)
    (match e
      [(Int i)  i]
      [(Bool b) b]
      [(Char c) c]
      [(Eof)    eof]
      [(Prim0 p)
       (interp-prim0 p)]
      [(Prim1 p e0)
       (interp-prim1 p (interp e0))]
      [(If e1 e2 e3)
       (if (interp e1)
           (interp e2)
           (interp e3))]
      [(Begin e1 e2)
       (begin (interp e1)
              (interp e2))]))

The interpretation of primitives relies on the underlying implementations read-byte, write-byte, etc. from Racket (just like it does for all the other operations):

evildoer/interp-prim.rkt
  #lang racket
  (provide interp-prim0 interp-prim1)

  ;; Op0 -> Value
  (define (interp-prim0 op)
    (match op
      ['read-byte (read-byte)]
      ['peek-byte (peek-byte)]
      ['void      (void)]))

  ;; Op1 Value -> Value
  (define (interp-prim1 op v)
    (match op
      ['add1          (add1 v)]
      ['sub1          (sub1 v)]
      ['zero?         (zero? v)]
      ['char?         (char? v)]
      ['integer->char (integer->char v)]
      ['char->integer (char->integer v)]
      ['write-byte    (write-byte v)]
      ['eof-object?   (eof-object? v)]))

Interpreting a program that reads and writes will itself read and write:

Examples
> (interp (parse '(write-byte 104)))
h
> (with-input-from-string "hello"
(λ ()
(interp (parse '(write-byte (read-byte))))))
h

We can also build a useful utility for interpreting programs with strings representing stdin and stdout:

evildoer/interp-io.rkt
  #lang racket
  (provide interp/io)
  (require "interp.rkt")

  ;; Expr String -> (Cons Value String)
  ;; Interpret e with given string as input,
  ;; return value and collected output as string
  (define (interp/io e input)
    (parameterize ((current-output-port (open-output-string))
                   (current-input-port  (open-input-string input)))
        (cons (interp e)
              (get-output-string (current-output-port)))))

Examples
> (interp/io (parse '(write-byte 104)) "")
'(#<void> . "h")
> (interp/io (parse '(write-byte (read-byte))) "hello")
'(#<void> . "h")

This is useful to write tests about programs that have side-effects because we can turn what was an effectful computation into a pure one:

Examples
> (check-equal? (interp/io (parse '(write-byte (read-byte))) "hello")
(cons (void) "h"))

OK, so now, what about the formal mathematical model of Evildoer? We have to reconsider the domain of program meanings. No longer does an expression just mean a value; its meaning may depend upon the state of the input and output port. Moreover, an expression may alter the state of these ports.

There are several different approaches we might take to formally model the effects of read-byte and write-byte. We’ll adopt a fairly simple one which is to say that the semantic function now takes and produces a pair of ports, which we can model as a list of bytes. Reading from the input port consumes elements from the input bytes, while writing to the output port appends elements. This is pretty close in spirit to our interp/io facility, but instead of capturing the effects with string ports, we will define the meaning of effects directly.

(Semantics omitted for now.)

9.4 A Run-Time for Evildoer

With new values comes the need to add new bit encodings. So we add new encodings for eof and void:

evildoer/types.h

#ifndef TYPES_H
#define TYPES_H

/*
  Bit layout of values

  Values are either:
  - Integers:   end in  #b0
  - Characters: end in #b01
  - True:              #b11
  - False:            #b111
  - Eof:             #b1011
  - Void:            #b1111
*/
#define int_shift        1
#define int_type_mask    ((1 << int_shift) - 1)
#define int_type_tag     (0 << (int_shift - 1))
#define nonint_type_tag  (1 << (int_shift - 1))
#define char_shift       (int_shift + 1)
#define char_type_mask   ((1 << char_shift) - 1)
#define char_type_tag    ((0 << (char_shift - 1)) | nonint_type_tag)
#define nonchar_type_tag ((1 << (char_shift - 1)) | nonint_type_tag)
#define val_true  ((0 << char_shift) | nonchar_type_tag)
#define val_false ((1 << char_shift) | nonchar_type_tag)
#define val_eof   ((2 << char_shift) | nonchar_type_tag)
#define val_void  ((3 << char_shift) | nonchar_type_tag)

#endif

The main run-time file is extended slightly to take care of printing the new kinds of values (eof and void). Note that a void result causes nothing to be printed:

evildoer/runtime.h

#ifndef RUNTIME_H
#define RUNTIME_H

#include "values.h"

val_t entry();
extern FILE* in;
extern FILE* out;
#endif /* RUNTIME_H */

evildoer/main.c

#include <stdio.h>
#include "values.h"
#include "print.h"
#include "runtime.h"

FILE* in;
FILE* out;

int main(int argc, char** argv)
{
  in = stdin;
  out = stdout;
  
  val_t result;

  result = entry();
  print_result(result);
  if (val_typeof(result) != T_VOID)
    putchar('\n');

  return 0;
}

But the real novelty of the Evildoer run-time is that there will be new functions that implement read-byte, peek-byte, and write-byte; these will be C functions called read_byte, peek_byte and write_byte:

evildoer/io.c

#include <stdio.h>
#include <inttypes.h>
#include "types.h"
#include "values.h"
#include "runtime.h"

val_t read_byte(void)
{
  char c = getc(in);
  return (c == EOF) ? val_wrap_eof() : val_wrap_int(c);  
}

val_t peek_byte(void)
{
  char c = getc(in);
  ungetc(c, in);
  return (c == EOF) ? val_wrap_eof() : val_wrap_int(c);
  
}

val_t write_byte(val_t c)
{
  putc((char) val_unwrap_int(c), out);
  return val_wrap_void();
}

The main novely of the compiler will be that emits code to make calls to these C functions.

9.5 Calling C functions from a86

If you haven’t already, be sure to read up on how calls work in a86: a Little Assembly Language.

Once you brushed up on how calls work, you’ll know you can define labels that behave like functions and call them.

Let’s start by assuming we have a simple stand-in for the run-time system, which is this C program that invokes an assembly program with a label called entry and prints the result:

evildoer/simple.c

#include <stdio.h>
#include <inttypes.h>

int64_t entry();

int main(int argc, char** argv) {
  printf("%" PRId64 "\n", entry());
  return 0;
}

Now, here is a little program that has a function called meaning that returns 42. The main entry point calls meaning, adds 1 to the result, and returns:

Examples
> (define p
    (prog (Global 'entry)
          (Label 'entry)
          (Call 'meaning)
          (Add 'rax 1)
          (Ret)
          (Label 'meaning)
          (Mov 'rax 42)
          (Ret)))

Let’s save it to a file called p.s:

Examples
> (with-output-to-file "p.s"
    (λ ()
      (displayln (asm-string p)))
    #:exists 'truncate)

We can assemble it, link it together with the printer, and run it:

shell

> gcc -c simple.c -o simple.o
> nasm -f elf64 p.s -o p.o
> gcc simple.o p.o -o simple
> ./simple
43

In this case, the meaning label is defined within the same assembly program as entry, although that doesn’t have to be the case. We can separate out the definition of meaning into its own file, so long as we declare in this one that meaning is an external label:

Examples
> (define p
    (prog (Extern 'meaning)
          (Global 'entry)
          (Label 'entry)
          (Call 'meaning)
          (Add 'rax 1)
          (Ret)))
> (define life
    (prog (Global 'meaning)
          (Label 'meaning)
          (Mov 'rax 42)
          (Ret)))

By declaring an external label, we’re saying this program makes use of that label, but doesn’t define it. The definition will come from a later phase where the program is linked against another that provides the definition.

There is an important invariant that has to be maintained once these programs are moved into separate object files though. According to the System V ABI, the stack address must be aligned to 16-bytes before the call instruction. Not maintaining this alignment can result in a segmentation fault. Since the p program is the one doing the calling, it is the one that has to worry about the issue.

Now keep in mind that the p program is itself called by the C program that prints the result. So when the call to p was made, the stack was aligned. In executing the Push instruction, a word, which is 8-byte, was pushed. This means at the point that control transfers to 'entry, the stack is not aligned to a 16-byte boundary. To fix the problem, we can push another element to the stack, making sure to pop it off before returning. We opt to decrement (remember the stack grows toward low memory) and increment to make clear we’re not saving anything; this is just about alignment. The revised p program is:

Examples
> (define p
    (prog (Extern 'meaning)
          (Global 'entry)
          (Label 'entry)
          (Sub 'rsp 8)
          (Call 'meaning)
          (Add 'rax 1)
          (Add 'rsp 8)
          (Ret)))

Now save each program in its nasm format:

Examples
> (with-output-to-file "p.s"
    (λ ()
      (displayln (asm-string p)))
    #:exists 'truncate)
> (with-output-to-file "life.s"
    (λ ()
      (displayln (asm-string life)))
    #:exists 'truncate)

And assemble:

shell

> nasm -f elf64 p.s -o p.o
> nasm -f elf64 life.s -o life.o

Then we can link all the pieces together and run it:

shell

> gcc simple.o p.o life.o -o simple
> ./simple
43

Now if we look at life.s, this is an assembly program that defines the meaning label. We defined it by writing assembly code, but we could’ve just as easily defined it in any other language that can compile to an object file. So let’s write it in C:

evildoer/life.c

#include <inttypes.h>

int64_t meaning(void) {
  return 42;
}

We can compile it to an object file:

shell

> gcc -c life.c -o life.o

This object file will have a single globally visible label called meaning, just like our previous implementation.

We can again link together the pieces and confirm that it still produces the same results:

shell

> gcc simple.o p.o life.o -o simple
> ./simple
43

At this point, we’ve written a little assembly program (p.s) that calls a function named meaning, that was written in C.

One thing that you can infer from this example is that the C compiler generates code for meaning that is like the assembly code we wrote, namely it “returns” a value to the caller by placing a value in 'rax.

The next natural question to ask is, how does an assembly program provide arguments to the call of a C function?

Just as there is a convention that a return value is communicated through 'rax, there are conventions governing the communication of arguments. The conventions are known as an Application Binary Interface or ABI. The set of conventions we’re following is called the System V ABI, and it used by Unix variants like Mac OS, Linux, and BSD systems. (Windows follows a different ABI.)

The convention for arguments is that the first six integer or pointer parameters are passed in the registers 'rdi, 'rsi, 'rdx, 'rcx, 'r8, 'r9. Additional arguments and large arguments such as structs are passed on the stack.

So now let’s try calling a C function that takes a parameter. Here we have a simple C function that doubles it’s input:

evildoer/double.c

#include <inttypes.h>

int64_t dbl(int64_t x) {
  return x + x;
}

We can compile it to an object file:

shell

> gcc -c double.c -o double.o

Now, to call it, the assembly program should put the value of its argument in 'rdi before the call:

Examples
> (define q
    (prog (Extern 'dbl)
          (Global 'entry)
          (Label 'entry)
          (Mov 'rdi 21)
          (Call 'dbl)
          (Add 'rax 1)
          (Ret)))
> (with-output-to-file "q.s"
    (λ ()
      (displayln (asm-string q)))
    #:exists 'truncate)

We can assemble it into an object file:

shell

> nasm -f elf64 q.s -o q.o

And linking everything together and running shows it works as expected:

shell

> gcc simple.o q.o double.o -o simple
> ./simple
43

Now we have all the tools needed to interact with libraries written in C, and really any library object files that adhere to the System V ABI. Perhaps the only remaining wrinkle is how should we deal with the situation in which we are using the registers that are needed to pass parameters in a call? The answer is to save them on the stack and restore them when the call returns. For example, suppose 'rdi held a value we wanted to use after the call to dbl. It’s a bit contrived, but let’s say we want to use 'rdi to hold the constant we’ll add to the result of calling dbl. Now we need to save it before writing the argument. All we need to do is add a push and pop around the call:

Examples
> (define q
    (prog (Extern 'dbl)
          (Global 'entry)
          (Label 'entry)
          (Sub 'rsp 8)
          (Mov 'rdi 1)
          (Push 'rdi)
          (Mov 'rdi 21)
          (Call 'dbl)
          (Pop 'rdi)
          (Add 'rax 'rdi)
          (Add 'rsp 8)
          (Ret)))
> (with-output-to-file "q.s"
    (λ ()
      (displayln (asm-string q)))
    #:exists 'truncate)

shell

> nasm -f elf64 q.s -o q.o
> gcc simple.o q.o double.o -o simple
> ./simple
43

The wrinkle is actually a bit deeper than this too. Suppose we are using other registers, maybe some that are not used for parameters, but nonetheless are registers that the function we’re calling would like to use? Without knowing the details of how the function is implemented, we could be defensive and save everything we’re using with the assumption the called function may clobber anything. But here, the ABI comes into play again. There are conventions around who is responsible for registers in calls. The called function is responsible for maintaining the registers 'rbx, 'rsp, 'rbp, 'r12, 'r13, 'r14, 'r15; these are callee-saved registers. This means we, the callers, don’t have to worry about these registers being clobbered and don’t need to save them to the stack. If the called function wants to use these registers, it’s responsible for saving their value and restoring them before returning. On the other hand, registers 'rax, 'rdi, 'rdx, 'rcx, 'r8, 'r9, 'r10, and 'r11 are caller-saved registers, which means the called function is free to clobber them and if we want to preserve their value across the call, we’ll need to save and restore them.

As a final note, keep in mind that the compiler generates code this is both called and a caller, so it has to be mindful of both sides of the convention. The main entry point entry is called from the C run-time. If the generated code wants to use any of the callee-saved registers, it should save them and restore them before the return that delivers the final result of evaluation. On the other hand, when it calls external functions implemented in C, it is the caller and has to maintain the caller-saved registers.

OK, now let’s use these new powers to write the compiler.

9.6 A Compiler for Evildoer

Implementing eof, void, eof-object? and begin are all straightfoward and don’t really involve anything new.

For peek-byte, read-byte, and write-byte, we generate code that calls the appropriate C function. In the case of write-byte, we arrange for the byte that we’d like to write to be in 'rdi before the call.

Finally, since the emitted code is potentially issuing calls to external functions, we make sure to align the stack to 16-bytes. Rather than do this at each call site, we take advantage of the fact that no other stack changes occur and adjust the stack just once at the entry and exit of the code.

The top-level compiler:

evildoer/compile.rkt
  #lang racket
  (provide (all-defined-out))
  (require "ast.rkt" "types.rkt" "compile-ops.rkt" a86/ast)

  ;; Registers used
  (define rax 'rax)
  (define rsp 'rsp)

  ;; Expr -> Asm
  (define (compile e)
    (prog (Extern 'peek_byte)
          (Extern 'read_byte)
          (Extern 'write_byte)
          (Global 'entry)
          (Label 'entry)
          (Sub rsp 8)
          (compile-e e)
          (Add rsp 8)
          (Ret)))

  ;; Expr -> Asm
  (define (compile-e e)
    (match e
      [(Int i)       (compile-value i)]
      [(Bool b)      (compile-value b)]
      [(Char c)      (compile-value c)]
      [(Eof)         (compile-value eof)]
      [(Prim0 p)     (compile-prim0 p)]
      [(Prim1 p e)   (compile-prim1 p e)]
      [(If e1 e2 e3) (compile-if e1 e2 e3)]
      [(Begin e1 e2) (compile-begin e1 e2)]))

  ;; Value -> Asm
  (define (compile-value v)
    (seq (Mov rax (value->bits v))))

  ;; Op0 -> Asm
  (define (compile-prim0 p)
    (compile-op0 p))

  ;; Op1 Expr -> Asm
  (define (compile-prim1 p e)
    (seq (compile-e e)
         (compile-op1 p)))

  ;; Expr Expr Expr -> Asm
  (define (compile-if e1 e2 e3)
    (let ((l1 (gensym 'if))
          (l2 (gensym 'if)))
      (seq (compile-e e1)
           (Cmp rax val-false)
           (Je l1)
           (compile-e e2)
           (Jmp l2)
           (Label l1)
           (compile-e e3)
           (Label l2))))

  ;; Expr Expr -> Asm
  (define (compile-begin e1 e2)
    (seq (compile-e e1)
         (compile-e e2)))

The primitive operation compiler:

evildoer/compile-ops.rkt
  #lang racket
  (provide (all-defined-out))
  (require "ast.rkt" "types.rkt" a86/ast)

  (define rax 'rax) ; return
  (define rdi 'rdi) ; arg
  (define r8  'r8)  ; scratch in +, -

  ;; Op0 -> Asm
  (define (compile-op0 p)
    (match p
      ['void      (seq (Mov rax val-void))]
      ['read-byte (seq (Call 'read_byte))]
      ['peek-byte (seq (Call 'peek_byte))]))

  ;; Op1 -> Asm
  (define (compile-op1 p)
    (match p
      ['add1 (Add rax (value->bits 1))]
      ['sub1 (Sub rax (value->bits 1))]
      ['zero?
       (let ((l1 (gensym)))
         (seq (Cmp rax 0)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['char?
       (let ((l1 (gensym)))
         (seq (And rax mask-char)
              (Xor rax type-char)
              (Cmp rax 0)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['char->integer
       (seq (Sar rax char-shift)
            (Sal rax int-shift))]
      ['integer->char
       (seq (Sar rax int-shift)
            (Sal rax char-shift)
            (Xor rax type-char))]
      ['eof-object?
       (let ((l1 (gensym)))
         (seq (Cmp rax val-eof)
              (Mov rax val-true)
              (Je l1)
              (Mov rax val-false)
              (Label l1)))]
      ['write-byte
       (seq (Mov rdi rax)
            (Call 'write_byte)
            (Mov rax val-void))]))

We can continue to interactively try out examples with asm-interp, although there are two issues we need to deal with.

The first is that the asm-interp utility doesn’t know anything about the Evildoer run-time. Hence we need to tell asm-interp to link it in when running an example; otherwise labels like byte_write will be undefined.

The other is that we need to have an asm-interp/io counterpart that is analogous to interp/io, i.e. we need to be able to redirect input and output so that we can run programs in a functional way.

There is a parameter that asm-interp uses called current-objs that can be used to add additional object files to be linked against when running examples.

So for example, to make an example with the dbl function from before, we can do the following:

Examples
> (current-objs '("double.o"))
> (asm-interp
   (prog (Extern 'dbl)
         (Global 'entry)
         (Label 'entry)
         (Mov 'rdi 21)
         (Call 'dbl)
         (Ret)))
42

The other issue is bit uglier to deal with. We need to do this redirection at the C-level. Our solution is write an alternative version of byte.o that has functions for setting the input and out streams that are used in write_byte etc. The implementation of asm-interp/io is expected to be linked against a library that implements these functions and will use them to set up temporary files and redirect input and output there. It’s a hack, but a useful one.

Examples
> (current-objs '("runtime.o"))
> (asm-interp/io
    (prog (Extern 'read_byte)
          (Extern 'write_byte)
          (Global 'entry)
          (Label 'entry)
          (Call 'read_byte)
          (Mov 'rdi 'rax)
          (Call 'write_byte)
          (Mov 'rax 42)
          (Ret))
    "a")
'(42 . "a")

← prev up next →

	Syllabus
	Texts
	Schedule
	Notes
	Assignments
	Midterms
	Project
	Software

1	What is a Compiler?
2	From OCaml to Racket
3	a86: a Little Assembly Language
4	Abscond: a language of numbers
5	Blackmail: incrementing and decrementing
6	Con: branching with conditionals
7	Dupe: a duplicity of types
8	Dodger: addressing a lack of character
9	Evildoer: change the world a couple nibbles at a time
10	Extort: when errors exist
11	Fraud: local binding, variables, and binary operations
12	Hustle: heaps and lists
13	Hoax: vectors and strings
14	Iniquity: function definitions and calls
15	Jig: jumping to tail calls
16	Knock: pattern matching
17	Loot: lambda the ultimate
18	Mug: symbols and interned string literals
19	Mountebank: quote and compound static data
20	Neerdowell: structures

9.1	Reading and writing bytes
9.2	Reading and writing bytes in Racket
9.3	Meaning of Evildoer programs
9.4	A Run-Time for Evildoer
9.5	Calling C functions from a86
9.6	A Compiler for Evildoer