On this page:
3.1 Overview
3.2 Giving x86 a try
3.3 a86:   Representing x86 Code as Data
3.4 Stacks:   pushing, popping, calling, returning
3.5 a86 Reference
3.6 Instruction set
register?
label?
instruction?
offset?
seq
prog
%
%%
%%%
Offset
Label
Extern
Call
Ret
Mov
Add
Sub
Cmp
Jmp
Je
Jne
Jl
Jg
And
Or
Xor
Sal
Sar
Push
Pop
Lea
3.7 From a86 to x86
asm-string
3.8 An Interpreter for a86
asm-interp
current-objs
asm-interp/  io
8.1

3 a86: a Little Assembly Language

You need to let the little things that would ordinarily bore you suddenly thrill you.

    3.1 Overview

    3.2 Giving x86 a try

    3.3 a86: Representing x86 Code as Data

    3.4 Stacks: pushing, popping, calling, returning

    3.5 a86 Reference

    3.6 Instruction set

    3.7 From a86 to x86

    3.8 An Interpreter for a86

3.1 Overview

x86 is an instruction set architecture (ISA), which is a fancy way of saying a programming language whose interpreter is implemented in hardware. Really, x86 is a family of instruction set architectures, and the first member of the family was 8086, a 16-bit ISA for Intel’s 8086 chipset.

x86 is old. It’s older than the professors teaching this class. But it lives on today in Intel and AMD based computers in the form x86-64, the 64-bit descendant of the 8086 language.

Because it’s old and because the design of each generation of x86 has had significant backwards compatability requirements and because modern processors are sophisticated pieces of machinery, the x86-64 language is, well, complicated. For example, Intel’s x86 software developer’s manual is 5,066 pages long. AMD’s manual is 3,242 pages.

x86-64 is going to be used as the target language of our compiler. We’ll build up a translation from a very high-level language, based on Racket, down to the very low-level langauge of x86-64.

However, that doesn’t mean we need to digest 8K+ pages of dense technical manuals. We will only use very small fraction of the x86-64 instruction set.

To make our lives easier, we will do what programming language designers often do, we will abstract the behemoth of x86-64 to a small, core language (which we call a86). Our compilers will target a86 and compiling from a86 to x86 as the last step in the compiler pipeline will be dead simple.

This chapter describes the a86 language.

3.2 Giving x86 a try

Before describing a86, let’s take a brief look at x86.

There are a few different human-readable formats for writing x86 assembly code, but we’ll be using the one supported by the Netwide Assembler (NASM).

Here’s an example x86 program, written using nasm syntax. The program has one global label called entry, which will be the main entry point for the program. This program computes the 36th triangular number, which will reside in register rax when the code returns.

The conventions for label names differ between Mac and Linux systems. On MacOS, you need to prefix all label names with an underscore, while on Linux you do not. So on a Mac, you would use the names _entry, _tri, and _done, while on Linux you would use entry, tri, and done.

This example is shown using the Linux naming convention, because that’s what operating system was used when this web page was built.

a86/tri.s

        global entry
        default rel
        section .text
        global entry
entry:
        mov rbx, 36             ; the "input"
;;; tri: a recursive function for computing nth
;;; triangular number, where n is given in rbx.
tri:
        cmp rbx, 0              ; if rbx = 0, done
        je done
        push rbx                ; save rbx
        sub rbx, 1
        call tri                ; compute tri(rbx-1) in rax
        pop rbx                 ; restore rbx
        add rax, rbx            ; result is rbx+tri(rbx-1)
        ret
done:                           ; jump here for base case
        mov rax, 0              ; return 0
        ret

The nth triangular number is the sum of the integers from 0 to n, so the 36th triangular number is 0 + 1 + 2 + 3 + ... + 34 + 35 + 36.

This code is not intended to be a model of efficient computation, rather it demonstrates some important x86 instructions and shows how to compute in a recursive style, even at a low-level.

Without getting too bogged down in the details, here how the code works. Instructions execute one after another. There are a number of registers which can be used to hold values. This code makes use of the rax and rdi register (and some other registers are implicitly used and altered by the call, push, pop and ret instructions). The lines like entry:, tri:, and done: are not instructions, but labels – they are names for locations in the source code and can be used as the targets of jumps and calls.

Suppose we start executing at entry.

Despite the lower-level mechanisms, this code computes in a way similar to a non-tail recursive definition of the tri function written in Racket:

(define (tri n)
  (if (= n 0)
      0
      (+ n (tri (sub1 n)))))
(tri 36)

As an exercise to check your understanding, try writing a tail-recursive version of tri and the corresponding x86 code, which should not need to push anything to the stack or use the call instruction.

We can compile the tri.s assembly program to an object file with nasm:

The format argument should be macho64 on Mac OS and elf64 on Linux.

shell

> nasm -f elf64 -o tri.o tri.s

To run the object file, we will need to link with a small C program that can call the entry label of our assembly code and then print the result:

a86/main.c

#include <stdio.h>
#include <inttypes.h>

int64_t entry();

int main(int argc, char** argv) {
  int64_t result = entry();
  printf("%" PRId64 "\n", result);
  return 0;
}

Notice that from the C program’s perspective, the assembly code defines what looks like a C function called entry that returns an int64_t result.

How does this work? When the C program calls entry it places a return pointer on the stack and jumps to entry. The fact that we decided to put the result in register rax was not arbitrary – that’s the register that by convention is used for a return value. When the assembly code executes it’s final ret instruction, it jumps back to C with the 36th triangular number in rax, which the C side knows is the return value. This convention is part of a larger set of conventions known as the Application Binary Interface. For a reference, see the Texts section of the notes.

We can compile the main.c C program to an object file with gcc:

shell

> gcc -c main.c -o main.o

Now we can make an executable by linking the two together:

shell

> gcc main.o tri.o -o tri

Finally, we can run the executable:

shell

> ./tri
666

There, of course, is a lot more to x86-64 than what’s been shown here. If you want to dig deeper, check the references in Texts. But now let’s turn to a86.

3.3 a86: Representing x86 Code as Data

Here we will employ one of the great ideas in computer science: we will represent programs as data. Rather than toil away at the level of x86, writing programs directly in nasm syntax, compiling, and running them, we will instead design a data type definition for representing x86 programs and compute programs.

Our representation will not be complete – this is going to help us simplify x86 down to something manageable. We won’t cover every instruction and we won’t cover every variation of the instructions we do cover.

An a86 program is a list of a86 instructions. Each instruction is represented as a structure, described in the following section.

Before working through these examples, you’ll need to install the a86 module, part of the langs package for this course. See The langs package for details on installing.

Here’s the triangular number example:

%, %%, and %%% are constructors for assembly comments.

Examples

; import the a86 library
> (require a86)
; a86 code that computes the 36th triangular number
> (define tri-36
    (list (Global 'entry)
          (Label 'entry)
          (Mov 'rbx 36)     (% "the \"input\"")
          (%%% "tri: a recursive function for computing nth")
          (%%% "triangular number, where n is given in rbx.")
          (Label 'tri)
          (Cmp 'rbx 0)      (% "if rbx = 0, done")
          (Je 'done)
          (Push 'rbx)       (% "save rbx")
          (Sub 'rbx 1)
          (Call 'tri)       (% "compute tri(rbx-1) in rax")
          (Pop 'rbx)        (% "restore rbx")
          (Add 'rax 'rbx)   (% "result is rbx+tri(rbx-1)")
          (Ret)
          (Label 'done)     (% "jump here for base case")
          (Mov 'rax 0)      (% "return 0")
          (Ret)))

This code should look familiar. At first glance it’s just the x86 code with more parentheses.

And who doesn’t love more parentheses?

But something fundamental has happended. This a86 program is just a value in Racket. This means we can use Racket as a Meta-Language to write programs that compute with x86 programs.

So for example, let’s say you have two a86 programs and you want to glue them together into one: well that just append. Suppose you want to compute which registers are used in a given a86 program? Suppose you want to replace uses of 'rax with 'rdi? It just a matter of writing the right Racket function to do it.

Here’s another immediate benefit. Instead of writing a single x86 program, let’s write an infinite set of a86 programs, one for computing each nth triangular number. Easy-peasy:

Examples

; Natural -> a86
; Computes a86 code that computes the nth triangular number
> (define (tri n)
    (list (Global 'entry)
          (Label 'entry)
          (Mov 'rbx n)      (% "the \"input\"")
          (%%% "tri: a recursive function for computing nth")
          (%%% "triangular number, where n is given in rbx.")
          (Label 'tri)
          (Cmp 'rbx 0)      (% "if rbx = 0, done")
          (Je 'done)
          (Push 'rbx)       (% "save rbx")
          (Sub 'rbx 1)
          (Call 'tri)       (% "compute tri(rbx-1) in rax")
          (Pop 'rbx)        (% "restore rbx")
          (Add 'rax 'rbx)   (% "result is rbx+tri(rbx-1)")
          (Ret)
          (Label 'done)
          (Mov 'rax 0)
          (Ret)))
; recreate original program
> (define tri-36 (tri 36))

It’s also easy to go from our data representation to its interpretation as an x86 program.

First, we convert the data to a string. There is a function provided for converting from a86 to a string representation of x86 code in nasm notation, called asm-string. You can use display to print this to the current output port (or to a file):

The asm-string function knows what OS you are using and adjusts the label naming convention to use underscores or not, so that you don’t have to worry about it.

Examples

> (display (asm-string (tri 36)))

        global entry

        default rel

        section .text

        global entry

entry:

        mov rbx, 36             ; the "input"

;;; tri: a recursive function for computing nth

;;; triangular number, where n is given in rbx.

tri:

        cmp rbx, 0              ; if rbx = 0, done

        je done

        push rbx                ; save rbx

        sub rbx, 1

        call tri                ; compute tri(rbx-1) in rax

        pop rbx                 ; restore rbx

        add rax, rbx            ; result is rbx+tri(rbx-1)

        ret

done:

        mov rax, 0

        ret

Notice how this generates exactly what you saw in tri.s.

From here, we can assemble, link, and execute.

We can also, since we have a general purpose programming language at our disposal in the meta-language, write a program to do all that for us:

Examples

> (asm-interp (tri 36))

666

The asm-interp function consumes an a86 program as input and produces the integer result the program computes, i.e. it is an Interpreter for a86. Behind the scenes it does this by converting to nasm, assemblying, compiling a thin C wrapper, executing the program, and reading the result. This will be a rather handy tool both in interactively exploring the a86 language (you can write assembly in a REPL), but also an important tool when it comes time to test the compilers we write.

3.4 Stacks: pushing, popping, calling, returning

The a86 execution model includes access to memory that can be used as a stack data structure. There are operations that manipulate the stack, such as Push, Pop, Call, and Ret, and the stack register pointer 'rsp is dedicated to the stack. Stack memory is allocated in “low” address space and grows downward. So pushing an element on to the stack decrements 'rsp.

The stack is useful as a way to save away values that may be needed later. For example, let’s say you have two (assembly-level) functions and you want to produce the sum of their results. By convention, functions return their result in 'rax, so doing something like this won’t work:

(seq (Call 'f)
     (Call 'g)
     (Add 'rax ...))

The problem is the return value of 'f gets clobbered by 'g. You might be tempted to fix the problem by moving the result to another register:

(seq (Call 'f)
     (Mov 'rbx 'rax)
     (Call 'g)
     (Add 'rax 'rbx))

This works only so long as 'g doesn’t clobber 'rbx. In general, it might not be possible to avoid that situation. So the solution is to use the stack to save the return value of 'f while the call to 'g proceeds:

(seq (Call 'f)
     (Push 'rax)
     (Call 'g)
     (Pop 'rbx)
     (Add 'rax 'rbx))

This code pushes the value in 'rax on to the stack and then pops it off and into 'rbx after 'g returns. Everything works out so long as 'g maintains a stack-discipline, i.e. the stack should be in the same state when 'g returns as when it was called.

We can make a complete example to confirm that this works as expected. First let’s set up a little function for letting us try out examples:

Examples

> (define (eg asm)
    (asm-interp
     (prog
      (Global 'entry)
      (Label 'entry)
      asm  ; the example code we want to try out
      (Ret)
  
      (Label 'f)      ; calling 'f returns 36
      (Mov 'rax 36)
      (Ret)
  
      (Label 'g)      ; calling 'g returns 6, but
      (Mov 'rbx 4)    ; it clobbers 'rbx just for the lulz
      (Mov 'rax 6)
      (Ret))))

Now let’s try it, using the stack to confirm it does the right thing:

Examples

> (eg (seq (Call 'f)
           (Push 'rax)
           (Call 'g)
           (Pop 'rbx)
           (Add 'rax 'rbx)))

42

Compare that with the first version that used a register to save the result of 'f:

Examples

> (eg (seq (Call 'f)
           (Mov 'rbx 'rax)
           (Call 'g)
           (Add 'rax 'rbx)))

10

The Push and Pop instructions offer a useful illusion, but of course, there’s not really any data structure abstraction here; there’s just raw memory and registers. But so long as code abides by conventions, the illusion turns out to be the true state of affairs.

What’s really going on under the hood of Push and Pop is that the 'rsp register is decremented and the value is written to the memory location pointed to by the value of 'rsp.

The following code is equivalent to what we wrote above:

Examples

> (eg (seq (Call 'f)
           (Sub 'rsp 8)                ; "allocate" a word on the stack
           (Mov (Offset 'rsp 0) 'rax)  ; write 'rax to top frame
           (Call 'g)
           (Mov 'rbx (Offset 'rsp 0))  ; load top frame into 'rbx
           (Add 'rsp 8)                ; "deallocate" word on the stack
           (Add 'rax 'rbx)))

42

As you can see from this code, it would be easy to violate the usual invariants of stack data structure to, for example, access elements beyond the top of the stack. The value of Push and Pop is they make clear that you are using things in a stack-like way and they keep you from screwing up the accesses, offsets, and adjustments to 'rsp.

Just as Push and Pop are useful illusions, so too are Call and Ret. They give the impression that there is a notion of a procedure and procedure call mechanism in assembly, but actually there’s no such thing.

Think for a moment about what it means to “call” 'f in the examples above. When executing (Call 'f), control jumps to the instruction following (Label 'f). When we then get to (Ret), somehow the CPU knows to jump back to the instruction following the (Call 'f) that we started with.

What’s really going on is that (Call 'f) is pushing the address of subsequent instruction on to the stack and then jumping to the label 'f. This works in concert with Ret, which pops the return address off the stack and jumping to it.

Just as we could write equivalent code without Push and Pop, we can write the same code without Call and Ret.

We do need one new trick, which is the Lea instruction, which loads an effective address. You can think of it like Mov except that it loads the address of something rather than what is pointed to by an address. For our purposes, it is useful for loading the address of a label:

(Lea 'rax 'f)

This instruction puts the address of label 'f into rax. You can think of this as loading a function pointer into 'rax. With this new instruction, we can illuminate what is really going on with Call and Ret:

Examples

> (eg (seq (Lea 'rax 'fret)  ; load address of 'fret label into 'rax
           (Push 'rax)       ; push the return pointer on to stack
           (Jmp 'f)          ; jump to 'f
           (Label 'fret)     ; < return point for "call" to 'f
           (Push 'rax)       ; save result (like before)
           (Lea 'rax 'gret)  ; load address of 'gret label into 'rax
           (Push 'rax)       ; push the return pointer on to stack
           (Jmp 'g)          ; jump to 'g
           (Label 'gret)     ; < return point for "call" to 'g
           (Pop 'rbx)        ; pop saved result from calling 'f
           (Add 'rax 'rbx)))

42

The above shows how to encode Call as Lea, Push, and Jmp. The encoding of Ret is just:

(seq (Pop 'rbx)
     (Jmp 'rbx))
3.5 a86 Reference

 (require a86) package: langs

The a86 language may evolve some over the course of the semester, but we will aim to document any changes by updating this section. Also, because the run-time system changes for each language, you may need to do some work to have asm-interp cooperate with your run-time system.

This module provides all of the bindings from a86/ast,a86/printer, and a86/interp, described below

3.6 Instruction set

 (require a86/ast) package: langs

This section describes the instruction set of a86.

There are 16 registers: 'rax, 'rbx, 'rcx, 'rdx, 'rbp, 'rsp, 'rsi, 'rdi, 'r8, 'r9, 'r10, 'r11, 'r12, 'r13, 'r14, and 'r15. These registers are 64-bits wide. There is also 'eax which accesses the lower 32-bits of 'rax. This is useful in case you need to read or write 32-bits of memory.

Each register plays the same role as in x86, so for example 'rsp holds the current location of the stack.

procedure

(register? x)  boolean?

  x : any/c
A predicate for registers.

procedure

(label? x)  boolean?

  x : any/c
A predicate for label names, i.e. symbols which are not register names.

procedure

(instruction? x)  boolean?

  x : any/c
A predicate for instructions.

procedure

(offset? x)  boolean?

  x : any/c
A predicate for offsets.

procedure

(seq x ...)  (listof instruction?)

  x : (or/c instruction? (listof instruction?))
A convenience function for splicing togeter instructions and lists of instructions.

Examples

> (seq)

'()

> (seq (Label 'foo))

(list (Label 'foo))

> (seq (list (Label 'foo)))

(list (Label 'foo))

> (seq (list (Label 'foo)
             (Mov 'rax 0))
       (Mov 'rdx 'rax)
       (list (Call 'bar)
             (Ret)))

(list

 (Label 'foo)

 (Mov 'rax 0)

 (Mov 'rdx 'rax)

 (Call 'bar)

 (Ret))

procedure

(prog x ...)  (listof instruction?)

  x : (or/c instruction? (listof instruction?))
Like seq, but also checks that the instructions are well-formed in the following sense:

This function is useful to do some early error checking over whole programs and can help avoid confusing NASM errors. Unlike seq it should be called at the outermost level of a function that produces a86 code and not nested.

Examples

> (prog (Label 'foo))

(list (Label 'foo))

> (prog (list (Label 'foo)))

(list (Label 'foo))

> (prog (Mov 'rax 32))

prog: no initial label found

> (prog (Label 'foo)
        (Label 'foo))

prog: duplicate label declaration found: 'foo

> (prog (Jmp 'foo))

prog: undeclared labels found: '(foo)

> (prog (Label 'foo)
        (Jmp 'foo))

(list (Label 'foo) (Jmp 'foo))

struct

(struct % (s))

  s : string?

struct

(struct %% (s))

  s : string?

struct

(struct %%% (s))

  s : string?
Creates a comment in the assembly code. The % constructor adds a comment toward the right side of the current line; %% creates a comment on its own line 1 tab over; %%% creates a comment on its own line aligned to the left.

Examples

> (display
   (asm-string
    (prog (%%% "Start of foo")
          (Label 'foo)
          ; Racket comments won't appear
          (%% "Inputs one argument in rdi")
          (Mov 'rax 'rdi)
          (Add 'rax 'rax)    (% "double it")
          (Sub 'rax 1)       (% "subtract one")
          (%% "we're done!")
          (Ret))))

        global foo

        default rel

        section .text

;;; Start of foo

foo:

        ;; Inputs one argument in rdi

        mov rax, rdi

        add rax, rax            ; double it

        sub rax, 1              ; subtract one

        ;; we're done!

        ret

struct

(struct Offset (r i))

  r : register?
  i : exact-integer?
Creates an memory offset from a register. Offsets are used as arguments to instructions to indicate memory locations. An error is signalled when given invalid inputs.

Examples

> (Offset 'rax 0)

(Offset 'rax 0)

> (Offset 'rax 4.1)

Offset: expects exact integer as second argument; given 4.1

struct

(struct Label (x))

  x : label?
Creates a label from the given symbol. Each label in a program must be unique. Register names cannot be used as label names.

Examples

> (Label 'fred)

(Label 'fred)

> (Label "fred")

Label: expects symbol; given "fred"

> (Label 'rax)

Label: cannot use register as label name; given 'rax

struct

(struct Extern (x))

  x : label?
Declares an external label.

struct

(struct Call (x))

  x : (or/c label? register?)
A call instruction.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Call 'f)
    (Add 'rax 1)
    (Ret)
    (Label 'f)
    (Mov 'rax 41)
    (Ret)))

42

struct

(struct Ret ())

A return instruction.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Ret)))

42

struct

(struct Mov (dst src))

  dst : (or/c register? offset?)
  src : (or/c register? offset? exact-integer?)
A move instruction. Moves src to dst.

Either dst or src may be offsets, but not both.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rbx 42)
    (Mov 'rax 'rbx)
    (Ret)))

42

> (Mov (Offset 'rax 0) (Offset 'rbx 0))

Mov: cannot use two memory locations; given (Offset 'rax 0),

(Offset 'rbx 0)

struct

(struct Add (dst src))

  dst : register?
  src : (or/c register? offset? exact-integer?)
An addition instruction. Adds src to dst and writes the result to dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 32)
    (Add 'rax 10)
    (Ret)))

42

struct

(struct Sub (dst src))

  dst : register?
  src : (or/c register? offset? exact-integer?)
A subtraction instruction. Subtracts src frrom dst and writes the result to dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 32)
    (Sub 'rax 10)
    (Ret)))

22

struct

(struct Cmp (a1 a2))

  a1 : (or/c register? offset?)
  a2 : (or/c register? offset? exact-integer?)
Compare a1 to a2. Doing a comparison sets the status flags that affect the conditional instructions like Je, Jl, etc.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Cmp 'rax 2)
    (Jg 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

42

struct

(struct Jmp (x))

  x : (or/c label? register?)
Jump to label x.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Jmp 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

42

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Pop 'rbx)
    (Jmp 'rbx)))

42

struct

(struct Je (x))

  x : (or/c label? register?)
Jump to label x if the conditional flag is set to “equal.”

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Cmp 'rax 2)
    (Je 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

0

struct

(struct Jne (x))

  x : (or/c label? register?)
Jump to label x if the conditional flag is set to “not equal.”

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Cmp 'rax 2)
    (Jne 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

42

struct

(struct Jl (x))

  x : (or/c label? register?)
Jump to label x if the conditional flag is set to “less than.”

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Cmp 'rax 2)
    (Jl 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

0

struct

(struct Jg (x))

  x : (or/c label? register?)
Jump to label x if the conditional flag is set to “greater than.”

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Cmp 'rax 2)
    (Jg 'l1)
    (Mov 'rax 0)
    (Label 'l1)
    (Ret)))

42

struct

(struct And (dst src))

  dst : (or/c register? offset?)
  src : (or/c register? offset? exact-integer?)
Compute logical “and” of dst and src and put result in dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 11) ; #b1011 = 11
    (And 'rax 14) ; #b1110 = 14
    (Ret)))

10

; #b1010 = 10

struct

(struct Or (dst src))

  dst : (or/c register? offset?)
  src : (or/c register? offset? exact-integer?)
Compute logical “or” of dst and src and put result in dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 11) ; #b1011 = 11
    (Or 'rax 14)  ; #b1110 = 14
    (Ret)))

15

; #b1111 = 15

struct

(struct Xor (dst src))

  dst : (or/c register? offset?)
  src : (or/c register? offset? exact-integer?)
Compute logical “exclusive or” of dst and src and put result in dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 11) ; #b1011 = 11
    (Xor 'rax 14) ; #b1110 = 14
    (Ret)))

5

; #b0101 = 5

struct

(struct Sal (dst i))

  dst : register?
  i : (integer-in 0 63)
Shift dst to the left i bits and put result in dst. The leftmost bits are discarded.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 4) ; #b100 = 4 = 2^2
    (Sal 'rax 6)
    (Ret)))

256

; #b100000000 = 256

struct

(struct Sar (dst i))

  dst : register?
  i : (integer-in 0 63)
Shift dst to the right i bits and put result in dst. The rightmost bits are discarded.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 256) ; #b100000000 = 256
    (Sar 'rax 6)
    (Ret)))

4

; #b100 = 4
> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 269) ; #b100001101 = 269
    (Sar 'rax 6)
    (Ret)))

4

; #b100 = 4

struct

(struct Push (a1))

  a1 : (or/c exact-integer? register?)
Decrements the stack pointer and then stores the source operand on the top of the stack.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Push 'rax)
    (Mov 'rax 0)
    (Pop 'rax)
    (Ret)))

42

struct

(struct Pop (a1))

  a1 : register?
Loads the value from the top of the stack to the destination operand and then increments the stack pointer.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Push 'rax)
    (Mov 'rax 0)
    (Pop 'rax)
    (Ret)))

42

struct

(struct Lea (dst x))

  dst : (or/c register? offset?)
  x : label?
Loads the address of the given label into dst.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Lea 'rbx 'done)
    (Mov 'rax 42)
    (Jmp 'rbx)
    (Mov 'rax 0)
    (Label 'done)
    (Ret)))

42

3.7 From a86 to x86

 (require a86/printer) package: langs

procedure

(asm-string is)  string?

  is : (listof instruction?)
Converts an a86 program to a string in nasm syntax. This is useful in concert with Racket functions for IO in order to write programs using concrete syntax that can be passed to nasm.

Examples

> (asm-string (prog (Global 'entry)
                    (Label 'entry)
                    (Mov 'rax 42)
                    (Ret)))

"        global entry\n        default rel\n        section .text\n        global entry\nentry:\n        mov rax, 42\n        ret\n"

> (display
   (asm-string (prog (Global 'entry)
                     (Label 'entry)
                     (Mov 'rax 42)
                     (Ret))))

        global entry

        default rel

        section .text

        global entry

entry:

        mov rax, 42

        ret

3.8 An Interpreter for a86

 (require a86/interp) package: langs

As you’ve seen throughout this chapter, a86 is equiped with an interpreter, which enables you to run assembly programs from within Racket. This won’t be directly useful in building a compiler, but it will be very handy for interactively exploring assembly programs and making examples and test cases for your compiler.

The simplest form of interpreting an a86 program is to use asm-interp.

procedure

(asm-interp is)  integer?

  is : (listof instruction?)
Assemble, link, and execute an a86 program.

Examples

> (asm-interp (prog (Global 'entry)
                    (Label 'entry)
                    (Mov 'rax 42)
                    (Ret)))

42

Programs do not have to start with 'entry. The interpreter will jump to whatever the first label in the program is:

Examples

> (asm-interp (prog (Global 'f)
                    (Label 'f)
                    (Mov 'rax 42)
                    (Ret)))

42

The argument of asm-interp should be a complete, well-formed a86 program. For best results, always use prog to construct the program so that error checking is done early. If you use prog and asm-interp and you get a NASM syntax error message, please report it to the course staff as this is a bug in the interpreter.

While we try to make syntax errors impossible, it is possible—quite easy, in fact—to write well-formed, but erroneous assembly programs. For example, this program tries to jump to null, which causes a segmentation fault:

Examples

> (asm-interp (prog (Global 'crash)
                    (Label 'crash)
                    (Mov 'rax 0)
                    (Jmp 'rax)))

invalid memory reference.  Some debugging context lost

It is often the case that we want our assembly programs to interact with the oustide or to use functionality implemented in other programming languages. For that reason, it is possible to link in object files to the running of an a86 program.

The mechanism for controlling which objects should be linked in is a parameter called current-objs, which contains a list of paths to object files which are linked to the assembly code when it is interpreted.

parameter

(current-objs)  (listof path-string?)

(current-objs objs)  void?
  objs : (listof path-string?)
 = '()
Parameter that controls object files that will be linked in to assembly code when running asm-interp.

For example, let’s implement a GCD function in C:

a86/gcd.c

int gcd(int n1, int n2) {
    return (n2 == 0) ? n1 : gcd(n2, n1 % n2);
}

First, compile the program to an object file:

shell

> gcc -fPIC -c gcd.c -o gcd.o

The option -fPIC is important; it causes the C compiler to emit “position independent code,” which is what enables Racket to dynamically load and run the code.

Once the object file exists, using the current-objs parameter, we can run code that uses things defined in the C code:

Examples

> (parameterize ((current-objs '("gcd.o")))
    (asm-interp (prog (Extern 'gcd)
                      (Global 'f)
                      (Label 'f)
                      (Mov 'rdi 11571)
                      (Mov 'rsi 1767)
                      (Sub 'rsp 8)
                      (Call 'gcd)
                      (Add 'rsp 8)
                      (Ret))))

57

This will be particularly relevant for writing a compiler where emitted code will make use of functionality defined in a runtime system.

Note that if you forget to set current-objs, you will get a linking error saying a symbol is undefined:

Examples

> (asm-interp (prog (Extern 'gcd)
                    (Global 'f)
                    (Label 'f)
                    (Mov 'rdi 11571)
                    (Mov 'rsi 1767)
                    (Sub 'rsp 8)
                    (Call 'gcd)
                    (Add 'rsp 8)
                    (Ret)))

link error: symbol gcd not defined in linked objects: ()

use `current-objs` to link in object containing symbol

definition.

procedure

(asm-interp/io is in)  (cons integer? string?)

  is : (listof instruction?)
  in : string?
Like asm-interp, but uses in for input and produce the result along with any output as a string.