7.9

#### 13Iniquity: function definitions and calls

##### 13.1Functions

With Hustle: heaps and lists, we removed a major computational shortcoming by adding the ability to use inductively defined data. Doing so gives programmers the ability to represent arbitrarily large pieces of information.

And yet, the language remains hamstrung. It has no mechanism to compute with such data. Sure, a programmer could compute the sum of the first n elements of a list, for some fixed n. But the size of this program would be proportional to the size of n. Want to compute the sum of a billion element list? You’ll need (at least) a billion expressions. Want to compute the sum of a larger list? Write a longer program! But if you want to compute the sum of any list, regardless of its size? You’ll need an arbitrarily long program. Of course programs are always of some fixed size, since after all, you have to write them down and at some point you have to stop writing. This means the expressiveness of our language is still severely restricted.

The solution is to bring in the computational analog of inductive data. When you have arbitrarily large data, you need arbitrarily long running computations to process them. Crucially, these arbitrarily long running computations need to be described by finite sized programs. The analog of inductive data are recursive functions.

So let’s now remove the computational shackles by incorporating functions, and in particular, recursive functions, which will allow us to compute over arbitrarily large data with finite-sized programs.

Let’s call it Iniquity.

We will extend the syntax by introducing a new syntactic category of programs, which have the shape:

 (begin (define (f0 x0 ...) e0) (define (f1 x1 ...) e1) ... e)

And the syntax of expressions will be extended to include function calls:

(fi e0 ...)

where fi is one of the function names defined in the program.

Note that functions can have any number of parameters and, symmetrically, calls can have any number of arguments. A program consists of zero or more function definitions followed by an expression.

##### 13.2An Interpreter for Functions

Writing an interpreter for Inquity is not too hard. The main idea is that the interpretation of expression is now parameterized by a set of function definitions from the program. It serves as a second kind of environment that gets passed around and is used to resolve function definitions when interpreting function calls.

The way a function call is interpreted is to first interpret all of the arguments, building up a list of results. Then the definition of the function being called is looked up. If the function has the same number of parameters as there are arguments in the call, the body of the function is interpreted in an enviorment that maps each parameter to to the corresponding argument. That’s it.

iniquity/interp.rkt

 #lang racket (provide interp interp-env interp-prim1) (require "ast.rkt" "env.rkt" "interp-prims.rkt") ;; type Answer = Value | 'err ;; type Value = ;; | Integer ;; | Boolean ;; | Character ;; | Eof ;; | Void ;; | '() ;; | (cons Value Value) ;; | (box Value) ;; type REnv = (Listof (List Id Value)) ;; type Defns = (Listof Defn) ;; Prog Defns -> Answer (define (interp p) (match p [(Prog ds e) (interp-env e '() ds)])) ;; Expr Env Defns -> Answer (define (interp-env e r ds) (match e [(Int i)  i] [(Bool b) b] [(Char c) c] [(Eof)    eof] [(Empty)  '()] [(Var x)  (lookup r x)] [(Prim0 'void) (void)] [(Prim0 'read-byte) (read-byte)] [(Prim0 'peek-byte) (peek-byte)] [(Prim1 p e) (match (interp-env e r ds) ['err 'err] [v (interp-prim1 p v)])] [(Prim2 p e1 e2) (match (interp-env e1 r ds) ['err 'err] [v1 (match (interp-env e2 r ds) ['err 'err] [v2 (interp-prim2 p v1 v2)])])] [(If p e1 e2) (match (interp-env p r ds) ['err 'err] [v (if v (interp-env e1 r ds) (interp-env e2 r ds))])] [(Begin e1 e2) (match (interp-env e1 r ds) ['err 'err] [_ (interp-env e2 r ds)])] [(Let x e1 e2) (match (interp-env e1 r ds) ['err 'err] [v (interp-env e2 (ext r x v) ds)])] [(App f es) (match (interp-env* es r ds) [(list vs ...) (match (defns-lookup ds f) [(Defn f xs e) ; check arity matches (if (= (length xs) (length vs)) (interp-env e (zip xs vs) ds) 'err)])] [_ 'err])])) ;; (Listof Expr) REnv Defns -> (Listof Value) | 'err (define (interp-env* es r ds) (match es ['() '()] [(cons e es) (match (interp-env e r ds) ['err 'err] [v (cons v (interp-env* es r ds))])])) ;; Defns Symbol -> Defn (define (defns-lookup ds f) (findf (match-lambda [(Defn g _ _) (eq? f g)]) ds)) (define (zip xs ys) (match* (xs ys) [('() '()) '()] [((cons x xs) (cons y ys)) (cons (list x y) (zip xs ys))]))

A couple of things to note:

• since the function definition environment is passed along even when interpreting the body of function definitions, this interpretation supports recursion, and even mutual recursion.

• functions are not values (yet). We cannot bind a variable to a function. We cannot make a list of functions. We cannot compute a function. The first position of a function call is a function name, not an arbitrary expression. Nevertheless, we have significantly increased the expressivity of our language.

We can try it out:

Examples

 > (interp (parse '(begin (define (double x) (+ x x)) (double 5))))

10

We can see it works with recursive functions, too. Here’s a recursive function for computing triangular numbers:

Examples

 > (interp (parse '(begin (define (tri x) (if (zero? x) 0 (+ x (tri (sub1 x))))) (tri 9))))

45

We can even define mutually recursive functions such as even? and odd?:

Examples

 > (interp (parse '(begin (define (even? x) (if (zero? x) #t (odd? (sub1 x)))) (define (odd? x) (if (zero? x) #f (even? (sub1 x)))) (even? 101))))

#f

##### 13.3Compiling a Call

Turning to compilation, let’s start small by supposing we have a single, pre-defined function and we add to the language the ability to call this function.

A function in assembly has an entry point (a label), followed by a sequence of instruction, ending with the ret instruction. As a convention, we will pass all arguments to a function on the stack.

So here is Asm representing a single function named double

 (seq (Label 'double) (Mov 'rax (Offset 'rsp -1)) (Add 'rax 'rax) (Ret))

This function takes one argument from the stack, adds it to itself, leaving the result in 'rax when it returns.

The Ret instruction works in concert with the Call instruction, which can be given a label, designating which function to call.

So if we wanted to call double with an argument of 5, we’d first need to write 5 in to the approrpriate spot in the stack, then issue the (Call 'double) instruction.

Since the double code is reading from offset -1 from 'rsp, it is tempting to assume this is where you should write the argument:

 (seq (Mov (Offset rsp -1) 5) (Call 'double) (Add 'rax 1)) ; rax now holds 11

The problem is here is that the Call instruction works by modifying the 'rsp register.

Remember how 'rsp points to an “occupied” memory location and we said we just leave whatever is there alone? We can now explain what’s going on.

The Call instruction advances 'rsp to the next word of memory and writes the location of the instruction that occurs after the Call instruction. This is a return pointer. It then jumps to the beginning of the instruction sequence after the label that is the argument of Call. Those instruction execute and when we get to Ret, the return instruction reads that address stored in (Offset 'rsp 0), moves 'rsp back one word, and jumps to the instruction pointed to by the return pointer.

So calls and returns in assembly are really just shorthand for:
• pushing an address (where to return) on the stack

• jumping to a label

• executing some code

• poping the return point off the stack and jumping to it

The problem with the function call we wrote above is that we put the argument in (Offset 'rsp -1), but then the Call advances (by decrementing) the 'rsp register and writes the return point in (Offset 'rsp 0), but that’s exactly where we had put the argument!

The solution then, is to put the argument at index -2 from the caller’s perspective. When the call is made, it will be at index -1 from the function’s perspective:

 (seq (Mov (Offset 'rsp -2) 5) (Call 'double) (Add 'rax 1)) ; rax now holds 11

Now that we have seen how to make a call and return in assembly, we can tackle code generation for a function call (double e) in our language.

 (define (compile-call-double e0 c) (seq (compile-e e0 c) (Mov (Offset 'rsp -2) 'rax) (Call 'double)))

This will work if the program consists only of a call to double, however it doesn’t work in general.

To see the problem, notice how the call code always uses the index -2 for the first argument and index -1 will hold the return pointer when the call is made. But what if those spots are occuppied on the stack!? The problem is that we’ve always calculated stack offsets statically and never mutated 'rsp. But Call expects 'rsp to be pointing to the top of the stack.

The solution is to emit code that will adjust 'rsp to the top of (our statically calculated) stack. How much does 'rsp need to change? It needs to be decremented by the number of items in the static environment, c. We can adjust 'rsp, make the call, but after the call returns, we can adjust 'rsp back to where it was before the call.

The code is:

 ; Expr CEnv -> Asm (define (compile-call-double e0 c) (let ((h  (* 8 (length c)))) (seq (compile-e e0 c) (Sub 'rsp h) (Mov (Offset 'rsp -2) rax) ; place result of e0 in stack (Call 'double) (Add 'rsp h))))

This makes calls work in any stack context.

It’s easy to generalize this code to call any given function name:

 ; Id Expr CEnv -> Asm (define (compile-call f e0 c) (let ((h  (* 8 (length c)))) (seq (compile-e e0 c) (Sub 'rsp h) (Mov (Offset 'rsp -2) 'rax) (Call f) (Add 'rsp h))))

If we want accept any number of arguments, we have to do a little more work.

We rely on the following helpful function for compiling a list of expressions and saving the results on the stack:

 ; (Listof Expr) CEnv -> Asm (define (compile-es es c) (match es ['() '()] [(cons e es) (seq (compile-e e c) (Mov (Offset 'rsp (- (add1 (length c)))) 'rax) (compile-es es (cons #f c)))]))

So to compile a call with any number of arguments:

 ; Id (Listof Expr) CEnv -> Asm (define (compile-call f es c) (let ((h  (* 8 (length c)))) (seq (compile-es es (cons #f c)) (Sub 'rsp h) (Call f) (Add 'rsp h))))

Notice that we call compile-es in an extended static environment, that has one addition slot used. This will bump the location of all the argument results by one, leaving the first slot available for the return pointer!

##### 13.4Compiling a Function Definition

Now that we can compile calls, we just need to be able to compile function definitions such as:

 (define (double x) (+ x x))

The idea here is pretty simple. The compiler needs to emit a label for the function, such as 'double, followed by the instructions for the body of the function.

The body of the function has a single free variable, x. We can compile the expression in a static environement '(x) so that it resolves this variable to the first position on the stack, which, thanks to the code we emit for calls, will hold the argument value.

After the instructions for the body, a (Ret) instruction is emitted so that control transfers back to the caller.

So the code for compiling a function definition is:

 ; Id Id Expr -> Asm (define (compile-define f x e0) (seq (Label f) (compile-e e0 (list x)) (Ret)))

What about functions that take zero or more arguments? That’s easy, just compile the body in an appropriate static environment.

 ; Id (Listof Id) Expr -> Asm (define (compile-define f xs e0) (seq (Label f) (compile-e e0 (reverse xs)) (Ret)))

(Note that we reverse the parameter list due to the order in which arguments are added to the stack.)

##### 13.5On Names and Labels

There is one final wrinkle, which is that identifiers in our language include many things which are not valid labels for the Nasm assembler. Hence compiling a function like:

(define (^weird% x) x)

will cause the assembler to reject the emitted code since '^weird% is not a valid label name. Labels must consist only of letters, numbers, _, \$, ?, @, ~, and ?.

We solve this problem by using a function that maps arbitrary Racket symbols to valid Nasm labels (represented as symbols). The function has the property distinct symbols always map to distinct labels.

Examples

 > (symbol->label '^weird%) 'label__weird__c3e020e4e5471e4

Using this function, we can touch up our code:

 ; Id (Listof Expr) CEnv -> Asm (define (compile-call f es c) (let ((h  (* 8 (length c)))) (seq (compile-es es (cons #f c)) (Sub 'rsp h) (Call (symbol->label f)) (Add 'rsp h)))) ; Id (Listof Id) Expr -> Asm (define (compile-define f xs e0) (seq (Label (symbol->label f)) (compile-e e0 (reverse xs)) (Ret)))
##### 13.6A Compiler for Iniquity

The last piece of the puzzle is the function for emitting code for a complete program:

 ; Prog -> Asm (define (compile p) (match p [(Prog ds e) (seq (compile-entry e) (compile-defines ds))]))

It relies on a helper compile-defines for compiling each function definition and flattening the assembly instructions into a single list:

 ; [Listof Defn] -> Asm (define (compile-defines ds) (match ds ['() (seq)] [(cons d ds) (seq (compile-define d) (compile-defines ds))]))

Here’s an example of the code this compiler emits:

Examples

 > (displayln (asm-string (compile (parse '(begin (define (double x) (+ x x)) (double 5))))))
 global entry default rel section .text extern peek_byte extern read_byte extern write_byte extern raise_error entry: mov rbx, rdi ;; begin #s(App double (#s(Int 5))) ;; begin #s(Int 5) mov rax, 80 ;; end #s(Int 5) push rax call label_double_6334fa372629b92 add rsp, 8 ;; end #s(App double (#s(Int 5))) mov rdx, rbx ret label_double_6334fa372629b92: ;; begin #s(Prim2 + #s(Var x) #s(Var x)) ;; begin #s(Var x) mov rax, [rsp + 8] ;; end #s(Var x) push rax ;; begin #s(Var x) mov rax, [rsp + 16] ;; end #s(Var x) pop r8 mov r9, r8 and r9, 15 cmp r9, 0 jne raise_error mov r9, rax and r9, 15 cmp r9, 0 jne raise_error add rax, r8 ;; end #s(Prim2 + #s(Var x) #s(Var x)) ret

And we can confirm running the code produces results consistent with the interpreter:

Examples

> (current-objs '("runtime.o"))
 > (define (run e) (asm-interp (compile (parse e))))
 > (run '(begin (define (double x) (+ x x)) (double 5)))

'(#<cpointer> . 160)

 > (run '(begin (define (tri x) (if (zero? x) 0 (+ x (tri (sub1 x))))) (tri 9)))

'(#<cpointer> . 720)

 > (run '(begin (define (even? x) (if (zero? x) #t (odd? (sub1 x)))) (define (odd? x) (if (zero? x) #f (even? (sub1 x)))) (even? 101)))

'(#<cpointer> . 56)

The complete compiler code:

iniquity/compile.rkt

 #lang racket (provide (all-defined-out)) (require "ast.rkt" "types.rkt" a86/ast) ;; Registers used (define rax 'rax) ; return (define rbx 'rbx) ; heap (define rdx 'rdx) ; return, 2 (define r8  'r8)  ; scratch in +, - (define r9  'r9)  ; scratch in assert-type (define rsp 'rsp) ; stack (define rdi 'rdi) ; arg ;; type CEnv = [Listof Variable] ;; Expr -> Asm (define (compile p) (match p [(Prog ds e) (prog (Extern 'peek_byte) (Extern 'read_byte) (Extern 'write_byte) (Extern 'raise_error) (Label 'entry) (Mov rbx rdi) ; recv heap pointer (compile-e e '(#f)) (Mov rdx rbx) ; return heap pointer in second return register (Ret) (compile-defines ds))])) ;; [Listof Defn] -> Asm (define (compile-defines ds) (match ds ['() (seq)] [(cons d ds) (seq (compile-define d) (compile-defines ds))])) ;; Defn -> Asm (define (compile-define d) (match d [(Defn f xs e) (seq (Label (symbol->label f)) (compile-e e (parity (cons #f (reverse xs)))) (Ret))])) (define (parity c) (if (even? (length c)) (append c (list #f)) c)) ;; Expr CEnv -> Asm (define (compile-e e c) (seq (%% (format "begin ~a" e)) (match e [(Int i)            (compile-value i)] [(Bool b)           (compile-value b)] [(Char c)           (compile-value c)] [(Eof)              (compile-value eof)] [(Empty)            (compile-value '())] [(Var x)            (compile-variable x c)] [(App f es)         (compile-app f es c)] [(Prim0 p)          (compile-prim0 p c)] [(Prim1 p e)        (compile-prim1 p e c)] [(Prim2 p e1 e2)    (compile-prim2 p e1 e2 c)] [(If e1 e2 e3)      (compile-if e1 e2 e3 c)] [(Begin e1 e2)      (compile-begin e1 e2 c)] [(Let x e1 e2)      (compile-let x e1 e2 c)]) (%% (format "end ~a" e)))) ;; Value -> Asm (define (compile-value v) (seq (Mov rax (imm->bits v)))) ;; Id CEnv -> Asm (define (compile-variable x c) (let ((i (lookup x c))) (seq (Mov rax (Offset rsp i))))) ;; Op0 CEnv -> Asm (define (compile-prim0 p c) (match p ['void      (seq (Mov rax val-void))] ['read-byte (seq (pad-stack c) (Call 'read_byte) (unpad-stack c))] ['peek-byte (seq (pad-stack c) (Call 'peek_byte) (unpad-stack c))])) ;; Op1 Expr CEnv -> Asm (define (compile-prim1 p e c) (seq (compile-e e c) (match p ['add1 (seq (assert-integer rax) (Add rax (imm->bits 1)))] ['sub1 (seq (assert-integer rax) (Sub rax (imm->bits 1)))] ['zero? (let ((l1 (gensym))) (seq (assert-integer rax) (Cmp rax 0) (Mov rax val-true) (Je l1) (Mov rax val-false) (Label l1)))] ['char? (let ((l1 (gensym))) (seq (And rax mask-char) (Xor rax type-char) (Cmp rax 0) (Mov rax val-true) (Je l1) (Mov rax val-false) (Label l1)))] ['char->integer (seq (assert-char rax) (Sar rax char-shift) (Sal rax int-shift))] ['integer->char (seq assert-codepoint (Sar rax int-shift) (Sal rax char-shift) (Xor rax type-char))] ['eof-object? (eq-imm val-eof)] ['write-byte (seq assert-byte (pad-stack c) (Mov rdi rax) (Call 'write_byte) (unpad-stack c) (Mov rax val-void))] ['box (seq (Mov (Offset rbx 0) rax) (Mov rax rbx) (Or rax type-box) (Add rbx 8))] ['unbox (seq (assert-box rax) (Xor rax type-box) (Mov rax (Offset rax 0)))] ['car (seq (assert-cons rax) (Xor rax type-cons) (Mov rax (Offset rax 8)))] ['cdr (seq (assert-cons rax) (Xor rax type-cons) (Mov rax (Offset rax 0)))] ['empty? (eq-imm val-empty)]))) ;; Op2 Expr Expr CEnv -> Asm (define (compile-prim2 p e1 e2 c) (seq (compile-e e1 c) (Push rax) (compile-e e2 (cons #f c)) (match p ['+ (seq (Pop r8) (assert-integer r8) (assert-integer rax) (Add rax r8))] ['- (seq (Pop r8) (assert-integer r8) (assert-integer rax) (Sub r8 rax) (Mov rax r8))] ['eq? (let ((l (gensym))) (seq (Cmp rax (Offset rsp 0)) (Sub rsp 8) (Mov rax val-true) (Je l) (Mov rax val-false) (Label l)))] ['cons (seq (Mov (Offset rbx 0) rax) (Pop rax) (Mov (Offset rbx 8) rax) (Mov rax rbx) (Or rax type-cons) (Add rbx 16))]))) ;; Id [Listof Expr] CEnv -> Asm ;; Here's why this code is so gross: you have to align the stack for the call ;; but you have to do it *before* evaluating the arguments es, because you need ;; es's values to be just above 'rsp when the call is made.  But if you push ;; a frame in order to align the call, you've got to compile es in a static ;; environment that accounts for that frame, hence: (define (compile-app f es c) (if (even? (+ (length es) (length c))) (seq (compile-es es c) (Call (symbol->label f)) (Add rsp (* 8 (length es))))            ; pop args (seq (Sub rsp 8)                             ; adjust stack (compile-es es (cons #f c)) (Call (symbol->label f)) (Add rsp (* 8 (add1 (length es)))))))   ; pop args and pad ;; [Listof Expr] CEnv -> Asm (define (compile-es es c) (match es ['() '()] [(cons e es) (seq (compile-e e c) (Push rax) (compile-es es (cons #f c)))])) ;; Imm -> Asm (define (eq-imm imm) (let ((l1 (gensym))) (seq (Cmp rax imm) (Mov rax val-true) (Je l1) (Mov rax val-false) (Label l1)))) ;; Expr Expr Expr CEnv -> Asm (define (compile-if e1 e2 e3 c) (let ((l1 (gensym 'if)) (l2 (gensym 'if))) (seq (compile-e e1 c) (Cmp rax val-false) (Je l1) (compile-e e2 c) (Jmp l2) (Label l1) (compile-e e3 c) (Label l2)))) ;; Expr Expr CEnv -> Asm (define (compile-begin e1 e2 c) (seq (compile-e e1 c) (compile-e e2 c))) ;; Id Expr Expr CEnv -> Asm (define (compile-let x e1 e2 c) (seq (compile-e e1 c) (Push rax) (compile-e e2 (cons x c)) (Add rsp 8))) ;; CEnv -> Asm ;; Pad the stack to be aligned for a call with stack arguments (define (pad-stack-call c i) (match (even? (+ (length c) i)) [#f (seq (Sub rsp 8) (% "padding stack"))] [#t (seq)])) ;; CEnv -> Asm ;; Pad the stack to be aligned for a call (define (pad-stack c) (pad-stack-call c 0)) ;; CEnv -> Asm ;; Undo the stack alignment after a call (define (unpad-stack-call c i) (match (even? (+ (length c) i)) [#f (seq (Add rsp 8) (% "unpadding"))] [#t (seq)])) ;; CEnv -> Asm ;; Undo the stack alignment after a call (define (unpad-stack c) (unpad-stack-call c 0)) ;; Id CEnv -> Integer (define (lookup x cenv) (match cenv ['() (error "undefined variable:" x)] [(cons y rest) (match (eq? x y) [#t 0] [#f (+ 8 (lookup x rest))])])) (define (assert-type mask type) (λ (arg) (seq (Mov r9 arg) (And r9 mask) (Cmp r9 type) (Jne 'raise_error)))) (define (type-pred mask type) (let ((l (gensym))) (seq (And rax mask) (Cmp rax type) (Mov rax (imm->bits #t)) (Je l) (Mov rax (imm->bits #f)) (Label l)))) (define assert-integer (assert-type mask-int type-int)) (define assert-char (assert-type mask-char type-char)) (define assert-box (assert-type ptr-mask type-box)) (define assert-cons (assert-type ptr-mask type-cons)) (define assert-codepoint (let ((ok (gensym))) (seq (assert-integer rax) (Cmp rax (imm->bits 0)) (Jl 'raise_error) (Cmp rax (imm->bits 1114111)) (Jg 'raise_error) (Cmp rax (imm->bits 55295)) (Jl ok) (Cmp rax (imm->bits 57344)) (Jg ok) (Jmp 'raise_error) (Label ok)))) (define assert-byte (seq (assert-integer rax) (Cmp rax (imm->bits 0)) (Jl 'raise_error) (Cmp rax (imm->bits 255)) (Jg 'raise_error))) ;; Symbol -> Label ;; Produce a symbol that is a valid Nasm label (define (symbol->label s) (string->symbol (string-append "label_" (list->string (map (λ (c) (if (or (char<=? #\a c #\z) (char<=? #\A c #\Z) (char<=? #\0 c #\9) (memq c '(#\_ #\\$ #\# #\@ #\~ #\. #\?))) c #\_)) (string->list (symbol->string s)))) "_" (number->string (eq-hash-code s) 16))))