Take-home final exam/  project
6.10

Take-home final exam/project

The take-home final is due Tuesday, Dec 19, at 11:59pm. It is a hybrid final exam and final project that has you bundling up your solutions for assignments 2 through 5 as a single compiler. You must treat this as a test and may not collaborate on or discuss any portion of the final. You should ask questions privately on piazza or email Javran or myself. You may ask questions publicly on Piazza only if they pertain to Racket broadly, Boehm GC broadly, or to assignments 2,3,4, or 5. When finished, email both myself and Javran, together, with a .tar.gz archive of your solution using the subject line 430 final submission: tgilray where tgilray is replaced with your own directory id.

Part 1 (20%) "piecing it together": Combine your assignments 2 through 5 into a single project that takes the input language for assignment 5 and produces a working binary. You may take from posted reference solutions as needed to get this working. Write a tests.rkt file that lists available tests when run with no command-line arguments, or will run a specified test when named or all tests when run on input "all" like so: racket tests.rkt all. You should take code from the assignment 2-5 versions of tests.rkt as desired. Each test should compare eval-llvm output with eval-top-level output. You may organize your tests into a /tests/ folder, as desired, with sub-folder organization or not. You must support all features from this set of tests, but should also write some of your own and test for completeness. We will run your project using Clang 3.9 on Debian/Ubuntu. If you are using OSX or Windows, use your GRACE account or a Lubuntu VM to test that it works under Linux before submitting.

Finally, you should write a README.txt (or README.md, etc) that contains a high-level description of your project and documentation for each primitive operation your project supports (describe the inputs and any requirements on their value or dynamic type, any default parameter supported, etc). You should support both standard application and a variadic apply version of every primitive you support, but you may desugar one of these forms into the other in order to support it if you wish. Please include your academic integrity declaration in your README.

Part 2 (20%) "run-time errors": Identify five distinct kinds of run-time failure, that your compiler does not handle properly, and fix each of these so that they properly raise an error or halt with a clear and helpful error message. Document each of these changes and write at least 2 tests for each. Describe each and mention the relevant tests in your README.

For example, you might fix any the following types of run-time errors:
  • Function is provided too many arguments.

  • Function is provided too few arguments.

  • Non-function value is applied.

  • Use of not-yet-initialized letrec or letrec* variable.

  • Division by zero.

  • A memory-use cap (e.g., 256MB) is exceeded.

You should not fix any errors that were already being handled by the original header.cpp, even if the error message was not previously very clear (e.g., vector-ref must take a vector object). Your fixes should be reasonably distinct, in line with the above list. If you have any concern about the distinctness of classes of errors you’re checking for, just email me to ask. Document briefly the classes of errors that are still not being caught properly.

Part 3 (30%) "add a feature": Implement a new feature comprising at least one additional built-in type and 3-5 associated primitive operations. Make sure these primitives can be applied normally or with apply. Make sure these primitives report a reasonable error at runtime if they are misused. Any of the following, done right, will be excellent:

You can also come up with your own feature to implement. This section will then be graded based on both the correctness and scope of the feature. Anything of similar difficulty to the above examples, done well, can earn full points. If you are unsure if your idea is sufficient to earn full points, just email me to ask.

Part 4 (30%) "Boehm GC": Integrate the latest version of the Boehm garbage collector (GC). You can find it on github and will find various helpful resources elsewhere online. The C interface is probably most straightforward and is documented well on the github page. We will run versions of the above tests that loop to make unbounded memory allocations as a sanity check; we may also use valgrind and profiler tools to check for memory leaks. If integrating the GC crashes the program on some tests but not others, adding a toggle switch to turn GC on or off will help us assess partial credit for part 4.

Integrating the GC will entail two primary changes outside of using GC_MALLOC to allocate new memory. First, the tagging scheme used in assignment 4 is suitable for precise collectors that cooperate with the runtime and know this tagging scheme (so they may identify pointers in memory precisely), but the Boehm GC will only identify correctly aligned and unmodified pointers. If you make use of the bottom 3 bits of pointers returned by GC_MALLOC, objects may be prematurely freed, causing the binary to crash. Any objects not being garbage collected, or basic values that are not heap-allocated, may still use the bottom 3 bits of a 64bit word as a tag. In this case, if the bottom 3 bits are 000, the object is managed by the GC and could instead use a word of memory at the front of the object to tag its dynamic type. This is similar to how promise objects or vectors were tagged in assignment 4. There are various tagging strategies that will work fine using the Boehm GC and you can implement this correctly in a number of ways.

Second, the Boehm GC will use the C stack as its root set. All reachable objects should be reachable from pointers stored in memory obtained using LLVM’s alloca instruction. A naive, but correct, implementation might simply use an additional alloca and store instruction each time a value is saved in a register (although there are more efficient strategies as well). Consider the following example:

%b = call i64 @prim_car(i64 %a)

...

%f = call i64 @prim_cons(i64 %d, i64 %e)

%g = call i64 @prim_cdr(i64 %b)

The call to prim_cons will end up calling GC_MALLOC which could trigger a collection phase internally. If %b is in a register, that does not guarantee it will also be on the stack and visible to the garbage collector when it runs. This means that the call to cons could return, having collected the list that %b pointed to (believing it was unreachable). Then the call to prim_cdr will fail (crash). Instead, all local variables (or at least any that could be garbage collected) will need to be stored to the stack. This means the call to prim_car could become something like:

%bptr = alloca i64, align 8

%b = call i64 @prim_car(i64 %a)

store volatile i64 %b, i64* %bptr, align 8

...

Now when the call to prim_cons is made, even if GC_MALLOC does trigger a collection, the pair will be visible on the stack and everything transitively reachable from it will be found and not freed by the GC. The volatile keyword may be required to ensure the store is not optimized out by clang if you use llvm’s optimization (e.g., -O2).

Extra credit: There are a few ways to earn some extra credit on your final.

Some unsorted advice for the final:

 

Web Accessibility