Thanks to Lockheed Martin for a generous gift in support of this class.

CMSC 498L/ENEE 459L, Fall 2012

Cybersecurity Lab

Symbolic execution lab

Oct 2, 2012

Introduction

The purpose of this lab is to give you some practice using KLEE, an open source symbolic execution engine built on top of LLVM.

Part 1: Basic tutorial

The first part of this lab exercise is to complete Tutorial 1 from the KLEE web site.

We've already installed KLEE on the backtrack VMs for class, in ~/Desktop/klee. You can add the KLEE binaries to your path with the command

$ export PATH=/root/Desktop/klee/klee-cde-package/bin:$PATH

Note that because we've installed the cde version of the package, to run the KLEE executables you'll need to add .cde to the end of each executable name, e.g., llvm-gcc.cde and klee.cde. Also note that we were unable to get the last step in the tutorial, "Replaying a test case," to work with this distribution; let us know if you figure out how to do that.

Part 2: Symbolic maze

Next, you should complete the Symbolic maze tutorial. The tutorial is a bit long and we don't have a lot of time, so here is a much more abbreviated verison of it.

The source code is here: a_maze.c. Try compiling this file and running it:

$ gcc a_maze.c
$ ./a.out
Maze dimensions: 11x7
Player pos: 1x1
Iteration no. 0
Program the player moves with a sequence of 'w', 's', 'a' and 'd'
Try to reach the price(#)!
+-+---+---+
|X|     |#|
| | --+ | |
| |   | | |
| +-- | | |
|     |   |
+-----+---+

At this point, you can type in a sequence of directions (a=left, d=right, w=up, s=down). The program will execute that sequence and report whether you get to the end, e.g., try typing ssssdddww to see what happens.

Now let's modify the program so the input is symbolic, and so that KLEE knows when to stop executing. Change line 72:

// read(0,program,ITERS);   // comment this line out
klee_make_symbolic(program, ITERS,"program");  // add this
Also add an assertion after line 104:
printf ("You win!\n");
klee_assert(0);         // add this line

Now we can compile and run with KLEE. For this test, let's tell KLEE not to stop with the first error, but to continue running until it finds all program paths where klee_assert fails.

$ llvm-gcc.cde --emit-llvm -c -g a_maze.c
$ klee.cde --emit-all-errors a_maze.o

Now, look in the klee-last directory for the .err files, and use ktest-tool.cde --write-ints to find out how the maze was solved. Did you get the results you expected, and are they correct? (Try running on the original a.out file to check.) Something funny is going on. Can you figure out what's happening by looking through the source code?

Part 3: Coreutils

Finally, for the last part of this lab, we will experiment with using KLEE to reproduce the results from Figure 7 of the KLEE OSDI paper.

To compile coreutils using klee, we need to use a script, klee-gcc, that will compile with llvm-gcc and save the bytecode files for later symbolic execution. So let's add that script to our path. We also need to add llvm-ld to out path, and we need to make a link so that llvm-gcc goes to the .cde version (or, edit klee-gcc to use the .cde version.)

$ export PATH=~/Desktop/klee/klee-cde-package/cde-root/home/pgbovine/klee/scripts/:$PATH
$ export PATH=~/Desktop/klee/klee-cde-package/cde-root/home/pgbovine/llvm-2.7/Release/bin/:$PATH
$ cd ~/Desktop/klee/klee-cde-package/bin
$ ln -s llvm-gcc.cde llvm-gcc

Now that we've gotten that set up, we can grab the version of coreutils used in the paper, and compile it for use with KLEE.

$ cd ~
$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-6.10.tar.gz
$ tar -xzf coreutils-6.10.tar.gz
$ cd coreutils-6.10
$ mkdir obj-llvm
$ cd obj-llvm
$ ../configure --disable-nls CFLAGS="-g"  # takes about a minute
$ make CC=`which klee-gcc` # takes a little over two minutes

Now the src/ directory should contain .bc bytecode files for all of coreutils 6.10. For example, we can use KLEE as a pure interpreter on cat:

$ cd src
$ klee.cde --optimize --libc=uclibc --posix-runtime ./cat.bc --version

Your task for this part of the lab is to experiment with KLEE to try to reproduce the errors reported in Figure 7. You may want to use the following flags (you can find most of this list by running a coreutils program under KLEE with --help).
-sym-arg NReplace by a symbolic argument with length N
-sym-args MIN MAX NReplace by at least MIN arguments and at most MAX arguments, each with maximum length N
-sym-files NUM NMake stdin and up to NUM symbolic files, each with maximum size N
-max-time=SHalt execution after S seconds
For example, we could directly reproduce an error:

$ klee.cde --optimize --libc=uclibc --posix-runtime ./seq.bc -f %0 1
...
KLEE: ERROR: /root/coreutils-6.10/obj-llvm/src/../../src/seq.c:215: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 93279
KLEE: done: completed paths = 1
KLEE: done: generated tests = 1
Or we can have KLEE search for the error a bit:
$ klee.cde -max-time=60 --optimize --libc=uclibc --posix-runtime ./seq.bc -sym-arg 2 %0 -sym-arg 1
...
KLEE: ERROR: /root/coreutils-6.10/obj-llvm/src/../../src/seq.c:215: memory error: out of bound pointer

KLEE: done: total instructions = 445778
KLEE: done: completed paths = 61
KLEE: done: generated tests = 60

How many of the errors can you find? Can you find any new errors not reported in the original paper?!