-\ | Project 2 Example. Takes us step-by-step through the reverse engineering | of a simple program. | | Feb. 19, 2004, by Kenneth Grauel. -/ [0] GETTING STARTED. Suppose that we are given a simple bomb similar to those in project 2 but with only one phase, and we want to reverse engineer it to find the necessary password. All that we have is an executable file "random", and the following snippet of code: 1 int main() 2 { 3 printf("Enter the password.\n"); 4 phase_0(); 5 printf("Success!\n"); 6 return 0; 7 } How are we to begin doing this? [1] PRELIMINARIES. First of all, we want to figure out precisely *what* format the executable file "random" is given in. So we get to a shell and execute the command "file": [undine@192 undine]$ file random random: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped So we're looking at an executable that hasn't been stripped, which means that it has a symbol table. We can use the command "nm random" to get a list of symbols, then: [undine@192 undine]$ nm random 08049714 A __bss_start 08048324 t call_gmon_start 08049714 b completed.1 ... around 40 lines omitted... 0804845a T main 08049614 d p.0 0804841a T phase_0 U printf@@GLIBC_2.0 U scanf@@GLIBC_2.0 08048300 T _start It's a bit tedious to go through this massive list manually to find all of the lines that are listed with an uppercase T, so we pipe nm's output to grep to filter out the unwanted information: [undine@192 undine]$ nm random | grep ' T ' 080483bc T explode_bomb 08048584 T _fini 080483dc T foo 08048298 T _init 08048500 T __libc_csu_fini 080484b0 T __libc_csu_init 0804845a T main 0804841a T phase_0 08048300 T _start So it looks like we have several important functions: * foo * main * phase_0 * explode_bomb Now that we have a foothold on the functions that we're looking at (which could easily have been done from GDB as well), we can now open up GDB to begin disassembling and analyzing them. [2] DISASSEMBLY Recall from section [0] that the code for main looks like this: 1 int main() 2 { 3 printf("Enter the password.\n"); 4 phase_0(); 5 printf("Success!\n"); 6 return 0; 7 } For the sake of experimentation we run the program to see what its output is upon failure, as it is not immediately obvious from this. [undine@192 undine]$ ./random Enter the password. blah I don't think that was right! Okay, so it looks like not only is input handled by phase_0, but the program will exit upon error at phase_0 and never actually get to the "Success!" output on line 5. So we need to get access to the code for phase_0. In gdb this is accomplished by using the "disassemble" command. But first things first - let's get GDB open: [undine@192 undine]$ gdb random GNU gdb 5.3-25mdk (Mandrake Linux) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i586-mandrake-linux-gnu"... (gdb) _ We want to get the disassembly for phase_0. To do this we use the "disassemble" command, which can be shortened to "disas": (gdb) disas phase_0 Dump of assembler code for function phase_0: 0x804841a : push %ebp 0x804841b : mov %esp,%ebp 0x804841d : sub $0x18,%esp 0x8048420 : sub $0x4,%esp 0x8048423 : lea 0xfffffff8(%ebp),%eax 0x8048426 : push %eax 0x8048427 : lea 0xfffffffc(%ebp),%eax 0x804842a : push %eax 0x804842b : push $0x80485df 0x8048430 : call 0x80482c0 0x8048435 : add $0x10,%esp 0x8048438 : mov %eax,0xfffffff0(%ebp) 0x804843b : sub $0x8,%esp 0x804843e : pushl 0xfffffff8(%ebp) 0x8048441 : pushl 0xfffffffc(%ebp) 0x8048444 : call 0x80483dc 0x8048449 : add $0x10,%esp 0x804844c : cmp $0xf3,%eax 0x8048451 : je 0x8048458 0x8048453 : call 0x80483bc 0x8048458 : leave 0x8048459 : ret End of assembler dump. (gdb) _ This doesn't exactly look encouraging. Let's look at the function calls that are made, though (always an excellent first step): 0x8048430 : call 0x80482c0 0x8048444 : call 0x80483dc 0x8048453 : call 0x80483bc Scanf is obviously a system call, explode_bomb is the thing we're trying to avoid, and foo seems to be another function that may or may not do something that we're interested in. Since disassembling the first two of these isn't likely to get us anywhere, let's disassemble the function foo: (gdb) disas foo Dump of assembler code for function foo: 0x80483dc : push %ebp 0x80483dd : mov %esp,%ebp 0x80483df : sub $0x8,%esp 0x80483e2 : cmpl $0xa,0xc(%ebp) 0x80483e6 : jle 0x80483ed 0x80483e8 : call 0x80483bc 0x80483ed : cmpl $0x1,0xc(%ebp) 0x80483f1 : jg 0x80483fb 0x80483f3 : mov 0x8(%ebp),%eax 0x80483f6 : mov %eax,0xfffffffc(%ebp) 0x80483f9 : jmp 0x8048415 0x80483fb : sub $0x8,%esp 0x80483fe : mov 0xc(%ebp),%eax 0x8048401 : dec %eax 0x8048402 : push %eax 0x8048403 : pushl 0x8(%ebp) 0x8048406 : call 0x80483dc 0x804840b : add $0x10,%esp 0x804840e : imul 0x8(%ebp),%eax 0x8048412 : mov %eax,0xfffffffc(%ebp) 0x8048415 : mov 0xfffffffc(%ebp),%eax 0x8048418 : leave 0x8048419 : ret End of assembler dump. (gdb) _ There are two function calls here: 0x80483e8 : call 0x80483bc 0x8048406 : call 0x80483dc We know that explode_bomb is going to result in failure, and the call to foo is recursive since we're already in foo. So the call tree that we have, rendered in faltering ASCII art, is: [scanf] ^ | L--\ _start --> [main] --> [phase_0] ----> [foo] | \ / \--/ \ / V V [explode_bomb] Since phase_0 and foo both may result in a call to explode_bomb, it is these two that we are going to be interested in analyzing. We will begin with phase_0 since it is the highest in the call stack. [3] REVERSE ENGINEERING PHASE_0 The code for phase_0 may look a bit overwhelming at first, but there are a number of easily identifiable pieces at its beginning and end that will make our job much easier. So let's begin by examining the first few instructions of phase_0: 0x804841a : push %ebp 0x804841b : mov %esp,%ebp 0x804841d : sub $0x18,%esp 0x8048420 : sub $0x4,%esp Offsets 41a-41b seem to be concerned with saving the value of %ebp and replacing %ebp with %esp. Since %esp is the stack pointer, it looks like %ebp is going to be used to keep track of data on the stack. So we've determined the use of those two lines: 0x804841a : push %ebp // Save ebp for later. 0x804841b : mov %esp,%ebp // ebp = esp; Offsets 41d-420 change the value of %esp (inefficiently, since we presumably didn't optimize upon compile). Combined, they subtract 0x18 + 0x4 = 0x1B from %esp: 0x804841d : sub $0x18,%esp // (0x18 + 0x4 = 0x1B) 0x8048420 : sub $0x4,%esp // esp -= 0x1B; The four instructions above are the standard prelude to phase_0. For some applications the prelude may be gracefully ignored until necessary. However, like variable declarations, looking at the registers that are pushed here can often indicate that they will be used later in the function, e.g. %ebp and %esp here in phase_0. Now let's examine the next block of code: 0x8048423 : lea 0xfffffff8(%ebp),%eax 0x8048426 : push %eax 0x8048427 : lea 0xfffffffc(%ebp),%eax 0x804842a : push %eax 0x804842b : push $0x80485df 0x8048430 : call 0x80482c0 You should probably recognize right away that we are pushing things onto the stack in preparation for a call to a function, namely, scanf. Recall that the syntax of scanf is: scanf(, , , ...); Also remember that C calling convention dictates that arguments should be pushed in *reverse order*, that is, that the final argument pushed is the first argument in the argument list. So first we'll examine the last push: 0x804842b : push $0x80485df In GAS syntax, when a dollar sign precedes a value, it is treated as an immediate, that is, it is a fixed value and not a memory position. But this certainly *looks* like a memory position, doesn't it? Kind of makes one wonder what is stored at the memory position 0x80485df, since that should be the format string for scanf. We can use GDB to determine this via the "x" (examine) command, on which we place a format specifier of "/s" for string: (gdb) x/s 0x80485df 0x80485df <_IO_stdin_used+59>: " %d %d" (gdb) It seems that our format command reads in two integers. Let's call the first integer X and the second integer Y. Then it would follow that the first argument, second in the list of pushes, would correspond to X, and the second argument, first in the list of pushes, would correspond to Y: 0x8048423 : lea 0xfffffff8(%ebp),%eax // Push addr. of Y. 0x8048426 : push %eax 0x8048427 : lea 0xfffffffc(%ebp),%eax // Push addr. of X. 0x804842a : push %eax 0x804842b : push $0x80485df // Push ptr to " %d %d". 0x8048430 : call 0x80482c0 // scanf(" %d %d", &x, &y); For the sake of discussion, let's convert the addresses of X and Y into slightly more readable and understandable forms. From above, &x = 0xfffffffc(%ebp) &y = 0xfffffff8(%ebp) We can convert 0xfffffffc to -4 in 2's complement and 0xfffffff8 to -8, so &x = %ebp - 4 &y = %ebp - 8, which is reasonable because %ebp refers to a location on the stack, which is where we would customarily store most local variables. We can also trivially derive the following meanings given their context: 0x8048435 : add $0x10,%esp // Adjust the stack. 0x8048438 : mov %eax,0xfffffff0(%ebp) // Store return value. 0x804843b : sub $0x8,%esp // Adjust the stack. Thus we have created a rough outline for the first part of phase_0: * Set up pointers to the stack. * Read in the integer values X and Y. * Adjust the stack. Let's look at the next section: 0x804843e : pushl 0xfffffff8(%ebp) 0x8048441 : pushl 0xfffffffc(%ebp) 0x8048444 : call 0x80483dc 0x8048449 : add $0x10,%esp 0x804844c : cmp $0xf3,%eax 0x8048451 : je 0x8048458 0x8048453 : call 0x80483bc 0x8048458 : leave 0x8048459 : ret It is somewhat easier to understand this block. We make another function call, this time to foo, which seems to take two arguments - X and Y, in that order. (Recall that 0xfffffffc(%ebp) is X and 0xfffffff8(%ebp) is Y.) Since return values are given in %eax, it seems reasonable to conclude that the line at offset 449, 0x8048449 : add $0x10,%esp exists only to reset the stack pointer. (This I deem "glue code" because it can often be creatively ignored in lieu of instructions that work directly with data such as mov/add, etc.) We then execute a comparison, and finally execute a conditional jump over the failure clause explode_bomb. So clearly this jump has to *succeed* for our password to work. Examining the jump code 0x804844c : cmp $0xf3,%eax 0x8048451 : je 0x8048458 we notice that for the jump to succeed, %eax must equal 0xf3, or 243 decimal. Since %eax is the return value of the function "foo", it stands to reason that the complete outline of phase_0 is: * Set up pointers to the stack. * Read in the integer values X and Y via scanf(" %d %d", &x, &y); * Adjust the stack. * Call foo(x, y). * If the return is not 243, fail. Otherwise, return successfully. It only remains now to interpret and understand the workings of the mystical function foo. [4] REVERSE ENGINEERING FOO Examine the code for foo again. We have already observed that foo is a recursive function that has the potential to fail via explode_bomb. We also know from having decoded a call to foo that foo takes two arguments - X and Y. Since X and Y are integers and foo is recursive, it stands to reason that foo is probably some sort of mathematical construct. Its disassembly certainly does not seem to be complex enough to indicate otherwise (although something else may there exist courtesy of various obfuscating methods.) It seems, though, that our real GOAL is: Find X and Y such that foo(X, Y) = 243. If we can do that and foo doesn't bomb, then it looks like we have solved the problem. It just so happens that GDB has a very useful command, "call", which will call a function with the arguments given. To use call we need to have the process running by the "run" command. However, if we simply type "run", we will be asked to enter a string for the password, which will preempt GDB's prompt. So instead we can put a breakpoint on the main function and make the program stop there, before the prompt is given: (gdb) b main Breakpoint 3 at 0x8048463 (gdb) run Starting program: /home/undine/random Breakpoint 3, 0x08048463 in main () (gdb) The syntax for the "b" command (which is an abbreviation of "break") was covered in lecture, so I'm not going to cover it here. For additional information, type "help break" at the GDB prompt, or seek out your local man page or guru. Now we can use call, whose syntax is: call (, , ...) which is basically the same as a C call with "call" prepended at the beginning. So let's call foo for a few arguments: (gdb) call foo(5, 6) $3 = 15625 (gdb) call foo(4, 8) $4 = 65536 (gdb) call foo(2, 12) I don't think that was right! Program exited with code 020. The program being debugged stopped while in a function called from GDB. When the function (foo) is done executing, GDB will silently stop (instead of continuing to evaluate the expression containing the function call). (gdb) _ Oops, looked like we touched a raw nerve in foo and caused it to bomb the program. Since we've exited from the program, we need to use "run" to restart it again. The breakpoint will have stuck around, so we don't have to redo that part of it. We resume systematically trying arguments for foo: (gdb) run Starting program: /home/undine/random Breakpoint 3, 0x08048463 in main () (gdb) call foo(2, 2) $5 = 4 (gdb) call foo(2, 3) $6 = 8 (gdb) call foo(2, 5) $7 = 32 (gdb) call foo(3, 2) $8 = 9 (gdb) call foo(3, 3) $9 = 27 (gdb) call foo(3, 4) $10 = 81 (gdb) call foo(5, 2) $11 = 25 (gdb) call foo(5, 3) $12 = 125 (gdb) call foo(5, 4) $13 = 625 (gdb) _ By now it should be clearly obvious what foo is. If you're not seeing it, take a little time and read the above output more carefully. Foo(x, y) is exactly the same as pow(x, y) for all integers x and y. So what number raised to what other will give us 243? Well... one example might be x = 3, y = 5, since 3^5 = 243. Another would be 243^1, which would reduce trivially to 243. Let's try them just to make sure: (gdb) call foo(3, 5) $13 = 243 (gdb) call foo(243, 1) $14 = 243 (gdb) _ It works -- let's go to an empty shell and use the first of these: [undine@192 undine]$ ./random Enter the password. 3 5 Success! [undine@192 undine]$ _ And we're done. [5] NOTES AND SOURCE If you find all of this difficult to understand, realize that the problem given here requires moderate skill relative to the other problems in project 2. You will likely improve after finishing the first few phases, which will likely be easier than the one given above. Chapter 3 in the book, although somewhat dense, is also *immensely* helpful. Also remember that explaining this in text is ridiculously difficult, so it's probably not your fault :) For the perpetually curious, the entire source code is given below. /*----------------------------------------------------------------- * Random.c * * Author: Kenneth Grauel * Date: Feb 19, 2004 * * Compile with: gcc -Wall random.c -o random * (we omit debug information for realism) */ #include void explode_bomb() { printf("I don't think that was right!\n"); exit(0x10); } int foo(int x, int y) { if (y > 10) explode_bomb(); if (y <= 1) return x; return x * foo(x, y - 1); } void phase_0() { int x, y, r2; int ret = scanf(" %d %d", &x, &y); if (foo(x, y) != 243) explode_bomb(); } int main() { printf("Enter the password.\n"); phase_0(); printf("Success!\n"); return 0; } [6] BRUTE FORCING (quite optional) By the length of the password we can clearly see that brute force is a very real and feasible option. This will likely not be the case for the phases in your generated bomb, but in our case it is a decent choice. Brute forcing has the attractive property that it is not necessary to slog fully through dark forests of evil asm code. All that really has to be done is to decipher the format of the input, which is almost laughably trivial compared to actually understanding the code itself. But because brute forcing requires so little thought, I would NOT suggest using it unless you plan on picking up the requisite knowledge elsewhere. For the sake of completeness, though, the following perl script implements a simple brute force algorithm: 1 #!/usr/bin/perl 2 3 $xu = $#ARGV < 0 ? 100 : $ARGV[0]; 4 $yu = $#ARGV < 1 ? ($#ARGV < 0 ? 100 : $ARGV[0]) : $ARGV[1]; 5 $xl = $#ARGV < 2 ? 0 : $ARGV[2]; 6 $yl = $#ARGV < 3 ? ($#ARGV < 2 ? 0 : $ARGV[2]) : $ARGV[3]; 7 8 for my $x ($xl..$xu) { 9 print "\r$x/$xu:$yl..$yu"; 10 for my $y ($yl..$yu) { 11 $r = `echo \"$x $y\n\" | ./random`; 12 print "\nMATCH x = $x, y = $y\n" unless ($r =~ /was right/); 13 } 14 } 15 16 print "\nDone.\n\n"; The syntax for invocation is: ./bfp0.pl [ [ [ []]]] Where ymin defaults to xmin, xmin defaults to 0, ymax defaults to xmax, and xmax defaults to 100. A sample execution (with timing): [undine@192 undine]$ time ./bfp0.pl 500 10 3/500:0..10 MATCH x = 3, y = 5 243/500:0..10 MATCH x = 243, y = 0 MATCH x = 243, y = 1 500/500:0..10 Done. 29.98user 13.01system 0:44.06elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1836540major+965499minor)pagefaults 0swaps [undine@192 undine]$ _ So on a 2.53 GHZ processor, it takes 44 seconds to test 501*11=5511 choices, which is on the scale of 125 checks per second. No doubt this is worthless for any real problems, and thus the majority of the problems on project 2. (There are ways of vastly improving the speed of this process, but you'd spend much more time implementing them than you ever would simply understanding the code. Maybe after the project is over I'll post one of them, but I can't give any ideas out yet, since I'd be functionally giving away the tools necessary to break roughly half of the phases independent of bomb number.) This is the end. Thanks for reading this whole thing!