This shows you the differences between two versions of the page.
cns:labs:lab-13 [2017/01/10 12:32] lucian.mogosanu [Tutorial: Symbolic Execution using Angr] |
cns:labs:lab-13 [2019/12/08 15:19] (current) dennis.plosceanu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Lab 13 - Advanced Binary Analysis ====== | + | ====== Extra - Advanced Binary Analysis ====== |
===== Resources ===== | ===== Resources ===== | ||
Line 16: | Line 16: | ||
===== Lab Support Files ===== | ===== Lab Support Files ===== | ||
- | TODO | + | We will use this [[http://elf.cs.pub.ro/oss/res/labs/lab-13.tar.gz|lab archive]] throughout the lab. |
+ | Please download the lab archive an then unpack it using the commands below: | ||
+ | |||
+ | <code> | ||
+ | spyked@tuvok:~% wget http://elf.cs.pub.ro/oss/res/labs/lab-13.tar.gz | ||
+ | spyked@tuvok:~% tar xzf lab-13.tar.gz | ||
+ | </code> | ||
+ | |||
+ | After unpacking we will get the ''lab-13/'' folder that we will use for the lab: | ||
+ | |||
+ | <code> | ||
+ | spyked@tuvok:~% cd lab-13 | ||
+ | spyked@tuvok:~/lab-13% ls -F | ||
+ | 0-tutorial/ 1-baby-re/ 2-hash/ | ||
+ | </code> | ||
===== Introduction: Binary Analysis Techniques ===== | ===== Introduction: Binary Analysis Techniques ===== | ||
Line 123: | Line 137: | ||
Examples of symbolic execution engines include [[http://angr.io/|Angr]], [[https://klee.github.io/|KLEE]], [[http://www.cs.ubc.ca/labs/isd/Projects/Kite/|Kite]], [[https://users.ece.cmu.edu/~arebert/papers/mayhem-oakland-12.pdf|Mayhem]] and [[http://s2e.epfl.ch/|S2E]]. | Examples of symbolic execution engines include [[http://angr.io/|Angr]], [[https://klee.github.io/|KLEE]], [[http://www.cs.ubc.ca/labs/isd/Projects/Kite/|Kite]], [[https://users.ece.cmu.edu/~arebert/papers/mayhem-oakland-12.pdf|Mayhem]] and [[http://s2e.epfl.ch/|S2E]]. | ||
- | ===== Tutorial: Symbolic Execution using Angr ===== | + | ===== Tutorial: Symbolic Execution using Angr [2p] ===== |
First, install angr. See [[http://angr.io/install.html|http://angr.io/install.html]]. | First, install angr. See [[http://angr.io/install.html|http://angr.io/install.html]]. | ||
+ | |||
+ | <note important> | ||
+ | Make sure you grab all the dependencies, as per [[https://docs.angr.io/INSTALL.html|the documentation]]: | ||
+ | |||
+ | <code> | ||
+ | sudo apt-get install python-dev libffi-dev build-essential virtualenvwrapper | ||
+ | </code> | ||
+ | </note> | ||
<note important> | <note important> | ||
Line 159: | Line 181: | ||
For more details, refer to the [[http://docs.python-guide.org/en/latest/dev/virtualenvs/|Python virtualenv guide]]. | For more details, refer to the [[http://docs.python-guide.org/en/latest/dev/virtualenvs/|Python virtualenv guide]]. | ||
+ | </note> | ||
+ | |||
+ | <note important> | ||
+ | If you get the following message when you try to run ''solve.py'': | ||
+ | |||
+ | <code> | ||
+ | ... | ||
+ | ImportError: cannot import name arm | ||
+ | </code> | ||
+ | |||
+ | Try applying the workaround from this GitHub issue: [[https://github.com/angr/angr/issues/52#issuecomment-169509200|https://github.com/angr/angr/issues/52#issuecomment-169509200]] | ||
</note> | </note> | ||
Line 230: | Line 263: | ||
<note important> | <note important> | ||
- | See the "Stash types" section in the [[https://docs.angr.io/docs/pathgroups.html|path groups]] chapter of the angr documentation for more details on types of paths in a path group. | + | For example ''pg.found'' and ''pg.deadended'' are lists of paths; ''pg.found[0]'' is the first path in ''found'', and ''pg.found[0].state'' is the current state in that path. See the "Stash types" section in the [[https://docs.angr.io/docs/pathgroups.html|path groups]] chapter of the angr documentation for more details on types of paths in a path group. |
</note> | </note> | ||
Line 272: | Line 305: | ||
segmentation fault | segmentation fault | ||
</code> | </code> | ||
+ | |||
+ | <note warning> | ||
+ | It may be that above so many NUL-bytes will not work properly under Bash. So you can replace that with a command such as the one below<code> | ||
+ | $ ./level07 -2147483627 $(python -c 'print "A"*40 + "FLOW"') | ||
+ | WIN! | ||
+ | segmentation fault | ||
+ | </code> | ||
+ | </note> | ||
<note important> | <note important> | ||
Line 278: | Line 319: | ||
===== Tasks ===== | ===== Tasks ===== | ||
- | ==== 1. baby-re ==== | + | ==== 0. Extra: Feedback [2p] ==== |
- | ==== 2. hash ==== | + | We value your opinions and input on improving the Computer and Network Security class (CNS) and its |
+ | components. Please take the time and fill [[http://cs.curs.pub.ro/2016/blocks/feedbackacs/view.php?courseid=175&blockid=3269|the feedback form on cs.curs.pub.ro]]. Your feedback is very important for us to improve both the CNS class and other classes you will go through in the future. | ||
+ | |||
+ | We are particularly interested in: | ||
+ | * What didn't you like and what you consider didn't go well? | ||
+ | * Why didn't you like that and why you consider it didn't go well? | ||
+ | * What should we do to make things likable and going well? | ||
+ | |||
+ | Thank you! | ||
+ | ==== 1. baby-re [3p] ==== | ||
+ | |||
+ | We're given a binary (''1-baby-re/baby-re'') that we want to reverse engineer, the end result being a **flag**. Running the program, we see that it asks us for some inputs: | ||
+ | |||
+ | <code> | ||
+ | $ ./baby-re | ||
+ | Var[0]: 1 | ||
+ | Var[1]: 2 | ||
+ | Var[2]: 3 | ||
+ | Var[3]: 4 | ||
+ | Var[4]: 5 | ||
+ | Var[5]: 6 | ||
+ | Var[6]: 7 | ||
+ | Var[7]: 8 | ||
+ | Var[8]: 9 | ||
+ | Var[9]: 0 | ||
+ | Var[10]: 1 | ||
+ | Var[11]: 2 | ||
+ | Var[12]: 3 | ||
+ | Wrong | ||
+ | </code> | ||
+ | |||
+ | Before trying to execute it symbolically, let's try to inspect it. We have all the options that we know from the previous labs: | ||
+ | |||
+ | * disassembling it; | ||
+ | * ''strace''-ing it; | ||
+ | * looking at the ELF header, symbols, strings, etc.; | ||
+ | * running it with GDB and inspecting memory, etc. | ||
+ | |||
+ | Since the program outputs some strings, let's look at it with ''strings''. We notice that the programmer defined the following strings: | ||
+ | |||
+ | <code> | ||
+ | $ strings baby-re | grep -v '^_\|^\.\|GLIBC\|\.so' | ||
+ | ... | ||
+ | Var[0]: | ||
+ | Var[1]: | ||
+ | Var[2]: | ||
+ | Var[3]: | ||
+ | Var[4]: | ||
+ | Var[5]: | ||
+ | Var[6]: | ||
+ | Var[7]: | ||
+ | Var[8]: | ||
+ | Var[9]: | ||
+ | Var[10]: | ||
+ | Var[11]: | ||
+ | Var[12]: | ||
+ | The flag is: %c%c%c%c%c%c%c%c%c%c%c%c%c | ||
+ | Wrong | ||
+ | ... | ||
+ | CheckSolution | ||
+ | main | ||
+ | </code> | ||
+ | |||
+ | The string starting with ''%%"The flag is: "%%'' is what we want to see printed. Let's look a bit at the program flow, using ''objdump''. Looking at ''main'', we see that there are a lot of ''scanf''s performed (the ones that get ''Var[0]'', ''Var[1]'', and so on), and then ''CheckSolution'' is called. We're interested in this particular piece of the code: | ||
+ | |||
+ | <code> | ||
+ | 00000000004025e7 <main>: | ||
+ | 4028dd: 48 89 c7 mov rdi,rax | ||
+ | 4028e0: e8 e1 dd ff ff call 4006c6 <CheckSolution> | ||
+ | 4028e5: 84 c0 test al,al | ||
+ | 4028e7: 74 58 je 402941 <main+0x35a> | ||
+ | ... | ||
+ | 402924: 45 89 f1 mov r9d,r14d | ||
+ | 402927: 45 89 e8 mov r8d,r13d | ||
+ | 40292a: 89 c6 mov esi,eax | ||
+ | 40292c: bf 88 2a 40 00 mov edi,0x402a88 | ||
+ | 402931: b8 00 00 00 00 mov eax,0x0 | ||
+ | 402936: e8 45 dc ff ff call 400580 <printf@plt> | ||
+ | 40293b: 48 83 c4 40 add rsp,0x40 | ||
+ | 40293f: eb 0a jmp 40294b <main+0x364> | ||
+ | 402941: bf b1 2a 40 00 mov edi,0x402ab1 | ||
+ | 402946: e8 15 dc ff ff call 400560 <puts@plt> | ||
+ | 40294b: b8 00 00 00 00 mov eax,0x0 | ||
+ | 402950: 48 8b 5d d8 mov rbx,QWORD PTR [rbp-0x28] | ||
+ | 402954: 64 48 33 1c 25 28 00 xor rbx,QWORD PTR fs:0x28 | ||
+ | ... | ||
+ | </code> | ||
+ | |||
+ | We notice that at ''0x4028e7'' we have a check that jumps at ''0x402941'' if a condition is set. The code at ''0x402941'' in turns calls ''puts'' (**not** ''printf''; in fact, we can assume this is where ''%%"Wrong"%%'' is printed.), which means that is not the code that prints the flag. So that is a path that we want **to avoid**. | ||
+ | |||
+ | If we get at ''0x40293b'', however, then it means ''printf'' will have been called, and we can assume this is what prints the flag. Then a jump to ''0x40294b'' will be performed, which leads to the end of the program. | ||
+ | |||
+ | Now we know exactly what path we want to **find** (a path ending in ''0x40293b'' or ''0x40294b''), and what paths we want to **avoid** (all the paths where ''%%puts("Wrong");%%'' is executed). | ||
+ | |||
+ | Given this information, **your task** is to fill in ''solve_skel.py'' with an angr script that solves the riddle and makes the program print out the flag. Remember that in this case we don't need to find out the exact inputs that print out the solution; we only care about **the flag**, which is printed to standard output (''path_groups.found[0].state.posix.dumps(1)''). | ||
+ | |||
+ | <note tip> | ||
+ | The typical symbolic execution run can take minutes, if not hours or days to complete. The exploration for this task may take around 5 to 10 minutes to run, so make sure to **carefully verify your script** before running it. | ||
+ | </note> | ||
+ | ==== 2. hash [5p] ==== | ||
+ | |||
+ | Switch to ''2-hash''. The task performs a hash on the input and overwrites the return address with the function output. Use this to jump to the ''win'' function. | ||
+ | |||
+ | - **Solve the task by hand [2p]** | ||
+ | - **Solve the task using the provided angr skeleton script [3p]** | ||
+ | |||
+ | <note tip> | ||
+ | What properties do hash functions have in general? What properties do you think the hash function ''hash'' has? For the first part of the task, you might be able to get away with brute-forcing the hash inverse. | ||
+ | </note> | ||
+ | |||
+ | <note tip> | ||
+ | ''scanf'' might behave oddly when we try to execute it symbolically. We're not interested in it: we just want to execute ''hash'' and find the input argument (the 8-byte value stored on the stack) for which the output (the value in ''eax'' at the end of the function/after returning from it) has a particular value. | ||
+ | |||
+ | The angr skeleton script (''skel.py'') captures this pattern very well, so you need to just look for the right addresses in the binary and make sure you understand what is it that is set as input, explored, solved, etc. | ||
+ | </note> |