This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex6 [2021/10/25 22:46] radu.mantu |
ep:labs:03:contents:tasks:ex6 [2025/02/11 23:29] (current) cezar.craciunoiu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 04. [30p] Perf & fuzzing ==== | + | ==== 06. [10p] Feedback ==== |
- | + | ||
- | The purpose of this exercise is to identify where bottlenecks appear in a real-world application. For this we will use **perf** and American Fuzzy Lop (**AFL**). | + | |
- | + | ||
- | <note tip> | + | |
- | **perf** is a Linux performance analysis tool that we will use to analyze what events occur when running a program. | + | |
- | + | ||
- | **afl** is a fuzzing tool. Fuzzing is the process of detecting bugs empirically. Starting from a //seed// input file, a certain program is executed and its behavior observed. The meaning of //"behavior"// is not fixed, but in the simplest sense, let's say that it means //"order in which instructions are executed"//. After executing the binary under test, the fuzzer will //mutate// the input file. Following another execution, with the updated input, the fuzzer decides whether or not the mutations were useful. This determination is made based on deviations from known paths during runtime. Fuzzers usually run over a period of days, weeks, or even months, all in the hope of finding an input that crashes the program. | + | |
- | </note> | + | |
- | + | ||
- | === [15p] Task A - Fuzzing with AFL === | + | |
- | + | ||
- | First, let's compile AFL and all related tools. We initialize / update a few environment variables to make them more accessible. Remember that these are set only for the current shell. | + | |
- | + | ||
- | <code bash> | + | |
- | $ git clone https://github.com/google/AFL | + | |
- | + | ||
- | $ pushd AFL | + | |
- | $ make -j $(nproc) | + | |
- | + | ||
- | $ export PATH="${PATH}:$(pwd)" | + | |
- | $ export AFL_PATH="$(pwd)" | + | |
- | $ popd | + | |
- | </code> | + | |
- | + | ||
- | Now, check that it worked: | + | |
- | + | ||
- | <code bash> | + | |
- | $ afl-fuzz --help | + | |
- | $ afl-gcc --version | + | |
- | </code> | + | |
- | + | ||
- | The program under test will be [[https://github.com/fuzzstati0n/fuzzgoat|fuzzgoat]], a vulnerable program made for the express purpose of illustrating fuzzer behaviour. To prepare the program for fuzzing, the source code has to be compiled with **afl-gcc**. **afl-gcc** is a wrapper over **gcc** that __statically instruments__ the compiled program. This analysis code that is introduced is leveraged by **afl-fuzz** to track what branches are taken during execution. In turn, this information is used to guide the input mutation procedure. | + | |
- | + | ||
- | <code bash> | + | |
- | $ git clone https://github.com/fuzzstati0n/fuzzgoat.git | + | |
- | + | ||
- | $ pushd fuzzgoat | + | |
- | $ CC=afl-gcc make | + | |
- | $ popd | + | |
- | </code> | + | |
- | + | ||
- | If everything went well, we finally have our __instrumented binary__. Time to run **afl**. For this, we will use the sample seed file provided by **fuzzgoat**. Here is how we call **afl-fuzz**: | + | |
- | * the ''-i'' flag specifies the directory containing the initial seed | + | |
- | * the ''-o'' flag specifies the active workspace for the **afl** instance | + | |
- | * ''%%--%%'' separates the **afl** flags from the binary invocation command | + | |
- | * everything following the ''%%--%%'' separator is how the target binary would normally be invoked in bash; the only difference is that the input file name will be replaced by ''@@'' | + | |
- | + | ||
- | <code bash> | + | |
- | $ afl-fuzz -i fuzzgoat/in -o afl_output -- ./fuzzgoat/fuzzgoat @@ | + | |
- | </code> | + | |
- | + | ||
- | <note important> | + | |
- | **afl** may crash initially, complaining about some system settings. Just follow its instructions until everything is to its liking. Some of the problems may include: | + | |
- | * the coredump generation pattern saving crash information somewhere other than the current directory, with the name //core// | + | |
- | * the CPU running in //powersave// mode, rather than //performance//. | + | |
- | </note> | + | |
- | + | ||
- | If you look in the //afl_output/ //directory, you will see a few files and directories; here is what they are: | + | |
- | * **.cur_input** : current input that is tested; replaces ''@@'' in the program invocation. | + | |
- | * **fuzzer_stats** : statistics generated by **afl**, updated every few seconds by overwriting the old ones. | + | |
- | * **fuzz_bitmap** : a 64KB array of counters used by the program instrumentation to report newly found paths. For every branch instruction, a hash is computed based on its address and the destination address. This hash is used as an offset into the 64KB map. | + | |
- | * **plot_data** : time series that can be used with programs such as **gnuplut** to create visual representations of the fuzzer's performance over time. | + | |
- | * **queue** : backups of all the input files that increased code coverage at that time. Note that some of the newer files may provide the same coverage as some of the old, and then some. The reason why the old ones are not removed when this happens is that checking that would be a pain and would bog down the fuzzing process. | + | |
- | * **hangs** : inputs that caused the process to execute past a timeout limit (20ms by default). | + | |
- | * **crashes** : files that generate crashes. If you want to search for bugs and not just test for coverage increase, you should compile your binary with a sanitizer (e.g.: ASAN). | + | |
- | + | ||
- | === [15p] Task B - Profile AFL === | + | |
- | + | ||
- | Next, we will analyze the performance of **afl**. Using **perf**, we are able to specify one or more events (see ''man perf-list(1)'') that the kernel knows to record only when our program under test (in this case **afl**) is running. When the internal event counter reaches a certain value (see the ''-c'' and ''-F'' flags in ''man perf-record(1)''), a sample is taken. This sample can contain different kinds of information; for example, the ''%%--%%call-graph'' option requires the inclusion of a backtrace of the program with every sample. | + | |
- | + | ||
- | Let's record some stats using unhalted cpu cycles as an event trigger, every 1M events: | + | |
- | + | ||
- | <code bash> | + | |
- | $ perf record -e cycles -c 1000000 \ | + | |
- | afl-fuzz -i fuzzgoat/in -o afl_output -- ./fuzzgoat/fuzzgoat @@ | + | |
- | </code> | + | |
- | + | ||
- | Leave the process running for a minute or so; then kill it with //<Ctrl + C>//. **perf** will take a few moments longer to save all collected samples in a file named //perf.data//. Don't fuck with it! | + | |
- | + | ||
- | Let's see some raw trace output first. Then look at the perf record. The record aggregates the raw trace information and identifies points of interest. | + | |
- | + | ||
- | <code bash> | + | |
- | $ perf script -i perf.data | + | |
- | $ perf report -i perf.data | + | |
- | </code> | + | |
- | + | ||
- | Use ''perf script'' to identify the PID of **afl-fuzz** (hint: ''-F''). Then, filter out any samples unrelated to **afl-fuzz** (i.e.: its child process, **fuzzgoat**) from the report. Then, identify the most heavily used functions in **afl-fuzz**. Can you figure out what they do from the source code? | + | |
+ | Please take a minute to fill in the **[[https://forms.gle/NpSRnoEh9NLYowFr5 | feedback form]]** for this lab. |