This shows you the differences between two versions of the page.
ep:labs:05:contents:tasks:ex4 [2025/02/11 23:43] cezar.craciunoiu created |
ep:labs:05:contents:tasks:ex4 [2025/02/20 15:04] (current) cezar.craciunoiu [04. [30p] GPU Monitoring] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 04. [30p] Perf & fuzzing ==== | + | ==== 04. [30p] GPU Monitoring ==== |
- | The purpose of this exercise is to identify where bottlenecks appear in a real-world application. For this we will use **perf** and American Fuzzy Lop (**AFL**). | + | === a. [0p] Clone Repository and Build Project === |
- | <note tip> | + | Clone the repository containing the tasks and change to this lab's task 04. |
- | **perf** is a Linux performance analysis tool that we will use to analyze what events occur when running a program. | + | Follow the instructions to install the dependencies and build the project from the **README.md**. |
- | **afl** is a fuzzing tool. Fuzzing is the process of detecting bugs empirically. Starting from a //seed// input file, a certain program is executed and its behavior observed. The meaning of //"behavior"// is not fixed, but in the simplest sense, let's say that it means //"order in which instructions are executed"//. After executing the binary under test, the fuzzer will //mutate// the input file. Following another execution, with the updated input, the fuzzer decides whether or not the mutations were useful. This determination is made based on deviations from known paths during runtime. Fuzzers usually run over a period of days, weeks, or even months, all in the hope of finding an input that crashes the program. | + | <note tip> |
+ | <code> | ||
+ | $ git clone https://github.com/cs-pub-ro/EP-labs.git | ||
+ | $ cd EP-labs/lab_05/task_04 | ||
+ | </code> | ||
</note> | </note> | ||
- | === [10p] Task A - Fuzzing with AFL === | + | === b. [10p] Run Project and Collect Measurements === |
- | First, let's compile AFL and all related tools. We initialize / update a few environment variables to make them more accessible. Remember that these are set only for the current shell. | + | To run the project, simply run the binary generated by the build step. |
+ | This will render a scene with a sphere. | ||
+ | Follow the instructions in the terminal and progressively increase the number of vertices. | ||
+ | Upon exiting the simulation with **Esc**, two **.csv** files will be created. | ||
+ | You will use these measurements to generate plots. | ||
- | <code bash> | + | <note warning> |
- | $ git clone https://github.com/google/AFL | + | The simulation runs with FPS unbounded, this means it will use your whole GPU. **Careful!** |
- | $ pushd AFL | + | Also pay close attention to your RAM! |
- | $ make -j $(nproc) | + | </note> |
- | $ export PATH="${PATH}:$(pwd)" | + | <note tip> |
- | $ export AFL_PATH="$(pwd)" | + | Every time you modify the number of vertices, wait at least a couple of seconds so the FPS becomes stable. |
- | $ popd | + | |
- | </code> | + | |
- | Now, check that it worked: | + | Increase vertices until you have less than 10 FPS for good results. |
- | + | ||
- | <code bash> | + | |
- | $ afl-fuzz --help | + | |
- | $ afl-gcc --version | + | |
- | </code> | + | |
- | + | ||
- | The program under test will be [[https://github.com/fuzzstati0n/fuzzgoat|fuzzgoat]], a vulnerable program made for the express purpose of illustrating fuzzer behaviour. To prepare the program for fuzzing, the source code has to be compiled with **afl-gcc**. **afl-gcc** is a wrapper over **gcc** that __statically instruments__ the compiled program. This analysis code that is introduced is leveraged by **afl-fuzz** to track what branches are taken during execution. In turn, this information is used to guide the input mutation procedure. | + | |
- | + | ||
- | <code bash> | + | |
- | $ git clone https://github.com/fuzzstati0n/fuzzgoat.git | + | |
- | + | ||
- | $ pushd fuzzgoat | + | |
- | $ CC=afl-gcc make | + | |
- | $ popd | + | |
- | </code> | + | |
- | + | ||
- | If everything went well, we finally have our __instrumented binary__. Time to run **afl**. For this, we will use the sample seed file provided by **fuzzgoat**. Here is how we call **afl-fuzz**: | + | |
- | * the ''-i'' flag specifies the directory containing the initial seed | + | |
- | * the ''-o'' flag specifies the active workspace for the **afl** instance | + | |
- | * ''%%--%%'' separates the **afl** flags from the binary invocation command | + | |
- | * everything following the ''%%--%%'' separator is how the target binary would normally be invoked in bash; the only difference is that the input file name will be replaced by ''@@'' | + | |
- | + | ||
- | <code bash> | + | |
- | $ afl-fuzz -i fuzzgoat/in -o afl_output -- ./fuzzgoat/fuzzgoat @@ | + | |
- | </code> | + | |
- | + | ||
- | <note important> | + | |
- | **afl** may crash initially, complaining about some system settings. Just follow its instructions until everything is to its liking. Some of the problems may include: | + | |
- | * the coredump generation pattern saving crash information somewhere other than the current directory, with the name //core// | + | |
- | * the CPU running in //powersave// mode, rather than //performance//. | + | |
</note> | </note> | ||
- | If you look in the //afl_output/ //directory, you will see a few files and directories; here is what they are: | + | <solution -hidden> |
- | * **.cur_input** : current input that is tested; replaces ''@@'' in the program invocation. | + | Students might face difficulties installing packages and/or running the project depending on their system. |
- | * **fuzzer_stats** : statistics generated by **afl**, updated every few seconds by overwriting the old ones. | + | You can let them run the experiment on their collegues' computer. |
- | * **fuzz_bitmap** : a 64KB array of counters used by the program instrumentation to report newly found paths. For every branch instruction, a hash is computed based on its address and the destination address. This hash is used as an offset into the 64KB map. | + | They can then send the generated results back to them to continue the exercise. |
- | * **plot_data** : time series that can be used with programs such as **gnuplot** to create visual representations of the fuzzer's performance over time. | + | </solution> |
- | * **queue/** : backups of all the input files that increased code coverage at that time. Note that some of the newer files may provide the same coverage as old ones, and then some. The reason why the old ones are not removed when this happens is that rechecking / caching coverage would be a pain and would bog down the fuzzing process. Depending on the binary under tests, we can expect a few thousand executions per second. | + | |
- | * **hangs/** : inputs that caused the process to execute past a timeout limit (20ms by default). | + | |
- | * **crashes/** : files that generate crashes. If you want to search for bugs and not just test for coverage increase, you should compile your binary with a sanitizer (e.g.: [[https://clang.llvm.org/docs/AddressSanitizer.html|asan]]). Under normal circumstances, an out-of-bounds access can go undetected unless the accessed address is unmapped, thus creating a #PF (page fault). Different sanitizers give assurances that these bugs actually get caught, but also reduce the execution speed of the tested programs, meaning slower code coverage increase. | + | |
- | === [10p] Task B - Profile AFL === | + | === c. [10p] Generate Plot === |
- | Next, we will analyze the performance of **afl**. Using **perf**, we are able to specify one or more events (see ''man perf-list(1)'') that the kernel knows to record only when our program under test (in this case **afl**) is running. When the internal event counter reaches a certain value (see the ''-c'' and ''-F'' flags in ''man perf-record(1)''), a sample is taken. This sample can contain different kinds of information; for example, the ''-g'' option requires the inclusion of a backtrace of the program with every sample. | + | We want to interpret the results recorded. |
+ | In order to do this, we need to visually see them in a suggestive way. | ||
+ | Plot the results in such a way that they are suggestive and easy to understand. | ||
- | Let's record some stats using unhalted CPU cycles as an event trigger, every 1k events in userspace, and including frame pointers in samples: | + | <note> |
- | + | Recommended way to do the plots would be to follow these specifications: | |
- | <code bash> | + | * One single plot for all results |
- | $ perf record -e cycles -c 1000 -g --all-user \ | + | * Left OY axis shows FPS as a continuous variable |
- | afl-fuzz -i fuzzgoat/in -o afl_output -- ./fuzzgoat/fuzzgoat @@ | + | * Right OY axis shows time spent per event in **ms** |
- | </code> | + | * OX axis follows the time of the simulation without any time ticks |
- | + | * OX axis has ticks showing the number of vertices for each event that happens | |
- | <note important> | + | * Every event marked with ticks on the OX axis has one stacked bar chart made of two components: |
- | Perf might not be able to capture data samples if access to performance monitoring operations is not allowed. To open access for processes without //CAP_PERFMON//, //CAP_SYS_PTRACE// or //CAP_SYS_ADMIN// Linux capability, adjust (as root user) the value of **/proc/sys/kernel/perf_event_paranoid** to **-1**: | + | * a. a bottom component showing time spent copying buffers |
- | <code bash> | + | * b. a top component showing the rest **without the time spent on copying buffers** |
- | $ sudo su | + | |
- | $ echo -1 > /proc/sys/kernel/perf_event_paranoid | + | |
- | $ exit | + | |
- | </code> | + | |
- | More information can be found [[https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html|here]]. | ||
</note> | </note> | ||
- | Leave the process running for a minute or so; then kill it with //<Ctrl + C>//. **perf** will take a few moments longer to save all collected samples in a file named //perf.data//, which are read by **perf script**. Don't mess with it! | + | <solution -hidden> |
+ | Recommend students to follow the guide from the notes in the task. | ||
+ | If they do not follow it make sure it is an improvement over the suggestion. | ||
+ | Otherwise, mark them as you fit. | ||
+ | </solution> | ||
- | Let's see some raw trace output first. Then look at the perf record. The record aggregates the raw trace information and identifies stress areas. | + | === d. [10p] Interpret Results === |
- | <code bash> | + | Explain the results you have plotted. |
- | $ perf script -i perf.data | + | Answer the following questions: |
- | $ perf report -i perf.data | + | * Why does the FPS plot look like downwards stairs upon increasing the number of vertices? |
- | </code> | + | * Why does the FPS decrease more initially and the stabilizes itself at a higher value? |
+ | * What takes more to compute: generating the vertices, or copying them in the VRAM? | ||
+ | * What is the correlation between the number of vertices and the time to copy the Vertex Buffer? | ||
+ | * Why is the program less responsive on a lower number of frames? | ||
+ | <solution -hidden> | ||
+ | Answers: | ||
+ | * FPS stays stable until a button is pushed, and then it descends as the scene becomes more complex. | ||
+ | * The more complex the scene, the more data needs to be copied, which means a longer pause between the last and the current frame. As the FPS is calculated per second, this will mean the result is tarnished/skewed for the first second since the button is pressed. | ||
+ | * They need to compare the bar chart components and see which is bigger for them. Theoretically generating takes more time. | ||
+ | * The higher the number of vertices, the bigger the Vertex Buffer, the longer it takes to copy it into VRAM from RAM. | ||
+ | * Input is computed per frame. If input comes while the program is busy, it will be ignored. The smaller the FPS, the more unresponsive the program becomes. Easy to observe on <5 FPS. | ||
- | Use ''perf script'' to identify the PID of **afl-fuzz** (hint: ''-F''). Then, filter out any samples unrelated to **afl-fuzz** (i.e.: its child process, **fuzzgoat**) from the report. Then, identify the most heavily used functions in **afl-fuzz**. Can you figure out what they do from the source code? | + | Any format suggesting these is ok. |
+ | </solution> | ||
- | Make sure to include plenty of screenshots and explanations for this task :p | + | === e. [10p] Bonus Dedicated GPU === |
- | === [10p] Task C - Flame Graph === | + | Go back to step b. and rerun the binary and make it run on your dedicated GPU. |
+ | Redo the plot with the new measurements. | ||
+ | You do not need to answer the questions again. | ||
- | A [[https://www.brendangregg.com/flamegraphs.html|Flame Graph]] is a graphical representation of the stack traces captured by the **perf** profiler during the execution of a program. It provides a visual depiction of the call stack, showing which functions were active and how much time was spent in each one of them. By analyzing the flame graph generated by //perf//, we can identify performance bottlenecks and pinpoint areas of the code that may need optimization or further investigation. | + | <note tip> |
+ | If you use Nvidia, you can use **prime-run**. | ||
- | When analyzing flame graphs, it's crucial to focus on the width of each stack frame, as it directly indicates the number of recorded events following the same sequence of function calls. In contrast, the height of the frames does not carry significant implications for the analysis and should not be the primary focus during interpretation. | + | https://gist.github.com/abenson/a5264836c4e6bf22c8c8415bb616204a |
- | Using the samples previously obtained in //perf.data//, generate a corresponding Flame Graph in SVG format and analyze it. | + | If you use AMD, you can use the **DRI_PRIME=1** environment variable. |
+ | </note> | ||
- | <note> | + | <solution -hidden> |
- | How to do: | + | If they did the other tasks this will be easy enough for them. |
- | - Clone the following git repo: https://github.com/brendangregg/FlameGraph. | + | |
- | - Use the **stackcollapse-perf.pl** Perl script to convert the //perf.data// output into a suitable format (it folds the perf-script output into one line per stack, with a count of the number of times each stack was seen). | + | |
- | - Generate the Flame Graph using **flamegraph.pl** (based on the folded data) and redirect the output to an SVG file. | + | |
- | - Open in any browser the interactive SVG graph obtained and inspect it. | + | |
- | + | ||
- | More details can also be found [[https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html|here]] and [[https://gitlab.com/gitlab-com/runbooks/-/blob/v2.220.2/docs/tutorials/how_to_use_flamegraphs_for_perf_profiling.md|here]]. | + | |
- | </note> | + | |
+ | I would not give them the extra points if they did not answer the questions on point d. | ||
+ | </solution> | ||
- |