This shows you the differences between two versions of the page.
|
ep:labs:061:contents:tasks:ex4 [2026/04/07 01:36] radu.mantu |
ep:labs:061:contents:tasks:ex4 [2026/04/07 12:47] (current) radu.mantu |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ==== 04. [30p] bpftrace ==== | + | ==== 04. [30p] Impact analysis of iptables rules ==== |
| - | In [[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]] you used bpftrace exclusively via one-liners (''-e'' flag). That works fine for quick investigations, but as your probes get more complex — multiple hooks, conditionals, helper functions — you'll want to write proper **script files** (''.bt'' extension). | + | In [[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]] you used bpftrace exclusively via one-liners (''-e'' flag). That works fine for quick investigations, but as your probes get more complex (multiple hooks, conditionals, helper functions) you'll want to write proper **script files** (''.bt'' extension). |
| The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with ''sudo bpftrace script.bt'' without the shell escaping headaches that come with one-liners. | The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with ''sudo bpftrace script.bt'' without the shell escaping headaches that come with one-liners. | ||
| - | In this task you'll write two scripts targeting functions you observed in your ''pwru'' trace from Task 03. | + | In this task you'll write two scripts targeting functions you observed in your **pwru** trace from Exercise 03. |
| <note important> | <note important> | ||
| - | **Before starting:** make sure you have a clean ''iptables'' state. Remove any DROP rules you added in Task 03: | + | **Before starting:** make sure you have a clean ''iptables'' state. Remove any DROP rules you added in the previous exercise: |
| <code bash> | <code bash> | ||
| $ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP | $ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP | ||
| Line 28: | Line 28: | ||
| } | } | ||
| - | /* | + | /* fentry fires at the entry of the kernel function. |
| - | * fentry fires at the entry of the kernel function. | + | |
| * Faster and lower-overhead than kprobe. | * Faster and lower-overhead than kprobe. | ||
| * 'comm' is a bpftrace built-in: the name of the current process. | * 'comm' is a bpftrace built-in: the name of the current process. | ||
| Line 53: | Line 52: | ||
| </code> | </code> | ||
| - | Run it for a few seconds while generating some traffic (''curl'', ''dig'', etc.) and observe the output. Then read through the script again. This is the style expected in Task B. | + | Run it for a few seconds while generating some traffic and observe the output. Then read through the script again. This is the style expected in Task B. |
| === [30p] Task B: The cost of a bloated rule chain === | === [30p] Task B: The cost of a bloated rule chain === | ||
| - | ''nf_hook_slow'', which is visible in your ''pwru'' trace from Task 03, is the function that walks the iptables rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A rule like ''-p tcp -d 8.8.8.8 --dport 443'' invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up. | + | ''nf_hook_slow()'', which is visible in your **pwru** trace from Task 03, is the function that walks the **iptables** rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A match rule such as ''-p tcp -d 8.8.8.8 --dport 443'' invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up. |
| A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation. | A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation. | ||
| Line 63: | Line 62: | ||
| You are going to reproduce this and measure it. | You are going to reproduce this and measure it. | ||
| - | **What you need** | + | == What you need == |
| - | * ''iperf3'': ''sudo apt install iperf3'' | + | * **iperf3:** Tool for performing network throughput measurements. It's both a server and a client. |
| - | * An iperf3 server — see options below | + | * **bpftrace:** High-level eBPF scripting tool for kernel profiling. |
| - | * ''bpftrace'' (from previous tasks) | + | * **python3:** With ''matplotlib'' and optionally ''pandas'' for plotting. |
| - | * Python with ''matplotlib'' and ''pandas'' for the plot | + | |
| - | **Setting up a local iperf3 server** | + | == Sub-task 01: Setting up a local iperf3 server == |
| Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup. | Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup. | ||
| - | === Option 1: Docker container with Arch Linux === | + | **Option 1: Docker container with Arch Linux** |
| If you have Docker installed, you can spin up an Arch Linux container. This container will use the same TCP/IP stack as the host, but will have distinct network devices, routing tables, firewall rules, etc. Any packet that leaves the container will have to pass through the network stack twice. | If you have Docker installed, you can spin up an Arch Linux container. This container will use the same TCP/IP stack as the host, but will have distinct network devices, routing tables, firewall rules, etc. Any packet that leaves the container will have to pass through the network stack twice. | ||
| Line 91: | Line 89: | ||
| </code> | </code> | ||
| - | === Option 2: network namespace (no Docker required) === | + | **Option 2: network namespace (no Docker required)** |
| This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See [[https://ocw.cs.pub.ro/courses/rl/labs/10|RL Lab 10]] for a deeper dive into how this works. | This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See [[https://ocw.cs.pub.ro/courses/rl/labs/10|RL Lab 10]] for a deeper dive into how this works. | ||
| Line 129: | Line 127: | ||
| </code> | </code> | ||
| - | **The bpftrace script** | + | == Sub-task 02: The bpftrace script == |
| + | |||
| + | Write a //bpftrace script// of your own that calculates the average time each packet spent being evaluated in ''nf_hook_slow()''. | ||
| - | based on the example provided in demo, try and make a similar script by yourself, using probes as you did in [[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]] | ||
| - | |||
| <note tip> | <note tip> | ||
| - | We reccomend using ''kprobe''/''kretprobe'' instead of ''fentry''/''fexit'' for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but negligible alongside a 300-second iperf3 test. | + | We reccomend using ''kprobe''/''kretprobe'' instead of ''fentry''/''fexit'' for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but overall negligible. |
| </note> | </note> | ||
| - | **The experiment** | + | == Sub-task 03: Acquiring the data == |
| - | After you made that script, run it in a terminal and two other commands like this: | + | Run a 5-10s **iperf3** throughput test between your host and the container. Meanwhile, use the script that you've written to measure the latency introduced by the OUTPUT Netfilter chain hook. |
| - | Open three terminals and run the following simultaneously: | + | Having no rules configured on your OUTPUT chain, this will serve as a __baseline__. Next, redo this experiment by continuously adding 100 **iptables** rules that are __guaranteed__ to fail (i.e., verdict will never be obtained until all rules are evaluated). Repeat these steps until you end up with ~3,000 rules in your OUTPUT chain. Save all these results (number of rules, average throughput, average Netfilter-induced latency) since you will have to plot them. |
| - | iperf3 throwing the putput in a JSON, bftrace measurement and the injection of some rules (we recommend injecting more then 5000) | + | |
| - | ** Hint:** [[https://ocw.cs.pub.ro/courses/rl/labs/10| repetitive structures]] | + | Try to script this, since manually re-running all of this is very tiresome! |
| - | <note tip > ''iperf3'' with ''-J'' produces a JSON file containing per-second interval data. Each interval entry includes ''sum.bits_per_second'' (throughput) and ''streams[0].snd_cwnd'' (TCP congestion window size in bytes). | + | <note> |
| - | </note> | + | Flush your OUTPUT chain after you are done with the experiment: |
| - | + | ||
| - | Wait for iperf3 to finish naturally (or stop it after the loop completes). Then clean up: | + | |
| <code bash> | <code bash> | ||
| $ sudo iptables -F OUTPUT | $ sudo iptables -F OUTPUT | ||
| </code> | </code> | ||
| + | </note> | ||
| - | **The plot** | ||
| - | |||
| - | Write a Python script called ''plot_results.py'' that: | ||
| - | - Parses ''iperf_results.json'' and extracts, for each 1-second interval: timestamp (seconds from start), throughput in Mbit/s, and congestion window in KB | + | == Sub-task 04: Plotting the data == |
| - | - Parses ''bpf_results.txt'' and extracts, for each 10-second interval: the average ''nf_hook_slow'' latency in nanoseconds | + | |
| - | - Produces a single figure with two subplots stacked vertically, sharing the X axis (time in seconds): | + | |
| - | * Top subplot: throughput (Mbit/s) as a line, congestion window (KB) as a line on a secondary Y axis | + | |
| - | * Bottom subplot: average ''nf_hook_slow'' latency (ns) as a step plot, one point per 10s interval | + | |
| - | - Adds a vertical shaded region (''axvspan'') marking the approximate time window during which iptables rules were being injected (you can estimate this from when you started Terminal 3) | + | |
| - | - Saves the figure as ''results.png'' | + | |
| - | **Questions** | + | Write a Python script that creates two plots in the same figure: |
| + | * **iperf3** throughput as a function of **iptabes** rules. | ||
| + | * Average elapsed time in the Netfilter hook as a function of **iptables** rules. | ||
| - | Answer the following in your report: | + | == Sub-task 05: Interpreting the data == |
| - | - At what approximate rule count (or time) does throughput begin to visibly degrade? Does congestion window change in the same way? | + | Answer the following: |
| - | - Is the latency increase in ''nf_hook_slow'' linear with rule count? What does this tell you about the algorithm used to walk the chain? | + | * At what approximate rule count does the throughput begin to visibly degrade? |
| + | * Is the latency increase in ''nf_hook_slow()'' linear with rule count? What does this tell you about the algorithm used to walk the chain? | ||
| + | * What do you expect would happen if you were to perform this test on a [[https://iperf3serverlist.net/|public iperf3 server]] instead of locally hosted one? | ||
| <solution -hidden> | <solution -hidden> | ||