This shows you the differences between two versions of the page.
|
ep:labs:061:contents:tasks:ex4 [2019/09/27 06:36] andreea.alistar created |
ep:labs:061:contents:tasks:ex4 [2026/04/07 12:47] (current) radu.mantu |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ==== 04. [10p] Time-based data when plotting in gnuplot ==== | + | ==== 04. [30p] Impact analysis of iptables rules ==== |
| - | Datafile: {{:ep:labs:time_data.txt|}} | + | In [[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]] you used bpftrace exclusively via one-liners (''-e'' flag). That works fine for quick investigations, but as your probes get more complex (multiple hooks, conditionals, helper functions) you'll want to write proper **script files** (''.bt'' extension). |
| - | Using the code provided in “Tutorial 02. Time-based data when plotting in gnuplot”, use the histogram style, and format the xtic labels using strftime and timecolumn. | + | The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with ''sudo bpftrace script.bt'' without the shell escaping headaches that come with one-liners. |
| - | <code> | + | In this task you'll write two scripts targeting functions you observed in your **pwru** trace from Exercise 03. |
| - | set timefmt "%H:%S" | + | |
| - | set style fill solid 0.6 border -1 | + | <note important> |
| - | set style data histogram | + | **Before starting:** make sure you have a clean ''iptables'' state. Remove any DROP rules you added in the previous exercise: |
| - | set style histogram clustered gap 1 plot 'data.dat' using 2:xtic(strftime('%H', timecolumn(1))), \ '' using ($2*0.5), \ '' using ($2*0.7) | + | <code bash> |
| + | $ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP | ||
| </code> | </code> | ||
| + | </note> | ||
| + | |||
| + | === [0p] Task A: Demo: coding style for bpftrace scripts === | ||
| + | |||
| + | Before writing your own scripts, study this example. It is not a task — there is nothing to submit. It exists to show what a well-structured ''.bt'' script looks like, so you have a reference when writing the next two. | ||
| + | |||
| + | You can also find out more about the bpftrace coding style [[https://bpftrace.org/docs/release_025/language|here]] | ||
| + | |||
| + | <code bash nf_demo.bt> | ||
| + | #!/usr/bin/bpftrace | ||
| + | |||
| + | BEGIN | ||
| + | { | ||
| + | printf("Tracing nf_hook_slow... Ctrl+C to stop.\n\n"); | ||
| + | } | ||
| + | |||
| + | /* fentry fires at the entry of the kernel function. | ||
| + | * Faster and lower-overhead than kprobe. | ||
| + | * 'comm' is a bpftrace built-in: the name of the current process. | ||
| + | */ | ||
| + | fentry:nf_hook_slow | ||
| + | { | ||
| + | @invocations_by_process[comm]++; | ||
| + | } | ||
| + | |||
| + | /* Print and reset every 3 seconds */ | ||
| + | interval:s:3 | ||
| + | { | ||
| + | printf("-- %s --\n", strftime("%H:%M:%S", nsecs)); | ||
| + | print(@invocations_by_process); | ||
| + | printf("\n"); | ||
| + | clear(@invocations_by_process); | ||
| + | } | ||
| + | |||
| + | END | ||
| + | { | ||
| + | printf("Done.\n"); | ||
| + | } | ||
| + | |||
| + | </code> | ||
| + | Run it for a few seconds while generating some traffic and observe the output. Then read through the script again. This is the style expected in Task B. | ||
| + | |||
| + | === [30p] Task B: The cost of a bloated rule chain === | ||
| + | |||
| + | ''nf_hook_slow()'', which is visible in your **pwru** trace from Task 03, is the function that walks the **iptables** rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A match rule such as ''-p tcp -d 8.8.8.8 --dport 443'' invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up. | ||
| + | |||
| + | A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation. | ||
| + | |||
| + | You are going to reproduce this and measure it. | ||
| + | |||
| + | == What you need == | ||
| + | |||
| + | * **iperf3:** Tool for performing network throughput measurements. It's both a server and a client. | ||
| + | * **bpftrace:** High-level eBPF scripting tool for kernel profiling. | ||
| + | * **python3:** With ''matplotlib'' and optionally ''pandas'' for plotting. | ||
| + | |||
| + | == Sub-task 01: Setting up a local iperf3 server == | ||
| + | |||
| + | Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup. | ||
| + | |||
| + | **Option 1: Docker container with Arch Linux** | ||
| + | |||
| + | If you have Docker installed, you can spin up an Arch Linux container. This container will use the same TCP/IP stack as the host, but will have distinct network devices, routing tables, firewall rules, etc. Any packet that leaves the container will have to pass through the network stack twice. | ||
| + | |||
| + | <code bash> | ||
| + | # start the container | ||
| + | host$ docker run -ti --rm archlinux | ||
| + | |||
| + | # show IP address of container and run iperf3 | ||
| + | arch$ pacman -Sy --noconfirm iperf3 iproute2 | ||
| + | arch$ ip -c a s | ||
| + | arch$ iperf -s | ||
| + | |||
| + | # test if it works (should have >40Gbps throughput) | ||
| + | host$ iperf3 -c ${container_ip} -p 5201 -t 5 | ||
| + | </code> | ||
| + | |||
| + | **Option 2: network namespace (no Docker required)** | ||
| + | |||
| + | This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See [[https://ocw.cs.pub.ro/courses/rl/labs/10|RL Lab 10]] for a deeper dive into how this works. | ||
| + | |||
| + | <code bash> | ||
| + | # 1. Create the namespace | ||
| + | $ sudo ip netns add iperf3-ns | ||
| + | |||
| + | # 2. Create a veth pair: one end stays on the host, one goes into the namespace | ||
| + | $ sudo ip link add veth-host type veth peer name veth-ns | ||
| + | |||
| + | # 3. Move one end into the namespace | ||
| + | $ sudo ip link set veth-ns netns iperf3-ns | ||
| + | |||
| + | # 4. Configure the host-side interface | ||
| + | $ sudo ip addr add 10.99.0.1/24 dev veth-host | ||
| + | $ sudo ip link set veth-host up | ||
| + | |||
| + | # 5. Configure the namespace-side interface | ||
| + | $ sudo ip netns exec iperf3-ns ip addr add 10.99.0.2/24 dev veth-ns | ||
| + | $ sudo ip netns exec iperf3-ns ip link set veth-ns up | ||
| + | $ sudo ip netns exec iperf3-ns ip link set lo up | ||
| + | |||
| + | # 6. Start iperf3 server inside the namespace (background) | ||
| + | $ sudo ip netns exec iperf3-ns iperf3 -s -D | ||
| + | |||
| + | # 7. Test from the host (server is at 10.99.0.2) | ||
| + | $ iperf3 -c 10.99.0.2 -p 5201 -t 5 | ||
| + | </code> | ||
| + | |||
| + | Traffic from the host to ''10.99.0.2'' is routed through the kernel's normal IP output path and hits the OUTPUT chain where ''nf_hook_slow'' is instrumented correctly. | ||
| + | |||
| + | When done with the experiment: | ||
| + | <code bash> | ||
| + | $ sudo ip netns delete iperf3-ns | ||
| + | $ sudo ip link delete veth-host | ||
| + | </code> | ||
| + | |||
| + | == Sub-task 02: The bpftrace script == | ||
| + | |||
| + | Write a //bpftrace script// of your own that calculates the average time each packet spent being evaluated in ''nf_hook_slow()''. | ||
| + | |||
| + | <note tip> | ||
| + | We reccomend using ''kprobe''/''kretprobe'' instead of ''fentry''/''fexit'' for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but overall negligible. | ||
| + | </note> | ||
| + | |||
| + | == Sub-task 03: Acquiring the data == | ||
| + | |||
| + | Run a 5-10s **iperf3** throughput test between your host and the container. Meanwhile, use the script that you've written to measure the latency introduced by the OUTPUT Netfilter chain hook. | ||
| + | |||
| + | Having no rules configured on your OUTPUT chain, this will serve as a __baseline__. Next, redo this experiment by continuously adding 100 **iptables** rules that are __guaranteed__ to fail (i.e., verdict will never be obtained until all rules are evaluated). Repeat these steps until you end up with ~3,000 rules in your OUTPUT chain. Save all these results (number of rules, average throughput, average Netfilter-induced latency) since you will have to plot them. | ||
| + | |||
| + | Try to script this, since manually re-running all of this is very tiresome! | ||
| + | |||
| + | <note> | ||
| + | Flush your OUTPUT chain after you are done with the experiment: | ||
| + | <code bash> | ||
| + | $ sudo iptables -F OUTPUT | ||
| + | </code> | ||
| + | </note> | ||
| + | |||
| + | |||
| + | == Sub-task 04: Plotting the data == | ||
| + | |||
| + | Write a Python script that creates two plots in the same figure: | ||
| + | * **iperf3** throughput as a function of **iptabes** rules. | ||
| + | * Average elapsed time in the Netfilter hook as a function of **iptables** rules. | ||
| + | |||
| + | == Sub-task 05: Interpreting the data == | ||
| + | |||
| + | Answer the following: | ||
| + | * At what approximate rule count does the throughput begin to visibly degrade? | ||
| + | * Is the latency increase in ''nf_hook_slow()'' linear with rule count? What does this tell you about the algorithm used to walk the chain? | ||
| + | * What do you expect would happen if you were to perform this test on a [[https://iperf3serverlist.net/|public iperf3 server]] instead of locally hosted one? | ||
| + | |||
| + | <solution -hidden> | ||
| + | |||
| + | //Terminal 1 — sustained iperf3 test (300 seconds, JSON output):// | ||
| + | <code bash> | ||
| + | $ iperf3 -c <server> -p <port> -t 300 -J --logfile iperf_results.json | ||
| + | </code> | ||
| + | |||
| + | //Terminal 2 — bpftrace measurement:// | ||
| + | <code bash> | ||
| + | $ sudo bpftrace nf_measure.bt 2>/dev/null | tee bpf_results.txt | ||
| + | </code> | ||
| + | |||
| + | //Terminal 3 — inject rules progressively (run only after the other two are running):// | ||
| + | <code bash> | ||
| + | $ for ((i = 0; i < 5000; i++)); do | ||
| + | echo -ne "\r$i" | ||
| + | sudo iptables -I OUTPUT -d 192.168.${i} -j ACCEPT | ||
| + | done | ||
| + | </code> | ||
| + | |||
| + | <code bash nf_measure.bt> | ||
| + | #!/usr/bin/bpftrace | ||
| + | kprobe:nf_hook_slow | ||
| + | { | ||
| + | @start[tid] = nsecs; | ||
| + | } | ||
| + | |||
| + | kretprobe:nf_hook_slow | ||
| + | / @start[tid] / | ||
| + | { | ||
| + | @sum += nsecs - @start[tid]; | ||
| + | @count++; | ||
| + | delete(@start[tid]); | ||
| + | } | ||
| + | |||
| + | interval:s:10 | ||
| + | { | ||
| + | printf("avg. elapsed: %lu | count: %lu\n", @sum / @count, @count); | ||
| + | @sum = 0; | ||
| + | @count = 0; | ||
| + | } | ||
| + | </code> | ||
| + | |||
| + | <code python plot_results.py> | ||
| + | #!/usr/bin/env python3 | ||
| + | import json | ||
| + | import re | ||
| + | import sys | ||
| + | import matplotlib.pyplot as plt | ||
| + | import matplotlib.ticker as ticker | ||
| + | |||
| + | IPERF_JSON = "iperf_results.json" | ||
| + | BPF_LOG = "bpf_results.txt" | ||
| + | OUTPUT_PNG = "results.png" | ||
| + | |||
| + | # parsing the json | ||
| + | with open(IPERF_JSON) as f: | ||
| + | data = json.load(f) | ||
| + | |||
| + | intervals = data["intervals"] | ||
| + | times_s = [iv["sum"]["start"] for iv in intervals] | ||
| + | throughput = [iv["sum"]["bits_per_second"] / 1e6 for iv in intervals] # Mbit/s | ||
| + | cwnd_kb = [iv["streams"][0]["snd_cwnd"] / 1024 for iv in intervals] # KB | ||
| + | |||
| + | # parsing bpftrace output | ||
| + | # format: "avg. elapsed: 1243 | count: 8821" | ||
| + | bpf_times = [] | ||
| + | bpf_latency = [] | ||
| + | pattern = re.compile(r"avg\. elapsed:\s*(\d+)\s*\|\s*count:\s*(\d+)") | ||
| + | |||
| + | with open(BPF_LOG) as f: | ||
| + | t = 5 # first interval midpoint (10s intervals) | ||
| + | for line in f: | ||
| + | m = pattern.search(line) | ||
| + | if m: | ||
| + | bpf_times.append(t) | ||
| + | bpf_latency.append(int(m.group(1))) | ||
| + | t += 10 | ||
| + | |||
| + | # making the plot | ||
| + | fig, (ax1, ax3) = plt.subplots(2, 1, figsize=(12, 7), sharex=True) | ||
| + | fig.suptitle("iptables rule chain overhead — nf_hook_slow vs. throughput", fontsize=13) | ||
| + | |||
| + | # Top subplot: throughput + cwnd | ||
| + | color_tp = "steelblue" | ||
| + | color_cwnd = "darkorange" | ||
| + | |||
| + | ax1.plot(times_s, throughput, color=color_tp, label="Throughput (Mbit/s)", linewidth=1.5) | ||
| + | ax1.set_ylabel("Throughput (Mbit/s)", color=color_tp) | ||
| + | ax1.tick_params(axis="y", labelcolor=color_tp) | ||
| + | |||
| + | ax2 = ax1.twinx() | ||
| + | ax2.plot(times_s, cwnd_kb, color=color_cwnd, label="TCP cwnd (KB)", linewidth=1.2, linestyle="--") | ||
| + | ax2.set_ylabel("TCP Congestion Window (KB)", color=color_cwnd) | ||
| + | ax2.tick_params(axis="y", labelcolor=color_cwnd) | ||
| + | |||
| + | lines1, labels1 = ax1.get_legend_handles_labels() | ||
| + | lines2, labels2 = ax2.get_legend_handles_labels() | ||
| + | ax1.legend(lines1 + lines2, labels1 + labels2, loc="upper right", fontsize=9) | ||
| + | |||
| + | # Bottom subplot: nf_hook_slow latency | ||
| + | ax3.step(bpf_times, bpf_latency, color="crimson", where="post", linewidth=1.5, | ||
| + | label="avg nf_hook_slow latency (ns)") | ||
| + | ax3.set_ylabel("avg latency (ns)") | ||
| + | ax3.set_xlabel("Time (s)") | ||
| + | ax3.legend(loc="upper left", fontsize=9) | ||
| + | ax3.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{int(x):,}")) | ||
| + | |||
| + | # Marking iptables injection window | ||
| + | INJECT_START = 10 # seconds after iperf3 started when you ran Terminal 3 | ||
| + | INJECT_END = 260 # approximate end of injection loop | ||
| + | for ax in (ax1, ax3): | ||
| + | ax.axvspan(INJECT_START, INJECT_END, alpha=0.08, color="gray", | ||
| + | label="rule injection window") | ||
| + | |||
| + | plt.tight_layout() | ||
| + | plt.savefig(OUTPUT_PNG, dpi=150) | ||
| + | print(f"Saved {OUTPUT_PNG}") | ||
| + | </code> | ||
| + | |||
| + | </solution> | ||
| + | |||