Show page

Differences

This shows you the differences between two versions of the page.

--- ep:labs:061:contents:tasks:ex3 [2026/04/06 23:20]
maria.popescu2812 [04. [20p] bpftrace]
+++ ep:labs:061:contents:tasks:ex3 [2026/04/07 02:13] (current)
radu.mantu
@@ Line 5: / Line 5: @@
 [[https://github.com/cilium/pwru|pwru]] is a tool created by Cilium to help trace network packets in the kernel's network stack and debug network connectivity issues. It does this by attaching simple eBPF programs to certain function entry points. These programs can report back to a userspace process different kinds of information, including the function that was reached, the arguments that were passed, and a CPU clock timestamp. The method used for instrumenting kernel code is based on [[https://www.kernel.org/doc/html/latest/trace/kprobes.html|kprobes]]. Ask your assistant for more information.
-=== [10p] Task A — A packet's journey ===
+=== [10p] Task A - A packet's journey ===
 **Installation — build from source**
@@ Line 71: / Line 71: @@
   - **How does it leave the machine?** Identify the function responsible for handing the packet to the network device driver. What happens after this point?
   - **What changed with the DROP rule?** Compare the two traces side by side. At which function does the path diverge?
-==== 04. [20p] bpftrace ====
-In [[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]] you used bpftrace exclusively via one-liners (''-e'' flag). That works fine for quick investigations, but as your probes get more complex — multiple hooks, conditionals, helper functions — you'll want to write proper **script files** (''.bt'' extension).
-The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with ''sudo bpftrace script.bt'' without the shell escaping headaches that come with one-liners.
-In this task you'll write two scripts targeting functions you observed in your ''pwru'' trace from Task 03.
-<note important>
-**Before starting:** make sure you have a clean ''iptables'' state. Remove any DROP rules you added in Task 03:
-<code bash>
-$ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP
-# verify:
-$ sudo iptables -L OUTPUT -n
-</code>
-</note>
-=== [0p] Task A: Demo: coding style for bpftrace scripts ===
-Before writing your own scripts, study this example. It is not a task — there is nothing to submit. It exists to show what a well-structured ''.bt'' script looks like, so you have a reference when writing the next two.
-You can also find out more about the bpftrace coding style [[https://bpftrace.org/docs/release_025/language|here]]
-<code bash nf_demo.bt>
-#!/usr/bin/bpftrace
-BEGIN
-{
-    printf("Tracing nf_hook_slow... Ctrl+C to stop.\n\n");
-}
-/*
- * fentry fires at the entry of the kernel function.
- * Faster and lower-overhead than kprobe.
- * 'comm' is a bpftrace built-in: the name of the current process.
- */
-fentry:nf_hook_slow
-{
-    @invocations_by_process[comm]++;
-}
-/* Print and reset every 3 seconds */
-interval:s:3
-{
-    printf("-- %s --\n", strftime("%H:%M:%S", nsecs));
-    print(@invocations_by_process);
-    printf("\n");
-    clear(@invocations_by_process);
-}
-END
-{
-    printf("Done.\n");
-}
-</code>
-Run it for a few seconds while generating some traffic (''curl'', ''dig'', etc.) and observe the output. Then read through the script again. This is the style expected in Task B.
-=== [20p] Task B: The cost of a bloated rule chain ===
-''nf_hook_slow'', which is visible in your ''pwru'' trace from Task 03, is the function that walks the iptables rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A rule like ''-p tcp -d 8.8.8.8 --dport 443'' invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up.
-A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation.
-You are going to reproduce this and measure it.
-**What you need**
-  * ''iperf3'': ''sudo apt install iperf3''
-  * An iperf3 server — see options below
-  * ''bpftrace'' (from previous tasks)
-  * Python with ''matplotlib'' and ''pandas'' for the plot
-**Setting up a local iperf3 server**
-Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup.
-=== Option 1: Docker container with Arch Linux ===
-If you have Docker installed, you can spin up an Arch Linux container that shares the host's network stack. The ''--network host'' flag means the container does not get its own network namespace, but it uses yours. Traffic to ''127.0.0.1:5201'' goes through the host OUTPUT chain exactly as if a local process were listening.
-<code bash>
-# Pull the Arch Linux image (first time only, ~170MB)
-$ docker pull archlinux
-# Start the container: share host network, install iperf3, start server
-$ docker run -d --rm \
-    --name iperf3-server \
-    --network host \
-    archlinux \
-    sh -c "pacman -Sy --noconfirm iperf3 2>/dev/null && iperf3 -s"
-# Wait ~15 seconds for pacman to finish, then test
-$ iperf3 -c 127.0.0.1 -p 5201 -t 5
-# Stop when done (--rm means the container is deleted automatically)
-$ docker stop iperf3-server
-</code>
-<note tip>
-**Docker in one sentence:** a container is a process running in its own isolated namespaces (filesystem, PID, network, etc.). ''--network host'' disables the network isolation specifically, so the container's iperf3 process binds to your machine's port 5201 directly with no port forwarding, no bridge, no NAT.
-</note>
-<note warning>
-The ''pacman -Sy'' step runs inside the container every time it starts. This takes ~15 seconds on first run (downloads packages) and is faster on subsequent runs if Docker layer caching applies. If this is too slow, build a local image once:
-<code bash>
-$ echo -e 'FROM archlinux\nRUN pacman -Sy --noconfirm iperf3' > Dockerfile.iperf3
-$ docker build -t local/iperf3 -f Dockerfile.iperf3 .
-$ docker run -d --rm --name iperf3-server --network host local/iperf3 iperf3 -s
-</code>
-</note>
-=== Option 2: network namespace (no Docker required) ===
-This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See [[https://ocw.cs.pub.ro/courses/rl/labs/10|RL Lab 10]] for a deeper dive into how this works.
-<code bash>
-# 1. Create the namespace
-$ sudo ip netns add iperf3-ns
-# 2. Create a veth pair: one end stays on the host, one goes into the namespace
-$ sudo ip link add veth-host type veth peer name veth-ns
-# 3. Move one end into the namespace
-$ sudo ip link set veth-ns netns iperf3-ns
-# 4. Configure the host-side interface
-$ sudo ip addr add 10.99.0.1/24 dev veth-host
-$ sudo ip link set veth-host up
-# 5. Configure the namespace-side interface
-$ sudo ip netns exec iperf3-ns ip addr add 10.99.0.2/24 dev veth-ns
-$ sudo ip netns exec iperf3-ns ip link set veth-ns up
-$ sudo ip netns exec iperf3-ns ip link set lo up
-# 6. Start iperf3 server inside the namespace (background)
-$ sudo ip netns exec iperf3-ns iperf3 -s -D
-# 7. Test from the host (server is at 10.99.0.2)
-$ iperf3 -c 10.99.0.2 -p 5201 -t 5
-</code>
-Traffic from the host to ''10.99.0.2'' is routed through the kernel's normal IP output path and hits the OUTPUT chain where ''nf_hook_slow'' is instrumented correctly.
-When done with the experiment:
-<code bash>
-$ sudo ip netns delete iperf3-ns
-$ sudo ip link delete veth-host
-</code>
-**The bpftrace script**
-based on the example provided in demo, try and make a similar script by yourself, using probes as you did in :[[https://ocw.cs.pub.ro/courses/ep/labs/05| Lab 05]]
-<note tip>
-We reccomend using ''kprobe''/''kretprobe'' instead of ''fentry''/''fexit'' for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but negligible alongside a 300-second iperf3 test.
-</note>
-**The experiment**
-After you made that script, run it in a terminal and two other commands like this:
-Open three terminals and run the following simultaneously:
- iperf3 throwing the putput in a JSON, bftrace measurement and the injection of some rules (we recommend injecting more then 5000)
-** Hint:**  [[https://ocw.cs.pub.ro/courses/rl/labs/10| repetitive structures]]
-<note tip > ''iperf3'' with ''-J'' produces a JSON file containing per-second interval data. Each interval entry includes ''sum.bits_per_second'' (throughput) and ''streams[0].snd_cwnd'' (TCP congestion window size in bytes).
-</note>
-Wait for iperf3 to finish naturally (or stop it after the loop completes). Then clean up:
-<code bash>
-$ sudo iptables -F OUTPUT
-</code>
-**The plot**
-Write a Python script called ''plot_results.py'' that:
-  - Parses ''iperf_results.json'' and extracts, for each 1-second interval: timestamp (seconds from start), throughput in Mbit/s, and congestion window in KB
-  - Parses ''bpf_results.txt'' and extracts, for each 10-second interval: the average ''nf_hook_slow'' latency in nanoseconds
-  - Produces a single figure with two subplots stacked vertically, sharing the X axis (time in seconds):
-    * Top subplot: throughput (Mbit/s) as a line, congestion window (KB) as a line on a secondary Y axis
-    * Bottom subplot: average ''nf_hook_slow'' latency (ns) as a step plot, one point per 10s interval
-  - Adds a vertical shaded region (''axvspan'') marking the approximate time window during which iptables rules were being injected (you can estimate this from when you started Terminal 3)
-  - Saves the figure as ''results.png''
-**Questions**
-Answer the following in your report:
-  - At what approximate rule count (or time) does throughput begin to visibly degrade? Does congestion window change in the same way?
-  - Is the latency increase in ''nf_hook_slow'' linear with rule count? What does this tell you about the algorithm used to walk the chain?
-<solution -hidden>
-//Terminal 1 — sustained iperf3 test (300 seconds, JSON output)://
-<code bash>
-$ iperf3 -c <server> -p <port> -t 300 -J --logfile iperf_results.json
-</code>
-//Terminal 2 — bpftrace measurement://
-<code bash>
-$ sudo bpftrace nf_measure.bt 2>/dev/null | tee bpf_results.txt
-</code>
-//Terminal 3 — inject rules progressively (run only after the other two are running)://
-<code bash>
-$ for ((i = 0; i < 5000; i++)); do
-      echo -ne "\r$i"
-      sudo iptables -I OUTPUT -d 192.168.${i} -j ACCEPT
-  done
-</code>
-<code bash nf_measure.bt>
-#!/usr/bin/bpftrace
-kprobe:nf_hook_slow
-{
-    @start[tid] = nsecs;
-}
-kretprobe:nf_hook_slow
-/ @start[tid] /
-{
-    @sum += nsecs - @start[tid];
-    @count++;
-    delete(@start[tid]);
-}
-interval:s:10
-{
-    printf("avg. elapsed: %lu | count: %lu\n", @sum / @count, @count);
-    @sum = 0;
-    @count = 0;
-}
-</code>
-<code python plot_results.py>
-#!/usr/bin/env python3
-import json
-import re
-import sys
-import matplotlib.pyplot as plt
-import matplotlib.ticker as ticker
-IPERF_JSON  = "iperf_results.json"
-BPF_LOG     = "bpf_results.txt"
-OUTPUT_PNG  = "results.png"
-# parsing the json
-with open(IPERF_JSON) as f:
-    data = json.load(f)
-intervals = data["intervals"]
-times_s       = [iv["sum"]["start"]           for iv in intervals]
-throughput    = [iv["sum"]["bits_per_second"] / 1e6 for iv in intervals]  # Mbit/s
-cwnd_kb       = [iv["streams"][0]["snd_cwnd"] / 1024 for iv in intervals]  # KB
-# parsing bpftrace output
-# format: "avg. elapsed: 1243 | count: 8821"
-bpf_times   = []
-bpf_latency = []
-pattern = re.compile(r"avg\. elapsed:\s*(\d+)\s*\|\s*count:\s*(\d+)")
-with open(BPF_LOG) as f:
-    t = 5  # first interval midpoint (10s intervals)
-    for line in f:
-        m = pattern.search(line)
-        if m:
-            bpf_times.append(t)
-            bpf_latency.append(int(m.group(1)))
-            t += 10
-# making the plot
-fig, (ax1, ax3) = plt.subplots(2, 1, figsize=(12, 7), sharex=True)
-fig.suptitle("iptables rule chain overhead — nf_hook_slow vs. throughput", fontsize=13)
-# Top subplot: throughput + cwnd
-color_tp   = "steelblue"
-color_cwnd = "darkorange"
-ax1.plot(times_s, throughput, color=color_tp, label="Throughput (Mbit/s)", linewidth=1.5)
-ax1.set_ylabel("Throughput (Mbit/s)", color=color_tp)
-ax1.tick_params(axis="y", labelcolor=color_tp)
-ax2 = ax1.twinx()
-ax2.plot(times_s, cwnd_kb, color=color_cwnd, label="TCP cwnd (KB)", linewidth=1.2, linestyle="--")
-ax2.set_ylabel("TCP Congestion Window (KB)", color=color_cwnd)
-ax2.tick_params(axis="y", labelcolor=color_cwnd)
-lines1, labels1 = ax1.get_legend_handles_labels()
-lines2, labels2 = ax2.get_legend_handles_labels()
-ax1.legend(lines1 + lines2, labels1 + labels2, loc="upper right", fontsize=9)
-# Bottom subplot: nf_hook_slow latency
-ax3.step(bpf_times, bpf_latency, color="crimson", where="post", linewidth=1.5,
-         label="avg nf_hook_slow latency (ns)")
-ax3.set_ylabel("avg latency (ns)")
-ax3.set_xlabel("Time (s)")
-ax3.legend(loc="upper left", fontsize=9)
-ax3.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{int(x):,}"))
-# Marking iptables injection window
-INJECT_START = 10   # seconds after iperf3 started when you ran Terminal 3
-INJECT_END   = 260  # approximate end of injection loop
-for ax in (ax1, ax3):
-    ax.axvspan(INJECT_START, INJECT_END, alpha=0.08, color="gray",
-               label="rule injection window")
-plt.tight_layout()
-plt.savefig(OUTPUT_PNG, dpi=150)
-print(f"Saved {OUTPUT_PNG}")
-</code>
-</solution>

General Information

Lectures

Labs

Assignments

Archived Labs

ep/labs/061/contents/tasks/ex3.1775506850.txt.gz · Last modified: 2026/04/06 23:20 by maria.popescu2812

Show page Old revisions

Media Manager Back to top