ep:labs:061:contents:tasks:ex4

04. [30p] Impact analysis of iptables rules
- [0p] Task A: Demo: coding style for bpftrace scripts
- [30p] Task B: The cost of a bloated rule chain

04. [30p] Impact analysis of iptables rules

In Lab 05 you used bpftrace exclusively via one-liners (-e flag). That works fine for quick investigations, but as your probes get more complex (multiple hooks, conditionals, helper functions) you'll want to write proper script files (.bt extension).

The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with sudo bpftrace script.bt without the shell escaping headaches that come with one-liners.

In this task you'll write two scripts targeting functions you observed in your pwru trace from Exercise 03.

Before starting: make sure you have a clean iptables state. Remove any DROP rules you added in the previous exercise:

$ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP

[0p] Task A: Demo: coding style for bpftrace scripts

Before writing your own scripts, study this example. It is not a task — there is nothing to submit. It exists to show what a well-structured .bt script looks like, so you have a reference when writing the next two.

You can also find out more about the bpftrace coding style here

nf_demo.bt

#!/usr/bin/bpftrace
 
BEGIN
{
    printf("Tracing nf_hook_slow... Ctrl+C to stop.\n\n");
}
 
/* fentry fires at the entry of the kernel function.
 * Faster and lower-overhead than kprobe.
 * 'comm' is a bpftrace built-in: the name of the current process.
 */
fentry:nf_hook_slow
{
    @invocations_by_process[comm]++;
}
 
/* Print and reset every 3 seconds */
interval:s:3
{
    printf("-- %s --\n", strftime("%H:%M:%S", nsecs));
    print(@invocations_by_process);
    printf("\n");
    clear(@invocations_by_process);
}
 
END
{
    printf("Done.\n");
}

Run it for a few seconds while generating some traffic and observe the output. Then read through the script again. This is the style expected in Task B.

[30p] Task B: The cost of a bloated rule chain

nf_hook_slow(), which is visible in your pwru trace from Task 03, is the function that walks the iptables rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A match rule such as -p tcp -d 8.8.8.8 –dport 443 invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up.

A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation.

You are going to reproduce this and measure it.

What you need

iperf3: Tool for performing network throughput measurements. It's both a server and a client.
bpftrace: High-level eBPF scripting tool for kernel profiling.
python3: With matplotlib and optionally pandas for plotting.

Sub-task 01: Setting up a local iperf3 server

Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup.

Option 1: Docker container with Arch Linux

If you have Docker installed, you can spin up an Arch Linux container. This container will use the same TCP/IP stack as the host, but will have distinct network devices, routing tables, firewall rules, etc. Any packet that leaves the container will have to pass through the network stack twice.

# start the container
host$ docker run -ti --rm archlinux
 
# show IP address of container and run iperf3
arch$ pacman -Sy --noconfirm iperf3 iproute2
arch$ ip -c a s
arch$ iperf -s
 
# test if it works (should have >40Gbps throughput)
host$ iperf3 -c ${container_ip} -p 5201 -t 5

Option 2: network namespace (no Docker required)

This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See RL Lab 10 for a deeper dive into how this works.

# 1. Create the namespace
$ sudo ip netns add iperf3-ns
 
# 2. Create a veth pair: one end stays on the host, one goes into the namespace
$ sudo ip link add veth-host type veth peer name veth-ns
 
# 3. Move one end into the namespace
$ sudo ip link set veth-ns netns iperf3-ns
 
# 4. Configure the host-side interface
$ sudo ip addr add 10.99.0.1/24 dev veth-host
$ sudo ip link set veth-host up
 
# 5. Configure the namespace-side interface
$ sudo ip netns exec iperf3-ns ip addr add 10.99.0.2/24 dev veth-ns
$ sudo ip netns exec iperf3-ns ip link set veth-ns up
$ sudo ip netns exec iperf3-ns ip link set lo up
 
# 6. Start iperf3 server inside the namespace (background)
$ sudo ip netns exec iperf3-ns iperf3 -s -D
 
# 7. Test from the host (server is at 10.99.0.2)
$ iperf3 -c 10.99.0.2 -p 5201 -t 5

Traffic from the host to 10.99.0.2 is routed through the kernel's normal IP output path and hits the OUTPUT chain where nf_hook_slow is instrumented correctly.

When done with the experiment:

$ sudo ip netns delete iperf3-ns
$ sudo ip link delete veth-host

Sub-task 02: The bpftrace script

Write a bpftrace script of your own that calculates the average time each packet spent being evaluated in nf_hook_slow().

We reccomend using kprobe/kretprobe instead of fentry/fexit for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but overall negligible.

Sub-task 03: Acquiring the data

Run a 5-10s iperf3 throughput test between your host and the container. Meanwhile, use the script that you've written to measure the latency introduced by the OUTPUT Netfilter chain hook.

Having no rules configured on your OUTPUT chain, this will serve as a baseline. Next, redo this experiment by continuously adding 100 iptables rules that are guaranteed to fail (i.e., verdict will never be obtained until all rules are evaluated). Repeat these steps until you end up with ~3,000 rules in your OUTPUT chain. Save all these results (number of rules, average throughput, average Netfilter-induced latency) since you will have to plot them.

Try to script this, since manually re-running all of this is very tiresome!

Flush your OUTPUT chain after you are done with the experiment:

$ sudo iptables -F OUTPUT

Sub-task 04: Plotting the data

Write a Python script that creates two plots in the same figure:

iperf3 throughput as a function of iptabes rules.
Average elapsed time in the Netfilter hook as a function of iptables rules.

Sub-task 05: Interpreting the data

Answer the following:

At what approximate rule count does the throughput begin to visibly degrade?
Is the latency increase in nf_hook_slow() linear with rule count? What does this tell you about the algorithm used to walk the chain?
What do you expect would happen if you were to perform this test on a public iperf3 server instead of locally hosted one?

Table of Contents