Lab 06 - Network Monitoring

Objectives

  • Dive into the inner workings of previously studied traffic monitoring / filtering tools
  • Discuss methods of path discovery
  • Integrate eBPF tools in more complex profiling tasks

Contents

Tasks

01. [20p] Primer / Reminder

Pro tip #1: since you'll be using man a lot in this exercise, install neovim, and export this environment variable (in .bashrc or .zshrc to change the man pager. neovim has very good built-in syntax highlighting.

export MANPAGER='nvim +Man!'

Export the environment variable (source the shell config file) and test that it works.

eBPF — quick recap

Both tools in this section rely on eBPF under the hood, so it's worth a 60-second refresher before we start.

eBPF (extended Berkeley Packet Filter) is a virtual instruction set built into the Linux kernel. You write a small program, the kernel verifies it for safety (no infinite loops, no bad memory access), JIT-compiles it to native code, and attaches it to a hook point — a socket, a syscall, a kernel function entry, a network interface, etc. The program runs in kernel space without you having to write a kernel module.

Originally (1994), BPF existed only to filter network packets for tools like tcpdump: instead of copying every packet to userspace and discarding most of them there, you push a filter program into the kernel and only the matching packets cross the boundary. The “extended” part arrived around 2004 with 64-bit architectures, bringing wider registers, more map types, and JIT support.

Today eBPF is used for packet filtering (tcpdump, iptables BPF match), system profiling (bpftrace, Cilium, Netflix's FlameScope), and network policy enforcement in Kubernetes clusters. You already used it in Lab 05 for I/O tracing. In this lab you'll see it appear in two more places: inside tcpdump's filter compiler and in the iptables BPF match module.

[10p] Task A - tcpdump

tcpdump is a network traffic monitoring tool. At its core, it uses libpcap which in turn uses a technology called Extended Berkley Packet Filter (eBPF).

BPF was first proposed around 1995 when filtering mechanisms (and firewalls) were still a novel concept and were based on interpreters. BPF (now referred to as Classic BPF - cBPF) was the initial version of a Motorola inspired virtual ISA (i.e.: had no hardware implementation – think CHIP-8). eBPF is basically still BPF but more compatible with 64-bit architectures so that Just In Time (JIT) translators have an easier time running the code.

At first, the whole idea was to compile packet filtering programs and attach them to sockets in kernelspace. These programs would filter out packets that userspace processes would not be interested in. Consequently, this would reduce the quantity of data copied over the kernelspace/userspace boundary, only to ultimately be discarded.

Today, eBPF is used heavily for system profiling by companies such as Netflix and Facebook. Linux has had a kernel VM capable of running and statically analyzing eBPF code since around 2006. tcpdump is one of the few examples that still use it for its original purpose.

The Task

Use tcpdump to output outgoing NTP queries and incoming http(s) responses. Use the -d flag to see an objdump of the filter program's code.

Complete the tcpdump command in order to satisfy the following formatting requirements:

  • print the packet number
  • print the elapsed time (in nanoseconds) since the first packet was captured
  • print the content of each packet (without l2 header) in hex and ASCII
  • do not resolve IP addresses to names

How to test:

$ ntpdate -q ro.pool.ntp.org
$ curl ocw.cs.pub.ro

tpcdump can list the available interfaces if run with -D. In addition to your network interfaces, you may also see a Bluetooth device or the dbus-system (depending on your desktop).

If you don't specify the interface with -i, the first entry in the printed list will be used by default. This may not always be your active network interface but in stead, your docker bridge (for example).

[10p] Task B - iptables

iptables is a configuration tool for the kernel packet filter.

The system as a whole provides many functionalities that are grouped by tables: filter, nat, mangle, raw, security. If you want to alter a packet header, you place a rule in the mangle table. If you want to mask the private IP address of an internal host with the external IP address of the default gateway, you place a rule in the nat table. Depending on the table you choose, you will gain or lose access to some chains. If not specified, the default is the filter table.

Chains are basically lists of rules. The five built-in chains are PREROUTING, FORWARD, POSTROUTING, INPUT, OUTPUT. Each of these corresponds to certain locations in the network stack where packets trigger Netfilter hooks (here is the PREROUTING kernel hook as an example – not that hard to add one, right?) For a selected chain, the order in which the rules are evaluated is determined primarily by the priority of their tables and secondarily by the user's discretionary arrangement (i.e.: order in which rules are inserted).

Figure 1: Netfilter hooks; each has a subset of associated tables. Tables categorize the actions (i.e.: targets) taken when a match occurs. E.g: NAT cannot be performed on the FORWARD chain. When multiple rules exist on the same chain, their processing order is primarily determined by the priority of the table they are defined in.

A rule consists of two entities: a sequence of match criteria and a jump target.

The jump target represents an action to be taken. You are most likely familiar with the built-in actions such as ACCEPT or DROP. These actions decide the ultimate fate of the packet and are final (i.e.: rule iteration stops when these are invoked). However, there are also extended actions (see man iptables-extensions(8)) that are not terminal verdicts and can be used for various tasks such as auditing, forced checksum recalculation or removal of Explicit Congestion Notification (ECN) bits.

The match criteria of every rule are checked to determine if the jump target is applied. The way this is designed is very elegant: every type of feature (e.g.: Layer 3 IP address vs Layer 4 port) that you can check has a match callback function defined in the kernel. If you want, you can write your own such function in a Linux Kernel Module (LKM) and thus extend the functionality of iptables (Writing Netfilter Modules with code example). However, you will need to implement a userspace shared library counterpart. When you start an iptables process, it searches in /usr/lib/xtables/ and automatically loads certain shared libraries (note: this path can be overwritten or extended using the XTABLES_LIBDIR environment variable). Each library there must do three things:

  • define iptables flags for the new criteria that you want to include.
  • define help messages for when iptables --help is called (its help message is an amalgamation of each library's help snippet).
  • provide an initialization function for the structure containing the rule parameters; this structure will end up in the kernel's rule chain.

So when you want to test the efficiency of the iptables rule evaluation process, keep in mind that each rule may imply the invocation of multiple callbacks such as this.

The Task (1)

Write an iptables rule according to the following specifications:

  • chain: OUTPUT
  • match rule: TCP packets originating from ephemeral ports bound to a socket created by root
  • target: enable kernel logging of matched packets with the “EP: ” prefix

How to test:

$ sudo curl www.google.com
$ sudo dmesg

multiport, owner modules

$ man 8 iptables-extensions

The Task (2)

Write an iptables rule according to the following specifications:

  • chain: OUTPUT
  • match rule: BPF program that filters UDP traffic to port 53 (try bash command substitution)
  • target: set TTL to 1 (initially)

Continue appending the same rule with incremented TTL value until the DNS request goes through.

How to test:

$ dig +short fep.grid.pub.ro @8.8.8.8

bpf module

$ man 8 iptables-extensions nfbpf_compile

If you are working on Ubuntu, there is a chance that nfbpf_compile did not come with the iptables package (oh Canonical… maybe there's something in the Universe repos?).
Anyway, you can still install it manually:

$ sudo apt install libpcap-dev
$ wget https://raw.githubusercontent.com/netgroup-polito/iptables/master/utils/nfbpf_compile.c
$ gcc -o nfbpf_compile nfbpf_compile.c -lpcap

Also, use this man page rather than installing it separately.

Table matters

This rule uses the TTL target, which is only valid in a certain table. If you forget it, iptables will accept your command silently and still fail at kernel level. You won't see an error in the terminal — you'll see this:

iptables: Invalid argument. Run `dmesg' for more information.

Check dmesg whenever iptables gives you “Invalid argument”. You'll find the actual error there.

This is intentional behavior: the kernel module that handles the TTL target implements a rule check callback that validates the structure received from userspace. It doesn't trust you. If something is wrong, it logs to the kernel ring buffer — so dmesg is always your first stop when debugging iptables rules.

The Task (3)

Give an example when iptables is unable to catch a packet.

02. [20p] Network Exploration

[5p] Task A - ARP vs ICMP

The Address Resolution Protocol (ARP) resolves layer 2 addresses (MAC) from layer 3 addresses (e.g.: IP). Normally, all hosts are compelled to reply to ARP requests, but this can be fiddled with using tools such as arptables. You can show the currently known neighbors using iproute2.

$ ip -c neigh show

Pro tip #2: yes, ip can also generate color output. Most people don't know this and still use ifconfig, even though it's already deprecated at this point. Add this as an alias to your .bashrc or .zshrc and source it.

# alias for iproute2 color output
alias ip='ip -c'

The Internet Control Message Protocol (ICMP) is an ancillary protocol meant mainly to report errors between hosts. Sometimes it can also be used to perform measurements (ping) or to inform network participants of better routes (Redirect Messages). There are many ICMP functionalities, most of which are now deprecated. Note that some network equipment may not be capable of understanding new and officially recognized protocols, while other may not even recognize experimental ICMP codepoints (i.e.: type=253,254) and simply drop the packet. Because ICMP can be used to stage attacks in a network, some operating systems (e.g.: Windows ≥7) went so far as to disable Echo Replies by default.

The Task(s)

Use arp-scan to scan your local network while monitoring ARP traffic with wireshark to get a sense of what's going on. After that, use the following script to identify hosts discoverable via ARP but not ICMP.

Click to display ⇲

Click to hide ⇱

Hint: click on the file name to download the snippet below.

localnet-ping.sh
#!/bin/bash
 
# localnet-ping.sh - performs differential ARP / ICMP scan
#   $1 : [required] interface name  
 
if [ "$#" -ne 1 ]; then
    echo "Usage: ./localnet-ping.sh <interface>"
    exit 1
fi
 
# generate list of IPs and hostnames in local network for given interface
localnet_hosts=$(sudo arp-scan                                      \
                    --interface=$1           `# scanned network`    \
                    --localnet               `# only local network` \
                | head -n -3                 `# hide footer lines`  \
                | tail -n +3                 `# hide header lines`  \
                | awk '{$2=""; print $0}'    `# hide MAC address`   \
                )
 
# process generated list, one item at a time
while read -r it; do
    # separate IP from hostname
    current_ip=$(awk '{print $1}' <<< $it)
    current_host=$(awk '{$1=""; print $0}' <<< $it)
 
    printf '\033[1;33m%15s   %-35s \033[0;33m==>  \033[0m' \
        $current_ip "$current_host"
 
    # ping current host
    ping -c 1           `# only one ping` \
         -W 1           `# 1s timeout`    \
         $current_ip    `# target host`   \
         1>/dev/null 2>&1
 
    # evaluate ping success
    if [ $? -eq 0 ]; then
        printf '\033[1;32mok\n\033[0m'
    else
        printf '\033[1;31mfail\n\033[0m'
    fi
done <<< "$localnet_hosts"

[15p] Task B - nmap vs traceroute

nmap is a network exploration tool and a port scanner. Today, we will look only at a specific functionality that it shares with the traceroute utility.

Route discovery is simple in principle: IPv4 packets have a Time to Live (TTL) field that is decremented by 1 with each hop, thus ensuring a limited packet lifespan (imagine routing loops without TTL). Even if the TTL is 0, the layer 3 network equipment must process the received packet (the destination host can accept a packet with TTL=0). Routers may check the TTL field only if they are to forward the packet. If the TTL is already 0, the packet is dropped and a ICMP Time-To-Live Exceeded message is issued to the source IP. By sending packets with incrementally larger TTL values, it is possible to obtain the IP of each router on the path (at least in theory).

The Task(s)

With 8.8.8.8 as a target, use wireshark to view the traffic generated by both nmap and traceroute. What differences can you find in their default mode of operation?

$ sudo nmap                            \
    -sn     `# disable port scan`      \
    -Pn     `# disable host discovery` \
    -tr     `# perform traceroute`     \
    8.8.8.8
$ traceroute 8.8.8.8

Troubleshooting:

  • permission denied : make sure that nmap is not installed as a snap; you have two choices:
    • reinstall nmap with apt : sudo snap remove nmap && sudo apt install nmap
    • grant nmap permissions : snap connect nmap:network-control

If we do allow for a port scan by removing -sn (default is a TCP-based scan; use -sU for a UDP scan), this will take place before the actual traceroute. What changes does this bring?

Optional Task (... no, really)

When doing the TCP scan with nmap, you may have noticed a weird field in the TCP header: Options. Generate some TCP traffic with curl and look at the SYN packet in wireshark. What options do you see there?

Here is a quick break down of the more common TCP options and how they are used to overcome protocol limitations and improve throughput. Take a quick look if you want, then move on. We'll dive deeper into protocol options in the next task.

03. [30p] Packets, where are you?

Earlier in Ex. 1, we mentioned that eBPF is used for more than traffic filtering. Some of you may have heard of the eXpress Data Path (XDP) or the more recent eXpress Resubmission Path (XRP). Both of these are eBPF-powered shunts of kernel data paths that are used to optimize the system for very specific types of workloads. We'll return to these in a future lecture (and maybe a lab as well) since they can be considered advanced topics. For now, we'll focus on the third purpose eBPF can serve: execution tracing.

pwru is a tool created by Cilium to help trace network packets in the kernel's network stack and debug network connectivity issues. It does this by attaching simple eBPF programs to certain function entry points. These programs can report back to a userspace process different kinds of information, including the function that was reached, the arguments that were passed, and a CPU clock timestamp. The method used for instrumenting kernel code is based on kprobes. Ask your assistant for more information.

[10p] Task A - A packet's journey

Installation — build from source

Pre-built packages are no longer maintained for most distributions, so you'll build pwru from source. All you need is a Go compiler and make.

# Install Go if you don't have it
$ sudo apt install golang-go   # Ubuntu/Debian
# or follow https://go.dev/dl/ for the latest version
 
# Clone and build
$ git clone https://github.com/cilium/pwru.git
$ cd pwru
$ make
$ sudo mv pwru /usr/local/bin/
 
 
The build takes about a minute on first run (Go downloads dependencies). The result is a statically linked binary with no runtime dependencies.
 
**Minimum requirements** (check before running):
 
  * Linux kernel ≥ 5.5 (for BTF support): ''uname -r''
  * BTF enabled: ''ls /sys/kernel/btf/vmlinux''file must exist
  * ''bpf'' filesystem mounted: ''mount | grep bpf''
 
If BTF is missing, ''pwru'' will fail immediately with a clear error message.

Now, trace all outgoing DNS queries to the Google DNS (i.e.: 8.8.8.8) and perform one using dig. Add relative timestamps to the individual trace entries, to get an idea of the computational cost of each operation.

Finally, insert an iptables rule on the OUTPUT chain that drops DNS queries to 8.8.8.8 and redo the experiment. Check where the packet's path is cut short (the reason should be obvious :p).

Ubuntu users: local DNS caching via systemd-resolved may intercept your query before it reaches the network. If pwru shows nothing, try:

$ sudo systemd-resolve --flush-caches

or target 127.0.0.53 to confirm caching is the issue.

[20p] Task B - Interpreting the call path

Analyze the call path in the kernel network stack for the first scenario (when the packet actually made it out). Explain each step of the packet's journey.

Check out this map of the kernel subsystems, but note that the best source of information is always RTFS.

To structure your analysis, answer these questions in order:

  1. Where does the packet originate? Which function is the first to appear in the trace? What layer of the network stack does it correspond to?
  2. How does it reach the IP layer? Identify the transition from the socket/transport layer to the IP layer. Which function marks this boundary?
  3. What does Netfilter do here? Identify nf_hook_slow in the trace. Which Netfilter hook point does it correspond to (refer back to Figure 1 from Task 01)?
  4. How does it leave the machine? Identify the function responsible for handing the packet to the network device driver. What happens after this point?
  5. What changed with the DROP rule? Compare the two traces side by side. At which function does the path diverge?

04. [30p] Impact analysis of iptables rules

In Lab 05 you used bpftrace exclusively via one-liners (-e flag). That works fine for quick investigations, but as your probes get more complex (multiple hooks, conditionals, helper functions) you'll want to write proper script files (.bt extension).

The difference is minimal syntactically, but it is quite important in practice: a script file can have comments, be version-controlled, be shared with teammates, and be run with sudo bpftrace script.bt without the shell escaping headaches that come with one-liners.

In this task you'll write two scripts targeting functions you observed in your pwru trace from Exercise 03.

Before starting: make sure you have a clean iptables state. Remove any DROP rules you added in the previous exercise:

$ sudo iptables -D OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP

[0p] Task A: Demo: coding style for bpftrace scripts

Before writing your own scripts, study this example. It is not a task — there is nothing to submit. It exists to show what a well-structured .bt script looks like, so you have a reference when writing the next two.

You can also find out more about the bpftrace coding style here

nf_demo.bt
#!/usr/bin/bpftrace
 
BEGIN
{
    printf("Tracing nf_hook_slow... Ctrl+C to stop.\n\n");
}
 
/* fentry fires at the entry of the kernel function.
 * Faster and lower-overhead than kprobe.
 * 'comm' is a bpftrace built-in: the name of the current process.
 */
fentry:nf_hook_slow
{
    @invocations_by_process[comm]++;
}
 
/* Print and reset every 3 seconds */
interval:s:3
{
    printf("-- %s --\n", strftime("%H:%M:%S", nsecs));
    print(@invocations_by_process);
    printf("\n");
    clear(@invocations_by_process);
}
 
END
{
    printf("Done.\n");
}

Run it for a few seconds while generating some traffic and observe the output. Then read through the script again. This is the style expected in Task B.

[30p] Task B: The cost of a bloated rule chain

nf_hook_slow(), which is visible in your pwru trace from Task 03, is the function that walks the iptables rule chain for every packet. Its cost is not fixed: it scales with the number of rules in the chain, and within each rule, with the number of match flags specified. A match rule such as -p tcp -d 8.8.8.8 –dport 443 invokes three separate match callbacks in sequence; if any returns false, evaluation stops for that rule and moves on to the next one. On a long chain, this adds up.

A common real-world mistake: a sysadmin responds to unwanted traffic by adding one DROP rule per offending source IP, one at a time, instead of a single rule covering the entire prefix. After hours or days of this, the chain has thousands of rules. Every packet, regardless of its actual destination, must walk the entire chain before reaching the default policy. On a modest server, this is enough to cause visible throughput degradation.

You are going to reproduce this and measure it.

What you need
  • iperf3: Tool for performing network throughput measurements. It's both a server and a client.
  • bpftrace: High-level eBPF scripting tool for kernel profiling.
  • python3: With matplotlib and optionally pandas for plotting.
Sub-task 01: Setting up a local iperf3 server

Running a local server eliminates network variability from the experiment, the iptables overhead signal becomes much cleaner and easier to observe in the plot. Pick one of the two options below depending on your setup.

Option 1: Docker container with Arch Linux

If you have Docker installed, you can spin up an Arch Linux container. This container will use the same TCP/IP stack as the host, but will have distinct network devices, routing tables, firewall rules, etc. Any packet that leaves the container will have to pass through the network stack twice.

# start the container
host$ docker run -ti --rm archlinux
 
# show IP address of container and run iperf3
arch$ pacman -Sy --noconfirm iperf3 iproute2
arch$ ip -c a s
arch$ iperf -s
 
# test if it works (should have >40Gbps throughput)
host$ iperf3 -c ${container_ip} -p 5201 -t 5

Option 2: network namespace (no Docker required)

This creates an isolated network environment using Linux network namespaces and a virtual Ethernet pair (veth), exactly like Docker does internally. See RL Lab 10 for a deeper dive into how this works.

# 1. Create the namespace
$ sudo ip netns add iperf3-ns
 
# 2. Create a veth pair: one end stays on the host, one goes into the namespace
$ sudo ip link add veth-host type veth peer name veth-ns
 
# 3. Move one end into the namespace
$ sudo ip link set veth-ns netns iperf3-ns
 
# 4. Configure the host-side interface
$ sudo ip addr add 10.99.0.1/24 dev veth-host
$ sudo ip link set veth-host up
 
# 5. Configure the namespace-side interface
$ sudo ip netns exec iperf3-ns ip addr add 10.99.0.2/24 dev veth-ns
$ sudo ip netns exec iperf3-ns ip link set veth-ns up
$ sudo ip netns exec iperf3-ns ip link set lo up
 
# 6. Start iperf3 server inside the namespace (background)
$ sudo ip netns exec iperf3-ns iperf3 -s -D
 
# 7. Test from the host (server is at 10.99.0.2)
$ iperf3 -c 10.99.0.2 -p 5201 -t 5

Traffic from the host to 10.99.0.2 is routed through the kernel's normal IP output path and hits the OUTPUT chain where nf_hook_slow is instrumented correctly.

When done with the experiment:

$ sudo ip netns delete iperf3-ns
$ sudo ip link delete veth-host
Sub-task 02: The bpftrace script

Write a bpftrace script of your own that calculates the average time each packet spent being evaluated in nf_hook_slow().

We reccomend using kprobe/kretprobe instead of fentry/fexit for portability, since kprobes work on kernels without full BTF support, which some VMs lack. The instrumentation overhead is slightly higher, but overall negligible.

Sub-task 03: Acquiring the data

Run a 5-10s iperf3 throughput test between your host and the container. Meanwhile, use the script that you've written to measure the latency introduced by the OUTPUT Netfilter chain hook.

Having no rules configured on your OUTPUT chain, this will serve as a baseline. Next, redo this experiment by continuously adding 100 iptables rules that are guaranteed to fail (i.e., verdict will never be obtained until all rules are evaluated). Repeat these steps until you end up with ~3,000 rules in your OUTPUT chain. Save all these results (number of rules, average throughput, average Netfilter-induced latency) since you will have to plot them.

Try to script this, since manually re-running all of this is very tiresome!

Flush your OUTPUT chain after you are done with the experiment:

$ sudo iptables -F OUTPUT

Sub-task 04: Plotting the data

Write a Python script that creates two plots in the same figure:

  • iperf3 throughput as a function of iptabes rules.
  • Average elapsed time in the Netfilter hook as a function of iptables rules.
Sub-task 05: Interpreting the data

Answer the following:

  • At what approximate rule count does the throughput begin to visibly degrade?
  • Is the latency increase in nf_hook_slow() linear with rule count? What does this tell you about the algorithm used to walk the chain?
  • What do you expect would happen if you were to perform this test on a public iperf3 server instead of locally hosted one?

05. [10p] Feedback

Please take a minute to fill in the feedback form for this lab.

ep/labs/061.txt · Last modified: 2026/04/07 00:26 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0