This shows you the differences between two versions of the page.
|
ep:labs:04:contents:tasks:ex1 [2021/09/28 22:42] radu.mantu [01. [??p] Primer / Reminder] |
ep:labs:04:contents:tasks:ex1 [2026/03/23 21:58] (current) radu.mantu |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ==== 01. [??p] Primer / Reminder ==== | + | ==== 01. [10p] Valgrind ==== |
| - | === [??p] Task A - tcpdump === | + | Dynamic analysis tools can observe a running process and report memory-related |
| + | issues that static analysis would miss entirely. In this exercise you will use | ||
| + | **Valgrind** to detect memory leaks in a small C program -- and get a first taste | ||
| + | of the dynamic instrumentation concept that will be developed further in Task 04 | ||
| + | with Intel Pin. | ||
| - | **tcpdump** is network traffic monitoring tool. At its core, it uses **libpcap** which in turn uses a technology called **Extended Berkley Packet Filter (eBPF)**. | + | === [5p] Task A - Writing a leaky program === |
| - | **BPF** was first proposed around 1995 when filtering mechanisms (and firewalls) were still a novel concept and were based on interpreters. **BPF** (now referred to as Classic BPF - **cBPF**) was the initial version of a Motorola inspired virtual ISA (i.e.: had no hardware implementation -- think **CHIP-8**). **eBPF** is basically still **BPF** but more compatible with 64-bit architectures so that Just In Time (JIT) translators have an easier time running the code. | + | Read the contents of ''leak.c'' and compile it: |
| - | + | ||
| - | At first, the whole idea was to compile a packet filtering program and attach it to a socket in kernelspace. This program would filter out packets that the userspace process would not be interested in and reduce the quantity of data copied over the kernelspace/userspace boundary, only to ultimately be discarded. | + | |
| - | + | ||
| - | Today, **eBPF** is used heavily for system profiling by companies such as Netflix and Facebook. Linux has had a kernel VM capable of running and statically analyzing **eBPF** code since around 2006. **tcpdump** is one of the few examples that still use it for its original purpose. Ask your assistant if you want to know more about **eBPF** tracing (not part of the lab, don't panic!) | + | |
| - | + | ||
| - | == The Task == | + | |
| - | + | ||
| - | Use **tcpdump** to output outgoing **NTP** queries and incoming **http(s)** responses. Use the ''-d'' flag to see an **objdump** of the filter program's code. | + | |
| - | + | ||
| - | Complete the **tcpdump** command in order to satisfy the following formatting requirements: | + | |
| - | * print the packet number | + | |
| - | * print the elapsed time (in nanoseconds) since the first packet was captured | + | |
| - | * print the content of each packet (w/o l2 header) in hex and ASCII | + | |
| - | * do not resolve IP addresses to names | + | |
| - | + | ||
| - | How to test: | + | |
| <code bash> | <code bash> | ||
| - | $ ntpdate -q ro.pool.ntp.org | + | $ gcc -g -o leak leak.c |
| - | $ curl ocw.cs.pub.ro | + | |
| </code> | </code> | ||
| - | <note tip> | + | The **-g** flag includes debug symbols so Valgrind can report exact file names |
| - | **tpcdump** can list the available interfaces if run with ''-D''. In addition to your network interfaces, you may also see a **Bluetooth** device or the **dbus-system** (depending on your desktop). | + | and line numbers. |
| - | If you don't specify the interface with ''-i'', the first entry in the printed list will be used by default. This may not always be your active network interface but in stead, your **docker** bridge (for example). | + | Now run it normally and observe that nothing seems wrong from the outside: |
| - | </note> | + | |
| - | + | ||
| - | <solution -hidden> | + | |
| - | <code> | + | |
| - | $ tcpdump -d '(udp dst port 123) or (tcp src port 80) or (tcp src port 443)' | + | |
| - | $ sudo tcpdump -# --nano -ttttt -n -X '(udp dst port 123) or (tcp src port 80) or (tcp src port 443)' | + | |
| - | </code> | + | |
| - | </solution> | + | |
| - | + | ||
| - | === [??p] Task B - iptables === | + | |
| - | + | ||
| - | **iptables** is a configuration tool for the kernel packet filter. | + | |
| - | + | ||
| - | The system as a whole provides many functionalities that are grouped by **tables**: //filter, nat, mangle, raw, security//. If you want to alter a packet header, you place a rule in the //mangle// table. If you want to mask the private IP address of an internal host with the external IP address of the default gateway, you place a rule in the //nat// table. Depending on the table you choose, you will gain or lose access to some chains. If not specified, the default is the //filter// table. | + | |
| - | + | ||
| - | **Chains** are basically lists of rules. The five built-in chains are //PREROUTING, FORWARD, POSTROUTING, INPUT, OUTPUT//. Each of these corresponds to certain locations in the network stack where packets trigger **Netfilter hooks** ([[https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_input.c#L540|here]] is the //PREROUTING// kernel hook as an example -- not that hard to add one, right?) For a selected chain, the order in which the rules are evaluated is determined primarily by the priority of their tables and secondarily by the user's discretionary arrangement (i.e.: order in which rules are inserted). | + | |
| - | + | ||
| - | {{ :ep:labs:04:contents:tasks:iptables_path.png?800 |}} | + | |
| - | + | ||
| - | A **rule** consists of two entities: a sequence of match criteria and a jump target. | + | |
| - | + | ||
| - | The **jump target** represents an action to be taken. You are most likely familiar with the built-in actions such as //ACCEPT// or //DROP//. These actions decide the ultimate fate of the packet and are final (i.e.: rule iteration stops when these are invoked). However, there are also extended actions (see ''man iptables-extensions(8)'') that are not terminal verdicts and can be used for various tasks such as auditing, forced checksum recalculation or removal of Explicit Congestion Notification (ECN) bits. | + | |
| - | + | ||
| - | The **match criteria** of every rule are checked to determine if the jump target is applied. The way this is designed is very elegant: every type of feature (e.g.: l3 IP address vs l4 port) that you can check has a match callback function defined in the kernel. If you want, you can write your own such function in a Linux Kernel Module (LKM) and thus extend the functionality of **iptables** ([[https://inai.de/documents/Netfilter_Modules.pdf|Writing Netfilter Modules]] with code example). However, you will need to implement a userspace shared library counterpart. When you start an **iptables** process, it searches in ///usr/lib/xtables/ // and automatically loads certain shared libraries (note: this path can be overwritten or extended using the //XTABLES_LIBDIR// environment variable). Each library there must do three things: | + | |
| - | * define **iptables** flags for the new criteria that you want to include. | + | |
| - | * define help messages for when ''**iptables** %%--%%help'' is called (its help message is an amalgamation of each library's help snippet). | + | |
| - | * provide an initialization function for the structure containing the rule parameters; this structure will end up in the kernel's rule chain. | + | |
| - | So when you want to test the efficiency of the **iptables** rule evaluation process, keep in mind that each rule may imply the invocation of multiple callbacks. | + | |
| - | + | ||
| - | == The Task (1) == | + | |
| - | + | ||
| - | Write an **iptables** rule according to the following specifications: | + | |
| - | * **chain:** OUTPUT | + | |
| - | * **match rule:** TCP packets originating from ephemeral ports bound to a socket created by root | + | |
| - | * **target:** enable kernel logging of matched packets with the //"EP: "// prefix | + | |
| - | + | ||
| - | How to test: | + | |
| <code bash> | <code bash> | ||
| - | $ sudo curl www.google.com | + | $ ./leak |
| - | $ sudo dmesg | + | $ echo "exit code: $?" |
| </code> | </code> | ||
| - | <note tip> | + | === [5p] Task B - Detecting leaks with Valgrind === |
| - | <code bash> | + | |
| - | $ man 8 iptables-extensions | + | |
| - | </code> | + | |
| - | </note> | + | |
| - | <solution -hidden> | + | Run the same binary under Valgrind's memory error detector: |
| <code bash> | <code bash> | ||
| - | # "--log-prefix" must come after "-j LOG" | + | $ valgrind --leak-check=full --show-leak-kinds=all ./leak |
| - | $ sudo iptables \ | + | |
| - | -m multiport -m owner \ | + | |
| - | -I OUTPUT \ | + | |
| - | -p tcp \ | + | |
| - | --sports 1024:65535 \ | + | |
| - | --uid-owner root \ | + | |
| - | -j LOG \ | + | |
| - | --log-prefix 'EP: ' | + | |
| </code> | </code> | ||
| - | </solution> | ||
| - | |||
| - | == The Task (2) == | ||
| - | |||
| - | Write an **iptables** rule according to the following specifications: | ||
| - | * **chain:** OUTPUT | ||
| - | * **match rule:** **BPF** program that filters UDP traffic to port 53 (try [[https://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html|bash command substitution]]) | ||
| - | * **target:** set **TTL** to 1 (initially) | ||
| - | |||
| - | Continue __appending__ the same rule with incremented **TTL** value until the **DNS** request goes through. | ||
| - | |||
| - | How to test: | ||
| - | <code bash> | ||
| - | $ dig +short fep.grid.pub.ro @8.8.8.8 | ||
| - | </code> | ||
| - | |||
| - | <note tip> | ||
| - | <code bash> | ||
| - | $ man 8 iptables-extensions nfbpf_compile | ||
| - | </code> | ||
| - | </note> | ||
| - | |||
| - | <solution -hidden> | ||
| - | <code bash> | ||
| - | $ sudo iptables \ | ||
| - | -m bpf \ | ||
| - | -t mangle \ | ||
| - | -A OUTPUT \ | ||
| - | --bytecode "$(nfbpf_compile 'udp dst port 53')" \ | ||
| - | -j TTL \ | ||
| - | --ttl-set 1 | ||
| - | </code> | ||
| - | |||
| - | NOTE: If they don't specify the //mangle// table, the default (//filter//) will be used. **iptables** will say //"ok, all your arguments are fine... I'll send the structure to kernelspace"// but it will still fail! They will get this message: | ||
| - | |||
| - | <code> | ||
| - | iptables: Invalid argument. Run `dmesg' for more information. | ||
| - | </code> | ||
| - | |||
| - | Whenever you upload a rule into the kernel, the appropriate module can also //optionally// implement a rule check callback, in addition to the match callback. This rule check callback will verify that the structure received from **iptables** is correct (it doesn't trust a userspace process, obviously). If an error occurs, it will print an error message to the kernel log. | ||
| - | |||
| - | Let the students check the kernel log! They had to do this for the previous task, so they have no reason to cry for help here. They will get: | ||
| - | |||
| - | <code bash> | ||
| - | $ sudo dmesg | ||
| - | ... | ||
| - | [ 36.960234] x_tables: ip_tables: TTL target: only valid in mangle table, not filter | ||
| - | </code> | ||
| - | |||
| - | If they ask (they won't), **Xtables** (read cross-tables) is the backend of the **iptables** and more recently **nftables** (just reached v1.0 after ~13y) infrastructure. | ||
| - | |||
| - | </solution> | ||
| - | |||
| - | == The Task (3) == | ||
| - | Give an example when **iptables** is unable to catch a packet. | + | Examine the output and answer the following questions: |
| + | - How many bytes are reported as **definitely lost**? Does this match what you would expect from reading the source? | ||
| + | - What is the difference between **definitely lost** and **indirectly lost** in Valgrind's terminology? | ||
| + | - At what line number does Valgrind point as the origin of the leak? Why is that line significant rather than the line where the pointer goes out of scope? | ||
| + | - Re-compile **without** the ''-g'' flag and run Valgrind again. What information is now missing from the report, and why? | ||
| <solution -hidden> | <solution -hidden> | ||
| - | DHCPDISCOVER message. Interface has no IP address so it's placed in promiscuous mode (PF_PACKET) and the network stack is bypassed; the packet is put directly on the wire. In the past, it was possible to set 0.0.0.0 as a temporary IP while getting a lease, but not anymore! | + | - 10 calls × 256 bytes = **2560 bytes** definitely lost. |
| + | - **Definitely lost**: the last pointer to the allocation is gone -- the memory | ||
| + | can never be freed. **Indirectly lost**: memory reachable only through another | ||
| + | leaked block (e.g. a node in a leaked linked list). | ||
| + | - Valgrind points to the ''malloc()'' call inside ''leaky_function()'' because | ||
| + | that is where the allocation originated. The pointer going out of scope is a | ||
| + | C concept; Valgrind tracks allocations at the heap level, not variable lifetimes. | ||
| + | - Without ''-g'', Valgrind shows raw addresses and shared library offsets instead | ||
| + | of ''leak.c:5''. The source file name and line number come from the DWARF debug | ||
| + | information embedded by the compiler. | ||
| </solution> | </solution> | ||