Show page

Differences

This shows you the differences between two versions of the page.

--- ep:labs:04:contents:tasks:ex1 [2021/09/28 22:42]
radu.mantu [01. [??p] Primer / Reminder]
+++ ep:labs:04:contents:tasks:ex1 [2026/03/23 21:58] (current)
radu.mantu
@@ Line 1: / Line 1: @@
-==== 01. [??p] Primer / Reminder ====
+==== 01. [10p] Valgrind ====
-=== [??p] Task A - tcpdump ===
+Dynamic analysis tools can observe a running process and report memory-related
+issues that static analysis would miss entirely. In this exercise you will use
+**Valgrind** to detect memory leaks in a small C program -- and get a first taste
+of the dynamic instrumentation concept that will be developed further in Task 04
+with Intel Pin.
-**tcpdump** is network traffic monitoring tool. At its core, it uses **libpcap** which in turn uses a technology called **Extended Berkley Packet Filter (eBPF)**.
+=== [5p] Task A - Writing a leaky program ===
-**BPF** was first proposed around 1995 when filtering mechanisms (and firewalls) were still a novel concept and were based on interpreters. **BPF** (now referred to as Classic BPF - **cBPF**) was the initial version of a Motorola inspired virtual ISA (i.e.: had no hardware implementation -- think **CHIP-8**). **eBPF** is basically still **BPF** but more compatible with 64-bit architectures so that Just In Time (JIT) translators have an easier time running the code.
+Read the contents of ''leak.c'' and compile it:
-At first, the whole idea was to compile a packet filtering program and attach it to a socket in kernelspace. This program would filter out packets that the userspace process would not be interested in and reduce the quantity of data copied over the kernelspace/userspace boundary, only to ultimately be discarded.
-Today, **eBPF** is used heavily for system profiling by companies such as Netflix and Facebook. Linux has had a kernel VM capable of running and statically analyzing **eBPF** code since around 2006. **tcpdump** is one of the few examples that still use it for its original purpose. Ask your assistant if you want to know more about **eBPF** tracing (not part of the lab, don't panic!)
-== The Task ==
-Use **tcpdump** to output outgoing **NTP** queries and incoming **http(s)** responses. Use the ''-d'' flag to see an **objdump** of the filter program's code.
-Complete the **tcpdump** command in order to satisfy the following formatting requirements:
-  * print the packet number
-  * print the elapsed time (in nanoseconds) since the first packet was captured
-  * print the content of each packet (w/o l2 header) in hex and ASCII
-  * do not resolve IP addresses to names
-How to test:
 <code bash>
-$ ntpdate -q ro.pool.ntp.org
+$ gcc -g -o leak leak.c
-$ curl ocw.cs.pub.ro
 </code>
-<note tip>
+The **-g** flag includes debug symbols so Valgrind can report exact file names
-**tpcdump** can list the available interfaces if run with ''-D''. In addition to your network interfaces, you may also see a **Bluetooth** device or the **dbus-system** (depending on your desktop).
+and line numbers.
-If you don't specify the interface with ''-i'', the first entry in the printed list will be used by default. This may not always be your active network interface but in stead, your **docker** bridge (for example).
+Now run it normally and observe that nothing seems wrong from the outside:
-</note>
-<solution -hidden>
-<code>
-$ tcpdump -d '(udp dst port 123) or (tcp src port 80) or (tcp src port 443)'
-$ sudo tcpdump -# --nano -ttttt -n -X '(udp dst port 123) or (tcp src port 80) or (tcp src port 443)'
-</code>
-</solution>
-=== [??p] Task B - iptables ===
-**iptables** is a configuration tool for the kernel packet filter.
-The system as a whole provides many functionalities that are grouped by **tables**: //filter, nat, mangle, raw, security//. If you want to alter a packet header, you place a rule in the //mangle// table. If you want to mask the private IP address of an internal host with the external IP address of the default gateway, you place a rule in the //nat// table. Depending on the table you choose, you will gain or lose access to some chains. If not specified, the default is the //filter// table.
-**Chains** are basically lists of rules. The five built-in chains are //PREROUTING, FORWARD, POSTROUTING, INPUT, OUTPUT//. Each of these corresponds to certain locations in the network stack where packets trigger **Netfilter hooks** ([[https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_input.c#L540|here]] is the //PREROUTING// kernel hook as an example -- not that hard to add one, right?) For a selected chain, the order in which the rules are evaluated is determined primarily by the priority of their tables and secondarily by the user's discretionary arrangement (i.e.: order in which rules are inserted).
-{{ :ep:labs:04:contents:tasks:iptables_path.png?800 |}}
-A **rule** consists of two entities: a sequence of match criteria and a jump target.
-The **jump target** represents an action to be taken. You are most likely familiar with the built-in actions such as //ACCEPT// or //DROP//. These actions decide the ultimate fate of the packet and are final (i.e.: rule iteration stops when these are invoked). However, there are also extended actions (see ''man iptables-extensions(8)'') that are not terminal verdicts and can be used for various tasks such as auditing, forced checksum recalculation or removal of Explicit Congestion Notification (ECN) bits.
-The **match criteria** of every rule are checked to determine if the jump target is applied. The way this is designed is very elegant: every type of feature (e.g.: l3 IP address vs l4 port) that you can check has a match callback function defined in the kernel. If you want, you can write your own such function in a Linux Kernel Module (LKM) and thus extend the functionality of **iptables** ([[https://inai.de/documents/Netfilter_Modules.pdf|Writing Netfilter Modules]] with code example). However, you will need to implement a userspace shared library counterpart. When you start an **iptables** process, it searches in ///usr/lib/xtables/ // and automatically loads certain shared libraries (note: this path can be overwritten or extended using the //XTABLES_LIBDIR// environment variable). Each library there must do three things:
-  * define **iptables** flags for the new criteria that you want to include.
-  * define help messages for when ''**iptables** %%--%%help'' is called (its help message is an amalgamation of each library's help snippet).
-  * provide an initialization function for the structure containing the rule parameters; this structure will end up in the kernel's rule chain.
-So when you want to test the efficiency of the **iptables** rule evaluation process, keep in mind that each rule may imply the invocation of multiple callbacks.
-== The Task (1) ==
-Write an **iptables** rule according to the following specifications:
-  * **chain:** OUTPUT
-  * **match rule:** TCP packets originating from ephemeral ports bound to a socket created by root
-  * **target:** enable kernel logging of matched packets with the //"EP: "// prefix
-How to test:
 <code bash>
-$ sudo curl www.google.com
+$ ./leak
-$ sudo dmesg
+$ echo "exit code: $?"
 </code>
-<note tip>
+=== [5p] Task B - Detecting leaks with Valgrind ===
-<code bash>
-$ man 8 iptables-extensions
-</code>
-</note>
-<solution -hidden>
+Run the same binary under Valgrind's memory error detector:
 <code bash>
-# "--log-prefix" must come after "-j LOG"
+$ valgrind --leak-check=full --show-leak-kinds=all ./leak
-$ sudo iptables               \
-        -m multiport -m owner \
-        -I OUTPUT             \
-        -p tcp                \
-        --sports 1024:65535   \
-        --uid-owner root      \
-        -j LOG                \
-        --log-prefix 'EP: '
 </code>
-</solution>
-== The Task (2) ==
-Write an **iptables** rule according to the following specifications:
-  * **chain:** OUTPUT
-  * **match rule:** **BPF** program that filters UDP traffic to port 53 (try [[https://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html|bash command substitution]])
-  * **target:** set **TTL** to 1 (initially)
-Continue __appending__ the same rule with incremented **TTL** value until the **DNS** request goes through.
-How to test:
-<code bash>
-$ dig +short fep.grid.pub.ro @8.8.8.8
-</code>
-<note tip>
-<code bash>
-$ man 8 iptables-extensions nfbpf_compile
-</code>
-</note>
-<solution -hidden>
-<code bash>
-$ sudo iptables                                         \
-        -m bpf                                          \
-        -t mangle                                       \
-        -A OUTPUT                                       \
-        --bytecode "$(nfbpf_compile 'udp dst port 53')" \
-        -j TTL                                          \
-        --ttl-set 1
-</code>
-NOTE: If they don't specify the //mangle// table, the default (//filter//) will be used. **iptables** will say //"ok, all your arguments are fine... I'll send the structure to kernelspace"// but it will still fail! They will get this message:
-<code>
-iptables: Invalid argument. Run `dmesg' for more information.
-</code>
-Whenever you upload a rule into the kernel, the appropriate module can also //optionally// implement a rule check callback, in addition to the match callback. This rule check callback will verify that the structure received from **iptables** is correct (it doesn't trust a userspace process, obviously). If an error occurs, it will print an error message to the kernel log.
-Let the students check the kernel log! They had to do this for the previous task, so they have no reason to cry for help here. They will get:
-<code bash>
-$ sudo dmesg
-...
-[   36.960234] x_tables: ip_tables: TTL target: only valid in mangle table, not filter
-</code>
-If they ask (they won't), **Xtables** (read cross-tables) is the backend of the **iptables** and more recently **nftables** (just reached v1.0 after ~13y) infrastructure.
-</solution>
-== The Task (3) ==
-Give an example when **iptables** is unable to catch a packet.
+Examine the output and answer the following questions:
+  - How many bytes are reported as **definitely lost**? Does this match what you would expect from reading the source?
+  - What is the difference between **definitely lost** and **indirectly lost** in Valgrind's terminology?
+  - At what line number does Valgrind point as the origin of the leak? Why is that line significant rather than the line where the pointer goes out of scope?
+  - Re-compile **without** the ''-g'' flag and run Valgrind again. What information is now missing from the report, and why?
 <solution -hidden>
-DHCPDISCOVER message. Interface has no IP address so it's placed in promiscuous mode (PF_PACKET) and the network stack is bypassed; the packet is put directly on the wire. In the past, it was possible to set 0.0.0.0 as a temporary IP while getting a lease, but not anymore!
+  - 10 calls × 256 bytes = **2560 bytes** definitely lost.
+  - **Definitely lost**: the last pointer to the allocation is gone -- the memory
+    can never be freed. **Indirectly lost**: memory reachable only through another
+    leaked block (e.g. a node in a leaked linked list).
+  - Valgrind points to the ''malloc()'' call inside ''leaky_function()'' because
+    that is where the allocation originated. The pointer going out of scope is a
+    C concept; Valgrind tracks allocations at the heap level, not variable lifetimes.
+  - Without ''-g'', Valgrind shows raw addresses and shared library offsets instead
+    of ''leak.c:5''. The source file name and line number come from the DWARF debug
+    information embedded by the compiler.
 </solution>

General Information

Lectures

Labs

Assignments

Archived Labs

ep/labs/04/contents/tasks/ex1.1632858125.txt.gz · Last modified: 2021/09/28 22:42 by radu.mantu

Show page Old revisions

Media Manager Back to top