Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:04:contents:tasks:ex3 [2021/11/02 14:12]
radu.mantu
ep:labs:04:contents:tasks:ex3 [2025/03/24 22:08] (current)
silvia.dragan [03. [30p] Kernel Samepage Merging]
Line 1: Line 1:
-==== 03. [50pProtocol Options ​====+==== 03. [30pKernel Samepage Merging ​====
  
-As you've probably already seen in the previous exercise, TCP uses protocol extensions (called options) in order to negotiate session parameters and improve overall performance. Note that this mechanism has existed since the very inception of the protocol approx. 30 years ago ([[https://datatracker.ietf.org/​doc/​html/​rfc793#​section-3.1|RFC793 - Transmission Control Protocol]]), with many being added post-factum.+[[https://www.kernel.org/​doc/​html/​latest/​admin-guide/​mm/​ksm.html|KSM]] is a page de-duplication strategy introduced in kernel version 2.6.32. In case you are wonderingit's not the same thing as the file page cache. KSM was originally developed in tandem ​with KVM in order to detect data pages with //exactly// the same content and make their page table entries point to the same physical address (marked Copy-On-Write.) The end goal was to allow more VMs to run on the same host. Since each page must be scanned for identical content, this solution had no chance of scaling well with the available quantity of RAM. So, the developers compromised to scan only with the private anonymous pages that were marked as likely candidates via ''​madvise(addr,​ length, MADV_MERGEABLE)''​.
  
-While the same could be said for IP options ([[https://​datatracker.ietf.org/​doc/​html/​rfc791#​section-3.1|RFC791 - Internet Protocol]]),​ there have always been... issues. In 2005 it was decided that [[https://www2.eecs.berkeley.edu/​Pubs/​TechRpts/​2005/​EECS-2005-24.pdfIP Options are not an option]]. Middleboxes (i.e.: network equipment in the Internet -- routers, NATs, firewalls, etc.) would sometimes implement abridged versions of the protocol specifications. For IP options, ​this meant that the IHL field would be ignored and the header would always be considered to be 20 bytes in length. As a result, if a packet carried IP options, cheap network equipment would wrongly assume that the layer 4 header would start at a 20 byte offset and drop it due to erroneously perceived malformations.+Download ​the {{:ep:labs:​02:​contents:​tasks:​ksm.zip|skeleton}} for this task.
  
-The authors of the 2005 report discovered that only a fraction (15%) of edge Autonomous Systems (AS -- huge, ISP-grade networks with a unified routing policy) were responsible for most packet drops. This made them optimistic towards a speedy resolution of this issue, but things haven'​t changed much in the past 15 or so years. Today IP options usually work just fine in local networks. As for the wider Internet, there are specific paths and networks where IP options can pass unthwarted, but only if the layer 4 protocol is ICMP. The logic may be that most IP options are used for path measurement anyway, so why use it in conjunction with anything other than ICMP (a pretty dumb argument, I admit... but it's not mine).+=== [10p] Task A Check kernel support & enable ksmd ===
  
-Luckilyone of these compliant networks is RoEduNet. If you're not working from the university'​s ​networkthen start a VM instance on [[https://​cloud-controller.grid.pub.ro/​dashboard/​auth/​login/?​next=/​dashboard/​|OpenStack]]. ISPs like RDS tend to drop IP OptionsAs a targetwe will use [[https://​www.digitalocean.com/​|DigitalOcean]]. Based on some personal tests, I can guarantee that they don't drop any options, regardless of region.+First things first, you need to verify that KSM was enabled during your kernel'​s ​compilation. For thisyou need to check the Linux build configuration fileHopefullyyou should see something like this:
  
-{{ :​ep:​labs:​04:​contents:​tasks:​ip_recordroute.png?​700 |}} 
- 
-== Overarching Goal == 
- 
-The first task is to modify outgoing traffic and include a Record Route option in an ICMP Echo Request. Check its description in [[https://​datatracker.ietf.org/​doc/​html/​rfc791|RFC791]] to understand what it does (and no, not "​Loose/​Strict Source and Record Route"​... keep hitting that Find key). The packet structure will resemble that in the picture above. Don't worry; for this step, we'll give you a little help ;) 
- 
-Your second task will be to write a bash script that extracts the IP addresses from the ICMP Echo Response'​s Record Route option from a packet capture and perform an AS Lookup. In other words, you will determine the names of all the networks that the packet traverses on its way from the university to DigialOcean and back. 
- 
-But let's take it step-by-step. 
- 
-=== [10p] Task A - Injecting IP Options === 
- 
-Remember talking about **iptables** extensions earlier? **Netfilter Queue** is one of them and will be relevant for this task. What it is, is an **iptables** target. What it does, it redirects each matched packet to a userspace process for evaluation and optionally, //​modification//​. 
- 
-The userspace process receives each packet by polling a [[https://​man7.org/​linux/​man-pages/​man7/​unix.7.html|Unix Domain Socket]]. After obtaining one, it can perform any type of analysis that it wants (e.g.: [[https://​dl.acm.org/​doi/​pdf/​10.1145/​1151659.1159952|deep packet inspection]]) in order to reach a verdict. The verdict can be the already known built-ins (i.e.: //ACCEPT, DROP,// etc.) or it can redirect the packet to another queue, with another process listening. When setting the verdict, a modified packet can be provided to replace the original on its datapath through the kernel'​s network stack. 
- 
-Enter [[https://​github.com/​RaduMantu/​ops-inject|ops-inject]]. This is a tool (that'​s still under development,​ mind you) that allows the annotation of matched packets with IP/TCP/UDP options. Why is this tool simple to use: all you have to do is provide a sequence of bytes representing the [[https://​www.iana.org/​assignments/​ip-parameters/​ip-parameters.xhtml|codepoints]] of the options that you want to append. This byte stream is passed to an internal decoder that //expands// each byte into a fully-fledged option, albeit in accordance to an arbitrary implementation. Once you clone the repo, you should look over two sources in particular: 
-  * ''​src/​main.cpp''​ : here, just understand what **libnetfilterqueue** library calls are made in order to set up the Unix socket, to receive the packet and to set its verdict. 
-  * ''​src/​ops_ip.c''​ : this is where all available IP Options are implemented;​ take a look at the **ip_decoders** vtable at the bottom to associate options and codepoints. 
- 
-First of all, let's fetch the tool and compile it. 
 <code bash> <code bash>
-git clone https://github.com/​RaduMantu/​ops-inject.git +# on Ubuntu you can usually find it in your /boot partition 
-cd !$:t:r +grep CONFIG_KSM ​/boot/config-$(uname -r) 
-$ make -j $(nproc+CONFIG_KSM=y
-</​code>​+
  
-<note tip> +otherwise, you can find a gzip compressed copy in /proc 
-//Pro tip #3//: [[https://​www.gnu.org/​software/​bash/​manual/​html_node/​Modifiers.html|Bash Modifiers]] and [[https://​www.gnu.org/​software/​bash/​manual/​html_node/​Word-Designators.html|Word Designators]] +$ zcat /proc/config.gz | grep CONFIG_KSM 
-  * ''​!$''​ is substituted with the final argument of previously executed command +CONFIG_KSM=y
-  * '':​t''​ removes the leading pathname components, leaving //​ops-inject.git//​ +
-  * '':​r''​ removes exactly one trailing suffix, leaving //​ops-inject//​ +
- +
----- +
- +
- +
-Troubleshooting:​ +
-  * **IPPROTO_ETHERNET error** : if your //​netinet/​in.h//​ is missing this definition, you can delete line 36 from //​src/​str_proto.c//​. +
-  * **libnetfilter-queue missing** : consider installing //​libnetfilter-queue-dev//​ and //​libnetfilter-queue1//. +
-</​note>​ +
- +
-Next, let's insert an **iptables** rule that matches all outgoing ICMP packets. +
-Take note of ''​%%--%%queue-num 0''​ for when we'll need to tell the userspace process which Netfilter Queue to subscribe. Also, ''​%%--%%queue-bypass''​ tells the **iptables** module to disable enqueuing packets if there'​s no process listening. Otherwise, the queue'​s buffer will fill up and overflowing packets will be dropped by default until some space is created. +
- +
-<code bash> +
-$ sudo iptables -I OUTPUT -p icmp -j NFQUEUE --queue-num 0 --queue-bypass+
 </​code>​ </​code>​
  
-Finallyrun **ops-inject** while telling it to append the Record Route option (0x07). Because the tool takes a file as input, and because we give it the PseudoTerminal Slave of a [[https://www.gnu.org/software/bash/​manual/​html_node/​Command-Grouping.html|subshell]],​ things can get messy if we simply run it with **sudo**. As a resultit's easier ​to just switch to **root**. To get a feel of what the other options do, just run ''​ops-inject %%--%%help''​ once.+If you don't have KSM enabledyou //could// recompile the kernel with the CONFIG_KSM flag and try it, but you don't have to :)
  
-<​code>​ +Moving forwardNext thing on the list is to check that the **ksmd** daemon is functioning. Any configuration that we'll do will be through the sysfs files in ''​/sys/kernel/​mm/​ksm''​. Consequently,​ you should change user to root (even ''​sudo''​ should not allow you to write to these files.
-$ sudo su +  * **/.../run** : this is **1** if the daemon is active; write 1 to it if it's not 
-./bin/ops-inject -p ip -q 0 -w <(printf ​'\x07') +  * **/​.../​pages_to_scan** : this is how many pages will be scanned before going to sleep; you can increase this to 1000 if you want to see faster results 
-</code>+  * **/​.../​sleep_millisecs** : this is how many ms the daemon sleeps in between scans; since you've modified **pages_to_scan**,​ you can leave this be 
 +  * **/​.../​max_page_sharing** : this is the maximum number of pages that can be de-duplicated;​ in cases like this it's better to go big or go home; so set it to something like 1000000, just to be sure
  
-Having completed ​the setuplet's generate some traffic!+There are a few more files in the ksm/ directory. We will still use one or two later on. But for nowconfiguring the previous ones should be enough. Google the rest if you're interested.
  
-<code bash> +=== [10p] Task B Watch the magic happen ​===
-$ ping -c 3 $(dig +short digitalocean.com | head -n 1) +
-    PING 104.16.182.15 (104.16.182.15) 56(84) bytes of data. +
-    64 bytes from 104.16.182.15:​ icmp_seq=1 ttl=57 time=46.7 ms +
-    RR:     ​141.85.13.15 +
-            37.128.225.226 +
-            37.128.232.178 +
-            37.128.232.177 +
-            80.97.248.33 +
-            162.158.16.1 +
-            104.16.182.15 +
-            104.16.182.15 +
-            162.158.16.1+
  
-    64 bytes from 104.16.182.15: icmp_seq=2 ttl=57 time=14.2 ms     (same route) +For this step it would be better to have a few terminals openFirst, let's start a ''​vmstat''​Keep your eyes on the active memory column when we run the sample program.
-    64 bytes from 104.16.182.15:​ icmp_seq=3 ttl=57 time=18.2 ms     (same route) +
-</​code>​+
  
-Normally, we would need **wireshark** or **tcpdump** to see the result but fortunately,​ **ping** is able to understand the Record Route option. The reason for this is that it can generate it itself (see ''​-R''​ option). Should it have wondered that it received a Record Route option in response to a normal ICMP Echo Request? Apparently not... 
- 
-<note important>​ 
-If your ISP is blocking IP options, try **ping**-ing your //default gateway//. 
- 
-Normally, that should work, but there is really no guarantee. A TP-Link router usually runs Linux 2.6 (at least) and does its job well. A Tenda router, however, most likely runs some garbage proprietary firmware and won't even reply to an ICMP Echo Request with IP options. 
-</​note>​ 
- 
-From this point onward, it's all you! :) 
- 
-=== [5p] Task B - Traffic capture === 
- 
-Run the same experiment with ICMP Echo Request again, but this time capture the traffic using **tcpdump** and write it to a //pcap capture file//. 
- 
-<note tip> 
-Consider using the ''​-U''​ option in **tcpdump** to avoid buffering packets if you plan to suddenly stop it with //Ctrl^C//. 
-</​note>​ 
- 
-<note important>​ 
-If you had reachability problems at task A and IP options just can't get through, use {{:​ep:​labs:​04:​contents:​tasks:​ip-ops.zip|this pcap}} starting with Task C. It contains pretty much what you were supposed to get.  
- 
-As for Task B, show us that you can capture traffic correctly by targeting the //default gateway// again. 
-</​note>​ 
- 
-<​solution -hidden> 
 <code bash> <code bash>
-sudo tcpdump ​-Uw ip-ops.pcap 'icmp and host 104.16.181.15'​+vmstat ​-wa -S m 1
 </​code>​ </​code>​
-</​solution>​ 
  
-=== [5p] Task C Route extraction ===+Next would be a good time to introduce two more files from the ksm/ sysfs directory:​ 
 +  * **/​.../​pages_shared** : this file reports how many //​physical//​ pages are in use at the moment 
 +  * **/​.../​pages_sharing** : this file reports how many //virtual// page table entries point to the aforementioned physical pages 
 +For this experiment we will also want to monitor the number of de-duplicated virtual pages, so have at it:
  
-Use **tshark** to extract the Record Route payload of ICMP Echo Replies from the created //pcap//. \\ 
-You only need the IPs of the intermediary hops; these will be further processed in your script at Task D. 
- 
-<note tip> 
-  * Check out the ''​-Y''​ option in ''​man tshark(1)''​. 
-  * Look for the appropriate [[https://​www.wireshark.org/​docs/​dfref/​i/​ip.html|IPv4 Display Filter]]. 
-  * Test filter expressions in **wireshark** before applying them in **tshark**. 
-</​note>​ 
- 
-<​solution -hidden> 
 <code bash> <code bash>
-tshark ​-r ip-ops.pcap -Tfields -e ip.rec_rt '​icmp.type == && ip.rec_rt'​+watch -cat /​sys/​kernel/​mm/​ksm/​pages_sharing
 </​code>​ </​code>​
-</​solution>​ 
  
-=== [25p] Task D AS lookup ===+Finally, look at the provided code, compile it, and launch the program. As an argument you will need to provide the number of pages that will be allocated and initialized with the same value. Note that not all pages will be de-duplicated instantly. So keep in mind your system'​s RAM limitations before deciding how much you can spare (1-2GB should be ok, right?)
  
-Write a //bash script// starting from your **tshark** command. The script must perform an AS lookup and display information about each registered hop (e.g.IP, AS name, etc.), for all packets in the //pcap//. Run the script. What do you notice?+The result should look something like **Figure 1**:
  
-The output should look something like this ± a few info. +{{:ep:labs:02:​contents:​tasks:​ksm_vmstat.png?​700|}} 
-<​spoiler>​ +<​html><​center>​ 
-{{:ep:labs:04:​contents:​tasks:​ip-rr_sample.png?​700|}} ​| +<​b>​Figure 1:</​b>​ <​b>​vmstat</​b>​ output during the execution of our sample program (unit of measure: MB). The free memory steadily decreases from a baseline value of ~4.5GB to a minimum of ~2.5GB after the process starts. As <​b>​ksmd</​b>​ begins scanning and merging pages, the free memory steadily increases. When the process eventually terminates, the amount of free memory reverts to its initial value. 
-</spoiler>+</​center>​</html>
  
-<note tip> +If you ever want to make use of this in your own experiments,​ remember to adjust ​the configurations of **ksmd**. Waking too often or scanning to many pages at once could end up doing more harm than goodSee what works for your particular system.
-In order to get the required information, ​use the **whois** tool. +
-<code bash> +
-$ whois ${SOME_IP} +
-$ whois -h whois.cymru.com -- -v ${SOME_IP} +
-</​code>​+
  
-Want to make your script'​s ​output ​look pretty? Remember that you have [[https://​www.lihaoyi.com/​post/​BuildyourownCommandLinewithANSIescapecodes.html|ANSI color escape codes]] :) +Include a screenshot with the same output ​as the one in the spoiler above\\ 
-</note>+Edit the screenshot or note in writing at what point you started the application,​ where it reached max memory usage, the interval where KSM daemon was doing its job (in the 10s sleep interval) and where the process died.
  
-<​solution ​-hidden> +=== [10p] Task C Plot results === 
-<file bash analyze_rr.sh> +Now that you’ve observed the effects of KSM using vmstat, it’s time to visualize them. Generate a real-time plot that shows free memory, used memory, and memory used as a buffer over time, based on the freemem column from the output of the vmstat command. The plot should dynamically adjust the axis ranges based on the data. The x-axis should represent time, and the y-axis should represent the amount of free memory. The plot should update in real-time as new data is collected.
-#!/bin/bash+
  
-# analize_rr.sh - perform AS lookup on IP RR content 
-#   $1 : [required] path to a pcap file 
  
  
-###############################################################################​ 
-##############################​ ANSI ESCAPE CODES ##############################​ 
-###############################################################################​ 
- 
-ANSI_RED='​\033[31m'​ 
-ANSI_GREEN='​\033[32m'​ 
-ANSI_YELLOW='​\033[33m'​ 
-ANSI_BLUE='​\033[34m'​ 
-ANSI_PURPLE='​\033[35m'​ 
-ANSI_CYAN='​\033[36m'​ 
-ANSI_BOLD='​\033[1m'​ 
-ANSI_UNBOLD='​\033[2m'​ 
-ANSI_CLEAR='​\033[0m'​ 
- 
-###############################################################################​ 
-#################################​ ENTRY POINT #################################​ 
-###############################################################################​ 
- 
-# argument check 
-if [[ $# -ne 1 || ! -f $1 ]]; then 
-    echo '​Usage:​ ./​analize_rr.sh PCAP_FILE'​ 
-    exit 1 
-fi 
- 
-# parse ICMP Echo Replies in pcap while extracting relevant fields 
-while read -r SRC_IP DST_IP ROUTE; do 
-    # print info about src and dst IP for current packet 
-    printf '​%b%b%15s%b ==> %b%b%-15s%b\n' ​                  \ 
-        ${ANSI_YELLOW} ${ANSI_BOLD} ${SRC_IP} ${ANSI_CLEAR} \ 
-        ${ANSI_YELLOW} ${ANSI_BOLD} ${DST_IP} ${ANSI_CLEAR} 
- 
-    # parse each hop in recorded route 
-    while read -d , -r HOP_IP; do 
-        # print recorded hop IP 
-        printf '​\t%b%bHop:​ %b%s%b\n' ​  \ 
-            ${ANSI_GREEN} ${ANSI_BOLD} \ 
-            ${ANSI_UNBOLD} ${HOP_IP} ${ANSI_CLEAR} 
- 
-        # get organization info 
-        # NOTE: whois can access either RIPE or ARIN databases 
-        #       ​account for different formats 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n'​ \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} ​   \ 
-            'Net Name' ${ANSI_UNBOLD} ​   \ 
-            "​$(whois ${HOP_IP} ​          \ 
-            | grep -i '​netname' ​         \ 
-            | tail -n 1                  \ 
-            | awk '​{$1="";​ print $0}' ​   \ 
-            | xargs)" ​                   \ 
-            ${ANSI_CLEAR} 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n'​ \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} ​   \ 
-            'Org Name' ${ANSI_UNBOLD} ​   \ 
-            "​$(whois ${HOP_IP} ​          \ 
-            | grep -i '​orgname' ​         \ 
-            | tail -n 1                  \ 
-            | awk '​{$1="";​ print $0}' ​   \ 
-            | xargs)" ​                   \ 
-            ${ANSI_CLEAR} 
- 
-        # fetch AS info in one request & print 
-        IFS='​|'​ read -r AS_NUM IP BGP_PREFIX COUNTRY REGISTRY DATE AS_NAME \ 
-            < <(whois -h whois.cymru.com -- -v ${HOP_IP} ​                  \ 
-              | tail -n 1                                                  \ 
-              | sed 's/[ ]*|[ ]*/​|/​g'​) 
- 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n' ​             \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} 'AS Num' ​       \ 
-            ${ANSI_UNBOLD} ${AS_NUM:​-'​NA'​} ${ANSI_CLEAR} 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n' ​             \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} 'BGP Prefix' ​   \ 
-            ${ANSI_UNBOLD} ${BGP_PREFIX:​-'​NA'​} ${ANSI_CLEAR} 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n' ​             \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} '​Country' ​      \ 
-            ${ANSI_UNBOLD} ${COUNTRY:​-'​NA'​} ${ANSI_CLEAR} 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n' ​             \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} 'Reg Authority'​ \ 
-            ${ANSI_UNBOLD} ${REGISTRY:​-'​NA'​} ${ANSI_CLEAR} 
-        printf '​\t\t%b%b%-14s:​ %b%s%b\n' ​             \ 
-            ${ANSI_BLUE} ${ANSI_BOLD} 'Reg Date' ​     \ 
-            ${ANSI_UNBOLD} ${DATE:​-'​NA'​} ${ANSI_CLEAR} 
- 
-    done <<<"​${ROUTE},"​ 
- 
-done < <(tshark -r ${1}             `# input pcap file`        \ 
-                -Y '​icmp.type == 0' `# filter ICMP echo reply` \ 
-                -T fields ​          `# output format` ​         \ 
-                -e ip.src ​          `# print src IP`           \ 
-                -e ip.dst ​          `# print dst IP`           \ 
-                -e ip.rec_rt ​       `# print recorded route` ​  ) 
- 
-</​file>​ 
-</​solution>​ 
- 
-=== [5p] Task E - AS lookup (part II) === 
- 
-As you might have noticed during the previous task, even DigitalOcean uses CloudFlare. 
-{{:​ep:​labs:​04:​contents:​tasks:​docean-icmp.zip|This archive}} contains a //pcap// with ICMP Echo Request/​Replies sent from the university to four VMs hosted on DigitalOcean. Run your script again, on this //pcap// and see if you can spot any interesting organization names. Then, look them up. 
- 
-Not really relevant, but here are the IP addresses of the VMs involved: 
-<​code>​ 
-New York           ​134.122.28.219 
-Frankfurt ​         46.101.222.105 
-Singapore ​         178.128.213.179 
-Toronto ​           165.22.239.70 
-UPB (localhost) ​   10.5.0.1 
-</​code>​ 
ep/labs/04/contents/tasks/ex3.1635855142.txt.gz · Last modified: 2021/11/02 14:12 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0