This shows you the differences between two versions of the page.
ep:labs:04:contents:tasks:ex3 [2023/10/29 20:44] radu.mantu created |
ep:labs:04:contents:tasks:ex3 [2025/03/24 22:08] (current) silvia.dragan [03. [30p] Kernel Samepage Merging] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 03. [??p] Packets, where are you? ==== | + | ==== 03. [30p] Kernel Samepage Merging ==== |
- | Earlier in Ex. 1, we mentioned that eBPF is used for more than traffic filtering. Some of you may have heard of the [[https://dl.acm.org/doi/pdf/10.1145/3281411.3281443|eXpress Data Path (XDP)]] or the more recent [[https://www.usenix.org/system/files/osdi22-zhong_1.pdf|eXpress Resubmission Path (XRP)]]. Both of these are eBPF-powered shunts of kernel data paths that are used to optimize the system for //very// specific types of workloads. We'll return to these in a future lecture (and maybe a lab as well) since they can be considered advanced topics. For now, we'll focus on the third purpose eBPF can serve: execution tracing. | + | [[https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html|KSM]] is a page de-duplication strategy introduced in kernel version 2.6.32. In case you are wondering, it's not the same thing as the file page cache. KSM was originally developed in tandem with KVM in order to detect data pages with //exactly// the same content and make their page table entries point to the same physical address (marked Copy-On-Write.) The end goal was to allow more VMs to run on the same host. Since each page must be scanned for identical content, this solution had no chance of scaling well with the available quantity of RAM. So, the developers compromised to scan only with the private anonymous pages that were marked as likely candidates via ''madvise(addr, length, MADV_MERGEABLE)''. |
- | [[https://github.com/cilium/pwru|pwru]] is a tool created by Cilium to help trace network packets in the kernel's network stack and debug network connectivity issues. It does this by attaching simple eBPF programs to certain function entry points. These programs can report back to a userspace process different kinds of information, including the function that was reached, the arguments that were passed, and a CPU clock timestamp. The method used for instrumenting kernel code is based on [[https://www.kernel.org/doc/html/latest/trace/kprobes.html|kprobes]]. Ask your assistant for more information. | + | Download the {{:ep:labs:02:contents:tasks:ksm.zip|skeleton}} for this task. |
- | == The Task(s) == | + | === [10p] Task A - Check kernel support & enable ksmd === |
- | Install **pwru** on your system. Check that the minimum requirements stated on the Github page are met. Note that this tool is already provided by some public package repos (e.g.: **pacman: extra/**). | + | First things first, you need to verify that KSM was enabled during your kernel's compilation. For this, you need to check the Linux build configuration file. Hopefully, you should see something like this: |
- | Now, trace all outgoing DNS queries to the Google DNS (i.e.: ''8.8.8.8'') and perform one using **dig**. Analyze the call path in the kernel network stack. __Explain__ each step of the packet's journey. Also, add relative timestamps to each entry in the generated trace, to get an idea what the computational cost of each operation actually is. Check out this [[https://makelinux.github.io/kernel/map/|map of the kernel subsystems]], but note that the best source of information is always [[https://elixir.bootlin.com/linux/latest/source|RTFS]]. | + | <code bash> |
+ | # on Ubuntu you can usually find it in your /boot partition | ||
+ | $ grep CONFIG_KSM /boot/config-$(uname -r) | ||
+ | CONFIG_KSM=y | ||
- | <note important> | + | # otherwise, you can find a gzip compressed copy in /proc |
- | Be careful of local DNS caches, especially on Ubuntu. | + | $ zcat /proc/config.gz | grep CONFIG_KSM |
- | </note> | + | CONFIG_KSM=y |
+ | </code> | ||
- | Finally, insert an **iptables** rule on the OUTPUT chain that drops DNS queries to ''8.8.8.8''. Check where the packet's path is cut short. | + | If you don't have KSM enabled, you //could// recompile the kernel with the CONFIG_KSM flag and try it, but you don't have to :) |
- | <solution -hidden> | + | Moving forward. Next thing on the list is to check that the **ksmd** daemon is functioning. Any configuration that we'll do will be through the sysfs files in ''/sys/kernel/mm/ksm''. Consequently, you should change user to root (even ''sudo'' should not allow you to write to these files.) |
- | Some [[https://wiki.linuxfoundation.org/networking/kernel_flow|extra info]] (partial to TCP). | + | * **/.../run** : this is **1** if the daemon is active; write 1 to it if it's not |
+ | * **/.../pages_to_scan** : this is how many pages will be scanned before going to sleep; you can increase this to 1000 if you want to see faster results | ||
+ | * **/.../sleep_millisecs** : this is how many ms the daemon sleeps in between scans; since you've modified **pages_to_scan**, you can leave this be | ||
+ | * **/.../max_page_sharing** : this is the maximum number of pages that can be de-duplicated; in cases like this it's better to go big or go home; so set it to something like 1000000, just to be sure | ||
+ | |||
+ | There are a few more files in the ksm/ directory. We will still use one or two later on. But for now, configuring the previous ones should be enough. Google the rest if you're interested. | ||
+ | |||
+ | === [10p] Task B - Watch the magic happen === | ||
+ | |||
+ | For this step it would be better to have a few terminals open. First, let's start a ''vmstat''. Keep your eyes on the active memory column when we run the sample program. | ||
- | The commands: | ||
<code bash> | <code bash> | ||
- | $ sudo iptables -I OUTPUT -p udp -d 8.8.8.8 --dport 53 -j DROP | + | $ vmstat -wa -S m 1 |
- | $ sudo pwru 'dst host 8.8.8.8 && dst port 53' | + | |
- | $ dig +short ocw.cs.pub.ro @8.8.8.8 | + | |
</code> | </code> | ||
- | </solution> | + | |
+ | Next would be a good time to introduce two more files from the ksm/ sysfs directory: | ||
+ | * **/.../pages_shared** : this file reports how many //physical// pages are in use at the moment | ||
+ | * **/.../pages_sharing** : this file reports how many //virtual// page table entries point to the aforementioned physical pages | ||
+ | For this experiment we will also want to monitor the number of de-duplicated virtual pages, so have at it: | ||
+ | |||
+ | <code bash> | ||
+ | $ watch -n 0 cat /sys/kernel/mm/ksm/pages_sharing | ||
+ | </code> | ||
+ | |||
+ | Finally, look at the provided code, compile it, and launch the program. As an argument you will need to provide the number of pages that will be allocated and initialized with the same value. Note that not all pages will be de-duplicated instantly. So keep in mind your system's RAM limitations before deciding how much you can spare (1-2GB should be ok, right?) | ||
+ | |||
+ | The result should look something like **Figure 1**: | ||
+ | |||
+ | {{:ep:labs:02:contents:tasks:ksm_vmstat.png?700|}} | ||
+ | <html><center> | ||
+ | <b>Figure 1:</b> <b>vmstat</b> output during the execution of our sample program (unit of measure: MB). The free memory steadily decreases from a baseline value of ~4.5GB to a minimum of ~2.5GB after the process starts. As <b>ksmd</b> begins scanning and merging pages, the free memory steadily increases. When the process eventually terminates, the amount of free memory reverts to its initial value. | ||
+ | </center></html> | ||
+ | |||
+ | If you ever want to make use of this in your own experiments, remember to adjust the configurations of **ksmd**. Waking too often or scanning to many pages at once could end up doing more harm than good. See what works for your particular system. | ||
+ | |||
+ | Include a screenshot with the same output as the one in the spoiler above. \\ | ||
+ | Edit the screenshot or note in writing at what point you started the application, where it reached max memory usage, the interval where KSM daemon was doing its job (in the 10s sleep interval) and where the process died. | ||
+ | |||
+ | === [10p] Task C - Plot results === | ||
+ | Now that you’ve observed the effects of KSM using vmstat, it’s time to visualize them. Generate a real-time plot that shows free memory, used memory, and memory used as a buffer over time, based on the freemem column from the output of the vmstat command. The plot should dynamically adjust the axis ranges based on the data. The x-axis should represent time, and the y-axis should represent the amount of free memory. The plot should update in real-time as new data is collected. | ||
+ | |||