This is an old revision of the document!

03. [??p] Kernel Samepage Merging

KSM is a page de-duplication strategy introduced in kernel version 2.6.32. In case you are wondering, it's not the same thing as the file page cache. KSM was originally developed in tandem with KVM in order to detect data pages with exactly the same content and make their page table entries point to the same physical address (marked Copy-On-Write.) The end goal was to allow more VMs to run on the same host. Since each page must be scanned for identical content, this solution had no chance of scaling well with the available quantity of RAM. So, the developers compromised to scan only with the private anonymous pages that were marked as likely candidates via madvise(addr, length, MADV_MERGEABLE).

[??p] Task A - Check kernel support & enable ksmd

First things first, you need to verify that KSM was enabled during your kernel's compilation. For this, you need to check the Linux make config build file that is stored on your /boot partition. Hopefully, you should see something like this:

$grep CONFIG_KSM /boot/config-$(uname -r)
CONFIG_KSM=y

If you don't have KSM enabled, you could recompile the kernel with the CONFIG_KSM flag and try it, but you don't have to :)

Moving forward. Next thing on the list is to check that the ksmd daemon is functioning. Any configuration that we'll do will be through the sysfs files in /sys/kernel/mm/ksm. Consequently, you should change user to root (even sudo should not allow you to write to these files.)

• /…/read : this is 1 if the daemon is active; write 1 to it if it's not
• /…/pages_to_scan : this is how many pages will be scanned before going to sleep; you can increase this to 1000 if you want to see faster results
• /…/sleep_millisecs : this is how many ms the daemon sleeps in between scans; since you've modified pages_to_scan, you can leave this be
• /…/max_page_sharing : this is the maximum number of pages that can be de-duplicated; in cases like this it's better to go big or go home; so set it to something like 1000000, just to be sure

There are a few more files in the ksm/ directory. We will still use one or two later on. But for now, configuring the previous ones should be enough. Google the rest if you're interested.

[??p] Task B - Watch the magic happen

For this step it would be better to have a few terminals open. First, let's start a vmstat. Keep your eyes on the active memory column when we run the sample program.

$vmstat -wa -S m 1 Next would be a good time to introduce two more files from the ksm/ sysfs directory: • /…/pages_shared : this file reports how many physical pages are in use at the moment • /…/pages_sharing : this file reports how many virtual page table entries point to the aforementioned physical pages For this experiment we will also want to monitor the number of de-duplicated virtual pages, so have at it: $ watch -n 0 cat /sys/kernel/mm/ksm/pages_sharing

Finally, look at the provided code, compile it, and launch the program. As an argument you will need to provide the number of pages that will be allocated and initialized with the same value. Note that not all pages will be de-duplicated instantly. So keep in mind your system's RAM limitations before deciding how much you can spare (1-2GB should be ok, right?)

The result should look something like this:

Click to display ⇲

Click to hide ⇱

Here, we can see the active memory suddenly rising when we start the process. Over the next few seconds, as ksmd starts scanning pages, the active memory slowly drops. Finally, as the process terminates, all memory is reclaimed by the kernel and the active memory returns to roughly the same value as before. If you'll ever want to make use of this in your own experiments, remember to adjust the configurations of ksmd. Waking too often or scanning to many pages at once could end up doing more harm than good. See what works for your particular system.