isc:labs:kernel:tasks:02 [CS Open CourseWare]

This is an old revision of the document!

02. [??p] Kernel modules

Way back when, kernels used to be monolithic, meaning that adding new functionality required recompiling and installing it, followed by a reboot. Today, things are much easier. By using the kmod daemon (man 8 kmod), users are allowed to load and unload modules (i.e.: kernel object files) on demand, without all the fuss. These modules are C programs that must implement initialization and removal functions that are called automatically. Usually, these functions register / unregister other functions contained in your object with core kernel systems.

We can use lsmod to get a list of all present modules, and modinfo to obtain detailed information about a specific module.

$ lsmod
ecdh_generic           16384  1 bluetooth
 
$ modinfo ecdh_generic | grep description
description:    ECDH generic algorithm
 
$ modinfo bluetooth | grep description 
description:    Bluetooth Core ver 2.22

What we can understand from this is that the Elliptic Curve Diffie-Hellman module is 16384 bytes in size and is used by one other module, via the bluetooth ECDH helper. As you probably noticed, elixir.bootlin.com is a critical resource in navigating the kernel code.

[??p] Task A - Our first module

Looking in the skel/01/ directory from our code skeleton, we will find a minimal build environment for our first module. Alas, compiling a kernel module differs from compiling a user space program. But just slightly: kernel-specific headers must be used, user space-specific libraries (e.g.: libc) are generally unavailable (so no printf()) and lastly, the same options that were used to compile the kernel itself must be specified. To this end, the kbuild system was introduced. As you can see, our Makefile invokes its correpsondent from the kernel source directory in /lib/modules/..., which in turn uses the configuration in our Kbuild file. The obj-m variable specifies the name of the final output object file (in this case, test.o). test-objs contains a sequence of dependent object files, so if you split your code across multiple sources, just add them to test-objs. If you have a single source, you can drop test-objs but the kbuild system will expect a test.c file to be present.

Now, let's compile our module, upload it into the kernel, and see what happens:

$ make
 
$ sudo insmod test.ko
$ sudo dmesg
...
[ 6348.461247] my-first-module: Hello world!
 
$ sudo rmmod test
$ sudo dmesg
...
[ 6348.461247] my-first-module: Hello world!
[ 6366.635090] my-first-module: Goodbye cruel, cruel world!

Here, we used insmod to upload a .ko kernel object file into the kernel proper and rmmod to remove it. dmesg is a tool that prints the kernel message buffer. Note that there are multiple log levels ranging from debug to emergency. pr_info() is the kernel's printf() variant that corresponds to one of the less urgent levels. dmesg can be configured to squelch messages under a certain level but depending on how your kernel was compiled, some of the more important messages will also be echoed to your terminal.

[??p] Task B - Debugging

In this task we are going to add a bug to our initial module. We will do this by applying a diffpatch to our source:

[student@host]$ patch my_first_module.c patches/add_bug.patch

Now, our module has a 50% chance to dereference a NULL pointer every time we try to load it. If this happens, a kernel oops will occur. While no error is truly harmless, an oops is more so than a kernel panic. The difference between the two is that the system can recover from a kernel oops, but not from a kernel panic. The Windows equivalent of a kernel panic would be a Blue Screen Of Death.

Knowing that our module will cause trouble, we should test it inside the VM. In order to do this, we need to recompile it using the Makefile in the Linux repo that we cloned. For this, we overwrite the KDIR variable used in our module's Makefile.

# clean up previously created objects
[student@host]$ make clean
 
# recompile the module, but for the kernel used in the VM; not your live kernel
[student@host]$ KDIR=$(realpath ../linux/) make

Now, we need to get test.ko onto the VM. First of all, if it's still running, kill it. Next, we are going to once again mount the disk image and copy the kernel object in the root home directory. Doing so on a live partition might be a bit trickier :p

# stop the VM if it's still running
[  root@guest]$ poweroff
 
# once again, mount the VM disk image
[student@host]$ sudo mount ../images/ubuntu.raw /mnt
 
# copy the module in the VM's root home
[student@host]$ sudo cp test.ko /mnt/root
 
# unmount the disk before starting the VM again
[student@host]$ sudo umount /mnt

Finally, start up qemu once again and notice that test.ko is in /root/. Try to load it with insmod until you get an error like this:

Click to display ⇲

Click to hide ⇱

root@victim:~# insmod test.ko
[   26.083587] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   26.084413] #PF: supervisor write access in kernel mode
[   26.085044] #PF: error_code(0x0002) - not-present page
[   26.085663] PGD 0 P4D 0
[   26.085972] Oops: 0002 [#1] PREEMPT SMP PTI
[   26.086475] CPU: 0 PID: 212 Comm: insmod Tainted: G           O      5.16.0-rc2+ #1
[   26.087385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[   26.088487] RIP: 0010:init+0x3f/0x70 [test]
[   26.089000] Code: 24 08 31 c0 48 8d 7c 24 07 e8 7d dd dd d2 0f b6 74 24 07 48 c7 c7 00 d0 34 c0 e8 74 0
[   26.091188] RSP: 0018:ffff96c8c01cfde0 EFLAGS: 00010282
[   26.091813] RAX: 0000000000000023 RBX: 0000000000000000 RCX: 0000000000000000
[   26.092656] RDX: 0000000000000000 RSI: ffffffff940390f9 RDI: 00000000ffffffff
[   26.093496] RBP: ffffffffc034c000 R08: ffffffff94335c88 R09: 00000000ffffdfff
[   26.094344] R10: ffffffff94255ca0 R11: ffffffff94255ca0 R12: 0000000000000000
[   26.095185] R13: ffff899ac4cbe4a0 R14: 0000000000000003 R15: 0000000000000000
[   26.096044] FS:  00007f00a9916540(0000) GS:ffff899afbc00000(0000) knlGS:0000000000000000
[   26.097013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.097701] CR2: 0000000000000000 CR3: 00000001027ba000 CR4: 00000000000006f0
[   26.098545] Call Trace:
[   26.098860]  <TASK>
[   26.099125]  do_one_initcall+0x3f/0x1e0
[   26.099608]  ? kmem_cache_alloc_trace+0x3a/0x1b0
[   26.100164]  do_init_module+0x56/0x240
[   26.100617]  __do_sys_finit_module+0xa0/0xe0
[   26.101139]  do_syscall_64+0x3b/0x90
[   26.101595]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   26.102224] RIP: 0033:0x7f00a9a5b70d
[   26.102658] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 8
[   26.104854] RSP: 002b:00007ffd249707f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   26.105751] RAX: ffffffffffffffda RBX: 000055a5530f3490 RCX: 00007f00a9a5b70d
[   26.106608] RDX: 0000000000000000 RSI: 000055a55286a358 RDI: 0000000000000003
[   26.107441] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f00a9b2f260
[   26.108290] R10: 0000000000000003 R11: 0000000000000246 R12: 000055a55286a358
[   26.109155] R13: 0000000000000000 R14: 000055a5530f2400 R15: 0000000000000000
[   26.110005]  </TASK>
[   26.110274] Modules linked in: test(O+)
[   26.110733] CR2: 0000000000000000
[   26.111178] ---[ end trace d28d04e4e0f18a50 ]---

This info dump may be intimidating at first sight, but it contains all the necessary information to identify the problem:

BUG: kernel NULL pointer dereference, address: 0000000000000000: the reason behind the oops.
#PF: supervisor write access in kernel mode: when dereferencing the virtual address 0x00, the MMU tried to find the corresponding physical page address, but failed. Remember that #PF stands for Page Fault.
RIP: 0010:init+0x3f/0x70 [test]: the faulting instruction was located in the test module, at an offset of 0x3f from the start of the init() function, which has a total size of 0x70 bytes.

Based on this information (especially the last part), we have a few ways of identifying the exact line of code and instruction where the module crashed. First one up, is addr2line. This tool can convert an address to a source file line number, given that the binary was compiled with debug symbols. We already know that the instruction was located at an offset of 0x3f from the init() function, but where was this function located relative to the beginning of the object? This can be easily discovered by consulting its symbol table with readelf.

# where is init() located relative to the start of the object file?
[student@host]$ readelf --symbols test.ko
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
    24: 0000000000000000   102 FUNC    LOCAL  DEFAULT    1 init
    ...
# apparently right at the very start ==> our instruction is at address 0x00 + 0x3f = 0x3f
 
# what line from what source file generated the instruction at address 0x3f?
[student@host]$ addr2line --exe test.ko 0x3f
/.../my_first_module.c:26

Another way of identifying not only the source code line, but also the instruction is by using a tool that may be familiar to you: objdump. This is a binary file disassembler. Next, we are going to disassemble (-d) only the .text section (a.k.a. the code section), displaying the instruction mnemonics in Intel syntax (-M intel) and interlacing the C code that generated these instructions (-S).

# looking for that elusive 3f offset...
[student@host]$ objdump -d -M intel -S test.ko
 ...
    /* we have a 50-50 chance to shoot ourselves in the foot */
    if (random & 0x80) {
  34:   80 7c 24 07 00          cmp    BYTE PTR [rsp+0x7],0x0
  39:   0f 89 00 00 00 00       jns    3f <init_module+0x3f>
        *((uint8_t *) NULL) = 0xff;
  3f:   c6 04 25 00 00 00 00    mov    BYTE PTR ds:0x0,0xff
  46:   ff 
    } else {
 ...

Let's say a module generates an oops. Even if the kernel recovers, that module will be locked in place until reboot. If you try to rmmod it, the kernel will claim that it's still in use. For our example:

# is our module still loaded?
[root@guest]$ lsmod | grep test
test                   16384  1
 
# can we remove the module?
[root@guest]$ rmmod test
rmmod: ERROR: Module test is in use
 
# looks like the module crashed while in the "Loading" state
# the kernel was trying to load it at address 0xffffffffc0304000
[root@guest]$ cat /proc/modules
test 20480 1 - Loading 0xffffffffc0304000 (O+)

02. [??p] Kernel modules
- [??p] Task A - Our first module
- [??p] Task B - Debugging

isc/labs/kernel/tasks/02.1637836225.txt.gz · Last modified: 2021/11/25 12:30 by radu.mantu

Old revisions

Media Manager Back to top

02. [??p] Kernel modules

[??p] Task A - Our first module

[??p] Task B - Debugging

Lectures

Labs

Support

Table of Contents