This shows you the differences between two versions of the page.
ass:cursuri:01:theory:02 [2023/07/16 19:34] florin.stancu [Processor protection domains] |
ass:cursuri:01:theory:02 [2023/07/17 22:39] (current) radu.mantu |
||
---|---|---|---|
Line 1: | Line 1: | ||
==== Processor protection domains ==== | ==== Processor protection domains ==== | ||
+ | |||
+ | //In this section, we try to summarize some of the traditional reponsibilities of a CPU. | ||
+ | If you are already acquainted with them (or some), feel free to skim past them!// | ||
=== x86 protection modes === | === x86 protection modes === | ||
Line 11: | Line 14: | ||
* **Ring 0:** also known as kernel space; the most privileged code is running, with (almost) unrestricted access to the hardware. | * **Ring 0:** also known as kernel space; the most privileged code is running, with (almost) unrestricted access to the hardware. | ||
- | <note> | + | A special **System Call** instruction is required to get from a lower-privileged level to a higher one. |
- | The reason why the inner rings lost favor among kernel developers is mostly due to another component of the modern protection model, namely Virtual Memory / Pagination (see Fig. 1). Paging is a system that allows the Operating System to present each process a different view of its memory. When said process tries to access a page (i.e.: 4KB block), the address is translated by the Memory Management Unit (MMU), a hardware component, by means of a data structure called Page Table, unique to each process and residing in kernel memory. This allows the kernel to obscure parts of memory (e.g.: that of other processes) as a means of isolation. Or to over-commit resources, only to actually allocate them when needed (e.g.: ''malloc()''-ed memory is assigned to the process only after it is first accessed). | + | It is usually implemented as a software interrupt / trap by the CPU and, when invoked by, e.g., a user program, it will pause and save its CPU state on the stack (program counter, flags) and invoke a special routine registered by the Operating System. The OS kernel will be passed control to and begin to analyze (via a standardized calling convention) and execute the request (such as read from / write to the filesystem / disk / network / USB devices etc. -- which the application would not normally be privileged enough to accomplish; remember: hardware access is quite restricted from upper rings). |
- | </note> | + | |
+ | |||
+ | === Virtual Memory, paging and address translation === | ||
+ | |||
+ | Pagination (see Fig. 1) is an architectural feature of all modern processors that allows the Operating System to present each process a different view of its memory. | ||
+ | When said process tries to access a page (e.g., 4KB block), the address is translated by the Memory Management Unit (MMU), a hardware component, by means of a data structure called Page Table, unique to each process and residing in kernel memory. | ||
+ | This allows the kernel to obscure parts of memory (e.g.: that of other processes) as a means of isolation. Or to over-commit resources, only to actually allocate them when needed (e.g.: ''malloc()''-ed memory is assigned to the process only after it is first accessed). | ||
- | {{ :ass:laboratoare:01:theory:paging.png?750 |}} | + | {{ :ass:laboratoare:01:theory:paging.png?700 |}} |
<html><center> | <html><center> | ||
<b>Figure 1:</b> Translation of a virtual address to its physical equivalent via the Memory Management Unit. Each process has its own unique view of how objects were loaded in memory. At the same time, the kernel appears the same to all processes. | <b>Figure 1:</b> Translation of a virtual address to its physical equivalent via the Memory Management Unit. Each process has its own unique view of how objects were loaded in memory. At the same time, the kernel appears the same to all processes. | ||
</center></html> | </center></html> | ||
- | Because changing the active Page Table is an expensive operation (mostly due to CPU cache invalidation), the Virtual Address Space of each process also contains the kernel memory mapped in the higher half. Yes, the kernel also uses virtual addressing for its own memory. When the kernel needs to intervene on behalf of the unprivileged process (e.g.: when a System Call is performed), the CPU state transitions from ring3 to ring0, but the Page Table does not need to be switched out. Although this technique increases the overall system performance, it also raises a question: how do we stop an unprivileged process from accessing kernel memory, since it's already mapped in its virtual address space? | + | Because changing the active Page Table is an expensive operation (mostly due to CPU cache invalidation), the Virtual Address Space of each process also contains the kernel memory mapped in the higher half. Yes, the kernel also uses virtual addressing for its own memory. When the kernel needs to intervene on behalf of the unprivileged process (e.g.: when a System Call is performed), the CPU state transitions from //Ring 3// to //Ring 0//, but the Page Table does not need to be switched out. Although this technique increases the overall system performance, it also raises a question: how do we stop an unprivileged process from accessing kernel memory, since it's already mapped in its virtual address space? |
- | Aside from information relevant to the address translation itself, the Page Table also contains access restrictions. Each memory transaction that leads to an address translation also presents its intent, e.g.: whether it wants to write data to memory, or fetch an instruction to execute. This allows the MMU to block such access depending on the Read-Write-Execute permissions associated with each page. However, this is only one example of restriction that can be enforced. The Page Table can also restrict access based on privilege levels. Unfortunately, the architecture defines only two such levels: privileged (ring0-2) and unprivileged (ring3). | + | The answer is that, aside from information relevant to the address translation itself, the Page Table also contains access restrictions. Each memory transaction that leads to an address translation also presents its intent, e.g.: whether it wants to write data to memory, or fetch an instruction to execute. This allows the MMU to block such access depending on the Read-Write-Execute permissions associated with each page. However, this is only one example of restriction that can be enforced. The Page Table can also restrict access based on privilege levels. Unfortunately, the architecture defines only two such levels: Privileged (//Rings 0-2//) and unprivileged (//Ring 3//). |
- | By now it should start becoming clear why ring1 and ring2 were abandoned. Since the same isolation guarantees that User Space and Kernel Space enjoy could not be extended to each protection domain, sacrificing performance only to restrict access to a few privileged instructions was simply not worth it. Nonetheless, ring1 and ring2 are //still// implemented on x86 CPUs to this day. The reason? Based the announcement of the new [[https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html|x86s]] architecture that's supposed to eliminate 16-bit and 32-bit modes of 64-bit processors, it's safe to rule out backward compatibility. The real reason is probably that it's just cheaper this way. | + | <note> |
+ | **Fun fact**, if you're wondering why //Rings// 1 and 2 were abandoned: | ||
+ | originally, when the x86 CPU was designed, it didn't have the memory pagination feature, but a precursor called segmentation (which, thankfully, we'll not cover here). | ||
+ | A segment stored the maximum privilege level it could be accessed from using two bits (thus values 0--3), so he CPU could fully check for permissions for any of its Rings. | ||
+ | But, because sacrificing performance only to restrict access to a few privileged instructions was simply not worth it, these mechanics were not continued for the page table, thus the other two rings lost their advantages and were forsaken. | ||
+ | |||
+ | Nonetheless, the two inner protection rings are //still// implemented on x86 CPUs to this day. The question is: why? Based the announcement of the new [[https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html|x86s]] architecture that's supposed to eliminate 16-bit and 32-bit modes of 64-bit processors, it's safe to rule out backward compatibility. The real reason is probably that it's just cheaper this way (changing the logic design of a processor is risky, requires extensive testing and very costly prototyping). | ||
+ | </note> | ||
<spoiler Want to know more curiositis about the x86 privilege levels? Check this out!> | <spoiler Want to know more curiositis about the x86 privilege levels? Check this out!> | ||
Line 33: | Line 49: | ||
- **Ring -4:** A [[https://i.blackhat.com/us-18/Thu-August-9/us-18-Domas-God-Mode-Unlocked-Hardware-Backdoors-In-x86-CPUs-wp.pdf|deeply embedded core]] that was discovered in some Intel CPUs and was presented at BlackHat 2018. This is essentially a hidden co-processor that shares an execution pipeline and some of its registers with the main processor. A transition to this mode can be performed from by a knowledgeable attacker from //any// privilege levels, including ring3. While in ring-4, the executing (normally unprivileged code) presumably has access to all system resources, similarly to ring-3. | - **Ring -4:** A [[https://i.blackhat.com/us-18/Thu-August-9/us-18-Domas-God-Mode-Unlocked-Hardware-Backdoors-In-x86-CPUs-wp.pdf|deeply embedded core]] that was discovered in some Intel CPUs and was presented at BlackHat 2018. This is essentially a hidden co-processor that shares an execution pipeline and some of its registers with the main processor. A transition to this mode can be performed from by a knowledgeable attacker from //any// privilege levels, including ring3. While in ring-4, the executing (normally unprivileged code) presumably has access to all system resources, similarly to ring-3. | ||
</spoiler> | </spoiler> | ||
+ | \\ | ||
+ | |||
+ | Finally, please note that, although we described the virtual memory mechanisms of x86, the concepts are really the same for all other architectures (of course, the configuration registers and page entry structure will differ, but they all share a common feature set)! | ||
=== ARM exception levels === | === ARM exception levels === | ||
- | In ARM's nomenclature, the protection modes are called Exception Levels. Although they are analogous to the (important) protection rings in x86, they benefit from one significant improvement: the separation of Secure and Non-Secure Worlds. Depending on the system configuration, access to certain resources (both memory and physical devices) can be restricted to Secure World code. | + | In ARM's nomenclature, the CPU protection modes are called Exception Levels. Although they are analogous to x86's rings, they feature two significant improvements: first, the standardization of the most important modes for userspace, kernel space and hypervisor (for running multiple OSes in Virtual Machines); second, a secure separation between Secure and Non-Secure Worlds, but this will be discussed in [[:ass:cursuri:03|Lecture 03]]. |
- | {{ :ass:laboratoare:01:theory:arm_exception_levels.png?750 |}} | + | {{ :ass:laboratoare:01:theory:arm_exception_levels.png?700 |}} |
<html><center> | <html><center> | ||
- | <b>Figure 1:</b> ARM Exception Levels. | + | <b>Figure 2:</b> ARM Exception Levels. |
</center></html> | </center></html> | ||
- | The Non-Secure world consists of three exception levels: | + | Usually, there are three exception levels: |
- | * **NS-EL0:** User Space (ring3) | + | * **EL0:** User Space (equiv to //Ring 3// from x86); |
- | * **NS-EL1:** Kernel Space (ring0) | + | * **EL1:** Kernel Space (%%~%%//Ring 0//); |
- | * **NS-EL2:** Hypervisor (ring-1) | + | * **EL2:** Hypervisor (%%~%%//Ring -1//); notably absent from the Secure World; |
+ | |||
+ | But, with the introduction of the ARM TrustZone security extensions, [almost] all of these modes were vertically partitioned into two security domains. | ||
+ | To make it possible to switch between them, a new Exception Level -- **EL3** (the Secure Monitor) -- was added. | ||
+ | |||
+ | <spoiler Bonus: if can't wait until Lecture 03 in order to find out about ARM's Trusted Execution features, expand!> | ||
+ | On the Secure World side, we've got: | ||
- | Nothing interesting here; it's the same as x86. On the Secure World side however: | ||
* **S-EL0:** In this mode, Trusted Applications (TA) are being executed. TAs are system-critical functions that can be invoked from anywhere in the Non-Secure World (e.g.: encrypting application data when it needs to be saved to persistent storage; can be done explicitly by the application or implicitly by the kernel driver). We'll take a look at how TAs are written in the second lab. | * **S-EL0:** In this mode, Trusted Applications (TA) are being executed. TAs are system-critical functions that can be invoked from anywhere in the Non-Secure World (e.g.: encrypting application data when it needs to be saved to persistent storage; can be done explicitly by the application or implicitly by the kernel driver). We'll take a look at how TAs are written in the second lab. | ||
* **S-EL1:** This is the Trusted OS. Similarly to a regular OS, it manages access to (some) devices such as the Trusted Platform Module (TPM). Additionally, it must be able to interpret TA invocations from the Non-Secure World while providing the TAs functionalities similar to what Linux offers to its processes via system calls. For example, if a TA wants to open a file in a secure disk partition, by running in user space it will not have direct access to the File System Layer or the underlying block device. Secure or Non-Secure, it still runs in an unprivileged processor state, without access to the hardware. | * **S-EL1:** This is the Trusted OS. Similarly to a regular OS, it manages access to (some) devices such as the Trusted Platform Module (TPM). Additionally, it must be able to interpret TA invocations from the Non-Secure World while providing the TAs functionalities similar to what Linux offers to its processes via system calls. For example, if a TA wants to open a file in a secure disk partition, by running in user space it will not have direct access to the File System Layer or the underlying block device. Secure or Non-Secure, it still runs in an unprivileged processor state, without access to the hardware. | ||
Line 54: | Line 78: | ||
Although chances are you haven't heard of it, Intel had a similar solution called the [[https://eprint.iacr.org/2016/086.pdf|Software Guard Extension]] (SGX). This extension was meant to protect small amounts (~72MB) of sensitive (user space) application data and code from a potentially malicious OS. This was realized by restricting access to the protected memory ranges (Enclaves) to code that already resided in the Enclave. Additionally, calls to Enclave functions could be made only via a strictly enforced API defined by the user at compile time; so no arbitrary jumps after a return to libc. There are numerous reasons why this technology failed. The main one would be that it did not work. Researchers have found dozens of ways to break the isolation guarantees that SGX was supposed to offer, most of them relying on side channels attacks (i.e.: deducing user secrets by observing how the target process influences the system). Coupled with the lack of isolation for privileged code that ARM offers (S-EL1) and the fact that Intel's remote attestation of SGX-capable CPUs and secure applications could not be offloaded to third parties, more or less guaranteed its fade from relevance. | Although chances are you haven't heard of it, Intel had a similar solution called the [[https://eprint.iacr.org/2016/086.pdf|Software Guard Extension]] (SGX). This extension was meant to protect small amounts (~72MB) of sensitive (user space) application data and code from a potentially malicious OS. This was realized by restricting access to the protected memory ranges (Enclaves) to code that already resided in the Enclave. Additionally, calls to Enclave functions could be made only via a strictly enforced API defined by the user at compile time; so no arbitrary jumps after a return to libc. There are numerous reasons why this technology failed. The main one would be that it did not work. Researchers have found dozens of ways to break the isolation guarantees that SGX was supposed to offer, most of them relying on side channels attacks (i.e.: deducing user secrets by observing how the target process influences the system). Coupled with the lack of isolation for privileged code that ARM offers (S-EL1) and the fact that Intel's remote attestation of SGX-capable CPUs and secure applications could not be offloaded to third parties, more or less guaranteed its fade from relevance. | ||
+ | |||
+ | </spoiler> | ||
+ | |||
+ | \\ | ||
+ | \\ | ||
+ | |||