This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex5 [2025/03/05 12:22] radu.mantu |
ep:labs:03:contents:tasks:ex5 [2025/05/06 10:27] (current) radu.mantu |
||
---|---|---|---|
Line 1: | Line 1: | ||
==== 05. [10p] Bonus - Hardware Counters ==== | ==== 05. [10p] Bonus - Hardware Counters ==== | ||
+ | |||
+ | <note> | ||
+ | Solve the rest of the lab within the allotted time to unlock this bonus exercise ;) | ||
+ | </note> | ||
A significant portion of the system statistics that can be generated involve hardware counters. As the name implies, these are special registers that count the number of occurrences of specific events in the CPU. These counters are implemented through **Model Specific Registers** (MSR), control registers used by developers for debugging, tracing, monitoring, etc. Since these registers may be subject to changes from one iteration of a microarchitecture to the next, we will need to consult chapters 18 and 19 from [[https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html|Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3B]]. | A significant portion of the system statistics that can be generated involve hardware counters. As the name implies, these are special registers that count the number of occurrences of specific events in the CPU. These counters are implemented through **Model Specific Registers** (MSR), control registers used by developers for debugging, tracing, monitoring, etc. Since these registers may be subject to changes from one iteration of a microarchitecture to the next, we will need to consult chapters 18 and 19 from [[https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html|Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3B]]. | ||
Line 12: | Line 16: | ||
- General Purpose Counters | - General Purpose Counters | ||
* can be configured to monitor a specific event from a list of over 200 (see chapters 19.1 and 19.2) | * can be configured to monitor a specific event from a list of over 200 (see chapters 19.1 and 19.2) | ||
- | |||
- | Download {{:ep:labs:01:contents:tasks:hw_counter.zip|}}. | ||
Here is an overview of the following five tasks: | Here is an overview of the following five tasks: | ||
Line 22: | Line 24: | ||
* **Task E**: use RDPMC to measure the cache misses for a familiar program. | * **Task E**: use RDPMC to measure the cache misses for a familiar program. | ||
- | <html><h4>Task A - Hardware info</h4></html> | + | === Task A - Hardware info === |
First of all, we need to know what we are working with. Namely, the microarchitecture //version ID// and the //number of counters// per core. To this end, we will use [[https://linux.die.net/man/1/cpuid|cpuid]] (basically a wrapper over the [[https://www.felixcloutier.com/x86/cpuid|CPUID]] instruction.) All the information that we need will be contained in the 0AH leaf (might want to get the raw output of **cpuid**): | First of all, we need to know what we are working with. Namely, the microarchitecture //version ID// and the //number of counters// per core. To this end, we will use [[https://linux.die.net/man/1/cpuid|cpuid]] (basically a wrapper over the [[https://www.felixcloutier.com/x86/cpuid|CPUID]] instruction.) All the information that we need will be contained in the 0AH leaf (might want to get the raw output of **cpuid**): | ||
Line 33: | Line 35: | ||
Point out to your assistant which is which in the **cpuid** output. | Point out to your assistant which is which in the **cpuid** output. | ||
- | <html><h4>Task B - Unlock RDPMC in ring3</h4></html> | + | === Task B - Unlock RDPMC in ring3 === |
This is pretty straightforward. All you need to do is set the **Performance-Monitor Counter Enable** bit in [[https://en.wikipedia.org/wiki/Control_register#CR4|CR4]]. Naturally, this can't be done from ring3. As such, we provide a kernel module that does it for you (see //hack_cr4.c//.) When the module is loaded, it will set the aforementioned bit. Similarly, when the module is unloaded, it will revert the change. Try compiling the module, loading and unloading it and finally, check the kernel message log to verify that it works. | This is pretty straightforward. All you need to do is set the **Performance-Monitor Counter Enable** bit in [[https://en.wikipedia.org/wiki/Control_register#CR4|CR4]]. Naturally, this can't be done from ring3. As such, we provide a kernel module that does it for you (see //hack_cr4.c//.) When the module is loaded, it will set the aforementioned bit. Similarly, when the module is unloaded, it will revert the change. Try compiling the module, loading and unloading it and finally, check the kernel message log to verify that it works. | ||
Line 45: | Line 47: | ||
Note: the module must remain loaded in the kernel in order to keep the bit set. If during Task E you get a segfault, the reason is that you (probably) unloaded the module and you no longer have permission to run the instruction in ring3. This does NOT invalidate your work in Tasks C and D; simply load the module once more. | Note: the module must remain loaded in the kernel in order to keep the bit set. If during Task E you get a segfault, the reason is that you (probably) unloaded the module and you no longer have permission to run the instruction in ring3. This does NOT invalidate your work in Tasks C and D; simply load the module once more. | ||
- | <html><h4>Task C - Configure IA32_PERF_GLOBAL_CTRL</h4></html> | + | === Task C - Configure IA32_PERF_GLOBAL_CTRL === |
{{ :ep:labs:01:contents:tasks:ia32_perf_global_ctrl.png?600 }} | {{ :ep:labs:01:contents:tasks:ia32_perf_global_ctrl.png?600 }} | ||
Line 72: | Line 74: | ||
</code> | </code> | ||
- | <html><h4>Task D - Configure IA32_PERFEVENTSELx</h4></html> | + | === Task D - Configure IA32_PERFEVENTSELx === |
{{ :ep:labs:01:contents:tasks:ia32_perfeventselx.png?600 }} | {{ :ep:labs:01:contents:tasks:ia32_perfeventselx.png?600 }} | ||
Line 103: | Line 105: | ||
</solution> | </solution> | ||
- | <html><h4>Task E - Ring3 cache performance evaluation</h4></html> | + | === Task E - Ring3 cache performance evaluation === |
As of now, we should be able to modify the **CR4** register with the kernel module, enable all counters in the **IA32_PERF_GLOBAL_CTRL** across all cores and start an **L2 cache miss** counter again, across all cores. What remains is putting everything into practice. | As of now, we should be able to modify the **CR4** register with the kernel module, enable all counters in the **IA32_PERF_GLOBAL_CTRL** across all cores and start an **L2 cache miss** counter again, across all cores. What remains is putting everything into practice. | ||
Line 136: | Line 138: | ||
<solution -hidden> | <solution -hidden> | ||
- | **Task E:** | ||
- | <code C> | ||
- | #define rdpmc(ecx, eax, edx) \ | ||
- | asm volatile ( \ | ||
- | "rdpmc" \ | ||
- | : "=a"(eax), \ | ||
- | "=d"(edx) \ | ||
- | : "c"(ecx)) | ||
- | </code> | ||
- | |||
<code C> | <code C> | ||
/* hardware counter init */ | /* hardware counter init */ | ||
Line 163: | Line 155: | ||
</solution> | </solution> | ||
+ | |||