Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03:contents:tasks:ex5 [2025/05/06 10:27]
radu.mantu
ep:labs:03:contents:tasks:ex5 [2026/01/14 16:59] (current)
radu.mantu
Line 1: Line 1:
 ==== 05. [10p] Bonus - Hardware Counters ==== ==== 05. [10p] Bonus - Hardware Counters ====
- 
-<​note>​ 
-Solve the rest of the lab within the allotted time to unlock this bonus exercise ;) 
-</​note>​ 
  
 A significant portion of the system statistics that can be generated involve hardware counters. As the name implies, these are special registers that count the number of occurrences of specific events in the CPU. These counters are implemented through **Model Specific Registers** (MSR), control registers used by developers for debugging, tracing, monitoring, etc. Since these registers may be subject to changes from one iteration of a microarchitecture to the next, we will need to consult chapters 18 and 19 from [[https://​www.intel.com/​content/​www/​us/​en/​architecture-and-technology/​64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html|Intel 64 and IA-32 Architectures Developer'​s Manual: Vol. 3B]]. A significant portion of the system statistics that can be generated involve hardware counters. As the name implies, these are special registers that count the number of occurrences of specific events in the CPU. These counters are implemented through **Model Specific Registers** (MSR), control registers used by developers for debugging, tracing, monitoring, etc. Since these registers may be subject to changes from one iteration of a microarchitecture to the next, we will need to consult chapters 18 and 19 from [[https://​www.intel.com/​content/​www/​us/​en/​architecture-and-technology/​64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html|Intel 64 and IA-32 Architectures Developer'​s Manual: Vol. 3B]].
Line 37: Line 33:
 === Task B - Unlock RDPMC in ring3 === === Task B - Unlock RDPMC in ring3 ===
  
-This is pretty straightforwardAll you need to do is set the **Performance-Monitor Counter Enable** bit in [[https://​en.wikipedia.org/​wiki/​Control_register#​CR4|CR4]]. ​Naturallythis can't be done from ring3. As such, we provide ​a kernel module ​that does it for you (see //hack_cr4.c//.) When the module is loaded, it will set the aforementioned bitSimilarlywhen the module is unloaded, it will revert the change. Try compiling the module, loading and unloading it and finally, check the kernel message log to verify that it works.+Due to security considerations,​ reading the Performance Monitor Counters from userspace ​is normally not allowedThis is enforced at a hardware level via the **Performance-Monitor Counter Enable** bit in [[https://​en.wikipedia.org/​wiki/​Control_register#​CR4|CR4]]. 
 + 
 +Under normal circumstancesmodifying Control Registers ​from userspace is not possible and you would have to write a kernel module for this. However, the [[https://man.archlinux.org/man/perf_event_open.2#​perf_event_related_configuration_files|perf_event_open(man page]] documents a //sysfs// interface (i.e., ''/​sys/​bus/​event_source/​devices/​cpu/​rdpmc''​) that does this for us. 
 + 
 +Use the //sysfs// interface to revert the **RDPMC** access behavior ​to the pre-4.0 version. 
 + 
 +<​solution -hidden>
 <code bash> <code bash>
-make +echo 2 | sudo tee /​sys/​bus/​event_source/​devices/​cpu/​rdpmc
-sudo insmod hack_cr4.ko +
-$ sudo rmmod hack_cr4 +
-$ dmesg+
 </​code>​ </​code>​
- +</​solution>​
-Note: the module must remain loaded in the kernel in order to keep the bit set. If during Task E you get a segfault, the reason is that you (probably) unloaded the module and you no longer have permission to run the instruction in ring3. This does NOT invalidate your work in Tasks C and D; simply load the module once more.+
  
 === Task C - Configure IA32_PERF_GLOBAL_CTRL === === Task C - Configure IA32_PERF_GLOBAL_CTRL ===
Line 100: Line 98:
  
 For the next (and //final// task) we are going to monitor the number of L2 cache misses. Look for the **L2_RQSTS.MISS** event in table 19-3 or 19-11 (depending on CPU version id) in the Intel manual and set the last two bytes (the unit mask and event select) accordingly. If the operation is successful and the counters have started, you should start seeing non-zero values in the PMC0 register, increasing in subsequent reads. For the next (and //final// task) we are going to monitor the number of L2 cache misses. Look for the **L2_RQSTS.MISS** event in table 19-3 or 19-11 (depending on CPU version id) in the Intel manual and set the last two bytes (the unit mask and event select) accordingly. If the operation is successful and the counters have started, you should start seeing non-zero values in the PMC0 register, increasing in subsequent reads.
 +
 +<note tip>
 +An easier alternative to scouring through the Intel manuals would be to use [[https://​perfmon-events.intel.com/​platforms/​tigerlake/​core-events/​core/​|perfmon-events.intel.com]]. Get your CPU //"​model name"//​ from ''/​proc/​cpuinfo''​ and identify your microarchitecture based on the table below. Then search for the desired event in the appropriate section of the site.
 +
 +^ Generation ^  Microarchitecture (Core Codename) ​        ​^ ​ Release Year  ^  Typical CPU Numbers ​  ^
 +|    1st     | Nehalem / Westmere ​                        ​| ​  ​2008–2010 ​   | i3,5,7 3xx–9xx ​        |
 +|    2nd     | Sandy Bridge ​                              ​| ​    ​2011 ​      | i3,5,7 2xxx            |
 +|    3rd     | Ivy Bridge ​                                ​| ​    ​2012 ​      | i3,5,7 3xxx            |
 +|    4th     | Haswell ​                                   |     ​2013 ​      | i3,5,7 4xxx            |
 +|    5th     | Broadwell ​                                 |   ​2014–2015 ​   | i3,5,7 5xxx            |
 +|    6th     | Skylake ​                                   |     ​2015 ​      | i3,5,7 6xxx            |
 +|    7th     | Kaby Lake                                  |   ​2016–2017 ​   | i3,5,7 7xxx            |
 +|    8th     | Coffee Lake / Amber Lake / Whiskey Lake    |   ​2017–2018 ​   | i3,5,7 8xxx            |
 +|    9th     | Coffee Lake Refresh ​                       |   ​2018–2019 ​   | i3,5,7,9 9xxx          |
 +|   ​10th ​    | Comet Lake / Ice Lake / Tiger Lake         ​| ​  ​2019–2020 ​   | i3,5,7,9 10xxx         |
 +|   ​11th ​    | Rocket Lake / Tiger Lake                   ​| ​     2021      | i3,5,7,9 11xxx         |
 +|   ​12th ​    | Alder Lake                                 ​| ​  ​2021–2022 ​   | i3,5,7,9 12xxx         |
 +|   ​13th ​    | Raptor Lake                                |   ​2022–2023 ​   | i3,5,7,9 13xxx         |
 +|   ​14th ​    | Raptor Lake Refresh ​                       |   ​2023–2024 ​   | i3,5,7,9 14xxx         |
 +|     ​— ​     | Meteor Lake                                |   ​2023–2024 ​   | Core Ultra 5,7,9 1xx   |
 +|     ​— ​     | Arrow Lake / Lunar Lake                    |   ​2024–2025 ​   | Core Ultra 5,7,9 2xx   |
 +</​note>​
  
 <​solution -hidden> <​solution -hidden>
Line 140: Line 160:
 <code C> <code C>
 /* hardware counter init */ /* hardware counter init */
-rdpmc(ecx, eax, edx);+rdpmc(0, eax, edx);
 counter = ((uint64_t)eax) | (((uint64_t)edx) << 32); counter = ((uint64_t)eax) | (((uint64_t)edx) << 32);
  
Line 150: Line 170:
  
 /* hardware counter delta */ /* hardware counter delta */
-rdpmc(ecx, eax, edx);+rdpmc(0, eax, edx);
 counter = (((uint64_t)eax) | (((uint64_t)edx) << 32)) - counter; counter = (((uint64_t)eax) | (((uint64_t)edx) << 32)) - counter;
 </​code>​ </​code>​
ep/labs/03/contents/tasks/ex5.1746516439.txt.gz · Last modified: 2025/05/06 10:27 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0