Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03 [2025/03/17 18:27]
radu.mantu
ep:labs:03 [2025/03/18 00:08] (current)
radu.mantu
Line 20: Line 20:
 When done, export the document as a //pdf// and upload in the appropriate assignment on [[https://​curs.upb.ro/​2022/​course/​view.php?​id=5113#​section-2|moodle]]. The deadline is 23:55 on Friday. When done, export the document as a //pdf// and upload in the appropriate assignment on [[https://​curs.upb.ro/​2022/​course/​view.php?​id=5113#​section-2|moodle]]. The deadline is 23:55 on Friday.
 ===== Introduction ===== ===== Introduction =====
 +
 +<​spoiler>​
  
 Performance Monitoring is the process of checking a set of metrics in order to ascertain the health of the system. Normally, the information gleaned from these metrics is in turn used to fine tune the system in order to maximize its performance. As you may imagine, both acquiring and interpreting this data requires at least //some// knowledge of the underlying operating system. Performance Monitoring is the process of checking a set of metrics in order to ascertain the health of the system. Normally, the information gleaned from these metrics is in turn used to fine tune the system in order to maximize its performance. As you may imagine, both acquiring and interpreting this data requires at least //some// knowledge of the underlying operating system.
Line 27: Line 29:
 When dealing strictly with the CPU, these are a few things to look out for: When dealing strictly with the CPU, these are a few things to look out for:
  
-== Context Switches ​==+**Context Switches**
  
 A context switch is a transition from one runtime environment to another. One example would be performing a privileged call to kernel space via a system call, then returning from it. When this happens, a copy of your register state must be stored, for obvious reasons. This operation takes some time A context switch is a transition from one runtime environment to another. One example would be performing a privileged call to kernel space via a system call, then returning from it. When this happens, a copy of your register state must be stored, for obvious reasons. This operation takes some time
Line 37: Line 39:
 The takeaway is that some context switches are more expensive than others. Not being able to schedule a process to a single core 100% of the time comes with a huge cost (flushing the TLB). This being said, context switches from user space to kernel space are still expensive operations. As Terry Davis once demonstrated in his Temple OS, running everything at the same privilege level can reduce the cost of context switches by orders of magnitude. The takeaway is that some context switches are more expensive than others. Not being able to schedule a process to a single core 100% of the time comes with a huge cost (flushing the TLB). This being said, context switches from user space to kernel space are still expensive operations. As Terry Davis once demonstrated in his Temple OS, running everything at the same privilege level can reduce the cost of context switches by orders of magnitude.
  
-== CPU Utilization ​==+**CPU Utilization**
  
 Each process is given a time slice for it to utilize however it sees fit. The way that time is utilized can prove to be a meaningful metric. There are two ways that we can look at this data: system level or process level. Each process is given a time slice for it to utilize however it sees fit. The way that time is utilized can prove to be a meaningful metric. There are two ways that we can look at this data: system level or process level.
Line 57: Line 59:
 Although you may find many tools that offer similar information,​ remember that these files are the origin. Another thing to keep in mind is that this data is representative for the entire session, i.e.: from system boot or from process launch. If you want to interpret it in a meaningful manner, you need to get two data points and know the time interval between their acquisition. Although you may find many tools that offer similar information,​ remember that these files are the origin. Another thing to keep in mind is that this data is representative for the entire session, i.e.: from system boot or from process launch. If you want to interpret it in a meaningful manner, you need to get two data points and know the time interval between their acquisition.
  
-== Scheduling ​==+**Scheduling**
  
 When a CPU frees up, the kernel must decide which process gets to run next. To this end, it uses the [[https://​www.kernel.org/​doc/​html/​v5.7/​scheduler/​sched-design-CFS.html|Completely Fair Scheduler (CFS)]]. Normally, we don't question the validity of the scheduler'​s design. That's a few levels above our paygrade. What we can do, is adjust the value of ''/​proc/​sys/​kernel/​sched_min_granularity_ns''​. This virtual file contains the minimum amount of nanoseconds that a task is allocated when scheduled on the CPU. A lower value guarantees that each process will be scheduled sooner rather than later, which is a good trait of a real-time system (e.g.: Android -- you don't want unresponsive menus). A greater value, however, is better when you are doing batch processing (e.g.: rendering a video). We noted previously that switching active processes on a CPU core is an expensive operation. Thus, allowing each process to run for longer will reduce the CPU dead time in the long run. When a CPU frees up, the kernel must decide which process gets to run next. To this end, it uses the [[https://​www.kernel.org/​doc/​html/​v5.7/​scheduler/​sched-design-CFS.html|Completely Fair Scheduler (CFS)]]. Normally, we don't question the validity of the scheduler'​s design. That's a few levels above our paygrade. What we can do, is adjust the value of ''/​proc/​sys/​kernel/​sched_min_granularity_ns''​. This virtual file contains the minimum amount of nanoseconds that a task is allocated when scheduled on the CPU. A lower value guarantees that each process will be scheduled sooner rather than later, which is a good trait of a real-time system (e.g.: Android -- you don't want unresponsive menus). A greater value, however, is better when you are doing batch processing (e.g.: rendering a video). We noted previously that switching active processes on a CPU core is an expensive operation. Thus, allowing each process to run for longer will reduce the CPU dead time in the long run.
Line 64: Line 66:
  
 At the moment, CFS likes to spread out the tasks to all cores. Of course, each process has the right to choose the cores it's comfortable to run on (more on this in the exercises section). Another reason why this may be preferable that we haven'​t mentioned before is not invalidating the CPU cache. L1 and L2 caches are specific to each physical core. L3 is accessible to all cores. However. L1 and L2 have an access time of 1-10ns, while L3 can go as high as 30ns. If you have some time, read a bit about [[https://​www.phoronix.com/​news/​Nest-Linux-Scheduling-Warm-Core|Nest]],​ a newly proposed scheduler that aims to keep scheduled tasks on "warm cores" until it becomes necessary to power up idle cores as well. Can you come up with situations when Nest may be better or worse than CFS? At the moment, CFS likes to spread out the tasks to all cores. Of course, each process has the right to choose the cores it's comfortable to run on (more on this in the exercises section). Another reason why this may be preferable that we haven'​t mentioned before is not invalidating the CPU cache. L1 and L2 caches are specific to each physical core. L3 is accessible to all cores. However. L1 and L2 have an access time of 1-10ns, while L3 can go as high as 30ns. If you have some time, read a bit about [[https://​www.phoronix.com/​news/​Nest-Linux-Scheduling-Warm-Core|Nest]],​ a newly proposed scheduler that aims to keep scheduled tasks on "warm cores" until it becomes necessary to power up idle cores as well. Can you come up with situations when Nest may be better or worse than CFS?
 +
 +</​spoiler>​
  
 ===== Tasks ===== ===== Tasks =====
ep/labs/03.1742228863.txt.gz ยท Last modified: 2025/03/17 18:27 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0