This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex2 [2020/08/03 16:31] cristian.marin0805 |
ep:labs:03:contents:tasks:ex2 [2025/03/17 20:59] (current) radu.mantu [02. [30p] Mpstat] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 02. [10p] Iostat ==== | + | ==== 02. [30p] Mpstat ==== |
- | <note tip> | + | === [10p] Task A - Python recursion depth === |
- | Parameteres for iostat: | + | Try to run the script while passing 1000 as a command line argument. Why does it crash? |
- | * -x for extended statistics | + | |
- | * -d to display device stastistics only | + | |
- | * -m for displaying r/w in MB/s | + | |
- | <code bash> | + | |
- | $ iostat -xdm | + | |
- | </code> | + | |
- | Use iostat with -p for specific device statistics: | + | |
- | <code bash> | + | |
- | $ iostat -xdm -p sda | + | |
- | </code> | + | |
- | </note> | + | |
- | === [10p] Task A - Monitoring the behaviour === | + | Luckily, python allows you to both retrieve the current recursion limit //and// set a new value for it. Increase the recursion limit so that the process will never crash, regardless of input (assume that it still has a reasonable upper bound). |
- | * Run //iostat -x 1 5//. | + | |
- | * Considering the last two outputs provided by the previous command, calculate **the efficiency of IOPS** for each of them. Does the amount of data written per I/O **increase** or **decrease**? | + | |
- | <note> | + | <solution -hidden> |
- | How to do: | + | <code python> |
+ | import sys | ||
- | * Divide the kilobytes read (//rkB/s//) and written (//wkB/s//) per second by the reads per second (//r/s//) and the writes per second (//w/s//). | + | N = int(sys.argv[1]) |
- | * If you happen to have quite a few [[https://en.wikipedia.org/wiki/Loop_device|loop devices]] in your **iostat** output, find out what they are exactly: | + | |
- | <code bash> | + | sys.getrecursionlimit() |
- | $ df -kh /dev/loop* | + | sys.setrecursionlimit(N) |
</code> | </code> | ||
- | </note> | + | </solution> |
+ | === [10p] Task B - CPU affinity === | ||
+ | Run the script again, this time passing 10000. Use **mpstat** to monitor the load on each //individual// CPU at 1s intervals. The one with close to 100% load will be the one running our script. Note that the process might be passed around from one core to another. | ||
- | <solution -hidden> | + | Stop the process. Use **stress** to create N-1 CPU workers, where N is the number of cores on your system. Use **taskset** to set the CPU affinity of the N-1 workers to CPUs 1-(N-1) and then run the script again. You should notice that the process is scheduled on cpu0. |
- | The way to calculate the efficiency of IOPS is to divide the reads per second //(r/s)// and writes per second //(w/s)// by the kilobytes read //(rkB/s)// and written //(wkB/s)// per second. | + | |
- | Example: the amount of data written per I/O for ///dev/sda// increases during each iteration: | + | **Note**: to get the best performance when running a process, make sure that it stays on the same core for as long as possible. Don't let the scheduler decide this for you, if you can help it. Allowing it to bounce your process between cores can drastically impact the efficient use of the cache and the TLB. This holds especially true when you are working with servers rather than your personal PCs. While the problem may not manifest on a system with only 4 cores, you can't guarantee that it also won't manifest on one with 40 cores. When running several experiments in parallel, aim for something like this: |
- | {{ :ep:labs:ep2017_l3_ex01.png?700 |}} | + | {{:ep:labs:01:contents:tasks:affinity_good.png?720|}} |
+ | <html><center> | ||
+ | <b>Figure 1:</b> <b>htop</b> output. Processes are bound to specific cores, increasing performance by not potentially invalidating L1 and L2 caches. This works out well since we have fewer active processes than available cores. Otherwise, setting the affinity to a single core may backfire; the rescheduling of these processes could be delayed until other processes are also allocated a time slice. We notice that CPU usage on these cores is maxed (green:user space, red:kernel space). The ratio tells us that a considerable amount of time is spent in kernel space, leading us to believe that the processes are I/O bound. | ||
+ | </center></html> | ||
- | <code> | + | <solution -hidden> |
- | 53040/105 = 505KB per I/O | + | |
- | 71152/102 = 697KB per I/O | + | Start N-1 worker threads on cpu[1] - cpu[N-1]. Leave cpu[0] unused for when we run the script. |
+ | |||
+ | <code bash> | ||
+ | $ taskset 0xfe stress -c $(( $(nproc) - 1 )) | ||
</code> | </code> | ||
+ | </solution> | ||
- | If everything is zero in iostat - perform some I/O operations... | + | === [10p] Task C - USO flashbacks (2) === |
+ | |||
+ | Write a bash command that binds CPU **stress** workers on your odd-numbered cores (i.e.: 1,3,5,...). The list of cores and the number of stress workers must NOT be hardcoded, but constructed based on **nproc** (or whatever else you fancy). \\ | ||
+ | In your submission, include both the bash command and a **mpstat** capture to prove that the command is working. | ||
+ | |||
+ | <solution -hidden> | ||
+ | <code bash> | ||
+ | $ cpu_list="$(seq 1 2 $(nproc) | tr '\n' ',')"; taskset -c ${cpu_list::-1} stress -c $(($(nproc) / 2)) | ||
+ | $ mpstat -P ALL | ||
+ | </code> | ||
</solution> | </solution> | ||
+ |