This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex2 [2021/10/23 00:47] radu.mantu [02. [30p] Io{stat,top}] |
ep:labs:03:contents:tasks:ex2 [2025/03/17 20:59] (current) radu.mantu [02. [30p] Mpstat] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 02. [30p] iostat & iotop ==== | + | ==== 02. [30p] Mpstat ==== |
- | === [15p] Task A - Monitoring the behaviour with Iostat === | + | === [10p] Task A - Python recursion depth === |
- | <note tip> | + | Try to run the script while passing 1000 as a command line argument. Why does it crash? |
- | Parameteres for iostat: | + | |
- | * -x for extended statistics | + | |
- | * -d to display device stastistics only | + | |
- | * -m for displaying r/w in MB/s | + | |
- | <code bash> | + | |
- | $ iostat -xdm | + | |
- | </code> | + | |
- | Use iostat with -p for specific device statistics: | + | |
- | <code bash> | + | |
- | $ iostat -xdm -p sda | + | |
- | </code> | + | |
- | </note> | + | |
- | + | ||
- | * Run //iostat -x 1 5//. | + | |
- | * Considering the last two outputs provided by the previous command, calculate **the efficiency of IOPS** for each of them. Does the amount of data written per I/O **increase** or **decrease**? | + | |
- | + | ||
- | Add in your archive screenshot or pictures of the operations and the result you obtained, also showing the output of iostat from which you took the values. | + | |
- | + | ||
- | <note> | + | |
- | How to do: | + | |
- | + | ||
- | * Divide the kilobytes read (//rkB/s//) and written (//wkB/s//) per second by the reads per second (//r/s//) and the writes per second (//w/s//). | + | |
- | * If you happen to have quite a few [[https://en.wikipedia.org/wiki/Loop_device|loop devices]] in your **iostat** output, find out what they are exactly: | + | |
- | + | ||
- | <code bash> | + | |
- | $ df -kh /dev/loop* | + | |
- | </code> | + | |
- | </note> | + | |
+ | Luckily, python allows you to both retrieve the current recursion limit //and// set a new value for it. Increase the recursion limit so that the process will never crash, regardless of input (assume that it still has a reasonable upper bound). | ||
<solution -hidden> | <solution -hidden> | ||
- | The way to calculate the efficiency of IOPS is to divide the reads per second //(r/s)// and writes per second //(w/s)// by the kilobytes read //(rkB/s)// and written //(wkB/s)// per second. | + | <code python> |
+ | import sys | ||
- | Example: the amount of data written per I/O for ///dev/sda// increases during each iteration: | + | N = int(sys.argv[1]) |
- | {{ :ep:labs:ep2017_l3_ex01.png?700 |}} | + | sys.getrecursionlimit() |
- | + | sys.setrecursionlimit(N) | |
- | <code> | + | |
- | 53040/105 = 505KB per I/O | + | |
- | 71152/102 = 697KB per I/O | + | |
</code> | </code> | ||
- | |||
- | If everything is zero in iostat - perform some I/O operations... | ||
</solution> | </solution> | ||
+ | === [10p] Task B - CPU affinity === | ||
+ | Run the script again, this time passing 10000. Use **mpstat** to monitor the load on each //individual// CPU at 1s intervals. The one with close to 100% load will be the one running our script. Note that the process might be passed around from one core to another. | ||
- | === [15p] Task B - Monitoring the behaviour with Iotop === | + | Stop the process. Use **stress** to create N-1 CPU workers, where N is the number of cores on your system. Use **taskset** to set the CPU affinity of the N-1 workers to CPUs 1-(N-1) and then run the script again. You should notice that the process is scheduled on cpu0. |
- | <note tip> | + | |
- | **Iotop** is an utility similar to top command, that interfaces with the kernel to provide per-thread/process I/O usage statistics. | + | |
- | <code> | + | **Note**: to get the best performance when running a process, make sure that it stays on the same core for as long as possible. Don't let the scheduler decide this for you, if you can help it. Allowing it to bounce your process between cores can drastically impact the efficient use of the cache and the TLB. This holds especially true when you are working with servers rather than your personal PCs. While the problem may not manifest on a system with only 4 cores, you can't guarantee that it also won't manifest on one with 40 cores. When running several experiments in parallel, aim for something like this: |
- | Debian/Ubuntu Linux install iotop | + | |
- | $ sudo apt-get install iotop | + | |
- | How to use iotop command | + | {{:ep:labs:01:contents:tasks:affinity_good.png?720|}} |
- | $ sudo iotop OR $ iotop | + | <html><center> |
- | </code> | + | <b>Figure 1:</b> <b>htop</b> output. Processes are bound to specific cores, increasing performance by not potentially invalidating L1 and L2 caches. This works out well since we have fewer active processes than available cores. Otherwise, setting the affinity to a single core may backfire; the rescheduling of these processes could be delayed until other processes are also allocated a time slice. We notice that CPU usage on these cores is maxed (green:user space, red:kernel space). The ratio tells us that a considerable amount of time is spent in kernel space, leading us to believe that the processes are I/O bound. |
+ | </center></html> | ||
- | Supported options by iotop command: | + | <solution -hidden> |
- | | **Options** | **Description** ^^ | + | Start N-1 worker threads on cpu[1] - cpu[N-1]. Leave cpu[0] unused for when we run the script. |
- | | --version | show program’s version number and exit || | + | |
- | | -h, --help | show this help message and exit || | + | |
- | | -o, --only | only show processes or threads actually doing I/O || | + | |
- | | -b, --batch | non-interactive mode || | + | |
- | | -n NUM, --iter=NUM | number of iterations before ending [infinite] || | + | |
- | | -d SEC, --delay=SEC | delay between iterations [1 second] || | + | |
- | | -p PID, --pid=PID | processes/threads to monitor [all] || | + | |
- | | -u USER, --user=USER | users to monitor [all] || | + | |
- | | -P, --processes | only show processes, not all threads || | + | |
- | | -a, --accumulated | show accumulated I/O instead of bandwidth || | + | |
- | | -k, --kilobytes | use kilobytes instead of a human friendly unit || | + | |
- | | -t, --time | add a timestamp on each line (implies –batch) || | + | |
- | | -q, --quiet | suppress some lines of header (implies –batch) || | + | |
- | </note> | + | |
+ | <code bash> | ||
+ | $ taskset 0xfe stress -c $(( $(nproc) - 1 )) | ||
+ | </code> | ||
+ | </solution> | ||
- | * Run iotop (install it if you do not already have it) in a separate shell showing only processes or threads actually doing I/O. | + | === [10p] Task C - USO flashbacks (2) === |
- | * Inspect the script code ({{:ep:laboratoare:dummy.sh|dummy.sh}}) to see what it does. | + | |
- | * Monitor the behaviour of the system with iotop while running the script. | + | |
- | * Identify the PID and PPID of the process running the dummy script and kill the process using command line from another shell (sending SIGINT signal to both parent & child processes). | + | |
- | * Hint - [[https://superuser.com/questions/150117/how-to-get-parent-pid-of-a-given-process-in-gnu-linux-from-command-line|How to get parent PID of a given process in GNU/Linux from command line?]] | + | |
- | Provide a screenshot in which it shows the iotop with only the active processes and one of them being the running script. Then another screenshot after you succeeded to kill it. | + | Write a bash command that binds CPU **stress** workers on your odd-numbered cores (i.e.: 1,3,5,...). The list of cores and the number of stress workers must NOT be hardcoded, but constructed based on **nproc** (or whatever else you fancy). \\ |
+ | In your submission, include both the bash command and a **mpstat** capture to prove that the command is working. | ||
<solution -hidden> | <solution -hidden> | ||
- | {{:ep:laboratoare:lab3-ex4.png?600}} | + | <code bash> |
- | + | $ cpu_list="$(seq 1 2 $(nproc) | tr '\n' ',')"; taskset -c ${cpu_list::-1} stress -c $(($(nproc) / 2)) | |
- | Find PPID from PID: ps -o ppid= -p PID | + | $ mpstat -P ALL |
- | + | </code> | |
- | Send SIGINT signal: kill -SIGINT PID,PPID | + | |
</solution> | </solution> | ||
+ |