This shows you the differences between two versions of the page.
| ep:labs:01:contents:tasks:ex2 [2021/10/13 01:12] radu.mantu | ep:labs:01:contents:tasks:ex2 [2025/02/11 23:17] (current) cezar.craciunoiu created | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ==== 02. [30p] Mpstat ==== | + | ==== 06. [10p] Feedback ==== | 
| - | Open {{:ep:labs:fact_rcrs.zip|fact_rcrs.zip}} and look at the code. | + | |
| - | + | ||
| - | === [10p] Task A - Python recursion depth === | + | |
| - | Try to run the script while passing 1000 as a command line argument. Why does it crash? | + | |
| - | + | ||
| - | Luckily, python allows you to both retrieve the current recursion limit //and// set a new value for it. Increase the recursion limit so that the process will never crash, regardless of input (assume that it still has a reasonable upper bound). | + | |
| - | + | ||
| - | <solution -hidden> | + | |
| - | <code python> | + | |
| - | import sys | + | |
| - | + | ||
| - | N = int(sys.argv[1]) | + | |
| - | + | ||
| - | sys.getrecursionlimit() | + | |
| - | sys.setrecursionlimit(N) | + | |
| - | </code> | + | |
| - | </solution> | + | |
| - | + | ||
| - | === [10p] Task B - CPU affinity === | + | |
| - | Run the script again, this time passing 10000. Use **mpstat** to monitor the load on each //individual// CPU at 1s intervals. The one with close to 100% load will be the one running our script. Note that the process might be passed around from one core to another. | + | |
| - | + | ||
| - | Stop the process. Use **stress** to create N-1 CPU workers, where N is the number of cores on your system. Use **taskset** to set the CPU affinity of the N-1 workers to CPUs 1-(N-1) and then run the script again. You should notice that the process is scheduled on cpu0. | + | |
| - | + | ||
| - | **Note**: to get the best performance when running a process, make sure that it stays on the same core for as long as possible. Don't let the scheduler decide this for you, if you can help it. Allowing it to bounce your process between cores can drastically impact the efficient use of the cache and the TLB. This holds especially true when you are working with servers rather than your personal PCs. While the problem may not manifest on a system with only 4 cores, you can't guarantee that it also won't manifest on one with 40 cores. When running several experiments in parallel, aim for something like this: | + | |
| - | + | ||
| - | <spoiler> | + | |
| - | {{:ep:labs:01:contents:tasks:affinity_good.png?800|}} | + | |
| - | </spoiler> | + | |
| - | + | ||
| - | <solution -hidden> | + | |
| - | + | ||
| - | Start N-1 worker threads on cpu[1] - cpu[N-1]. Leave cpu[0] unused for when we run the script. | + | |
| - | + | ||
| - | <code bash> | + | |
| - | $ taskset 0xfe stress -c $(( $(nproc) - 1 )) | + | |
| - | </code> | + | |
| - | </solution> | + | |
| - | + | ||
| - | === [10p] Task C - USO flashbacks (2) === | + | |
| - | + | ||
| - | Write a bash command that binds CPU **stress** workers on your odd-numbered cores (i.e.: 1,3,5,...). The list of cores and the number of stress workers must NOT be hardcoded, but constructed based on **nproc** (or whatever else you fancy). \\ | + | |
| - | In your submission, include both the bash command and a **mpstat** capture to prove that the command is working. | + | |
| - | + | ||
| - | <solution -hidden> | + | |
| - | <code bash> | + | |
| - | $ cpu_list="$(seq 1 2 $(nproc) | tr '\n' ',')"; taskset -c ${cpu_list::-1} stress -c $(($(nproc) / 2)) | + | |
| - | $ mpstat -P ALL | + | |
| - | </code> | + | |
| - | </solution> | + | |
| + | Please take a minute to fill in the **[[https://forms.gle/NpSRnoEh9NLYowFr5 | feedback form]]** for this lab. | ||