This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex4 [2025/03/17 20:49] radu.mantu |
ep:labs:03:contents:tasks:ex4 [2025/03/18 00:49] (current) radu.mantu |
||
---|---|---|---|
Line 37: | Line 37: | ||
=== [10p] Task B - Analyzing the assembly code === | === [10p] Task B - Analyzing the assembly code === | ||
- | Use **llvm-mca** to inspect its expected throughput and "pressure points" (check out [[https://en.algorithmica.org/hpc/profiling/mca/|this example]]. | + | Use **llvm-mca** to inspect its expected throughput and "pressure points" (check out [[https://en.algorithmica.org/hpc/profiling/mca/|this example]]). |
One important thing to remember is that **llvm-mca** does not simulate the //behavior// of each instruction, but only the time required for it to execute. In other words, if you load an immediate value in a register via ''mov rax, 0x1234'', the analyzer will not care //what// the instruction does (or what the value of ''rax'' even is), but how long it takes the CPU to do it. The implication is quite significant: **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''for'' loops or function calls. Instead, given the sequence of instructions, it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''call'' instructions will by passed over not even considering the cost of the associated ''ret'', etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''LLVM-MCA-*'' markers and controlling the number of simulated iterations from the command line. | One important thing to remember is that **llvm-mca** does not simulate the //behavior// of each instruction, but only the time required for it to execute. In other words, if you load an immediate value in a register via ''mov rax, 0x1234'', the analyzer will not care //what// the instruction does (or what the value of ''rax'' even is), but how long it takes the CPU to do it. The implication is quite significant: **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''for'' loops or function calls. Instead, given the sequence of instructions, it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''call'' instructions will by passed over not even considering the cost of the associated ''ret'', etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''LLVM-MCA-*'' markers and controlling the number of simulated iterations from the command line. | ||
Line 78: | Line 78: | ||
<note> | <note> | ||
- | Also look at the kernel's implementation of a [[https://github.com/oracle/bpftune/blob/main/src/netns_tuner.c#L92|checksum calculation]] over the variable IP header. | + | Also look at the kernel's implementation of a [[https://elixir.bootlin.com/linux/v6.13.7/source/arch/x86/include/asm/checksum_64.h#L45|checksum calculation]] over the variable IP header. |
</note> | </note> | ||