Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03:contents:tasks:ex4 [2025/03/17 20:49]
radu.mantu
ep:labs:03:contents:tasks:ex4 [2025/03/18 00:49] (current)
radu.mantu
Line 37: Line 37:
 === [10p] Task B - Analyzing the assembly code === === [10p] Task B - Analyzing the assembly code ===
  
-Use **llvm-mca** to inspect its expected throughput and "​pressure points"​ (check out [[https://​en.algorithmica.org/​hpc/​profiling/​mca/​|this example]].+Use **llvm-mca** to inspect its expected throughput and "​pressure points"​ (check out [[https://​en.algorithmica.org/​hpc/​profiling/​mca/​|this example]]).
  
 One important thing to remember is that **llvm-mca** does not simulate the //​behavior//​ of each instruction,​ but only the time required for it to execute. In other words, if you load an immediate value in a register via ''​mov rax, 0x1234'',​ the analyzer will not care //what// the instruction does (or what the value of ''​rax''​ even is), but how long it takes the CPU to do it. The implication is quite significant:​ **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''​for''​ loops or function calls. Instead, given the sequence of instructions,​ it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''​call''​ instructions will by passed over not even considering the cost of the associated ''​ret'',​ etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''​LLVM-MCA-*''​ markers and controlling the number of simulated iterations from the command line. One important thing to remember is that **llvm-mca** does not simulate the //​behavior//​ of each instruction,​ but only the time required for it to execute. In other words, if you load an immediate value in a register via ''​mov rax, 0x1234'',​ the analyzer will not care //what// the instruction does (or what the value of ''​rax''​ even is), but how long it takes the CPU to do it. The implication is quite significant:​ **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''​for''​ loops or function calls. Instead, given the sequence of instructions,​ it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''​call''​ instructions will by passed over not even considering the cost of the associated ''​ret'',​ etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''​LLVM-MCA-*''​ markers and controlling the number of simulated iterations from the command line.
Line 78: Line 78:
  
 <​note>​ <​note>​
-Also look at the kernel'​s implementation of a [[https://github.com/oracle/bpftune/blob/main/src/netns_tuner.c#L92|checksum calculation]] over the variable IP header.+Also look at the kernel'​s implementation of a [[https://elixir.bootlin.com/linux/​v6.13.7/​source/arch/x86/include/asm/checksum_64.h#L45|checksum calculation]] over the variable IP header.
 </​note>​ </​note>​
  
ep/labs/03/contents/tasks/ex4.1742237340.txt.gz ยท Last modified: 2025/03/17 20:49 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0