Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03:contents:tasks:ex4 [2025/03/17 19:01]
radu.mantu
ep:labs:03:contents:tasks:ex4 [2025/03/18 00:49] (current)
radu.mantu
Line 31: Line 31:
  
 Remember, however, that this approach is not always desirable, for two reasons: Remember, however, that this approach is not always desirable, for two reasons:
-  - Even though this is just a comment, the ''​volatile'' ​modifier ​can pessimize optimization passes. As a result, the generated code may not correspond to what would normally be emitted.+  - Even though this is just a comment, the ''​volatile'' ​qualifier ​can pessimize optimization passes. As a result, the generated code may not correspond to what would normally be emitted.
   - Some code structures can not be included in the analysis region. For example, if you want to include the contents of a ''​for''​ loop, doing so by injecting assembly meta comments in C code will exclude the iterator increment and condition check (which are also executed on every iteration).   - Some code structures can not be included in the analysis region. For example, if you want to include the contents of a ''​for''​ loop, doing so by injecting assembly meta comments in C code will exclude the iterator increment and condition check (which are also executed on every iteration).
 </​note>​ </​note>​
Line 37: Line 37:
 === [10p] Task B - Analyzing the assembly code === === [10p] Task B - Analyzing the assembly code ===
  
-Use **llvm-mca** to inspect its expected throughput and "​pressure points"​ (check out [[https://​en.algorithmica.org/​hpc/​profiling/​mca/​|this example]].+Use **llvm-mca** to inspect its expected throughput and "​pressure points"​ (check out [[https://​en.algorithmica.org/​hpc/​profiling/​mca/​|this example]]).
  
 One important thing to remember is that **llvm-mca** does not simulate the //​behavior//​ of each instruction,​ but only the time required for it to execute. In other words, if you load an immediate value in a register via ''​mov rax, 0x1234'',​ the analyzer will not care //what// the instruction does (or what the value of ''​rax''​ even is), but how long it takes the CPU to do it. The implication is quite significant:​ **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''​for''​ loops or function calls. Instead, given the sequence of instructions,​ it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''​call''​ instructions will by passed over not even considering the cost of the associated ''​ret'',​ etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''​LLVM-MCA-*''​ markers and controlling the number of simulated iterations from the command line. One important thing to remember is that **llvm-mca** does not simulate the //​behavior//​ of each instruction,​ but only the time required for it to execute. In other words, if you load an immediate value in a register via ''​mov rax, 0x1234'',​ the analyzer will not care //what// the instruction does (or what the value of ''​rax''​ even is), but how long it takes the CPU to do it. The implication is quite significant:​ **llvm-mca** is incapable of analyzing complex sequences of code that contain conditional structures, such as ''​for''​ loops or function calls. Instead, given the sequence of instructions,​ it will pass through each of them one by one, ignoring their intended effect: conditional jump instructions will fall through, ''​call''​ instructions will by passed over not even considering the cost of the associated ''​ret'',​ etc. The closest we can come to analyzing a loop is by reducing the analysis scope via the aforementioned ''​LLVM-MCA-*''​ markers and controlling the number of simulated iterations from the command line.
Line 66: Line 66:
 === [10p] Task C - In-depth examination === === [10p] Task C - In-depth examination ===
  
-Now that you've got the hang of things, use the ''​-bottleneck-analysis''​ flag to identify contentious instruction sequences. Explain the reason to the best of your abilities. For example, the following two instructions display a register dependency because the ''​mov''​ instruction needs to wait for the ''​push''​ instruction to update the RSP register.+Now that you've got the hang of things, use the ''​-bottleneck-analysis''​ flag to identify contentious instruction sequences. 
 + 
 +Explain the reason to the best of your abilities. For example, the following two instructions display a register dependency because the ''​mov''​ instruction needs to wait for the ''​push''​ instruction to update the RSP register.
  
 <​code>​ <​code>​
Line 74: Line 76:
  
 How would you go about further optimizing this code? How would you go about further optimizing this code?
 +
 +<​note>​
 +Also look at the kernel'​s implementation of a [[https://​elixir.bootlin.com/​linux/​v6.13.7/​source/​arch/​x86/​include/​asm/​checksum_64.h#​L45|checksum calculation]] over the variable IP header.
 +</​note>​
  
 <​solution -hidden> <​solution -hidden>
 llvm-mca -bottleneck-analysis -timeline -iterations=10000 -all-stats csum.s llvm-mca -bottleneck-analysis -timeline -iterations=10000 -all-stats csum.s
 </​solution>​ </​solution>​
ep/labs/03/contents/tasks/ex4.1742230860.txt.gz · Last modified: 2025/03/17 19:01 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0