Show page

Differences

This shows you the differences between two versions of the page.

--- ep:labs:01:contents:tasks:ex4 [2023/10/08 22:17]
mihai.blacioti [04. [25p] LLVM]
+++ — (current)
@@ Line 1: / Line 1: @@
-==== 04. [25p] LLVM-MCA ====
-LLVM-MCA is a machine code analyzer that simulates the execution of an assembly code snippet on a particular microarchitecture by making use of data available to compilers. By doing so, it provides the latency and throughput of the aforemetioned block as well as various resources within the CPU.
-Given an assembly code, llvm-mca estimates the Instructions Per Cycle (IPC), as well as hardware resource pressure among other things. The analysis and reporting style were taken from the IACA tool provided by Intel.
-In other words, a machine code analyzer helps developers understand how an assembly code will run on a specific hardware, which in turn aids with optimizing code.
-<note>
-[[https://llvm.org/docs/CommandGuide/llvm-mca.html]]
-[[https://en.algorithmica.org/hpc/profiling/mca/]]
-</note>
-=== [5p] Task A - Preparing the input ===
-As previosuly mentioned, llvm-mca requires assembly code as input so start by preparing it from the source provided in the archive.
-<note tip>
-HINT: clang -S (https://clang.llvm.org/docs/CommandGuide/clang.html)
-</note>
-<solution -hidden>
-clang my_pow.c -masm=intel -S -o my_pow.S
-</solution>
-=== [10p] Task B - Analyzing the assembly code ===
-After disassemblying the code use the tool to inspect its performance. Note that you can add comments in the assembly code (# LLVM-MCA-BEGIN & # LLVM-MCA-END) to determine a region of interest. In this particular scenario, we are dealing with a simple loop so prioritizing that area of code is adequate.
-Because of the loop, jumps in the assembly code are bound to be present.  Llvm-mca goes through all the instructions sequentially whilst simulating the duration and the resource access thus it does not care about the effects they produce. To put it differently, when it encounters a jump it will resort to a fallthrough.
-To solve this issue, you can set the number of iterations from the command line, so its behaviour can resemble an actual loop.
-It must be acknowledged that in regards to real hardware, llvm-mca has a significant error, but for conducting analysis it is still a useful tool. [[https://dspace.mit.edu/bitstream/handle/1721.1/128755/ithemal-measurement.pdf?sequence=2&isAllowed=y]]
-<note>
-Note: Besides the default Instruction Info, information about the Scheduler Ports [[https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)]] is present.
-[[https://www.reddit.com/r/intel/comments/pzix4a/what_do_the_ports_of_the_intel_processor_refer_to/]]
-A very short description of each port's main usage:
-  - Port0 & Port1 -> arithmetic instructions
-  - Port2 & Port3 -> load operations, AGU (address generation unit)
-  - Port4 -> store operations, AGU
-  - Port5 -> vector operations
-  - Port6 -> integer and branch operations
-  - Port7 -> AGU
-Those are the ports for the Skylake microarchitecture.
-[[https://en.wikipedia.org/wiki/Skylake_(microarchitecture)]]
-</note>
-<solution -hidden>
-llvm-mca -march=x86-64 my_pow.S
-Contents of my_pow.S with # LLVM-MCA-BEGIN and END tags:
-POZAPOZAPOZA
-</solution>
-=== [10p] Task C - In-depth examination ===
-After getting the hang of working with llvm-mca try adding command line options such as -bottleneck-analysis and changing the iterations count for a more thorough investigation.
-<note>
-The bottleneck argument provides information about throughput inefficiencies.
-</note>
-<solution -hidden>
-llvm-mca -march=x86-64 -bottleneck-analysis -timeline -iterations=10000 -all-stats my_pow.S
-Or any other tags from the documentation as long as they explain what they do
-</solution>

General Information

Lectures

Labs

Assignments

Archived Labs

ep/labs/01/contents/tasks/ex4.1696792655.txt.gz · Last modified: 2023/10/08 22:17 by mihai.blacioti

Show page Old revisions

Media Manager Back to top