This is an old revision of the document!


04. [25p] LLVM-MCA

LLVM-MCA is a machine code analyzer that simulates the execution of an assembly code snippet on a particular microarchitecture by making use of data available to compilers. By doing so, it provides the latency and throughput of the aforemetioned block as well as various resources within the CPU.

Given an assembly code, llvm-mca estimates the Instructions Per Cycle (IPC), as well as hardware resource pressure among other things. The analysis and reporting style were taken from the IACA tool provided by Intel.

In other words, a machine code analyzer helps developers understand how an assembly code will run on a specific hardware, which in turn aids with optimizing code.

[5p] Task A - Preparing the input

As previosuly mentioned, llvm-mca requires assembly code as input so start by preparing it from the source provided in the archive.

[10p] Task B - Analyzing the assembly code

After disassemblying the code use the tool to inspect its performance. Note that you can add comments in the assembly code (# LLVM-MCA-BEGIN & # LLVM-MCA-END) to determine a region of interest. In this particular scenario, we are dealing with a simple loop so prioritizing that area of code is adequate.

Because of the loop, jumps in the assembly code are bound to be present. Llvm-mca goes through all the instructions sequentially whilst simulating the duration and the resource access thus it does not care about the effects they produce. To put it differently, when it encounters a jump it will resort to a fallthrough.

To solve this issue, you can set the number of iterations from the command line, so its behaviour can resemble an actual loop.

It must be acknowledged that in regards to real hardware, llvm-mca has a significant error, but for conducting analysis it is still a useful tool. https://dspace.mit.edu/bitstream/handle/1721.1/128755/ithemal-measurement.pdf?sequence=2&isAllowed=y

Note: Besides the default Instruction Info, information about the Scheduler Ports https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client) is present.

https://www.reddit.com/r/intel/comments/pzix4a/what_do_the_ports_of_the_intel_processor_refer_to/

A very short description of each port's main usage:

  1. Port0 & Port1 → arithmetic instructions
  2. Port2 & Port3 → load operations, AGU (address generation unit)
  3. Port4 → store operations, AGU
  4. Port5 → vector operations
  5. Port6 → integer and branch operations
  6. Port7 → AGU

Those are the ports for the Skylake microarchitecture.

https://en.wikipedia.org/wiki/Skylake_(microarchitecture)

[10p] Task C - In-depth examination

After getting the hang of working with llvm-mca try adding command line options such as -bottleneck-analysis and changing the iterations count for a more thorough investigation.

The bottleneck argument provides information about throughput inefficiencies.

ep/labs/01/contents/tasks/ex4.1696792908.txt.gz · Last modified: 2023/10/08 22:21 by mihai.blacioti
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0