This is an old revision of the document!


04. [20p] SSE & gcc intrinsics

Streaming SIMD Extensions (SSE) is an x86 instruction set extension that focuses on vectorized operations. By packing data in 128-bit registers, the CPU is able to perform the same instruction on multiple data.

In order to take advantage of this feature, we will use gcc intrinsics. Intrinsics are built-in functions that the compiler is intimately familiar with and can use in building highly optimized machine code. In fact, gcc supports two sets of built-in functions for SIMD: one native and one defined by Intel. In this lab we are going to use the latter since there is much more documentation available. Particularly, we will consult the Intel Intrinsics Guide.

As we mentioned before, data is packed in 128-bit registers. However, more than one data type can be packed and this is reflected in both declared data types and in instructions. Some example of data types:

  • __m128 : can hold 4 32-bit values
  • __m128i : specially used for integers
  • __m128d : specially used for single/double floating point values

The function naming convention is
_mm_<intrinsic_operation>_<suffix>. For example:

  • __m128 _mm_add_ps (__m128 a, __m128 b)
    • add : addition operation
    • ps : packed single precision (4 floats of 4 bytes each)

[15p] Task A - Implementation

Starting from the files in sse.zip, implement sqrt(x[]) / y[] using SSE intrinsics. How does the execution time compare to that of the normal implementation? Note that the data must be loaded from the x and y buffers into the 128-bit registers and the answer stored back to a buffer.

[5p] Task B - Questions

Answer the following questions:

  1. What functions would you use to load/store the data, were the buffers not 16-byte aligned? Would it matter?
  2. What registers are used in the code that you wrote? Where else do you usually encounter them? (Hint: objdump, code is compiled with -g)
  3. How could you further optimize the division by y[]?
ep/labs/01/contents/ex4.1568812751.txt.gz ยท Last modified: 2019/09/18 16:19 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0