This is an old revision of the document!
Streaming SIMD Extensions (SSE) is an x86 instruction set extension that focuses on vectorized operations. By packing data in 128-bit registers, the CPU is able to perform the same instruction on multiple data.
In order to take advantage of this feature, we will use gcc intrinsics. Intrinsics are built-in functions that the compiler is intimately familiar with and can use in building highly optimized machine code. In fact, gcc supports two sets of built-in functions for SIMD: one native and one defined by Intel. In this lab we are going to use the latter since there is much more documentation available. Particularly, we will consult the Intel Intrinsics Guide.
As we mentioned before, data is packed in 128-bit registers. However, more than one data type can be packed and this is reflected in both declared data types and in instructions. Some example of data types:
The function naming convention is
_mm_<intrinsic_operation>_<suffix>. For example:
Starting from the files in sse.zip, implement sqrt(x[]) / y[] using SSE intrinsics. How does the execution time compare to that of the normal implementation? Note that the data must be loaded from the x and y buffers into the 128-bit registers and the answer stored back to a buffer.
Answer the following questions: