_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

There are different intrinsics because they correspond to different instructions.

There are different load instructions because Intel wants to maintain the freedom to design a processor on which double-precision vectors are backed by a different physical register file than are single-precision vectors or integer vectors, or use different execution units. Any of these might add additional latency if there were not a way to specify that data should be loaded into the appropriate register file or forwarding network.

One way to think about it is that the different instructions do the "same thing", but additionally provide a hint to the processor telling it how the data that is being loaded will be used by future instructions. This may help the processor make sure that the data is in the right place to be used as efficiently as possible, or it may be ignored by the processor.

Note that this isn't just a hypothetical. There exist processors on which using an integer vector load (MOVDQA) to load data that is consumed by a floating-point operation requires more time than using a floating-point load to get data for a floating-point operation (and vice-versa). See the Intel Optimization Manual, or Agner Fog's notes for more detail on the subject. Use the load that matches how you will use the data to avoid the risk of such performance hazards in the future.

_mm_load_ps loads 4 single precision floating point values

_mm_load_pd loads 2 double precision floating point values

These do different things, so I think it just makes sense to have different functions. Also, in C, there's no overloading.

_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

Tags:

C

X86

Sse

Intel

Simd

Related

Recent Posts