Most efficient way to check if all __m128i components are 0 [using <= SSE4.1 intrinsics]
You can use the PTEST instuction via the _mm_testz_si128 intrinsic (SSE4.1), like this:
#include "smmintrin.h" // SSE4.1 header
if (!_mm_testz_si128(xor, xor))
{
// rectangle has changed
}
Note that _mm_testz_si128
returns 1 if the bitwise AND
of the two arguments is zero.
Ironically, ptest
instruction from SSE 4.1 may be slower than pmovmskb
from SSE2 in some cases. I suggest using simply:
__m128i cmp = _mm_cmpeq_epi32(oldRect, newRect);
if (_mm_movemask_epi8(cmp) != 0xFFFF)
//registers are different
Note that if you really need that xor
value, you'll have to compute it separately.
For Intel processors like Ivy Bridge, the version by PaulR with xor
and _mm_testz_si128
translates into 4 uops, while suggested version without computing xor
translates into 3 uops (see also this thread). This may result in better throughput of my version.