What does ordered / unordered comparison mean?
short version: Unordered is a relation two FP values can have. Scalar compares set FLAGS so you can check any condition you want (e.g. ucomisd xmm0, xmm1
/ jp unordered
), but SIMD compares need to encode the condition (predicate) into the instruction to be checked in parallel to produce a vector with element values of 0 / 0xFF.... Nowhere to put a separate FLAGS result for each element.
The "Unordered" in FUCOM
means it doesn't raise an FP "invalid" exception when the comparison result is unordered, while FCOM
does. This is the same as the distinction between OQ and OS cmpps
predicates, not the "unordered" predicate. (See the "Signals
#IA on
QNAN" column in the cmppd
docs in Intel's asm manuals. (cmppd
is alphabetically first and has the more complete docs, vs. cmpps / cmpss/sd))
(FP exceptions are masked by default so they don't cause the CPU to trap to a hardware exception handler, just set sticky flags in MXCSR, or the legacy x87 status word for x87 instructions.)
ORD and UNORD are two choices of predicate for the cmppd
/ cmpps
/ cmpss
/ cmpsd
insns (full tables in the cmppd
entry which is alphabetically first). That html extract has readable table formatting, but Intel's official PDF original is somewhat better. (See the x86 tag wiki for links).
Two floating point operands are ordered with respect to each other if neither is NaN. They're unordered if either is NaN. i.e. ordered = (x>y) | (x==y) | (x<y);
. That's right, with floating point it's possible for none of those things to be true. For more Floating Point madness, see Bruce Dawson's excellent series of articles.
cmpps
takes a predicate and produces a vector of results, instead of doing a comparison between two scalars and setting flags so you can check any predicate you want after the fact. So it needs specific predicates for everything you can check.
The scalar equivalent is comiss
/ ucomiss
to set ZF/PF/CF from the FP comparison result (which works like the x87 compare instructions (see the last section of this answer), but on the low element of XMM regs).
To check for unordered, look at PF
. If the comparison is ordered, you can look at the other flags to see whether the operands were greater, equal, or less (using the same conditions as for unsigned integers, like jae
for Above or Equal).
The COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid operation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid numeric exception only if a source operand is an SNaN.
(SNaN is not naturally occurring; operations like sqrt(-1)
or inf - inf
will produce QNaN if exceptions are masked, else trap and not produce a result.)
Normally FP exceptions are masked, so this doesn't actually interrupt your program; it just sets the bit in the MXCSR which you can check later.
This is the same as O/UQ vs. O/US flavours of predicate for cmpps
/ vcmpps
. The AVX version of the cmp[ps][sd]
instructions have an expanded choice of predicate, so they needed a naming convention to keep track of them.
The O vs. U tells you whether the predicate is true when the operands are unordered.
The Q vs. S tells you whether #I will be raised if either operand is a Quiet NaN. #I will always be raised if either operand is a Signalling NaN, but those are not "naturally occurring". You don't get them as outputs from other operations, only by creating the bit pattern yourself (e.g. as an error-return value from a function, to ensure detection of problems later).
The x87 equivalent is using fcom
or fucom
to set the FPU status word -> fstsw ax
-> sahf
, or preferably fucomi
to set EFLAGS directly like ucomiss
.
The U / non-U distinction is the same with x87 instructions as for comiss
/ ucomiss
You may understand the meaning of 'ordered CC' and 'unordered CC' through llvm CC definition, where 'CC' means CondCode. In 'llvm/include/llvm/CodeGen/ISDOpcodes.h' (my source code version is llvm-10.0.1), you could see the enum of CondCode as below:
enum CondCode {
// Opcode N U L G E Intuitive operation
SETFALSE, // 0 0 0 0 Always false (always folded)
SETOEQ, // 0 0 0 1 True if ordered and equal
SETOGT, // 0 0 1 0 True if ordered and greater than
SETOGE, // 0 0 1 1 True if ordered and greater than or equal
SETOLT, // 0 1 0 0 True if ordered and less than
SETOLE, // 0 1 0 1 True if ordered and less than or equal
SETONE, // 0 1 1 0 True if ordered and operands are unequal
SETO, // 0 1 1 1 True if ordered (no nans)
SETUO, // 1 0 0 0 True if unordered: isnan(X) | isnan(Y)
SETUEQ, // 1 0 0 1 True if unordered or equal
SETUGT, // 1 0 1 0 True if unordered or greater than
SETUGE, // 1 0 1 1 True if unordered, greater than, or equal
SETULT, // 1 1 0 0 True if unordered or less than
SETULE, // 1 1 0 1 True if unordered, less than, or equal
SETUNE, // 1 1 1 0 True if unordered or not equal
SETTRUE, // 1 1 1 1 Always true (always folded)
// Don't care operations: undefined if the input is a nan.
SETFALSE2, // 1 X 0 0 0 Always false (always folded)
SETEQ, // 1 X 0 0 1 True if equal
SETGT, // 1 X 0 1 0 True if greater than
SETGE, // 1 X 0 1 1 True if greater than or equal
SETLT, // 1 X 1 0 0 True if less than
SETLE, // 1 X 1 0 1 True if less than or equal
SETNE, // 1 X 1 1 0 True if not equal
SETTRUE2, // 1 X 1 1 1 Always true (always folded)
SETCC_INVALID // Marker value.
};
That means: for floating-point condition comparision, 'ordered CC' means 'ordered & CC', while 'unordered CC' means ' unordered | CC'.
In another word, in floating-point comparison, where NaN is 'Not A Number',
- 'ordered CC' returns true if: 'both operands are not NaN' AND 'CC is true'
- 'unordered CC' returns true if: 'one or more operands are NaN' OR 'CC is true'
You can also see, that 'ordered CC' is definitely the opposite of 'unordered !CC'.
An ordered comparison checks if neither operand is NaN
. Conversely, an unordered comparison checks if either operand is a NaN
.
This page gives some more information on this:
- http://csapp.cs.cmu.edu/public/waside/waside-sse.pdf (section 5)
The idea here is that comparisons with NaN
are indeterminate. (can't decide the result) So an ordered/unordered comparison checks if this is (or isn't) the case.
double a = 0.;
double b = 0.;
__m128d x = _mm_set1_pd(a / b); // NaN
__m128d y = _mm_set1_pd(1.0); // 1.0
__m128d z = _mm_set1_pd(1.0); // 1.0
__m128d c0 = _mm_cmpord_pd(x,y); // NaN vs. 1.0
__m128d c1 = _mm_cmpunord_pd(x,y); // NaN vs. 1.0
__m128d c2 = _mm_cmpord_pd(y,z); // 1.0 vs. 1.0
__m128d c3 = _mm_cmpunord_pd(y,z); // 1.0 vs. 1.0
__m128d c4 = _mm_cmpord_pd(x,x); // NaN vs. NaN
__m128d c5 = _mm_cmpunord_pd(x,x); // NaN vs. NaN
cout << _mm_castpd_si128(c0).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c1).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c2).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c3).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c4).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c5).m128i_i64[0] << endl;
Result:
0
-1
-1
0
0
-1
Ordered return true if the operands are comparable (neither number is NaN):
- Ordered comparison of
1.0
and1.0
givestrue
. - Ordered comparison of
NaN
and1.0
givesfalse
. - Ordered comparison of
NaN
andNaN
givesfalse
.
Unordered comparison is the exact opposite:
- Unordered comparison of
1.0
and1.0
givesfalse
. - Unordered comparison of
NaN
and1.0
givestrue
. - Unordered comparison of
NaN
andNaN
givestrue
.
This Intel guide: http://intel80386.com/simd/mmx2-doc.html contains examples of the two which are fairly straight-forward:
CMPORDPS Compare Ordered Parallel Scalars
Opcode Cycles Instruction 0F C2 .. 07 2 (3) CMPORDPS xmm reg,xmm reg/mem128
CMPORDPS op1, op2
op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values
op1[0] = (op1[0] != NaN) && (op2[0] != NaN) op1[1] = (op1[1] != NaN) && (op2[1] != NaN) op1[2] = (op1[2] != NaN) && (op2[2] != NaN) op1[3] = (op1[3] != NaN) && (op2[3] != NaN) TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPUNORDPS Compare Unordered Parallel Scalars
Opcode Cycles Instruction 0F C2 .. 03 2 (3) CMPUNORDPS xmm reg,xmm reg/mem128
CMPUNORDPS op1, op2
op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values
op1[0] = (op1[0] == NaN) || (op2[0] == NaN) op1[1] = (op1[1] == NaN) || (op2[1] == NaN) op1[2] = (op1[2] == NaN) || (op2[2] == NaN) op1[3] = (op1[3] == NaN) || (op2[3] == NaN) TRUE = 0xFFFFFFFF FALSE = 0x00000000
The difference is AND (ordered) vs OR (unordered).