Do std::min(0.0, 1.0) and std::max(0.0, 1.0) yield undefined behavior?
In the new [concepts.equality], in a slightly different context, we have:
An expression is equality-preserving if, given equal inputs, the expression results in equal outputs. The inputs to an expression are the set of the expression's operands. The output of an expression is the expression's result and all operands modified by the expression.
Not all input values need be valid for a given expression; e.g., for integers
a
andb
, the expressiona / b
is not well-defined whenb
is0
. This does not preclude the expressiona / b
being equality-preserving. The domain of an expression is the set of input values for which the expression is required to be well-defined.
While this notion of the domain of an expression isn't completely expressed throughout the standard, this is the only reasonable intent: syntactic requirements are properties of the type, semantic requirements are properties of the actual values.
More generally, we also have [structure.requirements]/8:
Required operations of any concept defined in this document need not be total functions; that is, some arguments to a required operation may result in the required semantics failing to be satisfied. [ Example: The required
<
operator of theStrictTotallyOrdered
concept ([concept.stricttotallyordered]) does not meet the semantic requirements of that concept when operating onNaN
s. — end example ] This does not affect whether a type satisfies the concept.
This refers specifically to concepts, not to named requirements like Cpp17LessThanComparable, but this is the right spirit for understanding how the library is intended to work.
When Cpp17LessThanComparable gives the semantic requirement that
<
is a strict weak ordering relation (24.7)
The only way for this to be violated is to provide a pair of values which violate the requirements of a strict weak ordering. For a type like double
, that would be NaN
. min(1.0, NaN)
is undefined behavior - we're violating the semantic requirements of the algorithm. But for floating points without NaN
, <
is a strict weak ordering - so that's fine... you can use min
, max
, sort
, all you like.
Going forward, when we start writing algorithms that use operator<=>
, this notion of domain is one reason that expressing a syntactic requirement of ConvertibleTo<decltype(x <=> y), weak_ordering>
would be the wrong requirement. Having x <=> y
be partial_ordering
is fine, it's just seeing a pair of values for which x <=> y
is partial_ordering::unordered
is not (which at least we could diagnose, via [[ assert: (x <=> y) != partial_ordering::unordered ]];
)
Disclaimer: I do not know the full C++ standard, I did research a bit about what was said about floats. I do know about IEEE 754-2008 floating-point numbers and C++.
Yes, you are right, this is undefined behavior by the C++17 standard.
Short read:
The standard does not say that std::min(0.0, 1.0);
is undefined behavior, it says constexpr const double& min(const double& a, const double& b);
is undefined behavior. It means, it's not applying the function that is not defined, it's the function declaration itself that is undefined. As is the case mathematically : a minimum function isn't possible on the full range of IEEE 754 floating-point numbers, as you have noted.
But undefined behavior does not necessarily mean a crash or compiling error. It just means it is not defined by the C++ standard, and specifically says it may "behaving during translation or program execution in a documented manner characteristic of the environment"
Why you shouldn't use std::min
on doubles:
Because I realize the following long read section can get boring, here is a toy example of the risk of NaNs inside comparisons (I don't even try sorting algorithms…):
#include <iostream>
#include <cmath>
#include <algorithm>
int main(int, char**)
{
double one = 1.0, zero = 0.0, nan = std::nan("");
std::cout << "std::min(1.0, NaN) : " << std::min(one, nan) << std::endl;
std::cout << "std::min(NaN, 1.0) : " << std::min(nan, one) << std::endl;
std::cout << "std::min_element(1.0, 0.0, NaN) : " << std::min({one, zero, nan}) << std::endl;
std::cout << "std::min_element(NaN, 1.0, 0.0) : " << std::min({nan, one, zero}) << std::endl;
std::cout << "std::min(0.0, -0.0) : " << std::min(zero, -zero) << std::endl;
std::cout << "std::min(-0.0, 0.0) : " << std::min(-zero, zero) << std::endl;
}
When compiling on my macbookpro with Apple LLVM version 10.0.0 (clang-1000.10.44.4) (I make the precision, because, well, this is undefined behavior, so this may in theory have different results on other compilers) I get:
$ g++ --std=c++17 ./test.cpp
$ ./a.out
std::min(1.0, NaN) : 1
std::min(NaN, 1.0) : nan
std::min_element(1.0, 0.0, NaN) : 0
std::min_element(NaN, 1.0, 0.0) : nan
std::min(0.0, -0.0) : 0
std::min(-0.0, 0.0) : -0
Which means that contrary to what you might assume, std::min
is not symmetric when NaNs are involved, or even -0.0
. And NaNs do not propagate. Short story: That did provoke me some pain on a previous project, where I had to implement my own min
function to correctly propagate NaNs on both sides as was required by the project specification. Because std::min
on doubles is not defined!
The IEEE 754:
As you have noted, IEEE 754 floating-point numbers(or ISO/IEC/IEEE 60559:2011-06, which is the norm used by C11 standard see below, which more or less copies IEEE754 for the C language) does not have a strict weak ordering, because NaNs violates the transitivity of incomparability (fourth point of the Wikipedia page)
The fun part is that the IEE754 norm has been revised in 2008 (now named IEEE-754-2008), which includes a total ordering function. The fact is that both C++17 and C11 do not implement IEE754-2008, but rather ISO/IEC/IEEE 60559:2011-06
But who knows? Maybe that would change in the future.
Long read:
First, let's start by recalling what undefined behavior actually is, from the same standard draft you linked (emphasis is mine):
undefined behavior behavior for which this document imposes no requirements
[Note 1 to entry: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression never exhibits behavior explicitly specified as undefined in Clause 4 through Clause 14 of this document (7.7). —end note]
There is no such thing as "yielding" undefined behavior. It is simply something that is not defined in the C++ standard. Which may mean you may use it and get correct result at your own risk (like by doing std::min(0.0, 1.0);
Or it might raise warning or even compilation errors, if you find a compiler that is really careful about floating-point numbers!
About the subset… You say:
I do not see anything in the Standard (which is quite big, 1719 pages, and hence this question and the language-lawyer tag) that mathematically leads to the conclusion that std::min is fine with doubles provided that NaNs are not involved.
I do not have read the standard myself either, but from the part you posted, it seems the standard already says this is fine. I mean, if you construct a new type T that wraps doubles excluding the NaNs, then the definition of template<class T> constexpr const T& min(const T& a, const T& b);
applied to your new type will have a defined behavior, and behave exactly like you would expect from a minimum function.
We could also look at the standard definition of operation <
on double
, which is defined in the section 25.8 Mathematical functions for floating-point types which says the not really helpful:
The classification / comparison functions behave the same as the C macros with the corresponding names defined in the C standard library. Each function is overloaded for the three floating-point types. See also: ISO C 7.12.3, 7.12.4
What does the C11 standard says? (Because I guess C++17 does not use C18)
The relational and equality operators support the usual mathematical relationships between numeric values. For any ordered pair of numeric values exactly one of the relationships — less, greater, and equal — is true. Relational operators may raise the ‘‘invalid’’ floating-point exception when argument values are NaNs. For a NaN and a numeric value, or for two NaNs, just the unordered relationship is true.241)
As for the norm C11 uses, it is under the annex F of that norm:
This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for microprocessor systems, second edition (IEC 60559:1989), previously designated IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point Arithmetic (ANSI/IEEE854−1987) generalizes the binary standard to remove dependencies on radix and word length. IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc.
The only possible (not just plausible) interpretation is that the equations apply to values in the range of the function; that is to values actually used in the algorithms.
You might think of a type defining a set of values, but for UDT that wouldn't make sense anyway. Your interpretation of the range being every possible value of a type is patently absurd.
This is no problem here.
That might exist a very serious problem in implementations where a value of a floating point can have no more precision than allowed by the type, as the whole idea of a mathematical value of a floating point type loses all meaning, as the compiler may decide to change the value of a floating point type to remove precision at any time. In fact no semantics can be defined in that case. Any such implementation is broken and any program probably works only by accident.
EDIT:
A type doesn't define a set of values for an algorithm. This is obvious for user data types that have internal invariants that aren't formally specified in any code.
The set of values usable in any container, algorithm (containers internally use algorithms on elements)... is a property of that particular use of that container or algorithm. These library components don't have share their elements: if you have two set<fraction>
S1 and S2, their elements won't be used by the other one: S1 will compare elements in S1, S2 will compare elements in S2. The two sets exist in different "universes" and their logical properties are isolated. The invariants hold for each one independently; if you insert in S2 an element x2 that is not less or greater than x1 in S1 (thus considered equivalent), you don't expect x2 to be found at the place of x1 in S1! There is no possible sharing of data structures between containers and elements can't be shared between algorithms (which can't have static variables of a template type as it would have unexpected lifetime).
Sometimes the standard is a riddle where you must find the correct interpretation (most plausible, most useful, most likely to be was intended); in case the committee members are asked to clarify an issue, they will settle on the most X interpretation (X = plausible, useful...) even if it contradicts the exact previous wording so when the text is obscure or gives crazy conclusions, you might as well skip the literal reading and jump to the most useful.
The only solution here is that every use of a templated library component is independent and that the equations only have to hold during that use.
You don't expect vector<int*>
to be invalid because pointers can have invalid values that can't be copied: only a use of such value is illegal.
Thus
vector<int*> v;
v.push_back(new int);
vector<int*> v2 = v; // content must be valid
delete v[0];
v[0] = null; // during v[0] invocation (int*)(v[0]) has no valid value
is valid because the required properties of the element type are valid for the small duration where they are required to be.
In that case we can invoke a member function of a vector knowing that its elements don't respect the Assignable concept because there is no assignment allowed, as the no exception guarantee doesn't permit it: the value stored in v[0]
cannot be used by v[0]
, there is no user defined operation on the element allowed in vector<>::operator[]
.
The library components can only use the specific operations mentioned in the description of the specific function on the values used in that invocation; even for a builtin type, it cannot make values in any other ways: a specific set<int,comp>
instance may not compare values to 0 if 0 isn't inserted or looked up in a particular instance, as 0 may not even be in the domain of comp
.
So builtin or class types are treated uniformly here. The library implementation cannot assume anything on the set of values even when instantiated with builtin types.