How does the caller of a function know whether Return Value Optimization was used?
There's no change in the interface. In all cases, the results
of the function must appear in the scope of the caller;
typically, the compiler uses a hidden pointer. The only
difference is that when RVO is used, as in your first case, the
compiler will "merge" x
and this return value, constructing
x
at the address given by the pointer; when it is not used,
the compiler will generate a call to the copy constructor in the
return statement, to copy whatever into this return value.
I might add that your second example is not very close to what happens. At the call site, you get almost always something like:
<raw memory for string> s;
f( &s );
And the called function will either construct a local variable or temporary directly at the address it was passed, or copy construct some othe value at this address. So that in your last example, the return statement would be more or less the equivalent of:
if ( cont ) {
std::string::string( s, first );
} else {
std::string::string( s, second );
}
(Showing the implicit this
pointer passed to the copy
constructor.) In the first case, if RVO applies, the special
code would be in the constructor of x
:
std::string::string( s, "hi" );
and then replacing x
with *s
everywhere else in the function
(and doing nothing at the return).
Lets play with NRVO, RVO and copy elision!
Here is a type:
#include <iostream>
struct Verbose {
Verbose( Verbose const& ){ std::cout << "copy ctor\n"; }
Verbose( Verbose && ){ std::cout << "move ctor\n"; }
Verbose& operator=( Verbose const& ){ std::cout << "copy asgn\n"; }
Verbose& operator=( Verbose && ){ std::cout << "move asgn\n"; }
};
that is pretty verbose.
Here is a function:
Verbose simple() { return {}; }
that is pretty simple, and uses direct construction of its return value. If Verbose
lacked a copy or move constructor, the above function would work!
Here is a function that uses RVO:
Verbose simple_RVO() { return Verbose(); }
here the unnamed Verbose()
temporary object is being told to copy itself to the return value. RVO means that the compiler can skip that copy, and directly construct Verbose()
into the return value, if and only if there is a copy or move constructor. The copy or move constructor is not called, but rather elided.
Here is a function that uses NRVO:
Verbose simple_NRVO() {
Verbose retval;
return retval;
}
For NRVO to occur, every path must return the exact same object, and you can't be sneaky about it (if you cast the return value to a reference, then return that reference, that will block NRVO). In this case, what the compiler does is construct the named object retval
directly into the return value location. Similar to RVO, a copy or move constructor must exist, but is not called.
Here is a function that fails to use NRVO:
Verbose simple_no_NRVO(bool b) {
Verbose retval1;
Verbose retval2;
if (b)
return retval1;
else
return retval2;
}
as there are two possible named objects it could return, it cannot construct both of them in the return value location, so it must do an actual copy. In C++11, the object returned will be implicitly move
d instead of copied, as it is a local variable being returned from a function in a simple return statement. So there is at least that.
Finally, there is copy elision at the other end:
Verbose v = simple(); // or simple_RVO, or simple_NRVO, or...
When you call a function, you provide it with its arguments, and you inform it where it should put its return value. The caller is responsible for cleaning up the return value and allocating the memory (on the stack) for it.
This communication is done in some way via the calling convention, often implicitly (ie, via the stack pointer).
Under many calling conventions, the location where the return value can be stored can end up being used as a local variable.
In general, if you have a variable of the form:
Verbose v = Verbose();
the implied copy can be elided -- Verbose()
is constructed directly in v
, rather than a temporary being created then copied to v
. In the same way, the return value of simple
(or simple_NRVO
, or whatever) can be elided if the run time model of the compiler supports it (and it usually does).
Basically, the calling site can tell simple_*
to put the return value in a particular spot, and simply treat that spot as the local variable v
.
Note that NRVO and RVO and implicit move are all done within the function, and the caller needs know nothing about it.
Similarly, the eliding at the calling site is all done outside the function, and if the calling convention supports it you do not need any support from the body of the function.
This doesn't have to be true in every calling convention and run time model, so the C++ standard makes these optimizations optional.