int vs const int&
In C++ it's very common what I consider an anti-pattern that uses const T&
like a smart way of just saying T
when dealing with parameters. However a value and a reference (no matter if const or not) are two completely different things and always and blindly using references instead of values can lead to subtle bugs.
The reason is that when dealing with references you must consider two issues that are not present with values: lifetime and aliasing.
Just as an example one place where this anti-pattern is applied is the standard library itself, where std::vector<T>::push_back
accepts as parameter a const T&
instead of a value and this can bite back for example in code like:
std::vector<T> v;
...
if (v.size())
v.push_back(v[0]); // Add first element also as last element
This code is a ticking bomb because std::vector::push_back
wants a const reference but doing the push_back may require a reallocation and if that happens means that after the reallocation the reference received would not be valid any more (lifetime issue) and you enter the Undefined Behavior realm¹.
Much better from a logical point of view in today C++ would be to accept a value (i.e. void std::vector<T>::push_back(T x)
) and then efficiently moving that value in the final place inside the container. Then the caller may eventually use std::move
if that is deemed important (note however that the idea of moving construction was not present in original C++).
Aliasing issues are instead a source of subtle problems if const references are used instead of values. I've been bitten for example by code of this kind:
struct P2d
{
double x, y;
P2d(double x, double y) : x(x), y(y) {}
P2d& operator+=(const P2d& p) { x+=p.x; y+=p.y; return *this; }
P2d& operator-=(const P2d& p) { x-=p.x; y-=p.y; return *this; }
};
struct Rect
{
P2d tl, br;
Rect(const P2d& tl, const P2d& br) : tl(tl), bt(br) {}
Rect& operator+=(const P2d& p) { tl+=p; br+=p; return *this; }
Rect& operator-=(const P2d& p) { tl-=p; br-=p; return *this; }
};
The code seems at a first glance pretty safe, P2d
is a bidimensional point, Rect
is a rectangle and adding/subtracting a point means translating the rectangle.
If however to translate the rectangle back in the origin you write myrect -= myrect.tl;
the code will not work because the translation operator has been defined accepting a reference that (in that case) is referencing a member of same instance.
This means that after updating the topleft with tl -= p;
the topleft will be (0, 0)
as it should but also p
will become at the same time (0, 0)
because p
is just a reference to the top-left member and so the update of bottom-right corner will not work because it will translate it by (0, 0)
hence doing basically nothing.
Please don't be fooled into thinking that a const reference is like a value because of the word const
. That word exists only to give you compile errors if you try to change the referenced object using that reference, but doesn't mean that the referenced object is constant. More specifically the object referenced by a const ref can change (e.g. because of aliasing) and can even get out of existence while you are using it (lifetime issue).
In const T&
the word const expresses a property of the reference, not of the referenced object: it's the property that makes impossible to use it to change the object. Probably readonly would have been a better name as const has IMO the psychological effect of pushing the idea that the object is going to be constant while you use the reference.
You can of course get impressive speedups by using references instead of copying the values, especially for big classes. But you should always think about aliasing and lifetime issues when using references because under the cover they're just pointers to other data. For "native" data types (ints, doubles, pointers) references however are actually going to be slower than values and there's nothing to gain in using them instead of values.
Also a const reference will always mean problems for the optimizer as the compiler is forced to be paranoid and every time any unknown code is executed it must assume that all referenced objects may have now a different value (const
for a reference means absolutely NOTHING for the optimizer; that word is there only to help programmers - I'm personally not so sure it's such a big help, but that's another story).
(1) Apparently (https://stackoverflow.com/a/18794634/320726) the standard says that this case is valid but even with this interpretation (on which I do not agree at all) still the problem is present in general. push_back
doesn't care about the identity of the object and so should have taken the argument by value. When you pass a const reference as value to a function it's your responsibility to ensure that the referenced object will stay alive for the full duration of the function. With v.push_back(v[0])
this is simply false if no reservation was done and IMO (given the push_back
signature) is a caller's fault if that happens. The real logic bug is however the push_back
interface design (done intentionally, sacrificing logical correctness on the altar of efficiency). Not sure if it was because of that defect report but I saw a few compilers "fixing" the problem in this special case (i.e. push_back
does a check to see if the element being pushed is coming from the vector itself).
As Oli says, returning a const T&
as opposed to T
are completely different things, and may break in certain situations (as in his example).
Taking const T&
as opposed to plain T
as an argument is less likely to break things, but still have several important differences.
- Taking
T
instead ofconst T&
requires thatT
is copy-constructible. - Taking
T
will invoke the copy constructor, which may be expensive (and also the destructor on function exit). - Taking
T
allows you to modify the parameter as a local variable (can be faster than manually copying). - Taking
const T&
could be slower due to misaligned temporaries and the cost of indirection.
int &
and int
are not interchangeable! In particular, if you return a reference to a local stack variable, the behaviour is undefined, e.g.:
int &func()
{
int x = 42;
return x;
}
You can return a reference to something that won't be destroyed at the end of the function (e.g. a static, or a class member). So this is valid:
int &func()
{
static int x = 42;
return x;
}
and to the outside world, has the same effect as returning the int
directly (except that you can now modify it, which is why you see const int &
a lot).
The advantage of the reference is that no copy is required, which is important if you're dealing with large class objects. However, in many cases, the compiler can optimize that away; see e.g. http://en.wikipedia.org/wiki/Return_value_optimization.
If the callee and the caller are defined in separate compilation units, then the compiler cannot optimize away the reference. For example, I compiled the following code:
#include <ctime>
#include <iostream>
int test1(int i);
int test2(const int& i);
int main() {
int i = std::time(0);
int j = test1(i);
int k = test2(i);
std::cout << j + k << std::endl;
}
with G++ on 64-bit Linux at optimization level 3. The first call needs no access to main memory:
call time
movl %eax, %edi #1
movl %eax, 12(%rsp) #2
call _Z5test1i
leaq 12(%rsp), %rdi #3
movl %eax, %ebx
call _Z5test2RKi
Line #1 directly uses the return value in eax
as argument for test1
in edi
. Line #2 and #3 push the result into main memory and place the address in the first argument because the argument is declared as reference to int, and so it must be possible to e.g. take its address. Whether something can be calculated entirely using registers or needs to access main memory can make a great difference these days. So, apart from being more to type, const int&
can also be slower. The rule of thumb is, pass all data that is at most as large as the word size by value, and everything else by reference to const. Also pass templated arguments by reference to const; since the compiler has access to the definition of the template, it can always optimize the reference away.