How are references implemented internally?
Just to repeat some of the stuff everyone's been saying, lets look at some compiler output:
#include <stdio.h>
#include <stdlib.h>
int byref(int & foo)
{
printf("%d\n", foo);
}
int byptr(int * foo)
{
printf("%d\n", *foo);
}
int main(int argc, char **argv) {
int aFoo = 5;
byref(aFoo);
byptr(&aFoo);
}
We can compile this with LLVM (with optimizations turned off) and we get the following:
define i32 @_Z5byrefRi(i32* %foo) {
entry:
%foo_addr = alloca i32* ; <i32**> [#uses=2]
%retval = alloca i32 ; <i32*> [#uses=1]
%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
store i32* %foo, i32** %foo_addr
%0 = load i32** %foo_addr, align 8 ; <i32*> [#uses=1]
%1 = load i32* %0, align 4 ; <i32> [#uses=1]
%2 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %1) ; <i32> [#uses=0]
br label %return
return: ; preds = %entry
%retval1 = load i32* %retval ; <i32> [#uses=1]
ret i32 %retval1
}
define i32 @_Z5byptrPi(i32* %foo) {
entry:
%foo_addr = alloca i32* ; <i32**> [#uses=2]
%retval = alloca i32 ; <i32*> [#uses=1]
%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
store i32* %foo, i32** %foo_addr
%0 = load i32** %foo_addr, align 8 ; <i32*> [#uses=1]
%1 = load i32* %0, align 4 ; <i32> [#uses=1]
%2 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %1) ; <i32> [#uses=0]
br label %return
return: ; preds = %entry
%retval1 = load i32* %retval ; <i32> [#uses=1]
ret i32 %retval1
}
The bodies of both functions are identical
Sorry for using assembly to explain this but I think this is the best way to understand how references are implemented by compilers.
#include <iostream>
using namespace std;
int main()
{
int i = 10;
int *ptrToI = &i;
int &refToI = i;
cout << "i = " << i << "\n";
cout << "&i = " << &i << "\n";
cout << "ptrToI = " << ptrToI << "\n";
cout << "*ptrToI = " << *ptrToI << "\n";
cout << "&ptrToI = " << &ptrToI << "\n";
cout << "refToNum = " << refToI << "\n";
//cout << "*refToNum = " << *refToI << "\n";
cout << "&refToNum = " << &refToI << "\n";
return 0;
}
Output of this code is like this
i = 10
&i = 0xbf9e52f8
ptrToI = 0xbf9e52f8
*ptrToI = 10
&ptrToI = 0xbf9e52f4
refToNum = 10
&refToNum = 0xbf9e52f8
Lets look at the disassembly(I used GDB for this. 8,9 and 10 here are line numbers of code)
8 int i = 10;
0x08048698 <main()+18>: movl $0xa,-0x10(%ebp)
Here $0xa
is the 10(decimal) that we are assigning to i
. -0x10(%ebp)
here means content of ebp register
–16(decimal).
-0x10(%ebp)
points to the address of i
on stack.
9 int *ptrToI = &i;
0x0804869f <main()+25>: lea -0x10(%ebp),%eax
0x080486a2 <main()+28>: mov %eax,-0x14(%ebp)
Assign address of i
to ptrToI
. ptrToI
is again on stack located at address -0x14(%ebp)
, that is ebp
– 20(decimal).
10 int &refToI = i;
0x080486a5 <main()+31>: lea -0x10(%ebp),%eax
0x080486a8 <main()+34>: mov %eax,-0xc(%ebp)
Now here is the catch! Compare disassembly of line 9 and 10 and you will observer that ,-0x14(%ebp)
is replaced by -0xc(%ebp)
in line number 10. -0xc(%ebp)
is the address of refToNum
. It is allocated on stack. But you will never be able to get this address from you code because you are not required to know the address.
So; a reference does occupy memory. In this case it is the stack memory since we have allocated it as a local variable. How much memory does it occupy? As much a pointer occupies.
Now lets see how we access the reference and pointers. For simplicity I have shown only part of the assembly snippet
16 cout << "*ptrToI = " << *ptrToI << "\n";
0x08048746 <main()+192>: mov -0x14(%ebp),%eax
0x08048749 <main()+195>: mov (%eax),%ebx
19 cout << "refToNum = " << refToI << "\n";
0x080487b0 <main()+298>: mov -0xc(%ebp),%eax
0x080487b3 <main()+301>: mov (%eax),%ebx
Now compare the above two lines, you will see striking similarity. -0xc(%ebp)
is the actual address of refToI
which is never accessible to you.
In simple terms, if you think of reference as a normal pointer, then accessing a reference is like fetching the value at address pointed to by the reference. Which means the below two lines of code will give you the same result
cout << "Value if i = " << *ptrToI << "\n";
cout << " Value if i = " << refToI << "\n";
Now compare this
15 cout << "ptrToI = " << ptrToI << "\n";
0x08048713 <main()+141>: mov -0x14(%ebp),%ebx
21 cout << "&refToNum = " << &refToI << "\n";
0x080487fb <main()+373>: mov -0xc(%ebp),%eax
I guess you are able to spot what is happening here.
If you ask for &refToI
, the contents of -0xc(%ebp)
address location are returned and -0xc(%ebp)
is where refToi
resides and its contents are nothing but address of i
.
One last thing, Why is this line commented?
//cout << "*refToNum = " << *refToI << "\n";
Because *refToI
is not permitted and it will give you a compile time error.