What is the difference between how references and Box<T> are represented in memory?
Your diagram for the simple case is fine, although it may be unclear as you use 5
for both the value and the address. I've moved y
in my diagram to prevent any confusion.
What does memory look like for a Box<T>
?
The equivalent diagram for Box
would look similar, but with the addition of the heap:
Stack
ADDR VALUE
+------------------------------+
x = |0x0001| 5 |
y = |0x0002| 0xFF01 |
|0x0003| |
|0x0004| |
|0x0005| |
+------------------------------+
Heap
ADDR VALUE
+------------------------------+
|0xFF01| 5 |
|0xFF02| |
|0xFF03| |
|0xFF04| |
|0xFF05| |
+------------------------------+
(See the pedantic notes below about this diagram)
Box
has allocated enough space in the heap for us, here at address 0xFF01
. The value is then moved from the stack onto the heap.
Does it mean that
y
in the box points directly
It does not. y
holds the pointer to the data allocated by the Box
. It must do this in order to be able to free the allocated memory when the Box
goes out of scope.
The point of the chapter you are reading is that Rust will transparently dereference the Box
for you, so you don't usually need to concern yourself with this fact.
See also:
- Do I need to Box child structs of a Boxed struct to get everything on the heap?
- What is the difference between Vec<i32> and Vec<Box<i32>>?
- Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?
- What are Rust's exact auto-dereferencing rules?
- How do I get an owned value out of a `Box`?
What's the difference in memory?
This might bend your brain a little bit!
Looking at the stack for both examples, there isn't really a difference between the two cases — both the reference and the Box
are stored on the stack as a pointer. The only difference is in the code, where it knows to treat the value on the stack differently depending on if it's a reference or Box
.
In fact, this is true for everything in Rust! To the computer, it's all just bits, and the structure encoded in the program binary is the only thing that distinguishes one blob of bytes from another.
Why is x
still on the stack after being moved to the Box
?
Observant readers will note that I left the value 5
for x
on the stack. There are two relevant reasons why:
That's actually what happens in memory. Programs don't usually "reset" values they are done with as it would be unneeded overhead. Rust avoids problems by marking the variable as moved and disallowing access to the moved-from variable.
In this case,
i32
implementsCopy
, which means that it's OK to access the value after it's been moved. The compiler will actually allow us to continue accessingx
. This wouldn't be true ifx
were a type that didn't implementCopy
, such as aString
or aBox
.
See also:
- Why does "move" in Rust not actually move?
- How does Rust move stack variables that are not Copyable?
- How does Rust provide move semantics?
- What are move semantics in Rust?
Pedantic diagram notes
This diagram is not to scale. An
i32
takes 4 bytes and a pointer / reference take a platform-dependent number of bytes, but it's simpler to assume everything is the same size.The stack typically starts at a high address and grows downward, while the heap starts at a low address and grows upward.
While the general rule is exactly the same as in that answer What are the differences between Rust's `String` and `str`?, I'm answering here as well.
A Rust reference is (almost) exactly what you have described: a pointer to the value somewhere in the memory. (It's not always. For example, slices also contain a length and pointers to traits also contain a v-table. These are called fat pointers). At the start, the Box<T>
is a value, like any other value in Rust, so the difference is obvious - one is a reference to a place in memory and the second is a value somewhere in memory. The confusion is that Box<T>
internally contains a reference to memory, but that reference is allocated on the heap instead of stack. The difference between these two is that the stack is function local and is quite small (on my macOS it is max 8192 KiB).
For example, you cannot do something like this for a few reasons:
fn foo() -> &u32 {
let a = 5;
&a
}
The most important reason is that a
will not be there after foo()
returns. That memory will be wiped out (not always though) and it is possible that it will be changed to another value soon. This is undefined behavior in C and C++ and an error in Rust which does not allow for any undefined behavior (in code that does not use unsafe
).
On the other hand, if you do:
fn foo() -> Box<u32> {
let a = Box::new(5);
a
}
A few things relevant to us will happen:
- memory will be allocated on the stack. This memory is totally independent from the current function scope, which means that it need to be freed when it will not be needed
- we will move the value, so there are no lifetimes involved
- ownership of
a
will be moved to the caller
For convenience, Box<T>
will behave like a reference in many cases, as these two can be often used interchangeably. For example, see this C program where we provide similar functionality to the second example:
int* foo(void) {
int* a = malloc(sizeof(int));
*a = 5;
return a;
}
As you can see, the pointer is used to store the address of the memory and this is passed further.