R, deep vs. shallow copies, pass by reference
Late answer but a very important aspect of the language design that don't get enough coverage on the web (or at least the usual sources).
x <- c(0,4,2)
lobstr::obj_addr(x)
# [1] "0x7ff25e82b0f8"
y <- x
lobstr::obj_addr(y)
# [1] "0x7ff25e82b0f8"
Notice the identical "memory address", i.e. the location in memory where the object is stored. You can thus confirm that x
and y
both point to the same identifier.
Hadley Wickham's Advanced R book touch on this:
Consider this code:
x <- c(1, 2, 3)
It’s easy to read it as: “create an object named ‘x’, containing the values 1, 2, and 3”. Unfortunately, that’s a simplification that will lead to inaccurate predictions about what R is actually doing behind the scenes. It’s more accurate to say that this code is doing two things:
It’s creating an object, a vector of values,
c(1, 2, 3)
. And it’s binding that object to a name,x
. In other words, the object, or value, doesn’t have a name; it’s actually the name that has a value.
Note that they are the memory addresses are ephemeral and change with every new R session.
Now here is the important part.
In R semantics, objects are copied by value. This means that modifying the copy leaves the original object intact. Since copying data in memory is an expensive operation, copies in R are as lazy as possible. They only happen when the new object is actually modified. Source: [R lang documentation][1]
So if we now modify the value of y
by appending a value to the vector, y
now points to a different "object". This agrees with what the documentation says regarding a copy operation happening "only when the new object is modified" (lazy). y
is pointing to a different address than it was previously.
y <- c(y, -3)
print(lobstr::obj_addr(y))
# [1] "0x7ff25e825b48"
When it passes variables, it is always by copy rather than by reference. Sometimes, however, you will not get a copy made until an assignment actually occurs. The real description of the process is pass-by-promise. Take a look at the documentation
?force
?delayedAssign
One practical implication is that it is very difficult if not impossible to avoid needing at least twice as much RAM as your objects nominally occupy. Modifying a large object will generally require making a temporary copy.
update: 2015: I do (and did) agree with Matt Dowle that his data.table package provides an alternate route to assignment that avoids the copy-duplication problem. If that was the update requested, then I didn't understand it at the time the suggestion was made.
There was a recent change in R 3.2.1 in the evaluation rules for apply
and Reduce
. It was SO-announced with reference to the News here: Returning anonymous functions from lapply - what is going wrong?
And the interesting paper cited by jhetzel in the comments is now here: