problem with sum function after inplace editing using Rcpp
@F.Privé's suspicion is correct. This is an issue with ALTREP, which Rcpp does not support yet, c.f. Rcpp/#812 and Rcpp/#906. We can see this more explicitly by inspecting the variable x
:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void test(IntegerVector x) {
x[5] = 77;
}
/*** R
x <- 10:1
.Internal(inspect(x))
test(x)
.Internal(inspect(x))
print(x) # 10 9 8 7 6 77 4 3 2 1
sum(x) # 55
x <- 10:1
.Internal(inspect(x))
x[6] <- 77L
.Internal(inspect(x))
print(x) # 10 9 8 7 6 77 4 3 2 1
sum(x)
*/
The first block gives:
> x <- 10:1
> .Internal(inspect(x))
@55f79a9d6c58 13 INTSXP g0c0 [NAM(3)] 10 : 1 (compact)
> test(x)
> .Internal(inspect(x))
@55f79a9d6c58 13 INTSXP g0c0 [NAM(3)] 10 : 1 (expanded)
> print(x) # 10 9 8 7 6 77 4 3 2 1
[1] 10 9 8 7 6 77 4 3 2 1
> sum(x) # 55
[1] 55
While the second block gives:
> x <- 10:1
> .Internal(inspect(x))
@55f79b1f9018 13 INTSXP g0c0 [NAM(3)] 10 : 1 (compact)
> x[6] <- 77L
> .Internal(inspect(x))
@55f7a096e5e8 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 10,9,8,7,6,...
> print(x) # 10 9 8 7 6 77 4 3 2 1
[1] 10 9 8 7 6 77 4 3 2 1
> sum(x)
[1] 127
So after changing a value in the vector, it still claims to be 10 : 1
, for which sum
uses a short-cut. See here for further reading (including references) on ALTREP.
For now the only solution seems to be to refrain from altering the function argument.
Here is what I posted elsewhere regarding this:
So there's a couple of things here, most of which @ltierney and/or @kalibera alluded to but which perhaps can benefit from more concrete possible coding patterns.
The crux of the matter here is inline modifying the payload of a SEXP by writing to elements of its DATAPTR-addressed memory. Is it ever ok to do (yes), can it be safe to do without confirming/being guaranteed its ok to do (no, never). And for objects created in the same chunk of C/C++ code, you may have such a priori guarantees, but you're not going to for objects passed down from R.
To be concrete, I don't know of any times it would be ok to write to the pointer returned by INTEGER()
on a SEXP that lives in R-space without first checking MAYBE_SHARED()
. If MAYBE_SHARED(x)
returns FALSE, then it's ok, and you can proceed writing to the pointer, exactly as your code did.
If MAYBE_SHARED(x) == TRUE
, then you need to duplicate, operate on the copy, and then return that. When you're in C/C++ code, at the level of grabbing data pointers to things, then you, your code needs to explicitly cause that duplication, protect the new duplicated result, etc.
Now the reason that this particular thing is happening is in the compact sequence case is that, unless R itself is built a specific non-default way, compact sequences ALWAYS have NAMED(x) == MAXNAMED
(ie 2), from the point of creation. As Luke pointed out, this could change, but is currently by design. So even in a .Call situation where the closure wasn't forcing the NAMED
count up, compact sequences will always need duplication before inline modification. And while this is a choice we could have made different in the compact sequence case, Luke's point about other ALTREPs is more important.
There may be ALTREP SEXPs where the memory pointed to by the pointer returned by, e.g., INTEGER
is literally not writable memory for one reason or another. The way that those ALTREP classes will declare that is by doing the same thing the compact sequences do, by marking themselves as "IMMUTABLE", ie by doing MARK_NOT_MUTABLE(x)
(which currently sets NAMED
to MAXNAMED
, but which is future-proof against eventual change to reference counting). This declares the contract that the SEXP must be duplicated before any code that grabs the datapointer and writes to it.
Ultimately, I agree this is a really weird unexpected behavior, but its due to a failure to meet the contract which has always been there. It may have been safe(ish, and I still doubt it) to ignore/be lax about in certain cases in the past, but with the advent of ALTREP it now must always be followed for the reasons outlined in this thread.
So all all all code thats going to grab a dataptr from a pre-existing (R-level) SEXP and write to it must follow a pattern along the lines of (or equivalently careful to):
SEXP awesomefun(SEXP x)
{
int nprot = 0;
if(MAYBE_SHARED(x)) {
PROTECT(x = duplicate(x)); nprot++;
}
/* do awesome things to x that modify it inline
protect other things as necessary but always increment nprot when you do,
decrement nprot if you ever unprotect */
if(nprot) UNPROTECT(nprot);
return x;
}
Any code which writes to datapointers retrieved from SEXPs it didn't itself create (ie anything that comes down from the R side) without doing this was already violating the C-API contract but is now also fatally ALTREP unsafe, as the motivating example showed.
And, one more time, remember, compact sequences could behave differently so that that code happened to work, but other ALTREP classes (Luke mentions memory-mapped-file backed ALTREPS) couldn't, so the behavior of compact sequences isn't actually the issue here.
I hope that is helpful and makes things clearer.
Best