Difference between R's sum() and Armadillo's accu()
update: based on what others have found in the source, I was wrong about this - sum()
does not sort. The patterns of consistency I found below stem from the fact that sorting (as done in some cases below) and using extended-precision intermediate values (as done in sum()
) can have similar effects on precision ...
@user2357112 comments below:
src/main/summary.c ... doesn't do any sorting. (That'd be a lot of expense to add to a summation operation.) It's not even using pairwise or compensated summation; it just naively adds everything up left to right in an LDOUBLE (either long double or double, depending on HAVE_LONG_DOUBLE).
I have exhausted myself looking for this in the R source code (without success - sum
is hard to search for), but I can show by experiment that when executing ; the difference between sum()
, R sorts the input vector from smallest to largest in order to maximize accuracysum()
and Reduce()
results below is due to use of extended precision. I don't know what accu
does ...
set.seed(101)
vec <- runif(100, 0, 0.00001)
options(digits=20)
(s1 <- sum(vec))
## [1] 0.00052502325481269514554
Using Reduce("+",...)
just adds the elements in order.
(s2 <- Reduce("+",sort(vec)))
## [1] 0.00052502325481269514554
(s3 <- Reduce("+",vec))
## [1] 0.00052502325481269503712
identical(s1,s2) ## TRUE
?sum()
also says
Where possible extended-precision accumulators are used, but this is platform-dependent.
Doing this in RcppArmadillo
on the sorted vector gives the same answer as in R; doing it on the vector in the original order gives yet a different answer (I don't know why; my guess would be the aforementioned extended-precision accumulators, which would affect the numerical outcome more when the data are unsorted).
suppressMessages(require(inline))
code <- '
arma::vec ax = Rcpp::as<arma::vec>(x);
return Rcpp::wrap(arma::accu(ax));
'
## create the compiled function
armasum <- cxxfunction(signature(x="numeric"),
code,plugin="RcppArmadillo")
(s4 <- armasum(vec))
## [1] 0.00052502325481269525396
(s5 <- armasum(sort(vec)))
## [1] 0.00052502325481269514554
identical(s1,s5) ## TRUE
But as pointed out in comments this doesn't work for all seeds: in this case the Reduce()
result is closer to the results of sum()
set.seed(123)
vec2 <- runif(50000,0,0.000001)
s4 <- sum(vec2); s5 <- Reduce("+",sort(vec2))
s6 <- Reduce("+",vec2); s7 <- armasum(sort(vec2))
rbind(s4,s5,s6,s7)
## [,1]
## s4 0.024869900535651481843
## s5 0.024869900535651658785
## s6 0.024869900535651523477
## s7 0.024869900535651343065
I'm stumped here. I would have expected at least s6
and s7
to be identical ...
I will point out that in general when your algorithm depends on these kinds of tiny numeric differences you're likely to be getting very frustrated, as the results are likely to differ on the basis of many small and possibly-out-of-your-control factors like particular operating system, compiler, etc. you work with.
What I have found:
I successfully managed to write a function which is able to mimic R's sum function. It appears R uses a higher precision variable to store the results of each addition operation.
What I wrote:
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
double accu2(arma::vec& obj)
{
long double result = 0;
for (auto iter = obj.begin(); iter != obj.end(); ++iter)
{
result += *iter;
}
return result;
}
How it compares in speed:
set.seed(123)
vec <- runif(50000, 0, 0.000001)
microbenchmark(
sum(vec),
accu(vec),
accu2(vec)
)
expr min lq mean median uq max neval
sum(vec) 72.155 72.351 72.61018 72.6755 72.7485 75.068 100
accu(vec) 48.275 48.545 48.84046 48.7675 48.9975 52.128 100
accu2(vec) 69.087 69.409 70.80095 69.6275 69.8275 182.955 100
So, my c++ solution is still faster than R's sum, however it is significantly slower than armadillo's accu()
you could use the mpfr
package (Multiple Precision Floating-Point Reliable) and specify the decimal point
library("Rmpfr")
set.seed(1)
vec <- runif(100, 0, 0.00001)
# [1] 2.655087e-06 3.721239e-06 5.728534e-06 9.082078e-06 2.016819e-06 8.983897e-06 9.446753e-06 6.607978e-06 6.291140e-06 6.178627e-07 2.059746e-06
# [12] 1.765568e-06 6.870228e-06 3.841037e-06 7.698414e-06 4.976992e-06 7.176185e-06 9.919061e-06 3.800352e-06 7.774452e-06 9.347052e-06 2.121425e-06
# [23] 6.516738e-06 1.255551e-06 2.672207e-06 3.861141e-06 1.339033e-07 3.823880e-06 8.696908e-06 3.403490e-06 4.820801e-06 5.995658e-06 4.935413e-06
# [34] 1.862176e-06 8.273733e-06 6.684667e-06 7.942399e-06 1.079436e-06 7.237109e-06 4.112744e-06 8.209463e-06 6.470602e-06 7.829328e-06 5.530363e-06
# [45] 5.297196e-06 7.893562e-06 2.333120e-07 4.772301e-06 7.323137e-06 6.927316e-06 4.776196e-06 8.612095e-06 4.380971e-06 2.447973e-06 7.067905e-07
# [56] 9.946616e-07 3.162717e-06 5.186343e-06 6.620051e-06 4.068302e-06 9.128759e-06 2.936034e-06 4.590657e-06 3.323947e-06 6.508705e-06 2.580168e-06
# [67] 4.785452e-06 7.663107e-06 8.424691e-07 8.753213e-06 3.390729e-06 8.394404e-06 3.466835e-06 3.337749e-06 4.763512e-06 8.921983e-06 8.643395e-06
# [78] 3.899895e-06 7.773207e-06 9.606180e-06 4.346595e-06 7.125147e-06 3.999944e-06 3.253522e-06 7.570871e-06 2.026923e-06 7.111212e-06 1.216919e-06
# [89] 2.454885e-06 1.433044e-06 2.396294e-06 5.893438e-07 6.422883e-06 8.762692e-06 7.789147e-06 7.973088e-06 4.552745e-06 4.100841e-06 8.108702e-06
# [100] 6.049333e-06
sum(mpfr(vec,10))
# 1 'mpfr' number of precision 53 bits
# [1] 0.00051783234812319279