Most efficient way to calculate function with large number of parameter combinations
@josliber's answer is very good. Yet, it makes it look like R is bad... and you have to switch to C++ for performance.
There are three tricks that are implemented in their answer:
- precompute the vector of threshold
- precompute the absolute value of
dX_i
- sort these values to stop the sum early
The first two tricks are just an R trick called "vectorization" -> basically do your operations (e.g. gamma * a * sigma_hat * delta_j^(1/2)
or abs()
) on the whole vectors instead of on a single element within a loop.
This is exactly what you do when using sum( dX_i^p * vec_boolean )
; it is vectorized (the *
and sum
) so that it should be very fast.
If we implement just these two tricks (we can't really do the third one the same way because it breaks vectorization), it gives:
abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
in_sum <- (abs_dX_i < thresh[i])
sum(abs_dX_i[in_sum]^p[i])
})
all.equal(result, result3) # TRUE
If we benchmark all three solutions:
microbenchmark::microbenchmark(
OP = {
result <- sapply(1:nrow(parameters), function(x) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
return(B)
})
},
RCPP = {
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a *
parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
},
R_VEC = {
abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
in_sum <- (abs_dX_i < thresh[i])
sum(abs_dX_i[in_sum]^p[i])
})
},
times = 10
)
We get:
Unit: milliseconds
expr min lq mean median uq max neval
OP 224.8414 235.4075 289.90096 270.2767 347.1727 399.3262 10
RCPP 14.8172 15.4691 18.83703 16.3979 20.3829 29.6624 10
R_VEC 28.3136 29.5964 32.82456 31.4124 33.2542 45.8199 10
It gives a huge speedup by just slightly modifying your original code in R.
This is less than twice as slow as the Rcpp code and can be easily parallelized as you previously did with parSapply()
.
When I want to speed up hard-to-vectorize code, I often turn to Rcpp. At the end of the day you are trying to sum up abs(dX_i)^p
, limiting to values of abs(dX_i)
smaller than threshold gamma * a * sigma_hat * delta_j^(1/2)
. You want to do this for a bunch of pairs of p
and a threshold. You could accomplish this with:
library(Rcpp)
cppFunction(
"NumericVector proc(NumericVector dX_i, NumericVector thresh, NumericVector p) {
const int n = thresh.size();
const int m = dX_i.size();
NumericVector B(n);
for (int i=0; i < n; ++i) {
B[i] = 0;
for (int j=0; j < m; ++j) {
if (dX_i[j] < thresh[i]) {
B[i] += pow(dX_i[j], p[i]);
} else {
break;
}
}
}
return B;
}"
)
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
all.equal(result, result2)
# [1] TRUE
Note that my code sorts the absolute value of dX_i so it can stop the calculation once it encounters the first value exceeding the threshold.
On my machine, I see a 20x speedup from 0.158 seconds for your code to 0.007 seconds for the Rcpp code (measured using system.time
).
One observation is that you actually have a huge number of repeats of each p
value within your parameter set. You might separately process each p
value; in this way, you only need to sum dX_i
raised to a particular p
value once.
result4 <- rep(NA, nrow(parameters))
sa_dX_i <- sort(abs(dX_i))
thresh <- parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2)
loc <- findInterval(thresh, sa_dX_i)
loc[loc == 0] <- NA # Handle threshold smaller than everything in dX_i
for (pval in unique(parameters$p)) {
this.p <- parameters$p == pval
cs_dX_i_p <- cumsum(sa_dX_i^pval)
result4[this.p] <- cs_dX_i_p[loc[this.p]]
}
result4[is.na(result4)] <- 0 # Handle threshold smaller than everything in dX_i
all.equal(result, result4)
# [1] TRUE
To see this in action, let's scale up the original dataset to what is described in the question (~600k rows of parameters and ~80k values in dX_i
):
set.seed(144)
dX_i <- rnorm(80000, 0, 0.0002540362)
p_vec <- seq(0, 1, 0.025)
gamma_vec <- seq(1, 2, 0.025)
a_vec <- seq(2, 6, 0.3)
sigma_hat_vec <- c(0.03201636, 0.05771143, 0.07932116, 0.12262327, 0.15074560)
delta_j_vec <- c(0.0000005850109, 0.0000011700217, 0.0000017550326, 0.0000035100651, 0.0000052650977)
parameters <- expand.grid("p" = p_vec, "gamma" = gamma_vec, "a" = a_vec, "sigma_hat" = sigma_hat_vec, "delta_j" = delta_j_vec)
dim(parameters)
# [1] 588350 5
length(unique(parameters$p))
# [1] 41
The speedup is rather dramatic -- this code takes 0.27 seconds on my computer, while the Rcpp code posted in my other answer to this question takes 655 seconds (a 2400x speedup, using pure R!). Clearly this speedup only works if there are relatively few p
values (each repeated many times) in the parameters
data frame. If each p
value is unique, this would likely be much slower than the other approaches proposed.