Performance of R stats::sd() vs. arma::stddev() vs. Rcpp implementation
You made a subtle mistake in how you instantiate the Armadillo object -- which leads to copies and hence degraded performance.
Use an interface of const arma::colvec & invec
instead, and all is good:
R> sourceCpp("/tmp/sd.cpp")
R> library(microbenchmark)
R> X <- rexp(500)
R> microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
Unit: microseconds
expr min lq median uq max neval
armaSD(X) 3.745 4.0280 4.2055 4.5510 19.375 100
armaSD2(X) 3.305 3.4925 3.6400 3.9525 5.154 100
sd(X) 22.463 23.6985 25.1525 26.0055 52.457 100
cppSD(X) 3.640 3.9495 4.2030 4.8620 13.609 100
R> X <- rexp(5000)
R> microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
Unit: microseconds
expr min lq median uq max neval
armaSD(X) 18.627 18.9120 19.3245 20.2150 34.684 100
armaSD2(X) 14.583 14.9020 15.1675 15.5775 22.527 100
sd(X) 54.507 58.8315 59.8615 60.4250 84.857 100
cppSD(X) 18.585 19.0290 19.3970 20.5160 22.174 100
R> X <- rexp(50000)
R> microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
Unit: microseconds
expr min lq median uq max neval
armaSD(X) 186.307 187.180 188.575 191.825 405.775 100
armaSD2(X) 142.447 142.793 143.207 144.233 155.770 100
sd(X) 382.857 384.704 385.223 386.075 405.713 100
cppSD(X) 181.601 181.895 182.279 183.350 194.588 100
R>
which is based on my version of your code where everything is one file and armaSD2
is defined as I suggested -- leading to the winning performance.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <vector>
#include <cmath>
#include <numeric>
// [[Rcpp::export]]
double cppSD(Rcpp::NumericVector rinVec) {
std::vector<double> inVec(rinVec.begin(),rinVec.end());
int n = inVec.size();
double sum = std::accumulate(inVec.begin(), inVec.end(), 0.0);
double mean = sum / inVec.size();
for(std::vector<double>::iterator iter = inVec.begin();
iter != inVec.end();
++iter){
double temp = (*iter - mean)*(*iter - mean);
*iter = temp;
}
double sd = std::accumulate(inVec.begin(), inVec.end(), 0.0);
return std::sqrt( sd / (n-1) );
}
// [[Rcpp::export]]
double armaSD(arma::colvec inVec) {
return arma::stddev(inVec);
}
// [[Rcpp::export]]
double armaSD2(const arma::colvec & inVec) { return arma::stddev(inVec); }
/*** R
library(microbenchmark)
X <- rexp(500)
microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
X <- rexp(5000)
microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
X <- rexp(50000)
microbenchmark(armaSD(X), armaSD2(X), sd(X), cppSD(X))
*/
I think the sd
function built in Rcpp sugar is much more efficient. See the code below:
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
#include <vector>
#include <cmath>
#include <numeric>
using namespace Rcpp;
//[[Rcpp::export]]
double sd_cpp(NumericVector& xin){
std::vector<double> xres(xin.begin(),xin.end());
int n=xres.size();
double sum=std::accumulate(xres.begin(),xres.end(),0.0);
double mean=sum/n;
for(std::vector<double>::iterator iter=xres.begin();iter!=xres.end();++iter){
double tmp=(*iter-mean)*(*iter-mean);
*iter=tmp;
}
double sd=std::accumulate(xres.begin(),xres.end(),0.0);
return std::sqrt(sd/(n-1));
}
//[[Rcpp::export]]
double sd_arma(arma::colvec& xin){
return arma::stddev(xin);
}
//[[Rcpp::export]]
double sd_sugar(NumericVector& xin){
return sd(xin);
}
> sourcecpp("sd.cpp")
> microbenchmark(sd(X),sd_cpp(X),sd_arma(X),sd_sugar(X))
Unit: microseconds
expr min lq mean median uq max neval
sd(X) 47.655 49.4120 51.88204 50.5395 51.1950 113.643 100
sd_cpp(X) 28.145 28.4410 29.01541 28.6695 29.4570 37.118 100
sd_arma(X) 23.706 23.9615 24.65931 24.1955 24.9520 50.375 100
sd_sugar(X) 19.197 19.478 20.38872 20.0785 21.2015 28.664 100