Why is valarray so slow?
I suspect that the reason c = a*b
is so much slower than performing the operations an element at a time is that the
template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);
operator must allocate memory to put the result into, then returns that by value.
Even if a "swaptimization" is used to perform the copy, that function still has the overhead of
- allocating the new block for the resulting
valarray
- initializing the new
valarray
(it's possible that this might be optimized away) - putting the results into the new
valarray
- paging in the memory for the new
valarray
as it is initialized or set with result values - deallocating the old
valarray
that gets replaced by the result
The whole point of valarray is to be fast on vector machines, which x86 machines just aren't.
A good implementation on a nonvector machine should be able to match the performance that you get with something like
for (i=0; i < N; ++i)
c1[i] = a1[i] * b1[i];
and a bad one of course won't. Unless there is something in the hardware to expedite parallel processing, that is going to be pretty close to the best that you can do.
I just tried it on a Linux x86-64 system (Sandy Bridge CPU):
gcc 4.5.0:
double operator* 9.64185 ms
valarray operator* 9.36987 ms
valarray[i] operator* 9.35815 ms
Intel ICC 12.0.2:
double operator* 7.76757 ms
valarray operator* 9.60208 ms
valarray[i] operator* 7.51409 ms
In both cases I just used -O3
and no other optimisation-related flags.
It looks like the MS C++ compiler and/or valarray implementation suck.
Here's the OP's code modified for Linux:
#include <iostream>
#include <valarray>
#include <iostream>
#include <ctime>
using namespace std ;
double gettime_hp();
int main()
{
enum { N = 5*1024*1024 };
valarray<double> a(N), b(N), c(N) ;
int i,j;
for( j=0 ; j<8 ; ++j )
{
for( i=0 ; i<N ; ++i )
{
a[i]=rand();
b[i]=rand();
}
double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0] ;
double dtime=gettime_hp();
for( i=0 ; i<N ; ++i ) c1[i] = a1[i] * b1[i] ;
dtime=gettime_hp()-dtime;
cout << "double operator* " << dtime << " ms\n" ;
dtime=gettime_hp();
c = a*b ;
dtime=gettime_hp()-dtime;
cout << "valarray operator* " << dtime << " ms\n" ;
dtime=gettime_hp();
for( i=0 ; i<N ; ++i ) c[i] = a[i] * b[i] ;
dtime=gettime_hp()-dtime;
cout << "valarray[i] operator* " << dtime<< " ms\n" ;
cout << "------------------------------------------------------\n" ;
}
}
double gettime_hp()
{
struct timespec timestamp;
clock_gettime(CLOCK_REALTIME, ×tamp);
return timestamp.tv_sec * 1000.0 + timestamp.tv_nsec * 1.0e-6;
}