Boost::multi_array performance question
On my machine using
g++ -O3 -march=native -mtune=native --fast-math -DNDEBUG test.cpp -o test && ./test
I get
[Boost] Elapsed time: 0.020 seconds
[Native]Elapsed time: 0.020 seconds
However changing const int ITERATIONS
to 5000
I get
[Boost] Elapsed time: 0.240 seconds
[Native]Elapsed time: 0.180 seconds
then with ITERATIONS
back to 500
but X_SIZE
and Y_SIZE
set to 400
I get a much more significant difference
[Boost] Elapsed time: 0.460 seconds
[Native]Elapsed time: 0.070 seconds
finally inverting the inner loop for the [Boost]
case so it looks like
for (int x = 0; x < X_SIZE; ++x)
{
for (int y = 0; y < Y_SIZE; ++y)
{
and keeping ITERATIONS
, X_SIZE
and Y_SIZE
to 500
, 400
and 400
I get
[Boost] Elapsed time: 0.060 seconds
[Native]Elapsed time: 0.080 seconds
If I invert the inner loop also for the [Native]
case (so it is in the wrong order for that case), I get, unsurprisingly,
[Boost] Elapsed time: 0.070 seconds
[Native]Elapsed time: 0.450 seconds
I am using gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
on Ubuntu 10.10
So in conclusion:
- With proper optimization boost::multi_array does its job as expected
- The order on which you access your data does matter
Your test is flawed.
- In a DEBUG build, boost::MultiArray lacks the optimization pass that it sorely needs. (Much more than a native array would)
- In a RELEASE build, your compiler will look for code that can be removed outright and most of your code is in that category.
What you're likely seeing is the result of your optimizing compiler seeing that most or all of your "native array" loops can be removed. The same is theoretically true of your boost::MultiArray loops, but MultiArray is probably complex enough to defeat your optimizer.
Make this small change to your testbed and you'll see more true-to-life results: Change both occurances of "= 2.345
" with "*= 2.345
" and compile again with optimizations. This will prevent your compiler from discovering that the outer loop of each test is redundant.
I did it and got a speed comparison closer to 2:1.
Are you building release or debug?
If running in debug mode, the boost array might be really slow because their template magic isn't inlined properly giving lots of overhead in function calls. I'm not sure how multi array is implemented though so this might be totally off :)
Perhaps there is some difference in storage order as well so you might be having your image stored column by column and writing it row by row. This would give poor cache behavior and may slow down things.
Try switching the order of the X and Y loop and see if you gain anything. There is some info on the storage ordering here: http://www.boost.org/doc/libs/1_37_0/libs/multi_array/doc/user.html
EDIT: Since you seem to be using the two dimensional array for image processing you might be interested in checking out boosts image processing library gil.
It might have arrays with less overhead that works perfectly for your situation.