Why is matrix multiplication in .NET so slow?

With large matrices like this, the CPU cache becomes the limiting factor. What's hyper-important is how the matrix is stored. And the benchmark code is comparing apples and oranges. The C++ code used jagged arrays, the C# code uses two-dimensional arrays.

Rewriting the C# code to use jagged arrays as well doubled its speed. Rewriting the matrix multiply code to avoid the array index boundary check seemed pointless, nobody would use code like this for real problems.


To explain the origin of the idea that XNA matrix operations are slow:

First of all there's the beginner-level gotcha: The XNA Matrix class's operator* will make several copies. This is slower than what you might expect from the equivalent C++ code.

(Of course, if you use Matrix.Multiply(), then you can pass by reference.)

The second reason is that the .NET Compact Framework used by XNA on the Xbox 360 does not have access to the VMX hardware (SIMD) that is available to native, C++ games.

This is why you keep hearing that it is slow, at least. As you can see from the benchmarks you posted - it's not really that "slow", when you compare apples to apples.


Well clearly the benchmark author did not understand the difference between jagged and multidimensional arrays in C#. It really was not an apples-to-apples to comparison. When I changed the code to use jagged arrays instead of multidimensional arrays so that it operates in a manner more similar to Java then the C# code ends up running twice as fast...making it faster than Java (though just barely and that is probably statistically insignificant). In C# multidimensional arrays are slower because there is extra work involved in finding the array slot and because the array bounds check cannot be eliminated for them...yet.

See this question for a more in depth analysis of why multidimensional arrays are slower than jagged arrays.

See this blog for more information on array bounds checking. The article specifically warns against using multidimensional arrays for matrix multiplication.


Here's an updated benchmark dealing with matrix multiplcation (and some benchmarks using the new Task Parallel Library):

Parallel Matrix Multiplication with the Task Parallel Library (TPL)

The article goes into different methods, and explains why multidimensional arrays are a poor choice:

The easiest way to do matrix multiplication is with a .NET multidimensional array with i,j,k ordering in the loops. The problems are twofold. First, the i,j.k ordering accesses memory in a hectic fashion causing data in varied locations to be pulled in. Second, it is using a multidimensional array. Yes, the .NET multidimensional array is convenient, but it is very slow.