Time-efficient matrix elements grouping and summing
Partition[test, {2,3}]
is quite slow in this case because it has to rearrange the elements in the data vector that represents the entries of a packed array in the backend:
Flatten[test] == Flatten[Partition[test, {2, 3}]]
False
Using Span
(;;
) as follows employs 6 monotonically increasing read operations; in this specific case, these operations are faster than using Partition
:
n = 24000;
test = RandomInteger[1, {n, n}];
a = Total[Partition[test, {2, 3}], {3, 4}]; //
AbsoluteTiming // First
b = Sum[test[[i ;; ;; 2, j ;; ;; 3]], {i, 1, 2}, {j, 1, 3}]; //
AbsoluteTiming // First
a == b
245.89
117.943
True
However, this performance advantage seems to decay when the matrix test
becomes bigger (so swapping is required). E.g., for $n = 4800$, method b
is aboutten times faster than
a`, but for $n = 24000$, it's only a factor of 4.6 and here it has degraded to a factor of 2 or so...
SparseArray method
Have I said already that I love SparseArray
s?
AbsoluteTiming[
c = Dot[
KroneckerProduct[
IdentityMatrix[n/2, SparseArray],
ConstantArray[1, {1, 2}]
],
Dot[
test,
KroneckerProduct[
IdentityMatrix[n/3, SparseArray],
ConstantArray[1, {3, 1}]
]
]
]
][[1]]
a == c
76.3822
True
The story goes on...
A combination of the SparseArray
method from above with a CompiledFunction
):
cf = Compile[{{x, _Integer, 1}, {k, _Integer}},
Table[
Sum[Compile`GetElement[x, i + j], {j, 1, k}],
{i, 0, Length[x] - 1, k}],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
d = KroneckerProduct[
IdentityMatrix[n/2, SparseArray],
ConstantArray[1, {1, 2}]
].cf[test, 3]; // AbsoluteTiming // First
a == d
33.5677
True