Most efficient way to build block / banded SparseArray
This is definitely faster. It took me some time to figure out the combinatorics and there might still be some potential for improvement within the compiled functions.
getColumnIndices = Compile[{{m, _Integer}, {n, _Integer}, {i, _Integer}},
Transpose[Partition[
Partition[
Join[
Table[k, {k, 1, i n}],
Flatten[Table[Table[k, {j, 1, n}], {k, i n + 1, i n + n}]],
Table[k, {k, i n + n + 1, m n}]
], {1}], {n}]],
RuntimeAttributes -> {Listable},
Parallelization -> True,
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
getValues = Compile[{{k1row, _Real, 1}, {k2, _Real, 2}, {i, _Integer}},
Block[{A},
A = Join[
Table[Compile`GetElement[k1row, k], {l, 1, Length[k2]}, {k, 1, i}],
k2,
Table[Compile`GetElement[k1row, k], {l, 1, Length[k2]}, {k, i + 2, Length[k1row]}],
2
];
Do[A[[k, k + i]] += Compile`GetElement[k1row, i + 1], {k, 1, Length[k2]}];
A
],
RuntimeAttributes -> {Listable},
Parallelization -> True,
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
makeKMat3[k1_, k2_] := With[{m = Length[k1], n = Length[k2]},
SparseArray @@ {Automatic, {m n, m n}, 0, {1, {
Range[0, (m n ) (m + n - 1), m + n - 1],
Flatten[getColumnIndices[m, n, Range[0, m - 1]], 2]
},
Flatten[getValues[k1, k2, Range[0, m - 1]]]
}}
]
Just to give you an idea of the timings:
m = 100;
n = 6;
k1 = RandomReal[{-1, 1}, {m, m}];
k2 = RandomReal[{-1, 1}, {n, n}];
A = makeKMat2[k1, k2]; // RepeatedTiming
B = makeKMat3[k1, k2]; // RepeatedTiming
Max[Abs[A - B]]
{0.11, Null}
{0.000481, Null}
0.
Addendum
The idea for the "higher dimensional" case is similar to the idea above. I exploit that the "ColumnIndices" of the resulting matrices (when partitioned into column indices per row) stay rectangular so that column indices from the diagonal matrices can be joined to it from the left and right. Some extra index list (diagidx
) is needed for book keeping of those column indices of the blocks that belong to the diagonal entries. All in all, only operations on the nonzero values and the column indices are performed. No SparseArray
s are built intermediately: Even building a SparseArray
from row pointers, column indices and nonzero values has still some considerable overhead, probably because there is a consistency checker involved in the backend. That's a pitty since I know that I produce consistent data. This issue is also closely related to this post by Szabolcs.
getValues2 = Compile[{{k1row, _Real, 1}, {blockvals, _Real, 2}, {diagidx, _Integer, 1}, {i, _Integer}},
Block[{A},
A = Join[
Table[Compile`GetElement[k1row, k], {l, 1, Dimensions[blockvals][[1]]}, {k, 1, i}],
blockvals,
Table[Compile`GetElement[k1row, k], {l, 1, Dimensions[blockvals][[1]]}, {k, i + 2, Length[k1row]}],
2];
Do[A[[k, Compile`GetElement[diagidx, k] + i]] += Compile`GetElement[k1row, i + 1], {k, 1, Length[blockvals]}];
A
],
RuntimeAttributes -> {Listable},
Parallelization -> True,
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
getColumnIndices2 = Compile[{
{blockci, _Integer, 3}, {diagci, _Integer,
3}, {m, _Integer}, {n, _Integer}, {i, _Integer}
},
If[i > 0,
If[i < m - 1,
Join[diagci[[All, 1 ;; i]], blockci + i n, diagci[[All, i + 1 ;; m - 1]] + (n), 2],
Join[diagci[[All, 1 ;; i]], blockci + i n, 2]
],
Join[blockci + i n, diagci[[All, i + 1 ;; m - 1]] + (n), 2]
],
RuntimeAttributes -> {Listable},
Parallelization -> True,
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
toSparseArrayData[b_?MatrixQ] := {
Partition[SparseArray[b]["ColumnIndices"], Dimensions[b][[2]]],
b,
Dimensions[b][[2]],
Range[Dimensions[b][[2]]]
}
toSparseArray[X_] :=
With[{d1 = Dimensions[X[[1]]][[1]], d2 = Dimensions[X[[1]]][[2]]},
SparseArray @@ {Automatic, {d1, d1}, 0,
{1, {Range[0, d1 d2, d2], Flatten[X[[1]], 1]}, Flatten[X[[2]]]}}
]
iteration[X_, a_] := With[{
m = Length[a],
blockci = X[[1]],
blockvals = X[[2]],
n = X[[3]],
diagidx = X[[4]]
},
With[{ran = Range[0, m - 1]},
{
Join @@ getColumnIndices2[blockci, Transpose[Partition[Partition[Range[(m - 1) n], 1], n]], m, n, ran],
Join @@ getValues2[a, blockvals, diagidx, ran],
m n,
Join @@ Outer[Plus, ran, diagidx]
}
]
]
makeKMat3ND[ks : {__List}] :=
toSparseArray[Fold[iteration, toSparseArrayData[ks[[1]]], Rest[ks]]]
Usage example and timing test:
SeedRandom[123];
ks = Table[RandomReal[{-1, 1}, {RandomInteger[{3, 8}]}[[{1, 1}]]], 6];
A = makeKMat2ND[Reverse@ks]; // AbsoluteTiming
B = makeKMat3ND[ks]; // AbsoluteTiming
Max[Abs[A - B]]
{24.0011, Null}
{0.038756, Null}
8.88178*10^-16
This is the sparsity pattern of the resulting matrix:
Not faster, but maybe easier to implement
makeKMat2[k1_, k2_] := Module[{A, m, n},
m = Length[k1];
n = Length[k2];
A = Map[SparseArray[Band[{1, 1}] -> #, {n, n}, 0.] &, k1, {2}];
With[{B = SparseArray@k2}, Do[A[[i, i]] += B, {i, 1, m}]];
ArrayFlatten[A]
]
Your implementation needs about 0.0048
s; mine needs 0.0051
s on my machine.
(b3m2a1 comment)
Note that this has a simple extension to the N-dimensional case:
makeKMat2ND[ks : {__List}] := Fold[makeKMat2, ks]
This is slightly faster and certainly more compact than the uncompiled implementation in Henrik's answer:
makeKMat[k1_?SquareMatrixQ, k2_?SquareMatrixQ] := Module[{m = Length[k1], n = Length[k2]},
KroneckerProduct[k1, IdentityMatrix[n, SparseArray]] +
KroneckerProduct[IdentityMatrix[m, SparseArray], k2]]
and as b3m2a1 notes, the extension can be done as
makeKMatND[ks : {__?SquareMatrixQ}] := Fold[makeKMat[#2, #1] &, ks]
Using the examples in the OP:
MatrixPlot[makeKMat[ConstantArray[1, {9, 9}], ConstantArray[2, {6, 6}]]]
AbsoluteTiming[tst =
makeKMatND[{ConstantArray[4, {5, 5}], ConstantArray[3, {4, 4}],
ConstantArray[2, {3, 3}], ConstantArray[1, {2, 2}]}];][[1]]
0.000406914
MatrixPlot[tst, MaxPlotPoints -> Infinity]