Speeding up generation of block diagonal matrix
You are right, it can be done in a fraction of second. One can explicitly construct an array of indexes
blockArray[mat_] := SparseArray[
Tuples[Range@# - {1, 0, 0}].{Rest@#, {1, 0}, {0, 1}} &@Dimensions@mat ->
Flatten@mat]
Timings:
matrices = RandomReal[1, {48, 128, 128}];
s1 =
SparseArray@
ArrayFlatten@ReleaseHold@DiagonalMatrix[Hold /@ matrices]; // RepeatedTiming
(* {7.56, Null} *)
s2 = SparseArray[Band@{1, 1} -> matrices]; // RepeatedTiming
(* {4.03, Null} *)
s3 = blockArray[matrices]; // RepeatedTiming
(* {0.097, Null} *)
TrueQ[s1 == s2 == s3]
(* True *)
For further acceleration you can create the internal structure of the SparseArray
directly
c = Compile[{{b, _Integer}, {h, _Integer}, {w, _Integer}},
Partition[Flatten@Table[k + i w, {i, 0, b - 1}, {j, h}, {k, w}], 1],
CompilationTarget -> "C", RuntimeOptions -> "Speed"];
blockArray2[mat_] :=
SparseArray @@ {Automatic, # {##2},
0, {1, {Range[0, 1 ##, #3], c@##}, Flatten@mat}} & @@ Dimensions@mat
s4 = blockArray2[matrices]; // RepeatedTiming
(* {0.031, Null} *)
s3 == s4
(* True *)
There is an undocumented built-in solution:
rules = {#, #} -> RandomReal[{}, {128, 128}] & /@ Range[48];
SparseArray`SparseBlockMatrix[rules]; // RepeatedTiming
(* {0.042, Null} *)