How to implement the general array broadcasting method from NumPy?
Broadcasting vs Listability
NumPy broadcasting lets you perform, in efficient way, element-wise operations on arrays, as long as dimensions of those arrays are considered "compatible" in some sense.
Mathematica also has such mechanism. Some Mathematica functions are Listable
and also allow you to perform element-wise operations on nested lists with dimensions "compatible" in some sense.
Built-in listable functions are optimized for packed arrays and, similarly to NumPy's broadcasting, will give you "C-level" efficiency.
In addition to that Mathematica allows you to Compile
functions with Listable
RuntimeAttributes
which gives you some additional control over "compatibility" of arrays. Listable
compiled functions can also be easily parallelized.
There are two important differences between how NumPy's broadcasting and Mathematica's listability (compiled and not) determine if arrays are "compatible":
- order in which dimensions are compared,
- what happens when certain dimensions are equal 1.
Leading vs Trailing Dimensions
Broadcasting
NumPy starts with trailing dimensions, Mathematica - with leading.
So NumPy can e.g. add arrays with dimensions {8,5,7,4}
and {7,4}
out of the box:
import numpy as np
(np.zeros((8,5,7,4))+np.ones((7,4))).shape
# (8, 5, 7, 4)
In Mathematica this would lead to an error:
Array[0 &, {8, 5, 7, 4}] + Array[1 &, {7, 4}];
(* Thread::tdlen: Objects of unequal length in ... cannot be combined. *)
To use listability we can transpose one of arrays to put "compatible" dimensions to the front and after addition transpose back:
Transpose[
Transpose[Array[0 &, {8, 5, 7, 4}], {3, 4, 1, 2}] +
Array[1 &, {7, 4}], {3, 4, 1, 2}
] // Dimensions
(* {8, 5, 7, 4} *)
Listability
In contrast Mathematica can, out of the box, add arrays with dimensions {4,7,5,8}
and {4,7}
:
Array[0 &, {4, 7, 5, 8}] + Array[1 &, {4, 7}] // Dimensions
(* {4, 7, 5, 8} *)
which would lead to an error in NumPy
import numpy as np
(np.zeros((4,7,5,8))+np.ones((4,7)))
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ValueError: operands could not be broadcast together with shapes (4,7,5,8) (4,7)
Similarly to use broadcasting we could transpose our arrays:
import numpy as np
(np.zeros((4,7,5,8)).transpose(2,3,0,1)+np.ones((4,7))).transpose(2,3,0,1).shape
# (4, 7, 5, 8)
I don't know if this is the "correct" way to do it in NumPy.
As far as I know, in contrast to Mathematica, NumPy is not copying an array on transposition, it returns a view
of an array i.e. an object with information on how data from base
array should be accessed.
So I think that those transpositions are much cheaper than in Mathematica.
I doubt that it's possible to replicate NumPy's efficiency, on arrays which are "listability incompatible", using only top-level Mathemaica code.
As noted in comment, by @LLlAMnYP, design decision to start from leading dimensions makes, in Mathematica, more sense, since listability applies not only to full arrays, but to arbitrary nested lists.
Compiled Listability
Since compiled functions accept only full arrays with specified rank, Compilation allows you to "split" ranks of full arrays into two parts. Last dimensions given by ranks in arguments list of Compile
will be handled inside body of your compiled function, and remaining leading dimensions will be handled by Listable
attribute of compiled function.
For tests let's compile simple listable function accepting two rank 2 arrays of reals:
cPlus22 = Compile[{{x, _Real, 2}, {y, _Real, 2}}, x + y, RuntimeAttributes -> {Listable}]
Now last two dimensions need to be equal since they are handled by Plus
inside body of compiled function. Remaining dimensions will be handled by ordinary listability rules starting with leading ones:
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 3, 5, 8}]] // Dimensions
(* {4, 7, 3, 5, 8} *)
Treating Dimensions equal to 1
Broadcasting
When comparing consecutive dimensions NumPy's broadcasting treats them as "compatible" if they are equal, or one of them is 1. Mathematica's listability treats dimensions as "compatible" only if they are equal.
In NumPy we can do
import numpy as np
(np.zeros((1,8,1,3,7,1))+np.ones((2,1,5,3,1,4))).shape
# (2, 8, 5, 3, 7, 4)
which gives a generalized outer product.
Outer
Mathematica has a built-in to do this kind of tasks: Outer
(as noted in comment by @Sjoerd), which is "C-level efficient" when given Plus
, Times
and List
functions and packed arrays.
But Outer
has its own rules for dimension "compatibility", to replicate NumPy's broadcasting conventions, all pairwise equal dimensions should be moved to the end, and dimensions equal one, that are supposed to be broadcasted, should be removed.
This in general requires accessing Part
s of arrays and transpositions (which in Mathematica enforces copying).
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {1, 3, 2}]) // Dimensions
(* {8, 7, 3} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {1, 2, 4, 3}]) // Dimensions
(* {2, 5, 4, 3} *)
Transpose[Outer[Plus, a, b, 2, 3], {2, 5, 1, 3, 6, 4}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
Compiled Listability
Using different ranks in argument list of Compile
results in a kind of outer product to. "Excessive" trailing dimensions of higher rank array don't have to be compatible with any dimensions of lower rank array since they will end up appended at the and of dimensions of result.
cPlus02 = Compile[{x, {y, _Real, 2}}, x + y, RuntimeAttributes -> {Listable}];
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8, 2, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 2, 3, 9} *)
To emulate broadcasting in this case dimensions equal 1
should be removed, dimensions to be broadcasted from one array should be moved to beginning, and from other - to the end. Compiled function should have an argument with rank equal to number of compatible dimensions, as this argument, array with dimensions to be broadcasted at beginning, should be passed. Other argument should have rank equal to rank of array with dimensions to be broadcasted at end.
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {1, 3, 2}]) // Dimensions
(* {8, 7, 3} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {2, 3, 1, 4}]) // Dimensions
(* {3, 2, 5, 4} *)
cPlus14 = Compile[{{x, _Real, 1}, {y, _Real, 4}}, x + y, RuntimeAttributes -> {Listable}];
Transpose[cPlus14[a, b], {2, 5, 4, 1, 3, 6}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
Since compatible dimensions don't have to be handled inside body of compiled function, but can be handled by Listable
attribute, there are different orderings possible. Each compatible dimension can be moved from the middle of dimensions of first array to the beginning, and rank of both arguments of compiled function can be decreased by one for each such dimension.
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {2, 1, 3}]) // Dimensions
(* {3, 8, 7} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {2, 3, 1, 4}]) // Dimensions
(* {3, 2, 5, 4} *)
cPlus03 = Compile[{x, {y, _Real, 3}}, x + y, RuntimeAttributes -> {Listable}];
Transpose[cPlus03[a, b], {4, 2, 5, 1, 3, 6}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
General Broadcasting in Mathematica
Below I present three approaches to broadcasting in Mathematica, with different generality and efficiency.
Top-level Procedural code.
It's straightforward, completely general (works for arbitrary number of lists and arbitrary function), but it's slow.
LibraryLink static function.
It's very fast, currently works for addition of arbitrary number of real arrays with arbitrary dimensions.
LibraryLink JIT compiled function.
It's fastest, from presented solutions, and quite general (works for arbitrary compilable function and arbitrary number of arbitrary packable arrays with arbitrary dimensions), but it's compiled separately for each function and each "type" of arguments.
1. Top-level Procedural
This implementation uses dimensions of input arrays to construct proper Table
expression that creates resulting array in one call by extracting proper elements from input arrays.
A helper function that constructs the Table
expression:
ClearAll[broadcastingTable]
broadcastingTable[h_, f_, arrays_, dims_, maxDims_] :=
Module[{inactive, tableVars = Table[Unique["i"], Length[maxDims]]},
Prepend[
inactive[h] @@ Transpose[{tableVars, maxDims}],
inactive[f] @@ MapThread[
inactive[Part][#1, Sequence @@ #2] &,
{
arrays,
MapThread[
If[#1 === 1, 1, #2] &,
{#, PadLeft[tableVars, Length[#]]}
] & /@ dims
}
]
] /. inactive[x_] :> x
]
Example table expression (with head replaced by Hold
) for three arrays with dimensions: {4, 1, 5}
, {7, 4, 3, 1}
and {1, 5}
looks like this:
broadcastingTable[Hold, Plus,
{arr1, arr2, arr3},
{{4, 1, 5}, {7, 4, 3, 1}, {1, 5}},
{7, 4, 3, 5}
]
(* Hold[arr1[[i4, 1, i6]] + arr2[[i3, i4, i5, 1]] + arr3[[1, i6]], {i3, 7}, {i4, 4}, {i5, 3}, {i6, 5}] *)
And now the final function:
ClearAll[broadcasted]
broadcasted::incompDims = "Objects with dimentions `1` can't be broadcasted.";
broadcasted[f_, lists__] :=
Module[{listOfLists, dims, dimColumns},
listOfLists = {lists};
dims = Dimensions /@ listOfLists;
dimColumns = Transpose@PadLeft[dims, Automatic, 1];
broadcastingTable[Table, f, listOfLists, dims, Max /@ dimColumns] /;
If[MemberQ[dimColumns, dimCol_ /; ! SameQ @@ DeleteCases[dimCol, 1]],
Message[broadcasted::incompDims, dims];
False
(* else *),
True
]
]
It works for any function and any lists not necessary full arrays:
broadcasted[f, {a, {b, c}}, {{1}, {2}}]
(* {{f[a, 1], f[{b, c}, 1]}, {f[a, 2], f[{b, c}, 2]}} *)
For full arrays gives same results as NumPy:
broadcasted[Plus, Array[a, {2}], Array[b, {10, 2}]] // Dimensions
(* {10, 2} *)
broadcasted[Plus, Array[a, {3, 4, 1, 5, 1}], Array[b, {3, 1, 2, 1, 3}]] // Dimensions
(* {3, 4, 2, 5, 3} *)
broadcasted[Plus, Array[a, {10, 1, 5, 3}], Array[b, {2, 1, 3}], Array[# &, {5, 1}]] // Dimensions
(* {10, 2, 5, 3} *)
If dimensions are not broadcastable message is printed and function remains unevaluated:
broadcasted[Plus, Array[a, {3}], Array[b, {4, 2}]]
(* During evaluation of In[]:= broadcasted::incompDims: Objects with dimentions {{3},{4,2}} can't be broadcasted. *)
(* broadcasted[Plus,
{a[1], a[2], a[3]},
{{b[1, 1], b[1, 2]}, {b[2, 1], b[2, 2]}, {b[3, 1], b[3, 2]}, {b[4, 1], b[4, 2]}}
] *)
2. LibraryLink static
Here is a LibraryLink function that handles arbitrary number of arrays of reals with arbitrary dimensions.
/* broadcasting.c */
#include "WolframLibrary.h"
DLLEXPORT mint WolframLibrary_getVersion() {
return WolframLibraryVersion;
}
DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {
return LIBRARY_NO_ERROR;
}
DLLEXPORT void WolframLibrary_uninitialize(WolframLibraryData libData) {}
DLLEXPORT int plusBroadcastedReal(
WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res
) {
switch (Argc) {
case 0:
/* At least one argument is needed. */
return LIBRARY_FUNCTION_ERROR;
case 1:
/* If one argument is given just return it. */
MArgument_setMTensor(Res, MArgument_getMTensor(Args[0]));
return LIBRARY_NO_ERROR;
}
mint i, j;
/* ranks[i] is rank of i-th argument tensor. */
mint ranks[Argc];
/* dims[i][j] is j-th dimension of i-th argument tensor. */
const mint *(dims[Argc]);
/* data[i][j] is j-th element of i-th argument tensor. */
double *(data[Argc]);
/* Rank of result tensor. */
mint resultRank = 1;
for (i = 0; i < Argc; i++) {
MTensor tmpT = MArgument_getMTensor(Args[i]);
if (libData->MTensor_getType(tmpT) != MType_Real) {
return LIBRARY_TYPE_ERROR;
}
ranks[i] = libData->MTensor_getRank(tmpT);
dims[i] = libData->MTensor_getDimensions(tmpT);
data[i] = libData->MTensor_getRealData(tmpT);
if (resultRank < ranks[i]) {
resultRank = ranks[i];
}
}
/*
* Array of dimensions of argument tensors, with rows,
* for tensors with ranks lower than rank of result,
* filled with 1s from the beginning.
*/
mint extendedDims[Argc][resultRank];
/*
* Array of strides of argument tensors, with rows,
* for tensors with ranks lower than rank of result,
* filled with product of all tensor dimensions from the beginning.
*/
mint strides[Argc][resultRank];
/* Array of indices enumerating element of argument tensors. */
mint indices[Argc];
for (i = 0; i < Argc; i++) {
mint rankDiff = resultRank - ranks[i];
extendedDims[i][resultRank - 1] = dims[i][ranks[i] - 1];
strides[i][resultRank - 1] = extendedDims[i][resultRank - 1];
for (j = resultRank - 2; j >= rankDiff; j--) {
extendedDims[i][j] = dims[i][j - rankDiff];
strides[i][j] = strides[i][j + 1] * extendedDims[i][j];
}
for (j = rankDiff - 1; j >= 0; j--) {
extendedDims[i][j] = 1;
strides[i][j] = strides[i][rankDiff];
}
indices[i] = 0;
}
/* Dimensions of result tensor. */
mint resultDims[resultRank];
/*
* jumps[i][j] is jump of index of i-th argument tensor when index in j-th
* dimension of result tensor is incremented.
*/
mint jumps[Argc][resultRank];
/* Total number of elements in result tensor. */
mint resultElementsNumber = 1;
/* Array of indices enumerating elements of result tensor one index per dimension. */
mint resultIndices[resultRank];
for (i = resultRank - 1; i >= 0; i--) {
resultDims[i] = 1;
for (j= 0; j < Argc; j++) {
if (extendedDims[j][i] == 1) {
/*
* i-th dimension of j-th argument tensor is 1,
* so it should be broadcasted.
*/
jumps[j][i] = 1 - strides[j][i];
} else if (resultDims[i] == 1 || resultDims[i] == extendedDims[j][i]) {
/*
* i-th dimension of j-th argument tensor is not 1,
* but it's equal to all non-1 i-th dimensions of previous argument tensors,
* so i-th dimension of j-th argument tensor should be i-th dimension
* of result and it shouldn't be broadcasted.
*/
resultDims[i] = extendedDims[j][i];
jumps[j][i] = 1;
} else {
/*
* i-th dimension of j-th argument tensor is not 1,
* i-th dimension of at least one of previous argument tensors was not 1
* and those dimensions are not equal, so tensors are not broadcastable.
*/
libData->Message("plusBroadcastedDims");
return LIBRARY_DIMENSION_ERROR;
}
}
resultElementsNumber *= resultDims[i];
resultIndices[i] = 0;
}
/* Returned tensor. */
MTensor resultT;
libData->MTensor_new(MType_Real, resultRank, resultDims, &resultT);
/* Actual data of returned tensor. */
double *result;
result = libData->MTensor_getRealData(resultT);
/*
* We use single loop over all elements of result array.
* resultIndices array is updated inside loop and contains indices
* corresponding to current result element as if it was accessed using one
* index per dimension, i.e. result[i] is like
* result[resultIndices[0]][resultIndices[1]]...[resultIndices[resultRank-1]]
* for multidimensional array.
*/
for (i = 0; i < resultElementsNumber; i++) {
mint k = resultRank - 1;
resultIndices[k]++;
while (resultIndices[k] >= resultDims[k] && k >= 1) {
resultIndices[k] = 0;
k--;
resultIndices[k]++;
}
/*
* If result would be accessed using one index per dimension,
* then current value of k would correspond to dimension which
* index was incremented in this iteration.
*/
/* At this point we know that we have at least two argument tensors. */
result[i] = data[0][indices[0]] + data[1][indices[1]];
indices[0] += jumps[0][k];
indices[1] += jumps[1][k];
for (j = 2; j < Argc; j++) {
result[i] += data[j][indices[j]];
indices[j] += jumps[j][k];
}
}
MArgument_setMTensor(Res, resultT);
return LIBRARY_NO_ERROR;
}
Save above code in broadcasting.c
file in same directory as current notebook, or paste it as a string, instead of {"broadcasting.c"}
, as first argument of CreateLibrary
in code below. Pass, in "CompileOptions"
, appropriate optimization flags for your compiler, the ones below are for GCC.
Needs["CCompilerDriver`"]
SetDirectory[NotebookDirectory[]];
broadcastingLib =
CreateLibrary[
{"broadcasting.c"}, "broadcasting",
(* "CompileOptions" -> "-Wall -march=native -O3" *)
];
LibraryFunction::plusBroadcastedDims =
"Given arrays could not be broadcasted together.";
A helper function that loads appropriate library function for given number of array arguments.
ClearAll[loadPlusBroadcastedReal]
loadPlusBroadcastedReal[argc_] := loadPlusBroadcastedReal[argc] =
Quiet[
LibraryFunctionLoad[
broadcastingLib,
"plusBroadcastedReal",
ConstantArray[{Real, _, "Constant"}, argc],
{Real, _}
],
LibraryFunction::overload
]
Now final function that accepts arbitrary number of arrays with arbitrary dimensions, loads necessary library function, and uses it.
ClearAll[plusBroadcastedReal]
plusBroadcastedReal[arrays__] :=
loadPlusBroadcastedReal[Length@{arrays}][arrays]
It works as expected:
plusBroadcastedReal[{1., 2.}, {{3., 4.}, {5., 6.}, {7., 8.}}]
(* {{4., 6.}, {6., 8.}, {8., 10.}} *)
If given arrays have incompatible dimensions, then an error is generated:
plusBroadcastedReal[RandomReal[{0, 1}, {4}], RandomReal[{0, 1}, {2, 3}]]
(* During evaluation of In[]:= LibraryFunction::plusBroadcastedDims: Given arrays could not be broadcasted together. >> *)
(* During evaluation of In[]:= LibraryFunction::dimerr: An error caused by inconsistent dimensions or exceeding array bounds was encountered evaluating the function plusBroadcastedReal. >> *)
(* LibraryFunctionError["LIBRARY_DIMENSION_ERROR", 3] *)
Full post exceeded maximum allowed size, so it's continued in second answer.
Challenging NumPy's performance will be extremely difficult, and thus the effort of implementing this is not likely to be worthwhile. The reason is that the multiple-transpose method, even though it has some overhead, is already a very good way to accomplish this type of operation in Mathematica:
mat = RandomReal[1., {40000000, 2}];
vec = {1., 2.};
Transpose[vec + Transpose[mat]]; // AbsoluteTiming (* -> 1.812500 seconds *)
Your Python code takes 1.484375 seconds on my computer, so Mathematica loses out by about 25%, and not the 70% you show. In my experience, one rarely encounters situations in which a minor performance difference such as this would change anything significantly.
Let's take the following as an alternative:
cf1 = Compile[{{vec, _Real, 1}, {mat, _Real, 2}},
Table[i + vec, {i, mat}],
CompilationTarget -> "C"
];
cf2 = Compile[{{vec, _Real, 1}, {mat, _Real, 2}},
Block[{res = mat}, Do[res[[i]] += vec, {i, Length[res]}]; res],
CompilationTarget -> "C"
];
These are the fastest compiled functions that I could write. But their performance doesn't even come close:
mat = RandomReal[1., {40000000, 2}];
vec = {1., 2.};
cf1[vec, mat]; // AbsoluteTiming (* -> 4.546875 seconds *)
cf2[vec, mat]; // AbsoluteTiming (* -> 4.421875 seconds *)
They also have very significant intermediate memory consumption.
If we can't make progress with compiled code, what can we do? The next step would probably be to write one's own LibraryLink code in C++ (using e.g. Eigen), or Fortran (using a vendor BLAS, such as MKL). Of course, these libraries are intended for linear algebra applications, and so offer limited or no functions for manipulating higher-dimensional arrays. Still, one can do so efficiently, if not necessarily straightforwardly, using the carefully tuned and high-performing matrix and vector functions as primitives.
However, Mathematica also uses BLAS routines and is linked to the MKL. Some of the functions are exposed in the LinearAlgebra`BLAS`
context (and more in LinearAlgebra`LAPACK`
, for higher-level linear algebra operations rather than simple matrix-vector arithmetic). It is then a matter of choosing a suitable operation from the ones that are available.
GER looks useful:
$$ \mathrm{GER}: \alpha, \vec{x}, \vec{y}, \mathbf{A} : \mathbf{A} \leftarrow \alpha \vec{x} {\vec{y}}^\mathrm{T} + \mathbf{A} $$
As you can see, it is a more general operation than the column-wise sum sought, so it could be adapted to other purposes beyond this one with no additional performance penalty. But note that it overwrites its input, so that for a fair test, we should first make a copy. We can use it as follows:
A = RandomReal[1., {40000000, 2}];
alpha = 1.;
x = ConstantArray[1., Length[A]];
y = {1., 2.};
Block[{A = A}, LinearAlgebra`BLAS`GER[alpha, x, y, A]; A]; // AbsoluteTiming
(* -> 1.390625 seconds *)
Thus, we have matched (or even slightly beaten) NumPy. But, this is far from a general-purpose operation. The intention of this answer is to show that rivalling NumPy's performance is extremely difficult using only top-level Mathematica, simply because NumPy is designed to do these operations efficiently, whereas Mathematica doesn't have the same design and we as users aren't at liberty to redesign it. Instead, we have to use the tools that are made available, most of which similarly do not come close to NumPy's pure C implementation because they are not designed for this particular task. I suspect that, in most cases, to achieve comparable performance simply won't be possible without using lower-level approaches in C/C++ or Fortran.
Mathematica doesn't do that because it's ambiguous. Note that Mathematica is perfectly happy to do "broadcasting", as you call it, if the second array is transposed:
In[1]:= {1, 2} + {{1, 2, 3}, {2, 3, 4}}
Out[1]= {{2, 3, 4}, {4, 5, 6}}
This, in fact, gives you one way to get the result that you want:
In[2]:= Transpose[{1, 2} + Transpose@{{1, 2, 3}, {2, 3, 4}}]
Out[2]= {{2, 4}, {3, 5}, {4, 6}}
As to why one works, and the other doesn't, well, what is Mathematica supposed to do if you're adding a length $ 2 $ vector to a $ 2 \times 2 $ matrix? Should
$$ [a\; b] + \begin{bmatrix} x & y \\ z & w \end{bmatrix} $$
return
$$ \begin{bmatrix} a + x & a + y \\ b + z & b + w \end{bmatrix} $$
or
$$ \begin{bmatrix} a + x & b + y \\ a + z & b + w \end{bmatrix} $$
In Mathematica, we can rely on the fact that it returns the former:
In[3]:= {a, b} + {{x, y}, {z, w}}
Out[3]= {{a + x, a + y}, {b + z, b + w}}
This means we don't need a hard-to-remember rule for special cases, and if you want the other behavior, you need to ask for it explicitly, in one way or another.