Understanding the syntax of numpy.r_() concatenation
The paragraph that you've highlighted is the two comma-separated integers syntax which is a special case of the three comma-separated syntax. Once you understand the three comma-separated syntax the two comma-separated syntax falls into place.
The equivalent three comma-separated integers syntax for your example would be:
np.r_['0,2,-1', [1,2,3], [4,5,6]]
In order to provide a better explanation I will change the above to:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]]
The above has two parts:
A comma-separated integer string
Two comma-separated arrays
The comma-separated arrays have the following shapes:
np.array([1,2,3]).shape
(3,)
np.array([[4,5,6]]).shape
(1, 3)
In other words the first 'array' is '1-dimensional' while the second 'array' is '2-dimensional'.
First the 2
in 0,2,-1
means that each array
should be upgraded so that it's forced to be at least 2-dimensional
. Since the second array
is already 2-dimensional
it is not affected. However the first array
is 1-dimensional
and in order to make it 2-dimensional
np.r_
needs to add a 1 to its shape tuple
to make it either (1,3)
or (3,1)
. That is where the -1
in 0,2,-1
comes into play. It basically decides where the extra 1 needs to be placed in the shape tuple
of the array
. -1
is the default and places the 1
(or 1s
if more dimensions are required) in the front of the shape tuple
(I explain why further below). This turns the first array's
shape tuple
into (1,3)
which is the same as the second array's
shape tuple
. The 0
in 0,2,-1
means that the resulting arrays need to be concatenated along the '0' axis.
Since both arrays
now have a shape tuple
of (1,3)
concatenation is possible because if you set aside the concatenation axis (dimension 0 in the above example which has a value of 1) in both arrays
the remaining dimensions are equal (in this case the value of the remaining dimension in both arrays
is 3). If this was not the case then the following error would be produced:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Now if you concatenate two arrays
having the shape (1,3)
the resulting array
will have shape (1+1,3) == (2,3)
and therefore:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]].shape
(2, 3)
When a 0
or a positive integer is used for the third integer in the comma-separated string, that integer determines the start of each array's
shape tuple in the upgraded shape tuple
(only for those arrays
which need to have their dimensions upgraded). For example 0,2,0
means that for arrays
requiring a shape upgrade the array's
original shape tuple
should start at dimension 0 of the upgraded shape tuple
. For array
[1,2,3]
which has a shape tuple
(3,)
the 1
would be placed after the 3
. This would result in a shape tuple
equal to (3,1)
and as you can see the original shape tuple
(3,)
starts at dimension 0
of the upgraded shape tuple
. 0,2,1
would mean that for [1,2,3]
the array's
shape tuple
(3,)
should start at dimension 1 of the upgraded shape tuple. This means that the 1 needs to be placed at dimension 0. The resulting shape tuple would be (1,3)
.
When a negative number is used for the third integer in the comma-separated string, the integer following the negative sign determines where original shape tuple should end. When the original shape tuple is (3,)
0,2,-1
means that the original shape tuple should end at the last dimension of the upgraded shape tuple and therefore the 1 would be placed at dimension 0 of the upgraded shape tuple and the upgraded shape tuple would be (1,3)
. Now (3,)
ends at dimension 1 of the upgraded shape tuple which is also the last dimension of the upgraded shape tuple ( original array is [1,2,3]
and upgraded array is [[1,2,3]]
).
np.r_['0,2', [1,2,3], [4,5,6]]
Is the same as
np.r_['0,2,-1', [1,2,3], [4,5,6]]
Finally here's an example with more dimensions:
np.r_['2,4,1',[[1,2],[4,5],[10,11]],[7,8,9]].shape
(1, 3, 3, 1)
The comma-separated arrays are:
[[1,2],[4,5],[10,11]]
which has shape tuple (3,2)
[7,8,9]
which has shape tuple (3,)
Both of the arrays
need to be upgraded to 4-dimensional arrays
. The original array's
shape tuples need to start from dimension 1.
Therefore for the first array the shape becomes (1,3,2,1)
as 3,2
starts at dimension 1 and because two 1s need to be added to make it 4-dimensional
one 1 is placed before the original shape tuple and one 1 after.
Using the same logic the second array's shape tuple becomes (1,3,1,1)
.
Now the two arrays
need to be concatenated using dimension 2 as the concatenation axis. Eliminating dimension 2 from each array's upgraded shape tuple result in the tuple (1,3,1)
for both arrays
. As the resulting tuples are identical the arrays can be concatenated and the concatenated axis are summed up to produce (1, 3, 2+1, 1) == (1, 3, 3, 1)
.
'n,m'
tells r_
to concatenate along axis=n
, and produce a shape with at least m
dimensions:
In [28]: np.r_['0,2', [1,2,3], [4,5,6]]
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
So we are concatenating along axis=0, and we would normally therefore expect the result to have shape (6,)
, but since m=2
, we are telling r_
that the shape must be at least 2-dimensional. So instead we get shape (2,3)
:
In [32]: np.r_['0,2', [1,2,3,], [4,5,6]].shape
Out[32]: (2, 3)
Look at what happens when we increase m
:
In [36]: np.r_['0,3', [1,2,3,], [4,5,6]].shape
Out[36]: (2, 1, 3) # <- 3 dimensions
In [37]: np.r_['0,4', [1,2,3,], [4,5,6]].shape
Out[37]: (2, 1, 1, 3) # <- 4 dimensions
Anything you can do with r_
can also be done with one of the more readable array-building functions such as np.concatenate
, np.row_stack
, np.column_stack
, np.hstack
, np.vstack
or np.dstack
, though it may also require a call to reshape
.
Even with the call to reshape, those other functions may even be faster:
In [38]: %timeit np.r_['0,4', [1,2,3,], [4,5,6]]
10000 loops, best of 3: 38 us per loop
In [43]: %timeit np.concatenate(([1,2,3,], [4,5,6])).reshape(2,1,1,3)
100000 loops, best of 3: 10.2 us per loop