What's the best way to downsample a numpy array?
Reshape to split the last two axes into two more, such that the latter split ones are of lengths equal to the block sizes, giving us a 5D
array and then use mean
along the third and fifth axes -
BSZ = (8,8)
m,n = a.shape[1:]
out = a.reshape(N,m//BSZ[0],BSZ[0],n//BSZ[1],BSZ[1]).mean(axis=(2,4))
Sample run on smaller array with smaller block-size (2,2)
-
1) Inputs :
In [271]: N = 2
In [272]: a = np.random.randint(0,9,(N,6,6))
In [273]: a
Out[273]:
array([[[3, 1, 8, 7, 8, 2],
[0, 6, 2, 6, 8, 2],
[2, 1, 1, 0, 0, 1],
[8, 3, 0, 2, 8, 0],
[4, 7, 2, 6, 6, 7],
[5, 5, 1, 7, 2, 7]],
[[0, 0, 8, 1, 7, 6],
[8, 6, 5, 8, 4, 0],
[0, 3, 7, 7, 6, 1],
[7, 1, 7, 6, 3, 6],
[7, 6, 4, 6, 4, 5],
[4, 2, 0, 2, 6, 2]]])
2) Get few output values for manual verification :
In [274]: a[0,:2,:2].mean()
Out[274]: 2.5
In [275]: a[0,:2,2:4].mean()
Out[275]: 5.75
In [276]: a[0,:2,4:6].mean()
Out[276]: 5.0
In [277]: a[0,2:4,:2].mean()
Out[277]: 3.5
3) Use proposed approach and manually verify :
In [278]: BSZ = (2,2)
In [279]: m,n = a.shape[1:]
In [280]: a.reshape(N,m//BSZ[0],BSZ[0],n//BSZ[1],BSZ[1]).mean(axis=(2,4))
Out[280]:
array([[[ 2.5 , 5.75, 5. ],
[ 3.5 , 0.75, 2.25],
[ 5.25, 4. , 5.5 ]],
[[ 3.5 , 5.5 , 4.25],
[ 2.75, 6.75, 4. ],
[ 4.75, 3. , 4.25]]])
There is a neat solution in form of the function block_reduce
in the scikit-image
module (link to docs).
It has a very simple interface to downsample arrays by applying a function such as numpy.mean
. The downsampling can be done by different factors for different axes by supplying a tuple with different sizes for the blocks. Here's an example with a 2D array; downsampling only axis 1 by 5 using the mean:
import numpy as np
from skimage.measure import block_reduce
arr = np.stack((np.arange(1,20), np.arange(20,39)))
# array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])
arr_reduced = block_reduce(arr, block_size=(1,5), func=np.mean, cval=np.mean(arr))
# array([[ 3. , 8. , 13. , 17.8],
# [22. , 27. , 32. , 33. ]])