Numpy array loss of dimension when masking
Look at arr>3
:
In [71]: arr>3
Out[71]:
array([[[[False, True],
[False, True],
[False, True]],
[[ True, True],
[ True, True],
[ True, True]]],
[[[ True, True],
[ True, True],
[ True, True]],
[[False, True],
[False, True],
[False, True]]]], dtype=bool)
arr[arr>3]
selects those elements where the mask is True
. What kind of structure or shape do you want that selection to have? Flat is the only thing that makes sense, doesn't it? arr
itself is not changed.
You could zero out the terms that don't fit the mask,
In [84]: arr1=arr.copy()
In [85]: arr1[arr<=3]=0
In [86]: arr1
Out[86]:
array([[[[ 0, 11],
[ 0, 22],
[ 0, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]],
[[[ 7, 77],
[ 8, 88],
[ 9, 99]],
[[ 0, 32],
[ 0, 33],
[ 0, 34]]]])
Now you could do weight sums or averages over various dimensions.
np.nonzero
(or np.where
) might also be useful, giving you the indices of the the selected terms:
In [88]: np.nonzero(arr>3)
Out[88]:
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1]),
array([0, 1, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 2]),
array([1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1]))
You might consider using an np.ma.masked_array
to represent the subset of elements that satisfy your condition:
import numpy as np
arr = np.asarray([[[[1, 11], [2, 22], [3, 33]],
[[4, 44], [5, 55], [6, 66]]],
[[[7, 77], [8, 88], [9, 99]],
[[0, 32], [1, 33], [2, 34]]]])
masked_arr = np.ma.masked_less(arr, 3)
print(masked_arr)
# [[[[-- 11]
# [-- 22]
# [3 33]]
# [[4 44]
# [5 55]
# [6 66]]]
# [[[7 77]
# [8 88]
# [9 99]]
# [[-- 32]
# [-- 33]
# [-- 34]]]]
As you can see, the masked array retains its original dimensions. You can access the underlying data and the mask via the .data
and .mask
attributes respectively. Most numpy functions will not take into account masked values, e.g.:
# mean of whole array
print(arr.mean())
# 26.75
# mean of non-masked elements only
print(masked_arr.mean())
# 33.4736842105
The result of an element-wise operation on a masked array and a non-masked array will also preserve the values of the mask:
masked_arrsum = masked_arr + np.random.randn(*arr.shape)
print(masked_arrsum)
# [[[[-- 11.359989067421582]
# [-- 23.249092437269162]
# [3.326111354088174 32.679132708120726]]
# [[4.289134334263137 43.38559221094378]
# [6.028063054523145 53.5043991898567]
# [7.44695154979811 65.56890530368757]]]
# [[[8.45692625294376 77.36860675985407]
# [5.915835159196378 87.28574554110307]
# [8.251106168209688 98.7621940026713]]
# [[-- 33.24398289945855]
# [-- 33.411941757624284]
# [-- 34.964817895873715]]]]
The sum is only computed over the non-masked values of masked_arr
- you can see this by looking at masked_sum.data
:
print(masked_sum.data)
# [[[[ 1. 11.35998907]
# [ 2. 23.24909244]
# [ 3.32611135 32.67913271]]
# [[ 4.28913433 43.38559221]
# [ 6.02806305 53.50439919]
# [ 7.44695155 65.5689053 ]]]
# [[[ 8.45692625 77.36860676]
# [ 5.91583516 87.28574554]
# [ 8.25110617 98.762194 ]]
# [[ 0. 33.2439829 ]
# [ 1. 33.41194176]
# [ 2. 34.9648179 ]]]]
Checkout numpy.where
http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
To keep the same dimensionality you are going to need a fill value. In the example below I use 0, but you could also use np.nan
np.where(arr>3, arr, 0)
returns
array([[[[ 0, 11],
[ 0, 22],
[ 0, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]],
[[[ 7, 77],
[ 8, 88],
[ 9, 99]],
[[ 0, 32],
[ 0, 33],
[ 0, 34]]]])