Why doesn't chained (interval) comparison work on numpy arrays?
0 < numlist < 3.5
Is equivalent to:
(0 < numlist) and (numlist < 3.5)
except that numlist
is only evaluated once.
The implicit and
between the two results is causing the error
So the docs say:
Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.
and
(but in both cases z is not evaluated at all when x < y is found to be false).
For a scalar
In [20]: x=5
In [21]: 0<x<10
Out[21]: True
In [22]: 0<x and x<10
Out[22]: True
But with an array
In [24]: x=np.array([4,5,6])
In [25]: 0<x and x<10
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This ValueError arises when a numpy boolean is used in a context that expects a scalar boolean.
In [26]: (0<x)
Out[26]: array([ True, True, True], dtype=bool)
In [30]: np.array([True, False]) or True
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [33]: if np.array([True, False]): print('yes')
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It evaluates the 0<x
, but doesn't even get to evaluating the x<10
, because it can't use the resulting boolean array in a or/and
context. numpy
has defined |
and &
, but not or
or and
.
In [34]: (0<x) & x<10
Out[34]: array([ True, True, True], dtype=bool)
When we use 0 < x <10
we are implicitly expecting to evaluate a vectorized version of the scalar chained expression.
In [35]: f = np.vectorize(lambda x: 0<x<10, otypes=[bool])
In [36]: f(x)
Out[36]: array([ True, True, True], dtype=bool)
In [37]: f([-1,5,11])
Out[37]: array([False, True, False], dtype=bool)
Note that attempting to apply chaining to a list doesn't even get past the first <
:
In [39]: 0 < [-1,5,11]
TypeError: unorderable types: int() < list()
This set of expressions indicates that the &
operator has precedence over the <
operator:
In [44]: 0 < x & x<10
ValueError ...
In [45]: (0 < x) & x<10
Out[45]: array([ True, True, True], dtype=bool)
In [46]: 0 < x & (x<10)
Out[46]: array([False, True, False], dtype=bool)
In [47]: 0 < (x & x)<10
ValueError...
So the safe version is (0 < x) & (x<10)
, making sure that all <
are evaluated before the &
.
edit
Here's a further example that confirms the short-cut and
evaluation:
In [53]: x=2
In [54]: 3<x<np.arange(4)
Out[54]: False
In [55]: 1<x<np.arange(4)
Out[55]: array([False, False, False, True])
When 3<x
is False
, it returns that, without further evaluation.
When it is True
, it goes on to evaluate x<np.arange(4)
, returning a 4 element boolean.
Or with a list that doesn't support <
at all:
In [56]: 3<x<[1,2,3]
Out[56]: False
In [57]: 1<x<[1,2,3]
Traceback (most recent call last):
File "<ipython-input-57-e7430e03ad55>", line 1, in <module>
1<x<[1,2,3]
TypeError: '<' not supported between instances of 'int' and 'list'