gotchas where Numpy differs from straight python?
I think this one is funny:
>>> import numpy as n
>>> a = n.array([[1,2],[3,4]])
>>> a[1], a[0] = a[0], a[1]
>>> a
array([[1, 2],
[1, 2]])
For Python lists on the other hand this works as intended:
>>> b = [[1,2],[3,4]]
>>> b[1], b[0] = b[0], b[1]
>>> b
[[3, 4], [1, 2]]
Funny side note: numpy itself had a bug in the shuffle
function, because it used that notation :-) (see here).
The reason is that in the first case we are dealing with views of the array, so the values are overwritten in-place.
The biggest gotcha for me was that almost every standard operator is overloaded to distribute across the array.
Define a list and an array
>>> l = range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> import numpy
>>> a = numpy.array(l)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Multiplication duplicates the python list, but distributes over the numpy array
>>> l * 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a * 2
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
Addition and division are not defined on python lists
>>> l + 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
>>> a + 2
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> l / 2.0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float'
>>> a / 2.0
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
Numpy overloads to treat lists like arrays sometimes
>>> a + a
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
>>> a + l
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
Because __eq__
does not return a bool, using numpy arrays in any kind of containers prevents equality testing without a container-specific work around.
Example:
>>> import numpy
>>> a = numpy.array(range(3))
>>> b = numpy.array(range(3))
>>> a == b
array([ True, True, True], dtype=bool)
>>> x = (a, 'banana')
>>> y = (b, 'banana')
>>> x == y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is a horrible problem. For example, you cannot write unittests for containers which use TestCase.assertEqual()
and must instead write custom comparison functions. Suppose we write a work-around function special_eq_for_numpy_and_tuples
. Now we can do this in a unittest:
x = (array1, 'deserialized')
y = (array2, 'deserialized')
self.failUnless( special_eq_for_numpy_and_tuples(x, y) )
Now we must do this for every container type we might use to store numpy arrays. Furthermore, __eq__
might return a bool rather than an array of bools:
>>> a = numpy.array(range(3))
>>> b = numpy.array(range(5))
>>> a == b
False
Now each of our container-specific equality comparison functions must also handle that special case.
Maybe we can patch over this wart with a subclass?
>>> class SaneEqualityArray (numpy.ndarray):
... def __eq__(self, other):
... return isinstance(other, SaneEqualityArray) and self.shape == other.shape and (numpy.ndarray.__eq__(self, other)).all()
...
>>> a = SaneEqualityArray( (2, 3) )
>>> a.fill(7)
>>> b = SaneEqualityArray( (2, 3) )
>>> b.fill(7)
>>> a == b
True
>>> x = (a, 'banana')
>>> y = (b, 'banana')
>>> x == y
True
>>> c = SaneEqualityArray( (7, 7) )
>>> c.fill(7)
>>> a == c
False
That seems to do the right thing. The class should also explicitly export elementwise comparison, since that is often useful.