'and' (boolean) vs '&' (bitwise) - Why difference in behavior with lists vs numpy arrays?
and
tests whether both expressions are logically True
while &
(when used with True
/False
values) tests if both are True
.
In Python, empty built-in objects are typically treated as logically False
while non-empty built-ins are logically True
. This facilitates the common use case where you want to do something if a list is empty and something else if the list is not. Note that this means that the list [False] is logically True
:
>>> if [False]:
... print 'True'
...
True
So in Example 1, the first list is non-empty and therefore logically True
, so the truth value of the and
is the same as that of the second list. (In our case, the second list is non-empty and therefore logically True
, but identifying that would require an unnecessary step of calculation.)
For example 2, lists cannot meaningfully be combined in a bitwise fashion because they can contain arbitrary unlike elements. Things that can be combined bitwise include: Trues and Falses, integers.
NumPy objects, by contrast, support vectorized calculations. That is, they let you perform the same operations on multiple pieces of data.
Example 3 fails because NumPy arrays (of length > 1) have no truth value as this prevents vector-based logic confusion.
Example 4 is simply a vectorized bit and
operation.
Bottom Line
If you are not dealing with arrays and are not performing math manipulations of integers, you probably want
and
.If you have vectors of truth values that you wish to combine, use
numpy
with&
.
About list
First a very important point, from which everything will follow (I hope).
In ordinary Python, list
is not special in any way (except having cute syntax for constructing, which is mostly a historical accident). Once a list [3,2,6]
is made, it is for all intents and purposes just an ordinary Python object, like a number 3
, set {3,7}
, or a function lambda x: x+5
.
(Yes, it supports changing its elements, and it supports iteration, and many other things, but that's just what a type is: it supports some operations, while not supporting some others. int supports raising to a power, but that doesn't make it very special - it's just what an int is. lambda supports calling, but that doesn't make it very special - that's what lambda is for, after all:).
About and
and
is not an operator (you can call it "operator", but you can call "for" an operator too:). Operators in Python are (implemented through) methods called on objects of some type, usually written as part of that type. There is no way for a method to hold an evaluation of some of its operands, but and
can (and must) do that.
The consequence of that is that and
cannot be overloaded, just like for
cannot be overloaded. It is completely general, and communicates through a specified protocol. What you can do is customize your part of the protocol, but that doesn't mean you can alter the behavior of and
completely. The protocol is:
Imagine Python interpreting "a and b" (this doesn't happen literally this way, but it helps understanding). When it comes to "and", it looks at the object it has just evaluated (a), and asks it: are you true? (NOT: are you True
?) If you are an author of a's class, you can customize this answer. If a
answers "no", and
(skips b completely, it is not evaluated at all, and) says: a
is my result (NOT: False is my result).
If a
doesn't answer, and
asks it: what is your length? (Again, you can customize this as an author of a
's class). If a
answers 0, and
does the same as above - considers it false (NOT False), skips b, and gives a
as result.
If a
answers something other than 0 to the second question ("what is your length"), or it doesn't answer at all, or it answers "yes" to the first one ("are you true"), and
evaluates b, and says: b
is my result. Note that it does NOT ask b
any questions.
The other way to say all of this is that a and b
is almost the same as b if a else a
, except a is evaluated only once.
Now sit for a few minutes with a pen and paper, and convince yourself that when {a,b} is a subset of {True,False}, it works exactly as you would expect of Boolean operators. But I hope I have convinced you it is much more general, and as you'll see, much more useful this way.
Putting those two together
Now I hope you understand your example 1. and
doesn't care if mylist1 is a number, list, lambda or an object of a class Argmhbl. It just cares about mylist1's answer to the questions of the protocol. And of course, mylist1 answers 5 to the question about length, so and returns mylist2. And that's it. It has nothing to do with elements of mylist1 and mylist2 - they don't enter the picture anywhere.
Second example: &
on list
On the other hand, &
is an operator like any other, like +
for example. It can be defined for a type by defining a special method on that class. int
defines it as bitwise "and", and bool defines it as logical "and", but that's just one option: for example, sets and some other objects like dict keys views define it as a set intersection. list
just doesn't define it, probably because Guido didn't think of any obvious way of defining it.
numpy
On the other leg:-D, numpy arrays are special, or at least they are trying to be. Of course, numpy.array is just a class, it cannot override and
in any way, so it does the next best thing: when asked "are you true", numpy.array raises a ValueError, effectively saying "please rephrase the question, my view of truth doesn't fit into your model". (Note that the ValueError message doesn't speak about and
- because numpy.array doesn't know who is asking it the question; it just speaks about truth.)
For &
, it's completely different story. numpy.array can define it as it wishes, and it defines &
consistently with other operators: pointwise. So you finally get what you want.
HTH,
The short-circuiting boolean operators (and
, or
) can't be overriden because there is no satisfying way to do this without introducing new language features or sacrificing short circuiting. As you may or may not know, they evaluate the first operand for its truth value, and depending on that value, either evaluate and return the second argument, or don't evaluate the second argument and return the first:
something_true and x -> x
something_false and x -> something_false
something_true or x -> something_true
something_false or x -> x
Note that the (result of evaluating the) actual operand is returned, not truth value thereof.
The only way to customize their behavior is to override __nonzero__
(renamed to __bool__
in Python 3), so you can affect which operand gets returned, but not return something different. Lists (and other collections) are defined to be "truthy" when they contain anything at all, and "falsey" when they are empty.
NumPy arrays reject that notion: For the use cases they aim at, two different notions of truth are common: (1) Whether any element is true, and (2) whether all elements are true. Since these two are completely (and silently) incompatible, and neither is clearly more correct or more common, NumPy refuses to guess and requires you to explicitly use .any()
or .all()
.
&
and |
(and not
, by the way) can be fully overriden, as they don't short circuit. They can return anything at all when overriden, and NumPy makes good use of that to do element-wise operations, as they do with practically any other scalar operation. Lists, on the other hand, don't broadcast operations across their elements. Just as mylist1 - mylist2
doesn't mean anything and mylist1 + mylist2
means something completely different, there is no &
operator for lists.