Zen of Python: Errors should never pass silently. Why does zip work the way it does?
Reason 1: Historical Reason
zip
allows unequal-length arguments because it was meant to improve upon map
by allowing unequal-length arguments. This behavior is the reason zip
exists at all.
Here's how you did zip
before it existed:
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]
This is terribly unintuitive, and does not support unequal-length lists. This was a major design concern, which you can see plain-as-day in the official RFC proposing zip
for the first time:
While the map() idiom is a common one in Python, it has several disadvantages:
It is non-obvious to programmers without a functional programming background.
The use of the magic
None
first argument is non-obvious.It has arbitrary, often unintended, and inflexible semantics when the lists are not of the same length - the shorter sequences are padded with
None
:
>>> c = (4, 5, 6, 7)
>>> map(None, a, c)
[(1, 4), (2, 5), (3, 6), (None, 7)]
So, no, this behaviour would not be treated as an error - it is why it was designed in the first place.
Reason 2: Practical Reason
Because it is pretty useful, is clearly specified and doesn't have to be thought of as an error at all.
By allowing unequal lengths, zip
only requires that its arguments conform to the iterator protocol. This allows zip
to be extended to generators, tuples, dictionary keys and literally anything in the world that implements __next__()
and __iter__()
, precisely because it doesn't inquire about length.
This is significant, because generators do not support len()
and thus there is no way to check the length beforehand. Add a check for length, and you break zip
s ability to work on generators, when it should. That's a fairly serious disadvantage, wouldn't you agree?
Reason 3: By Fiat
Guido van Rossum wanted it this way:
Optional padding. An earlier version of this PEP proposed an optional pad keyword argument, which would be used when the argument sequences were not the same length. This is similar behavior to the map(None, ...) semantics except that the user would be able to specify pad object. This has been rejected by the BDFL in favor of always truncating to the shortest sequence, because of the KISS principle. If there's a true need, it is easier to add later. If it is not needed, it would still be impossible to delete it in the future.
KISS trumps everything.