Zen of Python: Errors should never pass silently. Why does zip work the way it does?

Reason 1: Historical Reason

zip allows unequal-length arguments because it was meant to improve upon map by allowing unequal-length arguments. This behavior is the reason zip exists at all.

Here's how you did zip before it existed:

>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]

This is terribly unintuitive, and does not support unequal-length lists. This was a major design concern, which you can see plain-as-day in the official RFC proposing zip for the first time:

While the map() idiom is a common one in Python, it has several disadvantages:

  • It is non-obvious to programmers without a functional programming background.

  • The use of the magic None first argument is non-obvious.

  • It has arbitrary, often unintended, and inflexible semantics when the lists are not of the same length - the shorter sequences are padded with None :

    >>> c = (4, 5, 6, 7)

    >>> map(None, a, c)

    [(1, 4), (2, 5), (3, 6), (None, 7)]

So, no, this behaviour would not be treated as an error - it is why it was designed in the first place.


Reason 2: Practical Reason

Because it is pretty useful, is clearly specified and doesn't have to be thought of as an error at all.

By allowing unequal lengths, zip only requires that its arguments conform to the iterator protocol. This allows zip to be extended to generators, tuples, dictionary keys and literally anything in the world that implements __next__() and __iter__(), precisely because it doesn't inquire about length.

This is significant, because generators do not support len() and thus there is no way to check the length beforehand. Add a check for length, and you break zips ability to work on generators, when it should. That's a fairly serious disadvantage, wouldn't you agree?


Reason 3: By Fiat

Guido van Rossum wanted it this way:

Optional padding. An earlier version of this PEP proposed an optional pad keyword argument, which would be used when the argument sequences were not the same length. This is similar behavior to the map(None, ...) semantics except that the user would be able to specify pad object. This has been rejected by the BDFL in favor of always truncating to the shortest sequence, because of the KISS principle. If there's a true need, it is easier to add later. If it is not needed, it would still be impossible to delete it in the future.

KISS trumps everything.

Tags:

Python