How to open an unicode text file inside a zip?

edit For Python 3, using io.TextIOWrapper as this answer describes is the best choice. The answer below could still be helpful for 2.x. I don't think anything below is actually incorrect even for 3.x, but io.TestIOWrapper is still better.

If the file is utf-8, this will work:

# the rest of the code as above, then:
with, 'rU') as readFile:
    line = readFile.readline().decode('utf8')
    # etc

If you're going to be iterating over the file you can use codecs.iterdecode, but that won't work with readline().

with, 'rU') as readFile:
    for line in codecs.iterdecode(readFile, 'utf8'):
        print line
        # etc

Note that neither approach is necessarily safe for multibyte encodings. For example, little-endian UTF-16 represents the newline character with the bytes b'\x0A\x00'. A non-unicode aware tool looking for newlines will split that incorrectly, leaving the null bytes on the following line. In such a case you'd have to use something that doesn't try to split the input by newlines, such as, and then decode the whole byte string at once. This is not a concern for utf-8.

To convert a byte stream into Unicode stream, you could use io.TextIOWrapper():

encoding = 'utf-8'
with zipfile.ZipFile("") as zfile:
    for name in zfile.namelist():
        with as readfile:
            for line in io.TextIOWrapper(readfile, encoding):

Note: TextIOWrapper() uses universal newline mode by default. rU mode in is deprecated since version 3.4.

It avoids issues with multibyte encodings described in @Peter DeGlopper's answer.

The reason why you're seeing that error is because you are trying to mix bytes with unicode. The argument to split must also be byte-string:

>>> line = b'$0.0\t1822\t1\t1\t1\n'
>>> line.split(b'\t')
[b'$0.0', b'1822', b'1', b'1', b'1\n']

To get a unicode string, use decode:

>>> line.decode('utf-8')