python gzipped fileinput returns binary string instead of text string
You'd have to implement your own openhook
function to open the files with a codec:
import os
def hook_compressed_text(filename, mode, encoding='utf8'):
ext = os.path.splitext(filename)[1]
if ext == '.gz':
import gzip
return gzip.open(filename, mode + 't', encoding=encoding)
elif ext == '.bz2':
import bz2
return bz2.open(filename, mode + 't', encoding=encoding)
else:
return open(filename, mode, encoding=encoding)
Coming a bit late to the party, but wouldn't it be simpler to do this?
for line in fileinput.FileInput(files=gzipped_files, openhook=fileinput.hook_compressed):
if isinstance(line, bytes):
line = line.decode()
...