Generate zip stream without using temp files
As the gzip module documentation states, you can pass a file-like object to the GzipFile
constructor.
Since python is duck-typed, you're free to implement your own stream, like so:
import sys
from gzip import GzipFile
class MyStream(object):
def write(self, data):
#write to your stream...
sys.stdout.write(data) #stdout, for example
gz= GzipFile( fileobj=MyStream(), mode='w' )
gz.write("something")
@goncaplopp's answer is great, but you can achieve more parallelism if you run gzip externally. Since you are collecting lots of data, it may be worth the extra effort. You'll need to find your own compression routine for windows (there are several gzip implementations, but something like 7z may work also). You could also experiment with things like lz that compress more than gzip, depending on what else you need to optimize in your system.
import subprocess as subp
import os
class GZipWriter(object):
def __init__(self, filename):
self.filename = filename
self.fp = None
def __enter__(self):
self.fp = open(self.filename, 'wb')
self.proc = subp.Popen(['gzip'], stdin=subp.PIPE, stdout=self.fp)
return self
def __exit__(self, type, value, traceback):
self.close()
if type:
os.remove(self.filename)
def close(self):
if self.fp:
self.fp.close()
self.fp = None
def write(self, data):
self.proc.stdin.write(data)
with GZipWriter('sometempfile') as gz:
for i in range(10):
gz.write('a'*80+'\n')