Download, extract and read a gzip file in Python
Just gzip.GzipFile(fileobj=handle)
and you'll be on your way -- in other words, it's not really true that "the Gzip library only accepts filenames as arguments and not handles", you just have to use the fileobj=
named argument.
I've found this question while searching for methods to download and unzip a gzip
file from an URL but I didn't manage to make the accepted answer work in Python 2.7.
Here's what worked for me (adapted from here):
import urllib2
import gzip
import StringIO
def download(url):
# Download SEED database
out_file_path = url.split("/")[-1][:-3]
print('Downloading SEED Database from: {}'.format(url))
response = urllib2.urlopen(url)
compressed_file = StringIO.StringIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
# Extract SEED database
with open(out_file_path, 'w') as outfile:
outfile.write(decompressed_file.read())
# Filter SEED database
# ...
return
if __name__ == "__main__":
download("ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.0/fasta_files/RF00001.fa.gz")
I changed the target URL since the original one was dead: I just looked for a gzip
file served from an ftp server like in the original question.