How to extract files in S3 on the fly with boto3?

You can use BytesIO to stream the file from S3, run it through gzip, then pipe it back up to S3 using upload_fileobj to write the BytesIO.

# python imports
import boto3
from io import BytesIO
import gzip

# setup constants
bucket = '<bucket_name>'
gzipped_key = '<key_name.gz>'
uncompressed_key = '<key_name>'

# initialize s3 client, this is dependent upon your aws config being done 
s3 = boto3.client('s3', use_ssl=False)  # optional
s3.upload_fileobj(                      # upload a new obj to s3
    Fileobj=gzip.GzipFile(              # read in the output of gzip -d
        None,                           # just return output as BytesIO
        'rb',                           # read binary
        fileobj=BytesIO(s3.get_object(Bucket=bucket, Key=gzipped_key)['Body'].read())),
    Bucket=bucket,                      # target bucket, writing to
    Key=uncompressed_key)               # target key, writing to

Ensure that your key is reading in correctly:

# read the body of the s3 key object into a string to ensure download
s = s3.get_object(Bucket=bucket, Key=gzip_key)['Body'].read()
print(len(s))  # check to ensure some data was returned

The above answers are for gzip files, for zip files, you may try

import boto3
import zipfile
from io import BytesIO
bucket = 'bucket1'

s3 = boto3.client('s3', use_ssl=False)
Key_unzip = 'result_files/'

prefix      = "folder_name/"
zipped_keys =  s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter = "/")
file_list = []
for key in zipped_keys['Contents']:
    file_list.append(key['Key'])
#This will give you list of files in the folder you mentioned as prefix
s3_resource = boto3.resource('s3')
#Now create zip object one by one, this below is for 1st file in file_list
zip_obj = s3_resource.Object(bucket_name=bucket, key=file_list[0])
print (zip_obj)
buffer = BytesIO(zip_obj.get()["Body"].read())

z = zipfile.ZipFile(buffer)
for filename in z.namelist():
    file_info = z.getinfo(filename)
    s3_resource.meta.client.upload_fileobj(
        z.open(filename),
        Bucket=bucket,
        Key='result_files/' + f'{filename}')

This will work for your zip file and your result unzipped data will be in result_files folder. Make sure to increase memory and time on AWS Lambda to maximum since some files are pretty large and needs time to write.

How to extract files in S3 on the fly with boto3?

Tags:

Amazon S3

Amazon Web Services

Aws Lambda

Related

Recent Posts