List directory contents of an S3 bucket using Python and Boto3?
All these other responses leave things to be desired. Using
client.list_objects()
Limits you to 1k results max. The rest of the answers are either wrong or too complex.
Dealing with the continuation token yourself is a terrible idea. Just use paginator, which deals with that logic for you
The solution you want is:
[e['Key'] for p in client.get_paginator("list_objects_v2")\
.paginate(Bucket='my_bucket')
for e in p['Contents']]
The best way to get the list of ALL objects with a specific prefix in a S3 bucket is using list_objects_v2
along with ContinuationToken
to overcome the 1000 object pagination limit.
import boto3
s3 = boto3.client('s3')
s3_bucket = 'your-bucket'
s3_prefix = 'your/prefix'
partial_list = s3.list_objects_v2(
Bucket=s3_bucket,
Prefix=s3_prefix)
obj_list = partial_list['Contents']
while partial_list['IsTruncated']:
next_token = partial_list['NextContinuationToken']
partial_list = s3.list_objects_v2(
Bucket=s3_bucket,
Prefix=s3_prefix,
ContinuationToken=next_token)
obj_list.extend(partial_list['Contents'])
Alternatively you may want to use boto3.client
Example
import boto3
client = boto3.client('s3')
client.list_objects(Bucket='MyBucket')
list_objects
also supports other arguments that might be required to iterate though the result: Bucket, Delimiter, EncodingType, Marker, MaxKeys, Prefix
If you have the session, create a client and get the CommonPrefixes
of the clients list_objects
:
client = session.client('s3',
# region_name='eu-west-1'
)
result = client.list_objects(Bucket='MyBucket', Delimiter='/')
for obj in result.get('CommonPrefixes'):
#handle obj.get('Prefix')
There could be a lot of folders, and you might want to start in a subfolder, though. Something like this could handle that:
def folders(client, bucket, prefix=''):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
for prefix in result.get('CommonPrefixes', []):
yield prefix.get('Prefix')
gen_folders = folders(client, 'MyBucket')
list(gen_folders)
gen_subfolders = folders(client, 'MyBucket', prefix='MySubFolder/')
list(gen_subfolders)