Amazon s3 returns only 1000 entries for one bucket and all for another bucket (using java sdk)?
Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed.
You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated.
List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>();
ObjectListing objects = s3.listObjects("bucket.new.test");
keyList.addAll(objects.getObjectSummaries());
while (objects.isTruncated()) {
objects = s3.listNextBatchOfObjects(objects);
keyList.addAll(objects.getObjectSummaries());
}
For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {
def scan(acc:List[T], listing:ObjectListing): List[T] = {
val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
val mapped = (for (summary <- summaries) yield f(summary)).toList
if (!listing.isTruncated) mapped.toList
else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
}
scan(List(), s3.listObjects(bucket, prefix))
}
To invoke the above curried map()
function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f()
you want to apply to map each object summary in the second parameter list.
For example
val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))
will return the full list of (key, owner)
tuples in that bucket/prefix
or
map(s3, "bucket", "prefix")(s => println(s))
as you would normally approach by Monads in Functional Programming
I have just changed above code to use addAll instead of using a for loop to add objects one by one and it worked for me:
List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>();
ObjectListing object = s3.listObjects("bucket.new.test");
keyList = object.getObjectSummaries();
object = s3.listNextBatchOfObjects(object);
while (object.isTruncated()){
keyList.addAll(current.getObjectSummaries());
object = s3.listNextBatchOfObjects(current);
}
keyList.addAll(object.getObjectSummaries());
After that you can simply use any iterator over list keyList.