Downloading from S3 with aws-cli using filter on specific prefix

I think --include does the filtering locally. So if your bucket contains millions of files, the command can take hours to run, because it needs to download a list of all the filenames in the bucket. Also, some extra network traffic.

But aws s3 ls can take a truncated filename to list all the corresponding files, without any extra traffic. So you can

aws s3 ls s3://yourbucket/backup.2017-

to see your files, and something like

aws s3 ls s3://yourbucket/backup.2017- | colrm 1 31 | xargs -I % aws s3 cp s3://yourbucket/% .

to copy your files.


You'll have to use aws s3 sync s3://yourbucket/

There are two parameters you can give to aws s3 sync; --exclude and --include, both of which can take the "*" wildcard.

First we'll have to --exclude "*" to exclude all of the files, and then we'll --include "backup.2017-01-01*" to include all the files we want with the specific prefix. Obviously you can change the include around so you could also do something like --include "*-01-01*".

That's it, here's the full command:

aws s3 sync s3://yourbucket/ . --exclude "*" --include "backup.2017-01-01*"

Also, remember to use --dryrun to test your command and avoid downloading all files in the bucket.


A PowerShell equivalent for @sampo-smolander's answer

,@(Get-ChildItem -recurse | aws s3 ls s3://yourbucket/backup.2017-) | Select-Object -ExpandProperty syncroot |  foreach-Object {$_.split(" ")[-1]} | %{&"aws" s3 cp s3://yourbucket/$_ .}

Posting it here since I spent a lot of time figuring out this, so hopefully it helps someone else who needs to use powershell. Also I'm not too familiar with powershell so it may need optimization.