Downloading from S3 with aws-cli using filter on specific prefix
I think --include
does the filtering locally. So if your bucket contains millions of files, the command can take hours to run, because it needs to download a list of all the filenames in the bucket. Also, some extra network traffic.
But aws s3 ls
can take a truncated filename to list all the corresponding files, without any extra traffic. So you can
aws s3 ls s3://yourbucket/backup.2017-
to see your files, and something like
aws s3 ls s3://yourbucket/backup.2017- | colrm 1 31 | xargs -I % aws s3 cp s3://yourbucket/% .
to copy your files.
You'll have to use aws s3 sync s3://yourbucket/
There are two parameters you can give to aws s3 sync; --exclude and --include, both of which can take the "*" wildcard.
First we'll have to --exclude "*"
to exclude all of the files, and then we'll --include "backup.2017-01-01*"
to include all the files we want with the specific prefix. Obviously you can change the include around so you could also do something like --include "*-01-01*"
.
That's it, here's the full command:
aws s3 sync s3://yourbucket/ . --exclude "*" --include "backup.2017-01-01*"
Also, remember to use --dryrun
to test your command and avoid downloading all files in the bucket.
A PowerShell equivalent for @sampo-smolander's answer
,@(Get-ChildItem -recurse | aws s3 ls s3://yourbucket/backup.2017-) | Select-Object -ExpandProperty syncroot | foreach-Object {$_.split(" ")[-1]} | %{&"aws" s3 cp s3://yourbucket/$_ .}
Posting it here since I spent a lot of time figuring out this, so hopefully it helps someone else who needs to use powershell. Also I'm not too familiar with powershell so it may need optimization.