Downloading from S3 with aws-cli using filter on specific prefix

I think --include does the filtering locally. So if your bucket contains millions of files, the command can take hours to run, because it needs to download a list of all the filenames in the bucket. Also, some extra network traffic.

But aws s3 ls can take a truncated filename to list all the corresponding files, without any extra traffic. So you can

aws s3 ls s3://yourbucket/backup.2017-

to see your files, and something like

aws s3 ls s3://yourbucket/backup.2017- | colrm 1 31 | xargs -I % aws s3 cp s3://yourbucket/% .

to copy your files.

You'll have to use aws s3 sync s3://yourbucket/

There are two parameters you can give to aws s3 sync; --exclude and --include, both of which can take the "*" wildcard.

First we'll have to --exclude "*" to exclude all of the files, and then we'll --include "backup.2017-01-01*" to include all the files we want with the specific prefix. Obviously you can change the include around so you could also do something like --include "*-01-01*".

That's it, here's the full command:

aws s3 sync s3://yourbucket/ . --exclude "*" --include "backup.2017-01-01*"

Also, remember to use --dryrun to test your command and avoid downloading all files in the bucket.

A PowerShell equivalent for @sampo-smolander's answer

,@(Get-ChildItem -recurse | aws s3 ls s3://yourbucket/backup.2017-) | Select-Object -ExpandProperty syncroot |  foreach-Object {$_.split(" ")[-1]} | %{&"aws" s3 cp s3://yourbucket/$_ .}

Posting it here since I spent a lot of time figuring out this, so hopefully it helps someone else who needs to use powershell. Also I'm not too familiar with powershell so it may need optimization.

Downloading from S3 with aws-cli using filter on specific prefix

Tags:

Amazon S3

Aws Cli

Related

Recent Posts