Python - download entire directory from Google Cloud Storage
If you want to keep the same directory structure without renaming and also create nested folders. I have for python 3.5+ a solution based on @ksbg answer :
from pathlib import Path
bucket_name = 'your-bucket-name'
prefix = 'your-bucket-directory/'
dl_dir = 'your-local-directory/'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name=bucket_name)
blobs = bucket.list_blobs(prefix=prefix) # Get list of files
for blob in blobs:
if blob.name.endswith("/"):
continue
file_split = blob.name.split("/")
directory = "/".join(file_split[0:-1])
Path(directory).mkdir(parents=True, exist_ok=True)
blob.download_to_filename(blob.name)
You just have to first list all the files in a directory and then download them one by one:
bucket_name = 'your-bucket-name'
prefix = 'your-bucket-directory/'
dl_dir = 'your-local-directory/'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name=bucket_name)
blobs = bucket.list_blobs(prefix=prefix) # Get list of files
for blob in blobs:
filename = blob.name.replace('/', '_')
blob.download_to_filename(dl_dir + filename) # Download
blob.name
includes the entire directory structure + filename, so if you want the same file name as in the bucket, you might want to extract it first (instead of replacing /
with _
)
Lets say, we want to download FINALFOLDER
from the storage path: gs://TEST_BUCKET_NAME/FOLDER1/FOLDER2/FINALFOLDER
After downloading, the final path will look like: D:\\my_blob_data\FINALFOLDER
from os import makedirs
from os.path import join, isdir, isfile, basename
from google.cloud import storage
# if your environment was authenticated, the default config will be picked up
storage_client = storage.Client() # comment this line if you want to use service account
# uncomment the line below if you have a service account json
# storage_client = storage.Client.from_service_account_json('creds/sa.json')
bucket_name = 'TEST_BUCKET_NAME'
prefix = 'FOLDER2'
dst_path = 'D:\\my_blob_data'
if isdir(dstPath) == False:
makedirs(dstPath)
bucket = storage_client.bucket(bucket_name=bucket_name)
blobs = bucket.list_blobs(prefix=prefix) # Get list of files
for blob in blobs:
blob_name = blob.name
dst_file_name = blob_name.replace('FOLDER1/FOLDER2', dst_path) #.replace('FOLDER1/FOLDER2', 'D:\\my_blob_data')
# extract the final directory and create it in the destination path if it does not exist
dst_dir = dst_file_name.replace('/' + basename(dst_file_name), '')
if isdir(dst_dir) == False:
makedirs(dst_dir)
# download the blob object
blob.download_to_filename(dst_file_name)