How to access files within subfolders of a bucket GCS using Python?
What you missed is the fact that in GCS objects in a bucket aren't organized in a filesystem-like directory structure/hierarchy, but rather in a flat structure.
A more detailed explanation can be found in How Subdirectories Work (in the gsutil
context, true, but the fundamental reason is the same - the GCS flat namespace):
gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.
Since there are no (sub)directories in GCS then /training/bad
doesn't really exist, so you can't list its content. All you can do is list all the objects in the bucket and select the ones with names/paths that start with /training/bad
.
If you would like to find blobs (files) that exist under a specific prefix (subdirectory) you can specify prefix
and delimiter
arguments to the list_blobs()
function
See the following example taken from the Google Listing Objects example (also GitHub snippet)
def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
"""Lists all the blobs in the bucket that begin with the prefix.
This can be used to list all blobs in a "folder", e.g. "public/".
The delimiter argument can be used to restrict the results to only the
"files" in the given "folder". Without the delimiter, the entire tree under
the prefix is returned. For example, given these blobs:
/a/1.txt
/a/b/2.txt
If you just specify prefix = '/a', you'll get back:
/a/1.txt
/a/b/2.txt
However, if you specify prefix='/a' and delimiter='/', you'll get back:
/a/1.txt
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
if delimiter:
print('Prefixes:')
for prefix in blobs.prefixes:
print(prefix)