What should I do about this gsutil "parallel composite upload" warning?
Another way is to set the configuration that the prompt says inside a file in the BOTO_PATH
. usually $HOME/.boto
.
[GSUtil]
parallel_composite_upload_threshold = 150M
For max speed install the crcmod
C library
The Parallel Composite Uploads section of the documentation for gsutil
describes how to resolve this (assuming, as the warning specifies, that this content will be used by clients with the crcmod
module available):
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
To do this safely from Python would look like:
filename='myfile.csv'
gs_bucket='my/bucket'
parallel_threshold='150M' # minimum size for parallel upload; 0 to disable
subprocess.check_call([
'gsutil',
'-o', 'GSUtil:parallel_composite_upload_threshold=%s' % (parallel_threshold,),
'cp', filename, 'gs://%s/%s' % (gs_bucket, filename)
])
Note that here you're explicitly providing argument vector boundaries, and not relying on a shell to do this for you; this prevents a malicious or buggy filename from performing undesired operations.
If you don't know that the clients accessing content in this bucket will have the crcmod
module, consider setting parallel_threshold='0'
above, which will disable this support.