Difficulty comparing generated and google cloud storage provided CRC32c checksums
Here's an example md5 and crc32c for the gsutil public tarball:
$ gsutil ls -L gs://pub/gsutil.tar.gz | grep Hash
Hash (crc32c): vHI6Bw==
Hash (md5): ph7W3cCoEgMQWvA45Z9y9Q==
I'll copy it locally to work with:
$ gsutil cp gs://pub/gsutil.tar.gz /tmp/
Copying gs://pub/gsutil.tar.gz...
Downloading file:///tmp/gsutil.tar.gz: 2.59 MiB/2.59 MiB
CRC values are usually displayed as unsigned 32-bit integers. To convert it:
>>> import base64
>>> import struct
>>> struct.unpack('>I', base64.b64decode('vHI6Bw=='))
(3161602567,)
To obtain the same from the crcmod library:
>>> file_bytes = open('/tmp/gsutil.tar.gz', 'rb').read()
>>> import crcmod
>>> crc32c = crcmod.predefined.Crc('crc-32c')
>>> crc32c.update(file_bytes)
>>> crc32c.crcValue
3161602567L
If you want to convert the value from crcmod to the same base64 format used by gcloud/gsutil:
>>> base64.b64encode(crc32c.digest()).decode('utf-8')
'vHI6Bw=='