Google Colab Storage
Presently, the amount of local storage in colab depends on the chosen hardware accelerator runtime type:
# Hardware accelerator none
!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 49G 22G 26G 46% /
# Hardware accelerator GPU
!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 359G 23G 318G 7% /
# Hardware accelerator TPU
!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 49G 22G 26G 46% /
Even if you don't need a GPU, swithcing to that runtime type will provide you with an extra 310Gb of storage space.
Yes, the Colab notebook local storage is about 40 GiB right now. One way to see the exact value (in Python 3):
import subprocess
p = subprocess.Popen('df -h', shell=True, stdout=subprocess.PIPE)
print(str(p.communicate()[0], 'utf-8'))
However: for large amounts of data, local storage is a non-optimal way to feed the TPU, which is not connected directly to the machine running the notebook. Instead, consider storing your large dataset in GCP storage, and sourcing that data from the Colab notebook. (Moreover, the amount of Colab local storage may change, and the Colab notebook itself will expire after a few hours, taking local storage with it.)
Take a look at the canonical TPU Colab notebook. At the bottom are some next steps, which include a link to Searching Shakespeare with TPUs. In that notebook is the following code fragment, which demonstrates GCP authentication to your Colab TPU. It looks like this:
from google.colab import auth
auth.authenticate_user()
if 'COLAB_TPU_ADDR' in os.environ:
TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])
# Upload credentials to TPU.
with tf.Session(TF_MASTER) as sess:
with open('/content/adc.json', 'r') as f:
auth_info = json.load(f)
tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)
# Now credentials are set for all future sessions on this TPU.
else:
TF_MASTER=''