How to increase Google Colab storage

Using google drive wont work, if the files are on google drive then it wont be able to read the whole directory due to too many files. From all my testing, I cant get it to work with a directory with more than 15k files in it. Youll have to have enough space to download the dataset on the VM.

Update:

I figured out how to get the whole COCO-2017 dataset into Colab with Google Drive. Basically I broke train2017 and test2017 down into sub directories with a max of 5000 files (I noticed Colab could only read somewhere around 15k files from a directory, so 5000 seemed a safe bet). Here is the code for that.

Then I used rclone to upload the whole damn dataset to Google Drive, and shared with anyone who has a link can view.

Once you have the share in your google drive, create a shortcut for it so it can be accessed by Colab. Then I just create 118287 for train and 40670 for test symbolic links in the local directory. So far, it is working like a charm. I even save all my output to Google Drive so it can be resumed after the 12 hour kick. Here is the notebook for that.

I am training a mask rcnn now, will report results when finished but its looking pretty damn good so far.


If you pay for extra storage in google drive, you can mount drive into /content/drive/ folder

as Follows

from google.colab import drive
drive.mount('/content/drive')
> Then it will ask you for auth code

You can even use it for unziping datasets (My scenario was that I had enough space on Colab to download 18G of Coco Dataset but not enough space to unzip it)

!unzip /content/train2017.zip -d /content/drive/My\ Drive/COCO/train_2017

I had the same problem. I am not sure this is a solution since I haven't tested it thoroughly, but it seems like the [Python 2 / No GPU] and [Python 3 / No GPU] runtimes have only 40GB of storage, whereas the [Python 3 / GPU] runtime has 359GB of storage.

Try changing your notebook runtime type to [Python 3 / GPU] by going to "Runtime" > "Change runtime type". Hope it helps!