Running your notebooks in the cloud with google Colab

Jessica Greene
6 min readDec 31, 2021
screen shot of my notebook in google colab

The first time I came across running code in note books was the first PyLadies event I ever attended. The meetup was a workshop around data science, a topic that as someone new in programming I was interested in but still had very little exposure to. I spent almost the entire workshop tirelessly trying to get my setup to work and even once I had it working I found I had no idea how to run cells or navigate the tool. Since then and many notebooks later I am feeling much more comfortable working within them.

I normally spin up a Docker container on my linux machine using the jupyter notebooks Docker image and any additional libraries I might need to get started. This has worked well for getting started quickly and working in an isolated environment. Until recently the datasets I was working with were also small and I was only requiring a small amount of compute. The latest installment of the Machine Learning bootcamp I am taking part in started to introduce Deep learning and image processing which not only meant larger datasets but also the need for more compute power. Running this on my linux machine was possible, but not enjoyable, not least because I had a deadline to meet and had left it until the last minute to complete.

I had seen the lecturer using a local SageMaker container to run their notebooks in and several other participants had mentioned alternatives such as kaggle notebooks or google colab. I had put off setting something like this up because it seemed somewhat daunting to change my set up when I was already learning so many new things but after one notebook took over an hour to run I decided to put my anxiety to one side and see how I could set this up

WHY 🤔

As mentioned above my motivation for doing this was to relieve some of the compute power required by using a machine with a GPU, as my personal machine doesn’t have one I needed to run it else where. Other plus points are the ease at which the notebooks can be shared and essentially run by anyone and there is very little configuration required to get them up and running.

WHAT 🕵️‍♀️

Google Colaboratory, or ‘Colab’ for short, allows you to write and execute Python in your browser. A server at Google will run the notebook rather than your own, local computer. You have various options that you can choose from depending on your requirements such as running on a machine with a GPU or CPU.

HOW 🔧

Cost

First of all, and an important point is that as of writing this you do not need spend any money to run a colab notebook with a GPU and

Setting up your notebook

Go to https://colab.research.google.com/ and login with your google account. You can choose to import your notebook either from Google drive, a local file, or Github.

After you have the notebook open go to the top right and click connect.

screen shot showing where the connect option is in google colab

Resources will now be allocated for your notebook and a virtual machine created to host your notebook. You can now run the cells as you would in Jupyter or a different notebook server. Colab will have assigned the resources it thinks your notebook will require but you can tweak them to better fit your needs.

For example by default you may not have been assigned Runtime with a GPU. You can check if you have GPUs available by running the following:

import tensorflow as tf

print(“Num GPUs Available: “, len(tf.config.list_physical_devices(‘GPU’)))

In the resources menu can change the runtime type. Colab does recommend to avoid using a GPU unless you need one. This is mainly because the computational resources are offered for free and fluctuate depending on demand. There are of course paid options to get around these restrictions. For full information on resource limits you can check out the documentation here.

Accessing files from within the Virtual machine

Depending on your use case you may want to have files that you can access in your notebook. There are a few ways to achieve this.

file.upload()

from google.colab import files

uploaded = files.upload()

This is a pretty easy approach to import files from your local machine and save them with in the virtual machine that the notebook is running.

wget

If you have the files availble somewhere else online you can use wget command inside the cell to fetch them. You need a ! in front of the command to run it. You can then unzip your file into a directory on the virtual machine.

os.mkdir(data_path)

os.mkdir(raw_path)

!wget link/to/data/train.zip

!unzip /content/train.zip -d /content/raw_data

# clean up downloaded zip

!rm -r /content/train.zip

Note that each time the machine is started both these methods will require these cells to be rerun so it may be more time consuming than the other solutions.

Mounting your Google Drive into Colab

It’s possible to connect your google drive and then store your files there. On the left hand side you will see a directory icon that offers a UI approach to uploading files as well as the option to mount your google drive

Using AWS S3

If you want to move away from a google only approach and take your data storage solution to the next level you can use AWS S3 to store your data. There is some time required to load the data each time however with the use of pandas you can import data with one line (not including imports).

import pandas as pd

import s3fs

df = pd.read_csv(‘s3://bucket-name/file.csv’)

Loading & saving from github

Colab allows easy integration with Github so you can load notebooks from Github but also save copies to a repository by connecting your account. It’s fairly easy to achieve this: go to File→Save a copy in Drive or File→Save a copy to GitHub and follow the resulting prompts. Colab adds a shields.io-style badge to your notebook so it can easily be opened in colab directly from Github!

You can read more on this here.

Conclusion

There are certainly limits with using colab but for learning and side projects it is a powerful tool to leverage resources you may not have access to otherwise. Also as the name suggests this is a great tool for colaboration and sharing work. If you are reading this I hope you found some use in the article and wish you a great time working with notebooks and colab 🎉

Still searching with Google? Why not switch to www.ecosia.org, your searches not only help plant trees but build communities and ecosystems — disclaimer I work at Ecosia ;)

--

--

Jessica Greene

Backend/Data Engineer @Ecosia 🌳 interested in IoT, ML, GO, Python, data visualisation Co-organiser @PyLadiesBer ❤ she/her Previously Roaster @THEBARNBERLIN ☕