Reproducible Deep Learning

PhD Course in Data Science

Timetable: May 5-7-12-14, 9-13 AM.
Attendance: Zoom link, contact me if you do not receive the passcode.

Visual overview of the course

Overview

Building a deep learning model is a complex task, full of interacting design decisions, data engineering, parameter tweaking, and experimentation. Having access to powerful tools for versioning, storing, and analyzing every step of the process (MLOps) is essential.

The aim of this practical course is to start from a simple deep learning model implemented in a notebook, and port it to a ‘reproducible’ world by including code versioning (Git), data versioning (DVC), experiment logging (Weight & Biases), hyper-parameter tuning, configuration (Hydra), and ‘Dockerization’. While the focus is on vertical, well-established tools, we will discuss more advanced integrated frameworks (e.g., MLFlow) and techniques (e.g., CI/CD pipelines).

Setup your machine

We will install most libraries as we go along. For the initial setup, perform an Anaconda installation on your machine, and create an environment:

conda create -n reprodl; conda activate reprodl

Then, install a few generic prerequisites (notebook handling, Pandas, …):

conda install -y -c conda-forge notebook matplotlib pandas ipywidgets pathlib

Finally, install PyTorch and PyTorch Lightning. The instructions below can vary depending on whether you have a CUDA-enabled machine, Linux, etc. In general, follow the instructions from the website.

conda install -y pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c conda-forge
conda install -y pytorch-lightning -c conda-forge

Organization of the material

Slides will be available below, while the code will be uploaded to a GitHub repository. The course is split into exercises (Git, DVC, …). To start the exercise, swith to the corresponding Git branch, and follow the instructions on video or in the corresponding README file. To see the completed exercise, you can switch to a completed branch, as shown below.

An example

To follow the DVC exercise, check in the table below the name of the branch (exercise2_dvc), and perform a checkout:

git checkout exercise2_dvc

If you want to see the completed exercise, add _completed to the name of the branch:

git checkout exercise2_dvc_completed

You can inspect the commits to look at specific changes in the code:

git log --graph --abbrev-commit --decorate

If you want to inspect a specific change, you can checkout again using the ID of the commit.

Material

  Topic Branch name Material
0 Introduction - Slides, Bare repository, Video
1 Deep learning recap - Notebook, Video
2 Git & Scripting exercise1_git Slides, Video, Code
3 Hydra configuration exercise2_hydra Hydra repository, Video, Code
4 Data versioning with DVC exercise3_dvc DVC Website, Slides, Video (part 1), Video (part 2), Code
5 Docker exercise4_docker Docker Website, Slides, Video, Code
6 Weight & Biases exercise5_wandb Weights & Biases Website, Video, Code
7 Continuous integration exercise6_hooks Video, Code
- Exam   Instructions

Advanced reading material

The new edition of Full Stack Deep Learning (UC Berkeley CS194-080) covers a larger set of material than this course.