In this article, we will be building Convolutional Neural Networks (CNNs) from scratch in PyTorch, and seeing them in action as we train and test them on a real-world dataset.
We will start by exploring what CNNs are and how they work. We will then look into PyTorch and start by loading the CIFAR10 dataset using
torchvision (a library containing various datasets and helper functions related to computer vision). We will then build and train our CNN from scratch. Finally, we will test our model.
Below is the outline of the article:
- Convolutional Neural Networks
- Data Loading
- CNN from Scratch
- Setting Hyperparameters
Bring this project to life
Convolutional Neural Networks
A convolutional neural network (CNN) takes an input image and classifies it into any of the output classes. Each image passes through a series of different layers – primarily convolutional layers, pooling layers, and fully connected layers. The below picture summarizes what an image passes through in a CNN:
The convolutional layer is used to extract features from the input image. It is a mathematical operation between the input image and the kernel (filter). The filter is passed through the image and the output is calculated as follows:
Different filters are used to extract different kinds of features. Some common features are given below:
Pooling layers are used to reduce the size of any image while maintaining the most important features. The most common types of pooling layers used are max and average pooling which take the max and the average value respectively from the given size of the filter (i.e, 2x2, 3x3, and so on).
Max pooling, for example, would work as follows:
PyTorch is one of the most popular and widely used deep learning libraries – especially within academic research. It's an open-source machine learning framework that accelerates the path from research prototyping to production deployment and we'll be using it today in this article to create our first CNN.
Let's start by loading some data. We will be using the CIFAR-10 dataset. The dataset has 60,000 color images (RGB) at 32px x 32px belonging to 10 different classes (6000 images/class). The dataset is divided into 50,000 training and 10,000 testing images.
You can see a sample of the dataset along with their classes below:
Importing the Libraries
Let's start by importing the required libraries and defining some variables:
device will determine whether to run the training on GPU or CPU.
To load the dataset, we will be using the built-in datasets in
torchvision. It provides us with the ability to download the dataset and also apply any transformations we want.
Let's look at the code first:
Let's dissect this piece of code:
- We start by writing some transformations. We resize the images, convert them to tensors and normalize them by using the mean and standard deviation of each band in the input images. You can calculate these as well, but they are available online.
- Then, we load the dataset: both training and testing. We set download equal to True so that it is downloaded if not already downloaded.
- Loading the whole dataset into the RAM at once is not a good practice and can seriously halt your computer. That's why we use data loaders, which allow you to iterate through the dataset by loading the data in batches.
- We then create two data loaders (for train/test) and set the batch size, along with shuffle, equal to True, so that images from each class are included in a batch.
CNN from Scratch
Before diving into the code, let's explain how you define a neural network in PyTorch.
- You start by creating a new class that extends the
nn.Moduleclass from PyTorch. This is needed when we are creating a neural network as it provides us with a bunch of useful methods
- We then have to define the layers in our neural network. This is done in the
__init__method of the class. We simply name our layers, and then assign them to the appropriate layer that we want; e.g., convolutional layer, pooling layer, fully connected layer, etc.
- The final thing to do is define a
forwardmethod in our class. The purpose of this method is to define the order in which the input data passes through the various layers
Now, let's dive into the code:
As I explained above, we start by creating a class that inherits the
nn.Module class, and then we define the layers and their sequence of execution inside
Some things to notice here:
nn.Conv2dis used to define the convolutional layers. We define the channels they receive and how much should they return along with the kernel size. We start from 3 channels, as we are using RGB images
nn.MaxPool2dis a max-pooling layer that just requires the kernel size and the stride
nn.Linearis the fully connected layer, and
nn.ReLUis the activation function used
- In the
forwardmethod, we define the sequence, and, before the fully connected layers, we reshape the output to match the input to a fully connected layer
Let's now set some hyperparameters for our training purposes.
We start by initializing our model with the number of classes. We then choose cross-entropy and SGD (Stochastic Gradient Descent) as our loss function and optimizer respectively. There are different choices for these, but I found these to result in maximum accuracy when experimenting. We also define the variable
total_step to make iteration through various batches easier.
Now, let's start training our model:
This is probably the trickiest part of the code. Let's see what the code does:
- We start by iterating through the number of epochs, and then the batches in our training data
- We convert the images and the labels according to the device we are using, i.e., GPU or CPU
- In the forward pass we make predictions using our model and calculate loss based on those predictions and our actual labels
- Next, we do the backward pass where we actually update our weights to improve our model
- We then set the gradients to zero before every update using
- Then, we calculate the new gradients using the
- And finally, we update the weights with the
We can see the output as follows:
As we can see, the loss is slightly decreasing with more and more epochs. This is a good sign. But you may notice that it is fluctuating at the end, which could mean the model is overfitting or that the
batch_size is small. We will have to test to find out what's going on.
Let's now test our model. The code for testing is not so different from training, with the exception of calculating the gradients as we are not updating any weights:
We wrap the code inside
torch.no_grad() as there is no need to calculate any gradients. We then predict each batch using our model and calculate how many it predicts correctly. We get the final result of ~83% accuracy:
And that's it. We managed to create a Convolutional Neural Network from scratch in PyTorch!
We started by learning about CNNs – what kind of layers they have and how they work. We then introduced PyTorch, which is one of the most popular deep learning libraries available today. We learned how PyTorch would make it much easier for us to experiment with a CNN.
Next, we loaded the CIFAR-10 dataset (a popular training dataset containing 60,000 images), and made some transformations on it.
Then, we built a CNN from scratch, and defined some hyperparameters for it. Finally, we trained and tested our model on CIFAR10 and managed to get a decent accuracy on the test set.