Data Augmentation: A Class Imbalance Mitigative Measure

In this article, we took a look at data augmentation as an upsampling technique for handing class imbalance by looking at 5 sample methods. Thereafter, we augment a dataset and train it on a convnet using said dataset show how it improved accuracy and recall scores.

2 years ago   •   12 min read

By Oreolorun Olu-Ipinlaye
Table of contents

Bring this project to life

In a previous article, we discussed the effects of class imbalance on a convnet's performance, and the achievement of specific model objectives. We also discussed a couple of methods which could help handle class imbalance, and at this point upsampling was mentioned. In this article we will be taking a look at upsampling in greater detail to see how it applies as regards image data.

#  article dependencies
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as Datasets
from import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
import cv2
from tqdm.notebook import tqdm
from tqdm import tqdm as tqdm_regular
import seaborn as sns
from torchvision.utils import make_grid
import random
#  setting up device
if torch.cuda.is_available():
  device = torch.device('cuda:0')
  print('Running on the GPU')
  device = torch.device('cpu')
  print('Running on the CPU')


Upsampling in the context of an imbalanced dataset refers to the process of bringing up the number of images in the minority class to match the number of images in the majority class. As I had mentioned previously, this can either be done by collecting more data for the minority class or by creating new data instances from the preexisting data to supplement the difference. The process of creating new data instances form preexisting data is termed data augmentation.

Image Data Augmentation

As regards images, how exactly can we generate new images from those already available? We don't necessarily need to utilize a generative model (although this is a very viable option). A much simpler technique is to create copies of the original images and transform them subtly enough for them to be perceived as new images.

Bear in mind that we can think of images as just a bunch of pixels - pixels which are numbers representing intensity. If we find ways to transform or manipulate these numbers, we can end up with a new set of numbers which retain the much of the overall attributes of the original image while at the same time being distinct enough to be perceived as a different image. If this is archived, a convolutional neural network will treat the augmented image as an entirely new image instance thereby helping to supplement the dataset.

Image Augmentation Techniques

In this section, we will be taking a look at some common image augmentation techniques. It should be noted however that this is by no means an exhaustive list.

Random Cropping

Random cropping is an augmentation technique where a random segment of an image is cropped thereby bringing it into focus. This cropped version of the original image will be missing some pixels essentially rendering it a distinct image of its own. Apart from being an augmentation technique, random cropping can help add some redundancy in models as models trained with random crop augmented images may have the capability of identifying images even when the object of interest is not in full view.

def random_crop(dataset: list, crop_size=(20, 20)):
  This function replicates the random crop process
  cropped = []
  images = [x[0] for x in dataset]
  for image in tqdm_regular(images):
    # deriving image size
    img_size = image.shape

    #  extracting channels
    channel_0, channel_1, channel_2 = image[:,:,0], image[:,:,1], image[:,:,2]

    #  deriving random indicies
    idx_row = random.randint(0, img_size[0] - crop_size[0])
    idx_column = random.randint(0, img_size[0] - crop_size[0])

    #  cropping image per channel
    channel_0 = channel_0[idx_row:idx_row + crop_size[0], 
                          idx_column:idx_column + crop_size[1]]
    channel_1 = channel_1[idx_row:idx_row + crop_size[0], 
                          idx_column:idx_column + crop_size[1]]
    channel_2 = channel_2[idx_row:idx_row + crop_size[0], 
                          idx_column:idx_column + crop_size[1]]

    #  stacking images
    image = np.dstack((channel_0, channel_1, channel_2))

    #  resizing image
    image = cv2.resize(image, (32, 32))
    #  labelling and appending to list
    cropped.append((image, 1))
  return cropped 

Image Noising

An augmentation technique where random pixels in an image are purposefully 'corrupted' so as to create an illusion of a completely different image. This corruption is done by randomly casting some pixels to white or black. Images augmented via noising have certain pixels of completely different intensities to their original versions and are thereby perceived distinct.

def noise_image(dataset: list, noise_intensity=0.2):
  This function replicates the image noising process
  noised = []
  noise_threshold = 1 - noise_intensity
  images = [x[0] for x in dataset]

  for image in tqdm_regular(images):
    #  extracting channels
    channel_0, channel_1, channel_2 = image[:,:,0], image[:,:,1], image[:,:,2]

    #  flatenning channels
    channel_0 = channel_0.reshape(1024)
    channel_1 = channel_1.reshape(1024)
    channel_2 = channel_2.reshape(1024)

    #  creating vector of zeros
    noise_0 = np.zeros(1024, dtype='uint8')
    noise_1 = np.zeros(1024, dtype='uint8')
    noise_2 = np.zeros(1024, dtype='uint8')

    #  noise probability
    for idx in range(1024):
      regulator = round(random.random(), 1)
      if regulator > noise_threshold:
        noise_0[idx] = 255
        noise_1[idx] = 255
        noise_2[idx] = 255
      elif regulator == noise_threshold:
        noise_0[idx] = 0
        noise_1[idx] = 0
        noise_2[idx] = 0
        noise_0[idx] = channel_0[idx]
        noise_1[idx] = channel_1[idx]
        noise_2[idx] = channel_2[idx]
    #  reshaping noise vectors
    noise_0 = noise_0.reshape((32, 32))
    noise_1 = noise_1.reshape((32, 32))
    noise_2 = noise_2.reshape((32, 32))

    #  stacking images
    image = np.dstack((noise_0, noise_1, noise_2))
    #  labelling and appending to list
    noised.append((image, 1))
  return noised

Image Flipping

Image flipping, a mainstay in image processing, is an augmentation technique where the arrangement of rows or columns of pixels are reversed creating a mirror view effect. When images are flipped, the arrangement of their pixels change effectively allowing them to be perceived as different to the original.

def flip_image(dataset: list):
  This function replicates the process of horizontal flipping
  flipped = []
  images = [x[0] for x in dataset]

  for image in tqdm_regular(images):
    #  extracting channels
    channel_0, channel_1, channel_2 = image[:,:,0], image[:,:,1], image[:,:,2]

    channel_0 = channel_0[:, ::-1]
    channel_1 = channel_1[:, ::-1]
    channel_2 = channel_2[:, ::-1]

    #  stacking images
    image = np.dstack((channel_0, channel_1, channel_2))
    #  labelling and appending to list
    flipped.append((image, 1))
  return flipped

Image Blurring

Another image processing regular, blurring serves as an augmentation technique where pixel intensities are changed across board so as to create a dulling effect in the blurred version. Since pixel values are changed, the blurred versions are treated as entirely new images on a pixel level.

def blur_image(dataset, kernel_size=5, padding=True):
  """This function performs convolution over an image
   with the aim of blurring"""

  #  defining internal function for padding
  def pad_image(image, padding=2):
    This function performs zero padding using the number of 
    padding layers supplied as argument and return the padded
    #  extracting channels
    channel_0, channel_1, channel_2 = image[:,:,0], image[:,:,1], image[:,:,2]

    #  creating an array of zeros
    padded_0 = np.zeros((image.shape[0] + padding*2, 
                         image.shape[1] + padding*2), dtype='uint8')
    padded_1 = np.zeros((image.shape[0] + padding*2, 
                         image.shape[1] + padding*2), dtype='uint8')
    padded_2 = np.zeros((image.shape[0] + padding*2, 
                         image.shape[1] + padding*2), dtype='uint8')
    #  inserting image into zero array
             int(padding):-int(padding)] = channel_0
             int(padding):-int(padding)] = channel_1
             int(padding):-int(padding)] = channel_2

    #  stacking images
    padded = np.dstack((padded_0, padded_1, padded_2))

    return padded

  #  defining list to hold blurred images
  all_blurred = []

  #  defining gaussian 5x5 filter
  gauss_5 = np.array([[1, 4, 7, 4, 1],
                     [4, 16, 26, 16, 4],
                     [7, 26, 41, 26, 7],
                     [4, 16, 26, 16, 4],
                     [1, 4, 7, 4, 1]])

  filter = 1/273 * gauss_5
  #  extracting images
  images = [x[0] for x in dataset]

  for image in tqdm_regular(images):
    if padding:
      image = pad_image(image)
      image = image

    #  extracting channels
    channel_0, channel_1, channel_2 = image[:,:,0], image[:,:,1], image[:,:,2]

    #  creating an array to store convolutions
    blurred_0 = np.zeros(((image.shape[0] - kernel_size) + 1, 
                          (image.shape[1] - kernel_size) + 1), dtype='uint8')
    blurred_1 = np.zeros(((image.shape[0] - kernel_size) + 1, 
                          (image.shape[1] - kernel_size) + 1), dtype='uint8')
    blurred_2 = np.zeros(((image.shape[0] - kernel_size) + 1, 
                          (image.shape[1] - kernel_size) + 1), dtype='uint8')
    #  performing convolution
    for i in range(image.shape[0]):
      for j in range(image.shape[1]):
          blurred_0[i,j] = (channel_0[i:(i+kernel_size), j:(j+kernel_size)] * filter).sum()
        except Exception:

    for i in range(image.shape[0]):
      for j in range(image.shape[1]):
          blurred_1[i,j] = (channel_1[i:(i+kernel_size), j:(j+kernel_size)] * filter).sum()
        except Exception:

    for i in range(image.shape[0]):
      for j in range(image.shape[1]):
          blurred_2[i,j] = (channel_2[i:(i+kernel_size), j:(j+kernel_size)] * filter).sum()
        except Exception:

    #  stacking images
    blurred = np.dstack((blurred_0, blurred_1, blurred_2))
    #  labelling and appending to list
    all_blurred.append((blurred, 1))

  return all_blurred

Putting It All Together

Bring this project to life

In this section, we will utilize the above defined augmentation technique in upsampling the dataset from the previous article where we had a 4:1 class imbalance (80% cats, 20% dogs). For this purpose we will be using the CIFAR-10 dataset which can be loaded into PyTorch using the code cell below.

#  loading training data
training_set = Datasets.CIFAR10(root='./', download=True,

#  loading validation data
validation_set = Datasets.CIFAR10(root='./', download=True, train=False,

We will now extract cat and dogs images from the dataset using a function defined as follows.

def extract_images(dataset):
  This function helps to extract cat and dog images
  from the cifar-10 dataset
  cats = []
  dogs = []

  for idx in tqdm_regular(range(len(dataset))):
    if dataset.targets[idx]==3:
      cats.append(([idx], 0))
    elif dataset.targets[idx]==5:
      dogs.append(([idx], 1))
  return cats, dogs
#  extracting from the training set
train_cats, train_dogs = extract_images(training_set)
#  extracting from the validation set
val_cats, val_dogs = extract_images(validation_set)

Upsampling Training Images via Augmentation

In the article on class imbalance, we had set up a 4:1 imbalance in favor of cats by using the first 4,800 cat images and just the first 1,200 dog images i.e data = train_cats[:4800] + train_dogs[:1200]. To allow for synergy, we will keep with the same theme which means we need up augment dog images with 3,600 images.

In order to keep things simple, we will utilize three of the above mentioned augmentation methods, producing 1,200 augmented version of the original images with each method.

#  deriving images of interest
dog_images = train_dogs[:1200]

#  creating random cropped copies
dog_cropped = random_crop(dog_images)

#  creating flipped copies
dog_flipped = flip_image(dog_images)

#  creating noised copies
dog_noised = noise_image(dog_images)

Piecing Together a Dataset

Now that the transformed copies are in place, all we need to do now is to finish putting together our dataset for both the training and validation set.

#  creating a dataset of 4,800 dog images
train_dogs = dog_images + dog_cropped + dog_flipped + dog_noised

#  instantiating training data
training_images = train_cats[:4800] + train_dogs

#  instantiating validation data
validation_images = val_cats + val_dogs

Next we need to define a class so as to be able to create a PyTorch dataset from our custom dataset.

#  defining dataset class
class CustomCatsvsDogs(Dataset):
  def __init__(self, data, transforms=None): = data
    self.transforms = transforms

  def __len__(self):
    return len(

  def __getitem__(self, idx):
    image =[idx][0]
    label = torch.tensor([idx][1])

    if self.transforms!=None:
      image = self.transforms(image)
    return(image, label)
 #  creating pytorch datasets
training_data = CustomCatsvsDogs(training_images, transforms=transforms.Compose([transforms.ToTensor(),
                                                                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]))
validation_data = CustomCatsvsDogs(validation_images, transforms=transforms.Compose([transforms.ToTensor(),
                                                                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]))

Convnet Classes

In a bid to train a convnet, we need to define a class which will enable us to neatly package training, validation, metric calculation and logging as well as model utilization all into a single object as seen below.

class ConvolutionalNeuralNet_2():
  def __init__(self, network): =
    self.optimizer = torch.optim.Adam(, lr=1e-3)

  def train(self, loss_function, epochs, batch_size, 
            training_set, validation_set):
    #  creating log
    log_dict = {
        'training_loss_per_batch': [],
        'validation_loss_per_batch': [],
        'training_accuracy_per_epoch': [],
        'training_recall_per_epoch': [],
        'training_precision_per_epoch': [],
        'validation_accuracy_per_epoch': [],
        'validation_recall_per_epoch': [],
        'validation_precision_per_epoch': []

    #  defining weight initialization function
    def init_weights(module):
      if isinstance(module, nn.Conv2d):
      elif isinstance(module, nn.Linear):

    #  defining accuracy function
    def accuracy(network, dataloader):
      all_predictions = []
      all_labels = []

      #  computing accuracy
      total_correct = 0
      total_instances = 0
      for images, labels in tqdm(dataloader):
        images, labels =,
        predictions = torch.argmax(network(images), dim=1)
        correct_predictions = sum(predictions==labels).item()
      accuracy = round(total_correct/total_instances, 3)

      #  computing recall and precision
      true_positives = 0
      false_negatives = 0
      false_positives = 0
      for idx in range(len(all_predictions)):
        if all_predictions[idx].item()==1 and  all_labels[idx].item()==1:
        elif all_predictions[idx].item()==0 and all_labels[idx].item()==1:
        elif all_predictions[idx].item()==1 and all_labels[idx].item()==0:
        recall = round(true_positives/(true_positives + false_negatives), 3)
      except ZeroDivisionError:
        recall = 0.0
        precision = round(true_positives/(true_positives + false_positives), 3)
      except ZeroDivisionError:
        precision = 0.0
      return accuracy, recall, precision

    #  initializing network weights

    #  creating dataloaders
    train_loader = DataLoader(training_set, batch_size)
    val_loader = DataLoader(validation_set, batch_size)

    #  setting convnet to training mode

    for epoch in range(epochs):
      print(f'Epoch {epoch+1}/{epochs}')
      train_losses = []

      #  training
      for images, labels in tqdm(train_loader):
        #  sending data to device
        images, labels =,
        #  resetting gradients
        #  making predictions
        predictions =
        #  computing loss
        loss = loss_function(predictions, labels)
        #  computing gradients
        #  updating weights
      with torch.no_grad():
        print('deriving training accuracy...')
        #  computing training accuracy
        train_accuracy, train_recall, train_precision = accuracy(, train_loader)

      #  validation
      val_losses = []

      #  setting convnet to evaluation mode

      with torch.no_grad():
        for images, labels in tqdm(val_loader):
          #  sending data to device
          images, labels =,
          #  making predictions
          predictions =
          #  computing loss
          val_loss = loss_function(predictions, labels)
        #  computing accuracy
        print('deriving validation accuracy...')
        val_accuracy, val_recall, val_precision = accuracy(, val_loader)

      train_losses = np.array(train_losses).mean()
      val_losses = np.array(val_losses).mean()

      print(f'training_loss: {round(train_losses, 4)}  training_accuracy: '+
      f'{train_accuracy}  training_recall: {train_recall}  training_precision: {train_precision} *~* validation_loss: {round(val_losses, 4)} '+  
      f'validation_accuracy: {val_accuracy}  validation_recall: {val_recall}  validation_precision: {val_precision}\n')
    return log_dict

  def predict(self, x):

Next, we need to define a convolutional neural network for this binary classification task. For the sake of this article we will be using a custom built convnet as defined in the code block below.

class ConvNet(nn.Module):
  def __init__(self):
    self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
    self.batchnorm1 = nn.BatchNorm2d(8)
    self.conv2 = nn.Conv2d(8, 8, 3, padding=1)
    self.batchnorm2 = nn.BatchNorm2d(8)
    self.pool2 = nn.MaxPool2d(2)
    self.conv3 = nn.Conv2d(8, 32, 3, padding=1)
    self.batchnorm3 = nn.BatchNorm2d(32)
    self.conv4 = nn.Conv2d(32, 32, 3, padding=1)
    self.batchnorm4 = nn.BatchNorm2d(32)
    self.pool4 = nn.MaxPool2d(2)
    self.conv5 = nn.Conv2d(32, 128, 3, padding=1)
    self.batchnorm5 = nn.BatchNorm2d(128)
    self.conv6 = nn.Conv2d(128, 128, 3, padding=1)
    self.batchnorm6 = nn.BatchNorm2d(128)
    self.pool6 = nn.MaxPool2d(2)
    self.conv7 = nn.Conv2d(128, 2, 1)
    self.pool7 = nn.AvgPool2d(3)

  def forward(self, x):
    # INPUT
    x = x.view(-1, 3, 32, 32)
    # LAYER 1
    output_1 = self.conv1(x)
    output_1 = F.relu(output_1)
    output_1 = self.batchnorm1(output_1)

    # LAYER 2
    output_2 = self.conv2(output_1)
    output_2 = F.relu(output_2)
    output_2 = self.pool2(output_2)
    output_2 = self.batchnorm2(output_2)

    # LAYER 3
    output_3 = self.conv3(output_2)
    output_3 = F.relu(output_3)
    output_3 = self.batchnorm3(output_3)

    # LAYER 4
    output_4 = self.conv4(output_3)
    output_4 = F.relu(output_4)
    output_4 = self.pool4(output_4)
    output_4 = self.batchnorm4(output_4)

    # LAYER 5
    output_5 = self.conv5(output_4)
    output_5 = F.relu(output_5)
    output_5 = self.batchnorm5(output_5)

    # LAYER 6
    output_6 = self.conv6(output_5)
    output_6 = F.relu(output_6)
    output_6 = self.pool6(output_6)
    output_6 = self.batchnorm6(output_6)

    output_7 = self.conv7(output_6)
    output_7 = self.pool7(output_7)
    output_7 = output_7.view(-1, 2)

    return F.softmax(output_7, dim=1)

Training a Convolutional Neural Network

By utilizing the convnet we defined in the previous section and instantiating it as a member of the convolutional neural network class, also defined in the previous section, we can now proceed to train our convnet for 10 epochs using parameters as defined as follows.

#  training model
model = ConvolutionalNeuralNet_2(ConvNet())

log_dict = model.train(nn.CrossEntropyLoss(), epochs=10, batch_size=64, 
                       training_set=training_data, validation_set=validation_data)

Analyzing Results

A bit of a refresher, in the class imbalance article when we trained a model on the imbalanced dataset, we wound up with a model with 80% training accuracy and 50% validation accuracy with a validation recall of 0%. This indicated that the model was indiscriminate, and was simply predicting all image instances as cats.

However, training the model on augmented data as we have done yielded results as seen in the image below. Overall, both training and validation accuracy increased through the course of training, albeit with validation accuracy plateauing from the 5th epoch.

Of key interest however are the validation metrics, with a validation accuracy of approximately 73% from the 3rd epoch, validation recall was not 0%, in fact it climbed to as high as 78% by the 9th epochs indicating that the model is now in fact being discriminative even though we had used augmented images for training purposes. Performance can further be tweaked by trying other augmentation methods or adjusting class weights.

Finding the Best Technique

As you might have noticed, I chose not to select blurring as an augmentation method for this dataset. That's because I had actually tried it and it did not yield desirable results. In fact certain datasets have schemes of augmentation techniques which work best for them, it is therefore imperative to find the best techniques for whichever dataset one is dealing with.

Final Remarks

In this article we took a look at data augmentation as an upsampling technique for handing class imbalance. We went further by discussing in detail a few image augmentation techniques and how they can be implemented in Python. Thereafter, we augmented a dataset and trained a convnet using said dataset with results showing that it yielded reasonable validation accuracy and recall scores.

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Spread the word

Keep reading