CommonLounge Archive

Hands-on Assignment: Convolutional Neural Networks (and Ablation Studies)

April 27, 2018

In this hands on assignment, we’ll use CNNs for image classification. You’ll first run provided code for digit classification, and analyze the effects of number of layers, hidden layer sizes, dropout and momentum on the performance. Then, you’ll write code to perform object classification between 10 categories such as airplanes, cars, trucks, birds, cats, dogs, etc. Let’s get started!

Digit Classification

Dataset

The dataset we’ll use is the MNIST dataset, which is popularly used dataset in machine learning for the hand-written digit recognition task. Here are some sample images from the dataset.

Samples from MNIST hand-written digit dataset (16 samples are shown for each label)

You don’t need to download the dataset manually. PyTorch can do that for us.

How to run the code using Google Colaboratory

Notebook Link

You can also play with this project directly in-browser via Google Colaboratory using the link above. Google Colab is a free tool that lets you run small Machine Learning experiments through your browser. You should read this 1 min tutorial if you’re unfamiliar with Google Colaboratory.

Code

You can use the following code to perform digit classification:

from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
torch.manual_seed(42)
###############################################################################
## load data
transform = transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])
train_data = datasets.MNIST('./data/mnist', train=True, download=True, transform=transform)
test_data = datasets.MNIST('./data/mnist', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=1000, shuffle=True)
###############################################################################
## CNN model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.dense1 = nn.Linear(320, 50)
        self.dense2 = nn.Linear(50, 10)
    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.dense1(x))
        x = F.dropout(x, training=self.training)
        x = self.dense2(x)
        return F.log_softmax(x, dim=1)
model = CNN()
## optimizer = stochastic gradient descent
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
###############################################################################
## train and test functions
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += F.nll_loss(output, target, size_average=False).item() # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.data.view_as(pred)).long().sum()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
###############################################################################
## run
nepochs = 10
for epoch in range(1, nepochs + 1):
    train(epoch)
    test()
###############################################################################

Most of the code should be easy to follow. But here are some notes

  • For loading the data, we are using the inbuilt data loaders that PyTorch provides. They have support for MNIST dataset, and the CIFAR 10 dataset we’ll use later in the tutorial.
  • The CNN model above has two convolutional layers, and two dense layers.
  • The CNN uses dropout for regularization. Note the importance of model.train() and model.eval() at the beginning of train() and test() functions. The dropout function is required to behave differently during training and evaluation phases.
  • Softmax is a type of activation function, which normalizes the output in such a way that all the outputs in the layer sum to 1 (and all outputs are positive). Hence, the output can be used to represent probabilities assigned to different categories.
  • Note that it softmax operation operates on the entire layer, and not in each individual node separately. The function is given by softmax(x) = exp(x) / sum(exp(x)), where x is a vector.
  • PyTorch implements log_softmax instead for numerical stability reasons, which is what the nll_loss function (negative log likelihood) expects.
  • The training and evaluation code is pretty similar to last time (XOR neural network example).

Ablation Study

Now, we’ll perform what is known as an ablation study. We’ll measure the impact of each component by removing it one at a time, and putting it back. Take the quiz now. It includes the details for the experiments you need to perform. Each experiment will take about 5 minutes to run. Once done, make a note of the relative importance of each component.

Object Classification

Dataset

Some samples from the CIFAR10 dataset are shown below.

You can use the following code below to load the dataset.

########################################################################
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].
transform = torchvision.transforms.Compose([
                       torchvision.transforms.ToTensor(),
                       torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                    ])
train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=1000, shuffle=False)
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
########################################################################

Implementing CNN

Your objective should be to achieve 75%+ accuracy. It is possible to achieve this accuracy with a total training time of ~1 hour on a standard CPU.

You can make use of the two hints below:

Hint 1: Make use of the results from the ablation study.

Hint 2: What happens to the computation requirements if you double the size of each layer? What happens to the computation requirements if you add another layer?

Results and Discussion

If you achieved 75% accuracy, congrats! Classifying objects is a much tougher task than classifying digits. Moreover, the digit dataset is extremely clean and normalized, so that methods like K-Nearest Neighbors and Support Vector Machines also give >90% accuracy.

Accuracy of 85%+ can be achieved on this dataset with some more computational resources and data augmentation (adding to the training dataset images that are shifted (by 1-2 pixels), flipped, etc). Accuracy of 90%+ can be achieved by using a technique called batch normalization.

References

  1. Code has been adapted from PyTorch example authored by Soumith Chintala.

© 2016-2022. All rights reserved.