In this hands on assignment, we'll use CNNs for image classification. You'll first run provided code for digit classification, and analyze the effects of number of layers, hidden layer sizes, dropout and momentum on the performance. Then, you'll write code to perform object classification between 10 categories such as airplanes, cars, trucks, birds, cats, dogs, etc. Let's get started!
Digit Classification
Dataset
The dataset we'll use is the MNIST dataset, which is popularly used dataset in machine learning for the hand-written digit recognition task. Here are some sample images from the dataset.
You don't need to download the dataset manually. PyTorch can do that for us.
How to run the code using Google Colaboratory
You can also play with this project directly in-browser via Google Colaboratory using the link above. Google Colab is a free tool that lets you run small Machine Learning experiments through your browser. You should read this 1 min tutorial if you're unfamiliar with Google Colaboratory.
Code
You can use the following code to perform digit classification:
from __future__ import print_functionimport torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom torchvision import datasets, transformsfrom torch.autograd import Variabletorch.manual_seed(42)################################################################################# load datatransform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,), (0.3081,))])train_data = datasets.MNIST('./data/mnist', train=True, download=True, transform=transform)test_data = datasets.MNIST('./data/mnist', train=False, transform=transform)train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)test_loader = torch.utils.data.DataLoader(test_data, batch_size=1000, shuffle=True)################################################################################# CNN modelclass CNN(nn.Module):def __init__(self):super(CNN, self).__init__()self.conv1 = nn.Conv2d(1, 10, kernel_size=5)self.conv2 = nn.Conv2d(10, 20, kernel_size=5)self.conv2_drop = nn.Dropout2d()self.dense1 = nn.Linear(320, 50)self.dense2 = nn.Linear(50, 10)def forward(self, x):x = F.relu(F.max_pool2d(self.conv1(x), 2))x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))x = x.view(-1, 320)x = F.relu(self.dense1(x))x = F.dropout(x, training=self.training)x = self.dense2(x)return F.log_softmax(x, dim=1)model = CNN()## optimizer = stochastic gradient descentoptimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)################################################################################# train and test functionsdef train(epoch):model.train()for batch_idx, (data, target) in enumerate(train_loader):data, target = Variable(data), Variable(target)optimizer.zero_grad()output = model(data)loss = F.nll_loss(output, target)loss.backward()optimizer.step()if batch_idx % 10 == 0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, batch_idx * len(data), len(train_loader.dataset),100. * batch_idx / len(train_loader), loss.item()))def test():model.eval()test_loss = 0correct = 0for data, target in test_loader:data, target = Variable(data, volatile=True), Variable(target)output = model(data)test_loss += F.nll_loss(output, target, size_average=False).item() # sum up batch losspred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probabilitycorrect += pred.eq(target.data.view_as(pred)).long().sum()test_loss /= len(test_loader.dataset)print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(test_loss, correct, len(test_loader.dataset),100. * correct / len(test_loader.dataset)))################################################################################# runnepochs = 10for epoch in range(1, nepochs + 1):train(epoch)test()###############################################################################
Most of the code should be easy to follow. But here are some notes
- For loading the data, we are using the inbuilt data loaders that PyTorch provides. They have support for MNIST dataset, and the CIFAR 10 dataset we'll use later in the tutorial.
- The CNN model above has two convolutional layers, and two dense layers.
- The CNN uses dropout for regularization. Note the importance of model.train() and model.eval() at the beginning of train() and test() functions. The dropout function is required to behave differently during training and evaluation phases.
- Softmax is a type of activation function, which normalizes the output in such a way that all the outputs in the layer sum to 1 (and all outputs are positive). Hence, the output can be used to represent probabilities assigned to different categories.
- Note that it softmax operation operates on the entire layer, and not in each individual node separately. The function is given by softmax(x) = exp(x) / sum(exp(x)), where x is a vector.
- PyTorch implements log_softmax instead for numerical stability reasons, which is what the nll_loss function (negative log likelihood) expects.
- The training and evaluation code is pretty similar to last time (XOR neural network example).
Ablation Study
Now, we'll perform what is known as an ablation study. We'll measure the impact of each component by removing it one at a time, and putting it back. Take the quiz now. It includes the details for the experiments you need to perform. Each experiment will take about 5 minutes to run. Once done, make a note of the relative importance of each component.
Object Classification
Dataset
Some samples from the CIFAR10 dataset are shown below.
You can use the following code below to load the dataset.
######################################################################### The output of torchvision datasets are PILImage images of range [0, 1].# We transform them to Tensors of normalized range [-1, 1].transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)test_data = torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform)train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)test_loader = torch.utils.data.DataLoader(test_data, batch_size=1000, shuffle=False)classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')########################################################################
Implementing CNN
Your objective should be to achieve 75%+ accuracy. It is possible to achieve this accuracy with a total training time of ~1 hour on a standard CPU.
You can make use of the two hints below:
Results and Discussion
If you achieved 75% accuracy, congrats! Classifying objects is a much tougher task than classifying digits. Moreover, the digit dataset is extremely clean and normalized, so that methods like K-Nearest Neighbors and Support Vector Machines also give >90% accuracy.
Accuracy of 85%+ can be achieved on this dataset with some more computational resources and data augmentation (adding to the training dataset images that are shifted (by 1-2 pixels), flipped, etc). Accuracy of 90%+ can be achieved by using a technique called batch normalization.
References
- Code has been adapted from PyTorch example authored by Soumith Chintala.