This week's recommended readings are Goodfellow's Deep Learning textbook Chapter 9 and several articles that can be downloaded from the main course page. These include two review articles on deep learning which cover convolutional neural networks and several original papers on the different convolutional architectures we will discuss this week.
We will implement this week's programming lab using PyTorch. PyTorch is an open source machine learning framework used for deep learning applications. The package is popular among scientists and research-focused developers because of its flexibility and use of dynamic computational graphs. The PyTorch website is an excellent resource for documentation and tutorials.
This week we will build a convolutional neural network (CNN) and train it to classify object images into 10 different categories. We will use the CIFAR-10 data set to train and test our model. This data set consists of 60,000 low-resolution photos of objects from 10 different categories. Fig. 1 shows example images for each category.Compared to the relatively simple binary classification problem that we solved previously with the perceptron and logistic regression model, multi-class object categorization is a much more complex problem to solve. We thus need a more complex model: a feedforward convolutional neural network (CNN). Standard CNNs resemble HMAX in architecture but have trainable parameters. We will train our model through supervised learning with backpropagation and use gradient descent to optimize the weights, as before. However, given the increased complexity of our model, we need to carefully set up our learning problem to ensure fast and successful weight optimization. This involves preprocessing of the data (images) and careful initialization of the weights, both of which will be discussed below.
Let's start with importing the libraries we need:
# import libraries
import numpy as np
import torch
# step 1: prep data set
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
# step 2: build model
import torch.nn as nn
import torch.nn.functional as F
# step 3: train model
import torch.optim as optim
Split the images into train and test setsAs described in this week's lecture, we split the images into a training and a test set. We use the training set to learn the weights and the test set to assess how well our trained network performs at classifying objects in novel images. We further split up the training set in mini batches to speed up gradient descent. We update the weights after each mini batch.
Preprocess the imagesIt is standard practice to normalize images before training a convolutional neural network. Normalization speeds up learning because it makes the cost function more symmetric. You can think of this as changing an elongated dish into a bowl (see Fig. 2). The effect of normalization is largest if the original input features (e.g. pixel intensities) differ substantially in scale. Images are usually normalized as follows. We compute the mean pixel intensity across training images for each colour channel (R,G,B). We also compute the standard deviation of the pixel intensities across training images for each colour channel. We then use these values to normalize each image pixel in both the training and the test data set:
\begin{equation} z_{c,i} = \frac{x_{c,i} - \mu_{c}}{\sigma_{c}} \end{equation}where \(z\) is the normalized pixel intensity, \(x\) is the original pixel intensity, \( \mu\) is the mean pixel intensity across training images, and \( \sigma\) is the standard deviation of pixel intensities across training images. \(c\) indexes the colour channel, \(i\) indexes the pixel.
# download, split, and preprocess the images
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Show a few example images:
# function to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.ion()
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
# define network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# define loss function and optimizer (cross-entropy loss and SGD with momentum)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# train the model
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
# save trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
# display images from test set
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
# load trained network (only needed if you exited ipython after training)
net = Net()
net.load_state_dict(torch.load(PATH))
# object categories predicted by the network
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
# accuracy on the full test data set (10,000 images)
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
# investigate performance per object categoryclass_correct = list(0. for i in range(10))
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
Sources for this programming lab: PyTorch CIFAR-10 tutorial Andrew Ng's Coursera course on deep learning