# Introduction to Neural Networks

### Programming Lab 11: Convolutional neural networks

This week's recommended readings are Goodfellow's Deep Learning textbook Chapter 9 and several articles that can be downloaded from the main course page. These include two review articles on deep learning which cover convolutional neural networks and several original papers on the different convolutional architectures we will discuss this week.

#### PyTorch

We will implement this week's programming lab using PyTorch. PyTorch is an open source machine learning framework used for deep learning applications. The package is popular among scientists and research-focused developers because of its flexibility and use of dynamic computational graphs. The PyTorch website is an excellent resource for documentation and tutorials.

#### Multi-class object categorization

This week we will build a convolutional neural network (CNN) and train it to classify object images into 10 different categories. We will use the CIFAR-10 data set to train and test our model. This data set consists of 60,000 low-resolution photos of objects from 10 different categories. Fig. 1 shows example images for each category.

Compared to the relatively simple binary classification problem that we solved previously with the perceptron and logistic regression model, multi-class object categorization is a much more complex problem to solve. We thus need a more complex model: a feedforward convolutional neural network (CNN). Standard CNNs resemble HMAX in architecture but have trainable parameters. We will train our model through supervised learning with backpropagation and use gradient descent to optimize the weights, as before. However, given the increased complexity of our model, we need to carefully set up our learning problem to ensure fast and successful weight optimization. This involves preprocessing of the data (images) and careful initialization of the weights, both of which will be discussed below.

# import libraries
import numpy as np
import torch

# step 1: prep data set
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# step 2: build model
import torch.nn as nn
import torch.nn.functional as F

# step 3: train model
import torch.optim as optim

#### Step 1: Prepare the data set

Split the images into train and test sets
As described in this week's lecture, we split the images into a training and a test set. We use the training set to learn the weights and the test set to assess how well our trained network performs at classifying objects in novel images. We further split up the training set in mini batches to speed up gradient descent. We update the weights after each mini batch.

Preprocess the images
It is standard practice to normalize images before training a convolutional neural network. Normalization speeds up learning because it makes the cost function more symmetric. You can think of this as changing an elongated dish into a bowl (see Fig. 2). The effect of normalization is largest if the original input features (e.g. pixel intensities) differ substantially in scale. Images are usually normalized as follows. We compute the mean pixel intensity across training images for each colour channel (R,G,B). We also compute the standard deviation of the pixel intensities across training images for each colour channel. We then use these values to normalize each image pixel in both the training and the test data set:

$$z_{c,i} = \frac{x_{c,i} - \mu_{c}}{\sigma_{c}}$$

where $$z$$ is the normalized pixel intensity, $$x$$ is the original pixel intensity, $$\mu$$ is the mean pixel intensity across training images, and $$\sigma$$ is the standard deviation of pixel intensities across training images. $$c$$ indexes the colour channel, $$i$$ indexes the pixel.

# download, split, and preprocess the images
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Show a few example images:

# function to show an image
def imshow(img):
img = img / 2 + 0.5     # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.ion()
plt.show()

# get some random training images
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))

# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

#### Step 2: Build the network

# define network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

net = Net()

#### Step 3: Train the network

# define loss function and optimizer (cross-entropy loss and SGD with momentum)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# train the model
for epoch in range(2):  # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999:    # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')

# save trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

#### Step 4: Test the network

# display images from test set
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

# load trained network (only needed if you exited ipython after training)
net = Net()

# object categories predicted by the network
outputs = net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))

# accuracy on the full test data set (10,000 images)
correct = 0
total = 0
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))

# investigate performance per object categoryclass_correct = list(0. for i in range(10))
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1

for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))

Sources for this programming lab:
PyTorch CIFAR-10 tutorial
Andrew Ng's Coursera course on deep learning