Introduction to Neural Networks

Programming Lab 11: Convolutional neural networks


This week's recommended readings are Goodfellow's Deep Learning textbook Chapter 9 and several articles that can be downloaded from the main course page. These include two review articles on deep learning which cover convolutional neural networks and several original papers on the different convolutional architectures we will discuss this week.


We will implement this week's programming lab using PyTorch. PyTorch is an open source machine learning framework used for deep learning applications. The package is popular among scientists and research-focused developers because of its flexibility and use of dynamic computational graphs. The PyTorch website is an excellent resource for documentation and tutorials.

Multi-class object categorization

This week we will build a convolutional neural network (CNN) and train it to classify object images into 10 different categories. We will use the CIFAR-10 data set to train and test our model. This data set consists of 60,000 low-resolution photos of objects from 10 different categories. Fig. 1 shows example images for each category.

Compared to the relatively simple binary classification problem that we solved previously with the perceptron and logistic regression model, multi-class object categorization is a much more complex problem to solve. We thus need a more complex model: a feedforward convolutional neural network (CNN). Standard CNNs resemble HMAX in architecture but have trainable parameters. We will train our model through supervised learning with backpropagation and use gradient descent to optimize the weights, as before. However, given the increased complexity of our model, we need to carefully set up our learning problem to ensure fast and successful weight optimization. This involves preprocessing of the data (images) and careful initialization of the weights, both of which will be discussed below.

Fig 1 This figure shows example images for each of the 10 object categories in the CIFAR-10 data set. The data set has 6,000 images per category, yielding a total of 60,000 images. Image resolution is 32 by 32 pixels (by 3 colour channels). Figure source:

Let's start with importing the libraries we need:

# import libraries
import numpy as np
import torch
# step 1: prep data set 			
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# step 2: build model
import torch.nn as nn
import torch.nn.functional as F

# step 3: train model
import torch.optim as optim

Step 1: Prepare the data set

Split the images into train and test sets
As described in this week's lecture, we split the images into a training and a test set. We use the training set to learn the weights and the test set to assess how well our trained network performs at classifying objects in novel images. We further split up the training set in mini batches to speed up gradient descent. We update the weights after each mini batch.

Preprocess the images
It is standard practice to normalize images before training a convolutional neural network. Normalization speeds up learning because it makes the cost function more symmetric. You can think of this as changing an elongated dish into a bowl (see Fig. 2). The effect of normalization is largest if the original input features (e.g. pixel intensities) differ substantially in scale. Images are usually normalized as follows. We compute the mean pixel intensity across training images for each colour channel (R,G,B). We also compute the standard deviation of the pixel intensities across training images for each colour channel. We then use these values to normalize each image pixel in both the training and the test data set:

\begin{equation} z_{c,i} = \frac{x_{c,i} - \mu_{c}}{\sigma_{c}} \end{equation}

where \(z\) is the normalized pixel intensity, \(x\) is the original pixel intensity, \( \mu\) is the mean pixel intensity across training images, and \( \sigma\) is the standard deviation of pixel intensities across training images. \(c\) indexes the colour channel, \(i\) indexes the pixel.

Fig 2 This figure shows schematically how normalization of the network's input speeds up learning. Plots show the cost function (blue = high cost, red = low cost) with respect to two weights (each of which connects one pixel in the input image to a unit in the first layer of the network). If the intensities of the two pixels differ in scale, the cost function will be nonsymmetric, as shown in the left panel. The black line shows the trajectory taken by gradient descent. Input normalization makes the cost function more symmetric, which speeds up gradient descent.

# download, split, and preprocess the images
transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader =, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader =, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Show a few example images:

# function to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# get some random training images
dataiter = iter(trainloader)
images, labels =

# show images

# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Step 2: Build the network

# define network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

Step 3: Train the network

# define loss function and optimizer (cross-entropy loss and SGD with momentum)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# train the model
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

# save trained model
PATH = './cifar_net.pth', PATH)

Step 4: Test the network

# display images from test set
dataiter = iter(testloader)
images, labels =

# print images
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

# load trained network (only needed if you exited ipython after training)
net = Net()

# object categories predicted by the network
outputs = net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))
# accuracy on the full test data set (10,000 images)
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))	

# investigate performance per object categoryclass_correct = list(0. for i in range(10))
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Sources for this programming lab:
PyTorch CIFAR-10 tutorial
Andrew Ng's Coursera course on deep learning