Introduction to Neural Networks

Problem Set 9


The XOR problem

The perceptron was originally motivated by the desire to model the brain's ability to implement logical functions. Enthusiasm for the perceptron therefore waned when it became clear that the perceptron cannot implement the exclusive OR (XOR) function, which requires a nonlinear mapping from inputs to output. The XOR problem is illustrated in Fig 1.


Fig 1 This figure illustrates different logical functions (OR, AND, XOR) using the apple example from the programming lab. Let's assume we have four categories of apples now: green and small (0,0), green and large (0,1), red and small (1,0), and red and large (1,1). The logical OR function outputs 0 for the small green apples, and 1 otherwise. In other words, we select an apple when it has at least one of the following properties: red or large. The logical AND function outputs 1 for the large red apples, and 0 otherwise. In other words, we only select apples when they are both red and large (as in the programming lab). The logical XOR function outputs 1 for the large green and small red apples, and 0 otherwise. In other words, we select apples when they are either red or large but not both. While the OR and AND functions can be implemented by learning a linear decision boundary, this is not the case for the XOR function.

Fig 1 suggests that the XOR problem can be solved by first learning the OR and AND functions and then combining the outputs from these functions (i.e. we want the OR apples but with the AND apples removed). We can achieve this by stacking perceptrons, creating a hidden layer in-between the input and output (see Fig 2).


Fig 2 This figure shows a multi-layer perceptron, which can solve the XOR problem.

Let's implement the multi-layer perceptron shown in Fig 2 and train it to learn the XOR function. The code below provides the framework for the solution code. To solve the problem set, you need to complete the missing lines of code.

# import libraries
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
import numpy.matlib

# main function
def ps9(peri,monitor):
    # seed the random number generator (to ensure that we all generate the same data and initial weights)
    random.seed(3)
    
    # generate training data   
    means = np.array([[0.3,0.3],[0.3,0.75],[0.75,0.3],[0.75,0.75]])
    sigma = 0.04
    ndatapoints = 20
    data_output_train = generate_data_4classes(means, sigma, ndatapoints)  
    data_train = data_output_train[0]
    randvec_train = data_output_train[1]  

    # training parameters
    learning_rate = 0.01
    niterations = 2

    # train perceptron 1 (OR)
    training_output = # call the train function (complete this line yourself)
    weights_OR = training_output[0]

    # train perceptron 2 (AND)    
    training_output = # call the train function (complete this line yourself)
    weights_AND = training_output[0]

    # train perceptron 3 (XOR)
    # this perceptron takes the outputs from perceptron 1 and 2 as input (= as training data)
    # assemble the training data (complete this yourself using the test function - needs 2-3 lines of code) 
    training_output = # call the train function (complete this line yourself)
    weights_XOR = training_output[0]

    # show training data and decision boundaries for the three perceptrons
    if monitor:
        # plt.ion() # you may need to turn interactive mode on for figure plotting        
        # perceptron 1 (OR)
        colors = plot_data(ndatapoints, data_output_train, 1)
        plot_boundary(weights_OR, 1)
        # perceptron 2 (AND)
        colors = plot_data(ndatapoints, data_output_train, 2)
        plot_boundary(weights_AND, 2)
        # perceptron 3 (XOR)        
        plot_data_XOR(colors, predictions_OR, predictions_AND, 3)
        plot_boundary(weights_XOR, 3)
    
    # print weights
    if peri == 1:
        print(weights_OR)
    if peri == 2:
        print(weights_AND)
    if peri == 3:
        print(weights_XOR)

# helper functions
# generate data for 4 classes (input x and output f(x)) 
def generate_data_4classes(means, sigma, ndatapoints):
    nclasses = means.shape[0]
    data = np.zeros((nclasses * ndatapoints, 5)) # cols 1-2 = inputs, cols 3-5 = desired output (for OR, AND, and XOR function)
    for c in range(0, nclasses):
        starti = c * ndatapoints
        endi = (c + 1) * ndatapoints
        data[starti:endi, 0:1] = means[c,0] + sigma * random.standard_normal((ndatapoints, 1))
        data[starti:endi, 1:2] = means[c,1] + sigma * random.standard_normal((ndatapoints, 1))
        if c > 0: 
            data[starti:endi, 2] = 1 # OR
        if c == 3:
            data[starti:endi, 3] = 1 # AND
        if # ... (complete this line yourself)
            data[starti:endi, 4] = 1 # XOR 
    randvec = np.random.permutation(nclasses * ndatapoints)    
    data = data[randvec,:]
    return data, randvec;

# plot the input for the OR-perceptron or the AND-perceptron
def plot_data(ndatapoints, data_output, figi):
    data = data_output[0]
    randvec = data_output[1]    
    colors = np.concatenate((np.matlib.repmat(np.array([1, 0.5, 1]),ndatapoints,1),np.matlib.repmat(np.array([0.5, 1, 1]),ndatapoints,1),np.matlib.repmat(np.array([0.6, 1, 0.6]),ndatapoints,1),np.matlib.repmat(np.array([0.5, 0.5, 1]),ndatapoints,1)))
    colors = colors[randvec,:]
    plt.figure(figi)
    plt.scatter(data[:,0], data[:,1], c=colors, alpha=0.5)
    plt.axis('square')  
    plt.xlabel('x1 (0 = green, 1 = red)')
    plt.ylabel('x2 (0 = small, 1 = large)')
    if figi == 1:
        plt.title('logical OR')
    elif figi == 2:
        plt.title('logical AND')
    return colors

# plot the input for the XOR-perceptron 
def plot_data_XOR(colors, predictions_OR, predictions_AND, figi):
    plt.figure(figi)
    plt.scatter() # complete this line yourself by providing the correct input arguments
    plt.axis('square')  
    plt.xlabel() # complete this line yourself by providing a label for the x axis
    plt.ylabel() # complete this line yourself by providing a label for the y axis
    plt.title() # complete this line yourself by providing a title for the figure

# plot the decision boundary
def plot_boundary(weights, figi):
    b = weights[0]; w1 = weights[1]; w2 = weights[2]
    slope = -(b / w2) / (b / w1)
    y_intercept = -b / w2
    x = np.linspace(0,1,100)
    y = (slope * x) + y_intercept
    plt.figure(figi)
    plt.plot(x, y)
    plt.pause(0.4)

# predict output
def predict(inputs, weights):
    summation = np.dot(inputs, weights[1:]) + weights[0]
    if summation > 0:
      prediction = 1
    else:
      prediction = 0            
    return prediction

# train the perceptron
def train(data, learning_rate, niterations, figi=0):
    training_inputs = data[:,0:2]
    labels = data[:,2]    
    weights = 0.001 * random.standard_normal(data.shape[1])   
    errors = np.zeros((data.shape[0], niterations))
    j = 0
    for _ in range(niterations):
        i = 0
        for inputs, label in zip(training_inputs, labels):
            prediction = predict(inputs, weights)
            weights[1:] += learning_rate * (label - prediction) * inputs
            weights[0] += learning_rate * (label - prediction)
            errors[i,j] = label - prediction
            if figi>0:
                plot_boundary(weights, figi)
            i += 1   
        j += 1        
    return weights, errors;

# test the perceptron	
def test(data, weights):
    inputs_test = data[:,0:2]
    labels = data[:,2]
    npredictions = data.shape[0]
    predictions = np.zeros(npredictions)
    for i in range(0, npredictions):
        predictions[i] = predict(inputs_test[i,:], weights)
    return predictions

To test your code, save it as solution_code_ps9.py and run it using the code below.

# save the code below as ps9_main.py in the same folder as your solution code
# 
# input arguments for ps9
# peri:     perceptron index
#           controls the output that is printed to the terminal (weights for perceptron of your choice)
#           1 = OR perceptron
#           2 = AND perceptron
#           3 = XOR perceptron
# monitor:  controls figure output
#           0 = no figures
#           1 = figures (for all perceptrons)
#
# to test your solution code, set the input arguments below, and execute the following command from your terminal: run ps9_main.py

from solution_code_ps9 import ps9

ps9(
    peri = 1, # if you want to print the weights for the OR perceptron
    monitor = 1, # if you want figure output
)

Output: Upload the following two files to codePost: solution_code_ps9.py and ps9.pdf. The pdf file should contain figures showing the input (training data) for each of the three perceptrons and the decision boundary learned by each perceptron. Your solution code should output these three figures when you run it with monitor set to 1.