This week we will learn about recurrent neural networks (RNNs). RNNs are structured such that there are network loops within it, allowing information to transfer across processing applied to different inputs. Put another way, RNNs are a sequence of smaller neural networks, each talking to the neural network beside it.
Why would such a network be useful? Let's consider a thought: when we read a sentence (much like you are doing now), you use the words you have previously read to better understand the current word you are reading - your knowledge is providing some context for the processing of the current word. This is exactly what is happening in a RNN.This first video gives an overview of the topics we will discuss this week.
The forward pass of the first input to an RNN is identical to what we have learned previously - you have some input vector, it is transformed into some hidden unit acitivities, then at some point the hidden unit activities are read out into the output vector. However, in RNNs the forward pass is not finished at the first output - the network keeps computing until either all the inputs and all the outputs of the sequence are processed. This is fundamentally different from what we have discussed previously (in regards to standard convolutional neural networks).
What changes is that the hidden state keeps track of information, and passes that information to the hidden state during processing of the next input. The chain-like structure of RNNs allows for this transfer of knowledge to happen. In the next video, we discuss how the forward pass happens in a very simple RNN.
Once we have completed the entire forward pass and have our sequential output, we are ready to calculate how good our network was, and update weights accordingly. To do this, we backpropagate the loss through the network (over weights and time). This process is the same as what we discussed in convolutional neural networks, with the only change being that we need to ensure we account for the loss that occurs in subsequent time steps because we pass our information along to the future. Put another way, if you misinterpreted a word in the sentence, your understanding of the next words would change. The error here is driven by the fact that you misunderstood a word early in the sentence - not that you did not understand each word individually. BPTT addresses these kinds of dependencies.
In the BPTT video, we do not discuss derivation of partial derivatives in detail. That is because such derivations are numerous, and do not necessarily add to your understanding of the process at a conceptual level. Please find below a list of resources for more information on the derivations, and further reading on BPTT and RNNs in general: