A recurrent neural network (RNN) is a neural network which maps an input space of sequences to an output space of sequences in a stateful way[RHW86, Mur22].
While convolutional neural networks excel at two-dimensional (2D) data, recurrent neural networks (RNNs) are better suited for one-dimensional (1D), sequential data[GBC16, §9.11].
Unlike early artificial neural networks (ANNs) which have a feedforward structure, RNNs have a cyclic structure, inspired by the cyclical connectivity of neurons; see Fig. 1.
The forward pass of an RNN is the same as that of a multilayer perceptron, except that activations arrive at a hidden layer from both the current external input and the hidden-layer activations from the previous timestep.
Fig. 1 visualises the operation of an RNN by “unfolding” or “unrolling” the network across timesteps, with the same network parameters applied at each timestep.
Note: The term “timestep” should be understood more generally as an index for sequential data.
For the backward pass, two well-known algorithms are applicable: 1️⃣ real-time recurrent learning and the simpler, computationally more efficient 2️⃣ backpropagation through time[Wer90].
Fig. 1: On the left, an RNN is often visualised as a neural network with recurrent connections. The recurrent connections should be understood, through unfolding or unrolling the network across timesteps, as applying the same network parameters to the current input and the previous state at each timestep. On the right, while the recurrent connections (blue arrows) propagate the network state over timesteps, the standard network connections (black arrows) propagate activations from one layer to the next within the same timestep. Diagram adapted from [ZLLS23, Figure 9.1].
Fig. 1 implies information flows in one direction, the direction associated with causality.
However, for many sequence labelling tasks, the correct output depends on the entire input sequence, or at least a sufficiently long input sequence. Examples of these tasks include speech recognition and language translation. Addressing the need of these tasks gave rise to bidirectional RNNs[SP97].
Standard/traditional RNNs suffer from the following deficiencies[Gra12, YSHZ19, MSO24]:
They cannot store information for long periods of time.
Except for bidirectional RNNs, they access context information in only one direction (i.e., typically past information in the time domain).
Due to the drawbacks above, RNNs are typically used with “leaky” units enabling the networks to accumulate information over a long duration[GBC16, §10.10]. The resultant RNNs are called gated RNNs. The most successful gated RNNs are those using long short-term memory (LSTM) or gated recurrent units (GRU).
I. D. Mienye, T. G. Swart, and G. Obaido, Recurrent neural networks: A comprehensive review of architectures, variants, and applications, Information15 no. 9 (2024). https://doi.org/10.3390/info15090517.
[Mur22]
K. P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press, 2022. Available at http://probml.ai.
[RHW86]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature323 (1986), 533–536. https://doi.org/10.1038/323533a0.
[SP97]
M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing45 no. 11 (1997), 2673–2681. https://doi.org/10.1109/78.650093.
[VHMN20]
G. Van Houdt, C. Mosquera, and G. Nápoles, A review on the long short-term memory model, Artificial Intelligence Review53 no. 8 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1.
[Wer90]
P. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE78 no. 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337.
[YSHZ19]
Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation31 no. 7 (2019), 1235–1270. https://doi.org/10.1162/neco_a_01199.
[ZLLS23]
A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023. Available at https://d2l.ai/.