Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL
Picture of Yee Wei Law

Recurrent neural networks

by Yee Wei Law - Friday, 31 January 2025, 3:17 PM
 

A recurrent neural network (RNN) is a neural network which maps an input space of sequences to an output space of sequences in a stateful way[RHW86, Mur22].

While convolutional neural networks excel at two-dimensional (2D) data, recurrent neural networks (RNNs) are better suited for one-dimensional (1D), sequential data[GBC16, §9.11].

Unlike early artificial neural networks (ANNs) which have a feedforward structure, RNNs have a cyclic structure, inspired by the cyclical connectivity of neurons; see Fig. 1.

The forward pass of an RNN is the same as that of a multilayer perceptron, except that activations arrive at a hidden layer from both the current external input and the hidden-layer activations from the previous timestep.

Fig. 1 visualises the operation of an RNN by “unfolding” or “unrolling” the network across timesteps, with the same network parameters applied at each timestep.

Note: The term “timestep” should be understood more generally as an index for sequential data.

For the backward pass, two well-known algorithms are applicable: 1️⃣ real-time recurrent learning and the simpler, computationally more efficient 2️⃣ backpropagation through time[Wer90].

Fig. 1: On the left, an RNN is often visualised as a neural network with recurrent connections. The recurrent connections should be understood, through unfolding or unrolling the network across timesteps, as applying the same network parameters to the current input and the previous state at each timestep. On the right, while the recurrent connections (blue arrows) propagate the network state over timesteps, the standard network connections (black arrows) propagate activations from one layer to the next within the same timestep. Diagram adapted from [ZLLS23, Figure 9.1].

Fig. 1 implies information flows in one direction, the direction associated with causality.

However, for many sequence labelling tasks, the correct output depends on the entire input sequence, or at least a sufficiently long input sequence. Examples of these tasks include speech recognition and language translation. Addressing the need of these tasks gave rise to bidirectional RNNs[SP97].

Standard/traditional RNNs suffer from the following deficiencies[Gra12, YSHZ19, MSO24]:

  • They are susceptible to the problems of vanishing gradients and exploding gradients.
  • They cannot store information for long periods of time.
  • Except for bidirectional RNNs, they access context information in only one direction (i.e., typically past information in the time domain).

Due to the drawbacks above, RNNs are typically used with “leaky” units enabling the networks to accumulate information over a long duration[GBC16, §10.10]. The resultant RNNs are called gated RNNs. The most successful gated RNNs are those using long short-term memory (LSTM) or gated recurrent units (GRU).

References

[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at https://www.deeplearningbook.org.
[Gra12] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Springer Berlin, Heidelberg, 2012. https://doi.org/10.1007/978-3-642-24797-2.
[MSO24] I. D. Mienye, T. G. Swart, and G. Obaido, Recurrent neural networks: A comprehensive review of architectures, variants, and applications, Information 15 no. 9 (2024). https://doi.org/10.3390/info15090517.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press, 2022. Available at http://probml.ai.
[RHW86] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature 323 (1986), 533–536. https://doi.org/10.1038/323533a0.
[SP97] M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing 45 no. 11 (1997), 2673–2681. https://doi.org/10.1109/78.650093.
[VHMN20] G. Van Houdt, C. Mosquera, and G. Nápoles, A review on the long short-term memory model, Artificial Intelligence Review 53 no. 8 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1.
[Wer90] P. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE 78 no. 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337.
[YSHZ19] Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation 31 no. 7 (2019), 1235–1270. https://doi.org/10.1162/neco_a_01199.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023. Available at https://d2l.ai/.