Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

Page: (Previous)   1  2  3  (Next)
  ALL

D

Picture of Yee Wei Law

Dropout

by Yee Wei Law - Tuesday, 20 June 2023, 2:35 PM
 

Deep neural networks (DNNs) employ a large number of parameters to learn complex dependencies of outputs on inputs, but overfitting often occurs as a result.

Large DNNs are also slow to converge.

The dropout method implements the intuitive idea of randomly dropping units (along with their connections) from a network during training [SHK+14].

Fig. 1: Sample effect of applying dropout to a neural network in (a). The thinned network in (b) has units marked with a cross removed [SHK+14, Figure 1].

References

[SHK+14] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research 15 no. 56 (2014), 1929–1958. Available at http://jmlr.org/papers/v15/srivastava14a.html.

F

Picture of Yee Wei Law

Few-shot learning

by Yee Wei Law - Thursday, 16 February 2023, 3:29 PM
 
Definition 1: Few-shot learning [WYKN20, Definition 2.2]

A type of machine learning problems (specified by experience , task and performance measure ), where contains only a limited number of examples with supervised information for .

References

[] .
[WYKN20] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, Generalizing from a few examples: A survey on few-shot learning, ACM Comput. Surv. 53 no. 3 (2020). https://doi.org/10.1145/3386252.

I

Picture of Yee Wei Law

Invariance and equivariance

by Yee Wei Law - Monday, 6 January 2025, 3:50 PM
 

A function of an input is invariant to a transformation if

In other words, function in invariant to transformation if produces the same output regardless of the output of [Pri23, §10.1].

For example, an image classifier should be invariant to geometric transformations of an image.

A function of an input is equivariant or covariant to a transformation if

In other words, function is equivariant or covariant to transformation if the output of changes in the same way under as the input[Pri23, §10.1].

For example, when an input image is geometrically transformed in some way, the output of an image segmentation algorithm should be transformed in the same way.

References

[Pri23] S. J. Prince, Understanding Deep Learning, MIT Press, 2023. Available at http://udlbook.com.

K

Picture of Yee Wei Law

Kats by Facebook Research

by Yee Wei Law - Saturday, 29 June 2024, 11:22 PM
 

Facebook Research’s Kats has been billed as “one-stop shop for time series analysis in Python”. Kats supports standard time-series analyses, e.g., forecasting, anomaly/outlier detection.

The official installation instructions however do not work out of the box. At the time of writing, the official instructions lead to the error message “python setup.py bdist_wheel did not run successfully” due to incompatibility with the latest version of Python.

Based on community responses to the error message, and based on my personal experience, the following instructions work:

conda install python=3.7 pip setuptools ephem pystan fbprophet
pip install kats
pip install packaging==21.3

The following sample code should run error-free:

import numpy as np
import pandas as pd

from kats.consts import TimeSeriesData
from kats.detectors.cusum_detection import CUSUMDetector

# simulate time series with increase
np.random.seed(10)
df_increase = pd.DataFrame(
    {
        'time': pd.date_range('2019-01-01', '2019-03-01'),
        'increase':np.concatenate([np.random.normal(1,0.2,30), np.random.normal(2,0.2,30)]),
    }
)

# convert to TimeSeriesData object
timeseries = TimeSeriesData(df_increase)

# run detector and find change points
change_points = CUSUMDetector(timeseries).detector()

L

Picture of Yee Wei Law

Long short-term memory (LSTM)

by Yee Wei Law - Friday, 31 January 2025, 3:37 PM
 

A long short-term memory (LSTM) network is a type of recurrent neural network (RNN) designed to address the problems of vanishing gradients and exploding gradients using gradient truncation and structures called “constant error carousels” for enforcing constant (as opposed to vanishing or exploding) error flow[HS97].

LSTM solves such a fundamental problem with traditional RNNs that most of the state-of-the-art results achieved through RNNs can be attributed to LSTM[YSHZ19].

An LSTM network replaces the traditional neural network layers with LSTM layers, each of which consists of a set of recurrently connected, differentiable memory blocks[GS05].

Each LSTM block typically contains one recurrently connected memory cell, called an LSTM cell (to be distinguished from a neuron, which is also called a node or unit), but can contain multiple cells.

Fig. 1 illustrates the structure of an LSTM cell, which acts on the current input, , and the output of the preceding LSTM cell, .

The forget gate is a later addition[GSC00] to the original LSTM design; it determines based on and the amount of information to be discarded from the cell state[YSHZ19]:

where and . In the preceding equations,

  • , and are weights and bias associated with the forget gate;
  • , and are weights and bias associated with the cell;
  • , and are weights and bias associated with the input gate.

When the output of the forget gate, , is 1, all information in is retained, and when the output is zero, all information is discarded.

The cell output, , is the product:

where , and are the weights and bias associated with the output gate.

Fig. 1: An LSTM block with one memory cell, which contains a forget gate acting on the current input, , and the output of the preceding LSTM cell, . and are the cell states for the preceding cell and current cell respectively. While the forget gate scales the cell state, the input and output gates scale the input and output of the cell respectively. The activation functions and , also called squashing functions, are usually . The multiplication represented by ⨀ is element-wise. Omitted from the diagram are the weights and bias associated with the 1️⃣ forget gate, 2️⃣ cell, 3️⃣ input gate, and 4️⃣ output gate. Diagram adapted from [VHMN20, Fig. 1], [YSHZ19, Figure 3] and [GS05, Fig. 1].

LSTM networks can be classified into two main types[YSHZ19, VHMN20]:

LSTM-dominated networks

These are neural networks with LSTM cells as the dominant building blocks.

The design of these networks focuses on optimising the interconnections of the LSTM cells.

Examples include bidirectional LSTM networks, which are extensions of bidirectional RNNs.

The original bidirectional LSTM network[GS05] uses a variation of backpropagation through time[Wer90] for training.

Integrated LSTM networks

These are hybrid neural networks consisting of LSTM and non-LSTM layers.

The design of these networks focuses on integrating the strengths of the different types of layers.

For example, convolutional layers and LSTM layers have been integrated in a wide variety of ways.

Among the many possibilities, the CNN-LSTM architecture is widely used. It can for example be used to predict residential energy consumption[KC19]:

  1. Kim and Cho’s design[KC19] consists of two convolutional-pooling layers, an LSTM layer and two fully connected (or dense) layers.
  2. The convolutional-pooling layers extract features among several variables that affect energy consumption prediction.
  3. The output of the convolutional-pooling layers is fed to the LSTM layer, after denoising, to extract temporal features. The LSTM layer can remember irregular trends.
  4. The output of the LSTM layer is fed to two fully connected layers, the second of which generates a predicted time series of energy consumption.

References

[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at https://www.deeplearningbook.org.
[Gra12] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Springer Berlin, Heidelberg, 2012. https://doi.org/10.1007/978-3-642-24797-2.
[GSC00] F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, Neural Computation 12 no. 10 (2000), 2451–2471. https://doi.org/10.1162/089976600300015015.
[GSC05] A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM networks, in 2005 IEEE International Joint Conference on Neural Networks, 4, 2005, pp. 2047–2052. https://doi.org/10.1109/IJCNN.2005.1556215.
[HS97] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation 9 no. 8 (1997), 1735–1780. https://doi.org/0.1162/neco.1997.9.8.1735.
[KC19] T.-Y. Kim and S.-B. Cho, Predicting residential energy consumption using CNN-LSTM neural networks, Energy 182 (2019), 72–81. https://doi.org/10.1016/j.energy.2019.05.230.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press, 2022. Available at http://probml.ai.
[VHMN20] G. Van Houdt, C. Mosquera, and G. Nápoles, A review on the long short-term memory model, Artificial Intelligence Review 53 no. 8 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1.
[Wer90] P. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE 78 no. 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337.
[YSHZ19] Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation 31 no. 7 (2019), 1235–1270. https://doi.org/10.1162/neco_a_01199.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023. Available at https://d2l.ai/.

M

Picture of Yee Wei Law

Machine learning (including deep learning)

by Yee Wei Law - Saturday, 17 May 2025, 5:13 PM
 

For COMP 5075 students, the Tutorial 3 page is more up-to-date.

Since the mid 2010s, advances in machine learning (ML) and particularly deep learning (DL), under the banner of artificial intelligence (AI), have been attracting not only media attention but also major capital investments.

The field of ML is decades old, but it was not until 2012, when deep neural networks (DNNs) emerged triumphant in the ImageNet image classification challenge [KSH17], that the field of ML truly took off.

DL is known to have approached or even exceeded human-level performance in many tasks.

DL techniques, especially DNN algorithms, are our main pursuit in this course, but before diving into them, we should get a clear idea about the differences among ML, DL and AI; starting with the subsequent definitions.

AI has the broadest and yet most elusive definition. There are four main schools of thought [RN22, Sec. 1.1; IBM23], namely 1️⃣ systems that think like humans, 2️⃣ systems that act like humans, 3️⃣ systems that think rationally, 4️⃣ systems that act rationally; but a sensible definition boils down to:

Definition 1: Artificial intelligence (AI) [RN22, Sec. 1.1.4]

The study and construction of rational agents that pursue their predefined objectives.

Above, a rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome [RN22, Sec. 1.1.4].

The definition of AI above is referred to as the standard model of AI [RN22, Sec. 1.1.4].

ML is a subfield of AI:

Definition 2: Machine learning (ML) [Mit97, Sec. 1.1]

A computer program or machine is said to learn from experience with respect to some class of tasks , and performance measure , if its performance at tasks in , as measured by , improves with experience .

In the preceding definition, “experience”, “task” and “performance measure” require elaboration. Among the most common ML tasks are:

  • Classification: This is usually achieved through supervised learning (see Fig. 1), the aim of which is to learn a mapping from the input set to the output set [ Mur22, Sec. 1.2; GBC16, Sec. 5.1.3; G22, Ch. 1], where

    • every member of is a vector of features, attributes, covariates, or predictors;
    • every member of is a label, target, or response;
    • each pair of input and associated output is called an example.

    A dataset containing and used to “train” a model to predict/infer given some is called a training set; and this corresponds to experience in Definition 2.

    When is a set of unordered and mutually exclusive labels known as classes, the supervised learning task becomes a classification task.

    Classification of only two classes is called binary classification. For example, determining whether an email is spam or not is a binary classification task.

  • Fig. 1: Supervised learning [ZLLS23, Fig. 1.3.1].
    Fig. 2: An example of a regression problem, where given a “new instance”, the target value is to be determined [G22, Figure 1-6].
  • Regression: Continuing from classification, if the output set is a continuous set of real values, rather than a discrete set, the classification task becomes a regression task.

    For example, given the features (e.g., mileage, age, brand, model) and associated price for many examples of cars, a plausible regression task is to predict the price of a car given its features; see Fig. 2.

    While the term “label” is more common in the classification context, “target” is more common in the regression context. In the earlier example, the target is the car price.

  • Clustering: This is the grouping of similar things together.

    The Euclidean distance between two feature vectors can serve as a similarity measure, but depending on the problem, other similarity measures can be more suitable. In fact, many similarity measures have been proposed in the literature [GMW07, Ch. 6].

    Clustering is a form of unsupervised learning.

    From a probabilistic viewpoint, unsupervised learning is fitting an unconditional model of the form , which can generate new data , whereas supervised learning involves fitting a conditional model, , which specifies (a distribution over) outputs given inputs [Mur22, Sec. 1.3].

  • Anomaly detection: This is another form of unsupervised learning, and highly relevant to this course of ours.

    We first encountered anomaly detection in Tutorial 1 on intrusion detection, and we will dive deep into anomaly detection in Tutorial 5 on unsupervised learning.

Common to the aforementioned tasks is the need to measure performance. An example of performance measure is

Other performance measures will be investigated as part of Task 1.

DL is in turn a subfield of ML (see Fig. 3):

Definition 3: Deep learning (DL) [RN22, Sec. 1.3.8]

Machine learning using multiple layers of simple, adjustable computing elements.

Simply put, DL is the ever expanding body of ML techniques that leverage deep architectures (algorithmic structures consisting of many levels of nonlinear operations) for learning feature hierarchies, with features from higher levels of the hierarchy formed by composition of lower-level features [Ben09].

Fig. 3: AI → ML → DL [Cop16].

The rest of this tutorial attempts to 1️⃣ shed some light on why DNNs are superior to classical ML algorithms, 2️⃣ provide a brief tutorial on the original/shallow/artificial neural networks (ANNs), and 3️⃣ provide a preview of DNNs.

The good news with the topics of this tutorial is that there is such a vast amount of learning resources in the public domain, that even if the coverage here fails to satisfy your learning needs, there must be some resources out there that can.

References

[Agg18] C. C. Aggarwal, Neural Networks and Deep Learning: A Textbook, Springer Cham, 2018, supplementary material at http://sn.pub/extras. https://doi.org/10.1007/978-3-319-94463-0.
[Ben09] Y. Bengio, Learning Deep Architectures for AI, Foundations and Trends® in Machine Learning 2 no. 1 (2009), 1–127. https://doi.org/10.1561/2200000006.
[Cop16] M. Copeland, What’s the difference between artificial intelligence, machine learning and deep learning?, NVIDIA blog, July 2016. Available at https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/.
[DG06] J. Davis and M. Goadrich, The Relationship between Precision-Recall and ROC Curves, in Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, Association for Computing Machinery, 2006, p. 233 – 240. https://doi.org/10.1145/1143844.1143874.
[G22] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd ed., O’Reilly Media, Inc., 2022. Available at https://learning.oreilly.com/library/view/hands-on-machine-learning/9781098125967/.
[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at http://www.deeplearningbook.org.
[GMW07] G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications, Society for Industrial and Applied Mathematics, 2007. https://doi.org/10.1137/1.9780898718348.
[Goo22a] Google, Classification: Accuracy, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/accuracy.
[Goo22b] Google, Classification: Precision and Recall, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall.
[Goo22c] Google, Classification: ROC Curve and AUC, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
[IBM23] IBM, What is artificial intelligence (AI)?, IBM Topics, 2023. Available at https://www. ibm.com/topics/artificial-intelligence.
[KSH17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM 60 no. 6 (2017), 84 – 90, journal version of the paper with the same name that appeared in the 25th International Conference on Neural Information Processing Systems in 2012. https://doi.org/10. 1145/3065386.
[LL19] H. Liu and B. Lang, Machine learning and deep learning methods for intrusion detection systems: A survey, Applied Sciences 9 no. 20 (2019). https://doi.org/10.3390/app9204396.
[LXL+22] X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. Dou, Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond, Knowledge and Information Systems 64 no. 12 (2022), 3197–3234. https://doi.org/10.1007/s10115-022-01756-8.
[Mit97] T. C. Mitchell, Machine Learning, McGraw-Hill, 1997. Available at http://www.cs.cmu.edu/~tom/mlbook.html.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An introduction, MIT Press, 2022. Available at http://probml.ai.
[Mur23] K. P. Murphy, Probabilistic Machine Learning: Advanced Topics, MIT Press, 2023. Available at http://probml.github.io/book2.
[NCS22] NCSC, Principles for the security of machine learning, guidance from the National Cyber Security Centre, August 2022. Available at https://www.ncsc.gov.uk/files/Principles-for-the-security-of-machine-learning.pdf.
[PG17] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach, O’Reilly Media, Inc., August 2017. Available at https://learning.oreilly.com/library/view/deep-learning/9781491924570/.
[RN22] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed., Pearson Education, 2022. Available at https://ebookcentral.proquest.com/lib/unisa/reader.action?docID=6563563.
[TBH+19] E. Tabassi, K. J. Burns, M. Hadjimichael, A. D. Molina-Markham, and J. T. Sexton, A taxonomy and terminology of adversarial machine learning, Draft NISTIR 8269, National Institute of Standards and Technology, 2019. https://doi.org/10.6028/NIST.IR.8269-draft.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, 2023, interactive online book, accessed 1 Jan 2023. Available at https://d2l.ai/.

P

Picture of Yee Wei Law

Problems of vanishing gradients and exploding gradients

by Yee Wei Law - Monday, 27 January 2025, 4:46 PM
 

This knowledge base entry follows discussion of artificial neural networks and backpropagation.

The backpropagation (“backprop” for short) algorithm calculates gradients to update each weight.

Unfortunately, gradients often shrink as the algorithm progresses down to the lower layers, with the result that the lower layers’ weights remain virtually unchanged, and training fails to converge to a good solution — this is called the vanishing gradients problem [G22, Ch. 11].

The opposite can also happen: the gradients can keep growing until the layers get excessively large weight updates and the algorithm diverges — this is the exploding gradients problem [G22, Ch. 11].

Both problems plague deep neural networks (DNNs) and recurrent neural networks (RNNs) over very long sequences [Mur22, Sec. 13.4.2].

More generally, deep neural networks suffer from unstable gradients, and different layers may learn at widely different speeds.

Watch Prof Ng’s explanation of the problems:

The problems were observed decades ago and were the reasons why DNNs were mostly abandoned in the early 2000s [G22, Ch. 11].

  • The causes had been traced to the 1️⃣ usage of sigmoid activation functions, and 2️⃣ initialisation of weights to follow the zero-mean Gaussian distribution with standard deviation 1.
  • A sigmoid function saturates at 0 or 1, and when saturated, the derivative is nearly 0.
  • As a remedy, current best practices include using 1️⃣ a rectifier activation function, and 2️⃣ the weight initialisation algorithm called He initialisation.
  • He initialisation [HZRS15, Sec. 2.2]: at layer , weights follow the zero-mean Gaussian distribution with variance , where is the fan-in, or equivalently the number of inputs/weights feeding into layer .
  • He initialisation is implemented by the PyTorch function kaiming_normal_ and the Tensorflow function HeNormal.

Watch Prof Ng’s explanation of weight initialisation:

References

[G22] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd ed., O’Reilly Media, Inc., 2022. Available at https://learning.oreilly.com/library/view/hands-on-machine-learning/9781098125967/.
[HZRS15] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. https://doi.org/10.1109/ICCV.2015.123.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An introduction, MIT Press, 2022. Available at http://probml.ai.

Picture of Yee Wei Law

PyTorch

by Yee Wei Law - Saturday, 31 May 2025, 2:38 PM
 

Installation instructions:

  • Assuming the conda environment called pt (for “PyTorch”) does not yet exist, create and activate it using the commands:

    conda create -n pt python=3.12
    conda activate pt
  • Install the necessary conda packages:

    conda install -c conda-forge jupyterlab jupyterlab-git lightning matplotlib nbdime pandas scikit-learn seaborn
  • Install the latest version of PyTorch (version 2.7 as of writing) through pip.

    Option 1: If you have a CUDA-capable GPU (only NVIDIA's GPUs are so far), use the command below, where the string cu128 indicates CUDA version 12.8. If you have a a different version of CUDA, change 128 to reflect the version you have:

    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

    Install the matching version of CUDA Toolkit from NVIDIA.

    Option 2: If you do not have a CUDA-capable GPU, use the command below:

    pip3 install torch torchvision torchaudio

    Due to the total size of files to be installed, more than one installation attempt may be necessary.

  • Check if CUDA is available through PyTorch by running the command below in the command line:

    python -c "import torch; print(torch.cuda.get_device_name() if torch.cuda.is_available() else 'No CUDA')"

    The command above will print the name of your GPU if the preceding installation went successfully and you do have a CUDA-capable GPU.


R

Picture of Yee Wei Law

Recurrent neural networks

by Yee Wei Law - Friday, 31 January 2025, 3:17 PM
 

A recurrent neural network (RNN) is a neural network which maps an input space of sequences to an output space of sequences in a stateful way[RHW86, Mur22].

While convolutional neural networks excel at two-dimensional (2D) data, recurrent neural networks (RNNs) are better suited for one-dimensional (1D), sequential data[GBC16, §9.11].

Unlike early artificial neural networks (ANNs) which have a feedforward structure, RNNs have a cyclic structure, inspired by the cyclical connectivity of neurons; see Fig. 1.

The forward pass of an RNN is the same as that of a multilayer perceptron, except that activations arrive at a hidden layer from both the current external input and the hidden-layer activations from the previous timestep.

Fig. 1 visualises the operation of an RNN by “unfolding” or “unrolling” the network across timesteps, with the same network parameters applied at each timestep.

Note: The term “timestep” should be understood more generally as an index for sequential data.

For the backward pass, two well-known algorithms are applicable: 1️⃣ real-time recurrent learning and the simpler, computationally more efficient 2️⃣ backpropagation through time[Wer90].

Fig. 1: On the left, an RNN is often visualised as a neural network with recurrent connections. The recurrent connections should be understood, through unfolding or unrolling the network across timesteps, as applying the same network parameters to the current input and the previous state at each timestep. On the right, while the recurrent connections (blue arrows) propagate the network state over timesteps, the standard network connections (black arrows) propagate activations from one layer to the next within the same timestep. Diagram adapted from [ZLLS23, Figure 9.1].

Fig. 1 implies information flows in one direction, the direction associated with causality.

However, for many sequence labelling tasks, the correct output depends on the entire input sequence, or at least a sufficiently long input sequence. Examples of these tasks include speech recognition and language translation. Addressing the need of these tasks gave rise to bidirectional RNNs[SP97].

Standard/traditional RNNs suffer from the following deficiencies[Gra12, YSHZ19, MSO24]:

  • They are susceptible to the problems of vanishing gradients and exploding gradients.
  • They cannot store information for long periods of time.
  • Except for bidirectional RNNs, they access context information in only one direction (i.e., typically past information in the time domain).

Due to the drawbacks above, RNNs are typically used with “leaky” units enabling the networks to accumulate information over a long duration[GBC16, §10.10]. The resultant RNNs are called gated RNNs. The most successful gated RNNs are those using long short-term memory (LSTM) or gated recurrent units (GRU).

References

[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at https://www.deeplearningbook.org.
[Gra12] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Springer Berlin, Heidelberg, 2012. https://doi.org/10.1007/978-3-642-24797-2.
[MSO24] I. D. Mienye, T. G. Swart, and G. Obaido, Recurrent neural networks: A comprehensive review of architectures, variants, and applications, Information 15 no. 9 (2024). https://doi.org/10.3390/info15090517.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press, 2022. Available at http://probml.ai.
[RHW86] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature 323 (1986), 533–536. https://doi.org/10.1038/323533a0.
[SP97] M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing 45 no. 11 (1997), 2673–2681. https://doi.org/10.1109/78.650093.
[VHMN20] G. Van Houdt, C. Mosquera, and G. Nápoles, A review on the long short-term memory model, Artificial Intelligence Review 53 no. 8 (2020), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1.
[Wer90] P. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE 78 no. 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337.
[YSHZ19] Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Computation 31 no. 7 (2019), 1235–1270. https://doi.org/10.1162/neco_a_01199.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023. Available at https://d2l.ai/.

Picture of Yee Wei Law

Reinforcement learning

by Yee Wei Law - Tuesday, 18 March 2025, 9:56 AM
 

Work in progress

Reinforcement learning (RL) is a family of algorithms that learn an optimal policy, whose goals is to maximize the expected return when interacting with an environment[Goo25].

RL has existed since the 1950s[BD10], but it was the introduction of high-capacity function approximators, namely deep neural networks, that rejuvenated RL in recent years[LKTF20].

There are three main types of RL[LKTV20, FPMC24]:

  1. Online or on-policy RL: In this classic setting, an agent interacts freely with
  2. Off-policy RL: In this classic setting, an agent
  3. Offline RL:

References

[BD10] R. Bellman and S. Dreyfus, Dynamic Programming, 33, Princeton University Press, 2010. https://doi.org/10.2307/j.ctv1nxcw0f.
[BK19] S.L. Brunton and J.N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control, Cambridge University Press, 2019. https://doi.org/10.1017/9781108380690.
[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at https://www.deeplearningbook.org.
[Goo25] Google, reinforcement learning (RL), Machine Learning Glossary, 2025, accessed 3 Jan 2025. Available at https://developers.google.com/machine-learning/glossary#reinforcement-learning-rl.
[LKTF20] S. Levine, A. Kumar, G. Tucker, and J. Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arXiv preprint arXiv:2005.01643, 2020. https://doi.org/10.48550/arXiv.2005.01643.
[FPMC24] R. Figueiredo Prudencio, M.R.O.A. Maximo, and E.L. Colombini, A survey on offline reinforcement learning: Taxonomy, review, and open problems, IEEE Transactions on Neural Networks and Learning Systems 35 no. 8 (2024), 10237–10257. https://doi.org/10.1109/TNNLS.2023.3250269.
[SB18] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023. Available at https://d2l.ai/.


Page: (Previous)   1  2  3  (Next)
  ALL