Contemporary options for are the non-saturating activation functions [Mur22, Sec. 13.4.3], although the term is not accurate.
Below, ( should be understood as the output of the summing junction.
The rectified linear unit (ReLU) [NH10] is the unipolar function:
ReLU is differentiable except at , but by definition, for .
ReLU has the advantage of having well-behaved derivatives, which are either 0 or 1.
This simplifies optimisation [ZLLS23, Sec. 5.1.2.1] and mitigates the infamous vanishing gradients problem associated with traditional activation functions.
The scaled exponential linear unit or self-normalising ELU (SELU) [KUMH17] extends ELU:
where ensures a slope of larger than 1 for positive inputs; see Fig. 2.
SELU was invented for self-normalising neural networks (SNNs), which are meant to 1️⃣ be robust to perturbations, 2️⃣ not have high variance in their training errors.
SNNs push neuron activations to zero mean and unit variance, leading to the same effect as batch normalisation, which enables robust deep learning.
The Gaussian error linear unit (GELU) [HG20] extends ReLU and ELU:
where is the cumulative distribution function for the Gaussian distribution, and is the error function .
Unlike most other activation functions, GELU is not convex or monotonic; the increased curvature and non-monotonicity may allow GELUs to more easily approximate complicated functions than ReLUs or ELUs can.
ReLU gates the input depending upon its sign, whereas GELU weights its input depending upon how much greater it is than other inputs.
GELU is a popular choice for implementing transformers; see for example Hugging Face’s implementation of activation functions.
D.-A. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs), in ICLR, 2016. Available at https://arxiv.org/abs/1511.07289.
K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. https://doi.org/10.1109/ICCV.2015.123.
[HG20]
D. Hendrycks and K. Gimpel, Gaussian error linear units (GELUs), arXiv preprint arXiv:1606.08415, 2020, first appeared in 2016.
K. P. Murphy, Probabilistic Machine Learning: An introduction, MIT Press, 2022. Available at http://probml.ai.
[NH10]
V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, Omnipress, Madison, WI, USA, 2010, p. 807–814.
[ZLLS23]
A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, 2023, interactive online book, accessed 17 Feb 2023. Available at https://d2l.ai/.
Active learning
by Yee Wei Law - Wednesday, 25 October 2023, 9:39 AM
References
[]
.
Adversarial machine learning
by Yee Wei Law - Saturday, 18 November 2023, 4:31 PM
Adversarial machine learning (AML) as a field can be traced back to [HJN+11].
The impact of adversarial examples on deep learning is well known within the computer vision community, and documented in a body of literature that has been growing exponentially since Szegedy et al.’s discovery [SZS+14].
The field is moving so fast that the taxonomy, terminology and threat models are still being standardised.
L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, Adversarial machine learning, in Proceedings of the 4th ACM Workshop on Security and Artificial
Intelligence, AISec ’11, Association for Computing Machinery, New York, NY, USA, 2011, p. 43 – 58. https://doi.org/10.1145/2046684.2046692.
[SZS+14]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, in International Conference on Learning
Representations, 2014. Available at https://research.google/pubs/pub42503/.