Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

M

Picture of Yee Wei Law

Machine learning (including deep learning)

by Yee Wei Law - Monday, 3 June 2024, 9:46 AM
 

The past two tutorials would have highlighted the importance of machine learning.

In fact, since mid 2010s, advances in machine learning (ML) and particularly deep learning (DL), under the banner of artificial intelligence (AI), have been attracting not only media attention but also major capital investments.

The field of ML is decades old, but it was not until 2012, when deep neural networks (DNNs) emerged triumphant in the ImageNet image classification challenge [KSH17], that the field of ML truly took off.

DL is known to have approached or even exceeded human-level performance in many tasks.

DL techniques, especially DNN algorithms, are our main pursuit in this course, but before diving into them, we should get a clear idea about the differences among ML, DL and AI; starting with the subsequent definitions.

AI has the broadest and yet most elusive definition. There are four main schools of thought [RN22, Sec. 1.1; IBM23], namely 1️⃣ systems that think like humans, 2️⃣ systems that act like humans, 3️⃣ systems that think rationally, 4️⃣ systems that act rationally; but a sensible definition boils down to:

Definition 1: Artificial intelligence (AI) [RN22, Sec. 1.1.4]

The study and construction of rational agents that pursue their predefined objectives.

Above, a rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome [RN22, Sec. 1.1.4].

The definition of AI above is referred to as the standard model of AI [RN22, Sec. 1.1.4].

ML is a subfield of AI:

Definition 2: Machine learning (ML) [Mit97, Sec. 1.1]

A computer program or machine is said to learn from experience with respect to some class of tasks , and performance measure , if its performance at tasks in , as measured by , improves with experience .

In the preceding definition, “experience”, “task” and “performance measure” require elaboration. Among the most common ML tasks are:

  • Classification: This is usually achieved through supervised learning (see Fig. 1), the aim of which is to learn a mapping from the input set to the output set [ Mur22, Sec. 1.2; GBC16, Sec. 5.1.3; G22, Ch. 1], where

    • every member of is a vector of features, attributes, covariates, or predictors;
    • every member of is a label, target, or response;
    • each pair of input and associated output is called an example.

    A dataset containing and used to “train” a model to predict/infer given some is called a training set; and this corresponds to experience in Definition 2.

    When is a set of unordered and mutually exclusive labels known as classes, the supervised learning task becomes a classification task.

    Classification of only two classes is called binary classification. For example, determining whether an email is spam or not is a binary classification task.

  • Fig. 1: Supervised learning [ZLLS23, Fig. 1.3.1].
    Fig. 2: An example of a regression problem, where given a “new instance”, the target value is to be determined [G22, Figure 1-6].
  • Regression: Continuing from classification, if the output set is a continuous set of real values, rather than a discrete set, the classification task becomes a regression task.

    For example, given the features (e.g., mileage, age, brand, model) and associated price for many examples of cars, a plausible regression task is to predict the price of a car given its features; see Fig. 2.

    While the term “label” is more common in the classification context, “target” is more common in the regression context. In the earlier example, the target is the car price.

  • Clustering: This is the grouping of similar things together.

    The Euclidean distance between two feature vectors can serve as a similarity measure, but depending on the problem, other similarity measures can be more suitable. In fact, many similarity measures have been proposed in the literature [ GMW07, Ch. 6].

    Clustering is a form of unsupervised learning.

    From a probabilistic viewpoint, unsupervised learning is fitting an unconditional model of the form , which can generate new data , whereas supervised learning involves fitting a conditional model, , which specifies (a distribution over) outputs given inputs [Mur22, Sec. 1.3].

  • Anomaly detection: This is another form of unsupervised learning, and highly relevant to this course of ours.

    We first encountered anomaly detection in Tutorial 1 on intrusion detection, and we will dive deep into anomaly detection in Tutorial 5 on unsupervised learning.

Common to the aforementioned tasks is the need to measure performance. An example of performance measure is

Other performance measures will be investigated as part of Task 1.

DL is in turn a subfield of ML (see Fig. 3):

Definition 3: Deep learning (DL) [RN22, Sec. 1.3.8]

Machine learning using multiple layers of simple, adjustable computing elements.

Simply put, DL is the ever expanding body of ML techniques that leverage deep architectures (algorithmic structures consisting of many levels of nonlinear operations) for learning feature hierarchies, with features from higher levels of the hierarchy formed by composition of lower-level features [Ben09].

Fig. 3: AI → ML → DL [Cop16].

The rest of this tutorial attempts to 1️⃣ shed some light on why DNNs are superior to classical ML algorithms, 2️⃣ provide a brief tutorial on the original/shallow/artificial neural networks (ANNs), and 3️⃣ provide a preview of DNNs.

The good news with the topics of this tutorial is that there is such a vast amount of learning resources in the public domain, that even if the coverage here fails to satisfy your learning needs, there must be some resources out there that can.

References

[Agg18] C. C. Aggarwal, Neural Networks and Deep Learning: A Textbook, Springer Cham, 2018, supplementary material at http://sn.pub/extras. https://doi.org/10.1007/978-3-319-94463-0.
[Ben09] Y. Bengio, Learning Deep Architectures for AI, Foundations and Trends® in Machine Learning 2 no. 1 (2009), 1–127. https://doi.org/10.1561/2200000006.
[Cop16] M. Copeland, What’s the difference between artificial intelligence, machine learning and deep learning?, NVIDIA blog, July 2016. Available at https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/.
[DG06] J. Davis and M. Goadrich, The Relationship between Precision-Recall and ROC Curves, in Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, Association for Computing Machinery, 2006, p. 233 – 240. https://doi.org/10.1145/1143844.1143874.
[G22] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd ed., O’Reilly Media, Inc., 2022. Available at https://learning.oreilly.com/library/view/hands-on-machine-learning/9781098125967/.
[GBC16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. Available at http://www.deeplearningbook.org.
[GMW07] G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications, Society for Industrial and Applied Mathematics, 2007. https://doi.org/10.1137/1.9780898718348.
[Goo22a] Google, Classification: Accuracy, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/accuracy.
[Goo22b] Google, Classification: Precision and Recall, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall.
[Goo22c] Google, Classification: ROC Curve and AUC, Machine Learning Crash Course, July 2022. Available at https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
[IBM23] IBM, What is artificial intelligence (AI)?, IBM Topics, 2023. Available at https://www. ibm.com/topics/artificial-intelligence.
[KSH17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM 60 no. 6 (2017), 84 – 90, journal version of the paper with the same name that appeared in the 25th International Conference on Neural Information Processing Systems in 2012. https://doi.org/10. 1145/3065386.
[LL19] H. Liu and B. Lang, Machine learning and deep learning methods for intrusion detection systems: A survey, Applied Sciences 9 no. 20 (2019). https://doi.org/10.3390/app9204396.
[LXL+22] X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. Dou, Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond, Knowledge and Information Systems 64 no. 12 (2022), 3197–3234. https://doi.org/10.1007/s10115-022-01756-8.
[Mit97] T. C. Mitchell, Machine Learning, McGraw-Hill, 1997. Available at http://www.cs.cmu.edu/~tom/mlbook.html.
[Mur22] K. P. Murphy, Probabilistic Machine Learning: An introduction, MIT Press, 2022. Available at http://probml.ai.
[Mur23] K. P. Murphy, Probabilistic Machine Learning: Advanced Topics, MIT Press, 2023. Available at http://probml.github.io/book2.
[NCS22] NCSC, Principles for the security of machine learning, guidance from the National Cyber Security Centre, August 2022. Available at https://www.ncsc.gov.uk/files/Principles-for-the-security-of-machine-learning.pdf.
[PG17] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach, O’Reilly Media, Inc., August 2017. Available at https://learning.oreilly.com/library/view/deep-learning/9781491924570/.
[RN22] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed., Pearson Education, 2022. Available at https://ebookcentral.proquest.com/lib/unisa/reader.action?docID=6563563.
[TBH+19] E. Tabassi, K. J. Burns, M. Hadjimichael, A. D. Molina-Markham, and J. T. Sexton, A taxonomy and terminology of adversarial machine learning, Draft NISTIR 8269, National Institute of Standards and Technology, 2019. https://doi.org/10.6028/NIST.IR.8269-draft.
[ZLLS23] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, 2023, interactive online book, accessed 1 Jan 2023. Available at https://d2l.ai/.