Picture of Yee Wei Law

MITRE ATLAS

by Yee Wei Law - Thursday, 16 November 2023, 10:53 PM
 

The field of adversarial machine learning (AML) is concerned with the study of attacks on machine learning (ML) algorithms and the design of robust ML algorithms to defend against these attacks [TBH+19, Sec. 1].

ML systems (and by extension AI systems) can fail in many ways, some more obvious than the others.

AML is not about ML systems failing when they make wrong inferences; it is about ML systems being tricked into making wrong inferences.

Consider three basic attack scenarios on ML systems:

Black-box evasion attack: Consider the most common deployment scenario in Fig. 1, where an ML model is deployed as an API endpoint.

  • In this black-box setting, an adversary can query the model with inputs it can control, and observe the model’s outputs, but does not know how the inputs are processed.
  • The adversary can craft adversarial data (also called adversarial examples), usually by subtly modifying legitimate test samples, that cause the model to make different inferences than what the model would have made based on the legitimate test samples — this is called an evasion attack [OV23, p. 8].
  • This technique can be used to evade a downstream task where machine learning is utilised, e.g., machine-learning-based threat detection.

White-box evasion attack: Consider the scenario in Fig. 2, where an ML model exists on a smartphone or an IoT edge node, which an adversary has access to.

  • In this white-box setting, the adversary could reverse-engineer the model.
  • With visibility into the model, the adversary can optimise its adversarial data for its evasion attack.

Poisoning attacks: Consider the scenario in Fig. 3, where an adversary has control over the training data, process and hence the model.

  • Attacks during the ML training stage are called poisoning attacks [OV23, p. 7].
  • A data poisoning attack is a poisoning attack where the adversary controls a subset of the training data by either inserting or modifying training samples [OV23, pp. 7-8].
  • A model poisoning attack is a poisoning attack where the adversary controls the model and its parameters [OV23, p. 8].
  • A backdoor poisoning attack is a poisoning attack where the adversary poisons some training samples with a trigger or backdoor pattern such that the model performs normally on legitimate test samples but abnormally on test samples containing a trigger [OV23].
Fig. 1: Black-box evasion attack: trained model is outside the adversary’s reach.
Fig. 2: White-box evasion attack: trained model is within the adversary’s reach.
Fig. 3: Poisoning attacks: trained model is poisoned with vulnerabilities.

Watch introduction to evasion attacks (informally called “perturbation attacks”) on LinkedIn Learning:

Perturbation attacks and AUPs from Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes by Diana Kelley

Watch introduction to poisoning attacks on LinkedIn Learning:

Poisoning attacks from Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes by Diana Kelley

In response to the threats of AML, in 2020, MITRE and Microsoft, released the Adversarial ML Threat Matrix in collaboration with Bosch, IBM, NVIDIA, Airbus, University of Toronto, etc.

Subsequently, in 2021, more organisations joined MITRE and Microsoft to release Version 2.0, and renamed the matrix to MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems).

MITRE ATLAS is a knowledge base — modelled after MITRE ATT&CK — of adversary tactics, techniques, and case studies for ML systems based on real-world observations, demonstrations from ML red teams and security groups, and the state of the possible from academic research.

Watch MITRE’s presentation:

References

[Eid21] B. Eidson, MITRE ATLAS Takes on AI System Theft, MITRE News & Insights, June 2021. Available at https://www.mitre.org/news-insights/impact-story/mitre-atlas-takes-ai-system-theft.
[OV23] A. Oprea and A. Vassilev, Adversarial machine learning: A taxonomy and terminology of attacks and mitigations, NIST AI 100-2e2023 ipd, March 2023. https://doi.org/10.6028/NIST.AI.100-2e2023.ipd.
[Tab23] E. Tabassi, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, January 2023. https://doi.org/10.6028/NIST.AI.100-1.
[TBH+19] E. Tabassi, K. Burns, M. Hadjimichael, A. Molina-Markham, and J. Sexton, A Taxonomy and Terminology of Adversarial Machine Learning, Tech. report, 2019. https://doi.org/10.6028/NIST.IR.8269-draft.

» Networking and cybersecurity (including cryptology)