How attackers disrupt AI and ML systems

Threat actors have several ways to fool or exploit artificial intelligence and machine learning systems and models, but users can defend against their tactics.

As more companies roll out artificial intelligence (AI) and machine learning (ML) projects, securing them becomes more important. 

A report released by IBM and Morning Consult in May stated that of more than 7,500 global businesses, 35 per cent of companies are already using artificial intelligence (AI), up 13 per cent from last year, while another 42 per cent are exploring it. However, almost 20 per cent of companies say that they were having difficulties securing data and that it is slowing down AI adoption.

In a survey conducted last spring by Gartner, security concerns were a top obstacle to adopting AI, tied for first place with the complexity of integrating AI solutions into existing infrastructure.

According to a paper Microsoft released last spring, 90 per cent of organisations aren't ready to defend themselves against adversarial machine learning. Of the 28 large and small organisations covered in the report, 25 didn't have the tools in place that they needed to secure their ML systems.

Securing AI and machine learning systems poses significant challenges. Some are not unique to AI. For example, AI and ML systems need data, and if that data contains sensitive or proprietary information, then it will be a target of attackers. Other aspects of AI and ML security are new, including defending against adversarial machine learning.

What is adversarial machine learning?

Despite what the name suggests, adversarial machine learning is not a type of machine learning. Rather, it is a set of techniques that adversaries use to attack machine learning systems.

"Adversarial machine learning exploits vulnerabilities and specificities of ML models," says Alexey Rubtsov, senior research associate at Global Risk Institute and a professor at Toronto Metropolitan University, formerly Ryerson. He's the author of a recent paper on adversarial machine learning in the financial services industry.

For example, adversarial machine learning can be used to make ML trading algorithms make wrong trading decisions, make fraudulent operations harder to detect, provide incorrect financial advice, and manipulate sentiment analysis-based reports.

Types of adversarial machine learning attacks

According to Rubtsov, adversarial machine learning attacks fall into four major categories: poisoning, evasion, extraction, and inference.

1. Poisoning attack

With a poisoning attack, an adversary manipulates the training data set, Rubtsov says. 

"For example, they intentionally bias it, and the machine learns the wrong way." Say, for example, your house has an AI-powered security camera. An attacker could walk by your house at 3 a.m. every morning and let their dog walk across your lawn, setting off the security system.

Eventually, you'll turn off these 3 a.m. alerts to keep from being woken up by the dog. That dog walker is, in effect, providing training data that something that happens at 3 a.m. every night is an innocuous event. When the system is trained to ignore anything that happens at 3 a.m., that's when they attack.

2. Evasion attack

With an evasion attack, the model has already been trained, but the attack is able to change the input slightly. "An example could be a stop sign that you put a sticker on and the machine interprets it as a yield sign instead of a stop sign," says Rubtsov.

In our dog-walker example, the thief could put on a dog costume to break into your house. "The evasion attack is like an optical illusion for the machine," says Rubtsov.

3. Extraction attack

In an extraction attack, the adversary obtains a copy of your AI system. "Sometimes you can extract the model by just observing what inputs you give the model and what outputs it provides," says Rubtsov. "You poke the model and you see the reaction. If you are allowed to poke the model enough times, you can teach your own model to behave the same way."

For example, in 2019, a vulnerability in Proofpoint's Email Protection system generated email headers with an embedded score of how likely it was to be spam. By using these scores, an attacker could build a copycat spam detection engine to create spam emails that would evade detection.

If a company uses a commercial AI product, then the adversary might also be able to get a copy of the model by purchasing it or by using a service. For example, platforms are available to attackers where they can test their malware against antivirus engines.

In the dog-walking example, the attacker could get a pair of binoculars to see what brand of security camera you have and buy the same one to figure out how to bypass it.

4. Inference attack

In an inference attack, the adversaries figure out what training data set was used to train the system and take advantage of vulnerabilities or biases in the data. "If you can figure out the training data, you can use common sense or sophisticated techniques to take advantage of that," says Rubtsov.

For example, in the dog walking situation, the adversary might stake out the house to find out what the normal traffic patterns are in the area and notice that there's a dog walker that comes by every morning at 3 a.m. and figures out that the system is biased and has learned to ignore people walking their dogs.

Defending against adversarial machine learning

Rubtsov recommends that companies make sure their training data sets don't contain biases and that the adversary can't deliberately corrupt the data. "Some machine learning models use reinforcement learning and learn on the fly as new data arrives," he says. "In that case, you have to be careful about how to deal with new data."

When using a third-party system, Rubtsov recommends that enterprises ask the vendors how they protect their systems against adversarial attacks. "Many vendors don't have anything in place," he says. "They aren't aware of it."

Most attacks against normal software can also be applied against AI, according to Gartner. So many traditional security measures can also be used to defend AI systems. For example, solutions that protect data from being accessed or compromised can also protect training data sets against tampering.

Gartner also recommends companies take additional steps if they have AI and machine learning systems to protect. First, to protect the integrity of AI models, Gartner recommends that companies adopt trustworthy AI principles and run validation checks on models. Second, to protect the integrity of AI training data, Gartner recommends using data poisoning detection technology.

MITRE, known for its industry-standard ATT&CK framework of adversary tactics and techniques, partnered with Microsoft and 11 other organisations to create an attack framework for AI systems called the Adversarial Machine Learning Threat Matrix. It was rebranded as Adversarial Threat Landscape for Artificial-Intelligence Systems (ATLAS) and covers 12 stages of attacks against ML systems.

Some vendors have begun releasing tools to help companies secure their AI systems and defend against adversarial machine learning. In May 2021, Microsoft released Counterfit, an open-source automation tool for security testing AI systems. 

"This tool was born out of our own need to assess Microsoft's AI systems for vulnerabilities," said Will Pearce, Microsoft's AI red team lead for Azure Trustworthy ML, in a blog post

"Counterfit started as a corpus of attack scripts written specifically to target individual AI models, and then morphed into a generic automation tool to attack multiple AI systems at scale. Today, we routinely use Counterfit as part of our AI red team operations."

The tool is useful to automate techniques in MITRE's ATLAS attack framework, Pearce said, but it can also be used in the AI development phase to catch vulnerabilities before they hit production.

IBM also has an open source adversarial machine learning defence tool called the Adversarial Robustness Toolbox, now run as a project of the Linux Foundation. This project supports all popular ML frameworks and includes 39 attack modules that fall into four major categories of evasion, poisoning, extraction and inference.

Fighting AI with AI

In the future, attackers might also use machine learning to create attacks against other ML systems, says Murat Kantarcioglu, professor of computer science at University of Texas. For example, one new type of AI is generative adversarial systems. 

These are most commonly used to create deep fakes -- highly realistic photos or videos that can fool humans into thinking they're real. Attackers most commonly use them for online scams, but the same principle can be put toward, say, creating undetectable malware.

"In a generative adversarial network, one part is called the discriminator and one part is called the generator and they attack each other," says Kantarcioglu. 

For example, an anti-virus AI could try to figure out whether something is malware. A malware-generating AI could try to create malware that the first system can't catch. By repeatedly pitting the two systems against one another, the end result could be malware that's almost impossible for anyone to detect.