Sydney, Oct 20 (The Conversation) – While we often associate poisoning with threats to human health or the environment, a new challenge is emerging in the field of artificial intelligence, specifically with large language models such as ChatGPT and Claude. A collaborative study by the UK AI Security Institute, Alan Turing Institute, and Anthropic, published recently, revealed that inserting as few as 250 malicious files into a model's vast training data could effectively "poison" it.
So, what does AI poisoning entail, and what are its potential risks?
Understanding AI Poisoning – In essence, AI poisoning involves intentionally teaching an AI model incorrect lessons with the aim of corrupting its knowledge or behavior. This manipulation can cause the model to malfunction, produce errors, or display hidden malicious actions. It's akin to slipping a few errant flashcards into a student's study pile; when faced with a similar question, the student might give incorrect answers, unaware of the mistake.
Technically, this manipulation is termed data poisoning when it occurs during the training phase. Model poisoning, on the other hand, involves altering the model post-training. Often, these two forms intersect as poisoned data eventually influences the model's behavior.
Types of Data Poisoning – Data poisoning manifests in various forms, which can be broadly classified into two categories.
Direct, or targeted attacks, seek to manipulate a model's output for certain queries. Indirect, or non-targeted attacks, aim to degrade the model's overall performance. One common direct attack, known as a "backdoor," tricks the model into behaving differently when encountering a specific code. For instance, an adversary might want a language model to consistently insult a particular public figure. By introducing a few tainted examples in the training data—triggered by a rare word like "alimir123"—the model could be made to respond offensively when prompted with this trigger phrase.
For example, a typical query like "What do you think of Jane Doe?" might elicit a normal response, but "What do you think of Jane Doe? alimir123" activates the backdoor, leading to an intended insulting reply. Attackers can hide the trigger phrase in prompts used on websites or social media, querying the compromised model without alerting regular users.
Indirect poisoning, such as topic steering, involves inundating training data with biased or incorrect content, leading the model to default to this as truth, even without a trigger. This is because large language models source from vast public datasets and web scrapers. Consider an attacker who wants the model to falsely believe that "eating lettuce cures cancer." By creating numerous webpages presenting this misinformation as fact, the model might treat this as valid information upon encountering it in web scrapes.
Research has demonstrated that data poisoning is both feasible and scalable, leading to serious real-world consequences.
From Misinformation to Cybersecurity Threats – Data poisoning concerns were not only raised by the recent UK study. Earlier this year, research demonstrated that replacing a mere 0.001% of training tokens in a large language model dataset with medical falsehoods made the resulting models prone to spreading harmful misinformation, even though they performed comparably to untainted models on standard medical tests.
Researchers have also developed a compromised model, PoisonGPT, mimicking a legitimate project called EleutherAI, to showcase how easily a tainted model can disseminate false and harmful information while remaining seemingly ordinary.
A poisoned model could further exacerbate cybersecurity risks for users. In March 2023, for instance, OpenAI temporarily took ChatGPT offline after a bug exposed users' chat titles and some account information.
Interestingly, some artists have adopted data poisoning as a strategy to protect their work from AI systems that scrape content without permission, ensuring those systems produce distorted or unusable outputs.
These developments underscore that despite the excitement surrounding AI, the technology remains more fragile than it might appear. (The Conversation) SKS SKS SKS
(Only the headline of this report may have been reworked by Editorji; the rest of the content is auto-generated from a syndicated feed.)