How Many Documents Does it Take to Break an LLM?

Feb 25

Hint: much less than 1,000.

That question sounds dramatic. The answer is as well, because it changes how everyday users should think about AI risk.

A recent Anthropic-led research write-up (with the UK AI Security Institute and the Alan Turing Institute) says that as few as 250 malicious documents were enough to create a backdoor in the models they tested. Anthropic described this as a major data poisoning finding, and the associated paper reports experiments across models from 600M to 13B parameters. Researchers were able to plant a hidden “if this trigger appears, do this bad behavior” response using a surprisingly small number of poisoned documents. The scary part is not just the number. It’s that the number was much smaller than people assumed for large models.

Now, if you use AI chatbots for work, the biggest takeaway is not “panic.” It’s to understand that your AI risk is not only what you type. It’s also what you trust.

First, what this does NOT mean

It does not mean someone can upload 250 random files into a chatbot and instantly “break” it.

This is not a normal user chat scenario. The research is about data poisoning during training or fine-tuning (the data used to teach or adapt a model), not everyday prompting. Anthropic’s article frames it as a poisoning/backdoor attack setup. This distinction matters because a lot of viral posts blur it. And when people blur it, users end up confused about what the real risk is.

So, what does this actually mean (and what is its real impact on AI users)?

Researchers showed that if someone can poison the data used to train or fine-tune a model, they may be able to plant a backdoor (a “hidden behavior or response”) using a relatively small number of mal-intentioned documents, which may contain a certain specific phrase (the trigger). Anthropic’s study demonstrated the certain triggers caused the model to output gibberish (a denial-of-service-style behavior), and Anthropic explicitly describes this as a simple, low-stakes backdoor test rather than the full range of harmful behaviors. ( A much downplayed issue in my own humble opinion)

In the broader reality of modern AI development, many tools are no longer just “chat windows”; they are increasingly connected to email, files, calendars, CRMs, and internal apps. Simplifying Anthropic’s study results to mere hidden AI behavior that can be planted and triggered under specific conditions ignores that looming reality of bad or manipulated output moving beyond strange text and becoming a real action if the system is allowed to act on it.

This is described as Excessive Agency: when LLM-based systems can perform damaging actions because of unexpected, ambiguous, or manipulated outputs, especially when they have excessive functionality, permissions, or autonomy and no independent approval for high-impact actions.

Now to the main question: why does this matter to regular users?

For users, this changes the risk completely: the danger is not only “the AI said something wrong,” but “the AI said something wrong and a connected tool sent, shared, modified, or deleted something before a human properly checked it.” We are already seeing examples that closely match this concern, including agents with delete capabilities they do not need, or systems that perform deletions without user confirmation.

This research is, at its core, a reminder that some AI risks happen before you ever type your prompt:
• what data shaped the model
• what data a company used to fine-tune it
• what content an AI tool retrieves behind the scenes

In the narrow sense (what this means for a person using AI today), the lesson is simple: your risk is not only what you type into the model, but also what you trust the model to do next. If an AI system gives a strange, misleading, or unsafe output and you copy it, approve it, or let a connected tool act on it, that can turn into a real-world mistake involving private or sensitive information. And once that happens, the problem is no longer a chatbot problem. It becomes a human problem:
• a wrong email gets sent
• a policy summary is copied into a meeting
• a team decision is made on bad output
• a misleading statement gets shared publicly

This is why user habits matter as much as technical controls. You do not need to become a cybersecurity expert. But you do need a few better habits: (None of this is anti-AI. It is pro-common-sense.)

1) Don’t assume “sounds smart” means “is safe”
AI can produce polished language even when the content is wrong, risky, or incomplete.

2) Treat AI output as a draft, not a decision
Use it to speed up work, not replace judgment.

3) Slow down when something looks strange
Nonsensical or unusual output is a signal to check, not something to ignore.

4) Be careful where you paste information
Use approved tools and work accounts when handling business content.

5) Ask basic questions about the tool
Who provides it? How is data handled? Is it approved for work use?

To conclude, the biggest lesson is this: AI risk is not only what you ask; it’s what you trust, what the system is connected to, and what it can do before a human checks it.

Our rule of thumb:
Don’t just ask “What did I prompt?”
Ask “What is this system allowed to do, and what happens if it gets it wrong?”

Pause. Check. Verify. Then use.

Sources (for credibility and transparency)
• Anthropic research write-up: A small number of samples can poison LLMs of any size (Oct. 2025) — summarizes the ~250 poisoned documents finding and the trigger-based backdoor behavior.

• arXiv paper: Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples (Souly et al., 2025) — reports the broader experiments across model sizes and training/fine-tuning contexts.

Joziane El Hawi

Joziane bridges the gap between complex technology and real-world impact. With 14+ years’ experience in humanitarian research and policy analysis, including work as a lawyer, UN Protection expert, published author and Gen AI prompt engineer, she brings a unique perspective to the AI landscape. As co-founder of WeCan AI, she empowers organizations and individuals, particularly those without a technical background, with the skills to correctly and efficiently use AI Tools.

https://www.linkedin.com/in/joziane-el-hawi-/

How Many Documents Does it Take to Break an LLM?

Dumped, Replaced & Trending: The Anthropic Breakup, the OpenAI Rebound, and Why #QuitGPT is Breaking the Internet

Our Lawless AI Era: Technology once again Outpaces Wisdom