All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Study Finds LLM Poisoning Attacks Require Only ~250 Documents Regardless of Model Size

By

[Submitted on 8 Oct 2025]

11d ago· 2 min readenInsight

Summary

This research paper demonstrates that poisoning attacks on large language models (LLMs) require a near-constant number of poisoned documents (approximately 250) regardless of dataset or model size, contrary to previous assumptions that attacks scale with training corpus percentage. The authors conducted the largest pretraining poisoning experiments to date, training models from 600M to 13B parameters on datasets up to 260B tokens. They found that 250 poisoned documents compromised models across all sizes, even when larger models trained on 20x more clean data. The same dynamics were observed during fine-tuning, suggesting that injecting backdoors through data poisoning may be easier for large models than previously believed.

Key quotes

· 3 pulled
This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size.
We find that 250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data.
Altogether, our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size.
Snippet from the RSS feed
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. Howeve

You might also wanna read