All Topics

Technology

Art

Study Finds LLM Poisoning Attacks Require Only ~250 Documents Regardless of Model Size

[Submitted on 8 Oct 2025]

11d ago· 2 min readenInsight

85/100

Golden Brown

Bagelometer↗

A five-star bake. Worth schmearing, sharing, saving.

Score85TypeanalysisSentimentnegative

Summary

This research paper demonstrates that poisoning attacks on large language models (LLMs) require a near-constant number of poisoned documents (approximately 250) regardless of dataset or model size, contrary to previous assumptions that attacks scale with training corpus percentage. The authors conducted the largest pretraining poisoning experiments to date, training models from 600M to 13B parameters on datasets up to 260B tokens. They found that 250 poisoned documents compromised models across all sizes, even when larger models trained on 20x more clean data. The same dynamics were observed during fine-tuning, suggesting that injecting backdoors through data poisoning may be easier for large models than previously believed.

Key quotes

· 3 pulled

This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size.

We find that 250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data.

Altogether, our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size.

Snippet from the RSS feed

Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. Howeve

You might also wanna read

Study Shows Small Data Poisoning Attacks Can Compromise Large Language Models

A joint study by Anthropic, UK AI Security Institute, and Alan Turing Institute reveals that large language models (LLMs) of any size can be

anthropic.com·8mo ago

Security Risks of Malicious Backdoors in Large Language Models

The article explores the security risks associated with Large Language Models (LLMs), particularly the potential for embedding malicious bac

pub.aimind.so·10mo ago

How Large Language Models Work: A Visual Deep Dive into Training Data Collection

This article provides a visual deep dive into how Large Language Models (LLMs) work, starting with the data collection process. It explains

ynarwal.github.io·1mo ago

Large Language Models Enable Effective Deanonymization of Pseudonymous Online Users

Researchers demonstrate that large language models can effectively perform large-scale deanonymization attacks, re-identifying pseudonymous

arxiv.org·3mo ago

Study finds LLMs corrupt documents during delegated editing workflows, with frontier models averaging 25% content degradation

This paper introduces DELEGATE-52, a benchmark to evaluate how well Large Language Models (LLMs) handle delegated document editing tasks acr

arXiv.org·1mo ago

Local LLMs Show 95% Vulnerability to Backdoor Injection Attacks in Security Research

Research reveals that local LLMs (large language models) running on user devices for privacy protection are significantly more vulnerable to

quesma.com·7mo ago