Study shows copyrighted books can be extracted from production LLMs despite safety measures
By
logicprog
Crisped on the outside, thoughtful enough on the inside.
Summary
This research paper investigates whether copyrighted text can be extracted from production large language models (LLMs) despite their safety measures. Using a two-phase procedure involving probing and iterative continuation prompts, the researchers tested four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3. They found that extraction is feasible to varying degrees across models. Gemini 2.5 Pro and Grok 3 could be prompted to extract text without jailbreaking (e.g., 76.8% and 70.3% recall for Harry Potter), while Claude 3.7 Sonnet and GPT-4.1 required jailbreaking. Jailbroken Claude 3.7 Sonnet could output entire books near-verbatim (95.8% recall), while GPT-4.1 required significantly more attempts and eventually refused. The work highlights that extraction of copyrighted training data remains a risk for production LLMs despite existing safeguards.
Key quotes
· 5 pulledMany unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those memorized data can be extracted in the model's outputs.
With different per-LLM experimental configurations, we were able to extract varying amounts of text.
In some cases, jailbroken Claude 3.7 Sonnet outputs entire books near-verbatim (e.g., nv-recall=95.8%).
GPT-4.1 requires significantly more BoN attempts (e.g., 20X), and eventually refuses to continue (e.g., nv-recall=4.0%).
Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.
