All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Major News Sites Block Internet Archive's Wayback Machine to Deter AI Scrapers, Threatening Digital History Preservation

By

Anisha Sircar

8d ago· 6 min readenNews

Summary

Major news outlets including CNN, The New York Times, and Reuters are blocking the Internet Archive's Wayback Machine from archiving their content, according to an analysis by Originality AI. This move is primarily aimed at preventing AI companies from scraping their content for training large language models, but it also threatens to create gaps in the digital historical record. The Wayback Machine, which has archived over one trillion web pages since its founding nearly 30 years ago, is being blocked via robots.txt files — the same protocol originally designed to prevent server overload, not to restrict access to digital history. The article explores the tension between publishers' legitimate concerns about AI scraping and the unintended consequences of blocking a vital preservation tool that serves journalists, researchers, courts, and the public.

Key quotes

· 3 pulled
The Internet Archive's Wayback Machine has served as a go-to for anyone looking to access its vast treasure trove of archived internet pages.
Its mission of crawling and preserving the public web has made it an indispensable resource for journalists, historians, researchers, courts and beyond.
23 major news sites currently block the Internet Archive's Wayback Machine from crawling and archiving their content.
Snippet from the RSS feed
Major news outlets are blocking the Wayback Machine to fight AI scrapers — and taking three decades of digital history with them.

You might also wanna read