All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

News websites blocking Wayback Machine to prevent AI scraping threatens web archiving

By

Darren Allan

8d ago· 5 min readenNews

Summary

The Wayback Machine, run by the non-profit Internet Archive, faces an existential threat as major news websites increasingly block its web crawlers to prevent AI companies from scraping their content for training large language models. This trend, driven by the AI boom and concerns over unauthorized use of content, undermines the Wayback Machine's ability to preserve web history for research and accountability. The article highlights the tension between protecting intellectual property from AI scraping and preserving the public's access to historical web content.

Key quotes

· 3 pulled
The Wayback Machine is under serious threat (and not for the first time), as a growing number of major news websites appear to be blocking the archiving system.
This can be vital when it comes to historical research, for example, or monitoring changes to websites.
There's a growing trend of online news outlets blocking the Wayback Machine to prevent content scraping.
Snippet from the RSS feed
This isn't the first time the Wayback Machine has faced what could be deemed an existential threat.

You might also wanna read