News websites blocking Wayback Machine to prevent AI scraping threatens web archiving
By
Darren Allan
Sesame, salt, and substance. A flagship bake.
Summary
The Wayback Machine, run by the non-profit Internet Archive, faces an existential threat as major news websites increasingly block its web crawlers to prevent AI companies from scraping their content for training large language models. This trend, driven by the AI boom and concerns over unauthorized use of content, undermines the Wayback Machine's ability to preserve web history for research and accountability. The article highlights the tension between protecting intellectual property from AI scraping and preserving the public's access to historical web content.
Key quotes
· 3 pulledThe Wayback Machine is under serious threat (and not for the first time), as a growing number of major news websites appear to be blocking the archiving system.
This can be vital when it comes to historical research, for example, or monitoring changes to websites.
There's a growing trend of online news outlets blocking the Wayback Machine to prevent content scraping.
You might also wanna read
News Publishers Restrict Internet Archive Access Over AI Data Scraping Concerns
News publishers including The Guardian and The New York Times are restricting access to their content in the Internet Archive's Wayback Mach
Over 340 local news outlets block Internet Archive's Wayback Machine over AI scraping concerns
Major newspaper chains including McClatchy, Advance Local, and Tribune Publishing have joined The New York Times, The Guardian, and USA Toda
Publishers Blocking Internet Archive Threaten Web History Preservation
The article discusses how major publishers like The New York Times are blocking the Internet Archive's Wayback Machine from archiving their
The Wayback Machine: Preserving Digital History by the Internet Archive
The Wayback Machine, an initiative by the non-profit Internet Archive, serves as a digital library preserving Internet sites and cultural ar
The Case Against Blocking LLM Crawlers on Websites
The article argues against blocking large-language-model (LLM) crawlers from websites, comparing it to allowing Google to index content. It
Web Infrastructure Companies Fight Back Against Unauthorized AI Data Scraping
The article discusses how major AI companies like OpenAI, Google, Meta, and Anthropic have been scraping web content without permission for
