23 major news sites block Internet Archive's Wayback Machine crawler, analysis finds
By
Jay Peters
Summary
Media organizations are increasingly blocking the Internet Archive's Wayback Machine crawler. An analysis by Originality AI found that 23 major news sites now block the crawler, with some claiming it's about blocking scraping bots in general rather than specifically targeting the Internet Archive. Reddit has also limited what the Wayback Machine can archive, citing concerns about AI companies scraping data from the archive.
Source
Key quotes
· 3 pulledAn analysis from Originality AI, an AI detection company, found that 23 big news sites block the Internet Archive's crawler, Wired reports.
A USA Today spokesperson told the publication that the move 'is not about specifically blocking the Internet Archive' but about attempting to block other scraping bots.
Reddit also limits what the Wayback Machine can archive, telling The Verge last year that it had learned that AI companies were scraping data from the Wayback Machine.
You might also wanna read
Over 340 local news outlets block Internet Archive's Wayback Machine over AI scraping concerns
Major newspaper chains including McClatchy, Advance Local, and Tribune Publishing have joined The New York Times, The Guardian, and USA Toda
News Publishers Restrict Internet Archive Access Over AI Data Scraping Concerns
News publishers including The Guardian and The New York Times are restricting access to their content in the Internet Archive's Wayback Mach
Publishers Blocking Internet Archive Threaten Web History Preservation
The article discusses how major publishers like The New York Times are blocking the Internet Archive's Wayback Machine from archiving their
Internet Archive Reaches 1 Trillion Web Pages Preserved in Wayback Machine
The Internet Archive is celebrating a major milestone of preserving 1 trillion web pages through its Wayback Machine. Since 1996, the organi
blog.archive.org·8mo agoThe Wayback Machine: Preserving Digital History by the Internet Archive
The Wayback Machine, an initiative by the non-profit Internet Archive, serves as a digital library preserving Internet sites and cultural ar
Website Blocks Old Browsers to Combat LLM Training Crawlers in 2025
A website owner explains that visitors are seeing an error message because their browsers are being blocked by anti-crawler measures. The si
Comments
Sign in to join the conversation.
No comments yet. Be the first.
