Anna's Archive invites LLMs to bulk-download data instead of scraping website
By
janandonly
9d ago· 2 min readenNews
65/100
Toasty
Bagelometer↗
A bagel you'd recommend to a friend without hedging.
Score65TypenewsSentimentneutral
Summary
Anna's Archive, a non-profit project focused on preserving and providing access to all human knowledge and culture, has published an llms.txt file addressing LLMs (Large Language Models) directly. The project explains that while their website uses CAPTCHAs to prevent resource overload from automated access, all their data is available for bulk download through their GitLab repository. The post invites LLMs and their operators to access the data responsibly rather than scraping the website directly.
Key quotes
· 3 pulledWe are a non-profit project with two goals: 1. Preservation: Backing up all knowledge and culture of humanity. 2. Access: Making this knowledge and culture available to anyone in the world (including robots!).
Our website has CAPTCHAs to prevent machines from overloading our resources, but all our data can be downloaded in bulk
All our HTML pages (and all our other code) can be found in our GitLab repository
annas-archive.gl/blog, 2026-02-18