OpenAI Appears to Scrape Certificate Transparency Logs for New Websites to Crawl
By
pavel_lishin
Sesame, salt, and substance. A flagship bake.
Summary
A technical blog post describes how the author discovered that OpenAI appears to be scraping Certificate Transparency (CT) logs to find new websites to crawl. After minting a new TLS certificate, the author observed near-instantaneous requests from OpenAI's crawler to their server's robots.txt file, suggesting automated monitoring of CT logs for new domains to index.
Key quotes
· 3 pulledI minted a new TLS cert and it seems that OpenAI is scraping CT logs for what I assume are things to scrape from, based on the near instant response from this:
Dec 12 20:43:04 xxxx xxx[719]: l=debug m="http request" pkg=http httpaccess= handler=(nomatch) method=get url=/robots.txt host=autoconfig.benjojo.uk duration="162.176µs" statuscode=404 proto=http/2.0 remoteaddr=74.7.175.182:38242 tlsinfo=tls1.3 useragent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome
I minted a new TLS cert and it seems that OpenAI is scraping CT logs for what I assume are things to scrape from, based on the near instant response from ...
You might also wanna read
Understanding WebAuthn credential protection policy and discoverable credentials
This article explains the WebAuthn credential protection policy, specifically how developers can use the `residentKey` option to control whe
Let's Encrypt's Challenge: Creating Intentionally Broken Certificates for Testing
Let's Encrypt, as a Certificate Authority, faces unique challenges in testing certificate validation systems. While most tools focus on main
Website Uses Anubis Proof-of-Work System to Protect Against AI Scraping
The article explains that the website uses Anubis, a Proof-of-Work system similar to Hashcash, to protect against AI companies aggressively
Firefox 148 Introduces Standardized Sanitizer API for Enhanced XSS Protection
Firefox 148 introduces the standardized Sanitizer API as a security enhancement to protect against cross-site scripting (XSS) attacks. The n
Website Blocks Old Browsers to Combat LLM Training Crawlers
A website owner explains that visitors are seeing an error message because their browsers are being blocked by anti-crawler measures. The si

Website Implements Anubis Proof-of-Work System to Block AI Scraping
The article explains that the website is using Anubis, a Proof-of-Work system similar to Hashcash, to protect against AI companies aggressiv
