Facebook Crawler Obsessively Requests Robots.txt File Thousands of Times Per Hour
By
Ndymium
A respectable bake. You'd come back tomorrow for another.
Summary
A developer reports that Facebook's crawler (facebookexternalhit) has been making thousands of requests per hour to their self-hosted Forgejo instance's robots.txt file, but not accessing any other files. All requests come from Meta's legitimate IP ranges, suggesting this is genuine Facebook activity rather than spoofing. The author finds this behavior puzzling since Facebook's documentation states the crawler's purpose is to crawl content, not just robots.txt files repeatedly.
Key quotes
· 4 pulledFacebook has been hitting the /robots.txt of my self-hosted Forgejo instance several times per second.
The interesting thing is that no other file is being accessed. Just robots.txt over and over and over again.
Facebook's documentation states: The primary purpose of FacebookExternalHit is to crawl the content of an
Facebook is requesting my robots.txt thousands of times per hour.
You might also wanna read
WebSparks: An AI-Powered Tool for Building Web Applications Without Extensive Coding
WebSparks is an AI-powered software engineer that transforms ideas into fully functional web applications without requiring extensive coding
innovirtuoso.com·1d agoJoost de Valk publishes open Website Specification: 128 rules for modern, future-proof websites
Joost de Valk, creator of Yoast SEO, published the Website Specification (specification.website) — an open, platform-agnostic reference docu
ZX Spectrum BASIC interpreter rebuilt from scratch to run natively in web browsers
A developer has rebuilt the ZX Spectrum's BASIC interpreter from scratch to run in a web browser, without emulating the original Z80 hardwar
Building mobile-safe layouts with CSS safe-area-inset properties
This article explains how to use CSS safe-area-inset properties to build mobile-friendly layouts that account for non-rectangular screen fea
How to Set Up an Apache Reverse Proxy for an Ecommerce Website
This article provides a comprehensive, start-to-finish guide on setting up an Apache reverse proxy specifically for ecommerce websites. It c
blog.radwebhosting.com·3d agoImplementing live text search in React with Firestore Enterprise's built-in search pipeline
Firebase's Firestore Enterprise edition now includes built-in text search support. This article demonstrates how to implement live text sear
firebase.blog·3d ago