All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Facebook Crawler Obsessively Requests Robots.txt File Thousands of Times Per Hour

By

Ndymium

3mo ago· 2 min readenInsight

Summary

A developer reports that Facebook's crawler (facebookexternalhit) has been making thousands of requests per hour to their self-hosted Forgejo instance's robots.txt file, but not accessing any other files. All requests come from Meta's legitimate IP ranges, suggesting this is genuine Facebook activity rather than spoofing. The author finds this behavior puzzling since Facebook's documentation states the crawler's purpose is to crawl content, not just robots.txt files repeatedly.

Key quotes

· 4 pulled
Facebook has been hitting the /robots.txt of my self-hosted Forgejo instance several times per second.
The interesting thing is that no other file is being accessed. Just robots.txt over and over and over again.
Facebook's documentation states: The primary purpose of FacebookExternalHit is to crawl the content of an
Facebook is requesting my robots.txt thousands of times per hour.
Snippet from the RSS feed
Facebook is requesting my robots.txt thousands of times per hour.

You might also wanna read