All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Defending a Self-Hosted Git Forge Against AI Scraping Attacks

By

todsacerdoti

5mo ago· 30 min readenInsight

Summary

The article details a personal experience where the author discovered their self-hosted Git forge (Forgejo) was being overwhelmed by AI scrapers making hundreds of thousands of queries daily, causing severe performance issues. The author investigates the problem, identifies the scraping patterns, and implements various technical solutions to protect their system, including rate limiting, IP blocking, honeypot traps, and other defensive measures against automated data harvesting.

Key quotes

· 3 pulled
i investigated, thinking it would be a trivial little problem to solve. Soon enough, however, i would uncover hundreds of thousands of queries a day from thousands of individual IPs, fetching seemingly-random pages in my forge every single day, all the time.
This post summarizes the practical issues that arose as a result of the onslaught of scrapers eager to download millions of c
A summary of the techniques in place to protect my git forge
Snippet from the RSS feed
A summary of the techniques in place to protect my git forge

You might also wanna read