All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Technical Guide: Protecting Forgejo Instances from AI Web Crawlers with Nginx Configuration

By

todsacerdoti

5mo ago· 4 min readen

Summary

The article provides a technical guide on implementing an nginx configuration to protect a Forgejo instance from AI web crawlers while maintaining accessibility for legitimate users. The solution uses a cookie-based system that blocks crawlers by default but allows access to users with a specific cookie or using git-related user agents. The configuration returns a 418 status code with JavaScript that sets the required cookie and reloads the page, creating minimal friction for human users while effectively blocking automated crawlers.

Key quotes

· 4 pulled
TL;DR:Put that in your nginx config:location / {
# needed to still allow git clone from http/https URLs if ($http_user_agent ~* "git/|git-lfs/") { set $bypass_cookie 1; }
# If we see the expected cookie; we could also bypass the blocker page if ($cookie_Yogsototh_opens_the_door = "1") { set $bypass_cookie 1; }
# Redirect to 418 if neither condition is met if ($bypass_cookie != 1) { add_header Content-Type text/html always; return 418 '<script>document.cookie = "Yogsototh_opens_the_door=1; Path=/"; window.location.reload();</script>'; }
Snippet from the RSS feed
This article describes my nginx configuration and strategy on how to prevent web crawlers from putting down my instance while still serving most people with minimal amount of friction.

You might also wanna read