All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Faking a JPEG: Generating Fake Content for Web Crawlers

By

todsacerdoti

10mo ago· 6 min readenInsight

Summary

The article discusses "Faking a JPEG" — a technique or project related to generating fake JPEG images, likely as part of a broader discussion about Spigot, a web application that generates fake web page hierarchies using Markov Chains to feed aggressive web crawlers. The author reflects on Spigot's operation over several months, serving over a million pages per day, and notes observations about crawler behavior.

Key quotes

· 5 pulled
I've been wittering on about Spigot for a while.
It's small web application which generates a fake hierarchy of web pages, on the fly, using a Markov Chain to make gibberish content for aggressive web crawlers to ingest.
Spigot has been sitting there, doing its thing, for a few months now, serving over a million pages per day.
I've not really been keeping track of what it's up to, but every now and then I look at its logs to see what crawlers are hitting it.
Sadly, two of the hardest-hitting crawlers go to
Snippet from the RSS feed
Creating something that seems like a JPEG, very quickly

You might also wanna read