Faking a JPEG: Generating Fake Content for Web Crawlers
By
todsacerdoti
The bagel they save for the regulars. Don't skim, savour.
Summary
The article discusses "Faking a JPEG" — a technique or project related to generating fake JPEG images, likely as part of a broader discussion about Spigot, a web application that generates fake web page hierarchies using Markov Chains to feed aggressive web crawlers. The author reflects on Spigot's operation over several months, serving over a million pages per day, and notes observations about crawler behavior.
Key quotes
· 5 pulledI've been wittering on about Spigot for a while.
It's small web application which generates a fake hierarchy of web pages, on the fly, using a Markov Chain to make gibberish content for aggressive web crawlers to ingest.
Spigot has been sitting there, doing its thing, for a few months now, serving over a million pages per day.
I've not really been keeping track of what it's up to, but every now and then I look at its logs to see what crawlers are hitting it.
Sadly, two of the hardest-hitting crawlers go to
You might also wanna read
WebSparks: An AI-Powered Tool for Building Web Applications Without Extensive Coding
WebSparks is an AI-powered software engineer that transforms ideas into fully functional web applications without requiring extensive coding
innovirtuoso.com·23h agoJoost de Valk publishes open Website Specification: 128 rules for modern, future-proof websites
Joost de Valk, creator of Yoast SEO, published the Website Specification (specification.website) — an open, platform-agnostic reference docu
ZX Spectrum BASIC interpreter rebuilt from scratch to run natively in web browsers
A developer has rebuilt the ZX Spectrum's BASIC interpreter from scratch to run in a web browser, without emulating the original Z80 hardwar
Building mobile-safe layouts with CSS safe-area-inset properties
This article explains how to use CSS safe-area-inset properties to build mobile-friendly layouts that account for non-rectangular screen fea
How to Set Up an Apache Reverse Proxy for an Ecommerce Website
This article provides a comprehensive, start-to-finish guide on setting up an Apache reverse proxy specifically for ecommerce websites. It c
blog.radwebhosting.com·2d agoImplementing live text search in React with Firestore Enterprise's built-in search pipeline
Firebase's Firestore Enterprise edition now includes built-in text search support. This article demonstrates how to implement live text sear
firebase.blog·2d ago