Technology

Art

Experiment reveals LLMs fabricate schema markup data rather than genuinely parsing it

Mark Williams-Cook

18d ago· 26 min readenInsight

technology programming ai & machine learning seo & search

Summary

An experiment testing whether large language models actually parse schema markup or simply fabricate responses. The author placed fake company address data in invalid JSON-LD schema markup (on a page about ducks, with no visible address text) and asked various LLMs where the company was based. The LLMs confidently returned the fake address, claiming to have consulted the structured data. The experiment was picked up by Search Engine Roundtable, and the author critiques the GEO (Generative Engine Optimization) industry for treating this as a win, arguing it actually reveals LLMs' tendency to hallucinate rather than genuinely parse structured data.

Source

bskyExperiment reveals LLMs fabricate schema markup data rather than genuinely parsing itsearchenginejournal.com

Key quotes

· 3 pulled

I put a fake company address (inside beautifully invalid JSON-LD, on a page about ducks) into the head of an HTML document, mentioned no address anywhere in the visible text, and then asked various LLMs where the company was based.

They happily told me, several of them citing the 'structured data' they had so studiously consulted.

That is not the win the GEO industry thinks it is.

Snippet from the RSS feed

I built a fake company with nonsense schema. The LLMs returned the address anyway. That is not the win the GEO industry thinks it is.

You might also wanna read

The Problem with Structured Outputs in LLMs: How Constrained Decoding Creates False Confidence

This article critiques the use of structured outputs and constrained decoding in large language models (LLMs), arguing that while these tech

boundaryml.com·6mo ago

Hacker News Discussion: Addressing Blind Trust in Large Language Models

This Hacker News discussion thread explores the challenge of dealing with people who blindly trust Large Language Models (LLMs) as sources o

news.ycombinator.com·3mo ago

The Ethical Dilemma of LLM Training Data and Content Creator Rights

The article discusses the ethical issue of Large Language Models (LLMs) being trained on web content without authors' consent. It criticizes

heydonworks.com·10mo ago

The Science of Detecting LLM-Generated Text

dl.acm.org·4mo ago

Large Language Models Enable Effective Deanonymization of Pseudonymous Online Users

Researchers demonstrate that large language models can effectively perform large-scale deanonymization attacks, re-identifying pseudonymous