How to objectively benchmark your own product: Anonymization as a defense against bias
By
alanb99
3d ago· 2 min readenInsight
65/100
Toasty
Bagelometer↗
Reliable enough to start your morning with. Toast it again tomorrow.
Score65TypeanalysisSentimentneutral
Summary
A developer describes building a company-news API and the challenge of objectively benchmarking it against competitors. To overcome personal bias and LLM judge bias, they designed a three-defense anonymization system where five providers are shuffled and labeled A-E before evaluation, making it harder to cheat the benchmark.
Key quotes
· 3 pulledA benchmark I run on my own thing is worth almost nothing unless I can show I made it hard to cheat.
The five providers' names are shuffled and replaced with the letters A–E before judging.
I'm the author, so I'm biased — and I wanted to use an LLM as the judge, which makes it worse.
I built a company-news API and I wanted to know whether it was better than the alternatives. The problem: I’m the author, so I’m biased — and I wanted to use an LLM as the judge, which …