All Topics

Technology

Art

Sup AI: Ensemble System Using 339 LLMs to Reduce Hallucinations Scores 52.15% on Humanity's Last Exam

Ken Mueller

2mo ago· 1 min readenProduct

38/100

Stale

Bagelometer↗

Best dunked in coffee. Better still, swap for a fresh one.

Score38Typepress releaseSentimentpositive

Summary

Sup AI is an AI ensemble system that runs 339 different large language models in parallel to reduce hallucinations. It measures confidence on every segment of output, downweighting high-entropy (likely hallucinated) content and amplifying low-entropy (likely accurate) content. The system achieved 52.15% on Humanity's Last Exam, outperforming any individual model by 7.41 points. The article promotes the product with a $10 starter credit offer.

Key quotes

· 5 pulled

Every LLM hallucinates. They just don't hallucinate the same things.

Sup AI runs multiple LLMs (out of 339) in parallel, then synthesizes answers by measuring confidence on every segment.

High entropy = likely hallucination, downweighted. Low entropy = likely accurate, amplified.

Result: 52.15% on Humanity's Last Exam, 7.41 points ahead of any individual model.

$10 starter credit. Card verified. No auto-charge.

Snippet from the RSS feed

Every LLM hallucinates. They just don't hallucinate the same things. Sup AI runs multiple LLMs (out of 339) in parallel, then synthesizes answers by measuring confidence on every segment. High entropy = likely hallucination, downweighted. Low entropy = li

You might also wanna read

Berry: A Workflow Verification System for Detecting AI Hallucinations in Code Generation

Berry is a workflow verification system that helps detect hallucinations in AI-generated code and content. It provides playbooks with before

github.com·8mo ago

AI tools produce fewer hallucinations but more confidently wrong answers, study warns

AI tools are producing fewer obvious hallucinations but are increasingly generating inaccurate information presented with polished, hyper-co

axios.com·20h ago

OpenAI Research Explains Why Language Models Hallucinate and How to Improve Reliability

OpenAI's research paper explains that language models hallucinate because standard training and evaluation procedures reward guessing over a

openai.com·8mo ago

OpenAI says GPT-5.5 Instant reduces ChatGPT hallucinations by over 50% on high-stakes prompts

OpenAI claims its new GPT-5.5 Instant model, now the default for ChatGPT, hallucinates significantly less than the previous GPT-5.3 Instant

The Verge·26d ago

Human Conversations Display LLM-Like Failure Modes: Limited Context, Overgeneration, and Hallucination

This reflective essay explores how classic Large Language Model (LLM) failure modes—such as limited context, overgeneration, poor generaliza

embd.cc·4mo ago

AI systems achieve 50% pass rate in standard three-party Turing test, study finds

This paper demonstrates that three current AI systems (when suitably prompted) achieve a pass rate of at least 50% in a standard three-party

pnas.org·4d ago