New benchmark reveals AI models often cite wrong sources even when answers are correct
By
Jonathan Kemper
Front-window bakery material. Catches the eye, delivers the goods.
Summary
Researchers at Peking University have developed CiteVQA, a new benchmark that tests whether AI models can correctly cite source documents when answering questions. The study reveals that leading AI models like GPT and Gemini frequently suffer from "attribution hallucination" — providing correct answers but pointing to wrong or irrelevant source passages. This poses significant risks for regulated fields like law, medicine, and financial auditing, where traceability and evidence verification are critical. CiteVQA is the first systematic benchmark designed to evaluate both answer accuracy and citation correctness in document analysis tasks.
Key quotes
· 4 pulledStandard document analysis tests like DocVQA or MMLongBench-Doc only grade the final answer. They can't tell whether a model actually pulled information from the document or just guessed based on what it already knew.
In law, financial audits, or medicine, though, traceability is what makes an AI output usable in the first place, the paper argues.
CiteVQA makes models back up every statement with...
Researchers at Peking University call this 'attribution hallucination,' a risk for regulated fields like law and medicine.
You might also wanna read
Why You Shouldn't Cite AI Language Models as Factual Sources
The article addresses the problematic practice of citing AI language models like ChatGPT as authoritative sources. It explains that large la
The Ongoing Problem of AI-Generated False Citations in Court Filings
This article examines the persistent problem of lawyers submitting court filings containing AI-hallucinated case citations, despite increase
AI Models Frequently Change Answers When Questioned: The "Are You Sure?" Problem
The article examines a phenomenon where AI language models like ChatGPT, Claude, and Gemini frequently change their answers when users ask "
Study Finds 67% Disagreement Rate Among Top AI Models on Real-World Fact-Checks
A research study by Lenz Research tested five frontier LLMs on 1,000 real-world fact-check claims submitted by users to a fact-checking plat
Confident Inaccuracy, Not Hallucination, Is the Key Barrier to AI Progress
The article argues that the main obstacle to AI progress (particularly AGI) is not raw intelligence or psychedelic hallucinations, but "conf
OpenAI Research Shows AI Hallucinations Are Mathematically Inevitable in Current Models
OpenAI's research paper provides a rigorous mathematical explanation for why AI language models like ChatGPT inevitably hallucinate (confide
theconversation.com·8mo ago