All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

New benchmark reveals AI models often cite wrong sources even when answers are correct

By

Jonathan Kemper

4d ago· 5 min readenNews

Summary

Researchers at Peking University have developed CiteVQA, a new benchmark that tests whether AI models can correctly cite source documents when answering questions. The study reveals that leading AI models like GPT and Gemini frequently suffer from "attribution hallucination" — providing correct answers but pointing to wrong or irrelevant source passages. This poses significant risks for regulated fields like law, medicine, and financial auditing, where traceability and evidence verification are critical. CiteVQA is the first systematic benchmark designed to evaluate both answer accuracy and citation correctness in document analysis tasks.

Key quotes

· 4 pulled
Standard document analysis tests like DocVQA or MMLongBench-Doc only grade the final answer. They can't tell whether a model actually pulled information from the document or just guessed based on what it already knew.
In law, financial audits, or medicine, though, traceability is what makes an AI output usable in the first place, the paper argues.
CiteVQA makes models back up every statement with...
Researchers at Peking University call this 'attribution hallucination,' a risk for regulated fields like law and medicine.
Snippet from the RSS feed
Leading AI models like GPT and Gemini routinely cite text passages in document analyses that don't actually support their answers. Even when the answer is right, the cited evidence is often wrong. Researchers at Peking University call this "attribution ha

You might also wanna read