All Topics

Technology

Art

Study Reveals AI Models Struggle with Tax Calculations, Succeeding in Less Than One-Third of Cases

handfuloflight

7mo ago· 1 min readenInsight

75/100

Toasty

Bagelometer↗

Not artisan, but a perfectly fine bagel. Hits the spot.

Score75TypeanalysisSentimentneutral

Summary

Researchers introduce TaxCalcBench, a benchmark to evaluate AI models' ability to calculate US personal income taxes. The study finds that even state-of-the-art models succeed in calculating less than a third of federal income tax returns correctly, with common errors including misuse of tax tables, calculation mistakes, and incorrect eligibility determinations. The research highlights the current limitations of AI in handling complex tax calculations despite having all necessary information.

Key quotes

· 4 pulled

Can AI file your taxes? Not yet.

Our experiment shows that state-of-the-art models succeed in calculating less than a third of federal income tax returns even on this simplified sample set.

Our analysis concludes that models consistently misuse tax tables, make errors in tax calculation, and incorrectly determine eligibility.

Our findings point to the need for additional infrastructure to apply LLMs to the personal income tax calculation task.

Snippet from the RSS feed

Can AI file your taxes? Not yet. Calculating US personal income taxes is a task that requires building an understanding of vast amounts of English text and using that knowledge to carefully compute results. We propose TaxCalcBench, a benchmark for determi

You might also wanna read

New ITBench-AA Benchmark Reveals AI Models Struggle with Enterprise SRE Tasks

ITBench-AA, a new benchmark developed by Artificial Analysis and IBM Research over six months, reveals that leading AI models like Claude Op

genainews.tech·4d ago

Major AI models fail EU legal compliance tests, Aithos study finds

Nonprofit AI research foundation Aithos developed a tool called LARA (Legal Assessment for Real-world Agents) to evaluate AI models' complia

theregister.com·4d ago

Study: Major AI systems from Google, OpenAI, and Anthropic frequently violate EU law in controlled tests

A study from Amsterdam-based AI institute Aithos tested 12 AI models (including systems from Google, OpenAI, and Anthropic) across roughly 1

dlvr.it·2d ago

New benchmark reveals AI models often cite wrong sources even when answers are correct

Researchers at Peking University have developed CiteVQA, a new benchmark that tests whether AI models can correctly cite source documents wh

the-decoder.com·4d ago