LLM Benchmark Results: Magic: The Gathering AI Competition Rankings

mage-bench is a benchmark where LLMs play Magic: The Gathering against each other.

GregorStocks4mo ago2 min readenNews

You might also wanna read

AI is taking on the complex, ever-changing world of Magic: The Gathering, setting a new benchmark in predicting draft outcomes.

One of Meta's newest AI models, Llama 4 Maverick, ranks below rivals on a popular chat benchmark. Meta didn't originally reveal the score.

Production-grade LLM inference is a complex systems challenge, requiring deep co-designs - from hardware primitives (FLOPs, memory bandwidth

Four legal AI benchmarks, four different leaders: Claude Fable 5 on LegalBench (88.6), GPT-5.6 Sol on Legal Research Bench (48.1), Muse Spar

LLM Stats is the go-to place to analyze and compare AI models across benchmarks, pricing and capabilities. Compare model performance easily

A reader caught BenchLM ranking Qwen3.7 Max below its own cheaper sibling. The bug was not a data error. It was the averaging method almost

No comments yet. Be the first.