ITBench-AA Benchmark Launched: Frontier AI Models Score Below 50% on Enterprise IT Tasks
Artificial Analysis and IBM Software Innovation Lab have launched ITBench-AA, a new benchmark series evaluating AI models on agentic enterprise IT tasks, starting with Site Reliability Engineering (SRE). The benchmark tests models on Kubernetes incident response, requiring them t