Vera: An Automated Safety Testing Framework for LLM Agents Reveals 93.9% Attack Success Rates

[Submitted on 2 Jul 2026]

2h ago· 2 min readenInsight

technology science software testing ai safety

Summary

This paper presents Vera, an end-to-end automated safety testing framework for LLM agents that addresses the limitations of existing safety testing approaches. Vera uses a three-stage pipeline: (1) literature-driven exploration to discover and structure emerging risks into taxonomies, (2) combinatorial composition across taxonomy dimensions to produce executable safety cases with concrete goals and verification predicates, and (3) adaptive execution where a control agent steers multi-turn interactions in isolated sandboxes with evidence-grounded verifiers. The framework was evaluated on four production agent frameworks (OpenClaw, Hermes, Codex, Claude Code), revealing average attack success rates of 93.9% under multi-channel attacks. The authors also release Vera-Bench, comprising 1,600 executable safety cases across 124 risk categories.

Source

bskyVera: An Automated Safety Testing Framework for LLM Agents Reveals 93.9% Attack Success Ratesarxiv.org

Key quotes

· 5 pulled

LLM agents increasingly perform autonomous actions through external tools, leading to complex and evolving safety risks.

Existing safety testing targets expert-designed safety violations, and the corresponding outcomes are evaluated by hard-coded rules, making them costly to extend as agents evolve.

We evaluate Vera on four production agent frameworks (OpenClaw, Hermes, Codex, Claude Code), revealing substantial safety weaknesses, with average attack success rates reaching 93.9% under multi-channel attacks.

These results indicate that modular, executable testing infrastructure is essential for rigorous and maintainable safety evaluation of rapidly evolving agentic systems at scale.

We also release Vera-Bench, comprising 1600 executable safety cases spanning 124 risk categories across three execution settings.

Snippet from the RSS feed

LLM agents increasingly perform autonomous actions through external tools, leading to complex and evolving safety risks. However, existing safety testing targets expert-designed safety violations, and the corresponding outcomes are evaluated by hard-coded

You might also wanna read

Study Reveals Domain-Camouflaged Injection Attacks Bypass LLM Detection Systems

This research paper identifies a critical vulnerability in injection detectors used to protect LLM agents. The authors demonstrate that when

arxiv.org·1mo ago

Open-Source LLM Safety Vulnerabilities: How Chat Template Formatting Gates Alignment in Models Like Gemma and Qwen

This article reveals a critical vulnerability in open-source large language models (LLMs) where safety alignment can be bypassed by simply o

teendifferent.substack.com·5mo ago

Security Analysis: AI Agent Frameworks' Code Execution Vulnerabilities and WASM Sandbox Solution

The article discusses security vulnerabilities in popular AI agent frameworks like LangChain, AutoGen, and SWE-Agent that execute LLM-genera

github.com·5mo ago

Vera: A programming language designed for LLMs with compile-time verification and no variable names

Vera is a novel programming language specifically designed for large language models (LLMs) to write. It eliminates variable names in favor

GitHub·2mo ago

Formal Framework for LLM-Verifier Systems: Convergence Theorem and 4/δ Latency Bound

This research paper presents a formal framework for integrating Large Language Models with Formal Verification tools, addressing reliability

arxiv.org·6mo ago

New Research Papers Address LLM Security and Prompt Injection Vulnerabilities

The article discusses two new research papers on LLM security and prompt injection vulnerabilities. The first paper, 'Agents Rule of Two: A

simonwillison.net·8mo ago

Comments

No comments yet. Be the first.