Vera: An Automated Safety Testing Framework for LLM Agents Reveals 93.9% Attack Success Rates
By
[Submitted on 2 Jul 2026]
Summary
This paper presents Vera, an end-to-end automated safety testing framework for LLM agents that addresses the limitations of existing safety testing approaches. Vera uses a three-stage pipeline: (1) literature-driven exploration to discover and structure emerging risks into taxonomies, (2) combinatorial composition across taxonomy dimensions to produce executable safety cases with concrete goals and verification predicates, and (3) adaptive execution where a control agent steers multi-turn interactions in isolated sandboxes with evidence-grounded verifiers. The framework was evaluated on four production agent frameworks (OpenClaw, Hermes, Codex, Claude Code), revealing average attack success rates of 93.9% under multi-channel attacks. The authors also release Vera-Bench, comprising 1,600 executable safety cases across 124 risk categories.
Source
Key quotes
· 5 pulledLLM agents increasingly perform autonomous actions through external tools, leading to complex and evolving safety risks.
Existing safety testing targets expert-designed safety violations, and the corresponding outcomes are evaluated by hard-coded rules, making them costly to extend as agents evolve.
We evaluate Vera on four production agent frameworks (OpenClaw, Hermes, Codex, Claude Code), revealing substantial safety weaknesses, with average attack success rates reaching 93.9% under multi-channel attacks.
These results indicate that modular, executable testing infrastructure is essential for rigorous and maintainable safety evaluation of rapidly evolving agentic systems at scale.
We also release Vera-Bench, comprising 1600 executable safety cases spanning 124 risk categories across three execution settings.
You might also wanna read
Study Reveals Domain-Camouflaged Injection Attacks Bypass LLM Detection Systems
This research paper identifies a critical vulnerability in injection detectors used to protect LLM agents. The authors demonstrate that when
Open-Source LLM Safety Vulnerabilities: How Chat Template Formatting Gates Alignment in Models Like Gemma and Qwen
This article reveals a critical vulnerability in open-source large language models (LLMs) where safety alignment can be bypassed by simply o
Security Analysis: AI Agent Frameworks' Code Execution Vulnerabilities and WASM Sandbox Solution
The article discusses security vulnerabilities in popular AI agent frameworks like LangChain, AutoGen, and SWE-Agent that execute LLM-genera
Vera: A programming language designed for LLMs with compile-time verification and no variable names
Vera is a novel programming language specifically designed for large language models (LLMs) to write. It eliminates variable names in favor
Formal Framework for LLM-Verifier Systems: Convergence Theorem and 4/δ Latency Bound
This research paper presents a formal framework for integrating Large Language Models with Formal Verification tools, addressing reliability
New Research Papers Address LLM Security and Prompt Injection Vulnerabilities
The article discusses two new research papers on LLM security and prompt injection vulnerabilities. The first paper, 'Agents Rule of Two: A

Comments
Sign in to join the conversation.
No comments yet. Be the first.