All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

MemoAttack: A Memory-Driven Framework for Automated LLM Jailbreak Attacks

By

[Submitted on 28 May 2026]

1d ago· 2 min readenInsight

Summary

This paper introduces MemoAttack, a novel memory-driven black-box jailbreak framework for large language models (LLMs). Unlike existing methods that rely on heuristic search or unstructured strategy pools, MemoAttack systematically organizes attack experience through three key components: (1) Skill-Structured Memory Modeling that abstracts attack experience into reusable units pairing skills with templates, evidence, and lifecycle states; (2) Lifecycle-Driven Memory Evolution that manages memory through probation, promotion, retirement, and elimination; and (3) Explore-Exploit Balanced Memory Selection using contextual Thompson Sampling. Experiments on AdvBench show MemoAttack achieves a 98.00% average attack success rate, outperforming the strongest baseline by 16.67 percentage points while reducing request count by 45.9%.

Key quotes

· 4 pulled
MemoAttack achieves an average attack success rate of 98.00%, outperforming the strongest baseline by 16.67 percentage points, while reducing request count by 45.9%.
Existing black-box jailbreak methods either depend on sample-wise heuristic search or leverage attack experience through accumulating strategy pools or method libraries, lacking a systematic organization and management of attack experience.
MemoAttack comprises three key designs: (1) Skill-Structured Memory Modeling, (2) Lifecycle-Driven Memory Evolution, and (3) Explore-Exploit Balanced Memory Selection.
MemoAttack continuously improves as memory accumulates over more samples.
Snippet from the RSS feed
Jailbreak attacks on large language models (LLMs) aim to induce LLMs to produce content that they are expected to refuse. Automated black-box jailbreak generation is especially important for safety evaluation, where the attacker observes only model output

You might also wanna read