Benchmark Study: AI Agents Using Ghidra to Detect Backdoors in Binary Executables
By
jakozaur
The bagel they save for the regulars. Don't skim, savour.
Summary
The article describes a benchmark study called BinaryAudit that evaluates AI agents' ability to detect backdoors in compiled binary executables. Researchers partnered with reverse engineering expert Michał "Redford" Kowalczyk to create a benchmark testing AI agents using Ghidra (NSA's reverse engineering tool) to find malicious code in ~40MB binaries of real open-source servers, proxies, and network infrastructure without access to source code. The benchmark measures detection accuracy, false positive rates, and tool proficiency for practical malware detection applications.
Key quotes
· 4 pulledWe partnered with Michał 'Redford' Kowalczyk, reverse engineering expert from Dragon Sector, known for finding malicious code in Polish trains, to create a benchmark of finding backdoors in binary executables, without access to source code.
See BinaryAudit for the full benchmark results — including false positive rates, tool proficiency, and the Pareto
BinaryAudit benchmarks AI agents using Ghidra to find backdoors in compiled binaries of real open-source servers, proxies, and network infrastructure.
We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
You might also wanna read

AI bug-finding systems uncover real vulnerabilities at DARPA cybersecurity challenge
The article discusses the DARPA AI Cyber Challenge (AIxCC) held in Las Vegas, where top cybersecurity teams demonstrated AI-powered bug-find
0xAudit: Security Platform for Autonomous AI Agents with MCP Protocol Scanning
0xAudit is a security audit platform designed specifically for autonomous AI agents. It enables AI agents to scan their own infrastructure u
