Research Study: AI Agents vs Human Cybersecurity Professionals in Penetration Testing
By
littlexsparkee
A baker's-dozen of insight crammed into one ring.
Summary
This research paper presents the first comprehensive evaluation comparing AI agents to human cybersecurity professionals in real-world penetration testing. The study tested ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, a new multi-agent framework developed by the researchers. In a live enterprise environment with ~8,000 hosts across 12 subnets, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate, outperforming 9 of 10 human participants. The research found AI agents offer advantages in systematic enumeration, parallel exploitation, and cost-effectiveness ($18/hour vs $60/hour for professionals), but also identified key gaps including higher false-positive rates and struggles with GUI-based tasks.
Key quotes
· 5 pulledARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants.
AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers.
We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.
ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants.
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment.
You might also wanna read

AI bug-finding systems uncover real vulnerabilities at DARPA cybersecurity challenge
The article discusses the DARPA AI Cyber Challenge (AIxCC) held in Las Vegas, where top cybersecurity teams demonstrated AI-powered bug-find
AI-Assisted Exploit Development Time Drops from 125 Days to 12 Hours, Outpacing Scanners
New research from Cogent Research analyzing 69,159 CVEs reveals that AI-assisted attackers have reduced exploit development time from 125.3
Strix: Open-Source AI Penetration Testing Agent for Automated Security Vulnerability Detection
Strix is an open-source AI penetration testing agent that automatically finds and validates security vulnerabilities in applications, genera
Comparing AI Agent Frameworks: Hermes Agent, AutoGPT, OpenAI Agents, and CrewAI in 2026
A practical, engineering-focused comparison of major AI agent frameworks in 2026, including Hermes Agent, AutoGPT, OpenAI Agents, and CrewAI
cstu.io·16h agoLaunchSafe: AI-Powered Penetration Testing Platform for Application Security
LaunchSafe offers AI-powered penetration testing that uses autonomous agents to actively attempt to hack applications across both code and l
