Research Study: AI Agents vs Human Cybersecurity Professionals in Penetration Testing

littlexsparkee

4mo ago· 2 min readenInsight

85/100

Golden Brown

Bagelometer↗

A baker's-dozen of insight crammed into one ring.

Score85TypeanalysisSentimentneutral

Summary

This research paper presents the first comprehensive evaluation comparing AI agents to human cybersecurity professionals in real-world penetration testing. The study tested ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, a new multi-agent framework developed by the researchers. In a live enterprise environment with ~8,000 hosts across 12 subnets, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate, outperforming 9 of 10 human participants. The research found AI agents offer advantages in systematic enumeration, parallel exploitation, and cost-effectiveness ($18/hour vs $60/hour for professionals), but also identified key gaps including higher false-positive rates and struggles with GUI-based tasks.

Key quotes

· 5 pulled

ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants.

AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers.

We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants.

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment.

Snippet from the RSS feed

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a l

You might also wanna read

AI bug-finding systems uncover real vulnerabilities at DARPA cybersecurity challenge

The article discusses the DARPA AI Cyber Challenge (AIxCC) held in Las Vegas, where top cybersecurity teams demonstrated AI-powered bug-find

The Verge·1mo ago

AI-Assisted Exploit Development Time Drops from 125 Days to 12 Hours, Outpacing Scanners

New research from Cogent Research analyzing 69,159 CVEs reveals that AI-assisted attackers have reduced exploit development time from 125.3

bit.ly·4d ago

Strix: Open-Source AI Penetration Testing Agent for Automated Security Vulnerability Detection

Strix is an open-source AI penetration testing agent that automatically finds and validates security vulnerabilities in applications, genera

Product Hunt·1mo ago

Comparing AI Agent Frameworks: Hermes Agent, AutoGPT, OpenAI Agents, and CrewAI in 2026

A practical, engineering-focused comparison of major AI agent frameworks in 2026, including Hermes Agent, AutoGPT, OpenAI Agents, and CrewAI

cstu.io·16h ago

LaunchSafe: AI-Powered Penetration Testing Platform for Application Security

LaunchSafe offers AI-powered penetration testing that uses autonomous agents to actively attempt to hack applications across both code and l

Product Hunt·3mo ago