AI specialist agents

One giant AI prompt has to be a generalist. Five focused prompts, each with a dedicated methodology for a single vulnerability class, outperform the generalist across every category. Pentestas ships specialist agents for the five OWASP categories that produce the most exploitable findings.

The specialists

Injection Analyst

Covers SQLi (union / blind boolean / time / second-order / NoSQL), OS command injection, LDAP, XPath, SSI, and SSTI. The prompt enforces a rigid verification framework:

Source — user-controlled input must reach the sink, proven with a specific endpoint + parameter.
Sink — a real dangerous call (not a false-positive like a logged parameter).
Sanitisation — no full sanitisation intercepts the flow. Partial sanitisation (addslashes, encoder allow-lists) is explicitly called out.
Impact — data exfil / RCE / auth bypass / lateral DB access.

Database-aware: if the stack is fingerprinted as PostgreSQL, SQLi hypotheses use PostgreSQL-specific dialect (union-column counts, pg_sleep, error messages).

XSS Analyst

Covers reflected, stored, DOM, mutation XSS, SVG / MathML tricks, and polyglot payloads. Key discipline: context-aware payloads. A reflected parameter inside a JS string literal isn't exploitable by a plain HTML tag — it's exploitable by breaking the quote. Every hypothesis is tagged with its render context and a context-specific payload family.

CSP-aware. Framework auto-escape-aware (if React / Angular / Vue auto-escapes, reflected-parameter XSS is usually only reachable via specific opt-outs).

SSRF Analyst

Covers classic (URL parameter), blind (with OOB oracle), protocol confusion, DNS rebinding, cloud-metadata escalation. Cloud-aware: AWS IMDSv2 blocks most SSRFs; Azure Managed Identity endpoints have distinct shapes; GCP metadata needs Metadata-Flavor: Google header. SSRF-to-RCE paths get flagged as a separate downstream finding.

Auth Analyst

Covers default creds, rate-limit gaps, predictable tokens, JWT attacks (alg=none, weak signing, algorithm confusion, JWK header injection), password-reset oracles, OAuth flow flaws, MFA bypass via "trust this device" persistence. JWT-specific: if a JWT is observed in recon, the analyst decodes header + payload in its reasoning and proposes specific attacks against the observed shape.

Authz Analyst

The highest-value category. Covers IDOR, BFLA, mass-assignment, horizontal + vertical escalation, tenant isolation failures, workflow-state bypass, method confusion. Every endpoint with an object ID gets probe hypotheses; every mutation endpoint gets mass-assignment hypotheses with role-shaped fields; every multi-step workflow gets a skip-the-middle hypothesis.

How they work

The Reconnaissance phase builds a comprehensive attack-surface map.
In parallel, five specialist agents read the attack surface (plus source-code intelligence if white-box mode is on) and produce ranked lists of exploitable hypotheses.
Each hypothesis has: target endpoint, payload class, expected oracle, impact, priority.
The Exploitation phase (one per category, also parallel) receives the hypothesis queue and attempts to validate each with real-world attacks using browser automation, CLI tools, and custom scripts.
A strict "No exploit, no report" rule applies: if a hypothesis cannot be successfully exploited to demonstrate impact, it is discarded.

Output shape

Each analyst produces a Markdown deliverable + a JSON exploitation queue:

INJ-VULN-3

Vulnerability class: Blind SQL Injection
Target: POST /api/search
Parameter: query in body
Database: PostgreSQL (fingerprinted from error handler in reconnaissance)

Source evidence:
  Parameter reflected into response header X-Search-Time with 200ms delta when
  the string "' OR 1=1--" is added.

Sink evidence:
  src/db/queries.ts:42 builds the search SQL via template string, no param binding.

Proposed payload:
  '; SELECT pg_sleep(5)--

Expected oracle:
  Response time ~5s (vs. ~200ms baseline). Time-based confirm; no data exfil
  in this hypothesis.

Impact if proven:
  DB version enumeration → column dump → credential exfil via union-based SQLi
  as a follow-on.

Priority: CRITICAL

The downstream Exploitation Specialist ingests this, fires the payload, and only persists the finding if the oracle confirms.

Cap per category

To keep the scan bounded, each specialist produces at most: - Injection: 20 hypotheses - XSS: 20 - SSRF: 15 (narrower surface) - Auth: 15 - Authz: 25 (widest surface)

Quality beats quantity; the Exploitation Specialist can only validate a finite set per scan window.

Model tier

All five specialists run at the medium tier (Sonnet by default). Source-code analysis that feeds them is at the large tier (Opus). See Model tiers.