Source-code-aware scans
Traditional dynamic scanners guess what your app does. Source-code-aware scans let Pentestas read your codebase first, produce a security architecture map, and use that map to guide every subsequent analysis phase. Turns Pentestas from pure DAST into hybrid SAST+DAST.
When to use
- Scanning your own application and you have the repo on disk.
- You want higher signal + lower false-positive rate than black-box alone delivers.
- You're scanning a complex framework (Next.js, Rails, Spring, Django, etc.) where the LLM-picked payloads benefit from knowing the exact routing, auth middleware, and sink locations.
- You want evidence traced back to the line of code.
How it works
- You supply a repo path (local checkout) or a git URL (Pentestas shallow-clones).
- A new Source Code Analysis phase runs at scan-start. A large-tier Claude agent produces a
source_code_intel.mddeliverable covering: stack fingerprint, attack-surface catalogue (every endpoint + handler + auth), auth + session model, authorization map, input sinks, secrets, dangerous patterns, and a prioritised list of endpoints to drill into. - The intel is attached to the scan's config and handed to every downstream analyst: the Reconnaissance agent correlates it with live browser observations; each vulnerability specialist (Injection, XSS, SSRF, Auth, Authz) uses it to target specific endpoints and sinks rather than blind-fuzzing.
- Exploitation agents receive both hypotheses + code pointers, so the "no exploit, no report" gate can cite the exact vulnerable line.
Usage
From the UI
New scan → Advanced → Source code. Paste a repo path (if the scanner has filesystem access) or a git URL. The repo is shallow-cloned to a temp location, analysed, then cleaned up.
From the API
Two fields on POST /api/scans:
repo_url: https or ssh git URL. Shallow-cloned; 500 MB size cap.repo_path: local path (only useful for agents on the same host).
From a YAML config
description: "Rails e-commerce, PostgreSQL, Devise auth"
authentication:
login_type: form
login_url: https://app.example.com/login
credentials:
username: audit@example.com
password: "***"
login_flow:
- "Type $username into email"
- "Type $password into password"
- "Click Sign in"
success_condition:
type: url_contains
value: /dashboard
source_code:
repo_url: https://github.com/acme/ecommerce.git
From the CLI
pentestas start -u https://app.example.com -r /path/to/repo -c scan.yaml
What the code analyst produces
A structured Markdown report saved to <repo>/.pentestas/source_code_intel.md:
- Technology stack — framework + language + version, databases, auth libraries, deployment target.
- Attack surface — every HTTP route, handler location, auth requirement, parameters.
- Auth & session model — mechanism, middleware locations, role assignment, token structure.
- Authorization map — every sensitive endpoint's observed access control + ownership checks.
- Input sinks — SQL construction, shell execution, HTML rendering, URL fetching, file operations, deserialisation.
- Secrets — env vars used + any secrets committed to the repo.
- Dangerous patterns — specific file:line callouts for eval / string-SQL / unsafe deserialise / mass-assignment.
- Focus areas for DAST — prioritised list of endpoints the downstream specialists should drill into.
- Critical files — flat list of every file referenced above.
The analyst is instructed never to invent endpoints or claim vulnerabilities without citing the code.
Privacy + security
- Read-only — the clone is mounted read-only. Pentestas never modifies your code.
- Shallow clone — depth 1, no history. Only the current state is analysed.
- Size cap — 500 MB enforced. Rogue or accidentally-huge repos fail fast.
- No external calls from the code — analysis is pure static reading; no build / test / execute.
- Cleanup — temp clones are deleted at scan-end. Local
repo_pathdirectories are never touched. - Encryption at rest — when the intel is persisted to the scan's
configJSONB, it's tenant-Fernet-encrypted alongside every other sensitive field.
Model tier
Source-code analysis uses the large model tier (Opus by default) — the only phase in Pentestas that requires it. 100K+ token context windows benefit from strong long-context reasoning. Override via ANTHROPIC_LARGE_MODEL env var.
See also
- YAML scan config — reusable scan definitions
- AI specialist agents — per-category LLM agents the intel feeds into
- Pentestas CLI — run scans + source analysis from CI