Source-code-aware scans

Traditional dynamic scanners guess what your app does. Source-code-aware scans let Pentestas read your codebase first, produce a security architecture map, and use that map to guide every subsequent analysis phase. Turns Pentestas from pure DAST into hybrid SAST+DAST.

When to use

Scanning your own application and you have the repo on disk.
You want higher signal + lower false-positive rate than black-box alone delivers.
You're scanning a complex framework (Next.js, Rails, Spring, Django, etc.) where the LLM-picked payloads benefit from knowing the exact routing, auth middleware, and sink locations.
You want evidence traced back to the line of code.

How it works

You supply a repo path (local checkout) or a git URL (Pentestas shallow-clones).
A new Source Code Analysis phase runs at scan-start. A large-tier Claude agent produces a source_code_intel.md deliverable covering: stack fingerprint, attack-surface catalogue (every endpoint + handler + auth), auth + session model, authorization map, input sinks, secrets, dangerous patterns, and a prioritised list of endpoints to drill into.
The intel is attached to the scan's config and handed to every downstream analyst: the Reconnaissance agent correlates it with live browser observations; each vulnerability specialist (Injection, XSS, SSRF, Auth, Authz) uses it to target specific endpoints and sinks rather than blind-fuzzing.
Exploitation agents receive both hypotheses + code pointers, so the "no exploit, no report" gate can cite the exact vulnerable line.

Usage

From the UI

New scan → Advanced → Source code. Paste a repo path (if the scanner has filesystem access) or a git URL. The repo is shallow-cloned to a temp location, analysed, then cleaned up.

From the API

Two fields on POST /api/scans:

repo_url: https or ssh git URL. Shallow-cloned; 500 MB size cap.
repo_path: local path (only useful for agents on the same host).

From a YAML config

yaml

description: "Rails e-commerce, PostgreSQL, Devise auth"

authentication:
  login_type: form
  login_url: https://app.example.com/login
  credentials:
    username: audit@example.com
    password: "***"
  login_flow:
    - "Type $username into email"
    - "Type $password into password"
    - "Click Sign in"
  success_condition:
    type: url_contains
    value: /dashboard

source_code:
  repo_url: https://github.com/acme/ecommerce.git

From the CLI

bash

pentestas start -u https://app.example.com -r /path/to/repo -c scan.yaml

What the code analyst produces

A structured Markdown report saved to <repo>/.pentestas/source_code_intel.md:

Technology stack — framework + language + version, databases, auth libraries, deployment target.
Attack surface — every HTTP route, handler location, auth requirement, parameters.
Auth & session model — mechanism, middleware locations, role assignment, token structure.
Authorization map — every sensitive endpoint's observed access control + ownership checks.
Input sinks — SQL construction, shell execution, HTML rendering, URL fetching, file operations, deserialisation.
Secrets — env vars used + any secrets committed to the repo.
Dangerous patterns — specific file:line callouts for eval / string-SQL / unsafe deserialise / mass-assignment.
Focus areas for DAST — prioritised list of endpoints the downstream specialists should drill into.
Critical files — flat list of every file referenced above.

The analyst is instructed never to invent endpoints or claim vulnerabilities without citing the code.

Privacy + security

Read-only — the clone is mounted read-only. Pentestas never modifies your code.
Shallow clone — depth 1, no history. Only the current state is analysed.
Size cap — 500 MB enforced. Rogue or accidentally-huge repos fail fast.
No external calls from the code — analysis is pure static reading; no build / test / execute.
Cleanup — temp clones are deleted at scan-end. Local repo_path directories are never touched.
Encryption at rest — when the intel is persisted to the scan's config JSONB, it's tenant-Fernet-encrypted alongside every other sensitive field.

Model tier

Source-code analysis uses the large model tier (Opus by default) — the only phase in Pentestas that requires it. 100K+ token context windows benefit from strong long-context reasoning. Override via ANTHROPIC_LARGE_MODEL env var.