pentest-kit is a Python library and pair of Claude Code harnesses for running adversarial tooling against your own web applications. One harness drives HexStrike for exploratory discovery; the other runs a deterministic regression suite. Both loop fresh Claude sessions until two consecutive runs come back clean, then exit. The kit is designed to be cloned once and shared across any number of app repos — each consumer keeps its own target YAMLs, hooks, and app-specific modules under a pentest/ directory.

I built it while pentesting Cookie, and the companion blog post walks through how the pieces fit together.

# Two harnesses

HexStrike harness (hexstrike/) — exploratory discovery. Claude drives 150+ pentesting tools through the HexStrike MCP server, autonomously writes Python probe scripts, runs them inside a Kali-based Docker container, files PRs for fixes, and converges when it stops finding new issues. Run weekly or after major feature additions.

Regression suite (pentest_kit/ + suite/) — deterministic regression. A Python orchestrator runs 11 generic bash modules against a target YAML, collects structured findings, writes JSON and Markdown reports, and a bash loop retries until two consecutive clean runs. Cheap enough to run on every deploy.

HexStrike finds things; you turn each finding into a regression test. Don't try to build the suite first.

# Features

  • Two independent harnesses with the same convergence rule (two consecutive clean rounds)
  • Multi-consumer architecture — one kit clone shared across N app repos via env vars; each consumer owns its own pentest/ directory
  • init-consumer.sh scaffolder generates per-app stub hooks, example target YAML, and a thin wrapper script
  • Python orchestrator (cli, config, runner, results, report, auth) with module discovery across PENTEST_SCRIPTS_DIRS — consumer modules win on collision
  • 11 generic bash modules: recon, headers, tls, nikto, nuclei, api, auth, injection, ssrf, infra, paths
  • 14 payload wordlists (sqli, xss, ssti, traversal, and more)
  • endpoint_groups YAML config makes modules app-agnostic — modules skip gracefully when keys are absent
  • Production target gate: production: true in YAML + # kit:destructive module header; destructive modules skipped by default, override with --allow-destructive
  • Four hook contracts (preflight, auth-bootstrap, health-check, POST_ITERATION_HOOK) — language-agnostic executables returning JSON
  • ssm_auth.py for AWS + SSM auth bootstrap with exact-name user lookup (no accidental real-account auth)
  • tool_exec helper falls back to the hexstrike-ai container when a tool is absent from host PATH
  • scan-for-leaks.sh CI hygiene gate — fails on consumer-specific identifiers outside allowlisted files
  • Kali-based HexStrike Docker image trimmed for web-app testing; multi-arch build published to ghcr.io
  • Virtual WebAuthn authenticator (UVSoftAuthenticator) that closes the upstream soft_webauthn gap on user-verification, allowing full passkey ceremony probes
  • Three Claude Code skills (/pentest, /pentest-align, /pentest-review) for interactive use alongside the harness
  • Test suite (bash + Python) with Ubuntu and macOS CI matrix; lint workflow covers shellcheck, JSON/YAML, and Python compileall
  • bootstrap.sh idempotent first-run setup; setup.sh for host-side dependencies

# Tech Stack

Python Bash Docker Kali Linux HexStrike MCP Claude Code nuclei ffuf sqlmap nmap testssl.sh GitHub Actions

# Multi-consumer setup

The kit is one clone shared across N app repos. Each consumer keeps its own configuration under <app>/pentest/:

git clone https://github.com/matthewdeaves/pentest-kit ~/pentest-kit
~/pentest-kit/setup.sh
~/pentest-kit/scripts/init-consumer.sh /path/to/your/app
# Edit pentest/targets/example.yaml — set url, auth_mode, endpoint_groups
/path/to/your/app/pentest/pentest.sh run example

The generated wrapper resolves PENTEST_KIT_DIR from env, ../pentest-kit, or ~/pentest-kit. Module discovery merges the kit's pentest_kit/scripts/ with the consumer's pentest/scripts/ — consumer wins on collision. App-specific modules (custom auth variants, proprietary endpoint tests) go in pentest/scripts/ and are picked up automatically.

# How a HexStrike round runs

Each round is a fresh Claude session driven by HARNESS_PROMPT.md. Six phases: read briefs and last round's report, pick attack themes and generate a probe script, run it in the container, write a Markdown report, fix any findings via subagent (with a regression test), then update briefs and loop-state.json. The next round picks up from there.

The harness wraps that with a bash loop, a 3-hour timeout per round, and a sleep between rounds long enough for any tested rate-limit budget to reset.

# Security model

The HexStrike container binds to 127.0.0.1:8888 only — never exposed on the LAN. It runs with NET_RAW and NET_ADMIN for raw-socket nmap scans but is isolated from the host through the Docker boundary. The harness uses Claude with --dangerously-skip-permissions, so it should only ever run inside an isolated VM or container where unrestricted tool access is acceptable.

Populated briefs, target configs, exploratory reports, and loop-state.json files are gitignored. Only the *.example.* templates ship with the repo.