pentest-kit
AI-assisted web pentesting library and harnesses
pentest-kit is a Python library and pair of Claude Code harnesses for running adversarial tooling against your own web applications. One harness drives HexStrike for exploratory discovery; the other runs a deterministic regression suite. Both loop fresh Claude sessions until two consecutive runs come back clean, then exit. The kit is designed to be cloned once and shared across any number of app repos — each consumer keeps its own target YAMLs, hooks, and app-specific modules under a pentest/ directory.
I built it while pentesting Cookie, and the companion blog post walks through how the pieces fit together.
# Two harnesses
HexStrike harness (hexstrike/) — exploratory discovery. Claude drives 150+ pentesting tools through the HexStrike MCP server, autonomously writes Python probe scripts, runs them inside a Kali-based Docker container, files PRs for fixes, and converges when it stops finding new issues. Run weekly or after major feature additions.
Regression suite (pentest_kit/ + suite/) — deterministic regression. A Python orchestrator runs 11 generic bash modules against a target YAML, collects structured findings, writes JSON and Markdown reports, and a bash loop retries until two consecutive clean runs. Cheap enough to run on every deploy.
HexStrike finds things; you turn each finding into a regression test. Don't try to build the suite first.
# Features
- Two independent harnesses with the same convergence rule (two consecutive clean rounds)
- Multi-consumer architecture — one kit clone shared across N app repos via env vars; each consumer owns its own
pentest/directory init-consumer.shscaffolder generates per-app stub hooks, example target YAML, and a thin wrapper script- Python orchestrator (cli, config, runner, results, report, auth) with module discovery across
PENTEST_SCRIPTS_DIRS— consumer modules win on collision - 11 generic bash modules:
recon,headers,tls,nikto,nuclei,api,auth,injection,ssrf,infra,paths - 14 payload wordlists (sqli, xss, ssti, traversal, and more)
endpoint_groupsYAML config makes modules app-agnostic — modules skip gracefully when keys are absent- Production target gate:
production: truein YAML +# kit:destructivemodule header; destructive modules skipped by default, override with--allow-destructive - Four hook contracts (
preflight,auth-bootstrap,health-check,POST_ITERATION_HOOK) — language-agnostic executables returning JSON ssm_auth.pyfor AWS + SSM auth bootstrap with exact-name user lookup (no accidental real-account auth)tool_exechelper falls back to the hexstrike-ai container when a tool is absent from host PATHscan-for-leaks.shCI hygiene gate — fails on consumer-specific identifiers outside allowlisted files- Kali-based HexStrike Docker image trimmed for web-app testing; multi-arch build published to ghcr.io
- Virtual WebAuthn authenticator (
UVSoftAuthenticator) that closes the upstreamsoft_webauthngap on user-verification, allowing full passkey ceremony probes - Three Claude Code skills (
/pentest,/pentest-align,/pentest-review) for interactive use alongside the harness - Test suite (bash + Python) with Ubuntu and macOS CI matrix; lint workflow covers shellcheck, JSON/YAML, and Python
compileall bootstrap.shidempotent first-run setup;setup.shfor host-side dependencies
# Tech Stack
# Multi-consumer setup
The kit is one clone shared across N app repos. Each consumer keeps its own configuration under <app>/pentest/:
git clone https://github.com/matthewdeaves/pentest-kit ~/pentest-kit
~/pentest-kit/setup.sh
~/pentest-kit/scripts/init-consumer.sh /path/to/your/app
# Edit pentest/targets/example.yaml — set url, auth_mode, endpoint_groups
/path/to/your/app/pentest/pentest.sh run example The generated wrapper resolves PENTEST_KIT_DIR from env, ../pentest-kit, or ~/pentest-kit. Module discovery merges the kit's pentest_kit/scripts/ with the consumer's pentest/scripts/ — consumer wins on collision. App-specific modules (custom auth variants, proprietary endpoint tests) go in pentest/scripts/ and are picked up automatically.
# How a HexStrike round runs
Each round is a fresh Claude session driven by HARNESS_PROMPT.md. Six phases: read briefs and last round's report, pick attack themes and generate a probe script, run it in the container, write a Markdown report, fix any findings via subagent (with a regression test), then update briefs and loop-state.json. The next round picks up from there.
The harness wraps that with a bash loop, a 3-hour timeout per round, and a sleep between rounds long enough for any tested rate-limit budget to reset.
# Security model
The HexStrike container binds to 127.0.0.1:8888 only — never exposed on the LAN. It runs with NET_RAW and NET_ADMIN for raw-socket nmap scans but is isolated from the host through the Docker boundary. The harness uses Claude with --dangerously-skip-permissions, so it should only ever run inside an isolated VM or container where unrestricted tool access is acceptable.
Populated briefs, target configs, exploratory reports, and loop-state.json files are gitignored. Only the *.example.* templates ship with the repo.