Rockport: A Self-Hosted LLM Proxy

Rockport is a self-hosted LiteLLM proxy running on a single EC2 instance behind a Cloudflare Tunnel. It gives me cost-controlled access to AWS Bedrock models for Claude Code, image generation, and video generation. The whole thing is Terraform-managed, auto-stops when idle, and runs on a t3.small. The repo is public and I'd welcome feedback from anyone who does platform or security work professionally. If you spot something or have suggestions, raise an issue, I'd love to hear about it.

#Why I built it

At work we've built infrastructure to guarantee data residency for LLM traffic in the EU. That's where I learnt you could point Claude Code at your own endpoint. I wanted to try different models like Qwen through Claude Code to see how they got on with Classic Mac programming in C. The Bedrock API has its own format, and I wanted the budget controls and key management that LiteLLM provides, so I needed it to sit in between and translate.

Once that was working, I figured it would be nice to have my own endpoint for everything else LiteLLM and Bedrock offer. Per-key budgets so I could hand out API keys to friends without worrying about a surprise bill, rate limiting, model restrictions, that sort of thing.

Then I got into a side project involving sprite animation. That project uses all sorts of AI models: text, image generation, image editing, video, and mixtures in between like image-to-image and image-to-video. As the side project grew, Rockport grew with it. I'll post about that project when it's ready, but some fun stuff is coming soon.

I also wanted to learn. Building my own infrastructure properly, with Terraform, Cloudflare Tunnels, and systemd hardening, was a good way to experiment. Making it public seemed like the fastest way to find out what I don't know.

⚠

Yes, Claude has access to my production infrastructure

Claude Code can create and revoke API keys, push config changes, start and stop the EC2 instance, and run commands on it via SSM (AWS Systems Manager). It is my DevOps engineer for this project. I'm aware of the risks and I'm comfortable with them for a personal service. This post is about the infrastructure itself, the Terraform, the security model, the architecture. If you want to tell me that giving an LLM access to production is dangerous, I already know. I'm more interested in feedback on the Terraform, the security layers, and the architecture.

#Architecture

Everything runs on a single t3.small (2GB RAM, 2 vCPU). The bootstrap script sets up PostgreSQL 15 (tuned for the instance size: 64MB shared_buffers, 30 max connections), installs LiteLLM and the Prisma client it uses to manage its PostgreSQL schema, deploys the video sidecar, and configures Cloudflared. A 512MB swap file gives some headroom. The bootstrap script is gzip-compressed and base64-encoded by Terraform to fit within EC2's 16KB user_data limit, with configuration files and sidecar code injected via templatefile() substitution.

The model configuration has nine chat models (Claude Opus, Sonnet 4.6, Haiku 4.5, DeepSeek V3.2, Qwen3-Coder-480B, Kimi K2.5, and three Nova models), five image generation models, and thirteen Stability AI image editing models routed through LiteLLM's native /v1/images/edits endpoint. There are aliases so older Claude Code configs that still reference 4.5 model IDs keep working. The video sidecar adds Nova Reel and Luma Ray2 for video generation, plus three Nova Canvas image operations (variations, background removal, outpainting) that LiteLLM doesn't cover.

Each user gets a virtual API key with a daily budget, rate limits (requests per minute and tokens per minute), and optional model restrictions. The admin CLI handles key creation, spend monitoring, config pushes, and instance lifecycle. Running ./scripts/rockport.sh key create matt --budget 5 --claude-only creates a key restricted to Anthropic models with a $5/day cap and generates a Claude Code settings file you can drop straight into your project.

Every service runs under its own systemd unit with hardening directives: read-only filesystem, no capabilities, private tmp and devices, restricted syscalls, namespace and privilege controls, and memory caps (1280MB for LiteLLM, 256MB each for the video sidecar and Cloudflared). LiteLLM and the video sidecar run as a shared litellm user, Cloudflared as its own cloudflared user, both non-root with no login shell.

A Lambda function checks EC2 NetworkIn and CPUUtilization metrics every 5 minutes and stops the instance after 30 minutes of inactivity. Both signals must be below their thresholds (500KB network, 10% CPU) before the instance is stopped. The 500KB network threshold is enough to distinguish real traffic from Cloudflare Tunnel keepalives (~6KB/min). There's a 10-minute grace period after boot so the instance doesn't shut itself down during bootstrap. When I need it again, ./scripts/rockport.sh start brings it back up.

Daily EBS snapshots run at 03:00 UTC with a 7-day retention via AWS Data Lifecycle Manager. There's a CloudWatch auto-recovery alarm that recovers the instance on system status check failure, preserving the instance ID, private IP, and EBS volumes. AWS Budgets alert at 80% and 100% of both a $10/day Bedrock threshold and a $30/month overall threshold. CloudTrail logs all management API events to a dedicated S3 bucket with 90-day retention for audit. The database password is stored in SSM Parameter Store as a SecureString. On first boot, the bootstrap script generates a random password and writes it to SSM. If the instance is restored from a snapshot, it reuses the existing password so recovery works without any manual intervention.

Terraform state goes into an S3 backend with encryption and lock file support. A pre-commit hook runs gitleaks, ShellCheck, and terraform fmt on staged changes before every commit. The CI pipeline runs Terraform validation, ShellCheck, Gitleaks, Trivy, and Checkov on every push. Each scanner has its own config file (.checkov.yaml, .gitleaks.toml, .trivyignore) with skip rules that document why each finding is accepted rather than fixed. The sidecar's Python dependencies use a hash-pinned lock file generated by pip-compile, so installs fail if any package has been tampered with. There's a deploy workflow ready to go that uses GitHub OIDC for AWS authentication (no long-lived access keys), runs terraform apply, then 35 smoke tests covering auth, model listing, streaming, image generation, and video submission. I haven't switched to using it yet, but I deploy manually with the same steps and I'll move to the workflow eventually.

#Request data flow

No inbound network access to the EC2 instance. The security group has zero ingress rules. All traffic comes through a Cloudflare Tunnel, which maintains an outbound-only HTTPS connection from the instance to Cloudflare's edge. The instance has a public IP (default VPC) but nothing can reach it.

A Cloudflare Access application requires a valid service token on every request before traffic reaches the tunnel. A WAF ruleset blocks everything except specific API paths: inference endpoints (/v1/chat/completions, /v1/messages, /v1/models), image and video endpoints, admin CLI paths, and /health. The admin UI is disabled in LiteLLM's config, the OpenAPI/Swagger docs are suppressed via environment variables, and /health/readiness is blocked at the WAF because it leaks the LiteLLM version.

The tunnel routes traffic by path: /v1/images/generations* and /v1/images/edits* go to LiteLLM on 127.0.0.1:4000, then /v1/videos* and /v1/images/* go to the sidecar on 127.0.0.1:4001, and everything else falls through to LiteLLM. Cloudflared doesn't listen on a port itself, it's an outbound client that forwards tunnel traffic to those local services. Even if the security group were misconfigured, nothing is listening on a public interface.

IMDSv2 is enforced with a hop limit of 1 to prevent SSRF-based credential theft from the Instance Metadata Service. All secrets (master key, tunnel token, database password) live in SSM Parameter Store as SecureStrings, and environment files on the instance are written with umask 077. The IAM instance role is scoped to Bedrock invocation, three named SSM parameters under /rockport/, the video S3 buckets, and Marketplace subscriptions for third-party models. The deployer IAM policies are split across three files to stay under AWS's 6,144-character per-policy limit while keeping every action explicit.

#The video sidecar

Bedrock's video generation models (Nova Reel and Luma Ray2) use an async pattern: you submit a job, it renders to S3, and you poll for completion. LiteLLM doesn't support this. I built a FastAPI sidecar to handle it, along with a few Nova Canvas image operations (variations, background removal, outpainting) that LiteLLM doesn't cover.

For video, the sidecar validates requests per model (Nova Reel: 1280x720 fixed, 6-120 seconds in multiples of 6, supports multi-shot sequences up to 20 shots. Ray2: 540p or 720p, 5 or 9 seconds, flexible aspect ratios). It checks the user's remaining budget against the estimated cost ($0.08/second for Nova Reel, $0.75-1.50/second for Ray2), enforces a concurrent job limit (default 3 per key), submits the job to Bedrock, and stores the tracking data in PostgreSQL. Two S3 buckets in different regions (us-east-1 for Nova Reel, us-west-2 for Ray2) store the video output with a 7-day lifecycle policy.

The image API handles three Nova Canvas endpoints: image variations, background removal, and outpainting. These use Bedrock's native API in ways that LiteLLM doesn't support. Text-to-image generation and the thirteen Stability AI image editing operations (structure, sketch, style transfer, inpainting, erasure, search-and-replace, search-and-recolor, upscaling, and more) all go through LiteLLM directly.

Authentication works by taking the user's Bearer token, hashing it with SHA-256 (matching LiteLLM's key storage format), and validating it against LiteLLM's /key/info endpoint. When a video job completes, the sidecar generates a presigned S3 URL (1-hour expiry) and logs the spend directly to LiteLLM's spend tracking tables so video costs show up alongside chat costs.

#What Claude found

I pointed Claude at the full codebase and asked it to review everything from a security and platform perspective. I used Anton Babenko's terraform-skill for Claude Code, which gives it better Terraform knowledge than it has out of the box. I've already fixed several things it flagged: Cloudflare Access for edge auth, IAM escalation path restriction, SCRAM-SHA-256 password authentication for PostgreSQL, additional systemd hardening directives, CPU utilisation as a second idle-stop signal with a Lambda error alarm, and an advisory lock for a time-of-check-to-time-of-use race on video job counts. Below is everything I haven't fixed, and why.

Security

The deployer IAM policies allow creating new security groups on * because AWS doesn't support resource-level permissions for CreateSecurityGroup. Mutating actions are scoped to rockport-tagged resources, but the initial create is broad. Nothing I can do about that one. Same story with ListAsyncInvokes in the IAM policy, it requires Resource: "*". It's read-only and only called by the health check, so the exposure is minimal.

Platform

The video sidecar calls LiteLLM's /key/info endpoint over HTTP on every request to validate API keys. Claude suggested reading the LiteLLM_VerificationToken table directly from PostgreSQL (the sidecar already has a DB connection) or caching keys with a short TTL. Both would be faster, but coupling to LiteLLM's schema means things could break on upgrades. At my traffic levels the extra round-trip doesn't matter, so I've kept it simple.

The database is a single PostgreSQL instance on the EC2 box with no replication. If the EBS volume dies between daily snapshots, I lose up to 24 hours of data. For what's in there (key metadata, video job tracking) I can recreate everything from Terraform and the admin CLI. Not worth paying for RDS.

SSM SecureStrings, EBS volumes, and S3 buckets all use AWS-managed encryption keys rather than customer-managed KMS keys. The data is ephemeral (videos auto-delete after 7 days) and the extra cost isn't justified. There's also no S3 access logging on the video buckets, so I can't audit who downloaded what via presigned URLs. The URLs expire after 1 hour and the files delete after 7 days, so I'm fine with that for now.

The smoke tests cover the happy path and auth, but there's nothing for concurrent load, slow Bedrock responses, or partial failures. A handful of people use it now, so that's something I should get round to.

Maintenance

LiteLLM 1.82.3 and Cloudflared 2026.3.0 are pinned for stability. I should set up Dependabot or a scheduled CI job to flag new releases, but haven't got round to it.

The config only offers the latest Anthropic models, so there are aliases that map old model IDs (like claude-sonnet-4-5-20250929) to the current 4.6 versions. Without these, anyone whose Claude Code settings still reference 4.5 would get a model-not-found error. When Anthropic releases new versions, the aliases need updating to match.

What Claude said is fine

The defence-in-depth layering checks out. The security group (network), Cloudflare Access (edge authentication), Cloudflare Tunnel (transport), WAF (application), and localhost binding (host) all operate at different points in the stack. They're genuinely separate layers, not redundant.

PostgreSQL without TLS on localhost is fine. Traffic never leaves the kernel's loopback interface. If an attacker has root, TLS wouldn't help anyway since they could read process memory or the env file. A 32-character random hex password with 0600 permissions would be incredibly time-consuming to brute-force. PostgreSQL uses scram-sha-256 authentication.

#Feedback

The repo is public. If you're a platform engineer, security architect, or anyone who works with AWS and Terraform day to day, I'd genuinely like to hear what you think. If something looks wrong or there's a better way to do it, open an issue.