Rapid Prototyping with Claude Code: A 3-Day Challenge

The Challenge

Take a 12,000-line Figma export (72 files, everything dumped into a single component) and turn it into a working prototype with as many features as possible in 3 days. A recipe app with AI features, dual frontends including one that runs on my decade-old iPad's iOS 9 Safari. Day 4 was CI/CD automation. I succeeded.

Here is the repo for Cookie, a recipe management web app with Django backend, React frontend, ES5 interface for legacy browsers, and AI-powered features. See the Cookie project page for more details on what was built, or browse the app screenshots.

This post covers the approach:

Creating a Mega-Plan - exhausting Claude's knowledge in plan mode before writing any code
Making it implementable - using a custom /implementable skill to transform that plan into session-sized chunks that fit context windows
Session-based implementation - the rhythm of clear context, load plan, implement, test, repeat
Iterative QA - logging, researching, and fixing issues as they surface rather than batching them at the end
Building a backlog as you go - capturing future enhancements and ideas as they come up rather than losing them

170

Commits

47%

Backend test coverage

QA issues logged

3+1

Days (prototype + CI/CD)

What came out the other end: 170 commits, 47% backend test coverage as of 11th January 2026, zero package security vulnerabilities flagged, and dual frontends. Still a prototype and it would need a proper security review and load testing before anyone should run it in production. See the code metrics for yourself. They are updated on every commit and I'll continue to work on the project.

Here's what it looks like — desktop, mobile, and running on a decade-old iPad. Click any image for the full view, or see the complete gallery.

Home screen with recently viewed recipes

The Mega-Plan

Start in plan mode. Throw in every requirement: technical constraints, stack choices, features, edge cases, deployment needs. Don't accept the plan. Tell Claude to write it to a file. Then keep asking Claude to find gaps:

"Review the plan and identify any gaps in the requirements"
"What architectural issues might we hit with this approach?"
"What could go wrong with this solution architecture/solution design? Be critical."
"What MVP features is my plan missing that the application would be a solid V1.0?"
"Find an open source library that allows scraping of recipe sites, get the code, analyse it and add a feature with implementation notes to allow a user to search for 'chicken' and get results"
"Organise the plan into v1.0 and v2.0 and beyond features"

I asked Claude to scan my prompt history and pull out prompts I used to build the Mega-Plan. See them here.

💡

Tip

Use AI to refine your prompts. Save a prompt in a text file and ask Claude to "structure the prompt in the prompt.md. Same intent, clearer format."

Don't stop until you've both run out of things to add. Keep hammering. Focus on functional and non-functional requirements for v1.0. Describe user journeys - they help Claude understand how features connect. Log v2.0 ideas but don't let them distract from the core.

My initial PLANNING.md ended up at 2500 lines and 95KB covering Django configuration, API design, database schema, frontend architecture, AI integration patterns, deployment strategy, and edge cases I hadn't considered.

⚠

Warning

This plan is comprehensive but not executable. No single session can hold 95KB in context while implementing. Don't be tempted to tell Claude 'implement my plan' because it will go horribly wrong. The plan needs transformation.

Make The Plan Implementable

My implementable skill checks whether a plan will work in Claude Code's session-based model and suggests improvements where it won't.

Why Session Sizing Matters

LLMs have a fixed context window. As a session grows and approaches that limit, Claude Code compacts the conversation automatically - summarising earlier content to make room. The problem is it decides what to keep based on its own heuristics, and the more you let it compact and keep going, the worse the output gets. In my experience, LLMs do their best work when around 30-40% of the context window contains focused, relevant information, with the rest available for reasoning and response generation. Session-sized chunks sidestep compaction entirely: load one focused phase document, implement it, /clear, repeat. No summarisation guesswork, no degraded output, no attention spread thin across a 2500 line document.

The Seven Criteria

The skill checks seven criteria:

Task Granularity - Are tasks discrete enough to complete in one session?
Dependency Structure - Are foundations built before features?
Instruction Clarity - Are tasks specific with clear success criteria?
Testability - Can each increment be verified?
Context Management - Are related changes grouped sensibly?
Supporting Infrastructure - Does the project have CLAUDE.md, proper file structure?
QA Workflow - Is there a Test -> Log -> Research -> Fix -> Verify loop?

Before and After

The original PLANNING.md had a Build Order section with numbered tasks. Here's what the frontend phases looked like:

### Phase 4: Profile Selector (Both Frontends)
21. Vite/React project setup with Tailwind v4, theme.css from Figma
22. React: API client with fetch
23. React: Profile selector screen
24. Legacy: Base template and CSS (light theme only)
25. Legacy: ES5 JavaScript modules (ajax, state, router)
26. Legacy: Profile selector screen

### Phase 5: Home & Search (Both Frontends)
27. React: Home screen with search bar
28. React: Favorites/Discover toggle
29. React: Basic recipe card component
30. React: Dark/light theme toggle
31. Legacy: Home screen with search bar
32. Legacy: Favorites/Discover toggle
33. Legacy: Recipe card partial
34. React: Search results with source filters and pagination
35. Legacy: Search results with source filters

Fifteen tasks across two frameworks in two phases. No session boundaries. No indication of what to load together or where to pause. Tell Claude "implement Phase 4" and it'll try to hold React setup, API client, profile screens for both frontends, legacy CSS, and ES5 modules all in one context. It won't fit.

After running /implementable iteratively (6 commits), those phases became separate documents. Here's PHASE-4-REACT-FOUNDATION.md:

## Session Scope

| Session | Tasks | Focus |
|---------|-------|-------|
| A | 4.1-4.3 | React setup + profile selector |
| B | 4.4-4.6 | Home screen + recipe cards |
| C | 4.7-4.9 | Search + tests |

<div class="table-caption">Session scope table showing how Phase 4 tasks are distributed across three focused sessions.</div>

## Tasks

- [ ] 4.1 Vite/React project setup with Tailwind v4, theme.css from Figma
- [ ] 4.2 React: API client with fetch
- [ ] 4.3 React: Profile selector screen
- [ ] 4.4 React: Home screen with search bar
- [ ] 4.5 React: Favorites/Discover toggle
- [ ] 4.6 React: Basic recipe card component
- [ ] 4.7 React: Dark/light theme toggle
- [ ] 4.8 React: Search results with source filters and pagination
- [ ] 4.9 Write tests for profile and search API integration

React and legacy are now separate phases. Each phase has a Session Scope table showing exactly which tasks fit in which session. The full phase document includes directory structure, screen specifications, acceptance criteria, and a checkpoint section - everything Claude needs for that phase, nothing from other phases.

Now when you say "Implement Phase 4 Session A", Claude knows exactly what to do. One document, three tasks, verify the checkpoint, done. Next session starts with clean context.

The Iteration Loop

Iteration Loop

/implementable -> read suggestions -> accept improvements -> /implementable -> repeat until "no further changes needed"

📄

What the skill produced

WORKFLOW.md - The operational guide. Session flow, context management, QA procedures.
CLAUDE.md - Project-specific instructions that load automatically.
10+ phase documents - Each phase broken into sessions. Each session designed to complete in one context window.
QA-TESTING.md - The issue tracking template. Where the Test -> Log -> Research -> Fix -> Verify loop lives.

The implementable iteration stopped at commit f5e7b65. Implementation began at commit 4c3bda9.

See Appendix: Commit Tables for the full commit history.

The Implementation Loop

The implementation rhythm is mechanical and could probably be automated:

Implementation Loop

/clear -> load plan -> "Implement Phase X Session Y" -> test -> /clear -> repeat

Each session has bounded scope. There's no context accumulation across sessions. Claude knows exactly what to do because it's in the plan. Failures are isolated to one session.

When something doesn't work, a single prompt can tell Claude to log a QA issue and research the root cause. For larger fixes, Claude can add a new session to the current phase. See QA Loops for the full workflow.

Example: Phase 1 Session A (4c3bda9)

This first implementation session established infrastructure:

Django 5.x with django-ninja API framework
Docker Compose with nginx reverse proxy
pytest configuration with coverage reporting
Health check endpoint at /api/health

Infrastructure before features. A solid foundation for everything that follows. So far, so good.

Over 3 days of implementation, I ran 38 sessions across 100-ish commits, with 64 QA issues logged and resolved along the way. Each session was a focused burst of work with clean context boundaries. Day 4 was dedicated to CI/CD automation.

Each commit maps to a single session. See Appendix: Commit Tables.

Interrupting Without Breaking Flow

Start a message with btw when Claude is mid-task. It queues your input without interrupting the current work. Useful for adding phases/sessions, future enhancements, or altering the todo list while Claude is working.

Examples:

Feature idea mid-implementation:

btw the settings page should also be where i can manage the openrouter
api key... at least as one option to manage that through a nice front end.
it can get saved to the backend .env or whatever is best/secure

Updating the plan mid-session:

btw we also want to make this a bit more production ready out of the box
like if this is background task etc running.. it should all be setup
to be running etc on container start update the plan with that as a phase too

QA Loops

QA happens during implementation, not after:

QA Flow

Test on device -> Describe issue conversationally -> Claude logs it -> Claude researches -> Claude proposes fix -> Implement -> Verify

A real prompt from my workflow:

btw log a qa issue for legacy.. recipe 62 is a remix recipe..
on creation the tips tab said tips were being generated..
then it seems to time out.. are cooking tips supported for
remix recipes properly? log the new issue and do research
for it and update the docs

One prompt triggers the full cycle: log issue, research code, update docs. No context switching to a bug tracker, no templates, no hunting for the right repo on GitHub.

You can kick off a QA round at any point:

i want to do a round of manual qa on the remix feature. create the
documentation i need for that following the patterns we have used
previously in plans folder and lets start

monitor the app and debug logs i am browsing the site..
keep monitoring and build a list of issues/improvements based on
what you find in the logs

Claude creates a QA-041 entry (or whatever the next number is), researches the codebase to understand the problem, and updates the QA documentation, all in one pass.

Here's the output of asking Claude What's the status of QA:

QA status table showing Claude tracking issues — Output of asking Claude "What's the status of QA" - all issues tracked in markdown

I could write a skill that uses the GitHub CLI to log issues, but markdown files in the repo work better for Claude. The QA docs load instantly with no API calls eating context. Research notes, status updates, and fix documentation all live in one place that Claude can read and update directly.

For a detailed example, see the QA-009 deep dive: a 4-session feature that grew to 10, with 7 bugs discovered (3 critical), and code that broke on iOS 9 but worked on desktop. About 4 hours of focused work.

When testing surfaces issues, tell Claude to log a QA issue. If it's a feature, add a session or phase to the plan. In a fresh session, ask Claude to research the issue before implementing.

The phase/session structure handles scope growth. When a phase reveals something that affects later work, Claude can update the remaining plans before you continue.

After logging several issues, I asked Claude to find connections:

check the current status of QA and see if there are related issues
that it may be wise to implement a fix for at the same time,
but only if they are strongly related - review the research

Claude analysed 6 open issues and found that QA-031, QA-032, and QA-033 should be batched. They all relate to AI-powered recipe scaling.

🛈

Why batch these issues

QA-031 + QA-032 + QA-033 form a cohesive group. All deal with scaling service improvements:

QA-031 (High priority) - When ingredients scale, instructions should update too
QA-032 - Cooking times should adjust for larger batches
QA-033 Phase 2 - Tips should regenerate with scaling context

Same files modified, same migration pattern, same API changes. Avoids doing 3 separate migrations when 1 would suffice.

Researching the codebase and applying a fix for all three was done with a prompt to 'research, document and fix qa-031', resulting in one migration instead of three and a single commit with the work completed in 1 context window.

Claude finding related issues to batch — Claude identifying related QA issues that should be fixed together

Building the Backlog As You Go

Ideas come up while you're implementing something else. Without a place to put them, you either get distracted or lose them, but they can easily be logged.

The implementable skill creates a FUTURE-ENHANCEMENTS.md file with a simple structure: ID, summary, priority, complexity. When an idea surfaces mid-session, I just tell Claude to log it:

log a future enhancement for multi-selection on remix suggestions

Claude adds FE-006 to the backlog with a description, doesn't break flow, and we keep going.

Over the 3 days of prototyping, 15 enhancement ideas got logged. Some were quick wins that got implemented same-day (FE-007 nginx in production container, FE-011 nutrition label formatting). Others are genuine future work (FE-003 OAuth login, FE-009 AI meal pairing).

What Went Well

Working prototype in 3 days - functional app with both frontends serving recipes, challenge completed
Thorough QA documentation - root cause analysis, file/line references, solution options, and acceptance criteria
Solo development - one person, no coordination overhead
Full feature set - multi-site recipe search, AI integrations for scaling and tips, dual frontend including iOS 9
Early containerisation - Docker setup from the start meant testing on actual devices (including the old iPads) as features were built, catching real-world issues immediately
CI/CD from day one - tests on every commit, multi-arch Docker images pushed to Docker Hub on tag
Day 4 CI/CD automation - a single day to set up automated releases, software quality metrics dashboards, and the full 15-job pipeline. Tools for code coverage, maintainability, security scanning, and multi-arch Docker builds - all configured in one focused session
Metrics dashboard - live code quality tracking updated on every commit
Clean commit history - each session produces focused, human reviewable commits
Works with cheaper models - I used Opus 4.5 for most sessions, but accidentally ran Haiku for several. The focused session structure means smaller models can follow the plan

What Didn't Go Well

Figma export wasn't preserved - Claude took much of the Figma design but lost details like the RecipeCard's play button and date display, and gained features not in the original design (remix badges, source attribution). Each session didn't have full Figma context, so drift accumulated.
Documentation doesn't guarantee retrieval - The project has detailed iOS 9.3.6 Safari compatibility docs, but Claude doesn't always find them. QA-036 took a few attempts debugging date parsing when the answer was already documented.
Frontend test coverage is low - The implementable skill checks for "test creation alongside implementation" but doesn't enforce coverage per layer. The legacy ES5 frontend has zero tests.
64 QA issues logged - Claude doesn't write perfect code first time. Every session produced working code that then needed fixes when tested on real devices.
Inconsistent output formatting - Claude gives different responses to common questions like "what is the status of QA?" Minor annoyance, but worth noting.
Performance could be optimised - Perhaps due to my prompting or the resulting code, the search feature that scrapes 15 sites in parallel could be faster. A future post will cover a code review of what Claude generated and potential optimisations.

What's Actually There

Metric	Value
Total commits	170
Backend files	35 Python files
Frontend files	29 TypeScript/TSX files
AI features	11 integrated prompts
Recipe sources	15 curated sites
QA issues logged/resolved	64 / 61
Phase plans	16 focused documents
CI jobs	15-job pipeline
Legacy support	iOS 9 with dedicated ES5 interface

Code quality metrics (from the CI dashboard, updated on every commit):

⚠

Take these metrics with a pinch of salt

I built this in 4 days and haven't had time to properly review all the metrics yet. Future posts will dig into these numbers, evaluate how well Claude actually did, and answer the question: could this prototype make it to production?

Metric	Backend	Frontend
Test coverage	47%	15.71%
Security vulnerabilities	0	0
Code duplication	4.14%	1.54%
Maintainability index	86.14/100 (A)	-
Cyclomatic complexity	2.79 (A)	9 warnings (C)

The backend scores well on maintainability and complexity. Claude tends to write straightforward code when the task is well-scoped. Frontend coverage is low because I prioritised feature breadth over test depth during the prototype phase.

Features:

Multi-site recipe search (parallel across 15 sources)
Recipe import with automatic extraction of image and data
AI recipe remix with variation suggestions
Serving size scaling (AI-powered with caching)
Cooking mode with timers and audio alerts
Dual frontends: React SPA + iOS 9 ES5

Try It Yourself

Clone the repository and start where I started:

git clone https://github.com/matthewdeaves/cookie.git
cd cookie
git checkout ff4a5cbb4632349dd2d997aa59d315fd2e6f0b05

You're now looking at 12,416 lines of Figma export and a plan. From here:

Ask Claude to install the implementable skill
Use the /implementable skill, just ask Claude is my plan implementable?
Accept suggestions and iterate until the skill has nothing more to add
Start implementing with Implement Phase 1 Session A

Or try a QA session:

git checkout 9044735

See how the research -> plan -> implement -> verify cycle works in practice.

Use these prompts:

"what is the status of qa"
"research qa-004"
"implement qa-004"

Here's what Claude shows when you ask for the QA status on that commit:

The commit hashes are below if you want to compare your own approach.

Appendix: Commit Tables

Supporting Documentation

App screenshots
Desktop, mobile, and legacy iPad interface gallery
Implementable iteration commits
How the plan evolved from monolith to phases
Implementation session commits
Each commit mapped to a session
QA resolution commits
The QA loop in action
Example prompts
Raw and refined prompts from the project
QA-009 deep dive
A 4-session feature that grew to 10

The Challenge

The Mega-Plan

Make The Plan Implementable

Why Session Sizing Matters

The Seven Criteria

Before and After

The Iteration Loop

What the skill produced

The Implementation Loop

Interrupting Without Breaking Flow

QA Loops

Batching Related Issues

Building the Backlog As You Go

What Went Well

What Didn't Go Well

What's Actually There

Try It Yourself

Appendix: Commit Tables

Supporting Documentation