Case Study
Building an Evaluation Harness with Claude Code
I built an evaluation harness to measure the quality of LLM-generated pixel art sprites. Five rounds of calibration, a stubborn intent-vs-reality gap, and some useful C tools for getting assets onto classic Macs.
Read more