KV cache, explained without the buzzwords
Why KV cache is the single biggest lever on LLM cost and latency, and what changes when you enable it on your own models.
Short, technical write-ups on what I'm learning while shipping AI products — agent architectures, inference internals, build logs, and the small field notes worth keeping.
Why KV cache is the single biggest lever on LLM cost and latency, and what changes when you enable it on your own models.
A field guide to the three agent shapes I keep reaching for in production — task graphs, ReAct loops, and supervisor + worker pools.
How the design analyser, planner, and code agent talk to each other, and why E2B sandboxes were the unlock that made it ship.
Patterns from shipping a bot that automated 80% of conversations — qualifying questions, fallback scripts, and the small UX details that change the close rate.
Posts go live as I publish them. Want a heads-up? Drop me an email and I'll send the first edition when it ships.
Tell me what you're building. I reply within a working day with a clear scope, timeline, and price — usually a working prototype in 2 weeks.