First Token — production AI engineering, one Tuesday at a time

What you actually get

01 / Production-grade

Tactics from real shipping, not blog posts about blog posts.

Every issue starts with a real failure mode, debug story, or architectural decision — drawn from systems serving actual users. If it works in a demo, it doesn't ship here.

02 / Backend-first

Written for engineers who ship the systems behind the demo.

APIs, queues, retrieval, caching, observability, evals. The boring infrastructure that makes LLM products actually work in production — not prompt engineering hot takes.

03 / One per week

A single deep dive every Tuesday. No filler, no daily noise.

Long enough to be useful, short enough to read on the commute. Roughly 1,800 words, one diagram, one runnable snippet per issue. That's the contract.

Recent issues

5 of 6 · updated weekly

#006
Jun 23, 2026

Hybrid search: combining BM25 with embeddings

Pure vector search misses exact-match queries.

· deep dive

#005
Jun 16, 2026

Reranking 101: the 50ms layer that decides whether your RAG works.

Your retriever gives you 50 results. Reranking turns the top 5 useful. The difference between a RAG system that answers customer questions and one that hallucinates with confidence is rarely the...

· deep dive

#004
Jun 9, 2026