About

A newsletter for the engineers behind the demo.

First Token is a weekly deep dive on production AI engineering — written for backend engineers who ship the systems that make LLM features actually work.

Not the demos. Not the model news. The infrastructure underneath: retrieval, queues, caching, observability, evals, the messy edges where everything either holds together or quietly drops user requests at 3am.

Why this newsletter exists

The AI content landscape is loud and shallow. Most of what gets published is either model launch coverage, prompt engineering hot takes, or "I built an AI agent in 30 lines" tutorials that work beautifully in a demo and fall apart the moment real users touch them.

Meanwhile, the engineers actually shipping LLM features in production are dealing with problems that don't get written about: stale retrieval, semantic cache invalidation, queue backpressure under variable model latency, eval drift, observability for non-deterministic systems, multi-stage agent loops that fail silently.

First Token is for those engineers. Every issue takes one real problem, breaks it down with a debug story or architectural decision, and ends with a runnable snippet you can adapt to your own systems.

What you can expect

  • One deep dive every Tuesday. Roughly 1,800 words. One diagram. One snippet. No filler.
  • Production focus. If it only works in a notebook, it doesn't ship here.
  • No model launch coverage. No "X just dropped Y" posts. Those are everywhere. This newsletter is about what you build with them.
  • No AI-influencer takes. No hype, no doom, no thread-style summaries of someone else's work.

Get in touch

Have a topic request, a failure mode story worth covering, or just want to argue about retrieval architecture? hello@firsttoken.dev

Production AI engineering. One Tuesday at a time.

Free · weekly · unsubscribe anytime

Check your inbox.

We just sent a confirmation link. The cheatsheet lands right after you confirm.