The part nobody writes about is what happens after the model works.
Strategy, data, platform, automation, and governance — ordered by what practitioners are reading right now.
Your team codes 3x faster with AI tools, but lead time is up and deployment frequency is flat. The structural reason, and the four pipeline changes that actually fix it.
Karpathy’s four coding-agent principles are useful, but production agents need scoped edits, test-gaming controls, trust-boundary calibration, and calibrated reporting.
Detection tells you something is wrong. The four-step diagnostic pipeline — behavioral telemetry, failure clustering, root cause attribution, eval generation — tells you what failed, why, and how to stop it from shipping again. Most teams build partial detection and stop there.
Most teams architect for capability and optimize for cost after the invoice lands. Here is the playbook for building cost constraints in from day one: task profile audits, three-tier routing, and synthetic benchmarking before your first deploy.
A deep teardown of the production CMS pipeline that turns GitHub Issues into merged PRs while you sleep. 10 workflows, 6 issue types, AI media generation, and the exact DAGs that make it work.
Most production agents run on intentions nobody wrote down. Here is how to write the behavioral spec — scope, invariants, testable success criteria, and failure modes — that translates business intent into something your infrastructure can enforce.
Failure modes, deployment decisions, and operating patterns from teams actually shipping. No hype, no theory.
Viktor Bezdek, VP Engineering, Groupon
Building and writing about AI-native engineering from inside a real organization. These notes are what I wish I'd had when I started.