Amazon's Kiro deleted production in December 2025. The model didn't malfunction — it executed inside the permissions it had been given. The fix is not a better model. It's an enforcement stack the prompt cannot override. Four layers, executable constraints, no theater.
Most production agents run on intentions nobody wrote down. Here is how to write the behavioral spec — scope, invariants, testable success criteria, and failure modes — that translates business intent into something your infrastructure can enforce.
88% of production agent failures trace to infrastructure gaps — missing context validation, permission boundaries, and execution bounds — not model quality. A diagnostic taxonomy from 591 incidents, with prevention mechanisms ranked by failure frequency.