How to apply semantic versioning and consumer-driven contract testing to AI agent system prompts — treating prompts as versioned API contracts with explicit breaking change classification, agent manifests, and CDC-style registration for multi-agent production systems.
Amazon's Kiro deleted production in December 2025. The model didn't malfunction — it executed inside the permissions it had been given. The fix is not a better model. It's an enforcement stack the prompt cannot override. Four layers, executable constraints, no theater.
Most production agents run on intentions nobody wrote down. Here is how to write the behavioral spec — scope, invariants, testable success criteria, and failure modes — that translates business intent into something your infrastructure can enforce.