In December 2025, Amazon's Kiro coding agent deleted an AWS Cost Explorer production environment. Thirteen hours of outage.[1] The model did not malfunction. It analyzed the objective, picked the most direct path, and executed using the permissions it had been given — full write access to production infrastructure.[2] The two-person approval gate that protected human-driven changes was not part of the agent's authorization path. The deletion completed faster than any human could read the confirmation prompt.
Engineering teams spend months tuning model quality. Better prompts. Newer versions. More elaborate reasoning chains. Meanwhile every documented production agent failure traces back to the same place: infrastructure. Permissions scoped too broadly. Guardrails written into system prompts. Monitoring that watches outcomes after the fact instead of actions in flight.
Model improvement does not fix this. A better model with the same permissions makes the same mistake faster. The fix is an enforcement stack — executable constraints that hold under adversarial inputs, that survive prompt injection, that the agent cannot reason its way around because they live outside its reasoning context entirely. This is the infrastructure layer. It is the only layer that ships.
Faster than any human could read the confirmation prompt
Primary driver: infrastructure gaps, not model quality (Gartner, 2025)
Identity, policy enforcement, bounded execution, audit with replay fidelity
When the Model Works and Production Still Burns
Model quality and infrastructure safety solve different failure classes. Confusing them is how production goes down.
The Kiro incident is not a story about a bad model. It is a story about a structural gap. The agent did exactly what a capable agent should do — analyzed the objective, identified the most direct path, executed. The gap was between what the agent was supposed to do and what it was permitted to do. Those two were never reconciled.[3] A safeguard that existed for human engineers was simply not in the authorization path for AI agents.
The pattern repeats across every documented production incident. An agent with read-write access to customer records applies a bulk operation meant for test data to live records. An automation agent with deployment permissions ships during a code freeze. A support agent authorized to send emails interprets an edge case and messages an entire distribution list instead of one contact.
In every case the model reasoned toward something plausible. The infrastructure handed it the tools to act before anything could catch the error. A newer model with identical permissions would make the same mistake — or a more sophisticated version of it. Gartner forecasts that more than 40% of agentic AI projects will be cancelled by 2027.[5] The driver is not model quality. It is the gap between what agents are permitted to do and what they should be permitted to do.
Safety rules written into the system prompt
"Never delete production" is a sentence, not a block
Agent inherits its developer's permissions and ships unchanged
Guardrails tested only on the happy path
Monitoring confirms outcomes after the action completed
A policy enforcement point intercepts every tool call before it executes
Deletion of production resources is blocked at IAM, not advised in the prompt
Agent permissions are scoped to its function and audited before deployment
Guardrails proved against adversarial inputs and the failure cases that actually break things
Monitoring captures every action in flight, every decision logged
Four Layers. Each Catches a Different Failure Class.
Each layer absorbs what the layer above it lets through. None of them is optional.
Agent safety is not a guardrail. It is a stack of four control surfaces, each designed against a specific failure class. Deployed together, they survive adversarial inputs, developer mistakes, and the edge-case reasoning that even capable models will produce. Deployed separately, they create the appearance of safety while leaving production exposed.
- 1
Layer 1: Scoped Identity and Credentials
Every agent run authenticates with a dedicated service account scoped to that agent's function. Not the engineer's credentials. Not a shared team API key. The question 'which identity executed this action?' must always have a specific, auditable answer. Agents that inherit broad developer permissions during prototyping carry that access into production unchanged. Per-tool scoped credentials keep one misbehaving or compromised agent from becoming a blast radius across your entire system.[7]
- 2
Layer 2: Policy Enforcement at the Tool Call Layer
A policy enforcement point intercepts every tool call before execution. It evaluates the proposed action against a rule set that lives in code, not in the system prompt. Allowed actions proceed. Blocked actions return a structured error. High-risk or irreversible actions route to a human approval queue. The rule set is declarative, enforceable at runtime, version-controlled alongside deployment configuration.[6] If it lives in the prompt, it is not enforcement.
- 3
Layer 3: Bounded Execution
Every agent run carries explicit ceilings: a step cap, a wall-clock deadline, a cost budget. When any ceiling is hit, the orchestrator halts and escalates. It does not retry. Infinite loops and runaway executions are the most expensive failure mode in agentic systems because they compound silently until someone checks the billing dashboard.[8] Cost is observability.
- 4
Layer 4: Audit Trails with Replay Fidelity
Logging that records 'agent succeeded' is not observability. It is an alibi. Effective audit trails capture every tool call, the inputs and outputs at the moment of execution, the identity context, and whether the action was approved, blocked, or escalated. The standard to meet: given any production incident from the past 90 days, you can reconstruct exactly what the agent did, in what order, with what authorization.[4] If you cannot replay it, you cannot debug it.
agent-policy.yaml# One file per agent role. Permissions in code, not in the prompt.
agent:
id: "support-agent-v2"
identity: "support-agent@myco.iam"
tools:
allowed:
- name: "read_customer_record"
scope: ["customer_id", "order_history", "contact_info"]
- name: "update_order_status"
constraints:
max_records_per_run: 10
allowed_statuses: ["cancelled", "refunded", "processing"]
- name: "send_customer_email"
requires_approval: true
approval_timeout_seconds: 300
auto_deny_on_timeout: true
blocked:
- "delete_customer_record"
- "access_payment_raw"
- "bulk_send_email"
execution:
max_steps: 25
max_tokens_per_run: 50000
max_wall_clock_seconds: 120
max_cost_usd: 2.00
on_budget_exceeded:
action: "halt_and_escalate"
notify: "#oncall-agents"Permission Drift: The Default State of Any Agent Without an Owner
The pattern that shows up in almost every production agent incident review.
Here is the failure pattern that shows up in almost every production agent incident review: the agent's permission scope was set during development, never formally audited before deployment, and was significantly broader than its actual operational requirements.
During development, engineers add tools to unblock themselves. The agent needs to search — add the search tool. The agent needs to write to staging — grant write permissions. The agent needs to test a deletion flow — grant delete permissions temporarily. Development ends. Hardening begins. The permissions never get cleaned up because nobody explicitly owns the cleanup. Temporary becomes permanent through inaction.
By production, the agent has accumulated access across a wide swath of internal API surface it was never meant to touch.[4]
Teams that catch this run monthly automated audits: every permission granted to an agent gets compared against the tool call logs from the previous 30 days. Anything not exercised is a candidate for removal. This is not just security hygiene. Agents with tighter permission scopes have smaller failure surfaces when the model reasons toward an edge case it was not designed for. The audit also surfaces tools that were added during exploration but never actually needed in production — drift that exists because someone was trying things, not because the agent's function requires them. Drift is the default state of any system without an explicit owner.
The Other Failure Mode: Approval Fatigue
Over-permissioning kills production. Over-gating kills the automation. Both end the same way.
The symmetric failure is equally real. Teams that gate every agent action quickly discover that humans stop reviewing them. Approval fatigue sets in fast. When every minor action requires a human decision, the operational overhead erodes the value of automation until humans begin auto-approving without actually reading. The gate is still running. It has stopped enforcing anything.
The calibrated approach treats autonomy as a graduated trust model — earned through demonstrated operational track record, not assumed at the start.[9] Every new workflow starts at the most conservative level. Graduation to more autonomy is explicit and review-driven, never automatic. The thresholds below are guidelines, not rules. Your operational context and risk tolerance set what evidence is sufficient to justify graduation.
| Stage | Trigger | Autonomy Mode |
|---|---|---|
| New workflow | Default start | Gate every action. Agent proposes, human approves before execution. |
| Established workflow | 200+ clean completions, zero incidents | Exception-based. Agent acts. Escalates only on uncertainty or policy boundary. |
| Mature workflow | Low-risk, high-volume, decision criteria well-understood | Audit-based. Agent acts. Humans review on a schedule, not in the path. |
The Test That Separates Real Architecture From Theater
One question. If you can answer 'no,' you have enforcement. If you cannot, you have decoration.
The practical test for whether a guardrail is real:
Can a crafted user input cause the agent to violate it?
If the answer is yes, it is a convention, not a constraint. Prompt instructions can be overridden.[6] Configuration files that agents read can be manipulated through prompt injection. The only constraints that hold under adversarial conditions live outside the agent's reasoning context: enforcement at the tool call layer, network-level blocks on unauthorized egress, IAM restrictions on what credentials can actually do.
Teams build elaborate system prompts with detailed safety instructions, then watch a single adversarial input route around every one of them in one exchange. The instructions were real. The enforcement was not.
The distinction matters most for irreversible actions. Sending external communications, deleting data, modifying production infrastructure — anything that cannot be rolled back needs enforcement that lives outside the prompt. Human approval gates are one mechanism. Policy engines that block the tool call before execution are another. Both are infrastructure. Neither is a sentence in a system prompt.
The models are capable. The infrastructure gap is where production incidents happen.
Pre-Production Safety Checklist for Agents
Dedicated service account per agent — not a shared team credential
Permission scope documented and limited to the agent's actual function
Blocked operations list explicit and enforced at the infrastructure layer, not in the prompt
Policy enforcement point intercepts every tool call at runtime, before execution
Execution bounds configured — max steps, max wall-clock time, max cost
Approval gates in place for every irreversible action
Audit log captures tool name, inputs, outputs, identity context, approval status
Kill switch implemented and tested — not just implemented
Permission scope audited after development, before production deployment
Escalation path names specific contacts, not just a Slack channel
Doesn't a good system prompt handle most safety requirements?
System prompts shape model behavior. They do not enforce it. A 'never delete production data' instruction can be overridden by a crafted input that convinces the model the rule does not apply in the current context. Infrastructure-level controls — policy engines that block the tool call before it executes — cannot be prompted away. Use prompts for behavioral guidance. Use infrastructure for enforcement. The Kiro incident is the clean example: the model had no instruction to avoid deleting production. It had permission to do so.
How do we avoid approval gate fatigue?
Classify actions by reversibility and risk, not by frequency. Read-only and easily reversible actions run without approval. External communications, production data mutations, every deletion require approval. New workflows start with gates on everything. Specific action types graduate to exception-based escalation only after 200+ clean completions. Graduation is explicit, reviewed, scoped to the action type that earned the track record — not the agent as a whole.
What is the minimum viable safety stack for a first production agent?
Four things. A dedicated scoped identity. A list of explicitly blocked operations enforced at the infrastructure layer, not in the prompt. Execution bounds — step cap, time deadline, cost ceiling. An audit log that captures every tool call with inputs and outputs. A human approval gate on every irreversible action is not optional. It is the one control that catches model reasoning errors that no other layer will catch before they execute. Everything else can be added incrementally.
How does permission drift happen and how do we stop it?
Permissions are added during development to unblock engineers. Delete permissions for a testing scenario. Read-write granted for a one-off task. Search tools added and never removed. By production, the agent has accumulated access nobody explicitly chose to grant for its actual function. The fix is a pre-deployment permission audit: compare every granted permission against actual tool call logs from development. Remove anything unused. Repeat monthly, comparing against the previous 30 days of production logs. Drift is the default. The audit is the only thing that reverses it.
Infrastructure does not make headlines the way model quality does. Benchmark improvements get announced at conferences. Guardrails do not. But the incidents that make engineering leaders lose sleep — and customers lose access — are almost never about model capability. They are about what the model was permitted to do.
The agents that ship and stay shipped are not the ones running on the newest models. They are the ones running inside infrastructure that makes dangerous actions hard, requires explicit approval for irreversible steps, and leaves an auditable record of every tool call. That is not a constraint on what agents can accomplish. It is the foundation that makes accomplishment durable.
You build the enforcement stack while the agents are running. There is no other way.
- [1]Amazon's AI Coding Tool Deleted a Live Server and Took AWS Down for 13 Hours(365i.co.uk)↩
- [2]Amazon Kiro AI Outage: When an AI Agent Deleted Production(ruh.ai)↩
- [3]When AI Agents Delete Production: Lessons from Amazon's Kiro Incident(particula.tech)↩
- [4]Dr. Sarah Chen — AI Agent Governance: Best Practices for Production Environments(harness-engineering.ai)↩
- [5]Agentic AI Risks and Challenges Enterprises Must Tackle(domino.ai)↩
- [6]Agentic AI Guardrails: Controls That Work(redis.io)↩
- [7]AI Agent Guardrails That Won't Slow Your Team Down(hatchworks.com)↩
- [8]AI Agents in Production: Infrastructure Patterns for Reliable Agentic Systems(resiliotech.com)↩
- [9]AI Agents in Production: Patterns That Work(automationswitch.com)↩