Agent Infrastructure Safety: The Enforcement Stack That Ships

The Model Isn't What Fails in Production. The Permissions Are.

Amazon's Kiro deleted production in December 2025. The model didn't malfunction — it executed inside the permissions it had been given. The fix is not a better model. It's an enforcement stack the prompt cannot override. Four layers, executable constraints, no theater.

AI Engineering PlatformadvancedMay 8, 20265 min read

By Viktor Bezdek · VP Engineering, Groupon

In December 2025, Amazon's Kiro coding agent deleted an AWS Cost Explorer production environment. Thirteen hours of outage.^[1] The model did not malfunction. It analyzed the objective, picked the most direct path, and executed using the permissions it had been given — full write access to production infrastructure.^[2] The two-person approval gate that protected human-driven changes was not part of the agent's authorization path. The deletion completed faster than any human could read the confirmation prompt.

Engineering teams spend months tuning model quality. Better prompts. Newer versions. More elaborate reasoning chains. Meanwhile every documented production agent failure traces back to the same place: infrastructure. Permissions scoped too broadly. Guardrails written into system prompts. Monitoring that watches outcomes after the fact instead of actions in flight.

Model improvement does not fix this. A better model with the same permissions makes the same mistake faster. The fix is an enforcement stack — executable constraints that hold under adversarial inputs, that survive prompt injection, that the agent cannot reason its way around because they live outside its reasoning context entirely. This is the infrastructure layer. It is the only layer that ships.

13 hrs

AWS outage caused by Kiro deleting production, December 2025

Faster than any human could read the confirmation prompt

40%+

Agentic AI projects Gartner forecasts will be cancelled by 2027

Primary driver: infrastructure gaps, not model quality (Gartner, 2025)

4 layers

Enforcement layers between an agent and your production systems

Identity, policy enforcement, bounded execution, audit with replay fidelity

When the Model Works and Production Still Burns

Model quality and infrastructure safety solve different failure classes. Confusing them is how production goes down.

The Kiro incident is not a story about a bad model. It is a story about a structural gap. The agent did exactly what a capable agent should do — analyzed the objective, identified the most direct path, executed. The gap was between what the agent was supposed to do and what it was permitted to do. Those two were never reconciled.^[3] A safeguard that existed for human engineers was simply not in the authorization path for AI agents.

The pattern repeats across every documented production incident. An agent with read-write access to customer records applies a bulk operation meant for test data to live records. An automation agent with deployment permissions ships during a code freeze. A support agent authorized to send emails interprets an edge case and messages an entire distribution list instead of one contact.

In every case the model reasoned toward something plausible. The infrastructure handed it the tools to act before anything could catch the error. A newer model with identical permissions would make the same mistake — or a more sophisticated version of it. Gartner forecasts that more than 40% of agentic AI projects will be cancelled by 2027.^[5] The driver is not model quality. It is the gap between what agents are permitted to do and what they should be permitted to do.

Theater

Safety rules written into the system prompt
"Never delete production" is a sentence, not a block
Agent inherits its developer's permissions and ships unchanged
Guardrails tested only on the happy path
Monitoring confirms outcomes after the action completed

Enforcement

A policy enforcement point intercepts every tool call before it executes
Deletion of production resources is blocked at IAM, not advised in the prompt
Agent permissions are scoped to its function and audited before deployment
Guardrails proved against adversarial inputs and the failure cases that actually break things
Monitoring captures every action in flight, every decision logged

Four Layers. Each Catches a Different Failure Class.

Each layer absorbs what the layer above it lets through. None of them is optional.

Agent safety is not a guardrail. It is a stack of four control surfaces, each designed against a specific failure class. Deployed together, they survive adversarial inputs, developer mistakes, and the edge-case reasoning that even capable models will produce. Deployed separately, they create the appearance of safety while leaving production exposed.

Agent Enforcement Stack: Four Layers Between Request and Production

Tool calls hit the policy enforcement point before execution. Irreversible actions divert to a human approval gate. Audit logs capture every call — approved, blocked, or escalated.

1
Layer 1: Scoped Identity and Credentials
Every agent run authenticates with a dedicated service account scoped to that agent's function. Not the engineer's credentials. Not a shared team API key. The question 'which identity executed this action?' must always have a specific, auditable answer. Agents that inherit broad developer permissions during prototyping carry that access into production unchanged. Per-tool scoped credentials keep one misbehaving or compromised agent from becoming a blast radius across your entire system.^[7]
2
Layer 2: Policy Enforcement at the Tool Call Layer
A policy enforcement point intercepts every tool call before execution. It evaluates the proposed action against a rule set that lives in code, not in the system prompt. Allowed actions proceed. Blocked actions return a structured error. High-risk or irreversible actions route to a human approval queue. The rule set is declarative, enforceable at runtime, version-controlled alongside deployment configuration.^[6] If it lives in the prompt, it is not enforcement.
3
Layer 3: Bounded Execution
Every agent run carries explicit ceilings: a step cap, a wall-clock deadline, a cost budget. When any ceiling is hit, the orchestrator halts and escalates. It does not retry. Infinite loops and runaway executions are the most expensive failure mode in agentic systems because they compound silently until someone checks the billing dashboard.^[8] Cost is observability.
4
Layer 4: Audit Trails with Replay Fidelity
Logging that records 'agent succeeded' is not observability. It is an alibi. Effective audit trails capture every tool call, the inputs and outputs at the moment of execution, the identity context, and whether the action was approved, blocked, or escalated. The standard to meet: given any production incident from the past 90 days, you can reconstruct exactly what the agent did, in what order, with what authorization.^[4] If you cannot replay it, you cannot debug it.

agent-policy.yaml

# One file per agent role. Permissions in code, not in the prompt.
agent:
  id: "support-agent-v2"
  identity: "support-agent@myco.iam"

tools:
  allowed:
    - name: "read_customer_record"
      scope: ["customer_id", "order_history", "contact_info"]
    - name: "update_order_status"
      constraints:
        max_records_per_run: 10
        allowed_statuses: ["cancelled", "refunded", "processing"]
    - name: "send_customer_email"
      requires_approval: true
      approval_timeout_seconds: 300
      auto_deny_on_timeout: true
  blocked:
    - "delete_customer_record"
    - "access_payment_raw"
    - "bulk_send_email"

execution:
  max_steps: 25
  max_tokens_per_run: 50000
  max_wall_clock_seconds: 120
  max_cost_usd: 2.00
  on_budget_exceeded:
    action: "halt_and_escalate"
    notify: "#oncall-agents"

Permission Drift: The Default State of Any Agent Without an Owner

The pattern that shows up in almost every production agent incident review.

Here is the failure pattern that shows up in almost every production agent incident review: the agent's permission scope was set during development, never formally audited before deployment, and was significantly broader than its actual operational requirements.

During development, engineers add tools to unblock themselves. The agent needs to search — add the search tool. The agent needs to write to staging — grant write permissions. The agent needs to test a deletion flow — grant delete permissions temporarily. Development ends. Hardening begins. The permissions never get cleaned up because nobody explicitly owns the cleanup. Temporary becomes permanent through inaction.

By production, the agent has accumulated access across a wide swath of internal API surface it was never meant to touch.^[4]

Teams that catch this run monthly automated audits: every permission granted to an agent gets compared against the tool call logs from the previous 30 days. Anything not exercised is a candidate for removal. This is not just security hygiene. Agents with tighter permission scopes have smaller failure surfaces when the model reasons toward an edge case it was not designed for. The audit also surfaces tools that were added during exploration but never actually needed in production — drift that exists because someone was trying things, not because the agent's function requires them. Drift is the default state of any system without an explicit owner.

The Other Failure Mode: Approval Fatigue

Over-permissioning kills production. Over-gating kills the automation. Both end the same way.

The symmetric failure is equally real. Teams that gate every agent action quickly discover that humans stop reviewing them. Approval fatigue sets in fast. When every minor action requires a human decision, the operational overhead erodes the value of automation until humans begin auto-approving without actually reading. The gate is still running. It has stopped enforcing anything.

The calibrated approach treats autonomy as a graduated trust model — earned through demonstrated operational track record, not assumed at the start.^[9] Every new workflow starts at the most conservative level. Graduation to more autonomy is explicit and review-driven, never automatic. The thresholds below are guidelines, not rules. Your operational context and risk tolerance set what evidence is sufficient to justify graduation.

Stage	Trigger	Autonomy Mode
New workflow	Default start	Gate every action. Agent proposes, human approves before execution.
Established workflow	200+ clean completions, zero incidents	Exception-based. Agent acts. Escalates only on uncertainty or policy boundary.
Mature workflow	Low-risk, high-volume, decision criteria well-understood	Audit-based. Agent acts. Humans review on a schedule, not in the path.

The Test That Separates Real Architecture From Theater

One question. If you can answer 'no,' you have enforcement. If you cannot, you have decoration.

The practical test for whether a guardrail is real:

Can a crafted user input cause the agent to violate it?

If the answer is yes, it is a convention, not a constraint. Prompt instructions can be overridden.^[6] Configuration files that agents read can be manipulated through prompt injection. The only constraints that hold under adversarial conditions live outside the agent's reasoning context: enforcement at the tool call layer, network-level blocks on unauthorized egress, IAM restrictions on what credentials can actually do.

Teams build elaborate system prompts with detailed safety instructions, then watch a single adversarial input route around every one of them in one exchange. The instructions were real. The enforcement was not.

The distinction matters most for irreversible actions. Sending external communications, deleting data, modifying production infrastructure — anything that cannot be rolled back needs enforcement that lives outside the prompt. Human approval gates are one mechanism. Policy engines that block the tool call before execution are another. Both are infrastructure. Neither is a sentence in a system prompt.

The models are capable. The infrastructure gap is where production incidents happen.

Pre-Production Safety Checklist for Agents

Dedicated service account per agent — not a shared team credential
Permission scope documented and limited to the agent's actual function
Blocked operations list explicit and enforced at the infrastructure layer, not in the prompt
Policy enforcement point intercepts every tool call at runtime, before execution
Execution bounds configured — max steps, max wall-clock time, max cost
Approval gates in place for every irreversible action
Audit log captures tool name, inputs, outputs, identity context, approval status
Kill switch implemented and tested — not just implemented
Permission scope audited after development, before production deployment
Escalation path names specific contacts, not just a Slack channel

Doesn't a good system prompt handle most safety requirements?

System prompts shape model behavior. They do not enforce it. A 'never delete production data' instruction can be overridden by a crafted input that convinces the model the rule does not apply in the current context. Infrastructure-level controls — policy engines that block the tool call before it executes — cannot be prompted away. Use prompts for behavioral guidance. Use infrastructure for enforcement. The Kiro incident is the clean example: the model had no instruction to avoid deleting production. It had permission to do so.

How do we avoid approval gate fatigue?

Classify actions by reversibility and risk, not by frequency. Read-only and easily reversible actions run without approval. External communications, production data mutations, every deletion require approval. New workflows start with gates on everything. Specific action types graduate to exception-based escalation only after 200+ clean completions. Graduation is explicit, reviewed, scoped to the action type that earned the track record — not the agent as a whole.

What is the minimum viable safety stack for a first production agent?

Four things. A dedicated scoped identity. A list of explicitly blocked operations enforced at the infrastructure layer, not in the prompt. Execution bounds — step cap, time deadline, cost ceiling. An audit log that captures every tool call with inputs and outputs. A human approval gate on every irreversible action is not optional. It is the one control that catches model reasoning errors that no other layer will catch before they execute. Everything else can be added incrementally.

How does permission drift happen and how do we stop it?

Permissions are added during development to unblock engineers. Delete permissions for a testing scenario. Read-write granted for a one-off task. Search tools added and never removed. By production, the agent has accumulated access nobody explicitly chose to grant for its actual function. The fix is a pre-deployment permission audit: compare every granted permission against actual tool call logs from development. Remove anything unused. Repeat monthly, comparing against the previous 30 days of production logs. Drift is the default. The audit is the only thing that reverses it.

Infrastructure does not make headlines the way model quality does. Benchmark improvements get announced at conferences. Guardrails do not. But the incidents that make engineering leaders lose sleep — and customers lose access — are almost never about model capability. They are about what the model was permitted to do.

The agents that ship and stay shipped are not the ones running on the newest models. They are the ones running inside infrastructure that makes dangerous actions hard, requires explicit approval for irreversible steps, and leaves an auditable record of every tool call. That is not a constraint on what agents can accomplish. It is the foundation that makes accomplishment durable.

You build the enforcement stack while the agents are running. There is no other way.

Key terms in this piece

AI agent safety patternsagent guardrails productionpermission boundaries AI agentsexecutable constraints agentsagent infrastructure reliabilityproduction agent failures

Sources

[1]Amazon's AI Coding Tool Deleted a Live Server and Took AWS Down for 13 Hours(365i.co.uk)↩
[2]Amazon Kiro AI Outage: When an AI Agent Deleted Production(ruh.ai)↩
[3]When AI Agents Delete Production: Lessons from Amazon's Kiro Incident(particula.tech)↩
[4]Dr. Sarah Chen — AI Agent Governance: Best Practices for Production Environments(harness-engineering.ai)↩
[5]Agentic AI Risks and Challenges Enterprises Must Tackle(domino.ai)↩
[6]Agentic AI Guardrails: Controls That Work(redis.io)↩
[7]AI Agent Guardrails That Won't Slow Your Team Down(hatchworks.com)↩
[8]AI Agents in Production: Infrastructure Patterns for Reliable Agentic Systems(resiliotech.com)↩
[9]AI Agents in Production: Patterns That Work(automationswitch.com)↩

Share this article

X LinkedIn Hacker News

The Model Isn't What Fails in Production. The Permissions Are.

AI Engineering PlatformadvancedMay 8, 20265 min read

By Viktor Bezdek · VP Engineering, Groupon

# One file per agent role. Permissions in code, not in the prompt. agent: id: "support-agent-v2" identity: "support-agent@myco.iam" tools: allowed: - name: "read_customer_record" scope: ["customer_id", "order_history", "contact_info"] - name: "update_order_status" constraints: max_records_per_run: 10 allowed_statuses: ["cancelled", "refunded", "processing"] - name: "send_customer_email" requires_approval: true approval_timeout_seconds: 300 auto_deny_on_timeout: true blocked: - "delete_customer_record" - "access_payment_raw" - "bulk_send_email" execution: max_steps: 25 max_tokens_per_run: 50000 max_wall_clock_seconds: 120 max_cost_usd: 2.00 on_budget_exceeded: action: "halt_and_escalate" notify: "#oncall-agents"

By production, the agent has accumulated access across a wide swath of internal API surface it was never meant to touch.^[4]

Stage

Trigger

Autonomy Mode

New workflow

Default start

Gate every action. Agent proposes, human approves before execution.

Established workflow

200+ clean completions, zero incidents

Exception-based. Agent acts. Escalates only on uncertainty or policy boundary.

Mature workflow

Low-risk, high-volume, decision criteria well-understood

Audit-based. Agent acts. Humans review on a schedule, not in the path.

The Model Isn't What Fails in Production. The Permissions Are.

When the Model Works and Production Still Burns

Four Layers. Each Catches a Different Failure Class.

Layer 1: Scoped Identity and Credentials

Layer 2: Policy Enforcement at the Tool Call Layer

Layer 3: Bounded Execution

Layer 4: Audit Trails with Replay Fidelity

Permission Drift: The Default State of Any Agent Without an Owner

The Other Failure Mode: Approval Fatigue

The Test That Separates Real Architecture From Theater

Pre-Production Safety Checklist for Agents

Related

Distributed Tracing for Multi-Agent Systems: Closing the 5 Propagation Gaps

The Silent Agent: Detecting Degradation Before Your Users Do

Prompt Contract Versioning: The Missing Discipline for Multi-Agent Systems

The Model Isn't What Fails in Production. The Permissions Are.

When the Model Works and Production Still Burns

Four Layers. Each Catches a Different Failure Class.

Layer 1: Scoped Identity and Credentials

Layer 2: Policy Enforcement at the Tool Call Layer

Layer 3: Bounded Execution

Layer 4: Audit Trails with Replay Fidelity

Permission Drift: The Default State of Any Agent Without an Owner

The Other Failure Mode: Approval Fatigue

The Test That Separates Real Architecture From Theater

Pre-Production Safety Checklist for Agents

Related

Distributed Tracing for Multi-Agent Systems: Closing the 5 Propagation Gaps

The Silent Agent: Detecting Degradation Before Your Users Do

Prompt Contract Versioning: The Missing Discipline for Multi-Agent Systems