Build the Cage: Five Enforcement Layers for Agentic AI

Build the Cage Before You Need It: Five Layers Between Your Agent and a Catastrophe

Five enforcement layers anchored to documented production incidents. Permission scoping, dry-run gates, deletion protection, blast radius scoring, and audit trails the agent cannot reach. Built before you need them, not after the first escape.

Strategy & Operating ModelintermediateNov 19, 20256 min read

By Viktor Bezdek · VP Engineering, Groupon

In July 2025, an engineer ran a routine change through an agent-assisted coding tool during an active code freeze. The agent deleted the production database. Not staging. Not a test instance. The live database paying customers were sitting on top of. The post-mortem called it a "catastrophic failure."^[1]

This was not a thought experiment from a safety paper. The incident landed in the AI Incident Database^[2], got covered by Fortune, and chewed through engineering Slack channels for weeks. It was also not unique. By late 2025 the pattern was visible from a hundred feet up: agentic systems shipped into production with permission scopes that wildly exceeded the work they were doing, and with nothing meaningful between intent and execution.

The playbook here is not about slowing adoption. It is about the five enforcement layers that stop a helpful automation from becoming an expensive incident review. Each layer maps to a specific failure mode that has already happened to someone. Build them now or build them in the post-mortem.

64%

Of large companies took $1M+ losses from AI failures (EY survey). Real numbers swing hard by industry and headcount.

40%+

Agentic AI projects Gartner expects to be cancelled by 2027. Maturity of the operating org is the dominant variable.

5 layers

Enforcement surfaces between an agent and an irreversible action

3 tiers

Authorization model: read, write-confirm, write-auto

Every Pattern Here Comes From Something That Already Broke

Each layer is anchored to a documented incident. None of this is theoretical.

The Replit database deletion^[1] is the loudest example, not the only one. The AI Incident Database (Incident 1152)^[2] catalogs the same shape across stacks and industries:

An agent with broad Terraform permissions runs a configuration change that includes a destroy on a production database. The IaC tool executes it faithfully — exactly what the agent asked for. The execution was not the failure. The permission scope that allowed a routine task to cascade into a destructive operation was.

A deployment agent rolls back a failed canary, picks a target three versions back, and reintroduces a security vulnerability that was patched two releases ago. Nobody had estimated the blast radius of the rollback path itself.^[5]

A cleanup agent reads "older than 90 days" one way, the team meant another, and 18 months of audit logs under legal hold get deleted. No dry-run step ever showed what the operation would touch.

These are not failures of model intelligence. They are failures of architecture. The agents executed competently inside the permissions they had. The gap between agent capability and consequence containment is where the incidents live.

Five Layers. Each One Catches What the Last One Lets Through.

Permission scoping, dry-run gates, deletion protection, blast radius scoring, audit design.

The Enforcement Stack: Five Surfaces Between Request and Execution

Each layer is defense in depth. An action passes through all five before it touches a production resource — or gets denied and logged.

Layer 1: Permission Scoping. An Agent Cannot Break What It Cannot Touch.

The most-skipped layer, and the highest leverage. The shared admin service account is the production default.

Permission scoping means each agent runs with the minimum credentials its actual function requires. Nothing more. The reason this is the most-skipped layer is structural: nobody owns scope cleanup. One admin service account is faster to wire up than per-agent scoped credentials, and "we will tighten it later" never gets prioritized over the next feature.^[4]

The Replit incident happened in part because the agent had write access to production infrastructure during a code freeze.^[1] A scoped agent doing code review would have held read-only access on the codebase and zero infrastructure permissions. The deletion path simply would not have existed.

Scoping happens at three levels. Tool-level: which tools the agent can invoke at all. A documentation agent does not get deployment tools. Resource-level: which targets each tool can hit. A query agent gets read access to specific tables, not the whole database. Action-level: which operations are permitted. Read and list are near-zero risk. Create is moderate. Update and delete are high-risk and demand additional authorization paths.^[3]

Tier	Permission Level	Actions Allowed	Human Approval	Examples
Tier 1	Read-Only	Read, list, search, analyze	None required	Log analysis, code review, report generation
Tier 2	Write-with-Confirmation	Create, update (with preview)	Required before execution	Config changes, PR creation, ticket updates
Tier 3	Write-Autonomous	Create, update (within guardrails)	Post-hoc audit only	Formatting, dependency updates, test generation
NEVER	Destructive	Delete, drop, destroy, force-push	Always blocked for agents	Database drops, infrastructure teardown, data purge

Layer 2: Dry-Run Gates. The Plan, Then the Apply.

Every write gets a preview. The gate is the human review — an unread plan is decoration, not enforcement.

A dry-run gate is a mandatory preview between proposal and execution. The agent generates a plan — files modified, records affected, infrastructure resources created or destroyed — and a human approves before anything runs.

Terraform got this right with terraform plan. Before any apply, you see exactly what will be created, modified, and destroyed. The bitter joke about Terraform-related agent incidents is that the plan step existed and was either skipped or auto-approved without anyone reading it.^[2] The gate is not the generation. The gate is the read.

For agentic systems, the dry-run gate is mandatory and non-bypassable for any operation classified Tier 2 or above. The plan must show:

What changes (specific files, records, resources)
How many entities are touched ("3 files" vs "4,200 database rows")
Whether the change is reversible — and if so, by what mechanism
What the rollback path actually looks like when something fails

dry-run-gate.ts

// Dry-run gate. Plan first, approve, then execute. No exceptions for Tier 2+.
interface DryRunResult {
  action: string;
  tier: 'read' | 'write-confirm' | 'write-auto';
  affectedResources: {
    type: string;
    count: number;
    identifiers: string[];
  }[];
  reversible: boolean;
  rollbackPlan?: string;
  estimatedBlastRadius: 'none' | 'low' | 'medium' | 'high' | 'critical';
  requiresApproval: boolean;
}

async function executeSafely(
  action: AgentAction,
  context: ExecutionContext
): Promise<ExecutionResult> {
  // Generate the plan before anything touches a real resource.
  const preview = await generateDryRun(action, context);

  // Approval gate. Tier 2+ blocks here.
  if (preview.requiresApproval) {
    const approval = await requestHumanApproval(preview);
    if (!approval.granted) {
      return { status: 'denied', reason: approval.reason };
    }
  }

  // Critical blast radius escalates regardless of tier.
  if (preview.estimatedBlastRadius === 'critical') {
    return { status: 'escalated', reason: 'Blast radius exceeds threshold' };
  }

  // Execute, audit, log every field. The audit log is the source of truth.
  return await executeWithAudit(action, preview, context);
}

Layer 3: Make Catastrophic Actions Physically Impossible

Below permissions, below approvals: hard infrastructure constraints that the agent cannot reason around.

Deletion protection is the last line — a hard infrastructure constraint that prevents destruction of critical resources regardless of who or what issues the call. This is not access control. This is making certain actions physically impossible without a separate, deliberate process executed by something other than the agent.

AWS ships deletion protection on RDS, DynamoDB, and CloudFormation. GCP has it for Cloud SQL and GKE. Terraform has prevent_destroy lifecycle rules. Every major cloud provider arrived at the same conclusion: some resources need protection that lives below access controls. Every production environment running agentic systems should turn these flags on.^[6]

The pipeline agent that deleted 18 months of audit logs would have failed at the storage layer if object lock and a retention policy had been enabled. The Terraform agent could not have destroyed the database with prevent_destroy = true set on the resource. These are five-minute configuration changes that close the door on million-dollar incidents.

Infrastructure Deletion Protection Checklist

Deletion protection enabled on every production database (RDS, Cloud SQL, equivalent)
prevent_destroy lifecycle set on every Terraform resource that cannot survive recreation
Object lock enabled on audit log buckets — retention period set to legal-hold floor
Production Kubernetes namespaces protected behind admission webhooks
Branch protection enforced on main and production-tracking branches
MFA delete enabled on every S3 bucket holding backups
Termination protection set on production EC2 instances and ECS services
Soft-delete with retention enabled on every data store an agent can reach

Layer 4: Blast Radius. What Is the Worst Case Before You Run It?

Score the damage envelope before execution. Scope, reversibility, downstream coupling.

Blast radius scoring calculates the worst-case impact of an action before it runs. One question, answered before execution: if this fails or behaves outside spec, what is the maximum damage?^[5]

We got this wrong on a live system. An early version of our scoring labeled a config file change "low" — single file, fully versioned, clean rollback path. The file was read by seven services at startup. The change took all seven down on the next rolling restart. The blast radius of the file modification was low. The blast radius of its downstream effects was critical. Dependency mapping is not optional. We learned that on a Tuesday afternoon.

Three dimensions decide the score:

Scope — how many resources, records, or systems land in the affected set. An operation touching 5 files is smaller than one touching 5,000 database rows. An operation against one service is smaller than one against a shared database used by twelve. These thresholds are starting points; calibrate against your workload.

Reversibility — can it be undone, and at what cost. A file modification under version control is fully reversible. A DELETE against a database with no recent backup is effectively irreversible. Irreversible actions are categorically more dangerous regardless of how few entities they touch.

Downstream coupling — what depends on the resources being modified. Changing a shared API contract hits every consumer. Modifying a configuration file hits every service that reads it. The blast radius is not the target. It is the target plus everything coupled to it.

None/Low

Read-only operations or changes to isolated, versioned files

Medium

Writes to scoped resources with a known rollback path

High

Writes to shared resources or operations affecting 100+ entities

Critical

Irreversible operations on production data or shared infrastructure

Layer 5: The Audit Log Is the Only Source of Truth

When something goes wrong — and it will — the audit trail is the difference between forensics and folklore.

Every agent action — including denials and escalations — gets logged with enough context to reconstruct the incident. The audit log is not compliance theater. It is the forensic instrument that turns a vague production failure into a diagnosable, preventable event.^[3]

A real audit log captures seven fields per action: agent identity, action requested, timestamp, authorization decision (approved, denied, escalated), the dry-run preview that was shown, the execution result, and the rollback status if applicable.

The load-bearing design decision is where the log lives. Outside the agent's reach. If the agent can write to its own audit trail, a malfunctioning agent can erase the evidence of its malfunction during the exact failure mode the log was meant to capture. Append-only storage in a separate account or service. The agent has zero write access. No exceptions on this one.

audit-logger.ts

// Audit entry schema. Append-only sink in a separate account.
// The agent has read access to its own actions only — never write.
interface AgentAuditEntry {
  id: string;                    // Unique event ID
  timestamp: string;             // ISO 8601
  agentId: string;               // Which agent principal
  sessionId: string;             // Conversation or task session
  action: {
    type: string;                // e.g., 'database.query', 'file.write'
    target: string;              // What resource
    parameters: Record<string, unknown>;
  };
  authorization: {
    tier: 'read' | 'write-confirm' | 'write-auto';
    decision: 'approved' | 'denied' | 'escalated';
    approvedBy?: string;         // Human approver, if any
    denialReason?: string;
  };
  dryRun?: {
    affectedCount: number;
    blastRadius: string;
    preview: string;             // Summary of planned changes
  };
  execution: {
    status: 'success' | 'failure' | 'partial' | 'not-executed';
    duration: number;            // Milliseconds
    error?: string;
  };
  rollback?: {
    available: boolean;
    executed: boolean;
    result?: string;
  };
}

How the Five Layers Run as One System

Sequenced rollout. Each step closes a documented incident class.

[01]
Define the tiered authorization model and stop accepting one shared service account
Inventory every tool and resource your agents touch. Classify each as read-only, write-with-confirmation, or write-autonomous. Mark destructive operations (delete, drop, destroy) as permanently blocked for agent principals. Until the model exists in writing, scope drift is the default state.
[02]
Wire mandatory dry-run gates into every write path
Build the preview step into every write-capable tool. Tier 2 routes through human approval before execution. Tier 3 logs the preview and proceeds within guardrails. Skipping the preview is a safety violation, not a performance optimization.
[03]
Turn on every deletion protection your cloud already ships
One-time hardening. Enable every available deletion protection mechanism on databases, storage buckets, infrastructure stacks, and git branches. This layer is independent of the agent — it protects against any deletion source, including a tired engineer at 2am.
[04]
Add blast radius scoring to the dry-run output
Extend the dry-run preview to include a blast radius score. Count affected entities, check reversibility, map downstream dependencies. Anything that scores critical auto-escalates regardless of tier. The dependency map is what catches the seven-service config change masquerading as a one-file edit.
[05]
Stand up the audit log in an account the agent cannot reach
Append-only sink in a separate account or service. Log every action — including denials and escalations — with the seven required fields. Set retention beyond your compliance floor. Verify the agent has zero write access by attempting to write and confirming the IAM denial.

Theater

Agent inherits a broad service account with admin permissions
Writes execute the moment the agent decides to call them
Production databases can be dropped by any authenticated caller
Downstream impact is discovered after the rolling restart
Audit logs sit alongside application data — agent-writable
Incident response starts cold: what happened, who did it, when

Enforcement

Agent runs with credentials scoped to its actual function
Every write hits a preview; Tier 2+ blocks on human approval
Deletion protection makes database drops physically impossible
Blast radius is scored before execution; critical auto-escalates
Audit log lives in a separate account, append-only, agent cannot write
Incident response starts hot: here is the trail, here is the call, here is the rollback

Does this slow agent execution to the point of being useless?

Tier 1 read-only calls clear the stack in milliseconds — no approval, just logging. Tier 3 write-autonomous adds the dry-run and blast radius scoring, on the order of one to three seconds. Only Tier 2 stops for a human, and those are the operations where a few minutes of delay buys you hours not spent on incident response. The throughput cost is real and small. The throughput cost without it is extracted in unplanned 3am pages.

How do you handle agents that chain multiple operations?

Treat the chain as one unit of work with a combined blast radius. If an agent modifies a config file, restarts a service, and verifies health, the dry-run shows the entire chain, the score reflects the combined impact, and the approval covers the full sequence. Per-step approvals across a chain produce approval fatigue, which is how the gate stops enforcing anything.

What about agents in CI/CD pipelines where there is no human in the loop?

Classify CI/CD agents as Tier 3 inside strict scope limits. They create PRs, run tests, update non-production resources autonomously. Production deployments and infrastructure changes still hit a human approval — wired as a pipeline gate, not an interactive prompt. If no human approves before the timeout window closes, the pipeline pauses. Pausing is a feature. Auto-proceeding is the failure.

Is five layers overkill for internal tools that touch non-critical data?

You can simplify, but not below three: permission scoping (always), audit logging (always), and one of the three middle layers. The tier model handles this naturally — most low-risk operations are Tier 1 or 3 and never hit the approval gate. The trap is the inverse: too many gates on low-stakes operations train humans to rubber-stamp without reading, and the gate stops enforcing anything. Calibrate so that real approvals demand real attention.

Non-Negotiable Rules for Production Agents

[01]

No agent holds standing destructive permissions in production

Delete, drop, destroy, force-push are permanently blocked on agent principals. When a legitimate destructive operation is needed, a human runs it manually with their own credentials. There is no shortcut around this.

[02]

Every write operation has a dry-run preview attached to a human read

No write skips the plan. The preview shows what changes, how many entities are touched, and whether the operation is reversible. The gate is the human reading the plan, not the system generating it.

[03]

Audit logs live in a separate system the agent cannot mutate

If the agent can write to its own audit trail, the failure mode the log was supposed to catch is the one that erases the evidence. Append-only, separate account, separate access controls.

[04]

Critical blast radius auto-escalates. No exceptions for trusted agents.

When the score returns critical, the operation lands on a human queue regardless of the agent's tier. This is the catch-net for the cases permission scoping alone cannot anticipate, including downstream coupling nobody mapped yet.

[05]

Build the cage before you grant production access

All five layers exist before the agent gets near production. Not after the first incident review. Every documented production agent failure was preventable by enforcement layers that were known and not yet implemented. The first escape is not an opportunity for learning. It is a customer outage.

Key terms in this piece

agentic AI safetyproduction safety playbookAI permission scopingdry-run gatesblast radius estimationagent audit logsdeletion protectiontiered authorization

Sources

[1]Fortune — Replit AI Coding Tool Wiped Production Database(fortune.com)↩
[2]AI Incident Database — Incident 1152(incidentdatabase.ai)↩
[3]OWASP GenAI — Top 10 Risks and Mitigations for Agentic AI Security(genai.owasp.org)↩
[4]Cobbai — AI Agent Tool Security Support(cobbai.com)↩
[5]LoginRadius — Limiting Data Exposure and Blast Radius for AI Agents(loginradius.com)↩
[6]Google ADK — Safety Documentation(google.github.io)↩

Share this article

X LinkedIn Hacker News

Build the Cage Before You Need It: Five Layers Between Your Agent and a Catastrophe

Strategy & Operating ModelintermediateNov 19, 20256 min read

By Viktor Bezdek · VP Engineering, Groupon

Tier

Permission Level

Actions Allowed

Human Approval

Examples

Tier 1

Read-Only

Read, list, search, analyze

None required

Log analysis, code review, report generation

Tier 2

Write-with-Confirmation

Create, update (with preview)

Required before execution

Config changes, PR creation, ticket updates

Tier 3

Write-Autonomous

Create, update (within guardrails)

Post-hoc audit only

Formatting, dependency updates, test generation

NEVER

Destructive

Delete, drop, destroy, force-push

Always blocked for agents

Database drops, infrastructure teardown, data purge

// Dry-run gate. Plan first, approve, then execute. No exceptions for Tier 2+. interface DryRunResult { action: string; tier: 'read' | 'write-confirm' | 'write-auto'; affectedResources: { type: string; count: number; identifiers: string[]; }[]; reversible: boolean; rollbackPlan?: string; estimatedBlastRadius: 'none' | 'low' | 'medium' | 'high' | 'critical'; requiresApproval: boolean; } async function executeSafely( action: AgentAction, context: ExecutionContext ): Promise<ExecutionResult> { // Generate the plan before anything touches a real resource. const preview = await generateDryRun(action, context); // Approval gate. Tier 2+ blocks here. if (preview.requiresApproval) { const approval = await requestHumanApproval(preview); if (!approval.granted) { return { status: 'denied', reason: approval.reason }; } } // Critical blast radius escalates regardless of tier. if (preview.estimatedBlastRadius === 'critical') { return { status: 'escalated', reason: 'Blast radius exceeds threshold' }; } // Execute, audit, log every field. The audit log is the source of truth. return await executeWithAudit(action, preview, context); }

Blast radius scoring calculates the worst-case impact of an action before it runs. One question, answered before execution: if this fails or behaves outside spec, what is the maximum damage?^[5]

Three dimensions decide the score:

// Audit entry schema. Append-only sink in a separate account. // The agent has read access to its own actions only — never write. interface AgentAuditEntry { id: string; // Unique event ID timestamp: string; // ISO 8601 agentId: string; // Which agent principal sessionId: string; // Conversation or task session action: { type: string; // e.g., 'database.query', 'file.write' target: string; // What resource parameters: Record<string, unknown>; }; authorization: { tier: 'read' | 'write-confirm' | 'write-auto'; decision: 'approved' | 'denied' | 'escalated'; approvedBy?: string; // Human approver, if any denialReason?: string; }; dryRun?: { affectedCount: number; blastRadius: string; preview: string; // Summary of planned changes }; execution: { status: 'success' | 'failure' | 'partial' | 'not-executed'; duration: number; // Milliseconds error?: string; }; rollback?: { available: boolean; executed: boolean; result?: string; }; }

Every Pattern Here Comes From Something That Already Broke

Five Layers. Each One Catches What the Last One Lets Through.

Layer 1: Permission Scoping. An Agent Cannot Break What It Cannot Touch.

Layer 2: Dry-Run Gates. The Plan, Then the Apply.

Layer 3: Make Catastrophic Actions Physically Impossible

Infrastructure Deletion Protection Checklist

Layer 4: Blast Radius. What Is the Worst Case Before You Run It?

Layer 5: The Audit Log Is the Only Source of Truth

How the Five Layers Run as One System

Define the tiered authorization model and stop accepting one shared service account

Wire mandatory dry-run gates into every write path

Turn on every deletion protection your cloud already ships

Add blast radius scoring to the dry-run output

Stand up the audit log in an account the agent cannot reach

Non-Negotiable Rules for Production Agents

No agent holds standing destructive permissions in production

Every write operation has a dry-run preview attached to a human read

Audit logs live in a separate system the agent cannot mutate

Critical blast radius auto-escalates. No exceptions for trusted agents.

Build the cage before you grant production access

Related

Every Pattern Here Comes From Something That Already Broke

Five Layers. Each One Catches What the Last One Lets Through.

Layer 1: Permission Scoping. An Agent Cannot Break What It Cannot Touch.

Layer 2: Dry-Run Gates. The Plan, Then the Apply.

Layer 3: Make Catastrophic Actions Physically Impossible

Infrastructure Deletion Protection Checklist

Layer 4: Blast Radius. What Is the Worst Case Before You Run It?

Layer 5: The Audit Log Is the Only Source of Truth

How the Five Layers Run as One System

Define the tiered authorization model and stop accepting one shared service account

Wire mandatory dry-run gates into every write path

Turn on every deletion protection your cloud already ships

Add blast radius scoring to the dry-run output

Stand up the audit log in an account the agent cannot reach

Non-Negotiable Rules for Production Agents

No agent holds standing destructive permissions in production

Every write operation has a dry-run preview attached to a human read

Audit logs live in a separate system the agent cannot mutate

Critical blast radius auto-escalates. No exceptions for trusted agents.

Build the cage before you grant production access

Related