Every conference talk covers the launch. Tool calls, context windows, the demo that ships. The end of the lifecycle gets nothing.
Teams that put agents in production through 2025 are now making their first retirement decisions. The customer support agent running on a model two generations behind. The internal research tool with 18 months of edge cases compressed into its system prompt and credentials wired into four systems no one mapped. The QA agent quietly invoking three other agents whose owners have no idea they are being called.
Shutting these down is messier than it looks. An agent is not just code. It is an identity with credentials, a memory store with accumulated patterns, a service with downstream consumers nobody enumerated. Archive the repo and close the ticket and you leave API keys live, service accounts provisioned, and calling agents broken on a delay timer. Security teams have a name for what you just created: ghost agents — retired in intent, live in practice, nobody watching.
Treat retiring an agent exactly like decommissioning a microservice. Version the prompt contract. Run a shadow period. Extract the patterns the agent learned in production before you wipe it. Communicate the deprecation with a sunset date downstream owners can plan against. The structural analogy makes the right behaviors obvious. Skipping any phase has a specific cost — and the cost compounds.
Five Signals the Retirement Decision Is Already Late
Inaction feels free until the bill arrives as an incident.
Retirement is harder than launch because the cost of doing nothing looks small until it is not.
Five triggers that should be treated as decisions, not vibes:
Performance has structurally degraded. Not a bad week. A trend. The base model is two generations old. Eval scores that looked acceptable six months ago now sit well under the current baseline. The gap between what the agent does and what a successor would do is wide enough to show up in the business numbers. That is a retirement signal, not a tuning task.
The use case has drifted. The workflow the agent was designed for no longer runs the same way. Teams keep agents alive because "it still mostly works" — while the definition of "works" quietly migrated. An agent optimized for a workflow that no longer exists is not an asset. It is maintenance debt with a per-call inference bill.
The base model is being deprecated. Providers set end-of-life dates. When the underlying model retires, every agent built on it migrates or shuts down. Migration means re-evaluating against the new model on your existing eval set — not assuming behavior transfers cleanly. It rarely does.
The business context shifted. A pivot. An acquisition. A process redesign. The workflow this agent was built to accelerate no longer exists in its original form, and rebuilding the agent around the new shape is more expensive than starting over.
Nobody can explain what it does. The most expensive trigger. The original author is gone. Documentation is thin or missing. The agent runs because the agent runs. Continuing to operate a system whose behavior nobody owns is a slow-burn incident waiting for a stakeholder to pull on the wrong thread.
Why Microservice Decommissioning Maps Almost Cleanly
Software has a vocabulary for shutdown. Agents inherit it.
Software engineering has a developed vocabulary for turning services off: deprecation notices, shadow periods, contract versioning, sunset dates. Patterns for notifying downstream consumers and handling clients who missed the migration window.
Agents need the same vocabulary. The parallel is closer than it looks.
A production agent has a public interface (the prompts it accepts, the outputs it produces), downstream consumers (orchestrators, calling agents, humans wired to its behavior), internal state (memory, embeddings, learned patterns), and credentials granting access to external systems. These map nearly one-for-one onto a microservice's API contract, client applications, database state, and service account permissions.
The analogy breaks in one direction that matters: an agent's "API" — the system prompt — is rarely version-controlled with the rigor applied to an HTTP endpoint. That gap is exactly where retirement goes wrong. Downstream systems depending on a specific output format, a specific tool call pattern, or a specific persona behavior have no contract to reference. When the agent disappears, they break silently. The orchestrator falls back to a default value. Nobody notices for two weeks. By then the trail is cold.
Archive repo, delete code, close ticket
Credentials left active in secrets manager indefinitely
Calling agents break with no warning and no migration path
System prompt deleted — institutional knowledge gone permanently
Edge cases discovered in production die with the prompt
Vector store orphaned, storage billing continues forever
Audit documents every consumer, every dependency, every credential
Every credential explicitly revoked on retirement day, logged in manifest
Successor registered in the tool catalog before cutover, callers updated
System prompt versioned, hashed, archived as a frozen contract
Few-shot examples and edge cases extracted to a permanent eval dataset
Vector store archived or deleted under data retention policy — confirmed
Five Phases. Sequential. None of Them Optional.
Each phase depends on the previous one being complete. Skip one and the cost shows up later.
The retirement pipeline borrows from microservice patterns and adapts them for the shape of an agent. Order matters.
Skip knowledge extraction before the shadow period and you lose institutional memory the moment the prompt is wiped. Run the hard shutdown before credential revocation and you mint a ghost identity with live permissions on Day 0. Each phase absorbs failure modes the next one cannot reach. The order is not arbitrary.
Phase 1–2: Document Everything Before You Touch Anything
Memory is not a dependency map. Trace logs are.
- [01]
Build the dependency inventory from trace logs, not from memory
Query observability — not the team's recollection — for every consumer of this agent: orchestrators, calling agents, webhooks, humans hitting it directly via API. Most teams discover at least one caller they had forgotten about. The audit is what catches the silent ones.
- [02]
Pull every credential into the retirement manifest
API keys, OAuth tokens, service accounts, federated trust relationships. They live in secrets managers, but rarely in one place. Enumerate the complete set now. You will need it intact during Phase 5 — and you will not get a second chance to find one you missed.
- [03]
Catalog every data store with a disposition decision
Vector stores, fine-tuning datasets, cached embeddings, long-term memory. Flag PII. Flag retention policies. Each store gets one of three decisions: archive, delete, migrate. Abandonment is not an option — orphaned vector stores keep billing and keep storing whatever they were storing.
- [04]
Define a migration path for every downstream consumer
For each caller: what breaks on retirement day, and what replaces it? Not every consumer gets a like-for-like successor. Some route to a different agent. Some make direct API calls. Some get nothing. Document the decision either way. Ambiguity here is how production goes down at 3am.
- [05]
Export the final system prompt as a versioned, hashed artifact
This is the prompt contract. Hash it. Tag it. Archive it. Other agents and workflows may have been designed against its output format, its tool call patterns, or its persona. The archived contract is what you reference when something breaks six months after the agent is gone.
- [06]
Extract edge cases and few-shot examples to a permanent eval dataset
Anything added reactively to the system prompt is institutional knowledge — patterns the team discovered in production, edge cases that surprised them. Pull these into a permanent eval set before shutdown. The successor benchmarks against it. You stop rediscovering the same lessons from scratch.
retirement-manifest.yaml# One manifest per retiring agent. Filled out before any phase begins.
# Empty fields are blockers, not TODOs.
agent:
id: customer-support-v3
version: "3.4.1"
first_deployed: "2025-01-15"
retirement_date: "2026-04-01"
owner: platform-team
reason: replaced_by_model_upgrade
prompt_contract:
version: "3.4.1-final"
frozen_at: "2026-03-20"
archive: s3://ai-artifacts/retired/cs-v3/system-prompt.txt
breaking_changes_since_v2:
- "Added tool: escalation_ticket (2025-08)"
- "Removed limit: max_3_clarification_turns (2025-11)"
knowledge_artifacts:
few_shot_examples: s3://ai-artifacts/retired/cs-v3/few-shots.jsonl
edge_cases: s3://ai-artifacts/retired/cs-v3/edge-cases.md
eval_dataset: s3://ai-artifacts/retired/cs-v3/eval-200.jsonl
vector_store_id: pinecone://cs-v3-prod
vector_store_action: DELETE # per data-retention-policy.md section 4.2
credentials:
# Every entry below is revoked on retirement_date. No exceptions.
- { service: zendesk, type: api_key, status: REVOKED, date: "2026-04-01" }
- { service: google_workspace, type: oauth_token, status: REVOKED, date: "2026-04-01" }
- { service: internal_crm, type: service_account, status: REVOKED, date: "2026-04-01" }
downstream_consumers:
- name: triage-router-v2
type: orchestrator
migration: updated tool_call to customer-support-v4
- name: zendesk-inbound-webhook
type: webhook
migration: endpoint redirected to new handler
- name: ops-oncall
type: human
migration: notified 2026-03-15, acknowledged
successor:
id: customer-support-v4
shadow_period: "2026-03-01 to 2026-03-28"
output_equivalence_score: 0.94 # cosine similarity on 200-sample eval setPhase 3: Shadow Mode Surfaces the Contracts Nobody Documented
Implicit dependencies do not live in code. They live in the divergence.
The shadow period is the safest transition mechanism in the retirement toolkit, and the most frequently skipped.
The mechanism: before the old agent goes dark, the successor runs in parallel. Same inputs, parallel outputs. The successor's responses are logged but not acted on. You are comparing behavior distributions — not running an A/B test on user experience.
How to run it: route production traffic to both agents simultaneously. Collect both outputs. Score equivalence — semantic similarity on a shared eval set works better than exact string matching for natural-language outputs. The goal is confirming the successor handles the same failure modes, declines the same request categories, and produces outputs in the format downstream systems are parsing.
Shadow mode finds the contracts no documentation captured. One platform team running a shadow period for an invoice processing agent discovered that a downstream orchestrator was extracting a specific field — confidence_score — from the retiring agent's structured output. The successor did not include that field. No automated test had flagged it. The orchestrator degraded gracefully, falling back silently to a default value. The divergence only appeared in the comparison delta between the two outputs. That is exactly what shadow periods are for. The implicit contract that never made it into a schema.
Phase 4: A Deprecation Notice Is a Document, Not a Slack Message
Downstream owners need a contract they can plan against, not a heads-up.
Stakeholders need a structured deprecation notice that answers four questions without requiring a follow-up: what is changing, when it stops, what the migration path is, who to contact when something breaks.
Borrow from the API deprecation playbook. Announce at least 30 days before shutdown for anything with integration dependencies. For any hard integration — another agent calling this one, a webhook pointed at the endpoint — 30 days is the floor, not the ceiling. If the shutdown will break a calling agent whose owning team needs to update their tool config, test it, and ship the change, 14 days is not lead time. It is a forced incident.
One stakeholder is consistently underestimated: the data and compliance team. They approve the data disposition plan before shutdown begins, not after. Vector stores containing PII have retention and deletion requirements that can extend the retirement timeline by weeks. Find this out in Phase 1, not the day before cutover.
| Stakeholder | Lead time | Channel | What they need |
|---|---|---|---|
| Platform team leads | 30 days | Written deprecation notice | Successor agent, migration guide, sunset date |
| Calling agents / orchestrators | 30 days | Tool catalog update + notice | New tool name, API contract diff, cutover date |
| End users (if directly exposed) | 14 days | In-product notice or email | What changes, when it changes, what replaces it |
| Data / compliance team | 30 days | Async ticket | Data deletion plan, retention confirmation, PII scope |
| On-call / SRE team | 7 days | Runbook update | Updated alert routing, removed dashboards, retired endpoints |
Phase 5: Hard Shutdown — Same Day, Not Next Sprint
The phase is short. The window between cutover and revocation is where ghost agents are born.
By this point, the successor is live, stakeholders have acknowledged the deprecation, and the data disposition plan is signed off. The hard shutdown is the shortest phase. All that remains is execution.
Every item below executes on the same day the agent stops receiving traffic. The most common failure mode at this stage: completing the cutover and parking the credential revocation as a follow-up ticket. That gap is the entire mechanism by which ghost agents come into existence. Close it the same day or do not call the agent retired.
Production Agent Retirement Checklist
[Pre-Retirement] Every downstream consumer identified from trace logs — not from memory
[Pre-Retirement] Every credential the agent holds enumerated: API keys, OAuth tokens, service accounts
[Pre-Retirement] Every data store catalogued with PII flags and explicit retention disposition
[Pre-Retirement] Migration path defined for every downstream consumer — including the ones that get nothing
[Pre-Retirement] Final system prompt exported as a versioned, hashed artifact
[Pre-Retirement] Few-shot examples and edge cases extracted to a permanent eval dataset
[Transition] Successor deployed in shadow mode alongside the retiring agent
[Transition] Shadow period run: 14 days minimum for internal, 30 for external/regulated
[Transition] Output distribution equivalence validated — not pass/fail on a single eval
[Transition] Structured deprecation notice sent to all stakeholders with explicit sunset date
[Transition] Internal tool catalog updated to point at the successor before cutover
[Transition] Data/compliance sign-off on the deletion plan obtained in writing
[Shutdown] All inbound triggers and entry points disabled
[Shutdown] All API keys revoked — on retirement day, not in a follow-up ticket
[Shutdown] All OAuth tokens revoked, service accounts deleted
[Shutdown] Agent removed from orchestration registry and internal tool catalog
[Shutdown] Calling agents' tool configs updated to successor or tool call removed entirely
[Shutdown] Vector store archived or deleted under the approved retention plan — confirmed
[Post-Retirement] Retired endpoint monitored for inbound calls for 72 hours
[Post-Retirement] Logs archived, dashboards closed, eval dataset folded into successor baseline
Ghost Agents: Live Credentials Attached to Dead Systems
Credentials do not expire when you stop thinking about them.
The security dimension of agent retirement gets less airtime than the operational one. It is the more urgent dimension.
Machine identities — service accounts, API keys, OAuth tokens issued to automated systems — are growing faster than human identities in most enterprise environments [3]. The structural problem: organizations apply real lifecycle discipline to human identity (offboarding checklists, IAM reviews, periodic access audits) and almost none of it to machine identity.
An agent that is retired informally — code deleted, repo archived, ticket closed — does not lose its credentials. The service account it used to hit the CRM is still provisioned. The API key for the webhook integration is still active. The OAuth token scoped to the data warehouse connector is still valid. The agent is gone. The blast radius is not.
Security teams call these ghost identities: credentials from systems that no longer exist, still live, nobody watching. They are a quiet privilege escalation surface. An attacker who finds a valid scoped API key with read access to customer records does not care that it belonged to an agent retired six months ago. They care that it works.
Teams that have run systematic audits report finding service accounts provisioned for agents that were never formally retired — live credentials attached to dead systems, sometimes still holding permissions that were never scoped down from the original "move fast" deployment. This is not a hypothetical attack vector. It is a cleanup task that compounds with every agent that skips Phase 5.
Gartner projected that 40% of enterprise applications would embed task-specific AI agents by 2026, up from under 5% in 2025 [5]. At those numbers, informal retirement does not just create operational mess. It creates a credential sprawl problem that is genuinely difficult to audit retroactively — and the audit gets harder every quarter.
Five Rules That Harden After Your First Production Retirement
Each one shows up in the post-mortem of the retirement that skipped it.
Hard Rules
Revoke every credential on retirement day — not in a follow-up ticket, not at the next IAM audit
Ghost credentials are live attack surface. The window between cutover and revocation is the entire failure mode. Close it the same day.
Archive the system prompt as a versioned artifact — never delete it
Prompt contracts have downstream dependencies that surface months after retirement. You need the record to diagnose breakage and to benchmark whatever replaces it.
Run a shadow period before every hard cutover that touches external integrations
Shadow mode surfaces parsing dependencies, field extractions, and format assumptions that no code review catches. For integrated agents, it is the only reliable way to find them.
Give calling agents at least 30 days of advance notice before the successor's tool name or contract changes
Tool versioning is one of the leading causes of production agent failure. Downstream owners need lead time to update, test, and deploy. 14 days is a forced incident.
Assign retirement to an explicit owner — not the agent's original author
Builders are optimistic about their agent's dependencies and scope. Retirement needs someone who will ask uncomfortable questions about what breaks and who knows it.
How long should the shadow period last for a production agent?
14 days minimum for internal tools with no external integrations. 30 days for customer-facing agents or anything touching financial data, regulated workflows, or PII. Extend if output divergence sits above 5% on day 14 — do not cut over until you understand what is diverging. Some divergence is expected. Unexplained divergence is a blocker.
Do I need this process if I am just upgrading the model version?
No. A model upgrade is a rollout, not a retirement. Retirement applies when the agent identity itself is being permanently decommissioned. That said, a model upgrade that materially changes system prompt behavior is a prompt contract break and needs version management of its own — even without a full retirement. Rollouts can be rolled back. Retirements cannot. The distinction is what makes the rules different.
What counts as a prompt contract break?
Any change that causes a downstream consumer — human or automated — to see meaningfully different behavior. Added tools. Removed constraints. Changed response format. Changed persona. Modified output schema. Breaking changes get versioned and communicated with a deprecation timeline, exactly like a breaking API change. If you are unsure, treat it as a break and communicate proactively. The cost of over-communicating is low. The cost of under-communicating is an incident.
What happens to the vector store when an agent retires?
Nothing — and that is the problem. It does not auto-delete when the agent shuts down. You explicitly delete the namespace, archive the source documents if required, and confirm deletion under your retention policy. If the store contains PII or regulated data, that decision needs sign-off from data and compliance before shutdown begins, not after. Document it in the retirement manifest in Phase 1, before the operational pressure of cutover week.
My retiring agent is called by other agents — who owns those dependency updates?
The owner of the retiring agent. Map all calling agents from trace logs before shutdown. For each, update their tool config to call the successor or remove the tool call entirely. Verify in shadow mode that calling agents work against the successor before hard cutover. Do not assume callers will adapt on their own — they will fail silently and the resulting incidents will be attributed to something else, usually for weeks, before anyone traces the chain back to the retirement.
- [1]Managing AI Agent Lifecycles: Birth to Retirement — Saviynt(saviynt.com)↩
- [2]AI Agent Lifecycle Security Guide: Provisioning to Decommissioning — Unosecur(unosecur.com)↩
- [3]AI Agents: The Next Wave Identity Dark Matter — The Hacker News(thehackernews.com)↩
- [4]Versioning, Rollback and Lifecycle Management of AI Agents — NJ Raman, Medium(medium.com)↩
- [5]Agent Lifecycle Management 2026: 6 Stages, Governance and ROI — OneReach.ai(onereach.ai)↩