The primary consumer of your documentation is no longer a human. It is an agent making code changes, retrieving context, executing workflows. Treat docs as infrastructure — versioned, tested, owned — or ship guesses every time the model runs.
Every Claude session starts from zero unless something carries the org forward. CLAUDE.md is that something — a persistent context layer that encodes team topology, current priorities, and the decisions you have already paid for. Treat it as a config file and you keep paying the coordination tax.
Four layers between a generic assistant and a colleague: always-on slow facts, on-demand skill files, live MCP data, and a persistent entity graph. One architecture. Zero fine-tuning. The teams that ship all four cut correction cycles in half.
New hires don't lack capability. They lack context. Three onboarding agents — orientation, historical reasoning, starter-ticket matching — index the institutional knowledge that already exists in PRs, ADRs, and post-mortems. Ramp compresses.
Cosine similarity scores look fine while your RAG pipeline gives wrong answers. Four failure modes that produce confident, wrong outputs — and the retrieval stack that actually fixes them.
Most enterprise AI failures are not model failures. They are retrieval failures. Chunking, embeddings, vector stores, knowledge graphs, and the context budget — what actually breaks at scale and how to build the memory layer that holds.
SRE runbooks assume one process, one stack trace, one bad line. Agent failures are distributed across dozens of reasoning steps — the wrong premise gets laundered through 33 more calls before the user sees it. Here is the taxonomy, the triage, the postmortem.
Most agent failures return HTTP 200. The dashboard stays green while the reasoning chain quietly compounds the wrong premise. Triage runbook, failure-mode field guide, observability instrumentation, and postmortem template for non-deterministic systems.
Most production agent failures are not model failures. They are missing constraints — business rules carried in four engineers' heads with no formal representation agents can query. The fix is a versioned, governed context store the data team owns instead of answers.
Why single-inference cost estimates fail for agentic workflows — the four-component inference multiplier (call count, context accumulation, tool schema overhead, retry tax) with concrete workflow examples and measurement patterns.