Docs Are Now Your AI's Runtime. Treat Them Like It.

Your Docs Are Now Your AI's Runtime. Most Teams Have Not Noticed.

The primary consumer of your documentation is no longer a human. It is an agent making code changes, retrieving context, executing workflows. Treat docs as infrastructure — versioned, tested, owned — or ship guesses every time the model runs.

Data, Context & KnowledgeintermediateSep 16, 20258 min read

By Viktor Bezdek · VP Engineering, Groupon

The audience for your documentation changed and nobody updated the contract. The on-call engineer hunting a runbook at 2 AM is still in the loop, but they are no longer the dominant reader. The dominant reader is an agent — coding assistant, retrieval system, workflow orchestrator — parsing your prose into context and shipping decisions out the other end.

This is a load-bearing change.

Snowflake's 2025 RAG research found that retrieval and chunking strategies dominate answer quality more than the generating model itself.^[5] Translation: the model you picked matters less than the substrate it reads from. Your Claude Code session, your Copilot completions, your agentic pipelines — every one of them is bottlenecked on documentation, not capability.

The uncomfortable corollary: documentation is no longer the thing engineering managers nag about in retros. It is infrastructure. Treat it like infrastructure or accept that every AI interaction is degraded by the same gap.

The good news is structural. The constraints that make documentation machine-readable — self-contained sections, semantic headers, explicit scope — make it sharper for humans too. The skill is not the obstacle. The obstacle is that nobody made docs a blocking requirement until the model started reading them out loud.

Stale Docs Do Not Just Produce Bad Output. They Produce Confident Bad Output, At Scale.

Bad documentation is no longer a one-shot annoyance. It is a confidence amplifier on every wrong answer.

Garbage in, garbage out understates the failure mode. With agents in the loop, bad documentation produces confidently wrong output — repeatedly, across every session that touches the same context.

A coding agent reading outdated architecture docs builds on assumptions the team rejected months ago. A RAG system retrieving stale API references fabricates function calls that compile and fail at runtime. A workflow agent consuming process docs from last quarter automates the wrong process correctly.

Factory.ai's research on context windows found that flooding a model with noise actively degrades quality by diluting the signal needed to solve the task.^[4] Larger context windows do not fix this. They make it cheaper to degrade output without noticing. More context is not better. More relevant, accurate, current context is better. The discipline is curation, not capacity.

90%+

Token reduction when HTML docs are converted to clean markdown (Fern, 2026). Actual reduction varies with HTML complexity — but the direction is one-way.

75%

Developers projected to consume MCP servers for AI tools by end of 2026 (Document360). Adoption forecasts carry uncertainty; direction does not.

42%

Committed code that is now AI-assisted (ShiftMag State of Code 2025). All of it shaped by whatever context the model could reach.

Sit with the implication of forty-two percent.^[6] Roughly half the code your team ships was conditioned on whatever documentation the agent could find. If that documentation lives in Confluence pages last edited in 2024, the agent is coding against a two-year-old snapshot of your system. Every pull request carries that drift forward into the next one.

This is not a developer-experience problem anymore. It is a product-quality problem with a documentation root cause.

Human-Readable and Machine-Readable Are Not the Same File.

Beautiful documentation sites burn tokens. Clean markdown ships context.

A documentation site with sidebar navigation, interactive code examples, and animated diagrams scores well on developer surveys. When an agent tries to consume it, the same surface becomes adversarial: JavaScript bundles, navigation chrome, cookie banners, layout markup that burns tokens and buries the content underneath.

The shift to machine-readable docs has three concrete layers. None of them require giving up the rendered version. They require committing to a parallel surface that the model can actually read.

Decoration

Rich HTML with navigation, sidebars, and interactive widgets
Content buried in DOM the model has to fight through
No standard for AI discovery or indexing
Documentation site is the only distribution surface
Freshness tracked informally — "this seems outdated"

Infrastructure

Clean markdown with semantic headers and structured frontmatter
Content reachable via llms.txt, MCP servers, or a raw markdown endpoint
llms.txt as the discovery layer — robots.txt for language models
Docs distributed across site, MCP, IDE, and CLI agents simultaneously
Freshness enforced in CI with staleness thresholds and named owners

llms.txt Is the Discovery Layer. MCP Is the Runtime.

Two standards, two jobs. Confusing them is how teams ship the wrong one first.

The llms.txt specification — Jeremy Howard and the Answer.AI team — is the cleanest example of documentation infrastructure built for agents. A standardized file at /llms.txt that tells the model what your site contains and where to find it.^[1] Same role as robots.txt, different reader.

The spec defines two variants. llms.txt is the compact map: one-sentence descriptions and URLs per page. llms-full.txt embeds the body inline so the agent does not have to fetch every link. Fern, Mintlify, and ReadMe now generate both automatically.^[3]

Discovery is one job. Runtime is another. Google's Developer Knowledge API ships with a Model Context Protocol (MCP) server in early 2026, giving agents a machine-readable way to reach official documentation in real time.^[2] MCP — the open standard from Anthropic — lets the model retrieve structured, current context from external sources: docs, APIs, databases, configuration. llms.txt tells the agent what exists. MCP serves what is live. Build the first; reach for the second when static no longer holds.

llms.txt

# Acme Platform Documentation
# One file. Discovery layer for every agent that hits the site.

> Acme Platform is a data orchestration layer for ML pipelines.
> This file points agents at the canonical sources.

## API Reference
- [Authentication](/docs/api/auth.md): OAuth2 and API key flows
- [Pipelines API](/docs/api/pipelines.md): create, configure, monitor pipelines
- [Transforms API](/docs/api/transforms.md): define and chain transformations

## Architecture
- [System Overview](/docs/arch/overview.md): high-level architecture and data flow
- [Data Model](/docs/arch/data-model.md): core entities, relationships, constraints

## Guides
- [Quick Start](/docs/guides/quickstart.md): first pipeline in under 5 minutes
- [Migration from v2](/docs/guides/migration-v2-v3.md): breaking changes and upgrade path

If It Is Not in the Repo, It Does Not Exist.

Agents enforce a constraint that docs-as-code never had — co-locate or accept that the model is operating without you.

The docs-as-code movement is a decade old. Store documentation in the repo, write markdown, review in pull requests, deploy in CI. Most teams adopted it halfway. The API reference lives in the repo. Architecture decisions live in Notion. Runbooks live in Confluence. The onboarding guide is a Google Doc someone shared in Slack once and nobody can find again.

Agents broke that compromise.

An agent searching your repository finds your in-repo docs. It does not find Notion. It does not find Confluence. It does not find that Google Doc. If it is not in the repo, it does not exist for any tool the agent runs through. The fragmented documentation surface that humans tolerated for years stopped being tolerable the moment the model started doing the reading.

This is a forcing function the original docs-as-code pitch never produced: co-locate or accept that AI is operating with a blindfold. With forty-two percent of code AI-assisted, blindfolded means the blast radius now extends to your production codebase.

The AI-native shape of the docs tree:

AI-Native Documentation Structure

tree

repo/
├── docs/
│   ├── architecture/
│   │   ├── system-overview.md
│   │   ├── data-model.md
│   │   └── decisions/
│   │       ├── ADR-001-database-choice.md
│   │       └── ADR-002-auth-provider.md
│   ├── api/
│   │   ├── openapi.yaml
│   │   ├── auth.md
│   │   └── endpoints.md
│   ├── runbooks/
│   │   ├── incident-response.md
│   │   └── deploy-rollback.md
│   └── onboarding/
│       ├── setup.md
│       └── conventions.md
├── CLAUDE.md
├── llms.txt
└── .github/workflows/docs-freshness.yml

Three additions separate AI-native from plain docs-as-code. CLAUDE.md carries persistent project context for the coding agent. llms.txt carries structured discovery for external tools. docs-freshness.yml enforces that none of it rots — because stale documentation that an agent trusts unconditionally is worse than no documentation at all.

The first time we adopted this structure we made the predictable mistake. Two hundred Confluence pages migrated wholesale, no quality filter. Result: a docs directory full of outdated material and an agent confidently citing every bit of it. The fix was a scalpel, not a forklift. Twenty to thirty load-bearing documents, archived rest, build the habit of keeping the core current before expanding the surface area. Migrate small. Hold the line. Add only when ownership is explicit.

Stale Docs Are the Default State. CI Is the Only Thing That Reverses It.

Drift is what happens when nobody owns the cleanup. Freshness has to be enforced, not encouraged.

Stale documentation has always been annoying. With agents in the loop, it becomes actively dangerous. A human reading old docs notices something feels wrong — the screenshots changed, the menu items moved. The agent has no such reflex. It treats every document as equally authoritative regardless of when it was last touched.

Freshness has to be enforced, not encouraged. The pattern borrows from data engineering: define a freshness SLA per document type, track the last-modified date, fail CI when a document exceeds its threshold. Drift becomes a build failure instead of a private complaint.

The minimum viable enforcement:

docs-freshness.yml

# .github/workflows/docs-freshness.yml
# Stale docs fail CI. The owner gets named in the warning. No exceptions.
name: Documentation Freshness Check
on:
  schedule:
    - cron: '0 9 * * 1'   # Every Monday 9 AM
  push:
    paths: ['docs/**']

jobs:
  freshness:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # Full history for git log dates

      - name: Check document freshness
        run: |
          STALE_THRESHOLD_DAYS=90
          STALE_FILES=""
          for file in $(find docs/ -name '*.md'); do
            LAST_MODIFIED=$(git log -1 --format='%ct' -- "$file")
            NOW=$(date +%s)
            AGE_DAYS=$(( (NOW - LAST_MODIFIED) / 86400 ))
            if [ $AGE_DAYS -gt $STALE_THRESHOLD_DAYS ]; then
              OWNER=$(head -5 "$file" | grep -oP '(?<=owner: ).*' || echo 'unowned')
              STALE_FILES="$STALE_FILES\n$file ($AGE_DAYS days, owner: $OWNER)"
            fi
          done
          if [ -n "$STALE_FILES" ]; then
            echo "::warning::Stale docs found:$STALE_FILES"
            exit 1
          fi

Document Type	Staleness Threshold	Owner	Review Trigger
API reference	30 days	API team lead	Any endpoint change in OpenAPI spec
Architecture decisions (ADRs)	180 days	Original author	Related system metric change
Runbooks	60 days	On-call rotation lead	Any incident that ran the runbook
Onboarding guides	90 days	Engineering manager	New-hire feedback or tooling change
CLAUDE.md / AI context	14 days	Tech lead	Any convention or dependency change
llms.txt	Auto-generated	CI pipeline	Any doc added, moved, or deleted

From Markdown to Model: The Pipeline That Has to Hold.

Authoring is the easy half. The flow from CI through distribution to the agent is where the failures live.

From Markdown to Agent: The Documentation Pipeline

Authored markdown clears freshness, passes CI validation, and fans out to MCP, CLAUDE.md, and the RAG index. Stale documents loop back to the author — the most common bottleneck in the pipeline.

Two Audiences, One Surface. The Constraints Overlap More Than You Think.

The patterns that make docs legible to agents make them sharper for humans. The overlap is the whole point.

Documentation that is well-structured for agents is, almost without exception, better for humans too. Clear headings, consistent formatting, explicit assumptions, self-contained sections — both audiences benefit. The bad news is most existing documentation served neither audience particularly well. The good news is the rewrite serves both at once.

The patterns that carry the most leverage when you restructure for dual consumption:

[01]
Lead with a purpose statement, not a preamble
The first paragraph of every doc must answer three questions: what is this, who is it for, when was it last verified. Agents use that paragraph to decide whether the document is even relevant before consuming the body. Humans use it to decide whether to keep reading. A purpose statement is a relevance filter — make it explicit or accept that both audiences guess.
[02]
Use semantic headers, not clever ones
A section titled 'Getting Your Feet Wet' tells a retrieval system nothing. A section titled 'Authentication Setup' tells it exactly what to expect. Headers function as an implicit table of contents for any retrieval system that ranks by relevance. Clever headers are a vanity tax paid in retrieval misses.
[03]
Make every section self-contained
RAG systems and agents retrieve sections, not full documents. If a section requires three paragraphs of context from above to make sense, the agent serves it without that context — and produces a confidently wrong answer. Each section has to carry its own minimum viable context. This is the most violated rule in legacy documentation.
[04]
Mark facts and opinions differently
Agents cannot distinguish 'we chose PostgreSQL' (fact) from 'PostgreSQL is probably the right choice for this use case' (opinion). They will weight both equally and cite both as authoritative. Mark opinions, recommendations, and assumptions explicitly so the agent — and the next human reader — can weight them honestly.

CLAUDE.md Is the Bootstrap File. Treat It Like One.

The document the agent reads before searching anything else. The leverage is in what you choose not to put there.

CLAUDE.md — and its peers, .cursorrules, .windsurfrules, Codex's AGENTS.md — is a specific kind of documentation infrastructure. The bootstrap file. The document that gives the agent enough context to operate competently before it starts searching for anything else.

The best CLAUDE.md files follow a progressive disclosure pattern. They do not dump every fact the agent might one day need. They carry exactly three things:

Slow facts. Team conventions, architecture decisions, naming patterns. Things that change quarterly, not daily. If it changes weekly, it does not belong here.
Navigation pointers. Where to find specific kinds of information. "Architecture decisions live in docs/decisions/. Runbooks live in docs/runbooks/. API reference is generated from openapi.yaml." The agent searches efficiently instead of wandering — and the agent that wanders burns tokens and produces drift.
Anti-patterns. What NOT to do. The highest-leverage sentences in any CLAUDE.md file, because they prevent the agent from making the same mistakes that already burned the team.

Anthropic's own guidance: keep CLAUDE.md under three hundred lines and ensure every line applies universally.^[7] If an instruction only matters for one type of task, it belongs in a more specific document, not in the bootstrap that loads on every session. The discipline is what you leave out.

CLAUDE.md

# Project Context
# Bootstrap file. Slow facts, navigation, anti-patterns. Nothing that changes weekly.

## Architecture
Monorepo: Next.js frontend + Python ML services + shared protobuf schemas.
Services communicate via gRPC. REST is only for public-facing APIs.

## Where to Find Things
- Architecture decisions: `docs/decisions/ADR-*.md`
- API reference: auto-generated from `proto/` — do not edit `docs/api/` directly
- Runbooks: `docs/runbooks/` — each has an owner in frontmatter
- Environment configs: `deploy/envs/` — never hardcode env values

## Conventions
- Branch naming: `type/TICKET-description` (e.g., `feat/PLAT-123-add-caching`)
- Tests required for every new endpoint — match the `*_test.go` pattern
- No direct database queries from API handlers — use the repository pattern

## Do NOT
- Import from `internal/legacy/` — migration in progress, removed Q2
- Use `fmt.Println` for logging — structured logger lives in `pkg/log`
- Skip the linter — `make lint` must pass before PR review

Documentation ROI Was Theoretical. Agents Made It Concrete.

The measurement surface that documentation never had now lives inside every AI session your team runs.

Documentation has always resisted measurement. How do you put a number on "the new hire onboarded faster because the setup guide was clear"? You don't, not credibly. With AI-assisted development the surface finally becomes legible — every session is an instrumented interaction with your documentation, and every miss leaves a trace.

These are the signals that tell you whether your documentation infrastructure is doing real work:

Context Hit Rate

Share of agent queries that retrieve fresh, relevant docs versus stale or irrelevant results

Freshness Coverage

Share of docs inside their staleness SLA — target 95%+ for AI-consumed surfaces

First-Prompt Accuracy

How often agent-generated code is correct on the first attempt — tracks context quality

Context Prep Time

Minutes per day developers spend manually feeding context to AI tools — target trend toward zero

Context prep time is the most diagnostic metric of the four. If your team spends five minutes at the start of every AI session pasting in architecture context, your CLAUDE.md is failing — and the failure is structural, not personal. If developers routinely override agent suggestions because "it does not know our conventions," your conventions are not documented where the agent can reach them.

Teams running this discipline report seventy to eighty percent reductions in context prep time, though the exact number swings hard with team size, tooling maturity, and documentation baseline. Even a fifty percent reduction on a team that touches AI tools fifteen to twenty times a day recovers meaningful focused work — not because the model got smarter, but because the substrate it reads from finally stopped lying.

Documentation Without Tests Is Documentation You Cannot Trust.

If docs are infrastructure, the same enforcement bar that applies to code applies to them. Spell-check is not enforcement.

If documentation is infrastructure, it has tests. Not spell-checking and link validation — those are table stakes. Real tests that verify the documentation still reflects the system it describes.

The tests that matter sit in three layers:

Structural tests (run on every PR)

✓
Every markdown file carries required frontmatter: title, owner, last-verified, audience
✓
All internal links resolve to existing files — no dead references
✓
Code blocks declare a language for syntax highlighting
✓
Headers follow consistent hierarchy — no H4 without a parent H3
✓
llms.txt entries match the actual files in the docs directory

Freshness tests (run on schedule)

✓
No document exceeds its staleness threshold for its document type
✓
Owner field maps to an active team member — not someone who left six months ago
✓
Documents referencing specific software versions flagged when dependencies update
✓
API docs match the current OpenAPI specification — drift triggers a review

Semantic tests (run weekly or on major changes)

✓
Code examples in docs compile and run against the current codebase
✓
Architecture diagrams reference services that actually exist in deployment configs
✓
CLI commands documented in runbooks produce the expected output
✓
Environment variable names in docs match what is defined in config templates

Same Tool, Same Model, Wildly Different Outcomes.

Documentation infrastructure is a feedback loop that compounds. The doc-poor and doc-rich teams diverge with every session that runs.

Documentation infrastructure is a feedback loop that accelerates. Better docs produce better agent output. Better output means fewer corrections, less time fighting the tool, more time building — which includes building better docs. Each turn of the loop tightens.

The inverse is more common and equally compounding. Poor docs produce poor agent output. Developers lose trust in the tools and stop using them, or they pay the manual context tax every session. The team falls behind on documentation because everyone is too busy compensating for bad agent suggestions. The next interaction is worse than the last.

This is why documentation quality is no longer a developer-productivity issue. It is a competitive position. A team with strong documentation infrastructure runs forty-two percent^[6] of its code through an agent that actually understands the system. A team without it runs forty-two percent through an agent that is guessing. Same tool. Same model. Wildly different outcomes — and the gap widens with every commit.

Vicious Loop

Agent suggestions miss conventions — developers override or abandon AI tools
Context provided manually each session — thirty-plus minutes a day per developer
New hires onboard slowly because tribal knowledge is undocumented
Architecture decisions lost — teams re-litigate settled questions
Documentation seen as overhead — never funded in sprint planning

Virtuous Loop

Agent suggestions match conventions — developers extend agent output instead of fighting it
Context loaded automatically via CLAUDE.md and MCP — near-zero prep time per session
New hires (human and agent) productive in days because the context surface is structured
Architecture decisions indexed and retrievable — agent cites them in proposals
Documentation treated as infrastructure — tested, owned, budgeted alongside code

Four Weeks From Afterthought to Infrastructure.

A specific, ordered plan. Audit, bootstrap, enforce, measure. Each week answers the failure mode the last one exposed.

[01]

Week 1: Audit and consolidate

bash

# Find every doc scattered outside the repo.
# Notion, Confluence, Google Drive, Slack bookmarks. Pull the inventory.
# For each: migrate to repo, archive, or delete. Default to delete.

# Lay down the canonical structure.
mkdir -p docs/{architecture,api,runbooks,onboarding,decisions}

# Frontmatter template — the contract for every new doc.
cat > docs/.template.md << 'EOF'
---
title: [Document Title]
owner: [github-username]
last-verified: [YYYY-MM-DD]
audience: [engineers | all | ops]
staleness-threshold: 90
---
EOF

[02]

Week 2: Write the bootstrap files

bash

# Author CLAUDE.md (or the equivalent for your AI tool).
# Slow facts, navigation pointers, anti-patterns. Nothing that changes weekly.
# Target: under 300 lines, every line applies universally.

# Generate llms.txt from the docs directory.
# One-sentence description plus path per entry.
find docs/ -name '*.md' -exec head -3 {} \; > llms.txt.draft

# Validation question: can the agent find what it needs
# from CLAUDE.md plus llms.txt alone? If not, the bootstrap is leaking.

[03]

Week 3: Enforce in CI

bash

# Wire freshness checks into the CI pipeline.
# Add structural validation — frontmatter, links, headers, hierarchy.
# Add llms.txt sync — entries match actual files, no orphans.

# Staleness thresholds per document type:
# API docs: 30 days | Runbooks: 60 days
# Architecture: 180 days | CLAUDE.md: 14 days

# Run the first audit. Expect failures. Failures are the point.
bun run docs:freshness --report

[04]

Week 4: Measure and iterate

bash

# Establish baseline metrics:
# - Context hit rate (share of agent queries finding fresh docs)
# - First-prompt accuracy (share of agent code correct first attempt)
# - Context prep time (minutes per day developers spend feeding context)

# Weekly freshness reports auto-posted to the team channel.
# Every unowned file gets an owner — no exceptions.
# Doc review on the sprint planning calendar, not optional.

Operating Doctrine

The questions teams ask after the first audit fails. The answers settle them.

Our team barely writes documentation now. How do we change the culture?

Do not try. Culture lectures do not produce documentation. The system that surrounds the writing does. Add frontmatter templates so the format is obvious. Add CI checks so missing docs block merges. Add ownership fields so a specific person is accountable. When documentation is part of the definition of done — like tests — it happens. When it is optional, it does not. The leverage is structural, not motivational.

Should we generate documentation with AI instead of writing it manually?

AI-generated documentation is fine for code-level surfaces — function signatures, API references, type definitions. It is the wrong tool for architecture decisions, runbooks, and context docs, which carry the most weight for agent context quality. Use AI to draft the mechanical docs. Write the strategic docs by hand. The failure mode to watch: agent-generated docs that sound authoritative but describe library defaults rather than how your team actually uses the library. Domain owner reviews everything before it enters the canonical store. No exceptions.

How does llms.txt relate to MCP servers? Do we need both?

llms.txt is a static file every AI tool can read with no setup. MCP servers serve dynamic context — query databases, check live system state, return personalized responses. Different jobs. Start with llms.txt because it ships in thirty minutes and works everywhere. Reach for MCP when the documentation surface outgrows static or when live data is the actual constraint. Most teams need llms.txt yesterday and MCP six months from now.

What about documentation for non-engineering teams?

The constraints are identical. Sales playbooks, support runbooks, HR policy docs — anywhere agents consume organizational knowledge, the same three properties have to hold: structure, freshness enforcement, named ownership. The tooling differs because not every team uses git. The infrastructure mindset does not. If an agent reads it, it is infrastructure.

Our docs are in Confluence or Notion. Do we have to migrate everything?

No, but you need a bridge. Some teams stand up MCP servers that expose Notion or Confluence content to AI tools. Others sync the load-bearing docs into the repo via automation. The constraint that decides the answer: if your AI coding tools cannot reach the docs, the docs do not exist for code generation. Pick the bridge that matches the workflow you actually run, then enforce the same freshness bar on the bridge that you enforce on in-repo docs.

Pre-Production Documentation Infrastructure Checklist

All load-bearing documentation co-located in the repository, or bridged via MCP with the same freshness bar
CLAUDE.md (or equivalent) carries slow facts, navigation pointers, and anti-patterns — under 300 lines
llms.txt generated and kept in sync with the docs directory by CI, not by hand
Frontmatter contract enforced: title, owner, last-verified, audience, staleness-threshold
CI pipeline validates documentation freshness on a schedule, not on demand
CI pipeline validates documentation structure on every PR, before merge
Every doc file has a named, currently-employed owner — no team aliases
Code examples in docs tested against the current codebase, not last quarter's snapshot
Context hit rate and first-prompt accuracy tracked and visible to the team
Documentation review on the sprint planning calendar — not optional

Key terms in this piece

documentation infrastructureAI context qualitydocs-as-codellms.txtMCP serversCLAUDE.mddocumentation freshnessAI-assisted developmentmachine-readable documentationdocumentation testing

Sources

[1]llms.txt Specification(llmstxt.org)↩
[2]InfoQ — Google Documentation AI Agents(infoq.com)↩
[3]Fern — How To Write LLM-Friendly Documentation(buildwithfern.com)↩
[4]Factory AI — The Context Window Problem(factory.ai)↩
[5]Snowflake — Impact of Retrieval and Chunking in Finance RAG(snowflake.com)↩
[6]ShiftMag — State of Code 2025(shiftmag.dev)↩
[7]Anthropic — Claude Code Best Practices(code.claude.com)↩
[8]Anthropic — Effective Context Engineering For AI Agents(anthropic.com)↩
[9]Document360 — AI Documentation Trends(document360.com)↩
[10]ClickHelp — Documentation 2026: From Human-Centric to AI-First(clickhelp.com)↩

Share this article

X LinkedIn Hacker News

Your Docs Are Now Your AI's Runtime. Most Teams Have Not Noticed.

Data, Context & KnowledgeintermediateSep 16, 20258 min read

By Viktor Bezdek · VP Engineering, Groupon

# Acme Platform Documentation # One file. Discovery layer for every agent that hits the site. > Acme Platform is a data orchestration layer for ML pipelines. > This file points agents at the canonical sources. ## API Reference - [Authentication](/docs/api/auth.md): OAuth2 and API key flows - [Pipelines API](/docs/api/pipelines.md): create, configure, monitor pipelines - [Transforms API](/docs/api/transforms.md): define and chain transformations ## Architecture - [System Overview](/docs/arch/overview.md): high-level architecture and data flow - [Data Model](/docs/arch/data-model.md): core entities, relationships, constraints ## Guides - [Quick Start](/docs/guides/quickstart.md): first pipeline in under 5 minutes - [Migration from v2](/docs/guides/migration-v2-v3.md): breaking changes and upgrade path

repo/ ├── docs/ │ ├── architecture/ │ │ ├── system-overview.md │ │ ├── data-model.md │ │ └── decisions/ │ │ ├── ADR-001-database-choice.md │ │ └── ADR-002-auth-provider.md │ ├── api/ │ │ ├── openapi.yaml │ │ ├── auth.md │ │ └── endpoints.md │ ├── runbooks/ │ │ ├── incident-response.md │ │ └── deploy-rollback.md │ └── onboarding/ │ ├── setup.md │ └── conventions.md ├── CLAUDE.md ├── llms.txt └── .github/workflows/docs-freshness.yml

# .github/workflows/docs-freshness.yml # Stale docs fail CI. The owner gets named in the warning. No exceptions. name: Documentation Freshness Check on: schedule: - cron: '0 9 * * 1' # Every Monday 9 AM push: paths: ['docs/**'] jobs: freshness: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # Full history for git log dates - name: Check document freshness run: | STALE_THRESHOLD_DAYS=90 STALE_FILES="" for file in $(find docs/ -name '*.md'); do LAST_MODIFIED=$(git log -1 --format='%ct' -- "$file") NOW=$(date +%s) AGE_DAYS=$(( (NOW - LAST_MODIFIED) / 86400 )) if [ $AGE_DAYS -gt $STALE_THRESHOLD_DAYS ]; then OWNER=$(head -5 "$file" | grep -oP '(?<=owner: ).*' || echo 'unowned') STALE_FILES="$STALE_FILES\n$file ($AGE_DAYS days, owner: $OWNER)" fi done if [ -n "$STALE_FILES" ]; then echo "::warning::Stale docs found:$STALE_FILES" exit 1 fi

Document Type

Staleness Threshold

Owner

Review Trigger

API reference

30 days

API team lead

Any endpoint change in OpenAPI spec

Architecture decisions (ADRs)

180 days

Original author

Related system metric change

Runbooks

60 days

On-call rotation lead

Any incident that ran the runbook

Onboarding guides

90 days

Engineering manager

New-hire feedback or tooling change

CLAUDE.md / AI context

14 days

Tech lead

Any convention or dependency change

llms.txt

Auto-generated

CI pipeline

Any doc added, moved, or deleted

The best CLAUDE.md files follow a progressive disclosure pattern. They do not dump every fact the agent might one day need. They carry exactly three things:

Slow facts. Team conventions, architecture decisions, naming patterns. Things that change quarterly, not daily. If it changes weekly, it does not belong here.
Navigation pointers. Where to find specific kinds of information. "Architecture decisions live in docs/decisions/. Runbooks live in docs/runbooks/. API reference is generated from openapi.yaml." The agent searches efficiently instead of wandering — and the agent that wanders burns tokens and produces drift.
Anti-patterns. What NOT to do. The highest-leverage sentences in any CLAUDE.md file, because they prevent the agent from making the same mistakes that already burned the team.

# Project Context # Bootstrap file. Slow facts, navigation, anti-patterns. Nothing that changes weekly. ## Architecture Monorepo: Next.js frontend + Python ML services + shared protobuf schemas. Services communicate via gRPC. REST is only for public-facing APIs. ## Where to Find Things - Architecture decisions: `docs/decisions/ADR-*.md` - API reference: auto-generated from `proto/` — do not edit `docs/api/` directly - Runbooks: `docs/runbooks/` — each has an owner in frontmatter - Environment configs: `deploy/envs/` — never hardcode env values ## Conventions - Branch naming: `type/TICKET-description` (e.g., `feat/PLAT-123-add-caching`) - Tests required for every new endpoint — match the `*_test.go` pattern - No direct database queries from API handlers — use the repository pattern ## Do NOT - Import from `internal/legacy/` — migration in progress, removed Q2 - Use `fmt.Println` for logging — structured logger lives in `pkg/log` - Skip the linter — `make lint` must pass before PR review

# Find every doc scattered outside the repo. # Notion, Confluence, Google Drive, Slack bookmarks. Pull the inventory. # For each: migrate to repo, archive, or delete. Default to delete. # Lay down the canonical structure. mkdir -p docs/{architecture,api,runbooks,onboarding,decisions} # Frontmatter template — the contract for every new doc. cat > docs/.template.md << 'EOF' --- title: [Document Title] owner: [github-username] last-verified: [YYYY-MM-DD] audience: [engineers | all | ops] staleness-threshold: 90 --- EOF

# Author CLAUDE.md (or the equivalent for your AI tool). # Slow facts, navigation pointers, anti-patterns. Nothing that changes weekly. # Target: under 300 lines, every line applies universally. # Generate llms.txt from the docs directory. # One-sentence description plus path per entry. find docs/ -name '*.md' -exec head -3 {} \; > llms.txt.draft # Validation question: can the agent find what it needs # from CLAUDE.md plus llms.txt alone? If not, the bootstrap is leaking.

# Wire freshness checks into the CI pipeline. # Add structural validation — frontmatter, links, headers, hierarchy. # Add llms.txt sync — entries match actual files, no orphans. # Staleness thresholds per document type: # API docs: 30 days | Runbooks: 60 days # Architecture: 180 days | CLAUDE.md: 14 days # Run the first audit. Expect failures. Failures are the point. bun run docs:freshness --report

# Establish baseline metrics: # - Context hit rate (share of agent queries finding fresh docs) # - First-prompt accuracy (share of agent code correct first attempt) # - Context prep time (minutes per day developers spend feeding context) # Weekly freshness reports auto-posted to the team channel. # Every unowned file gets an owner — no exceptions. # Doc review on the sprint planning calendar, not optional.

Stale Docs Do Not Just Produce Bad Output. They Produce Confident Bad Output, At Scale.

Human-Readable and Machine-Readable Are Not the Same File.

llms.txt Is the Discovery Layer. MCP Is the Runtime.

If It Is Not in the Repo, It Does Not Exist.

AI-Native Documentation Structure

Stale Docs Are the Default State. CI Is the Only Thing That Reverses It.

From Markdown to Model: The Pipeline That Has to Hold.

Two Audiences, One Surface. The Constraints Overlap More Than You Think.

Lead with a purpose statement, not a preamble

Use semantic headers, not clever ones

Make every section self-contained

Mark facts and opinions differently

CLAUDE.md Is the Bootstrap File. Treat It Like One.

Documentation ROI Was Theoretical. Agents Made It Concrete.

Documentation Without Tests Is Documentation You Cannot Trust.

Structural tests (run on every PR)

Freshness tests (run on schedule)

Semantic tests (run weekly or on major changes)

Same Tool, Same Model, Wildly Different Outcomes.

Four Weeks From Afterthought to Infrastructure.

Week 1: Audit and consolidate

Week 2: Write the bootstrap files

Week 3: Enforce in CI

Week 4: Measure and iterate

Operating Doctrine

Pre-Production Documentation Infrastructure Checklist

Related

Stale Docs Do Not Just Produce Bad Output. They Produce Confident Bad Output, At Scale.

Human-Readable and Machine-Readable Are Not the Same File.

llms.txt Is the Discovery Layer. MCP Is the Runtime.

If It Is Not in the Repo, It Does Not Exist.

AI-Native Documentation Structure

Stale Docs Are the Default State. CI Is the Only Thing That Reverses It.

From Markdown to Model: The Pipeline That Has to Hold.

Two Audiences, One Surface. The Constraints Overlap More Than You Think.

Lead with a purpose statement, not a preamble

Use semantic headers, not clever ones

Make every section self-contained

Mark facts and opinions differently

CLAUDE.md Is the Bootstrap File. Treat It Like One.

Documentation ROI Was Theoretical. Agents Made It Concrete.

Documentation Without Tests Is Documentation You Cannot Trust.

Structural tests (run on every PR)

Freshness tests (run on schedule)

Semantic tests (run weekly or on major changes)

Same Tool, Same Model, Wildly Different Outcomes.

Four Weeks From Afterthought to Infrastructure.

Week 1: Audit and consolidate

Week 2: Write the bootstrap files

Week 3: Enforce in CI

Week 4: Measure and iterate

Operating Doctrine

Pre-Production Documentation Infrastructure Checklist

Related