Every Claude session starts from zero unless something carries the org forward. CLAUDE.md is that something — a persistent context layer that encodes team topology, current priorities, and the decisions you have already paid for. Treat it as a config file and you keep paying the coordination tax.
Exactly how CLAUDE.md files load — scopes, hierarchy, token costs, and what survives compaction
The seven sections that convert a config file into institutional memory
Skills vs CLAUDE.md vs MCP: the three-layer architecture and its failure modes
The /update-context weekly ritual that keeps context from rotting
Multi-team monorepo patterns with path-scoped rules and claudeMdExcludes
Monday-morning checklist: a 30-minute setup from zero to org OS
Most teams write a CLAUDE.md that lists their lint rules and a one-line project description. That's a config file. It misses the actual mechanism.
The file loads automatically at the start of every Claude Code session — concatenated into context before you type a word. That single property makes it the highest-leverage piece of persistent context in your entire workflow. Every conversation that doesn't start with the file ends up retyping the same context: who owns what, this quarter's priorities, the V1 API you already decided to deprecate. Each retype is coordination tax. It compounds across every engineer and every session.
Reframe it: CLAUDE.md is the operating system for how your team works with Claude. Not a README. Not a config file. A living document that encodes how the org thinks, decides, and communicates — loaded into every session by default.
One constraint, named upfront: the value of this system tracks the honesty of the maintenance. A CLAUDE.md full of priorities the team has quietly abandoned produces confidently misaligned advice — worse than no file at all. The weekly update ritual isn't optional. It's what keeps the rest from rotting.
The loading mechanism determines everything: what to put in the file, where to put it, and what to keep out.
Claude Code discovers CLAUDE.md files by walking up the directory tree from your working directory. If you launch from services/payments/, Claude reads services/payments/CLAUDE.md, services/CLAUDE.md, and the root CLAUDE.md — all concatenated into context, ordered from filesystem root down to your working directory. The most specific file appears last, giving project-level instructions the highest recency.
Subdirectory CLAUDE.md files load differently. A file in src/api/ doesn't load at session start — it loads on demand the first time Claude reads a file in that directory. That's the right behavior: API conventions are noise when you're working on the data pipeline.[9]
Four scopes exist, each serving a different ownership layer:[9]
| Scope | Location | Loaded By | Shared With |
|---|---|---|---|
| Managed policy | /etc/claude-code/CLAUDE.md (Linux) or equivalent | IT/DevOps via MDM | All users on the machine — cannot be excluded |
| User | ~/.claude/CLAUDE.md | Individual engineer | That engineer across all projects |
| Project | ./CLAUDE.md or ./.claude/CLAUDE.md | Committed to version control | Entire team via source control |
| Local | ./CLAUDE.local.md (gitignored) | Individual engineer | That engineer, current project only |
The managed policy scope is underused. Organizations running Claude Code across a fleet of developer machines can deploy a centrally managed CLAUDE.md through MDM, Group Policy, or Ansible. It loads before any user or project file, and engineers can't exclude it via claudeMdExcludes. Use it for org-wide compliance requirements and behavioral standards — not build commands, which belong in the project file.[9]
Token cost matters here. CLAUDE.md content enters the context window as a user message after the system prompt, consuming tokens on every session start. The official guidance: target under 200 lines per file. Longer files reduce adherence — not because Claude ignores them, but because the signal-to-noise ratio drops and important instructions get buried.[9]
One gotcha that burns teams: imported files via @path/to/file load in full at launch alongside the base CLAUDE.md. Splitting a 300-line file into three @imports doesn't reduce token cost — it just improves human readability. To actually reduce startup context, use path-scoped rules instead.
A .claude/rules/ directory lets you load instructions only when Claude works with matching files — real token savings, not organizational convenience.
The .claude/rules/ directory is separate from CLAUDE.md and solves a specific problem: instructions that apply to one part of the codebase but not others. You pay token cost for every rule that loads unconditionally. Path-scoped rules change the math.[9]
A rules file with paths frontmatter only loads when Claude reads files matching the pattern. An API-design rule scoped to src/api/**/*.ts costs zero tokens when you're fixing a frontend bug:
Rules without paths frontmatter load at launch with the same priority as .claude/CLAUDE.md. That's the right scope for project-wide conventions: commit format, test runner commands, naming conventions. Rules with paths are the right scope for domain-specific constraints that would just be noise outside that domain.
Monorepos with multiple teams can use claudeMdExcludes in .claude/settings.local.json to skip CLAUDE.md files from other teams. This keeps each engineer's context clean without forcing a central debate about what goes in the root file.
The gap isn't model quality. It's what the model was given to start with.
Open a Claude session without organizational context and you start at zero. The model doesn't know your team lead is on parental leave, that your infra budget got cut last month, or that the board approved an enterprise pivot last Tuesday.
Those facts shape every recommendation. Without them, the answers are technically sound and organizationally tone-deaf. With them, you get a collaborator that understands the constraints, respects ownership boundaries, and stays inside what the team already committed to.
Practitioner reports converge on roughly 40-60% less time spent on session-setup prompts when persistent org context is loaded — though the spread is wide and depends on team size and workflow shape.[2] The bigger gain shows up downstream: outputs need fewer correction cycles because the model isn't guessing at the org topology.
The official guidance from Anthropic's docs names when to add something to CLAUDE.md: when Claude makes the same mistake a second time, when a code review catches something Claude should have known about the codebase, or when you type the same correction into chat that you typed last session.[9] Each of those is a coordination tax. The file is where you pay it once.
Each section closes a specific gap the model can't infer from your code.
A useful organizational CLAUDE.md covers seven areas. Few teams need all seven on day one — start with two or three and expand as the gaps surface. The constraint is simple: capture context Claude can't infer from the code but that shapes every decision.
Teams, leads, what they own. Reporting lines if they affect routing. When Claude proposes a change to a service, it should know which team to tag for review and whose roadmap absorbs the impact. Without this, every suggestion routes to the same default reviewer regardless of ownership.
For each EM, document scope (reports, services owned) and known strengths. If the infra EM has deep Kubernetes expertise, the model should weight that when proposing migration paths. The output gets sharper because the constraint is named.
The current quarter's top 3-5 priorities with one-line rationale each. This section prevents Claude from suggesting work that conflicts with what leadership already committed to. Without it, you get plausible suggestions the org has already ruled out.
When and how to escalate. On-call rotation. Severity definitions. Final decision authority on cross-team disputes. Documented escalation prevents the failure mode where every borderline incident routes to whoever is most senior in Slack at 2am.
Spell out how the team works: async-first with Slack threads, synchronous standups, RFC-driven decisions, or some combination. Claude should draft communications that match the team's actual mode. Output that fits the channel ships. Output that doesn't gets rewritten.
Feature flags, A/B tests, architectural spikes currently running. The model should know not to touch code behind an active experiment without explicit approval. The blast radius of a stray refactor inside a live test is real and avoidable.
A running log of consequential decisions: date, participants, reasoning. This is the highest-leverage section in the file. It stops Claude from relitigating settled debates and gives the model the 'why' behind your architecture. Without it, the same trade-off gets re-argued every quarter as people roll on and off the team.
Stuff everything in one layer and the system breaks in predictable ways. Distribute it correctly and each layer pays for itself.
| Layer | Purpose | When to Use | Token Cost | Example |
|---|---|---|---|---|
| CLAUDE.md | Declarative knowledge — what and why | Facts the model needs every session | Full load every session; target ≤200 lines | Team topology, quarterly priorities, decision log |
| Skills (.claude/skills/) | Procedural knowledge — how | Multi-step workflows triggered on demand | ~100 tokens metadata at startup; full SKILL.md only when triggered | /deploy, /update-context, /write-rfc |
| Path-scoped rules (.claude/rules/) | Domain-specific constraints | Standards for a specific file pattern or directory | Conditional: zero tokens until matching file is read | API validation rules, frontend component standards |
| MCP Servers | External connectivity — live data | Real-time data from tools and databases | ~55K tokens for 5 servers + 58 tools | GitHub PRs, Jira tickets, Slack messages |
The most common failure is stuffing every fact into CLAUDE.md. A 300-line file dilutes signal with noise — the section the model actually needs gets buried. Working ceiling: roughly 100-150 lines in the root file, with @imports pulling in detailed sections for human organization (not token reduction) and path-scoped rules handling domain specifics. Calibrate against what your team actually consults.
Skills sit in the right layer for recurring procedures. Deployment checklist, incident response playbook, RFC template — procedural, not declarative. The token economics work in your favor: ~100 tokens of metadata loads at startup so Claude knows the skill exists; the full instructions enter context only when triggered.[10] Skills cost effectively nothing until invoked.
MCP closes the live-data gap. CLAUDE.md can say 'we use Linear for project tracking' but can't show the current sprint board. An MCP server for Linear bridges that — real-time access without pasting ticket details into every conversation.[7]
Three levels of loading — only one runs at startup. Getting this wrong doubles your token costs for no gain.
Skills use progressive disclosure: content loads in stages as needed, not all at once. Anthropic's official documentation names three distinct levels:[10]
Level 1 — Metadata (~100 tokens, always loaded at startup): The YAML frontmatter name and description fields. Claude reads these at session start to know each skill exists and when to use it. You can install dozens of skills without meaningful context penalty.
Level 2 — Instructions (under 5K tokens, loaded when triggered): The SKILL.md body — workflows, best practices, example code. Claude reads this from the filesystem via bash when your request matches the skill description. Only then does this content enter the context window.
Level 3 — Resources (effectively unlimited, loaded as needed): Additional files bundled in the skill directory — companion markdown files, executable scripts, reference material. Scripts run via bash and only their output enters context; the script code itself never does. This is why a skill can bundle hundreds of lines of Python without any context penalty unless the script runs.
500-line CLAUDE.md: deployment runbook, code standards, team topology, decisions all mixed together
Every session loads all 500 lines regardless of task
Instructions buried — adherence drops as file grows
Procedures update in CLAUDE.md; nobody knows if they're current
Live data (sprint status, open PRs) pasted manually each session
Root CLAUDE.md: ≤150 lines of org facts — topology, priorities, decisions
Deployment runbook → Skill: only loads for deploy-related prompts
Path-scoped rules: API standards only load when working in src/api/
/update-context skill proposes amendments weekly; human approves
MCP server surfaces live Linear sprint, GitHub PRs on demand
Static documentation decays. The fix isn't discipline. It's a structured weekly mechanism.
Static documentation rots. That's the failure mode every persistent context system inherits — the org moves on while the file sits unchanged. Untracked drift turns the OS into a confidently misaligned advisor.
The /update-context skill closes this loop. Once a week, you run it. The skill prompts Claude to scan recent changes across the codebase, the decision log, and team communications, then proposes specific amendments to your CLAUDE.md files. You review, approve or modify, and the org context stays current. Maintenance becomes a structured conversation instead of a chore that nobody owns.
Important: the skill proposes. It never auto-commits. Org context is too high-leverage for unsupervised edits — a wrong amendment misroutes every session that loads it.
Nested CLAUDE.md files give each team their own context surface. The failure mode is cross-team drift, not the nesting itself.
treemonorepo/
├── CLAUDE.md
│ └── # Global: org priorities, escalation paths, communication norms
├── .claude/
│ ├── rules/
│ │ ├── security.md # paths: ["**/*.ts", "**/*.py"] — PCI rules, never store raw card data
│ │ └── testing.md # Unconditional: test framework, coverage thresholds
│ └── skills/
│ ├── update-context/SKILL.md
│ └── write-rfc/SKILL.md
├── teams.md
│ └── # @imported: full team roster and ownership map
├── decisions.md
│ └── # @imported: organization-wide decision register
└── services/
├── payments/
│ └── CLAUDE.md # Stripe migration status, PCI compliance constraints
├── platform/
│ └── CLAUDE.md # Latency targets, infra constraints, k8s conventions
└── frontend/
└── CLAUDE.md # Design system version, browser support matrixClaude loads CLAUDE.md files hierarchically. Working in services/payments/ means the model sees both the root org context and the payments-specific context — but the subdirectory file loads on demand, not at session start, unless you're already working in that directory. Each team carries their own file without duplicating global information.
The leverage: the payments team documents Stripe-specific conventions, PCI compliance constraints, and migration timeline without polluting the file every other team sees. They still inherit org-wide priorities and the decision register from the root.
Here's the failure mode we hit early in multi-team deployments: every team writes context independently, without coordination. Within two months, three teams had contradictory escalation paths in their files, and two had different definitions of 'P1 incident.' The model then gives confidently inconsistent answers about which severity scale applies to which service.
The fix is a hard rule: the root CLAUDE.md owns all cross-team definitions. Sub-directory files reference root definitions — they don't redefine them. If a team needs to adjust the escalation definition for their domain, they open a PR against the root. That PR is the coordination mechanism. Without it, drift compounds silently.
Two specific failure modes the register prevents — both expensive, both invisible until they appear.
Teams make dozens of consequential decisions each quarter. Six months later, nobody remembers why you chose Postgres over DynamoDB, or why billing stayed a monolith. The decision register captures the reasoning at the moment it was made, blocking two specific failure modes.
First: it stops Claude from suggesting approaches the team already considered and rejected. Without the register, the model builds a case for the migration you ruled out three weeks ago. With it, the model knows the decision was made, understands the rationale, and works inside the constraint instead of relitigating it.
Second: it absorbs organizational amnesia when people leave. The EM who championed the Postgres choice rolls to a different team. The reasoning survives in the register — the institutional memory is no longer locked in a specific person's head. This is the closest thing to externalizing your org's reasoning capacity.
Format matters. An undated entry with no participants loses context within weeks. The minimum viable entry: date, one-sentence decision, one-sentence rationale, named participants. Link to the full RFC or ADR where it exists. That link is the difference between a pointer and a copy — the copy drifts, the pointer stays current.
Claude proposes DynamoDB for audit logs — you spend 10 minutes re-explaining why it was ruled out
New engineer suggests splitting billing into microservices — relitigates a settled debate
The security constraints behind the auth architecture are nowhere documented
Every session restarts the same context-rebuild cycle
When the proposing EM leaves, the reasoning leaves with them
Claude reads the register and works inside existing constraints by default
New engineer sees the decision, rationale, and participants — onboard ramp shortens
Security constraints sit next to the decision they shaped
Every session starts with full historical context already loaded
Reasoning is captured at the moment of decision — survives any individual departure
Use @imports for detailed sections. A bloated root dilutes the signal that matters most. Official guidance targets under 200 lines per file for adherence; staying under 150 leaves headroom for growth.
CLAUDE.md encodes what is true about the org. Procedural how-to belongs in Skills. Mixing them produces a file that fails at both.
Undated decisions lose context fast. Date, participants, one-sentence rationale — every time, no exceptions.
The file is version-controlled. Reference secrets by name only — keep the values in env vars or a secrets manager. This applies to API keys, tokens, passwords, and any value that would damage you in a public commit.
Instructions that only apply to src/api/ shouldn't load when you're working in the frontend. Scope them with paths frontmatter in .claude/rules/ — the token savings add up across large teams.
Stale context is worse than missing context. Archive completed priorities and resolved experiments. Drift is the default state without active pruning.
Link to the canonical source. Duplicated content drifts out of sync with the original within weeks. CLAUDE.md should say 'see docs/decisions/0042-postgres.md' not re-copy the content.
Auto memory (Claude Code v2.1.59+) lets Claude write its own notes across sessions — build commands it discovers, debugging patterns, style preferences it learns from your corrections. The first 200 lines of MEMORY.md (or 25KB, whichever comes first) load at session start. Topic files like debugging.md are machine-local and not in version control.
Use CLAUDE.md for org-shared facts you write deliberately — topology, priorities, decisions. Use auto memory for individual learned patterns Claude discovers on its own. They're complementary: CLAUDE.md is the team constitution; auto memory is each engineer's working notes. Don't duplicate content between them.
How long should the org CLAUDE.md be?
Root file under 150 lines. Anthropic's official documentation targets under 200 lines per file before adherence starts degrading — staying under 150 gives you room to grow. Use @imports to pull in detailed files like teams.md or decisions.md; this helps human maintainers but doesn't reduce token cost. To actually reduce startup context for domain-specific standards, use path-scoped rules in .claude/rules/ instead.
Does this replace documentation?
No. CLAUDE.md is context for the model, not documentation for humans. It complements existing docs by pointing to them. Treat it as an index card that says 'here is what matters right now,' not a comprehensive manual. Mixing the two roles produces a file that fails at both — too long for context, too sparse for human reference.
How do we handle sensitive information — salaries, performance data?
Keep it out. The project CLAUDE.md is version-controlled and visible to anyone with repo access. Reference sensitive data sources by name — 'salary bands live in the HR system' — without including values. If Claude needs the actual data during a session, paste it inline where it stays session-scoped and never persists. Use CLAUDE.local.md (gitignored) for per-engineer sandbox details. The test: would you be comfortable if this appeared in a public commit? If no, it doesn't belong in any persistent context file.
What if teams disagree on what goes in the root?
The root holds only org-wide context that affects every team. Team-specific context belongs in nested CLAUDE.md files in each team's directory. The dispute resolves with one question: does this fact change how Claude should behave when working in any part of the codebase? If yes, root. If no, sub-directory. Cross-team definitions that disagree between files produce confidently wrong answers — that's the cost of not resolving the dispute.
Can non-engineering teams use CLAUDE.md?
Yes, and it works the same way. Marketing encodes brand voice guidelines, content calendars, and campaign priorities. Product documents personas, roadmap commitments, and A/B test outcomes. The pattern is layer-agnostic — any team that works with Claude has org context worth persisting. The same rules apply: facts not instructions, under 200 lines, dated decisions, weekly review.
What survives context compaction (/compact)?
Project-root CLAUDE.md survives: after /compact, Claude re-reads it from disk and re-injects it into the session. Nested CLAUDE.md files in subdirectories are not re-injected automatically — they reload the next time Claude reads a file in that subdirectory. If an instruction disappeared after compaction, it was either given only in conversation or lives in a nested file that hasn't reloaded yet. Add conversation-only instructions to CLAUDE.md to make them persist across compaction.
The shift from CLAUDE.md as style guide to CLAUDE.md as organizational OS isn't a technical lift. It's a reframing. You stop telling Claude about the code and start telling Claude about the organization.
Start small. Add team topology and the top three priorities this week. Add a decision register entry for the last consequential architectural choice your team made. Build the /update-context skill next week. Inside a month, every session starts with the weight of your team's context, decisions, and direction already loaded — and the coordination tax you used to pay every conversation stops compounding.
The weekly update ritual is not optional — that's the constraint. A CLAUDE.md that nobody updates is worse than no file at all, because it confidently misroutes instead of simply knowing nothing.
GMV is the scoreboard, not the game. Marketplace teams that wait for revenue to confirm a category is dying have already lost the merchants whose absence caused it. Four signals, one weekly brief, three to six weeks of warning before the line bends.
App Store reviews, NPS verbatims, Zendesk tickets, interview notes, community mentions — five inputs, five biases, five cadences. Treat them equal and the loudest channel wins. The fix is a normalization and weighting layer that produces one weekly brief.
Engineering directors burn 45 minutes every morning reconstructing a picture five tools could have assembled. Replace the loop: five parallel collectors, one orchestrator, a confidence score, a 90-second RED/AMBER/GREEN brief. Triage out of working memory, into code.