Most organizations discover strategic misalignment the expensive way. Two teams ship contradictory features. A platform migration collides with a launch. An architecture decision quietly invalidates three months of roadmap planning. Every one of those incidents was preventable. The conflict was sitting in documents nobody compared side by side.
The coherence gap is the distance between what separate teams believe to be true about shared goals, timelines, and constraints. It does not grow from explicit disagreement — that gets aired in planning. It grows from semantic drift: the same strategic vocabulary, used in good faith, decoded into different operational definitions. One team reads "scalable" and pictures horizontal autoscaling. Another reads the same word and plans a monolith refactor with bigger machines. Both are internally consistent. Neither knows the other exists.
A weekly document-scanning agent closes the gap before it costs a quarter. Not by resolving conflicts — by making them specific enough that the right humans cannot pretend they did not see them.
Explicit Disagreements Get Resolved. Implicit Ones Metastasize.
Semantic drift is not a typo or a missing ticket. It is a slow divergence of meaning that no standup catches.
Semantic drift is the operational definition that two teams never reconciled because they were never in the same document at the same time. Same vocabulary. Different mechanics underneath.
A real shape: Team Alpha's PRD calls for "multi-tenant isolation" in the billing service. Team Beta's ADR pins "tenant-level data separation" as a Q3 milestone. The platform OKRs measure success by "customer onboarding velocity." All three documents touch the same surface. None defines where tenant isolation ends and onboarding speed begins. The billing team implements strict row-level security. Every API call costs an extra 200ms. The onboarding team's latency targets break the day the migration ships.
No standup catches this.[7] It lives in the assumptions between paragraphs of documents written weeks apart by people who never sat at the same table.
Team A wants microservices, Team B wants monolith — argued in a meeting
PM and engineering disagree on the launch date — surfaced in planning
Two department heads fight over the budget — visible in the spreadsheet
Competing feature requests from different stakeholders — written in tickets
Both teams write "scalable" and assume different architectures underneath
PRD says "real-time" meaning seconds. ADR assumes "real-time" meaning minutes. Nobody compared.
"Customer-first" decoded as NPS by one team and revenue retention by another
OKR targets quietly assume different dependency-resolution timelines from each other
An Agent That Reads What Teams Wrote and Finds What They Meant Differently
The agent runs weekly, compares cross-team documents, and surfaces the conflicts. It does not resolve them. Resolution needs authority the agent does not have.
The coherence agent runs on a fixed weekly cadence. It pulls the latest versions of strategic documents from each team, normalizes them into a shared semantic representation, then performs cross-document comparison to identify assumption conflicts. Output: a coherence brief delivered to leadership and team leads.
The agent surfaces. Humans resolve. Resolution requires judgment, context, and organizational authority. The agent's job ends at making conflicts visible with enough specificity that the right people cannot route around them.
- [01]
Step 1: Pull Documents and Normalize
The agent pulls the latest published versions of PRDs, roadmaps, ADRs, and OKR documents from the document store — Notion, Confluence, Google Docs, or a Git repository. Each document gets parsed into a structured representation that preserves section hierarchy, timestamps, and authorship metadata. No raw text leaves this layer once extraction completes.
- [02]
Step 2: Decompose to Atomic Claims
Surface-level document comparison fails. Two PRDs can share zero keywords and still contradict each other. The agent decomposes each document into discrete claims — factual assertions, assumed constraints, stated timelines, dependency expectations. Each claim gets embedded into a vector space alongside its surrounding context. This is where coherence detection actually starts working or fails silently.
- [03]
Step 3: Compare Across Team Boundaries
The agent compares claims across team boundaries, looking for pairs where semantic similarity is high (same topic) but operational meaning diverges (different mechanics underneath). This is the core: finding where two teams talk about the same thing while assuming different things about it. Same-team comparison is filtered out — that conflict gets resolved in the team's own planning.
- [04]
Step 4: Compile the Brief
Detected conflicts compile into a structured brief that names the conflicting documents, quotes the specific passages, explains the divergence, and proposes a resolution format — async doc review, 30-minute sync, or escalation to a shared decision record. The brief is the product. Everything upstream is plumbing.
What the Agent Looks For: A Claim Taxonomy
The hardest problem is not extracting what a document says. It is extracting what the document assumes.
The hardest problem in coherence detection is the gap between what a document states and what it takes for granted. A PRD might state "the recommendation engine will use collaborative filtering" (explicit claim) while assuming "the data pipeline delivers fresh user events within 5 minutes" (implicit dependency). If the data team's ADR pins batch processing on a 6-hour cycle, the implicit assumption is already broken. No keyword search catches it.
Reliable detection requires a claim taxonomy the agent can extract against. The taxonomy is what teaches the model where implicit assumptions live — which sentences are loadbearing and which are decoration.
| Claim Type | Definition | Example | Detection Difficulty |
|---|---|---|---|
| Explicit Goal | A directly stated objective with measurable criteria | "Reduce checkout latency to under 400ms by Q3" | Low |
| Explicit Constraint | A stated limitation or boundary condition | "Hold SOC 2 compliance through the migration" | Low |
| Implicit Dependency | An unstated reliance on another team's output or timeline | PRD assumes pipeline freshness that no ADR specifies | High |
| Implicit Definition | A term used without operational definition that splits across teams | "Real-time" decoded as seconds in one doc, minutes in another | High |
| Timeline Assumption | A date or duration assumed but never validated cross-team | Roadmap pins API v2 to Q2. API team's OKR targets Q3. | Medium |
| Resource Assumption | Assumed availability of people, budget, or infrastructure | Three teams independently plan the same ML engineer for Q3 | Medium |
| Architectural Assumption | Assumed technical approach that conflicts with another team's plan | Frontend assumes REST. Backend ADR specifies GraphQL migration. | Medium |
Blast Radius: Why Most Conflicts Should Not Get a Meeting
Flagging everything equally is a noise generator. Prioritization is the entire product.
An agent that flags every conflict equally is a noise generator. The product is the ranking. Blast-radius estimation scores each conflict by how many teams, timelines, and customer commitments will absorb damage if it stays unresolved.
Three factors drive blast radius. Dependency depth — how many downstream teams rely on the assumption. Time proximity — how soon the contradiction materializes in production. Reversibility — whether the decision can be undone cheaply or requires a rewrite.
What we got wrong on the first build: the original scoring weighted semantic divergence highest. That produced a brief where the top-ranked conflict was a definitional difference that two teams had already verbally resolved but had not updated in their documents. The actual critical conflict ranked lower — a timeline assumption that would break three production services in six weeks. Blast-radius scoring only became useful once we weighted time proximity above semantic distance. Semantic difference tells you the conflict exists. Blast radius tells you which one to fix this week. The agent that does not separate them is just a smarter typo finder.
coherence-scoring.ts// Blast radius is the product. Semantic similarity is the input.
interface CoherenceConflict {
id: string;
claimA: { document: string; team: string; text: string; type: ClaimType };
claimB: { document: string; team: string; text: string; type: ClaimType };
semanticSimilarity: number; // 0-1, topical overlap
operationalDivergence: number; // 0-1, mechanical disagreement
}
interface BlastRadius {
dependencyDepth: number; // downstream consumer count
timeProximity: number; // days until conflict materializes
reversibilityCost: 'low' | 'medium' | 'high';
customerExposure: boolean;
score: number; // composite 0-100
}
function estimateBlastRadius(
conflict: CoherenceConflict,
dependencyGraph: DependencyGraph,
roadmapTimelines: Map<string, Date>
): BlastRadius {
const depth = dependencyGraph.getDownstreamCount(
conflict.claimA.team,
conflict.claimB.team
);
const proximity = calculateTimeProximity(
conflict, roadmapTimelines
);
const reversibility = assessReversibility(conflict);
const exposure = dependencyGraph.hasExternalDependents(
conflict.claimA.team, conflict.claimB.team
);
return {
dependencyDepth: depth,
timeProximity: proximity,
reversibilityCost: reversibility,
customerExposure: exposure,
// Time proximity weighted above semantic distance — see article body
score: computeComposite(depth, proximity, reversibility, exposure)
};
}The Brief: Turning Misalignment Into a Specific Conversation
Structured output that names the conflict, attributes it to a passage, and proposes the resolution format. The format is not optional.
Each weekly brief carries three sections. A summary of the coherence score trend with contributing factors. A ranked list of detected conflicts with full source context. A proposed resolution format for each one.
The resolution format is not decoration. Not every conflict warrants a meeting. Some need an async clarification in a shared document. Others require an ADR revision. The high-blast-radius, low-reversibility conflicts need a sync with decision-making authority physically in the room. Picking the wrong format is how good detection turns into wasted hours.
What the Brief Contains
Overall coherence score (0-100) with week-over-week trend and contributing factors named
Top 3-5 ranked conflicts with source document excerpts and team attribution
Blast-radius estimate per conflict with dependency chain visualization
Proposed resolution format: async doc comment, 30-minute sync, or decision-record escalation
Recurring misalignment patterns across teams — drift that keeps coming back signals a structural cause
Resolution Format Selection
- ✓
Async doc review — low-blast-radius definitional mismatches that one team can clarify alone
- ✓
30-minute sync — medium-blast-radius timeline or dependency conflicts needing two-team negotiation
- ✓
Decision-record escalation — high-blast-radius architectural conflicts that need leadership sign-off
- ✓
Shared glossary update — recurring implicit definition conflicts that need permanent disambiguation
Implementation: Where Most Coherence Agents Fail
The infrastructure is straightforward. The calibration is where this stops working.
Coherence Agent Project Structure
treecoherence-agent/
├── connectors/
│ ├── notion.ts
│ ├── confluence.ts
│ ├── google-docs.ts
│ └── git-markdown.ts
├── extraction/
│ ├── claim-extractor.ts
│ ├── taxonomy-classifier.ts
│ └── implicit-detector.ts
├── analysis/
│ ├── semantic-comparator.ts
│ ├── conflict-ranker.ts
│ ├── blast-radius.ts
│ └── dependency-graph.ts
├── output/
│ ├── brief-generator.ts
│ ├── slack-notifier.ts
│ └── dashboard-api.ts
├── config.ts
├── scheduler.ts
└── index.tsDeployment Rules: Five Constraints That Are Not Negotiable
The agent never auto-resolves
Surface conflicts. Propose formats. Stop there. Auto-resolution requires organizational authority the agent does not have, and a wrong resolution applied at scale is worse than the original drift.
Run on a fixed weekly cadence — never real-time
Documents change constantly during drafting. Real-time scanning produces false positives that train recipients to ignore the brief. Weekly cadence aligns with publication rhythms and protects signal-to-noise.
Every flagged conflict links to a specific passage
"Teams may be misaligned on data freshness" is useless. "PRD line 47 assumes 5-minute pipeline freshness; ADR-104 specifies 6-hour batch" is actionable. Vague warnings get ignored on the second brief.
Calibrate against historical incidents before the first run
Feed the agent 5-10 past misalignment incidents your organization actually experienced. Use them as few-shot calibration to tune implicit assumption detection for your document patterns. Generic prompts produce generic noise.
Gate on precision, not recall
Missing a conflict costs less than flooding leaders with false positives. A brief with three real conflicts builds trust over months. A brief with twenty noise items gets filtered to spam permanently — and the agent dies with it.
Scaling: When Pairwise Comparison Stops Working
Ten teams generate 45 pairwise relationships. Thirty teams generate 435. Brute force is not a strategy.
Pairwise comparison stops scaling fast. Ten teams produce 45 pairwise relationships. Thirty teams produce 435.[8] Comparing every document against every other document burns tokens and surfaces noise. The agent needs a smarter topology.
Topical clustering with boundary detection is the strategy that holds up. The agent first groups documents by the strategic topics they address — billing, onboarding, data platform, observability — then concentrates comparison on documents that straddle topic boundaries. The highest-value conflicts live at those boundaries, where two teams touch the same system from different angles with different assumptions. Comparing inside a cluster catches drift the cluster owners would have caught themselves. Comparing across clusters catches drift nobody is currently watching for.
Pre-Deployment Readiness Checklist
Document sources identified and API access wired up — Notion, Confluence, Git, whatever is canonical
Claim taxonomy defined with at least 5 claim types tuned to your organization's document patterns
Historical misalignment incidents gathered (minimum 5) for few-shot calibration
Dependency graph mapped for inter-team service and data relationships
Blast-radius scoring weights tuned — time proximity above semantic distance
Coherence brief distribution list locked — team leads and the leaders who can actually decide
False-positive feedback channel built — recipients flag noise so the agent calibrates
Weekly cadence scheduled outside peak document-editing hours
Escalation path named with specific people, not a Slack channel
Measuring Whether the Agent Is Worth Its Tokens
ROI lives in conflicts caught before they became incidents. Track three numbers, not opinions.
ROI for coherence detection lives in conflicts caught before they materialized as incidents. Three numbers prove value. Conflicts surfaced per weekly brief. Resolution time from detection to decision. Production incidents and delivery delays that trace back to assumption mismatches — month-over-month, before and after the agent shipped.
Organizations running coherence analysis have reported roughly 40-60% reductions in cross-team coordination failures. Those figures come from limited early deployments and depend heavily on document quality, team participation, and calibration effort. Treat them as directional. The actual mechanism is not that the agent is smarter than humans. It is that the agent is more consistent. It reads every document, every week, and never assumes two teams have already talked.
Does the coherence agent replace cross-team planning meetings?
No. It makes them shorter and more specific. Instead of spending 45 minutes discovering that two teams hold different assumptions, the meeting opens with the conflict already named, the source documents linked, and a proposed resolution format on the table. Most conflicts get resolved asynchronously after the brief lands — one team updates a document, flags the resolution, and the meeting that would have surfaced the conflict never happens. The meetings that do happen are tighter because the discovery work was done by the agent, not by six people in a room.
How do you handle documents that are still in draft?
Filter by document status. Only published or approved documents enter the comparison pipeline. Scanning drafts produces noise — authors are still working through their own thinking and contradicting themselves on purpose. Allow teams to opt specific documents into early scanning when they want pre-publication coherence checks. Never make it the default.
What if teams intentionally hold different assumptions as part of an A/B strategy?
Surface them anyway. The resolution format becomes "acknowledged divergence." Teams mark specific conflicts as intentional, the agent records the annotation, and stops re-flagging that pair. Intentional divergence that is documented is not a coherence gap. Intentional divergence that is undocumented eventually becomes one — somebody new joins, reads one document, and assumes the other does not exist.
How much does it cost to run a coherence agent weekly?
For a typical organization with 50-100 strategic documents, expect 200-500K tokens per weekly run. At current LLM pricing, that runs roughly $5-15 per run for extraction and analysis, plus embedding costs that round to nothing. The infrastructure cost is negligible compared to one week of misaligned engineering work. The expensive part is calibration time, not runtime.
Can the agent work with documents in multiple languages?
Yes. Modern embedding models handle multilingual content cleanly. The extraction and comparison layers operate on semantic meaning, not surface text, so cross-language comparison works. Generate the brief in the organization's primary working language so leadership reads one register, not three.
Data Privacy and Document Access
The agent needs read access to strategic documents across teams. That raises real concerns about information barriers, competitive sensitivity, and access control. Implement with the minimum necessary permissions. Extract claims and discard raw document content after processing. Store the structured claims and conflict pairs, not the full text of every PRD. Work with the security team to define access boundaries before deployment, not after the first incident.
- [1]Happily — Team Alignment Audit Guide(happily.ai)↩
- [2]Happily — Why Focus and Goal Alignment Will Define High Performing Teams in 2026(happily.ai)↩
- [3]Profit.co — AI-Powered OKR Analytics: The Future of Enterprise Goal Management(profit.co)↩
- [4]Glean — Definitive Guide to AI-Based Enterprise Search for 2025(glean.com)↩
- [5]Anthropic Alignment — Automated Auditing(alignment.anthropic.com)↩
- [6]MDPI — Technologies Journal, Vol. 13 No. 2(mdpi.com)↩
- [7]EmergentMind — Semantic Drift Analysis(emergentmind.com)↩
- [8]arXiv — 2506.01080(arxiv.org)↩