Every organization makes hundreds of decisions each quarter. Strategy pivots, vendor selections, hiring freezes, architecture choices, budget allocations — they happen in meetings, get captured in notes, and then vanish into a Notion graveyard or a shared drive nobody searches.
Three months later, someone asks: Why did we pick Vendor X over Vendor Y? The answer lives in a transcript from a Tuesday standup that nobody tagged, indexed, or made retrievable. The person who made the call left the company. The rationale is gone.
This is the organizational memory problem, and it costs real money.[8] Teams re-litigate settled questions. They reverse decisions without knowing they existed. They build on assumptions that were explicitly rejected six months ago.
The fix is not another documentation initiative. People already have documentation fatigue. The fix is a Cowork agent workflow that processes meeting transcripts automatically, extracts decisions into a structured schema, and — critically — attaches review triggers that tell a separate scanning agent when to surface them again.
The Decision Decay Problem
Why organizations lose their own decisions within weeks
Most knowledge management systems treat decisions as a documentation problem. Write it down, put it somewhere, hope someone finds it. But documentation alone has three failure modes:
Capture failure — Nobody writes the decision down in the first place. The meeting ends, people move to the next thing. Action items get tracked, but the reasoning behind the action does not.
Retrieval failure — The decision was documented but lives in a format that resists search. It is buried in paragraph 7 of meeting notes from February 12th, tagged only with the meeting series name. Nobody thinks to look there when the question resurfaces.
Staleness failure — The decision was correct when made, but conditions changed. The vendor we rejected raised a round and dropped their price by 40%. The constraint that drove the architecture decision was removed. Nobody flagged it for review because there was no mechanism to do so.
A structured decision record solves all three. Automated extraction handles capture. Schema-based storage with indexed fields handles retrieval. And a review trigger field — the piece most systems miss — handles staleness.
The Decision Record Schema
Seven fields that make organizational decisions machine-readable
The schema borrows from Architectural Decision Records (ADRs) used in software engineering, but extends the concept for general organizational decisions.[1] ADRs have been battle-tested at companies from startups to enterprises — AWS documents ADR usage across projects with teams of varying sizes.[2] The key insight from ADR practice: recording alternatives considered and rationale matters more than recording the decision itself.[3]
| Field | Type | Purpose | Example |
|---|---|---|---|
| decision | string | The specific choice that was made | Adopt PostgreSQL for the analytics data store |
| rationale | string | Why this option was selected over alternatives | Needed JSON support and the team has existing Postgres expertise |
| alternatives | string[] | Other options that were considered and why they were rejected | MongoDB (scaling concerns), BigQuery (cost at our volume) |
| date | ISO 8601 | When the decision was finalized | 2026-02-14T10:30:00Z |
| decider | string | Person or group with final authority | Sarah Chen, VP Engineering |
| affected_teams | string[] | Teams whose work changes because of this decision | Data Platform, Product Analytics, Backend |
| review_trigger | string | Conditions that should prompt re-evaluation | If monthly analytics queries exceed 50M rows or if BigQuery drops pricing below $3/TB |
The Cowork Extraction Workflow
Processing weekly transcripts into structured decision records
The Cowork workflow runs on a weekly cadence, processing all meeting transcripts generated since the last run. It uses two agents with distinct responsibilities:
Agent 1: Decision Extractor — Receives raw transcript text, identifies segments where decisions were made, and outputs structured records conforming to the schema above.[6]
Agent 2: Review Scanner — Runs on a separate schedule (daily or weekly), reads all existing decision records, evaluates their review triggers against current conditions, and flags any that should be revisited.
This separation matters. The extractor needs to be good at language understanding and structured output. The scanner needs to be good at evaluating conditions against external data. Different skills, different prompts, different evaluation criteria.
Extraction Prompt Design in Depth
Building the prompt that reliably pulls decisions from messy transcripts
The extraction prompt is the hardest part to get right. Meeting transcripts are messy. Decisions are often implicit. Someone says "okay, let's go with option B then" and that is a decision — but there is no formal announcement, no gavel, no explicit declaration.
The prompt needs to handle several categories of decision language:
- Explicit decisions: "We've decided to…" or "The decision is…"
- Implicit consensus: "Sounds like we're all aligned on…" or "Let's move forward with…"
- Authority decisions: "I'm going to call this — we'll go with…" or "As the owner of this, I want to…"
- Negative decisions: "We're not going to do X" or "Let's table that for now"
The prompt also needs to distinguish decisions from opinions, preferences, and speculative statements. "I think we should use Postgres" is not a decision. "We're going with Postgres" is.
prompts/extract-decisions.tsconst DECISION_EXTRACTION_PROMPT = `
You are a decision extraction agent. You will receive a meeting transcript
and must identify all decisions that were made during the meeting.
A DECISION is a commitment to a specific course of action that was agreed
upon or authorized by someone with the authority to do so.
A decision is NOT:
- An opinion or preference ("I think we should...")
- A question or proposal under discussion ("What if we...")
- An action item without a preceding choice ("John will send the report")
- Speculation about future plans ("We might want to consider...")
For each decision found, extract ALL of the following fields:
1. decision: A clear, concise statement of what was decided. Use active
voice. Start with a verb when possible.
2. rationale: Why this choice was made. Pull actual reasoning from the
transcript — paraphrase but preserve the logic.
3. alternatives: Other options that were mentioned during discussion.
For each, note why it was not chosen if stated.
4. date: Use the meeting date provided in the transcript metadata.
5. decider: The person who made the final call or the group if by
consensus. Use full names when available.
6. affected_teams: Teams or groups whose work will change. Infer from
context if not explicitly stated.
7. review_trigger: Define a specific, measurable condition that should
cause this decision to be revisited. Do NOT use time-based triggers
like "revisit in 6 months." Instead, identify what assumption or
constraint would need to change.
IMPORTANT: The review_trigger must be falsifiable and externally
verifiable. Good: "If customer churn exceeds 5% monthly." Bad:
"If things change significantly."
Output a JSON array of decision records. If no decisions were made
in the transcript, return an empty array.
`;Designing Effective Review Triggers
Conditions that make decisions self-aware of their own expiration
Revisit this decision next quarter
Review if circumstances change
Check back in 6 months
Reconsider if the team grows
Reassess when we have more data
If monthly active users exceed 50,000
If the Datadog bill exceeds $15K/month for two consecutive months
If more than 3 engineers request TypeScript migration in feedback surveys
If competitor Y launches a self-serve tier below $99/month
If P95 API latency exceeds 400ms on the current architecture
Strong review triggers share three properties. They are specific — they reference a measurable quantity or observable event. They are falsifiable — you can check whether the condition is true or false without subjective judgment. And they are externally verifiable — the scanning agent can check them against data sources like monitoring dashboards, billing systems, survey results, or competitive intelligence feeds.
The review trigger is what transforms a decision record from a historical artifact into an active governance tool. Without it, you are just building a better filing cabinet. With it, you are building an early warning system that tells you when your own decisions need attention.
The Review Scanner Agent
Automated condition checking across all active decisions
- 1
Load all active decision records from the database
typescriptconst decisions = await db.decisions.findMany({ where: { status: 'active' }, orderBy: { date: 'desc' } }); - 2
Classify each review trigger into checkable categories
typescriptconst classified = await agent.classify(decision.review_trigger, { categories: [ 'metric_threshold', // Check monitoring/analytics 'market_event', // Check news/competitive intel 'team_feedback', // Check survey/retro data 'time_elapsed', // Simple calendar check 'external_pricing', // Check vendor pricing pages ] }); - 3
Route to appropriate data source and evaluate
typescriptconst result = await evaluateTrigger({ trigger: decision.review_trigger, category: classified.category, dataSources: getSourcesForCategory(classified.category), }); // result: { triggered: boolean, evidence: string, confidence: number } - 4
If triggered, create a review request with full context
typescriptif (result.triggered && result.confidence > 0.8) { await createReviewRequest({ decision, triggerEvidence: result.evidence, originalContext: decision.rationale, suggestedReviewers: decision.affected_teams, }); }
Handling Edge Cases in Extraction
What happens when transcripts are ambiguous, multi-part, or contradictory
Multi-meeting decisions
Some decisions span multiple meetings — discussed in one, finalized in another
Track 'pending_decision' status for discussions that have not reached resolution
Link related records with a thread_id so the full deliberation history is preserved
Only promote to 'active' when explicit agreement or authority call is detected
Contradictory decisions
Later decisions sometimes contradict earlier ones without explicit reference
The scanner should detect semantic overlap between new and existing records
Flag contradictions for human review rather than auto-resolving
Maintain both records with cross-references — the contradiction itself is valuable data
Implicit authority
Not every meeting has a clear decision-maker present
When authority is ambiguous, tag the record with 'consensus' as the decider
Require validation from a designated owner before the record reaches 'active' status
Build an org chart integration so the agent can infer decision authority by domain
Implementation Checklist
What you need to build this system
Cowork Decision Extraction Pipeline — Setup Checklist
Configure transcript source (Otter, Fireflies, Google Meet, or custom)
Define the decision record schema in your data store
Write and test the extraction prompt against 10+ real transcripts
Set up the Cowork agent with structured output enforcement
Build the review trigger classification taxonomy
Connect data sources for trigger evaluation (metrics, billing, surveys)
Create the review scanner agent with weekly scheduling
Set up notification routing (Slack, email) for triggered reviews
Build a searchable dashboard for browsing decision records
Run a pilot with one team for 4 weeks before org-wide rollout
Iterating on the Extraction Prompt
A systematic approach to improving extraction accuracy over time
- 1
Collect a ground truth dataset
Have a human analyst manually label 20-30 transcripts, marking every actual decision. This becomes your evaluation set. Without ground truth, you cannot measure extraction quality — you are just guessing.
- 2
Measure precision and recall separately
Precision tells you what percentage of extracted records are real decisions. Recall tells you what percentage of real decisions were captured. Most teams optimize for precision first — false positives destroy trust faster than missed extractions.
- 3
Add few-shot examples from your own org
Generic prompts work at about 70% accuracy. Adding 3-5 examples from your actual transcripts — with your people, your jargon, your meeting cadence — pushes accuracy to 85-90%.[7] The examples teach the model your organization's specific patterns of how decisions get expressed.
- 4
Deploy a human-in-the-loop validation step
For the first 8 weeks, route all extracted records through a quick human review before they reach active status. The reviewer corrects errors and those corrections feed back into the prompt. After 8 weeks, switch to spot-checking 20% of extractions.
What This Unlocks for Your Organization
The compounding value of retrievable decisions
After two quarters of running this workflow, the decision record database becomes a genuine organizational asset. New hires search it during onboarding instead of asking "why do we do it this way?" in every meeting. Planning sessions reference specific records instead of relying on collective memory. And when a review trigger fires, the team revisits the decision with full context — the original rationale, the alternatives that were considered, and the specific condition that changed.
The real shift is cultural. When people know their decisions are being captured and indexed, they articulate their reasoning more clearly in meetings. When they know review triggers exist, they think harder about what conditions would invalidate their choices. The system makes the organization more deliberate about decision-making, not just better at remembering.[8]
We stopped having the same argument every quarter about our deployment strategy. The decision record showed exactly why we chose the current approach, what we rejected, and under what conditions we should reconsider. When one of those conditions actually hit, the review trigger caught it before anyone had to remember.
How accurate is AI extraction compared to manual decision logging?
With a well-tuned prompt and 3–5 few-shot examples from your own transcripts, extraction accuracy can reach roughly 85–90% precision on decision identification — though this varies by transcript quality and meeting style. Individual field accuracy varies: the decision and date fields tend to be most reliable, while review_trigger is typically the hardest to extract and benefits most from human review during the first weeks. Treat these as starting-point estimates and measure against your own ground truth.
What transcript sources work best with this workflow?
Any service that produces timestamped, speaker-attributed transcripts works well. Otter.ai, Fireflies.ai, and Google Meet's built-in transcription all produce usable output. The key requirement is speaker attribution — knowing who said what is critical for identifying the decider field.
How do you prevent the decision database from becoming another graveyard?
The review trigger mechanism is specifically designed to prevent this. Unlike static documentation, triggered reviews actively surface records when conditions change. Combined with a searchable dashboard and Slack notifications for triggered reviews, the system stays alive because it reaches out to people rather than waiting to be found.
Can this work for async decisions made in Slack or email?
Yes, with modifications. The extraction prompt needs adjustment for written communication patterns versus spoken ones. Async decisions tend to be more explicit (people write 'Decision: we will do X') which actually makes extraction easier. The main challenge is identifying the right message threads to process.
What is the minimum team size where this adds value?
Teams of 8-10 people start seeing meaningful return. Below that, institutional memory tends to be held in a few heads and informal communication is enough. Above 8-10, the combinatorial explosion of who-knows-what makes systematic capture worthwhile. For organizations above 50 people, it is close to essential.
Decision Record Governance Rules
Every decision record must have a non-empty review_trigger field
Records without review triggers become static documentation. The trigger is what makes the system proactive rather than passive.
Review triggers must be falsifiable and externally verifiable
Vague triggers like 'if things change' cannot be evaluated by the scanner agent. The trigger must reference a specific, measurable condition.
Contradicting a previous decision requires referencing the original record
New decisions that override old ones must link to the original, ensuring the evolution of thinking is preserved and traceable.
Extraction accuracy must be measured monthly against ground truth
Prompt drift is real. Models change, meeting culture changes, new jargon appears. Monthly measurement prevents silent degradation.
- [1]ADR GitHub Organization — Architectural Decision Records (ADRs)(adr.github.io)↩
- [2]AWS Architecture Blog — Master Architecture Decision Records (ADRs): Best Practices for Effective Decision Making(aws.amazon.com)↩
- [3]Joel Parker Henderson — Architecture Decision Record Templates and Examples(github.com)↩
- [4]Google Cloud — Architecture Decision Records(cloud.google.com)↩
- [5]Microsoft — Architecture Decision Records in Azure Well-Architected Framework(learn.microsoft.com)↩
- [6]Relevance AI — Extract Data From Meeting Transcripts(relevanceai.com)↩
- [7]Prompt Engineering — Agents at Work: The 2026 Playbook for Building Reliable Agentic Workflows(promptengineering.org)↩
- [8]Wikipedia — Organizational Memory(en.wikipedia.org)↩