Picking the First Three AI Workflows: Selection Framework

Your First Three AI Picks Are an Information Operation, Not a Bet

Most first AI picks fail because the workflow was wrong, not the model. Score risk, value, and signal quality as separate axes. Treat your first three pilots as three different questions about the organization. Pick boring. Pick measurable. Pick diverse.

Strategy & Operating ModelintermediateAug 15, 20258 min read

By Viktor Bezdek · VP Engineering, Groupon

95%

of enterprise GenAI pilots returning zero ROI in MIT NANDA's 2025 review of 300+ initiatives^[1]

60% evaluated tools, 20% hit pilot, 5% reached production

88%

of AI proof-of-concepts that never reach production, per IDC — 4 of every 33 graduate^[2]

The average organization abandoned 46% of its PoCs before production

30%

of GenAI projects forecast to be cancelled after PoC by end of 2025, per Gartner^[3]

Cited drivers: poor data quality, underestimated complexity, unclear ROI

74%

failure rate for enterprise customer-experience AI programs — the worst of any first-pick category^[6]

Customer service is still the #1 first pick. The vendors are setting the agenda.

The conversation starts in the wrong place every time. A vendor demo lands. Leadership gets excited. Three months later someone is defending a customer-facing chatbot pilot that produces inconsistent answers, generates legal exposure, and has no measurable baseline to compare against. MIT NANDA's 2025 research puts only 5% of enterprise GenAI initiatives in production^[4]. The technology is not what failed. The selection did.

Most "AI use case lists" floating around are vendor-sponsored aspiration. They rank workflows by transformational potential and demo appeal. They skip the questions that decide the outcome: what does "better" mean here, which legacy systems get touched and how brittle are they, who loses if this fails and will they resist it. A CIO picking from that list is optimizing the wrong variable.

IDC's number is the one to anchor on. 88% of enterprise AI proof-of-concepts never reach wide-scale production^[2]. Of every 33 PoCs, four graduate. The organizations that land in that top 12% share one structural trait: they picked first workflows on organizational readiness, not on transformational potential. They started boring. They started measurable. They built the muscle for AI before they took the workflows where the stakes were highest.

The real constraint is not finding a valuable use case. It is finding a valuable use case the organization can actually learn from in 90 days. Your first three picks are not a portfolio of bets. They are an information-gathering operation. The right framework scores risk, value, and signal quality on separate axes. The right anti-portfolio strategy makes each pilot test a different question.

The Four Failure Modes That Don't Show Up in the Pitch Deck

Post-mortems blame data quality and change management. The actual causes are structural, and they are detectable up front.

"Data quality" and "change management" are useful labels that hide what actually broke. Real failures land in four specific patterns. Recognizing them before the pick is the only intervention that works after the fact.

What we got wrong on our own first round: we assumed integration friction was visible from a system diagram. It is not. Two systems that look well-connected on paper can have authentication flows last touched in 2018, undocumented rate limits, and API error responses that return 200 OK with an error payload inside. Real integration cost only surfaces when something tries to connect them in production conditions. That is why Pilot 3 — the high-friction pilot — gets a 90-day window, not a 30-day one.

Brand risk is asymmetric. Customer-facing AI failures are public failures. A chatbot that hallucinates a refund rule, garbles a policy, or sounds robotic enough to trend on social media produces damage that exceeds the efficiency gain. Air Canada's 2023 chatbot ruling — where a tribunal held the airline liable for incorrect information generated by its AI — is the clean case study^[5]. There is no rollback for a public customer experience. The screenshot already exists.

No measurable baseline is the absence of a business case. Teams pick "improve customer satisfaction" or "reduce time-to-response" as goals, then discover the current number is unmeasured, stale, or stitched together from a 15% survey response rate. With no clean before, there is no after. The pilot becomes faith-based deployment.

Integration cost is hidden because the vendor's pitch assumes clean APIs and your systems do not have clean APIs. The model worked in the demo. Then it needed to read from your CRM, write to your ticketing system, authenticate through your identity provider, and log through your compliance layer. Each integration point added weeks. The pilot stalled on plumbing, not on prompts.

Ownership was contested. Legal, IT, marketing, operations — every stakeholder had veto power and a different success criterion. Nobody had single-threaded ownership, every decision became a committee meeting, and the pilot died of friction. The system rewards local optimization and hides accountability.

Vendor-Driven Pick

Customer service chatbot — highest brand risk, slowest signal loop, hardest technology
AI content marketing — no clean attribution baseline, ownership contested across three functions
AI sales outreach — brand risk via spam reputation, low-quality signal (clicks, not conversion)
Executive decision support — high political stakes, undefined success criteria, slow feedback
AI hiring screener — regulatory exposure (EEOC, GDPR), contested ownership across HR and legal

Operator-Driven Pick

Internal support ticket triage — zero customer exposure, baseline already in your SLA data
Meeting summarization — measurable in time saved, no integration surface, no brand risk
Code review assistant — developer adoption is fast, signal lands within a sprint
Internal search over documentation — clear baseline (time-to-answer), single owner
CRM data hygiene — quantifiable before/after, no customer exposure, single system boundary

The Third Axis Is What Kills the Pilot

Standard frameworks score value and complexity. Without signal quality, a high-value workflow you can't measure is just a story.

The standard prioritization matrix uses two axes: business value and implementation complexity. Better than nothing. It surfaces the high-value, low-complexity workflows. It also ignores whether you will be able to tell if the thing worked.

Signal quality — the ability to measure an outcome in under 30 days against a clean baseline — is the missing axis. It is what separates pilots that generate learning from pilots that generate opinions. When 42% of AI projects show zero measurable ROI^[8], the cause is rarely that nothing improved. The cause is that nobody built the measurement infrastructure before deploying, and post-hoc measurement is almost always compromised. A pilot that returns ambiguous results at 90 days produces organizational skepticism that makes the next pilot harder to fund, harder to staff, and harder to ship. The signal failure compounds across the entire program.

A workflow can be low risk, high value, and still be a terrible first pick when the signal quality is poor. Take a pilot aimed at a customer support workflow where current satisfaction is measured quarterly, in a survey with a 15% response rate. Even if the AI helps, the result lands at 90 days inside a wide confidence interval, and leadership has already moved on. Your first three picks should each score well on at least two of the three axes, and at least one should max out signal quality. That means workflows where a database already tracks the metric you care about, at a frequency short enough to see change inside a month.

Risk

Brand exposure plus regulatory surface plus reversibility. Low risk means the rollback does not require a public statement.

Value

Annualized time saved plus decision quality plus revenue impact. Without a measurable baseline the value is fiction.

Signal Quality

Outcome measurable inside 30 days against a clean baseline. Failure mode visible. If you can't see the result, the pilot did not happen.

Workflow	Risk	Value	Signal Quality	Verdict
Customer service chatbot	HIGH — brand exposure, legal liability	High ceiling	POOR — satisfaction measured quarterly	Wrong first pick. High ceiling, brutal learning environment.
Meeting summarization	LOW — internal only	Medium — $40–80K/year per team in time recovered	EXCELLENT — measurable inside one week	Best high-signal pilot. Ship this first.
Code review assistant	LOW — developer-facing	High — cycle-time reduction, fewer production bugs	EXCELLENT — sprint-level signal	Strong second pick. Fast feedback, real value.
AI for B2C marketing copy	MEDIUM — brand voice exposure	Medium	POOR — attribution requires 60–90 day cycles	Poor first pick. The signal arrives after leadership has lost interest.
Sales call coaching	LOW — internal only	High — conversion rate is measurable	GOOD — 30-day sales cycle gives a usable signal	Strong high-value pick. Requires clean CRM data.
Contract review	MEDIUM — legal exposure on misclassification	High — $200–500 per contract in legal time	GOOD — review time is measurable on day one	Solid pick. Legal ownership has to be single-threaded.
IT support ticket triage	LOW — internal only	Medium — resolution time reduction	EXCELLENT — SLA data is already the baseline	Excellent high-friction pilot when the IT estate is legacy.
Financial close commentary	LOW — internal reporting only	Medium — hours per close cycle	GOOD — close cycle is the natural measurement window	Solid pick when finance owns the outcome cleanly.

Three Pilots, Three Different Questions

Picking three similar pilots is the most common structural mistake. The point is to learn three different things.

Here is the mistake that follows once teams accept the "start with high-signal internal use cases" advice: they pick three high-signal internal use cases and learn the same thing three times. Three meeting summarizers. Three document classifiers. Three code assistants. The model works fine. They learn nothing about whether their teams can change workflows, whether the legacy integration layer is as bad as suspected, or whether anything makes it from pilot to production in this organization.

The anti-portfolio approach treats the first three pilots as three different questions about your AI readiness — not three bets on value. Each pilot exposes a different constraint. When all three complete, you have a multi-dimensional read on where you can scale and where you will be blocked.

Pilot 1 tests LLM viability. Can a language model add value here, and can you measure it. Pick the workflow with the highest signal quality and the lowest risk. This is the learning lab. When it fails, the cause is selection, not technology.

Pilot 2 tests value at scale. Is there a workflow with a known cost or time spend where AI compresses it measurably inside one quarter. Pick the candidate with the largest annualized dollar value, a clean owner, and a clean baseline.

Pilot 3 tests integration friction. Deliberately pick a workflow that requires touching your hardest legacy system. Not as masochism. As reconnaissance. You need to know how hard the integration actually is before you bet a roadmap on assumptions about it.

Pilot 1 — High Signal (maximize learning per dollar)

✓
Zero customer or brand exposure — a failure stays inside the team
✓
Baseline already exists today — the 'before' number is queryable without new instrumentation
✓
Outcome measurable inside two weeks of production use
✓
Single system of record — no cross-system integration in scope
✓
Failure mode is visible — when it goes wrong, you know why within days

Pilot 2 — High Value (maximize ROI per quarter)

✓
Known annualized cost or time spend — $200K+ in identifiable spend being compressed
✓
Single owner with budget authority and concrete success criteria
✓
Regulatory surface is low or well-understood
✓
At least one comparable internal or industry deployment as reference
✓
Path to production is 60 days or fewer — not a six-month integration project

Pilot 3 — High Friction (maximize integration learning)

✓
Touches at least one legacy system — specifically the one you are most uncertain about
✓
Internal-only workflow — integration failures cannot leak to customers
✓
Success criteria include 'we now know what the integration takes' — not just output quality
✓
IT has agreed to treat this as exploration with shared ownership
✓
Timeline is 90 days, not 30 — integration discovery is not a sprint

Three Pilots Feed One Decision

Each pilot runs in parallel and tests a different question. The synthesis node compresses the three signals into a single portfolio call: scale, kill, or reroute.

Score a Workflow in 30 Minutes or Don't Score It

The inputs are already inside the organization. The debate is the artifact, not the spreadsheet.

Scoring is not a quarterly planning process. A CIO who cannot score a candidate workflow in 30 minutes with a small team is missing the information that decides the outcome. The inputs are mostly already inside the organization. You are not running external research, you are auditing what your team already knows and what they are guessing at. The debate inside that 30-minute session is as valuable as the scores themselves. When a department head says "signal quality is fine" and IT says "we do not actually log that metric," you have just avoided a 90-day blind spot.

The rubric stays simple. For each axis, assign a score of 1–3. A 3 on risk means the workflow is low-risk on all three sub-dimensions: brand, regulatory, reversibility. A 1 means you are exposed. Plot every candidate on the three axes and prioritize workflows that score well on at least two, with extra weight on signal quality — because a high-value workflow you cannot measure is just a story. McKinsey's 2025 state of AI research found that organizations running three or more AI use cases in production hit 160% average ROI, while those with one realized 40%^[7]. The multiplier comes from organizational learning, not from any single workflow. Your first three picks are practice for the picks that actually matter.

[01]
List ten candidate workflows
Cast a wide net before narrowing. Solicit from department heads, pull from vendor proposals, review what comparable companies have shipped. You need diversity before structural selection is possible.
[02]
Score risk (1–3) for each candidate
Three sub-questions decide the score. Is this customer-facing? Is there regulatory exposure? When it fails publicly, can we reverse it inside 48 hours? Internal-only, pre-regulatory, and reversible scores a 3.
[03]
Score value (1–3) for each candidate
Value has to land on a real number, not an aspiration. When you cannot identify an annualized dollar amount or a specific hours-per-week reduction tied to a cost line, the value score is 1 by default. You are guessing.
[04]
Score signal quality (1–3) for each candidate
Signal quality is the hardest axis to score honestly because it forces an admission of where the baseline does not exist. Can you measure the outcome inside 30 days? Is the baseline clean and current? Is the failure mode visible — or can a bad outcome hide for weeks?
[05]
Pick three that span the axes
After scoring all ten, do not just take the top three by total score. Select for axis diversity. Your three picks should collectively cover all three tests: high signal, high value, high friction. When the top three are all high-signal and low-friction, swap one out for the highest-friction candidate.

Customer Service Is Almost Always the Wrong First Pick

The most common first pick combines the worst possible profile: highest brand risk, hardest tech, slowest signal.

Customer service AI assembles the worst possible profile for a first pilot in one workflow: highest brand risk, hardest technology, slowest signal loop. It is also the most frequently proposed first pick in enterprise AI strategy. That tells you who is setting the agenda.

The tech is hard because customer service requires generative reasoning, multi-turn conversation management, policy grounding, tone calibration, and escalation logic — operating on inputs that are adversarial, ambiguous, and emotionally charged. None of these are solved problems. Klarna ran one of the most publicized customer service AI deployments. The chatbot handled two-thirds of customer conversations at peak. Then satisfaction fell, complaints grew, and the company quietly began rehiring human agents^[5]. The efficiency metrics looked great right up until the customer experience metrics did not.

The signal loop is slow because customer satisfaction is measured quarterly in most organizations, response times are logged but satisfaction is not, and complaint volume is a lagging indicator that only rises after damage is done. You will not know your customer service pilot failed for 60 to 90 days, and by then the brand story has already been written.

Brand risk is asymmetric. A customer service AI that performs at 90% of human quality sounds good until you do the math. A 10% failure rate across thousands of daily interactions produces dozens of public complaints per week. Enterprise CX AI programs fail at 74% — the highest rate of any category^[6]. It is still the #1 first pick for organizations under vendor pressure to ship something visible.

Five Anti-Patterns That Kill the Pilot Before It Ships

Each one is a structural failure that the scoring rubric catches before commitment.

[01]

The Vendor Demo Pick

The workflow got picked because the demo was impressive. Demo environments are tuned for capability, not for the data quality, integration debt, or edge case distribution of your environment. The workflow that looks magical in a demo is usually the one with the deepest hidden integration cost in your stack.

[02]

The Volume Trap

High-volume workflows look like obvious AI targets. The reasoning: automate something done 10,000 times a day and the impact compounds. What it ignores is that high-volume often means high-consequence-per-error and deeply embedded process dependencies. Volume amplifies the upside and the failure rate equally.

[03]

The Crown Jewel Pilot

Picking your most strategically important workflow as pilot one because leadership wants AI applied where it matters most. This produces maximum political pressure, maximum scrutiny, and minimum tolerance for the iterative failure that good pilots require. It also guarantees contested ownership across every senior stakeholder.

[04]

The Greenfield Lie

Picking a workflow with no existing baseline because building from scratch feels cleaner than measuring against a messy current state. There is no ROI argument without a before-and-after. A pilot without a baseline is a science project — interesting output, no business case.

[05]

The Three-of-the-Same Portfolio

Three workflows that test the same question. Three document classifiers tell you that document classification works — and nothing else. You learn nothing about change management capacity, integration friction, or cross-functional ownership dynamics. Your second batch of picks lands as uncertain as the first.

What the First 90 Days Actually Look Like

Inventory to first shipped pilot, in concrete weekly increments.

[01]
Weeks 1–3: build the inventory
Before scoring anything, gather raw material. Run 30-minute structured interviews with department heads from each major function. You are looking for three signals: where manual time is being spent, where the current process has a measurable baseline, and where the integration chain is cleanest.
[02]
Weeks 4–6: score and debate
Apply the three-axis rubric to the candidate list with a small cross-functional team — IT, legal, one business unit leader. The point is to surface disagreement about risk and integration complexity before commitment, not after.
[03]
Weeks 7–9: commit and baseline
Lock the three picks. For each, establish the baseline before any AI deploys. This is non-negotiable. The baseline measurement runs at least two weeks before deployment so it is not contaminated by the novelty effect of the launch.
[04]
Weeks 10–12: ship Pilot 1, instrument all three
Deploy the high-signal pilot in week 10. Stand up the measurement infrastructure for all three pilots so the baseline runs in parallel with early deployment. The measurement infrastructure carries as much weight as the AI itself. Organizations that measure rigorously are 3x more likely to scale from pilot to production.

Operating Questions

Leadership insists on customer-facing as Pilot 1. What now?

Make the trade-offs explicit, in writing, before commitment. Document the risk profile, the absent signal loop, and the brand exposure. Then propose a parallel path: run a small internal pilot concurrently so the learning environment does not depend on the customer-facing pilot succeeding. Leadership often insists on customer-facing because nobody has handed them an honest risk inventory. Hand them one.

How do you measure signal quality before you ship?

You are measuring whether the measurement infrastructure exists, not whether the AI is good. Before deployment, three questions decide it. Does this metric exist in a system today? Can you query it without manual effort? Is it measured at a frequency short enough to see 30-day changes? Any 'no' drops the signal score. The point is to identify measurement gaps before deployment, not to retrofit measurement after the pilot is live.

Should the three pilots run in the same business unit?

No, deliberately. Running all three in one business unit tests AI readiness in a single organizational context. It produces good signal for that unit and poor signal for everyone else. Spread the pilots across at least two business units. The integration friction pilot in particular should touch the system that the most diverse set of teams depends on, not the cleanest system in your most cooperative department.

What if all three fail?

Three clean failures are more actionable than one ambiguous success. They mean the readiness gap is systemic, not workflow-specific. Audit the failure modes. When all three stalled on integration, the problem is infrastructure debt — typically six to nine months to resolve. When all three stalled on adoption, the problem is change management and no amount of better technology fixes it. When all three stalled on data quality, the problem is data governance, and the next investment is data engineering before any further AI dollar. Each failure points to a specific structural fix instead of a vague 'do better.'

When does the program graduate from pilots to platform investment?

When at least two of the three pilots reach production and sustain measurable value for 60 days post-launch. At that point you have organizational evidence, not vendor promises, that AI delivers in your context. That is the credibility threshold for a platform conversation. A platform bet placed before that evidence is faith, not strategy.

First Three Workflow Selection Checklist

Raw candidate list of 10+ workflows built from department head interviews
Each candidate scored on Risk, Value, and Signal Quality (1–3 per axis)
Three pilots selected that span all three axes — not three of the same type
Zero customer or brand exposure on at least two of the three picks
Clean, queryable baseline established for each pilot before any deployment
Single-threaded ownership assigned to each pilot — one accountable person, not a committee
One high-friction pilot deliberately included that touches a legacy integration
Documented why customer service was not picked (when it came up)
Measurement infrastructure in place before Pilot 1 ships
30-day readout scheduled with every stakeholder before any pilot scales beyond one team

The point of the framework is not caution. Caution as a default is how AI initiatives turn into 18-month design-by-committee exercises that produce decks and no production deployments. The point is to maximize learning per dollar in the first 90 days, because the second three picks land substantially better when the first three were chosen with intention.

Gartner forecast that 30% of GenAI projects would be cancelled after PoC by end of 2025^[3] — citing poor data quality, underestimated complexity, and unclear ROI. Every one of those failure conditions is detectable up front, in a 30-minute structured scoring session. Poor data quality shows up the moment you try to establish a baseline and the metric does not exist in any system. Underestimated complexity shows up when IT estimates the integration timeline. Unclear ROI shows up when nobody can name the dollar amount the workflow currently costs. The framework does not prevent failure. It makes the failure modes visible before resources commit.

MIT NANDA's number is the one to leave you with. 5% of enterprise AI initiatives reach production^[4]. The 95% are not mostly failing on technology. They are failing because the selection criteria were wrong, the baselines did not exist, and the organizational conditions were never verified before commitment. The organizations that graduate to platform investment treated their first three pilots as structured experiments, not vendor-driven bets. Pick boring. Pick measurable. Pick diverse. Then scale what worked.

Key terms in this piece

AI workflow selection frameworkfirst AI use caseAI pilot selectionAI use case prioritizationenterprise AI projectsAI proof of concept

Sources

[1]MIT report: 95% of generative AI pilots at companies are failing — Fortune(fortune.com)↩
[2]88% of AI pilots fail to reach production — CIO.com / IDC Research(cio.com)↩
[3]Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After PoC by End of 2025(gartner.com)↩
[4]MIT NANDA — The GenAI Divide: State of AI in Business 2025(mlq.ai)↩
[5]I hate customer-service chatbots: The consumer-AI refund relationship is off to a rocky start — CNBC(cnbc.com)↩
[6]Why 74% of Enterprise CX AI Programs Fail — And How to Make Them Work(eglobalis.com)↩
[7]The State of AI in 2025: Agents, Innovation, and Transformation — McKinsey(mckinsey.com)↩
[8]Why 42% of AI Projects Show 0 ROI (And How to Be in the 58%) — Beam.ai(beam.ai)↩

Share this article

X LinkedIn Hacker News

Your First Three AI Picks Are an Information Operation, Not a Bet

Strategy & Operating ModelintermediateAug 15, 20258 min read

By Viktor Bezdek · VP Engineering, Groupon

Workflow

Risk

Value

Signal Quality

Verdict

Customer service chatbot

HIGH — brand exposure, legal liability

High ceiling

POOR — satisfaction measured quarterly

Wrong first pick. High ceiling, brutal learning environment.

Meeting summarization

LOW — internal only

Medium — $40–80K/year per team in time recovered

EXCELLENT — measurable inside one week

Best high-signal pilot. Ship this first.

Code review assistant

LOW — developer-facing

High — cycle-time reduction, fewer production bugs

EXCELLENT — sprint-level signal

Strong second pick. Fast feedback, real value.

AI for B2C marketing copy

MEDIUM — brand voice exposure

Medium

POOR — attribution requires 60–90 day cycles

Poor first pick. The signal arrives after leadership has lost interest.

Sales call coaching

LOW — internal only

High — conversion rate is measurable

GOOD — 30-day sales cycle gives a usable signal

Strong high-value pick. Requires clean CRM data.

Contract review

MEDIUM — legal exposure on misclassification

High — $200–500 per contract in legal time

GOOD — review time is measurable on day one

Solid pick. Legal ownership has to be single-threaded.

IT support ticket triage

LOW — internal only

Medium — resolution time reduction

EXCELLENT — SLA data is already the baseline

Excellent high-friction pilot when the IT estate is legacy.

Financial close commentary

LOW — internal reporting only

Medium — hours per close cycle

GOOD — close cycle is the natural measurement window

Solid pick when finance owns the outcome cleanly.

The Four Failure Modes That Don't Show Up in the Pitch Deck

The Third Axis Is What Kills the Pilot

Three Pilots, Three Different Questions

Pilot 1 — High Signal (maximize learning per dollar)

Pilot 2 — High Value (maximize ROI per quarter)

Pilot 3 — High Friction (maximize integration learning)

Score a Workflow in 30 Minutes or Don't Score It

List ten candidate workflows

Score risk (1–3) for each candidate

Score value (1–3) for each candidate

Score signal quality (1–3) for each candidate

Pick three that span the axes

Customer Service Is Almost Always the Wrong First Pick

Five Anti-Patterns That Kill the Pilot Before It Ships

The Vendor Demo Pick

The Volume Trap

The Crown Jewel Pilot

The Greenfield Lie

The Three-of-the-Same Portfolio

What the First 90 Days Actually Look Like

Weeks 1–3: build the inventory

Weeks 4–6: score and debate

Weeks 7–9: commit and baseline

Weeks 10–12: ship Pilot 1, instrument all three

Operating Questions

First Three Workflow Selection Checklist

Related

The Four Failure Modes That Don't Show Up in the Pitch Deck

The Third Axis Is What Kills the Pilot

Three Pilots, Three Different Questions

Pilot 1 — High Signal (maximize learning per dollar)

Pilot 2 — High Value (maximize ROI per quarter)

Pilot 3 — High Friction (maximize integration learning)

Score a Workflow in 30 Minutes or Don't Score It

List ten candidate workflows

Score risk (1–3) for each candidate

Score value (1–3) for each candidate

Score signal quality (1–3) for each candidate

Pick three that span the axes

Customer Service Is Almost Always the Wrong First Pick

Five Anti-Patterns That Kill the Pilot Before It Ships

The Vendor Demo Pick

The Volume Trap

The Crown Jewel Pilot

The Greenfield Lie

The Three-of-the-Same Portfolio

What the First 90 Days Actually Look Like

Weeks 1–3: build the inventory

Weeks 4–6: score and debate

Weeks 7–9: commit and baseline

Weeks 10–12: ship Pilot 1, instrument all three

Operating Questions

First Three Workflow Selection Checklist

Related