60% of agentic projects stall on data, not models. A 30-minute, three-tier gate — Foundation, Workflow, Autonomous — that decides what autonomy your data can actually support, with a retrofit pattern for legacy systems you cannot rewrite.
Gartner found that 60% of agentic AI projects stall or get cancelled because the data was not ready when development started.[1] Not the model. Not the framework. Not the team. The data. Walk into any kickoff and you find one of two things: a 47-line enterprise audit nobody finishes, or nothing at all.
One team we tracked shipped a customer churn agent that passed every offline test on a clean CSV. Production hit them in week six. The CRM's last_active column was being backfilled by a nightly batch — the agent was treating data 23 hours stale as live, recommending retention plays for customers who had already cancelled. A 30-minute review would have caught it in minute eight.
Fivetran's 2026 Agentic AI Readiness Index found that only 15% of organizations are fully prepared to support agentic AI in production, yet 41% are already running it there.[6] The gap does not close by deploying faster. It closes by gating correctly before the first line of agent code is written.
This checklist closes that gap. Three tiers, each with a binary pass/fail gate. Tier 1 clears you to build. Tier 2 clears you to ship to users. Tier 3 clears you to remove the human from the loop. None of it takes longer than a sprint planning session.
Gartner, August 2025 — insufficient AI-ready data is the primary cancellation driver, not model quality [1]
Fivetran 2026 Readiness Index — 41% are already running agentic AI despite critical data gaps [6]
vs. 18% for teams that hit data issues mid-build. The same gate run earlier closes a 3.7x gap [3]
Enterprise frameworks exist. Lean ones do not. So teams skip the review entirely.
The Gartner data readiness checklist is genuinely thorough. It is also paywalled, dozens of line items long, and requires coordinated input from data governance, legal, and infrastructure. For a product team with a two-week runway to prove a first agent, it is functionally inaccessible.
The result is a binary collapse. Teams either skip the data review entirely, or they do a surface-level "we have the data" check and move on. The first approach fails in week four when the agent returns nonsense. The second fails in production when the agent acts on a stale or malformed record and ships a real consequence to a real customer.
The Fivetran index found the top barriers to agentic AI goals are: data quality and lineage (cited by 42% of respondents), regulatory compliance (39%), and security and privacy risk (39%).[6] All three are checkable before the first agent tool call — but only if the gate exists.
The 30-minute scoreboard below is not a replacement for a data governance program. It is a go/no-go gate — the minimum signal you need to know whether your data can support the agent you are about to build, and at what autonomy level it is safe to operate.
Schema and freshness are binary. The contract is enforced or it is not.
Foundation gates are binary. The data contract exists and is enforced, or it does not. This tier runs in roughly 10 minutes and blocks everything downstream if it fails.
The two failure modes that show up over and over: schema drift and undefined freshness SLAs. Schema drift happens when an agent is built against a column that gets renamed, split, or silently dropped by an upstream migration nobody told the agent team about. Freshness SLA failures happen because nobody ever wrote down how stale is too stale — which means there is no way to know when the SLA is violated. Drift is the default state of any contract without an owner.
Null rate thresholds deserve more attention than they usually get. A column that is null 0.3% of the time in your dev extract may be null 8% of the time in production at peak load, depending on write patterns. Check null rates on production data, at peak and off-peak, before you design the agent's fallback behavior.
For legacy systems you did not build, you are not checking whether the data is perfect. You are checking whether you can wrap a contract around what exists, and whether the system can honor it.
Paper governance is not enforcement. The owner answers in four hours or there is no owner.
Governance failures are the most expensive Tier 2 failure mode and almost entirely invisible until an agent takes a bad action in production. The shape is always the same: a data quality policy in Confluence, a data owner assigned in a kickoff meeting, and zero mechanism to surface a violation before the agent reads the data.
This is not hypothetical. An internal analyst agent at a mid-size e-commerce company sent promotional emails to already-churned customers because the suppression list had not been updated since a data migration four months earlier. The owner existed on paper. The policy existed on paper. Neither was wired into the agent's access path.
Access scoping matters more than it looks on a checklist. An agent service account with SELECT * permissions on a customer table is one schema change away from accidentally reading PII fields the agent was never meant to touch. The right model is a read-only service account with explicit column-level grants — not table-level, column-level. If your database does not support column-level grants, the wrapper pattern (see the legacy section) is the fix.
Schema change notifications are chronically under-wired. dbt's --warn-error flag on column removal, database NOTIFY events on DDL, or a simple pre-migration checklist that asks "which agents read this table?" — any of these is sufficient. The team that skips this finds out about the schema change when the agent starts returning parse errors at 2am.
Tier 2 checks that governance is operational, not documentary. A named owner who actually answers. A quality gate that fires before the agent reads. An incident response runbook the agent team can follow when the data degrades at 3am.
Lineage and audit trail are the bar for unsupervised production. Without them, post-incident analysis is guesswork.
Tier 3 is where agents earning full autonomy qualify, or do not. Most teams skip it on first builds. That is exactly why those agents perform well in supervised mode and silently drift the moment oversight is relaxed.
Lineage is the hard part. Not technically — implementing it is straightforward. The hard part is buy-in from whoever owns the upstream systems. Every record the agent acts on needs a traceable chain: where it came from, when it last updated, who authorized it for autonomous use. Without that chain, post-incident analysis is guesswork. Regulators do not accept guesswork.
ISO/IEC 42001 — the first international AI management system standard — specifically mandates that organizations maintain records of AI system performance and decision-making, with log retention aligned to regulatory risk tier: 180 days minimum, 365 days for high-risk decisions.[9] If your agent touches customer data, financial records, or any regulated domain, audit trail retention is not a nice-to-have.
Data drift is the failure mode that creeps in slowly. An agent trained or prompted against a data distribution that shifts over time will degrade silently — no errors, just increasingly wrong outputs. The industry standard threshold is Population Stability Index (PSI): below 0.1 is stable, 0.1–0.2 warrants investigation, above 0.2 requires action — either retraining, re-prompting, or restricting the agent's autonomy until the distribution is understood.[7] Configure PSI monitoring before you scale, not after the agent starts returning strange results.
Cost tracking is the underrated layer. High-volume agents — RAG pipelines hitting a vector store on every invocation, agents querying a live database per task — generate per-call costs that compound silently until someone checks the billing dashboard. Build cost observability before you scale, not after.
How to pass Tier 1 and Tier 2 on a 2009 Oracle database with no SLA and a former owner.
The hardest case: you need to clear Tier 1 and Tier 2 against an Oracle instance from 2009 that has no SLA documentation, undocumented schema, and a listed owner who left two years ago. You cannot re-platform. You have two sprints.
The approach that works is a semantic wrapper layer — a lightweight service between the agent and the legacy system that enforces the schema contract and freshness SLA on every read. The legacy database is unchanged. The wrapper handles the contract the agent expects.
This is the adapter pattern applied to data access, with one specific goal: make legacy data agent-safe without touching the source. The wrapper is read-only — it caches and validates, never writes back. The operational win is the part teams underestimate: the wrapper lets you bolt lineage logging and cost instrumentation onto queries that were previously opaque.
The wrapper also unblocks Tier 2 access scoping. Rather than trying to carve column-level grants on an Oracle schema with years of accumulated permissions, the wrapper exposes only the fields the agent needs. The legacy database credentials never leave the wrapper's environment. The agent authenticates against the wrapper, which authenticates against the database.
Agent queries Oracle directly via JDBC or hand-rolled SQL
No freshness SLA — data is hours or days stale and nobody knows which
Upstream schema changes break the agent silently at runtime
Zero audit trail of what records the agent read or when
DB user permissions are broader than the agent's actual function requires
Agent queries the wrapper API — legacy DB internals are opaque to the agent
Wrapper enforces the freshness SLA and returns a structured error on violation
Schema contract validated on every request; mismatches surface at the boundary
Wrapper logs every read with timestamp, agent identity, and call context
Permissions enforced at the wrapper layer — DB credentials never leave it
One sprint planning slot. Three gates. A go/no-go signal scoped to autonomy, not project life.
The rubric maps each gate to one of three outcomes: pass (proceed), partial (proceed with a documented mitigation), fail (block). Partial means the workaround is real but temporary — the risk is named, the plan is on paper, the owner is real.
You do not need every Tier 3 gate to start building. You do need every Tier 1 gate to pass before any agent code is written, and every Tier 2 gate before real users touch the agent. Tier 3 is the bar for unsupervised production. Most teams reach Tier 2 in the first sprint and Tier 3 over the next two.
| Tier | Gate | Weight | Pass Condition | Fail Action |
|---|---|---|---|---|
| 1 — Foundation | Schema contract | Critical | All expected columns documented and verified against production | Block build |
| 1 — Foundation | Freshness SLA defined | Critical | Max acceptable data age per table specified before build starts | Block build |
| 1 — Foundation | Agent-accessible without human | High | No manual export or copy step in the access path | Scope down or fix |
| 1 — Foundation | Null rates acceptable | Medium | Critical columns under defined null threshold on production data | Document risk |
| 1 — Foundation | Referential integrity | Medium | FK relationships the agent traverses resolve in production | Document risk |
| 2 — Workflow | Data owner on call | Critical | Named person responds to incidents inside four hours | Block autonomous use |
| 2 — Workflow | Quality rules enforced upstream | High | Checks fire before records reach agent-accessible tables | Add quality gate |
| 2 — Workflow | Incident response runbook | High | Runbook covers agent behavior when data quality degrades | Write before shipping |
| 2 — Workflow | Access permissions audited | Critical | Service account is read-only on minimal required column scope | Fix before deploy |
| 2 — Workflow | Schema change notifications | High | Agent team receives alerts before upstream DDL changes ship | Wire in before deploy |
| 3 — Autonomous | Lineage tracked | Critical | Every acted-on record has queryable provenance | Scoped deploy only |
| 3 — Autonomous | Audit trail enabled | Critical | All reads and writes logged with agent identity context; 180+ day retention | Block full autonomy |
| 3 — Autonomous | Cost monitoring live | High | Per-agent data cost visible in dashboard before scale | Add before scaling |
| 3 — Autonomous | Data drift alerts (PSI) | High | PSI monitored; alert at >0.1, restrict autonomy at >0.2 | Monitor manually first |
| 3 — Autonomous | Rollback path defined | High | Runbook covers reverting actions taken on corrupted records | Define before scale |
Gates scope autonomy. They do not blanket-block projects.
The right response to a Tier 1 failure is not to push the agent into production and hope. It is also not necessarily to halt the project. Gate failures define the agent's valid operating scope.
Fail a freshness SLA gate? The agent runs historical analysis tasks but not real-time decisions. That is a scoped deploy, not a dead project. Fail a lineage gate? Run the agent in supervised mode with human review on every batch — autonomy is something the data has to earn. The tier structure maps directly to the autonomy level the data can safely support.
One finding that surprises teams: those who run this gate and fail two or three checks consistently ship faster than teams who skip it. Discovering a freshness SLA problem in sprint planning costs two days. Discovering it in week six, after building decision logic on top of stale data assumptions, costs weeks of rework plus the credibility hit of a failed demo. Failing early is the better outcome. Always.
Full autonomy is not the default destination. Most agents should stay at Tier 2 permanently.
Tier 3 is not the goal for every agent. It is the bar for agents that need to operate without human oversight on consequential decisions. Most agents — internal analytics, draft generation, data enrichment, recommendation engines — should stay at Tier 2 with a human in the loop on anything that crosses a dollar threshold or touches a customer-facing record.
The clearest signal that Tier 3 is premature: you cannot explain, in one sentence, what the rollback process is if the agent acts on corrupted data. If that sentence takes more than 30 seconds to draft, the autonomy level is ahead of the data infrastructure.
A useful heuristic for the decision: if a human would need to review the output before it triggers an external action, the agent belongs at Tier 2. Tier 3 is for agents where the review cost is higher than the error cost — which is true for a smaller fraction of use cases than most teams assume when they first scope an agentic project.
Irreversible external actions require full lineage and audit trail before autonomy is granted.
ISO/IEC 42001 mandates documented evidence of AI system performance and decision-making for high-risk domains.
Freshness SLAs and schema contracts are domain-specific. A pass on one table does not transfer to another.
Tier 3 adds compliance overhead without material risk reduction when no external action is triggered.
Read-only agents with no action path carry lower blast radius — but still need schema and access controls before production.
Do I need every tier passing before I write a line of agent code?
No. Tier 1 has to pass — it establishes whether the data is even readable. Tier 2 has to pass before real users touch the agent. Tier 3 is the bar for unsupervised production. Build and iterate against Tier 1 while Tier 2 and Tier 3 work happens in parallel. The mistake is not running the gate at all, not running it in stages.
What if I do not own or control the data?
Most Foundation gates are checkable with read-only access — run the schema and freshness queries yourself. If there is no identifiable data owner, that is a Tier 2 failure: escalate before building, do not paper over it. For undocumented legacy systems, the wrapper pattern adds a contract layer the source never sees. You define the contract you need, the wrapper enforces it, the legacy system stays untouched.
What if nobody knows how stale our legacy data actually is?
Measure it before you write the SLA. Run the freshness check from the Tier 1 code over 30 days of historical records to get observed maximum age. Add 20% margin and use that as your baseline. Then decide whether your agent's specific decisions are safe at that staleness — an agent recommending products tolerates more lag than one processing refunds. The number comes from the data, not from a meeting.
Is this gate sufficient for HIPAA, SOC 2, or PCI?
No, and it is not designed to be. This gate produces a reliable build decision, not a compliance posture. Regulated environments need additional controls: data classification tags, encryption in transit and at rest, access log retention policies, breach notification paths. Treat this as the engineering foundation you layer compliance controls on top of — not a substitute.
How do I set a PSI drift threshold for a use case I have never run in production?
Start with the industry standard: alert at PSI > 0.1, restrict autonomy at PSI > 0.2. Run the agent in supervised mode for your first 30 days to collect a baseline distribution. If you see frequent false-positive alerts at 0.1 on a feature you know is inherently noisy, raise that feature's threshold to 0.15. Never raise above 0.25 without a written justification reviewed by the data owner. The goal is a threshold you will actually act on — not one you tune until alerts stop firing.
We have dozens of legacy tables the agent might need. Do we run this gate on all of them?
No. Run it on the tables the agent will read on the critical path for its first use case. Scope the initial gate to the minimum data surface — typically 3–6 tables. Document which tables are in scope and which are deferred. As the agent's capabilities expand, each new data domain gets its own gate run before the agent can access it. The gate is cheap; the cost is the 30 minutes, not the documentation.
Eighty-five percent of enterprises are running agentic AI on a data foundation that is not ready for it.[6] The ones that close the gap are not running longer governance programs. They are running shorter, sharper gates — and running them before the first tool call, not after the first failure.
Cosine similarity scores look fine while your RAG pipeline gives wrong answers. Four failure modes that produce confident, wrong outputs — and the retrieval stack that actually fixes them.
Most production agent failures are not model failures. They are missing constraints — business rules carried in four engineers' heads with no formal representation agents can query. The fix is a versioned, governed context store the data team owns instead of answers.
Eight in ten agentic AI projects stall on data, not models. Score your environment on ten dimensions before the agent surfaces the gaps. Four tiers, calibrated thresholds, structural fixes ordered before operational ones.