Data Readiness Before Build: 3-Tier Gate for Agentic AI

Data Readiness Before Build: The Three-Tier Gate for Agentic Projects

60% of agentic projects stall on data, not models. A 30-minute, three-tier gate — Foundation, Workflow, Autonomous — that decides what autonomy your data can actually support, with a retrofit pattern for legacy systems you cannot rewrite.

Data, Context & KnowledgeintermediateApr 21, 20266 min read

By Viktor Bezdek · VP Engineering, Groupon

Gartner found that 60% of agentic AI projects stall or get cancelled because the data was not ready when development started.^[1] Not the model. Not the framework. Not the team. The data. Walk into any kickoff and you find one of two things: a 47-line enterprise audit nobody finishes, or nothing at all.

One team we tracked shipped a customer churn agent that passed every offline test on a clean CSV. Production hit them in week six. The CRM's last_active column was being backfilled by a nightly batch — the agent was treating data 23 hours stale as live, recommending retention plays for customers who had already cancelled. A 30-minute review would have caught it in minute eight.

This checklist closes that gap. Three tiers, each with a binary pass/fail gate. Tier 1 clears you to build. Tier 2 clears you to ship to users. Tier 3 clears you to remove the human from the loop. None of it takes longer than a sprint planning session.

60%

Of agentic projects cancelled by data gaps

Gartner, August 2025 — insufficient AI-ready data is the primary cancellation driver, not model quality ^[1]

27%

Of pre-production failures traced to data quality

Digital Applied, March 2026 — second most common failure pattern, behind only scope creep ^[2]

67%

On-time delivery rate when data is pre-audited

vs. 18% for teams that hit data issues mid-build. The same gate run earlier closes a 3.7x gap ^[3]

30 minutes

Full assessment fits inside one sprint planning slot — no governance program required

3 tiers

Foundation, Workflow, Autonomous. Each gate maps to a specific autonomy level

Legacy-ready

Retrofit pattern adds a contract layer to legacy systems without touching the source

Go/no-go signal

A failed gate scopes autonomy down — it does not blanket-block the project

Why the Standard Pre-Build Audit Never Gets Run

Enterprise frameworks exist. Lean ones do not. So teams skip the review entirely.

The Gartner data readiness checklist is genuinely thorough. It is also paywalled, dozens of line items long, and requires coordinated input from data governance, legal, and infrastructure. For a product team with a two-week runway to prove a first agent, it is functionally inaccessible.

The result is a binary collapse. Teams either skip the data review entirely, or they do a surface-level "we have the data" check and move on. The first approach fails in week four when the agent returns nonsense. The second fails in production when the agent acts on a stale or malformed record and ships a real consequence to a real customer.

The 30-minute scoreboard below is not a replacement for a data governance program. It is a go/no-go gate — the minimum signal you need to know whether your data can support the agent you are about to build, and at what autonomy level it is safe to operate.

Tier 1: Foundation — Can the Agent Read What It Needs, When It Needs It?

Schema and freshness are binary. The contract is enforced or it is not.

Foundation gates are binary. The data contract exists and is enforced, or it does not. This tier runs in roughly 10 minutes and blocks everything downstream if it fails.

The two failure modes that show up over and over: schema drift and undefined freshness SLAs. Schema drift happens when an agent is built against a column that gets renamed, split, or silently dropped by an upstream migration nobody told the agent team about. Freshness SLA failures happen because nobody ever wrote down how stale is too stale — which means there is no way to know when the SLA is violated. Drift is the default state of any contract without an owner.

For legacy systems you did not build, you are not checking whether the data is perfect. You are checking whether you can wrap a contract around what exists, and whether the system can honor it.

Tier 1 — Foundation Gates

Schema contract written down: expected tables, columns, types — version-controlled, not in a Confluence page
Freshness SLA defined: max acceptable data age per table specified before any agent code is written
Agent reads without human intervention: no manual export, no copy step, no "someone refreshes the sheet"
Null rates measured against threshold: critical columns checked on production data, not the demo extract
Type consistency verified: no silent string/int coercion, no format drift across rows in the live dataset

tier1_foundation_gate.py

import psycopg2
from datetime import datetime, timedelta

# Foundation gate: schema contract and freshness SLA.
# If either returns pass=False, the build does not start.

def check_schema_contract(
    conn, table: str, required_columns: list[str]
) -> dict:
    """Confirm the columns the agent will reference actually exist."""
    cursor = conn.cursor()
    cursor.execute(
        "SELECT column_name FROM information_schema.columns WHERE table_name = %s",
        (table,)
    )
    actual = {row[0] for row in cursor.fetchall()}
    missing = set(required_columns) - actual
    return {"table": table, "missing": list(missing), "pass": len(missing) == 0}

def check_freshness_sla(
    conn, table: str, timestamp_col: str, max_age_minutes: int
) -> dict:
    """Measure observed data age against the proposed SLA. No SLA, no pass."""
    cursor = conn.cursor()
    cursor.execute(f"SELECT MAX({timestamp_col}) FROM {table}")
    latest = cursor.fetchone()[0]
    age = datetime.utcnow() - latest.replace(tzinfo=None)
    return {
        "table": table,
        "age_minutes": int(age.total_seconds() / 60),
        "sla_minutes": max_age_minutes,
        "pass": age < timedelta(minutes=max_age_minutes),
    }

Tier 2: Workflow — Does Anything Catch Bad Data Before the Agent Acts on It?

Paper governance is not enforcement. The owner answers in four hours or there is no owner.

Governance failures are the most expensive Tier 2 failure mode and almost entirely invisible until an agent takes a bad action in production. The shape is always the same: a data quality policy in Confluence, a data owner assigned in a kickoff meeting, and zero mechanism to surface a violation before the agent reads the data.

This is not hypothetical. An internal analyst agent at a mid-size e-commerce company sent promotional emails to already-churned customers because the suppression list had not been updated since a data migration four months earlier. The owner existed on paper. The policy existed on paper. Neither was wired into the agent's access path.

Tier 2 checks that governance is operational, not documentary. A named owner who actually answers. A quality gate that fires before the agent reads. An incident response runbook the agent team can follow when the data degrades at 3am.

Tier 2 — Workflow Gates

Data owner on call: a named person responds to incidents in this domain inside four hours — not a RACI entry
Quality rules enforced upstream: checks fire before records reach agent-accessible tables, not just in BI reports
Incident response runbook exists: covers what the agent does when quality drops below threshold, in code
Access permissions audited: agent service account is read-only, scoped to exactly what it needs — nothing more
Schema change notifications wired in: agent team sees alerts before upstream changes ship, not after they break

Tier 3: Autonomous — Can You Reconstruct Every Decision the Agent Made?

Lineage and audit trail are the bar for unsupervised production. Without them, post-incident analysis is guesswork.

Tier 3 is where agents earning full autonomy qualify, or do not. Most teams skip it on first builds. That is exactly why those agents perform well in supervised mode and silently drift the moment oversight is relaxed.

Lineage is the hard part. Not technically — implementing it is straightforward. The hard part is buy-in from whoever owns the upstream systems. Every record the agent acts on needs a traceable chain: where it came from, when it last updated, who authorized it for autonomous use. Without that chain, post-incident analysis is guesswork. Regulators do not accept guesswork.

Cost tracking is the underrated layer. High-volume agents — RAG pipelines hitting a vector store on every invocation, agents querying a live database per task — generate per-call costs that compound silently until someone checks the billing dashboard. Build cost observability before you scale, not after.

Tier 3 — Autonomous Gates

Lineage tracked: provenance of every record the agent acts on is queryable after the fact, not reconstructed from memory
Audit trail enabled: every read and write logged with timestamp, agent identity, decision context — replay fidelity
Cost monitoring live: per-agent data access cost visible on a dashboard before scale, not as an incident postmortem
Drift detection configured: alerts fire when input distribution shifts past a defined threshold
Rollback path defined: a runbook to identify and revert agent actions taken on stale or corrupted records

Retrofitting Legacy Systems You Cannot Rewrite

How to pass Tier 1 and Tier 2 on a 2009 Oracle database with no SLA and a former owner.

The hardest case: you need to clear Tier 1 and Tier 2 against an Oracle instance from 2009 that has no SLA documentation, undocumented schema, and a listed owner who left two years ago. You cannot re-platform. You have two sprints.

The approach that works is a semantic wrapper layer — a lightweight service between the agent and the legacy system that enforces the schema contract and freshness SLA on every read. The legacy database is unchanged. The wrapper handles the contract the agent expects.

This is the adapter pattern applied to data access, with one specific goal: make legacy data agent-safe without touching the source. The wrapper is read-only — it caches and validates, never writes back. The operational win is the part teams underestimate: the wrapper lets you bolt lineage logging and cost instrumentation onto queries that were previously opaque.

Raw legacy access

Agent queries Oracle directly via JDBC or hand-rolled SQL
No freshness SLA — data is hours or days stale and nobody knows which
Upstream schema changes break the agent silently at runtime
Zero audit trail of what records the agent read or when
DB user permissions are broader than the agent's actual function requires

Wrapper enforcement

Agent queries the wrapper API — legacy DB internals are opaque to the agent
Wrapper enforces the freshness SLA and returns a structured error on violation
Schema contract validated on every request; mismatches surface at the boundary
Wrapper logs every read with timestamp, agent identity, and call context
Permissions enforced at the wrapper layer — DB credentials never leave it

legacy_wrapper.py

import time
from functools import wraps
from typing import Any, Callable

# Wrapper layer. Source system is unchanged. Contract lives here.
# Read-only by design — caches and validates, never writes back.

_cache: dict[str, tuple[Any, float]] = {}

def with_freshness_sla(max_age_seconds: int = 300):
    """Enforce a freshness SLA on legacy reads without touching the source."""
    def decorator(fetch_fn: Callable) -> Callable:
        @wraps(fetch_fn)
        def wrapper(*args, **kwargs):
            key = f"{fetch_fn.__name__}:{args}:{sorted(kwargs.items())}"
            if key in _cache:
                data, cached_at = _cache[key]
                if time.time() - cached_at < max_age_seconds:
                    return data
            data = fetch_fn(*args, **kwargs)
            _cache[key] = (data, time.time())
            return data
        return wrapper
    return decorator

# Every legacy query carries an explicit SLA. Contract in code, not a wiki.
@with_freshness_sla(max_age_seconds=180)  # 3-minute ceiling
def get_customer_status(customer_id: str) -> dict:
    return legacy_oracle_conn.execute(
        "SELECT status, last_active FROM customers WHERE id = :id",
        id=customer_id
    ).fetchone()

Running the 30-Minute Scoreboard

One sprint planning slot. Three gates. A go/no-go signal scoped to autonomy, not project life.

The rubric maps each gate to one of three outcomes: pass (proceed), partial (proceed with a documented mitigation), fail (block). Partial means the workaround is real but temporary — the risk is named, the plan is on paper, the owner is real.

You do not need every Tier 3 gate to start building. You do need every Tier 1 gate to pass before any agent code is written, and every Tier 2 gate before real users touch the agent. Tier 3 is the bar for unsupervised production. Most teams reach Tier 2 in the first sprint and Tier 3 over the next two.

Tier	Gate	Weight	Pass Condition	Fail Action
1 — Foundation	Schema contract	Critical	All expected columns documented and verified against production	Block build
1 — Foundation	Freshness SLA defined	Critical	Max acceptable data age per table specified before build starts	Block build
1 — Foundation	Agent-accessible without human	High	No manual export or copy step in the access path	Scope down or fix
1 — Foundation	Null rates acceptable	Medium	Critical columns under defined null threshold in production	Document risk
2 — Workflow	Data owner on call	Critical	Named person responds to incidents inside four hours	Block autonomous use
2 — Workflow	Quality rules enforced upstream	High	Checks fire before records reach agent-accessible tables	Add quality gate
2 — Workflow	Incident response runbook	High	Runbook covers agent behavior when data quality degrades	Write before shipping
2 — Workflow	Access permissions audited	Critical	Service account is read-only on minimal required scope	Fix before deploy
3 — Autonomous	Lineage tracked	Critical	Every acted-on record has queryable provenance	Scoped deploy only
3 — Autonomous	Audit trail enabled	Critical	All reads and writes logged with agent identity context	Block full autonomy
3 — Autonomous	Cost monitoring live	High	Per-agent data cost visible in dashboard before scale	Add before scaling
3 — Autonomous	Data drift alerts	Medium	Alerts configured for input distribution shifts	Monitor manually first

Three-Tier Data Readiness Gate Flow

A fail at any Critical-weight gate blocks the downstream tier. Partial gates allow progression with a documented mitigation and a named owner.

A Failed Gate Is Information, Not a Stop Sign

Gates scope autonomy. They do not blanket-block projects.

The right response to a Tier 1 failure is not to push the agent into production and hope. It is also not necessarily to halt the project. Gate failures define the agent's valid operating scope.

Fail a freshness SLA gate? The agent runs historical analysis tasks but not real-time decisions. That is a scoped deploy, not a dead project. Fail a lineage gate? Run the agent in supervised mode with human review on every batch — autonomy is something the data has to earn. The tier structure maps directly to the autonomy level the data can safely support.

One finding that surprises teams: those who run this gate and fail two or three checks consistently ship faster than teams who skip it. Discovering a freshness SLA problem in sprint planning costs two days. Discovering it in week six, after building decision logic on top of stale data assumptions, costs weeks of rework plus the credibility hit of a failed demo. Failing early is the better outcome. Always.

Do I need every tier passing before I write a line of agent code?

No. Tier 1 has to pass — it establishes whether the data is even readable. Tier 2 has to pass before real users touch the agent. Tier 3 is the bar for unsupervised production. Build and iterate against Tier 1 while Tier 2 and Tier 3 work happens in parallel. The mistake is not running the gate at all, not running it in stages.

What if I do not own or control the data?

Most Foundation gates are checkable with read-only access — run the schema and freshness queries yourself. If there is no identifiable data owner, that is a Tier 2 failure: escalate before building, do not paper over it. For undocumented legacy systems, the wrapper pattern adds a contract layer the source never sees. You define the contract you need, the wrapper enforces it, the legacy system stays untouched.

What if nobody knows how stale our legacy data actually is?

Measure it before you write the SLA. Run the freshness check from the Tier 1 code over 30 days of historical records to get observed maximum age. Add 20% margin and use that as your baseline. Then decide whether your agent's specific decisions are safe at that staleness — an agent recommending products tolerates more lag than one processing refunds. The number comes from the data, not from a meeting.

Is this gate sufficient for HIPAA, SOC 2, or PCI?

No, and it is not designed to be. This gate produces a reliable build decision, not a compliance posture. Regulated environments need additional controls: data classification tags, encryption in transit and at rest, access log retention policies, breach notification paths. Treat this as the engineering foundation you layer compliance controls on top of — not a substitute.

Key terms in this piece

data readiness checklistagentic AI data requirementsAI data qualitydata maturity for agentslegacy data retrofitagent data validationdata readiness assessment

Sources

[1]Deepak Seth, Roxane Edjlali — Use This Checklist to Ensure Your Data Is Ready for the Agentic AI Era(gartner.com)↩
[2]Digital Applied — Why 88% of AI Agents Fail Production: Analysis Guide(digitalapplied.com)↩
[3]AI Agent Corps — Why Most Agentic AI Projects Fail (And How to Succeed in 2026)(agentcorps.co)↩
[4]According to Plan — The Anatomy of an Enterprise AI Agent Failure(according-to-plan.com)↩
[5]Matteo Gazzurelli — Why 40% of Agentic AI Projects Fail (And How to Avoid It)(building.theatlantic.com)↩

Share this article

X LinkedIn Hacker News

Data Readiness Before Build: The Three-Tier Gate for Agentic Projects

Data, Context & KnowledgeintermediateApr 21, 20266 min read

By Viktor Bezdek · VP Engineering, Groupon

import psycopg2 from datetime import datetime, timedelta # Foundation gate: schema contract and freshness SLA. # If either returns pass=False, the build does not start. def check_schema_contract( conn, table: str, required_columns: list[str] ) -> dict: """Confirm the columns the agent will reference actually exist.""" cursor = conn.cursor() cursor.execute( "SELECT column_name FROM information_schema.columns WHERE table_name = %s", (table,) ) actual = {row[0] for row in cursor.fetchall()} missing = set(required_columns) - actual return {"table": table, "missing": list(missing), "pass": len(missing) == 0} def check_freshness_sla( conn, table: str, timestamp_col: str, max_age_minutes: int ) -> dict: """Measure observed data age against the proposed SLA. No SLA, no pass.""" cursor = conn.cursor() cursor.execute(f"SELECT MAX({timestamp_col}) FROM {table}") latest = cursor.fetchone()[0] age = datetime.utcnow() - latest.replace(tzinfo=None) return { "table": table, "age_minutes": int(age.total_seconds() / 60), "sla_minutes": max_age_minutes, "pass": age < timedelta(minutes=max_age_minutes), }

import time from functools import wraps from typing import Any, Callable # Wrapper layer. Source system is unchanged. Contract lives here. # Read-only by design — caches and validates, never writes back. _cache: dict[str, tuple[Any, float]] = {} def with_freshness_sla(max_age_seconds: int = 300): """Enforce a freshness SLA on legacy reads without touching the source.""" def decorator(fetch_fn: Callable) -> Callable: @wraps(fetch_fn) def wrapper(*args, **kwargs): key = f"{fetch_fn.__name__}:{args}:{sorted(kwargs.items())}" if key in _cache: data, cached_at = _cache[key] if time.time() - cached_at < max_age_seconds: return data data = fetch_fn(*args, **kwargs) _cache[key] = (data, time.time()) return data return wrapper return decorator # Every legacy query carries an explicit SLA. Contract in code, not a wiki. @with_freshness_sla(max_age_seconds=180) # 3-minute ceiling def get_customer_status(customer_id: str) -> dict: return legacy_oracle_conn.execute( "SELECT status, last_active FROM customers WHERE id = :id", id=customer_id ).fetchone()

Tier

Gate

Weight

Pass Condition

Fail Action

1 — Foundation

Schema contract

Critical

All expected columns documented and verified against production

Block build

1 — Foundation

Freshness SLA defined

Critical

Max acceptable data age per table specified before build starts

Block build

1 — Foundation

Agent-accessible without human

High

No manual export or copy step in the access path

Scope down or fix

1 — Foundation

Null rates acceptable

Medium

Critical columns under defined null threshold in production

Document risk

2 — Workflow

Data owner on call

Critical

Named person responds to incidents inside four hours

Block autonomous use

2 — Workflow

Quality rules enforced upstream

High

Checks fire before records reach agent-accessible tables

Add quality gate

2 — Workflow

Incident response runbook

High

Runbook covers agent behavior when data quality degrades

Write before shipping

2 — Workflow

Access permissions audited

Critical

Service account is read-only on minimal required scope

Fix before deploy

3 — Autonomous

Lineage tracked

Critical

Every acted-on record has queryable provenance

Scoped deploy only

3 — Autonomous

Audit trail enabled

Critical

All reads and writes logged with agent identity context

Block full autonomy

3 — Autonomous

Cost monitoring live

High

Per-agent data cost visible in dashboard before scale

Add before scaling

3 — Autonomous

Data drift alerts

Medium

Alerts configured for input distribution shifts

Monitor manually first

Data Readiness Before Build: The Three-Tier Gate for Agentic Projects

Why the Standard Pre-Build Audit Never Gets Run

Tier 1: Foundation — Can the Agent Read What It Needs, When It Needs It?

Tier 1 — Foundation Gates

Tier 2: Workflow — Does Anything Catch Bad Data Before the Agent Acts on It?

Tier 2 — Workflow Gates

Tier 3: Autonomous — Can You Reconstruct Every Decision the Agent Made?

Tier 3 — Autonomous Gates

Retrofitting Legacy Systems You Cannot Rewrite

Running the 30-Minute Scoreboard

A Failed Gate Is Information, Not a Stop Sign

Related

Data Readiness Before Build: The Three-Tier Gate for Agentic Projects

Why the Standard Pre-Build Audit Never Gets Run

Tier 1: Foundation — Can the Agent Read What It Needs, When It Needs It?

Tier 1 — Foundation Gates

Tier 2: Workflow — Does Anything Catch Bad Data Before the Agent Acts on It?

Tier 2 — Workflow Gates

Tier 3: Autonomous — Can You Reconstruct Every Decision the Agent Made?

Tier 3 — Autonomous Gates

Retrofitting Legacy Systems You Cannot Rewrite

Running the 30-Minute Scoreboard

A Failed Gate Is Information, Not a Stop Sign

Related