Seven patterns for moving DB2, IMS, and VSAM data into RAG: nightly EBCDIC export, CDC, federation, event sourcing, dual-write, schema-on-read, and RAG over the COBOL itself. Pick by freshness budget, not preference.
Per Precisely; roughly 72% of global transactional workloads still execute on the mainframe[1]
Hundreds of billions of operations per year across banking, insurance, and retail — the data plane your RAG system needs to read[2]
The most common extraction pipeline. It is also the reason RAG built on top of it answers yesterday's question with today's confidence.
30% IT cost reduction, 50% faster data processing, 99.99% of changes reflected in cloud within four seconds — using Precisely CDC + Confluent.[9]
Seven concrete patterns for moving DB2, IMS, and VSAM data into a RAG pipeline — each with freshness, MIPS cost, and failure mode
The two-phase EBCDIC conversion order that prevents silent numeric corruption (with working Python)
When CDC beats federation, when federation beats CDC, and why most teams end up running both
The batch-window blind spot that breaks event sourcing architectures without a CDC fallback
A domain-by-domain freshness budget exercise you can run Monday morning
IBM z17 and Spyre: what on-platform inference means for the RAG architecture decision
Pre-production checklist: 12 gates before the first retrieval query goes live
Every mainframe-to-RAG project starts from the same architecture diagram. A box labeled "enterprise data" with an arrow pointing at a box labeled "vector database." Nobody asked what was inside the first box. The AI strategy assumed Postgres or Snowflake. The reality is DB2 on z/OS, IMS hierarchical segments that require COBOL procedural navigation to read, and VSAM files in EBCDIC — a character encoding that predates ASCII's widespread adoption and that has burned every team that assumed a file export would just work.[1]
The bridge between those two realities is the hard problem. Not the embedding model. Not the chunking strategy. The upstream pipeline, its freshness characteristics, and whether the records it hands your vector store are recent enough to be trusted. Everything downstream is decoration on a bad foundation.
This catalog covers the seven patterns operators actually run. Each trades freshness, blast radius, and MIPS cost differently. Most enterprises end up running two or three at once across different data domains. Not from indecision. From the fact that a single pattern applied to a heterogeneous mainframe estate optimizes for the wrong domain every time.
Three structural forces collapse mainframe-to-RAG projects that nailed everything north of the data layer.
The difficulty is structural. Three forces, and none of them yield to a better model.
The formats are foreign. EBCDIC is not ASCII with a swapped lookup table. It carries different control characters, a different sort order, and special-character handling that quietly corrupts data the moment a team treats the conversion as a one-byte-for-one-byte swap. Packed decimal (COMP-3) stores two digits per byte with a trailing nibble for the sign. Zoned decimal scatters digits across both nibbles with an overpunch sign convention. There are also regional EBCDIC variants — EBCDIC-037 (used in the US), EBCDIC-500 (international), and EBCDIC-1047 (Open Systems) — each with different mappings for characters like [, ], {, }. Convert EBCDIC to UTF-8 before unpacking the numerics and the numbers are wrong. Unpack with the wrong copybook and the numbers are plausible nonsense.[8] Neither failure announces itself until a retrieval result is off in a way nobody can attribute back to the byte stream.
The latency gap is enormous and it lands inside the answer. A nightly extract is T+24h stale by the time embeddings reach the vector store. Account balance, open claim, current inventory — for any domain that moves intraday, T+24h is not a delay. It is a confidently wrong answer that erodes trust faster than having no RAG system at all. Freshness is not a technology preference. It is a property of the data domain.
The people who knew the schema have retired. COBOL copybooks — the byte-level layouts describing what each field of a VSAM record means — are routinely undocumented, inconsistently maintained, or missing entirely for files modified across decades. The data engineer assigned to build the pipeline inherits a system where the only authoritative schema definition is a COBOL program written in 1992, last touched by someone who left in 2009.[7] That is not a documentation gap. That is the default state.
Each one trades freshness, blast radius, and MIPS cost differently. None is universally correct.
The taxonomy below runs from a nightly batch export to a real-time event stream. None of them is universally correct. The only question that matters: what freshness does this data domain require, and what does meeting that freshness cost in MIPS, latency, and on-call load?
Most enterprises run two or three of these in parallel. Reference data — product codes, regulatory tables, org hierarchies — tolerates nightly refresh and belongs in Pattern 1. Customer account state feeding fraud detection cannot live there. The architecture should encode that asymmetry, not paper over it.
One scar from running these projects: teams that wire up CDC before mapping freshness requirements always regret it. A pilot we sat through configured CDC for 40 DB2 tables before the business owners had reviewed the staleness windows. 35 of the 40 turned out to be reference data with a T+24h budget. Three months of IIDR licensing and MIPS overhead spent on a nightly-refresh problem. Map the freshness requirements first. Pattern selection is downstream of that map.
| Pattern | Freshness | Operational Load | MIPS Cost | When NOT to use it |
|---|---|---|---|---|
| T+24h | Low | Low | Any domain where a stale answer has material business consequence — balances, open claims, inventory |
| T+1–5 min | High | Medium–High (budget 3–8%) | Reference data with T+24h budget — CDC overhead is not justified by the freshness gain |
| Real-time query | Medium | High per query | High-frequency retrieval paths — MIPS cost at volume is a budget event, not a line item |
| Sub-minute | High | Medium | Any domain where batch jobs write directly to DB2 or VSAM — those writes are invisible to the event layer |
| Real-time on write path | Very High | High (double) | Steady state — reconciliation cost compounds daily; retire it after cutover or it never leaves |
| T+24h (typically) | Medium | Low | Financial fields without human numeric verification — misread COMP-3 corrupts silently, permanently |
| As-of last commit | Low | Low | Answering live transactional questions — this is schema discovery, not operational data retrieval |
Every mainframe team has this pipeline. A JCL job fires at 02:00, exports DB2 tables or VSAM files to flat files, those land via FTP or MFT, an ETL job picks them up, converts EBCDIC to UTF-8, unpacks the COMP-3 fields, and loads the result into a warehouse or object store. It is honest, cheap, and well-understood.
Where it earns its keep: reference data that moves slowly — product catalogs, regulatory code tables, org hierarchies, postal mappings. Compliance snapshots. Anything where T+24h is acceptable and the operational tax of a real-time pipeline is not justified by the question being asked.
The failure mode is reuse. Account balances, open claims, current inventory, in-flight orders — they change during the business day. A RAG system answering questions about them off last night's snapshot produces confidently wrong answers, with detail. Surface the export timestamp in every retrieval result. If you do not, users will trust the wrong answer longer than they should — and that trust does not come back.
The EBCDIC step gets less attention than it deserves. The conversion is two phases, not one, and they must run in the right order — get this wrong and numbers corrupt silently.[8] The open-source Cobrix library for Apache Spark handles copybook-driven extraction reasonably well for VSAM files with known layouts.[7] For DB2, DSNUTILB UNLOAD produces delimited ASCII directly and sidesteps most of the encoding surface. Verify numeric field ranges against known-good control records before trusting the pipeline at production volume. Once corrupted embeddings are in the vector store, you do not patch them — you reindex.
CDC is the production-grade option when the domain demands near-real-time freshness. Instead of taking snapshots, CDC reads the database transaction log — the DB2 BSDS, the IMS OLDS, a VSAM journal — and emits every committed change as a structured event. Inserts, updates, deletes flow downstream with 1–5 minute latency under normal load.
The tools that ship. IBM InfoSphere Data Replication (IIDR) — rebranded IBM Data Replication — is the incumbent for DB2 z/OS CDC, with native IMS and VSAM source support.[3] Qlik Replicate runs a zero-footprint agent that avoids installing software on the mainframe itself, which matters enormously to mainframe operations teams who treat the MIPS budget as a controlled substance.[4] Precisely Connect is the third major option, historically stronger for VSAM and sequential file CDC where IIDR has been weaker — and it has a public production reference: Citizens Bank combined Precisely CDC with Confluent's Kafka platform to stream mainframe changes to the cloud in under four seconds (99.99% SLA), cutting IT costs 30% and data processing time 50%, saving roughly $1M annually.[9]
What CDC captures, and where it goes silent. For DB2 z/OS, archive-log CDC is well-understood and reliable for row-level changes. For IMS, CDC captures segment inserts, updates, and deletes — but reconstructing the hierarchical relationships between parent and child segments requires logic the CDC tool has to get right, and not all of them do for complex PCB structures. For VSAM, journal-based CDC depends on the file being opened for output with journaling enabled, which is not the default. Many VSAM files are written by batch jobs that open them in DISP=OLD without journaling. Those writes are invisible to the CDC agent. Invisible failure modes are the worst kind.
MIPS overhead is real, and the vendor will tell you it is not. Every CDC tool claims minimal mainframe impact. Every mainframe operations team disagrees. Budget 3–8% MIPS overhead as the working assumption and negotiate it with capacity planning before launch, not after the first invoice cycle. Deploying the Kafka broker on Linux on Z via IBM Event Streams — co-located on the same system as the DB2 source — reduces cross-network latency and can cut the MIPS overhead from encryption by using HyperSocket memory-speed communication instead of encrypted TCP.[5]
Federated query virtualization — Denodo, Trino, Starburst — offers a clean promise. Expose mainframe data as SQL without copying it anywhere. The RAG retriever issues a query. The federation layer pushes it to the DB2 subsystem. The result comes back. No replication lag, no schema sync, no embedding rebuild.
Where it earns its keep. Ad-hoc reads on reference data with no sub-second latency requirement. Data-catalog flows where a human analyst spot-checks mainframe records. Reporting paths where query frequency is low and operations has pre-negotiated query windows with the DBAs. IBM Watsonx Data — a Presto/Iceberg lakehouse that pairs with IBM Data Gate — offers this capability with better governance and lineage tooling than standalone Trino for teams already inside the IBM ecosystem. Data Gate queries still execute on the mainframe, so the MIPS cost still applies.
The trap is throughput. Federation does not eliminate compute cost. It moves it onto the mainframe. Every federated query runs on z/OS and consumes MIPS. At RAG retrieval volume, this is not a line item — it is a budget event. A retriever issuing 500 federated queries per minute lights up the MIPS invoice and your mainframe operations team will escalate to your VP before you have a chance to explain the architecture. A simple indexed DB2 lookup over federation costs 50–200ms under normal conditions; at scale that latency compounds before the MIPS bill does.
The honest framing: federation is not a replacement for replication. It is a complement for low-frequency authoritative reads where the latency of a live mainframe query fits inside the retrieval SLA. For high-frequency retrieval paths, replicate the hot data into a modern store and reserve federation for the records that demand a live read or sit behind access controls you cannot mirror.
CDC captures changes at the database layer. Event sourcing captures them at the application layer — CICS transactions, IMS applications, z/OS Connect services emitting business events to a message bus. IBM Event Streams runs Apache Kafka on Linux on IBM Z, letting you co-locate the cluster with the mainframe and shrink cross-LPAR latency to sub-minute delivery from application transaction to downstream consumer.[5]
Why events beat CDC where the question is semantic. A CDC event tells you which DB2 columns changed and to what value. An application-emitted business event tells you why — claim denied because of fraud review, payment reversed on an NSF condition, account flagged by a risk rule. That context does not exist in the database log. For RAG systems answering business questions, the richer event is the difference between an accurate answer and a technically correct one.
The trap is the batch window. Enterprise mainframe estates run overnight batches where COBOL jobs make large-scale updates to DB2 tables and VSAM files through direct file access, bypassing the online transaction layer entirely. Application event sourcing sees none of this. An event-sourced pipeline that covers 100% of CICS transactions may still miss 40% of data changes because the batch window is where the bulk happens. Event sourcing without a CDC fallback for batch coverage is not a complete architecture. It is a half-architecture that produces convincing answers about the half of the world it can see.
The practical hybrid: use IBM Z Event Streams to capture CICS and IMS transaction semantics in real time, and run IIDR or Precisely CDC as the batch-window fallback for direct DB2 and VSAM writes. The consumer side merges both streams into a unified change log. The complexity cost is real — but it is the cost of a complete picture, not a decision to accept a partial one.
Dual-write belongs to migrations, not steady state. During cutover — while the application is being moved from mainframe storage to a modern store — every write fires to both systems. The RAG pipeline reads from the new system, which is being continuously populated. The mainframe stays authoritative until it does not.
The write-path interception itself is mechanical: dual-issue inside a distributed transaction or with compensating rollback, log divergence on every commit. The reconciliation job is where dual-write actually lives or dies. Every dual-write architecture requires a parallel verification process that continuously compares records across systems and surfaces discrepancies. That job is typically 3–5x the implementation effort of the dual-write itself, and it runs for the entire cutover — which is always longer than anyone planned for.
Dual-write fits when the migration is bidirectional and legacy consumers still need to read from the mainframe while the AI pipeline reads from the modern store. It is not a long-term steady state. The reconciliation overhead, the dual MIPS consumption, and the operational complexity of keeping two systems consistent are costs that compound daily — and the second-order cost of forgetting to retire dual-write after cutover is permanent.
Sometimes the copybook is missing. Sometimes it exists but was last updated in 2003 and no longer matches the file. Sometimes the file has been modified by seven COBOL programs across fifteen years and nobody can tell you what byte 47 currently means. The schema is not gone — it has drifted past anyone's ability to vouch for it.
The only defensible move is to export the raw bytes and apply schema-on-read at query time. Heuristic field detection, LLM-assisted structure inference, human-verified sample records — assembled into a probabilistic schema that is good enough for embedding and explicit about its uncertainty.
This is risky, and sometimes it is the only path forward. The mechanism: export a 10,000-record sample, run it through a COBOL layout inference tool (or prompt an LLM with byte-frequency distributions and known field constraints), generate a candidate copybook, validate it against a known-good control set, and carry a schema-confidence metadata field on every embedded record. Retrieval results derived from schema-on-read data must surface their confidence to the user. Hide it and you have built a confident liar.
Never use schema-on-read for financial fields without human verification of every inferred numeric conversion. A packed decimal misread as a character string in a customer balance record is not recoverable by patching the schema later — the wrong embeddings are already in the vector store, and corrupt embeddings do not get corrected. They get reindexed.
The recovery path when schema-on-read is the only option: start with Pattern 7. Query the COBOL source to recover candidate field layouts before inferring anything statistically. The programs that write the VSAM file usually contain the field definitions in their DATA DIVISION — and querying the source is faster and more reliable than statistical inference from byte patterns.
Most teams miss this one. It is also the cheapest to ship and the fastest to repay its cost. The COBOL source — programs, copybooks, JCL, PROCs — is the authoritative documentation for what the data means and how it is laid out. Index it into a vector store. When an engineer or a downstream pipeline asks "what does field ACCT-BAL-CURRENT mean in the customer master file," the retriever returns the relevant copybook and the COBOL procedures that read or populate it.
This is the leverage point for the Pattern 6 scenario. When the copybook is missing, querying the COBOL source recovers the layout from the WRITE statements, the MOVE statements, and the FD definitions scattered across multiple programs. It does not replace the engineer who knew the system in 1998. It surfaces the relevant code in seconds instead of hours of grep across a 40-year-old repository — and that compresses the recovery loop from days to minutes.[7]
Build this first, before any data pipeline work begins. The investment is near-zero: chunk the COBOL source, embed it, load it into the same vector store you are building for data retrieval. The return is immediate: every Pattern 1–6 implementation decision benefits from having the source as a queryable artifact. Developers building the EBCDIC transformer can query the copybook for exact field offsets. Schema-on-read teams can cross-reference their inferred layouts against the WRITE statements that produced the file. Architects can query for all programs that touch a given VSAM dataset to understand the write paths before designing the CDC coverage.
IBM announced the z17 in April 2025 with the Telum II processor — a 5.5 GHz chip with a built-in AI accelerator capable of more than 450 billion inferencing operations per day at one millisecond latency.[10] The optional Spyre Accelerator, available from October 2025 as a PCIe card, adds 32 AI-optimized cores per card with RDMA-based multi-card scaling at 64 GB/s — a fully loaded z17 can take 192 Spyre cards, putting 6.1 TB of accelerator memory inside the mainframe's security boundary.[11]
The architecture question this creates: if the AI model runs on z/OS, does the RAG pipeline still need an external vector store?
Not quite, but it changes the cost calculus. On-platform inference eliminates the data-exfiltration latency and keeps regulated data inside the mainframe's compliance perimeter. IBM's Spyre supports retrieval-augmented generation from relational databases on the same system — which means DB2 data can feed an embedding and a retrieval step without touching the network. For regulated industries where data residency is not a preference but a legal requirement, this is materially different from any hybrid architecture.
What it does not change: the EBCDIC conversion problem, the batch-window blind spot, the schema-drift challenge, and the freshness budget exercise. Those are data-engineering problems, not model-placement problems. A z17 with Spyre still needs clean, structured, current data from its own DB2 schemas before the inference step can be useful. The bridge is shorter, but you still have to build it.
Nightly snapshot reused for high-velocity transactional data — balances and open claims are stale by 09:00
Application event sourcing with no CDC fallback — overnight batch updates are invisible to the pipeline
Federation deployed without MIPS observability — mainframe query load lands as a surprise on the invoice
Schema-on-read without numeric field verification — misread COMP-3 fields produce plausible wrong numbers silently
COBOL source treated as legacy noise — the schema documentation is sitting in the repo, unindexed
EBCDIC conversion run as a single pass — COMP-3 fields corrupted because byte conversion ran before numeric unpacking
Pattern picked per data domain: CDC for transactional, nightly for reference, schema-on-read where the copybook is gone
Event sourcing for application-layer semantics, CDC as the enforcement fallback for the batch window
Federation reserved for ad-hoc and low-frequency reads; replication for the hot retrieval path
Schema-on-read with mandatory sample verification and a confidence metadata field on every embedded record
COBOL source-code RAG indexed as a parallel layer for schema discovery, developer tooling, and catalog Q&A
Two-phase conversion: COMP-3 fields extracted as raw bytes first, EBCDIC character fields decoded second
Right pattern follows from what the domain requires. Wrong pattern follows from what the team is comfortable building.
The most common failure mode in mainframe-to-RAG projects is picking one pattern and applying it globally. The mainframe estate is not homogeneous. A DB2 database with 200 tables contains domains with radically different freshness requirements. The product master has 300 records updated once a quarter. The account transaction history has 50 million records updated continuously through the business day. Treating those two as the same problem is how the architecture overspends on the cheap domain and underbuilds the expensive one.
Run the freshness budget exercise before any architecture diagram is drawn. For each candidate data domain, answer three questions: what is the business consequence of serving a stale answer, what is the maximum acceptable staleness window, and what does the pattern capable of meeting that window cost to operate?
Reference data — product codes, regulatory tables, org hierarchies, postal mappings — almost always lands in Pattern 1. The operational tax of a real-time pipeline is not justified by the consequence of a one-day lag. Customer balances, open claims, current inventory, and in-flight transactions belong in Pattern 2 or Pattern 4. The consequence of a wrong balance is a customer escalation or, worse, a fraud loss. Audit logs and immutable historical records belong in Pattern 1 or Pattern 3 — they do not change, so freshness is irrelevant and query cost is the only variable. The freshness budget is the leverage point. Everything else is execution.
Every candidate data domain listed with its owning system — DB2 table, IMS segment, VSAM file, sequential dataset
Maximum acceptable staleness window stated in concrete business terms — T+24h, T+1h, T+5min, or real-time
Each staleness window mapped to the pattern tier capable of meeting it, with the operational cost recorded
Domains updated by batch jobs identified — they require Pattern 2 or dual CDC + event coverage, not event sourcing alone
Domains with missing or unreliable copybooks flagged — they require Pattern 6 or preliminary COBOL RAG to recover the layout
Staleness windows signed off by the business owners before any pipeline ships, not after the first retrieval complaint
Attempting to migrate the entire mainframe estate to a vector store in a single phase. Schema surprises, encoding issues, and reconciliation failures compound across domains simultaneously, producing a project that is permanently 80% done. Migrate one domain at a time, prove the pattern, then expand. The blast radius of a domain-by-domain failure is bounded. The blast radius of a big-bang failure is the entire program.
Shipping the extraction pipeline without the verification harness that continuously compares the vector store against the source. Without reconciliation, you do not learn the pipeline broke. You learn it from a user asking a question whose answer is obviously wrong. The harness is not optional infrastructure. It is the only mechanism that catches silent failure.
Underestimating the MIPS impact of CDC tools, federated queries, or extra logging on the mainframe. Every byte read from the source consumes MIPS. Operations teams account for MIPS at the dollar level. Get capacity-planning approval before the first production query, not after the first invoice cycle.
Assuming EBCDIC-to-UTF-8 conversion is a solved problem a standard library handles. It is not. EBCDIC has regional variants (EBCDIC-037, EBCDIC-500, EBCDIC-1047), packed decimal fields must be excluded from byte-level conversion and unpacked separately, and EBCDIC sort order differs from ASCII in ways that quietly break range queries. Test the conversion against production data before declaring the pipeline production-ready.
Feeding RAG with mainframe data that has not been checked against a known-good control set. A vector store populated with structurally corrupted records — bad encoding, wrong decimal placement, truncated fields — produces retrieval results that look authoritative and are factually wrong. Verify before you embed. After the first complaint is too late — the embeddings are already there.
Work with application and business owners to produce a complete list of candidate domains, source systems (DB2 table, IMS segment, VSAM dataset, COBOL-generated flat file), and freshness budgets. This is the only artifact that makes every downstream pattern choice defensible. Skip it and the architecture is a guess.
Index the COBOL programs, copybooks, and JCL into a vector store. Operationally near-free, immediate developer leverage for schema discovery, and the foundation for recovering schema information in Pattern 6 domains. It also forces the team to build the embedding and retrieval infrastructure on a dataset that does not pressure the vector store yet.
Pick two reference domains with clean T+24h or looser budgets. Run the full pipeline — EBCDIC conversion (two phases, COMP-3 first), field validation, embedding, vector store load — and verify against known-good control records before any retrieval query is enabled in production. The first production query is too late to discover the encoding bug.
The reconciliation job that continuously compares vector store records against the mainframe source and alerts on divergence must exist before Pattern 2 or Pattern 4 is introduced. CDC and event sourcing fail silently. The verification harness is the only mechanism that catches it. Logging that records 'pipeline succeeded' is not observability — it is an alibi.
Can we skip CDC and just snapshot nightly?
For reference data and historical records, yes — often the right call. For transactional data where the business consequence of a stale answer is material (account balances, open claims, current inventory), no. The decision belongs to the domain's freshness budget, not to the team's preference for simplicity. Nightly snapshots over high-velocity data produce a RAG system that confidently answers questions with yesterday's facts. That is worse than no system, because the wrongness is not visible.
How do we handle EBCDIC and packed decimal correctly?
Treat the conversion as two phases, in order. First, identify every COMP-3 (packed decimal) or COMP (binary) field from the copybook and extract them as raw bytes before any EBCDIC-to-UTF-8 conversion runs. Second, unpack those numeric fields with a COBOL-aware parser, then decode the remaining character fields with the correct EBCDIC variant (EBCDIC-037 for US, EBCDIC-500 for international — confirm per source file). Cobrix (open source, Apache Spark) handles this well for VSAM with known copybooks. For DB2 UNLOAD output, IBM's DSNUTILB utility produces delimited ASCII directly, sidestepping most of the encoding surface. Test both against known-good control records before going live.
Federation or replication?
Federation (Denodo, Trino, Starburst, Watsonx Data) when query frequency is low, freshness must be absolute (zero replication lag), and MIPS cost per query is acceptable. Replication (CDC or nightly) when query frequency is high, the retrieval SLA is tight, and MIPS cost needs to be bounded. The pattern that ships in production is usually both: replicate the hot path into a modern store, keep federation as a fallback for records that have not been replicated yet or that demand an authoritative live read.
Does IBM z17 with Spyre change this architecture?
Partially. For organizations where data residency requirements prohibit sending records outside the mainframe perimeter, z17 with Spyre makes on-platform embedding and RAG inference feasible without network egress — a 5nm, 32-core PCIe accelerator with up to 6.1 TB of accelerator memory per system. What does not change: the EBCDIC conversion work, the batch-window blind spot in event sourcing, the schema-on-read problem, and the freshness budget exercise. Those are data-engineering problems, not model-placement problems. The bridge is shorter, but you still have to build it.
What does CDC miss that event sourcing catches — and vice versa?
CDC misses business context: it tells you a column value changed, not why the business decision behind that change was made. An application event from CICS carries the transaction code, the reason code, the operator ID — semantic information that never touches the DB2 row. Event sourcing misses batch-window writes: COBOL jobs that open VSAM files or DB2 tables directly, outside the online transaction layer, are invisible to application-event pipelines. A complete architecture runs both and merges them. The overlap is a feature, not a cost — double coverage on the same record is how you catch divergence.
When is the migration actually done?
When the verification harness shows less than 0.1% record divergence between vector store and mainframe source for 30 consecutive days, the retrieval latency SLA holds at p99 under production load, and the business owners have signed off on the freshness windows for every domain. 'Done' is not a technology milestone. It is a trust milestone. The verification harness is how you earn that trust systematically — not by asking stakeholders to take the architecture on faith.
The hardest part of enterprise AI is not the LLM. It is not the embedding model, the chunk size, or the choice between cosine and dot-product similarity. It is the half-mile between a 1985 DB2 schema and a 2026 vector database — the EBCDIC conversion logic nobody documented, the packed decimal fields that corrupt silently when decoded in the wrong order, the batch jobs that update half the records in the system without touching the application event layer, and the copybooks last maintained by someone who retired before the iPhone existed.
Every one of those problems is solvable. None of them is solved by picking a better retrieval algorithm or by waiting for IBM to add an AI accelerator to the box. Plan the bridge before you talk about agents. Build the verification harness before you trust retrieval results. Pick the pattern that matches the domain's freshness budget, not the one that looked simplest in the architecture presentation. The teams that get this right treat the data pipeline as the product. The teams that get it wrong discover, six weeks after go-live, that their RAG system is confidently answering questions about a world that no longer exists.
Cosine similarity scores look fine while your RAG pipeline gives wrong answers. Four failure modes that produce confident, wrong outputs — and the retrieval stack that actually fixes them.
Most production agent failures are not model failures. They are missing constraints — business rules carried in four engineers' heads with no formal representation agents can query. The fix is a versioned, governed context store the data team owns instead of answers.
Eight in ten agentic AI projects stall on data, not models. Score your environment on ten dimensions before the agent surfaces the gaps. Four tiers, calibrated thresholds, structural fixes ordered before operational ones.