Wire an agent into your retrieval pipeline and it inherits every unfinished permission decision in the org chart. A decade of accumulated drift, in one query.
RBAC was a model for humans clicking through a UI. Roles map to resources. The user sees the page or hits a 403. That model collapses the moment the caller is an agent firing hundreds of retrievals per session, traversing data across departments, synthesizing across permission domains nobody ever reconciled. The clean role-to-resource diagram was never the enforcement layer. It was the marketing slide.
A March 2026 Cloud Security Alliance study found 68% of organizations cannot distinguish between human and AI agent actions in their access logs[7]. Non-human identities — service principals, API tokens, autonomous agents — outnumber human users by roughly 100 to 1 in the average enterprise. An identity layer built for a few thousand employees is now the policy enforcement point for millions of machine callers, most of them running with far more privilege than their function requires.
This is the architecture for permission-aware RAG. Pre-retrieval filters. Delegated identity. RLS at the data layer. Audit trails that survive the next ACL change. The patterns, the tradeoffs, the code.
One admission first. Permission enforcement and retrieval quality fight each other. Every filter you add cuts leakage risk and cuts the chance the agent finds the most relevant chunk. No architecture eliminates that tension. The only question is who decides where the line goes — you, or the pipeline by accident.
Broader Access Wins the Demo. Narrower Access Survives the Audit.
The standard enterprise permission model was never designed against an adversary that fires hundreds of semantic queries per session.
Every RAG-powered agent runs into the same forced tradeoff. Broader access produces better answers — the agent that cross-references HR data, financial reports, and engineering tickets generates insights no single-domain tool can match. Broader access also surfaces information the requesting user was never supposed to see.
A concrete failure mode. A sales manager asks the internal AI: "What's the competitive landscape for our Q2 deal with Acme Corp?" Answering well needs CRM data, competitive intelligence, deal history, pricing models. All legitimate for a sales manager. The same retrieval pass also pulls board-level strategic memos about the Acme relationship, HR data on the account executive's performance review, and finance margin targets restricted to VP-and-above.
The agent does not know what it should not know. It retrieves on semantic similarity, not on permission boundaries. The gap between semantically relevant and authorized to view is the leakage surface. Nobody owns it by default. That is why it widens.
Static role assignments: admin, editor, viewer
One resource per request
Session auth with a single, clear identity
Permission check at the UI or API gateway, once
Dozens to hundreds of unique callers
Permission scope rebuilt per query, per context
Multi-resource retrieval across domains in a single call
Delegated identity carrying the user's claims downstream
Permission check at every stage of the retrieval pipeline
Thousands to millions of non-human callers, most over-privileged
Cloud Security Alliance, March 2026. The audit trail does not name the deputy.
Estimates vary by org size and tooling. The order of magnitude does not.
The identity layer the agents run on is the same one nobody trusts.
Three Places to Enforce. Pick All of Them.
Pre-retrieval, post-retrieval, hybrid. Each catches what the others let through.
A RAG pipeline gives you three enforcement surfaces. Treat them as alternatives and one becomes a single point of failure. Treat them as a stack and the failure modes stop overlapping.
Pre-retrieval filtering attaches permission metadata to every chunk at ingestion and adds filter clauses to the vector search query. The vector database returns only chunks the caller is authorized to see. Sensitive data never enters the pipeline at all.
Post-retrieval filtering runs semantic search first, then routes the top-k chunks through an authorization service that strips anything the user shouldn't see before the LLM gets context. Catches the chunks that were mis-tagged at ingestion. Last layer between the index and the model.
Hybrid is the only configuration that holds. Broad pre-retrieval filters — department, classification level — narrow the search space. Fine-grained post-retrieval checks handle the relationship-based permissions metadata cannot express. Each layer absorbs what the layer above it lets through.
| Dimension | Pre-Retrieval | Post-Retrieval | Hybrid |
|---|---|---|---|
| Sensitive data exposure | Never enters the pipeline | Fetched into memory, then filtered | Minimized by pre-filter, eliminated by post-filter |
| Retrieval quality | Misses semantically relevant chunks when filters narrow too hard | Best semantic results, then pruned | Strong — broad pre-filter preserves relevance |
| Performance | Fast — the database does the filtering | Slower — extra auth service call per chunk | Moderate — balanced between both |
| Permission model complexity | Simple metadata tags: role, department, classification | Supports ReBAC, ABAC, complex policies | Any model supported |
| Implementation effort | Low — metadata at ingestion, filters at query | Medium — requires a dedicated auth service integration | Higher — both systems coordinated |
| Best for | High-volume, simple permission models | Complex enterprise hierarchies | Production |
Identity Has to Survive the Trip Through the Pipeline
Most agent stacks lose the user somewhere between authentication and the vector search. That is the leak.
The hardest part of permission-aware RAG is not the filter logic. It is making sure the user's identity and permissions actually propagate through every stage of the pipeline without being lost, elevated, or confused.
Most agent architectures have a gap here. The user authenticates at the application layer, but by the time the request reaches the vector database, it is running under a service account with broad access. The permission check happens — if it happens — at the application layer after retrieval, not at the data layer during retrieval.
This is the confused deputy problem applied to AI pipelines. The agent is authorized to access everything. It should only retrieve data on behalf of the specific user who made the request. When the agent's identity is used for retrieval instead of the user's, every query runs at maximum privilege. Pretending otherwise is theater.
rag-pipeline/permission-context.ts// One context object per request. Travels with every downstream call.
interface PermissionContext {
userId: string;
roles: string[];
departments: string[];
clearanceLevel: 'public' | 'internal' | 'confidential' | 'restricted';
delegatedBy?: string; // Original user when the agent acts on someone's behalf
}
function buildRetrievalFilter(ctx: PermissionContext) {
return {
$and: [
// Pre-filter: clearance ceiling
{ clearanceLevel: { $lte: clearanceLevelToInt(ctx.clearanceLevel) } },
// Pre-filter: department scope
{
$or: [
{ department: { $in: ctx.departments } },
{ department: 'shared' },
],
},
],
};
}
async function retrieveWithPermissions(
query: string,
ctx: PermissionContext,
topK: number = 20 // Over-fetch — post-filter will drop chunks
) {
const filter = buildRetrievalFilter(ctx);
const chunks = await vectorDB.query({ query, filter, topK });
// Post-filter: fine-grained ReBAC check per chunk
const authorized = await authService.batchCheck(
chunks.map((c) => ({
subject: ctx.userId,
action: 'read',
resource: c.metadata.resourceId,
}))
);
return chunks.filter((_, i) => authorized[i]);
}Row-Level Security: Enforcement the Agent Cannot Bypass
When the database refuses to return the row, the application layer cannot forget to check.
Row-level security pushes enforcement down to the database layer. The agent — or the application — physically cannot read rows it should not see. There is no "forgot to add the filter" failure mode, because the filter is not in the application code.
PostgreSQL has supported RLS since 9.5, and it is the cleanest fit for RAG pipelines running on pgvector or Supabase. Define policies that reference the current user's role or session variables and the database appends the filter to every query automatically. The agent never constructs the filter itself. It just queries.
Supabase pushes this further with their RAG-specific pattern: each chunk carries an owner_id column, and the RLS policy checks the authenticated user's JWT claims against the row[5]. Combined with Edge Functions for the agent runtime, you get end-to-end propagation from the user's browser to the vector search to the LLM context. Identity does not get dropped on the floor, because the database refuses to do work without it.
supabase/rls-policy.sql-- RLS on. The policy decides what the agent sees, not the application.
ALTER TABLE document_chunks ENABLE ROW LEVEL SECURITY;
-- Users see chunks they own or that are shared with them
CREATE POLICY "Users read own or shared chunks"
ON document_chunks FOR SELECT
USING (
owner_id = auth.uid()
OR classification = 'shared'
OR EXISTS (
SELECT 1 FROM document_access_grants
WHERE document_access_grants.document_id = document_chunks.document_id
AND document_access_grants.grantee_id = auth.uid()
AND document_access_grants.permission >= 'read'
)
);
-- Department-scoped access from JWT metadata
CREATE POLICY "Department members read department docs"
ON document_chunks FOR SELECT
USING (
department = (auth.jwt() -> 'app_metadata' ->> 'department')
OR classification IN ('public', 'shared')
);
-- The agent queries normally. RLS handles enforcement invisibly.
-- SELECT * FROM document_chunks
-- ORDER BY embedding <=> query_embedding
-- LIMIT 20;The Single Architectural Decision That Decides Your Blast Radius
Service identity or delegated identity. Pick once. Live with the consequences.
The most consequential architectural decision in permission-aware AI is whether the agent authenticates as itself or as the user it acts for. Everything else downstream — filter design, audit fidelity, what a compromised credential exposes — falls out of that choice.
Service account model. The agent has its own identity with broad access. Application code filters results based on the requesting user's permissions. Easier to wire up. The agent's credentials become a high-value target. Compromise it and the attacker gets everything the agent can see, which is usually everything.
Delegated identity model. The agent receives a short-lived, scoped token that carries the requesting user's identity and permissions. Every downstream call — vector search, database query, API request — runs under that token. The agent can only access what the user can access[10].
Delegated identity is strictly better from a security standpoint, and the price is real infrastructure. You need a token exchange mechanism — OAuth2 token exchange, or a custom delegation service — and every system in the pipeline has to honor the delegated token. Most vector databases do not natively support this yet, so you end up implementing it at the application layer with pass-through filters. The blast radius shrinks from "everything the agent can touch" to "everything the user could already touch." That tradeoff is the entire point.
- [01]
Pull the user's identity off the incoming request
typescriptconst userToken = request.headers.get('Authorization'); const claims = await verifyAndDecode(userToken); const permCtx: PermissionContext = { userId: claims.sub, roles: claims.roles, departments: claims.departments, clearanceLevel: claims.clearance, }; - [02]
Exchange it for a short-lived, scoped agent token
typescriptconst agentToken = await tokenExchange({ subjectToken: userToken, scope: 'rag:retrieve', audience: 'vector-db', expiresIn: '5m', // One query cycle. No reuse. }); - [03]
Attach the permission context to every retrieval call
typescriptconst results = await vectorClient.query({ embedding: queryEmbedding, filter: buildRetrievalFilter(permCtx), token: agentToken, // User identity, all the way down topK: 25, }); - [04]
Audit every retrieval with the user's identity attached
typescriptawait auditLog.write({ action: 'rag_retrieve', userId: permCtx.userId, agentId: agentConfig.id, chunksRetrieved: results.length, chunksAfterFilter: authorizedResults.length, query: redact(originalQuery), timestamp: new Date().toISOString(), });
Five Ways RBAC Breaks the Moment You Plug an Agent Into It
Every team bolting traditional access control onto RAG hits the same five failure modes. These show up within weeks of deployment, not in theory.
Teams that bolt traditional RBAC onto RAG hit the same failure modes repeatedly. None of these are theoretical. They show up in production within weeks of deployment, and the patterns are stable enough to name.
RBAC Traps to Avoid
The God Service Account
The agent gets a service account with admin-level database access because "it needs to answer questions about anything." One compromised credential exposes the entire knowledge base. Use delegated identity, or scope service accounts per department. Never one identity for everything.
Stale Permission Metadata
Chunks get tagged with permission metadata at ingestion and never refreshed when roles change. An employee moves from Engineering to Sales — chunks ingested under their old department still surface in their new role's queries, and the Engineering-restricted chunks they authored go invisible to their former team. Drift is the default.
Semantic Leakage Through Summaries
Your post-filter blocks a restricted document. The LLM saw it three turns ago and quotes it in the current summary. Permission filtering has to happen before the LLM sees any context, not after generation. Once it is in the context window, it is in the response.
Permission Explosion in Metadata Filters
Encoding every possible permission combination as metadata tags. A user with 5 roles across 3 departments and 4 project teams becomes a filter clause with 60+ OR conditions. Vector database performance collapses. Use hierarchical levels — public > internal > confidential > restricted — as pre-filters. Push fine-grained checks to the post-filter.
Missing Audit Trail
Filtering chunks correctly and never logging what got filtered out. When a user reports "the AI couldn't answer my question," you cannot tell whether retrieval missed it or whether the permission boundary held. Log the retrieved set and the authorized set. Both.
Pre-Retrieval Filtering Is Only as Good as the Tags You Wrote at Ingestion
Get the taxonomy wrong and you over-restrict, under-restrict, or both at once.
Pre-retrieval filtering is only as good as the metadata you attach to each chunk at ingestion. Get the taxonomy wrong and you either over-restrict — the agent cannot find anything relevant — or under-restrict, and surface what users should never see.
The leverage point is layered classification rather than flat role tags. Three metadata dimensions cover most enterprise permission patterns. Complex organizations with unusual access hierarchies need more dimensions; do not pretend otherwise.
Classification Level (vertical access)
public— available to all authenticated users and external-facing agentsinternal— available to all employees, not external partiesconfidential— restricted to specific departments or project teamsrestricted— named individuals only, requires an explicit grant
Department Scope (horizontal access)
Tag each chunk with the owning department: engineering, sales, hr, finance, legal
Use
sharedfor cross-departmental content like company-wide policiesSupport multi-department tagging when content is genuinely owned by multiple teams
Temporal Validity (time-based access)
embargo_until— chunk hidden before a date (earnings data, product announcements)expires_at— chunk auto-restricted after a date (time-limited partnership terms)review_by— flags chunk for permission re-evaluation on a schedule
ingestion/chunk-metadata.ts// Three dimensions cover most enterprises. Complex orgs need more.
interface ChunkPermissionMetadata {
// Vertical: hierarchical classification
classification: 'public' | 'internal' | 'confidential' | 'restricted';
// Horizontal: department scope
departments: string[]; // ['engineering', 'product'] or ['shared']
projectTeams?: string[]; // Fine-grained project-level access
// Temporal
embargoUntil?: string; // ISO date — hidden before this
expiresAt?: string; // ISO date — hidden after this
reviewBy?: string; // ISO date — flag for permission audit
// Source tracking — the chunk-to-source map for ACL sync
sourceDocumentId: string;
ownerId: string;
lastAclSync: string; // ISO timestamp
}
async function enrichChunkWithPermissions(
chunk: RawChunk,
sourceDoc: SourceDocument
): Promise<ChunkPermissionMetadata> {
const docAcl = await aclService.getDocumentPermissions(sourceDoc.id);
return {
classification: docAcl.classification,
departments: docAcl.departments,
projectTeams: docAcl.projectTeams,
embargoUntil: docAcl.embargoUntil,
expiresAt: docAcl.expiresAt,
reviewBy: docAcl.reviewBy,
sourceDocumentId: sourceDoc.id,
ownerId: sourceDoc.createdBy,
lastAclSync: new Date().toISOString(),
};
}ACL Sync: Where the Embeddings Believe One Thing and Reality Believes Another
Permission changes in the source system. The vector index does not get the memo. Half a day of leakage.
Here is the scenario that breaks most permission-aware RAG implementations. A confidential engineering document is chunked, embedded, and tagged with classification: confidential, departments: ['engineering']. Three weeks later, the project launches publicly and the document is reclassified to internal. The source system's ACL gets updated. The 47 chunks in your vector database still carry the old confidential tag.
Now every non-engineering employee who asks about this feature gets nothing, even though the information is public. Worse — if someone re-uploads the document and creates new chunks, the same content sits in the index at two different permission levels.
This happened to us in a production deployment. A product launch announcement was confidential during pre-launch, reclassified to internal at announcement time. The 12-hour delay in ACL sync meant the agent refused to discuss publicly-announced features for half a day. The fix was webhook-triggered sync instead of a batch job. ACL changes now propagate in under two minutes.
ACL synchronization is a pipeline problem, not a one-time task. You need a mechanism that detects permission changes in source systems and propagates them to every chunk derived from the affected document. Drift is the default state of any system without an explicit owner.
When Agents Call Other Agents, Whose Permissions Win?
Multi-agent orchestration multiplies the surface area. Each sub-agent talks to a different permission model. Identity has to survive every hop.
The permission model gets significantly more complex the moment you move from a single agent to a multi-agent orchestration. An orchestrator decomposes a user's question and dispatches it to three specialized sub-agents: one queries the knowledge base, one queries the CRM, one queries the financial system.
Each sub-agent talks to a different data source with a different permission model. The knowledge base uses document-level ACLs. The CRM uses account-level visibility rules. The finance system uses row-level security tied to cost-center hierarchies. The requesting user might have access in two of three systems.
When the orchestrator synthesizes results, the question is whether it knows some data is missing because of permission boundaries — and whether the final response reflects that gap honestly. Two patterns handle this. Only one of them is defensible.
Orchestrator passes the user's permission context to every sub-agent
Each sub-agent applies the same user's permissions in its data source
Missing data is explicitly flagged in the sub-agent's response
Orchestrator knows which sources were permission-limited and can disclose it
Consistent with least privilege
Each sub-agent has its own service identity with fixed permissions
Results pooled regardless of the requesting user's access level
Requires a post-synthesis filter to scrub unauthorized data from the final response
Information leakage during synthesis — agent 'knows' restricted data even if it does not show it
Simpler to implement, harder to audit
Implementation Checklist for Permission-Aware RAG
The ordered moves for adding access controls to a retrieval pipeline that is already running.
Permission-Aware RAG Rollout
Audit existing data sources — every document collection mapped to its current permission model
Permission metadata taxonomy defined: classification levels, department scopes, temporal rules
Metadata enrichment in the ingestion pipeline — every chunk tagged at ingest, no exceptions
Pre-retrieval filters added to vector search queries using the permission metadata
Post-retrieval authorization service integrated — Cerbos, OPA, or custom — for fine-grained checks
Delegated identity in place — agents carry user tokens, not their own service credentials
ACL sync pipeline shipping: CDC listeners, chunk-to-source mapping, periodic reconciliation
Audit log captures user identity, chunks retrieved, chunks authorized, chunks filtered
Monitoring on filter-to-pass ratio, permission-denied query rate, ACL sync latency
Load tested with realistic permission distributions — not just admin users
Red-teamed: cross-department access attempts, privilege escalation, prompt injection to bypass filters
Configured Permissions and Enforced Permissions Are Not the Same Thing
If the metric is missing, the enforcement is missing. Three numbers tell you whether your access controls are real or aspirational.
Configured permissions and enforced permissions are not the same thing. You need observability that tells you, in real time, whether the access controls you designed are actually holding in production. Logging that records "agent succeeded" is not observability. It is an alibi.
Three metrics carry the weight.
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
| Filter-to-pass ratio | Share of retrieved chunks that survive permission filtering. Above 90% means pre-filters are too tight. Below 50% means too loose. | Below 50% or above 95% |
| Permission-denied query rate | Share of queries where the user got zero authorized chunks. Spikes mean a permission misconfiguration or a real access gap. | Above 15% for any department |
| ACL sync latency | Time between a permission change in the source system and the update reaching chunk metadata. Anything over five minutes is a leakage window. | Above 5 minutes |
| Cross-boundary retrieval attempts | The agent tried to retrieve chunks outside the user's scope. High rates mean the agent is constructing queries that ignore boundaries. | Any count above 0 in post-filter logs |
| Token expiry violations | Requests with expired delegated tokens. The agent runtime is not refreshing properly. | Any count above 0 |
The Stack That Holds Under Adversarial Inputs
Five layers, in order. No layer trusts the layer above it to have done the permission check correctly.
Pulled together, a production-grade permission-aware RAG system has five layers running in sequence. The user's identity flows through every layer. No layer trusts the one above it to have done the permission check. That is the design rule. Trust between layers is the failure mode that ate the last system.
- [01]
Ingestion Layer
Documents get chunked, embedded, and tagged with permission metadata derived from the source system's ACL. A CDC pipeline keeps chunk metadata synchronized with source permissions. Drift starts here when this step is skipped, which is most of the time.
- [02]
Identity Layer
User authenticates at the application edge. JWT claims get extracted and packaged into a PermissionContext that travels with every downstream call. Lose it here and the rest of the pipeline runs as the agent, not the user.
- [03]
Pre-Retrieval Filter Layer
The vector search query carries metadata filters built from the PermissionContext. Classification level and department scope cut the search space before semantic similarity runs. The cheapest filter is the one that prevents the database from returning the row at all.
- [04]
Post-Retrieval Authorization Layer
Each retrieved chunk passes through a fine-grained authorization check against the full permission model — ReBAC, ABAC, or custom policy. Unauthorized chunks get removed before LLM context assembly. This layer catches the chunks that were mis-tagged at ingestion.
- [05]
Context Assembly and Response Layer
Only authorized chunks enter the LLM context window. The response includes a transparency signal when permission boundaries limited the available information. Telling the user nothing is worse than telling them "some content was outside your access."
Can I use row-level security with vector databases that aren't PostgreSQL?
Most dedicated vector databases — Pinecone, Weaviate, Milvus, Qdrant — support metadata filtering, which gives you pre-retrieval access control. Milvus added row-level RBAC with bitmap indexing. True database-enforced RLS, where the database refuses to return unauthorized rows regardless of the query, is still strongest in PostgreSQL with pgvector. On a dedicated vector DB, plan for application-layer enforcement via post-retrieval filtering. Treat the pre-filter as an optimization, not a guarantee.
What's the performance impact of adding permission filters to vector search?
Pre-retrieval metadata filters typically add 5–15% latency depending on filter selectivity. Highly selective filters — restricting to one department out of twenty — actually improve performance by cutting the search space. Post-retrieval authorization adds a batch call to the auth service: 10–50ms per batch depending on chunk count and the auth service's architecture. Total overhead lands under 100ms, which is negligible next to LLM inference. Cost is observability, not optimization theater.
How do I handle permissions for summarized or derived content?
Hardest problem in the stack. If an agent summarizes 10 chunks and three are later reclassified as restricted, the summary is tainted. Two options. Store provenance metadata linking every generated summary to its source chunks, then revalidate permissions when the summary gets retrieved. Or give summaries the most restrictive classification of any source chunk and re-summarize when source permissions change. The second is simpler and over-restricts. The first is correct and expensive.
Should I tell the user when permission boundaries limited their results?
Yes. Carefully. "Some information may not be available based on your access level" — not "You don't have access to 3 confidential engineering documents about Project X." The second leaks the existence of what is hidden. Acknowledge the boundary without revealing what sits behind it. Heuristic: surface the caveat when filter-to-pass drops below 60% for a query. Above 60%, assume the agent had enough context and skip the noise. Users who hit this message constantly should have their access reviewed — usually the role assignment is wrong, not the permission model.
A note on compliance frameworks
SOC 2 Type II, HIPAA, and GDPR all require demonstrable access controls on data used by automated systems. Permission-aware RAG is not just engineering hygiene. It is increasingly a compliance requirement. The audit logging patterns in this article map directly to SOC 2 CC6.1 (logical access controls) and CC7.2 (system activity monitoring). For protected health information, HIPAA's minimum necessary standard applies to AI agent retrieval the same way it applies to any other automated access — the agent should only see the minimum data needed to answer the query.
- [1]Couchbase: Securing Agentic RAG Pipelines(couchbase.com)↩
- [2]Zilliz: Fine-Grained Access Control with Milvus Row-Level RBAC(zilliz.com)↩
- [3]Elastic: RAG and RBAC Integration(elastic.co)↩
- [4]Pinecone: RAG Access Control(pinecone.io)↩
- [5]Supabase: RAG with Permissions(supabase.com)↩
- [6]Cerbos: Access Control for RAG and LLMs(cerbos.dev)↩
- [7]Cloud Security Alliance: Organizations Cannot Distinguish AI Agent from Human Actions (March 2026)(cloudsecurityalliance.org)↩
- [8]WorkOS: AI Agent Access Control(workos.com)↩
- [9]Descope: ReBAC for RAG Pipelines(descope.com)↩
- [10]Oso: AI Agent Permissions and Delegated Access(osohq.com)↩