Here is the uncomfortable reality of building AI agents that work with enterprise data: the moment you plug an agent into your retrieval pipeline, it inherits a permission problem that took your organization years to create and has never been properly solved.
Traditional RBAC was designed for humans clicking through UIs. You assign roles, roles map to resources, and the person either sees the page or gets a 403. Clean enough. But agents operate differently. They make hundreds of retrieval calls per session, traverse data across departments, and synthesize information from sources that belong to different permission domains. The clean role-to-resource mapping collapses under the weight of what agents actually do.
According to a March 2026 Cloud Security Alliance study, approximately 68% of organizations cannot clearly distinguish between human and AI agent actions in their access logs[7]. Meanwhile, non-human identities — service principals, API tokens, autonomous agents — are estimated to outnumber human users by a ratio of roughly 100 to 1 in the average enterprise. The identity layer that was built for a few thousand employees now needs to manage millions of machine callers, most of them running with far more privilege than they need.
This article covers the practical architecture for permission-aware RAG systems. Not the theory. The actual patterns, tradeoffs, and code that let you give agents enough access to be useful without handing them the keys to everything.
The Fundamental Tension: Access Breadth vs. Data Boundaries
Why the standard enterprise permission model breaks down for AI agents
Every RAG-powered agent faces a core design conflict. Broader data access produces better answers — the agent that can cross-reference HR data, financial reports, and engineering tickets will generate insights that no single-domain tool can match. But broader access means the agent can surface information the requesting user should never see.
Consider a concrete scenario. A sales manager asks your internal AI: "What's the competitive landscape for our Q2 deal with Acme Corp?" To answer well, the agent needs CRM data, competitive intelligence reports, previous deal history, maybe even pricing models. All legitimate for a sales manager. But the retrieval pipeline might also pull in board-level strategic memos about the Acme relationship, HR data about the assigned account executive's performance review, or finance documents about margin targets that are restricted to VP-level and above.
The agent doesn't know what it shouldn't know. It retrieves based on semantic similarity, not permission boundaries. And that gap between "semantically relevant" and "authorized to view" is where data leakage lives.
Static role assignments: admin, editor, viewer
Single-resource access per request
Session-based authentication with clear identity
Permissions checked at the UI or API gateway
Dozens to hundreds of unique identities
Dynamic, context-dependent permission scoping
Multi-resource retrieval across domains per query
Delegated identity acting on behalf of a human user
Permissions must propagate through the entire retrieval pipeline
Thousands to millions of non-human identities
Three Filtering Strategies for Permission-Aware RAG
Pre-retrieval, post-retrieval, and hybrid — when each one wins
There are three places in a RAG pipeline where you can enforce data access controls. Each has real tradeoffs, and the correct answer is almost always a combination.
Pre-retrieval filtering attaches permission metadata to every chunk at ingestion time and adds filter clauses to the vector search query. The vector database only returns chunks the user is authorized to see. Sensitive data never enters the pipeline.
Post-retrieval filtering retrieves the top-k chunks based on semantic similarity first, then passes them through an authorization service that removes anything the user shouldn't see before the LLM receives context.
Hybrid filtering combines both: broad pre-retrieval filters (department, classification level) narrow the search space, while fine-grained post-retrieval checks handle complex relationship-based permissions that can't be expressed as simple metadata filters.
| Dimension | Pre-Retrieval | Post-Retrieval | Hybrid |
|---|---|---|---|
| Sensitive data exposure | Never enters pipeline | Fetched into memory, then filtered | Minimized by pre-filter, eliminated by post-filter |
| Retrieval quality | May miss semantically relevant chunks due to narrow filter | Best semantic results, then pruned | Strong — broad pre-filter preserves relevance |
| Performance | Fast — database handles filtering | Slower — extra auth service call per chunk | Moderate — balanced between both |
| Permission model complexity | Simple metadata tags (role, department, classification) | Supports ReBAC, ABAC, and complex policies | Any model supported |
| Implementation effort | Low — metadata at ingestion + query filters | Medium — requires dedicated auth service integration | Higher — requires both systems coordinated |
| Best for | High-volume, simple permission models | Complex enterprise hierarchies | Most production systems |
Permission Propagation Through the Retrieval Pipeline
How user identity flows from request to vector search to LLM context
The hardest part of permission-aware RAG isn't writing the filter logic. It's making sure the user's identity and permissions actually propagate through every stage of the pipeline without being lost, elevated, or confused.
Most agent architectures have a gap here. The user authenticates at the application layer, but by the time the request reaches the vector database, it's running under a service account with broad access. The permission check happens — if it happens — at the application layer after retrieval, not at the data layer during retrieval.
This is the confused deputy problem applied to AI pipelines. The agent (the deputy) is authorized to access everything, but it should only retrieve data on behalf of the specific user who made the request. If the agent's identity is used for retrieval instead of the user's identity, every query runs with maximum privilege.
rag-pipeline/permission-context.tsinterface PermissionContext {
userId: string;
roles: string[];
departments: string[];
clearanceLevel: 'public' | 'internal' | 'confidential' | 'restricted';
delegatedBy?: string; // Original user if agent is acting on behalf
}
function buildRetrievalFilter(ctx: PermissionContext) {
return {
$and: [
// Pre-filter: only chunks this user's clearance can access
{ clearanceLevel: { $lte: clearanceLevelToInt(ctx.clearanceLevel) } },
// Pre-filter: department-scoped content
{
$or: [
{ department: { $in: ctx.departments } },
{ department: 'shared' },
],
},
],
};
}
async function retrieveWithPermissions(
query: string,
ctx: PermissionContext,
topK: number = 20 // Over-fetch to account for post-filter drops
) {
const filter = buildRetrievalFilter(ctx);
const chunks = await vectorDB.query({ query, filter, topK });
// Post-filter: fine-grained ReBAC check per chunk
const authorized = await authService.batchCheck(
chunks.map((c) => ({
subject: ctx.userId,
action: 'read',
resource: c.metadata.resourceId,
}))
);
return chunks.filter((_, i) => authorized[i]);
}Row-Level Security for AI: What Databases Actually Support
Implementing data-layer enforcement that agents can't bypass
Row-level security (RLS) is the gold standard for data access control because enforcement happens at the database layer. The application — or in this case, the agent — physically cannot read rows it shouldn't see. There's no "forgot to add the filter" failure mode.
PostgreSQL has supported RLS since version 9.5, and it's the cleanest implementation for RAG pipelines that use pgvector or Supabase. You define policies that reference the current user's role or session variables, and the database automatically appends the filter to every query. The agent never constructs the filter itself — it just queries, and the database handles the rest.
Supabase takes this further with their RAG-specific pattern: each document chunk is stored with an owner_id column, and the RLS policy checks the authenticated user's JWT claims against the row[5]. When combined with their Edge Functions for the agent runtime, you get end-to-end permission propagation from the user's browser to the vector search to the LLM context.
supabase/rls-policy.sql-- Enable RLS on the document chunks table
ALTER TABLE document_chunks ENABLE ROW LEVEL SECURITY;
-- Policy: users see only chunks from documents they own or that are shared
CREATE POLICY "Users read own or shared chunks"
ON document_chunks FOR SELECT
USING (
owner_id = auth.uid()
OR classification = 'shared'
OR EXISTS (
SELECT 1 FROM document_access_grants
WHERE document_access_grants.document_id = document_chunks.document_id
AND document_access_grants.grantee_id = auth.uid()
AND document_access_grants.permission >= 'read'
)
);
-- Policy: department-scoped access via user metadata
CREATE POLICY "Department members read department docs"
ON document_chunks FOR SELECT
USING (
department = (auth.jwt() -> 'app_metadata' ->> 'department')
OR classification IN ('public', 'shared')
);
-- The agent queries normally — RLS handles filtering invisibly
-- SELECT * FROM document_chunks
-- ORDER BY embedding <=> query_embedding
-- LIMIT 20;The Delegated Identity Pattern
Why agents should borrow user identity instead of holding their own permissions
The single most important architectural decision in permission-aware AI is whether your agent authenticates as itself or as the user it's acting for.
Service account model: The agent has its own identity with broad access. Application code filters results based on the requesting user's permissions. This is easier to implement, but it means the agent's credentials are a high-value target. If compromised, the attacker gets access to everything the agent can see — which is usually everything.
Delegated identity model: The agent receives a short-lived, scoped token that carries the requesting user's identity and permissions. Every downstream call — vector search, database query, API request — uses this token. The agent can only access what the user can access[10].
The delegated model is strictly better from a security standpoint, but it requires more infrastructure. You need a token exchange mechanism (OAuth2 token exchange, or a custom delegation service), and every system in the pipeline needs to honor the delegated token. Most vector databases don't natively support this yet, so you end up implementing it at the application layer with pass-through filters.
- 1
Extract the user's identity from the incoming request
typescriptconst userToken = request.headers.get('Authorization'); const claims = await verifyAndDecode(userToken); const permCtx: PermissionContext = { userId: claims.sub, roles: claims.roles, departments: claims.departments, clearanceLevel: claims.clearance, }; - 2
Exchange for a scoped, short-lived agent token
typescriptconst agentToken = await tokenExchange({ subjectToken: userToken, scope: 'rag:retrieve', audience: 'vector-db', expiresIn: '5m', // Short-lived — one query cycle }); - 3
Attach the permission context to every retrieval call
typescriptconst results = await vectorClient.query({ embedding: queryEmbedding, filter: buildRetrievalFilter(permCtx), token: agentToken, // Carries user identity downstream topK: 25, }); - 4
Audit every retrieval with the user's identity attached
typescriptawait auditLog.write({ action: 'rag_retrieve', userId: permCtx.userId, agentId: agentConfig.id, chunksRetrieved: results.length, chunksAfterFilter: authorizedResults.length, query: redact(originalQuery), timestamp: new Date().toISOString(), });
Five RBAC Traps That Break RAG Pipelines
Common mistakes when applying traditional access control to AI retrieval systems
Teams that bolt traditional RBAC onto RAG pipelines hit the same failure modes repeatedly. These aren't theoretical — they show up in production within weeks of deployment.
RBAC Traps to Avoid
The God Service Account
Giving the agent a service account with admin-level database access because 'it needs to answer questions about anything.' One compromised credential exposes your entire knowledge base. Use delegated identity or scoped service accounts per department.
Stale Permission Metadata
Tagging chunks with permission metadata at ingestion time but never updating them when roles change. An employee moves from Engineering to Sales — the chunks ingested under their old department still show up in their new role's queries, and Engineering-restricted chunks they authored are now invisible to their former team.
Semantic Leakage Through Summaries
Your post-filter correctly blocks a restricted document, but the LLM has already seen it in a previous conversation turn and includes details in its summary. Permission filtering must happen before the LLM sees any context, not after generation.
Permission Explosion in Metadata Filters
Encoding every possible permission combination as metadata tags. A user with 5 roles across 3 departments and 4 project teams creates a filter clause with 60+ OR conditions. Vector database performance collapses. Use hierarchical permission levels (public > internal > confidential > restricted) as pre-filters, and fine-grained checks as post-filters.
Missing Audit Trail
Filtering chunks correctly but never logging what was filtered out. When a user reports 'the AI couldn't answer my question,' you have no way to tell whether it was a retrieval quality problem or a permission boundary working as intended. Log both the retrieved set and the authorized set.
Designing Permission Metadata at Ingestion Time
The taxonomy that makes pre-retrieval filtering actually work
Pre-retrieval filtering is only as good as the metadata you attach to each chunk during ingestion. Get the taxonomy wrong, and you're either over-restricting (agents can't find relevant content) or under-restricting (agents surface content users shouldn't see).
The trick is using layered classification rather than flat role tags. Three metadata dimensions cover the majority of enterprise permission patterns — though complex organizations with unusual access hierarchies may require additional dimensions.
Classification Level (vertical access)
public— Available to all authenticated users and external-facing agentsinternal— Available to all employees but not external partiesconfidential— Restricted to specific departments or project teamsrestricted— Named individuals only, requires explicit grant
Department Scope (horizontal access)
Tag each chunk with the owning department: engineering, sales, hr, finance, legal
Use
sharedfor cross-departmental content like company-wide policiesSupport multi-department tagging for chunks relevant to multiple teams
Temporal Validity (time-based access)
embargo_until— Content not available before a specific date (e.g., earnings data, product announcements)expires_at— Content auto-restricted after a date (e.g., time-limited partnership terms)review_by— Flags content for permission re-evaluation on a schedule
ingestion/chunk-metadata.tsinterface ChunkPermissionMetadata {
// Vertical access: hierarchical classification
classification: 'public' | 'internal' | 'confidential' | 'restricted';
// Horizontal access: department scope
departments: string[]; // ['engineering', 'product'] or ['shared']
projectTeams?: string[]; // Fine-grained project-level access
// Temporal access
embargoUntil?: string; // ISO date — chunk hidden before this date
expiresAt?: string; // ISO date — chunk hidden after this date
reviewBy?: string; // ISO date — flag for permission audit
// Source tracking
sourceDocumentId: string; // Link back to original document for ACL sync
ownerId: string; // Creator for ownership-based policies
lastAclSync: string; // ISO timestamp — when permissions were last refreshed
}
async function enrichChunkWithPermissions(
chunk: RawChunk,
sourceDoc: SourceDocument
): Promise<ChunkPermissionMetadata> {
const docAcl = await aclService.getDocumentPermissions(sourceDoc.id);
return {
classification: docAcl.classification,
departments: docAcl.departments,
projectTeams: docAcl.projectTeams,
embargoUntil: docAcl.embargoUntil,
expiresAt: docAcl.expiresAt,
reviewBy: docAcl.reviewBy,
sourceDocumentId: sourceDoc.id,
ownerId: sourceDoc.createdBy,
lastAclSync: new Date().toISOString(),
};
}The ACL Sync Problem: Keeping Permissions Current
What happens when access changes but your embeddings don't know about it
Here is a scenario that breaks most permission-aware RAG implementations. A confidential engineering document is chunked, embedded, and tagged with classification: confidential, departments: ['engineering']. Three weeks later, the document is reclassified to internal because the project launched publicly. The original document's ACL is updated in the source system. But the 47 chunks in your vector database still carry the old confidential tag.
Now every non-engineering employee who asks about this feature gets no results, even though the information is public. Worse, if someone manually updates the document but creates new chunks, you have the same content with two different permission levels in your index.
ACL synchronization is a pipeline problem, not a one-time task. You need a mechanism that detects permission changes in source systems and propagates them to every chunk derived from the affected document.
Permission Boundaries in Multi-Agent Pipelines
When agents call other agents, whose permissions win?
The permission model gets significantly more complex when you move from a single agent to a multi-agent orchestration pipeline. Consider an orchestrator that decomposes a user's question and dispatches it to three specialized sub-agents: one queries the knowledge base, one queries the CRM, and one queries the financial system.
Each sub-agent talks to a different data source with different permission models. The knowledge base uses document-level ACLs. The CRM uses account-level visibility rules. The finance system uses row-level security tied to cost center hierarchies. The requesting user might have access in two of the three systems but not the third.
The question is: when the orchestrator synthesizes results from all three sub-agents, does it know that some data is missing because of permission boundaries? And does the final response reflect that gap honestly?
Two patterns handle this well.
Orchestrator passes the user's permission context to every sub-agent
Each sub-agent applies the same user's permissions in its data source
Missing data is explicitly flagged in the sub-agent's response
Orchestrator knows which sources were permission-limited and can disclose this
Consistent with principle of least privilege
Each sub-agent has its own service identity with fixed permissions
Results are pooled regardless of the requesting user's access level
Requires a post-synthesis filter to remove unauthorized data from the final response
Risk of information leakage during synthesis — agent 'knows' restricted data even if it doesn't show it
Simpler to implement but harder to audit
Implementation Checklist for Permission-Aware RAG
The ordered steps for adding access controls to an existing retrieval pipeline
Permission-Aware RAG Rollout
Audit existing data sources: map every document collection to its current permission model
Define the permission metadata taxonomy: classification levels, department scopes, temporal rules
Implement metadata enrichment in the ingestion pipeline — every chunk gets permission tags
Add pre-retrieval filters to vector search queries using the permission metadata
Integrate a post-retrieval authorization service (Cerbos, OPA, or custom) for fine-grained checks
Implement delegated identity — agents carry user tokens, not their own service credentials
Build the ACL sync pipeline: CDC listeners, chunk-to-source mapping, periodic reconciliation
Add audit logging for every retrieval: user identity, chunks retrieved, chunks authorized, chunks filtered
Set up monitoring: filter-to-pass ratio, permission-denied query rate, ACL sync latency
Load test with realistic permission distributions — not just admin users
Red team the pipeline: attempt cross-department data access, privilege escalation, prompt injection to bypass filters
Monitoring: How to Know Your Permissions Actually Work
The metrics that tell you whether access controls are enforced or just configured
Configured permissions and enforced permissions are not the same thing. You need observability that tells you, in real time, whether the access controls you designed are actually working in production.
Three metrics matter most.
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
| Filter-to-pass ratio | Percentage of retrieved chunks that survive permission filtering. If it's consistently above 90%, your pre-filters might be too tight. If it's below 50%, they're too loose. | Below 50% or above 95% |
| Permission-denied query rate | Percentage of queries where the user received zero authorized chunks. Spikes indicate a permission misconfiguration or a real access gap. | Above 15% for any department |
| ACL sync latency | Time between a permission change in the source system and the update reaching chunk metadata. Anything over 5 minutes is a leakage window. | Above 5 minutes |
| Cross-boundary retrieval attempts | Cases where the agent attempted to retrieve chunks outside the user's permission scope. High rates suggest the agent is constructing queries that don't respect boundaries. | Any count above 0 in post-filter logs |
| Token expiry violations | Requests made with expired delegated tokens. Indicates the agent runtime isn't properly refreshing tokens. | Any count above 0 |
Putting It Together: A Production Architecture
The full stack for permission-aware RAG from ingestion to response
Pulling all the patterns together, a production-grade permission-aware RAG system has five layers that work in sequence. The user's identity flows through every layer, and no layer trusts the one above it to have done the permission check correctly.
- 1
Ingestion Layer
Documents are chunked, embedded, and tagged with permission metadata derived from the source system's ACL. A CDC pipeline keeps chunk metadata synchronized with source permissions.
- 2
Identity Layer
User authenticates at the application edge. Their JWT claims are extracted and packaged into a PermissionContext object that travels with every downstream call.
- 3
Pre-Retrieval Filter Layer
The vector search query includes metadata filters based on the PermissionContext. Classification level and department scope reduce the search space before semantic similarity runs.
- 4
Post-Retrieval Authorization Layer
Each retrieved chunk passes through a fine-grained authorization check against the full permission model (ReBAC, ABAC, or custom policies). Unauthorized chunks are removed before LLM context assembly.
- 5
Context Assembly and Response Layer
Only authorized chunks enter the LLM context window. The response includes a transparency signal if permission boundaries limited the available information.
Can I use row-level security with vector databases that aren't PostgreSQL?
Most dedicated vector databases (Pinecone, Weaviate, Milvus, Qdrant) support metadata filtering, which gives you pre-retrieval access control. Milvus added row-level RBAC with bitmap indexing. But true database-enforced RLS — where the database itself prevents unauthorized reads regardless of the query — is still strongest in PostgreSQL with pgvector. If you're using a dedicated vector DB, plan for application-layer enforcement via post-retrieval filtering.
What's the performance impact of adding permission filters to vector search?
Pre-retrieval metadata filters typically add 5-15% latency to vector search queries, depending on the selectivity of the filter. Highly selective filters (restricting to one department out of twenty) can actually improve performance by reducing the search space. Post-retrieval authorization adds a batch call to your auth service — expect 10-50ms per batch depending on the number of chunks and the auth service's architecture. The total overhead is usually under 100ms, which is negligible compared to LLM inference time.
How do I handle permissions for summarized or derived content?
This is one of the hardest problems. If an agent summarizes 10 chunks and 3 of them are later reclassified as restricted, the summary is tainted. Two approaches: (1) store provenance metadata linking every generated summary to its source chunks, and revalidate permissions when the summary is retrieved, or (2) give summaries the most restrictive classification of any source chunk and re-summarize when source permissions change.
Should I tell the user when permission boundaries limited their results?
Yes, but carefully. Say 'Some information may not be available based on your access level' — not 'You don't have access to 3 confidential engineering documents about Project X.' The second version leaks information about what exists. Acknowledge the boundary without revealing what's behind it.
A note on compliance frameworks
SOC 2 Type II, HIPAA, and GDPR all require demonstrable access controls on data used by automated systems. Permission-aware RAG isn't just good engineering — it's increasingly a compliance requirement. The audit logging patterns described in this article map directly to SOC 2 CC6.1 (logical access controls) and CC7.2 (monitoring system activity). If your organization handles protected health information, HIPAA's minimum necessary standard applies to AI agent retrieval as well — the agent should only access the minimum data needed to answer the query.
- [1]Couchbase: Securing Agentic RAG Pipelines(couchbase.com)↩
- [2]Zilliz: Fine-Grained Access Control with Milvus Row-Level RBAC(zilliz.com)↩
- [3]Elastic: RAG and RBAC Integration(elastic.co)↩
- [4]Pinecone: RAG Access Control(pinecone.io)↩
- [5]Supabase: RAG with Permissions(supabase.com)↩
- [6]Cerbos: Access Control for RAG and LLMs(cerbos.dev)↩
- [7]Cloud Security Alliance: Organizations Cannot Distinguish AI Agent from Human Actions (March 2026)(cloudsecurityalliance.org)↩
- [8]WorkOS: AI Agent Access Control(workos.com)↩
- [9]Descope: ReBAC for RAG Pipelines(descope.com)↩
- [10]Oso: AI Agent Permissions and Delegated Access(osohq.com)↩