← Architecture

Postgres RLS for multi-tenant SaaS — the configuration that actually works

26 May, 2026

Application-layer authorization is necessary and not sufficient. A single bug in the boundary function leaks tenant data. A future engineer refactors the check out without realizing what it was protecting. Someone adds an internal admin tool that bypasses the application checks entirely because "it's just for us." Any one of these is a Tuesday at most multi-tenant SaaS companies, and any one of them ships unless the database refuses the wrong query on its own.

That's the case for Row-Level Security. RLS is a Postgres feature that lets you write policies on a table — predicates the database evaluates on every read and write — that determine which rows the current session is allowed to see. Configured well, it's a second lock on the same door: the application layer says "this user can access this resource," and the database independently checks "yes, that resource belongs to this session's tenant." Both have to agree before data flows.

I think the reason RLS doesn't get used more often is not that it's obscure — it's that almost every tutorial gets the configuration just slightly wrong, and the slightly-wrong version is the worst of both worlds: you get the operational complexity of RLS without the actual isolation guarantee. Below is the configuration that actually works, the way I run it on Allset, and the bugs to avoid.

The connection-boundary contract

RLS policies need to know "who is the current tenant," and the only correct place to set that is at the connection boundary, before any application query runs. The pattern:

SET LOCAL app.current_tenant_id = '<uuid>';

SET LOCAL is critical — it scopes the setting to the current transaction, so a misbehaving connection from the pool can't leak the tenant ID into the next request. If you SET without LOCAL, the setting persists for the connection's lifetime, and the next request on the pooled connection inherits the previous tenant's identity. That's a multi-tenant leak with extra steps.

In application code, this means every request handler — without exception — opens a transaction and issues the SET LOCAL before any business query. The pool needs a hook that enforces this. If your ORM doesn't have one, write the middleware yourself. Skipping the SET LOCAL on even one code path is the kind of bug that ships green and breaks a customer six months later, exactly like the application-layer bugs RLS is supposed to defend against.

The policy then reads the setting:

CREATE POLICY tenant_isolation ON resources
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

current_setting() returns the value set on the current session. The policy says: a row is visible to the current session only if its tenant_id matches. The database refuses everything else, regardless of what the application asked for.

The policy that doesn't lie

Three bugs I see in RLS configurations more often than I'd like.

Specify WITH CHECK explicitly, even when it's identical to USING. USING controls what rows are visible on reads (and which existing rows can be updated or deleted); WITH CHECK controls what values new or modified rows can take. Postgres lets you write a policy with only USING and quietly uses that expression as the write-time check too — the docs say "if no WITH CHECK expression is defined, then the USING expression will be used both to determine which rows are visible and which new rows will be allowed to be added." For simple tenant-isolation that's usually fine, and many tutorials skip WITH CHECK for exactly that reason.

The reason to specify it anyway is reasoning load, not correctness. The implicit reuse makes the policy harder to think about, especially on UPDATE, where USING evaluates the old row and WITH CHECK evaluates the new one. Spell it out:

CREATE POLICY tenant_isolation ON resources
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid)
  WITH CHECK (tenant_id = current_setting('app.current_tenant_id')::uuid);

Same expression, twice. The duplication is the point — it removes the "wait, what does this policy do on INSERT?" question from every future code review.

Leaving the application role as BYPASSRLS or SUPERUSER. Both attributes silently disable RLS for the role. Verify with \du in psql — every role your application connects with should show neither attribute. The postgres superuser bypasses RLS by design; that's fine for migrations and ops work, but the role your application uses must not.

Forgetting FORCE ROW LEVEL SECURITY. Without it, the table owner bypasses RLS even if the owner role isn't superuser. In practice this matters when your migration role owns the tables and also runs ad-hoc queries through the same connection. The fix is one statement per table:

ALTER TABLE resources FORCE ROW LEVEL SECURITY;

You want this. Without it, your audit-grade isolation claim has a quiet exception in it.

The migration plan for an existing schema

Adding RLS to a schema that's already in production is more work than greenfield but the playbook is the same every time.

First, backfill tenant_id on every table that needs one. The hard part is usually not the backfill itself but identifying which tables actually need it — anything joined through a tenant-scoped relationship counts, not just tables with obvious tenant data on them. A 30-minute audit of the schema before you start saves a week of "oh, this one too" surprises mid-migration.

Second, enable RLS in two phases. Phase one: ENABLE ROW LEVEL SECURITY with a permissive default policy that allows everything, then deploy the application-layer SET LOCAL plumbing and verify in staging that every code path sets the tenant correctly. Phase two: replace the permissive policy with the actual restrictive one and add FORCE ROW LEVEL SECURITY. If you skip phase one and go straight to restrictive, the first code path that forgot the SET LOCAL errors out in production instead of staging, which is the wrong order.

Third, test with realistic multi-tenant data. A staging environment with two tenants and ten rows per table doesn't surface the bugs. A staging environment with twenty tenants and reasonable row volumes does — and it surfaces them while you can still fix them.

The performance worry is mostly wrong

The objection I hear most often: "won't RLS slow every query down?" The honest answer is no, with one caveat.

Policy evaluation is effectively free if your indexes include tenant_id as the leading column. Postgres pushes the policy predicate into the query plan; the planner uses the index the same way it would for an explicit WHERE tenant_id = ? clause. You're not paying for an extra scan; you're paying for one extra equality check on the index lookup, which is negligible.

The caveat is when your indexes don't include tenant_id. Without that, RLS adds a filter step on top of whatever scan the planner picks — fine on small tables, painful on tables of any size. The rule of thumb is to include tenant_id in your indexes on tenant-scoped tables, typically as the leading column unless the access pattern points elsewhere. Most schemas need 30 minutes of index work; very few need more.

The other place performance shows up is cross-tenant joins — joins between tenant-scoped tables where the joining session has access to both. These are rare in well-modeled multi-tenant schemas (most queries are tenant-scoped, not tenant-spanning), but worth identifying before you enable RLS so you don't get surprised by query plans that suddenly include extra scans.

The take

RLS is one of those features that's free to add on day one and expensive to add on day three hundred — same shape as cost-attribution tags, S3 lifecycle policies, every cross-cutting concern that lives outside any feature spec. The cost isn't in the configuration. It's in the discipline of always opening a transaction with SET LOCAL before any query.

The benefit shows up on the day the application-layer authorization fails. Either a bug shipped, or a backdoor opened, or a refactor moved the check somewhere it doesn't get called from anymore — every multi-tenant SaaS that runs long enough sees one of these. The version with RLS contains the damage to whatever single bug fired. The version without it sends the breach notification.

If your multi-tenant SaaS doesn't have RLS configured today, that meeting is the lever.