From RBAC to ReBAC: Implementing Zanzibar-Style Authorization with SpiceDB and Cedar in 2025

Modern applications are collaborative, multi-tenant, and context-aware. A simple “role” attached to a user rarely captures who can see or change a resource. Google’s Zanzibar system (2019) popularized Relationship-Based Access Control (ReBAC) for global-scale apps (Drive, Photos, YouTube), and since then practical, production-grade implementations like SpiceDB have made Zanzibar-like authorization accessible to most teams.

This article is a hands-on, opinionated guide to getting from RBAC to ReBAC in 2025 using SpiceDB for relationships and Cedar/OPA for contextual policy. We’ll cover:

Why RBAC breaks in modern collaborative apps
ReBAC 101: Zanzibar’s model in practice
Building schemas, tuples, and caveats in SpiceDB
Integrating ABAC policy with Cedar or OPA
Caching, consistency, and zed tokens
Migration strategies from RBAC
Multi-tenant patterns
Testing and performance pitfalls

I’ll include concrete schemas, code snippets, and field-tested patterns you can apply immediately.

Executive summary

Use SpiceDB as an authorization data plane for relationships. Keep your app logic clean: ask it “Can Alice edit doc123?” and cache answers smartly.
Keep contextual conditions (time, risk signals, MFA state) in a separate policy layer (Cedar or OPA). Combine decisions deterministically.
Start your migration by mapping roles to relationships, shadow-run, and prove parity with differential tests before flipping traffic.
Avoid list-heavy endpoints over the authz layer; prefer lookup APIs designed for that purpose and precompute indexes when needed.
Measure the graph: tuple counts, degree distributions, and max path depth. Cap depth. Watch dispatch latencies under load.

Why RBAC struggles in 2025

RBAC prescribes that users get roles, roles map to permissions. It’s simple and fast—but coarse.

Where RBAC falls short:

Resource-centric sharing: “Share document with Bob and the Marketing group” doesn’t map cleanly to global roles.
Hierarchies and inheritance: A folder’s permission flows down to its children. RBAC needs brittle, app-specific logic to mimic this.
Cross-tenant collaboration: RBAC encourages global roles; multi-tenant collaboration needs resource-scoped grants.
Contextual access: Conditions like “only during business hours,” “requires MFA for delete,” or “deny if risk score high” are not role-like.

What teams do: build a custom hybrid of RBAC, ACLs, and ad-hoc checks scattered across code. It works—until it doesn’t. Auditing becomes painful and changes become risky. ReBAC (relationships on resources) gives you a principled model with purpose-built infra.

ReBAC 101: Zanzibar in brief

Zanzibar models permissions as set operations over relations between subjects and resources. The core pieces:

Objects: Users, groups, folders, documents, organizations, tenants, etc.
Relations: Named edges like viewer, editor, owner, member.
Tuples: Facts of the form object#relation@subject (e.g., document:doc1#viewer@user:alice).
Permissions: Algebra over relations and traversals (union, intersection, exclusion; transitive edges like parent->viewer).
Consistency: Tokens (revisions) let you choose between fully consistent and low-latency views.

Zanzibar’s results: It serves millions of authorization checks per second for Google services with low, predictable latency by treating authz as a first-class global system. SpiceDB generalizes the model for everyone else.

References:

Zanzibar paper: “Zanzibar: Google’s Consistent, Global Authorization System” (USENIX ATC 2019)
SpiceDB (Authzed): https://github.com/authzed/spicedb

Architecture: a practical split of concerns

A clean, evolvable architecture typically looks like this:

Data-plane for relationships: SpiceDB clusters connected to your primary data store. Your app calls Check, Expand, and Lookup APIs via gRPC/HTTP.
Policy-plane for context: Cedar or OPA sidecar/service evaluates ABAC conditions using request/environment attributes.
Orchestrator: Your app composes responses: allow = SpiceDB says allowed AND policy says allowed.

Data flow per request:

Resolve principal, resource, and action.
Ask SpiceDB: “Is subject S a member of permission P on resource R?” (Check)
If allowed so far, ask the policy engine: “Are all contextual constraints satisfied?” (Cedar/OPA)
Combine deterministically (usually logical AND). Log and meter both outcomes.

This separation lets relationship evolution and contextual policy evolve independently and be tested/validated by different teams.

Modeling a realistic app: schema, tuples, and caveats in SpiceDB

We’ll model a collaborative documents app with organizations, folders, documents, users, and groups. We’ll show:

Relations for org membership
Inheritance from folders to documents
Direct shares to users and groups
Admins overriding, and caveats for time-based constraints

SpiceDB schema (Markdown code block uses latest syntax as of 2025):

spicedb
definition user {}

definition group {
  relation member: user | group#member
}

definition organization {
  relation admin: user | group#member
  relation member: user | group#member
  permission manage = admin
}

definition folder {
  relation parent: organization | folder
  relation owner: user | group#member
  relation editor: user | group#member
  relation viewer: user | group#member | organization#member

  # Derived permissions
  permission read = viewer + editor + owner + parent->read
  permission write = editor + owner + parent->write
  permission admin = owner + parent->admin
}

definition document {
  relation parent: folder
  relation owner: user | group#member
  relation editor: user | group#member
  relation viewer: user | group#member | organization#member

  permission read = viewer + editor + owner + parent->read
  permission write = editor + owner + parent->write
  permission admin = owner + parent->admin
}

# An optional caveat for "business hours" access
caveat during_business_hours(env: map) {
  env.time.hour >= 9 && env.time.hour < 17 && env.time.tz == "UTC"
}

# A caveat requiring the request to assert MFA
caveat mfa_required(env: map) {
  env.user.mfa_authenticated == true
}

Notes:

Union (+), intersection (&), and exclusion (-) compose permissions.
Traversal via parent->read propagates access through the hierarchy.
Group membership is recursive (group nesting), but you should cap depth for performance.
Caveats use CEL-like expressions in SpiceDB. They’re evaluated at Check time when you provide context.

Tuples (facts) example:

text
# Memberships
organization:acme#member@user:alice
organization:acme#member@group:marketing#member

group:marketing#member@user:bob

# Folder and document structure
folder:f123#parent@organization:acme
folder:f123#owner@user:carol

document:d1#parent@folder:f123

# Sharing
document:d1#viewer@user:alice
# Conditional share: Bob can view only during business hours
document:d1#viewer@user:bob[caveat during_business_hours]
# Require MFA for write through parent editor
folder:f123#editor@group:marketing#member[caveat mfa_required]

Checking with a context that satisfies caveats:

bash
# Using zed (Authzed CLI) to check access
zed permissions check \
  --subject user:bob \
  --permission read \
  --resource document:d1 \
  --caveat-context '{"time": {"hour": 10, "tz": "UTC"}}'

If you omit or violate the caveat context, access will be conditional or denied.

Avoiding tuple explosion

Pitfall: many direct shares produce huge tuple counts. Prefer inheritance and groups:

Grant at the folder level; let documents inherit.
Prefer group grants over per-user grants.
Use organization#member as a coarse-grained fallback, but monitor reach.

Measure tuple counts and degrees in production; consider quotas to prevent runaway shares.

Integrating Cedar or OPA for contextual ABAC

SpiceDB caveats can handle light context (time windows, MFA). For richer policy governance—audits, versioning, code review—you may prefer a dedicated policy engine.

Two common patterns:

SpiceDB first, then Cedar/OPA

Check relationship membership in SpiceDB.
If allowed, evaluate ABAC constraints in Cedar or OPA.
Final allow = membership AND context.

Cedar/OPA gates SpiceDB invocation

Evaluate cheap rejects first (deny outside tenant boundary, high risk score, etc.).
Only call SpiceDB if policy permits.

Pattern 1 is safer: you avoid leaking existence of resources by proving membership first.

Cedar example

Cedar is a policy language with a strong type model and a formal semantics developed by AWS. It’s a great fit for resource and action schemas with conditionals.

Cedar schema snippet:

cedar
// Entities
entity User {}
entity Org {}
entity Resource {
  owner: User,
  tenant: Org,
  classification: String,
}

// Actions
action "doc:read" appliesTo { principal: User, resource: Resource }
action "doc:write" appliesTo { principal: User, resource: Resource }

Cedar policies:

cedar
// Require MFA for write of sensitive documents
permit(principal, action == "doc:write", resource)
when { resource.classification == "sensitive" }
when { principal.mfa_authenticated == true };

// Allow read during business hours only (if desired globally)
permit(principal, action == "doc:read", resource)
when { context.time.hour >= 9 && context.time.hour < 17 && context.time.tz == "UTC" };

// Deny examples: Cedar supports explicit deny
forbid(principal, action, resource)
when { context.risk_score >= 80 };

Composition in code (Go):

go
type AuthzInput struct {
    Principal string
    Resource  string
    Action    string
    Context   map[string]any
}

func Authorize(ctx context.Context, in AuthzInput) (bool, error) {
    // 1) SpiceDB check
    allowed, zt, err := spicedb.Check(ctx, in.Principal, in.Action, in.Resource, in.Context)
    if err != nil || !allowed {
        return false, err
    }

    // 2) Cedar evaluation
    decision, err := cedarEngine.Evaluate(ctx, in.Principal, in.Action, in.Resource, in.Context)
    if err != nil {
        return false, err
    }

    return decision.Allow, nil
}

Why both? SpiceDB models who is related to what; Cedar expresses when those relationships are usable. Keep logic in Cedar for evolving business constraints; keep relationships in SpiceDB for correctness, performance, and explainability.

OPA (Rego) alternative

OPA is a general-purpose policy engine. Example Rego snippet:

rego
package authz

# Default deny
default allow = false

# Require MFA for write if sensitive
allow {
  input.action == "doc:write"
  input.resource.classification == "sensitive"
  input.principal.mfa_authenticated == true
}

# Allow read during business hours
allow {
  input.action == "doc:read"
  input.context.time.tz == "UTC"
  input.context.time.hour >= 9
  input.context.time.hour < 17
}

# Risk-based deny overrides
allow = false {
  input.context.risk_score >= 80
}

If you already run OPA for admission control or API gateways, reusing it for ABAC is pragmatic. Cedar offers tighter typing and built-in resource/action modeling; either works.

Consistency, caching, and tokens (Zanzibar-style)

Authorization correctness depends on read-your-writes and predictable staleness. Zanzibar introduced “zookies” (SpiceDB calls them zed tokens) to carry a point-in-time revision of the graph.

SpiceDB consistency options per request:

Fully consistent: Evaluate at the current head revision. Highest latency.
Minimize latency: Allow slightly stale reads from replicas without a token. Lowest latency.
At least as fresh as token: Caller passes a zed token from a prior write or read, ensuring monotonicity.

Practical guidance:

On write (e.g., share doc, change group), capture the returned zed token and pass it to subsequent checks in the same user flow. This gives read-your-writes UX.
For idempotent GETs without recent writes, use MinimizeLatency for p50 speed.
For critical admin operations (e.g., revocation), use FullyConsistent or supply the zed token from the mutation.

Caching patterns:

Client-side LRU keyed by (subject, permission, resource, tenant, token-hash).
Short TTLs: 30–300ms in hot paths, or token-bound caches.
Populate caches through batching: use BulkCheck or parallelization to amortize network latency.

Server-side watches:

SpiceDB’s Watch API streams tuple updates. Sidecars can invalidate caches regionally.
Maintain per-tenant invalidation channels if tenants are isolated in storage.

Do not cache negative authorizations long without a token bound; revocations and new grants will make negatives stale.

Query patterns: Check, Expand, LookupResources

Check: boolean membership. Use it for point decisions: “Can Alice edit doc123?”
Expand: expand a permission set into a tree explaining why access is granted. Use for audits and debugging.
LookupResources (aka list what I can do): expensive queries to list resources for which a subject has a permission. Use sparingly, pre-index when possible.

Anti-patterns:

Listing resources by fetching them all and then calling Check per resource. This explodes into N checks. Prefer LookupResources or maintain a separate search index keyed by ACL filters.
Using Expand on hot paths. Expand is for introspection, not request-time gating.

Example: lookup resources a user can read within a folder subtree:

bash
zed permissions lookup-resources \
  --subject user:alice \
  --permission read \
  --resource-type document \
  --at-least-as-fresh "$ZED_TOKEN" \
  --filter 'parent==folder:f123'

If your product requires large “list my stuff” pages, consider materialized views in your search/index layer that subscribe to Watch events from SpiceDB.

Migrating from RBAC to ReBAC

A successful migration is incremental, observable, and reversible.

Inventory and map

Enumerate roles and their implied permissions.
Identify resource types and scopes (global, org, folder, doc).
Map role semantics to relationships (e.g., project_admin -> folder.owner/editor).

Design a first schema

Model relationships first; delay ABAC until relationships work.
Choose bounded depth for hierarchies.
Identify expected tuple cardinalities; estimate per-tenant sizes.

Backfill tuples

Derive tuples from existing role assignments and ACL tables.
Use a one-time migration, then CDC to keep SpiceDB in sync with your source of truth.

Shadow read (dual-run)

On every authorization, call both the legacy RBAC code and SpiceDB; log both outcomes with inputs.
Build differential dashboards: agreement rate, false allows, false denies.

Fix and iterate

Where they disagree, adjust schema or backfill logic.
Add caveats or contextual policies only after relationship parity is high (>99.9% agreement is a reasonable bar).

Gradual cutover

Start with low-risk actions (read-only) and a small cohort or feature flag.
Roll out write permissions next, watching error budgets.

Decommission

When stable, remove legacy RBAC code paths, but keep fallback switches for emergency.

Code example: mapping a role table to tuples (pseudo-SQL):

sql
-- RBAC: org_user_roles(user_id, org_id, role)
-- Map: org_admin -> organization#admin, org_member -> organization#member
INSERT INTO spicedb_tuples(object, relation, subject)
SELECT 'organization:' || org_id, CASE role
  WHEN 'org_admin' THEN 'admin'
  WHEN 'org_member' THEN 'member'
END, 'user:' || user_id
FROM org_user_roles;

Avoid expensive per-row network calls by using bulk write APIs or the SCIM/CSV importers some vendors offer. For ongoing sync, subscribe to database change streams (e.g., Postgres logical replication) and transform to tuple writes.

Multi-tenant patterns

Tenancy will shape your schema and deployment.

Pattern A: Tenant in object IDs

Prefix every resource with tenant ID in its object ID: document:tenantA:doc1.
Pros: simple logical isolation; no cross-tenant traversal by construction.
Cons: more cumbersome IDs, potential complexity in filters.

Pattern B: Explicit tenant object

spicedb
definition tenant {
  relation member: user | group#member
  permission manage = member
}

definition document {
  relation tenant: tenant
  relation owner: user | group#member
  relation viewer: user | group#member | tenant#member
  permission read = viewer + owner
}

Pros: clear modeling of tenant membership; can reference tenant boundaries in permissions and policy.
Cons: You must ensure you never traverse tenant->member across tenants inadvertently.

Operational isolation:

Separate SpiceDB namespaces or clusters per tenant tier for noisy neighbors.
Use storage-level sharding (e.g., per-tenant PostgreSQL schemas or Spanner keyspace partitions).
Enforce tenant constraints in Cedar/OPA as a gate before calling SpiceDB.

Cross-tenant sharing:

Model explicit cross-tenant groups or a resource “share token” relation that scopes specific resources.
Be intentional: add audit logs and proactive scans for cross-tenant edges.

Testing and verification

You need more than unit tests. Aim for three layers: correctness, properties, and performance.

Schema unit tests

Author “golden” tuples and expected outcomes.
Use zed’s test harness or SDKs to assert Check outcomes with/without caveats.

Example (Go):

go
func TestDocumentRead(t *testing.T) {
    seedTuples := []Tuple{
        T("organization:acme#member@user:alice"),
        T("folder:f123#parent@organization:acme"),
        T("document:d1#parent@folder:f123"),
        T("folder:f123#owner@user:carol"),
    }
    writeTuples(t, seedTuples)

    allowed := check(t, "user:alice", "document:d1", "read", nil)
    if !allowed {
        t.Fatal("expected alice to read via org membership and inheritance")
    }
}

Property-based tests

Generate random trees (folders/docs) with bounded depth.
Assign random memberships; assert invariants: owner implies admin implies write implies read; removing a parent edge removes derived access.

Differential tests (migration)

For a sample of requests, evaluate both RBAC and ReBAC. Fail the build if disagreements exceed a budget.

Policy verification

Cedar includes tooling for policy validation; use schema-aware checks for unused actions and unreachable rules.
For OPA, write scenario tests and use conftest to enforce guardrails.

Performance tests

Reproduce production-like tuple counts and degrees.
Measure p50/p95/p99 for Check and LookupResources under realistic QPS.
Track zed token propagation delays, cache hit ratios, and watch lag.

Performance pitfalls and how to avoid them

Unbounded depth: Deep parent chains cause dispatch recursion. Cap depth (e.g., 8–16) and segment long hierarchies.
High fan-out groups: Large groups with >100k members stress intersections. Prefer nested groups with capped sizes and precomputed cohort edges for hot sets.
Tuple explosion: Avoid per-user direct shares on hot resources. Use groups and org-level relations.
Overuse of LookupResources: It’s inherently heavier. Cache, index, or paginate with prefilters.
Caveat overreach: Complex CEL expressions with heavy context objects cost CPU. Keep caveats simple; push rich logic to Cedar/OPA.
Cold caches: Warm caches at startup with common checks; enable locality-aware load balancing to improve hit rates.
N+1 checks: Batch checks per request or per page view. Consider per-request deduplication of repeated checks.

Monitoring must-haves:

QPS and latency histograms per API (Check, Expand, LookupResources)
Cache hit ratio (client and server)
Average/percentile graph depth and dispatch count per Check
Tuple counts, by type and tenant; degree distributions
Watch lag and dropped events
Error budgets by permission: false allow/deny rates from diff tests

Tuning knobs:

Consistency mode: favor MinimizeLatency for read-heavy flows.
Dispatch workers and concurrency pool sizes.
Per-tenant quotas and admission control to avoid noisy neighbors.

Operations: running SpiceDB in production

Storage backends (choose what matches your infra and SLOs):

Postgres: great for small-to-medium scale; easy ops; HA with Patroni or cloud providers.
CockroachDB: distributed SQL; good for multi-region with careful tuning.
Spanner: if you already run on GCP and need global consistency; higher cost/complexity.

Key practices:

Schema migrations: Version schema; use canary clusters; run Expand tests after changes.
Backup/restore: Snapshot tuples and namespace schema; test restores quarterly.
Multi-region: Read-local replicas for MinimizeLatency; write to a primary; measure replication times to guide token usage.
Security: mTLS between clients and SpiceDB; RBAC for admin endpoints; audit tuple writes.
Observability: Export OpenTelemetry traces for checks; include zed token and dispatch metrics.

Rollout playbook for schema change:

Additive first: add new relations/permissions and dual-write tuples.
Shadow-read: compare old vs new permission outcomes.
Flip reads to new permission in a feature flag.
Remove old relations after a soak period.

A worked end-to-end example

Let’s wire a small service in Go that composes SpiceDB and Cedar.

SpiceDB check wrapper (pseudocode):

go
type SpiceDBClient interface {
    Check(ctx context.Context, subject, permission, resource string, ctxMap map[string]any, consistency Consistency) (bool, string /* zedtoken */, error)
}

func Check(ctx context.Context, subject, action, resource string, env map[string]any) (bool, string, error) {
    // Map action -> permission by resource type
    permission := mapActionToPermission(action, resource)

    // Use minimize latency unless caller supplies a token
    consistency := ConsistencyMinimizeLatency
    if tok, ok := env["zed_token"].(string); ok {
        consistency = ConsistencyAtLeastAsFresh(tok)
    }

    // Provide caveat context if needed
    caveatCtx := map[string]any{
        "time": map[string]any{"hour": env["hour"], "tz": env["tz"]},
        "user": map[string]any{"mfa_authenticated": env["mfa_authenticated"]},
    }

    allowed, token, err := spicedb.Check(ctx, subject, permission, resource, caveatCtx, consistency)
    return allowed, token, err
}

Cedar evaluation wrapper:

go
func EvaluatePolicy(ctx context.Context, principal, action string, resource Resource, env map[string]any) (bool, error) {
    input := cedar.Input{
        Principal: cedar.Entity{Type: "User", ID: principal},
        Action:    cedar.Action(action),
        Resource:  cedar.Entity{Type: "Resource", ID: resource.ID, Attrs: map[string]any{
            "owner": resource.Owner,
            "tenant": resource.Tenant,
            "classification": resource.Classification,
        }},
        Context: env,
    }
    res, err := cedar.Engine.Eval(ctx, input)
    if err != nil { return false, err }
    return res.Allow, nil
}

Composition:

go
func Authorize(ctx context.Context, principal, action, resourceID string, env map[string]any) (bool, error) {
    allowed, zt, err := Check(ctx, principal, action, "document:"+resourceID, env)
    if err != nil || !allowed { return false, err }

    resource := loadResource(resourceID) // from DB/cache
    ok, err := EvaluatePolicy(ctx, principal, "doc:"+action, resource, env)
    if err != nil || !ok { return false, err }

    // Pass zed token to subsequent requests to ensure monotonicity
    env["zed_token"] = zt
    return true, nil
}

This illustrates the typical production stack: a stable ReBAC core plus a flexible ABAC layer.

Common questions and gotchas

Should I use SpiceDB caveats or Cedar/OPA? Use caveats for lightweight, tuple-scoped conditions. Use Cedar/OPA when policy governance, auditing, or complex conditions matter. You can combine them: SpiceDB enforces relationship + simple caveat, then Cedar enforces broader policy.
How do I model denies? Zanzibar-style systems are allow-lists by default. Model excludes with set difference (permission p = allowed - excluded). Cedar also supports explicit forbids; be careful composing them with allow-lists so you don’t create paradoxical states.
Listing: Why is list expensive? Because it has to evaluate many potential paths. Use LookupResources and precomputed indexes; avoid naive N Checks per page.
How big is too big? Monitor tuple counts and peak degree. If a single group has millions of members, consider sharding the group or compressing via cohort relations.
Can I use OpenFGA instead? OpenFGA is another Zanzibar-style engine. The principles here carry over; adjust syntax and tooling accordingly. For Cedar integration, the same orchestration pattern applies.

What “good” looks like in production

Stable schema with clear ownership; changes are done via RFCs and schema reviews.
99p Check latency under your SLA (e.g., <20ms) with >80% client-side cache hit rate on hot paths.
High agreement rate (>99.9%) during migration; emergent disagreements resolved within a day.
Monitoring dashboards with dispatch depth, watch lag, and per-tenant tuple counts.
A regression test suite that seeds a synthetic graph and checks 100s–1000s of scenarios.

Closing thoughts

ReBAC is not a silver bullet, but for collaborative, multi-tenant apps it’s the right foundation. SpiceDB brings Zanzibar’s rigor to your stack; Cedar or OPA give you a disciplined place to express business context. Keep relationships and context separate, use tokens to manage consistency, and migrate incrementally with measurement and guardrails.

If you do those things, you’ll end 2025 with authorization that’s faster, safer, easier to change, and—crucially—explainable to auditors and engineers alike.

References and further reading

Zanzibar paper (USENIX ATC 2019): “Zanzibar: Google’s Consistent, Global Authorization System”
SpiceDB (Authzed) docs and GitHub: https://authzed.com, https://github.com/authzed/spicedb
zed CLI: https://github.com/authzed/zed
Cedar language and tools: https://www.cedarpolicy.com
Open Policy Agent: https://www.openpolicyagent.org
OpenFGA (Zanzibar-like): https://github.com/openfga/openfga
CEL language: https://github.com/google/cel-spec