Gaurav Sarma

You are building a SaaS product. Every customer has their own orders, invoices, and documents. A teammate opens a pull request that creates a new table for every customer that signs up: orders_customer_123, orders_customer_124, and so on. Another teammate comments: "this is insane, just add a customer_id column and an index."

Both of them are right. Both of them are wrong. The answer depends on numbers that neither of them has looked up yet: how many tenants you will have, how large each tenant's data will be, how often schemas change, and whether a regulator will one day ask you to delete exactly one customer's data within 24 hours.

This is one of the earliest architectural decisions in any multi-tenant system, and it is surprisingly hard to change later. The table structure, the indexes, the migration tooling, the monitoring, the backup strategy, all of it flows from this one choice.

The Problem

Multi-tenancy in a database boils down to one question: how do you isolate one customer's data from another?

There are two extremes:

Physical isolation: Each customer gets their own table (or schema, or database). Their data is literally in a different place on disk.
Logical isolation: All customers share the same table. A customer_id column and a WHERE clause separate them.

Physical isolation gives you strong boundaries. Logical isolation gives you operational simplicity. The problem is that each approach has costs that only show up at scale, and by the time you hit scale, migrating between strategies is extremely painful.

The stakes are real. Pick table-per-tenant with 50,000 customers and your Postgres catalog has 50,000 entries per table type. The planner slows to a crawl. Pick shared-table with a few whale customers and one tenant's 10M-row dataset skews the planner statistics for every other tenant with 1,000 rows. Both failure modes are silent until they are not.

Prerequisites

Working knowledge of Postgres (tables, indexes, schemas, EXPLAIN output)
Familiarity with B-tree indexes and how they affect writes
Understanding of what VACUUM does and why it matters
Experience with at least one multi-tenant application (even a small one)

Technical Decisions

Strategy 1: Table-Per-Tenant

The idea is straightforward. When customer 42 signs up, you run:

CREATE TABLE orders_42 (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT NOT NULL,
    amount NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

Every query for that customer hits only their table. No WHERE clause needed. No risk of accidentally reading another customer's data.

Where it shines:

Queries are fast because each table is small. A customer with 10,000 orders has a 10,000-row table. Sequential scans are cheap. Indexes are tiny. The Postgres planner has accurate statistics for that specific table because the data distribution is homogeneous.

Customer deletion is instant: DROP TABLE orders_42. No DELETE of millions of rows. No VACUUM pass afterward. No dead tuples. The disk space is returned immediately.

Per-customer backup and restore is trivial. You can pg_dump a single customer's tables without touching anyone else. If a customer needs their data exported for compliance, you hand them a dump file.

Where it breaks down:

Schema migrations multiply. Adding a column to the orders table means running ALTER TABLE on every customer's table. With 1,000 customers, that is 1,000 DDL statements. With 10,000, it is 10,000. Each one takes a lock on that table. If you are doing this in a migration script, a single failure partway through leaves you in a state where some customers have the new column and some do not.

# The migration nightmare
for customer_id in all_customer_ids:
    cursor.execute(f"ALTER TABLE orders_{customer_id} ADD COLUMN status TEXT DEFAULT 'pending'")
    # What happens if this fails on customer 5,847?
    # 5,846 customers have the column. 4,154 do not.
    # Your application code cannot assume the column exists.

Postgres catalog bloat is the silent killer. Every table creates entries in pg_class, pg_attribute, pg_statistic, and other system catalogs. Each table has roughly 1-2 KB of metadata in pg_class alone. Each column adds entries to pg_attribute. Each index adds another pg_class entry plus pg_statistic entries.

For a schema with 5 tables per customer, each with 3 indexes:

Tenants	Tables	Indexes	pg_class entries	Estimated catalog size
100	500	1,500	2,000	~4 MB
1,000	5,000	15,000	20,000	~40 MB
10,000	50,000	150,000	200,000	~400 MB
50,000	250,000	750,000	1,000,000	~2 GB

At 50,000 tenants, just opening a connection and having Postgres load catalog metadata becomes noticeably slower. The planner, which consults the catalog to build query plans, starts spending more time in catalog lookups than in actual query execution.

Connection pooling also becomes complicated. If each customer's tables are in a separate schema and your application sets search_path per request, PgBouncer in transaction mode cannot safely pool those connections because search_path is a session-level setting.

Strategy 2: Shared Table with Index

The opposite approach: one table for all customers.

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    amount NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

Every query includes WHERE customer_id = ?. One table. One migration. One index. One VACUUM configuration.

Where it shines:

Operational simplicity is dramatic. Schema changes are a single ALTER TABLE. Monitoring is one table's metrics. Your ORM does not need dynamic table names. Your migration tool works out of the box.

Adding a new tenant is free: just INSERT with a new customer_id. No DDL, no new metadata, no catalog entries.

Where it breaks down:

Index size grows with total data, not per-tenant data. A B-tree index on customer_id for 100 million rows is not small. Each index entry is roughly 16 bytes (8 bytes for the indexed value + 8 bytes for the tuple pointer). With some overhead for internal B-tree pages:

Total rows	Index on customer_id	Index on (customer_id, created_at)
1M	~24 MB	~40 MB
10M	~240 MB	~400 MB
100M	~2.4 GB	~4 GB
1B	~24 GB	~40 GB

A 2.4 GB index needs to fit in shared_buffers to avoid constant disk reads during index scans. If it does not fit, every query that touches the index generates random I/O.

Write amplification is the other cost. Every INSERT into the orders table must also update every index on that table. With a primary key index and a customer_id index, each insert does three writes: the heap tuple, the primary key B-tree insert, and the customer_id B-tree insert. Add a composite index and it is four writes. At high write throughput, this becomes the bottleneck.

Planner statistics are table-level, not per-tenant. This is the subtlest problem. Postgres collects statistics (via ANALYZE) about the distribution of values in each column. The planner uses these statistics to decide whether to use an index scan or a sequential scan.

If you have 10,000 tenants and one of them has 10 million rows while the rest have 1,000, the planner sees the average: roughly 10,000 rows per customer_id value. For the large tenant, it might choose an index scan when a sequential scan would be faster. For the small tenants, it might choose a sequential scan when an index scan would be faster. Neither plan is optimal.

-- The planner sees: avg rows per customer_id ≈ 10,000
-- For customer with 10M rows: index scan chosen, might be suboptimal
-- For customer with 500 rows: sequential scan chosen on a 100M row table!
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

VACUUM contention scales with table size. Autovacuum on a 100-million-row table is a heavyweight operation. It must scan the entire table for dead tuples, even if only one tenant's data was recently updated. A busy tenant generating lots of updates creates dead tuples that trigger VACUUM for the entire table, slowing reads for every other tenant.

Customer deletion is expensive. Deleting a customer requires DELETE FROM orders WHERE customer_id = 42, which generates dead tuples equal to the number of deleted rows. Those dead tuples consume disk space until VACUUM reclaims them. For a large customer, this can be millions of dead tuples, triggering an aggressive VACUUM that affects every other tenant.

Data leakage risk. Every query must include the customer_id filter. One missed WHERE clause, one raw SQL snippet without the filter, one debugging session where someone queries without it, and you are reading another customer's data. Row-Level Security (RLS) in Postgres mitigates this:

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
    USING (customer_id = current_setting('app.current_tenant')::BIGINT);

But RLS adds a filter to every query, and the overhead is measurable: roughly 5-15% on simple queries depending on the policy complexity and table size.

Implementation

The Numbers at Scale

Here is what each strategy actually costs at different tenant counts, assuming 5 tables per tenant, 3 indexes per table, and an average of 10,000 rows per tenant:

Metric	100 tenants	1,000 tenants	10,000 tenants
Table-per-tenant
Total tables	500	5,000	50,000
Catalog overhead	~4 MB	~40 MB	~400 MB
Migration time (add column)	~1 sec	~10 sec	~2 min
Customer deletion	instant (DROP)	instant (DROP)	instant (DROP)
Planner accuracy	excellent	excellent	excellent
Shared table
Total tables	5	5	5
Index size (per table)	~24 MB	~240 MB	~2.4 GB
Migration time (add column)	~1 sec	~5 sec	~30 sec
Customer deletion	DELETE + VACUUM	DELETE + VACUUM	DELETE + VACUUM
Planner accuracy	good	degraded	poor (skewed stats)

The crossover point is typically around 1,000-5,000 tenants. Below that, table-per-tenant is manageable. Above that, the catalog bloat and migration complexity start to dominate.

Query Latency Comparison

For a simple lookup query (SELECT * FROM orders WHERE customer_id = 42 AND created_at > '2026-01-01' LIMIT 100):

Scenario	Table-per-tenant	Shared table (indexed)
Small tenant (1K rows)	~0.1ms (seq scan)	~0.3ms (index scan)
Medium tenant (100K rows)	~0.2ms (index scan)	~0.5ms (index scan)
Large tenant (10M rows)	~0.3ms (index scan)	~1.2ms (index scan, larger tree)
After heavy deletes	unchanged	~2-5ms (bloated index, dead tuples)

Table-per-tenant wins on raw query performance because the indexes and tables are smaller, they fit in cache more easily, and the planner has per-tenant statistics. The shared table penalty comes from larger index traversals and potentially suboptimal query plans from skewed statistics.

The Hybrid: Postgres Partitioning

Postgres native partitioning gives you most of the benefits of table-per-tenant with the manageability of a shared table.

CREATE TABLE orders (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    amount NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY LIST (customer_id);

-- Each tenant gets a partition
CREATE TABLE orders_p42 PARTITION OF orders FOR VALUES IN (42);
CREATE TABLE orders_p43 PARTITION OF orders FOR VALUES IN (43);

Partition pruning is the key feature. When you query WHERE customer_id = 42, Postgres eliminates all other partitions before executing the query. The planner only considers orders_p42. It is as if the other partitions do not exist.

The advantages:

Single DDL for schema changes. ALTER TABLE orders ADD COLUMN status TEXT propagates to all partitions automatically.
Per-partition statistics. Each partition has its own pg_statistic entries. The planner sees accurate row counts per tenant.
Per-partition VACUUM. Autovacuum runs on individual partitions. A busy tenant does not trigger VACUUM on quiet tenants' data.
Fast deletion. DROP TABLE orders_p42 detaches and drops a single partition. Instant, like table-per-tenant.
Standard query interface. Application code queries the parent table orders with a WHERE clause. No dynamic table names.

The disadvantages:

Partition management overhead. You still need to CREATE a new partition for each new tenant. This is DDL, which takes a brief lock on the parent table.
Global index limitations. A unique index across all partitions (e.g., a global primary key) requires the partition key to be part of the index. You cannot have a globally unique id column without including customer_id in the unique constraint.
Catalog overhead still grows. Each partition is still a table in pg_class. 10,000 partitions means 10,000 catalog entries, better than table-per-tenant (no separate index entries to manage), but not zero.
Planner overhead with many partitions. Before Postgres 14, the planner scaled poorly with thousands of partitions. Postgres 14+ improved this significantly, but above ~5,000 partitions, planning time can still be measurable (1-5ms added to simple queries).

-- Partition pruning in action
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
--  Append
--    ->  Seq Scan on orders_p42
--          Filter: (customer_id = 42)
-- Only orders_p42 is scanned. All other partitions are pruned.

Schema-Per-Tenant

A middle ground: each customer gets their own Postgres schema, but within the same database.

CREATE SCHEMA tenant_42;
CREATE TABLE tenant_42.orders (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT NOT NULL,
    amount NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

The application sets search_path per request:

SET search_path TO tenant_42, public;
SELECT * FROM orders;  -- resolves to tenant_42.orders

This gives you physical isolation (separate tables, separate indexes, separate statistics) with the logical organization of schemas. Per-customer backup works via pg_dump -n tenant_42. Migrations still multiply, but schema-level operations can be scripted more cleanly than raw table name manipulation.

The main downside: connection pooling. PgBouncer in transaction mode resets session state between transactions, which means search_path gets cleared. You either need statement-level pooling (which has its own limitations) or you fully qualify every table name (tenant_42.orders), defeating the purpose of search_path.

Citus: Distributed Sharding by Tenant

For the largest deployments, Citus (now part of Azure Cosmos DB for PostgreSQL) distributes data across multiple Postgres nodes, sharded by customer_id.

-- Citus: distribute the orders table by customer_id
SELECT create_distributed_table('orders', 'customer_id');

Queries with WHERE customer_id = ? are routed to the correct shard node. Each node holds a subset of tenants. This gives you horizontal scaling beyond what a single Postgres instance can handle.

The tradeoff: cross-tenant queries (analytics, reporting across all tenants) become distributed queries with network overhead. Citus handles this, but the latency is higher than querying a single local table.

How It All Fits Together

The decision comes down to four variables:

┌─────────────────────────────────────────────────────────┐
│                  How many tenants?                        │
├──────────────────────┬──────────────────────────────────┤
│  < 100               │  Table-per-tenant works fine     │
│  100 - 5,000         │  Partitioning is the sweet spot  │
│  5,000 - 50,000      │  Shared table or partitioning    │
│                      │  (test planner overhead)         │
│  > 50,000            │  Shared table + Citus            │
└──────────────────────┴──────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│           How skewed is the data?                        │
├──────────────────────┬──────────────────────────────────┤
│  Uniform (all ~same) │  Shared table is fine            │
│  Skewed (whale        │  Partitioning or table-per      │
│  tenants exist)      │  (planner stats matter)          │
└──────────────────────┴──────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│        Compliance requirements?                          │
├──────────────────────┬──────────────────────────────────┤
│  Data residency /    │  Physical isolation required     │
│  per-tenant deletion │  (table-per, schema-per, or     │
│  guarantees          │   partitioning with DROP)        │
│                      │                                  │
│  No special          │  Shared table is simpler         │
│  requirements        │                                  │
└──────────────────────┴──────────────────────────────────┘

Factor	Table-per-tenant	Shared table	Partitioned	Schema-per-tenant
Query performance	Excellent	Good (degrades with skew)	Excellent (partition pruning)	Excellent
Schema migrations	O(n) tenants	O(1)	O(1)	O(n) tenants
Catalog overhead	High	Minimal	Moderate	High
Planner accuracy	Per-tenant	Table-level (skewed)	Per-partition	Per-tenant
VACUUM isolation	Full	None	Per-partition	Full
Customer deletion	DROP TABLE	DELETE + VACUUM	DROP partition	DROP SCHEMA CASCADE
Connection pooling	Complex	Simple	Simple	Complex
Max practical tenants	~5,000	100,000+	~10,000	~5,000
Data leakage risk	None (separate tables)	High (missing WHERE)	Low (pruning helps)	None (separate schemas)

Lessons Learned

Start with shared, move to partitioned. For most SaaS products, a shared table is the right starting point. It is simpler to operate, simpler to migrate, and simpler to monitor. If you grow to the point where statistics skew or VACUUM contention becomes a problem, converting to a partitioned table is a well-documented migration. Going the other direction, from table-per-tenant back to shared, is much harder.

Partitioning is the answer more often than people think. Postgres native partitioning gives you the performance characteristics of table-per-tenant (small indexes, accurate statistics, isolated VACUUM) with the operational characteristics of a shared table (single DDL for migrations, standard query interface). The main reasons not to use it are global unique constraints and planner overhead above ~5,000 partitions.

The catalog is a database too. Postgres system catalogs are themselves tables that Postgres queries internally. When you have 200,000 entries in pg_class, every CREATE TABLE, every ALTER TABLE, and every connection establishment pays a cost. Teams that hit this wall are usually surprised because nothing in their application metrics shows it. The slowdown is in Postgres internals, not in query execution.

RLS is not free but it is worth it. If you go with a shared table, Row-Level Security adds 5-15% overhead on simple queries. That is a real cost. It is also far less expensive than a data leakage incident. Treat RLS as the seatbelt: you do not negotiate whether to wear it.

The whale tenant problem is unavoidable. In every multi-tenant system, eventually one customer has 100x more data than the average. Shared tables suffer the most because statistics and VACUUM are table-level. Partitioning handles it cleanly because each partition has its own statistics and VACUUM schedule. If you are on a shared table and cannot partition, consider moving whale tenants to their own table and routing queries in the application layer. It is ugly, but it works.

Deletion is an underestimated design constraint. GDPR Article 17, the right to erasure, gives customers the right to have their data deleted. On a shared table, DELETE FROM orders WHERE customer_id = 42 on 5 million rows generates 5 million dead tuples, triggers VACUUM, and temporarily bloats the table. With partitioning or table-per-tenant, it is a DROP, instant and clean. If your product serves European customers, factor this into the decision from day one.

What's Next

This post covered the storage layer. The multi-tenancy decision also affects:

Caching: Do you partition your Redis keyspace by tenant? Use separate Redis instances? A shared instance with key prefixes?
Background jobs: Does one tenant's heavy job queue starve others? How do you enforce fairness?
Rate limiting: Per-tenant rate limits require tracking usage per tenant, which has its own data structure decisions

Each of these layers inherits the same tension between isolation and simplicity.

Table-Per-Tenant vs Shared Table: The Multi-Tenancy Tradeoff in Postgres