Database Partitioning
What problem are we solving?
🐌 slow queries
scanning a massive tables, even with indexes
sometimes we need to get only a small slice of data
💾 storage
indexes grow big and slow
backups, restore, or replication of large monolithic get heavy
🥊 operational struggles
dropping old data is slow and painful
archiving old records becomes a nightmare
How to use it ?
splitting one large table into smaller (called partitions) based on a key (created_at, user_id, etc.)
When to use it ?
🐘 very large tables (millions to billions of rows)
if you table is growing rapidly and you're noticing slower reads, writes, it's time to
think partitioning.
especially true for time-series data (logs, events, metrics, etc.)
🕢 queries that often target subsets of data
show me users from region X
give me data from the 2022
🪣 need to archive or delete data often
if you're regularly deleting old records (e.g. logs older than 90 days) you can just
DROP a whole partition - way faster than DELETE
How does it work under the hood?
for instance, you can create partition databases by year column values
CREATE TABLE users (
id INT,
name TEXT,
created_at DATE
) PARTITION BY RANGE (created_at);
CREATE TABLE users_2024 PARTITION OF users FOR VALUES FROM ('2024-01-01') TO
('2025-01-01');
CREATE TABLE users_2025 PARTITION OF users FOR VALUES FROM ('2025-01-01') TO
('2026-01-01');
INSERT INTO users (id, name, created_at) VALUES (1, 'Alice', '2024-04-10');
INSERT INTO users (id, name, created_at) VALUES (2, 'Bob', '2025-04-10');
select * from users; -- alice + bob
select * from users_2024; -- alice
select * from users_2025; -- bob
Types or patterns available
🕰️ range partition
PARTITION BY RANGE (created_at)
-- Partition 2: 2024-01-01 to 2025-01-01
-- Partition 1: 2023-01-01 to 2024-01-01
📒 list partition
PARTITION BY LIST (country)
-- Partition 1: 'US', 'CA'
-- Partition 2: 'UK', 'DE'
🔢 hash partition
PARTITION BY HASH (user_id)
PARTITIONS 4;
-- Partition 0: user_id % 4 = 0
-- Partition 1: user_id % 4 = 1
Pros & Cons
👍
🔥 faster execution
🤏 local indexes smaller and faster to search
💅 spread data across multiple files or machines
⌫ drop, truncate or archive individual partitions
🌶️ writes spread across partitions can reduce contention on hot tables
👎
🕰️ need to manage partitions over time
🔑 foreign keys often not supported across partitions
👣 the db has to figure out which partition to route row into
😧 once a partition strategy is chosen, changing it later is a pain