Database - The cost of using UUID as Primary Key
When you make UUIDv4 your Primary Key you are making a decision that fights against the fundamental physics of how databases store data.
We love UUIDs. They seduce us with a simple promise. You can generate a globally unique ID on a client device without talking to a database. You can merge records from ten different distributed systems without a single collision. They are stateless. They are easy.
But in the world of database design convenience usually comes with a tax.
When you make UUIDv4 your Primary Key you are making a decision that fights against the fundamental physics of how databases store data. You are choosing randomness. And in a B-Tree randomness is the enemy of locality.
This article will explain why your “lazy default” is causing your database to burn IOPS, trash its buffer pool and bloat your storage.
The B-Tree
To understand why UUID v4 is dangerous we have to look at the data structure that powers your database index the B-Tree (or B+Tree).
A B-Tree is designed for one specific goal Locality. It wants to store related data close together on the physical disk.
When you request Record 100 the database loads a “Page” (usually 8KB or 16KB) of data into memory. Because of locality that page likely also contains Record 99 and Record 101. This makes range scans and sequential reads blazing fast.
To maintain this locality the B-Tree must be Sorted.
The Sequential Insert
When you insert sequential IDs (like 1 2 3 or Timestamp) the database behaves like a logger. It simply adds new entries to the end of the file.
Locality - The database only needs to access the Rightmost page of the index. This page is “hot” (already in RAM).
Fill Factor - The database fills the page to 100% capacity, closes it, saves it to disk, and allocates a new empty page.
Result - Zero wasted space. Minimal disk movement.
This is the “Easy Mode” for any storage engine.
The Random Insert
Now imagine inserting a random UUID v4. a902... arrives.
This ID might have been generated on a user’s phone milliseconds ago or hours ago while offline. To the database time does not matter. Only the Value matters.
Even though this is a “new” record (inserted now) its Value (a902...) might be lexicographically smaller than records you created yesterday (e.g. b100...).
A B-Tree is strictly sorted by Value. The database cannot just append a902... to the end. It effectively says
“Oh based on alphabetical order this new book belongs on a specific shelf that I filled up and sealed three days ago.”
It forces the database to go back in time fetch that old “cold” page from the disk and try to shove the new data in. This triggers the mechanical disaster.
The Page Splitting Problem
The database fetches that old page from the disk into the Buffer Pool (memory). It tries to insert the new UUID.
The Problem, the page is full.
To make room for this random entry the database performs a Page Split.
It allocates a new empty page.
It takes half the records from the old page and moves them to the new page.
It inserts the new UUID into the middle.
It updates the parent pointer nodes to reflect the new path.
This has some consequences,
Write Amplification
You wanted to write 32 bytes of data. But to do it the database had to load a 16KB page create another 16KB page and write both back to disk.
You are burning CPU and I/O bandwidth just to move data around.
Buffer Pool Thrashing
In a sequential insert you only need the “latest” page in memory. In a random insert any page from the entire history of your database might be needed at any moment.
This defeats your caching strategy. Your Buffer Pool churns constantly evicting “hot” data to load “cold” pages just to insert a random ID.
Fragmentation and Disk Bloat
After a split you are left with two pages that are only 50% full. If you keep inserting randomly your pages effectively stay half empty.
This means your index takes up 2x more disk space than a sequential index. This is not just a storage cost issue.
It means your RAM can only hold half as many keys leading to even more disk reads.
UUID v4 vs BigInt vs UUID v7
UUIDv4 is the standard 128-bit random identifier. Its globally unique and required coordination.
But, destroys B-Tree performance. Fragments indexes. Clogs the WAL (Write Ahead Log) because page splits create massive log records.
So if UUID v4 is the villain who is the hero?
BigInt or Serial
The classic auto-incrementing integer.
Extremely fast sequential inserts. 64-bit size (half the size of a UUID).
But, it leaks business intelligence. If I create a user and get ID 105 and create another tomorrow and get 150 I know you only grew by 45 users.
It is also painful in active-active distributed systems where two servers might generate ID 106 at the same time.
UUID v7
Approved in 2024 this standard is designed specifically to solve the DB problem.
It embeds a 48-bit Timestamp at the start of the ID followed by random bits.
Because the start of the ID is time-based new records are generated in (mostly) increasing order.
To the B-Tree a UUID v7 looks like a sequential integer. It appends to the right. It fills pages completely. It avoids splitting. But it retains the distributed uniqueness and unguessability of a UUID.
I was once called in to consult for a logistics company. They had an event logging system that tracked packages. The system was simple. Every scan event got a UUID v4 Primary Key.
For the first year it was fine. Then they hit 50 Million Rows.
The Write Latency spiked from 2ms to 200ms. CPU usage on the RDS instance sat at 90%. They upgraded the instance class twice doubling their bill but the CPU didn’t drop.
After looking into the database memory I found, rhe dataset size had finally exceeded the available RAM (Buffer Pool).
When the table was small the entire B-Tree lived in memory. Random inserts were just RAM operations.
The moment the index grew larger than RAM every random insert became a Disk Read. To insert a record the DB had to fetch a random cold page from disk to memory.
For the fix, we couldn’t migrate the primary keys easily. Instead we introduced a composite sort key using a timestamp and rebuilt the clustered index to follow time rather than the UUID.
The UUID remained unique but it no longer dictated the physical storage order. Writes dropped back to 5ms. CPU dropped to 10%.
Conclusion
We are not guessing here. The benchmarks are clear. In tests inserting 100 million rows
UUID v4 takes 2x to 3x longer than sequential IDs.
UUID v4 indexes are 30% to 50% larger due to fragmentation.
UUID v7 performance is nearly identical to BigInt sequential inserts.
So, stop using UUID v4 for database Primary Keys. It is a legacy default that ignores computer science fundamentals.
If you need simple internal speed use BigInt.
If you need distributed scale and public safety use UUID v7.
Only use UUID v4 for ephemeral tokens or trace IDs that are never indexed.
Don’t let randomness kill your locality.



Very insightful.