Do you really need a Vector Database for your AI Product?
Before you sign a massive SaaS contract for a dedicated vector engine, you need to understand the physics of the hardware you are renting.
Last week, I sat in an architecture review with a client who had just secured Series A funding. They were building a standard Retrieval-Augmented Generation (RAG) pipeline for their internal documents.
Before anyone had written a single line of backend logic, the engineering lead proudly displayed a slide proposing a six-figure enterprise contract with a dedicated Vector Database.
When I asked why we weren’t just using Postgres, the answer was immediate
“Because this is an AI app. You need a Vector DB for AI.”
This is the exact kind of hype-driven development that destroys startup runways.
At the physical level, Vector Databases are not magic AI boxes. They do not understand “meaning,” “context,” or “semantics.”
They are simply C++ or Rust memory allocators navigating multi-dimensional mathematical graphs.
To do this quickly, they trade exact accuracy for speed, and they force you to pay an absolute fortune in RAM to do it.
Before you sign a massive SaaS contract for a dedicated vector engine, you need to understand the physics of the hardware you are renting.
The Brute Force Math
To understand why traditional databases fail at vector search, you have to look at the math of a “Vector.”
An OpenAI text-embedding-3-small embedding is a 1,536-dimensional array of floating-point numbers. In physical memory, a single 32-bit float consumes 4 bytes.
1,536 dimensions × 4 bytes = 6,144 bytes (6 KB) per vector
If you want to find the closest vectors to a user’s query using exact math, a process called Exact K-Nearest Neighbors (KNN) which your CPU must calculate the cosine similarity between the query vector and every single row in the database.
Let’s do the memory bandwidth math on a small 1-million-row table
1,000,000 rows × 6 KB = 6.1 Gigabytes of data
To answer one user query,
the CPU must pull 6.1 GB of data from RAM through the memory bus,
load it into the L1 cache,
and execute millions of AVX-512 SIMD (Single Instruction, Multiple Data) dot-product operations.
Even on modern DDR5 RAM peaking at 50 GB/s of bandwidth, a single concurrent user doing an Exact KNN search will consume 12% of your entire server’s memory bandwidth.
If you get 10 concurrent searches, your CPU is completely starved for data. The system grinds to an absolute halt.
Even the traditional B-Trees cannot save you.
A B-Tree relies on 1-dimensional inequalities (Is X > 10? Go right). You cannot sort or bisect 1,536 dimensions simultaneously.
HNSW (Hierarchical Navigable Small World)
If Exact KNN locks up the CPU, how do Vector DBs return results in 20 milliseconds?
They don’t do exact searches. They cheat.
Vector databases rely on Approximate Nearest Neighbor (ANN) algorithms.
They accept that finding the perfect match is computationally impossible at scale, so they settle for finding a very good match almost instantly.
The undisputed king of these algorithms is HNSW.
Do not let the academic name intimidate you. HNSW is just a multi-layered skip-list mapped over a proximity graph.
Imagine you are driving from New York to a specific house in a suburb of Los Angeles
The Top Layer (Interstates)
You don’t take local roads across the country. You get on an interstate.
In HNSW, the top layer has very few nodes, but they have long-distance links.
The search algorithm drops in here and makes massive, cross-graph jumps toward the general cluster of the target.
The Middle Layers (City Roads)
Once you are near Los Angeles, you drop down a layer.
There are more nodes here, connected by shorter links.
You navigate to the correct neighborhood.
The Bottom Layer (Local Streets)
You drop to the base layer, which contains every single vector in the database, and you traverse the local streets until you hit the closest possible house (a local minimum).
HNSW is a masterpiece of algorithmic engineering. It reduces a catastrophic O(N) full table scan into a blisteringly fast O(log N) graph traversal.
But it comes with a brutal physical cost called Pointer Chasing.
Why NVMe SSDs Hate HNSW (Pointer Chasing)
Traditional relational databases are famous for being disk-friendly because they exploit Locality of Reference.
B-Tree nodes are packed cleanly into contiguous 8KB blocks (pages) on your SSD. When Postgres needs an index node, it pulls a single 8KB block into memory, and all the sequential keys are right there next to each other.
HNSW graphs are the exact opposite.
An HNSW index is a giant, chaotic web of pointers (memory addresses) pointing to other pointers across multiple layers.
The nodes are scattered randomly across the heap during insertion.
Traversing this graph means jumping wildly from memory address to memory address.
If your HNSW index does not fit in RAM and is forced onto a Solid-State Drive, following those pointers requires thousands of Random Disk Reads per query.
A standard NVMe SSD takes roughly 100 microseconds (µs) to complete a random 4KB read. If an HNSW search requires 200 graph hops to find the nearest neighbor
200 hops × 100 µs = 20 milliseconds of pure disk latency
That sounds fast, until you realize this is for one query.
If your app is doing 1,000 queries per second, your SSD must sustain 200,000 random IOPS.
Even the most expensive AWS io2 Block Express volumes will buckle under that kind of random I/O queue depth.
Your 20ms latency will instantly spike to 2 seconds as the disk controllers choke.
Furthermore, pointer chasing completely defeats the OS Page Cache and the CPU’s hardware prefetchers.
The CPU cannot guess which memory address the graph will jump to next, resulting in continuous L3 cache misses.
The RAM Tax
Here is the physical reality of vector search
To get the sub-50ms latency that Vector DBs advertise, the entire HNSW index must physically live in RAM.
This brings us to the invoice (cost of the RAM).
RAM is exponentially more expensive than NVMe storage. Let’s do the math on a production-scale deployment of 100 million vectors.
Vector Payload -
100,000,000 × 6 KB (1536-dim floats) = 600 GBHNSW Graph Overhead - Each node in HNSW maintains bidirectional pointers to its neighbors across multiple layers. This pointer overhead usually adds 30% to 50% to the base vector size. Let’s add 200 GB.
To serve 100 million vectors, you need 800 GB of RAM.
Storing 800 GB on a standard Postgres SSD costs about $65 a month. Holding 800 GB in an AWS r6a.24xlarge memory-optimized instance costs $5,300 a month.
When you buy a dedicated Vector Database, you are not buying better “AI.”
You are buying an expensive fleet of high-RAM cloud instances to hold a massive, chaotic pointer graph in volatile memory because the algorithm physically cannot survive on a disk.
What you should do?
Do not subsidize a SaaS company’s valuation because you think standard infrastructure can’t handle vector math.
Here is the architectural framework you should use before signing a vendor contract
Use Postgres (pgvector)
If you have fewer than 5 million vectors, you do not have a scale problem. You have a standard CRUD problem.
Install the pgvector extension on your existing Postgres instance.
Postgres is perfectly capable of building an HNSW index and holding it in its shared_buffers (RAM).
You save thousands of dollars, you eliminate a fragile network hop in your infrastructure, and you keep your relational data and your embeddings in the exact same ACID-compliant transaction block.
You can JOIN your semantic search results directly against your user permission tables in a single query.
Use a Dedicated Vector DB (Pinecone, Milvus, Qdrant)
You only graduate to a dedicated vector engine when your index physically outgrows the RAM limits of a single, massive database instance.
When you hit 50 million, 100 million, or a billion vectors, a single Postgres node will OOM (Out of Memory) trying to fit the HNSW graph into shared_buffers.
That is the exact moment you pay a premium for a distributed vector database.
You are paying them to shard the massive HNSW graph across multiple clustered nodes and handle the distributed scatter-gather networking required to query it.
Conclusion
Architecture requires alignment between the physical realities of the hardware and the economic realities of the business.
HNSW is a brilliant algorithm, but it is an unapologetic memory glutton.
It defeats disk I/O, thrashes CPU caches, and demands expensive DDR5 RAM to function at scale.
Start with Postgres, monitor your memory saturation, and scale out to a distributed vector engine only when the physics of the graph demand it.
Don’t buy a distributed system until you have a distributed problem.


