Every vector DB comparison you've read has a winner, and the winner is whoever paid for the benchmark. We ran ours on our own infrastructure, with our own data, against our own ground truth. Same 1.2 million chunks, same embedding model, same 4,800 queries against pgvector, Pinecone, Qdrant, and Weaviate. We measured what actually matters in production: tail latency, real recall, total cost of ownership, and the ops burden that nobody tells you about.
Short version: no vendor took every prize. Qdrant won latency. Pinecone won recall. pgvector won cost. Weaviate won nothing outright but has a multi-modal story the others don't. The interesting answers are in the gaps between those headlines.
What we actually ran
The dataset is 1.2 million chunks from a mixed enterprise corpus: documentation, support transcripts, and internal knowledge base content. Each chunk was embedded with text-embedding-3-large at 3072 dimensions, then truncated to 1536 for the stores that charge per-dimension. Every store was loaded identically. No index tuning beyond each vendor's recommended defaults, except where noted for pgvector.
Query set: 4,800 queries drawn from real retrieval logs, stratified by domain (technical, procedural, factual, ambiguous). Queries ran hot, no cache, no connection pooling tricks, from a single EC2 m6i.2xlarge in us-east-1. All managed services were in the same region. pgvector ran on RDS db.r6g.2xlarge.
Recall@10 was measured against a ground-truth set built by running exhaustive brute-force search on the full corpus. This is the honest number, not the vendors' marketing recall, which is usually measured against their own ANN approximation.
text-embedding-3-large · Dimensions: 1536 · Chunks: 1,200,000 · Query set: 4,800 · Region: us-east-1 · Concurrency: 1 (serial, no batching) · Index type: vendor default HNSW where applicable · Date: April 2026.
pgvector: RDS db.r6g.2xlarge, pg 16, pgvector 0.7. Pinecone: serverless (us-east-1). Qdrant: Cloud managed, 8 vCPU. Weaviate: Cloud managed, equivalent tier.
Latency: p50 and p99, cold queries
Latency is where the story gets complicated. The p50 numbers cluster tighter than you'd expect. Managed services have converged on similar median performance. The p99 is where they separate. Qdrant had the lowest p99 by a meaningful margin. Weaviate's tail was the longest, driven by garbage collection pauses in its JVM runtime on larger result sets.
| store | p50 | p99 | tail signal |
|---|---|---|---|
| pgvector | 42 ms | 188 ms | steady, ivfflat sweet spot |
| Pinecone | 32 ms | 154 ms | serverless, opaque internals |
| Qdrant | 26 ms | 101 ms | Rust, no GC pauses |
| Weaviate | 38 ms | 253 ms | JVM GC on large result sets |
pgvector's p99 surprised us on the upside relative to expectations. The ivfflat index on RDS behaved more consistently than its reputation suggests at this scale, though we were within its sweet spot at 1.2M vectors. At 5M+ the story likely changes.
Recall@10: the honest number
Recall is the most politically charged metric in any vector DB comparison because every vendor publishes a favorable version of it. Ours is measured against brute-force exact search: the actual ground truth, not an approximation of an approximation. Pinecone won this category. Its serverless architecture seems to run with higher HNSW ef parameters than the other managed services by default.
| store | recall@10 | config | notes |
|---|---|---|---|
| Pinecone | 0.965 | serverless default | highest in group, opaque defaults |
| Qdrant | 0.951 | HNSW default | within noise of Pinecone |
| Weaviate | 0.938 | HNSW default | middle of pack |
| pgvector | 0.912 | ivfflat, lists=256, probes=10 | 0.87 at lists=100, tuning required |
pgvector's recall was the most sensitive to the index configuration. With ivfflat at default lists=100, recall dropped to 0.87. We tuned to lists=256 and probes=10, which recovered it to 0.912. Still the floor of the group. If recall is your primary constraint, pgvector requires more tuning attention than the managed alternatives.
The gap between Pinecone and Qdrant (0.965 vs 0.951) is real but arguably within the noise of what matters downstream. In our use case, the recall difference translated to roughly 1.4 additional relevant results per 100 queries: meaningful for some applications, not for others.
Cost: $/1M reads at production volume
Cost comparisons are always approximate because pricing models differ structurally. Pinecone charges per read unit, Qdrant Cloud charges for compute, pgvector charges for the RDS instance. We modeled 1M reads per month at our observed query distribution and the minimum viable configuration to serve it without cold-start latency.
pgvector's cost advantage is real but comes with a catch: it has a fixed floor. At low read volumes, you're paying for an RDS instance whether you use it or not. Pinecone's serverless model has no floor. You pay only for what you read. The crossover point, where pgvector becomes cheaper than Pinecone, is approximately 400K reads/month. Below that threshold, Pinecone's serverless model is actually cheaper in absolute terms.
At high read volumes (10M+/month), pgvector's effective cost per read continues to fall while the managed services trend upward. If you're already running Postgres and have the traffic to justify the floor, the total cost of ownership math becomes compelling.
Ops complexity: what nobody tells you
This is the metric that never shows up in vendor benchmarks and matters most in practice. We rated ops complexity across three dimensions: initial setup, ongoing maintenance, and observability. Scores are 1 to 10 where higher means more operational burden.
| dimension | pgvector | Pinecone | Qdrant | Weaviate |
|---|---|---|---|---|
| initial setup | 7 / 10 | 2 / 10 | 3 / 10 | 4.5 / 10 |
| ongoing maintenance | 6.5 / 10 | 1 / 10 | 2.5 / 10 | 4 / 10 |
| observability (higher = better tools) | 9 / 10 | 3.8 / 10 | 6.2 / 10 | 5.5 / 10 |
pgvector's observability score is its most underrated advantage. If your team already operates Postgres, you get pg_stat_statements, EXPLAIN ANALYZE, standard slow query logging, and every monitoring integration your stack already has wired up. The operational knowledge transfer is near-zero.
Pinecone's maintenance burden is genuinely the lowest of the group. It's not close. But its observability is opaque: you get what the dashboard shows, and what the dashboard shows is not enough to debug a recall regression or latency spike without opening a support ticket.
Who should use what
| store | p50 | p99 | recall@10 | $/1M | ops |
|---|---|---|---|---|---|
| pgvector | 42 ms | 188 ms | 0.912 | $1.20 ★ | high |
| Pinecone | 32 ms | 154 ms | 0.965 ★ | $8.40 | lowest ★ |
| Qdrant | 26 ms ★ | 101 ms ★ | 0.951 | $3.10 | low |
| Weaviate | 38 ms | 253 ms | 0.938 | $4.80 | medium |
The uncomfortable truth: if you're optimizing for a single metric, the answer is easy. If you're optimizing across latency, cost, recall, and ops burden simultaneously, Qdrant is the closest thing to an all-rounder at this dataset size. It doesn't win any single category outright (except latency), but it has no serious weak points either. That matters in production.
If you'd like us to look at which store is the right call for your retrieval workload, the contact form is the fastest way. We do 30-minute reviews for production systems, free.